Mobile VR has been using OpenGL ES as its graphics API since its infancy, but nowadays most engines are trying to pivot towards lower level APIs such as Vulkan and DX12, giving developers more flexibility and lower rendering overhead. This post explains some of the differences between GL and Vulkan in VR-oriented features that are necessary for high performance VR games, it is a pretty in-depth graphics article that requires basic knowledge of GL and Vulkan for full comprehension.
Lots of the features explained below require the Quest version 7 operating system that was recently released due to changes in the Adreno graphics driver and in the Oculus runtime.
MSAA on mobile chipsets, unlike PC GPUs, is done by executing the MSAA operations only on the tiler and then resolving (averaging all subsamples into the final pixel) when the tile is done, right before storing its result into main memory. This allows us to run MSAA framebuffers at close to the same speed as non-MSAA ones, as the bandwidth from the tile memory to the general memory is non-MSAA. This is extremely important in VR due to very low pixels per eye-degree introducing noticeable edge aliasing.
In GLES, this is done by rendering into a non-MSAA texture using MSAA framebuffers initialized with functions such as:
Although this works, it has caused plenty of questions before, such as “I don’t think I have MSAA, my texture is just RGBA8888, how do I enable MSAA” due to its implicit nature.
In Vulkan, everything is explicit. Each subpass is defined by a vkSubpassDescription structure containing pColorAttachments, pDepthStencilAttachment and pResolveAttachments. A good Vulkan application should set up MSAA by attaching a 4x (if 4x is chosen) MSAA buffer for both color and depth in color/depth attachments, and a non-MSAA color buffer in pResolveAttachment (likely the one provided by VRAPI that you intend to show to the user). The application should also setup the pColorAttachments and pDepthStencilAttachments storeOperations as VK_ATTACHMENT_STORE_OP_DONT_CARE. This tells the driver “I will not require either this color or depth later, please do not store them out of the tile memory and discard them instead”. It is the well-designed equivalent of the GL invalidate calls we currently have throughout the GL pipeline for the depth buffer, but this time without trying to race the implicit flushes to invalidate before the color starts being stored. For bonus points, the MSAA color and depth buffers should be setup as transient, and their memory lazily allocated, as those buffers won’t actually ever see memory (the only thing that ever will is the color resolve attachment in pResolveAttachment). See below for an example of a good pipeline state with MS Arrays in color0/depth and non-MS in Resolve1.
It is important to use this path (MSAA handling through the renderpass/subpass system) rather than using the vkCmdResolveImage path. The vkCmdResolveImage path is a PC-focused API which wouldn’t make use of the tiler’s ability to resolve during store, and so an engine doing MSAA through vkCmdResolveImage commands would store the 4x MSAA data in memory, load it again from memory to the tiler, then resolve the non-MSAA one back to memory. This could easily add 3ms on the GPU.
Multiview is an extension allowing the GPU driver to execute a draw call N times whenever it’s being submitted, on N-different slices of a texture array. It is used in VR to draw, with a single draw call, the left and right eyes on a 2-deep texture array. It is supported in Vulkan via the VK_KHR_Multiview extension. It requires the color, depth and resolve images (and potentially FFR, as explained later) to be 2D Arrays instead of 2D, and requires adding the VkRenderPassMultiviewCreateInfo structure to the renderpass you want to execute with Multiview. It correctly (as tested on UE4.23) supports multiple subpasses. The renderdoc pipeline state picture (same as the one above) shows a multiview capture with 2 views (one for the left eye, one for the right).
VkRenderPassMultiviewCreateInfo has a pViewMasks argument asking for a bitwise mask of the number of views to multi-view. A 2-view system requires bit0 and bit1 to be true, for a viewMask value of 0b11.
Our runtime natively supports texture arrays as input textures for timewarp compositing. Native devs should use this by creating their textures with the VRAPI_TEXTURE_TYPE_2D_ARRAY enum as opposed to rendering into multiview buffers, then manually copying them into non-multiview images to send to VRAPI (which is computationally demanding!)
From a native developer’s standpoint, FFR is extremely different between GL and Vulkan due to the large architecture differences between both FFR APIs. Conceptually FFR is a rendering setting that should apply to framebuffers as it modifies how the GPU computes the frame, without affecting the textures in any way (all texels in both the color and depth textures are filled no matter what).
The original FFR extension from QCOM was actually supposed to be applied to the framebuffer through a glFramebufferFoveationConfigQCOM function, which to be fair is where it should live in the API. However, for the Oculus Go launch, we wanted to introduce this feature without requiring deep engine/application changes. More importantly we wanted our runtime to control the application’s foveation settings, so that we didn’t require individual developers and engines to control FFR settings, and have homogeneous settings throughout our platform. We therefore asked QCOM to store the FFR metadata in the color textures (which we control, as the runtime allocates the textures, but not the framebuffers) and thus was born glTextureFoveationParametersQCOM, whose implementation from the developer’s standpoint is pretty much hidden in the runtime. Developers request an FFR level (off/low/medium/high) and we configure the lower-level stuff for them automatically.
This solution, although it served us well for Go and Quest, has multiple challenges. First, since it is driven by VRAPI modifying the color texture metadata, every pre-pass (that doesn’t render into the color texture) is non-foveated. This is fine for now as most of our applications are singlepass, but that won’t hold forever as developers want to enable (for example) HDR pipelines. Second, the QCOM functions have a hardcoded foveation function model that doesn’t let us select the resolution curves we really want depending on our field of view and lenses.
Vulkan fixes this completely by introducing the VK_EXT_fragment_density_map extension. Instead of hardcoded equations, application renderpasses that want foveated rendering add another image attachment which won’t be a write attachment, but a read attachment. That attachment is a R8G8 pixel density image, whose resolution is /32 compared to your normal attachments (a 1216x1344 renderpass will have a 38x42 fragment density attachment) and drives which resolution that area of the framebuffer should be rendered in.
See below for fragment density image example. The yellow area is our 1:1 high resolution area, with the pixel density decreasing as we go further from the center of the screen.
This texture, although the above one has a foveated shape, can be of any shape you want. For example, it is completely possible to decrease, in an open-world game, the sky’s effective resolution if the developer is less interested in spending time there. This would be completely impossible with the equation-focused GL extension. The texture can also be bound to any renderpass, so a developer on a 2-pass HDR-LDR renderer can simply bind the texture in both renderpasses and get FFR on both. Given that we still want to maintain the VrDriver control over the foveation curves, but that the rendepass is solely controlled by applications, we ended up with a hybrid approach. When the developer creates a swapchain to get the color textures from the Oculus runtime in Vulkan, under the hood it creates another swapchain: the foveation swapchain where index 0 of the color swapchain is matched to index 0 of the foveation swapchain, and so on. A developer can request the foveation swapchain images through the new vrapi_GetTextureSwapChainBufferFoveationVulkan function and, without requiring any modification to those images, should simply bind them to the renderpasses that the developer wants to enable FFR on. At the very least the renderpass that renders into the normal VRAPI-allocated color swapchains. When using the normal FFR control API, under the hood our runtime will then modify the foveation swapchain images to generate a foveation image whose foveation strength corresponds with the setting the developer requested. The image above is what our runtime generates for a foveation setting of High. Without any further changes in the engine, the image will automatically switch foveation curves, and the next renderpass’s execution will use the new FFR settings, exactly like if the developer was using OpenGLES’s FFR extension.
One of the huge benefits of Vulkan is that it is a graphics API shared between PC and Mobile. On top of that, it is a stateless API, meaning that there is no underlying state that the driver needs to store between commands. This dramatically helps tools such as RenderDoc which has amazing support for Vulkan. It has support for MSAA, Multiview, VK_EXT_fragment_density_map and works with Oculus Quest since RenderDoc 1.4. Fragment density map textures can be visualized in RenderDoc to make sure the foveation you’re getting is the one you expect, and the frame is replayed with foveation on so you can see the real texture output you’ll get in-HMD.
The Oculus 4.22.2 release 1.39.0 on Oculus’s UE4 GitHub fork is the first version of UE4 that we recommend for Vulkan development. It implements MSAA, Multiview, FFR, sRGB-native rendering and multiple UE4-Vulkan bug fixes we were experiencing on our platform. Note: to access these files on GitHub, you must be logged into a subscribed account, otherwise you will get a 404 error when accessing this link.
We’re seeing pretty good performance gains on it, a development build of SunTemple with a fixed HMD camera goes from 16ms to 13ms on the renderthread. On top of that, the loading speed is significantly enhanced due to the shaders being SPIRV-precompiled instead of GLSL.
More than simply speed, Vulkan’s enhanced design and inherent multithreading permits us to ship features that GLES wouldn’t permit such as in-engine timer queries every frame. The GPU timer data in ‘stat unit’ now works and is driven by engine-generated timer queries instead of averaged out data carried back from the runtime, and ‘stat gpu’ gives you per-renderpass information. Vulkan also enables us to start thinking about future theoretical features like subpass-based HDR, multithreaded shader loading, simple tonemapping without the need to resolve the base pass into RAM and back into the tiler, and so on.
Although we have found this release combined with the latest Quest OS to be stable on Vulkan, it is early for UE4-Vulkan-VR, so please let us know if you encounter any issues (bonus points if you have repro steps on renderdoc captures and UE4 projects!)
Everybody’s favorite VrCubeWorld_Vulkan sample in our Oculus Mobile samples directory now implements all of those features, so please take a look there for inspiration!
Hopefully, this post either brought clarity to your current Vulkan graphics development efforts or if you’re a native or UE4 developer, motivated you to go try Vulkan programming on Quest! It should make your graphics code clearer and faster, on top of allowing easier code sharing between PC and Mobile platforms.
Check out the OC6 presentation below for more information about developing with Vulkan for mobile VR rendering. We are working to make it our preferred graphics API at Oculus, and we hope you’ll do the same.