I’m Cristiano Ferreira, one of the developer relations engineers at Oculus. I work with game developers to help their games run as efficiently as possible and with this article, I will help you do the same! This article is part 2 of a series on RenderDoc, if you’re starting from square one, I recommend taking a look at the first article:
How to Optimize your Oculus Quest App w/ RenderDoc: Getting Started + Frame Capture.
Now that we know how to use RenderDoc, let’s cover what the Oculus Quest hardware and software stack can offer you.
Why and how to use Fixed Foveated Rendering
Fixed Foveated Rendering (FFR) is a graphics feature we support in our OS that can save a substantial amount of time on pixel-fill bound workloads. It works by rendering coarse chunks of pixels out towards the periphery of your eye buffers, using only a single pixel shader invocation. These coarse pixels largely go unnoticed when players are looking out in front of them but do go noticed in certain conditions depending on your foveation level (how aggressive the peripheral pixel blocks are set). These situations include:
High contrast textures that land within the foveation areas
Hard lines on textures that land within the foveation areas
Sharp text that extends into foveation areas
We support 5 foveation levels out of the box that can be updated each frame with no overhead. This quick toggling makes it ideal to ramp up or down depending on the situation in your game. For instance, if there are GPU performance spikes in some cases that you know will happen (ex. player is about to explode a bunch of barrels) you can preemptively ramp up the FFR level on the same frame the collision check to explode the barrels completes. This can be paired with the programmable CPU / GPU frequencies discussed in the next section in a single begin / end function pair to create a ‘performance turbo mode’.
The 5 foveation levels (mapped to enums) are:
0 - off
1 - on (low)
2 - on (medium)
3 - on (high)
4 - on (high) - same number of coarse pixels as level 3, but pushes coarse map to the top of the frame for cases where there are lots of hand interactions and so road / floor textures don’t show as many coarse pixels
FFR ReferencesUnity: set the OVRManager.tiledMultiResLevel property
Unreal: Input/Oculus/Get/SetTiledMultiresLevel in Blueprints
Programmable CPU/GPU Frequencies
The Oculus Quest OS will dynamically throttle or increase CPU/GPU frequencies to save battery life, or help pump out frames faster depending on performance heuristics. Though the OS does a great job using past frame timing heuristics to determine the appropriate energy usage level, nobody knows your game/application better than you. Because of this, we allow developers to set the floor CPU/GPU frequencies for the device on a frame by frame basis with no overhead. In the situation described previously with exploding barrels, you can ramp the CPU and GPU frequencies up to get ahead of any performance spikes. Barrels exploding may involve extra physics, object activations and draw calls that may spike the CPU requirements and extra particle billboards that require extra GPU power for that frame. In other cases, maybe you just need extra CPU power while the GPU is fine or vice versa. You can toggle each level independently. The API works very similarly to setting foveation level for Fixed Foveated Rendering with mapped enums:
CPU and/or GPU level 0: no floor for CPU/GPU frequency
CPU and/or GPU level 1: set floor to low for CPU/GPU frequency
CPU and/or GPU level 2: set floor to medium for CPU/GPU frequency
CPU and/or GPU level 3: set floor to high for CPU/GPU frequency
Keep in mind it’s important to have a “take what you need” philosophy when choosing your floor level. Battery life must be considered so that players can play your game as long as possible before needing to recharge their device. Forcing 4/4 full time may also lead to overheating the device.
Programmable CPU/GPU frequencies: ReferencesWhat is Multisample Anti-Aliasing (MSAA)
MSAA can take your game looking from good to great. There is extra overhead for 2x/4x MSAA, but on mobile the computational cost is nowhere near as substantial as on PC, relatively. Check out the analysis document linked in references. It’s almost always preferred to use some level of MSAA rather than keeping your resolution (renderscale) at native. Text looks much more crisp when rendering with MSAA enabled as well.
MSAA ReferencesTile Based Deferred Rendering vs. Immediate Mode Rendering
Most of the VR developers I have worked with have lots of experience developing for PC or discrete GPUs (Immediate Mode Rendering) but haven’t had as much experience on Mobile GPUs (Tile-based Rendering). The main difference between the two is that Mobile GPUs are optimized for bandwidth (minimizing the amount of external memory accesses the GPU needs during fragment shading) to keep power usage at a minimum for longer term use. To make this happen, geometry is all projected up front and assigned to a tile (a small subsection of the frame buffer) before any shading is started. After all geometry is processed, each tile is shaded and written to external memory one by one. This makes individual draws much cheaper at the cost of a ‘binning’ phase. Here is a quick simplified rundown on how each type of rendering works on the GPU once draw calls are submitted:
Immediate Mode RenderingFor each draw call
For each primitive in draw call:
Execute vertex shader for each vertex.
Execute Fragment shader for each fragment covered by projected primitive.
Tile-Based RenderingPass 1: For each draw in render pass:
For each primitive in draw call:
Execute vertex shader for each vertex.
If primitive is not culled, append to the tile list it is associated with.
Pass 2: For each tile in the render pass (note that all geometry has been projected at this phase in a single pass game/application)
For each primitive in the tile:
For each fragment covered by the primitive:
Execute fragment shader for fragment
Implications of Tile-Based RenderingExternal memory accesses are very slow so you should be weary of any operation that might trigger that scenario to occur. Think of a typically implemented post-effect, as follows:
Render base pass to a temporary buffer.
Switch render target from temporary buffer to swapchain texture.
A swapchain texture is the render target that is eventually presented to the display (as opposed to a temporary buffer which may be created with different parameters).
Bind TempBuffer as an input texture for post-effect to use as a resource.
Execute post-effect using input resource onto swapchain texture.
In this scenario, the GPU is writing the input texture from tile memory to external memory to be used as an input resource. This is also called a
resolve. At the Quest native headset resolution, the cost to resolve the texture can take from 1-1.5 ms just to write to external memory (Note: this does not include the additional cycles required to complete the post-effect which touches every pixel on the high resolution eye buffers). To render at 72 frames per second (required by
Oculus Quest VRC guidelines), you’re given a frame budget of 13.8 ms/frame (1000 milliseconds/72 frames per second) - so this operation can take roughly ~1/13th of your total frame budget alone. Additionally, when you don’t render directly to the swapchain texture, you’ll not receive the visual benefits of MSAA or the performance benefits of FFR as those features only apply to the swapchain texture, not temporary buffers.
Tile-Based Rendering ReferencesConclusion
If you haven’t already, be sure to check out the other articles in this series on RenderDoc:
Thanks for reading.
- Cristiano Ferreira