Optimizing Oculus Go for Performance

Oculus Developer Blog
Posted by Remi Palandri, Samuel Gosselin, and Chris Pruett
March 22, 2018

When it launches, Oculus Go will offer remarkable performance at an affordable price point. To deliver this, we've innovated on a few technologies that need developer understanding in order to hit their maximum potential. This engineering-focused article's goal is to present and explain some of them—Fixed Foveated Rendering, Dynamic Throttling, and 72 Hz Mode—to help you build the best looking, performing, and fun apps out there! Hopefully, you'll also get a glimpse into the work we do at Oculus and how we approach solving VR problems!

Fixed Foveated Rendering

We'll start with Fixed Foveated Rendering. FFR is a technology that allows the edges of the eye texture to be rendered at a lower resolution than the center. The effect, which is nearly imperceptible, lowers the fidelity of the scene in the viewer's peripheral vision. Because fewer total pixels need to be shaded, FFR can produce a significant improvement in GPU fill performance. Using FFR, many apps can dramatically increase the resolution of the eye texture that they render to on Oculus Go, improving the final image. Complex fragment shaders also benefit from this form of multi-resolution rendering. Note that unlike some other forms of foveation technologies, Oculus Go's Fixed Foveation system is not based on eye tracking. The high-resolution pixels are “fixed” in the center of the eye texture.

The Problem with Distortion and Pixel Density
Unlike traditional 2D screens, VR devices require that the image displayed to the viewer be warped to match the curvature of the lenses in the headset. This distortion allows us to perceive a higher field of view than we would get just by looking at a raw display. The image below shows the effect of distortion: a 2D plane (the horizontal line) is warped into a spherical shape.

If you look closely, you will see that the pixels of our eye texture are very unevenly represented during this distortion! Many more pixels are needed to create the post-distortion areas at the edge of the field of view (the side blue cone) than the pixels needed to create the center of our FOV. This results in a higher pixel density at the edge of the FOV than in the middle. This is highly counterproductive: users mostly look towards the center of the screen. On top of that, lenses blur the edge of the field of view, so even though many pixels have been rendered in that part of the eye texture, the sharpness of the image is lost. The GPU spends a lot of time rendering pixels at the edge of the FOV that can't be clearly seen. This time could be better spent!

Saving pixel computational time
Fixed Foveated Rendering claws some of this wasted time back by lowering the resolution of the output image during its computation. On Oculus Go, this is implemented by controlling the resolution of individual render tiles on the GPU. Oculus Go uses a Qualcomm 821 SoC, which, like most mobile chipsets, is a tiled renderer. Rather than execute render commands sequentially like a desktop GPU would, a tiled renderer divides its work into rectangles (or tiles) and renders those in parallel. By controlling the resolution of individual tiles, and ensuring that the tiles that fall on the edges of the eye buffer are lower resolution than the center, we can reduce the number of pixels that the GPU needs to fill without perceptibly lowering the quality of the post-distortion image. This translates to a very significant improvement in GPU performance for applications that render a large number of pixels (e.g. to a very high-resolution eye buffer) or employ an expensive fragment shader (e.g. dynamic lighting and shadows).

The screenshot below shows the tile resolution multiplier map for a 1024x1024 eye buffer on Oculus Go with FFR's cranked up its maximum foveation level:

In the white areas at the center of our FOV, the resolution is native: every pixel of the texture will be computed independently by the GPU. However, in the red areas, only half of the pixels will be calculated, 1/4th for the green areas, 1/8th for the blue areas, and finally 1/16th for the magenta tiles. The missing pixels will be interpolated from the calculated pixels at resolve time, when the GPU stores the result of its computation in general memory.

Fixed foveated rendering isn't a one-size-fits-all solution. The perceptibility of artifacts caused by FFR varies depending on the content of the scene. Running FFR on The Well, for example, shows noticeable changes to some of the high-contrast text elements in the starting sacred pool scene. The image below shows a part of that scene rendered with four different FFR settings (left is no FFR, right is FFR with harshest settings), focusing on the area in the bottom right of the HMD screen:

We can notice that low-frequency content like the Oracle's arm and her immediate environment don't suffer from FFR at all. However, high frequency textures such as the text display severe pixelation artifacts as a result of rendering at a lower resolution. You can even notice individual tiles by looking at the Oracle word, where the left side and the right side of the word are rendered on different tiles with different foveation settings. These artifacts are still pretty hard to see from inside the headset because they appear only in our peripheral vision and are blurred by the lenses. The API for Fixed Foveated Rendering allows developers to tune the FFR level (even from frame to frame) to minimize visible artifacts if necessary. In most cases it may be sufficient to simply turn FFR on for in-game scenes and turn it down or off in menu or UI screens.

Fixed Foveated Rendering can benefit UE4, Unity, and native applications. We've seen as much as a 25% gain on pixel-intensive apps. Note that apps with very simple shaders that are not bound on GPU fill will likely not see a significant improvement from Fixed Foveated Rendering.

Dynamic Clock Throttling and Mobile Power Management

All mobile devices must deal with heat and battery issues. Mobile chipsets are very efficient, but they still generate a lot of heat and draw a lot of battery power when crunching on hard problems. On a phone or tablet, it is normal for an overtaxed mobile CPU to simply slow itself down to avoid overheating. On a VR device, that generally leads to a loss of application frame rate, which can immediately make users uncomfortable. Heating up until performance decreases isn't a good solution for a VR headset.

On Gear VR we developed a Fixed Clocks Policy to help developers manage the overall thermal cost of their applications. Developers select a CPU and GPU “level” which allows them to throttle the clock speed of those chips up and down dynamically. This system has allowed developers to route power to either the CPU or the GPU depending on the needs of their scene, and clock the entire system down during idle periods or simple scenes. One of the earliest Gear VR games, Herobound, set the CPU and GPU clocks differently based on the needs of each individual level.

On Oculus Go, we've made the management of CPU and GPU level much simpler by making it almost entirely automatic. This feature is called Dynamic Throttling.

Oculus Go applications are compatible with Gear VR, and the basic power management API remains the same. The developer can set CPU and GPU levels between 0 and 3, which translate to different clock speeds at which the chips can run. However, these levels are now treated as a baseline, and the system can choose to dynamically clock the CPU and GPU up as necessary to maintain performance. The goal of this system is to retain advantages of Gear VR's Fixed Clock Policy while trying to mitigate its drawbacks.

Oculus Go can dissipate heat much more effectively than a phone. This allowed us to create a CPU and GPU level 4, unlocking significantly more performance than the S7. We wanted all apps on the platform, not just those that were made for Oculus Go, to benefit from this improvement. Building on previous work, we developed a dynamic system that can move the throttling levels based on CPU and GPU utilization to maximize performance and minimize thermal increase and battery drain.

Writing a dynamic throttler is harder than it sounds. One of key challenges was just coming up with a way to actually quantify processor utilization.

For a GPU, this is a pretty straightforward task. GPUs are highly parallel systems, but they can only work on one problem at a time. Either the GPU is working on a frame (computing pixels, transferring data around, and so on), or it is doing nothing. We ended up computing GPU utilization by simply dividing the number of ticks that the GPU is doing work by the number of ticks in the GPU's execution frequency. Our pipeline is parallelized (the CPU works on the next frame while the GPU renders the last frame), so it's easy to tell if the GPU is the bottleneck or not: either GPU is fully utilized and is causing the CPU to wait or it is not the bottleneck. When looking at throttling opportunities, GPU utilization below 90% therefore indicates that the application is not GPU-bound and we don't need to clock it up.

Computing utilization for the CPU, however, is a lot harder. Oculus Go's CPU is a multicore processor: one core can be working while another is sleeping, which makes it very difficult to tell if the CPU execution speed is the bottleneck of the application's overall performance. For example, check out the trace below of a heavy Oculus Go application.

There are lots of white areas, which indicate a core that is doing nothing, so it might seem like this application has plenty of free cycles. But looking closer, one of the cores (CPU 2) is nearly RenderThread-bound: there is less than one millisecond between the end of one execution of a render thread and the start of a new cycle on the next frame! Clocking down the CPU in this case would result in frame drops.

Looking at the raw utilization of a multi-core processor can provide some useful information, but not enough to base good decisions on CPU throttling on. Fortunately we had another signal built into our runtime that helped us identify apps that are good targets for throttling: the missed frame counter. When rendering in VR, our API keeps track of every frame that is missed. An app with a low missed frame count is consistently making its target frame rate, and might be able to run at a lower clock speed. On the other hand, an app missing a lot of frames, when the GPU is clearly not the bottleneck, indicates that we should clock the CPU up even if some of the cores appear to be sleeping.

The governor on Oculus Go will never clock the CPU or GPU below the levels specified by the developer. But it will clock up to ensure a smoother experience when it detects that the system is under significant load. The chart below is a run of the first Daedalus Level, a beautiful UE4 app with reflections and high-end shaders, on an Oculus Go with Dynamic Throttling turned off. The red line represents GPU utilization, and the magenta line at the bottom represents missed frames. The green line is the GPU level (which remains fixed because the dynamic throttler is turned off). The brown and blue lines represent CPU utilization (all cores and the worst core, respectively).

Although the first 45 seconds of the level run fine (the missed frames at the beginning are mainly due to level loading), you can see that the GPU utilization stays very high throughout the run, around 85%. After 50 seconds, it peaks above 95%, and frames start dropping as the GPU becomes the bottleneck (the magenta line describing dropped frames goes up). At its peak, 20 frames are dropped per second for an average frame rate of 40 fps.

If you run the exact same app and level with Oculus Go's Dynamic Throttling enabled, we get this graph:

Though this app has requested GPU level 2, the system clocks it up to GPU 3 to keep utilization around 75%. The system correctly assesses that this application is not CPU-bound (frames are not being missed), and so the CPU clock is left at level 0 for the entire session. Near the end of the capture, around 50 seconds in, the system notices that GPU load has increased and clocks the GPU up to level 4. The spike in GPU load ends about 15 seconds later, and the system clocks the GPU back down accordingly (in this case all way to level 2). As you can see from the graph, Dynamic Throttling effectively removed nearly all of the missed frames, keeping this application at 60 fps without any modifications to the app itself.

When combined with Fixed Foveated Rendering, Dynamic Throttling can unlock a new level of performance on Oculus Go. This performance can be applied to greater scene or shader complexity, or can be easily used to increase eye texture resolution to improve an application's visual quality without changing its assets. Some developers might also choose to spend some of that extra frame time on a new mode that Oculus Go supports: 72 Hz Mode.

72 Hz Mode

Another new feature introduced by Oculus Go is 72 Hz Mode. With this mode Oculus Go apps can choose to target 72 frames per second instead of the normal 60 frames per second. This mode is strictly optional, and in some cases prohibitively expensive, but it can be a significant quality improvement for apps that choose to support it.

Why 72 Hz Mode?
Typically high frame rates for VR devices are associated with lowering latency, particularly when it comes to positional tracking. Oculus Go is not a positionally-tracked device, and though lower head tracking latency is comfortable, it is not the primary reason to run at 72 Hz. Rather, the purpose of this mode is to improve the visual quality of the display.

The Oculus Go display has been tuned to be comfortable at 60 Hz. 72 Hz Mode allows the display to become brighter without causing a perceptible flicker, which improves the visual quality of the screen. In particular, this mode makes the display brighter, and causes colors to pop and appear warmer.

Optimizing for 72 Hz.
Any application that can accommodate 72 frames per second rendering should use 72 Hz Mode when running on Oculus Go. This means rendering at least 2.8 ms faster than usual, which is not always possible. Combined with Dynamic Throttling and Fixed Foveated Rendering, some apps may be able to simply toggle this mode on and run at a higher frame rate. Others may need to do significant optimization to achieve this level of performance. Running at 72 Hz is optional.

72 Hz for Video
Video applications should carefully consider 72 Hz Mode. An application that renders video at 30 or 60 frames per second will look better at 60 fps than at 72. On the other hand, 24 Hz video looks a lot better when running in 72 Hz Mode because the display and the video frame rates can be synchronized to avoid tears (24 is an even divisor of 72). You can switch modes seamlessly at run time.

Go! Go! Go!

We've put a lot of effort into making Oculus Go the best standalone VR device available. With the addition of features like Fixed Foveated Rendering, Dynamic Throttling, and 72 Hz Mode, we expect many hurtles to developing great VR software are significantly lowered on this device. We've tried to keep your runway as straight and smooth as can be. You're cleared for liftoff. Go! Go! Go!