Tech Note: Profiling & Optimizing on Mobile Devices
Oculus Developer Blog
|
Posted by Gabor Szauer
|
May 30, 2018
|
Share

This blog post will explore when and what to optimize on mobile devices. A high level overview of game optimization is provided, as well as a workflow for profiling.

When to Optimize

Whenever we talk about optimization, the first thing everyone tends to think of is Frames Per Second (FPS). While FPS is a good metric if you want to know when to optimize, it doesn't provide any insight into what to optimize. Oculus provides the OVR Metrics Tool for mobile devices.

The OVR Metrics Tool is a standalone Android application you can install on any Android device. This tool will provide a real time frame graph without any application side integration.

To use the OVR Metrics Tool, download it and install the apk. Use the -g flag when installing to automatically grant all permissions.

adb install -g OVRMetricsTool_v1.1.apk

Once installed, enable the metrics tool with the following commands:

    adb shell setprop debug.oculus.notifPackage com.oculus.ovrmonitormetricsservice
    adb shell setprop debug.oculus.notifClass com.oculus.ovrmonitormetricsservice.PanelService

The next time a VR application launches, a frame graph similar to the image above should show up. Any time the phone is restarted, the above commands need to be re-run to start the metrics tool.

Optimization Goals

A better metric for optimizing games is milliseconds per frame! Measuring each frame in milliseconds provides an actionable goal. Every frame has to finish all its CPU and GPU work in a set amount of time (usually measured in milliseconds) to hit a certain frame-rate.

When measuring milliseconds per frame, each frame has a budget, a time it must fit into. Common performance targets for mobile devices are:

  • 30 FPS = 33.3 milliseconds per frame
  • 60 FPS = 16.6 milliseconds per frame
  • 72 FPS = 13.7 milliseconds per frame

Let's take a look at a simple example of using milliseconds as a frame budget. The below graph is something you might see from any profiler. The game in question is running at 30 FPS. The goal is to get it running at 60!

When looking at the above image, we can see that one frame takes 21.3 milliseconds to update and render. The application waits for the next V Synch, which is 12 milliseconds away. This makes the total frame time 33.3 milliseconds, or 30 frames per second.

In order to run at 60 frames per second, the time it takes to update and render a frame must be less than 16.6 milliseconds. In the above example, the game logic takes significantly longer than rendering the game. 4.7 milliseconds need to be cut, and the game logic seems like a good place to start.

Think of the green bar in the above graph that represents the Wait for V-Synch time as padding. It can't be optimized away, and will always fill out the time it takes to the next frame. After optimizing, the above frame should look something like this:

After optimization, every frame fits into a snug 16.6 millisecond window. There are now two frames spanning the 33.3 milliseconds that only one frame took up before. The optimization in this example was all CPU side, reducing the game logic from 15.3 milliseconds to 9 milliseconds. The wait for V-Synch time is still there, as expected but it only has to wait for 1.6 milliseconds. The game in the above graph is running at 60 frames per second.

Things get a little more complicated in the real world. The example above shows a simple game loop running on one thread, where rendering happens only after all game logic is done executing. All modern engines have multi-thread rendering. That is, there is a render thread and a main thread. The main thread executes all of the games logic, and submits draw calls to the render thread. The main thread is able to start processing the next frame while the render thread is working on rendering the current frame. This will be covered in depth in a future blog post.

Profiling Workflow (What to optimize)

In the above example, it was obvious what needed to be optimized. This was an unrealistic example, in an actual game it's usually much harder to tell what needs to be optimized. Having a plan on how to profile a game will make tracking down what needs to be optimized much easier.

CPU or GPU Bound?

The first question when profiling, is the game in question GPU bound or CPU bound? Determining if a game is CPU or GPU bound is pretty easy, don't render anything, to do this is to turn off the render camera and let the game continue to run. Doing this will eliminate the cost of the render pipeline, IE culling, submitting draw calls, running shaders, etc... Keep an eye on both the frame rate of the game, and the milliseconds each frame take.

  • If the games performance is not affected, or affected very little, the game is likely CPU bound.
  • If performance improves significantly, the game is likely GPU bound.
CPU Bound

Common causes for a game to be CPU bound are the complexity of game logic, physics simulation, gc stalls, etc.

Use an instrumented profiler, like the ones built into Unity and Unreal to track down performance bottle necks.

Focus on optimizing only the most expensive code paths. Any game logic that takes longer than two milliseconds can probably be optimized.

GPU Bound

When a game is GPU bound, it can generally be categorized into one of two states: vertex bound or fragment bound.

A game that is vertex bound has issues with scene complexity. On the other hand, a fragment bound game has issues with shader complexity.

The way to test for this is to render less pixels. You can do this by setting the games render scale to something really small, like 0.01. This will cause less fragments to be rendered, but keep the scene complexity.

  • If performance is not affected, the game is likely Vertex bound.
  • If performance improves, the game is likely Fragment bound.

To change the render scale in Unity, set the eyeTextureResolutionScale. In Unity 5.6 it's called the renderScale:

UnityEngine.XR.XRSettings.eyeTextureResolutionScale = 0.01f;
/* Legacy: */ UnityEngine.VR.VRSettings.renderScale = 0.01f;

To change the render scale in Unreal, use the SetScreenPercentage function:

UHeadMountedDisplayFunctionLibrary::SetScreenPercentage(0.01f);

Vertex Bound

The way we determined if an application is CPU or GPU bound (by disabling all render cameras) means that some CPU side calculations like frustum culling are considered being vertex bound. While this isn't completely accurate, these operations are often optimized by the same actions we take to optimize a vertex bound game. Common issues for a vertex bound game are:

  • Culling objects is taking too long
  • Too many draw calls are being issued
  • Too many vertices are being rendered

Tools like Unity's frame debugger or GPU profiling in Unreal can help find these bottle necks. Simplifying geometry and reducing draw calls will often fix these issues. Consider using some kind of a LOD system and batching draw calls.

Fragment Bound

If a game is fragment bound, one or more of its shaders need to be optimized. Tools like the Qualcomm Snapdragon Profiler or Mali Graphics Debugger can help identify and fix these shaders.

Pixel complexity and overdraw tend to be the main issues for fragment bound shaders. Be sure to read the Oculus rendering guidelines.

Rinse and Repeat

Profiling and optimizing is an iterative process. Chances are, optimizing the first contention away will not get the game to perf. It usually takes a few rounds of optimization to get to the desired frame rate. Go through the full profiling chart after each optimization pass. If the game was CPU bound before, and it's been optimized, it could end up being fill bound next.

Below is a high level flow chart of what to test for. Keep following the chart until you hit frame rate!

Additional Resources