This blog post will explore when and what to optimize on mobile devices. A high level overview of game optimization is provided, as well as a workflow for profiling.
Whenever we talk about optimization, the first thing everyone tends to think of is Frames Per Second (FPS). While FPS is a good metric if you want to know when to optimize, it doesn't provide any insight into what to optimize. Oculus provides the OVR Metrics Tool for mobile devices.
The OVR Metrics Tool is a standalone Android application you can install on any Android device. This tool will provide a real time frame graph without any application side integration.
To use the OVR Metrics Tool, download it and install the apk. Use the -g
flag when installing to automatically grant all permissions.
adb install -g OVRMetricsTool_v1.1.apk
Once installed, enable the metrics tool with the following commands:
adb shell setprop debug.oculus.notifPackage com.oculus.ovrmonitormetricsservice adb shell setprop debug.oculus.notifClass com.oculus.ovrmonitormetricsservice.PanelService
The next time a VR application launches, a frame graph similar to the image above should show up. Any time the phone is restarted, the above commands need to be re-run to start the metrics tool.
A better metric for optimizing games is milliseconds per frame! Measuring each frame in milliseconds provides an actionable goal. Every frame has to finish all its CPU and GPU work in a set amount of time (usually measured in milliseconds) to hit a certain frame-rate.
When measuring milliseconds per frame, each frame has a budget, a time it must fit into. Common performance targets for mobile devices are:
Let's take a look at a simple example of using milliseconds as a frame budget. The below graph is something you might see from any profiler. The game in question is running at 30 FPS. The goal is to get it running at 60!
When looking at the above image, we can see that one frame takes 21.3 milliseconds to update and render. The application waits for the next V Synch, which is 12 milliseconds away. This makes the total frame time 33.3 milliseconds, or 30 frames per second.
In order to run at 60 frames per second, the time it takes to update and render a frame must be less than 16.6 milliseconds. In the above example, the game logic takes significantly longer than rendering the game. 4.7 milliseconds need to be cut, and the game logic seems like a good place to start.
Think of the green bar in the above graph that represents the Wait for V-Synch time as padding. It can't be optimized away, and will always fill out the time it takes to the next frame. After optimizing, the above frame should look something like this:
After optimization, every frame fits into a snug 16.6 millisecond window. There are now two frames spanning the 33.3 milliseconds that only one frame took up before. The optimization in this example was all CPU side, reducing the game logic from 15.3 milliseconds to 9 milliseconds. The wait for V-Synch time is still there, as expected but it only has to wait for 1.6 milliseconds. The game in the above graph is running at 60 frames per second.
Things get a little more complicated in the real world. The example above shows a simple game loop running on one thread, where rendering happens only after all game logic is done executing. All modern engines have multi-thread rendering. That is, there is a render thread and a main thread. The main thread executes all of the games logic, and submits draw calls to the render thread. The main thread is able to start processing the next frame while the render thread is working on rendering the current frame. This will be covered in depth in a future blog post.
In the above example, it was obvious what needed to be optimized. This was an unrealistic example, in an actual game it's usually much harder to tell what needs to be optimized. Having a plan on how to profile a game will make tracking down what needs to be optimized much easier.
CPU or GPU Bound?The first question when profiling, is the game in question GPU bound or CPU bound? Determining if a game is CPU or GPU bound is pretty easy, don't render anything, to do this is to turn off the render camera and let the game continue to run. Doing this will eliminate the cost of the render pipeline, IE culling, submitting draw calls, running shaders, etc... Keep an eye on both the frame rate of the game, and the milliseconds each frame take.
Common causes for a game to be CPU bound are the complexity of game logic, physics simulation, gc stalls, etc.
Use an instrumented profiler, like the ones built into Unity and Unreal to track down performance bottle necks.
Focus on optimizing only the most expensive code paths. Any game logic that takes longer than two milliseconds can probably be optimized.
GPU Bound
When a game is GPU bound, it can generally be categorized into one of two states: vertex bound or fragment bound.
A game that is vertex bound has issues with scene complexity. On the other hand, a fragment bound game has issues with shader complexity.
The way to test for this is to render less pixels. You can do this by setting the games render scale to something really small, like 0.01. This will cause less fragments to be rendered, but keep the scene complexity.
To change the render scale in Unity, set the eyeTextureResolutionScale. In Unity 5.6 it's called the renderScale:
UnityEngine.XR.XRSettings.eyeTextureResolutionScale = 0.01f;
/* Legacy: */ UnityEngine.VR.VRSettings.renderScale = 0.01f;
To change the render scale in Unreal, use the SetScreenPercentage function:
UHeadMountedDisplayFunctionLibrary::SetScreenPercentage(0.01f);
Vertex Bound
The way we determined if an application is CPU or GPU bound (by disabling all render cameras) means that some CPU side calculations like frustum culling are considered being vertex bound. While this isn't completely accurate, these operations are often optimized by the same actions we take to optimize a vertex bound game. Common issues for a vertex bound game are:
Tools like Unity's frame debugger or GPU profiling in Unreal can help find these bottle necks. Simplifying geometry and reducing draw calls will often fix these issues. Consider using some kind of a LOD system and batching draw calls.
Fragment Bound
If a game is fragment bound, one or more of its shaders need to be optimized. Tools like the Qualcomm Snapdragon Profiler or Mali Graphics Debugger can help identify and fix these shaders.
Pixel complexity and overdraw tend to be the main issues for fragment bound shaders. Be sure to read the Oculus rendering guidelines.
Below is a high level flow chart of what to test for. Keep following the chart until you hit frame rate!