Improving GPU Profiling on Oculus Quest
Oculus Developer Blog
|
Posted by Jimmy Lee, Remi Palandri
|
June 19, 2020
|
Share

For the past year, we’ve been working with Qualcomm to build the Performance Interface Library (PIL), a low level on-device library that gives us GPU information that was previously only available through Snapdragon Profiler. This library is now embedded inside the Quest operating system and gives us two main types of information: 1. Render Stage metrics and 2. Real-time metrics. This information is now accessible through two new tools, GPU Systrace and ovrgpuprofiler, to show you exactly what the GPU is doing, with minimal GPU overhead.

GPU Systrace

One of the core issues with most existing GPU tools is that they measure time as draw call sequences, whereas mobile GPUs render surfaces tile after tile. There is no way to query tile time and information, as tiles are abstracted below the graphics API. PIL, however, gives us a way to query all of this data, which effectively tells us “the GPU rendered a 1216x1344 surface with 96 tiles that are all of size 192x168, and that took 5.2ms.”

GPU Systrace integrates render stage information into systrace for a better visualization experience and lets you visualize the GPU and CPU workloads in the same view, allowing you to see how your application’s CPU and GPU workloads work together.

VrCubeworld sample trace output — must use Chrome browser

In the screenshot above, you can see a GPU process rendering to a surface shown on the top row, with the CPU processes presented in the bottom rows. Soon after the GPU finishes rendering to the surface, the CPU wait operation (FenceChecker::Wait) is released. Additionally, the GPU surface rendering process is divided into a series of renderstages. Binning is where triangle vertex positions for all draw calls are calculated and assigned to bins which correspond to a partition of the drawing surface. Render represents the total cost of all vertex and fragment operations for one bin. Preempt is the compositor, an OS level service that executes at regular intervals to present the image submitted by the application onto the screen. See our GPU Systrace documentation for a complete list of GPU information available through the tool.

Ovrgpuprofiler Tool

Ovrgpuprofiler is a low level CLI tool on Oculus Quest that provides access to detailed GPU information. It’s built as a super lightweight CLI client that effectively acts as a wrapper on top of the PIL Qualcomm library. It allows you to retrieve two types of information, render stage metrics (like GPU systrace, although simply in text form) and real-time metrics. Its primary goal is to be a low-friction tool that is easy to use, as it’s available directly through adb shell.

In an adb shell prompt, ovrgpuprofiler -m will print the list of all real-time metrics that the tool supports, the first few results being:

monterey:/ # ovrgpuprofiler -m
47 metrics supported:
1 Clocks / Second
2 GPU % Bus Busy
3 % Vertex Fetch Stall
4 % Texture Fetch Stall
5 L1 Texture Cache Miss Per Pixel

If the user wants to retrieve metrics 3 and 5 for example, the user would call ovrgpuprofiler --realtime=”3,5”, which would return those metrics sampled every second:

monterey:/ # ovrgpuprofiler --realtime="3,5"
% Vertex Fetch Stall : 1.057
L1 Texture Cache Miss Per Pixel : 0.166

% Vertex Fetch Stall : 1.082
L1 Texture Cache Miss Per Pixel : 0.166

To query renderstage data, after calling ovrgpuprofiler -e, and relaunching the application (this is necessary to put the application’s GPU context in profiling mode, exactly like in GPU systrace), a call to ovrgpuprofiler -t will return information such as:

Surface 1 | 1216x1344 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4 | 60 128x224 bins | 5.12 ms | 123 stages : Binning : 0.643ms Render : 2.17ms StoreColor : 0.474ms Blit : 0.002ms Preempt : 1.411ms

Render Stage data allows us to answer questions like, “How long is my app actually taking to compute, and how much of that is Timewarp?”

From this data we can see it took 5.12ms to execute, and 1.411ms of that was TimeWarp (the preempt stage). You now have all the information you need to make educated choices for your app.

Conclusion

For more information on PIL, ovrgpuprofiler, and GPU Systrace, please check out our GPU Systrace and ovrgpuprofiler documentation.

This library and accompanying tools are a work in progress. We plan to add more information and features like draw call metrics and additional render stages in the near future. We’re also aware of an issue with Vulkan which creates unnecessary CPU waits when render stage tracing is used. Please comment below or post in the Developer Forums if you have any feedback and stay tuned for more updates to the tools coming soon!