Jun 19, 2020

Improving GPU Profiling on Oculus Quest

Jimmy Lee, Remi Palandri Blog Hero Image

For the past year, we’ve been working with Qualcomm to build the Performance Interface Library (PIL), a low level on-device library that gives us GPU information that was previously only available through Snapdragon Profiler. This library is now embedded inside the Quest operating system and gives us two main types of information: 1. Render Stage metrics and 2. Real-time metrics. This information is now accessible through two new tools, GPU Systrace and ovrgpuprofiler, to show you exactly what the GPU is doing, with minimal GPU overhead.

GPU Systrace

One of the core issues with most existing GPU tools is that they measure time as draw call sequences, whereas mobile GPUs render surfaces tile after tile. There is no way to query tile time and information, as tiles are abstracted below the graphics API. PIL, however, gives us a way to query all of this data, which effectively tells us “the GPU rendered a 1216x1344 surface with 96 tiles that are all of size 192x168, and that took 5.2ms.”

GPU Systrace integrates render stage information into systrace for a better visualization experience and lets you visualize the GPU and CPU workloads in the same view, allowing you to see how your application’s CPU and GPU workloads work together.

VrCubeworld sample trace output — must use Chrome browser

In the screenshot above, you can see a GPU process rendering to a surface shown on the top row, with the CPU processes presented in the bottom rows. Soon after the GPU finishes rendering to the surface, the CPU wait operation (FenceChecker::Wait) is released. Additionally, the GPU surface rendering process is divided into a series of renderstages. Binning is where triangle vertex positions for all draw calls are calculated and assigned to bins which correspond to a partition of the drawing surface. Render represents the total cost of all vertex and fragment operations for one bin. Preempt is the compositor, an OS level service that executes at regular intervals to present the image submitted by the application onto the screen. See our GPU Systrace documentation for a complete list of GPU information available through the tool.

Ovrgpuprofiler Tool

Ovrgpuprofiler is a low level CLI tool on Oculus Quest that provides access to detailed GPU information. It’s built as a super lightweight CLI client that effectively acts as a wrapper on top of the PIL Qualcomm library. It allows you to retrieve two types of information, render stage metrics (like GPU systrace, although simply in text form) and real-time metrics. Its primary goal is to be a low-friction tool that is easy to use, as it’s available directly through adb shell.

In an adb shell prompt, ovrgpuprofiler -m will print the list of all real-time metrics that the tool supports, the first few results being:

monterey:/ # ovrgpuprofiler -m

47 metrics supported:

1       Clocks / Second

2       GPU % Bus Busy

3       % Vertex Fetch Stall

4       % Texture Fetch Stall

    5       L1 Texture Cache Miss Per Pixel

If the user wants to retrieve metrics 3 and 5 for example, the user would call ovrgpuprofiler --realtime=”3,5”, which would return those metrics sampled every second:

monterey:/ # ovrgpuprofiler --realtime="3,5"

% Vertex Fetch Stall                      	     :           1.057

L1 Texture Cache Miss Per Pixel            :           0.166


% Vertex Fetch Stall                                :           1.082

    L1 Texture Cache Miss Per Pixel            :           0.166

To query renderstage data, after calling ovrgpuprofiler -e, and relaunching the application (this is necessary to put the application’s GPU context in profiling mode, exactly like in GPU systrace), a call to ovrgpuprofiler -t will return information such as:

Surface 1    | 1216x1344 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4 | 60  128x224 bins | 5.12 ms | 123 stages :  Binning : 0.643ms Render : 2.17ms StoreColor : 0.474ms Blit : 0.002ms Preempt : 1.411ms

Render Stage data allows us to answer questions like, “How long is my app actually taking to compute, and how much of that is Timewarp?”

From this data we can see it took 5.12ms to execute, and 1.411ms of that was TimeWarp (the preempt stage). You now have all the information you need to make educated choices for your app.

Conclusion

For more information on PIL, ovrgpuprofiler, and GPU Systrace, please check out our GPU Systrace and ovrgpuprofiler documentation.

This library and accompanying tools are a work in progress. We plan to add more information and features like draw call metrics and additional render stages in the near future. We’re also aware of an issue with Vulkan which creates unnecessary CPU waits when render stage tracing is used. Please comment below or post in the Developer Forums if you have any feedback and stay tuned for more updates to the tools coming soon!

Quest

Did you find this page helpful?

Explore more

Apr 17, 2025Growth Insights Series: More Best Practices for New User Onboarding

Explore strategies and best practices to increase retention by supporting user recall and progression during app onboarding.

All, App Submission, Apps, Games, Marketing, Quest

Read article

Apr 15, 2025New to the Meta Horizon Store from App Lab? Here are Tips for Overcoming 5 Key Challenges

Explore the top five issues developers encounter when making the switch from App Lab to the Meta Horizon Store and gain solutions to navigate these challenges successfully.

All, App Submission, Games, Marketing, Quest

Read article

Apr 14, 2025Build Faster and Smarter with GenAI Tools in Meta Horizon Worlds

GenAI tools in the Meta Horizon Worlds desktop editor are now available to creators in the US, UK and Canada. Explore how new features like Mesh Generation can greatly reduce the time it takes to build worlds for mixed reality and mobile.

All, Games, Mobile, Quest

Read article