ovrgpuprofiler is a low-level CLI tool that developers can use to access an assortment of real-time GPU metrics and perform render stage tracing. It is built to access real-time metrics and GPU profiling data in a convenient, low-friction manner.
ovrgpuprofiler is included with the Oculus Quest runtime and lives on the device itself.
It is recommended that you open a shell via ADB on a connected Oculus Quest when using
ovrgpuprofiler. If not using a shell, precede all commands in this topic with
adb shell <command>.
To list all supported real-time metrics and their ID number, enter the following from the command line when an Oculus Quest is connected via ADB:
The beginning of the output for this command looks like the following:
47 metrics supported: 1 Clocks / Second 2 GPU % Bus Busy 3 % Vertex Fetch Stall 4 % Texture Fetch Stall 5 L1 Texture Cache Miss Per Pixel 6 % Texture L1 Miss 7 % Texture L2 Miss 8 % Stalled on System Memory 9 Pre-clipped Polygons/Second 10 % Prims Trivially Rejected 11 % Prims Clipped
As an alternative,
ovrgpuprofiler -m -v can be used to provide the same list with more verbose descriptions for each metric.
To retrieve data for a metric, the command takes the following format:
`ovrgpuprofiler -r<metric ID number>`
For example, to retrieve the metric Texture Fetch Stall (ID number 4), enter
ovrgpuprofiler -r4 and data will be printed in the console every second until Ctrl-C is pressed.
You can also request multiple metrics at once by separating ID numbers with commas in a string, such as
ovrgpuprofiler -r"4,5,6". The following shows output from
$ ovrgpuprofiler -r"4,5,6" % Texture Fetch Stall : 2.449 L1 Texture Cache Miss Per Pixel : 0.124 % Texture L1 Miss : 20.338 % Texture Fetch Stall : 2.369 L1 Texture Cache Miss Per Pixel : 0.122 % Texture L1 Miss : 20.130 % Texture Fetch Stall : 2.580 L1 Texture Cache Miss Per Pixel : 0.127 % Texture L1 Miss
Note: It is not recommended to request more than 30 real-time metrics at the same time.
ovrgpuprofiler supports render stage GPU tracing on a tile-per-tile level. Unlike direct-mode GPUs, which execute draw calls sequentially, tile-based renderers batch draw calls for an entire surface, then that surface is split into tiles that are computed sequentially, where each tile executes all the draw calls that touched that tile.
ovrgpuprofiler can tell you how much time was spent in each rendering stage for each surface rendered during a trace’s duration.
Tracing on a tile-per-tile level requires the GPU context for the app being traced to be put into detailed GPU profiling mode. To set the OS to start subsequent apps in detailed GPU profiling mode, enter the following command:
If an app is running when the command is entered, it must be restarted for its GPU context to be changed to detailed GPU profiling mode.
ovrgpuprofiler -i shows if detailed GPU profiling mode is enabled, and
ovrgpuprofiler -d disables it.
In addition, apps being used with
ovrgpuprofiler must have the
<uses-permission android:name="android.permission.INTERNET" /> permission in their manifest.
Note: Detailed GPU profiling incurs an approximately 10% overhead in GPU rendering times. Keep this overhead in mind when reading trace output.
To execute a 100 ms trace on the currently running app, enter the following:
Trace length can be specified in seconds by including a number with the
-t argument. For example,
ovrgpuprofiler -t1.2 would run a trace for 1.2 seconds.
The output of the trace is printed to the console, listing the surfaces rendered during the trace along with render stage information.
Lines from the trace output look like the following:
Surface 1 | 1216x1344 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4 | 60 128x224 bins | 5.08 ms | 130 stages : Binning : 0.623ms Render : 1.877ms StoreColor : 0.309ms Blit : 0.002ms Preempt : 1.286ms
This shows that Surface 1 has a resolution of 1216x1344, 32-bit color, 32-bit depth, and uses MSAA4. The surface was broken down into 60 tiles/bins with a size of 128x224, and it took 5.08 ms to render in total. There were 130 render stage executions in the process, and the remaining data states how much time was spent in each render stage. Note that every render stages will not be present for each surface. Render stages that appear include the following
On Oculus Quest,
ovrgpuprofiler will output one surface line per slice for multiview apps. This means that there will be one surface for each eye. You must add the render times of two eye surfaces for the total frame time.
On Oculus Quest 2, however,
ovrgpuprofiler will output one surface line for both views of the surface, due to how the Adreno650 GPU processes multiview commands (Hardware Multiview). On Quest 2, bins of multiview surfaces are shared between both views, so really
135 96x176 bins
on a trace should be interpreted as
135 96x176x2 bins
Render stages that appear include the following:
The following are the recommended command-line arguments available for use with
|-r/--realtime||Prints the value of the real-time metrics every second. Accepts an optional comma-separated list of metrics IDs to track.|
|-m/--metrics||Prints the list of available real-time metrics IDs, their name, and their description.|
|-v/--verbose||Adds more detailed information to most other commands.|
|-e/--enable-detailed||Enables detailed profiling mode on the GPU driver; required for render stage tracing. Only applies to applications started after this mode is started.|
|-i/--is-detailed||Queries if the GPU driver is in detailed profiling mode.|
|-t/--trace||Executes a render stage trace, with an optional trace length as argument in seconds.|