Draw Call Metrics

Draw call metrics can be retrieved by using RenderDoc for Oculus to perform a draw call trace on a frame capture. The following table lists currently available metrics.

Note that all percentage values are of clock cycles for a draw call.

MetricDescription
ClocksNumber of GPU clocks that elapsed while a draw call was being executed. If a draw call does not touch some tiles, there will be no GPU clocks added from rendering that tile. However, there will be a small fixed cost of setup overhead for that tile.
% Vertex Fetch StallPercentage of clock cycles where the GPU cannot make any more requests for vertex data. A high value for this metric implies the GPU cannot get vertex data from memory fast enough, and rendering performance may be negatively affected
% Texture Fetch StallPercentage of clock cycles where the shader processors cannot make any more requests for texture data. A high value for this metric implies the shaders cannot get texture data from the texture pipe (L1, L2 cache or memory) fast enough, and rendering performance may be negatively affected
L1 Texture Cache Miss Per PixelAverage number of Texture L1 cache misses per pixel. Lower values for this metric imply better memory coherency. If this value is high, consider using compressed textures or reducing texture usage.
% Texture L1 MissNumber of L1 texture cache misses divided by L1 texture cache requests. This metric does not consider how many texture requests are made per time period, but is a simple miss to request ratio.
% Texture L2 MissNumber of L2 texture cache misses divided by L2 texture cache requests. This metric does not consider how many texture requests are made per time period, but is a simple miss to request ratio.
% Stalled on System MemoryPercentage of draw call cycles the L2 cache is stalled while waiting for data from system memory.
Pre-clipped PolygonNumber of polygons submitted to the GPU before any hardware clipping.
% Prims Trivially RejectedPercentage of primitives that are trivially rejected. A primitive can be trivially rejected if it is outside the visible region of the render surface. These primitives are ignored by the rasterizer.
% Prims ClippedPercentage of primitives clipped by the GPU, where new primitives are generated. For a primitive to be clipped, it must have a visible portion inside the viewport, but extend outside the "guardband", which is an area that surrounds the viewport and significantly reduces the number of primitives the hardware has to clip.
Average Vertices / PolygonAverage number of vertices per polygon. This will be around 3 for triangles, and close to 1 for triangle strips.
Reused Vertices / SecondNumber of vertices used from the post-transform vertex buffer cache. A vertex may be used in multiple primitives; a high value for this metric (compared to number of vertices shaded) indicates good reuse of transformed vertices, reducing vertex shader workload.
Average Polygon AreaAverage number of pixels per polygon. Adreno's binning architecture will count a primitive for each bin it covers, so this metric may not exactly match expectations.
% Shaders BusyPercentage of time that all shader cores are busy.
Generally, the shader is considered busy any time it is processing a draw, so stall cycles count as busy cycles, as do cycles where the shader core is making actual progress.
The shader core will often not be busy during memory load and store operations. There are also times at the beginning of a bin when the GPU is busy getting a draw call set up (processing state, fetching vertex indices, fetching vertices) but the shader does not yet have vertices or fragments to process.
Vertices ShadedNumber of vertices submitted to the shader engine.
Fragments ShadedNumber of fragments submitted to the shader engine.
Vertex InstructionsTotal number of scalar vertex shader instructions issued. Includes full precision ALU vertex instructions and EFU vertex instructions. Does not include medium precision instructions, since they are not used for vertex shaders. Does not include vertex fetch or texture fetch instructions.
The GPU ALU/EFU hardware counters count scalar instructions. Vector operations are counted as multiple scalar operations.
Fragment InstructionsTotal number of fragment shader instructions issued. Reported as full precision scalar ALU instructions, where 2 medium precision instructions equal 1 full precision instruction. Also includes interpolation instructions (which are executed on the ALU hardware) and EFU (Elementary Function Unit) instructions. Does not include texture fetch instructions.
The GPU ALU/EFU hardware counters count scalar instructions. Vector operations are counted as multiple scalar operations.
Fragment ALU Instructions(Full)Total number of full precision fragment shader instructions issued. Does not include medium precision instructions or texture fetch instructions.
Fragment ALU Instructions(Half)Total number of half precision scalar fragment shader instructions issued. Does not include full precision instructions or texture fetch instructions.
The Oculus Quest supports high precision (32-bit) and medium precision (16-bit) operations. If you specify "lowp" in your shader, that will map to a 16 bit operation and count with the Half precision counters.
Fragment EFU InstructionsTotal number of scalar fragment shader Elementary Function Unit (EFU) instructions issued. These include math functions like sin, cos, pow, and so on.
Textures / VertexAverage number of textures referenced per vertex.
Textures / FragmentAverage number of textures referenced per fragment.
ALU / VertexAverage number of vertex scalar shader ALU instructions issued per shaded vertex. Does not include fragment shader instructions.
ALU / FragmentAverage number of scalar fragment shader ALU instructions issued per shaded fragment, expressed as full precision ALUs (2 mediump = 1 highp). Includes interpolation instruction. Does not include vertex shader instructions.
EFU / FragmentAverage number of scalar fragment shader EFU instructions issued per shaded fragment. Does not include Vertex EFU instructions
EFU / VertexAverage number of scalar vertex shader EFU instructions issued per shaded vertex. Does not include fragment EFU instructions
% Time Shading FragmentsAmount of time spent shading fragments compared to the total time spent shading everything.
% Time Shading VerticesAmount of time spent shading vertices compared to the total time spent shading everything.
% Time ComputeAmount of time spent in compute work compared to the total time spent shading everything.
% Shader ALU Capacity UtilizedPercent of maximum shader capacity (ALU operations) utilized. For each cycle that the shaders are working, the average percentage of the total shader ALU capacity that is utilized for that cycle.
The ALUs are a large SIMD array that process many vertices or fragments at a time. If all the ALU elements in the array are active in one cycle, that ALU is operating at full capacity. However, there are times where not every ALU element is active. In the case of very small triangles, for example, the way the GPU allocates work will leave some of the (fragment) ALU elements empty. Or, if some fragments pass the z test and some nearby fragments fail, there may also be some empty slots in the ALU array. This metric attempts to convey how efficiently the workload is running on the ALUs. If the ALU capacity utilized is near 100%, that means the ALUs are working as efficiently as they can, with every entry in the SIMD doing useful work every cycle. If this metric is low, it means that there are empty slots in the ALU SIMD and the system is not running as efficiently as it could. Note, however, that low ALU utilization isn’t necessarily bad. It can be low just because there isn’t a lot of ALU work compared to other work (such as texture work).
% Time ALUs WorkingPercentage of time the ALUs are working while the shaders are busy.
% Time EFUs WorkingPercentage of time the EFUs are working while the shaders are busy.
% Nearest FilteredPercent of texels filtered using the "nearest" sampling method.
% Linear FilteredPercent of texels filtered using the "linear" sampling method.
% Anisotropic FilteredPercent of texels filtered using the anisotropic sampling method.
% Non-Base Level TexturesPercent of texels coming from a non-base MIP level.
Read Total (Bytes)Total number of bytes read by the GPU from memory. This represents the total amount of data read by the GPU, regardless of which block requested the memory.
Write Total (Bytes)Total number of bytes written by the GPU to memory. This represents the total amount of memory written by the GPU during the sample period, regardless of which GPU block was doing the writing.
Texture Memory Read BW (Bytes)Bytes of texture data read from memory. This represents data requested by the texture pipes for any type of operation (vertex textures, fragment textures, compute operations that read a texture).
Vertex Memory Read (Bytes)Bytes of vertex data read from memory. This represents data read by the vertex processing pipeline besides texture data (vertex positions, attributes).
SP Memory Read (Bytes)Bytes of data read from memory by the shader processors. This represents data requested by the shader processor through an explicit load type operation.
Avg Bytes / FragmentAverage number of bytes transferred from main memory for each fragment.
More accurately, this is the average amount of texture data read per fragment. It divides the texture memory read by the number of fragments shaded. This is not a particularly precise metric, but for many graphics use cases, it provides a relatively accurate picture of how much texture data is required (on average) for each pixel.
Avg Bytes / VertexAverage number of bytes transferred from main memory for each vertex. This metric divides the Vertex Memory Read (Bytes) metric described above and divides it by the number of vertices shaded.
Avg Preemption DelayAverage time (us) from the preemption request to preemption start.
This is an average because preemption can happen more than once. In practice, it is unlikely the same draw call will be preempted multiple times. The same set of metrics are used at various levels of granularity, so for cases where this is used over a longer time period multiple preemptions would be more likely.