Dive in to learn how Adreno Offline Compiler (AOC) can help you analyze and optimize the material shader performance.
Background
Before AOC was integrated, Unreal developers didn’t have a fast, iterative way to improve the material shader offline for Meta Quest. As a workaround, two common methods were utilized:
Shader Stats in Unreal Engine: Meta Quest headsets run on Adreno GPU, but this tool is only applicable for Mali GPU. The only stat available is instruction count, which doesn’t show real shader performance well because different types of instructions could each perform differently. For example, there’s a large performance difference between ALU instructions and memory access instructions.
Shader Stats in RenderDoc: To get more accurate shader performance statistics, Quest developers have used RenderDoc in order to collect the online stats—however, this method requires RenderDoc expertise, which can add to development time. Additionally, this method has a slow material iteration—launching samples on the GPU takes time, and the RenderDoc stats can only be retrieved from the online GPU.
With the release of AOC, Quest developers can now access rich shader stats and collect them offline with minimal effort, greatly shortening material iteration time.
Adreno Offline Compiler (AOC) Introduction
As shown in the picture above, AOC provides rich shader stats for different shaders (VS, FS, binning VS, etc.) directly inside the UE5 material editor. Additionally, AOC supports various command line options including GPU arch targets, view mask, graphics APIs, and more, which gives developers and game engines greater flexibility. Note: All of the AOC stats are explained in detail in the OfflineCompiler.html installed together with binaries.
AOC is now well integrated into the Shader Stats window of Unreal Engine, and developers can access and export all shader stats easily. Please visit our
documentation for usage details.
Please note that we recommend using AOC to check the material performance trends by changing the material properties and then checking the shader stats changes. We don’t recommend using AOC to compare the performance of materials whose shaders could be hugely different—for example, when one material has flow controls and many memory accesses while the other has high register pressure. In this case, it’s very challenging to use AOC to estimate the performance difference.
Getting Started: Setting Up Your Environment
To better demo AOC’s effectiveness, we’ll go over several use cases below where we change some material’s properties and then check the performance trend by using shader stats of AOC, shader stats of RenderDoc, and the realtime App time. Follow the links below to set up your environment (all use cases were run using UE5, AOC1.3, and Meta Quest 2):
AOC: Follow our AOC
documentation to set up AOC and materials stats
Note: We use AOC1.3 because RenderDoc uses the same stats, making it easier to compare the results between the two tools (AOC1.3 has a stat called “Shader processor utilization percentage” and AOC1.4 has “ALU fiber occupancy percentage,” which are similar but not identical). For GPU real-time measurement (example output below), we’re only using the App time for each use case.
04-16 23:21:46.004 475 2186 I VrApi : FPS=72/72,Prd=29ms,Tear=0,Early=0,Stale1/2/5/10/max=0/0/0/0/0,VSnc=0,Lat=-1,Fov=0D,CPU4/GPU=4/3,1478/490MHz,OC=FF,TA=0/0/0,SP=N/N/N,Mem=2092MHz,Free=3030MB,PLS=0,Temp=33.7C/0.0C,TW=0.00ms,App=3.86ms,GD=0.00ms,CPU&GPU=10.07ms,LCnt=1(DR0,LM0),GPU%=0.28,CPU%=0.28(W0.32),DSF=1.00,CFL=19.79/21.58
Use Cases
The four use cases below demonstrate how you can use AOC to analyze and optimize material shader performance.
- Reduce Texture Operations: The image below shows three texture operations in Material v1.0 are replaced by a single one in Material v2.0, and another two texture operations are replaced by one, as well.
- AOC Shader Stats Delta: The shader stats below are from AOC, where the left side is for Material v1.0 and the right side is for v2.0. Both “Texture read instruction count” and “Long latency sync instruction count” have decreased, which signals that performance improved. Note: Some other stats (i.e., ALU) changed as well but are unlikely to affect performance as significantly.
- Validation
- App Time: App time decreased from 4.04ms for Material v1.0 to 3.86ms for Material v2.0, confirming that AOC’s stats trend is correct.
- RenderDoc Shader Stats Delta: The stats from RenderDoc below show a similar trend as those from AOC. Note that the absolute value of each stat might be different between RenderDoc and AOC. For example, the “Instruction count all” of RenderDoc is 375 while that of AOC is 359.
Change Precision: The results below are generated by creating a material based on M_Water_Ocean_Math and then changing its precision to full precision. The left side shows the original material and the right side shows the full precision. As a result, 32bit ALU instruction count increased a lot and 16bit counts reduced greatly. And other stats like complex instructions, register footprint, and “Shader processor utilization percentage” are on the same trend (worse performance).
- AOC Shader Stats Delta
- Validation
- App Time: App time increased from 4.8ms for Regular Precision to 5.95ms for Full Precision, confirming that AOC’s stats trend is correct.
- RenderDoc Shader Stats Delta: Below, the stats from RenderDoc show a similar trend with those from AOC.
- Replace Complex Math Operations with Simple Ones: In the image below, we replaced some complex math operations with simple ones in VertexShader.
- AOC Shader Stats Delta: Complex instruction count reduces and ALU instruction increases. Considering that normally ALU instruction is much cheaper than complex instruction, we’re relatively confident that the performance improves.
- Validation
- App Time: App time decreased from 4.41ms for Material v1.0 to 4.38ms for Material v2.0, confirming that AOC’s stats trend is correct.
- RenderDoc Shader Stats Delta: Stats from RenderDoc below show a similar trend with those from AOC. Note that the example below does not have any noises like “ALU instruction count 32bit” in AOC, and all stats below show that the performance increased. This shows that sometimes, AOC and online GPU could be a little different for some stats’ trends.
- Replace Simplex Noise Function with Voronoi: In the image below, we replace a material’s Simplex noise function with the Voronoi one.

- AOC Shader Stats Delta: “Total instruction Count” and “Texture read instruction count” are significantly reduced, and register footprint is also reduced—however, we got a higher “Flow control instruction count.” This sends a mixed signal, and it’s almost impossible to compare these extra branches’ performance penalty against the texture instructions' performance gains. This also means that based on this stats comparison, we shouldn’t conclude the performance trend. Instead, we should collect real-time performance data to decide the performance trend. Note that if we open the generated SPIR-V instructions, we’ll see these branches mainly come from certain loop branches. If one shader has one more loop than another shader, the performance could be hugely different.
- Validation
- App Time: App time increased from 26.8ms for Simplex Noise to 124ms for Voronoi Noise, confirming that AOC’s stats trend is a mixed signal and we shouldn’t estimate the performance trend based on AOC’s results.
- RenderDoc Shader Stats Delta: The stats from RenderDoc below show very different results (especially the “Total instruction count”) compared to those from AOC. The main reason might be related to the branch optimization difference between the online GPU compiler and the offline one which lacks loop-related contextual information. As shown below, the only big performance penalty comes from the flow control, and it is huge per the above App time.
For all use cases above, we focused on the common stats between AOC and RenderDoc to show an apples-to-apples comparison of AOC’s effectiveness. But, even for the common stats, the absolute values of AOC and RenderDoc might be different because the online GPU compiler (used by RenderDoc) works with other driver data on-the-fly and does much more optimizations than the offline AOC.
Adreno Offline Compiler Limitations
The use cases above demonstrate how you can better understand and optimize your apps using AOC. There isn’t a single stat that can tell you if the material performance trend is good or bad—instead, you can use different shader stats for different materials based on your unique app.
From a high-level perspective, parallelism and memory access are two critical factors affecting shader performance. Register pressure can affect the parallelism heavily while memory access can introduce major latency if they aren’t hidden well. That is why “register footprint,” “ALU fiber occupancy percentage,” and any memory-related stats, including texture ones, are often interesting.
On the other hand, if the material uses complex instructions or if it uses flow controls which could have bad performance penalties, then you might want to focus on “Complex instruction count” and “Flow control instruction count.” Normally, ALU instructions are very fast, and that’s one of the reasons why it’s important to be careful of “Total instruction count,” which can be misleading. Higher “Total instruction count” doesn’t necessarily indicate bad performance, and a detailed look at other stats will help inform what further performance trend could look like.
Unreal Engine Integration Status
Unreal Engine’s framework doesn’t support offline compiling the entire shader pipeline (VS+FS) well. As a result, when AOC is integrated into Unreal Engine, we always compile each shader, VS or FS, separately with AOC, and we assume there are no cases where shaders could be optimized greatly by compiling the whole shader pipeline (for example, VS output not being used by FS). We also assume there are only VS and FS (no other shader types—Multiviews are also supported).
Currently, the integration is only supported in Meta’s Unreal Engine fork. We’re working with Epic to upstream this feature to the Unreal fork.
Get Started
As shown in the examples above, we believe AOC is a powerful tool that can help you analyze and optimize the material shader performance. Thanks to its ability to run offline, AOC enables you to greatly reduce the material iteration time. To get started, check out our
documentation.
For more news and updates for developers building on Meta Quest, be sure to follow us on
Twitter and
Facebook.