Oculus Go Development

On 6/23/20 Oculus announced plans to sunset Oculus Go. Information about dates and alternatives can be found in the Oculus Go introduction.

Oculus Quest Development

All Oculus Quest developers MUST PASS the concept review prior to gaining publishing access to the Quest Store and additional resources. Submit a concept document for review as early in your Quest application development cycle as possible. For additional information and context, please see Submitting Your App to the Oculus Quest Store.

Performance Optimization Tools

This section covers the tools that you should use when tracking down performance problems.

The reason for discussing a number of different performance optimization tools in this guide is to provide guidance regarding the workflow, or the decision tree, that you need to follow when triaging and resolving issues. Typically, when you are working on a graphics problem, your application may be crashing, you may see a black screen, or you may experience slow performance, perhaps due to high latency issues. However, there are many conditions that can lead to the same types of problems, and you want to find the root causes of the symptoms that you are seeing. The tools introduced in this section enable you to quickly rule out large classes of problems, and then to drill down to the underlying issues.

The following optimization tools help you to isolate and resolve performance problems:

Lost Frame Capture

This tool captures eye displays for any frames that your application drops, and lets you replay just those frames while viewing performance statistics and graphs. This helps you to quickly get a sense where your application is failing to maintain frame rate. In many cases, this may be sufficient for you to discern what the problem is. Lost Frame Capture is an offline analytical tool: you place the tool into capture mode, run your VR application, stop the capture mode, and then step through the dropped frame content offline, as desired. You can also export the captured content by saving it to an Oculus Debug Archive (ODA) file. This makes it possible to easily share the lost frame data with others who can then reproduce the scenarios where frames are dropped, and help to diagnose the underlying issues. For more information, see Lost Frame Capture.

NVIDIA Frame Capture Analysis Tool for VR Games (FCAT VR)

FCAT VR is a tool that is provided by NVIDIA. It can be used with any GPU hardware, however. One component of the FCAT VR Capture tool executes on your PC. It uses event tracing data that is generated by Event Tracing for Windows (ETW). (The process of capturing ETW event tracing data is described in Tutorial: Optimizing a Sample Application in this guide.) A second component of FCAT VR imports the event tracing data that is captured by ETW and displays charts that help you to analyze frame timing, dropped frames, warped frames, synthesized frames (Asynchronous SpaceWarp), reprojection, and other issues. For more information, see Frame Capture Analysis Tool for VR Games.

Oculus Debug Tool

This tool provides a live heads-up display, called the Oculus Performance HUD. This information display can help you to associate performance issues with specific contexts within your VR experience, and to experiment with those situations in real time. You can view statistics such as how many compositor frames are being dropped, how much time the application is taking to render frames, how much time the compositor is taking to render frames, and so forth.

The Oculus Performance HUD enables you to quickly rule out large classes of possible issues, since you don’t want to spend effort enhancing the performance of a component that doesn’t impact the overall performance of the application. For example, if your application is dropping frames, and the CPU profile shows a large burst of usage at some point during a frame cycle, then you don’t even want to consider GPU issues (unless the CPU is waiting for the GPU).

Once you have made a determination such as this, you can do important lower-level analysis by taking an ETW trace and analyzing that trace with Windows Performance Analyzer (WPA) or GPUView. For example, suppose your application is CPU bound, and is stalling around physics processing and draw call submissions. Even though the application is CPU bound, it can have a stack in the draw call generation, in which case you may want to use more batching. A tool such as GPUView can help you to narrow down this type of issue very precisely, so that you know exactly what needs to change in your code. For more information, see Oculus Debug Tool and Performance Head-Up Display.

SDK Statistics

You can also call the SDK directly to obtain the same statistics displayed in the Oculus Performance HUD, and then utilize those statistics within your code as desired. For example, you could write specific statistics to a console output, or invoke a debugger when a certain condition arises. For more information, see SDK Performance Statistics.

Performance Profiler

This analytical tool produces graphs based on the same types of statistical data that are available directly from the SDK and within the Oculus Performance HUD. You might use these graphs if you need to analyze patterns that play out over a given time period. For more information, see Performance Profiler.

Event Tracing for Windows (ETW)

ETW is a trace utility for performance analysis. It collects event data while the VR application is running, and then saves that data to event trace log (.etl) files. Performance analysis using ETW is centered on the events generated by the Windows kernel, which provides extensive details about the operation of the system. ETW profiles the entire system, not just the GPU. The full ETW documentation is located here: https://msdn.microsoft.com/en-us/library/windows/desktop/bb968803.

In order to analyze the ETW traces, you will typically use either Windows Performance Analyzer (WPA), or GPUView, or both.

Windows Performance Analyzer (WPA)

WPA provides a top-down view of the trace output, including contextual information that helps you to understand the system load.

When WPA loads a trace, the view it shows of that trace is a hierarchy of events that led to each other. For example, you might find that 90% of the time, your application is rendering a camera. And, 90% of that time was spent drawing bushes. Within that, 80% of the time is involved with rendering work for the foliage, and 30% of the time is spent on alpha blending. In this example, you may decide that you need to change the shading so that alpha blending is computationally cheaper. You would expect that this approach will improve the performance of the application as a whole.

GPUView

GPUView provides a lower-level view of the trace output, and lacks the contextual information that WPA provides. However, GPUView provides insights into the interaction between the CPU and the GPU which cannot be obtained by using WPA. Since GPUView lacks contextual information, it can be difficult to locate specific points in your application’s life cycle. However, you can use the two tools in a complementary way. For example, you can zoom in with WPA and locate the exact VR frame index that you are interested in, and then use this information to locate the same frame within GPUView.

When you are using Unity or Unreal, you can take advantage of their built-in profilers in order to debug most bottlenecks. For example, if your shaders are too complex, or your scripts are running for too long, those profilers will help you to track down the issue. But if your application is exhibiting a lot of contention and synchronization issues, or if there is judder that isn’t explained by application utilization, GPUView can allow you to look more closely at the queuing in the system. This enables you to find inefficient policies or bad synchronization mechanisms. For example, if you are using a vertex buffer that is too large, it may cause page faults. So the GPU might be fetching from system memory, and that might tell you that your meshes are too complex.

GPUView helps you to resolve the following types of issues:

  • Why is the application missing VSync intervals?
  • Are new surface allocations stalling the GPU and causing the frame stuttering problem we are observing?
  • Will optimizing the CPU code improve performance, or do we need to reduce the amount of work we send to the GPU?
  • Are we sending graphics tasks to the GPU early enough in the frame, or is the GPU idle while waiting on our CPU code?

ovrlog_win10

This script is used to start and stop ETW tracing sessions on Windows 10. (On previous versions of Windows, use ovrlog.) The ETW trace output is used as the input to WPA and GPUView. The ovrlog_win10 and ovrlog scripts call the xperf tool which initiates the event capture process, sets up file paths, injects all the events that are relevant to Oculus applications, and then turns off event tracing when done.

ovrlog_win10 and ovrlog are designed to capture kernel-level and application-level events, including:

  • Application start times
  • Interrupt activity
  • System responsiveness issues
  • Application resource utilization
  • Interrupts
  • and more

ovrlog_win10 and ovrlog are located here:

%PROGRAMFILES%\Oculus\Support\oculus-diagnostics\ETW> ovrlog_win10.cmd

Both ovrlog and ovrlog_win10 are shipped with the Oculus runtime.

A VSync (vertical synchronization) is a timing point that prevents any changes to the display memory until after the display finishes its current refresh cycle. VSyncs are generally your most important navigational unit when optimizing VR applications.

The typical ovrlog workflow is:

  1. Generate an ETW trace while running the Oculus application.
  2. Load the trace file into WPA, and locate the frame index where the problem is occurring.
  3. Load the trace file into GPUView, and locate the VSync for the desired frame index
  4. Zoom in around the frame index, perhaps showing a surrounding 10-frame interval.
  5. Compare the problem frame to a nearby healthy frame, and in particular examine how the frames are scheduling CPU and GPU resources.
  6. Match the CPU/GPU work against the functions that they are performing within your application (by clicking on the packets, and thereby highlighting the corresponding application-level work that they are performing).
  7. Infer from this what is causing the performance issue, such as a vertex buffer that is too large.

For a detailed walkthrough of this procedure, see the tutorial.