In order to create the best possible VR experience, it’s important to optimize your application or game for the Meta Quest platform. This can pose a challenge because the mobile GPU can only fill so many pixels and shade so many vertices, and the CPU can only perform so much computation. The good news is the constraints around performance have been largely discussed, so there is a known path for optimization. There’s another limiting factor to running software on Quest, however: memory. Unlike most desktop computers, because the storage on Quest is using flash memory, we can’t swap memory to disk because reading and writing that frequently might decrease the lifespan of the hardware. We’re also constrained in the amount of RAM available. Compared to most modern PC’s with 16 GB, 32 GB, or even more RAM, the Quest has 4 GB of RAM, and the Quest 2 has 6 GB of RAM. As you can imagine, making sure your application stays within the limit will require some thought and effort, and the first step in getting there is determining how much memory you’re actually using.
How Is Memory Allocated?Before we can measure how much memory an application is using, we need to understand how memory is used by the system. From a programming perspective, one simply asks the system for a chunk of memory using a call like `malloc()`. This will give you some number of bytes you can read and write to until you’re done with it and then you call `free()`. It seems like you can get the total memory usage of an application just by adding up all the `malloc()` calls, but there’s actually quite a bit more going on under the hood.
To explain why, let’s imagine that we have a system that allocates the exact amount of memory you request in sequential order. Our uninitialized memory might look like this:

Now let’s say our first allocation (A) is 5 bytes:

Then let’s say we make two more allocations, one of 7 bytes (B), and one for 4 bytes (C). Our memory would look like this:

Now let’s say we release B. And we allocate a new buffer of size 5 (D). Our system, trying to pack our data in as small a space as possible will replace B with D, but as you can see, we’ve left a 2 byte hole, that could only be filled with a 1 or 2 byte allocation:

If we kept going like this eventually we’d be left with a bunch of holes of unused memory of varying sizes, and it would be a huge task to keep track of all these holes and their sizes. This state is called
fragmentation, and it’s a tough problem to solve, but modern systems do their best to avoid it by being smarter about how they allow memory to be allocated.
In reality, our system does not return sequential blocks when malloc is called. Instead, it will return memory in predefined block sizes, as determined by the allocator. The default allocator for Android’s libc is jemalloc. The allocation pattern for jemalloc is to assign memory in blocks with sizes that are powers of 2, starting with the smallest allocation size of 64 bits (8 bytes). When you ask jemalloc for any allocation from 1 to 8 bytes, it will reserve an 8 byte slot, from 9 to 16 bytes it will reserve 16 bytes, from 17 to 32 bytes, it will reserve 32 bytes, and so on. The unused portion of the slot will remain unused for the duration of the allocation. This means when you release an allocation, the allocator will know it now has a full slot available for a given size, which makes it much easier to reuse.
Let’s take a look at our allocation example from earlier, knowing that our allocator will assign in blocks. Starting from our initial state of empty:

After allocating A (5) bytes, our first slot will be populated with A:

Then after allocating B (7 bytes), and C (4 bytes), our second slot will have B, and our third C:

If we deallocate B, our second slot opens up again, allowing us to fill it with a new allocation D:

While more memory overall is allocated using this system, the allocator has a much easier time finding locations to allocate. It simply needs to keep track of whether a slot is used or unused, not the size of each allocation, the initial offset, etc. This makes allocations faster and more efficient.
At a system level, keeping track of all these small allocations would still be quite complex and lead to fragmentation, so what the OS does is divides memory into larger, fixed size blocks that it can distribute to applications, called
‘pages.’ Pages on Quest are 4096 bytes in size, which is typical for many modern operating systems. There will always be a whole number of pages allocated for a process at the system level. At the application level, how the pages are utilized is determined by the allocator. This means the allocator is responsible for determining how to divide pages into individual block slots, when these slots are available for use or reuse, and when to request new pages from the OS. The conditions that determine whether a new page will be needed for an allocation are somewhat opaque, which means it is almost impossible to predict precisely how many pages will be allocated for a given set of calls to `malloc()`.
It’s also important to understand that when the system gives pages to the application, they’re provided with a virtual address instead of a physical address. This allows the system to manage physical RAM that is shared between processes, and on systems that do allow swapping to disk, will allow some pages to be removed from RAM completely invisible to the process. This also avoids the issue of fragmenting physical memory. If an application asks for a large continuous block of memory, the system can give it as many pages as it needs from anywhere in physical memory, as long as the virtual addresses are contiguous. It should be noted that it is very possible to fragment virtual address space, especially on 32 bit applications, where the total address space is only 4GB. This is why 64 bit applications are required on the Quest Platform.
PSS, RSS, USS, VSSOnce you understand how memory is physically allocated, it seems simple enough to calculate how much memory a process is using: just add up the number of pages. However, there is one complicating wrinkle (that actually exists for very good reason!). If multiple processes share a common library, instead of loading multiple copies of the same library into memory, it will only be loaded one time, which saves memory. It does change the story on how much memory each application actually uses, though. Say process A is loaded and has a shared library uncreatively named ‘sharedlib.so’ that takes 100 MB, and process A as a whole takes 1.1GB. Let’s say there’s also a process B, that takes 1.5 GB total, including ‘sharedlib.so’. Now if both processes A and B are loaded at the same time, the total memory usage of the system is only 2.5GB, not 2.6GB, because ‘sharedlib.so’ is only loaded once for both processes. So how much memory is each process using? You could say process A is using 1.1GB, and process B is using 1.5GB, because that’s how much each would take up on their own. Or you could say Process A is taking up 1.1GB, and process B is taking up 1.4GB because it loaded second, which accurately represents the total memory usage of the system. Then if Process A exits, process B would be taking up 1.5GB.
To provide a consistent way of measuring memory usage of each process, such that all processes memory adds up to the total system memory, we use something called ‘Proportional Set Size’, or PSS. PSS is a number based on a formula where you take all the unique memory of a process, and then you take all the shared memory, divided by the number of processes that share that memory. So for our example, PSS for process A is 1.05GB, PSS for process B is 1.45GB, which adds up to our true total memory, 2.5GB. While the PSS number is not reflective of the actual memory usage by either process A and B, it is useful in the context of adding up all used memory by the system. It’s also the way Android determines if an application is using too much memory and must be terminated by measuring its PSS.
There are other ways of measuring the memory usage of an application that are reported by the system as well. RSS, or ‘Resident Set Size’ is the total count of physical memory that a process is using, including shared memory. USS, or ‘Unique Set Size’ is the amount of unique memory a process is using (which does not count any shared memory). Finally, ‘Virtual Set Size’, or VSS, is the total of all virtually allocated memory, whether it is mapped to physical memory or not.
While PSS is used by the system to determine if an application should be killed, it’s less useful for debugging and tuning. RSS, while not representative of the exclusive memory usage of an application, can be good to track total memory usage of an application and when additional allocations are made. USS is also a useful metric to keep track of, as it is a measurement of memory that is solely allocated by your application. VSS is the least useful memory metric, as it has very little correlation with how close your application is to running out of space.
Let’s take our example of process A and process B from above, and show what our memory metrics would look like:

The remaining question is, how does one access PSS, RSS, USS and VSS? PSS is logged in logcat once per second on Quest. It’s also available through
OVR Metrics Tool. Similarly, VSS and RSS are available through OVR Metrics Tool. It’s also possible to retrieve a summarized value for PSS, RSS, and USS through `dumpsys procstats`, which will return the minimum, maximum and average of each over a period of time. The format of the output from `dumpsys procstats` will look something like this:
* <package name> / <id> / <version code>:
TOTAL: 94% (55MB-80MB-106MB/46MB-69MB-93MB/131MB-166MB-201MB over 2)
Where these memory amounts represent:
minPSS-avgPSS-maxPSS/minUSS-avgUSS-maxUSS/minRSS-avgRSS-maxRSS
It’s also possible to query for PSS, RSS, and USS through the Android Java API
ActivityManager.getProcessMemoryInfo. This App will populate a MemoryInfo object, which has methods to retrieve total PSS, as well as private dirty, private clean, shared dirty and shared clean memory. USS is the sum of private clean memory and private dirty memory, though it may be useful to keep track of private clean and private dirty memory separately, as private clean memory can be easier to reclaim (clean in this case means it has been unmodified from when it was read from disk). RSS is the sum of all private and shared, clean and dirty memory.
GPU MemoryThe Meta Quest and Quest 2 both have a combined system-on-chip architecture that share RAM between CPU and GPU. This means that when textures and graphics resources are created through GLES or Vulkan, instead of ending up in dedicated graphics card RAM, they are placed in the same main memory as any allocations made from code, such as calls to malloc(). This means all measurements of application memory also include GPU memory. While CPU allocated memory is usually easy to track (simply marking when malloc is called), GPU allocations can be trickier to keep track of, as they will happen inside the graphics driver. They’re also less predictable regarding size, as textures may (and often do) require more memory than expected to align with page offsets.
Fortunately, there are ways to query the GPU memory usage of an application. For a high level view, you can use
gpumeminfo. This tool will poll once a second and sum the memory usage of each GPU resource type. Under the hood, this tool is reading and processing `/d/kgsl/proc/<pid>/mem`, which contains a list of all allocated gpu resources, which includes the type of resource and the amount of memory each is using. Depending on whether the application is using GLES or Vulkan, the following types will be in the list:
Engine AllocationsIf you’ve accounted for your total memory usage and GPU memory usage, it’s often useful to keep track of the amount of memory that has been allocated directly by your game and game engine. Fortunately, both Unity and Unreal Engine provide tools to help you analyze what you currently have in memory at any given time.
Unity Memory ProfilerUnity’s memory profiler can be used both to track the allocations as a graph of memory in real-time, or to take a snapshot of the current state of memory, and actually determine the references that are keeping objects resident in memory. Unity provides documentation on the tool
here.
For more information on the snapshot usage, I’d recommend
this tutorial.
Unreal Engine Memory ProfilerUE4 also contains a
Profiler tool for tracking real time memory usage.
Tips and Tricks for Reducing Memory UsageNow that you can account for all the memory used by your application, it should be possible to take steps to reduce the amount of memory needed to remain below the limit at all times. Here are a few ideas that can help.
Compress your TexturesMoving from uncompressed textures to compressed textures is probably the easiest way to save memory. An uncompressed ARGB texture requires 32 bits per pixel in memory, which means a single 4k texture takes 67 MB (not even including mips). Using a hardware supported compression, such as ASTC or ETC2 can reduce the bits per pixel to as low as 0.89 bits per pixel, which means that same 4k texture can shrink to as low as 1.87 MB. Of course higher compression will mean more detail is lost, so the process of selecting the compression level is to find the highest level of compression that keeps the quality at an acceptable level. As far as determining which compression to use, ASTC usually provides the highest visual quality at a given compression level, but ETC2 can be slightly more performant to sample on the GPU, so for performance-critical cases that might be preferred. Please note that in Unity versions before 2019.1.0, the quality setting was not respected for ASTC textures, causing them to look worse than they should when quality was set to ‘High.’
Make sure textures and meshes only live in GPU MemoryIn Unity, textures and meshes have a checkbox labeled ‘Read/Write Enabled’ which marks them as readable from code (the readable state can be queried at runtime through the ‘isReadable’ property). When a texture or mesh is marked as readable, one copy of the asset is kept in main memory, while another copy is uploaded to the GPU. The effect of this is double the amount of memory that a texture or mesh needs for rendering is resident in memory at all times. Likewise, when creating a texture or mesh at runtime, it’s possible to delete the main memory copy by calling Apply(true) for textures, or UploadMeshData(true) for meshes, which will make them no longer readable from code.
In Unreal Engine, Textures will by default upload their data to the GPU only. Meshes can be marked with the ‘Allow CPUAccess’ flag, which will keep a copy in memory. Make sure this is only checked for meshes you need to manipulate at runtime.
Compress your Mesh VerticesWhile compressing mesh vertices sounds the same as compressing textures, it’s actually a very different process. While texture compression uses a visually based algorithm to convert textures into an optimal format, mesh compression is simply swapping larger data types for smaller ones for vertex buffer attributes. For example, instead of using full floating point precision for vertex positions, you could use half precision, and for uvs you could use fixed precision. Ideally this would be done on a per mesh basis in the mesh editor of your choice, however it can be more convenient to apply a preset set of vertex attribute precisions to all of your meshes. Unity allows you to do this by selecting the attributes you’d like to compress in the ‘Vertex Compression’ dropdown in Player Settings. It should be noted that enabling ‘Mesh Compression’ on the importer for a Mesh/Model will actually disable vertex compression. ‘Mesh Compression’ compresses the mesh on disk, but it will be full size when loaded in memory. Compressing your vertices has the additional advantage of potentially speeding up rendering time in cases where the GPU is memory bound.
Unreal Engine will use lower precision data for certain channels by default. You can disable this behavior by checking the ‘Use High Precision Tangent Basis’ or ‘Use Full Precision UVs’ checkbox for individual LOD levels of your mesh.
Use Texture StreamingTexture Streaming is a system provided by Unity and Unreal Engine to try to only load the mipmap levels of your textures that are actually needed by the current camera view, saving memory. This system also lets you set a total texture budget, which can be good to prevent going over the total memory limit of the system. You can read Unity’s documentation of the feature
here and UE4’s documentation
here.
The downsides to this system can be texture pop-in, since mips are loaded asynchronously they can suddenly jump in quality, which can be noticeable. Using texture streaming can also make it hard to optimize total memory, as the behavior of the system can sometimes be unpredictable since it is based on camera view, which can rapidly change, especially in VR. It may also conflict with other IO operations if your game features any other dynamic loading. However, it may be a good solution to reduce total memory usage depending on your application, so it is worth testing.
Optimize Texture Sizes to GPU AlignmentsWhen Textures are created, the GPU will allocate blocks of memory to keep the textures aligned, which makes them easier to page into cache when sampled. Therefore, for a texture of a given size, the GPU may round up the memory to be slightly larger than what the texture actually requires. For example, take a look at the following chart. At certain sizes, the allocated size perfectly matches the texture size, however, a texture one pixel larger will require an allocation a step higher. This means, to maximize your memory utilization, you should always size your textures to align with the GPU allocation to have no wasted space.

For Square ASTC Textures with Square Block Sizes, the following sizes achieve the most efficient allocations:

*Textures at these sizes don’t fully utilize the 131,072 bytes allocated, but one pixel more will double the allocated size to 262,144 bytes.
For Square ASTC Textures with Non-Square Block Sizes, the following sizes are the most efficient:

ASTC 10x5 More closely follows the pattern of Square Block Sizes (which makes sense because of its 2:1 ratio):
Minimize Unity ‘Screen’ SizeEven though VR applications don’t render directly to the screen, Unity’s default behavior is to allocate a screen-sized texture and render it each frame (for most applications it just clears it to black, which should take almost no time, but I have measured it taking upwards of a millisecond in some cases). In addition to potentially causing a performance hit, this screen size buffer takes up a not insignificant amount of memory. You can avoid this wasted space by changing the size of the screen buffer to a minimal size by calling
Screen.SetResolution(16, 16, true);
Because the screen is never actually visible, this change should be a pure performance and memory win.
Take Advantage of ‘Lazy’ Allocated Buffers in Native Vulkan ApplicationsIf you are using Vulkan natively for your application, you can completely avoid allocating memory for buffers that should only ever be accessed in GPU tiled memory (such as MSAA attachments) by creating them with the VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT. This flag prevents the memory for a buffer from being allocated until it is needed. However, in the case of buffers that only ever exist in tiled memory, the space is never required in main memory so the allocation will not occur. See
here for more information.