Adreno Hardware Profile

The Adreno has a sizable (512k - 1 meg) on-chip memory that framebuffer operations are broken up into. Unlike the PowerVR or Mali tile based GPUs, the Adreno has a variable bin size based on the bytes per pixel needed for the buffers -- a 4x MSAA, 32 bit color 4x MRT, 32 bit depth render target will require 40 times as many tiles as a 16-bit depth-only rendering.

Vertex shaders are run at least twice for each vertex, once to determine in which bins the drawing will happen, and again for each bin that a triangle covers. For binning, the regular vertex shader is stripped down to only the code relevant to calculating the vertex positions. To avoid polluting the vertex cache with unused attributes, rendering with separate attribute arrays may provide some benefit. The binning is done on a per-triangle basis, not a per-draw call basis, so there is no benefit to breaking up large surfaces. Because scenes are rendered twice for stereoscopic view, and because the binning process doubles it again (at least), vertex processing is more costly than you might expect.

Avoiding any bin fills from main memory and unnecessary buffer writes is important for performance. The VrApi framework handles this optimally, but if you are doing it yourself, make sure you invalidate color buffers before using them, and discard depth buffers before flushing the eye buffer rendering. Clears still cost some performance, so invalidates should be preferred when possible.

There is no dedicated occlusion hardware like one would find in PowerVR chips, but early Z rejection is performed, so sorting draw calls to roughly front-to-back order is beneficial.

Texture compression offers significant performance benefits. Favor ETC2 compressed texture formats, but there is still sufficient performance to render scenes with 32 bit uncompressed textures on every surface if you really want to show off smooth gradients

glGenerateMipMap() is fast and efficient; you should build mipmaps even for dynamic textures (and of course for static textures). Unfortunately, on Android, many dynamic surfaces (video, camera, UI, etc) come in as SurfaceTextures / samplerExternalOES, which don't have mip levels at all. Copying to another texture and generating mipmaps there is inconvenient and costs a notable overhead, but is still worth considering.

sRGB correction is free on texture sampling, but has some cost when drawing to an sRGB framebuffer. If you have a lot of high contrast imagery, being gamma correct can reduce aliasing in the rendering. Of course, smoothing sharp contrast transitions in the source artwork can also help it.

2x MSAA runs at full speed on chip, but it still increases the number of tiles, so there is some performance cost. 4x MSAA runs at half speed, and is generally not fast enough unless the scene is very undemanding.