On this Page
As with sound design, mixing a scene for VR is an art as well as a science, and the following recommendations may include caveats.
Realism is not necessarily the end goal! Keep this in mind at all times. As with lighting in computer environments, what is consistent and/or “correct” may not be aesthetically desirable. Audio teams must be careful not to back themselves into a corner by enforcing rigid notions of correctness on a VR experience.
This is especially true when considering issues such as dynamic range, attenuation curves, and direct time of arrival.
Sounds must now be placed carefully in the 3D sound field. In the past a general approximation of location was often sufficient since positioning was accomplished strictly through panning and attenuation. The default location for an object might be its hips or where its feet met the ground plane, and if a sound is played from those locations it will be jarring with spatialization, e.g. “crotch steps” or “foot voices”.
The Oculus Audio SDK does not support directional sound sources (speakers, human voice, car horns, et cetera). However, higher level SDKs often model these using angle-based attenuation that controls the tightness of the direction. This directional attenuation should occur before the spatialization effect.
Not all sounds are point sources, the Oculus Audio SDK provides volumetric sound sources to model sounds that need to be more spread out such as waterfalls, rivers, crowds, and so on.
The Doppler effect is the apparent change of a sound’s pitch as its source approaches or recedes. VR experiences can emulate this by altering the playback based on the relative speed of a sound source and the listener, however it is very easy to introduce artifacts inadvertently in the process.
The Oculus Audio SDK does not have native support for the Doppler effect, but most sound systems/middleware provide the ability to implement the Doppler effect.
In the real world, sound takes time to travel, so there is often a noticeable delay between seeing and hearing something. For example, you would see the muzzle flash from a rifle fired at you 100 meters away roughly 330 milliseconds before you would hear it. Modeling propagation time incurs some additional complexity and may paradoxically make things seem less realistic, as we are conditioned by popular media to believe that loud distance actions are immediately audible.
The Oculus Audio SDK does not have native support for time-of-arrival, it can be implemented for dramatic effect by adding a short delay in the sound system/middleware.
Playing stereo sounds without spatialization in VR will result in sounds that are “head locked”, as they follow the users head movements rather than staying locked in place in the virtual world. This can detract from the spatial audio experience and should generally try to be avoided where possible. This can be challenging for some sounds particularly music which is generally mixed as stereo.
For original compositions it’s best to mix to ambisonics which can be rotated and won’t be headlocked. If that is not an option then try to be mindful of how the music impacts the spatial audio.
Spatialization incurs a performance hit for each additional sound that must be placed in the 3D sound field. This cost varies, depending on the platform. For example, on a high end PC, it may be reasonable to spatialize 50+ sounds with reflections and reverb, whereas on mobile you may be limited to fewer sounds or disabling advanced features like refelections and reverb.
Some sounds may not benefit from spatialization even if placed in 3D in the world. For example, very low rumbles or drones offer poor directionality and could be played as standard stereo sounds with some panning and attenuation.
Aural immersion with traditional non-VR games was often impossible since many gamers or PC users relied on low-quality desktop speakers, home theaters with poor environmental isolation, or gaming headsets optimized for voice chat.
With headphones, positional tracking, and full visual immersion, it is now more important than ever that sound designers focus on the user’s audio experience.
As a 3D sound moves through space, different HRTFs and attenuation functions may become active, potentially introducing discontinuities at audio buffer boundaries. These discontinuities will often manifest as clicks, pops or ripples. They may be masked to some extent by reducing the speed of traveling sounds and by ensuring that your sounds have broad spectral content.
While latency affects all aspects of VR, it is often viewed as a graphical issue. However, audio latency can be disruptive and immersion-breaking as well. Depending on the speed of the host system and the underlying audio layer, the latency from buffer submission to audible output may be as short as 2 ms in high performance PCs using high end, low-latency audio interfaces, or, in the worst case, as long as hundreds of milliseconds.
High system latency becomes an issue as the relative speed between an audio source and the listener’s head increases. In a relatively static scene with a slow moving viewer, audio latency is harder to detect.
Effects such as filtering, equalization, distortion, flanging, and so on can be an important part of the virtual reality experience. For example, a low pass filter can emulate the sound of swimming underwater, where high frequencies lose energy much more quickly than in air, or distortion may be used to simulate disorientation.