Head-Related Transfer Functions (HRTFs) in conjunction with attenuation provide an anechoic model of three dimensional sound, which exhibits strong directional cues but tends to sound dry and artificial due to a lack of room ambiance. To compensate for this, we can add environmental modeling to mimic the acoustic effects of nearby geometry.
In this section, we provide a number of core concepts that drive environmental modeling, including reverberation and reflections, example models, presence, world acoustics and more.
As sounds travel through space, they reflect off of surfaces, creating a series of echoes. The initial distinct echoes, called early reflections help us determine the direction and distance to a sound. As these echoes propagate, diminish, and interact they create a late reverberation tail, which contributes to our sense of space.
Some 3D positional implementations layer simple “shoebox room” modeling on top of their HRTF implementation. These consist of specifying the distance and reflectivity of six parallel walls (i.e., the “shoebox”) and sometimes the listener’s position and orientation within the room. With this basic model, you can simulate early reflections from walls and late reverberation characteristics.
While far from perfect, it’s much better than artificial or no reverberation.
Since modeling physical walls and late reverberations can quickly become computationally expensive, reverberation is often introduced via artificial, ad hoc methods such as those used in digital reverb units of the 1980s and 1990s. While less computationally intensive than physical models, they do not consider the listener’s orientation or physical environment surrounding the listener, so these methods can sound less realistic.
Convolution reverb samples the impulse response (IR) from a specific real-world location such as a recording studio, stadium, or lecture hall. It can then be applied to a signal to make it sound like it was played back from within that location; this can produce a phenomenally lifelike and immersive reverb effect. The drawback is that the real environment from which the IR was captured won’t necessarily map perfectly to the virtual world, and since IR is captured from a single location it won’t adapt as the user moves throughout the environment.
The “shoebox model” attempts to provide a simplified representation of an environment’s geometry. It assumes no occlusion, equal frequency absorption on all surfaces, and six parallel walls at a fixed distance from the listener’s head. Needless to say, this is a heavy simplification for the sake of performance, and as VR environments become more complex and dynamic, it may not scale properly.
Some solutions exist today to simulate diffraction and complex environmental geometry, but support is not widespread, and performance implications are still significant.
Audio contributes greatly to the overall VR experience, and high quality spatial audio enhances immersion and creates a sense of presence - the sense for the user that they’re really in the virtual world.
Audio immersion is maximized when the listener is located inside the scene, as opposed to viewing it from afar. For example, a 3D chess game in which the player looks down at a virtual board offers less compelling spatialization opportunities than a game in which the player stands on the play field. By the same token, an audioscape in which moving elements whiz past the listener’s head with auditory verisimilitude is far more compelling than one in which audio cues cut the listener off from the action by communicating that they’re outside of the field of activity.
Equipped with the knowledge above, you should be ready to design your next soundscape and mix it in post. Be sure to review the guide on Mixing VR Audio, as well as the Overview of Audio Devices to further improve your understanding of spatial audio.
If you’re ready to kick off the technical side of VR audio design and engineering, be sure to review the following documentation: