This post is a high level overview of our near-field rendering tech.
This is the first article in a series reviewing new functionality in the Audio SDK. The following post covers our near field rendering tech.
* * *
Binaural 3D audio works by applying to a sound a unique filter for each ear based on the 3D position of the sound source. The term “filter” can be used to describe very different things from simple EQ all the way to complex reverberation. So what are we talking about here?
Just as a reverberation filter captures in its binaural impulse response (IR) all the ways a sound can interact with the surrounding environment on its way to the listener’s ears, a binaural spatialization filter captures all the ways a sound can interact with the listener's body on its way to the ears.
In the reverberation case, the IRs are much longer and chaotic due to the size and complexity of the environment. We’ve been taking advantage of this for years to make an approximation of an environment with a single binaural reverb IR because beyond the 1st few bounces, spatialization is buried in a fading chaos that we perceive unconsciously as a diffuse connection to our surrounding environment.
In the binaural 3D spatialization case, the IRs are tiny, but extremely directional. Beyond a few feet, the IRs don’t change much with distance. We’ve been taking advantage of this to make another approximation of 3D audio that is independent of distance and that we call “far-field”. Our HRTF database is captured/sampled around the head as a grid on a sphere rather than a volume.
We’re spatializing along azimuth and elevation angles, but not distance.
Distance is addressed in a separate dedicated modeling:
Near-field rendering begins with the acknowledgement that this model doesn’t work as well when sound distance from the listener shrinks to the point of being comparable to the size of the human head. In that case, spatialization and distance modeling become closely intertwined and are better synthesized from an ear-centric, rather than a head centric, spatial reference. In far-field, the center of the world is the center of our head. In near-field, the center of the world is the ear canal entrance, and we have two of them, which makes near-field even more “binaural” in some way than far-field.
The Near-field distance (radius of the Near-field sphere around the listener’s head) is commonly defined as ~0.5 - 1.0 m (~3 feet, “within arm's reach”). A logical evolution from our current far-field HRTF tech would be extending it to near-field by adding more filter samples to the database (red dots) to fill up the entire near-field sphere volume all the way to the head boundary:
This will likely come down the line from R&D, but will take more resources. In the meantime, just like for the reverberation and far-field spatialization cases, we're looking for a perceptual approximation that runs fast on hardware with limited resources.
So, what's special about near-field audio?
For our approximation to work, we first have to identify the main perceptual cues of near-field rendering:
From this, the approximation will work better on the lateral sides (away from the median plane) where the ILDs and diffraction filters will be strongest, and we need full control over the reflected signal gains (early reflections and late reverberation).
Also worth noting is the absence of ITD (Interaural Time Difference) specific cues: being in close proximity does not affect the timing differences between each ear in a perceivable way, but does generate wider ITD and ILD variations than when they're moving similarly but farther away (remember that pesky mosquito!).
Near-field rendering model
az: azimuth angle
el: elevation angle
d: sound distance to the listener
a: head diameter
The key physical phenomenon at play here is acoustic diffraction: the bending of waves around rigid obstacles like the head.
This phenomenon is frequency dependent:
It can be thought of as a binaural (each ear will get a different filtering effect) directional lowpass filter with a cutoff frequency directly related to the head size, the azimuth and elevation angles. Some of that filtering is already captured in our far field HRTFs (head diffraction is not restricted to near-field) so we're subtly accentuating the effect with a set of realtime filters parameterized in distance, azimuth and elevation.
* * *
Our second article covers volumetric sounds and the various usage patterns for achieving better presence through sound design.