VR Accessibility Design: Audio for Accessibility

One of the biggest challenges of designing accessible audio is the balance between familiar patterns that help associate meaning and combating listener fatigue. Sounds are made up of many different factors: pitch, rhythm, timbre, movement. You can use these elements to design a sonic language for your experience.

If you’re new to VR Audio Design, check out the VR Audio Design, Engineering, and Mastering Guide.

This document contains the following sections:

UI Sounds

To design sounds that correlate with your UI, you’ll likely want them to be relatively short and simple. Aim to create sounds that are memorable and recognizable. Communicate their functions even if your player might not be able to read the text or understand the meaning of button iconography.

With UI sounds, it’s important to refrain from overwhelming people with too many sounds as they click through or hover over certain buttons or menus. Use audio to amplify the meaning of each UI element, but make sure your sound design is not audibly cluttered or confusing.

Gameplay Sounds

As opposed to UI sounds, gameplay sound design will differ in length and complexity based on the context in which they are heard. If a sound is going to be heard repeatedly in short intervals, you’ll want to design at least several variations of the sound to make sure your users won’t get fatigued. Listener fatigue can cause your players to become desensitized to your sounds, and, therefore, potentially ignore important sonic information.

A good example of intentional sound variations would be those intended for footsteps. If your game has different types of terrain that players walk on, you might want to create different footstep sounds based on whether the player is walking on dirt, grass, or concrete. Someone who may not be able to bend their head to look low enough to see the ground might rely on the footstep sounds to figure out where they are in the map, so each type of footstep will have to be distinct enough to indicate each material. Then, within each type of footstep, your variations should still be similar enough to easily identify the associated terrain. For example, in a group of grass footstep variations, they should all sound like the player is wearing the same type of shoe and the grass is consistent in terms of height and type.


Pay attention to the frequency range of your individual sounds. This is important for listener comfort and their ability to hear each sound. The human ear is more or less sensitive to specific frequencies.

A sound that’s around 3,000Hz to 4,000Hz is perceived to be louder than one at 1,000Hz, even if they’re played at the same volume. The inverse is true for sounds that are lower than 300Hz; a sound lower than 300Hz sounds softer than one at 1,000Hz when played at the same volume. Remember the concept of perceived loudness as it will enable you to mix your soundscape for user comfort and accessibility.

Frequency range is also important when you consider players with partial hearing loss, as they may not be able to hear sounds above a certain threshold. It is not uncommon for people to experience high-frequency hearing loss, causing them to struggle with sounds within the 2,000Hz to 8,000Hz range. This can make it difficult for them to hear the voice of a female or child, or certain consonants such as s, h, or f. Refer to the section on Captions/Subtitles for details on how you can mitigate this accessibility issue.

As people age, they may also lose the ability to hear frequencies above 12,000Hz. If there is a critical event that relies on a very high-pitched audio cue, consider adding visual cues and/or haptics.

Audio Mixing Considerations

The audio mix is integral, not only to creating a good quality sonic experience, but also ensuring that information is effectively communicated to the user. During this process, you should make sure the volume levels of each individual sound is at a comfortable listening level in relation to one another.

The volume of each individual sound should be generally balanced. If a sound is too quiet, the user might adjust their headset volume to compensate, only to be surprised by subsequent sounds being way too loud. If the opposite is true, they might turn down the headset volume and end up missing important information from a voiceover line that plays afterwards, for example.

You should also test your volume levels as players progress through the experience. Do they need to increase/decrease the volume as they get closer to the end or continuously adjust the volume the entire time? Do they need to edit the volume levels of the application again every time they open it or get a software update?

Another important element of audio mixing involves the dynamic behavior of your in-game audio. Typically, any time there is spoken audio, it takes priority over the rest of the mix. If these lines aren’t easily heard over music, ambience, or sound effects, it can be extremely challenging for the player to focus on gameplay during and/or after the lines because they’ll be stuck trying to decipher what they just heard. Ducking is a very useful technique for these situations, as you automatically decrease the levels for a majority of the mix to prioritize specific sounds like VO, or direct the user’s attention to an event or object, such as the pause menu.

Just as we’ve encouraged in all the other sections, it’s best to open some customization options up to your users. Many console and PC games have audio options that allow users to customize volume levels for music, sound effects, dialogue, and voice chat. A user who struggles with attention might want to turn down the music and sound effects in the game to ensure they don’t miss a line of dialogue that could be critical to the story.

Another common setting lets users select how wide or narrow they’d like the dynamic range. The wider the dynamic range, the bigger the difference in volume levels between the quietest sound and the loudest sound. For example, a player who experiences partial hearing loss may want to opt for a setting where the dynamic range is more narrow to more easily hear all the different sounds.