Creating Spatial Audio for 360 Video using FB360 Spatial Workstation
Oculus Developer Blog
|
Posted by Abesh Thakur
|
July 25, 2017
|
Share

Spatial audio is a powerful way to fully immerse a user and direct attention within your VR app via sound. A huge part of our attention can be directed with audio cues but a fully immersive experience requires spatial audio and not just cues. Spatial audio makes what we hear a believable auditory experience that matches what we see and have experienced. It allows you to realistically present audio cues within your application from any direction, drawing the user's attention and providing a believable VR experience.

In this blog we’ll cover the basics of spatial audio and provide a few how-to guides to help you make an awesome VR experience.

What is Spatial Audio?

The human brain interprets auditory signals in a specific way that allows it to make decisions about its surrounding environment. We use our two ears, in conjunction with the ability to move our heads in space, to make better decisions about the position of an audio signal and the environment the sound is in.

Spatial audio in virtual reality is the manipulation of audio signals so they mimic acoustic behavior in the real world. An accurate sonic representation of a virtual world is a very powerful way to create a compelling and immersive experience. Spatial audio not only serves as a mechanism to complete the immersion but is also very effective as a UI element: audio cues can call attention to various focal points in a narrative or draw the user into watching specific parts of a 360 video, for example.

Spatial audio can be experienced using a normal pair of headphones. No special speakers, hardware, or multi-channel headphones are required. For more details on how spatialized audio works, check out our Introduction to Virtual Reality Audio.

Try it for yourself with these examples of spatialized audio. Make sure you have your headphones on!

Through the Ages

Rapid Fire: A Brief History of Flight

Linear vs Interactive Audio Design

While the playback and consumption of spatial audio is the same regardless of whether the experience is an interactive experience, 360 video, or a cinematic mixed reality piece, the workflow to create such content is significantly different. In particular, games and interactive experiences often rely on audio samples played from discrete sound sources that are mixed in real time relative to the position of the camera. The Oculus Audio SDK is designed to add high-quality spatialization tools to existing tools (such as FMOD and Wwise) and engines (Unity and Unreal) often used by game developers.

Designing audio for 360 videos typically requires a very different workflow. Sound designers work with a set of tools organized around a timeline where the final results are predetermined. Though the content is by definition linear and deterministic, this model gives sound designers maximum creative control over the experience. Digital Audio Workstations (DAWs) such as Pro Tools, Nuendo, and Reaper are typically used to craft such content. Currently for 360 video, the most common method of developing a spatial soundscape is a technology called ambisonics.

AMBISONICS

Ambisonic technology is a method to render 3D sound fields in a spherical format around a particular point in space. It is conceptually similar to 360 video except the entire spherical sound field is audible and responds to changes in head rotation. There are many ways of rendering to an ambisonic field, but all of them rely on decoding to a binaural stereo output to allow the user to perceive the spatial audio effect over a normal pair of headphones.

Ambisonic audio itself can be of n-orders comprising of various channels. More channels results in higher spatial quality, although there is a limit to the perceived difference in sound quality as one goes beyond 3rd order ambisonics (16 channels of audio). Regardless of the number of channels used for encoding the original signal, the decoded binaural audio output will always be to two channels. As the listener moves their head the content of the decoded output stream shifts and changes accordingly, providing a 3D spatial effect.

Ambisonics are not the only way to render spatial audio for 360 videos. There are other solutions in the market as well, although the effectiveness, feature set, toolchains, and final render quality varies between various techniques:

  • Traditional surround sound such as 5.1, 7.1 etc. which can be decoded over virtual speakers and rendered binaurally over headphones. Depending on the content, the rendered sound field may suffer from ‘holes’ between the speakers and won't have the same smoothness in spatial accuracy or resolution
  • Quad-binaural: 4 pairs of pre-rendered binaural stereo tracks each in 0, 90, 180 and 270 degrees. The audio streams are faded in based on head-rotation

Using Facebook 360 Spatial Workstation to Create 360 Linear VR

The FB 360 Spatial Workstation is an end-to-end pipeline that allows sound designers to drop in audio sources, pan and sync to scene elements, and render to a single ambisonic file that is played back on Facebook and Oculus video. Originally developed by Two Big Ears, Facebook 360 Spatial Workstation is now a free tool provided by the Audio 360 team at Facebook.

Facebook 360 Spatial Workstation is a collection of plugins for DAWs that include a spatializer, video player, encoder, and rendering SDK, just to name a few. These plugins help authors create spatial audio content, encode it with platform-specific metadata (for Facebook, YouTube, etc.), and play it back in a client application.

Facebook 360 Spatial Workstation is a collection of plugins for DAWs that include a spatializer, video player, encoder, and rendering SDK, just to name a few. These plugins help authors create spatial audio content, encode it with platform-specific metadata (for Facebook, YouTube, etc.), and play it back in a client application.

The diagram above illustrates a typical end-to-end workflow focusing on sound design, asset preparation, mixing with final video, and publishing to Facebook, Oculus or other supported apps.

For most 3rd party apps on Gear VR and other platforms using the Rendering SDK, the sound designer prepares a .tbe file, which is delivered separately. The 3rd party application has the underlying Rendering SDK integrated, which allows it to play back the .tbe file in sync with the video file. There are multiple APIs included with the documentation that allow synchronization to an external clock.

On Facebook and YouTube, the Facebook 360 Encoder application creates an upload-ready .mp4 file.

CONTENT CREATION

The Facebook 360 Spatial Workstation is a collection of plugins for creating interactive spatial mixes for 360 videos.

  • The Spatialiser plugin allows the sound designer to place a sound source in space. The source itself could be a mono source, an ambisonics recording, or a multi-channel source such as a surround reverb. Non-mono sources act as a 'bed' while diegetic mono sources, such as dialogue and sound effects, are usually placed in a scene. Non-diegetic audio such as narration or background music is usually routed to the head-locked stereo bus. This makes it part of the final mix but not relative to head orientation.
  • The Control plugin acts as the command centre, controlling how all audio is routed for real-time binaural playback over headphones. This plugin also manages global settings of features such as early reflections and mix focus.
  • The Video player is a built-in 360 video player that is 'slaved' to the DAW timeline, and allows the sound designer to preview the mix with the 360 video in real time, either in VR or on the desktop. Desktop mode allows rotating the video with the keyboard or mouse, which will rotate the sound field instantly, providing direct feedback during the authoring stage.
  • The Converter plugin is a utility that can rotate a mix after it has been created, or output to other formats such as 4-channel ambiX.
  • The Loudness meter provides an overview of the loudness of the entire mix when looking in a particular direction. Loudness for spatial mixes is considerably different than what is offered inside DAWs, which is usually for static content. Spatial audio for 360 videos is considerably more complex and this meter gives useful data that will prevent the final uploaded content from distorting when played back on the target device.

ENCODING AND ASSET PREPARATION

The Facebook 360 Encoder application takes a video file and combines the audio files into the video container, suitable for playback on Facebook and other supported platforms. Additionally, it also allows adding metadata to the file describing values for the Focus feature.

This process also injects relevant metadata into the tool, making the final asset ready for upload to supported platforms.

SUPPORTED PLATFORMS FOR PLAYBACK

  1. Facebook 360 Video format (8 or 10 channel audio): Can be viewed on Facebook News Feed, or Oculus Video on Gear VR
  2. .tbe format: Apps with Rendering SDK
  3. YouTube 360
  4. Other platforms with support for ambisonics or quad-binaural format. Note that these platforms have specific instructions for preparing assets.

FURTHER READING

Here are some key resources to get started with Facebook 360 Spatial Workstation.