This guide describes how to use the Oculus Lipsync integration within your own projects. You may find it helpful to use the Unity sample scene or the included prefabs as reference. In order to use the Lipsync plugin, you must have already completed the download and setup steps to add the Oculus Lipsync assets to your Unity project.

Main Lipsync interface

To use Oculus Lipsync, a scene must include the OVRLipSync script, which is the main interface to the Oculus Lipsync plugin. We provide the LipSyncInterface prefab for convenience. You can find it in Unity under the Assets > Oculus > LipSync > Prefabs menu option. Add this to your scene to get started.

To drive your own objects with Lipsync

OVRLipSyncContext must be added to each GameObject which has a morph or texture target that you want to control. This script provides a number of options:

  • Show Visemes displays responses on an OVRLipSyncDebugConsole. You can add the LipSyncDebugConsole prefab for an easy way to include this.
  • Audio Loopback replays received audio as output.
  • Enable Keyboard/Touch Input enables keyboard and touch control for these two options.

All options provide tool-tips with more information.

OVRLipSyncContextMorphTarget and OVRLipSyncContextTextureFlip are the scripts that bridge the viseme output from OVRLipSyncContext, as described in the following sections.

Geometry morph targets

OVRLipSyncContextMorphTarget requires a Skinned Mesh Renderer, which should have blend targets assigned to it (see the Lips object in the LipSyncMorphTarget_Lips prefab for an example). The mesh should include all 15 visemes generated by the OVRLipSyncContext (expand BlendShapes in Lips Inspector view to access). For example:

Each blend target from sil to ou represents a viseme generated by the viseme engine. You may view each one by setting the blend target for a single viseme to 100.0 Note that sil corresponds to the silence, i.e. the neutral expression, and setting it to 100 with all other values 0 will have no visible effect. We provide a reference set of viseme images here, based on those from the Viseme MPEG-4 Standard.

Visemes capture lip shapes, and so if you want to add expression you may want to add additional shapes. Use caution when adding mouth expressions. For example, adding a laughter expression on top of viseme shapes may look uncanny. The OVRLipSyncContextMorphTarget script includes options under Viseme To Blend Targets to assign the 15 outputs from Oculus Lipsync to blend shapes other than the first 15 if you so wish:

By selecting the Enable Viseme Test Keys option you can drive each viseme to 100% using the QWERTY row of a US-layout keyboard by default, or set the keys by expanding Viseme Test Keys.

Texture flip targets

OVRLipSyncContextTextureFlip requires a material target and set of textures - one for each viseme - which will be selected based on output from the OVRLipSyncContext. These textures must be set within the Textures field, and must match the texture which you want to associate with a given viseme:

The logic within the OVRLipSyncContextTextureFlip script only chooses one texture to use on a given frame, and assigns it to the main material texture, which should be assigned to the model which is used for drawing the avatar lips.

Other Oculus Lipsync Scripts

OVRLipSyncMicInput is for use with a GameObject which has an AudioSource attached to it. It takes input from any attached microphone and pipes it through the AudioSource.

You should look at the other scripts and prefabs included with this integration. They will provide more insight as to what is possible with Oculus Lipsync. For example, there are some helper scripts to facilitate easy on-screen (in VR) debugging.

Using Audio Spatialization with LipSync

You can use the Oculus Native Spatializer for Unity to process sound sources so that the user experiences audio in a 3D environment, relative to the user’s head orientation and location. This dramatically improves the user’s experience of sound within immersive experiences. However, when speech is subjected to spatial processing, it effects the integrity of the spoken signal. The spatial processing essentially adds noise to the signal, which degrades the viseme output. In order to use audio spatialization in conjunction with LipSync, you must configure the the spatializer to post process the signal so that the raw input signal drives the viseme engine.

By default, the Oculus Native Spatializer for Unity processes the AudioSource buffers before calling OnAudioFilterRead (which invokes the LipSync functionality, if it is enabled). To change the order, set the spatializePostEffects flag on the AudioSource to True. You can set this flag in the Unity script function OnAudioFilterRead. You can also set this flag via the Inspector window. The field will show up provided the Spatialize field is checked. For more information, see AudioSource.spatializePostEffects.