This guide describes how to install and use the Oculus Lip Sync Unity integration.
The Oculus Lip Sync Unity integration (OVRLipSync) is an add-on plugin and set of scripts used to sync avatar lip movements to speech sounds. OVRLipSync analyzes an audio input stream from a canned source or microphone input, and creates a set of values (called visemes) which may be used to animate the lips of an avatar.
A viseme is a gesture or expression of the lips and face that corresponds to a particular speech sound. The term is used, for example, when discussing lip reading, where it is analogous to the concept of a phoneme, and is a basic visual unit of intelligibility. In computer animation, visemes may be used to animate avatars so that they look like they are speaking.
OVRLipSync uses a repertoire of visemes to modify avatars based on a specified audio input stream. Each viseme targets a specified morph target in an avatar to influence the amount that target will be expressed on the model. Thus, realistic lip movement can be used to sync what is being spoken to what is being seen, enhancing the visual cues that one can used when populating an application with avatars (either controlled by a user locally or on a network, or for generating lip sync animations for NPC avatars via dialogue samples).
Our system currently maps to 15 separate viseme targets: sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, ih, oh, and ou. These visemes correspond to expressions typically made by people producing the speech sound by which they’re referred, e.g., the viseme sil corresponds to a silent/neutral expression, PP appears to be pronouncing the first syllable in “popcorn,” FF the first syllable of “fish,” and so forth.
These targets have been selected to give the maximum range of lip movement, and are agnostic to language. For more information on these 15 visemes and how they were selected, please read the following documentation: Viseme MPEG-4 Standard