Binocular Vision, Stereoscopic Imaging and Depth Cues

  • The brain uses differences between your eyes’ viewpoints to perceive depth.
  • Don’t neglect monocular depth cues, such as texture and lighting.
  • The most comfortable range of depths for a user to look at in the Rift is between 0.75 and 3.5 meters (1 unit in Unity = 1 meter).
  • Set the distance between the virtual cameras to the distance between the user’s pupils from the OVR config tool.
  • Make sure the images in each eye correspond and fuse properly Effects that appear in only one eye or differ significantly between the eyes look abnormal.

Basics

Binocular vision describes the way in which we see two views of the world simultaneously—the view from each eye is slightly different and our brain combines them into a single three-dimensional stereoscopic image, an experience known as stereopsis. The difference between what we see from our left eye and what we see from our right eye generates binocular disparity. Stereopsis occurs whether we are seeing different viewpoints of the physical world from each of our eyes or two flat pictures with appropriate differences (disparity) between them.

The Oculus Rift presents two images, one to each eye, generated by two virtual cameras separated by a short distance. Defining some terminology is in order. The distance between our two eyes is called the interpupillary distance (IPD), and we refer to the distance between the two rendering cameras that capture the virtual environment as the inter-camera distance (ICD). Although the IPD can vary from about 52mm to 78mm, average IPD (based on data from a survey of approximately 4,000 U.S. Army soldiers) is about 63.5 mm—the same as the Rift’s average interaxial distance (IAD), which is the distance between the centers of the Rift’s lenses (as of this revision of this guide).

Monocular depth cues

Stereopsis is just one of many depth cues our brains process. Most of the other depth cues are monocular; that is, they convey depth even when they are viewed by only one eye or appear in a flat image viewed by both eyes. For VR, motion parallax due to head movement does not require stereopsis to see, but is extremely important for conveying depth and providing a comfortable experience to the user.

Other important depth cues include: curvilinear perspective (straight lines converge as they extend into the distance), relative scale (objects get smaller when they are farther away), occlusion (closer objects block our view of more distant objects), aerial perspective (distant objects appear fainter than close objects due to the refractive properties of the atmosphere), texture gradients (repeating patterns get more densely packed as they recede) and lighting (highlights and shadows help us perceive the shape and position of objects). Current-generation computer-generated content already leverages a lot of these depth cues, but we mention them because it can be easy to neglect their importance in light of the novelty of stereoscopic 3D.

Comfortable Viewing Distances Inside the Rift

Two issues are of primary importance to understanding eye comfort when the eyes are fixating on (i.e., looking at) an object: accommodative demand and vergence demand. Accommodative demand refers to how your eyes have to adjust the shape of their lenses to bring a depth plane into focus (a process known as accommodation). Vergence demand refers to the degree to which the eyes have to rotate inwards so their lines of sight intersect at a particular depth plane. In the real world, these two are strongly correlated with one another; so much so that we have what is known as the accommodation-convergence reflex: the degree of convergence of your eyes influences the accommodation of your lenses, and vice-versa.

The Rift, like any other stereoscopic 3D technology (e.g., 3D movies), creates an unusual situation that decouples accommodative and vergence demands—accommodative demand is fixed, but vergence demand can change. This is because the actual images for creating stereoscopic 3D are always presented on a screen that remains at the same distance optically, but the different images presented to each eye still require the eyes to rotate so their lines of sight converge on objects at a variety of different depth planes.

Research has looked into the degree to which the accommodative and vergence demands can differ from each other before the situation becomes uncomfortable to the viewer.[1] The current optics of the Rift are equivalent to looking at a screen approximately 1.3 meters away. (Manufacturing tolerances and the power of the Rift’s lenses means this number is only a rough approximation.) In order to prevent eyestrain, objects that you know the user will be fixating their eyes on for an extended period of time (e.g., a menu, an object of interest in the environment) should be rendered between approximately 0.75 and 3.5 meters away.

Obviously, a complete virtual environment requires rendering some objects outside this optimally comfortable range. As long as users are not required to fixate on those objects for extended periods, they are of little concern. When programming in Unity, 1 unit will correspond to approximately 1 meter in the real world, so objects of focus should be placed 0.75 to 3.5 distance units away.

Our ongoing research and development might allow future incarnations of the Rift to improve their optics to widen the range of comfortable viewing distances. No matter how this range changes, however, 2.5 meters should be a comfortable distance, making it a safe, future-proof distance for fixed items on which users will have to focus for an extended time, like menus or GUIs.

Anecdotally, some Rift users have remarked on the unusualness of seeing all objects in the world in focus when the lenses of their eyes are accommodated to the depth plane of the virtual screen. This can potentially lead to frustration or eye strain in a minority of users, as their eyes may have difficulty focusing appropriately.

Some developers have found that depth-of-field effects can be both immersive and comfortable for situations in which you know where the user is looking. For example, you might artificially blur the background behind a menu the user brings up, or blur objects that fall outside the depth plane of an object being held up for examination. This not only simulates the natural functioning of your vision in the real world, it can prevent distracting the eyes with salient objects outside the user’s focus.

We have no control over a user who chooses to behave in an unreasonable, abnormal, or unforeseeable manner. Someone in VR might choose to stand with their eyes inches away from an object and stare at it all day. Although we know this can lead to eye strain, drastic measures to prevent this anomalous case, such as setting collision detection to prevent users from walking that close to objects, would only hurt overall user experience. Your responsibility as a developer, however, is to avoid requiring the user to put themselves into circumstances we know are sub-optimal.

Effects of Inter-Camera Distance

Changing inter-camera distance, the distance between the two rendering cameras, can impact users in important ways. If the inter-camera distance is increased, it creates an experience known as hyperstereo in which depth is exaggerated; if it is decreased, depth will flatten, a state known as hypostereo. Changing inter-camera distance has two further effects on the user. First, it changes the degree to which the eyes must converge to look at a given object. As you increase inter-camera distance, users have to converge their eyes more to look at the same object, and that can lead to eyestrain. Second, it can alter the user’s sense of their own size inside the virtual environment. The latter is discussed further in Content Creation under User and Environment Scale.

Set the inter-camera distance to the user’s actual IPD to achieve veridical scale and depth in the virtual environment. If applying a scaling effect, make sure it is applied to the entire head model to accurately reflect the user’s real-world perceptual experience during head movements, as well as any of our guidelines related to distance.

The inter-camera distance (ICD) between the left and right scene cameras (left) must be proportional to the user’s inter-pupillary distance (IPD; right). Any scaling factor applied to ICD must be applied to the entire head model and distance-related guidelines provided throughout this guide.

Potential Issues with Fusing Two Images

We often face situations in the real world where each eye gets a very different viewpoint, and we generally have little problem with it. Peeking around a corner with one eye works in VR just as well as it does in real life. In fact, the eyes’ different viewpoints can be beneficial: say you’re a special agent (in real life or VR) trying to stay hidden in some tall grass. Your eyes’ different viewpoints allow you to look “through” the grass to monitor your surroundings as if the grass weren’t even there in front of you. Doing the same in a video game on a 2D screen, however, leaves the world behind each blade of grass obscured from view.

Still, VR (like any other stereoscopic imagery) can give rise to some potentially unusual situations that can be annoying to the user. For instance, rendering effects (such as light distortion, particle effects, or light bloom) should appear in both eyes and with correct disparity. Failing to do so can give the effects the appearance of flickering/shimmering (when something appears only in one eye) or floating at the wrong depth (if disparity is off, or if the post processing effect is not rendered to contextual depth of the object it should be effecting—for example, a specular shading pass). It is important to ensure that the images between the two eyes do not differ aside from the slightly different viewing positions inherent to binocular disparity.

Although less likely to be a problem in a complex 3D environment, it can be important to ensure the user’s eyes receive enough information for the brain to know how to fuse and interpret the image properly. The lines and edges that make up a 3D scene are generally sufficient; however, be wary of wide swaths of repeating patterns, which could cause people to fuse the eyes’ images differently than intended. Be aware also that optical illusions of depth (such as the “hollow mask illusion,” where concave surfaces appear convex) can sometimes lead to misperceptions, particularly in situations where monocular depth cues are sparse.

[1] Shibata, T., Kim, J., Hoffman, D.M., Banks, M.S. (2011). The zone of comfort: Predicting visual discomfort with stereo displays. Journal of Vision, 11(8), 1-29.