We’ve compiled some of the best practices we’ve learned through researching, testing, and designing with hands. Most of our experiments have been for the system, but the learnings we gathered can be applied to any experiences. Our hope is that this resource saves you time and effort as you learn about this new input modality, so you don’t have to start from scratch when building experiences.
The thumb-index pinch is our preferred method of selection. The contact between your thumb and index finger can compensate for the lack of tactile feedback that other input devices typically provide.
The pinch also works well because it’s simple to perform and easy to remember. Even more importantly, it’s not a hand pose that’s frequently used, which makes it less likely that the system will pick up unintended triggers.
Note: Pinches have three states: Open, closing, and closed. The “closing” state is a way to provide confirmation of a user’s intent to act, which reassures them that the sensors are picking up their motion. It also prevents unintended selections if their thumb and index finger accidentally approach each other.
As mentioned in our Principles section, hands don’t provide tactile feedback the way controllers and other input devices do. It’s important to compensate for this with both visual and audio throughout user interactions.
We think of this in two ways: Signifiers communicate what a user can do with a given object. Feedback confirms the user’s state throughout the interaction. Visually, objects can change shape, color, opacity and even location in space (like a button moving toward an approaching finger to signify that it’s meant to be pushed). You can also play with volume, pitch and specific tones to provide audio feedback. These elements can work together to guide users seamlessly as they navigate the system.
Each of the components that we experimented with for hand tracking was designed to provide continuous feedback throughout user interactions.
Computer mice and keyboards provide satisfying haptic feedback with clicks. Controllers provide it through vibration. Since hands don’t have any comparable haptic feedback, we found that it was important to design distinct sounds that confirm when a user interacts with a component.
Note: We also learned that it can be easy to over-correct and create a system that’s too noisy, so be mindful of finding the right balance.
Using the pinch as a selection method is a way of providing tactile feedback during an interaction. The contact between your thumb and index finger can be thought of as a satisfying proxy for the click of a button, or for feeling the object you’re grasping.
See our User Interface Components section for more about signifiers and components.
Raycasting is our preferred interaction method for far-field targeting.
We’ve experimented with tracking from the wrist, but we found that hands have some natural tremor that gets magnified over distance when raycasting. To solve this, we use a secondary position on the body to stabilize the raycast.
The optimal point of origin for this secondary position varies depending on whether you’re standing or sitting. For standing experiences a shoulder ray works well, because target objects would most likely be below your shoulders. When seated, target objects are likely at a height that would require raising the wrist uncomfortably high, so an origin point near the hip is a less fatiguing alternative.
However, for most experiences you won’t know whether a user is sitting or standing, and they may even move freely between the two. Our solution was to develop a raycasting model that blends between the shoulder and the hip based on the angle of your gaze.
There are certain limitations to computer vision that are important to consider when designing your experiences.
The sensors have a limited tracking volume, which means objects outside of their horizontal and vertical field of view won’t be detected. To make sure a user’s relevant motions are being tracked, try to avoid forcing people to reach outside of the tracking volume. Keep in mind, however, that the hand-tracking volume is larger than the display field of view — so hands may be tracked even when the user can’t see them.
The more of your hands the headset’s sensors can see, the more stable tracking will be. Try to design interactions in a way that encourages users to keep their palms facing either towards or away from the headset. Tracking will diminish if the fingers are blocked by the back of the hand or curled up in the palm.
It’s also important to avoid the overlap of two hands due to current computer vision limitations. A good way around this is to design interactions that can be performed with just one hand, which has the added benefit of making your interactions more accessible.
When designing experiences, it’s important to make sure the user can remain in a neutral body position as much as possible. Ideally, users should be able to interact with the system while keeping the arm close to the body, and the elbow in line with the hip. This allows for a more comfortable ergonomic experience, while keeping the hand in an ideal position for the tracking sensors.
Interactions should be made in a way that minimizes muscle strain, so try not to make people reach too far from their body too frequently. When organizing information in space, the features a user will interact with most often should be closer to the body. On the flipside, the less important something is, the farther from the body you can place it.
At the most basic level, hand representation needs to fulfill two functions:
Your first instinct might be to create a realistic representation of a human hand, but this can be an expensive and difficult endeavor. Realistic hands often feel uncanny at best, and at worst can make users feel disembodied. Instead, think about what works best for the experience you’re building.
If the visual representation of hands is an important part of your immersive experience, then it’s important to make sure the representation is either anonymous enough for anyone to feel embodied (like an octopus hand), or can be customized by users to suit their needs.
In contexts where your hands are primarily an input rather than a part of the experience, we’ve found that a functionally realistic approach is ideal. This means that the representation itself lets people know what they can do with their hands within a given experience, without requiring a level of detail that can be hard to produce and easy to get wrong. You can see the approach we took to system hands in our User Interface Components section.
Imagine if every time you moved your hand, you accidentally dragged a virtual lamp with it. Establishing gating logic helps avoid that scenario by filtering out common cases of unintended interactions. Here are some examples of gating logic that we’ve had success with.
When your hands are at rest, they’re often still in the tracking field of view. But as hands relax they tend to curl up, which can trigger unintended interaction. To solve this, we’ve experimented with an idle state.
Hands enter an idle state when they’ve been lowered at a specific distance from the head and haven’t moved in a specific amount of time. Then, for a hand to become active again, it must re-enter the tracking field of view and change pinch states. This action is deliberate enough to ensure the hand won’t become active without the user’s intent.
Unless a user clearly indicates that they want to interact with the system, having a constant cursor and pointer can be distracting. Establishing a pointing state can let the system know when the pointer and cursor are desired.
The hand enters the pointing state when it’s pointing away from the body and toward the panel at a very specific angle. This signals the user’s intent to scroll, browse or select items, which is an appropriate time for the cursor and raycast to appear.
Hands don’t have a Home button, so we needed to provide users with a way of summoning the system menu while in third-party apps and experiences. To prevent users from accidentally performing the gesture while in immersive experiences, we gated the gesture in two ways: Holding the palm up while looking at it, then holding a pinch.
Raising the hand up toward the face both puts the hands in a good tracking position for the sensors, and is an unusual enough pose that it’s unlikely to be performed accidentally. The hand then hand glows to let the user know they’ve entered this state, so they can lower their hand if this was unintentional. Finally, the user has to pinch and hold to complete the gesture and summon the system.
Note: While most of our interactions are performed through analog control, this system gesture is a good case study for how to make an abstract gesture feel responsive, while keeping gates in mind.