Today, we introduced High Frequency Hand Tracking, a new tracking mode that allows for better gesture detection and lower latencies. Read more about this upgrade here and continue reading below to see how it was implemented in Tiny Castles.
Tiny Castles is an "action puzzle game" built from the ground up, using hand-tracking as the primary input. It was developed internally by the Facebook Reality Lab Oculus Strike Team to test necessary interactions for a hands-only game experience.
In the game, you play as a powerful god. The game's objective is simple—free your "believers" from the evil god and his minions and advance forward to destroy the obelisk which powers the evil in this world. The game was built to identify the types of game mechanics that could work well with hand-tracking. For this reason, it lacks real challenge and difficulty balancing and focuses more on showcasing a variety of hand interactions instead.
The game consists of 5 levels and demonstrates a variety of interactions built for hand-tracking. One of the levels is the "Playground" level, which is the best place to experience all of the game mechanics in an isolated setting while the actual game levels showcase these mechanics in real game scenarios.
The first thing we needed to determine was what we can do with the Hands API. Out of the box, the Hands API provided us with the information required to render a fully articulated representation of the user’s real life hands in VR without the use of controllers, including:
Hand position and orientation
Hand size (ie, scale)
Finger position and orientation
Pinch (finger + thumb) strength information
Pointer pose for UI raycasts
System gesture for opening the universal or application-defined menu
With Unity and the Oculus Integration, we were able to quickly stand up a simple test scene with hand tracked hands using OVRCameraRig and OVRHandPrefab.
Since an API is not an application, there were additional systems we had to build.
Although Tiny Castles is intended to be a “hands first” application, we also wanted to provide an equivalent Touch controller experience. To facilitate this, we created an abstract BaseHand class to define our core hand functionality:
Pinch (index + thumb)
Grab (for near field interactions)
Force Grab (for far field interactions)
The derived TouchHand class (used when controllers are active) implemented the required BaseHand functionality by essentially mapping button presses to animation states.
The derived HandTrackingHand class (used when hands are active) utilized the Hands API to implement the required BaseHand functionality and exposed additional hand tracking specific functionality such as per finger pinch strength and hand pose information.
Hand Pose Recognition
In Tiny Castles, we defined the hand pose as a bitmask of which fingers are currently visible based upon whether or not the fingertips are contained within a sphere in the palm of the hand.
This compact representation was the foundation of our hand input system and proved to be relatively forgiving, but at the cost of some fidelity. For example, our system is not able to differentiate between visible fingers that are close together or spread apart.
Hand Gesture Recognition
In addition to simple hand pose recognition, we needed to build a system that would combine hand poses and movements to allow for more complex behaviors. Enter the gesture behavior tree, which is a graph of nodes that describe the set of conditions required to complete the gesture.
This data-driven system has proven to be very valuable and has allowed anyone on the team to experiment with simple to complex gestures with little to no code support.
In order to support Tiny Castles gameplay, we needed to incorporate additional systems such as hand physics, grabbables, player abilities, NPC AI, etc. that are commonly found in commercial VR games.
What users find intuitive to do with their hands is very personal and subjective. What works well for one user does not necessarily work well for everyone, so testing with a large and diverse group of users is essential.
Determining what “feels good” requires a LOT of experimentation and iteration!
Using high-level Hand API classes such as OVRHand and OVRSkeleton allows developers to start prototyping quickly. However, using the lower level OVRPlugin interface gives developers more control.
Supporting controllers allows for easy A/B comparisons.
Use the same rig as the Hands API when building custom hand models to avoid needing to retarget incoming data.
Pinch is an easy mechanic to experiment with initially because support is included within the Hands API.
We needed to add the concept of pinch duration to determine how long a finger (or set of fingers) has been pinching. This is useful for input purposes since pinching is not quite the same as pressing a controller button.
Since the concept of grabbing is not included in the Hands API, we had to do much more experimentation in this area compared to pinch.
We first needed to define what it means to “grab”. This resulted in many questions that needed to be answered, including:
At what point between an open hand and closed fist are you grabbing?
Can you grab with two fingers? Three? Four?
Once you’ve grabbed something, how do you know when to release it?
Can you grab and pinch at the same time?
We experimented with several different grab detection methods:
Since pinch data is readily available, we were able to define grabbing as “pinching with all fingers."
Detection worked decently, but it required users to make a non-intuitive pose to trigger.
This method was relatively easy to implement using the physics capsule support in OVRSkeleton.
While neat at first (because physics), grabbing was difficult to do consistently in practice. The user had to be very deliberate in their actions. Since there is no tactile feedback with hand tracking, we encountered common physics issues such as interpenetrating an object too much could cause it to launch away unexpectedly, etc.
This method involves using a sphere placed in the palm of the hand to determine whether or not the fingertips are contained within the sphere (and laid the foundation for our hand pose recognition system).
Detection generally worked well. The main challenge was in defining the appropriate size and location of the sphere. Making the sphere too large resulted in “false positive” grabs. Making the sphere too small prevented grabs from being detected at all. Placement of the sphere was also important because users do not necessarily close their hand the same way.
Finger Curl Angle
This method involves measuring the angles of the fingers relative to the bind pose. The greater the angle, the more “curled in” the finger is.
Detection worked well for the index, middle, ring, and pinky fingers. However, detection of the thumb angle was somewhat problematic depending on how the user made a fist.
This method used collision information to determine when fingers were in contact with an object. Once enough fingers were in contact with an object, the object would be grabbed.
We found it difficult to find a balance between grab and release detection that worked well across a range of objects and users.
We found that predefined and/or procedural hand poses when holding an object not only looked better, but actually felt more immersive even though the user’s virtual hand did not exactly match what their real life hand was doing. This also laid the groundwork for our crank and lever mechanics.
We found precise selection of distant objects to be non-trivial. The force grab target selection system in Tiny Castles uses a cone to determine potential targets and selects the target that is closest to the center of the cone. We also incorporated aim assist and target retention techniques.
When aiming at a target, we found that using more of an “Iron Man” pose (where the user would point more with their palm) tracked more reliably than a “Star Wars” pose (where the user would have their fingers reach out toward the target) due to self-occlusion issues.
As a best practice, we recommend pose detection checks take place over a short period of time to reduce the occurrences of false positives and also momentary data inconsistencies.
As a best practice, we recommend that gestures be relatively short combinations of distinct actions. Also, try to avoid gestures that overlap common behavior to prevent false positives. For example, a gesture that consists of only making a fist will be inadvertently triggered constantly. However, a gesture that consists of making a fist then pulling your hand quickly towards you would probably not.
Tiny Castles uses a custom “gauntlet” model to represent the player’s hands. The initial design was to build all the UI into the gauntlet mesh, where the bracer portion would have 3-5 weapon mods that would transform the player’s hand into different weapons or tools.
We found that doing complex selection with the built-in UI was too complex and cumbersome. To streamline this, we switched to focusing on one elemental power per scene with a specific hand gesture.
We found the gauntlet increased player immersion and allowed us to avoid scaling the hands (which would have complicated hand physics).
Adding inverse kinematics to the bracer on the player’s virtual forearm added an additional layer of immersion.
An ongoing challenge in hand tracking applications will be locomotion. Tiny Castles uses fixed warped points in the world the player can pull themselves to with a simple grab+pull gesture. It feels intuitive, natural, and works reliably.
However, this method may not work for all games. We have done some locomotion tests using a virtual controller and a two-handed gesture approach, each with varying results. Without thumbsticks and buttons, though, coming up with a free-roam locomotion standard for games could prove challenging. The Playground level in Tiny Castles contains examples of some locomotion techniques.
In-world interactions such as cranks and levers can be reliable, but placement in the world is key.
Games that let the player control the pace, something like a Myst-type game or turn-based game, would probably work well with today's hand technology.
Simple gestures, like a grab+pull, are useful for activating game events.
Continuing to hold an object in your hand even when there are tracking issues avoids a lot of player frustration.
The Playground level ended up being an excellent sandbox to experiment with and showcase various mechanics.
Interacting with UI and pressing buttons with hand tracking can be frustrating because of a lack of haptics, tracking jitter, self-occlusion and/or all of the above. What we implemented and feel works best is to give the UI some backing colliders so that your virtual fingers actually collide with the UI. These “visual haptics” as well as adding sounds for the UI interactions also help to fill the void of actual physical haptics. Additionally, to avoid false button presses, a progress bar “fill” is implemented for feedback and to avoid inadvertent user actions.
Adding audible feedback to all core hand interactions made the experience feel more immersive since there is generally no haptic feedback when using hands. In particular, we found that adding sounds to the grab and release mechanics helped reinforce to users when they were successful. "Hover over" sounds were also added to the force grab, force pull, and warp locomotion mechanics to let users know when a new target has been selected. These interaction sounds sit slightly louder in the audio mix to help increase user confidence.
During the development of Tiny Castles, we had to deal with several challenges that all hand tracking developers will encounter at some point.
Hand tracking data currently has more noticeable latency relative to controllers. This can complicate situations where response time is critical. Using hand physics can also increase the perceived latency, but this was necessary for Tiny Castles gameplay purposes.
There is noticeable jitter when keeping your hands relatively still.
Camera orientation also contributes to the perception of jitter. For example, if the user keeps their hands still and looks around, they will notice their hands slightly moving.
To reduce the perception of jitter in Tiny Castles, we added custom smoothing of the hand when it is not moving quickly. This helped quite a bit, but at the cost of introducing slightly more perceived latency in some situations.
Low Confidence Tracking Data
There will be times when the data returned by the Hands API is low confidence. How this situation is handled is very important. Using the data as-is can result in erratic hand behavior.
In Tiny Castles, when we detect low confidence data we keep the hands in the last known good pose and swap the material so the hands are obviously red. The hands remain in this state until high confidence data is received. We found this method helped users quickly identify there was an issue and trained them to self correct as needed.
Occlusion occurs when portions of the fingers and/or hand are not visible to the cameras. This can cause unexpected results when the virtual representation of the user’s hands significantly diverges from what they are doing in real life.
Unfortunately, occlusion can occur in many situations that first time hand tracking users find “intuitive” and “natural”:
Users often naturally tend to point in a way that results in self-occlusion, where the user’s arm or the back of their hand will hide their fingers. For example, when pointing to give directions.
Just like pointing, users tend to extend their hand and arm in a way that results in self-occlusion. For example, when reaching out to pick up something on the ground.
Placement of in-world buttons and UI menus should take self-occlusion into account.
Occlusion issues affected design decisions in Tiny Castles the most.
Hand tracking on Quest and Quest 2 hardware works best when hands are well within the cameras’ FOV. Unfortunately this does not cover all the areas users tend to place their hands, such as when hands are at their sides or when performing an overhand throw.
High Frequency Hand Tracking
In Tiny Castles, we observed that high frequency hand tracking results in a slight reduction in perceived latency but also significantly improved tracking quality during fast hand movements. There were no perceived changes regarding jitter and occlusion.
Developers should note that enabling high frequency hand tracking on Quest 2 will result in reducing the maximum CPU and GPU available.
As hand tracking capabilities continue to improve, we’re looking forward to seeing the creative ways that they can be used to create an even more intuitive and immersive VR experience. Please let us know your thoughts and hand tracking experiences in the comments or developer forum.