Hand Pose Detection
A hand pose is defined by shapes and transforms. Shapes are boolean conditions about the required position of the hand’s finger joints. Transforms are boolean conditions about the required orientation of the hand in the world space. A pose is detected when the tracked hand matches that pose’s required shapes and transforms.
This topic explains poses, shape and transform recognition, and the criteria used to determine if a pose is detected. It also describes debug components you can use to visually debug poses. To make your own custom hand pose, see
Build a Custom Hand Pose.
Interaction SDK includes six ready-to-use example pose prefabs. You can define your own poses using the patterns defined in these prefabs. You can experiment with pose detection using these pose prefabs in the
PoseExamples sample scene.
- RockPose
- PaperPose
- ScissorsPose
- ThumbsUpPose
- ThumbsDownPose
- StopPose
Each pose prefab has these components:
- A
HandRef
that takes IHand
. The other components on this prefab read hand state via this reference. - One or more
ShapeRecognizerActiveState
components that become active when criteria of a specified shape is met. - A TransformRecognizerActive that becomes active when a transform feature (such as a particular wrist orientation) is detected.
- An
ActiveStateGroup
that returns true when all dependent ActiveStates are true. - An
ActiveStateSelector
and SelectorUnityEventWrapper
, which can invoke events when a pose is detected.
Poses include one or more shapes. A shape is a set of boolean conditions about the position of one or more fingers. The conditions are defined using
Finger Features states. If a tracked hand meets these conditions, the shape becomes active. If all the shapes in the pose are active and the transform is also active, the pose is detected. A pose’s shapes are stored as
ShapeRecognizer assets in the pose’s
ShapeRecognizerActiveState component.
ShapeRecognizerActiveState ShapeRecognizerActiveState
checks the shapes that make up a pose against the state of every finger. If they all match, the
ShapeRecognizerActiveState
becomes active.
ShapeRecognizer
is a
ScriptableObject
that defines a shape. To define a shape, it uses a set of rules called Feature Configs. Feature Configs specify a required position (state) for each of the five fingers. It defines state using at least one of the
finger features: curl, flexion, abduction, and opposition.
ShapeRecognizer
is referenced by the
ShapeRecognizerActiveState component to determine if a pose is active.
FingerFeatureStateProvider FingerFeatureStateThresholds FingerFeatureStateThresholds
is a
ScriptableObject
that defines the state thresholds for each
finger feature. A state threshold is a set of boundaries that determine when a finger has transitioned between states. For example, the curl feature has 3 states: open, neutral, and closed. The state thresholds for curl use an angle in degrees to define when the finger’s state has changed from open to neutral, neutral to closed, or vice-versa.
Interaction SDK provides four sets of default state thresholds, which are under DefaultSettings/PoseDetection
:
- DefaultThumbFeatureStateThresholds (for the thumb)
- IndexFingerFeatureStateThresholds (for the Index finger)
- MiddleFingerFeatureStateThresholds (for the Middle finger)
- DefaultFingerFeatureStateThresholds (for the Ring & Pinky fingers)
The thumb’s curl state threshold. For curl, the value is an angle in degrees.
FingerFeatureStateThresholds Example Given the transition between two states, A <> B:
If the current state is “A”, to transition up to “B” then the angle must rise above the midpoint for that pairing by at least (width / 2.0) for “Min Time In State” seconds.
If the current state is “B”, to transition down to “A” then the angle must drop below the midpoint for that pairing by at least (width / 2.0) for “Min Time In State” seconds.
So for Curl, to transition:
- From Open > Neutral: value must be above 195 for 0.0222 seconds
- From Neutral > Open: value must be below 185 for 0.0222 seconds
- From Neutral > Closed: value must be above 210 for 0.0222 seconds
- From Closed > Neutral: value must be below 200 for 0.0222 seconds
Finger Features are specific finger positions that let you define a shape. There are four features:
Represents how bent the top two joints of the finger or thumb are. This feature doesn’t take the Proximal (knuckle) joint into consideration.
States:
- Open: Fingers are fully extended straight.
- Neutral: Fingers are slightly curled inwards, as if they were wrapped around a coffee mug.
- Closed (pictured): Fingers are tightly curled inwards such that the tips are almost touching the palm.
The joints used to measure the curl feature.
The extent that the Proximal (knuckle) joint is bent relative to the palm. Flexion is only reliable on the four fingers. It can provide false positives on the thumb.
States:
- Open: The first bone on the fingers is fully extended and is parallel to the palm.
- Neutral: Somewhat bent.
- Closed: Knuckle joint is fully bent (pictured).
Warning
Flexion is only reliable on the 4 fingers. It can provide false positives on the thumb.An example of flexion where the knuckle joint is in the closed state.
Abduction is the angle between two adjacent fingers, measured at the base of those two fingers. Meaning the angle between the given finger and the adjacent finger that’s closer to the pinky. For example, Abduction for the index finger is the angle between the index and middle finger.
States:
- Open The two fingers are spread apart (pictured for index).
- Closed: The two fingers are tightly compressed together (pictured for thumb, middle, ring).
- None: Not currently used.
Note: Abduction on Pinkie is not supported.
An example of abduction. The index finger is in the open state. The thumb, middle, and ring fingers are in the closed state.
How close a given fingertip is to the thumb tip. Can only be used on index, middle, ring, and pinky fingers.
States:
- Touching: The fingertip joints are within ~1.5cm (pictured for index).
- Near: The fingertip joints are between ~1.5cm and ~15cm apart.
- None: The fingertip joints are greater than ~15cm apart.
An example of opposition. The index finger is in the touching state.
Poses consist of one or more transforms. The transform of the hand only represents the orientation and position. The orientation is only evaluated relative to the WristUp, WristDown, PalmDown, PalmUp, PalmTowardsFace, PalmsAwayFromFace, FingersUp, FingersDown, and PinchClear transforms.
A pose’s required transforms are listed in the pose’s
TransformRecognizerActiveState component. During hand tracking, the hand’s transforms are compared to the pose’s transforms. If both sets of transforms match and all the shapes in the pose are active, then the pose is detected.
The axes that define the hand’s fingers, wrist, and palm.TransformRecognizerActiveState
takes a hand, the current state of the hand’s transforms, and a list of transform feature configs, and a transform config. To get the current state of the hand’s transforms, it uses the
GetHandAspect
method to retrieve the
TransformFeatureStateProvider
component. That component reads the raw feature values and quantizes them into
TransformFeatureStates using the
TransformConfig
you provide in this component.
Note:
Once you register a specific configuration, the
RegisterConfig
method can then query the state of each state tracked for configuration. It leverages
FeatureStateProvider
to drive state-changing logic.
Velocity recognition components detect motion, whereas shape recognition only detects static poses. For example, shape recognition can detect a hand in a thumbs-up pose, but cannot determine if the hand is moving upward while in that pose.
There are two velocity recognition components. Both components get joint data from the
JointDeltaProvider
component.
Sequence
takes a list of
ActivationSteps and iterates through them as they become active. Each
ActivationStep consists of an
IActiveState
, a minimum active time, and a maximum step time. These steps function as follows:
- The
IActiveState
must be active for at least the minimum active time before the Sequence
proceeds to the next step. - If an
IActiveState
is active for longer than the maximum step time, the step will fail and the Sequence
will restart.
Once the final
ActivationStep in the
Sequence
has completed, the
Sequence
becomes active. If an optional KeepActiveWhile
IActiveState
has been provided, the
Sequence
will remain active as long as KeepActiveWhile is active.
The last phase of a
Sequence
is the optional cooldown phase, the duration of which can be set in the RemainActiveCooldown field. A
Sequence
that is deactivating will wait for this cooldown timer to elapse before finally becoming inactive.
Internal Implementation Classes
The classes described here implement the underlying functionality, but will not need to be modified in order to use Pose Recognition.
FeatureStateProvider
is a generic helper class that keeps track of the current state of features, and uses thresholds to determine state changes. The class uses buffers to state changes to ensure that features do not switch rapidly between states.
FeatureStateThresholdsEditor A generic editor class that defines the look and feel of dependent editor classes such as FingerFeatureStateThresholdsEditor and TransformStatesThresholdsEditor.
FeatureDescriptionAttribute Lets you define editor-visible descriptions and hints and values to aid users in setting thresholds for features.
IFeatureStateThreshold
is a generic interface that defines the functionality of all thresholds used in hand pose detection.
IFeatureThresholds
provides an interface to a collection of
IFeatureStateThreshold
s as well as
MinTimeInState (i.e. a minimum threshold of time a feature can be in a certain state before transitioning to another state).
ColliderContainsHandJointActiveState An
IActiveState
that tests to see if a hand joint is inside a collider. If a SphereCollider is specified then its radius is used for checking, otherwise the script relies on the collider’s bounds. This class is useful in case a developer wishes to see if the hand joint exists within a certain volume when a pose is active.
Can be attached to an object that needs to be anchored from the center eye. Position and rotation offsets can be specified, along with options to toggle the roll, pitch and yaw of the latter. One can combine this with a
ColliderContainsHandJointActiveState
to position a collider relative to the center eye.