Since September, I’ve worked hard with the software team at Oculus VR on the SDK for the Rift. Tracking head orientation with as little latency and error as possible was a key challenge to making it work well. From a math and engineering perspective, it is an old problem. People have wanted to track the orientation of sea vessels and land vehicles for millenia. Over the past century, clever sensing systems have been developed to track aircraft, spaceships, missiles, robots, VR headsets, and smart phones. I’ve spent many years as a robotics professor, thinking hard about localizing robots and a host of other interesting problems that mix sensors, motors, and computers. I jumped at the chance to make tremendous impact through available source code and a developer community that is supercharged about VR gaming. I am thrilled to be at Oculus!
In the rest of this post, I will explain how our tracking method works, what challenges we faced, and why various design decisions were made. One overriding theme throughout our development has been to keep the method simple so that it is easier to understand its behavior, to optimize its performance, and to make future enhancements. We could approach the problem using standard sledgehammers, such as the Kalman filter  or particle filters , but these require significant modeling assumptions and adjustment to reach their theoretical benefits. For example, the Kalman filter is the optimal estimator for linear systems with linear measurements and Gaussian noise, but their performance outside of that range is not guaranteed. Furthermore, the method is appropriate for systems with a lower sampling rate and a high degree of predictability due to a stronger motion model. Particle filters are more suited for problems in which the world state is enormous, which might include, for example, models of the surrounding obstacles (think about robots mapping their environment). Another alternative, which arises from classical linear filtering theory, is the complementary filter, which combines high-pass filtering of gyroscope data with low-pass filtering of accelerometer data. For a comparison of these approaches to Kalman filters, see . Our approach is similar in spirit to complementary filtering, but due to the power of modern computers, we can run algorithms in each iteration that are specific to our tracking problem.
Integrating gyroscope readings
The gyroscope inside of the Rift measures the head’s orientation change at a rate of 1000 times a second. The software needs to compute the current head orientation, given the previously “known” head orientation and the latest gyroscope measurement. Imagine trying to figure out how far a car has traveled by reading the speedometer. The update would look like this:
Current distance = Previous distance + Time difference Observed speed.
As the time difference shrinks to zero, the formula gives the exact distance, assuming the speedometer is 100% accurate. In reality, the time difference is not zero and the speedometer imperfect, causing drift error, which will be discussed later.
Now turn to the problem of tracking a human head, which has three rotational degrees of freedom. The orientation of a 3D rigid body is often described by yaw, pitch, and roll angles. They are convenient for making figures like the one on the left, but later cause a lot of trouble due to numerical singularities (see gimbal lock) and a huge variety of alternative, incompatible definitions (see Euler angles–pronounced by Americans as “oiler angles”). We therefore use quaternions internally for representing orientation.
Suppose that the head is rotating about the axis only, observed by the sensor to be an angular velocity of radians per second. Assuming 1000 sensor readings per second, an angular version of the previous update formula is:
Current orientation = Previous orientation + 0.001 .
This is exactly how the update works in the Rift SDK, but it is slightly more complicated so that it handles any combination of yaw, pitch, and roll. The gyroscope provides angular velocity with respect to all three of these, producing a 3D vector:
It is known from mathematics that every 3D orientation can be nicely described by a rotation of degrees about some axis poking through the origin. The rotating head can then be thought of a spinning top that keeps changing speed and axis. Amazingly, is exactly the rotation axis (though you might want to normalize it). Furthermore, its length is the angular speed of rotation about that axis. So, the update equation is simply
Current quaternion = Previous quaternion Quat(axis, angle),
in which Quat(axis, angle) means a unit quaternion that represents rotation by angle about the given axis. Unit quaternions are used because it is easy to convert them to and from the axis-angle description, and their multiplication operation combines orientations in a way that is equivalent to multiplying out their corresponding 3 by 3 rotation matrices. We also avoid numerical singularity issues associated with yaw, pitch, and roll angles.
This method is simple and fairly accurate. It relies on some prefiltering of gyroscope readings, which occurs in hardware. More complicated numerical integration formulas could be tried (the one above is called Euler integration; see 4th-order Runge-Kutta for a better alternative), but we did not find any need when operating at 1000 measurements per second with prefiltered data. Some predictive filtering is also applied, which I plan to talk about in a later post.
After many thousands of updates, the true orientation will drift away from the calculated orientation. Therefore, other sensors are needed to bring the orientation back into correct alignment. Drift in the pitch and roll angles is called tilt error, which corresponds to confusion about which way is up. Drift in the yaw angle is called yaw error, which is confusion about which way is North, or at least which way you are facing relative to when you started. The discussion of yaw error correction is planned for a future post due to the complications of using a magnetometer; some of the ideas below, however, apply to that case as well.
To handle tilt error, let’s think about what “up” actually means. Our perception of “up” is based entirely on gravity. It is in the direction of a ray that starts at the center of the Earth and pokes through your body. We have been taught that the acceleration due to gravity is 9.81, but it actually varies up to a half of a percent depending on your location on the Earth (you are actually lighter at the equator—imagine being on the edge of a huge merry-go-round!).
Gravity is expressed as an acceleration vector, so it seems natural to use an accelerometer to measure it. While standing on the earth, it is as if we are riding on a rocket that is constantly accelerating upward, which is why we are stuck to the ground. A three-axis accelerometer measures this vector, but it unfortunately measures any additional accelerations of the sensor. When placed in the Rift, it measures the linear accelerations due to head motions, in addition to gravity. To handle this, we want to have high confidence that gravity vector is being measured in isolation. Because the drift error grows slowly, we wait for two simple conditions to be met over a few tens of milliseconds:
In addition, all accelerometer readings are filtered by a simple moving average. If the conditions are met, then it assumed that the accelerometer is correctly reporting the direction of “up”. Although a standard method , it is clearly flawed because you can accelerate the sensor downward to cancel off part of gravity, while laterally accelerating to bring the magnitude back up to 9.8. Nevertheless, it is simple and works well enough for us.
Now suppose that an error angle has been detected between what is currently believed to be “up” and the acceleration vector measured by the accelerometer. The sensor fusion system then needs apply a corrective rotation. The angle is , but what is the rotation axis? It must lie in the horizontal, plane and be perpendicular to both and the axis.
Simply project into the horizontal plane, to obtain . A perpendicular vector that remains in the horizontal plane is , which is the tilt axis. Imagine grabbing on to the tilt axis and twisting to bring back into alignment with the axis.
Once tilt error is detected, the remaining issues are when to perform the corrective rotation and how much. This is actually an ongoing research topic, to which developers working with the Rift may bring new insights. If a player notices the tilt correction while staring in one direction, then the effect could be nauseating. On the other hand, if they turn their head quickly, perhaps all of the needed corrections can be performed without them even noticing.
We currently take the following approach. In the first few seconds after the Rift is turned on, if there is a huge tilt error, then we rotate by the entire . A common situation is that the Rift could be sitting on your lap or on a table upon startup. At this point, a large correction needs to be made. Otherwise, a tiny correction is applied in each cycle of the sensor fusion. The rotation axis may frequently change while corrections are being performed. When the system knows that tilt correction needs to be performed, a critical issue is to perform it at a time and rate that the player will tend not to notice. We experimented with several alternatives, but it seems to be a matter of personal preference.
Can you think of a better way to handle sensor fusion? Hack it up and give it a try!
 Doucet, A., De Freitas, N., and Gordon, N.J., Sequential Monte Carlo Methods in Practice. Springer, 2001.
 Favre, J., Jolles, B.M., Siegrist O., and Aminian, K.,
Quaternion-based fusion of gyroscopes and accelerometers to improve 3D angle measurement, Electronics Letters, Volume 32, Issue 11, pp. 612-614, 2006.
 Higgins, W. T., A Comparison of Complementary and Kalman Filtering, IEEE Transactions on Aerospace and Electronic Systems, Volume 11, Issue 3, pp. 321-325, 1975.
 Stengel, R. F., Optimal Control and Estimation, Dover, 1986.