The Latent Power of Prediction

“You are defeated. Instead of shooting where I was, you should have shot where I was going to be. Muahahahaha!”*-Lrrr (character from Futurama, after invading Earth in Space Invaders style)*

**Introduction**

We have all learned that latency is the bane of virtual reality. Because your head freely rotates, presenting the correct image to your eyes is like firing a bullet at a moving target. The target is “sighted” by sensor fusion software, which provides the direction you are currently looking (see [8] for a survey of techniques). Latency is the time it takes between your head moving to a new orientation and the correct image arriving on your retinas. In real, old-fashioned reality, the latency is effectively zero. In VR, latency is widely recognized as a key source of disorientation and disbelief (the brain cannot be fooled). In this post, I will argue that simple prediction techniques can reduce latency so much that it is no longer the main problem. Simply present the image that corresponds to where the head is *going to be*, rather than *where it was*.

As enthusiasm for VR gaming has ramped up over the past year, recent blog articles and talks by veterans John Carmack and Michael Abrash have focused on latency as the key obstacle. In [3], Carmack provides a nice summary of the factors that contribute to latency. He offers some strategies to reduce it, but does not emphasize prediction as serious contender. In [1], Abrash calls for fundamental change in the way rendering and display technology are currently pursued so that latency can be reduced while maintaining high fidelity images. I hope this happens because it could improve VR experiences on all fronts! Prediction, however, is discouraged due to increased error when the motion direction changes.

How much latency is too much? Based on VR research during the 1990s, 60 milliseconds (ms) has been commonly cited as an upper limit for acceptable VR. Even so, I definitely notice a disturbing lag when the latency is greater than 40ms. Most people agree that if latency is below 20ms, then the lag is no longer perceptible. Abrash even calls for 7ms to 15ms to be safe. How close are we in modern times? For a game running at 60 FPS, the latency when using the Oculus Rift Development Kit is typically in the range from 30ms to 50ms, including time for sensing, data arrival over USB, sensor fusion, game simulation, rendering, and video output. Other factors, such as LCD pixel switching times, game complexity, and unintentional frame buffering may drive it higher, but it is important to note that the latency period is generally shorter than it was decades ago.

Why is prediction often not taken too seriously? The most likely reason is that through decades of VR research, it has become widely known as a double-edged sword. Most of the time it works, but then makes catastrophic errors when the head motion abruptly changes. This was true across a wide range of VR and AR systems; however, the game has changed thanks to new technologies. The main factors are:

- How far into the future do we need to predict?
- How far into the past do we need to look to estimate the trajectory?

Regarding the first factor, Ronald Azuma’s highly cited 1995 thesis [2] on predictive tracking for AR insists on keeping the prediction interval “short”: Below 80ms! If the latency in a current system is only 50ms, then accurately predicting around 30 to 40ms into the future would already tackle most of the problem, even satisfying Abrash’s extreme demands.

To understand the effect of shortening the prediction interval, suppose that the head accelerates at a rate of deg/sec². After seconds, the angular velocity will change by deg/sec and the orientation will change by degrees. It is notoriously difficult to estimate time derivatives, making it hard to accurately measure angular acceleration [5]. Furthermore, the acceleration could change during the prediction interval. In either case, the error could grow at least quadratically with respect to the prediction interval length. For an aggressive acceleration of 1000 deg/sec² that goes unaccounted for (see the figure), the error in head orientation for 20ms is 0.2 degrees. For 40ms, it is already 0.8 degrees. At 80ms, it is already up to 3.2 degrees. Therefore, predicting 20ms into the future is **much** easier than predicting 40ms, and 40ms is much easier than 60 to 80ms. Also working to our advantage in shorter intervals is the fact that the head, while wearing a VR headset, has significant momentum. This has a smoothing effect that prohibits excessive rate changes.

Now consider the second factor: If we need to look too far into the past to reliably estimate the trend, then an implicit latency is built into the estimate. Suppose that optical tracking is performed using cameras at a nice rate of 60 frames per second. This means an estimate of the head orientation arrives every 16.67ms. If these estimates are noisy, and we would furthermore like to know how the orientation is changing, then several measurements are needed. If we use 6 samples, then we have reached 100ms into the past to determine the trend, effectively lengthening the prediction interval.

We can thank the smartphone industry for helping with this problem. MEMS-based sensors continue to improve, providing accurate, high-frequency measurements in a tiny, low-cost package. Modern gyroscopes provide angular velocity measurements at 1000Hz. Even at this incredible rate, the measurements are even prefiltered to reduce noise; raw measurements can be obtained at around 10,000Hz. So, only 1ms of MEMS gyroscope data may be more informative than looking 100ms into the past with an optical tracking system. Better yet, a few milliseconds of gyroscope data could enable accelerations or higher-order trends to be estimated.

To summarize, the game has changed: Trackers do not need to predict as far into the future, and they barely need to look into the past.

**Some Technical Details**

Predictive tracking or filtering is an old idea, extending back to the early days of signal processing and control theory. Let the *state* refer to the quantity that we would like to track and predict. A classical example is the position, orientation, and velocity of an aircraft. Predictive filtering is based on three parts:

- The sensor readings up until now.
- A model of what each reading tells about the state at that time.
- A model of how the state changes over time.

To keep it simple, let’s suppose that #1 and #2 cause no trouble: A sensor reading directly provides the current state. If we obtain sensor readings at regular intervals (for example, every 10 ms), then what will the state be one step into the future? Let refer to the *i*th reading. A linear prediction approach looks like

,

in which are constants chosen in advance. They provide the predictive model (#3 above). For example, is a simple model which predicts that the state will never change. The model predicts that the state changes at a fixed rate. More complicated linear prediction filters can handle other factors, such as noise reduction and state acceleration.

Linear prediction is just one type of filter among many others. For example, Bayesian filters use probabilistic modeling (in #2 and #3 above) to arrive at distributions over possible current states and future states. Heavier weights are given to more likely futures. The celebrated Kalman filter is a famous special case of Bayesian filters for which all of the distributions become Gaussian and there is a nice update formula for each step. The most basic and general way to view filter design is in terms of *information states*, an idea introduced by von Neumann and Morgenstern [7]: When playing an iterative game with uncertain current state, all past information is compressed into some representation that is critical for decision making. In our case, the “decision” is specifying the future state. The information state is updated in each stage, and forms the basis for a decision. Think about what you need to keep track of to play Battleship effectively. Card counting strategies for Blackjack are another good example. Finally, what information state should a game AI maintain? For filtering from an information-state perspective, see Chapter 11 of my book.

Now consider tracking head orientation, which means the state is a quaternion that represents head orientation. From my previous blog post, the orientation is updated every millisecond by calculating a quaternion that represents the rotation that occurred over that time interval. The critical piece of information is the current angular velocity, as measured using a gyroscope.

Consider the following methods:

**No prediction:**Just present the updated quaternion to the renderer.**Constant rate:**Assume the currently measured angular velocity will remain constant over the latency interval.**Constant acceleration:**Estimate angular acceleration and adjust angular velocity accordingly over the latency interval.

The first method seems absurd because it assumes that the head will immediately come to a complete stop and remain that way over the latency interval (recall ). The second method extends the rotation rate over the latency interval (recall , but now we use the angular velocity). If the rotation rate remains constant, then the rotation axis is unchanged. We only need to extend the rotation angle about that axis to account for the longer time interval. To predict 20ms into the future, simply replace with . The third method allows the angular velocity to change at a linear rate when looking into the future. The angular acceleration is estimated from the change in gyroscope data. For each small step 1ms into the future, the acceleration is applied to change the predicted angular velocity. For example, if the head is decelerating, then its predicted angular velocity will be smaller in each time step along the latency interval. The figure shows their differences in terms of calculated angular velocity over the prediction interval.

One remaining detail is noise reduction. Errors in estimating the current angular velocity tend to be amplified when making predictions over a long time interval. Vibrations derived from noise are particularly noticeable when the head is not rotating quickly. Therefore, we use simple smoothing filters in the estimation of current angular velocity (Methods 2 and 3) and current angular acceleration (Method 3). We use Savitzky-Golay filters, but many other methods should work just as well.

**Performance**

A simple way to evaluate performance is to record predicted values and compare them to the current estimated value after the prediction interval has passed. Note that this does not compare to actual ground truth, but it is very close because the drift error rate from gyroscope integration is very small over the prediction interval. I’ve compared the performance of several methods with prediction intervals ranging from 20ms to 100ms. The following graph shows error in terms of degrees, for a prediction interval of 20ms, using the Oculus Rift sensor over a 3 second interval:

I was wearing the Rift and turning my head back and forth, with a peak rate of about 240 deg/sec, which is fairly fast. This is close to reported peak velocities in published VR studies [4,6]. The blue line represents Method 1 (no prediction), which performs the worst. The red line shows Method 2 (constant rate), which is much improved. The yellow line shows Method 3 (constant acceleration), which performs the best in the comparison. Method 1 is used by default in the original SDK release for the Rift Development Kit, but with prediction turned on, Method 2 is used. A variant of Method 3 is expected to appear in an upcoming release.

Numerically, the angular errors for predicting 20ms into the future are:

Method | Average error | Worst error |

1 | 1.46302 | 4.77900 |

2 | 0.19395 | 0.71637 |

3 | 0.07596 | 0.35879 |

During these motions, the acceleration peaked at around 850 deg/sec². The fastest I could rotate my head while wearing the Rift was about 600 deg/sec, with peak accelerations of around 20,000 degrees/sec² (and my neck still hurts as I am typing this). By flipping the Rift in my hands and catching it again, I was able to obtain 1400 deg/sec and 115,000 deg/sec²; however, these speeds are unreasonable! Typical, slower motions, which are common in game play, yield around 60 deg/sec in velocity and 500 deg/sec² in peak accelerations. For both slow and fast motions with a 20ms prediction interval, Method 3 is generally superior over the others. If we double the prediction interval, then performance degrades; however, the prediction methods remain preferable over nothing. For similar head motions as above, the results for 40ms prediction are:

Method | Average error | Worst error |

1 | 3.36267 | 9.68985 |

2 | 0.57410 | 1.59862 |

3 | 0.17338 | 0.50788 |

**Discussion**

During ordinary game play, even with some fast head motions, simple prediction techniques accurately predict 20 to 40ms into the future. Subtracting this time from the actual latency results in an *effective* latency that is well below 20ms. Hooray! It appears that the holy grail has been reached. But not really. As mentioned before, other factors may drive the actual latency higher. Also, the effect of small prediction errors is difficult to assess. This ties directly into *perception*, which is an important topic missing from the discussion so far. For example, when the head is almost stationary, small perturbations are more noticeable than when the head quickly rotates. How much error is imperceptible and how does this vary with respect to angular velocity, acceleration, screen resolution, shading methods, and so on? How important is the *direction* of the error as it propagates over time? Answers to these questions would help to further improve prediction methods. At the same time, improvements in computation power, software, rendering, and display technologies (OLEDs) are expected to reduce the actual latency, which would further shorten the required prediction interval.

Latency is no longer the powerful beast that it once was. It has been beaten down and nearly defeated by modern sensing technology and effective filtering techniques. This will cause attention to shift to a host of other problems. If latency is no longer causing simulator sickness, then what about the game content? A fast ride on a virtual roller coaster may cause more disorientation than latency or other VR system artifacts. Furthermore, what kind of user interfaces are most appropriate? What game genres will emerge to provide the best VR experience? As display resolution and switching speeds improve, how should judder be addressed? The list goes on and on. Exciting times lie ahead!

**Acknowledgments**

I am grateful to Tom Forsyth, Peter Giokaris, Nate Mitchell, and Laurent Scallie for helpful discussions.

**References**

[1] Abrash, Michael, Latency: The Sine Qua Non of AR and VR, blog post, Dec. 29, 2012.

[2] Azuma, Ronald, Predictive Tracking for Augmented Reality, PhD Thesis, Uniersity of North Carolina, 1995.

[3] Carmack, John, Latency Mitigation Strategies, blog post, Feb. 22, 2013.

[4] List, Uwe H. Nonlinear Prediction of Head Movements for Helmet-Mounted Displays. Technical report AFHRLTP-83-45, William AFB, AZ: Operations Training Division Air Force Human Resources Laboratory, 1983.

[5] Ovaska, S. J., and Valiviita, S., Angular Acceleration Measurement: A Review, IEEE Transactions on Instrumentation and Measurement, Volume 47, Number 5, Pages 1211-1217, 1998.

[6] Smith Jr., Bernard R., Digital Head Tracking and Position: Prediction for Helmet Mounted Visual Display Systems, Proceedings of AIAA 22nd Aerospace Sciences Meeting, 1984.

[7] von Neumann, John, and Morgenstern, Oskar, Theory of Games and Economic Behavior, Princeton University Press, 1944.

[8] Welch, Greg, and Foxlin, Eric, Motion Tracking: No Silver Bullet, but a Respectable Arsenal, Computer Graphics and Applications, Volume 22, Number 6, Pages 24-38, 2002.