Sharing VR Through Video: Mixed-Reality in The Unspoken

Oculus Developer Blog
Posted by Shaun McCabe, Production Director, Insomniac Games
September 2, 2016

“So… what’s it like to, you know, be in VR?”

I get these sorts of questions a lot these days from friends, family, and—in this case—from a cashier at a grocery store who saw I was wearing an Oculus T-shirt. VR is becoming part of the zeitgeist and people want to know what’s so special about it.

When these questions come up, I usually answer by talking about “unprecedented levels of immersion” and “being transported to different worlds.” And while this is all true, it doesn’t really do the experience justice.

Short of strapping on the headset, however, what’s the best way to share the uniquely amazing experience of VR?

For The Unspoken, our upcoming PVP spell casting game for Touch, the answer is mixed-reality video.

Sharing a VR fantasy in 2D

The Unspoken is all about realizing the fantasy of wielding magic in the modern world. From the beginning, our player experience goal has been “to make spellcasting with Touch feel amazing.”

We revealed The Unspoken in April 2016 through a hands-on demo for select media and influencers. With an Rift on their heads and Touch in their hands, players just got what we were going for.

For our next reveal, however, we wanted to share that experience with a wider audience—in other words, to show that spellcasting with Oculus Touch feels amazing outside of VR, for example, in a video viewed on a mobile device. That’s how we arrived at mixed-reality.

Mixed-reality denotes content that merges elements of virtual and real-world environments. It’s a broad classification and can mean anything from inserting digital objects into the real world to taking real-world objects and inserting them into a virtual world.

What we thought would work best for The Unspoken was to show real people wielding magic in our world—sharing the VR fantasy in a 2D medium. So we got to work.


Our mixed-reality research started with a Gamasutra blog post from the team behind Fantastic Contraption. It’s an excellent primer on the problem space and the results they’re getting are, well, fantastic for their game. Their technique involves taking video streams from the game and a real world camera and compositing them into the final frame using OBS, an open-source video streaming tool.

Because it was simple and well-documented, we decided to use this as the basis for our proof-of-concept.

Spectator Mode

Our first step was creating a special “spectator” mode. As a networked 1v1 game, this meant allowing a user to join as an “invisible” client. Once a spectator was connected, they could choose to target one of the active players in the game. After a player is targeted, a couple of special things happen:

  • Avatar rendering is disabled. Because we want the real-world player to be inserted into the shot, we disable the avatar for remote players.
  • A third-person camera is activated. We want the spectator camera to line up with the real-world camera. We achieve this by taking the synced world-space position of the target Player’s origin, then applying a fixed offset and rotation. We’ll revisit specifics of how we line things up later when we discuss the setup.

Broadcast Set

The next step was building a broadcast set. Here is a list of the equipment we used:

We also set up OBS on a “Broadcast” PC. For the trial run, it took 2 inputs: the video from the Spectator build and the USB camera. From there, we configured OBS to run a chroma-key filter on the camera input and layer it on top of the spectator input.

Lining Up the Cameras With our broadcast set assembled, it was time to tackle the problem of lining up the real-world and in-game cameras.

The first thing we did is create calibration objects, a real-world box and game object with matching dimensions. Having these two objects provides a consistent reference point between each environment.

There was a lot of trial-and-error at this stage, but here are the steps we developed for lining things up:

  1. Mark where the real-world player is going to stand in front of green screen.
  2. Place the calibration box at the position marked in 1).
  3. Place real-world camera at the position/orientation that best fits the composition you’re looking for.
  4. Measure the distance and rotation from the player to the camera lens (back/forward, left/right, up, any angles).
  5. Place the calibration object (that matches the real world box, same size and location up) at the position where the avatar would normally be.
  6. Set the Spectator camera offset and rotation to the same things you measure for step 2).
  7. Tweak the settings to make sure the calibration object lines up with the real calibration box.
  8. Remove calibration objects from the set and the game and have the real-world player step in.
  9. Have the player perform actions to line up any final tweaks to the in-game camera. In our case, that meant holding their arms out and casting a shield and/or charging a fireball.
  10. Save those settings out.


All in all, the proof-of-concept took just a few days. It was incredibly cool to see this all working and it confirmed our belief that mixed-reality was the best way to show The Unspoken in 2D.

Like any successful prototype, however, our proof-of-concept highlighted a serious problem: Our real-world players didn’t feel integrated into the world.

A lot of this comes down to art direction. The Unspoken visuals are all about spectacular spellcasting against the backdrop of an urban magic fight club—lots of particle effects in arenas set in gritty locations with moody lighting. Our initial approach didn’t handle either particularly well; virtual objects didn’t sort well with the real-world actor and we couldn’t really match the virtual lighting with real-world lights.

We’d have to try something different.

Integrating Players into the World

After discussing several different options, we decided that the best way to solve the integration problem was to get the camera data into the engine. If we could effectively stream the video to a texture and apply it to a camera-aligned quad, we’d not only solve the sorting problem, we’d also have access to all of our environment lights and post effects.

Streaming Video to the Engine

Using some sample code from Microsoft Media Foundation as a starting point, we added USB camera capture support to our engine. The Media Foundation provides access to one frame of video at a time and we added support for buffering several frames. With that, it was easy (at least, for our engine programmer) to copy one of the frames to a texture.

Next, we created a proxy object in our engine. The proxy is automatically enabled for spectator mode and has a single, simple behavior: It derives its position from the world-space head tracking position. From an art standpoint, the proxy is just a camera-facing quad with a custom material. The material samples a single texture input, runs a chroma-key filter, and passes the result to the diffuse output. Then, at runtime, we copy a buffered video frame into the texture input and voila, the real-world player is composited into the game!

This was a big step forward. Because the proxy was rendered in the game engine, it sorted nicely with the visual effects and inherited the same lighting properties as the rest of the environment. The two biggest barriers to player integration were solved.

Well, mostly solved. There were still a couple of lingering issues before we were production-ready.

Final Tweaks

The first was deliciously ironic. Because the proxy was running through the engine, it was affected by the lights in the scene. The problem was, most of our spell effects are sweetened with dynamic point lights. And if point lights have a natural enemy, it’s flat surfaces. Because our proxy object was just a quad with a single normal, it was more or less the perfect nemesis to our spell effects—casting spells would often result in the player getting completely washed out.

We didn’t want to remove the point lights—they’re a big part of what makes spells feel impactful—but at the same time, we knew the washed out players wouldn’t fly. We decided to split the difference by modifying the material so we could reduce the contribution of the scene lights on the proxy.

The second problem was more with green screens in general—chroma key artifacts. Fortunately for us, there’s a common solution in video production called a “garbage matte.” It allows editors to reduce the composited source to as small an area as possible. It’s traditionally a hand-drawn matte but, in our case, we found it was good enough to support a rectangular clipping mask in the material:

And here’s a look at the final material we used to implement the corrections:

With these issues fixed, we were ready to take The Unspoken mixed-reality production on the road.

Taking It On the Road

The Unspoken mixed-reality debuted in the Oculus booth at E3 2016. Oculus contracted Pixel Corps, a digital services company, to produce a live broadcast of titles featured in the booth, including mixed-reality for The Unspoken.

The mixed-reality set in the booth was far more sophisticated than what we’d cobbled together in our studio. The equipment, from the green screen to the camera to the lighting, was top-notch, and the folks from Pixel Corps were broadcast professionals. So we were a bit surprised when we first got things set up and couldn’t match the quality of what we were getting in our set back home.

What changed? Well, everything. The results we were getting in Durham were the product of a lot of trial-and-error based on local conditions. For example, our home set was at the mercy of natural lighting conditions (we have a lot of big windows in the office). Also, our HD USB camera had a different FOV than the one we were using at the show.

Luckily for us, this wasn’t a catastrophe. But it emphasized the most important lesson we learned about setting mixed-reality in a new location: make everything tweakable!

Think about it this way: A live broadcast involves a lot of setup and broadcasting equipment can be configured for a variety of conditions. The same thing needs to be true for the game. In our case, we have a debug system that makes it pretty easy to expose engine parameters such that they can be tweaked at runtime. Even better, the parameters can be saved and then loaded at a later time. Here’s a list of what we could tweak:

  • Camera FOV (to match real-world camera)
  • Spectator camera offset
  • Calibration object offset
  • Input camera offset
  • Base transform yaw
  • Base transform offset
  • Scene luminance modifier
  • Lighting modifier
  • Input camera clipping (left, right, top, bottom)
  • Chroma key
  • Linear key
  • Spill reduction
  • Camera delay (in milliseconds)
  • Camera select (if using multiple cameras)
  • Camera resolution and frame rate

Next steps

So where do we go from here? There are a lot of things we want to do to enhance mixed-reality for The Unspoken. In typical Insomniac fashion, the “want to do” list far exceeds the “actually can do” list. But here are a few things we’re looking to do in the near future:

  • Dynamic camera. From the start, we focused on mixed-reality for a static camera. Based on our experience, we’re glad we established that limitation upfront. We can imagine how much cooler it would be to have a camera that has some freedom of movement.
  • Multiple cameras. Our setup looks to capture both the context of a PVP match and the personal drama for the player. To take this a step further, however, we’d like to support multiple cameras. For example, one that focuses on the game and another that focuses on the player.
  • Simplified setup. Now that we’ve had to set up mixed-reality in a couple of locations, we have a good handle on what it takes. In the long run, however, we want it to become a consumer-facing feature so anyone can stream The Unspoken in mixed-reality. That means digging into the setup and simplifying it as much as possible.

For many of us, Rift was love at first sight. We see a future where VR is ubiquitous. While that might be a ways off, it’s incredibly exciting to see VR enter the mainstream collective consciousness.

Someday, we won’t have to answer the “why VR” question. It’ll be part of people’s lives. Until then, technologies like mixed-reality will go a long way toward bridging the gap.

Just a couple of weeks ago, a non-gamer friend of mine asked me how work was going. I pulled up one of mixed-reality videos from E3 and showed her the video below.

Her response: “cooooool, I get to cast spells!”

Mission accomplished.