Quality Capture and Compression in React VR

Oculus Developer Blog
Posted by Eric Cheng
November 17, 2017

In April 2017, we released React VR—a library that lets developers with an understanding of JavaScript build and ship compelling VR experiences with speed and ease. These interactive immersive experiences can reach audiences across web, mobile, and VR devices. Since then, we’ve worked with Sony Pictures, the National Gallery, oyster.com, and more to build engaging content that reaches audiences online, on the go, and in VR. Today, Oculus Immersive Media Lead Eric Cheng takes us behind the scenes for the making of the British Museum React VR project.

The British Museum React VR project was an opportunity to produce an interactive VR experience accessible both in and outside of a VR headset using high-resolution, high-dynamic-range, 360° imagery. Because we had incredible access to the museum, we wanted to ensure that we captured and processed the best assets possible. In addition to its use in the React VR experience, we wanted the captured imagery to be high enough quality to be used for archival purposes. Over roughly 16 hours and two late nights of shooting that conjured up scenes from Night at the Museum, we captured and processed 136 high-resolution 360° panoramas.

The British Museum’s Virtual Reality Tour, built in React VR.

Challenging Lighting Environment
The lighting in museums, especially at night and during the blue hour, creates a challenging environment for photography. Exhibits are usually lit with warm spotlights that differ in color temperature from both ambient museum lighting and light coming in from windows. Twilight is especially challenging because incoming light from windows is blue (hence, “blue hour”). This means that captured images feature multiple illuminant sources of differing color temperatures. In addition, the range of light levels under spotlights and in unlit areas of a museum is extreme—beyond the range of today’s camera sensors; high dynamic range is required to capture details in both highlights and shadows.

360° and Parallax
Shooting in 360° is challenging because every image needs to be stitched to produce a full 360° capture (whether in camera, or in post). Multi-lens 360° cameras like back-to-back consumer cameras and DIY rigs of cobbled-together mirrorless or SLR cameras all present challenges in stitching due to the need to correct for parallax. Each camera is in a slightly different position and sees the scene from a different perspective. Even careful stitching with a lot of manual tuning can’t completely solve the problem.

To solve the stitching and lighting problem, I decided to shoot using a single camera and lens rotating around its no-parallax point (entrance pupil). When a camera system rotates around its no-parallax point, there is no parallax in resulting captures, which means that stitching is usually perfect. Parallax is what makes capturing in 3D possible, but because our goal was to produce monoscopic final output, removing parallax from our captures improved the quality of stitching. The no-parallax point in a given camera and lens system is fairly easy to find, but requires some time and experimentation.

To solve the dynamic range issue, I shot in 3-shot, HDR (high dynamic range) bursts to capture range in light levels beyond what was possible from a single exposure. Luckily, the museum was closed and I was alone (with a chaperone); we hid behind priceless artifacts during the capture of each panorama, so there were no moving subjects to deal with.

Panorama Heads and Camera Selection
Because I decided to do no-parallax 360° capture, I needed to use a panoramic tripod head designed to allow rotation around a customizable point (the no-parallax point). Manual panorama heads like the Nodal Ninja are designed specifically for this application, and they’re perfectly suitable for full panoramic capture. However, for this project, I would be capturing over 100 panoramas, and I needed an automated solution. I decided to use a Roundshot VR Drive, which is a fully-robotic, automated panorama head. Luckily, the Roundshot supports my camera of choice, a Sony a7r II, chosen for its high-quality sensor and support for sharp wide-angle lenses. For the lens, I chose the Leica Tri-Elmar-M 16-18-21mm f/4 ASPH. Lens (shot at 16mm), which gave me excellent image quality at a wide field of view.

With 30% overlap per image, the combination of 16mm lens and 3-shot HDRs yielded 84 source images per 360° panorama, yielding a final image at around 322 megapixels in size.

Image resolution calculations: a 16mm lens on a full-frame camera has a ~97° horizontal field of view. The camera was shot in vertical orientation with 5304 pixels across the short edge of the frame and a 73.7° vertical field of view. Final output is calculated to be ~71.9 pixels per degree, or about 26,000 x 13,000 pixels (322 megapixels). To increase resolution, a higher focal length lens could be used, which would increase the number of pixels per degree (e.g., 382 megapixels w/18mm lens, 490 megapixels w/21mm lens).

It’s important to note that, for the purposes of output resolution in 360°, the only variable that matters is pixels per degree. When camera hardware is fixed, the number of source images captured is related only to the chosen percentage of overlap. A large overlap percentage yields more redundancy for stitching accuracy, but it doesn’t result in a higher-resolution output.

Using a Standalone 360° Camera
I don’t want to discourage anyone from using a standalone, integrated 360° camera for shoots like this. I chose to use a panorama head and high-resolution camera with a big sensor because of the unusual access we had at the museum. The opportunity was there to do extremely high-resolution capture in an empty museum (no moving subjects), so I chose quality over convenience.

A high-resolution stills camera like the the Panono camera captures fantastic 360° images and even features an HDR mode (108-megapixel spec, but 128-megapixel stitched file output). The 2015 model of the Samsung Gear 360 is also a formidable stills camera, with a 30-megapixel output, but its small sensor makes it less desirable in low light scenarios.

Shooting Challenges
One challenge was that the Sony a7r II’s buffer isn’t very big and flushes to SD card slowly. It’s designed for high-resolution, considered shooting, not for action/sports, and it can’t sustain a fast shooting speed for very many shots. I had to tune the shooting interval on the Roundshot by trial and error so that no frames would be dropped during my 84-frame capture. If I were doing the shoot again, I would probably choose the Sony a6500 over the a7r II. The a6500 has a huge buffer (~100 raw files), which would allow for full-speed automated shooting, and the reduced sensor resolution could be accounted for by using a slightly longer lens (and more source pictures per 360°).

Another challenge was leveling the pano head for each shot while moving a tripod around on an uneven surface. It was often difficult to see a bubble level on the tripod head, and I ended up with many shots that weren’t quite level. Furthermore, I wasn’t allowed to put a tripod down in some areas and had to place a moving blanket underneath the whole rig. This was time-consuming, and precise leveling was something skipped. Having a quick way to automatically level a pano head would have saved a lot of time in post. Accurate IMU data could possibly have been used to do this, except that I couldn’t assume that the floor and museum itself were level in relation to gravity!

A very sturdy tripod was necessary to keep the pano head and camera stable. Exposures had to be balanced between speed (shutter speed) and noise (ISO), and I tried to keep the longest exposures under three seconds each. Any movement of the camera would have caused the final images to be blurry (I used a Really Right Stuff Series 4 Carbon Fiber Tripod).

Finally, the shoot was physically taxing. For eight hours at a time, I repeatedly lifted and moved a heavy camera system whose grip point was at chest level (the camera was at head level), and then ran back and forth to various hiding areas to stay out of the shot. After the shoot, I immediately boarded a plane and flew from London to San Francisco. When I returned home, I was wrecked! Bringing an assistant to share the physical workload would have been helpful.

Shooting Nadir Plates
In each of the rooms we captured, I took assorted pictures of the floor in case we needed source material for nadir patching. We didn’t end up using them, but it’s good practice to shoot nadir plates when doing 360° capture just in case.

Post-Production Workflow
The goals for post-production were to take a large set of overlapping HDR source imagery and convert them into high-quality assets for display in a WebVR experience. Here’s the workflow we used:

  1. Stitch each group of images into a large 360° panorama
  2. Level the horizon
  3. Patch nadir and remove the tripod
  4. Color grade (for consistency, to adjust for multiple illuminant colors, and for applying the desired look)
  5. Save at high resolution
  6. Resize, convert, and save in target distribution format (in our case, cube maps @ 1536px per side)

Original stitch and processed final image.

Post-Production Workflow Specifics

1. Stitching
Each 360° panorama was stitched automatically from 84 source images (28 x 3-image HDR brackets) using GoPro Kolor's Autopano Giga. Out of the 136 panoramas, 13 exhibited serious artifacts during stitching and required manual tuning before they stitched properly. The artifacts were usually a result of source imagery containing a lot of repeating patterns, and removing some of the highly-redundant data (at high and low angles of capture) usually enabled a second-pass automatic stitch to be successful. Each stitched master was saved as a 16-bit TIFF (~322 megapixels, 2.5 GB per image).

2. Horizon Leveling
I was unable to carefully level the pano head at every location, so some of the resulting panos weren’t level. These images were leveled in post using PTGui Pro. We found that the automated “straighten panorama” feature wasn’t always precise enough, so we used vertical- and horizontal-line control points and the “level panorama” feature.

3. Tripod Removal
We did tripod removal and nadir patching using PTGui Pro’s “extract floor” and “insert floor” templates and Adobe Photoshop. These templates are very powerful and make nadir patching an easy part of the workflow. We could also have used nadir plates during patching, but they weren’t necessary.

4. Color Grading
Because we started each night of shooting during twilight, the light coming from outside the windows was blue and didn’t match the interior lighting. Inside the museum, ambient and exhibit lighting were also often mismatched, although all indoor lighting was warm. This created images with severe blue tinting at and around windows, as well as the potential for color gradients across each scene. Adobe Lightroom and Photoshop were used to match the look of all images in a particular room, along with removing the blue cast from areas around windows.

Using Adobe Photoshop to remove “blue hour” color cast from windows.

5. Saving / Export
Finally, new masters were saved at high resolution before being exported into lower-resolution versions suitable for large-scale distribution as an interactive 360° tour built with React VR. We chose to start at a 6000x3000px resolution for our equirectangular panos, as it would provide a good baseline for our optimization work and ensure we delivered a high-quality experience at a download size that was reasonable across all platforms: web, mobile, and, most importantly, VR.

The final version of the tour was composed of 29 panoramic images, which, at the baseline resolution, implied a download size of roughly 15mb per image, or over 450mb total. As you can imagine, this is quite large for an experience consumed via the web—and that’s without taking into consideration all the remaining assets in the experience.

The first step in our optimization process was converting the equirectangular images into cubemaps. To do so, we made use of ReactVR’s equirect2cubemap tool, which, given an equirectangular image and an edge size in pixels, spits out a cubemap as six separate images in PNG format, leaving image compression up to the user. For our conversion, we selected 1536px as the edge size for each cube face, i.e. six faces of the cube at a 1536x1536px resolution, or roughly a ~9216x1536px total if laying out all six faces of the cube as a single image. The selection of edge size was made having the target platforms in mind, 1536px being the recommended resolution for cubemaps in VR considering existing displays and headsets. Among the benefits of switching to cubemaps, two stand out:

  1. Reducing overall pixel count per image, while increasing perceived quality
  2. Reducing memory used when loading the textures, roughly from ~72mb baseline to ~55mb per pano

Both these benefits were a step in the right direction, but we could still do better by compressing the resulting cubemaps as JPEGs. Going from a lossless compression format like PNG to the lossy compression offered by JPEG meant we had to be careful so as not to lose all the good work we had done during capture and post-processing. This is why we generated numerous JPEG versions of the original PNG cubemaps, ranging from 30% to 100% original quality in 10% increments. This let us find the right quality v. download size balance that we were after.

The first thing we did was to compare overall download size at each compression level. As seen on the chart above, there is little to gain in terms of overall download size and much to lose in terms of overall image quality beyond the 70% mark. It is for this particular reason that we decided to rule out any compression quality below that number. We were now left with four JPEG qualities to choose from: 70, 80, 90, and 100, which we used to deploy three different versions of the experience and do some testing across a variety of devices and headsets.

Due to the nature of the contents of our experience, we identified that there was no major quality degradation at the 80% scale when compared to our baseline, but there was a big download size win. The important parts of the scene, such as galleries, statues, and sculptures, remain pretty much unchanged, even when explored in VR. The reduction in quality is noticeable on text or other places with sharp edges and along the edges of surfaces with high contrast. These artifacts become more evident at the 70% mark, which is why we decided against that route. That said, only the expert eye or a very detail-oriented person will be able to easily notice the artifacts resulting from the lossy compression at 80% quality and identify them in the experience, so we could get away with our compression of choice.

So, from the original 29 6000x3000px equirectangular panos we had, totaling a ~450mb download size, we arrived at 1536px edge-sized cubemaps, compressed as JPEGs at 80% quality, resulting in a ~40mb total download size. This implied more than a 10x improvement on download size while requiring less memory than with the original equirects—big wins to try and reach as large of an audience as possible across multiple devices and form factors.

The key takeaway here is to be able to capture in as high of a quality as possible. This gives you enough flexibility to start downsizing and reach the quality and size needs your particular use case calls for. The inverse process isn’t viable—capturing images that aren’t good enough and working hard to improve them won’t lead to the same results. Moreover, by capturing images at a super high resolution you can also future-proof your work, being able to come back at it later as display technologies, devices, and bandwidths all continue to improve. As the years go by, you can rework these to the values that are relevant at the time.

You can experience the The British Museum’s Virtual Reality Tour at vr.britishmuseum.org.