Today we are releasing the most epic cat video ever created. Anything you want in a cat video is happening here, somewhere. It’s two hours, in 360º three dimensional virtual reality, that anyone can experience on their phone or computer.
Check it out at thecatquarium.com — load it on your phone’s YouTube app and move the phone around you. (If it doesn’t work, try these tips.) It’s like you’re in a bowl full of cats.
Then share it!!! Tell your friends! We want this video to go far and wide!
Then come on back here to learn how we did it. (Note that this post gets technical at times, but if you skip over the technical stuff you’ll still get a lot of the gist of it.)
Virtual reality is a powerful tool to create empathy and tell new stories. It’s also a great way to transport people into a different reality, and one of the most powerful ways to experience a place is to sit and observe.
The video encourages quiet observation. Watch a cat or a human, experience them, and be in the room with them. See how they interact with their environment. Let the sound of the environment wash over you. You’re somewhere else.
Not everybody has access to a cat café. Many of us, myself included, have allergies. This café is accessible to all and hypoallergenic.
My next few VR projects will dive deeply into the core of the human experience. This one dives shallowly and broadly.
“Hey Dave,” I said. Dave is one of the founders of KitTea, the San Francisco cat cafe which is around the corner from my apartment.
“Can we make a virtual reality cat video in KitTea?”
So how did you do it?
First some numbers, because I love numbers: this video was shot with 12 GoPros, each filming for 2hr15min at 60fps, creating 1.2 terabytes (that’s 1200 gigabytes) of source material. I then stitched the video with four computers, for over 10 computer-weeks(!!), into a 360º video. At its biggest size the video took up 7.5 terabytes. That is bigger than the size of the entire Library of Congress. Cats take up a lot of pixels.
A standard 360º capture requires a ring of cameras, with a good deal of overlap in the images that gets stitched together in software. Here’s one example rig with 4-5 cameras, and here’s the Ricoh Theta, which makes do with only two capturing at greater than 180º.
To capture 3D, you need to create not just one panorma, but two — one for the left eye and one for the right. These two panoramas are separated by the standard inter-ocular distance, meaning the average distance between two standard human eyes. Ideally, this means that whichever direction a person looks, we’ve got a simulated left-eye perspective and a simulated right-eye perspective.
(In the real world, it’s not this perfect. The farther away the lenses are from the center of the camera, the more parallax you get and the greater the stitching errors are. You can see a ton of stitching errors in our video, which look like wormholes the cats walk in and out of, spaced in regular vertical lines. For a great discussion about stereo in VR on a great VR blog, check out Elevr | Stereo Polygons.)
So we borrowed a ton of GoPros and a rig (thanks Forrest!) Danger set up the rig (which had 12 cameras, five stereo pairs and a top and bottom camera.) We call the rig the Medusa (because it has cool hair and steals your soul):
The stitching errors are minimized when things stay further away from the camera. To understand why, imagine a curious cat coming right up to one of the lenses. One camera will see a wet nose a few inches away, while the next camera might just see a tail. This is an extreme version of parallax. Contrast that with a very shy pussy that is 10 feet away, where both cameras get a very similar image. It wouldn’t be hard to stitch the second image into a single panorama that appears to come from one lens, but it’d be next to impossible to do the same for the first one. (You can also see why stereo rigs have worse stitching errors, because the inter-ocular distance requires them to be bigger.)
As much as I wanted the cats to come right up to the lenses and explore the camera, the weird glitches would be too distracting, so we put a 5′ circular piece of sticky paper around the camera rig.
Filming the video
The #CatQuarium is intentionally banal. There’s not much going on, and that’s how life is sometimes.
Most people will watch the few parts we’ve highlighted and move on, and a few very eager folk will become truly lost in another world for the two hours. Just remember that the Google Carboards have “FOR TEMPORARY USE” printed on them!
I gave participants in the video a few instructions:
- The video is about the cats. We are props.
- Don’t come too close to the camera.
- Try to get the cats to do interesting things.
- There are no outtakes. We’re not editing things out. What happens is what will be in the video.
The extended length of the video gave us the ability to play out a few super-long narratives and hide some cool things in there. You’ll have to watch through to see what I’m talking about.
We planned two peak periods of activity, one at xxxx which would settle into a lull and then build into another activity peak at xxxx.
The GoPros were all plugged into a USB hub. We checked all their settings, formatted their 64gb cards, then hit record on each GoPro and fed the cats. The video had begun.
Initially not much happened, and it took a few minutes for us to figure out what we were supposed to be doing in there and how to operate as props for cats. After a while, the cats warmed up to us, and soon, they were highly energized and engaged, playing joyfully with the new toys we brought them.
Synchronizing footage as closely as possible is crucial for syncing. Imagine if a flash goes off in the room and one camera is slightly ahead of another — the flash would start in one place and then move to other parts of the room. Imagine if a cat jumps from one table to another. When frames aren’t perfectly synced, you’re stitching the wrong frames.
When a camera records at 30 frames per second (1/30sec per frame), it’s not usually capturing the entire 1/30; most cameras only get about half (this is called a shutter angle for retro reasons.) For this reason, we recorded at 60fps, which meant we were able to get a closer sync. I then exported each of the source videos at 30fps.
We clapped and shook the cameras at the beginning, but because GoPro audio is inconsistently synced to the video, I manually looked through the video at multiple points to see when, for example, a paw touched the floor or an eye opened or shut.
The magic sauce of 360º video is stitching! I used Kolor AutoPano Video Pro, which took each frame of each source video and stitched them together into composite images, mapped onto an equirectangular projection (which is a lot like a mercator projection.) An equirectangular image represents a sphere in a rectangle by mapping 180º of vertical
xx source images -> equirectangular
Oh boy, stitching. It took only a few minutes to set up the parameters for the stitch in Kolor’s AutoPano Video Pro, and then I just had to click “Render” and come back the next morning and—
Haha. Hahahahaha. If only. The setup was extremely easy, but the stitching ended up taking 10 computer-weeks! That means that I set up four computers, each with their own copy of the source material and project file, and they took 2.5 weeks to spit out the whole movie.
xx image of the computers
Why so slow? First, warping a bunch of giant images takes a lot of computations and a lot of time. Second, I wanted an extremely high resolution image, and two separate images for left and right eyes. The output was 5870×5870 pixels, or 3.8x more pixels than a 4k image (which is movie theater resolution.)
Why so big? Well, it sounds extravagant, but consider that when YouTube shows this video, it displays about 100º of horizontal field of view — that’s less than 1/3 of the horizontal resolution. To get a full HD image at this narrow field of view, you want about 6000 pixels on a side.
(An aside: it’s worth noting that we have a long way to go in terms of storing 360º 3D information. Equirectangular images are easy to work with in existing editors but they don’t store information very efficiently — the top row represents only a single pixel! And 3D images store more or less the same information twice.)
I came into the office at least twice per day to check on the computers, and it was a lot like feeding pets or watering plants. One was faster and would always be done with the render assignment I gave it. One had very little disk space, and I connected it by ethernet to the one with a lot of memory, continuously moving frames off the computer. When they all got low on disk space, I paused rendering and converted still PNG images to ProRes video files to free up space.
At last we were ready to output the video. I lined up all the little pieces that had been created, added music and some surprises, and exported it out to a timeline, which took 60 hours to output and convert. And bam — now it’s on YouTube!
Final file sizes:
60fps h.264 GoPro footage: xx
30fps ProRes GoPro footage: xx
Stitched video PNG stills: xx
Stitched video ProRes video: xx
Encoded h.264 video: xx
Encoded h.264 video w/ 3D metadata: xx
Project files: xx
It’s not working?
The best way to watch the video is with the YouTube cell phone player. You should be able to wave your phone around you and see all parts of the ‘quarium. (BE SURE TO CHANGE THE QUALITY SETTING TO AT LEAST xx!)
If it doesn’t work, try the following steps:
- Update your YouTube app. On an iPhone, do this by going into the App Store, searching for YouTube, and clicking Update. On an Android, do the same from Google Play.
- Be sure that you’re choosing to open the video in YouTube, not in Safari or Chrome
- If it’s still not 360º, open the YouTube app directly and search for Catquarium, then play the video from there.
The desktop also offers a good viewing experience, and you can move around the video by clicking and dragging.
The goal of this project was to see how epic of a video we could create on nearly zero budget. If I were doing this with a budget and on a professional level, a few things would be different:
- I’d use professional cameras. GoPros are fun and small and easier to obtain a lot of, but a professional camera offers a degree of control that they don’t.
- I’d budget a lot of time for reshoots and visual effects to paint over the stitching errors. Where we’re at now, a lot of this still has to be done by hand.
- I wouldn’t do both 3D and 360º. 3D is cool, but it requires separating the cameras more than is otherwise necessary, giving a worse stitch and requiring more after-the-fact editing.
- I’d create an interactive viewing party. Throwing a viewing party and providing viewers is an awesome way to get VR out there.
- And of course, 1-3 custom built blazing fast computers with massive amounts of storage space. Pixel crushers are needed greatly.
In the future
360º/VR video is in its infancy, and we’re going to see it develop a lot in the coming years. Some thoughts:
- I don’t believe that a full 360º is necessary for most narrative experiences. This is what I keep coming away with time and time again. The ability to look behind you *without* the ability to walk around a space is a particular and weird kind of experience which lends itself to certain experiences (like the #Catquarium) but not usually to storytelling. 3D and eye-tracking, on the other hand, are key components to a virtual reality experience and feeling “there.” My ideal canvas at this point is something like a 210º field of view and 3D.
I’ve played with the Glyph and am particularly excited about it — it’s high resolution, 3D, and can come with you wherever you are. There’s no reason to need to provide a full 360 wraparound on a device like that.
- The full capture of a three dimensional space, so that you can walk through it, is a serious milestone to look forward to. You’d set up multiple 360º or lightfield arrays and create a 3D model of the video that could be explored. This allows you to turn a play-type experience like Sleep No More or a party into a video game / movie. I put this milestone about 5-6 years out, given current technology and where it looks like the industry is headed.
- We need better VR players. The apps out on Android are terrible and YouTube doesn’t allow any kind of interactivity. I like the Elevr player. We need more and better and it’s an easy pain point to work on.
- 360º video is here. You can experience it now and it will grow rapidly over the next few years. VR, headsets on the other hand, will move slowly but surely. Of particular excitement is Samsung’s offer to give free Gear VR’s.
- That being said, I’m excited about open VR platforms like the Cardboard and Oculus, which can be used with any device in any way. The Samsung Gear is a great product, but it does its best to lock you in. Vendors see VR as a way to hook us on another app store, and this could limit the development of the technologies in fundamental ways.
- With VR, simple is better and will be for a while. A few things in this space work really well, a few things work terribly. Know which are which.