Saturday, March 28, 2009

Motion capture report from GDC09, San Francisco

I attended the Game Developer's Conference (GDC'09) in San Francisco on Friday March 27 so that you didn't have to. So if you missed GDC this year, this is your lucky day. I don't develop games, and have no plans to develop games, however GDC does have some motion-capture vendors on the exhibit ("expo") floor, plus a few other exhibitors who sell software/hardware toys that are potentially useful for hobbyist animation.

Since my interest is strictly in mocap systems and in products that make hobbyist animation faster, this is a highly-focused writeup that doesn't touch on the vast majority of vendor content from the expo floor. Many of the exhibitors were game middleware vendors, testing/QA service companies, national agencies trying to convince game developers to relocate to Canada / Germany / Scandanavia / Singapore, or schools that offer programs in computer animation. However there were also three full-body mocap vendors who attended with full demos, so I paid my $75 student entry fee (I'm a card-carrying very-part-time student) and hit the expo floor.

Here I have a heavy focus on NaturalPoint, because they're realistically the only mocap company with a product in the hobbyist range -- and even that's pushing it, depending on what you mean about disposable income levels when you say "hobbyist". I really should have asked the reps at Xsens and PhaseSpace what their system prices are, however I didn't, and neither of those companies is forthcoming in print about its pricepoints like NaturalPoint is. I'm fairly sure that when I've researched their prices via web searches, they've been significantly higher than the $5K-$10K pricepoint of a NaturalPoint system.

Interestingly, the 3 mocap vendors at the expo all use different technologies (optical passive marker, LED active marker, and inertial). All of them had live demos running with an actor or actress wearing the appropriate suit, and the data streaming to one or more live video monitors with a reconstructed virtual actor on-screen.


There's actually quite a bit of incremental news here, though apparently they haven't published it much on their web site yet.

The biggest news is that the new cameras are coming. This is all in the printed glossy materials they were handing out, so it's presumably all public at this point. NaturalPoint is about to roll out the "V100:R2" camera, which is the second generation camera that will replace their current "FLEX:V100" series. The R2 looks very similar to the R1, however it's got twice the FPGA programming space internally, so NaturalPoint will have more flexibility with what they can do with the camera's internal software over time.

The R2 camera will also support interchangeable lenses, which turns out to be a huge win. I don't believe that the original first-gen camera allows you to change the lens, though NaturalPoint will presumably clarify that as they get more documentation online. The first-gen camera has a 45-degree field-of-view on the camera, but the R2 camera will support a 60-degree field-of-view lens if you choose to install that lens type. The 60-degree lenses allow you to increase your capture area without increasing the size of the room where you're capturing. There is presumably some cost in terms of accuracy, since the cameras are still only 640x480 devices, however the brochure makes the win really clear:

-- With the original cameras, to capture a 10'x10' square space, you pretty much needed a 20'x20' room. (At least, this is what the diagram in the literature suggests). The NaturalPoint material estimates that about 20% of the volume of the 20'x20' room is usable as actual capture space in this original configuration.

-- With the R2 cameras and 60-degree lenses, to capture a 10'x10' square space you only need a 16'x16' room. NaturalPoint estimates that in this mode, about 34% of the volume of your room is usable as capture space. So that's about a 50% increase in your capture volume (from 20% of the room to 34% of the room) just by swapping out the lenses.

So we can mark NaturalPoint down as probably one of the only vendors which is improving its engineering to DECREASE the amount of capture workspace that you need. This clearly is reflective of their customer base, which presumably has a heavy component in the "garage or living room hobbyist" demographic like me. Since I don't have a 20'x20' open room in my home (not even my garage will do that for me), the R2 camera now opens up the serious possibility for me of getting close to the full theoretical 10'x10' square capture space at home.

The new cameras will cost slightly more than the old -- it looks like it will be $600 rather than $550 -- however the base price for a 6-camera setup will apparently still stay at $5000, so the actual cost for a new 8-camera or 12-camera setup won't go up by much.

The second piece of news is the 3dsMax streaming plug-in, which NaturalPoint has mentioned in their online forums. They also have a sample video up on the web site that shows it in use. The plug-in works much like their MotionBuilder and Daz Studio plug-ins: you still need to run their capture software "Arena" on one PC, but now you can also fire up the Max plug-in and stream data live directly into 3dsMax, where you can target it directly onto a Biped object. Since I very recently decided to dump Maya for Max, this is also positive news for me.

The Max plug-in was running at the GDC'09 booth, so I can confirm that it was receiving data from the live mocap actor and sending it straight onto a Max biped.

It's not clear if this is officially announced or what the timetable is, however NaturalPoint suggested that they're working out a way to trick up the camera "sync" signal onto the USB cabling, which would mean that you no longer need to run two sets of cables to each camera, instead you'd just run the USB cable. So that will be a win for cable control and setup time once that feature is available. Right now the glossy literature still shows sync cables, so those must still be a necessity. I noticed from their web site that they've also got a new "second-generation calibration wand" which is already part of the calibration kits - this new wand is extensible up to about 5' long, so you don't have to wave a little 18" pixie stick thing frantically in front of you to calibrate the space anymore.

Now let's do a dive into some of the geek details that you might care about if you're thinking of buying one of these systems.

-- The GDC09 NaturalPoint demo booth used a 16-camera truss-mounted setup in a total area of about 15'x20', with a performance stage and capture area of about 9'x9' (probably 10'x10') in the center. This setup allowed easy single-actor capture and allowed them to attempt dual-actor capture in the afternoon -- more on that below.

-- One thing I noticed about their setup is that one of the lower corner cameras was NOT mounted directly below the upper, rather it was mounted an additional 18" or so backwards, which presumably helped the capture area a tiny bit. I confirmed with the salesrep that it is NOT necessary to have your lower cameras mounted directly below your upper, even though the online diagrams suggest that this is how you should do it. You have some flexibility in the horizontal positioning of the cameras. This is helpful information for somebody like me with weird pipes in my garage which will occlude upper cameras but not lower, for some corners of the garage. It sounds like I can mount each individual camera as far back in the corner of the garage as possible, as long as there's no occlusion.

-- Arena (the NaturalPoint software) was running on one dual-core laptop and I'm 98% sure that the Max plugin was running on a separate laptop.

-- In the afternoon I stopped by again and they were attempting to demo a 2-actor capture, however the system was having problems tracking one of the actors -- limbs were often going out of joint or the figure would turn sideways or such. The explanation given is that for 2-actor capture you really need a quad-core CPU, and that they were overloading their dual-core with all the cameras and the markers to solve. This was the behavior I observed as well -- it looked like the displayed point clouds were sluggish to update on the large Arena software display screen. So the rule of thumb seems to be: use a PC with at least 2 CPU cores per actor. If you ever wondered why in the world anybody would need a quad-core laptop, the answer isn't "so that you can run Microsoft Word REALLY REALLY FAST", it's "because I want to do 16-camera two-actor motion capture." You knew Dell and IBM were selling those expensive laptops for a reason, right?

- I asked sales if it was possible to run both the Max plug-in and Arena on the same PC, and got a nervous response that suggested that CPU matters a lot in this situation. The recommended config was Arena on one PC, Max on another, then you stream the data across your local network from Arena to Max. This is probably how I'd set it up anyway, so that's not a problem in my case.

- They also were slightly nervous when I told them that my living room is only 13'x14' -- the response was that yes, I'd get a capture volume, but it wouldn't be anything close to 10'x10'. They didn't mention the upcoming rev2 cameras with the new lenses at the time -- that was a later conversation. It still sounds like my 20'x15' garage is going to be a better choice -- in fact I had something of an extended "garage versus living room" discussion with them, the outcome of which was that I vowed to re-measure my garage size when I got home.

- High ceilings matter. The preferred ceiling height for camera mounting seems to be about 10', with current OptiTrack online videos suggesting that your high cameras need to be AT LEAST 8' high. Unfortunately, many houses including mine have 8' ceilings. It's probably not a showstopper, however it will limit my ability to have an actor stand on a footstool or jump in the air or such.

- The full Arena system is locked down only to the dongle, and is NOT locked to any particular CPU or machine. This means that unlike Autodesk products or similar node-locked software, you can choose to sell your entire Arena system to somebody else if you decide that mocap isn't for you anymore. NaturalPoint confirmed that this was fine, they have no issues with people reselling their systems used. To me this seems like a fairly significant selling point -- if you put down $8K for a NaturalPoint system and then you decide in a year that you really don't want it, you can probably recover a good portion of your original cost if you find a good buyer.

- The Max plugin (I believe it's still in beta) apparently doesn't presently support anything other than a single Biped in the Max scene, i.e. it sounds like you can't yet add any geometry to the scene so that your mocap actor can touch virtual walls, lean against a virtual table, etc. They suggested that they hope to eliminate that limitation in future revs of the plug-in.


PhaseSpace uses an active-LED marker technology wired into the usual mocap suit, so when you're tricked up in one of these, you look like a glowing red Christmas tree. They list a camera resolution of 3600x3600 and a capture rate of 480 Hz, which is one reason why I'm fairly certain that this is an expensive system. (By comparison: NaturalPoint uses 640x480 cameras at 100 Hz.) PhaseSpace also sends the cameras into a proprietary capture box, which is actually a dual-core PC running their software. So every PhaseSpace mocap system ships with its own custom PC.

The win with the active markers is that your software should never accidentally swap markers around, which means less cleanup of your data after the capture.

PhaseSpace is the one vendor who offers an optional glove for the mocap suit that can capture finger position. The other vendors at GDC'09 don't have a way to track individual fingers -- they can only track the position of the full hand, and then you have to animate the fingers on your own. If you add the glove to the PhaseSpace system, you get extra active markers on the fingers so that the system should be able to know how the fingers are bending.

The recommended usage pipeline is to send the data into MotionBuilder -- in fact it sounds as if the PhaseSpace system is simply collecting a set of (x,y,z) datapoints for you on the proprietary PC that ships with the system, and then you stream those datapoints into MotionBuilder and use MotionBuilder to do the bone solve. This approach appears to be somewhat less than what NaturalPoint does with its software, since with NaturalPoint you can stream directly into Max or Daz Studio, and I guarantee that Daz Studio doesn't have a built-in point cloud solver that determines how to position skeleton bones using a set of tagged (x,y,z) coordinates.

I asked if it would be possible to use just the PhaseSpace glove without the suit, and the answer was one of those "in theory, yes" answers -- in theory yes, if I write my own intermediary software to take their datastream and send it into my preferred 3D application onto a hand rig. However they don't sell a plug-and-play hand-capture-only solution -- the glove is really meant to be used with the full suit.

Price speculation: lists an 8-camera PhaseSpace system with control/capture computer and accessories at 33,590 British pounds, so it looks like we're talking easily in the US$50,000 range for a PhaseSpace system. Inition does list a lower-end 4-camera system for a mere 21,000 pounds.


Xsens sells the "MVN" (formerly "Moven") inertial motion capture system, which is derived from their earlier work on biomechanics and medical motion capture. The technology here is very different from the other two systems, and has some advantages and disadvantages. An "inertial sensor" is an accelerometer, the same type of sensor built into the Wii controllers. It can detect changes in acceleration and you can then integrate these to get velocity and positional CHANGE, however there's nothing in the sensor that can give you absolute (xyz) position in space.

Their demo setup featured a one-actress system streaming live into MotionBuilder.

Some differences between the Xsens/MVN system and the other two systems shown at GDC09:

-- No markers. The system uses strictly inertial sensors, so you don't have to stick markers onto the suit. The suit does, however, have small-but-obvious inertial sensors embedded in it -- these appear as rectangular lumps on the suit.

-- No cameras. Since you have no markers, you have no cameras to capture the markers.

-- Huge capture space. Since you don't have to stay within a "capture area" bounded by cameras, your actor can go wandering all over the place and the system will still capture it. You're only limited by the range of the WIFI transmitter built into the suit, and that can go up to 100' or so. The sales rep showed this by having the actress wander around outside of the booth space, and the system captured the motion just fine.

-- Position drift. Since the system uses only inertial sensors, it has no way to confirm the absolute xyz position of the hips. The sales rep mentioned that although the mocap actress could walk all the way down the expo floor and back, and the system would capture all of that motion, there would probably be a drift of at least a few inches in position by the time she got back. Whether this makes a difference for an animation pipeline probably depends on the pipeline and the need.

The Xsens system is biped-only, and streams the data into a custom application takes the accelerometer data, applies specific knowledge of human biomechanics, and uses it to produce the human motion on the screen. In other words, there's a heavy amount of "computer algorithmic assist" going on -- the software isn't actually tracking the position of every joint, rather it gets information about the motions of SOME joints, and it uses it knowledge of how the human skeleton is wired together, and how human beings move, to figure out how the skeleton should move.

The fact that the system doesn't know the xyz position of the hips leads to some unexpected behavior when your actor climbs stairs or a stepladder. Since the system can't track the hips height, it has to make an assumption, which is: "the ground is always at the same level, and the hips will always settle to the same level after a stride." It means that if your character climbs a ladder, the Xsens system will read each step as if your character simply took a large step up, and then settled back down to the floor. You have to manually go into the software after the capture session and tell it "no, the floor moved to a higher level at this point" on the relevant footstep keyframes as your character climbs up or down. I'm a bit surprised that Xsens hasn't added a hips-height capture mechanism to their suit to solve this problem, however that would then require constraining the mocap actor to an enclosed space near to the height sensor, much as an optical system requires you to stay near the cameras. (Didn't Animazoo solve this problem with an inertial gyroscope or something, in the Gypsy system?)

Price speculation: a thread on from early 2008 estimates a pricepoint of $50,000 for the system.


SUMMARY: NaturalPoint Arena remains the only full-body motion capture solution that I'm aware of that falls into the under-$10K category. The other vendors at GDC'09 are targeting small and mid-sized game studios who want to cut development costs by doing some mocap in-house rather than outsourcing all of their mocap needs to a specialty studio like Red Eye, but who don't want to pay $100K or more for a Vicon system. They can presumably be good choices for a 20-person studio that brings in a few million dollars a year in revenue and has a budget for R&D, but $30K to $50K for a mocap system just doesn't cut it as a pricepoint for the hobbyist. With the upcoming introduction of the new 60-degree camera lenses, NaturalPoint is continuing to make enhancements presumably aimed at helping the "living room animator" do a better job.


Xsens MVA:


Disclaimer: I don't work for any of the companies mentioned here, and none of them asked me to write this show report, no money is changing hands, etc.

Copyright: This show writeup is copyright (c) 2009 by Bruce Hahne. Nonprofit, non-commercial forwarding and redistribution is permitted and encouraged. For other uses, please contact the author.