Digits: Hands-Free 3-D

Microsoft's new gadget Digits eliminates gloves from 3-D motion capture with a wrist-worn sensor

Loading the video player...

A new motion capture gadget from Microsoft Research provides all the control of a 3-D gaming glove. But Digits is a wrist worn sensor that leaves you barehanded and free to touch other objects. With an infrared camera, a MEMS motion sensing chip, and some software trickery it creates a 3-D model of your hand that responds to movements with fingertip precision.

The Digits prototype allows the wearer to answer a phone call with a thumbs up, change a television channel with a flick of a finger, play videos games without a controller, translate sign language into text—and maybe even touch type without a keyboard someday.

David Kim, one of the creators, sat down with IEEE Spectrum via Skype to dissect a Digits demo video. Kim is a PhD fellow at Newcastle University who works with the Interactive 3-D Technologies group at Microsoft Research Cambridge, the same lab responsible for other crazy augmented reality projects like KinEtre, HoloDesk, and Vermeer.

Digits could easily be built as a watch-sized gadget for every day wear, says Kim. It isn’t anywhere close to market, but here’s hoping.

David Kim: When I control computing devices with my gestures only, the technology becomes just invisible to me, and it kind of feels magical. It feels like the computer understands me.

Celia Gorman: Welcome to “The Full Spectrum.” Hi, I’m Celia Gorman. Today we’re speaking with David Kim of Newcastle University and Microsoft Research. We’re going to take a look at his new 3-D controller, called Digits. Welcome David. Thanks for joining us. So, what exactly is Digits?

David Kim: Digits is a novel, wrist-worn 3-D hand tracker, which is indeed gloveless, which means that you can use your hand to interact with physical objects, but it also enables you to continuously interact with virtual objects in 3-D space, but also use gestures to communicate with your computing devices around you. So when Kinect first came out, our group was quite excited about the prospect of enabling bare hand interactions with virtual objects and enabling fine interactions with your fingers, not just your body limbs.

So the main component is an infrared camera sitting on the inner area of the wrist, overlooking the palm. And then we’ve got a horizontal laser line generator which projects the horizontal laser across all the fingers. We’ve got a ring of IR [infrared] LEDs, which illuminates the whole hand. And we’ve got an inertial measurement unit, which gives the forearm orientation for cause interactions with your whole arm.  

We use a single camera and we use synchronized infrared illumination. So in total we capture three different kinds of images. The first one is with all the illumination off so we can see what kind of lighting situation we have in the environment. Then we capture one camera image with the laser on to get the distance measurements from the camera. And then the infrared LEDs are used to extract the fingertips and the shape of the hand. And from those two images, the first and the infrared, the first is subtracted so that we only see the signal that we want to see.

Celia Gorman: You talk about two different models in the video. Why wasn’t the simple model enough?

David Kim: The laser gets projected either closer to the camera or farther away, depending on how your fingers are bent or stretched. And we tried to replicate this model mathematically. We just assumed that all joints are proportional to each other, which means that we can parametrize the bend motion of each finger with one parameter, which is the distance that the laser senses. But then in the next stage we wanted to get a much truer model of the hands, which also models an independent articulation of this knuckle joint.

Celia Gorman: What sort of real world applications does Digits have? And what’s your favorite?

David Kim: Our vision when we started this research project was that this could be something like a wristwatch, which is small and ubiquitous enough to be worn all the time. By pinching your fingers place a call, reject a call, change the music, change the volume—things like that—without having to reach for the device. My favorite application is I think the nonvisual interface, because it is something that we don’t see right now. When I control computing devices with my gestures only, the technology becomes just invisible to me, and it kind of feels magical. It feels like the computer understands me.

Celia Gorman: You also mention that Digits is made with off-the-shelf parts. About how much does it cost to build one?

David Kim: Our research project, as you can see here, it costs around [US] $300 to $400. And that’s because we are using the best components we can get. But there’s no reason why it couldn’t be much, much cheaper.

Celia Gorman: I’m sure a lot of people would love to build their own version of Digits at home. Are you going to release the plans?

David Kim: This is a purely explorative research project. I don’t know about plans and I can’t comment about that.

Celia Gorman: So, David, where do you see this going? And what other challenges are coming up for this project?

David Kim: In terms of research challenges and challenges I’m personally interested in, we show a proof-of-concept device, which works in 80 to 90 percent of the cases. But we still have to tackle the last 10 to 20 percent to make it really work. And some of the challenges include when I bend my thumb across all my fingers, it occludes the view of the camera. And we would like to think more about how we could tackle these things, for example using machine-learning algorithms to tackle unknown situations which are tricky to handle with our current setup.

Celia Gorman: Thank you so much for joining us David. It’s really a pleasure to hear about your new project.

David Kim: Thank you.

Celia Gorman: We’ve been speaking with David Kim from Newcastle University and Microsoft Research. Thank you so much for joining us. For IEEE Spectrum, I’m Celia Gorman.

NOTE: Transcripts are created for the convenience of our readers and listeners and may not perfectly match their associated interviews and narratives. The authoritative record of IEEE Spectrum’s video programming is the video version.