Nvidia is not particularly well known for robotics, but that’s about to change. As of just a few weeks ago, Nvidia has established a shiny new robotics research lab in Seattle, within an easy stroll of the University of Washington. The Nvidia AI Robotics Research Lab is led by Dieter Fox, a professor of computer science and engineering at UW, and will eventually grow to house “close to 50 research scientists, faculty visitors, and student interns.” Nvidia’s goal is to help robots make the difficult transition from working in the lab just long enough to publish a paper to working out in the real world in a reliable and useful way.
“The charter of the lab is to drive breakthrough robotics research to enable the next generation of robots that perform complex manipulation tasks to safely work alongside humans and transform industries such as manufacturing, logistics, healthcare, and more.”
Here’s a brief video intro to the new lab to give you an idea of how things look over there:
The mobile manipulator in the video is Franka Emika’s Panda arm mounted on top of an Nvidia development platform. The plan is to eventually upgrade the robot to dual arms, probably an omnidirectional base, and definitely more integrated sensors. The sensors in particular are key to what Nvidia wants to focus on, which is getting robots to safely and effectively work around (and with) humans. But there’s a lot more research that goes into making useful human-friendly robots: grasping, manipulation, computer vision, object recognition, human-robot interaction, and collaboration—the company has plenty of work to do.
Nvidia’s new robotics lab is led by Dieter Fox, who’s also a professor of computer science and engineering at the University of Washington, in Seattle. Photo: Nvidia
For more details on Nvidia’s approach to solving all of these robotics challenges, we spoke with Dieter Fox, senior director of robotics research at Nvidia.
IEEE Spectrum: The University of Washington has a great robotics program. How will things be different doing research with Nvidia?
Dieter Fox: The kind of research that we want to do and the research philosophy that we want to follow is motivated by the fact that, as an academic, the kind of publication model that we have is sometimes a bit frustrating. It’s not very amenable to doing larger scale projects that run over multiple years and integrate different research strengths. What you typically see is your PhD student writes a paper, it gets accepted, and then I tell the student, "okay, now let’s integrate that into our larger robot system and get it to really work well in the real world." And of course the student says, "I don’t really want to do this because I cannot publish it." The student is strongly encouraged by this publishing model to move on to a new problem— there’s no strong encouragement to make it actually work. For the students, it’s fully understandable, but as an aging roboticist, it’s getting a bit frustrating.
Why use the kitchen as a research environment?
We want to use this to drive our research forward. I believe that the kitchen is such a rich, challenging environment, that if we can’t solve all the tasks in a kitchen, then we won’t be able to solve similar problems that also come up in other areas, such as manufacturing or healthcare. So the idea is really to use this kitchen to challenge ourselves to design and build robots that can solve tasks.
Right now, there’s a lack of an agreed-upon environment in which people can test and compare their manipulation systems. I think kitchens, and for example Ikea kitchens which are almost standardized across the world, you can imagine that robotics labs just buy their own kitchens and do their own research and then compare their robots and capabilities against each other.
One of Nvidia’s strengths is simulation. In what situations do you think simulation is most useful for robotics, and in what situations is it less useful? How do you make the transition between simulation success and real-world success?
I think there are different application domains for how you can use a simulator. One use for a simulator is if you just want to test or debug your system, I think that’s super useful. You can also test out scenarios that you wouldn’t want to do in the real world, like in human-robot interaction, although that’s an open research still—how can we simulate people? To actually simulate the behavior of a person, we’re really not able to do that yet. One way around that is virtual reality, where you can put a real person into a virtual environment, and that person can interact with a robot. You can also collect training data for a robot if you have two people, where one person controls the robot in VR, and a second person mimics a human in VR as well—I think that’s going to be a really exciting application domain.
In the machine learning domain, one caveat is that some simulation scenarios are somewhat oversimplified and don’t really reflect the complexity of a real robot in the real world, so I think we have to be careful of looking at tasks in simulation that we feel like actually reflect the challenges that we have on real robots. For example, we are using simulation to train our object detection system. You have a camera image, and let’s say you know the robot wants to detect a coffee mug. If we have a model of a coffee mug, we can train a deep neural network using synthetic data in simulation, and that works very robustly in the real world. It’s really amazing, I’m surprised how well that stuff generalizes. The key trick is that in your simulator, you just randomize over lighting conditions and different properties of objects so that your keep neural network can learn to be robust, and hopefully that captures that variability it will see in the real world.
The problem is that your simulator does not know all the parameters of the real world. This means if we now apply our learned policy on a real robot, it’s often going to fail. So in our research, we train a policy in simulation, randomizing over physics parameters, and then we try it on the real world, and that’s most likely going to fail. But we use the data that we generate in this real world trial to change the randomization of the physics parameters in simulation, and then we can go back and train a policy that’s more robust. We only use the real world to better randomize the parameters in our simulation.
So, I fully agree that it’s a really important question, but I think the research community is aware of it, and we’re really working towards making it work better and better.
Nvidia founder and CEO Jensen Huang interacts with a ABB YuMi dual-arm robot programmed to mimic a person’s gestures. Photo: NVIDIA
You mentioned that simulation is useful for object recognition and training grasping, but one of the big problems is how places like kitchens vary from one home to another. How much easier would it be if your new kitchen robot came with an entire set of dishes that your robot knew exactly how to recognize and how to pick up? Is it a useful strategy to adapt our lives a little bit to make the job of robots a lot easier?
On the one hand, a researcher might say, "you guys are crazy because you need this whole 3D model of the kitchen, and that’s not realistic because everyone will have to measure their kitchen." But, I strongly believe that in the near future, if you buy any kind of furniture or anything for your kitchen, that stuff should come with its own 3D model. These companies have 3D models of things because they need them for manufacturing; why shouldn’t they be used by robots as well?
From a research perspective, I’m also ambitious enough to believe that robots should be able to solve these tasks without requiring these models. For example, if you go into the kitchen of your neighbor and you want to get a fork, you don’t need a 3D model of the kitchen, what you do is find a drawer, recognize where the handle is, and you just pull on it. We humans are able to do these kind of things extremely robustly, and that is something that we’re investigating on the research side—we’re trying to get a robot to do exactly that.
So we’re trying to go both ways—take advantage of all of these models, but at the same time, investigate through our research how we can make it work even without these models.
How much can hardware be used to help solve grasping challenges, and how much of grasping is a software problem?
It’s the co-evolution of both that we need. You need to view software and hardware jointly, not independently. There’s still so much development that has to be done on the hardware side; for example, touch sensing is a crucial capability that helps humans to perform in-hand manipulation tasks and even picking tasks in a much more robust way. And right now on the hardware side, there a several systems coming up, but they still don’t have that notion of a full touch-skin over the hand of a robot. It’s usually constrained to only the fingertips.
So I think there’s still a huge amount of room for improvement on the manipulator design, both with respect to touch sensing and also to the articulation itself, and at the same time developing the control algorithms that can take advantage of these new hardware designs and sensor designs as well.
That’s actually where I think there’s some promise for deep learning, because it’s well suited for combining these different streams of sensor data. If you want to do robust manipulation, you need to put all of that information together, which means you need to use visual information and combine that with touch sensing and force feedback and use all of that to control the hand itself with closed-loop control algorithms.
Do you think it would be beneficial in an environment like a kitchen, where everything is designed for human hands, to use a five-fingered hand, or do you think using two-fingered grippers will get you most of the way there with a lot less complexity?
Let’s put it this way—two fingers with smart algorithms and sensing will get us beyond where we are right now. I don’t think it’s the ultimate solution. Also, why not have a robot with a suction cup? I’m not married to the anthropomorphic design necessarily, but since these environments are designed for us, they are obviously well suited for these kinds of hands. Talking to people who have been working in manipulation for many many years, they also agree that two fingers can do a lot of stuff, but ultimately you have to move beyond that.
Just one example—if you let a person remote control a robot with a two finger gripper, we can already do a lot of stuff. So that’s a clear indication that what’s lacking is the intelligence and the algorithms that perceive the way we perceive these scenarios, and that generate the control commands accordingly.
Since you’re working on a mobile manipulator for a kitchen environment, when do you think we might have something like that operating in our homes?
The reason why we’re doing the kitchen is not because I believe the kitchen is going to be the big first application for these robots in the home—the tasks are far too challenging still. This is more a research direction that can represent other scenarios, like industrial manufacturing, or healthcare. So I think in the home, before we have robots that a normal family would buy to do the cooking for them, there will be many other application scenarios. For example, people with special needs, where there’s just a much stronger incentive to have a robot, and there might be more constrained tasks that can be very useful, but that a robot could perform, like bringing items to a person or picking up items from the floor.
Being able to just put up a recipe and have the robot cook the whole thing for you? That’s going to be decades out. But that’s why the kitchen is exciting, because we can stage it, we can make it progressively more complex. An intermediate step could be the person is doing the hard work, and all the robot needs to do is bring the right ingredients at the right time—the robot isn’t quite a sous chef, but it does the things can it can do, and helps you do the stuff that you’re much better at more efficiently.
You say “the time is right to develop the next generation of robots.” What exactly is the next generation of robots?
The next generation of robots is really this notion of robots that can perform manipulation tasks alongside people, which means they are safe to work with people, they have the abilities to recognize what’s going on around them, understand when people give them commands, are able to be flexible with respect to learning new tasks, and a person should be able to teach them new tasks in an easy and natural way. That’s certainly far more than 5 years out. But along the way, as we’re doing this research, we will see these robots becoming useful enough that in things like industrial applications, they will start to be deployed within that time frame for sure. That’s the nice thing—there is no end goal where we say, "until we reach there, these robots are totally useless." There will always be intermediate steps that make them worth deploying, but in less constrained settings over time.
It sounds like the work being done at Nvidia’s lab will be relatively open, with papers being presented at conferences and such. We’re expecting to hear about some of the initial research within just the next few months.
[ Nvidia Robotics ]