How Google Wants to Solve Robotic Grasping by Letting Robots Learn for Themselves

You are likely pretty good at picking things up. That’s nice. Part of the reason that you’re pretty good at picking things up is that when you were little, you spent a lot of time trying and failing to pick things up, and learning from your experiences. For roboticists who don’t want to wait through the equivalent of an entire robotic childhood, there are ways to streamline the process: at Google Research, they’ve set up more than a dozen robotic arms and let them work for months on picking up objects that are heavy, light, flat, large, small, rigid, soft, and translucent (although not all at once). We talk to the researchers about how their approach is unique, and why 800,000 grasps (!) is just the beginning.

Part of what makes animals so good at grasping things are our eyes, as opposed to just our hands. You can grab stuff with your eyes closed, but you’re much better at it if you watch your hand interacting with the object that you’re trying to pick up. In robotics, this is referred to as visual servoing, and in addition to improving grasping accuracy, it makes grasping possible when objects are moving around or changing orientation during the grasping process, a very common thing to have happen in those pesky “real-world situations.”

One of the robotic manipulators used in the data collection experiments. Each unit consisted of a 7-degree-of-freedom arm with a 2-finger gripper, and a camera mounted over the shoulder of the robot. The researchers say the camera recorded monocular RGB and depth images, but only the monocular RGB images were used for grasp success prediction.Image: Google Research

Teaching robots this skill can be tricky, because there aren’t necessarily obvious connections between sensor data and actions, especially if you have gobs of sensor data coming in all the time (like you do with vision systems). A cleverer way to do it is to just let the robots learn for themselves, instead of trying to teach them at all. At Google Research, a team of researchers, with help from colleagues at X, tasked a 7-DoF robot arm with picking up objects in clutter using monocular visual servoing, and used a deep convolutional neural network (CNN) to predict the outcome of the grasp. The CNN was continuously retraining itself (starting with a lot of fail but gradually getting better), and to speed the process along, Google threw 14 robots at the problem in parallel. This is completely autonomous: all the humans had to do was fill the bins with stuff and then turn the power on.

“In essence, the robot is constantly predicting, by observing the motion of its own hand, which kind of subsequent motion will maximize its chances of success. The result is continuous feedback: what we might call hand-eye coordination. Observing the behavior of the robot after over 800,000 grasp attempts, which is equivalent to about 3000 robot-hours of practice, we can see the beginnings of intelligent reactive behaviors. The robot observes its own gripper and corrects its motions in real time. It also exhibits interesting pre-grasp behaviors, like isolating a single object from a group. All of these behaviors emerged naturally from learning, rather than being programmed into the system.”

With 14 robots all working on this problem, a lot of data get collected a lot faster, but at the same time, a lot of unintentional variation gets introduced into the experiment. Cameras are positioned slightly differently, lighting is a bit different for each robot, and each of the compliant, underactuated two-finger grippers exhibits different types of wear, affecting performance:

What the grippers of the robots used for data collection looked like at the end of the experiments. The researchers say the robots “experienced different degrees of wear and tear, resulting in significant variation in gripper appearance and geometry.”Image: Google Research

The upside to this is that the robots end up with a tolerance for things like minor hardware variation and camera calibration differences, making the grasping as a whole more robust. Even so, this method can’t be generalized too much, and is unlikely to work on significantly different hardware or in different grasping environments (like trying to pick stuff up off of a shelf). In future work, the researchers plan to explore increasing the diversity of the training setup to see how much more adaptable their technique can get. They’d also like to investigate how this method could be applied to “real world” robots that are “exposed to a wide variety of environments,
objects, lighting conditions, and wear and tear.”

For more info, we spoke with Sergey Levine at Google Research about what they’ve been working on:

IEEE Spectrum: Can you describe how your work is related to similar efforts, like Brown’s Million Object Challenge or UC Berkeley’s Dex-Net?

Sergey Levine: Like Dex-Net and the work at Brown, our work is predicated on the hypothesis that large datasets will have a transformative effect on robot capability. The principal difference between our work and these efforts is that we take a very direct and data-driven approach to a specific robotic problem—grasping—with minimal prior knowledge. Dex-Net uses a model-based approach and simulated data, while the Brown Million Objects Challenge has the substantially broader aim of collecting scans of a large number of objects (our approach doesn’t aim to collect scans, simply to learn to grasp from experience).

Why was the volume of data important, and what (if anything) could you have learned with more data?

We used between six and 14 arms at any given time (the number increased over the course of the experiment as more robots came online). We are still working to formally determine how much data is actually needed, but anecdotally, things started to pick up after about 200,000 grasps, and continued to improve up to 800,000 grasps (and seem likely to improve further with more data).

The volume is important for two reasons: (1) there are many possible geometric configurations of objects and grippers that are possible (2) additional data was always collected using the latest model, which was effective at picking out precisely those situations where the latest model was confident but incorrect, and therefore appending samples to the dataset that could improve the latest model further.

How does your hardware design affect the technique (and success) of grasping objects? Why did you choose this particular gripper, and can the approach be adapted to any gripper?

The approach is straightforward to apply to any parallel jaw gripper, and can likely be adapted to other grippers and hands. The hardware was not designed specifically for this task, it was just the easiest hardware for us to get access to at the required volume. That said, the particular fingers we used with our gripper are well suited for picking various objects.

How can this work be generalized so that the technique could be useful to other manipulators in other environments?

It is likely that, in order to generalize to other manipulators, the system must be trained with a variety of manipulators and end effectors in order to achieve generalization. The current system is a proof of concept. A practical application is likely to require more extensive training in a variety of environments, with a variety of backgrounds, and possibly in other settings (on shelves, in drawers, etc), as well as a mechanism for higher-level direction to choose what to grasp, perhaps by constraining the sampled motor commands to specific parts of the workspace.

You can read a preprint of the paper “Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection” by Sergey Levine, Peter Pastor, Alex Krizhevsky, and Deirdre Quillen, on arXiv.

[ Google Research ]

From Your Site Articles

The Global Project to Make a General Robotic Brain - IEEE Spectrum ›

robot ai industrial robots grasping robot grasping neural networks AI Google machine learning google robotics Sergey Levine manipulation dex-net

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

How Google Wants to Solve Robotic Grasping by Letting Robots Learn for Themselves

800,000 grasps is just the beginning for Google's large-scale robotic grasping project

50 Years Later, This Apollo-Era Antenna Still Talks to Voyager 2

This Blood-Sampling Cytometer Is Small Enough for Mars

Tiny Sensor Aims to Monitor Tumors in Real Time

Related Stories

Two Natural-Language AI Algorithms Walk Into A Bar...

Are Digital Humans the Next Step in Human-Computer Interaction?

To Learn To Deal With Uncertainty, This AI Plays Pong

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

How Google Wants to Solve Robotic Grasping by Letting Robots Learn for Themselves

800,000 grasps is just the beginning for Google's large-scale robotic grasping project

50 Years Later, This Apollo-Era Antenna Still Talks to Voyager 2

This Blood-Sampling Cytometer Is Small Enough for Mars

Tiny Sensor Aims to Monitor Tumors in Real Time

Related Stories

Two Natural-Language AI Algorithms Walk Into A Bar...

Are Digital Humans the Next Step in Human-Computer Interaction?

To Learn To Deal With Uncertainty, This AI Plays Pong