This is a guest post. The views expressed here are solely those of the author and do not represent positions of IEEE Spectrum or the IEEE.
While robots have prepared entire breakfasts since 1961, general manipulation in the real world is arguably an even more complex problem than autonomous driving. It is difficult to pinpoint exactly why, though. Closely watching the 1961 video suggests that a two-finger parallel gripper is good enough for a variety of tasks, and that it is only perception and encoded common sense that prevents a robot from performing such feats in the real world. Indeed, a recent Science article reminded us that even contact-intensive assembly tasks such as assembling a piece of furniture are well within the realm of current industrial robots. The real problem is that the number of possible manipulation behaviors is very large, and the specific behaviors required to prepare a club sandwich aren’t necessarily the same as those required to assemble a chair.
From a strictly industrial perspective, general manipulation may not even be a problem that is worth solving. Indeed, you can build a machine for anything, from preparing fantastic espresso, doing your dishes, harvesting a field of wheat, or mass-producing a sneaker. This is how the majority of robots are currently employed in industry. Even those marketed as “collaborative robots” mostly become parts of a sophisticated machine in an automated line (that gets away with less safety caging). Any attempts to develop more generalized manipulation solutions that an academic might be interested in are benchmarked against these use cases. This makes the advantage of a general solution less obvious, and they risk getting stuck in a valley of inefficiency where investors and industry lose interest. However, manufacturing and delivery processes involve a long tail of highly varying manipulation steps. Even if each step is of negligible value, their cumulative cost is economically significant.
So how do we know whether a manipulation solution is general enough to unlock this value? The robotics community has proposed a series of challenges that require either solving a large variety of tasks or a manipulating a large variety of objects, including RoboCup@Home, the IROS manipulation challenge, and the Amazon Picking Challenge. While these competitions push for general solutions, it is still hard to define tasks such that they could not be better solved by a specialized solution. For example, the winning team at the IROS manipulation competition in Daejeon, Korea, used a Baxter robot and a system of self-adhesive foam cubes to manipulate items like dishes and spoons. Similarly, a majority of items in the Amazon challenge can be handled using suction alone. What we really need is a single manipulation solution that does equally well at all of the above tasks.
A different perspective was provided by the industrial assembly competition at the World Robot Summit, in Tokyo, that offered a ¥15M (US$130k) prize to the team that presents a generalizable solution to a series of industrial kitting and assembly tasks that can be reconfigured in a single day. Teams need to first pick a series of items with widely varying sizes (from M3 screws to electric motors to flexible rubber belts) from bins, place them into a kit, and then assemble them into a complex structure. The competition not only requires a manipulation solution that is both able to grasp and to manipulate, but the unknown objects and tasks emphasize that the solution must be easily reprogrammable on the day of the competition. If successful, such robots could be commissioned as easily as a human to help you with setting up your furniture, helping you with your move, or whatever other manipulation tasks that a human would quickly grasp, but a robot currently does not.
Suction, Grippers, and Soft Robots
So what are the options that we have to achieve general manipulation? Industrial automation is dominated by three competing paradigms: suction, mechanical grippers and hands, and more recently, soft robots. Suction is the dominating method of choice, because suction cups are deformable and conform around an object even if its location is not perfectly well known. Vacuum can then be applied to stiffen the suction cup and create a ring-shaped constraint that limits the object’s movement. This is very attractive as one suction cup alone can sufficiently constrain a large number of different objects. Yet, suction cannot solve every grasping problem, for example if an item is too heavy, too porous, or when further manipulation needs exact transfer of forces.
Exact transfer of forces can be achieved using mechanical grippers, which are most often deployed as parallel grippers or using two four-bar linkage mechanisms. Three-finger solutions are used much more rarely, and excel when grasping cylindrical objects from the top. A challenge with stiff mechanical grippers is that gripper velocity needs to be exactly zero at impact to avoid an undesired impulse. If contact is elastic, this impulse is mostly preserved, resulting in small objects bouncing off the gripper with high velocities. Undesirable bouncing can be reduced by making the gripper deformable to make the contact more plastic, by improving accuracy in perception so that the gripper can stop closing in time, or by caging the object within the gripper to constrain possible motions.
Taken to an extreme, these measures result into completely soft grippers, whose deformability prevent the object from bouncing and whose compliance reduces the need for highly accurate perception. Successful grasping benefits from larger contact surfaces that maximize friction and reduce the rotational degrees of freedom of an object. When grasping a rod with quadratic cross-section using a two-finger gripper for example, we would want to position the gripper such that its finger pads are parallel with two surfaces of the rod. A soft gripper will likely neither require the additional perception that is needed to determine the bar’s orientation nor the necessary planning as a soft gripper will conform around the object. While deformability helps to reduce the requirements on perception and planning, it makes it very difficult to exert forces on an object in a controlled way. Not only is the pose of the object within the soft hand mostly unknown, deformability also prevents the accurate transfer of forces. This might not be a problem when grasping and dropping objects, but makes any manipulation including simple pick and place very difficult.
The optimal gripper therefore must become as hard or as soft as possible, allowing objects to be grasped with as little perception and planning as possible, while removing uncertainty on the object’s pose and providing the ability for firm handling afterwards. At the same time, grasp surfaces should maintain contact once it is made. This can be achieved by combining the techniques described above. For example, a soft gripper might become stiff using granular jamming, or a suction mechanism can be complemented by an encompassing grasp to provide additional constraints. Similarly, a mechanical gripper might be enhanced with suction devices or electro-static pads for reversible adhesion. The human hand does an amazing job at combining these properties: the combination of hard (bones) and soft elements (ligaments and muscles) allow it to vary its stiffness so that it can conform around objects during grasping, while being able to precisely control a tool. These effects are supported by the soft padding of our fingertips, friction properties of our skin, and even stickiness that you can observe when picking up a small piece of paper just using a single finger.
There are a few low-hanging fruits that will allow us to combine the advantages of soft and conventional robotics to create commercially viable general manipulation solutions. One of them is impedance control applied to conventional two-finger grippers. If we control the impedance of a mechanism, we are controlling the force of resistance to external motions that are imposed by the environment. Good practical results can be achieved for robotic grippers by combining simple position control with limiting the maximum torque the motors can exert. By limiting torque, a stiff gripper can become arbitrarily (within the limits of the accuracy of its torque sensors) deformable. Like its fully deformable soft-robotic equivalent, an impedance controlled gripper can conform to an object and make up for inaccuracies in perception. At the same time, such a design can become stiff to precisely manipulate objects. Impedance control in conjunction with sensing finger position is also a form of tactile sensing. Specifically, a gripper will be able to detect the presence of objects in the environment by monitoring both position and torque. The resulting motions are gentle and make up for inaccurate perception.
Torque-controlled grippers could also serve as a platform to integrate recent results from soft robotics research into industrial practice: augmenting fingers by suction devices at their tips and their palms combines the benefits of precise control of position and force with the robustness of suction-based grasping. Torque-based sensing at finger joints could be augmented by tactile sensors that measure pressure and are strategically placed across the gripper. Tactile sensors on the palm and the tip might help to differentiate whether finger motion was inhibited by running into an obstacle or by making contact with the object of interest. Tactile sensors also directly complement vision sensors by determining when contact was made, thereby improving object pose estimation, and where in the hand an object was grasped.
Recent advances in 3D perception, however, bring us closer to general manipulation than ever before. 3D sensors such as the Intel RealSense are able to perceive objects as close as 11cm from the camera at accuracies that allow to make out very small items such as M3 screws, and integrated solutions have become commercially available, with my lab’s spin-off Robotic Materials Inc. just releasing a beta version of its hand. It is the interplay of accurate 3D perception, impedance control for gentle interaction with the environment, and the various ways of tactile sensing to assess grasp success that now can enable robust mobile manipulation in uncertain environments.
For example, we recently demonstrated a mobile kitting task requiring a robot to retrieve three different items (M3 screw, injection molded part, rubber belt) from bins whose location on a table was approximately known. Despite uncertainty of +/-10 cm from the autonomous cart that navigated to different waypoints in the environment, the robot was able to locate individual bins and items using 3D perception built into the hand. Limiting torque is then used to gently interact with bin content and minimizes the impact of a possible collision. Finally, tactile sensing, here using torque measurements, is used to assess grasp success.
Mobile manipulation for a “kitting task”. In this video, the location of the bins are approximately known and registered using 3D perception integrated in the robot’s hand. Impedance control is used for robust interaction with the bin and its content.
Despite their impressive enabling capability for general manipulation, 3D perception, impedance control, and tactile sensing are at odds with the prevailing industrial paradigm of specialized manipulation solutions. Any form of sensing requires time and puts hard limits on end-effector velocity to limit impact energy in case of unexpected collisions. The drivers for general manipulation will therefore be small and medium enterprises who are working on a large variety of products with small numbers, and larger players who want to differentiate their products with faster production cycles and higher degrees of customization. At the same time, mobile robots are becoming ubiquitous in warehouses, hotels and hospitals. In situations like these, specific manipulation tasks like loading, unloading, and simple maintenance might dramatically amplify the value proposition of such robots, creating the economic forces that we need to solve general manipulation.