Bipedal robots are a huge hassle. They’re expensive, complicated, fragile, and they spend most of their time almost but not quite falling over. That said, bipeds are worth it because if you want a robot to go everywhere humans go, the conventional wisdom is that the best way to do so is to make robots that can walk on two legs like most humans do. And the most frequent, most annoying two-legged thing that humans do to get places? Going up and down stairs.
Stairs have been a challenge for robots of all kinds (bipeds, quadrupeds, tracked robots, you name it) since, well, forever. And usually, when we see bipeds going up or down stairs nowadays, it involves a lot of sensing, a lot of computation, and then a fairly brittle attempt that all too often ends in tears for whoever has to put that poor biped back together again.
You’d think that the solution to bipedal stair traversal would just involve better sensing and more computation to model the stairs and carefully plan footsteps. But an approach featured in upcoming Robotics Science and Systems conference paper from Oregon State University and Agility Robotics does away will all of that out and instead just throws a Cassie biped at random outdoor stairs with absolutely no sensing at all. And it works spectacularly well.
A couple of things to bear in mind: Cassie is “blind” in the sense that it has no information about the stairs that it’s going up or down. The robot does get proprioceptive feedback, meaning that it knows what kind of contact its limbs are making with the stairs. Also, the researchers do an admirable job of keeping that safety tether slack, and Cassie isn’t being helped by it in the least—it’s just there to prevent a catastrophic fall.
What really bakes my noodle about this video is how amazing Cassie is at being kind of terrible at stair traversal. The robot is a total klutz: it runs into railings, stubs its toes, slips off of steps, misses steps completely, and occasionally goes backwards. Amazingly, Cassie still manages not only to not fall, but also to keep going until it gets where it needs to be.
And this is why this research is so exciting—rather than try to develop some kind of perfect stair traversal system that relies on high quality sensing and a lot of computation to optimally handle stairs, this approach instead embraces real-world constraints while managing to achieve efficient performance that’s real-world robust, if perhaps not the most elegant.
The secret to Cassie’s stair mastery isn’t much of a secret at all, since there’s a paper about it on arXiv. The researchers used reinforcement learning to train a simulated Cassie on permutations of stairs based on typical city building codes, with sets of stairs up to eight individual steps. To transfer the learned stair-climbing strategies (referred to as policies) effectively from simulation to the real world, the simulation included a variety of disturbances designed to represent the kinds of things that are hard to simulate accurately. For example, Cassie had its simulated joints messed with, its simulated processing speed tweaked, and even the simulated ground friction was jittered around. So, even though the simulation couldn’t perfectly mimic real ground friction, randomly mixing things up ensures that the controller (the software telling the robot how to move) gains robustness to a much wider range of situations.
One peculiarity of using reinforcement learning to train a robot is that even if you come up with something that works really well, it’s sometimes unclear exactly why. You may have noticed in the first video that the researchers are only able to hypothesize about the reasons for the controller’s success, and we asked one of the authors, Kevin Green, to try and explain what’s going on:
“Deep reinforcement learning has similar issues that we are seeing in a lot of machine learning applications. It is hard to understand the reasoning for why a learned controller performs certain actions. Is it exploiting a quirk of your simulation or your reward function? Is it perhaps stuck in a local minima? Sometimes the reward function is not specific enough and the policy can exhibit strange, vestigial behaviors simply because they are not rewarded or penalized. On the other hand, a reward function can be too constraining and can lead to a policy which doesn’t fully explore the space of possible actions, limiting performance. We do our best to ensure our simulation is accurate and that our rewards are objective and descriptive. From there, we really act more like biomechanists, observing a functioning system for hints as to the strategies that it is using to be highly successful.”
One of the strategies that they observed, first author Jonah Siekmann told us, is that Cassie does better on stairs when it’s moving faster, which is a bit of a counterintuitive thing for robots generally:
“Because the robot is blind, it can choose very bad foot placements. If it tries to place its foot on the very corner of a stair and shift its weight to that foot, the resulting force pushes the robot back down the stairs. At walking speed, this isn’t much of an issue because the robot’s momentum can overcome brief moments where it is being pushed backwards. At low speeds, the momentum is not sufficient to overcome a bad foot placement, and it will keep getting knocked backwards down the stairs until it falls. At high speeds, the robot tends to skip steps which pushes the robot closer to (and sometimes over) its limits.”
These bad foot placements are what lead to some of Cassie’s more impressive feats, Siekmann says. “Some of the gnarlier descents, where Cassie skips a step or three and recovers, were especially surprising. The robot also tripped on ascent and recovered in one step a few times. The physics are complicated, so to see those accurate reactions embedded in the learned controller was exciting. We haven’t really seen that kind of robustness before.” In case you’re worried that all of that robustness is in video editing, here’s an uninterrupted video of ten stair ascents and ten stair descents, featuring plenty of gnarliness.
We asked the researchers whether Cassie is better at stairs than a blindfolded human would be. “It’s difficult to say,” Siekmann told us. “We’ve joked lots of times that Cassie is superhuman at stair climbing because in the process of filming these videos we have tripped going up the stairs ourselves while we’re focusing on the robot or on holding a camera.”
A robot being better than a human at a dynamic task like this is obviously a very high bar, but my guess is that most of us humans are actually less prepared for blind stair navigation than Cassie is, because Cassie was explicitly trained on stairs that were uneven: “a small amount of noise (± 1cm) is added to the rise and run of each step such that the stairs are never entirely uniform, to prevent the policy from deducing the precise dimensions of the stairs via proprioception and subsequently overfitting to perfectly uniform stairs.” Speaking as someone who just tried jogging up my stairs with my eyes closed in the name of science, I absolutely relied on the assumption that my stairs were uniform. And when humans can’t rely on assumptions like that, it screws us up, even if we have eyeballs equipped.
Like most robot-y things, Cassie is operating under some significant constraints here. If Cassie seems even stompier than it usually is, that’s because it’s using this specific stair controller which is optimized for stairs and stair-like things but not much else.
“When you train neural networks to act as controllers, over time the learning algorithm refines the network so that it maximizes the reward specific to the environment that it sees,” explains Green. “This means that by training on flights of stairs, we get a very different looking controller compared to training on flat ground.” Green says that the stair controller works fine on flat ground, it’s just less efficient (and noisier). They’re working on ways of integrating multiple gait controllers that the robot can call on depending on what it’s trying to do; conceivably this might involve some very simple perception system just to tell the robot “hey look, there are some stairs, better engage stair mode.”
The paper ends with the statement that “this work has demonstrated surprising capabilities for blind locomotion and leaves open the question of where the limits lie.” I’m certainly surprised at Cassie’s stair capabilities, and it’ll be exciting to see what other environments this technique can be applied to. If there are limits, I’m sure that Cassie is going to try and find them.
Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning, by Jonah Siekmann, Kevin Green, John Warila, Alan Fern, and Jonathan Hurst from Oregon State University and Agility Robotics, will be presented at RSS 2021 in July.
Evan Ackerman is a senior editor at IEEE Spectrum. Since 2007, he has written over 6,000 articles on robotics and technology. He has a degree in Martian geology and is excellent at playing bagpipes.