Without a lifetime of experience to build on like humans have (and totally take for granted), robots that want to learn a new skill often have to start from scratch. Reinforcement learning lets robots learn new skills through trial and error but, especially in the case of end-to-end vision-based control policies, it takes a lot of time: The real world is a weirdly lit, friction-filled, obstacle-y mess that robots can’t understand without a frequently impractical amount of effort.
Roboticists at the University of California at Berkeley have vastly sped up this process by doing the same kind of cheating that humans do—instead of starting from scratch, you start with some previous experience that helps get you going. By leveraging a “foundation model” that was pretrained on robots driving themselves around, the researchers were able to get a small-scale robotic rally car to teach itself to race around indoor and outdoor tracks, matching human performance after just 20 minutes of practice.
That first pretraining stage happens at your leisure, by manually driving a robot (that isn’t necessarily the one that will be doing the task you care about) around different environments. The goal isn’t to teach the robot to drive fast around a course but rather the basics of not running into stuff.
With that pretrained foundation model in place, when you then move over to the little robotic rally car, it no longer has to start from scratch. Instead, you can plop it onto the course you want it to learn, drive it around once slowly to show it where you want it to go, and then let it go fully autonomous, training itself to drive faster and faster. With a low-resolution, front-facing camera and some basic state estimation, the robot attempts to reach the next checkpoint on the course as quickly as possible, leading to some interesting emergent behaviors:
The system learns the concept of a “racing line,” finding a smooth path through the lap and maximizing its speed through tight corners and chicanes. The robot learns to carry its speed into the apex, then brakes sharply to turn and accelerates out of the corner, to minimize the driving duration. With a low-friction surface, the policy learns to oversteer slightly when turning, drifting into the corner to achieve fast rotation without braking during the turn. In outdoor environments, the learned policy is also able to distinguish ground characteristics, preferring smooth, high-traction areas on and around concrete paths over areas with tall grass that impedes the robot’s motion.
The other clever bit here is the reset feature, which is necessary in real-world training. When training in simulation, it’s super easy to reset a robot that fails, but outside of simulation, a failure can (by definition) end the training if the robot gets itself stuck. That’s not a big deal if you want to spend all your time minding the robot while it learns, but if you have something better to do, the robot needs to be able to train autonomously from start to finish. In this case, if the robot hasn’t moved at least 0.5 meters in the previous 3 seconds, it knows that it’s stuck, and it will execute the simple behaviors of turning randomly, backing up, and then trying to drive forward again, which gets it unstuck eventually.
During indoor and outdoor experiments, the robot was able to learn aggressive driving comparable to that of a human expert after just 20 minutes of autonomous practice, which the researchers say “provides strong validation that deep reinforcement learning can indeed be a viable tool for learning real-world policies even from raw images, when combined with appropriate pretraining and implemented in the context of an autonomous training framework.” It’s going to take a lot more work to implement this sort of thing safely on a larger platform, but this little car is taking the first few laps in the right direction just as quickly as it possibly can.
“FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing,” by Kyle Stachowicz, Arjun Bhorkar, Dhruv Shah, Ilya Kostrikov, and Sergey Levine from UC Berkeley, is available on arXiv.
- Autonomous Drones Challenge Human Champions in First “Fair” Race ›
- To Fly Solo, Racing Drones Have a Need for AI Speed Training ›
- DARPA’s RACER Program Sends High-Speed Autonomous Vehicles Off-Road ›
Evan Ackerman is a senior editor at IEEE Spectrum. Since 2007, he has written over 6,000 articles on robotics and technology. He has a degree in Martian geology and is excellent at playing bagpipes.