The vast majority of the fancy autonomous flying we've seen from quadrotors has relied on some kind of external localization for position information. Usually it's a motion capture system, sometimes it's GPS, but either way, there's a little bit of cheating involved. This is not to say that we mind cheating, but the problem with cheating is that sometimes you can't cheat, and if you want your quadrotors to do tricks where you don't have access to GPS or the necessary motion capture hardware and software, you're out of luck.
Researchers are working hard towards independent autonomy for flying robots, and we've seen some impressive examples of drones that can follow paths and avoid obstacles using only onboard sensing and computing. The University of Pennsylvania has been doing some particularly amazing development in this area, and they've managed to teach a swarm of of a dozen 250g quadrotors to fly in close formation, even though each one is using just one small camera and a simple IMU. This is probably the largest swarm of quadrotors which don't rely on motion capture or GPS.
Each little quadrotor is equipped with a Qualcomm Snapdragon Flight development board. The board includes an onboard quad-core computer, a downward facing VGA camera with 160◦ field of view, a VGA stereo camera pair, and a 4K video camera. For these flights, though, the drones are only using one or two cores of processing power (running ROS), a simple onboard IMU, and a downward-looking VGA camera with a 160 degree field of view.
Each quadrotor's job is to use visual inertial odometry (VIO) to estimate how far and in what direction it's moved from its starting position, which gives a good approximation of its relative location. To do this, it simply identifies and tracks visual features in its camera field of view: if the drone's camera sees an object, and that object moves right to left across the frame, the drone can infer (with some help from its IMU) that it's moving left to right. Either that, or there's an earthquake going on. Dead reckoning approaches like these do result in some amount of drift, where small errors in position estimation build up over time, but UPenn has managed to keep things under control, with overall positional errors of just over half a meter even after the drones have flown over 100 meters.
As each drone keeps track of its own position, it sends updates at 10 Hz over 5 GHz Wi-Fi to a ground station running ROS. The ground station collects all of those position updates, and sends commands back to the swarm to change formation. The only thing that the individual drones get back is a set of target coordinates and a time to start moving; each drone calculates is own trajectory, meaning that the ground station isn't doing all of the planning. This keeps things lightweight and distributed, so that the swarm can easily scale up to more drones. However, it's worth noting that as far as each drone is concerned, it's not really part of a swarm at all— it's just monitoring its own position and moving from coordinate to coordinate, and isn't aware (directly or indirectly) that there are other drones around it. Still, the system works very well, and as you can see from the video, the drones don't run into each other.
The outdoor testing that the video shows is notable for a few reasons. Some of it is done in very low light, which is always impressive to see for any VIO system, since it depends on identifying enough features to make a good position estimate, a tricky thing at night. And you can't tell from the video, but the average wind speed during the outdoor tests was 10mph, with 18mph gusts. It's a very robust tracking, in other words, which makes it more likely to be useful rather than just a novel demo.
This research was done at the University of Pennsylvania in Vijay Kumar's lab, by Aaron Weinstein, Adam Cho, and Giuseppe Loianno. We spoke with Giuseppe for more details.
IEEE Spectrum: Why hasn't this been done before? Can you describe the key problems you were able to solve?
Giuseppe Loianno: Compared to previous works, our solution presents the following main differences. First, we developed a scalable and extensible architecture for controlling multiple vision-based quadrotors. We use the term scalable to refer to the ease of adding additional agents to the system without sacrificing overall performance. Second, this is the first time that perception, planning and control are combined for autonomous navigation of multiple interchangeable aerial vehicles (up to 12 quadrotors) without relying on GPS or an external motion capture system. Finally, we use commercially available components and provide our source code online. This is the largest swarm of autonomous quadrotors that does not rely on motion capture or GPS position. We want anyone being able to realize experiments with multiple MAVs without the use of expensive motion capture systems.
The paper mentions that "chalk and colored tape was applied to the floor to provide ample features for VIO tracking" for indoor experiments. How much is the real-world usefulness of the system constrained by the need for visual feature requirements?
Yes, it is true we used chalk and colored tape while benchmarking inside the lab but, as you can see from the video, it works outdoors, and even in poorly lit environments. The real world is feature rich. Only human-made environments have clean, polished floors.
Although only a VGA camera was used in this work, the quadrotors were equipped with much more sophisticated sensors. Is the performance of the swarm scalable with just the VGA camera and IMU, or would using other sensors be necessary? What is the minimum hardware requirement to make this work?
We think it is scalable just with a single camera and IMU. Right now all the autonomy components are working onboard and for this reason we can have as many vehicles as we want compatibly with the space size. However, the other sensors like the stereo camera can be useful to map the environment and identify other vehicles on the fly to obtain a fully distributed solution.
It sounds like deployment of the swarm requires both a ground station and a careful initial setup of the drones at known offsets. How could the swarm made be easier to deploy and more independent of ground infrastructure?
In any multi robot deployment the robots need to know where the other robots are. We plan to use a localization algorithm that identifies each vehicle relative to its neighbors, with loop closure algorithms. Remember that the vehicles can communicate with each other, and with a base station during flight.
What are you working on next?
We are working on robustness, loop closure, dense reconstruction. We are developing a loop closure module to reduce the small drift that occurs with visual odometry— right now each robot needs to know its start location, but when the loop closure problem is solved then it is not needed anymore. These vehicles will be soon deployed to obtain collaborative dense reconstruction of the environment in a distributed fashion in a disaster scenario. We are reducing the role of the ground station and we are working on new tactics to increase formation resilience. Finally, we enhance the human-robot interaction aspect allowing the team of robots to supports missions in coordination with humans.