AI-Powered Drone Mimics Cars and Bikes to Navigate Through City Streets

Deep-learning algorithm uses car and bicycle data set to fly a drone autonomously

4 min read

A car and bicycle image datasets were used to train DroNet, a convolutional neural network that can fly a drone through the streets of a city
Photo: Robotics and Perception Group/University of Zurich

Two years ago, roboticists from Davide Scaramuzza's lab at the University of Zurich used a set of pictures taken by cameras mounted on a hiker’s head to train a deep neural network, which was then able to fly an inexpensive drone along forest paths without running into anything. This is cool, for two reasons: The first is that you can use this technique to make drones with minimal onboard sensing and computing fully autonomous, and the second is that you can do so without collecting dedicated drone-centric training data sets first. 

In a new paper appearing in IEEE Robotics and Automation Letters, Scaramuzza and one of his Ph.D. students, Antonio Loquercio, along with collaborators Ana I. Maqueda and Carlos R. del-Blanco from Universidad Politécnica de Madrid, in Spain, present some new work in which they’ve trained a drone to autonomously fly through the streets of a city, and they’ve done it with data collected by cars and bicycles.

A car and bicycle image data sets were used to train DroNet, a convolutional neural network that can fly a drone through the streets of a city

Most autonomous drones (and most autonomous robots in general) that don't navigate using a preexisting map rely on some flavor of simultaneous localization and mapping, or (as the researchers put it), "map-localize-plan." Building a map, localizing yourself on that map, and then planning safe motion is certainly a reliable way to move around, but it requires big, complex, and of course very expensive power-hungry sensors and computers. And if we’re going to make commercial drones work, that’s just not feasible. 

Fortunately, it’s possible to replace all that hardware with a more data-driven approach. Given a large enough data set showing the right way of doing things, you can train a neural network to respond to simple inputs (like images from a monocular camera) with behaviors that are, if not necessarily complex, at least what a human would probably do. Unfortunately, you can't easily collect training data in a real, busy environment like a city. Fortunately, there are already plenty of data sets available for these kinds of environments, thanks to the whole self-driving car thing that’s been going on for a while. Unfortunately, these data sets aren’t ideal for training a drone not to run into things, since they do include data associating camera images with steering angles but (prudently) do not include associations between camera images and collision probabilities. Fortunately, Scaramuzza and his colleagues could just collect their own let's-not-run-into-stuff training data set by putting a GoPro on a bicycle and riding through Zurich.

The car data set and the bicycle data set together were used to train DroNet, a convolutional neural network that can safely fly a drone through the streets of a city. 

Using a monocular camera image as input, DroNet instructs whatever UAV it’s living inside to move forward in a plane with a specific steering angle and velocity. The velocity is moderated between ludicrous speed and zero depending on the probability of a collision. All of the training data comes from outdoor city streets, but the researchers found that it actually works pretty well in other environments too, like inside buildings and garages, even though no indoor data was used to train the network.

For details, we spoke with Professor Scaramuzza, who directs the Robotics and Perception Group at the University of Zurich, via email:

IEEE Spectrum: Can you describe how this research is an improvement on the quadcopter forest navigation from a few years ago?

Davide Scaramuzza: DroNet uses a completely different approach with respect to that work. In that work in fact, the network was just recognizing a trail and deriving, from the output, a discrete action to take (center, left, right). By contrast, DroNet continuously outputs control commands, resulting in much smoother performance. Additionally, DroNet can also recognize dangerous situations, such as as a pedestrians or biker crossing the way, and can promptly react to them by stopping the drone.

How well does DroNet generalize? You show that it works indoors and in parking garages; how different would the environment need to be to challenge DroNet? Are there specific situations in which your algorithms have trouble or are unreliable?

The degree of generalization is extensively shown in the video. Sometimes, even we were surprised by how well it could actually do. We tried to understand why this happens and discovered that the network responds specifically to "line-like" features in the environment. This is the case, for example, in streets, in parking lots, in indoor corridors, and in all other environments where this line-like features are present.

However, places where those features are not indicative of the motion direction, or where they are too dense would definitely challenge DroNet. This is the case for example when placing the drone within a forest without a clear trail to be followed.

DroNet datasetThe researchers used a data set created by Udacity to train DroNet how to steer (top row). To train the network how to avoid collisions, the researchers collected a different set of images (bottom rows: green box shows no-collision frames; red box shows potential collision frames).Image: Robotics and Perception Group/University of Zurich

The press release says that “the research team warns from exaggerated expectations of what lightweight, cheap drones can do.” What kinds of exaggerated expectations do you mean, and what kind of expectations should we have for drones like these?

This result is just very preliminary. At the moment, the network outputs a steering angle and a probability of collision. Therefore, the motion is constrained at a constant height. Additionally, it is not integrated with other tasks, such as exploration.

What we want to show with this research is what it is possible to achieve with such a simple shallow network (DroNet only uses eight layers and runs on a small CPU without requiring power consuming GPUs!). Therefore, the results we achieved are of relevance for all resource-constrained platforms, and could even be applied to nano drones (palm size, and a few tens of watts of power consumption) to make them navigate through urban environments.

What are you working on next?

Our next steps are going in the direction of making the drone more agile in their maneuvers, going much faster, and remove the 2D motion limitation. Additionally, we would like them to be a bit more intelligent. What we are aiming are is drones that can fly around and navigate exactly like birds do!

The researchers have publicly released all of their data sets, code, and the trained networks, and you can find that good stuff along with the paper itself at the link below.

[ DroNet ]

The Conversation (0)