Despite the ubiquity of drones nowadays, it seems to be generally accepted that learning how to control them properly is just too much work. Consumer drones are increasingly being stuffed full of obstacle-avoidance systems, based on the (likely accurate) assumption that most human pilots are to some degree incompetent. It’s not that humans are entirely to blame, because controlling a drone isn’t the most intuitive thing in the world, and to make it easier, roboticists have been coming up with all kinds of creative solutions. There’s body control, face control, and even brain control, all of which offer various combinations of convenience and capability.
The more capability you want in a drone control system, usually the less convenient it is, in that it requires more processing power or infrastructure or brain probes or whatever. Developing a system that’s both easy to use and self-contained is quite a challenge, but roboticists from the University of Pennsylvania, U.S. Army Research Laboratory, and New York University are up to it—with just a pair of lightweight gaze-tracking glasses and a small computing unit, a small drone will fly wherever you look.
While we’ve seen gaze-controlled drones before, what’s new here is that the system is self-contained, and doesn’t rely on external sensors, which have been required to make a control system user-relative instead of drone-relative. For example, when you’re controlling a drone with a traditional remote, that’s drone-relative: You tell the drone to go left, and it goes to its left, irrespective of where you are, meaning that from your perspective it may go right, or forwards, or backwards, depending on its orientation relative to you.
User-relative control takes your position and orientation into account, so that the drone instead moves to your left when it receives a “go left” command. In order for this to work properly, the control system has to have a good idea of both the location and orientation of the drone, and the location and orientation of the controller (you), which is where in the past all of that external localization has been necessary. The trick, then, is being able to localize the drone relative to the user without having to invest in a motion-capture system, or even rely on GPS.
Using data from the eye-tracking glasses, the researchers compute a 3D navigation waypoint that tells the drone where to go.Photo: NYU
Making this happen depends on some fancy hardware, but not so fancy that you can’t buy if off-the-shelf. The Tobii Pro Glasses 2 is a lightweight, noninvasive, wearable eye-tracking system that also includes an IMU and an HD camera. The glasses don’t have a ton of processing power onboard, so they’re hooked up to a portable NVIDIA Jetson TX2 CPU and GPU. With the glasses on, the user just has to look at the drone, and the camera on the glasses will detect it, using a deep neural network, and then calculate how far away it is based on its apparent size. Along with head orientation data from the IMU (with some additional help from the camera), this allows the system to estimate where the drone is relative to the user.
And really, that’s the hard part. Then it’s just a matter of fixating on somewhere else with your gaze, having the glasses translate where your eyes are looking into a vector for the drone, and then sending a command to the drone to fly there. Er, well, there is one other hard part, which is turning where your eyes are looking into a 3D point in space rather than a 2D one as the paper explains:
To compute the 3D navigation waypoint, we use the 2D gaze coordinate provided from the glasses to compute a pointing vector from the glasses, and then randomly select the waypoint depth within a predefined safety zone. Ideally, the 3D navigation waypoint would come directly from the eye tracking glasses, but we found in our experiments that the depth component reported by the glasses was too noisy to use effectively. In the future, we hope to further investigate this issue in order to give the user more control over depth.
It’s somewhat remarkable that the glasses are reporting depth information from pupil-tracking data at all, to be honest, but you can see how this would be super difficult, determining the difference in pupil divergence between you looking at something that’s 20 feet away as opposed to 25 feet away. Those 5 feet could easily be the difference between an intact drone and one that’s in sad little pieces on the ground, especially if the drone is being controlled by an untrained user, which is after all the idea.
The researchers are hoping that eventually, their system will enable people with very little drone experience to safely and effectively fly drones in situations where finding a dedicated drone pilot might not be realistic. It could even be used to allow one one person to control a bunch of different drones simultaneously. Adding other interaction modes (visual, vocal, gestures) could add capabilities or perhaps even deal with the depth issue, or a system that works on gaze alone could potentially be ideal for people who have limited mobility.
“Human Gaze-Driven Spatial Tasking of an Autonomous MAV,” by Liangzhe Yuan, Christopher Reardon, Garrett Warnell, and Giuseppe Loianno, from the University of Pennsylvania, U.S. Army Research Laboratory, and New York University, has been submitted to ICRA 2019 and IEEE Robotics and Automation Letters. And extra special congratulations to Giuseppe Loianno, who is now one of the newest assistant professors at NYU and is starting a new Agile Robotics and Perception Lab there.
Evan Ackerman is a senior editor at IEEE Spectrum. Since 2007, he has written over 6,000 articles on robotics and technology. He has a degree in Martian geology and is excellent at playing bagpipes.