20 Teams to Compete for $10M Telerobot XPrize

Robots will explore remote physical embodiment in the ANA Avatar XPrize Finals

8 min read

Evan Ackerman is IEEE Spectrum’s robotics editor.

A woman in a VR headset holding motion controllers stands next to a humanoid robotic torso on a mobile base

Ideally, autonomous robots would be capable enough to do everything we wanted them to do, and lots of people are working very hard toward that goal. Annoyingly, though, humans are extremely capable, and with the exception of tasks that require a very specific combination of strength or speed or precision, having a human in the loop is still a good way of making sure that you get the job done. But the physical meat-sack nature of humans is annoying as well, restricting us to using our talents and (equally important) having physical experiences in only one location at a time.

The ANA Avatar XPrize seeks to solve this by combining humans and robots by enabling physical, nonautonomous avatar systems that allow remote users to see, hear, touch, and interact in real time. This isn’t a new idea, but with US $10 million up for grabs, this competition is the biggest push toward avatar robotics we’ve seen since the DARPA Robotics Challenge. And after a questionable start, the challenge evolved to (I would argue) better serve its purpose, with a final event coming up in November that will definitely be worth watching.

In the future, avatars could help provide critical care and deploy immediate responses in emergency situations, or offer opportunities for exploration and new ways of collaboration, stretching the boundaries of what is possible and maximizing the impact of skill and knowledge sharing.

Avatar robots are systems designed to provide the hardware and software necessary for a remote human to experience the robot's environment as directly as possible, and allow the human to interact with that environment. Strictly speaking, the extent to which you can call avatar systems “robots” is debatable, because the focus is usually on fidelity of experience rather than autonomy. The systems are free to assist their users with carrying out low-level tasks, but systems that are anything more than “semiautonomous” are specifically excluded from the XPrize competition.

Avatar systems are essentially very, very fancy remote-control stations connected to mobile manipulators. For the human in the loop, this typically means (at the very least) a virtual-reality headset along with some way of directly controlling a manipulator or two. However, the concept could be extended to include wearable sensors and even brain-machine interfaces. The idea is that once you have something like this up and running, distance ceases to be a factor, and you’d be able to effectively use the avatar system whether it’s in the next room over, the next continent over, or anywhere else from deep sea to high orbit.

The XPrize competition, sponsored by the Japanese airline ANA, has been underway for several years now. Twenty finalist teams have been selected to compete in Los Angeles in November. While each robot meets some general guidelines (mobile, safe to operate indoors and around people, under 160 kilograms, fully untethered), each team has its own unique hardware and approach to telepresence, which should make the competition incredibly exciting.

During the final event, each of these robots (and their remote operators) will have to complete 10 tasks that test the avatar’s ability to provide remote human-to-human connection, the potential to explore places where humans cannot, and feasibility of transporting the skills of an expert human to remote locations in real time. These tasks will measure tangible things like fidelity of remote perception (including touch), localization and navigation, and manipulation. There will also be tasks targeted toward more experiential things, including effectiveness of emotional expression and natural conversational turn-taking. While the full test tasks won’t be revealed until the final event, here are some examples of what the tasks will be incorporated:

  • The avatar introduces itself to the mission commander, repeats mission goals, and activates a switch
  • The avatar moves between designated areas, identifies a heavy canister, and picks it up
  • The avatar is able to utilize a drill
  • The avatar feels the texture of objects without seeing them, and retrieves a requested object

None of these example tasks necessarily seem that complicated to perform, but the key is performing them reliably and well, especially when it comes to things that aren’t quite as easy to measure in an empirical way—like being able to give (and feel) a gentle hug. The actual scoring will be done by expert judges acting as operators, which is a really great way of ensuring that the avatar interfaces are adaptable, effective, and user friendly. During the event, the scored portion of each trial will last a maximum of 25 minutes, but the 75 minutes beforehand will be spent training the operator to use the avatar system. Seventy-five minutes may sound like a lot of time, but for a new operator on a sophisticated system, the teams are going to need to focus not just on making their systems easy to use but also on finding an effective way of teaching people. I really appreciate this aspect of the challenge, because (as we’ve seen in both the DARPA Robotic Challenge and the SubT subterranean challenge) expert operators can accomplish amazing things, but that’s not a sustainable path for the broad adoption of practical remotely operated robots.

The final event itself will be free and open to the public in Los Angeles on 4 and 5 November. Spectators (that’s you!) will be able to see both the test course and the team garages, and there will be live broadcasts of the operator control rooms as well as feeds of what the operators are experiencing through the sensors of their avatar robots. The stakes are high, with $5 million going to the winning team, $2 million to second place, and $1 million for third.

For more details on the competition, we spoke with David Locke, the senior program director of the ANA Avatar XPrize.

IEEE Spectrum: Where did the inspiration for this competition come from?

David Locke: The vision that we had with ANA was this idea of teleportation—the idea that you could literally teleport yourself somewhere else. But we knew we didn’t have the technology to do that then, and we certainly don’t have it now, but what’s the next step? What if you could put an avatar in the middle of that vision and transport yourself anywhere in the world through that system. Teleportation might not be an option, but telepresence and telexistence is, where you can actually feel physically present in a location, using the robot as your conduit.

Compared to the DRC and SubT, where do you think that these avatar systems will be on the spectrum from robot operators to robot supervisors?

Locke: When I first came to this avatar competition, I was thinking a lot about the DRC, and how we could model this competition off of it. I actually brought in the technical lead for the DRC to help me run this competition. But to be clear, the avatar that we’re talking about is nonautonomous. This robot will not be making any movements without the operator dictating that it will make those movements. I think the future of avatars could include a combination of both nonautonomy and autonomy, but right now, we’re really focused on the nonautonomy. The reason is that it’s important for us that the operator feels connected to the environment, and one of the ways of doing that is by controlling your own movements, and not having the avatar tell you where to go. We want the user to have shared interactions that feel authentic, where you feel a full sense of embodiment in the remote environment.

But isn’t it true that sometimes when you’re remotely controlling a multiple-degrees-of-freedom robot, the task of doing so is complex enough that it makes you feel disconnected with the remote environment anyway? What about assistive autonomy to smooth that interface?

Locke: During the semifinals, we did find that a lot. The judges are in there judging their experience on both a subjective and objective level, and you would see them struggle with things like grabbing puzzle pieces. And sometimes the recipient judge would have to nudge them in the right direction. I definitely think that there’s some form of autonomy that’s going to come in and play a bigger role in avatars in the future, but it’s hard to say what that will be, and I’d love to explore it in an ANA Avatar XPrize No. 2. But I think right now, this nonautonomous zone is a good space for us, as sort of the “hello world” of what the potential is of avatar technology.

Can you describe what the finals will be focusing on?

Locke: We’ll be going for advanced tasks in mobility, haptics, and manipulation, with a focus on three domains: connection, the ability for humans to connect using an avatar as a conduit; skill transference, the ability to transport your skill set anywhere in the world using an avatar; and then exploration, the idea that you can use an avatar to travel anywhere from your own couch or access places that are dangerous or otherwise inaccessible to humans. Those are the three main things we’re going to hit on at the finals.

If connection is an important metric for this competition, how do you judge that in a fair way?

Locke: It’s actually weighted such that operator experience and the recipient experience are the most important parts of the competition, more important than the task-level objective scoring. This rationale has evolved since semifinals testing; after further review of our semifinals testing data, and because we saw such positive feedback/scoring from the judges regarding both the recipient and operator experience(s) during semifinals, for finals we are now weighting the ability to complete the required tasks. Experience absolutely remains a key factor. However, of the 15 possible points teams may earn at finals, 10 points will be task based with 5 points attributed to the operator/recipient experience(s). This will also help from a storytelling and audience experience perspective.

Looking at the finalist teams, there’s a lot of variety in the hardware, and more expensive hardware can make an enormous difference to how a robot performs. How does that factor into the competition?

Locke: It’s hard to say. Your team may not have the best hardware, but if you’re able to converge and integrate different technologies to make yourself successful, you have an advantage. But you know, it’s something that we face in all XPrizes, and I imagine DARPA has a little bit of this as well: How do you make it a level playing field for all the teams? Some of the steps that we took early on in this competition were to hit really hard very early on the fact that teams should be looking to collaborate and share ideas and tech, and to either combine and form larger teams or find other ways to support one another. We also tried to find different experts to link the teams up with for advice, as well as bringing in a number of different supply partners for free or discounted goods. And at the semifinals testing, we did distribute $2 million in milestone prize money.

I’ve done eight of these XPrize competitions. And the one thing that always blows me away is that for a lot of these teams, it's not just about the money or about winning—it comes from an honest place of advancing the technology, and I’m always shocked at the level of dedication that these teams have for making progress and pushing the tech forward. —David Locke

What should our expectations be for the final event?

Locke: This competition is so different from any other robotics competition, because it’s not solely about completing the tasks. It’s about exploring and connecting people, and that can be hard to demonstrate. The audience will need to really understand what they’re seeing, because it’s hard as an audience member to detect a connection between a human and a robot, right? The audience will be able to see the robot in the trial along with the recipient, but they’ll also see an operator view that shows what the operator is seeing and why they’re making the decisions they’re making as they go through the course.

People are going to have to keep in mind that this is like the very first computer. What we see isn’t going to go straight from the test course to store shelves. It’s going to take time for the technology to advance, and this will be phase No. 1. And what I would love to see is a way of continuing this challenge with XPrize year after year to help teams hone their tech and push it toward the market.

Long term, avatar robots will, we hope, go far beyond this competition. The pandemic has shown both how flexible physical presence should be, as well as how important it can be. Telepresence has been an important first step, which (at least for me) has given some tantalizing hints of just how powerful embodied remote presence can potentially be. The idea of an immersive experience is far more compelling, and the Avatar XPrize is going to help us get there.

The Conversation (0)