New Guidelines for $10 Million Avatar XPRIZE Promise Compelling Robot Challenge

With some input from the robotics community, XPRIZE's new guidelines will make this a robotics competition worth getting excited about

8 min read

Evan Ackerman is IEEE Spectrum’s robotics editor.

Conceptual illustration of a human avatar relationship.
Illustrations: iStockphoto

Earlier this year, XPRIZE announced a new challenge: a four-year global competition to “develop real life avatars,” with a US $10 million prize sponsored by All Nippon Airways (ANA). We like robot challenges, especially robot challenges with prizes big enough to attract top-notch competition, and the idea of creating remote presence systems that can do more than just send back video is a compelling one, with all kinds of potential use cases.

However, our first reaction to the sample of potential challenge scenarios published by XPRIZE was that they weren’t nearly difficult and compelling enough, meaning that the challenge wouldn’t promote the kind of cutting-edge innovation that we (and presumably XPRIZE) would like to see. To their credit, XPRIZE has put a lot of work into incorporating feedback from a variety of experts based on those initial guidelines, and today they are releasing a revised version of the challenge guidelines that have been completely recalibrated for a more difficult and long-term relevant challenge that we’re now super excited for.

Here’s a brief overview of XPRIZE’s inspiration for the Avatar challenge:

The ANA Avatar XPRIZE seeks to incentivize innovators around the world to imagine a future with avatars and integrate several emerging and exponential technologies to create a useful and functional physical robotic Avatar System. Current investment and research tend to focus on the development and incremental improvements of individual component technologies, rather than bringing together synergistic technologies to support transformational leaps. A successful solution to this challenge will enable humankind to take the next step in transcending the limits of physical transportation, leading to a more connected world.

An “avatar,” in this context, is a robotic system that’s designed to leverage the capabilities of having a human operating a robot remotely. While robots are good at lots of things, like being places and doing things that you don’t want to for whatever reason, complex real-world autonomy is still super hard. So tossing a human in the loop to help the robot interpret the world and make good decisions can help make a more effective system overall. 

The problem right now is that we don’t have all that many robots that are able to take advantage of remote humans in a comprehensive way. The DARPA Robotics Challenge showed us how critical humans were to robots doing complex remote tasks, but it also showed us how difficult it was to get that to work—the robots themselves generally just sent back visual information, and the operators (highly trained experts) interacted with the robots mostly using keyboard inputs, a very precise but not all that efficient or intuitive system. The Avatar XPRIZE Challenge will be very different. In order to complete the challenge, the robot avatar will need to serve as a surrogate for a remotely connected human, carrying out tasks and transmitting a variety of detailed sensory data back to the operator. 

So far, this is all essentially the same concept that we were introduced to back in March, but the details of the challenge have seen some significant updates. We spoke with Amir Banifatemi, general manager for innovation and growth at XPRIZE), who explained that the XPRIZE organization has been holding meetings with educational institutions, students, research labs, and industry players over the past several months, using their perspective and input to conduct a step-by-step revision of the challenge guidelines. They considered things like the implied meaning of the prize, what an avatar should be, what scenarios in the future an avatar might be useful for, what the overall impact could be, and how to calibrate the challenge to an appropriate difficulty. 

With all of that in mind, XPRIZE has conceptually subdivided the operation and functionality of avatars into three primary modes:

Mode 1—Operator Control Mode: Avatars serve primarily as remote agency mechanisms for the Operator. In this mode the robotic Avatar only does what the Operator does. The goal of this mode is to connect the human operator with another human or humans at a distance in such a way that the Operator feels themselves to be interacting with others as if they were truly in the remote location.

Mode 2—Enhanced Avatar Mode: In this mode, the Avatar provides enhanced capabilities to the Operator, such special skills like seeing in infrared, being able to map out and analyze an environment, or utilizing the muscle memory of experts that have trained it. 

Mode 3—Semi-Autonomous Mode: The Avatar takes input such as goals from the Operator, processes that input for understanding and then executes details based on that understanding, still under primary Operator control. 

It’s important to note that XPRIZE is very deliberately deemphasizing autonomy here—“avatars that are entirely or primarily autonomous will not be considered in this competition.” Partial or assistive autonomy is just fine, but essentially, the robot should not be making high-level decisions on its own. 

Each of these modes will be tested through a series of tasks making up different realistic scenarios. A scenario could be drawn from domains like health care, family connectivity, maintenance tasks, disaster relief, or exploration, and the tasks will involve specific interactions with remote humans or objects, inspired by real-world capabilities that would be useful in a robotic avatar. Here are some examples for each of the operating modes:

Mode 1—Operator Control Mode

  • The Operator, through the Avatar, selects a ball of a color specified by the Recipient and throws it to the Recipient.
  • The Operator, via the Avatar, gives the Recipient a gentle hug. The Operator can feel some aspect of the hug returned by the Recipient.
  • The Recipient tells the other person where they had a minor injury by touching that part of the Avatar. The Operator must respond in a way that shows they know exactly where the Avatar was touched because the sensation has been conveyed from the Avatar to the Operator.
  • The Avatar holds a cube that weighs between .5 and 3 kg in its hand. The Operator feels the weight and can guess the weight of the cube.
  • The Avatar is blindfolded and given an odd shaped object and through touch alone must convey enough haptic information back to the Operator so they can guess what that object might be.

Mode 2—Enhanced Avatar Mode

  • The Avatar conveys the location of a surrogate for a live person or animal buried under rubble by assessing the environment for heat signatures.
  • The Avatar uses its extrasensory abilities (such as noting vibrations or smells or ultrasonic hearing) to assess the danger in a particular location and conveys that information, along with the perceived level of threat to the Operator.
  • The Avatar conveys general sensor information about the environment to the Operator via an easily readable Heads-up Display
  • The Avatar reports on how many other humans are currently in its environment.

Mode 3—Semi-Autonomous Mode

  • The Avatar takes a command from the Operator to measure the perimeter, ingress and egress points of its environment and does so, producing a graphic map with the distances and openings clearly marked which is sent back to the Operator.
  • The Avatar continues to perform a given command such as taking a box off the shelf, or opening a door, while experiencing a network outage.
  • The Avatar continually scans sensors in its environment and sends that data back to the Operator via an easily readable Heads-up Display. Areas of concern are designated by some means of attentional focus, such as a blinking color.
  • The Avatar recovers from an unexpected glitch in balance, latency, or power.
  • The Avatar requests a spoken command to be repeated when it fails to hear correctly, such as in a noisy room.

These are all just suggestions from XPRIZE, of course, and the tasks and scenarios in the competition might be totally different. XPRIZE is also planning on incorporating scenario suggestions from the teams themselves, as well as suggestions from the public, which’ll definitely add some creative excitement to the event.

Making a robot that can do all of these things is going to be difficult. It’s certainly possible, though—there’s nothing here that robots or avatars couldn’t do, and much (if not all) of it has been demonstrated already, in many cases without a human in the loop at all. XPRIZE told us that they expect that there aren’t a lot of groups that would be able to build a robot avatar like they have in mind all on their own, and the challenge will allow groups with different specialities to work together and compete as a single team. 

XPRIZE suggests that because avatars are supposed to interact with remote humans in human environments, that some humanoid elements are probably at least a little bit important. But if you think you can solve this challenge with a hexapod or a drone or something, then go for it

As far as the actual hardware and interfaces that XPRIZE is looking for, there aren’t a whole lot of restrictions. Avatars can look like pretty much anything, and if there’s commercially available hardware that fits the bill, teams are free to just buy it. XPRIZE suggests that because avatars are supposed to interact with remote humans in human environments, and to some extent collect perception information in an easily human understandable way, that some humanoid elements are probably at least a little bit important. But if you think you can solve this challenge with a hexapod or a drone or something, then go for it.

The guidelines specify that avatars (and their human operators) will both have access to fast and reliable connectivity, which sounds great, but we were wondering if it’s a realistic representation of the situations under which avatars might be the most useful, like in disaster areas. Banifatemi told us that XPRIZE is confident that pervasive high speed connectivity will be the norm in the future, and anyway, they don’t want to make this challenge about solving those particular communication problems. What the challenge will emphasize is latency management—being able to handle the fact that no matter how fast your Internet is, it still may take a tangible amount of time for data to make the trip from your robot to you and back again, especially if you throw wireless networks into the mix. Avatars will have to figure out how to deal with this, which is likely where much of the assistive autonomy will come in, since it may not be practical to expect the human operator to make continuous decisions in real time.

The only bit of (slightly) bad news that we have about these updated guidelines is that XPRIZE’s original idea of having untrained operators running the avatars is likely going to change. The thinking was that in order to maximize versatility and usefulness, just about anyone would be able to use these systems with a small amount of training. We liked this idea, and XPRIZE did too, but it turned out to not be practical— the issue is that it would be very difficult to make sure that the operators would be able to judge each system fairly. In order to help make sure that the success of the avatar systems can be at least somewhat generalized (and aren’t completely dependent on a highly trained and experienced dedicated expert operator), they’ll be operated by a group of reasonably experienced judges that receive a small amount of training on each system.

The timeline for the ANA Avatar XPRIZE starts today—you can go register right now if you’re confident enough. I believe in you! XPRIZE expects to produce a more formal rules document in early 2019, and they’ll hold a summit for registered teams sometime in Q2 of next year. The rules and guideline will continue to evolve until registration closes, at which point XPRIZE should be able to finalize the specific kinds of tasks that teams should expect to perform in the first (and only) milestone competition. That competition will take place in 2021, with up to 20 teams splitting $2 million and advancing to the finals. The winner of the finals, held in early 2022, will take home the $8 million grand prize all by themselves.

XPRIZE is looking forward to seeing the creative ways in which teams approach this challenge, Amir Banifatemi told us. They’re trying very hard not to be too prescriptive with the rules and guidelines, which is part of the reason that the revisions have been so significant. “We want people to use bold thinking and their imaginations about how they’re actually going to build these avatars,” Banifatemi says. For our part, we really appreciate how much thought and effort XPRIZE is putting into this, and how open they are to incorporating community feedback to make this competition as useful and compelling as possible, and we’re very much looking forward to seeing it kick off properly next year.


The Conversation (0)