Animal-AI Olympics Will Test AI on Intelligence Tasks Designed for Crows and Chimps

Experiments drawn from Aesop’s Fables can gauge general intelligence

Advertisement

Are today’s best artificial intelligence (AI) systems as smart as a mouse? A crow? A chimp? A new contest aims to find out. 

The Animal-AI Olympics, which will begin this June, aims to “benchmark the current level of various AIs against different animal species using a range of established animal cognition tasks.” At stake are bragging rights and US $10,000 in prizes.

The project, a partnership between the University of Cambridge’s Leverhulme Centre for the Future of Intelligence and GoodAI, a research institution based in Prague, is a new way to evaluate the progress of AI systems toward what researchers call artificial general intelligence.

While AI systems have recently bested humans in a host of challenging competitions, including the board game Go, the poker game Texas hold’em, and the video game StarCraft, these matchups only proved that AIs were astoundingly good at these particular tasks. AIs have yet to demonstrate the kind of flexible intelligence that enables humans to reason, plan, and act in many different domains.

To learn more about the Animal-AI Olympics, IEEE Spectrum spoke with Matthew Crosby, one of the contest’s organizers and a postdoctoral researcher at the Leverhulme Center and at Imperial College London. 

IEEE Spectrum: What was the genesis of this project?

Matthew Crosby: This idea came out of conversations with animal-intelligence researchers. You can take an animal, put it in an environment it’s never seen before, and give it a problem to solve, like getting through some contraption. Often the animal does solve the problem. Whereas if you train an AI to be great at a specific task, it doesn’t even make sense to put it in a new environment. It won’t even try to solve the problem. It just fails to behave.

Spectrum: What makes animal-cognition tests useful and interesting to AI researchers?

Crosby: A lot of animal cognition tests involve training the animal to take food from an apparatus. The researchers want to figure out if it’s succeeding because it’s clever and worked out how the apparatus works, or if it’s just repeating the pattern that it learned through trial and error. Is it succeeding through understanding or rote memorization?

We want to translate this to the AI arena and use these experiments to test for actual understanding of, say, the physics of an environment. Will the AI understand that if the food moves out of sight, it still exists?

Spectrum: Can you give me some examples of specific tests and tasks? 

Crosby: Well, the whole point of the competition is to test the AI on tasks it hasn’t seen before, so we can’t give away too much information. But we’ve picked out some examples to share, which are famous in the animal-intelligence literature. 

In one classic experiment, you put an animal in front of some upside-down, opaque cups. Under one cup, you put some food, and the animal’s job is to retrieve the food. At first you put food under the same cup every time, call it the A cup—that’s equivalent to the training phase in an AI environment. Then in the testing phase, you put the food under the A cup, then take it out, very visibly, and put it under the B cup. Some animals, like chimpanzees, will go straight to the B cup. But a lot of animals will still go to the A cup, because they’ve learned the task through memorization. 

We’re also drawing from Aesop’s Fables. This experiment was taken from a fable where a crow did exactly this. An apparatus with food is floating on water inside a test tube. The crow can’t reach down to get the food; it’s too far down. But the crow can learn to pick up rocks and put them in: The rocks displace the water, the water level rises, and eventually the food is up high enough so that the crow can get it out. In the experiment, you can have an environment with both rocks and pieces of cork, which float and don’t increase the water level at all. The crows learn to put the rocks in, not the cork.

Spectrum: It seems like these tests get at some pretty sophisticated aspects of intelligence, like generalizing knowledge and synthesizing new information. Maybe even creative problem solving.

Crosby: This project captures a lot of elements in AI research that are considered really hard. So far, we haven’t had benchmarks for them, because our benchmarks come from existing games that humans have played in the past. Here, we’re making tasks specifically to test things like generalization and transfer learning. Even if no one does incredibly well in the competition, it will still be useful. 

Spectrum: Is this competition intended partially to puncture the hype around AI? 

Crosby: There has been a lot of hype about AI. The successes are real, like AlphaGo beating the best human Go player in the world. But what that means for general intelligence is a lot harder for people to understand. A lot of the media reports on general intelligence are a bit overblown. 

It’s important to encourage skepticism. AI has made huge progress recently: There are problems that we can solve today that we couldn’t solve only three or four years ago. We just have to be careful about explaining what that means. An AI can be great at one task, but can it solve similar tasks that it hasn’t seen before? This competition is testing for exactly that kind of thing. Maybe we’ll be surprised by how well the AI agents do. But we think the problems we’re putting forward are very hard. 

Spectrum: What’s the procedure and schedule for this competition?

Crosby: We have about 50 tasks from the animal-intelligence literature now. In April, we’ll put out the full packets of information about the competition. In June, the competition goes live, we’ll release everything, people can start working on it. We’ll release lots of training environments with lots of objects. So the agents will know all the environments and objects, all the details will be there for them to learn from. But it’s a generalization challenge, so in the tests they’ll have to use the objects in different ways. In December, we’ll have the final results.

Spectrum: Do you think successful AI agents will have to display common sense

Crosby: There are research groups working on teaching AI an intuitive understanding of physics: What are the rules of the world it’s living in? I hope the people who are working in that area will enter this competition. They’re doing things that are really interesting, but may not have a test-bed that enables them to say, “We’re making good progress here.” I’m hoping they’ll hear about the Animal-AI Olympics and think, “This is our time to shine.”

The Tech Alert Newsletter

Receive latest technology science and technology news & analysis from IEEE Spectrum every Thursday.

About the Tech Talk blog

IEEE Spectrum’s general technology blog, featuring news, analysis, and opinions about engineering, consumer electronics, and technology and society, from the editorial staff and freelance contributors.