Deep Learning Startup Maluuba's AI Wants to Talk to You

Maluuba sees reading comprehension and conversation as key to true AI. It's built a new way to train AIs on those skills

5 min read

The Canadian startup Maluuba has developed deep-learning datasets to train AI on language comprehension and dialogue
Photo: James MacDonald/Bloomberg/Getty Images

Apple’s personal assistant Siri is more of a glorified voice recognition feature of your iPhone than a deep conversation partner. A personal assistant that could truly understand human conversations and written texts might actually represent an artificial intelligence capable of matching or exceeding human intelligence. The Canadian startup Maluuba hopes to help the tech industry achieve such a breakthrough by training AI to become better at understanding languages. The key, according Maluuba’s leaders, is building a better way to train AIs.

Like humans, AI can only get better at understanding languages by practicing. Maluuba aims to use the popular AI technique known as deep learning to improve computer systems’ language skills in key areas such as reading comprehension and having conversations. Toward that end, Maluuba has released two new sets of data designed to train deep-learning algorithms on becoming better at those crucial language skills.

“If you teach a machine to truly understand language, you’ve truly built artificial intelligence,” says Mo Musbah, vice president of products at Maluuba. “We’re excited about teaching a machine to truly engage in conversation or language comprehension.”

Big tech companies such as Google and Microsoft already use machine learning algorithms to help automatically perform language translation. For example, the popular Google Translate service now uses deep-learning algorithms to help Google users more accurately translate written sentences from Chinese to English or vice versa. But even Google Translate still has problems translating some sentences because its underlying AI lacks the the needed language comprehension skills.

The reality is that today’s AI technology is still a far cry from having the natural language skills of robots and computers depicted in science fiction films. The typical question-and-answer interactions with Apple’s Siri pale in comparison with the natural dialogue that flows between actor Joaquin Phoenix’s character and the AI named Samantha voiced by actress Scarlett Johansson in the 2013 film “Her.” Phoenix’s character eventually forms a romantic relationship with his AI companion as they share meaningful conversations that include both moments of laughter and sorrow.

Musbah, the vice president of product at Maluuba, brought up Samantha as an example of an AI possessing language skills far beyond today’s computer systems.

To get to the point where you can get Samantha from “Her,” you need to get the fundamental blocks of understanding language. She reads through emails and processes and provides back-and-forth dialogue. We’re excited because these are the stepping stones to get to point where you have true AI.

Deep-learning algorithms have the power to help AI learn on its own over time by filtering huge amounts of relevant data. In the case of fundamental language skills, that means deep-learning researchers need huge amounts of data that can challenge an AI to perform certain conversational tasks or comprehension and reasoning tasks. Creating those datasets takes both time and effort.

“The big challenge with deep learning in our space is that because it’s so data driven, the models you end up training are only as complex as the data you train them on,” says Adam Trischler, a research scientist at Maluuba.

Tech giants such as Google’s DeepMind AI lab and Facebook AI Research created the first big, publicly-available datasets for machine comprehension that contained enough data to train deep-learning algorithms. DeepMind’s CNN dataset creates comprehension challenges by deleting words from certain sections of CNN news articles to create “fill-in-the-blank” questions. Facebook AI Research created a similarly large dataset by deleting certain words from the passages of children’s books.

DeepMind’s and Facebook AI Research’s datasets were important first steps in training deep-learning algorithms, Trischler says. But he explains that these “fill-in-the-blank” questions can often be solved through simple methods such as context or synonym matching, rather than really challenging an AI’s language comprehension and reasoning.

So Maluuba set out to build a better dataset. It has now released the result, the “NewsQA” dataset, with more than 110,000 training questions. To build it, the startup enlisted the help of human workers through an online crowdsourcing service similar to Amazon’s Mechanical Turk. One set of workers looked at the highlights from CNN news articles and tried to come up with challenging comprehension questions. A second set of workers tried to answer those questions. And a third set of workers helped validate the pairs of questions and answers.

“We found that a large majority of the questions in our dataset do require reasoning beyond the context matching and synonym matching in previous datasets,” Trischler says. “That was our goal and we achieved that.”

Maluuba has also released a second “Frames” dataset with 1,368 dialogues to help train deep-learning algorithms on conversations. But instead of using an online crowd of anonymous workers to create the dataset, the startup invited 12 human volunteers to its Montreal-based lab. There the volunteers engaged in online chat conversations where one person pretended to be a customer looking to book a vacation and the second person pretended to be a travel agent consulting a database with information on different hotels, flights and vacation destinations.

These human-to-human conversations showed Maluuba that people frequently went back-and-forth on different travel routes and vacation possibilities. Examples of such dialogue challenge AI by requiring the computer systems to retain a memory of the different possibilities as a basis for comparison.

Such conversational capability remains far beyond Apple’s Siri or any online chatbots. Those can only answer questions about individual or sequential pieces of information that come in a specific order, says Layla El Asri, a research scientist at Maluuba. Previously, the most challenging dialogue dataset that was publicly available for deep-learning researchers was designed for a sequential process of searching for a restaurant with specific steps such as type of food, then budget, then  geographic location.

By comparison, Maluuba’s new publicly-available Frames dataset challenges deep-learning algorithms to have the memory to hold a natural conversation that can go back-and-forth on different points such as hotels, flights, and vacation destinations without necessarily following a specific order. The Frames dataset also allows researchers to study other aspects of natural language that still pose a huge challenge for deep-learning AI.

“The human beings did a lot of summarizing of information in the database, such as ‘The cheapest package I have is this one’ or ‘I don’t have anything under $2,000,’” El Asri says. “There is no natural-dialogue generation model that can do that kind of summarization.”

The Canadian startup has already begun using its datasets to begin training its own deep-learning algorithms to become better at both natural language comprehension and dialogue. But it has also made its new datasets publicly available to other researchers in the hopes of boosting the state of machine comprehension technology across the industry. The public release of such datasets could also raise Maluuba’s prestige if such datasets become the new industry benchmarks for testing deep-learning algorithms’ performance.

Maluuba’s bet on language as the key to elevating AI could also eventually face its own kind of test. The startup is working with a researcher at McGill University in Montreal on training an AI system that could take on the Winograd Schema Challenge: A test designed to determine how well an AI system can handle commonsense reasoning. One classic example of a Winograd Schema Challenge question goes: “I tried to put my computer inside the briefcase, but it was too small.” The AI system would have to figure out whether “it was too small” refers to the briefcase or the computer.

“The Winograd Schema Challenge is all about common sense,” Trischler says. “The reason we see that as something very important is because that goes hand-in-hand with the machine comprehension we’re working on.”

If Maluuba is right, training AI to become better at language comprehension and conversation could do much more than just deliver a more helpful Siri or smarter online chatbots. We might someday see an intelligent robot such as C-3PO or computing system such as Samantha step out of science fiction into reality.

The Conversation (0)