“I had a half-cup of oatmeal, with two-tablesoons of maple syrup and a cup of coffee. Oh, I put a handful of blueberries in the oatmeal, and there was milk in the coffee. It was skim milk.”
Ask someone what they had for breakfast, and this is the kind of description you might get. And that’s one of the reasons keeping track of food intake is such a problem for tech that’s meant to help a person lose weight or stick to a diet for other reasons.
Logging food for nutrition and calories is important to sticking to a diet, according to Susan Roberts, director of the Boston-based Energy Metabolism Lab at Tufts University. “It makes people more self-aware about the junk they are eating and how little they actually enjoy it, and the shock of huge portions, et cetera. But currently, it is really tedious to log your food,” she said in a press release. “A spoken-language system that you can use with your phone would allow people to log food wherever they are eating it, with less work.”
Roberts approached the Spoken Language Systems Group at MIT for a fix, and a team of engineers presented their prototype solution last week at the IEEE International Conference on Acoustics, Speech, and Signal Processing, in Shanghai. The system can take a person’s natural speech about a meal, understand it, match it to items in a U.S. Department of Agriculture database, and retrieve the right nutrition information.
There were two main challenges, according to James Glass, the MIT senior research scientist whose student, Mandy Korpusik, presented the results in Shanghai. The first is to understand what a person is saying. That means a machine needs to know that in “bowl of oatmeal,” bowl is a quantity referring to the food oatmeal. But in “oatmeal cookie,” oatmeal is describing the food, which is the cookie.
Glass’s team used a type of machine learning called conditional random fields. It’s a form of pattern recognition that’s particularly well suited to sequences of things such as gene sequences and spoken language, because it takes the context of a sample into account.
But the system still has to be taught what kinds of things to look for. And that meant human involvement—specifically a bunch of workers they found through Amazon’s Mechanical Turk. The workers described their meals and then labeled the parts of the description.
This data was used to train the system to understand meal-logging speech. Glass’s group then went back to a few of the Mechanical Turks for the task’s second big challenge: matching the meal labels to the USDA nutrition database.
It’s harder than it sounds. “Something like oatmeal might not even be in it,” says Glass, but there’s an entry for oats. So the system had to be trained to match labels from spoken words to where they best fit in the database’s language.
If it all seems very labor intensive, that’s because it was. But that’s the current state of the art. “Speech recognition doesn’t work the way humans work,” says Glass. “That’s the direction it needs to go into, but it’s not there yet.”
Glass stresses that the system is just a research prototype that would need improvement and real-world testing before it could be turned into a useable app.
Logging by voice might not be the only way to go in the future. Engineers at State University of New York at Buffalo and at Northeastern University in China are working on a gadget that tells what you’re eating by the sounds it picks up from your neck when you bite, chew, and swallow.