This is a guest post. The views expressed in this article are solely those of the blogger and do not represent positions of Automaton, IEEE Spectrum, or the IEEE.
When Microsoft was developing its Kinect 3D sensor, a critical task was to calibrate its algorithms to rapidly and accurately recognize parts of the human body, especially hands, to make sure the device would work in any home, with any age group, any clothing, and any kind of background object. Using a computer-based approach to do the calibration had limitations, because computers would sometimes fail to identify a human hand in a Kinect-generated image, or would "see" a hand where none existed. So Microsoft is said to have turned to humans for help, crowdsourcing the image-tagging job using Amazon’s Mechanical Turk, the online service where people get paid for performing relatively simple tasks that computers are not good at. As a result the Kinect now knows what all (or most) hands look like. Great!
Well, that's great if all you care about is gesture-based gaming, but from my commercial robotics-oriented perspective, the problem is that a human hand is just one "thing" among thousands -- millions?! -- out there that we would like machines to be able to identify. Imagine if a robot could promptly recognize any object in a home or office or factory: Anything that the robot sees or picks up it would instantly know what it is. Now that would be great.
So the question is: Can we ever achieve that goal? Can we somehow automate or crowdsource image tagging of almost every object imaginable?