Qualcomm’s Scene-Detecting Smartphone System Is Almost Here

img 10NWQualcommSceneDetect Photo: Qualcomm

Artificial neural networks have done many cool things in recent years, including learning how to cook food by watching YouTube videos and making cars less noisy. Qualcomm is hoping to bring neural networks to our smartphones to help them recognize the world around them, enabling them to identify objects and act upon this knowledge. With the upcoming release of Qualcomm’s smartphone processor, the Snapdragon 820, this capability could be just a few months away.

Humans are really good at identifying and classifying objects. Computers, on the other hand, find this task exceedingly difficult. Only in the past few years have we started seeing the first systems with capabilities close to those of small children. However, even those typically require significant computational power, limiting their widespread use.

For several years, researchers from several Qualcomm facilities around the world have been working on a large project called Zeroth. Their goal is to discover new algorithmic advances in machine learning that perform things like visual perception and audio recognition, and to develop efficient implementations of those algorithms for power-constrained devices such as smartphones.

SceneDetect currently recognizes between 30 and 50 categories of scenes—including birds, mountains, people, and clouds

SceneDetect will be the first commercial implementation of this idea. It performs near real-time classification of the visual scene for a variety of categories using the Snapdragon 820 and the camera on your smartphone (or some other supported device).

SceneDetect relies on an emerging field of artificial intelligence called deep learning, and it is implemented by a kind of artificial neural network called a deep convolutional network. It is a form of artificial neural network that is constructed of layers of interconnected artificial neurons. These neurons are fed data and collectively work to solve a problem—recognizing an image of a dog, for example. To train the network to recognize the dog, researchers feed images of many kinds of dogs into the network, and the network’s pattern of internal connections is adjusted until the system recognizes dogs.

Once trained, the network is able to tell you if an image contains a dog or not, even if it has never seen the specific image of that dog before. Thus these networks are capable of “learning” the inherent characteristics of dogs and not just “remembering” dogs they have seen in the past. (For an explanation of deep learning by one of its inventors, see this interview with Yann LeCun.)

Convolutional neural networks are widely used in image and video recognition, and SceneDetect currently recognizes between 30 and 50 categories of scenes—including birds, mountains, people, and clouds. (That number was chosen because earlier research concluded that this was a reasonable amount for most users.)

The training of SceneDetect’s neural network was performed offline using a compute cluster, and only then was it deployed to the Snapdragon-powered devices. In order to properly train a neural network to successfully identify different scenes out of countless potential ones, Qualcomm used a very large sample set of prelabeled images. This option became available only in the past few years, in part due to the pioneering work of Stanford computer vision researcher Fei-Fei Li and her colleagues. Her ImageNet research project, which has been using the crowdsourcing technology Amazon Mechanical Turk to create a database of millions of labeled images since at least 2008, was key to the training.

Qualcomm believes that there are a number of real-time applications, such as in robotics, where offloading to the cloud is not possible. However, for less time-critical or more computationally intensive situations, turning to the cloud is an option.

The world is full of unlabeled data, and Qualcomm is looking at this as a huge opportunity to use its SceneDetect technology to label our surroundings. “Using SceneDetect, your smartphone will be able to categorize the visual scene using concepts that are very similar to how humans describe a scene,” says Samir Kumar, senior director at Qualcomm Research. Automatically categorizing scenes in still images and videos will be a huge boon to search engines, both locally on your device and online. You could search through all of your photos and immediately find all of those that include your kid eating ice cream, for example. SceneDetect is already capable of recognizing both real-time and saved images, and Qualcomm is looking to extend this capability to video. So in the future, SceneDetect could automatically generate metadata for videos you upload to YouTube, including information about all of the main scenes and a list of objects that appear on them at different times.

According to Jeff Gehlhaar, vice president of technology at Qualcomm, as SceneDetect technology improves to include video localization of objects within a scene and counting specific types of objects, it could break in to whole new categories of devices. These include security and monitoring systems, robots, and a host of Internet of Things gadgets that could benefit from what will one day be “cognitive cameras.”

About the Author

Iddo Genuth is an Israel-based technology reporter. For IEEE Spectrum he’s written about smart textiles and startups.

convolutional neural networks software neural networks qualcomm audio image recognition machine learning

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Qualcomm’s Scene-Detecting Smartphone System Is Almost Here

Engineers explain Qualcomm’s SceneDetect ahead of the release of the smartphone processor that runs it

About the Author

7 Bell Labs Breakthroughs Honored as IEEE Milestones

Video Friday: Musculoskeletal Robot Dog

The Untold History of the RESISTORS

Related Stories

Why IT Projects Repeat Costly Mistakes

Trillions Spent and Big Software Projects Are Still Failing

Airflow: From Stagnation to Millions of Downloads

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

Qualcomm’s Scene-Detecting Smartphone System Is Almost Here

Engineers explain Qualcomm’s SceneDetect ahead of the release of the smartphone processor that runs it

About the Author

7 Bell Labs Breakthroughs Honored as IEEE Milestones

Video Friday: Musculoskeletal Robot Dog

The Untold History of the RESISTORS

Related Stories

Why IT Projects Repeat Costly Mistakes

Trillions Spent and Big Software Projects Are Still Failing

Airflow: From Stagnation to Millions of Downloads