Hey there, human — the robots need you! Vote for IEEE’s Robots Guide in the Webby Awards.

Close bar

Sight for Sore Ears

Dutchman develops auditory imager for the blind

3 min read

What does your cellphone do for you? If you are blind, it just might give you sight. In October, Blue Edge Bulgaria, in Sofia, a maker of software applications for cellphones, announced the development of software that turns compatible camera phones into visual aids for the blind by changing images snapped by the camera into sounds that the user's brain can reconstruct into mental pictures.

Blue Edge's software is the latest derivative of a suite of programs developed by Peter Meijer, a research physicist at Philips Research, Eindhoven, the Netherlands, who in 1998 produced the first working prototype of the vOICe system. (The three middle letters stand for "Oh, I See.")

Meijer's complete vOICe system [see drawing, " Mind's Eye"] translates moving images into sounds in real time, while Blue Edge's software only transforms still images. Compared to competing experimental aids for the blind such as tactile imagers, retinal implants, and brain implants, vOICe is a model of simplicity because it consists entirely of software and off-the-shelf equipment: a camera, stereo headphones, and a laptop computer, which a user can carry in a backpack.

Once per second, the computer scans a 64-by-64-pixel frame from left to right, one column at a time. Each pixel in a column produces a wave whose frequency indicates its position; the highest frequencies are at the top. Amplitude is based on the brightness of the pixel on a 16-tone gray scale. The brightest pixels produce waves with the highest peaks; black pixels, assigned amplitudes of zero, produce no waves.

So if 30 pixels in a column are black, only 34 of the 64 frequencies will be represented. Frequency is then translated into pitch and amplitude into volume; what a listener hears is a musical chord--admittedly a rather dissonant one--of up to 64 notes.

Once data has been extracted from the 64th column, the system grabs and digitizes a new video frame. In the 20 milliseconds between the end of the tones from the last column of one frame and the beginning notes of the next, the system generates an audible click that helps orient the listener. In a sense, it says, "What you are about to hear is the left side of the [image]," says Meijer, who works on vOICe separately from his job at Philips.

To further boost the listener's spatial orientation, the stereo headphones shift the volume balance from left to right in step with the movement of the pixel scanner. This gives the person a sense of where objects are, even if they have a difficult time keeping track of where the scanner is between clicks.

Meijer intentionally chose a low-resolution camera for capturing the images from which the soundscapes would be made because the human ear has a far lower capacity for handling data than the eye. The bit rate that each ear can accommodate is about 15 kB/s. A camera that captures 24-bit color VGA (640-by-480-pixel) images at, say, 25 frames per second produces more than 180 MB/s. This is 6000 times faster than the structures in the ears can vibrate in response to sound waves. But by limiting each image to 4096 pixels (instead of the 300 000 in a VGA image), with four bits per pixel (rather than 24 or more), and scanning to a single frame per second, the Dutch researcher was able to get the bit rate down to just over 16 kB/s.

How is the brain able to make pictures from sounds that, to the uninitiated, resemble the noise one hears when a fax machine begins transmission? "We simply do not know yet," says Meijer. There is new evidence that the part of the brain responsible for sight, the visual cortex, will, after some training, respond to changes in pitch. And that phenomenon, part of a well-known but poorly understood cerebral adaptation capability known as brain plasticity, is currently being studied at the Heinrich Heine University of Düsseldorf in Germany using the vOICe system and other sensory substitution devices.

Meijer likens the process of learning to recognize sounds as shapes to acquiring a foreign language. Novices begin by mastering an elementary visual alphabet: a circle, a rectangle, a triangle, an oval. Since, generally speaking, all other shapes can be said to be combinations of these, the wearer of such a device begins to gain "fluency" that makes recognizing whether a door is open or closed, or whether a chair is occupied, almost second nature. A deeper understanding of the reason for such a sensory crossover could lead to quicker and more effective adaptation to the vOICe system, which Meijer says is sorely needed.

Photo: Peter Meijer

Voice Headset

An early version. Cellphone camera or video camera on eyeglasses draw less attention

Though it takes time to learn to use, the vOICe system has some devotees. The vOICe Web site, https://www.seeingwithsound.com, contains diary entries of one of the first people to use a vOICe device. She noted that over time, as she grew more accustomed to the system's drone, her recognition of objects around her home improved dramatically. And in a development that was a surprise to even Meijer and his colleagues, she reported experiencing gradual development of depth perception

This article is for IEEE members only. Join IEEE to access our full archive.

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, podcasts, and special reports. Learn more →

If you're already an IEEE member, please sign in to continue reading.

Membership includes:

  • Get unlimited access to IEEE Spectrum content
  • Follow your favorite topics to create a personalized feed of IEEE Spectrum content
  • Save Spectrum articles to read later
  • Network with other technology professionals
  • Establish a professional profile
  • Create a group to share and collaborate on projects
  • Discover IEEE events and activities
  • Join and participate in discussions