The world's leading source of technology news and analysis
Search Spectrum IEEEXplore Digital Library Submit
Font Size: A A A
IEEE
Home [Alt + 1] Magazine [Alt + 2] Bioengineering [Alt + 3] Computing [Alt + 4] Consumer [Alt + 5] Power/Energy [Alt + 6] Semiconductors [Alt + 7] Communications [Alt + 8] Transportation [Alt + 9]

Sultan of Sound By Tekla S. Perry

Voice mail, speech recognition, the artificial larynx, packet-switched voice—these commonplace applications build on the pioneering research of James L. Flanagan
emailEmail PrintPrint CommentsComments ()  ReprintsReprints NewslettersNewsletters

Image: Jordan Hollender

The challenge was clear. Take the telegraph line that ran across the bottom of the Atlantic Ocean, carrying simple pulses of electricity over its narrow 300-hertz bandwidth, and use it to transmit voice conversations. The problem was, it was 1952, and the technology was rudimentary. Bandwidth compression of voice signals, called vocoding, had been done during World War II for radio transmission of speech—but the quality of the resulting speech was inadequate.

It was up to [see sidebar, "James L. Flanagan"]., then a graduate student at the Massachusetts Institute of Technology, in Cambridge, to come up with a better idea. He dismissed the then-current vocoding technology, which took 10 different frequencies of the voice signal and converted their varying amplitudes into analog signals for transmission. Instead, he went back to the fundamentals: how the elements of the voice are created and resonate within the human vocal tract. For different elements, resonances peak at different points in the spectrum; the frequencies of these points are called formant frequencies.

In human speech, the frequencies of formants depend on the shape and motion of the articulators—mouth, jaw, tongue, and lips. Different combinations of these shapes and motions create many different formants. Flanagan chose three of those. Then, by coding the variations in frequency for each formant, he supplied a more efficient means of representing speech than the technique used earlier, which coded the amplitudes of multiple signals at fixed positions of frequency. Although this formant coding was never implemented on those telegraph cables, it did set the stage for later advances in bandwidth conservation.

He also did experiments to determine how accurately the ear detects errors in such coding, establishing the engineering criteria for his own and future speech coding technology. In fact, a short letter to the editor of the Journal of the Acoustical Society of America (May 1955) describing these experiments is still his most frequently requested publication.

Flanagan's work with speech coding heralded a series of advances over the years, including a currently favored technique, linear predictive coding. In this technique, the value of a speech signal at each sample point is predicted by a combination of past samples. This type of coding is used in low-bandwidth speech communications today—in cellphones, voice mail systems, and computer-generated speech. Later ideas championed by Flanagan contributed to the development of modern automatic speech-recognition systems, audio codecs like MP3, and today's voice over IP technology.

"Wayne Gretzky once said he played hockey well because he doesn't go to where the puck is; he skates to where it is going to be," said Rich Cox, the vice president who directs the IP and Voice Services Research Lab for AT and T Labs Research in Florham Park, N.J. "That's Jim. He could see where things were going to go," even if it would take decades to get there.

For this—his "sustained leadership and outstanding contributions in speech technology"—Flanagan is being honored with the 2005 IEEE Medal of Honor.


Page 1 of 5 Next »
emailEmail PrintPrint CommentsComments ()  ReprintsReprints NewslettersNewsletters