It's all about the algorithm--but which one will win?
Photo: Joran Hollender
Of the countless human pursuits touched by technology, music has been among the most profoundly transformed. Beginning more than a century ago, when Thomas Edison's phonograph gave rise to the recorded music industry, technology has brought music to the masses with steadily increasing efficiency, fidelity, and convenience. Today, the Internet, digital recording, and new storage technologies are coming together to prompt another momentous shift. It is liberating music from the last link to Edison's era: the dependence on physical, recorded media that has long confined it.
Of all the technologies fomenting this revolution, one of the most pivotal, and interesting, is the compression algorithm. The most common example is the ubiquitous MP3, which was a key enabler of Napster's rise in its copyright-flouting initial incarnation. MP3 is just one of an expanding array of such algorithms--more than 100 at last count--that also includes such contenders as WAV, WMA, Ogg, AAC, and AC-2. All of them use a variety of clever tricks to compress music files 90 percent or more, so that the data can be more economically transmitted over a network, such as the Internet, and stored on a computer or music player. They're all vying for a central role in the global recorded music industry, which now generates US $32 billion a year in revenues.
Powerful alliances are being formed. And unlike many previous technology-related business battles, technology may actually be a significant factor in this one. Consider Sony Corp.'s slick new music player, the NW-HD1. Praised for its compact size, long battery life, and clever touch-sensitive controller, the device nevertheless has been widely and bitterly criticized for its choice of compression algorithm: ATRAC3, a proprietary system used by Sony alone.
Not since the days of the PC operating system wars in the 1980s, arguably, has a software issue held so much sway over an emerging category of consumer electronics. And this time, at least, technology will weigh fairly heavily as the marketplace sorts out winners and losers.
For the algorithms, the basic tradeoff is between sound quality and how much they can compress the music files. But there are other important considerations, including the extent to which the full-fidelity, uncompressed files can be re-created from the compressed files, how copy protection is implemented, and how secure the downloaded files are from unauthorized distribution.
The contest is far from over. But already, glimpses of a seriously streamlined future for the sale and distribution of recorded music are apparent--ones that are showing the way for digital movies, too. The first part of the transition is well under way: Apple's iTunes alone is selling about $500 000 worth of music a week; additional online music services from RealAudio, Wal-Mart, Napster, and others are also doing brisk business. The advantages over the old industry model are overwhelming: record companies don't have to ship plastic discs all over the world, and music fans need no longer clutter their homes with racks of CDs or tapes. Instead, somewhere near their favorite audio listening spots are a hard drive (or two or three) and a computer displaying a list of thousands and thousands of songs, arranged by album, by artist, or simply by mood. No more shuffling through stacks of discs; a click of a mouse changes the tune.
This revolution is not happening just in the home. It is truly everywhere. Once compressed, music files can be quickly and easily loaded into a compact, shirt-pocket-size player, where they are stored on a miniature hard-disk drive or in flash memory. The hard-disk-based systems, such as Apple's ubiquitous iPods, can store thousands of songs--your entire music library, probably.
This is just a hint of what's to come. Today, early adopters are using wireless networks to move music from their computers to audio gear all over their homes. When wireless personal area networks based on the IEEE 802.15.3 standard become commercially available, people will be storing movies this way as well. Eventually, CDs and DVDs will join the vinyl albums gathering dust in the backs of closets, and yet music and movies will be everywhere--in file servers, magnetic and semiconductor memories, communications lines, and in the air itself.
Keep in mind that it was only five years ago that the music industry was facing a civil war over the next-generation disc-based music format--the successor to the wildly successful CD. At that time, hardly anybody doubted that the music would be encoded optically on a round plastic disc the size of a CD. The quibbling was over the technology of that encoding, and two leading contenders emerged--DVD-Audio and Super-Audio CD [see sidebar, "A New Generation of CDs"].
Even then, while the audiophiles were fiddling, a few technophiles were burning. Computer-savvy music buffs had started quietly applying some of the earliest compression formats, such as MP3 and WAV, to move their CD collections to their computer hard drives. At the same time, other people were expanding their collections by downloading music from the Web onto CDs.
It was a grass-roots revolt that the consumer electronics manufacturers couldn't ignore, and in 1998 they came out with the first portable digital music players. By then, Napster, the file-sharing system, was on its way to becoming a major, if controversial, force in the recorded music industry.
Today, with the move to the big time, challenges have come to compression formats that the pioneers didn't worry about. Playing compressed files on a high fidelity home system demands the ability to re-create a high quality signal. Copyright requirements, too--not a major consideration until recently--are now at the forefront as companies turn downloading into a legitimate business.
The challenge for technologists is to accommodate these new needs without interfering with the essential purpose of compression algorithms: making music files smaller. To understand this function, consider the compact disc. Music is converted into digits for a compact disc by a technique called pulse code modulation. Basically, the music's two channels are sampled 44 100 times a second, and each sample is converted into a pair of 16-bit numbers, one for each of the two stereo channels. Those numbers are then put onto the disc. Generally, CDs can store a maximum of 74 minutes of music, with one minute of music occupying nearly 10 megabytes, including raw data and overhead.
Even with a broadband connection, it would take about three and a half minutes to download each minute of music on a CD. A typical 50-minute CD would take 3 hours to download. Without compression, even the largest iPod, packing a 40-GB hard drive, could store only about 67 hours of music. It is compression that turns the little player into a library in your pocket, capable of carrying about 500 hours of music.
The trick is to take away bits without degrading the fidelity of the sound that the listener hears. To do this, the algorithms exploit quirks in human hearing and, more specifically, in the way the human brain processes sound.
Hearing varies from person to person, based on such things as age, sex, and previous exposure to loud noise. But even what is commonly called perfect hearing isn't so perfect. Most young people can hear frequencies between about 20 hertz and 20 kilohertz. But most adults, particularly older ones, cannot hear all that much above 16 kHz. And even within the wide swath of frequencies that most people can hear, some bands register more loudly than others.
Then there are the brain's perceptual quirks: it has trouble distinguishing tones that are closely spaced in frequency, and it reflexively masks any sounds that occur immediately after sudden, louder ones. (These two are known as frequency and temporal masking.) It also hears tones differently when they are sounded in isolation or accompanied by other tones. Therefore, masking algorithms that take into account tones and the harmonics surrounding them are somewhat more successful than those that ignore tone differences.
Compact discs ignore these limitations of human hearing, faithfully preserving all the sound between 20 Hz and about 20 kHz, and doing it flat across the entire frequency range, as though we perceived all bands equally. Compression algorithms, on the other hand, start by analyzing the mathematical patterns of the digitized sound. They compare these patterns with psychoacoustic models--models of human auditory perception, basically--which vary from algorithm to algorithm. The algorithm uses the models to deemphasize or discard portions of the signal that the listener isn't likely to hear.
Then the compression algorithm typically makes another pass through the compressed signal, looking for opportunities to further compress by eliminating redundancies. One way of doing this is called Huffman coding, which represents repetitive sounds more compactly. As a simple symbolic illustration, ABCABCABCABC would become (ABC) * 4.
Of course, because the musical experience is very subjective, there is no single, universally accepted set of rules for compression algorithms. In creating such an algorithm, also known as a codec, for coder/decoder, the developer must decide which and how many bits to throw out, striving for the best balance between quality and compression. It's a trickier challenge than it may seem. If you lean too far toward quality, you risk winding up with a poor compression ratio and no guarantee of consumer appreciation (recall the famous VHS/Betamax videotape battle, in which the longer-playing format vanquished the higher-quality one).
Sony's slick new NW-HD1 has been criticized for its choice of compression algorithm
On the other hand, going too far toward minimizing file size will bother aficionados, who periodically want to pipe the music from their pocket players through their home stereos. They would be dismayed to find that a high compression ratio had obliterated details otherwise audible with a good amplifier and speakers.
The latter problem illustrates the importance of the tradeoffs between what compression specialists call lossiness and losslessness, the degree to which the original sound data files can be exactly reconstructed from the compressed data. In a perfectly lossless algorithm, the decoded stream is bit-for-bit identical to the original stream, not just representative of that data.
A common strategy for achieving lossless compression emphasizes taking advantage of repeating patterns in waveforms. The compression algorithm uses these patterns to predict the next value of the signal and then encodes the usually small difference between the expected value and the actual value. Such techniques can compress an audio file to nearly half its original size.
Lossy formats are not without advantages; they give you somewhat smaller file sizes with higher-quality reproduction--the JPEG image format used in photography is one of the most common in this category. But losslessness counters with another advantage: it enables conversion to other formats. After all, if you can re-create the original digitized signal, or something very close to it, then you can convert that signal to some other compression format. This feature allows backward compatibility with existing hardware and players.
The ultimate format will have a high degree of lossless compression, because this attribute is essential for the futuristic scenario in which people simply store all their music on their home computers, playing it in high fidelity through their home stereos via ultrawideband.
Another important goal for the ideal compression algorithm is easy streaming. Streaming allows data to be transferred in a stream of packets that are interpreted and played as they arrive. When you are record shopping online and you click on a sample of music from a CD you're considering buying, the music is streamed to your PC. Whether or not a format is streamable is determined by the complexity of the algorithm, the power of the user's computer, and, most critical, the speed of the connection. In other words, if the algorithm is simple enough for the computer to execute in real time and the connection speed is fast enough to keep up, then the music will be streamable.
Last but certainly not least, the compression format will have to support digital rights management, or technical protection--that is, it must include technology that limits unauthorized copying and distribution.
Today's transportable compressed formats meet these ideal goals to varying degrees. The most successful compression standard by far to date is MP3, officially called MPEG-1 Layer III, introduced as part of the MPEG-1 standard in 1992. The MPEG standards come from the Moving Picture Experts Group, a working group of the International Organization for Standardization, in Geneva. MPEG-1 was the first compression format to come out of that group, optimized for encoding video on CD-ROMs.
MP3 is a lossy format that compresses CD music to one-tenth its original size and works well with streaming. At its heart, the MP3 format uses an algorithm that takes the data contained in CD music relating loudness to specific points in time and transforms it instead into data relating loudness to specific frequencies. Once that is done, extraneous information can be eliminated--for instance, if at any point a frequency is too quiet for a typical listener to hear, it can be thrown away.
MP3 gained popularity (and notoriety) as the format used for music file swapping in the mid-1990s. A wide range of products support it, including the majority of PDAs and dedicated portable music players. The quality of MP3 audio depends on the complexity of the signal to be encoded and on the quality of the encoder--which, as anyone who has used several different MP3 encoders will tell you, varies widely. For some listeners, MP3 audio is perfectly adequate. MP3 offered one of the highest compression ratios at the time of its introduction, but it has since been surpassed by newer formats. The quality is dependent on the compression ratio selected. MP3 is easily streamable, but it contains no digital rights management tools--the reason it was the darling of Napster and other file-sharing systems.
Introduced in 2001, mp3PRO is the next generation of MP3, offering the same quality as MP3 at half the file size, with a compression ratio of 20 to 1. It takes advantage of an audio compression technique developed by Coding Technologies AB in Stockholm, Sweden, that allows more of the data for the higher frequencies to be eliminated. It then reconstructs the high frequencies using an analysis of the low-frequency data along with additional guidance information transmitted with the encoded data.
Music that is mp3PRO-encoded is hard to discern from CD audio, especially when played back on a relatively low-fidelity computer or personal audio player. This format is backward compatible, which means that mp3PRO files can be played in ordinary MP3 players, albeit with some degradation of quality. Thomson Consumer Products and Royal Philips Electronics have several players that support this new format [see photo, ]. It has a compression ratio twice that of MP3 and is almost lossless and easily streamable. It does not, however, have digital rights management tools.
WAV (WAVEform audio format) is an IBM and Microsoft audio file format standard for storing audio on computers. WAV was one of the first such formats, introduced 14 years ago in Microsoft Windows 3.1, and is now most commonly used on Windows-based PCs. WAV files are virtually the same quality as files on audio CDs, but their very large size--10 megabytes per minute of audio--makes them unsuitable for everyday exchange via the Internet.
WAV audio can also be edited and manipulated with software relatively easily. As file sharing over the Internet has become popular, the WAV format has declined in popularity, primarily because WAV files take a long time to send. WAV has one of the lowest compression ratios and is virtually lossless, but it is not streamable and has no digital rights management tools.
Advanced Audio Coding (AAC) is a lossy data compression scheme intended for audio streams. Designed to replace MP3, AAC is an extension of the MPEG-2 international standard, which is widely used for the transmission of digital video. It was further improved in subsequent digital video formats, MPEG-4, MPEG-4 Version 2, and MPEG-4 Version 3. It has a wider range of sampling frequencies than official MP3 (8 kHz to 96 kHz, compared with the 16 kHz to 48 kHz of MP3) and handles high frequencies much better.
AAC provides better and more consistent quality than MP3 at equivalent or slightly lower bit rates. In fact, depending on the MP3 encoder used, 96-kilobit-per-second AAC can give the same or better perceptional quality as 128-kb/s MP3.
Two other formats, aacPlus and Dolby AAC, both standardized in 2001, enhance the standard AAC with proprietary technologies. Trademarked as aacPlus by Coding Technologies, the technology is also called AAC+.
Apple's ubiquitous iPods can store thousands of songs--your entire music library, probably
When people say "AAC" they usually really mean AC-2. Based primarily on adaptive delta modulation technology as refined by Dolby Laboratories, AC-2 was developed for professional audio transmission and storage applications where encoder and decoder complexity can be similar. It is embraced by Apple in its iTunes service and by Real Audio in its new online music store. In the iTunes version, Apple has added a digital rights management system it has tagged FairPlay, a name that Apple bought from Veridisc Inc., of Mundelein, Ill.
Microsoft's response to MP3 was the Windows Media Audio standard, WMA, which was released in December 2000. With the introduction of Apple's iTunes Music Store, WMA has been positioned as a competitor to the AC-2 format used by Apple. In compression ratio and quality, it is similar to MP3, and it offers the advantage of copyright-protected songs that cannot be published any further.
Files in WMA format can be played using Windows Media Player, Winamp, and even iTunes for Windows. With the advent of Windows Media Player 9, a new lossless codec has been introduced to accompany the existing lossy codec. The new release also supports variable bit rates. WMA features strong digital rights management.
The Ogg standard began in 1993 by the Xiph.org Foundation. It is an open-source project and can therefore be used without licensing fees. Various components of the project are intended to stand as alternatives to codecs that require license fees, such as MP3 and most of the rest. The Ogg codec includes lossy formats (which do a serviceable job of reproducing the audio when decoded but do not reproduce the original bit stream) such as Speeks, which handles voice data at low bit rates (from about 8 to 32 kb/s per channel), and Vorbis, which handles general audio data at mid- to high-level bit rates (from about 32 to 256 kb/s per channel).
The Ogg standard also includes lossless formats, such as Ogg's original codec, Squish, along with its successor, FLAC (Free Lossless Audio Codec), but has no digital rights management tools.
As one of the lossy offshoots of Ogg, the Vorbis format has a small but die-hard following that appreciates the format's good fidelity and the fact that it costs nothing. Though we may never see Ogg surpass MP3, Ogg has made major inroads in the video-game sector because game developers can use it without paying fees to anyone.
The lack of widely available audio hardware players is hindering Ogg's growth in mainstream audio, although such devices do exist, including the Neuros MP3 Digital Audio Computer with a firmware upgrade, the Rio Karma, the Xclef HD800, and the Cowon iAudio M3.
So who's pulling into the lead? It's hard to say at the moment. MP3 is still the undisputed leader, although its limitations--such as degraded quality at low bit rates--is starting to show. Certainly, with the success of Apple's iTunes service, the AC-2 format has surged into a strong position.
None of these formats, however, has all the characteristics necessary to dominate the market. None combines both lossless transmission and storage with the built-in ability to adapt to a variety of playback hardware.
The ultimate format is most likely to come out of the Motion Picture Experts Group, which has already brought us MP3, mp3PRO, and AC-2. This will be true despite the patents held on the MPEG algorithms that require them to be licensed for use in commercial products. Even though open standards such as Ogg have a large number of devotees and voluntary developers, they will have a difficult battle competing with internationally recognized standards in the commercial arena.
But regardless of which specific flavor of compression ultimately wins, there is no question that compression will change the way we collect and listen to music. Simply put, if you're a music fan, the best is yet to come. The audio library of the future will reside in a device about the size of a deck of playing cards that contains at least 2000 hours of your favorite music, has a wireless interface that communicates with your computer and your home and car audio systems, has a battery life of at least 90 days, and costs no more than a PDA or a cellphone. It will be music to your ears.
To Probe Further
For general audio information, see http://www.audiolinks.nl/.
For information on CD and DVD technology, see http://www.disctronics.co.uk/technology/index.htm.
For information on audio formats and testing, see http://www.litexmedia.com/article/.
For a concise audio history, see http://history.sandiego.edu/gen/recording/notes.html.
For an overview on how CDs work, see http://entertainment.howstuffworks.com/cd.htm.