Ray Kurzweil's Music Revolution

The (audio) Singularity nears

6 min read

Ray Kurzweil's Music Revolution

I've been doing some research into the next generation of music technologies, and, not surprisingly, futurist Ray Kurzweil had plenty to say - sending me a lengthy response as to what he sees as the innovations to come. 

Kurzweil is best known these days for his extensive writings and lectures on the Singularity, a phenomenon which Spectrum investigated in great detail.  But he has also been intimately involved in the field of music, both at his company, Kurzweil Music Systems, and in his personal life (his father was a composer).  Here's what he had to say to me when asked about the future of music:

Walk around a typical music convention (such as the National Association of Music Merchants in Anaheim which I attended in January), and aside from the cacophony of music in all directions and the flamboyantly dressed performers it looks just like a computer conference given all of the complex hardware and software products on display for creating music. 

Consider how far we’ve already come.  My father was a famous conductor (conductor of the Bell Symphony, the symphony orchestra of the Bell Telephone system which appeared on TV frequently) and composer.  In order for him just to hear his compositions he would have to raise money, hire an orchestra, run off mimeographed scores, and then engage in rehearsals.  If he wanted to make substantive changes to his score, he would have to start this process over again.  Now a student in her dorm room can command a full orchestra (or jazz band or rock group) in her dorm room with very inexpensive yet immensely powerful equipment and software.  She then has access to distribute her creations to the world market at no cost. 

The tools for creating music have thus become an information technology and as such are now subject to what I call the “law of accelerating returns,” which states that every form of information technology grows predictably and exponentially in price-performance and capacity, basically doubling about every year.  Our intuition about the future is linear, not exponential and there is a profound difference between these perspectives.  Thirty linear steps gets you to thirty, whereas thirty exponential steps gets you to a billion.  So the computer in my pocket today provides a billion times more memory and a billion times more processing power per dollar than the computer I shared with thousands of other students when I was an undergraduate.

Every industry and area of life will ultimately be transformed in this way.  Health and medicine, for example, has recently become an information technology now that we have the software of life (the genome), means of updating this outdated software (RNA interference to turn genes off and new forms of gene therapy to add new genes), tools to design interventions on computers and biological simulators to test them out. 


In the music field, we saw the beginning of these trends in the early 1980s, around the time that I founded Kurzweil Music Systems. Nearly thirty years ago we set out to develop a digital technology that could realistically emulate acoustic instruments, the most important and complex one being the acoustic grand piano. There were many challenges in doing this due to the many nonlinearities in the grand piano.  Due to the multiple strings for each note that are slightly out of tune with each other, the overtones of the piano are not perfect multiples of the fundamental frequency – they are “enharmonic,” which gives the piano its unique rich character of sound.  Traditional samplers at the time would loop the last waveform during the decay phase of the sound and this turned the piano sound into what sounded like an organ.  We developed a unique way to retain the enharmonicity of the partials throughout the evolution of each note.  We were able to model the effect of key pressure on timbre, and to capture many other complexities.  The result was the Kurzweil 250, which was considered to be the first digital instrument that could realistically capture the grand piano and other orchestral instruments. 

In the intervening decades the electronic music field has broken the centuries old link between controllers and sound.  Traditionally, if you wanted to create violin sounds you needed to master violin technique (and have a violin!).  The problem was that very few people could master more than one or a couple of these techniques.  Even if you mastered them all you still could not play them all simultaneously.  Most instruments are not even polyphonic.  Today a musician can create any type of sound response (and keep in mind that an instrument is more than just sounds) while using any type of controller (albeit there is still work to be done to create a good polyphonic midi guitar).  As a result a new industry of controllers has arisen that are not limited by the physics of creating sound. 

New technologies are also engaging the consumer as co creator.  Consider how movies feed the game industry so that a movie viewer is now able to enter the world of the movie and become a participant.  The same thing is happening in the music world as the success of music games such as guitar hero attests.  Tod Machover of the MIT Media Lab has created music toys that allow children to compose symphonic works that are surprisingly rich and satisfying.  Applications such as "Virtual DJ" and "iRemix" allow users to alter music tracks in real time, automatically splicing in synchronized beats from other songs, adjusting tempo and generally deconstructing and rearranging content into something entirely new and personalized.  Music applications for Microsoft’s Project Natal music controller to be released later in 2010 will allow the controlling of music through expressive body movement. 

Keep in mind that these information technology tools will be at least a thousand times more powerful for the same cost in a decade, more than a million times more powerful in two decades.  As a result, the ability to immerse users in music that they participate will move from a game to a mainstream experience in the years ahead. 

In the decade ahead, music education software will become very effective at teaching music and playing skills and will have the ability to intelligently assess and address areas of strength and weakness.  It will become routine to learn keyboard and other skills using intelligent computer assisted music instruction.  At Kurzweil Music (now a subsidiary of Hyundai), we’re working on software along these lines.  “Easy play” software will involve more than just preprogrammed patterns that the user needs to follow.  Rather the instrument will follow the user.  Intelligent software programs that understand music theory will instantly interpret and even predict the creator’s intentions, and adjust the composition to ensure it’s in key and follows inherent musical rules.

There will be applications that will be able to gather all the subtle patterns present in one song, and apply them to another song, like a tint. Playback platform technologies will emerge that allow listeners to deeply and spontaneously interact with the music they consume – for example letting them "assign" a new singer to any song.  You could have Sinatra croon your favorite Pink Floyd tune, or transform your Green Day rock anthem into a samba. You could sample any voice, and digitally paint the song with it, or add a synthetic back-up choir of realistic human voices.

As virtual worlds become more immersive, realistic and accessible, and with ever faster computing power and communications connectivity, real-time collaborations will take place in virtual environments that will feel and sound real. Humans will remotely collaborate with other humans, or with intelligent programs that exist as a virtual DJ or avatar composer. And the memory of these performances and work sessions can be densely preserved – all nuances and details – allowing intelligent software to recall, learn from and then modify any performance. 

The interactive graphics field is grappling with a phenomenon called the “uncanny valley,” in which animations are close but not convincingly identical to real human behavior.  The result is a visceral negative reaction.  For this reason, many animated characters such as Shrek are distinctly different from humans.  To provide completely human avatars the interactive animation field will need to leap over the uncanny valley.  There’s a similar issue with music.  The early Moog synthesizers did not need to sound like real instruments and early emulations of orchestral instruments were rejected because of the uncanny valley.  The synthesizer field has successfully leaped this chasm and synthesizers are now used for almost all commercial music (soundtracks, musicals, commercials, popular songs).  The same leap has not yet been made in music collaboration software but that’s coming over the next five to ten years.

In music controllers we will progress from devices such as keyboards which provide touch sensitive switches to full immersive environments in which we can use our hundreds of muscles to interact with and shape rich musical tapestries. 

There are still frontiers to conquer in music synthesis.  The state of the art are instruments such as our Kurzweil Music PC3 series combine sampling with extensive digital synthesis, signal and sound processing.  The broad trend in computing is to recreate the real world with realistic simulations and then expand beyond what is possible with real materials and real environments.  Digital modeling has been around for a couple of decades but the enormous computational requirements still limit the ability to realistically capture real world instruments and effects.  The next decade, however, will be the decade of digital modeling.  We will be able to realistically simulate what happens to sound in instruments of real world complexity – for example providing a several hundred pole filter to simulate the effect of lifting the dampers on the several hundred strings of a piano (note that many of the 88 notes have more than one string).  Once the real world is captured we can then expand to fantastic virtual instruments with the complexity of the natural world but limited by only our imagination.

Music will remain the communication of human feelings and stories through sound.  We all communicate our feelings with words but very few of us have the opportunity to express ourselves by creating original music.  I believe the innate capacity to do this exists in all of us – the music technology that will emerge over the next decade will give voice to the music creator in all of us.

The Conversation (0)