Four-Armed Marimba Robot Uses Deep Learning to Compose Its Own Music

The Georgia Tech Center for Music Technology, led by Gil Weinberg, has a reputation for doing incredible musical things with robots, with a mix of creativity and technical expertise in robotics and AI. We’ve seen projects like a cybernetic second arm for a drummer, a cybernetic third arm (!) for a drummer, and a bunch of interesting research on ways that robots can dynamically collaborate with humans in the context of improvisational music. That last thing usually features Shimon, a four-armed expressive robotic marimba player, which can analyze music in real time and improvise along with human performers.

It’s an impressive thing to watch, but Shimon’s talents were mostly restricted to riffing on what other human musicians were doing. Now, Shimon has leveraged deep learning to create structured and coherent and totally unique compositions of its very own.

This is Shimon’s very first original piece of music, a sort of classical-jazz fusion-y thing:

Shimon’s teacher (of sorts) is Georgia Tech Ph.D. student Mason Bretan. The melody and harmonic structure that you’re hearing is the output of a four-measure-long seed melody running through a neural network that’s been trained on nearly 5,000 complete songs (including music by Beethoven, The Beatles, Lady Gaga, Miles Davis, and John Coltrane), along with 2 million motifs, riffs, licks, and other foundational musical elements.

In the second piece that Shimon came up with, Bretan used a faster seed melody, and Shimon came up with something completely different but noticeably more brisk:

It’s important to understand that Shimon isn’t just mushing together different bits of music that it’s been programmed with, or using some kind of random-music generator. The special thing about what Shimon is doing here is that its deep neural network has, in effect, listened to those thousands of songs, and its compositions represent everything it’s learned from analyzing them. It’s able to generate harmonies and chords, and it focuses (like humans do) on the overall structure of the composition rather than simply what note should come next in an existing sequence.

Bretan calls this “higher-level musical semantics.” Shimon’s music isn’t something that we can necessarily identify with at this point, because we’re hearing the creative output of a deep-learning system. Weinberg calls Shimon’s music “beautiful, inspiring, and strange,” and we’d have to agree: This is something with coherence and structure, but it’s also completely unique.

For more details, we spoke with Bretan and Professor Weinberg over email:

IEEE Spectrum: Are the compositions that you selected to share videos of representative of what Shimon comes up with? Or did you select some that came out particularly well?

Gil Weinberg: These are the first two compositions Shimon composed using deep learning. No selection on our part. They are representing the data set Shimon learned from and the seed motif he was fed. One can imagine that if we extend the data set to include other music, and if we provide different kinds of seed melodies, the music Shimon will generate would be quite different.

If you trained the robot on only one type of music (say, classical music, or even classical music by a particular composer or school of composers), to what extent would the music it composes be identifiable as being related to the training set?

Weinberg: Shimon’s music is very much related to the training set, so if the data set had only one composer, the music would probably be quite identifiable with that composer (or genre). There is another important parameter at play, which is the seed music, which can lead to significant variations of the outcome.

Shimi musical robot Georgia Tech’s Shimon, a four-armed expressive robotic marimba player.Credit

Why do you feed Shimon motifs, riffs, and licks, and other musical fragments as well as complete songs? How does it integrate those two things?

Mason Bretan: We want the network to learn important structural concepts. If we draw an analogue to language, in order for someone to write a story, he or she would need to understand the concept of words, sentences, paragraphs, and so on. In music, things such as licks, motifs, passages, and so on are somewhat analogous components. To encourage learning these musical concepts, we don’t explicitly say, “Here is a motif, here is a full song, here is a lick.” Instead, we train the network dynamically, by varying the sequence length so that sometimes the network has to predict the next measure given just the previous measure, or sometimes given the previous 2 measures, or sometimes given the previous 8 measures, all the way up to 16 measures.

Can you give us a more detailed description of the process that Shimon uses to compose original music?

Bretan: The first (and arguably most important) step is learning an effective numerical representation of a small snippet of music, like a single beat or a few beats of music. This is called “neural embedding.” In language modeling, you may have heard of “word to vector” or “word2vec,” which is a method for a network to learn word concepts—such as the words “good,” “great,” “pleasant,” and “wonderful”—that are all semantically similar. A similar process is done in this work for music so that a network learns how to effectively represent small musical snippets such that similar snippets are grouped closer together.

The second part is the sequence modeling and prediction of these musical snippet vectors. A recurrent neural network is trained to make predictions given the previous measures of music. It’s not exactly the type of reinforcement learning commonly used in robotics, in which a robot learns a sequence of discrete actions to solve some problem. Instead, Shimon is predicting a sequence of numbers in a continuous space. Let’s say given the sequence “1, 2, 1, 2, 1, 2, 1” the network is trained to predict the number “2.” That means during training, the farther away it is from the number “2,” the more substantially it will update the parameters. So once the network is trained, a seed is given to the network to provide some context, and then it continuously makes predictions, which make up Shimon’s composition.

Does Shimon have a particular style as a composer? Can you elaborate on how Shimon’s compositions are different from the music that humans create?

Weinberg: The underlying rationale behind all of our robotic musicians is to combine between music that we humans love and appreciate (using machine listening and machine learning) and new ways to play and think about music (using algorithms that humans can’t or don’t use). Here, the deep-learning architectures aim at capturing musical concepts and patterns that are used by humans. As part of the generation phase, we can play with the algorithms to add mathematical permutations that are machine based and may lead to novel music, which may be beautiful, inspiring, and strange.

Are there practical applications for this learning and improvisation technique beyond musical composition?

Weinberg: We are using LSTM (Long Short Term Memory) networks, and unit selection; both approaches can be (and have been) used in language modeling and generation, which can be equivalent to “improvisation.”

What are you working on next?

Weinberg: We started to look at using deep learning to learn not only from symbolic notation but also from human performance of the music in the data set. This could allow the robot to learn not only what notes to play but also how to play them so the music sounds rich and expressive (controlling parameters such as microtiming, articulation and intonation).

Bretan: The next big questions I have are about interaction and how developing a deeper understanding of embodiment influences the compositional and perceptual processes of music. Shimon has four arms: How does that influence its interpretation of music compared to a human with two arms and 10 fingers?

Many thanks to Mason Bretan and Gil Weinberg for speaking with us. And if Shimon ever wants to learn from a bagpiper, just let me know.

[ Georgia Tech Center for Music Technology ] via [ Georgia Tech News ]

music robots neural networks gil weinberg music ai robot ai georgia institute of technology musical robots machine learning

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Four-Armed Marimba Robot Uses Deep Learning to Compose Its Own Music

Georgia Tech's Shimon has analyzed thousands of songs and millions of music clips and can now compose completely original music

This IEEE Society’s Secret to Boosting Student Membership

Why Haven’t Hoverbikes Taken Off?

Ukraine Is Riddled With Land Mines. Drones and AI Can Help

Related Stories

Two Natural-Language AI Algorithms Walk Into A Bar...

Are Digital Humans the Next Step in Human-Computer Interaction?

To Learn To Deal With Uncertainty, This AI Plays Pong

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Four-Armed Marimba Robot Uses Deep Learning to Compose Its Own Music

Georgia Tech's Shimon has analyzed thousands of songs and millions of music clips and can now compose completely original music

This IEEE Society’s Secret to Boosting Student Membership

Why Haven’t Hoverbikes Taken Off?

Ukraine Is Riddled With Land Mines. Drones and AI Can Help

Related Stories

Two Natural-Language AI Algorithms Walk Into A Bar...

Are Digital Humans the Next Step in Human-Computer Interaction?

To Learn To Deal With Uncertainty, This AI Plays Pong