Last month, Georgia Tech’s Center for Music Technology introduced the latest version of Shimon, its head-bopping, marimba-playing robot. Along with a new face, Shimon has learned to compose and sing original music with a voice and style of its own. Today, Shimon’s full album will be released on Spotify, along with a series of music videos and a demonstration of a new talent: real-time robot on human rap battles.
After Shimon’s first video posted, we asked its creator (and musical collaborator) Gil Weinberg what the reaction was like, and here’s what he told us: “The YouTube comments about our first single demonstrate a similar polarized set of responses falling into two camps: Either ‘amazing’ and ‘adorable’ on one hand, or ‘an abomination’ and ‘leave art for humans’ on the other.”
He added, “We think this is great! We want the project to stretch people’s imagination, push them away from their comfort zone, and make us all think about the future of music and creativity in new ways. Some people will be fascinated by the idea of artificial creativity and robotic singer-songwriters, and others will object to it or find it scary. The good news is that we now have examples of what such a human-robot collaborative future may look like, and we can start having that discussion.”
There are going to be a lot more things to discuss as of today—first, here’s a just-released music videos featuring one of Shimon’s newest compositions, “Gospel in Space”:
You should be able to listen to the other tracks on the album at the embedded player below or on Shimon’s BandCamp page. Some highlights that are definitely worth checking out include Shimon’s cover of “Under My Thumb” by the Rolling Stones, and “Pandemic Rap,” which Shimon composed based around the words pandemic and hope.
The poetry slam and rap styles are new for Shimon, but what’s even more challenging for the robot is rapping in real-time when another human is involved, which Shimon has been learning how to do:
For more details on how this works, we spoke with Gil Weinberg via email.
IEEE Spectrum: For Shimon, how is rapping different from singing? How are rap battles different from how Shimon normally composes?
Gil Weinberg: For our rap-lyric generation engine, we developed two new capabilities—rhyme and rhythm. The art form of rap has a unique approach for rhyming (both at the end of lines, and internally) as well as a strong emphasis on rhythmic pronunciation. My Ph.D. students, Richard Savery and Lisa Zahray, supervised a group of undergraduate students who worked on different approaches for such hip-hop inspired rhyme and rhythm in an effort to capture the feel and groove of rapping, as well as pushing it forward to uncharted domains.
Another difference between our singing and rapping voices stems from the requirement for real-time interaction in rap battles. While our singing voice used a time-demanding deep learning–based voice synthesizer (developed by the music technology group at UPF), for the rapping voice we use a faster commercial voice synthesizer that allows Shimon to respond within seconds rather than hours. Similarly to our original lyric generation, here too, Shimon is picking a few keywords from the rapper to generate his lyrics. But since in this case, Shimon’s voice does not need to follow sophisticated melodies or to sound great in different pitch registers, we could speed up the process to support real-time interaction.
When you say that Shimon is able to “listen to, understand, and respond to lyrics in real time,” what does “understand” mean?
We developed a deep-learning neural network that can use keywords selected from the human rapper to create semantically relevant responses. The network finds meaningful correlations, synonyms, and relevant subject matters, generating related text based on the large data set of hip-hop lyrics that it was trained on. So, while we cannot say that Shimon “understands” the rapper in the same way humans do, we do believe that Shimon’s responses are relevant and meaningful enough to be perceived as that he understands what he hears.
Since rap battles happen in real time and you can’t curate what Shimon says, how are the lyrics that it comes up with different?
While our real-time approach for rapping can make the interaction between human and robot surprising and exciting, it can also lead to strange, meaningless, and unexpected responses here and there, which we actually embrace. When you watch the video of the rapper Dash Smith interacting with Shimon, you can see how Dash smiles and shakes his head in surprise every now and then while listening to Shimon’s response. We think that if Shimon can surprise us and make us smile (even if it is due to some nonsensical responses), it might actually push the genre boundaries and can lead us to respond in a manner we wouldn’t have otherwise.
Can you give us some examples of lyric failures or cases in which Shimon comes up with songs or lyrics that are objectively bad?
Since we trained our model using noncensored hip-hop lyrics data set, you can imagine that the original lyrics Shimon generated were extremely explicit and oftentimes offensive. It was quite an unsettling experience to hear our cute robot using such language, so we wrote a script to take out offensive words. While I have many examples of such uncensored bad lyrics, I don’t think you want me to share them here! [Editor’s Note: Of course we wanted Weinberg to share them, and he did. One example is below.]
I wouldn’t know the sun was out.
I hate every day I was just a [censored] problem
I love you and I don’t wanna [censored]
I don’t wanna go
I know I wanna be a man
The more family-friendly bad lyrics usually happen when Shimon uses the keywords out of context, with bad grammar leading to nonsensical responses such as the following lyrics that were generated based on the keyword “world”:
I only do it, I ain’t world
I’m produced by, vibing to a turn and try
Your ex told me she can go world
No other choice, before, I’ll tell him they call world
What is your own favorite song so far, and why?
That’s a tough one as I like them all. If I had to mention one unique track it would be the latest addition to the album called “Pandemic Slam.” Just a few days before we submitted the album to Spotify, when the coronavirus was spreading exponentially and everything seemed pretty desperate, I asked myself whether Shimon could possibly help provide some comfort. We asked Shimon to quickly generate a poem based on the keywords “pandemic” and “hope.” Since we didn’t have time to compose and record a full song, we decided to generate the voice in a Poetry Slam style and to add it as the last song in the album. We later asked Jason Barnes, an amputee drummer who worked with us on the prosthetic arm project, to compose a beat to the song and we added it as a bonus track on our Bandcamp page. Personally, I think it is pretty good.
Pandemic born of a host
the dawn of wisdom flows through the ghost
Bewildered by drunken haze
That flies and turns into concealed
Eyes wide awake from this cold land
they don’t want neglect just to understand
Conscience over virus uncontrolled, Will you wait to prevail
A life on the path of a living, lack of hate and pain
Against each heart, this is my warmth, in fame
Shimon was originally scheduled to go on a world tour, giving concerts in the Netherlands, Saudi Arabia, and Greece, among other places. Obviously, that’s on hold, but in the meantime Weinberg says that his group is now working on a new project funded by the U.S. National Science Foundation “that will allow us to build multiple new robotic musicians—some will use new emotion-driven voice synthesizers, while others will hopefully play instruments such as violin and guitar. Stay tuned.”
[ Shimon Sings ]
Evan Ackerman is a senior editor at IEEE Spectrum. Since 2007, he has written over 6,000 articles on robotics and technology. He has a degree in Martian geology and is excellent at playing bagpipes.