Wednesday, July 28, 2021
Home » Education » Georgia Tech’s Shimon Can Jam!

Georgia Tech’s Shimon Can Jam!

Much like a human marimbist, Shimon concentrates on playing the correct notes.

A Center-Stage, Jazz-Playing,

Robot Musician,

But Does It Have Soul?

When most of think of robots, we think of task-orientated machines, not grooving hep-cats laying down the sound with their bandmates while bobbing their head to the beat.

Shimons four arms each have two mallets in them. The mallets closet to its body are for the marimbas white keys and the farther ones are for the black keys.

However, that is what the team at Georgia Techs Center for Music Technology created when they designed Shimon, a marimba-playing robot that can improvise jazz solos and backup human musicians based upon their playingnot based on a pre-programmed sequence of notes and rhythms. And it learns over time to improve its virtuosity, accompaniment and improvisational skills!

Gile Weinberg is the director of music technology at Georgia Tech.

The robot, which consists of a head and four arms, listens to a human musician, watches for visual cues and continuously adapts its improvisation and choreography to the music of the human, according to the team at The Georgia Tech Center for Music Technology that created it.

These are three of the many concept sketches that were made to develop Shimon.

This is a novel interactive system based on the notion of gestures for both musical and visual expression. To do this, the system uses anticipatory beat-matched action to enable real-time synchronization with the human player. If you are an improvisational musician, this may sound creative and interesting, and if you are an orchestral musician we can imagine the complementary contribution of machine intelligence to established compositions!

Shimons head can swivel around, as if it is checking out the other musicians who are playing. And, in actuality, it is. Shimon is programmed to see and hear what the other players are doing.


Shimon has given live ensemble performances of Duke Jordans Jordu and has been on the Bumbershoot Tour. And invitations for future engagements keep rolling in. So much so, that its creators back in Atlanta are finding it hard to work on Shimon to develop improvements as it seems to be continuously on the road.

Make no mistake, Shimon is not a modern take on a roll piano, or a sequencer linked to mechanical movements to create a sophisticated dancing puppet. It is also not a novelty act. Shimon is a real, center stage, jazz-playing musician, loaded with artificial intelligence to determine which note is the next best note to play based in part on music theory and in part on understanding the music styles of legendary jazz musicians like Thelonious Monk, Charlie Parker and Duke Ellington, to name a few.

It also possesses innovative human-robot interaction in order for it to connect with its fellow musicians. Our motto is listen like a human and improvise like a machine, says Gil Weinberg, director of music technology at Georgia Tech and one of the creative forces behind Shimon. He said jazz was chosen as a topic for robotics because it is a demanding medium and improvisation is a challenging form of expression.

It may be a Robot Musician but its human-robot interaction and artificial intelligence may provide lessons for robot developers everywhere no matter what problem they are trying to solve.


Shimon, Hebrew for listener, is the second Robot Musician created by the Georgia Tech Center for Music Technology. The first was Haile, a robot percussionist created in 2007, which was conceived to interact rhythmically with two human percussionists on a Native American pow-wow drum. It cost around $500 to construct and program.

Weinberg said the group started with percussion first as rhythm is a basic musical expression. Determining tempo and beat are integral first steps to music. Once success was reached by Haile drumming, the next step was to add tonal qualities leveraging the traits learned on Haile. Haile was modified to play a toy xylophone, generating melodic responses based on a genetic algorithm. While it served as an initial platform for musical experiments, the system was limited to one octave and was restricted to score-based interaction.

A marimba playing Robot Musician seemed the next logical step. The marimba is a percussion instrument played with mallets on bars aligned on a horizontal frame much like piano keys. The increased tonal possibilities could build upon the rhythms and melodies previously explored. However, with the increased capabilities came an increased price-tag to the tune of around $120,000 over three years, which was funded by the National Science Foundation (NSF).


 It may not have a diploma from Berkeley College of Music or Julliard, but Shimon really has studied music and is playing and improvising for real. A standard approach for musical robots is to handle a stream of MIDI (musical instrument digital interface) notes and translate them into actuator movement that produces notes and Shimon is no different in this respect. MIDI is used both for the programming of musical knowledge as well as during live performance when it is listening and playing along with human musicians in real time.

Ryan Nikolaidis.


We have implemented a variety of Music generation systems. The most recent (and we have found to be the most successful) uses a variable order Hidden Markov Model (HMM), explains Ryan Nikolaidis, PhD candidate and who worked on programming Shimons musical capabilities.

For offline learning, we run a system on improvisations by the Masters. Working with improvisation, we have focused on a jazz idiom, and therefore trained on some of the jazz masters, like Charlie Parker, John Coltrane, Thelonious Monk, Duke Ellington and others, Nikolaidis says. As far as analysis, it splits up musical information into chains for pitch, rhythm (inter onset intervals), and velocity. Each of the chains is saved as .xmls unique for each author/improvisor.

In performance, when the system is improvising it constantly tries to find out what is the next note to play and when should that note be played. It does so by referring back to these pitch, duration, and velocity chains. It looks for matches between these chains and the last notes played by itself and/or by the human performer(s). All of these matches are weighted according to length of the match. So if the last five notes played by a human performer match five notes in a John Coltrane pitch chain, the next note of that John Coltrane pitch chain is weighted highly. Obviously, most often there is more than one match. It chooses from these matches pseudo randomly (with weighting given to better matches).

Similarly, chess programs also run similar analysis of the next move assigning higher weightigs to the more correct move after reviewing thousands of possibilities.

The programming for Shimon, in Java and C, is building off ideas set out by Francois Pachets Continuator and incorporating elements of growing intelligence, where it continually builds a knowledge database of individual styles that it recombines to create something unique. Pachets continuator also uses HMMs for style modeling, but it relies more on call and response interaction.

Shimons four arms slide back and forth on the rail that is between it and the four-octave marimba.


Shimon continues to learn so it is possible that in five years time, you will be listening to a much more accomplished performer than the one it is today.

This gets overlooked and it is probably what excites me most about the work. Rather than just trying to recreate the style model the performances of others, we are ultimately interested in how modeling can be applied to recombining multiple models to create a unique and truly creative improvisation. Its a clear Turing test with one model. You have it model only John Coltrane, then have it play. If it plays like John Coltrane (without just replicating the original music), youve done youre job. To me it gets interesting once you start adding two, three, or more models, because now we start to move from style modeling to language modeling (how do we sum up our past experiences to create something new?). Aside from just the idea of combining multiple models, it IS continual. Every time it plays with someone, I let it know the name of the person with whom it is playing. It saves a comprehensive model of how that person performed and interacted. If the same person comes back, it will recall their model and start adding on to it. In any future performance, it can use this information to inform the way it plays, says Nikolaidis.

Ultimately, I want it to be able to remember more. For instance, maybe when I played with it I liked that it used more Monk and less Ellington. Eventually we will incorporate memory so that it will recall this information from previous times that we rehearsed together and bring that to every new interaction (but only with me), said Nikolaidis.

Everyday that I make it into the lab and get some time to jam with Shimon, its language continues to grow. My job now is just to keep trying to improve how it grows, so that it can make the best use of the information that it hears, Nikolaidis adds.


Shimon was programmed to possess a degree of music theory but Nikolaidis said he wanted to keep its knowledge of music more general rather than overload it with rules. In programming I have tried my best to avoid too much hard coded music theory assumptions, especially any that point directly towards Western Tonal Theory or any exclusive theory. So, there isnt any hardwired definition of a C-chord, for instance. Rather, there is a more basic understanding of how to combine the notes that I play with the notes it plays. For instance, Thelonious Monk is totally happy with smashing those minor seconds. So, if the model is building from Monk, and I play a C-chord, theres a strong possibility that it might respond with C-#s, Fs, and G#s, which would not be the case if it relied on a model of Vivaldi. The hypothesis is that it can learn musical language…classical, folk…etc from modeling an amalgamation of the distinct styles that make up that language. So, in studying jazz we model folks like Coltrane and Monk and Parker. Generally when programming for Shimon I ask myself if the way I am writing the code would exclude it from being able to learn anything from Bach, to Ravi Shenkar, to Stockhausen. If it does, then I am inherently limiting its potential, Nikolaidis intones.


Shimon is a robot plat platform that needs to succeed on several levels. The first level is that it must play its instrument and hit the right notes. The second is that it must illustrate gestures that show that it is interacting with its fellow musicians as well as concentrating on its own playing. And it must do both of these things in sync with each other. Further, with its round head and eye it provides a visual connection to other musicians and the audience. Last but not least, it needs durability and reliability and it must be able to perform for a considerable period of time and live in front of other musicians and an audience. For that reason, industrial-grade components were employed when developing the robot.

Several considerations influenced the physical design of Shimon. Large movements for visibility, as well as fast movements for virtuosity were employed. In addition, the goal was to allow for a wide range of sequential and simultaneous note combinations. The resulting design was a combination of fast, long-range, linear actuators, and two sets of rapid parallel solenoids, split over both registers of the instrument.


When designing Shimons head, my main goal was to keep the design non-anthropomorphic, and in line with the rest of the robot, both in terms of materials, and in terms of shape and scale. I first set out to distill the core social competencies of musical interaction: keeping rhythm and communicating that internal rhythm to fellow band members; turn-taking: indicating concentration (by looking down) and expectation (by turning the gaze towards others); and coordinating anticipation and expectation in the physical space, says Guy Hoffman, post doctoral associate at the Center for Music Technology and who primarily worked on Shimons head.

Then I asked: how do you put these competencies in as simple of a package as possible. Since the rest of Shimon was very minimal, I didnt want to design a head that was too cartoonish or too human, as some early designs indicated. Instead, I wanted to make it clear that Shimon is a machine, a robot – just a robot that can do what one needs to do to play its part in a band. And we also wanted to include a camera, so that Shimon can not only communicate its own internal state, but also detect the social cues from other band members, Hoffman said.

Shimon jams with a young lady who is playing drums.

Hoffman also took a pragmatic and practical view. I paid particular attention to the orientation of the motors, using my training as an animator to explore the minimal set of motors that can create a life-like movement, an exercise many animators do when they try to install personality into simple shapes. Using the same animation principles, I tried to give Shimons head the right kinds of acceleration and velocity envelopes to make him groove as well as he plays. The eyelids add another level of [animation], Hoffman explains.

The robot is comprised of four arms, each actuated by a voicecoil linear actuator at its base, and running along a shared rail, in parallel to the marimbas long side. It can play up to four simultaneous notes with a frequency of over 10 Hz per mallet. The robots trajectory covers the marimbas four octaves. The linear actuators are based on a commercial product by IAI and are controlled by a SCON trajectory controller. They can reach acceleration of 3g, andat top speedmove at approximately one octave per 0.25 seconds.

The arms are custom-made aluminum shells housing two rotational solenoids each. The solenoids control mallets, chosen with an appropriate softness to fit the area of the marimba that they are most likely to hit. Each arm contains one mallet for the bottom row (white) keys and one for the top-row (black) keys. Shimons physical structure was also designed in collaboration with Roberto Aimi, principal of Alium Labs.

The body, which consists of the arms moving in parallel along the shared rail and the linear actuated strikers, has two Degrees of Freedom each. The head has four DoFs: the base (moving the neck rotationally), the neck (moving the neck up and down), head rotation, and head movement up and down. There are also two Dynamixels in the head that control the blinking of its eye.


If you look at some Robot Musicians, you will notice there is a distinct difference in the way Shimon is created and the way others are. In some ways, Shimon has physical limitations, which actually increases its physical performance. In other marimba Robot Musicians for example, every bar on the marimba has a striker above. For mechanical performance demands, this is great, because the delay between telling it to play a note and the note being played is only the time it takes for the striker above the bar to strike it.

In Shimons case, it has four arms. So when Shimon goes to play a note, first he has to figure out which arm to use (the path planning for this is much more complicated that you would think, which includes collision avoidance), then move the arm, and then strike. This is absolutely a much longer and less efficient process. However, through a series of studies we have found that the anticipatory visual cues, as you see the arm moving towards the destination note, helps us significantly to synchronize with the robot (which is crucial when it is truly improvising and we cant predict exactly which notes it is going to play). Also, this limitation really changes the way you approach working with and programming for the robot. When you have a note at any time at your disposal, I think one is more inclined to just treat the robot as a synthesizer or some piece of hardware, but when the gestures are large and limiting, you need to make the best use of every movement (as we also strive for as musicians), Nikolaidis imparts.

The mechanics and movements and the noise resulting from those add to Shimons expressiveness. A colleague took notice of all of the mechanical noises that go on as Shimon moves, between the arms moving across the shared rail, and various DoFs cycling in the breathing patterns of Shimons head. It generates a lot of regular and irregular noise. [However], the more I work with Shimon, though, I realize this is a beautiful part of Robotic Musicianship, as the parts hum and crank…, these sounds themselves have become cues for me and I have begun anticipating movements based on hearing servos kick on, Nikolaidis relates.

In a way all its inefficiency increases the human-robot interaction considerably. We use this (and other applications) with a robot that facilitates visual cues, physical embodiment, anticipation, synchronization and coordination with a human player. This cannot be achieved in traditional human-computer interactionnot to mention rich acoustic sound, as opposed to the digital sound in humancomputer musical interaction, Weinberg adds.


So does the introduction Robots Musicians mean that long after robots rise up and destroy humanity, there will still be someone around pretending to like jazz? as comedian Stephen Colbert of the Colbert Report, noted in his news skit on the robot. Is this the end of life as we know it? No. This is not Terminator Jazz but it may point the way towards a new form of musical expression that is only just having its surfaced scratched.

The great Pat Methany.


Indeed, others such as Grammy award-winning jazz guitarists Pat Metheny, who is currently on his Orchestrion Tour, which employs Robot Musicians in sync with him in live improvisation, have found the technology inspiring.

I can only speak about what this all means to me personally, but what is represented in this project [Orchestrion] is organic to my interests and is intrinsic to the fairly odd skill set that I have had to develop, not just with this project, but with everything I have had to do to be the kind of musician that I ended up being. Knobs, wires, electricity, and all the rest are kind of part and parcel of the world I have lived in over the past 40 years or so, not unlike what reeds are to sax players and mouthpieces are to trumpet players. All of this, including computers and everything else, kind of make up my instrument. I am always naturally looking to see where that all might take me. This is the latest manifestation of that search, Metheny explains. He adds that his use of robots can be extremely detailed composition to 100% purely improvised and every shade in between at anytime.

Pat Methanys Orchestrion is a collection of various instruments that are programmed to play themselves while accompanying Pat during a concert.

While we shouldnt expect robot musicians to be dominating the charts soon on their own pushing human musicians aside, it may be the start of a musical trend and we could see these robot musicians appearing more frequently in the coming decade, much like we saw an explosion of sampling and sequencers in the 1980s as musicians back then embraced the technology of the day.

However, if robot musicians have a future, they will need to perform as well as their human counterparts. As Metheny relates, Good notes are good notes and good playing is always good playing; and both are often elusive. That will never change no matter who [or what] is playing them. Further, Metheny imparts this equalizer which is a truism in music whether it is robotic or not. If [Robots] are not capable of feeling good rhythmically, no one will have any interest in them – just like with people.

Weinberg says that professional musicians that have played with Shimon have experienced moments of surprise and inspiration where they have asked How did Shimon come up with this? While he admits those moments at present are few and far between, he expects them to increase as Shimons capabilities improve.


From a musical stand-point there could be real tension among robot enthusiasts and musical purists who could view Shimon and other robot musicians as a true threat to expression (or at the very least their livelihoods).

Metheny provides some perspective from his own experience using robots and says the medium is merely a new way of expression and people should focus on whether the music that is the end product is good or not and not get hung up on the technology. Behind this or any other musical effort the basic qualities of spirit, soul, and feeling and of course a high level of content harmonically, melodically and rhythmically must be there, at least for me. It is easy to get lost in a discussion here about the how rather than the actual music. For me, the satisfaction in what this has been so far has been 98% musical and about 2% for the tech/how aspect.

What makes Shimon exciting (and other robots like it) is not only that it can provide new levels of entertainment and expression, but that it is reaching new heights of human-robot interaction and artificial intelligence. Shimons ability to be able to determine when to solo, what to solo, what notes to hit, and when to provide accompaniment with an acoustic instrument is awe-inspiring to anyone that has tried to play an instrument (or program a robot for that matter). Further, its ability to focus on the most interesting or lead player, by looking at him and even bobbing its head shows an expressiveness and a connectedness beyond what a mere computer program amplifying notes from a speaker could ever hope to achieve.

Other roboticists should take noteespecially if they want their robots to form bonds with their human users with strong HRI. As musicians know, playing live music is a personal experience shared among other musicians and the audienceit is communaland performance feeds on the ability to leverage that connectedness to bring music to a higher level beyond what is written on a page, programmed, composed or intended.

One of the most important qualities Shimon possesses is that it responds to the improvised music from humans. The response is designed to be similar enough to what the human played to create a connection, but different enough (and hopefully inspiring enough) to push the music in new directions. In addition to creating new kinds of musical interactions, we hope that this can lead to new music that humans would have never come up with in traditional settings, Weinberg says.

Being robot and music enthusiasts we also cant wait for a Shimon concert in our backyards so we can see and hear this amazing new robot for ourselves.


Georgia Tech Center for Music Technology,

Pat Methenys Orchestrion Tour,

Shimon at Georgia Tech,

Words by Thomas Marsh