New algorithms developed at UC Berkeley let robots learn motor tasks much more like humans do, using trial and error. Using this new approach the robot completed various tasks without prior knowledge of its surroundings, including putting a clothes hanger on a rack, assembling a toy plane, and screwing a cap on a water bottle. The research team will be presenting their latest findings in Seattle at the International Conference on Robotics and Automation (ICRA) later in May.
“What we’re reporting on here is a new approach to empowering a robot to learn,” said Professor Pieter Abbeel of UC Berkeley’s Department of Electrical Engineering and Computer Sciences in a statement. “The key is that when a robot is faced with something new, we won’t have to reprogram it. The exact same software, which encodes how the robot can learn, was used to allow the robot to learn all the different tasks we gave it.”
The UC Berkeley researchers used a new branch of artificial intelligence known as deep learning in their work. This branch is loosely inspired by the neural circuitry of the human brain when it perceives and interacts with the world.
“For all our versatility, humans are not born with a repertoire of behaviors that can be deployed like a Swiss army knife, and we do not need to be programmed,” said Levine. “Instead, we learn new skills over the course of our life from experience and from other humans. This learning process is so deeply rooted in our nervous system, that we cannot even communicate to another person precisely how the resulting skill should be executed. We can at best hope to offer pointers and guidance as they learn it on their own.”
In the experiments, the UC Berkeley researchers worked with a Willow Garage Personal Robot 2 (PR2), which they nicknamed BRETT, or Berkeley Robot for the Elimination of Tedious Tasks.
They presented BRETT with a series of motor tasks, such as placing blocks into matching openings or stacking Lego blocks. The algorithm controlling BRETT’s learning included a reward function that provided a score based upon how well the robot was doing with the task. BRETT takes in the scene, including the position of its own arms and hands, as viewed by the camera. The algorithm provides real-time feedback via the score based upon the robot’s movements. Movements that bring the robot closer to completing the task will score higher than those that do not. The score feeds back through the neural net, so the robot can learn which movements are better for the task at hand. This end-to-end training process underlies the robot’s ability to learn on its own. As the PR2 moves its joints and manipulates objects, the algorithm calculates good values for the 92,000 parameters of the neural net it needs to learn.
With this approach, when given the relevant coordinates for the beginning and end of the task, the PR2 could master a typical assignment in about 10 minutes. When the robot is not given the location for the objects in the scene and needs to learn vision and control together, the learning process takes about three hours.