U-M Researchers Are ‘Teaching’ Robots to ‘Learn’ Like Humans

403
Peter Mitrano demonstrates a rope manipulation robot learning experiment. // Photo by Daryl Marshke/U-M
Peter Mitrano demonstrates a rope manipulation robot learning experiment. // Photo by Daryl Marshke/U-M

Robotics researchers at the University of Michigan in Ann Arbor are using a new approach to “training” robots that work with soft objects like ropes and fabrics, or in cluttered environments.

By expanding the training data sets for the robots, the researchers say they are inching closer to robots that can learn on the fly like humans do, reducing learning time for new materials and environments down to a few hours rather than a week or two.

In simulations, the expanded training data set improved the success rate of a robot looping a rope around an engine block by more than 40 percent and nearly doubled the successes of a physical robot for a similar task.

That task is among those a robot mechanic would need to be able to do with ease. But using today’s methods, learning how to manipulate each unfamiliar hose or belt would require huge amounts of data, likely gathered for days or weeks, says Dmitry Berenson, U-M associate professor of robotics and senior author of a paper presented last week at Robotics: Science and Systems in New York City.

In that time, the robot would play around with the hose — stretching it, bringing the ends together, looping it around obstacles and so on — until it understood all the ways the hose could move.

“If the robot needs to play with the hose for a long time before being able to install it, that’s not going to work for many applications,” Berenson says.

Human mechanics likely would be unimpressed with a robot co-worker that needed that kind of time. So, Berenson and Peter Mitrano, a doctoral student in robotics, put a twist on an optimization algorithm to enable a computer to make some of the generalizations humans do — predicting how dynamics observed in one instance might repeat in others.

In one example, the robot pushed cylinders on a crowded surface. In some cases, the cylinder didn’t hit anything, while in others, it collided with other cylinders and they moved in response.

If the cylinder didn’t run into anything, that motion can be repeated anywhere on the table where the trajectory doesn’t take it into other cylinders. This is intuitive to a human, but a robot needs to get that data. And rather than doing time-consuming experiments, Mitrano and Berenson’s program can create variations on the result from that first experiment that serve the robot in the same way.

They focused on three qualities for their fabricated data. It had to be relevant, diverse, and valid. For instance, if you’re only concerned with the robot moving cylinders on the table, data on the floor is not relevant. The flip side of that is that the data must be diverse — all parts of the table, all angles must be explored.

“If you maximize the diversity of the data, it won’t be relevant enough. But if you maximize relevance, it won’t have enough diversity,” Mitrano says. “Both are important.”

And finally, the data must be valid. For example, any simulations that have two cylinders occupying the same space would be invalid and need to be identified as invalid so that the robot knows that won’t happen.

For the rope simulation and experiment, Mitrano and Berenson expanded the data set by extrapolating the position of the rope to other locations in a virtual version of a physical space — so long as the rope would behave the same way as it had in the initial instance. Using only the initial training data, the simulated robot hooked the rope around the engine block 48 percent of the time. After training on the augmented data set, the robot succeeded 70 percent of the time.

An experiment exploring on-the-fly learning with a real robot suggested that enabling the robot to expand each attempt in this way nearly doubles its success rate over the course of 30 attempts, with 13 successful attempts rather than seven.