Learning from demonstration is often explained from a perspective that the working
Robot-control-system is available and the human-demonstrator is using it. And indeed, if the software works, the
Human operator takes the robot-arm, makes a move with it, and the robot will reproduce the action later. For example, he teaches the robot-arm how to put a cup under a coffeemaker and press the start-button. In the replay phase, the robot is imitating this behavior 1:1. But that is not how the system works internally; it is only what the audience can observe. In reality, Learning from demonstration is much more complex. One of the first works on learning by robot apprentices (anthropomorphic robots learning by imitation) was Adrian Stoica's PhD thesis in 1995. In 1997, robotics expert
Stefan Schaal was working on the
Sarcos robot-arm. The goal was simple: solve the
pendulum swingup task. The robot itself can execute a movement, and as a result, the pendulum is moving. The problem is, that it is unclear what actions will result into which movement. It is an
Optimal control-problem which can be described with mathematical formulas but is hard to solve. The idea from Schaal was, not to use a
Brute-force solver but record the movements of a human-demonstration. The angle of the pendulum is logged over three seconds at the y-axis. This results into a diagram which produces a pattern. In computer animation, the principle is called
spline animation. That means, on the x-axis the time is given, for example 0.5 seconds, 1.0 seconds, 1.5 seconds, while on the y-axis is the variable given. In most cases it's the position of an object. In the inverted pendulum it is the angle. The overall task consists of two parts: recording the angle over time and reproducing the recorded motion. The reproducing step is surprisingly simple. As an input we know, in which time step which angle the pendulum must have. Bringing the system to a state is called “Tracking control” or
PID control. That means, we have a trajectory over time, and must find control actions to map the system to this trajectory. Other authors call the principle “steering behavior”, because the aim is to bring a robot to a given line. ==See also==