One of the most successful recipes in robot learning is to initialize the robot with a *good policy* via imitation learning and then refine the result with reinforcement learning. Strong assumption: the human must be capable of providing the initial demonstration. This assumption holds well for certain quasi-static tasks (grasping, reaching, etc.) or when the robot is capable of generating accelerations closer to that of a human (very few robots have this capability). However, these stringent assumptions often do not hold.
For example, how would you teach a limited-torque robot arm to swing-up a heavy ball to solve the cup-and-ball game? The robot cannot launch the ball vertically as most of us would do due to the lack of torque required to generate large upwards accelerations. The solution is then to swing the cup sideways to build up momentum. But how many swing ups are required? One? Two? … Five?
Here, we investigate how the human and robot can solve a task togetherwhen the human also lacks intuition for providing a single, initial demonstration. We assume the robot starts with a blank policy. And use policy search to provide the robot with a local exploration noise combined with human feedback in the action space. In a project led by Carlos Celemin (University of Chile) and in collaboration with Jens Kober (TU Delft), we proposed a method combining COrrective Advice Communicated by Humans (COACH) [pdf][BiBTeX] with policy search. This combination optimizes a movement primitive that is used to generate robot trajectories.
COACH allows for human feedback that is only qualitative (“go up”, “go left”), and thus easy for any non-expert user to also train the robot. Under the hood, the method actually models the human feedback to predict what is the magnitude of the human advice. Moreover, human feedback can be done at any time during the training, at every roll-out, once in a while, or not at all. In the latter, the process reduces to that of pure autonomous learning.
Preliminary results and a brief overview of the method can be found in this poster presented at IROS 2017 [pdf].
November 2017. I am presenting our most recent work at the 1st Annual Conference on Robot Learning (CoRL 2017). The paper can be found here. The paper proposes active learning to make a robot incrementally learn and refine movement primitives. You can watch the 5 min. talk below and check the presented poster here.
Robots that can be programmed by non-technical users must be capable of learning new tasks incrementally, via demonstrations. This poses the problem of selecting whento teach a new robot skill, or whento generalize a skill based on the current robot’s repertoire. Ideally, robots should actively make such decisions. The robot must quantify the suitability of its own skill set for a given query. It must reason whether it is confident enough to execute the task by itself, or if it should request a demonstration or corrections from a human.
We investigate algorithms for active requests for incremental learning of reaching skills via human demonstrations. Gaussian processes are used to extrapolate the current skill set with confidence margins, which are then encoded as movement primitives to accurately reach the desired query in the workspace of the robot. This combination allows the robot to generalize its primitives using as few as a single demonstration.
In the video below you can see a robot indicating to the user which demonstrations should be provided to increase its repertoire of skills. The experiment also shows that the robot becomes confident in reaching objects for whose demonstrations were never provided, by incrementally learning from the neighboring demonstrations.
The contribution is reported in this paper
Maeda, G.; Ewerton, M.; Osa, T.; Busch, B. & Peters, J. “Active Incremental Learning of Robot Movement Primitives”. Proceedings of Machine Learning Research (PMLR) 1st Annual Conference on Robot Learning (CoRL), 2017, 78: Conference on Robot Learning (CoRL), 37-46. [pdf]
In collaboration with the Flowers Group in Inria, we addressed the problem of how to generate robot motions that are ergonomically optimal with respect to the human partner. The proposed approach uses metrics provided by the literature in musculoskeletal disorders to optimize the kinematic posture of a calibrated human skeleton. Once this posture is found, we use the ISOEMP methodto learn a trajectory for the robot based on the observation of the other human partner, and adapt it to satisfy the ergonomic solution. Details can be found in this paper:
Busch, B.; Maeda, G.; Mollard, Y.; Demangeat, M. & Lopes, M. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017
Imitation learning is useful to endow robots with skills that are difficult, if not impossible, to program by hand.
For example, a golf swing movement that exploits the redundancy of a 7 degree-of-freedom arm, or a collaborative skill that must be coordinated with the movement of a human partner.
Kinesthetic teaching and teleoperation are now widely accepted methods to provide demonstrations for imitation learning, mainly because they avoid the correspondence problem.
However, these methods are still far from ideal.
In human-robot collaboration, kinesthetic teaching is disruptive and natural interactions cannot be demonstrated.
When learning skills, the physical embodiment of the robot obstructs truly optimal and natural human demonstrations.
Ideally, robots should learn simply by observing the human.
Direct observations pose the problem that a movement that can be demonstrated well by a human may not be kinematically feasible for robot reproduction.
In this paper  we address this problem by using
stochastic search to both find the appropriate location of the demonstration reference frame with respect to the robot, and to adapt the demonstrated trajectory, simultaneously.
This means that a human demonstrator can show the skill anywhere without worrying if the robot is capable or not of reproducing it kinematically.
Our optimizer aims at finding a feasible mapping for the robot such that its movement resembles the original human demonstration.
Later, we used this method to generate human-like movements that also address the ergonomics of the human partner. Check this post.
. Maeda, G.; Ewerton, M.; Koert, D. & Peters, J. Acquiring and Generalizing the Embodiment Mapping From Human Observations to Robot Skills IEEE Robotics and Automation Letters, 2016, 1, 784-791. pdf here.
While probabilistic models are useful to classify and infer trajectories, a common problem is that their construction usually requires the time alignment of training data such that spatial correlations can be properly captured. In a single-agent robot case, this is usually not a problem as robots move in a controlled manner. However, when the human is the agent that provides observations, repeatability and temporal consistency becomes an issue as it is not trivial to align partially observed trajectories of the observed human with a probabilistic model, particularly online and under occlusions. Since the goal of the human movement is unknown, it is difficult to estimate the progress or phase of the movement. We approach this problem by testing many sampled hypotheses of his/her movement speed, online. This usually allows us to recognize the human action and generate the appropriate robot trajectory. The video shows some of the benefits of estimating phases for faster robot reactions. It also shows the interesting case when the robot tries to predict the human motion too early, therefore leading to some awkward/erroneous coordination.
Maeda, G.; Ewerton, M; Neumann, G.; Lioutikov, R.; Peters, J. “Phase Estimation for Fast Action Recognition and Trajectory Generation in Human-Robot Collaboration”, Accepted. International Journal of Robotics Research (IJRR). [pdf][BibTeX]
Maeda, G.; Neumann, G.; Ewerton, M.; Lioutikov, R.; Peters, J. (2015). “A Probabilistic Framework for Semi-Autonomous Robots Based on Interaction Primitives with Phase Estimation”, International Symposium of Robotics Research (ISRR). [pdf][BiBTeX]
Ewerton, M.; Maeda, G.; Peters, J.; Neumann, G. (2015). “Learning Motor Skills from Partially Observed Movements Executed at Different Speeds”, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 456–463. [pdf][BibTeX]
Interaction Probabilistic Movement Primitive (Interaction ProMP) is a probabilistic framework based on Movement Primitives that allows for both human action recognition and for the generation of collaborative robot policies. The parameters that describe the interaction between human-robot movements are learned via imitation learning. The procedure results in a probabilistic model from which the collaborative robot movement is obtained by (1) conditioning at the current observation of the human, and (2) inferring the corresponding robot trajectory and its uncertainty.
The illustration below summarizes the workflow of Interaction ProMP where the distribution of human-robot parameterized trajectories is abstracted to a single bivariate Gaussian. The conditioning step is shown as the slicing of the distribution a the observation of the human. In the real case, the distribution is multivariate and correlates all the weights of all demonstrations.
These are some related publications
Maeda, G.; Ewerton, M.; Lioutikov, R.; Ben Amor, H.; Peters, J. & Neumann, G. Learning Interaction for Collaborative Tasks with Probabilistic Movement Primitives Proceedings of the International Conference on Humanoid Robots (HUMANOIDS), 2014, 527-534. [pdf here].
Maeda, G.; Neumann, G.; Ewerton, M.; Lioutikov, R.; Kroemer, O. & Peters, J. Probabilistic movement primitives for coordination of multiple human–robot collaborative tasks Autonomous Robots, 2017, 41, 593-612. [pdf here].
Maeda, G.; Ewerton, M.; Neumann, G.; Lioutikov, R. & Peters, J. Phase Estimation for Fast Action Recognition and Trajectory Generation in Human-Robot Collaboration International Journal of Robotics Research (IJRR), Accepted. [pdf here].
This code in Matlab shows a simple toy problem example where an observed agent with two degrees-of-freedom (DoF) is trained together with an unobserved agent, also with two DoFs. The observed agent could be the human, and the unobserved agent the robot. Note that to collect training data we assume both agents are observed. This means that we will learn the initial distribution by demonstrations. Once the model is learned (the green patch in the figure), we can observe only the human (the two blue dots) to find out a posterior distribution (the red patch), which can be used to control a robot.
This video shows the vanilla implementation of multiple Interaction ProMPs running on an assembly task. Note that the robot response is quite slow as the human has to wait for the action recognition. The related papers are the HUMANOIDS 2014 and the AURO 2016.
We improved the robot response by proposing a probabilistic method to estimate the phase of the human as he/she moves. A simple version of this method is described in this IJRR 2017 paper and a more sophisticated version that can also address incomplete observations can be found in this IROS 2015 paper. The video next shows the result with phase estimation.
In our quest to make the interaction as fluid as possible, we also considered predicting the possible sequences of collaborative actions by constructing a lookup table with many variations of an assembly task. Interaction ProMPs’ action recognition are used with nearest-neighbor to search for the most probable sequence. This method was presented in this AAAI sympoium paper here. The video is shown below.