While probabilistic models are useful to classify and infer trajectories, a common problem is that their construction usually requires the time alignment of training data such that spatial correlations can be properly captured. In a single-agent robot case, this is usually not a problem as robots move in a controlled manner. However, when the human is the agent that provides observations, repeatability and temporal consistency becomes an issue as it is not trivial to align partially observed trajectories of the observed human with a probabilistic model, particularly online and under occlusions. Since the goal of the human movement is unknown, it is difficult to estimate the progress or phase of the movement. We approach this problem by testing many sampled hypotheses of his/her movement speed, online. This usually allows us to recognize the human action and generate the appropriate robot trajectory. The video shows some of the benefits of estimating phases for faster robot reactions. It also shows the interesting case when the robot tries to predict the human motion too early, therefore leading to some awkward/erroneous coordination.
Maeda, G.; Ewerton, M; Neumann, G.; Lioutikov, R.; Peters, J. “Phase Estimation for Fast Action Recognition and Trajectory Generation in Human-Robot Collaboration”, Accepted. International Journal of Robotics Research (IJRR). [pdf][BibTeX]
Maeda, G.; Neumann, G.; Ewerton, M.; Lioutikov, R.; Peters, J. (2015). “A Probabilistic Framework for Semi-Autonomous Robots Based on Interaction Primitives with Phase Estimation”, International Symposium of Robotics Research (ISRR). [pdf][BiBTeX]
Ewerton, M.; Maeda, G.; Peters, J.; Neumann, G. (2015). “Learning Motor Skills from Partially Observed Movements Executed at Different Speeds”, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 456–463. [pdf][BibTeX]
Interaction Probabilistic Movement Primitive (Interaction ProMP) is a probabilistic framework based on Movement Primitives that allows for both human action recognition and for the generation of collaborative robot policies. The parameters that describe the interaction between human-robot movements are learned via imitation learning. The procedure results in a probabilistic model from which the collaborative robot movement is obtained by (1) conditioning at the current observation of the human, and (2) inferring the corresponding robot trajectory and its uncertainty.
The illustration below summarizes the workflow of Interaction ProMP where the distribution of human-robot parameterized trajectories is abstracted to a single bivariate Gaussian. The conditioning step is shown as the slicing of the distribution a the observation of the human. In the real case, the distribution is multivariate and correlates all the weights of all demonstrations.
These are some related publications
Maeda, G.; Ewerton, M.; Lioutikov, R.; Ben Amor, H.; Peters, J. & Neumann, G. Learning Interaction for Collaborative Tasks with Probabilistic Movement Primitives Proceedings of the International Conference on Humanoid Robots (HUMANOIDS), 2014, 527-534. [pdf here].
Maeda, G.; Neumann, G.; Ewerton, M.; Lioutikov, R.; Kroemer, O. & Peters, J. Probabilistic movement primitives for coordination of multiple human–robot collaborative tasks Autonomous Robots, 2017, 41, 593-612. [pdf here].
Maeda, G.; Ewerton, M.; Neumann, G.; Lioutikov, R. & Peters, J. Phase Estimation for Fast Action Recognition and Trajectory Generation in Human-Robot Collaboration International Journal of Robotics Research (IJRR), Accepted. [pdf here].
This code in Matlab shows a simple toy problem example where an observed agent with two degrees-of-freedom (DoF) is trained together with an unobserved agent, also with two DoFs. The observed agent could be the human, and the unobserved agent the robot. Note that to collect training data we assume both agents are observed. This means that we will learn the initial distribution by demonstrations. Once the model is learned (the green patch in the figure), we can observe only the human (the two blue dots) to find out a posterior distribution (the red patch), which can be used to control a robot.
This video shows the vanilla implementation of multiple Interaction ProMPs running on an assembly task. Note that the robot response is quite slow as the human has to wait for the action recognition. The related papers are the HUMANOIDS 2014 and the AURO 2016.
We improved the robot response by proposing a probabilistic method to estimate the phase of the human as he/she moves. A simple version of this method is described in this IJRR 2017 paper and a more sophisticated version that can also address incomplete observations can be found in this IROS 2015 paper. The video next shows the result with phase estimation.
In our quest to make the interaction as fluid as possible, we also considered predicting the possible sequences of collaborative actions by constructing a lookup table with many variations of an assembly task. Interaction ProMPs’ action recognition are used with nearest-neighbor to search for the most probable sequence. This method was presented in this AAAI sympoium paper here. The video is shown below.
Here you find a minimalist code in Matlab that uses stochastic optimization for motion planning. It is inspired by a few methods: it uses the exploration of parameters proposed in STOMP, with code based on the Pi2 implementation, and the update efficiency of REPS. Despite the code being inspired by different methods, I tried to make its structure as simple as possible. Partially because it does not need the DMPs of Pi2 (as in STOMP), and because the episodic version of REPS can be implemented in a few lines of code. The main advantage is the fact that you do not need gradients, and the method works even when the cost is badly behaved, with plateaus and discontinuities. The code that you can find here will run in two different modes. In the first run, the start and goal states are anchored. In the second run, the start and goal are free to float, which allows the solution to preserve more of the original starting shape. The latter was particularly useful to solve the problem described in this paper (which you can also cite if you use the code).
Aligning the phase of the motion of different trajectories to a single reference phase is usually a recurrent problem. This is particularly an issue of probabilistic models that are constructed from multiple demonstrations. Often Dynamic Time Warping (DTW) is used. It provides a global, optimal solution and works pretty well when trajectories are somewhat similar. When trajectories are extremely different DTW tends to generate unnatural solutions, usually caused by having many time indexes of one trajectory being repeated at the other trajectory. Heuristics to avoid this problem were already addressed in the seminal paper of Sakoe and Chiba 1978. This problem is critical for trajectories of dynamical systems and one alternative is to enforce a 1:1 mapping on time indexes. This can be achieved by imposing a smooth function on the time alignment mapping using local optimization. The code in the repo provides this idea implemented in Matlab. There is no free lunch, we trade-off the heuristics of DTW by the usual heuristics of a local optimizer (initial guess, learning rate, and convergence criteria). But in my experience, the usual parameters of an optimizer are much easier to adjust and while covering a wider range of data input. Try it yourself: source code with a short explanation of the method in a pdf document can be found here. If you find the code useful and use it in your work you can cite this paper.