Robust Recognition of Complex Gestures for Natural Human-Robot Interaction

Joint work with Tobias Axenbeck and Sven Behnke

Robots coexisting with humans in everyday environments should be able to interact with them in an intuitive way. This requires that the robots are able to recognize typical gestures performed by humans such as head shaking/nodding, waving, or pointing gestures. We developed a system that is able to spot and recognize complex gestures from monocular images. To estimate their position and to represent people, we detect and track their faces and hands using classifiers trained with AdaBoost. We use few expressive features extracted out of this compact representation as input to hidden Markov models (HMMs). We segment gestures into distinct phases and train HMMs for each phase separately. Then, we construct composed HMMs, which consist of the individual phase-HMMs. Once a specific phase is recognized, we estimate the parameter of a gesture such as the pointing target. Our system is able to robustly spot and recognize a variety of complex gestures. Additionally, parameters of gestures can be accurately estimated.

Videos:

 
  • This video (XVID-MPEG4, AVI) shows that faces, facial features, and hands can be robustly tracked even under difficult and changing lighting conditions and given cluttered background. Our system reliably recognizes complex gestures. We only show the most likely recognized gesture in the video. (Click here for the video using an alternative codec.)
  •  
  • We perfomed further experiments in a different envionment. In this video (XVID-MPEG4, AVI), we show the most likely gesture individually for the left and right hand, and for bi-manual gestures.
  • Multimodal Interaction between a Humanoid Robot and Multiple Persons

    The purpose of our research is to develop a humanoid museum guide robot that performs intuitive, multimodal interaction with multiple persons. Our robot Fritz uses speech, facial expressions, eye-gaze, and gestures to interact with people. Depending on the audio-visual input, our robot shifts its attention between different persons in order to involve them into the conversation. Fritz performs human-like arm gestures during the conversation and also uses pointing gestures generated with eyes, head, and arms to direct the attention of its communication partners towards objects of interest. To express its emotional state, the robot generates facial expressions and adapts the speech synthesis.

    Latest publication:

    Video:

     
  • Our communication robot Fritz explained some smaller robots to the visitors at the Science Days in the Europa-Park Rust, October 2006. This video (wmv, 28 MB) shows the interaction between Fritz and the visitors.
  • Metric Localization with Scale-Invariant Visual Features using a Single Perspective Camera

    The Scale Invariant Feature Transform (SIFT) has become a popular feature extractor for vision-based applications. It has been successfully applied to metric localization and mapping using stereo vision and omnivision. We present an approach to Monte-Carlo localization using SIFT features for mobile robots equipped with a single perspective camera. First, we acquire a 2D grid map of the environment that contains the visual features. To come up with a compact environmental model, we appropriately down-sample the number of features in the final map. During localization, we cluster close-by particles and estimate for each cluster the set of potentially visible features in the map using ray-casting. These relevant map features are then compared to the features extracted from the current image. The observation model used to evaluate the individual particles considers the difference between the measured and the expected angle of similar features. In real-world experiments, we demonstrate that our technique is able to accurately track the position of a mobile robot. Moreover, we present experiments illustrating that a robot equipped with a different type of camera can use the same map of SIFT features for localization.

    Related publication:

    Animations:
     
  • The animated gif (2 MB) shows the evolution of the particle clouds during an localization experiment. The blue dot corresponds to the true pose of the robot and the green dot indicates the pose resulting from odometry information.
  •  
  • This video (wmv, 19 MB) shows the humanoid robot Max collecting data in an office environment. Since the robot was designed for playing soccer, its camera looks downwards. Thus, in the experiment shown here, Max has to bend backwards in order to observe the features used for localization in the environment.
  • Utilizing Learned Motion Patterns to Predict Positions of People

    Whenever people move through their environments they do not move randomly. Instead, they usually follow specific trajectories or motion patterns corresponding to their intentions. Knowledge about such patterns enables a mobile robot to robustly keep track of persons in its environment and to improve its behavior. We propose a technique for learning collections of trajectories that characterize typical motion patterns of persons. Data recorded with laser-range finders is clustered using the expectation maximization algorithm. Based on the result of the clustering process we derive a Hidden Markov Model (HMM) that is applied to estimate the current and future positions of persons based on sensory input. We present several experiments carried out in different environments with a mobile robot equipped with a laser range scanner and a camera system. The results demonstrate that our approach can reliably learn motion patterns of persons, can robustly estimate and predict the positions of multiple persons, and can be used to improve the navigation behavior of a mobile robot.

    Latest publication:

    Animations:

     
  • See mpeg-video (4.7 MB) for an experiment with a single person. The video shows a scene overview (left hand side), the results from the people tracking system which is based on laser-range data (right hand side), as well as the HMM (bottom) which is used to maintain a belief of the robot over the positions of the person. In this case we do not use vision information because we assume only one person is moving in the environment. In the HMM the red dot corresponds to the position of the person provided by the laser tracking system. The size of the squares of the states of the HMM represent the probabilty that the person is currently in the corresponding state.
  •  
  • See mpeg-video (12.8 MB) for an experiment with multiple persons. The video shows the camera images (left hand side) with the areas corresponding to a person detected by the laser tracking system, as well as one HMM (right hand side). The HMM shows the belief of the robot over the position of the person which enters the corridor as second (black trousers, blue shirt).
  •  
  • See animated gif (5.9 MB) for an experiment with two persons. Whereas the upper image depicts the belief about the position of person 1 the lower image shows the belief about the position of person 2. The circles are detected features. The grey value of each circle represents the similarity to the person corresponding to the HMM (the darker the more likely). In the beginning the robot was quite certain that persons 1 and 2 were in the room containing resting place 3.
  •  
  • See animated gif (3.4 MB) for an experiment with a moving robot. Here the robot traveled along the corridor and looked into one of the offices where it detected person A. Whereas the robot was initially rather uncertain as to where person A was, the probability of resting place 3 seriously increased after the detection.
  • Adapting Navigation Strategies Using Learned Motion Patterns of People

    We propose a method for adapting the behavior of a mobile robot according to the activities of the people in its surrounding. Our approach uses learned motion patterns of persons. Whenever the robot detects a person it computes a probabilistic estimate about which motion pattern the person might be engaged in. During path planning it then uses this belief to improve its navigation behavior. In different practical experiments carried out on a real robot we demonstrate that our approach allows a robot to quickly adapt its navigation plans according to the activities of the persons in its surrounding.

    Related publication:

    Animations:

    • Our mobile robot Albert moves into a doorway to let a person pass by (mpg-video).
    • Albert moves forward and waits until the likelihood of interfering with the person is low enough (mpg-video).
    • Albert moves away from a doorway to let a person enter the corresponding room (mpg-video).

    Learning Motion Patterns of People

    We propose a method to learn typical motion behaviors of persons. As people move through their environments, they usually do not move randomly. Instead, they often engage in typical motion patterns, related to specific locations they might be interested in approaching and specific trajectories they might follow in doing so. Knowledge about such patterns may enable a mobile robot to develop improved people following and obstacle avoidance skills. We present an algorithm that learns collections of typical trajectories that characterize a person's motion patterns. Data, recorded by mobile robots equipped with laser-range finders, is clustered into different types of motion using the popular expectation maximization algorithm while simultaneously learning multiple motion patterns. Experimental results, obtained using data collected in a domestic residence and in an office building, illustrate that highly predictive models of human motion patterns can be learned.

    Related publication:

    Animation:

    Prioritized Multi-Robot Path Planning

    Coordinating the motion of multiple mobile robots is one of the fundamental problems in robotics. The predominant algorithms for coordinating teams of robots are decoupled and prioritized, thereby avoiding combinatorially hard planning problems typically faced by centralized approaches. While these methods are very efficient, they have two major drawbacks. First, they are incomplete, i.e. they sometimes fail to find a solution even if one exists, and second, the resulting solutions are often not optimal. We developed a method for finding and optimizing priority schemes for such prioritized and decoupled planning techniques. Existing approaches apply a single priority scheme which makes them overly prone to failure in cases where valid solutions exist. By searching in the space of priorization schemes, our approach overcomes this limitation. It performs a randomized search with hill-climbing to find solutions and to minimize the overall path length. To focus the search, our algorithm is guided by constraints generated from the task specification.

    Latest publication:

    Animations: