research projects

the robot gallery

people

research publications

news

vacancies

contact

logo

Bristol Robotics Laboratory bio-engineering and intelligent autonomous systems

research projects

the robot gallery

people

research publications

contact and location

news

vacancies

UWE Logo

UoB Logo

logo Natural Facial Expressions: Towards Empathy in Humanoids

Introduction

Jules' Face

Our work advances the modelling and generation of realistic, dynamic facial behaviour in humanoid robots. Unlike most research projects in this field, the focus is on dynamic, subtle, facial expressions, rather than static exaggerated facial displays. We have investigated ways of capturing human facial motion from video for the animation of robotic faces.

Realistic, life-like robot appearance is crucial for sophisticated face-to-face robot/human interaction. Robot appearance and behaviour need to be well matched to human equivalent responses in order to meet our expectations, formed from our social experience. Violation of these expectations, due to subtle imperfections or imbalance between appearance and behaviour results in discomfort in humans that perceive or observe the robot. Japanese roboticist Masahiro Mori described this in his theory of "The Uncanny Valley", published in the early 70's.

Researchers predict that one day, robotic companions will work with, or assist, humans in space, care, education and many other fields. The effects of the "Uncanny Valley" would be counterproductive to efforts to achieve trustworthiness, reliability and emotional intelligence. All these are basic requirements for robotic companions, assisting astronauts in space or care robots employed as social companions for the elderly or infants.

Our Team

Our lead investigator is Peter Jaeckel, (Bristol Robotics Laboratory and the University of the West of England) supervised by Professor Chris Melhuish (Bristol Robotics Laboratory), and Dr Neill Campbell (University of Bristol). He has been assisted recently by a collaboration with Carl Henrik Ek and Dr. Neil Lawrence, both of the University of Manchester.

Our Robot Platforms

We employ robotic heads purchased from Hanson Robotics. You can see Jules and Eva, our robots, as they were before delivery, in these You Tube videos. Jules soon made his views of us and his ambitions apparent as you can see in the video.

Jules has a soft, pliable face which is moved by 34 actuators to produce realistic mimicry of human expressions.

Background

In order to maintain high-level and sophisticated robot-human interaction, robots require a similar and a realistic, life-like appearance. Furthermore, life-like appearance demands appropriate robot behaviour, which is essential for meeting the expectations of human observers. Until recently, researchers have avoided high degrees of life-likeness in order to avoid the effects of the 'Uncanny Valley'. Others argued that lowering the expectations by using simple, creature-like appearances, may enhance interaction quality by exploiting the fact that humans may evaluate the interaction more positively if the robot has unexpected capabilities.

You need a Flash player
A movie of Jules imitating the expressions of an actress.
(click the arrow button to play)

In reality though, a lack of human-like appearance leads to underestimating a robot's capabilities. There are significant delays between encounter and engaging with such artificially intelligent entities. Before high-level interaction can take place, human subjects interacting with a robot need to explore its capabilities, rather than engaging with the robot straight away.

The study of the human mind and behaviour makes use of artificial interaction partners in form of emotionally expressive robots. For experimental purposes, human interaction partners are replaced by an artificial agent to simulate human-human interaction. Physical presence makes subjects remember the interaction in more detail. [C. Kidd, 2003] Researchers argue that tools used for the simulation and study of human behaviour must be as natural as possible in appearance and behaviour.

There are robotics research projects that mirror emotional facial behaviour on an emotional level [Hegel, 2006]. Others recognise facial behaviour and describe it in an emotion manifold, [Littlewort, 2006]. The work in [Valstar, 2004] recognises facial behaviour on the basis of facial action units.

Humans can mimic facial expression effortlessly and projection of the perceived to felt and performed facial behaviour plays an essential role in understand others and their intentions. In order to generate life-like dynamic facial behaviour, recognition on a higher level, such as emotions or facial action unit, will not deliver the detail, required for animating an artificial face. We require data in a pseudo-continuous domain, rather than quantised intensities of facial action units. Furthermore, action units group certain muscle actions, which may or may not be present in a robotic face. Hence our investigations explore facial behaviour at a low level of muscles or robot actuators, emulating human facial muscles. Rather than classifying facial action, we directly map from visually perceived human facial behaviour to a robotic actuator space.

Research

The AAM facial features

We investigate ways of extracting facial motion information from video footage and translation of such results to control commands for application to robotic faces.

Facial motion data is obtained by tracking a set of facial feature landmarks in video sequences or live video. We use Active Appearance Models (AAM) [Cootes, 1998], [Stegmann, 2003], which are generative, statistical models of face texture and shape. Fitting AAMs to video frames is an iterative process, searching for a set of parameters which yield minimum error between AAM-instance and an input image. The shape of the AAM delivers a low dimensional descriptor of the facial expressions, further used in our experiments.

We have investigated techniques such as Partial Least Squares (PLS) and Gaussian Process (GP) Regression [Rasmussen, 2006] for the mapping of facial motion. We assume that muscular activity is a result of facial expressions, unlike in reality, where it is the other way around. The primary aim is to investigate the feasibility and subjects need to explore the robot's capabilities. plausibility of such for modelling and mapping.

Model Training

Model Training - The training set consists of pairs of facial landmarks and the corresponding robot poses, manually set by an animator.

Translating facial motion from video footage to a robotic actuator space, assumes a functional relationship between both, human facial expression space and robotic actuator position-space. This means that for each observation in the human facial expression space, there exists one observation in the robot servo space. In reality, facial expressions are a result of muscular activity and taking person specific and robot structural factors into account, the assumption of a one-one to one relationship no longer appears to be valid. Hence there may be multiple or no outputs for some model inputs, which cannot be described by a closed-form functional relationship.

Model Pose Estimation

Robot pose estimation - An Active Appearance Model (AAM) is fitted to input footage. The shape of the fitted AAM gives 25 landmarks, a low dimensional representation of the facial expression. Subsequently, the input data is aligned, normalised and scaled before the estimation using the previously trained model takes place.

To this end, we investigate whether extensions of Gaussian Process Latent Variable Models [Lawrence, 2005] can be used to model shared and non-shared features of human and robot facial behaviour. We aim to overcome problems of global head motion interfering with the detection of local head motion. Multiple outputs for the same input expressions and the low dimensional representation of facial expressions causes ambiguities within the data which need handling. This part of the project is a collaboration between researchers at BRL and the University of Manchester.

Conclusions

The workings of a robot head
Actuator system in a robot head.

We have investigated ways of capturing human facial motion from video and modelling it for the animation of robotic faces.

Using the low-dimensional descriptor of facial expressions given by the AAM shape is a very compact and simple representation of human facial expressions. However, a large amount of facial motion data is omitted. To overcome the negative effects, such as interference of global and local head motion causing ambiguous robot poses, we suggest modelling facial expressions as functional relationship of an internal shared muscle state, not the other way around. Furthermore, in this work we take into account that there are structural and person specific differences between human face space, represented by the AAM descriptor space, and the robot actuator space. Some features are shared by both human and robot, but other features are non-shared and hence human, or robot specific.

PLS is a simple and well understood way of mapping facial expressions and it requires only a very little training data. Disadvantages arise from common problems of co-linearity and manual modification and cross validation of the model is essential. The model is not robust to tracking errors and cannot handle missing data.

We have found that GP Regression helps to overcome problems of implausible model output, due to tracking errors and operation beyond model boundaries. It is much more robust to input errors and we have shown it to have lower prediction errors for our data. We were able to robustly map the shape of mouth as well eyebrow- and head motion to robotic hardware in real-time, which could not be achieved using a partial least squares model. However, GP Regression requires more training data and training generally takes longer.

In regression tasks, only shared variance is modelled and the observation space specific, non-shared, variance is omitted. We show that modeling non-shared variance is essential for overcoming ambiguities within the data. Similar to a human’s perception and facial motor skills, the Shared GP-LVM can infer facial expression in a bi-directional fashion. It can predict muscle pose from facial expressions as well as produce facial expressions from muscle activity.

So far it is assumed that the servo positions, updated every 40ms, are achieved instantly and that no over shoot takes place. Servo actuator states are not accessible but the internal control loop in each servo actuator allows us to set and achieve desired positions. However, the temporal characteristics as well as trajectories between actuator states are unknown. Even more life-like robotic facial behaviour requires velocity and acceleration control, which should be subject of future research.

This project contributes to the modelling and generation of realistic, dynamic facial behaviour in humanoid robotic faces. Its novelty lies in modelling shared and non-shared properties to account for person and robot specific facial features and the application to physically present, robotic hardware.

We have introduced a number of modelling techniques to robotics. Firstly, Partial Least Squares (PLS) which is very popular in Chemistry and Spectroscopy. We have shown that very small set of training data suffices for the mapping of long sequences of facial motion.

In a context of humanoid robotic research, we fully exploit the capabilities of Shared and Subspace GP-LVMs by modelling data realistically, in a reverse fashion, from muscle state to facial expression, not the other way around, as previously assumed. Furthermore, the model allows interference in a bi-directional fashion, which allows facial expressions to be generated as well as perceived.

Publications:

2007

Peter Jaeckel, Neill Campbell and Chris Melhuish, , Towards Realistic Facial Behaviour in Humanoids - Mapping from Video Footage to a Robot Head. In: 10th edition of the International Conference on Rehabilitation Robotics (ICORR), IEEE, June 2007, pp. 833 - 838. PDF, 565 Kbytes.

Rehabilitation robotics and physical therapy could greatly benefit from engaging and motivating, robotic caregivers which respond in accordance to patients’ emotional and social cues. Recent studies indicate that human-machine interactions are more believable and memorable when a physical entity is present, provided that the machine behaves in a realistic manner. It is desirable to adopt face-to-face communication because it is the most natural and efficient way of exchanging information and does not require users to alter their habits. Towards this end, we describe a process for animating a robot head, based on video input of a human head. We map from the 2D coordinates of feature points into the robot’s servo space using Partial Least Squares (PLS). Learning is done using a small set of keyframes manually created by an animator. The method is efficient, robust to tracking errors and independent of the scale of the face being tracked.

Peter Jaeckel, Neill Campbell and Chris Melhuish, , Mapping from Video Footage to a Robot Head In: Towards Autonomous Robotic Systems (Taros) 2007, pp. 9-16

2008

Peter Jaeckel, Neill Campbell and Chris Melhuish, , Facial Behaviour Mapping - from Video Footage to a Robot Head, Robotics and Autonomous Systems Volume 56, Issue 12, 31 December 2008, Pages 1042-1049

As autonomous robotic systems advance, they will be required and designed for interaction with humans in order to exchange information, which is essential for fulfilling their tasks. It is well established that human-machine interactions are more believable and memorable when a physical entity is present, provided that the machine behaves in a realistic manner. It is desirable to adopt face-to-face communication, because it is the most natural and efficient way of exchanging information, and does not require users to alter their habits. In this context, this paper describes a process for animating a robot head, based on video input of a human head. We map from the 2D coordinates of feature points into the robot's servo space, using Partial Least Squares (PLS). Learning is done using a small set of keyframes manually created by an animator. The method is efficient, robust to tracking errors and independent of the scale of the face being tracked.

Peter Jaeckel, Neill Campbell and Chris Melhuish, , Gaussian Processes for Facial Behaviour Mapping, In - Proceedings of Taros 2008

This paper presents a mapping of observed motion data of facial behaviour footage to a robotic face using Gaussian process regression. The model is build from a training set that consists of examples of input expressions in form of 2-dimensional feature point locations and their appropriate robot pose in terms of servo values. Unlike the linear mapping suggested in previous work, this approach does not suffer from noise in the training and data and the model can handle 'multi-emotional' information. It is robust to inherent and hard-to-avoid errors in the training set. The mapping still performs well in cases of input corruption. The probabilistic characteristics of the Gaussian process regression also gives a measure of quality or confidence about an output.

2009

Peter Jaeckel, Carl Henrik Ek, Neill Campbell Neil Lawrence and Chris Melhuish, Shared Gaussian Process Latent Variable Models for Handling Ambiguous Facial Expressions, Proceedings INTELLIGENT SYSTEMS AND AUTOMATION: 2nd Mediterranean Conference on Intelligent Systems and Automation (CISA'09), March 5, 2009 - Volume 1107, pp. 147-153

Despite the fact that, in reality, facial expressions occur as a result of muscle actions, facial expression models assume an inverse functional relationship, which makes muscle action be the result of facial expressions. Clearly, facial expression should be expressed as a result of muscle action, the other way round than as previously suggested. Furthermore, a human facial expression space and a robot's actuator space have common features. However there are also features that one or the other does not have. This suggests modelling shared and non-shared feature variance separately. To this end we propose Shared Gaussian Process Latent Variable Modelling (Shared GP-LVM) for models of facial expression, which assume shared and private features between an input and output space. In this work we are focussing on the detection of ambiguities within data sets of facial behaviour. We suggest ways of modelling and mapping of facial motion from a representaion of human facial expressions to a robot's actuator space. We aim to compensate for ambiguities caused by interference of global with local head motion and the constrained nature of Active Appearance Models, used for tracking.

References

[Cootes, 1998] T. Cootes, G. Edwards, and C. Taylor , Active Appearance Models , in Proceedings European Conference on Computer Vision 1998, vol. 2, June 1998, pp. 484-498.

[Lawrence, 2005] N. D. Lawrence , Probabilistic non-linear principal component analysis with Gaussian process latent variable models , Journal of Machine Learning Research, no. 6, pp. 1783-1816, 2005

[Stegnmann, 2003] Mikkel B. Stegmann , The AAM-API: An Open Source Active Appearance Model, Implementation , Medical Image Computing and Computer-Assisted Intervention - MICCAI 2003, pp. 951-952, 2003

[Kidd, 2003] C. Kidd , Sociable robots: The role of presence and task in human-robot interaction , Master's Thesis - Massachusetts Institute of Technology, 2003.

[Hegel, 2006] F. Hegel, T. Spexard, T. Vogt, G. Horstmann, and B. Wrede , Playing a Different Imitation Game: Interaction with an Empathic Android Robot , in Proceedings 2006 IEEE-RAS International Conference on Humanoid Robots (Humanoids06). IEEE, December 2006, pp. 56-61.

[Littlewort, 2006] G. Littlewort, M. Bartlett, I. Fasel, J. Susskind, and J. Movellan , An Automatic System for Measuring Facial Expression in Video , Image and Vision Computing, vol. 24, no. 6, pp. 615-625, 2006

[Valstar, 2005] M. Valstar, I. Patras, and M. Pantic , Facial action unit recognition using temporal templates , in IEEE Int'l Workshop on Human-Robot Interaction 2004, 2004, p. 253-258

[Rasmussen, 2006] E. Rasmussen, C. and C. Williams , Gaussian Processes for Machine Learning , The MIT Press, 2006

This file last updated Friday, 28-Aug-2009 12:42:10 BST

research projects

the robot gallery

people

research publications

news

vacancies

contact

© 2005, 2006, 2007, 2008, 2009 Bristol Robotics Laboratory, Dupont Building, University of the West of England, Coldharbour Lane, Bristol, BS16 1QY

Valid XHTML 1.1