lab lab

Lip Synchronisation for the iCub Talking Head

lab

Ingo Keller, Katrin S. Lohan Heriot-Watt University School of Mathematical & Computer Sciences

Background

Motivation In Human-Robot Interaction feedback is a vital to keep an interaction going [9]. Experiments indicate that a more animated robot face is more likely to attract attention [2]. Here, we are focusing on lip synchronisation to facilitate the recognition of generated speech. According to the McGurk Effect, which described how the visual information of human speech affects auditory recognition, the visual and audio representation of a phoneme has to be coherent so humans can understand a phoneme correctly. If the phoneme sound and the visual representation are not matching each other, it leads to contradiction in understanding that phoneme [6]. Implementation was done on the iCub Talking Head [8] and compared to the suggested analysis of Kyung-Geune at el. [7] as a reference.

System

• Visemes are units of distinguishable facial expressions

Phoneme-to-Viseme Map

• Hearing-impaired people use it to compensate for the missing sound • More phonemes than visemes exist

• Visemes are the visual equivalent to phonemes

• Multiple viseme-to-phoneme mappings possible [4]

• Trained people can use them for “lip-reading”

System Design • Speech generation and phoneme durations are generated with MaryTTS

• 5 levels per joint were defined to compensate for small min-max joint travel distances

• iCub Talking Head features 4 DOF for the lips (Up, Down, Left, Right)

• Simplifying the mapping resulted in 16 visemes

Class 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

MaryTTS Phonemes p, b, ., f, v, w, m OI u aU A, h r {, U @U tS, dZ i, I, AI, EI, t, g, T, D, s, z, S, Z, n, N, j d @, V, E r= l O

Up Down Left/Right 0% 0% 50% 0% 25% 25% 0% 50% 25% 0% 50% 50% 0% 100% 50% 25% 0% 25% 25% 75% 75% 25% 100% 50% 50% 25% 50% 50% 25% 100% 50% 50% 75% 75% 100%

50% 75% 0% 25% 75%

100% 100% 25% 100% 50%

Scenario

Comparison between iCub and FR-i 2.0

The lip synchronisation will be used in our fruit salat scenario in which the participants will explain to the robot how they create a fruit salad.

References [1] Annosoft. Phoneme mapping. http://www.annosoft.com/docs/Visemes17.html, 2015.

phoneme and viseme based acoustic units for speech driven realistic lip animation. Proc. of Signal Proc. and Communications Applications, pages 1–4, 2007.

[2] Christoph Bartneck, Takayuki Kanda, Omar Mubin, and Abdullah Al Mahmud. Does the Design of a Robot Influence Its Animacy and Perceived Intelligence? International Journal of Social Robotics, 1(2):195–204, April 2009. 00054.

[4] Luca Cappelletta and Naomi Harte. Phoneme-to-viseme mapping for visual speech recognition. In ICPRAM (2), pages 322–329, 2012.

[3] Elif Bozkurt, Cigdem Eroglu Erdem, Engin Erzin, Tanju Erdem, and Mehmet Ozkan. Comparison of

[5] Katrin Solveig Lohan, Katharina Rohlfing, Karola Pitsch, Joe Saunders, Hagen Lehmann, Chry-

Acknowledgements The presented work was supported by a James-Watt scholarship from Heriot-Watt University. Also, we would like to thank the iCub community for the provided Open Source software on which we could build our system.

dtopher Nehaniv, Kerstin Fischer, and Britta Wrede. Tutor spotter: Proposing a feature set and evaluating it in a robotic system. International Journal of Social Robotics, 4:131–146, 2012. [6] Harry McGurk and John MacDonald. Hearing lips and seeing voices. 1976. [7] Kyung-Geune Oh, Chan-Yul Jung, Yong-Gyu Lee, and Seung-Jong Kim. Real-time lip synchronization between text-to-speech (tts) system and robot mouth. In RO-MAN, 2010 IEEE, pages 620–625. IEEE, 2010.

Contact Details Email: [email protected] Web: www.macs.hw.ac.uk/RoboticsLab Twitter: @EDINrobotics, @BrutusTT

[8] Alberto Parmiggiani, Marco Randazzo, Marco Maggiali, Frederic Elisei, Gerard Bailly, and Giorgio Metta. An articulated talking face for the icub. In Humanoid Robots (Humanoids), 2014 14th IEEE-RAS International Conference on, pages 1–6. IEEE, 2014. [9] Anna-Lisa Vollmer, Manuel Mühlig, Jochen J Steil, Karola Pitsch, Jannik Fritsch, Katharina J Rohlfing, and Britta Wrede. Robots show us how to teach them: Feedback from robots shapes tutoring behavior during action learning. PloS one, 9(3):e91349, 2014.