Virtual Characters for Interactive Multimedia ... - Semantic Scholar

1 downloads 0 Views 44KB Size Report
as “thug” or “sinister” which are widely needed by animators. .... [17] Prendinger, H., et al., "Scripting Affective Communication with Life-like Characters in Web- ...
Virtual Characters for Interactive Multimedia Presentations

Research Plan Ali Arya

October 2006 Carleton School of Information Technology

Introduction Multimedia content used in a variety of areas such as entertainment, education, and the service industry, is undergoing a fundamental change. Traditionally, such content fell into clearly separate categories, for example, live-action movies, animated films, and more recently computer games and multimediaenabled web sites. Each one of these categories had their own distinguishable characteristics. Movies were based on live-action pre-recorded content and linear/sequential storylines, while computer games used Computer-Generated Imagery (CGI) and were interactive by nature. Recent advances in multimedia and Human-Computer Interaction (HCI) hardware and software have made it possible to add interactivity and computer-generated audio and visual content to many systems. Extensive use of CGI in movies, DVD and Interactive TV technologies, and interactive virtual agents (avatars) on customer service web pages are among notable examples. On the other hand, with new developments, computer games will employ more complicated storylines and realistic imagery, and include live-action parts. It is foreseeable that these traditionally separate fields will soon merge into a relatively unified concept we can call Interactive Multimedia Presentation. In almost all applications of such interactive multimedia content, including those mentioned before, computer-generated virtual characters are essential. The research proposed in this grant application consists of a group of related projects targeted at a comprehensive framework for creating virtual characters. The environments in which these characters can be utilized may have different purposes and use different technologies and styles. For instance, the system may be an online customer service tool that uses a two-dimensional virtual “agent” with close-up views, or a game with full-body 3D but stylistic characters. But regardless of such differences, such characters share some common features to varied levels. Cognitive and behavioural models governing the character’s actions (e.g. personality traits and emotional states) are obvious examples. As our first category of activities, the proposed research will study such models with the aid of cognitive science and behavioural psychology communities, and will develop intelligent computational mechanisms to apply them to virtual characters. The Facial Personality Project (pFACE) [A-1] is part of this proposal and in its early phases has been able to associate facial actions to viewers’ perception of personality parameters (Dominance and Affiliation [22]) through extensive user experiments, and then use these findings to create software models for generating facial actions in an animation based on the intended personality profile. This and similar projects and experiments will help build autonomous and/or easy-to-control characters for various applications. A personality-rich character, for example, can fit perfectly into a real-time agent situation, or help create an offline video without the need for animators to design all the details of actions and expressions. Our research will also consider domain-specific behavioural issues such as full-body movement styles in virtual dance vs. effect of facial actions on the perception of personality in viewers. On the other hand, regardless of the behavioural models used to control the characters, a CGI-based system needs to deal with lower-level visualization issues, such as data types (e.g. 3D or 2D), rendering methods (e.g. realistic or stylistic), and geometry modelling (e.g. head/body physics, skin and muscle representation, deformation, and action coordination for head/body parts). These form the second category of our research activities, generally referred to as visualization. Finally the third category in our proposed research is the application technology where we will address general technologies that make applications based on virtual characters possible. Some examples of the topics in this category are web service protocols and standards most suitable for agent-based systems, multimedia streaming over the Internet with focus on character-rich scenes, and re-usable software architectures and components such as game engines for intelligent characters. In the following sections, the progress made toward such a framework and the objectives of the proposed research will be discussed in more detail.

Recent Progress The author’s Ph.D. research resulted in the development of ShowFace system (http://www.csit.carlton.ca/~arya/showface) [B-3], a component-based streaming architecture, which uses Face Modelling Language (FML) [A-2], an XML-based language compatible with MPEG-4 XMT framework, and Feature-based Image Transforms (FIX), a facial feature-based image transformation method to create new images based on a limited set of input 2D views. During my post-doc research, I extended the ideas from ShowFace to 3D techniques in order to develop a comprehensive environment for interactive face animation. This was done with the collaboration of Steve DiPaola from Simon Fraser University, and the result was a system called iFACE (Interactive Facial Animation – Comprehensive Environment, http://img.csit.carleton.ca/iface) [A-4]. It is based on the idea of Face Multimedia Object (FMO), a re-usable software component that provides a variety of functions to face-based applications. Considering the communicative aspects of facial actions, iFACE implements the FMO through following features: • Multi-dimensional parameter space for head model, consisting of: o Hierarchical geometry that supports both 2D and 3D approaches by placing pixel/vertex information at the lowest level of abstraction under regions and features o Knowledge encapsulating rules of interaction and requested actions o Personality representing long-term individual characteristics o Mood representing relatively short emotional and sensational states • Hierarchical architecture for exposing functionality to a variety of clients at four layers: o Data layers for core face object functionality with direct access o Stream layer for multimedia streaming objects o Wrapper layer for wrapper objects such as web controls o Application layer for user applications • XML-based language for content description (Face Modelling Language, FML) FMO can be used as a “face engine” in a variety of face-based applications. An example can be designtime and run-time environments for computer games. Using FMO an animator can easily design facebased animations simply by specifying the desired facial actions through high-level and low-level parameters, and see the resulting animation. At run-time, the same face engine receives these specifications and creates the animation the way it has been designed and viewed by the artist. On the other hand, behavioural modelling allows FMO to be used in interactive non-scripted situations as well. iFACE provides an interface that can work with head data of any type: simple or photo-realistic 2D, realistic or stylistic 3D. User commands or programming functions are independent of data type or the head model used, except for lower-level point-manipulation. The system is compatible with MPEG-4 face animation and definition parameters. It is the base for a set of research and educational systems at the University of British Columbia, Simon Fraser University, and Carleton University. Examples are pFACE (http://img.cit.carleton.ca/pface) [A-1] and MusicFace (http://ivizlab.sfu.ca/research/musicface) [A-3]. The current proposal will use and enhance iFACE as the starting point for a variety of other research projects. The pFACE project has done extensive user experiments with the help of behavioural psychologists to study the effect of dynamic facial actions and expressions on the perception of personality in viewers, to parameterize the personality types, and to allow animators and automated facial animation systems to initiate perceptually-valid facial actions depending on intended personality. MusicFace extracts the emotional content of music to generate “similar” facial animation.

Objectives The objectives of the proposed research fall into three main categories: Cognitive/Behavioural Modelling, Visualization, and Application Technology. Within each one of these categories, short-term and long-term objectives can be identified. The short-term ones are mainly related to projects that have already started or the coordination with participating researchers and detailed planning for them is in progress. These projects are expected to be fully active in 2007 and 2008. The long-term objectives (and the projects associated with them) depend on the success of short-term projects and the participation of more researchers. While pursuing both groups of objectives requires proper funding, work on the longerterm objectives is not even possible to start at the moment due to limited resources. Practical priorities such as industry need as well as the interests of researchers have also affected the selection of short-term and long-term objectives.

Short-term Projects and Objectives Cognitive/Behavioural Modelling • Effective parameterization of facial personality (pFACE project) to answer the following questions: (1) What are the facial personality types and what parameters control them? (2) How the viewers associate the facial personality types and parameters associated with static and dynamic visual cues on face (i.e. facial actions)? In behavioural psychology, personality types can be considered regions in a circular 2D space with dimensions corresponding to Affiliation and Dominance parameters [22]. We need to investigate if this model is applicable and practical for facial personality. Then, through user studies we investigate the effect of facial actions such as eyebrow-raise and head-nod on the perception of personality (for instance by asking users to rate short videos of animated characters for personality adjectives). The results will be analyzed and will help build a model that automatically generates facial actions that increase the probability of an intended personality type. The results achieved so far show that further experimentation and more complicated models may be required, especially for representing not just basic personality types but more abstract “character” types such as “thug” or “sinister” which are widely needed by animators. • Parameterization of facial expressions and using “expression units” to create and blend the expression of different emotions in a perceptually valid way. Realistic combination of facial expressions such as moods with normal facial actions such as talking. Assuming that the default movements for a facial expression and a facial action are known, the movements for that action at presence of the expression is a function of those two movements, but the function is not necessarily a simple addition or any linear one. The idea here is to break expressions into small units and then combine them using proper combination of units in order to create a spectrum of possible expressions. This project is called xFACE and will use an approach similar to pFACE, i.e. based on experimentation and modelling, and together they provide a “perceptually valid” facial animation. • Parameterization of dance styles through motion capture in order to edit existing performances and create new ones. Style Processor for Interactive Choreography and Education (SPICE) is a project at the early phases of literature review, attracting interested researchers, and planning. Visualization • Developing a hierarchical parameter space for facial geometries. At the lowest level, this “face space” includes parameters such as size/location of facial features. Higher layers of abstraction (face types) are necessary to simplify character design. This project is referred to as FaceSpace. • Optimal function for the movement of facial points during a facial action including deformations, such as wrinkles. • Adding support for 2D image-based animation to iFACE system by integrating ShowFace (2D) framework and current iFACE (3D) and enhancing the functionality.

Application Technology • iFACE version 2 with enhancements based on received feedback

Long-term Projects and Objectives Cognitive/Behavioural Modelling • Behavioural modelling for social and conversational agents in virtual worlds and web-based communities • Intelligent agents, learning from experience for social agents • Affective communication remapping, i.e. translating affective content to/from facial presentations. For example, MusicFace is an application that extracts affective information from music and creates a corresponding facial representation. This involves not only the facial expressions of emotions in the form of static facial states but also facial behaviours such as frequency of blinking, and rhythmic head movements. The project requires renewed funding and active researchers to restart. • Music-driven virtual performances (based on ideas from MusicFace and SPICE projects) Visualization • Adding support for non-photo-realistic animation (moving paintings) to iFACE and SPICE systems Application Technology • iFACE for game consoles and mobile devices • Distributed multimedia services (e.g. XML-based web service) for face multimedia object • Collaborative online environments, especially for e-Learning, and virtual communities • Streaming for facial animation The following diagram shows the evolution of the projects in the proposed research. 1st Year

2nd Year

iFACE

FaceSpace

pFACE

xFACE

SPICE

Facial Animation

Face Types & Geometry

Personality Modelling

Facial Expressions

Virtual Dance Styles

iFACE

NPR

MusicFace

For Game

Non-PhotoRealistic

Music-driven Animation

Consoles & Mobile Devices

3rd Year

Behaviour & Intelligence Modelling for Virtual Characters

4th Year

Animation Streaming

Web Services

Virtual Communities

Musicdriven Virtual Dance

Literature Pertinent to the Proposal The Big-5 [21] and two-dimensional circumplex [22] are among the most popular personality models in behavioural psychology. The Big-5 model considers five major personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (OCEAN). Although successful in many aspects, the five dimensions in the Big-5 model are (1) not independent enough and (2) hard to visualize. Wiggins et al. [22] have proposed another personality model based on two dimensions: Affiliation and Dominance. They show that different personality types can be considered points around a circular structure formed in two-dimensional space. The smaller number of dimensions allows them to be controlled more effectively and independently. Two parameters are also easier to visualize, perceive, and understand. The perception of personality type and traits based on observation has also been a subject of research in behavioural psychology [3,10]. Unfortunately, this research has not focused on facial actions, and has primarily considered the observation of full-body behaviours. Also, mainly due to production difficulties, the observations have been mostly limited to photographs or very few dynamic actions. Russell [19] has mapped emotional states onto a 2D space controlled by Arousal and Valence. The detailed study of facial actions involved in the expression of the six universal emotions [7] has helped the computer graphics community develop realistic facial animations. Yet the rules by which these facial expressions are combined to convey more subtle information remains less well understood by behavioural psychologists and animators [13,16]. Facial Action Coding System (FACS) [7] was the earliest approach to systematically describe facial action in terms of small Action Units such as left-eyelid-close and jaw-open. The MPEG-4 standard [2] extended this idea and introduced Face Definition Parameters and Face Animation Parameters. Virtual Human Markup Language (VHML) [12] and Multimodal Presentation Markup Language (MPML) [17] are examples of approaches to a more formal and high-level representation of facial actions. Advantages and shortcomings of these languages are discussed by Arya, et al. [A-2]. A variety of methods have been proposed for modelling the geometry [14] and behaviour [1,5,6,8,9,11,15,18,20] of virtual characters. Lack of dependence on findings in behavioural psychology, not addressing facial actions specifically, being limited to special cases (e.g. controlling facial action only through speech content), and using parameters that are not independent and easy-to-control are among issues in these approaches that still need improvement. Detailed discussion of these issues and the ways to improve them can be found in publications by the author and his colleagues [A-1,A-4]. The proposed research aims at addressing these issues.

References [A1] Ali Arya, Lisa Jefferies, James T. Enns, and Steve DiPaola, “Facial Actions as Visual Cues for Personality,” Journal of Computer Animation and Virtual Worlds, 17:371-382, John Wiley & Sons, Ltd., 2006. [A2] Ali Arya and Steve DiPaola, “Face Modelling and Animation Language for MPEG-4 XMT Framework,” under revision for IEEE Transactions on Multimedia, 2006. [A3] Steve DiPaola and Ali Arya, "Emotional Remapping of Music to Facial Animation," ACM SIGGRAPH Video Game Symposium, Boston, MA, USA, July 29-30, 2006. [A4] Ali Arya, Steve DiPaola, Lisa Jefferies, and James T. Enns, “Socially communicative characters for interactive applications," 14th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG-2006), University of West Bohemia, Plzen, Czech Republic, January 30 - February 3, 2006.

[1] Badler, N. et al. “Towards Personalities for Animated Agents with Reactive and Planning Behaviors,” in Creating Personalities for Synthetic Actors: Towards Autonomous Personality Agents, R. Trappl and P. Petta (Ed.), Spinger-Verlag, 1997. [2] Battista, S. et al. “MPEG-4: A Multimedia Standard for the Third Millennium,” IEEE Multimedia, Vol. 6, No. 4, IEEE Press, 1999. [3] Borkenau, P. et al. “Thin Slices of Behaviour as Cues of Personality and Intelligence,” Journal of Personality and Social Psychology, vol. 86, no. 4, 599–614, 2004. [4] Brand and Hertzmann, “Style Machine,” ACM SIGGRAPH, 2000. [5] Cassell, J. et al. “BEAT: the Behaviour Expression Animation Toolkit,” ACM SIGGRAPH, 2001. [6] Egges, A. et al. “A Model for Personality and Emotion Simulation,” Knowledge-Based Intelligent Info & Eng Systems, 2003. [7] Ekman, P., and W.V. Friesen, Facial Action Coding System, Consulting Psychologists Press Inc. 1978. [8] Funge, J., et al. “Cognitive Modelling: Knowledge, Reasoning and Planning for Intelligent Characters,” ACM SIGGRAPH, 1999. [9] King, S.A. et al. “Language-driven nonverbal communication in a bilingual conversational agent,” Computer Animation and Social Agents, Geneva, 2003. [10] Knutson, B., “Facial Expressions of Emotion Influence Interpersonal Trait Inferences,” Journal of Nonverbal Behaviour, vol. 20, no. 3, 165-181, 1996. [11] Kshirsagar, S., and N. Magnenat-Thalmann. “A Multilayer Personality Model,” 2nd Intl Symposium on Smart Graphics, June, 2002 [12] Marriott, A., and J. Stallo, "VHML: Uncertainties and Problems. A discussion," First Intl Conf Autonomous Agents & Multi-Agent Systems, Workshop on Embodied Conversational Agents, July 2002. [13] Paradiso, A., “An Algebra of Facial Expressions,” ACM SIGGRAPH, 2000. [14] Parke, F. I., and Waters, K., Computer Facial Animation, A. K. Peters, 2000. [15] Pelachaud, C., and M. Bilvi. “Computational Model of Believable Conversational Agents.,” in Communication in MAS: background, current trends and future, Marc-Philippe Huget (Ed), Springer-Verlag, 2003. [16] Perlin, K., “Layered Compositing of Facial Expression,” Sketches of ACM SIGGRAPH, 1997. [17] Prendinger, H., et al., "Scripting Affective Communication with Life-like Characters in Web-based Interaction Systems," Applied Artificial Intelligence, vol.16, no.7-8, 2002. [18] Rousseau, D., and B. Hayes-Roth. “Interacting with Personality-Rich Characters,” Report No. KSL 97-06, Knowledge Systems Laboratory, Stanford University, 1997. [19] Russell, J.A., "A Circumplex Model of Affect," Journal of Personality and Social Psychology, 39, 1161-1178, 1980. [20] Smid, K. et al. “Autonomous Speaker Agent,” Computer Animation and Social Agents, Geneva, 2004. [21] Watson, D. “Strangers' Ratings of the Five Robust Personality Factors: Evidence of a Surprising Convergence With Self-Report,” Journal of Personality and Social Psychology, vol. 57, no. 1, 120-128, 1989. [22] Wiggins, J.S. et al. “Psychometric and Geometric Characteristics of Revised Interpersonal Adjective Scales (IAS-R),” Multivariate Behavioural Research, vol. 23, 517-530, 1988.