Semantic Virtual Environments - CiteSeerX

Semantic Virtual Environments ` THESE ´ ´ A LA FACULTE ´ INFORMATIQUE ET COMMUNICATIONS PRESENT EE ´ ´ ERALE ´ ECOLE POLYTECHNIQUE FED DE LAUSANNE ` SCIENCES POUR L’OBTENTION DU GRADE DE DOCTEUR ES

PAR ´ Mario A. GUTIERREZ A. Master of Science in Computer Science, ITESM Campus Toluca Originaire du Mexique

Jury:

Prof. Dr. Daniel Thalmann, co-directeur de thèse Dr. Frédéric Vexo, co-directeur de thèse Prof. Dr. Pascal Fua, président Prof. Dr. Bianca Falcidieno, rapporteuse Prof. Dr. Fran¸coise Preteux, rapporteuse Prof. Dr. Alain Wegmann, rapporteur

Lausanne, EPFL 2005

ii

Abstract The main problem targeted by this research is the lack of flexibility of virtual objects within a Virtual Environment. Virtual objects are difficult to re-use in applications or contexts different from the ones they were designed for. This refers mainly to the need for adaptive entities, from the points of view of the geometric representation and user interface. The solution we propose is based on a semantic model aimed at representing the meaning, and functionality of objects in a virtual scene; including their geometric representation and user interface. This thesis is divided into four main parts. First we define a semantic model for Virtual Environments populated by digital items (virtual entities). Then we focus on three main aspects of the development and use of Virtual Environments: interaction, visualization and animation. We describe the needs and benefits of using a semantics-based representation for: animating virtual characters; interacting within Virtual Environments and visualizing them in different contexts/devices (handhelds, hi-resolution displays, text-based devices, etc.). The thesis concludes by exploring a novel trend in research that builds upon knowledgebase technologies and the emerging Semantic Web initiatives: the use of ontologies for Virtual Environments. A preliminary version of an ontology for Virtual Humans is presented. When using ontologies, digital entities can be understood by both humans and machines and be reused in a wide range of contexts and applications. Semantic Virtual Environments are composed of such reusable and adaptive digital items.

iv

Version Abr´ eg´ ee Contrairement à la vidéo, les environnements virtuels interactifs (réalité virtuelle, jeux vidéo) souffrent d’un handicap important lié à une faible réutilisation et adaptation du contenu 3D produit. En effet, la majeure partie du temps l’ensemble du travail de production est à refaire complètement si l’on change de terminal graphique ou de périphériques d’interaction. Nous proposons dans cette thèse d’utiliser un modèle sémantique pour décrire les objets ce modèle va au delà d’une représentation 3D de type géométrique. Ce modèle sémantique représentera un objet selon sa géométrie mais aussi selon ses fonctionnalités intrinsèques et les différentes interactions qu’il offre. Cela permettra une grande flexibilité dans sa réutilisation et son adaptation aux différentes interfaces d’interactions offertes à l’utilisateur. Cette thèse est divisée en quatre chapitres principaux. Le premier présente de manière générale un modèle sémantique pour décrire les Environnement Virtuels intégrés par de nombreuses entités virtuelles comme les humains, des objets avec lesquels on peut interagir, etc. Dans les trois suivants, nous présentons les développements que nous avons réalisés en utilisant ce modèle sémantique pour les taches classiques autour des Environnement Virtuels Interactifs : Interaction avec les personnages et les objets qui peuplent les mondes virtuels, Animation des personnages virtuels et Visualisation en fonction des différents dispositifs de visualisation (ordinateurs de poche, écrans haut résolution, dispositifs basés sur texte, etc.). La thèse ce conclut en présentant une nouvelle approche qui propose d’utiliser la notion d’Ontologie, qui vient des technologies des systèmes à base de connaissance et le “ Semantic Web ”, pour les Environnements Virtuels.

vi

Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Version Abrégée . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iii v vii

1 Introduction and summary 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Main research topics of this work . . . . . . . . . . . . . . . . . . . .

1 1 3

2 The 2.1 2.2 2.3

Need for Semantic Virtual Environments Creating Virtual Environments . . . . . . . . . . . . . . . . . . . . . Scene Graph - based systems . . . . . . . . . . . . . . . . . . . . . . . A novel representation for Virtual Environments . . . . . . . . . . . .

3 Semantic Virtual Environments 3.1 Introduction . . . . . . . . . . . . . . . . . 3.2 Semantic Modeling . . . . . . . . . . . . . 3.3 Semantic model of a Virtual Environment 3.4 Conclusions . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

7 7 9 11 15 15 16 20 26

4 Semantics for Interaction 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Handheld interfaces for semantic Virtual Environments . . . . . . . . 4.2.1 The Mobile Animator: interactive control of virtual entities . . 4.2.2 Conducting a Virtual Orchestra . . . . . . . . . . . . . . . . . 4.2.3 Controlling Haptic Virtual Environments through Handheld Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Semantics for adaptive multimodal interfaces . . . . . . . . . . . . . . 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29 29 30 30 46

5 Semantics for visualization 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Semantic Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69 69 70

vii

48 57 66

viii

Contents 5.3

Augmented CGI Films: a general-purpose representation quality animated content . . . . . . . . . . . . . . . . . . . 5.3.1 Coding of 3D animation . . . . . . . . . . . . . . . 5.3.2 A coding scheme for augmented CGI films . . . . . 5.3.3 User interface for augmented CGI films . . . . . . . 5.3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . .

6 Semantics for Animation 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 6.2 Reflex Movements: Representation and Algorithms tual Entities . . . . . . . . . . . . . . . . . . . . . . 6.2.1 The biology based approach . . . . . . . . . 6.2.2 The computer graphics approach . . . . . . 6.2.3 The virtual human neuromotor system . . . 6.2.4 Test application: reaction to thermic stimuli 6.3 Semantics for autonomous virtual characters . . . . 7 Ontology-based Virtual Environments 7.1 Semantic Virtual Environments . . . . . . . . . . 7.2 Ontologies in the context of Virtual Environments 7.3 Ontology for interactive Virtual Environments . . 7.4 Ontology for Virtual Humans . . . . . . . . . . .

. . . .

. . for . . . . . . . . . . . . . . . .

. . . .

for high. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . Reactive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

72 76 79 87 89 91 91

. . . Vir. . . 92 . . . 93 . . . 94 . . . 96 . . . 101 . . . 104 . . . .

. . . .

. . . .

109 109 112 115 119

8 Conclusions 129 8.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . 129 8.2 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . 131 Bibliography

135

A Descriptors for Augmented CGI Films

149

Chapter 1 Introduction and summary 1.1

Motivation

Computer generated 3D graphics have evolved very fast in the last few years. Nowadays we are surrounded by impressive synthetic imagery that creates a rich variety of Virtual Environments such as video games, Computer Generated Imagery (CGI) movies, scientific/artistic simulations, etc. However, despite such achievements, the full potential of Virtual Environments (VEs) is far from being attained. Producing, visualizing and interacting within VEs is still a complex task. Most of the time our experience with such environments is limited and oversimplified in one or more of the following aspects: interactivity, quality of graphics and animation, etc. The research we present in this thesis shows how we can create more flexible and adaptive virtual environments by means of using a richer representation of the building components. The building components of a VE include the 3D models used to visualize the entities conforming a virtual world, as well as the interaction devices 1

2

Chapter 1: Introduction and summary

and algorithms used to animate and interact with such entities. The central idea of this research is to define a novel representation of Virtual Environments based on the semantics of their components. VE’s are not considered anymore as 3D geometry but as a set of entities with functionalities and properties. The way these entities are rendered (visualized) depends on two factors: the knowledge associated with them and the context of interaction. Entities represented in this way are turned into digital items with increased autonomy, flexibility and adaptability. We have defined a general model to represent digital items used to populate and control interactive Virtual Environments. Our model is based on the semantics of the virtual entities and not on their geometry, as is the case in most of the VE applications. The semantic model we have proposed provides a way to specify alternative geometric representations for each entity and a range of functionalities. The entities are called digital items and can be not only virtual objects to be rendered as part of the scene, but also independent user interfaces and controls for animation or interaction. This semantic model allows for implementing a variety of interaction channels (multimodal interfaces); provides a way to implement animation/interaction controls as components of a digital item or as independent entities. Once the information has been organized and unified, the choice of interfaces and visualization techniques is more a development decision than a design problem. The next chapters describe how we applied this modeling approach in the areas of interaction, visualization and animation of entities participating within Virtual Environments. This thesis is structured as follows: chapter 2 identifies the need for a novel


3

representation for Virtual Environments. The new representation should be based on high-level semantics and knowledge associated to entities and objects participating in VEs. Chapter 3 presents a semantics-based representation and modeling approach for Virtual Environments. The next three chapters are focused on a particular aspect of the development and use of VE applications. We describe the needs and benefits of using a semantics-based representation for: interaction, visualization and animation in Virtual Environments. In Chapter 7 the thesis concludes with the description of a formal unified representation that uses an ontology-based approach.

1.2

Main research topics of this work

The following is a brief overview of the problems we intend to overcome as well as our contributions and limitations.

Semantics for interaction Interaction mechanisms for VEs remain rather basic and limiting. Users are only able to take simple decisions: start or stop the simulation; select some actions from a pre-configured repertory, e.g. controlling a virtual character by selecting from a few predefined actions such as walk, fire, jump, and so on. This is due to several reasons; on the one hand, some advanced interaction technologies are not mature enough to be taken out of the controlled environments that are the research laboratories. On the other hand, currently accepted interaction paradigms still fail to immerse the user into the simulation. Interaction within VEs is often very complex due to the inherent difficulties of using devices such as joysticks, keyboards, mice, data-gloves,

4


etc. In many cases the available options are too limited and users end as simple spectators, this is the case of CGI films. In this thesis we present the principles we have applied to the implementation and customization of user interfaces. Chapter 4 describes the semantics based approach we defined. Our main contribution is to provide a semantics-based framework for building human machine interfaces with better adaptation capabilities and easy configuration. Our system is based on an ontology for interaction which accurately models the way building components communicate with each other to create reconfigurable multimodal interfaces.

Semantics for content adaptation Virtual Environments with high visual quality -complex rendering of special effectsare one of the main research objectives aiming at improving the believability of the applications. The highest quality computer imagery we can produce is generated through non-real time algorithms such as ray-tracing. Complex visual effects such as the ones resulting from realistic illumination models are only accessible as non interactive sequences. State of the art video games take the most out of modern graphics boards and achieve impressive results, getting closer and closer to non-realtime rendering techniques. However, producing and adapting such content to multiple platforms is very expensive, if not impossible, in terms of both human and computer resources. As technology evolves, the need for adapting high-quality content to multiple platforms grows everyday. For instance, porting video games and other desktop-based applications to mobile devices requires most of the time to re-develop them completely. This directly affects the quality and variety of content available for


5

entertainment, business and scientific applications. There is a clear need for reusable high-quality content and applications. Chapter 5 presents a set of semantics-based technologies and methods applied to the adaptation of animated content to multiple platforms. First we present a coding scheme that allows for lossless representation of complex animation and rendering effects. The second part of the chapter presents the concept of semantic rendering: Using semantic annotations for automatic and semiautomatic adaptation of virtual entities (2D/3D geometry) for using them in multiple platforms (PC, mobile devices, etc.). Semantic rendering increases the reusability of digital content and provides new possibilities for user applications: transforming 3D models into 2D graphics or either text-based content.

Semantics for animation Virtual Environments require better user interfaces and reusable/adaptable components. Nevertheless, VE components themselves need also more autonomy and intelligence. Better interaction paradigms and more possibilities of portability can be achieved by incorporating an intelligent layer for controlling the behavior of virtual entities. A few years ago, a widely accepted theory established that the ultimate user interface would be constituted by an embodied agent which would communicate with the user through natural language and realistic human-like behavior. Interface research saw then the boom of talking heads. Even if impressive progress has been achieved both in terms of AI and computer graphics, the ”ultimate” user interface has not been released yet. Embodied conversational agents are a promising alternative towards better human machine interfaces. However, believable characters are

6


difficult to develop. The difficulty lays mainly on our misunderstanding of the factors that create the illusion of life and communication. In fact, talking heads often fail as communication channels between the machine and the user due to their lack of naturalness and spontaneity. The road to better human machine paradigms and virtual environments passes through the creation of autonomous components that display a natural behavior in reaction to the user’s input. There is a need for reactive VE components that can act as autonomous or semi-autonomous interface elements, equipped with an intelligent and reactive layer that renders them more understandable and believable. In chapter 6 we show how a semantics-based representation can be used as the foundation for reactive virtual characters that display spontaneous behavior in the form of reflex movements and other autonomous reactions to the environment. An essential aspect of the semantics-based representation is that we consider a virtual entity (e.g. a virtual character) as a component characterized by a set of functionalities and properties where the geometry is just one of them. This apparently subtle and even philosophical twist in the way we represent and handle the building components of a VE enables interesting possibilities for content, adaptation, interaction and animation. Chapter 7 presents the unification of the concepts and contributions described in previous chapters. A unified ontology-based representation for Virtual Environments is proposed as the final contribution of this research.

Chapter 2 The Need for Semantic Virtual Environments 2.1

Creating Virtual Environments

The central area of interest of this research are Virtual Environments. There are many different definitions of this term, and they usually depend on the context of application. Virtual Environments (VEs) commonly involve the use of 3D graphics, 3D sound and real-time interaction for creating a simulation. A Virtual Environment can be defined as: an environment which is partially or totally based on computer generated sensory input. Sensory information used to create a VE addresses three main types of senses: vision, audition and touch (haptics). The creation of VEs is a challenging problem requiring diverse areas of expertise, which may range from networks to psychology. Developing VE systems is a very expensive task in terms of time, financial and human resources. VEs can be applied in 7

8

Chapter 2: The Need for Semantic Virtual Environments

a broad range of areas, such as scientific visualization, socializing, training, psychological therapy, gaming. Such diversity of applications produces a set of requirements that make it very difficult, if not impossible, to build a single system to fit all needs. Traditionally, the result has been the creation of monolithic systems that are highly optimized to a particular application, without possibility of reusability with a different purpose. According to Oliveira et al. [110], the problem of lack of reusability is due to the current trend in the VE community: developing a new VE system for each different application. The “reinventing the wheel” and “not invented here” syndromes limit the innovation and delay the use of VEs in wider areas for the general public. The main problems we intend to tackle with this research are the reusability and adaptability of Virtual Environments. In addition, we will show that the approach we have followed can be also applied to enhance the believability and responsiveness of the VE. Monolithic systems such as DIVE [70], MASSIVE [55], NPSNET [97], SPLINE [5], dVS/dVISE [56] amongst others have proliferated in the last decade due to the lack of system flexibility to a particular application [110]. The introduction of more modular architectures led to the emergence of toolkits such as WorldToolkit [133], Avocado [143], VR Juggler [16], VHD++ [120], Virtools [149], etc. These software suites have different degrees of flexibility. Frameworks like VHD++ differentiate from the others due to its specialized skills on a particular domain, e.g. Virtual Humans simulation technologies. All of them are based on a hierarchical representation of the virtual environment: a scene graph.


2.2

9

Scene Graph - based systems

A scene graph is an abstract logical access structure used to represent objects composing the environment (scene data) and the relationships between them (often hierarchical attachments, such as in character animation). Scene graphs are often confused with data structures used to do visibility culling or collision queries such as octrees/BSP/ABT/KdT, etc. Scene graphs are used to connect game-rules, physics, animation and AI systems to the graphics engine. Popular implementations of scene graph programming interfaces (APIs) include: Cosmo3D (SGI), Vega Scene Graph [105], Java3D [140], OpenSceneGraph [112] and OpenGL Performer [136]. All of them were designed for creating real-time visual simulations and other performance-oriented 3D graphics applications. OpenGL Performer is a commercial toolkit which evolved from Open Inventor [135]. Open Inventor, is considered as the archetypical example of scene graph library. It presents an object oriented programming model based on a 3D scene database (scene graph) that provides a higher layer of programming for OpenGL. Java 3D gained popularity as the main scene graph based API for developing 3D applications with Java. It is frequently used for developing web-based applications enhanced with real-time 3D graphics [74],[44],[155] . It is also a very representative example of a scene graph-based 3D toolkit. A more precise definition of scene graph can be borrowed from the Java 3D documentation [140]: A scene graph is a ”tree” structure that contains data arranged in a hierarchical manner. The scene graph consists of parent nodes, child nodes, and data objects. The

10


parent nodes, called Group nodes, organize and, in some cases, control how Java 3D interprets their descendants. Group nodes serve as the glue that holds a scene graph together. Child nodes can be either Group nodes or Leaf nodes. Leaf nodes have no children. They encode the core semantic elements of a scene graph- for example, what to draw (geometry), what to play (audio), how to illuminate objects (lights), or what code to execute (behaviors). Leaf nodes refer to data objects, called NodeComponent objects. NodeComponent objects are not scene graph nodes, but they contain the data that Leaf nodes require, such as the geometry to draw or the sound sample to play. As far as we know, all of the available scene graph implementations are based on a hierarchical spatial representation of the objects in the scene, e.g. a terrain contains a house, inside the house there is a person who is holding a hammer in her right hand. Usually, the semantic information that is encoded in the scene graph corresponds mainly to visualization aspects: geometry to draw, and associated effects such as a sound sample. Only elemental relationships between objects can be specified, e.g. smart objects [85] (simple tools and objects such as a hammer or a drawer) which contain information describing how they can be grasped by a virtual human and a pre-determined behavior to perform. Efforts aimed at enhancing the adaptability and reusability of VE applications and entities within them, have focused on designing software component frameworks for managing the resources and building blocks of a VE system. Such is the case of the Java Adaptive Dynamic Environment (JADE) [110] which permits dynamic runtime management of all components and resources of a VE system. While this kind of initiatives are successful in terms of providing reusability an interoperability


11

at the level of the code source, they do not address the fundamental problem of reusing the virtual entities that participate in the VE application. The use of scene graphs as hierarchical spatial representations is not questioned. As a result, source code implementing animation/visualization/interaction algorithms can be reused to some extent. But the knowledge associated to a virtual entity remains difficult to reuse.

2.3

A novel representation for Virtual Environments

Virtual entities such as virtual humans, or any object “living” in a VE have various kinds of associated knowledge. For instance, when we consider a virtual human at a higher level, we don’t see it just as a deformable 3D mesh whose animation is controlled by an internal hierarchy of joints and segments (animation skeleton). Instead we consider it as a ”living” creature with some degree of autonomy and a particular role to play in the environment. Virtual characters can modify their behavior depending on the experience acquired through contact with human users and virtual entities. Other types of knowledge associated to an entity are the different possibilities to interact with it: the available input/output channels and the interaction devices that can be “plugged-in”. Virtual entities considered at a higher semantic level such as described before, are impossible to reuse or to adapt to different contexts (e.g. migrating from a graphics workstation to a handheld device or a text-based interface). Doing this still requires

12


a huge amount of time and effort, because it means that each software component (rendering engine, animation algorithms, interaction controllers) must be handled separately and sometimes modified in order to adapt it to a new context. Geometry should be modified, in order to visualize it in a system with different computing power and display capabilities. Boier-Martin [19] proposed the idea of adaptive graphics. This means including methods such as data streaming, model simplification and level-of-detail management, into a single system that allows optimal combinations to be selected and applied, depending on the specifics of each application. In multimedia jargon, the process of converting between different representations to adapt to different client capabilities is known as transcoding. Transcoding only considers the adaptation and reutilization of the graphical side of a virtual entity. The idea of transcoding other facets of a virtual object, such as its behavior or interaction capabilities is rather new and constitutes the central idea of this research. Our main hypothesis is that a higher-level semantic representation of Virtual Environments can enhance the reusability and adaptation capabilities of the virtual entities participating in a VE. The semantics based representation we propose builds upon the concept of scene graph, with the difference that it does not focus on visual or spacial relationships between entities, but on higher level semantics. Our approach is based on the semantics of the virtual entities and not on their geometry, as is the case in most of the VE applications. The semantic model we propose provides a way to specify alternative geometric representations for each entity


13

and a range of functionalities. The entities are called digital items and can be not only virtual objects to be rendered as part of the scene, but also independent user interfaces and controls for animation or interaction. In this thesis we study three main aspects that constitute a virtual object as an independent entity: interaction capabilities, animation possibilities and visualization alternatives. The end-goal we pursued was to unify these three types of high level semantics into a general knowledge-based formal representation, an ontology for Virtual Environments.

14


Chapter 3 Semantic Virtual Environments 3.1

Introduction

Despite the multiple designs and implementations of VE frameworks and systems, the creation of reusable, scalable and adaptive content, and interfaces, is still an open issue. Research focuses on design patterns for reusable software components, but important questions such as the adaptability/scalability of the content itself and the way to interact with have received less attention. The contents of an interactive VE should be dynamic, but in reality the possibilities are limited due to the difficulties to update the digital assets -usually 3D modelsat a reasonable rate. For instance, adding new characters/objects or reusing them in different contexts, sometimes even in different platforms is very expensive and sometimes unfeasible -e.g. taking a 3D character appearing in a film and using it in a video game running on a mobile phone requires re-designing everything, from the 3D model to the application. 15

16

Chapter 3: Semantic Virtual Environments When it comes to interaction, the possibilities are reduced to the mechanisms and

combinations foreseen by the designers and developers. For example, using a data glove instead of a keyboard, or controlling objects through a 3D virtual manipulator instead of a 2D GUI, usually requires rewriting components of the VR system. Our research aims at defining a semantic model of virtual environments that helps on the adaptation, and re-purposing of both 3D models and their interaction mechanisms. We describe a semantic representation that captures the functions, characteristics and relationships between virtual objects. The model we propose is designed to turn the objects in a virtual environment into autonomous and reusable entities that we call digital items. Next section presents an overview of the semantic modeling approach commonly used in information systems, and explains how we apply it to the area of virtual environments. The main part of this Chapter details the semantics-based model we have defined to represent the virtual entities populating a virtual environment and the tools used to display and communicate -interface- with them. The semantic model presented in this chapter has been described in further detail in [68].

3.2

Semantic Modeling

The problem we are targeting is the lack of flexibility of virtual objects when it comes to re-using them in applications or contexts different from the one they were designed for. This refers mainly to the need for adaptive entities -from the geometric and interface point of view. Implementing virtual objects that can modify their shape -mainly the level of

Chapter 3: Semantic Virtual Environments

17

details of the geometry- while keeping their functionalities, requires having knowledge about how the virtual objects in the scene are related to each other and what are their functions and roles. Virtual objects can be represented not only as animatable 3D shapes but also as entities with pre-defined functions that interact with each other in different ways. Semantic models allow for representing such information. According to Rishe [125], a semantic representation offers a simple, natural, implementation-independent, flexible, and non-redundant specification of information. The word semantic means that this convention closely captures the meaning of user’s information and provides a concise, high-level description of that information. A semantic model shall represent in a human and machine-readable format the relations between the objects in the virtual environment and the user. The concept of semantic modeling was originated in the field of databases and information systems in general. The large amount of data contained in multimedia information systems requires a semantic support for efficient navigation and retrieval. Concepts like ”semantic nets” [36] have been proposed to represent the knowledge contained in a collection of heterogeneous data. Since those early works, semantic descriptors have been used as the building blocks of a hypermedia knowledge base. A more recent example of using semantics to create an efficient knowledge database can be found in [33]. The authors propose an integrated approach to the development of spatial hypertext. They use several theories and techniques concerning semantic structures, and transform them into a semantic space rendered in virtual reality. Browsing and querying become natural, inherent, and compatible activities within the same semantic space. The design principle is based on the theory of cognitive

18


maps. Techniques such as latent semantic indexing, Pathfinder network scaling, and virtual reality modeling are used as well. The concept of semantic web, proposed by Tim Berners-Lee et al. [15], defined an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. Since then, new research has been targeted at defining better models to represent information and concepts: knowledge. For instance, OntoShare [41] is an ontology-based system for sharing information among users in a virtual community of practice. The authors use Semantic Web technology and apply it to knowledge management tools. Communities are able to automatically share information. The ontologies can change over time based on the concepts represented and the information that users choose to associate with particular concepts. The link from knowledge based systems to Virtual Environment applications has been gradually established through the use of virtual reality to visualize the relations between concepts and other pieces of information. This can be exemplified by the work of Chen et al. [33], where the authors represent semantic information in a VR environment. At the same time, the ”pure” VE modeling has started to incorporate semantics as a mean to improve the user interaction and representation of the virtual worlds. In a latter publication, Chen et al. [34] present a knowledge-based media space modeling approach to build customizable shared virtual environments. They focus on how a semantically enriched spatial model mediates interactive behavior of users in a domain-specific context. Enriched spatial models contribute to make the environment more attractive to visit, and interact with. This work shows that a


19

semantic-based representation can be useful to help the user understand and interact with a virtual world. VE’s are custom applications that can’t be easily adapted to different contexts. Information describing the environments is not readable by different clients -viewers and interfaces. Sharing the content or using it in a different scenario is impossible in most of the cases. A semantic virtual environment shall provide a unified representation of the information describing a VE and allow for uniform and scalable access to heterogeneous environments [113]. Recently, the semantic of 3D objects has been explored in the field of architecture and cultural heritage. The main idea is to emphasize the importance of the object’s functions and its relations with other objects. The geometric representation is a result of the functions and role of the object in the VE. Kang and Kwon [86] have used semantic information concerning the relations between objects composing a scene -ancient buildings in this case- to provide additional geometric representations of a given object. The ”semantic level rendering” provides the user with an alternative image of the virtual object, conveying meaningful information that would be impossible to represent otherwise. Grussenmeyer et al. [59] consider the semantic structure combined with the geometry as the root of the 3D model. The space concerned by the modeling is decomposed into different ”semantic concepts” that include a semantic structure. The semantic structure is the foundation of knowledge and composition rules which can be used to assist in the semi-automatic geometric reconstruction of different objects -historic buildings in this case.

20

Chapter 3: Semantic Virtual Environments We have presented a brief state of the art on semantic modeling and its appli-

cations on knowledge-based systems and virtual environments. We conclude that the potential of using semantic information has not been fully exploited. The need for incorporating additional information to the objects in a virtual world has been clearly identified. Semantics can be used at any step of a 3D model’s life cycle: starting with the modeling/design phase and going up to the exploitation of the content -navigation, and retrieval. Nevertheless, in order to truly unify and extend the use of semantics for virtual environments, a general model to represent any virtual scene is required. In the next section we describe the general model we are proposing to represent interactive virtual environments. It is important to keep in mind that our main objective is the reutilization of the content in a variety of contexts, and the interaction flexibility -being able to use different interfaces without re-implementing the application.

3.3

Semantic model of a Virtual Environment

For defining our general semantic model of a virtual environment we have taken into account the works on shape semantics cited on the previous section as well as a basic notions coming from the domain of CAD systems. The distinction between the geometry and its functionality is a concept common in the context of featurebased CAD and also in the area of character animation. For instance, when a digital artist creates an animation sequence, he/she often uses a simplified version of the character being animated -usually a skeleton or a generic articulated shape- and after, the functionality (animation) is applied to one or more shapes (characters)


21

with the same semantics (similar hierarchical structure or skeleton). The object’s shape becomes one of the various properties describing a virtual entity and depends on the available interfaces -both for visualization and communication- and on the user preferences. For example, a virtual character can be automatically visualized as a realistic high resolution 3D shape or as a simplified low-polygon skeleton, according to the computing power of the viewer (e.g. a graphics workstation or a PDA). In any case, the character’s functionality (animation) is the same. This is possible only if the character is defined neither as a deformable 3D mesh nor as a simplified skeleton, but as a virtual entity with a particular semantics (hierarchical structure describing the character’s pose and animation). One of the main assumptions of our model is that geometry is not the central attribute of a virtual object, but a function of its role in the environment and the context where it is being used. Our model draws some inspiration from the ISO standards that aim at describing digital items and render them adaptive: MPEG-7, MPEG-21. MPEG-7, formally named ”Multimedia Content Description Interface”, is a standard for describing the multimedia content data that supports some degree of interpretation of the information’s meaning, which can be passed onto, or accessed by, a device or a computer code [99]. In general, what this standard tries to standardize is the way to annotate -describe- the digital content. A similar kind of descriptors will be used by our model to specify the general information of each object in the VE. But describing the content is only the first step towards implementing a truly semantic model; the notion of content adaptation must be introduced as a consequence of the

22


knowledge -semantics- contained in the content’s description. MPEG-21’s approach is to define a framework to support transactions that are interoperable and highly automated, specifically taking into account digital rights management (DRM) requirements and targeting multimedia access and delivery using heterogeneous networks and terminals [21]. We believe one of the keys to heterogeneous delivery is the content adaptability, which can only be achieved by considering the way a multimedia item is built -knowing what are their components, functions and capabilities. Our objective was to define a complete representation based on the semantics and functionality of the digital items -objects- contained within a Virtual Environment. One of the innovative aspects of this representation is the ”digital item approach”. We consider each and every object participating in a VE application not only as a 3D geometry, but as a set of properties where the geometry is just one of them. A digital item is a logic entity which can be used in different contexts, it is able to describe itself and declare its functionalities, it can evolve over time, and its author(s) can incorporate or eliminate attributes/functions to make it more suitable to a particular application. This representation will allow for incorporating novel interaction mechanisms within VE applications. The digital items will inform about their skills and functionalities and will accept the user’s input coming from different interaction devices in a transparent way. This will enable the possibility to develop more intelligent and adaptive interfaces to VEs. The digital items will play both the role of the content and the interface to the virtual environment -e.g. autonomous items with function-


23

alities designed to perform particular tasks following the user’s orders. Moreover, as we show in the test applications’ section, the semantic model allows digital items to communicate between themselves and interact or react in consequence (autonomous behavior). We have defined a semantic model of virtual environments having in mind that the main attribute of a virtual entity -objects or characters- is its 3D shape. Virtual Environments are geometry-based applications. However, depending on the context, the shape of the entity can change. Moreover, there are contexts in which multiple shapes representing the same virtual entity must be manipulated and synchronized. For instance, when displaying a virtual character in a large projection screen, we require information for rendering a high resolution (hi-res) 3D shape. Nevertheless, the hi-res shape could be controlled by the user interactions performed through a different representation, either a simplified shape or an abstract manipulator -a menu, a slider or any other GUI control. Using independent representations for each case is costly in terms of data synchronization. Our semantic model encapsulates the set of geometric and/or abstract representations belonging to each virtual entity and associates descriptors to inform the rendering and interface systems about how to handle each object. Figure 3.1 shows a general class diagram of the semantic model for the VE and the tools that can be used to control and interact with it. The Scene is the main container of the VE model; it has references to the digital items contained in the VE. Basically, the scene is the data repository used by the digital items and other software tools such as viewers or interaction devices.

24


Figure 3.1: Semantic representation of an interactive virtual environment.

A Semantic Descriptor provides human and machine readable information about a particular digital item (we also call them virtual entities). They are the entry points for the scene controller to choose the correct geometry and interface to present. The semantic descriptor is the placeholder for any information describing how the digital item is to be used and how it is related to other items in the scene. The Geometric Descriptor of a digital item specifies the type of Shape associated to the entity: a deformable mesh, an articulated body (joints and segments), etc. For instance, hierarchical structures for skeleton-based animation can be defined by the geometric descriptors -e.g. H-Anim [69], the standard specification of a hierarchical structure for humanoid characters animation. Moreover, alternative skeletons with different levels of detail on the number of joints and/or the complexity of the segments -3D shapes associated to each joint- can be specified as well. Handling alternative geometric representations for each virtual entity in the en-


25

vironment is not enough to provide a complete representation of a virtual world. Our model reflects also the relations between the entities. The semantic descriptors characterize each object/character in the scene and constitute a scene graph that can be used both for rendering and for extracting the underlying information about its contents. Digital items can contain other items or be related to each other in different ways -e.g. entities that move together, or that trigger events on other entities-, the model is general enough to express a variety of relations. A Digital Item can be controlled in a variety of ways: interactive controls (requiring user input), autonomous animation controllers (implementation of algorithms to synthesize some kind of behavior and/or animation), etc. A Controller specifies the interaction possibilities: predefined object behaviors that react to human intervention, and the way to provide access -interfaces- to them. For instance, an interactive control can describe the animation to open and close a door, and expose the parameters controlling this behavior. This information can be used to implement a variety of interfaces to access the same parameter: 3D manipulators or 2D GUI controls. Usually, digital items in a VE are animated either in an autonomous or in an event-triggered way. The virtual entities contained in a scene can be controlled and displayed in several ways. Different viewers and interfaces can be used depending on the devices available and the information associated to each virtual object/character. For example, a single virtual character can be rendered as a hi-res 3D model in an OpenGL-based viewer or a simplified version can be displayed in a PDA screen -provided that the character’s descriptor contains alternative representations associated to it. The character can be

26


controlled through a 3D manipulator or a 2D GUI control. All the previous cited information can be stored in the corresponding semantic descriptor.

3.4

Conclusions

We have presented a general model to represent digital items used to populate and control interactive Virtual Environments. Our model is based on the semantics of the virtual entities and not on their geometry, as is the case in most of the VE applications. The semantic model we have proposed provides a way to specify alternative geometric representations for each entity and a range of functionalities. The entities are called digital items and can be not only virtual objects to be rendered as part of the scene, but also independent user interfaces and controls for animation or interaction. The virtual entities defined with this modeling approach can be reused in a variety of visualization contexts -from realistic rendering to highly simplified representationsand contain the information required to implement different types of interfaces to control them, from 3D manipulators to “classical” 2D GUIs. Moreover, the model can also be applied in the area of animation, allowing the creation of more responsive virtual entities and improving the believability of the VE. This semantic model allows for implementing a variety of interaction channels (multimodal interfaces); provides a way to implement animation/interaction controls as components of a digital item or as independent entities. Once the information has been organized and unified, the choice of interfaces and visualization techniques is more a development decision than a design problem. The semantic model we have defined provides us with a set of design patterns


27

to represent the information required for controlling and interacting within a virtual environment. The next chapters describe how we applied this modeling approach in the areas of interaction, visualization and animation of entities participating within Virtual Environments.

28


Chapter 4 Semantics for Interaction 4.1

Introduction

This chapter presents new interaction paradigms for Virtual Environments supported by a semantic modeling approach. VE applications are still limited to controlled environments such as research laboratories. For VEs to be integrated into everyday applications, new interaction paradigms must be designed by means of flexible tools. New interaction paradigms should build upon familiar concepts to ease user’s acceptance. This chapter is divided into two main parts. The first one presents a novel way to interact within Virtual Environments by means of handheld devices. The inspiration for this is the fact that remote controls are omnipresent nowadays. We intended to exploit the notion of holding a device and using it to control different entities. The only difference being the fact that our remote control serves to interact with entities in a virtual world, instead of operating with physical devices. We 29

30

Chapter 4: Semantics for Interaction

present three different examples of using a handheld device as interface for VEs: the mobile animator, used for interactive posing of virtual characters; a conducting interface for a virtual orchestra; and a monitoring and configuration interface for Virtual Environments with haptic feedback. Looking for familiar notions and concepts to be employed as interaction paradigms we decided to explore the idea of gesture-based interfaces. Gesturing with upper limbs and face are natural communication channels between human beings. It has been demonstrated that integrating them into a multimodal interface brings important benefits: ease of learning, intuitiveness, immersion in the task, etc. [37], [4]. The second part of this chapter describes a meta-interface supported by a semantic representation of VEs that allows for configuring in real-time a variety of multimodal interfaces.

4.2

Handheld interfaces for semantic Virtual Environments

4.2.1

The Mobile Animator: interactive control of virtual entities

It is a well-known fact that the main objective of a VE application is to immerse the user into the simulation, there are several definitions for this concept, a simple and popular one is: ”creating the illusion of being there”. Immersion requires different technologies to create a multisensorial representation of the VE and at the same


31

time provide a way for the user to communicate -interact- within the virtual world. Depending on the technology applied, we can be fully or partially immersed into the VE. Full immersion is typically achieved by means of Head Mounted Displays (HMD). The semi-immersive approach consists in positioning one or more users in front of a large rear-projection screen displaying the virtual world. Stereo glasses and 3D surround sound enhance the experience. The problem with full-immersion is that the HMD isolates the user from the real world and makes it difficult to share the experience and communicate with other people. The limited field of view leads to spatial orientation problems and motion sickness : the cyber-sickness effect [98], [146]. Research has been aimed at tackling the spacial orientation and other related problems [25]. However, after several years of experimentation, the semi-immersive approach has gained popularity due to the possibilities it provides for direct interaction and communication between users. It has been observed that a semi-immersive VE improves the overall user experience [103]. Moreover, this kind of systems can be used as collaborative tools by the design and development teams, making the implementation of a VE application more interactive and effective. At this point is important to clarify that we consider the term immersion in a broader sense than the ”classical” VR approach. In this work we talk about ”immersion in the task”. We aim at creating an environment that eases the control and setup of virtual scenes letting several users work simultaneously. At VRlab we have tested the semi-immersive approach with different interfaces.

32


The interaction mechanisms inside a VE are also a key component to create the ”illusion of being there”, the sense of presence [153]. Interaction includes activities like changing the point of view (camera management), navigating through the 3D world, and performing specific actions involving objects and/or virtual characters -either autonomous or user-controlled avatars. In the IST-JUST European project [121] -a VE for health emergency decision training- head orientation tracking with a magnetic sensor for camera management and navigation was used in combination with voice commands to interact with semiautonomous characters. The sensor on the head revealed to be uncomfortable, concerning the voice commands, the intervention of a human operator was required to ensure the correct performance of the system. In a later work, we implemented a multimodal interface called ”The Magic Wand” [37], we replaced the head tracker by a less cumbersome device, similar to a 3D mouse. We continued to use voice commands, since they are a very intuitive way for selecting among preset actions. The problem of speech recognition was partially solved by limiting the number of recognizable phrases to a small single-word vocabulary. However, this interface does not perform so well in more open scenarios, due to the inherent difficulties of natural language processing. In effect, some situations may require full control of the virtual character, e.g. there are games where the characters must adopt a very specific pose in order to reach the goal. For instance, in ”The Enigma of the Sphinx” [4] -a VR game developed to demonstrate the use of the ”magic wand” interface- the user must mimic the postures displayed by one virtual character and also direct another one to adopt a particular pose


33

4.1. The problem was solved with the combination of the 3D pointing device and a set of voice commands (the names of the body parts to be moved). While this can be entertaining, is not always the best way to specify a detailed character posture or execute more complex tasks.

Figure 4.1: User interacting on a semi-immersive VE using ”The Magic Wand”.

Interactive simulations require fine-tunning of the character’s posture, in particular at design time. As we have mentioned, a semi-immersive VE can be used since the early stages of development for experimenting and observing the effects of the VE on the test users -the designers themselves and the rest of the development team. Our objective was to provide a low-level control -joint-wise- of the characters in a multiuser Virtual Environment. For instance, one user could modify the posture of the upper body while another one adjusts the position and orientation of the same character, the changes are immediately reflected on the projection screen. Each user would need access to an independent view and a way to interact with. The problem of interaction with objects in VEs has been approached by means of

34


3D/2D widgets and other interaction techniques: [39], [156], [24]. With this work we start exploring a new alternative based on handheld devices or PDAs (Personal Digital Assistant). One of main advantages of this approach is the absence of floating menus and other user interface (UI) components over the simulation screen. Several interaction techniques have been proposed to avoid superposing UI components: [119], [24]; but they have been implemented in the context of full-immersion applications and are not suitable for our semi-immersive system.

Handheld devices as interfaces to virtual environments Handheld devices have attracted the attention of researchers in the field of Virtual Reality even before the required hardware were available. One of the first studies aimed at using a mobile device as an interface to a VE is the work of Fitzmaurice et. al.

[51]. Actually, the authors implemented their

own palmtop device 1 . A 4-inch color monitor with an attached 6D input device was used to navigate and interact through the virtual world. No additional display was used, the small monitor was the only ”window” into the VE. The experiment showed the feasibility of using a handheld display in a VE and its advantages: user mobility, intuitive way to interact with the computer. However by that time (1993) the available offer of true handhelds was very limited and there was no hardware capable to communicate with the main system and display an independent interface their system was physically connected to a workstation. Besides, this experiment was more oriented to replace normal size monitors by small screens for navigation and 1

The term palmtop is considered here as equivalent to ”handheld”: a device that can be held in the palm of the hand


35

interaction. It was different from our concept of an interaction tool for collaboration in an immersive VE. The work of Watsen, Darken and Capps [152] was one of the first implementations of a true handheld-based interface to a Virtual Environment. The authors c to be used while standing inimplemented a 2D interface on a 3Com PalmPilot° side a CAVE system. Basic scene and object manipulation were done through slider controls on the handheld. While this prototype confirmed the benefits of a mobile interface to VE, the scenario where they applied it -CAVE- opposed some problems to the use of a handheld: being a full-immersion system, the CAVE made it difficult for the user to switch between the surrounding 3D world and the handheld. More recent research has continuously focused on the implementation of 2D GUIs for basic interaction: camera control and scene management (object picking, navigation through a simplified representation of the 3D environment) [72], [49]. Emphasis has been made on the portability and mobility of the GUI itself. Some other problems starting to be targeted are peer-to-peer communications between handhelds, together with client-server (handheld-main graphic workstation) interactions to control the VE and interact with other users at the same time [50]. Other authors still consider the PDA as a virtual viewer. In the Touch-Space system [35] the PDA is optically tracked and used to display virtual objects superposed on the real world. The PDA may serve as a tool to reinforce human-human contact in a virtual environment. This is a very important point to consider, apart from immersing the users into the simulation, we want them to keep in contact with each other to share

36


their impressions and work together. The missing component was an efficient interface to give full control of the characters and other objects in the scene. The research work we are aware of has always focused on basic manipulation of 3D objects, excluding articulated characters: translation, rotation and camera management by means of 2D controls. We decided to go one step forward and created a 3D graphical representation of the characters and objects in the Virtual Environment right into the handheld. We believe the best way to interact with a virtual object is through a 3D representation controllable by virtual manipulators. Similar to the animation tools included in commercial packages like Maya [100] or 3DS MAX [1]. This is why we called our interface ”the mobile animator”, we tried to mimic the well tested interfaces used by professional animators and adapted them to a mobile device. The following section describes our handheld interface in detail: the object representation, the virtual manipulators, the data coding/transmission and the system architecture.

Object representation The fundamental assumption we have made when designing the mobile animator concerns the interface. We decided to use a 3D representation of the objects in the handheld display instead of 2D controls -sliders, buttons, menus- because this seems to be the most intuitive way to have the low level control we are looking for. This choice is justified by the confirmed efficacy of the commercial animation tools. Professional animators manipulate characters and objects through virtual 3D


37

manipulators, see figure 4.2. This interface can be more intuitive than a 2D GUI, in particular when we have direct feedback on a high resolution 3D projection. In fact, for this first prototype we targeted the manipulation of virtual characters only. The next question was about the level of detail of such 3D representation. The immediate answer could be to reproduce with the same resolution and detail the characters as they are rendered in the VE. This option has been discarded for several reasons. First, the computing power available on the currently most advanced PDA -Pocket c PC using the Intel XScale°microprocessoris not enough to render the high resolution geometries we project on the large screen.

Figure 4.2: Virtual manipulator for joint orientation and high resolution character on c IPAQ°.

We have implemented a testbed application to visualize the same virtual characters used in our VE applications -articulated characters with an average size of 15k c [65], see figure 4.2. We didn’t obtain an interpolygons- in an IPAQ PocketPC° active frame rate, our best results were about 5 frames per second, not enough for

38


real-time manipulation.

c Figure 4.3: Bounding-box representation on Bones Pro°.

Moreover, even if the hardware performance is expected to increase in the next months/years, the screen of the PDA is too small to let the user appreciate realistic 3D objects. Augmenting the resolution of such small displays won’t help too much, and increasing the size wouldn’t make sense because those devices would be transformed into Tablet or Notebook PC’s, impossible to hold them on one hand. Here it is pertinent to explain the way we model the virtual characters we work with (virtual humans, most of them). We use the standard representation defined by the MPEG-4 FBA specification [29], [80], [81]. The foundation of this is the hierarchical structure or skeleton used to animate the virtual humans. The skeleton has been standardized by the H-Anim group [69] and adopted by MPEG. A full H-Anim skeleton contains close to 80 joints with their corresponding segments (associated


39

geometry). The most complex parts due to the number of joints involved are the hands and the trunk.

The choice was to provide the mobile users with a simplified representation and let the realistic rendering for the large projection screen.

How simple should the representation be ? Once more we took the professional c [20]is a tool intensively used by the design team at tools as a reference. Bones Pro° VRlab - EPFL for character posing and animation. The bounding-box representation it uses to manipulate the 3D objects is rather common and intuitive, see figure 4.3.

There is another frequently used representation: a skeleton built by ”balls and sticks”. We discarded it because it doesn’t give enough information about the actual dimensions of the character limbs.

The bounding boxes help avoiding collisions and impossible postures while manipulating the skeleton -even if no collision detection is performed, having the notion of the actual dimensions is welcomed. However, the reduced dimensions of the handheld’s display still prevents us to make an exact copy of such representation for virtual characters.

To keep the representation simple we introduced different levels of detail. In the default level, the shoulders, trunk and hands are simplified. If the user needs fine control, the simplified sections can be zoomed to access the full set of joints. The figure 4.4 shows the simple model compared to the detailed model of the upper trunk.

40


Interactive control of virtual characters

The graphical interface on the PDA allows the user to modify the posture of the character (joint orientations), the orientation of the character in the handheld display, and its position/orientation in the Virtual Environment, see figure 4.5. If the VE contains more than one virtual character, the interface presents a list of names to pickup the character to modify. Each joint is fully modifiable through the stylus by pointing on the segment (bounding box) and dragging it to the desired position. Each joint has 3 degrees of freedom -Euler angles-, they are controlled by mapping the 2D coordinates of the stylus In fact, only two rotation axes are modifiable at a time, the third one is accessible by dragging the stylus while pressing on a button configured by the user. The PDA has several built in buttons whose function is customizable. In a similar way, the character can be viewed from different points of view, by pressing the button preset for camera management and dragging the stylus until the model is in the desired orientation. This makes more comfortable the manipulation of certain joints,difficult to access from the default orientation. While the user can modify at will the orientation of the character on the PDA screen, this parameter remains unchanged in the Virtual Environment. Only the changes on the joints are reflected. To modify the position/orientation in the virtual world a virtual manipulator is used in combination with the directional pad on the device.


41

Figure 4.4: Simple model and a detailed view of the upper trunk.

Figure 4.5: Interactive character control on the PDA: joint orientations and viewpoint. System architecture The Mobile animator is the front-end of a 4-layered system. As shown in Figure 4.15, the system architecture has 4 main components:

• VE system: software running the interactive Virtual Environment application to project on the large screen. • Network layer: software component in charge of managing the communications between the Handheld device(s) and the Rendering System. This component can reside in the rendering workstation or on a different computer.

42

Chapter 4: Semantics for Interaction • Character Data: automatic digital model adaptation. • Handheld interface: The user interface to the VE. The VE system is an interactive 3D graphics application that receives information

from the network layer to update the character’s posture/position/orientation. It sends information concerning the characters in the scene for the network layer to adapt it and deliver it to the PDA. The network layer contains a multi-threaded TCP sockets application able to manage concurrent connections between one or more PDAs and the rendering system. The Character Data component is responsible of retrieving the character’s definition from the VE application and delivering it to the PDA when requested. It queries the rendering system to get the data defining the virtual characters present in the VE: skeleton and associated geometry, position and orientation in the space. With this information it prepares the simplified bounding-box representation of each of them: it replaces the polygonal mesh of each character’s segment (body part) by its corresponding bounding-box. The segments corresponding to the hands and trunk are simplified as described before. In order to manage concurrent communications, the network layer blocks the access to a character on a per-parameter basis: One or more users can interact with the same virtual character, but they are not allowed to modify the same joint or the position/orientation on the VE at the same time. The Handheld interface communicates with the network layer to retrieve the list of available characters on the scene and sends the updated information generated by the user.


43

Figure 4.6: System Architecture.

Semantics of interaction with virtual characters Figure 4.7 shows a schematic view of the main components of the system as they fit into the semantic model presented in Chapter 3. The controller associated to the character (interactive control) implements an algorithm to modify several parameters on the character as a function of the user input (through the stylus on the handheld device). User modifiable parameters are described in the semantic descriptor (not shown in figure) and include the rotation angles of each geometric descriptor (joints) as well as the global translation and rotation of the main descriptor (equivalent to the root joint of the skeleton). The system implementation can provide interfaces to modify the character’s posture (joint angles) and its position/orientation in the virtual world. It is useful to provide alternative geometric representations for the same character, just as the modeling tools do: a simplified representation for specifying the anima-

44


Figure 4.7: Semantic model of an animation system for shared virtual environments. tion, and a high-resolution 3D geometry for realistic rendering. This is achieved by associating two different shapes to each geometric descriptor: a high-resolution shape for high-quality rendering and a simple one for direct interaction. The simplified representation is useful for providing a control interface independent of the main simulation. The semantic descriptor contains the information defining which representation to use depending on the user terminal.

Evaluation of the Mobile Animator Tests with two users in front of the projection screen reported good results in terms of user-friendliness and effectiveness of the interface, see figure 4.8. The system truly enables more than one person to interact in a shared Virtual Environment. The main issue he have observed during evaluation was the division of the user’s attention between the handheld and the simulation screen. Some users looked at the large screen only after they had finished posing their character on the


45

Figure 4.8: Two users interacting in the collaborative Virtual Environment.

PDA, to check the final result. Other participants selected one joint on the handheld and turned their attention to the main screen while moving the stylus. Direct contact between participants and absence of UI components on the main simulation screen were appreciated by the test users. The handheld interface showed to be an efficient way to modify the posture and location of virtual characters in a Virtual Environment, without requiring additional applications/interfaces to be open, the main simulation screen remains undisturbed. This kind of mobile interface can be used not only for posing characters, but for interacting with any object -virtual entity- in the Virtual Environment. In the next section we present second example of how a handheld interface can be used for interacting and communicating the user’s intentions to the entities contained in a VE: conducting an orchestra of virtual musicians.

46


4.2.2

Conducting a Virtual Orchestra

The ”virtual orchestra” system is presented in detail in [132]. My contribution to this work focused on the conception and implementation of a handheld interface supported by a semantic representation of the entities within the VE. The virtual orchestra combines user interaction through alternative representations of the scene, and semi-autonomous characters whose behavior is conducted (modulated) by the user. In this application, the characters are virtual musicians that can be conducted by a human user through a magnetically tracked handheld interface.

Figure 4.9: Semantic model of a virtual musician.

The objective of this work was to provide an interface to modulate (conduct) the performance of an ensemble of virtual musicians. The Autonomous animation algo-


47

rithm implemented as a controller on each virtual musician is based on the research work of Esmerado [48],[47]. This application is based on the semantic model proposed in chapter 3: independent digital items that can be used by a main entity as user interfaces or sources of information, and alternative representations of the digital items. Figure 4.9 presents the main system components as they fit into the semantic model. Conceiving the application as a Semantic Virtual Environment eased the implementation of the virtual orchestra: The system is composed of a set of independent digital items that are put together and intercommunicate through the semantic layer constituted by semantic descriptors. Digital items are used as interfaces; one of them (handheld interface) displays alternative representations of the main scene: schematic view of the musicians in the scene, the current music score and the state of the performance’s parameters (dynamics, tempo). The digital item implementing a handheld interface interacts with the main items (virtual musicians) and enables the dialog (interaction) between the user and the entities in the scene. A second input device (interface) is implemented as a digital item that has no geometric descriptor; it is a logical device or controller for a magnetic tracker (Ascension’s Flock Of Birds [8]). The controller of the FOB item (not shown in figure) implements a gesture analysis algorithm that translates the user’s gestures into discrete values of dynamics and tempo. This information is communicated to the virtual musicians through their interactive controls. The process ends when the acquired information reaches the autonomous animation controller which is responsible of animating the characters. The Music Score and Virtual Flute

48


digital items have a double function specified in their respective semantic descriptors (not shown in figure). On the one hand, these items have a geometric descriptor defining the way they are visualized in the virtual environment (see the virtual flutist in figure 4.9). On the other hand they contain information about the score to play (basically a midi file) and the way to play the instrument (this information is used as part of the input parameters of the autonomous animation controller). Next section presents a different application involving handheld interfaces supported by a semantic modeling. In this case, the interface is used not only for interaction and configuration of the environment, but also for monitoring the performance of other users interacting within the VE.

4.2.3

Controlling Haptic Virtual Environments through Handheld Interfaces

This work is based on the use of haptic interfaces and reconfigurable virtual environments as tools for telerehabilitation. Further details are described in [62]. Our research focuses on implementing a telerehabilitation system for kinesthetic therapy for patients with motion coordination disorders of the upper limbs. The therapy is targeted to help patients who have lost precision/control of their armhand gestures. This disorder is frequently the consequence of a traumatism. The patients are unable to follow a given trajectory in space. They cannot control their movements and/or have lost the notion of space depth (spatial reasoning). The therapy we have designed consists on having the patient follow different trajectories with her hands while immersed in a virtual environment with haptic feedback,


49

see figure 4.10. Trajectories are represented as 3D pipes lying on a 2D plane in front of the patient. The idea is to keep the hands inside the pipe, without touching the borders. The patient can see her hands in the virtual environment and feel when touching the virtual object.

Figure 4.10: Telerehabilitation system, haptic virtual environment controlled through a handheld interface. Picture on the left shows the patient’s view of the VE.

The therapist uses a handheld interface that allows for creating and modifying the pipes in real-time. While the patient stays in the hospital using our teleoperation system, the therapist can monitor and control the treatment at distance, from any location with Internet access. The handheld interface is supported by a semantic model of the haptic virtual environment.

50


System Architecture Our architecture for telerehabilitation systems is based on the following requirements: • using fully immersive environments with haptic feedback • keeping close communication between therapist and patient • giving the therapist full control over the virtual environment Haptic Feedback First we must define the specific type of virtual environment we want to use. We have chosen the full-immersion approach, a system where the user gets inside the virtual world by means of a Head Mounted Display. We believe this is an interesting alternative. Full immersion can enhance the patient’s interest. This kind of systems isolate the user from the real world and allow for deeper concentration on the exercise. We used a Haptic WorkstationTM [78]. This device, conceived for virtual prototyping, provides two-handed force-feedback and is a versatile tool. Our architecture intends to evaluate it in the context of physical rehabilitation. Obviously, for the moment we restrict ourselves to upper-limb therapy. However, the concepts and the rest of the architecture are not hard-linked to the use of the Haptic Workstation and can take advantage of other haptic interfaces. A ”Window to the Real World” As affirmed by Loureiro et. al. [96] attention and motivation are keys for recovery. We believe these can be achieved through an appealing therapy environment. However, special care should be put on the therapistpatient communication as well. Human contact is essential. The therapist plays not


51

only the role of doctor and specialist but acts as coach or friend. In a telerehabilitation scenario, the audiovisual contact should be kept by means of teleconferencing technologies. A webcam with microphone is a convenient solution to ”send” the therapist into the patient’s place. In our full-immersion-based architecture we keep human contact by means of a ”window to the real world”, a virtual screen that displays live video of the therapist. This way, the patient immersed in a virtual environment is linked to the real world. The live image allows for demonstrating the therapy exercise and accompanying the patient through the first trials. This can be an effective way for correcting the patient’s gestures and encouraging her to keep trying. Remote Control of Virtual Environments An on-line therapy system is not complete unless we close the communication loop. The therapist needs to monitor the patient’s performance. Being able to adapt the therapy environment to the current needs of the patient is essential. Closing the communication loop with a second webcam located on the patient’s side would not be enough. The therapist requires more detailed information such as performance statistics, clinical history, and a way to modify the environment. Here is where we make our main contribution. The therapist requires control over the therapy environment in order to dynamically adapt the exercises to the current needs of the patient. For instance, the patient’s mood could make her get bored faster than usual. She could find the routines harder than they actually are. The therapist could take the decision of modifying totally or partially the current exercise to better fit the patient’s mental and physical conditions.

52

Chapter 4: Semantics for Interaction Such a detailed control of the therapy environment requires an easy-to-use, non-

cumbersome interface. The interface should allow for keeping direct visual contact with the patient and freedom for gesturing with the arms. The therapist must be able to demonstrate the exercises and encourage the patient. Instead of placing the therapist in front of a PC with a webcam, we put the essential tools and information in the palm of her hand by means of a handheld device. PDA or handheld devices have been successfully used to complement or even eliminate the need for PC-based interfaces to virtual environments, e.g. [65], [66], [50]. Tests have shown the feasibility of using a handheld to control and interact within a VR application. A handheld interface maximizes the user’s freedom of motion without loosing neither control nor ease of use. We apply the concept of handhelds as interaction tools to VR in the context of telerehabilitation. The central idea of our system architecture is giving to the therapist the possibility of monitoring and reconfiguring the therapy environment in real-time. We want our system to be as flexible as possible. The next section describes the way we have modeled the therapy environment by means of a generic representation of virtual environments.

Semantic model of a therapy VE Instead of implementing an ad-hoc application for a unique test case we have defined a flexible system architecture. The objective was to specify the infrastructure for developing a variety of applications involving multiple interaction terminals (haptic virtual environments, handheld/PC-based interfaces, etc.).


53

Figure 4.11: UML diagram of a generic semantic model for interactive virtual environments. We designed a data model based on the semantics of virtual entities. We consider virtual objects not as 3D shapes but as items with a set of functionalities (semantics) which can be used in different contexts. Virtual entities should be rendered (visually and haptically) in different ways depending on the terminal (therapy VR environment, handheld interface, etc.).

In this case we need to render the virtual entities to be used in the therapy environment. This includes the virtual objects with which the patient interacts, as well as the virtual hands - the patient’s interface. Such virtual objects must be editable by means of a mobile handheld device. At the same time, the patients performance must be monitored using the same mobile interface. For instance, the hands of the patient should be tracked and visualized both on the therapy environment

54


and on the handheld. Geometric and functional descriptions, as well as state variables of the virtual entities (current position, etc.) are maintained in a central data repository. The semantic data repository acts as a mediator/translator between the handheld interface and the complex haptic virtual environment. Figure 4.11 shows an UML diagram of the main components of the semanticsbased model that we have defined. The Scene is the main container of the VE model; it has references to the digital items contained in the VE. A Semantic Descriptor provides human and machine readable information (XML documents) about a particular digital item or virtual entity. They are the entry points for the scene controller. They are used to choose the most appropriate geometry and interface to present. The semantic descriptor is the placeholder for any information describing how the digital item is to be used and how it is related to other items in the scene. The Geometric Descriptor of a digital item specifies the type of Shape associated to the entity: a 3D mesh to be used in the therapy environment, or an articulated body composed of joints and segments to represent the patient’s hands, etc. Hierarchical structures for skeleton-based animation -for the virtual hands- can be defined using geometric descriptors. Virtual entities can be represented with different shapes depending on the context in which the are used. For instance, on a handheld interface the therapist does not require a 3D view but only a schematic representation of both the patient’s hands and the interactive entities. Alternative geometric representations for each virtual


55

Figure 4.12: Virtual entities share semantic meaning and have context-dependent shape representations. entity can be defined by means of Shape descriptors, this idea is illustrated in figure 4.12. Our model reflects also the relationships between the entities. The semantic descriptors characterize each object in the scene. They constitute a scene graph that can be used both for rendering and extracting underlying information about its contents. For instance, such information is used for collision detection and generation of force-feedback under the haptic VE. Digital items can contain other items or be related to each other in different ways. A Digital item can be edited through the handheld interface or follow the motion of the user’s hands. Controllers specify the interaction possibilities and expose the parameters controlling their behavior.

56


Evaluation of the telerehabilitation environment

Preliminary informal tests have been carried on with our first prototype. For the moment, researchers from our lab have played the role of patients and therapists. We observed that the integration of the virtual window is a valuable help to keep the user communicated with the real world. A psychiatrist took a look at our system and found that the ”window to the real world” was a very positive improvement compared to other virtual therapy environments. Tests have been realized in which the ”therapist” designs an exercise and right after accompanies the ”patient” in the execution of the gesture. Thanks to the handheld device, the therapist has a good range of motion freedom and can easily gesticulate with the upper body while monitoring and editing the therapy environment.

The users playing the role of patients were able to follow the gestures of the therapist on the virtual screen. According their comments, the haptic feedback proved to be an efficient way to convey the feeling of interacting with a real object. It was easy to imagine that the 3D pipes were true objects since there was a response when touching them (force-feedback).

The semantic model provided us with an ensemble of design patterns to represent the information required for configuring, monitoring and interacting within a virtual environment in real-time.


4.3

57

Semantics for adaptive multimodal interfaces

This section presents research related to the field of interactive virtual environments. We focus on developing multimodal interfaces. Detailed reviews of the state of the art can be found in [14], [115]. Oviatt [114] identifies three main types of multimodal interfaces that have reached a certain level of maturity after several years of research: speech/pen, speech/lip movement and multibiometric input. We have proposed a variation of the speech/pen interface, replacing the pen input by a basic posture recognition of a magnetic tracked wand: the ”Magic Wand” [38]. My work focused on the implementation of the posture recognition system. The addition of speech recognition and the integration of the interface in a VR game was done by the research team at VRlab. A detailed description of the application and the results is presented in [3]. The interface proved to be robust enough when tested by many users in a public event. Nevertheless, despite de maturity level of some multimodal technologies, the issue of interface adaptation is still to be solved. In fact, multimodal interfaces (MMI) are usually implemented as ad-hoc systems. Even if MMI are designed with a focus on flexibility, few of them are capable of adapting to different user preferences, tasks, or contexts [154]. Changing the application or the context in which an MMI is used often requires costly modifications in terms of development time and effort. This is usually a matter of doing changes in the application’s source code. MMI should be able to adapt dynamically to the needs and abilities of different users, as well as to different contexts of use [123].

58

Chapter 4: Semantics for Interaction MMI adaptation requires being able to manage in real-time the mapping between

multiple user inputs and application functionalities. Different alternatives have been proposed for adaptive man-machine interfaces that can be used in a wide variety of tasks and contexts within virtual environments. Research includes adaptive interfaces for 3D worlds (e.g. tion of multimedia content (e.g.

[13]) but also adapta-

[101]). Content adaptation implies accessing the

same information in different scenarios/contexts, through different interfaces. Efforts aimed at unifying management, delivery, consumption and adaptation of content led to the creation of multimedia frameworks such as MPEG-21 [28], [127]. Content adaptation requires standard representations of content features, functionalities (manipulation/interfacing information). In MPEG-21 such information is represented -declared- in the form of ”Digital Items” which are defined by XML-based descriptors. XML-based descriptors are frequently used for handling the semantics of multimedia content (MPEG-7, MPEG-21) but they can be useful for representing multimodal interaction models as well. For instance, in [124] the authors present an adaptive system for applications using multimodal interfaces. They avoid implementing special (ad-hoc) solutions for special problems. All functionalities are treated coherently using a knowledge based approach. For all multimodal inputs and outputs (speech, gestures, pen/keyboard inputs; PDA, TV screens) they use a common representation approach and generic interaction models. Interaction processing is based on an ontology-driven system. Everything the user and the system can talk about is encoded in the ontology, using an XML-based representation.


59

Systems as the one presented in [124] solve the problem of accessing -multimediacontent through multimodal interfaces without the need of ad-hoc applications. The coherent content representation allows for implementing a variety of interaction and visualization modalities with minimum effort. However, dynamic input adaptability is not so easily achieved. Input adaptability can be defined as the ability of an interactive application to exploit alternative input devices effectively and offer users a way of adapting input interaction to suit their needs [46] (natural or close to natural human communication). Dragicevic [45] has proposed the ”Input Configurator Toolkit” which provides a Visual Programming interface for modifying the mapping between input devices and functionalities of an application. The system adapts to special interaction devices as well as user preferences and needs. Inputs can be mapped to different application controls, creating customized interaction techniques. For instance, speech input can be connected to a scroll-bar control. One of the advantages is the ease of use and interactivity of the visual representation. The user manipulates block diagrams representing the interaction and application devices and the connections between their respective I/O slots. The system has been used to customize mainly desktop-based applications. Devices and interface configurations are defined through a dedicated script language (ICScript). Using a non-standard language limits portability/extensibility of the system to other programming languages/contexts. Systems like the ones presented in [124] and [46] show the need and benefits of adaptive multimodal interfaces. We have defined the foundations for a real-time adaptive multimodal interface system. We use a visual programming interface as a

60


front-end for dynamic configuration and input adaptation.

Figure 4.13: A multimodal interaction setup: vision-based head position tracking + PDA interaction with a virtual character.

Entities participating in a VE with multimodal interaction Configuring a multimodal interface requires mapping the output of an interaction device to a functionality on a particular virtual entity. For this purpose, and following the semantic modeling approach detailed in chapter 3, we have defined the following semantic components. Interaction Devices: The physical device handled by the user in order to send orders to the VE. The essential attribute of an interaction device is the data it delivers (output ports). It can be a 2D vector, a token indicating a particular gesture or word being recognized, etc.


61

Virtual Entities: They can be 3D animated shapes such as virtual characters or multimedia documents, a video, an so on. From the interaction point of view the most important attributes are the customizable parameters that constitute the semantics of the virtual entity (such as the rotation angle of a virtual character’s joint). These are the input ports that let us communicate with them. Modulators: Data coming from interaction devices may require some additional post-processing before reaching the controlled entity. We incorporate a mechanism to further process interaction data in the form of modulators. They are containers for modulation functions. Modulators are also used as the register unit for storing the mapping between an interaction device output and the input of a virtual entity. A multimodal interface is constituted by a set of Interaction Mappings. They can be stored and reused.

Building multimodal interfaces through Visual Programming We have implemented a visual programming interface (VPI) for handling the main semantic components participating in the environment. A meta-interface that eases the task of handling the interface building blocks: interaction devices, modulators and virtual entities; and the links (mappings) between them. When developing this meta-interface we drew inspiration from well-known programming interfaces like the ones implemented in commercial software such as Virtools [149] and LabView [106]. The visual programming paradigm has several advantages when it comes to specify relationships between entities in the form of links between output and input ports [76], [150].

62

Chapter 4: Semantics for Interaction The first prototype of this interface has been developed using MS-VisualBasic, this

allowed for fast implementation of a visual programming interface, from the graphics point of view. Interaction devices, modulators and virtual entities are represented as boxes containing the corresponding attributes. Mapping between interaction data and functionalities of the virtual entities is done by connecting I/O ports through modulators, see Figure 4.14. Interaction data can be of two types: tokens or numeric -normalized- values. Tokens are generally the output of speech recognition algorithms or high-level gesture analysis tools. In the case of numeric values, modulators can treat the input data by means of some user-defined function. In the current version, the output interval (min., max. values) can be specified and the output can be modulated with a polynomial function. Figure 4.14 shows two modulation functions with three control points. Up to four control points can be used to define a modulation function.

Figure 4.14: Visual Programming meta-interface: modulating interaction data and mapping to virtual entity’s functionalities.

In the example, the orientation of a virtual character’s head is controlled by the


63

user’s hand (optical tracking). The hand tracker outputs the normalized position of the head: a 2D vector (0,0) means the hand is on the left upper corner of the camera’s view window, while (1,1) indicates the hand is on the down corner to the right. Modulation results into faster movements as the user’s hand approaches the right or left borders of the view window. Moving the hand up and down directs the characters gaze in the same direction but the motion speed is faster when the hand is on the center of the view window. In this configuration, we don’t modify head rotation on the Z-axis. The whole configuration is stored as an interaction mapping register which can be reused and further modified. The main elements of the adaptive multimodal interface system are illustrated in Figure 4.15. Interaction mapping is done in a central component acting as repository and interaction handler. It exchanges data between interaction devices and the virtual environment application. Interaction devices are usually constituted by the device used to capture user input (microphone, PDA, camera,...) and an interaction controller system that process the raw input and normalizes -recognizesit. Interaction controllers are responsible of communicating with the central interaction handler. This is done by sending the corresponding device -semantic- descriptor through a network connection. Once the central handler is aware of an interaction device, it can display the graphical representation of the semantic descriptor in the VPI. An analogous process occurs in the case of the VE application. Once the user loads a previously defined interaction mapping descriptor or creates a new one, the interaction handler starts processing the interaction data. The central interaction handler modulates data and forwards it to the corresponding input port

64


on the Virtual Environment application. All communications are done through TCP sockets, allowing the implementation of a distributed system. The interface repository and handler is a Windows application programmed in C++ using the QT development framework. These two components synchronize through network with the VB application which provides the visual programming front-end. XML processing is done using the Xerces [7] and Pathan [42] libraries. They allow for parsing and evaluating XPath expressions for XML node selection. XPath [151] is a language for addressing parts of an XML document, pattern matching and string manipulation. This way we implemented basic database functionalities (queries, updates, ...) for the semantic descriptors repository. In the next section we describe some examples of adaptive multimodal interfaces implemented with our system.

Figure 4.15: Architecture of the multimodal interface system: data exchange and storage is done through semantic descriptors.


65

Adaptive multimodal interfaces in action This section shows the feasibility of our system and the benefits of using a semanticsbased representation of Virtual Environments. The examples are based on 3D virtual worlds, but the principles are applicable to any multimedia environment. Gestures-based interface We use optical tracking of facial gestures to animate a virtual character. Tracking is based on the calculation of the optical flow between two images for a manually selected set of points. We use the implementation of the method described in [22], which is included as part of the OpenCV library [111]. Tracked points correspond to representative facial features such as the eyebrows. More robust algorithms such as the one described in the reference work of Goto et. al. [54] could be used as well. The application is based on the MPEG-4 body animation engine developed in the framework of the IST-INTERFACE project [64]. The stand alone demo of body emotional gestures is transformed into an interactive application using a gesture-based interface. The virtual character displays the user emotions recognized by the features tracker. Pen and gesture interface In this application a handheld device is used as a direct manipulation tool for interaction within 3D virtual worlds. The interface is complemented with a gesture-based interface (head tracking). The implementation is based on the ”Mobile animator”, a PDA-based interface presented in [66]. Head tracking is done using the same optical flow algorithm referenced before. This example (Figure 4.17) shows the flexibility of the system for handling multiple interaction devices. Interaction data can be applied to a set of functionalities

66


Figure 4.16: Facial gestures interface: recognized emotions are mapped to emotional behavior of a virtual character. belonging to the same type, such as the joints of a virtual character. Output data coming from a different device can be mapped to a specific attribute as well, e.g. output from head tracking is mapped to the rotation angles of the skullbase joint.

4.4

Conclusions

We have presented a semantics-based representation for interactive Virtual Environments. We base our contribution on considering a VE as a group of entities that are characterized by a set of functionalities -semantics. The geometric representation is considered as an attribute, in contrast to conventional representations that privilege the geometry as the basis of the model, e.g. scene-graph-based representations. The system we have presented shows that a formalization based on the functionalities


67

Figure 4.17: Pen and gestures interface: direct manipulation of 3D objects through a multimodal interface. of the entities under control eases the configuration of multimodal interfaces. Our system is adaptive in the sense that it enables run-time changes on the interaction techniques: mapping between data from interaction devices and customizable attributes -semantics- of entities under control. The proposed meta-interface (visual programming) is an efficient front-end to the underlying formal representation of the interaction mappings. Interface adaptation -reconfiguration- is easily achieved through a semantics-based representation. The difficulty remains on authoring the semantic content which requires creating multiple representations for the different functionalities. For instance, the emotional gestures example needs different animations to be prepared in advance. A semantic representation is not only useful for interfacing with the content as shown in this Chapter. Such a formalization could be used also for content adaptation (transcoding), see Chapter 5. The functionalities and associated information can be

68


used as criteria to translate content to different formats: adaptive rendering. For example, transcoding a 3D virtual character into a text description based on its semantic functionalities, e.g. ”This virtual character is named Peter and is displaying his joy of working with you”. Our work aims at evolving the conception of virtual environments as 3D graphics into a more descriptive representation based on the meaning -semantics- of the content instead of its appearance.

Chapter 5 Semantics for visualization 5.1

Introduction

This Chapter describes a lossless coding-scheme designed to produce a new kind of semi-interactive 3D content. We emphasize the possibility of visualizing highquality content on a variety of terminals ranging from powerful graphics workstations down to thin-clients such as Set-top boxes and mobile devices. The coding scheme we propose gives new expression possibilities to content producers. Building upon previous work on handheld interfaces (see Chapter 4) we propose also an intuitive interaction mechanism for controlling the interactive content. The coding scheme presented in this chapter can be incorporated into a semanticsbased representation for VE, and be the foundation for a transcoding method (semantic rendering) that takes into account not only the geometry, but the interaction capabilities and other functionalities of each virtual entity participating in the scene or application. The use of semantic information about the entities in an VE al69

70

Chapter 5: Semantics for visualization

lows for obtaining alternative representations suitable for visualization in different contexts. The coding scheme described in this chapter provides one those possible alternative representations. This representation of VEs can be used in the context of semi-interactive applications.

5.2

Semantic Rendering

An active self-contained entity such as the digital items defined in our semantic model (see Chapter 3) can be visualized in a variety of contexts. The term visualization can be associated to the concept of rendering. The term rendering is commonly used to define algorithms and technologies for visualizing 3D or 2D geometry (realistic 3D rendering for films and games) or documents (web page rendering). As R defined by The American Heritage° Dictionary of the English Language defines as

a depiction or interpretation (as in painting or music) or as the actio of expressing or representing something in another language or form. In this sense, rendering has a broader meaning that includes representing virtual entities not only as 3D geometry but also in other forms such as text, sound or even tactile information. The semantic descriptors defined in Chapter 3 can be used to associate to a digital item information on how to render it in graphical (this includes both images or text), tactile (for use with haptic interfaces), or auditive form (speech description, associated sounds). Virtual entities augmented with such semantics can be visualized or rendered in multiple contexts, terminals and applications. Semantic rendering offers possibilities similar to the process of transcoding. Transcoding consists on converting a media file or object from one format to another. Transcod-


71

ing is often used to convert video formats (Beta to VHS, VHS to QuickTime, QuickTime to MPEG) [147]. Transcoding can also be used to enable mobile devices that have low resolution and low bit-rate capabilities to access content such as HTML and graphics files originally created for stationary, desktop clients with high bandwidth connections [129]. In this scenario, transcoding is performed by a transcoding proxy server or device, which receives the requested document or file and uses a specified annotation to adapt it to the client ,[31]. Semantic rendering extends the concept of transcoding to virtual entities within an interactive virtual environment, a semantic virtual environment. In some sense, semantic rendering could be considered as an object-based transcoding technique [40],[148]. With the difference that semantic rendering focuses on changing the representation modality instead of the compression ratio of each object in the scene, applied only to video content delivery, as is the case of typical object-based transcoding algorithms. The goal of semantic rendering is to drastically change the representation modality (3D, 2D, audio, haptics) of virtual objects in order to be able to render the same digital item in a wide range of devices and contexts of application. In contrast, transcoding techniques focus on video content and try to optimize the bit-rate and quality under different circumstances such as variable network bandwidth and terminal capabilities (resolution, processing power, etc.). Semantic rendering considers different aspects related to the representation of an entity in a virtual environment: graphical appearance, animation, audio and tactile representations, interaction and functionalities within the scene. In this chapter we concentrate on defining a method to encode -represent- semi-interactive content.

72


5.3

Augmented CGI Films: a general-purpose representation for high-quality animated content

This section describes a set of coding, and descriptor schemes for a new kind of multimedia application that we call Augmented CGI (Computer Generated Imagery) Films. Augmented CGI films are the result of blending the best of three popular multimedia products: Virtual Environments, video games and Computer Generated films. Virtual Environments (VE) are interesting due to the experience they create by mixing impressive graphics, sound, textual information and interaction. Good video games can immerse users for days, even weeks in an engaging multimedia world. While VE provide the most interactive experiences, they usually fail to present rich stories. Complex narratives in video games are sacrificed for the sake of interaction. The contrary occurs with CGI films. Also known as Computer Animation (CA) films, they immerse the spectator in a compelling multimedia experience by means of impressive special effects, showing images and shots that would be impossible to make in the ”real world”. CA films concentrate on conveying a message, a narrative with different degrees of complexity but leave interaction aside. Researchers have already started to study this question: the tension between author and user control of narratives in VE and multimedia systems such as films. Steiner and Tomkins [138] proposed a system architecture to better balance user freedom and author control: adaptation of narrative event presentation. They chose to stay on the side of VE and adapt the way to present events to drive the user’s


73

attention and achieve better comprehension of a story. These authors believe that keeping enough interactivity -free exploration of the virtual world- is key to achieve immersion and engagement. However, they report that despite the narrative event adaptation, the level of user comprehension of narrative events is still higher in noninteractive multimedia systems (movies) than on fully interactive VE. We have chosen the side of CGI movies and decided to introduce certain levels of interaction and additional information. Our goal is to give the user alternative ways to interpret the story without sacrificing neither the narrative comprehension nor the story. Figure 5.1 shows where we situate our proposal compared to video games and CGI films. The main objective of a film is to tell a story. More precisely, a particular interpretation of a story, which is the result of the director’s selection of scenes and camera paths. We believe that giving the spectator the option to select from multiple camera paths could be an interesting added value: it would introduce some level of personalization of the experience. It could also add new expressivity dimensions to this art. The director could propose alternative versions of the story in the form of different camera paths. This way the scene is presented in a different ways, adapted to different user profiles (age, culture, preferences etc.). For example, a scene of a football game could be seen using a camera situated on the highest point of the stadium; or through a camera that closely follows the ball. Each of the two camera paths offers different aspects of the action, different interpretations and emotions. The idea of providing multiple camera paths was originally proposed for DVD films. However it is rare to have this option due to the high costs of filming a live

74


Figure 5.1: Augmented CGI films are in the middle of the multimedia applications spectrum: they are augmented in the sense of providing more interactivity than classical CGI films and more narrative complexity than video games.

scene using different cameras simultaneously and the space available on the media. Typical DVD’s are able to store only about 2 hours of video, that is only one camera path. Nevertheless, having multiple camera paths seems to be the natural next step for CGI films. We call them 3D animation movies, but we only watch them as 2D images from a single camera path, the one selected by the director. We enjoy watching these impressive 3D models, but in contrast to games, we cannot take advantage of their third dimension. We believe the coding and film descriptors we propose here provide a good foundation to enrich -augment- the expressivity and interactivity of CGI films. Of particular interest is the notion of profiling based on the camera paths.


75

The same animation could be rated for a general public or a restricted audience according to the camera path being used. E.g. violent scenes could be shown using a camera path that hides the most aggressive images.

Several problems must be solved before we can create augmented CGI films. First, there is the question of defining an efficient coding scheme to represent high-quality animated 3D geometry. The animation and geometry representation must be compact and simple enough to be decoded on a user terminal in real-time. Target platforms for this new media would be the digital set-top boxes [108] which are powered by chips with limited performance and capacities, e.g. no Floating Point Unit. These are expected to be low-cost consumer-oriented hardware, which usually translates as low performance components. Moreover, for the coding scheme to be truly useful, the encoding tools should be seamlessly integrated into the current production tools. Making a CGI movie is a complex and long process involving many design, production and post-production steps. For the moment we concentrate on a production pipeline centered on a single modeling and animation tool. The movies we are able to encode are animated scenes created within such a tool, Alias Maya in this case.

Second, we need a way to exploit the 3D animation by letting the user watch the story using different camera paths and provide pertinent information about them and the scene. This information will be coded in the form of an film descriptor based on the MPEG-7 standard.

76


5.3.1

Coding of 3D animation

When reviewing the state of the art on 3D animation coding, it is natural to be pointed towards compression techniques. The ideal very low-bit rate coding scheme capable to represent high quality 3D images is still to be defined. Partial solutions have been found for very efficient compression and transmission of 3D interactive content. Most of the compression algorithms use a polygonal mesh representation of 3D objects. This representation is widespread because it is supported by the hardware of most of the graphics rendering cards available in the market. Compressing animated 3D geometry is still an open issue and our proposal does not focus on obtaining high compression rates but on defining a simple generic representation. Our search for efficient coding schemes pointed us to the MPEG-4 specification [82]. MPEG-4 provides very low-bit rate coding for virtual characters [29], [30] and efficient compression for general 3D shapes. However, the MPEG-4 coding scheme does not support advanced animation techniques, neither for virtual characters nor for general 3D objects, this forces the designers to produce low visual quality contents -created with the limited number of supported animation techniques. Efforts have been done on incorporating advanced animation techniques to the specification, but they have mainly focused on the animation of virtual characters, e.g. Bone Based Animation [122] or morphing. Furthermore, the MPEG-4 specification has grown too much to be implemented in a lightweight terminal (decoder) and the same applies to the encoder. In fact, the Moving Picture Experts Group has recently identified the need for a lightweight rep-


77

resentation for interactive scenes [83], [104]. While focusing on mobile applications and mainly 2D content, this kind of initiatives show the need for a generic and simple representation for animated 3D content. Research concerning 3D geometry coding and compression includes the work of Rossignac, the Edgebreaker algorithm [128]. Focused on polygonal meshes compression, this algorithm is less complex and easier to implement in a lightweight terminal than previous compression algorithms cited by the author. A detailed report on 3D compression algorithms can be found in [142]. Nevertheless, this research concentrates on static geometry compression. Lengyel was one of the first to consider time-dependent geometry as a streaming media and proposed a technique for compressing data streams of dynamic meshes [94]. His algorithm is based on compressing the motion of the vertices of the animated meshes by means of a predefined set of fitting predictors. The method groups the vertices and computes the transformation that best matches the average evolution of the group. Corrections are then streamed together with the predictors. Lengyel affirms that in most of the cases the best possible compression would be to encode the modeling primitives used to create the animation. However, he concludes that it would be unfeasible to implement every single primitive on the run-time engine -user terminal- since this component must remain as generic and fast as possible. The Dynapack algorithm [75] performs a space-time compression of animated triangle meshes with fixed connectivity by means of a single predictor for all of the vertices and for all key frames. This approach differs from Lengyel’s algorithm and the other ones cited by the authors in the sense that it avoids the need for grouping

78


vertices to be animated with different predictors -the ones that best fit the motion of each group. Applying a single generic method -single predictor- for decoding the animation simplifies the implementation of the user terminal. In this sense, the Geometry Videos technique proposed by Brice˜ no et. al. [26] goes one step further. Not only it decodes the animation by applying a generic algorithm, but it takes advantage of existing mature video processing and compression techniques to increase the compression ratio. However, the technique is viable only for a certain class of animation: the algorithm assumes all the animated meshes have fixed connectivity and still has to be improved in order to encode a larger class of animations in an automatic and efficient way. Karni and Gotsman [87] defined a compression scheme based on the principle component analysis (PCA) method. They represented the animation sequence (spatial correlation) using a small number of basis functions. They exploit temporal correlations to increase the compression rate by using second-order linear prediction coding (LPC). While their algorithm achieves higher compression rates than other algorithms such as Dynapack [75], the encoding time is longer. The authors argue this is not an issue since content producers can afford multiple workstations for parallel processing. The works cited before were focused on maximizing the data compression, the bit per vertex rate. These algorithms have been tested in a research context situated at some distance from the film production pipelines. The authors propose methods for compressing 3D animation but do not consider problems such as how to integrate their techniques into the production tools and user context. Moreover, the best


79

techniques are usually lossy compression algorithms, they can degrade the quality of the animation and hence do not fulfill our requirements. The compression and coding scheme we are looking for should keep an equilibrium between the following factors: fast decoding (acceptable frame rate on a user terminal: at least 25 fps), adaptability to the digital content production pipeline, and last but not least, the coding scheme should be able to encode virtually any animation effect. It should be generic enough to be integrated into modeling and animation tools and don’t impose any restriction to the designers. Creating an animated sequence requires applying many different animation techniques. We need a coding scheme and an encoding tool that integrates seamlessly to the production stage. In the next section we describe our proposal for representing 3D films.

5.3.2

A coding scheme for augmented CGI films

The representation for 3D films that we have adopted is a straightforward approach which allows for transparent integration into production tools and can be implemented in low-end hardware since it does not require floating point operations. It is important to stress the fact that this algorithm does not impose any limitation to the animation techniques used by the digital artist: physics-based animation, morphing, bone-based, etc. We work with the final version of the animation and don’t require to know how it was produced. The coding algorithm focuses on the faithful reproduction (loss-less coding) of a broad range of animation effects created with state of the art modeling tools such as Maya. We have in mind the possible implementation of the decoder into system-on-a-

80


chip multimedia viewers (e.g. ATI’s XilleonTM [11]) and mobile devices such as PDAs and cellular phones, which usually lack of a Floating Point Unit and have reduced computing power. Even the multimedia-enabled mobile devices powered by the new breed of graphics co-processors (e.g. ATI’s ImageonTM 2300 [10]) will still benefit from simple decoding algorithms. As mentioned in the introduction, we tried to minimize the floating point operations required to decode the 3D film. This prevented us from using more efficient compression algorithms such as the ones cited in section 5.3.1. Moreover, highcompression-rate algorithms tend to sacrifice the versatility of the encoder: they reduce the variety of applicable animation techniques -see the case of MPEG-4 in section 5.3.1- and/or impose relatively high hardware requirements. The 3D animation coding scheme we propose goes in a direction similar to the call for proposals from MPEG-4 for a light-weight scene representation [104]. In order to reproduce 3D animated sequences we need to represent first the geometry and then its changes over time, the animation. Next subsections detail both parts of our coding scheme.

Geometry coding The geometric representation we have adopted is based on the indexed face set format -polygonal mesh- and contains a list of points in 3D or 2D, an array defining the mesh, the texture file to map and the uv coordinates. The format resembles a simplified VRML file. The reason for using a polygonal mesh representation is explained on section 5.3.1. The objects are defined in global coordinates. All transformations


81

CGI_film{ string filmName unsigned int framesInFilm unsigned int frameRate CGI_scene scenes[ ] } CGI_scene{ unsigned int initFrame unsigned int endFrame CGI_object objectsInScene[ ] } CGI_object{ string objectName quantized_float vertices[ ] unsigned int coordIndex[ ] quantized_float textureCoordinates[ ] CGI_texturedShape texturedShapes[ ] } CGI_texturedShape{ string textureImageURL unsigned int texCoordIndex[ ] }

Figure 5.2: Coding syntax for the geometry of augmented CGI films.

-translation, rotation, vertex displacements, etc.- are defined by transition functions -polynomial curves. The data structure for the geometry is defined in figure 5.2. CGI film is the main geometry container, it provides basic information such as the film’s name, the number of animation frames, the frame rate and a pointer to an array of CGI scene nodes. CGI scene is the basic building block of an augmented CGI film. It defines the 3D objects present during a certain frames interval. Dividing the film into scenes -or chapters- will allow us to associate semantic descriptors which

82


will provide information on a per-chapter basis. The chapter descriptors will contain text descriptions and in particular the available camera paths to watch the scene using different camera paths, see section 5.3.2. CGI object defines the 3D object that participate in a scene, whether they are animated or not. CGI objects are constituted of a single mesh, without skeleton or any other hierarchic structure.vertices is a list of quantized floats (integer values used to represent floats) specifying the vertex coordinates (3D vectors). coordIndex is a list of integers containing the indices to the vertices describing the facets of the object’s mesh, -1 is used as separation character, e.g.: 0,1,2-1, 2,1,3,-1 describes two triangles. textureCoordinates is a list of uv coordinates used to map the texture over the object’s mesh. Several textures can be mapped over different mesh sections of an object. We ”bake down” the output of the rendered texture, so that each single texture has every rendered feature: shadows, scattered light, illuminance, transparency etc. Textures can be updated during animation to reproduce lighting effects such as shadows without need to calculate them on client side. texturedShapes is a list of CGI texturedShape nodes containing the URL pointing to the image texture and the indices to the textureCoordinate list used to set the uv coordinates on each vertex.

Animation coding We have defined a vertex-based animation coding. The trajectory of each vertex in the scene is coded using polynomial curves per vertex component. The algorithm first tries to fit to a polynomial curve the trajectory of each vertex component. Our tests


83

CGI_animation{ string objectName nibble contentToUpdate CGI_transitionTexture textureUpdates[ ] unsigned int initFrame unsigned int endFrame CGI_transitionFunction transFunctions[ ] } CGI_transitionFunction{ nibble component unsigned int polynomDegree quantized_float polynomialCurve[ ] unsigned int vertexCluster[ ] } CGI_transitionTexture{ unsigned int textureIndex CGI_texturedShape textureShape }

Figure 5.3: Vertex-based animation coding syntax.

showed parabolic interpolation allows for coding in a compact way the trajectories -transition functions- of the vertices being animated. After first step, we end with a set of parabolic equations (2nd degree polynomials) describing the trajectory of each vertex component for every vertex in the animation. Each transition function describes a vertex component trajectory for at least 3 animation frames (3 points needed to calculate a parabolic equation). Vertices which don’t move have no transition functions, some others are described using only lines (first degree polynomials). In many cases a single parabolic equation

84


is able to describe a vertex component trajectory for more than 4 animation frames. This depends on the nature of the animation: the difference between subsequent positions of each vertex. In order to increase the compression ratio, the vertices with the same transition function are grouped by clusters. This is done on a second step where we analyze the whole set of transition functions and group vertex components into clusters animated by the same transition function. This is illustrated in figure 5.4. The animation encoding includes also the rendering of the surface textures in order to ”bake-down” the lighting effects. At each animation frame, texture images are compared to the previous ones and if they have differences over a certain epsilon, a new texture update is registered together with the corresponding image. Image comparison is done on a per-pixel basis, two texture images are significantly different if they have a certain percentage of different pixels. Two pixels are different if the Euclidean distance between them surpasses a certain threshold (pixels are represented by 3 coordinates: RGB components). The syntax used to represent the animation functions and texture updates is described in figure 5.3. CGI animation is the basic update message, it indicates the object to which the current update is to be applied. contentToUpdate is a flag that indicates whether the update contains a set of transition functions and/or texture updates for the current object. textureUpdates is a list of CGI transitionTexture nodes which indicate the index to the texture that will be updated with the new textureShape node. initFrame and endFrame define the interval where the list of transition functions (transFunctions) will be applied. CGI transitionFunction indicates the list


85

of vertices (vertexCluster) to be animated by evaluating the polynomial defined in polynomialCurve (list of polynomDegree + 1 coefficients).

Figure 5.4: Vertices follow parabolic trajectories that can be grouped into clusters.

Coding camera paths through semantic descriptors The syntax described in previous section let us define the animation of 3D objects. However, the added value of the augmented CGI films is the possibility to view the scene from different viewpoints (camera paths), introducing a new interactivity dimension. For this we need a means to associate information to the pure animated geometry data we have defined. The goal of the MPEG-7 standard is to allow interoperable searching, indexing, filtering, and access of audio-visual (AV) content. MPEG-7 specifies the description of features related to the AV content as well as information related to its management [131]. It has been selected as the best way to describe the ”augmented” features of our coding scheme. This section details the MPEG-7-based (semantic) descriptors we have defined for this purpose. The information we will associate to the animation and geometry data is the following: a text description of the film and each of its chapters (CGI scenes which

86


are the building blocks of the film, see section 5.3.2), plus the available camera paths per scene.

Figure 5.5: XML-based descriptors for Augmented CGI films.

We decided to propose a new descriptor for the camera paths since the ones defined on the Visual [2] and the Multimedia Description Schemes [116] parts of the standard were created to describe 2D video. We did not find them suitable for specifying camera paths in 3D. The parameters needed to define a camera view on a 3D application are those used to create the view (eye point, look-at point, ”up” direction) and projection matrices (field of view, aspect ratio, near and far viewplanes) [102]. They cannot be mapped to the camera motion parameters defined in the MPEG-7 CameraMotion Descriptor [2], which defines basic camera operations such as: fixed, tracking booming, dollying, panning, tilting, rolling and zooming. Each scene in an augmented CGI film must have a short text description associated as well as the definition of the available camera paths. MPEG-7 provides descriptors


87

to incorporate semantics (text descriptions) into video sequences, we will use them for our animation sequences. The new descriptor we propose will be used to describe the camera animation taking into account the parameters required to construct the view and projection matrices. To describe the path and animation sequence of a camera, we must keep track of changes of its position/orientation (view matrix parameters), and projection matrix parameters such as the field of view, etc. We follow the same approach used to encode the geometry animation and describe the changes in camera parameters as polynomial curves. Modeling/Animation tools describe camera paths by means of splines curves which are easy to handle by designers and provide a compact representation. We avoid using this representation to keep the decoder simple and minimize the floating point operations. Evaluate a polynomial is less expensive than interpolating a spline curve, moreover, the calculation routine has been already implemented in the decoder to reproduce the animation. Figure 5.5 shows a schematic view of the descriptors we have defined. Appendix A shows the CGI film descriptor defined using the MPEG-7 Description Definition Language (DDL). A sample descriptor is shown in the same appendix as well.

5.3.3

User interface for augmented CGI films

The previous section presented the coding and description representations we have defined to create augmented CGI films. We must now consider the problem of controlling and interacting with such content. Since we are talking about films, the most

88


Figure 5.6: User interface for Augmented CGI films.

natural interface would be something similar to the classical remote control used to control must home electronic devices such as TV or DVD players. The majority of remote controls rely on text displayed on the TV screen and/or small displays embedded in the devices (e.g. VCR display), but not on the remote control. We have decided to apply the PDA-based interface presented in chapter 4. Thanks to the semantic layer, the user has full control of objects in Virtual Environments. The PDA eliminates the need for overlaying menus or other widgets that obstruct the main display screen. A handheld interface introduces the ”division of attention” problem: the user risks to concentrate more on the PDA than on the film. To avoid this our interface is designed in such a way that camera paths are selectable only at the beginning of the film or on a per-scene basis. At the end of each scene an overlaying logo informs the user that camera path selection is available, only during this time (a few seconds) a new camera path can be selected with the PDA and it will be applied to the next scene. Otherwise, the user should stop the film and select a camera path before


89

starting a new scene. Based on this technology, we extended the paradigm of the remote control and provided an interface that allows for selecting the camera path to apply. The text descriptions associated to the film (title, duration, etc.) and to each camera view, if any, are presented as well on the PDA display.

Figure 5.7: Main components of the implemented prototype.

5.3.4

Conclusions

We have presented a coding scheme and semantic (XML) descriptors for 3D animated scenes -CGI films- that are augmented through the incorporation of alternative viewpoints customizable by the user, as well as additional information about the scene. Augmented CGI films are the result of blending Virtual Environments and Computer Generated films. We introduce some level of interactivity, a typical VE feature,

90


while keeping high quality animated graphics. We believe this gives content producers new expression possibilities to tell a story and create multimedia experiences. The coding scheme presented in this chapter can be incorporated into a semanticsbased representation for VE, and be the foundation for semantic rendering, a transcoding method that takes into account not only the geometry, but the interaction capabilities and other functionalities of each virtual entity participating in the scene or application.

Chapter 6 Semantics for Animation 6.1

Introduction

This Chapter presents our contribution to the area of Computer Animation (CA). CA appears in many contexts including gaming, scientific simulations, and as a medium for artistic animation. Motion and reactivity of entities within a Virtual Environment are essential to the believability and immersion of the application. Computer Animation involves a complex body of techniques and algorithms. Almost all computer animation today is done using keyframe systems evolved from the early 1980s. Only recently have advances such as full inverse kinematics, dynamics, flocking, automated walk cycles and 3D morphing made the leap from the academic to commercial sectors [139]. Computer Animation can be considered from two different perspectives, depending on whether it is constrained to work in real-time or not. The contribution of this Chapter focuses on real-time animation of virtual characters. Character animation is 91

92

Chapter 6: Semantics for Animation

one of the most difficult ones. Simulating natural living motion has led to a multitude of solutions, each attacking different aspects of the problem. We concentrate on the need for spontaneousness and reactivity of virtual entities. Virtual entities that react to their environment can be used to build better humanmachine interfaces. This Chapter presents our approach toward the synthesis of virtual characters capable of performing reflex movements as reaction to events and environmental stimuli. First we describe the algorithms and techniques we developed to generate reflex movements (previously described in [67]). The second part of the Chapter explains how our technique fits within the semantics-based model presented in Chapter 3.

6.2

Reflex Movements: Representation and Algorithms for Reactive Virtual Entities

The animation of virtual humans and computer animation in general have always searched to produce realistic imagery and true naturalness in the motions. One particularly difficult problem consists on animating a virtual human displaying a behavior that truly resembles a real human [12], [118]. Our goal is to animate virtual humans (body gestures) in an autonomous way, giving the illusion that the artificial human being is a living creature displaying spontaneous behavior according to its internal information and external stimuli coming from its virtual environment. The proposed method is a real-time, distributed control system inspired in the human nervous system, which will be able to produce natural movements in response


93

to external and internal stimuli. The advantage of the real-time control techniques over the ”Computer Graphics movie” approach is their promise of combined autonomy and realism as well as their focus on reusability. The autonomous control system we are presenting could be instantiated and used to animate several different characters in a variety of situations. The following is an overview of the different scientific developments focused on the study and generation of human postures and gestures.

We distinguish two

main trends: the biology (biomechanics studies) and the non-biology based (more computer-graphics-oriented) approach.

6.2.1

The biology based approach

Many biomechanical models have been developed to provide partial simulations of specific human movements: walking/running models, or simulations of dynamic postural control under unknown conditions [53], [92]. The biology based approach has been used in the research of biomechanical engineering, robotics and neurophysiology, to clarify the mechanisms of human walking [109], and modelling the muscular actuation systems of animals [71], [52] and human beings [27], [61]. The biologically inspired control architectures are one of the most recent trends in the field of robotics [107]. Models of the human nervous system, in conjunction with fuzzy logic, neural networks and evolutionary algorithms have been used to better understand and control the way humans and animals move and execute typical actions such as walking, running, reaching or grasping [88] [6], [92]. Some other studies focus on the analysis of the human motion to extract parametric

94


models for posture recognition [73]. None of these developments has been specifically applied to create a general autonomous control system to drive the animation and behavior of a virtual human in the framework of a virtual reality application. However, they can provide the basis for a general motion control model.

6.2.2

The computer graphics approach

In contrast with the above mentioned studies, the following citations are more related to the development of virtual reality and computer graphics applications in general. The most ”traditional” techniques used to synthesize human gestures include the use of kinematics, dynamics or a combination of them [84], [90]. Inverse Kinematics has been used for animation of complex articulated figures such as the human body, balance control, motion and postures correction are some of their applications [23], [126], [91]. These techniques are focused on the correction of predefined postures or common actions such as walking to make them more natural; however, they require specific goals to be predefined and don’t consider spontaneous movements. They can be used as tools for animators but don’t provide a system for autonomous motion. Another approach is the use of statistical analysis of observation data acquired by different means: motion capture systems, video or photographs. These methods use in different ways a database of pre-recorded movements to mix them, readapt them and reuse them in a variety of environments where the same kind of actions are required. Again, the main type of movements that are studied are walking sequences


95

and standard combinations of movements: walk-sit, sit-walk, etc. [12], [141], [9]. These approaches are not very suitable to be used as part of a system for the automatic generation of gestures. Synthesizing autonomous gestures is related to the area of behavioral animation. By behavioral animation we refer to the techniques applied to the synthesis of autonomous animation of virtual characters. Autonomy is one of the most priced goals in the field of character animation. Several studies and implementations have been done with the objective of giving a virtual character the ability to perceive its virtual environment and react to it by means of executing adequate tasks [18], [144], creating the illusion that the synthetic character is alive: it moves by itself and achieves relatively complex tasks. The subtle and spontaneous gestures are part of low-level behaviors, and they are essential to communicate any message and convey emotions. These ideas are just beginning to be explored by the scientific community, some studies focus on non-human characters, like dogs or other virtual animals [79]. One particularly difficult issue is the problem of what to do with the characters when there is no pre-defined task assigned to them. Research has been done to produce the so-called ”idle animation”: generating behavior to make the character move when there’s no specific action to do. Some work has focused on adding a pseudo-random ”perturbation” to an idle posture [117], avoiding ”frozen” characters. However, real people perform very different gestures when waiting for something or while attending or listening to someone. The gestures performed during ”idle states” depend on a variety of factors: emotional state, cultural background, the current

96


situation (waiting for something, listening, or thinking), unexpected events (a change in the environment), etc. and can be classified as low-level behavior. Despite the advances in behavioral animation, autonomous virtual humans are not yet able to display the whole range of subtle gestures and mannerisms that characterize a real human being: facial expressions, body language, autonomous reactions, etc. Our work intends to advance the state of the art on the last category: providing virtual humans with the ability to react to unexpected events through reflex movements. After analyzing some of the existing techniques applied to the control and synthesis of human gestures, we observe a promising way in the biologically inspired systems, especially when the goal of autonomy and naturalness in the motions is the first priority. In the state of the art we observe that the metaphor of the nervous system has been used to simulate and control the motion of a specific limb with applications to medicine and/or robotics. We have followed this approach to build a general architecture for the synthesis of autonomous gestures. In the next section we describe our proposal in detail.

6.2.3

The virtual human neuromotor system

The main innovation of this work is the design of a distributed control architecture based on autonomous entities that intercommunicate with each other in a self-similar hierarchy (fractal architecture). We propose a control system inspired in the human nervous system which will be used to generate autonomous behavior on virtual hu-


97

mans. Biologically based control systems have been applied mainly to simulations oriented to robotics or for biomechanics applications. The human nervous system is divided into the central and peripheral nervous systems (CNS and PNS, respectively, see figure 6.1). The peripheral nervous system consists of sensory neurons running from stimulus receptors that inform the CNS of the received stimuli; and motor neurons running from the CNS to the muscles and glands - called effectors - that take action. The central nervous system consists of the spinal cord and the brain [89]. The CNS is a control center which receives internal and external stimuli and sends orders to the effectors by means of the motor neurons. We are using this principle as the basic building block for a simplified model of the nervous system that will act as an animation controller for a virtual human.

Figure 6.1: The Central Nervous System.

The proposed model of the Virtual Human Nervous System is a distributed system capable to produce autonomous gestures (reflex movements) in reaction to a defined set of stimuli: external forces, temperature, and muscle effort. This model constitutes what we call the distributed animation control system (DACS). In conjunction with the DACS we have implemented a simplified model of the human locomotion system in order to provide the DACS with a set of effectors to move the different body parts.

98


The Virtual Human Locomotion System models the human musculo-skeletal system as a skeleton structure, compliant with the H-Anim specification [69], whose joints can be moved by pairs of antagonist muscles, one pair for each degree of freedom. The implementation of the muscles gives importance to the effect they produce on the virtual humans joints, that’s why they are called effectors. The main idea behind the DACS is the definition of a minimum control entity constituted by three main sub entities or components: sensors, analyzers and effectors, defined as follows: The Sensors are the virtual devices capable of gathering information from the exterior world of the virtual human, such as temperature, contact with other objects, external forces, etc.; and also from the interior of the virtual human: stress, muscle effort, etc. They store the acquired information in the form of a stimulus vector, containing information on the intensity, direction and orientation of the received stimulus. The Analyzers are entities which concentrate information coming from one or many sensors or analyzers. An analyzer’s main function is to calculate the cumulative effect of the sensors attached to it. The reaction vector calculated by the analyzer is used by the effectors to generate the adequate reaction as a function of the intensity, direction and orientation of the stimulus received. The Effectors: these entities are black boxes capable to affect one or more degrees of freedom of the virtual human joints to which they’re attached. Currently, they’re implemented as inverse kinematics controllers that calculate the position of the joints as a function of the reaction vector calculated by the analyzer attached to them.


99

The components of the minimum control entity -sensors, analyzers and effectorswhich will constitute the basic building block of the DACS are shown on figure 6.2. This scheme will be replicated at different hierarchic levels providing main control centers (analyzers which will be linked to lower level analyzers) capable of taking high level decisions. The control entity is capable to receive (sense) a set of stimuli or information, process them and generate a reaction, which can be an order to an effector (a muscle or set of muscles) or a message to another control entity in the same or in a different hierarchic level. Each control entity (c.e.) is usually responsible of controlling one limb of the virtual human (arms, legs). The basic c.e. will be reproduced in different hierarchic levels in order to coordinate the whole body. Figure 6.2 illustrates this idea: a hierarchy of control entities following a fractal like structure (self-similar hierarchy). Each component works in an independent way, this characteristic allows for distributing the system. The communication between control entities will takes place depending on the intensity of the received stimuli, this will emulate the neurotransmitters effect (the chemical substances which act as communication channels between neurons) and allow or avoid communication at different levels. The simulation of the neurotransmitters effect will allow for modifying the intensity of the response to a given stimulus. For example, if the virtual human touches a hot object such as the grill of a stove, it will react with different speed or intensity depending on the actual temperature, it’s different to touch a grill at 20C than trying to do it when the object is at 150C.

100


Figure 6.2: The basic control entity and the hierarchical structure of the DACS.

Spontaneous behavior can be generated depending on the overall conditions of the virtual human. Spontaneous movements such as balancing the arms or changing the body weight from one leg to another in a stand up posture, vary depending on the muscular and/or mental stress. People display different gestures when they stop after running or walking depending on the amount of energy they have spent, among many other factors. These gestures can be also considered as reactions to a certain kind of stimuli as well. Until now we have explained the general principles of the DACS which will constitute the nervous and motor system and provide autonomous animation to a virtual human, depending on the internal and external stimuli and information coming from its environment.


101

In the next section we describe a test application that shows the feasibility of the proposed model.

6.2.4

Test application: reaction to thermic stimuli

To show the feasibility of using the Distributed Animation Control System as the neuromotor system for a virtual human, we have implemented a demonstration application which generates reflex movements for the arm as reaction to thermic stimuli. The virtual human will stand in front of a stove with its left hand over one of the burners. The temperature of the burner will be modified in the different tests. The objective is to see different levels of reaction depending on the perceived temperature. The reaction levels have been classified intro 3 main regions depending on the stimulus intensity: green zone -no reaction is required-, yellow zone -controlled reaction, the speed of the motion starts to increase depending on the stimulus intensity-, red zone -the reaction is more ”violent” since the stimulus intensity reaches the highest values, in this level the joint limits are usually reached and a ”bounding” motion is produced as a reaction. If the temperature of the burner is above 40C, the reflex movements start to appear, ranging from a slight movement of the wrist to separate the hand from the burner level, up to a violent fast movement involving all the arm joints in order to retire the hand as soon as possible if the temperature raises to values around the 100C. We used an H-Anim compliant virtual human model and implemented an animation control for one arm (see figure 6.3), with temperature sensors (to receive external

102


Figure 6.3: The animation control for the arm.

stimuli) in the palm of the hand. The general algorithm used to calculate the reflex movement is the following: Analyzers and sensors are arranged in a tree structure. The sensors are sampled at each animation frame. The analyzer-sensor tree is traversed in post-order (first children, then the local parent). Allowing each analyzer to gather the information of the sensors attached to it. The process is recursively repeated letting the analyzers concentrate the information coming from the lower levels. Each segment of the articulated character has a main analyzer. Segment analyzers are associated to the main limb analyzer, the one for the arm in this case. The limb analyzers are usually the local root of the sensor-analyzer tree and contain the overall reaction vector specifying the velocity of the reflex movement. The reaction vector contains the orientation and direction to be followed by the limb in order to react as required -rejecting the thermic stimuli.


103

The orientation and direction of the stimulus vector depend on the position of the sensor relative to the stimulus source -orientation is inverted representing the need for rejecting the stimulus source. The stimulus vectors are calculated as follows: the vector magnitude -stimulus intensity- is a function of the Euclidean distance between the sensor and the stimulus source -represented by a point in the 3D space. The intensity decreases with the distance -in our tests this is done in a linear way. The analyzers compute the reaction vector by calculating a vector addition of the stimulus vectors from their associated sensors.

The actual movement of the arm is calculated using inverse kinematics (IK) [43], [77]. The end effector required by the IK algorithm is the segment receiving the highest stimulus intensity -magnitude of the reaction vector-, as calculated by the segment analyzers. The trajectory to be followed by the limb -end effector- is defined by the reaction vector computed by the main limb analyzer. The motion speed is directly proportional to the intensity of the reaction vector. As explained in the previous section, the reaction speed can range from no reaction at all -when intensity falls into the tolerance interval- up to a very fast reaction causing the limb to reach the joint limits very fast, even before the sensors detect there is no need to reject the stimulus source anymore. In the later case, the effector nodes -not shown in figure 6.3, but associated to each joint, force the limb to recover a comfortable posture -under the joint limits.

104


Technical details of the implementation The demonstration application has been implemented as a java applet. The 3D animation is done using a 3D rendering engine for java [134]. The components of the basic control entity are java classes which extend the thread class in order to be instantiated and run as independent threads. The analyzer objects monitor continuously their attached components (effectors, sensors or other analyzers) and establish the required communications. The virtual human model is an H-Anim VRML file, the stove is a conventional VRML’97 file. The demonstration was run on a PC workstation with a bi-Xeon at 1.2Mhz processor and 1Gb of RAM using MS-Windows 2000. The animation runs at 30 frames per second and the gives the impression of a natural reaction speed. The figure 6.4 shows different levels of reaction depending on the preset temperature. For each test the simulation is reset to the initial posture of the virtual human with its left hand over the stove burner.

6.3

Semantics for autonomous virtual characters

The main idea is the design of a distributed control architecture based on autonomous entities (digital items) that intercommunicate with each other in a selfsimilar hierarchy (fractal architecture). We proposed a control system inspired in the human nervous system and applied it to generate autonomous behavior on virtual humans. Figure 6.5 shows the main components of the system as they fit into our semantic model. The image on the right of figure 6.5 presents the system components in a more ”graphical” way.


Figure 6.4: Snapshots of the test application.

105

106


Figure 6.5: Semantic model of an autonomous character (reflex movements). The main idea behind the ”virtual nervous system” is the definition of a minimum control entity constituted by three main sub entities or components, implemented now as digital items: sensors, analyzers and effectors. The Sensors are virtual entities capable of gathering information from the exterior world of the virtual human, such as temperature, contact with other objects, external forces, etc. They store the acquired information in the form of a stimulus vector, containing information on the intensity, direction and orientation of the received stimulus (see figure 6.5). The Analyzers are entities which concentrate information coming from one or many sensors or analyzers. An analyzer’s main function is to calculate the cumulative effect of the sensors attached to it. The reaction vector calculated by the analyzer is used by the effectors to generate the adequate reaction as a function of the intensity, direction and orientation of the stimulus received. The Effectors are black boxes capable to affect one or more degrees of freedom of


107

the virtual human joints to which they’re attached. Currently, they’re implemented as inverse kinematics controllers that calculate the position of the joints as a function of the reaction vector calculated by the analyzer attached to them. The virtual nervous system is implemented as an independent component which can be attached to any virtual character with a compatible semantics (h-anim hierarchy). The semantic descriptor informs the VE system that this particular character uses an ArmEffector. The ArmEffector in this case is a basic control unit used to perform reflex movements as reaction to thermical stimuli. Other digital items are defined as thermic sources in their corresponding semantic descriptors.

108


Chapter 7 Ontology-based Virtual Environments 7.1

Semantic Virtual Environments

The term Virtual Environment (VE) as used in this thesis can be applied to any multimedia application. For us, a VE can be considered as a collection of entities, each of them defined by a set of functionalities, with a particular type of associated information and semantics. Entities can be represented in a variety of ways, e.g. as 3D/2D animated shapes, text, images, video, etc.; depending on the context and application. A Virtual Environment can be a 3D world or a multimedia document containing text, images, audio, etc. In any case, a VE is composed of a set of entities with clearly defined functions and information that can be represented in different ways. Common representations of VE are based on 3D geometry. In contrast, we base 109

110

Chapter 7: Ontology-based Virtual Environments

our conception of VEs on the semantics -functionalities- of the virtual entities that participate in them. We consider their geometric representation as one of the attributes. In Chapter 3 we defined an object representation based on the semantics and functionality of interactive digital items - virtual objects- within a Virtual Environment (VE) [68]. Every object participating in a VE application is a dynamic entity with multiple visual representations and functionalities. This allows for dynamically scaling and adapting the object’s geometry and functions to different scenarios. We refer to Virtual Environments built following this modeling approach as Semantic Virtual Environments. The main chapters of this thesis describe who we have applied a semantic representation of VEs, in each of the following aspects: Interaction Dynamic adaptation of the multiple interaction techniques that can be used to communicate within a VE. The objective is to let the user access the available interaction devices and customize in real-time the way of controlling the VE’s functionalities and personalize the interaction techniques. The use of semantics for interaction is described in Chapter 4 Visualization Adaptive content means using a VEs in a contexts with variable levels of resolution/computing power. Content adaptation implies modifying the geometry of the objects, as well as the level of interaction: interaction possibilities are different depending on the available resources (e.g. desktop or mobile scenarios). The term Semantic Rendering refers to this kind of content/interaction adaptation (see Chapter 5).


111

Animation Entities in a VE should display a behavior depending on inner (coming from other virtual entities) or external (user interaction) stimuli they receive. In Chapter 6we described how a semantic modeling approach helped the implementation of reactive autonomous characters.

Even if the work presented in previous chapters takes into account the model proposed in Chapter 3, implementations focus on a particular aspect, and the resulting virtual environments are only reusable to some extend. Virtual entities can be reused in different contexts and are customizable enough to fit user and application preferences: reconfigurable interfaces, adaptive visualization and animation. However, retrieving such entities or finding an efficient way to describe them or manipulate them is not fully achieved. For instance, our digital items, as implemented so far, are difficult to use in a development context where content producers need to search for particular virtual entities to build new VEs. We have proposed a modeling approach to ease the creation of VEs, enhancing their adaptability and reusability, from the development point of view, but haven’t addressed the need for retrieve them and manage the knowledge and data they represent. Our virtual entities are software components that require yet another semantic layer in order to be managed as true digital assets that ease the integration and collaboration of tools and applications. The objective is to produce content (virtual entities) that can be easily retrieved and ”understood” by different applications, software tools and human users. This understanding goes beyond data translation and data integration and focuses on the process of expressing the meaning of a virtual entity, its possibilities as passive (usercontrolled avatar, scene decoration, etc.) or active (autonomous character, virtual

112


tool, etc.) actor in a VE. A required complement to our Semantic Virtual Environments is a way to represent in a unified way the concepts and knowledge related to the interaction, visualization and animation of their components. The semantics-based representation can be enhanced by means knowledge-management techniques and tools. One of the main instruments used to lay-down the foundations of a knowledge-based system is an ontology. An ontology defines a common vocabulary for domain-users (researchers or experts in a particular area) who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them. In the next sections we discuss on the use of ontologies in the domain of Virtual Environments. Then we describe the ontologies that we have developed: one focused on the adaptive multimodal interfaces system described in Chapter 4 and a second one that formalizes the knowledge related to the creation of virtual humans and serves as a basis for a general ontology for Virtual Environments.

7.2

Ontologies in the context of Virtual Environments

Ruffaldi et al. [130] propose the use of Virtual Environments as a mean to visualize ontologies: multimodal representations of ontology-based information. This could be considered as the opposite to the problem we are addressing in this research: While we try to use ontologies for representing VEs, the work of Ruffaldi aims at using VEs, as visualization tools for existing ontologies. This is similar to the approach presented


113

by Chen [32], who uses graphical capabilities of the virtual environment to support the visual exploration of external data within the environment itself. The common aspect of the research works cited before is the need to formalize the concept of virtual entity, the fundamental component of a VE. In [130] the authors take into account the three aspects that we consider essential for VEs (interaction, visualization, animation). They propose an ontology for VE that defines a basic building block for VE (what we call a virtual entity). The VE building block is characterized by four multimodal properties: action (related to animation), visual (visualization), haptic (related to interaction) and physical (concepts related to animation and interaction). The drawback of this ontology is that it still considers VEs as a set of objects organized in a spatial hierarchy. This does not correspond with our approach, which emphasizes the functionality and role of the virtual entity over the place it occupies in the 3D space. Enabling reuse of domain knowledge was one of the driving forces behind recent surge in ontology research. For example, models for many different domains need to represent the notion of time. This representation includes the notions of time intervals, points in time, relative measures of time, and so on. If one group of researchers develops such an ontology in detail, others can simply reuse it for their domains. In [137] the authors propose a methodology to provide virtual worlds with multimedia, interoperability and reusability properties. Reusability enables virtual worlds designers to use a virtual entity pattern initially designed for a virtual world A to be used in the design of a new virtual world B. The originality of this methodology relies on multiagent concepts and learning techniques for avoiding prior complex

114


specifications tasks to achieve interoperability and reusability of multimedia virtual entities in virtual worlds. Nevertheless, their approach is once more focused on spatial hierarchies of entities that display some behavior, or actions. They consider action as the main concept of virtual worlds. The concept of action states an absolute separation between an action (animation or behavior) and its consequences. Disconnecting actions from their consequences allows for addressing interoperability and reusing problems. This implies an explicit and formal definition of the interface between virtual entities and the virtual worlds they populate. The Influences/reaction model they propose has some common points with our semantic model of reflex movements for autonomous characters (Chapter 6). However, the authors still focus on spatial relationships between virtual entities and do not propose a true formalization (ontology) that can be used for building VEs or to manage the knowledge they represent. The work of Biuk-Aghai et al. [17] is one of the few research works that makes use of ontologies for virtual environments. The authors present a framework for integrating data mining in the design of collaborative virtual environments, in a way that facilitates not only the data collection and analysis, but also the application of discovered knowledge. Once more, the ontology of VE that they propose only provides knowledge about the navigation and feasible actions in the environment. Their ontologies structure VEs as space structure where each part has some associated functionality. Data mining techniques can be used for discovering relations between VE components such as which parts of the VE were used more intensively and which ones are directly connected to each other (“neighbouring” relations). From this overview, we can conclude that our approach is rather new. We aim


115

at modeling Virtual Environments using ontologies that take into account not only the spatial structure but more high-level semantics such as interaction functionalities, visualization alternatives and animation mechanisms. In the next section we describe an ontology that formalizes the concepts implied in the development of VEs with adaptive multimodal interfaces.

7.3

Ontology for interactive Virtual Environments

According to Gruber [57], an ontology is a formal specification of a shared conceptualization. The systems we target are composed of two main parts: interaction devices, such as multimodal interfaces; and virtual entities, that are part of a VE. The conceptualization shared by both parts consists on the abstraction of two main types of entities: interaction devices and virtual entities (3D animated shapes, images, text, etc.). The formal aspect of the specification refers to the fact that this model shall be both human and machine readable -this is achieved by means of an XML-based representation. Handling semantic descriptors defined in XML has several advantages. First of them is portability, XML parsers are available for a wide range of HW and SW platforms. Moreover, XML has become the standard format for data representation. Standards for content annotation and description, such as MPEG-7 use XML [131]. Virtually every language and specification for semantics annotation and retrieval of digital items (multimedia content, 3D models, etc.) is based on XML. Ontological principles are well recognized as effective designing rules for information systems [60], [145]. This has led to the notion of ”Ontology-driven information

116


systems” which covers both the structural and temporal dimensions [60]. Our adaptive multimodal interface is supported by such a system. The structural dimension concerns a database containing the information describing both interaction devices and virtual entities (semantic descriptors). The temporal dimension is related to the interface (visual programming) that gives access to such information at run-time. The central point of our formal representation is the conception of VEs as a set of entities with a semantic meaning. Entities that can be represented and affected in a variety of ways, either through user interaction or autonomous processes. Virtual entities have a meaning -semantics- a role to play in the environment. The way they are visualized and controlled -user interaction- depends on: the application context, the interaction devices available and the user preferences and needs. We must provide a flexible system that allows for adapting the interfaces to the semantics -function- of the content. The functionality of a virtual entity can be accessed in a variety of ways (multiple modalities). The user should be able to choose and configure the interaction technique that best adapts to her needs. Choosing and configuring an interaction technique translates into mapping the output of an interaction device to a functionality on a particular virtual entity. We designed an ontology expressing this basic principle. Figure 7.1 shows a diagram of the ontology for interactive Virtual Environments. On the one hand we have a range of Interaction Devices that let the user express her intentions through multiple channels -modalities. This can be done by means of a classical mouse-keyboard or through more sophisticated interfaces such as a PDA, speech, hand gestures or a combination of them. The essential attribute of


117

Figure 7.1: Ontology for interactive VEs: elements involved in a multimodal interface. an interaction device is the data it delivers (output ports). It can be a 2D vector, a token indicating a particular gesture or word being recognized, etc. On the other hand there are the Virtual Entities to be controlled. They can be 3D animated shapes such as virtual characters or multimedia documents, a video, an so on. From the interaction point of view the most important attributes are the customizable parameters that constitute the semantics of the virtual entity (such as the rotation angle of a virtual character’s joint). These are the input ports that let us communicate with them. Virtual entities can be fully manipulable by the user -e.g. the reproduction control of a video, while others could display some behavior as reaction to user input, for example, an autonomous virtual character. Data coming from interaction devices may require some additional post-processing before reaching the controlled entity. We incorporate a mechanism to further process interaction data in the form of Modulators. They are containers for modulation functions. We consider interaction devices as black boxes whose output has been

118


already normalized according to some criteria. Nevertheless, modulators are included in the ontology to maximize its flexibility. Modulators are also used as the register unit for storing the mapping between an interaction device output and the input of a virtual entity. A multimodal interface is constituted by a set of Interaction Mappings. They can be stored and reused. Formalizing the concepts and entities involved in the creation of adaptive multimodal interfaces helps on the development of search engines and GUI interfaces that can better exploit the information and maximize its reuse. The ontology for interactive VEs can be used for sharing the multimodal interfaces configuration with other applications. The knowledge expressed in this way is independent of the software implementations, this maximizes its reuse and further enhancement. The example we have described was relatively easy to specify, the concepts to formalize were reduced in number and their role in the domain was well defined. This is not the case on more complex aspects of the creation of VE. For instance, the ontology for interactive VEs does not give details on the components required to build a complex entity such as a virtual character. In this ontology we consider any virtual entity as a black box of which all we know are its I/O ports. We don’t consider the complexity of an autonomous character, which can react to the input coming from the user. The current ontology indicates what can make a virtual character react, but not how. In the next section, we present an ontology specialized on Virtual Humans (VH), one of the most popular kinds of virtual character. The VH ontology considers how to construct them and animate them.


7.4

119

Ontology for Virtual Humans

Virtual Humans are complex entities composed by well defined features, and functionalities. Concepts and techniques related to the creation and exploitation of VHs such as those described in [63] are shared by the research community. Our effort is targeted at unifying such concepts and representing them in a formal way. The objective is to support the creation and exploitation of VHs. The ontology described in this section was developed in the framework of the AIM@SHAPE European Network of Excellence (http://www.aimatshape.net/) as has been described in [63].

Developing the Ontology The development of an ontology usually starts by defining its domain and scope. That is, answer several basic questions known as competency questions. Competency questions (CQs) are one of the best ways to determine the scope of the ontology. CQs consist on a list of questions that a knowledge base based on the ontology should be able to answer [58].

Competency Questions The proposed ontology should be able to answer the following categories of competency questions:

Model history Is this model obtained by editing another model? What features have been changed on model X?

120


What tools where involved in the synthesis/modification of this VH? Who performed the task T on the model X? Features listing What is the height of the model? Is the model male or female? Is the model European? What are the features of this model? Is this model obtained artificially or it represents a real person? Which VH have a landmark description? Which are the available structural descriptors for a particular VH? Which aspects of the shape are described by the structural descriptor related to a particular VH? Which are the standing(seating, walking, .) VH? How is the body model represented? (a mesh/ a point set/...) Is the VH complete? (does it have a skeleton/ a hierarchy of body parts/ a set of landmarks attached to it? )

Questions whose answer is a function of low/high level features Most of the answers to these questions cannot be directly answered by the ontology -at least not in the current version. Answers will be provided by external algorithms which will take as input the data retrieved through the ontology. Which are the VH that are fat/slim/short?


121

Is this VH a child or an adult? Does it have a long nose? Does it miss any body part? Does this VH match another VH (or how much do they match)?, and in particular: Are they in the same posture? Do they have the same structure? Do they have similar parts? (same arm length/same fatness/similar nose?) Do they have similar anthropomorphic measures ( in terms of landmarks?) Is the model suitable for animation? How will this VH look like after 20 years? With 20 kg more? With another nose? Does this model fit this cloth? (Clothes haven’t been considered in the current version of the ontology. However, they could be considered as a special case of -smart- object or as a geometry with particular landmarks). What VH do I get if I put the head of VH1 on the body of VH2?

Animation sequences

What model does this animation use? What are the joints affected by this animation sequence? Are there any animation sequences lasting more than 1 minute suitable for this VH? Are there any ”running”/”football playing” animation sequences for this kind of VH? Can the animation sequence X be applied to the VH Y? (in the case of key-frames for skeleton-based animation this would basically depend on the possibility to match the key-frame data to the skeleton of the VH).

122


Animation algorithms What are the input and output channels of a particular Behavior controller (animation algorithm)? What are the models suitable to be animated with this algorithm? Does this VH have a vision sensor attached? Can this VH react to sound events in its virtual environment?

Interaction with objects What capabilities does an object provide? What are the actions the human can execute on the object? What are the characteristics of an object? (structure, physical properties, etc.) How can the object be grasped?

Ontology components We have defined a first version of the VH ontology based on the competency questions listed above. The Ontology for Virtual Humans aims at organizing the knowledge and data of three main research topics and applications involving graphical representations of humans: • Human body modeling and analysis: morphological analysis, measuring similarity, model editing/reconstruction. • Animation of virtual humans: autonomous or pre-set animation of VH. • Interaction of virtual humans with virtual objects: virtual -smart- objects that


123

contain the semantic information indicating how interactions between virtual humans and objects are to be carried out.

Figure 7.2: Main components of an Ontology for Virtual Humans.

Figure 7.2 presents a simplified diagram of the main components of the ontology. A detailed view is presented in figures 7.3,7.4. The main classes define the geometry of the VH, which can be represented as a polygonal mesh, NURBS, etc. The Structural Descriptor class (abbreviated as StructuralD in the simplified diagram) allows for deriving a variety of descriptors such as: nodes for topological graphs, animation skeletons (H-ANIM compliant, standardized hierarchical structure for humanoid animation), or animation skeleton for smart objects (objects which can be manipulated by a VH). The ontology considers that a VH can have associated information about

124


its -modeling- history, landmarks, sensors used for behavioral animation algorithms, animation sequences (e.g. keyframes), smart objects and other accessories. The current version of the ontology for Virtual Humans is work in progress. As stated before, there are still missing components which are required to fulfill all the needs of a complex and multidisciplinary task such as the creation and use of Virtual Humans. However, we believe this is an important step towards a formal representation of Virtual Humans. The following are some of the main application scenarios where the ontology for Virtual Humans can play an essential role: Virtual Characters data repository: a search engine for retrieving VHs and Smart Objects with particular features/functionalities related to animation. The categories of competency questions that would correspond to this scenario are: Animation sequences, Animation algorithms, Interaction with objects and Features listing, to some extent (features linked to animation such as skeleton, geometry type). Modeling data repository: a place where a modeler/animator could find VH shapes (whether full or partial bodies) and use them to model new VH, improve or reconstruct existing ones. Categories of competency questions involved: Model history, Features listing (when referring to geometric, anthropomorphic features), Questions whose answer is a function of low/high level features (the ones dealing with similarity measures related to the anthropomorphic features). Shape recognition/extraction/analysis: a knowledge base able to answer competency questions linked to low level features of the VH shape (landmarks, topological graphs, and so on). Main users would include researchers working on algorithms for recognizing features on a shape representing a virtual/real human. Data


125

would be used on ergonomics studies, computer vision algorithms, etc. Virtual Humans are virtual entities with a rich set of functionalities an potential uses in a VE. They can be used as interfaces, they can act as avatars to visualize the user inside a virtual world or they can act as autonomous entities that communicate with the human and other entities in the VE. VHs cover the main aspects required to build a general VE application: interaction, visualization and animation. We can use the VH ontology as the basis for a unified ontology for Virtual Environments, this is let as future work.

126


Figure 7.3: Ontology for Virtual Humans: main classes associated to a VH entity.


127

Figure 7.4: Detailed view of the structural descriptors that represent different levels of geometric and animation related information.

128


Chapter 8 Conclusions 8.1

Summary of contributions

Virtual Environments (VE) are used in a broad range of contexts, ranging from sophisticated Virtual Reality (VR) simulations to video games, and all sorts of interactive applications. However, the implementation of this kind of applications is highly expensive in terms of time and human resources. Moreover, the problems related to the implementation, limit the creation of novel applications and make it difficult to improve the existing ones. Some of the main problems faced when developing interactive virtual environments include: non-extensibility, limited, interoperability, poor scalability, monolithic architecture, etc. Research in this area usually focuses on defining development frameworks with reusable and pluggable components. Instead of making reusable components at the level of the objects or virtual entities that participate in a VE, most of the proposals focus on reusable software components: algorithms, rendering routines, etc. 129

130

Chapter 8: Conclusions

The main contribution of this research is the specification of a modeling approach for developing interactive Virtual Environments. The specification consists of a semantics-based framework that allows for implementing distributed systems composed by autonomous self-included digital entities. Virtual entities defined with this modeling approach can be reused in a variety of visualization contexts -from realistic rendering to highly simplified representationsand contain the information required to implement different types of interfaces to control them, from 3D manipulators to “classical” 2D GUIs. Moreover, the model can also be applied in the area of animation, allowing the creation of more responsive virtual entities and improving the believability of the VE. We have demonstrated the advantages of modeling Virtual Environments as a set of virtual entities with high-level semantics. We described different applications that were built using reusable virtual entities: multimodal adaptive interaction and use of novel user interfaces (Chapter 4); animation of autonomous characters (Chapter 6); and dynamic visualization: efficient lossless coding of semi-interactive CGI films (Chapter 5). A remaining problem was the management of the virtual entities and the knowledge associated with them. To solve this issue, we have applied Semantic Web technologies, which are targeted at easing the knowledge management and information retrieval. We have shown a first example of how to formalize the main concepts and components related to Virtual Humans, one of the most complex entities of a VE, by means of ontologies (Chapter 7). Ontology-based Virtual Environments are expected to facilitate the reuse of virtual entities and their interpretation by both hu-


131

mans and machines. In fact, ontologies are targeted to render concepts, information and knowledge readable and understandable not only by human users, but also by computers. Semantic Virtual Environments allow for managing virtual objects as knowledgeable entities. This opens a number of possibilities for: increasing the productivity of VE-based applications (re-using self-contained virtual objects instead of algorithms or low-level libraries); exploring novel interaction paradigms (thanks to meta-interfaces that ease the re-configuration and adaptation of communication channels, as described in Chapter 4); advancing the state of the art on transcoding techniques (see Chapter 5), etc.

8.2

Limitations and future work

The applications created following the principles we have defined should work like a multi-agent environment where each component can be run separately and communicate with the rest of them. Interprocess communication for complex systems involving more than just a few components and with more intensive data exchange is not fully implemented. For the moment, we have not fully addressed the issues of efficient communications between virtual entities. Current implementations use conventional message passing with TCP. Using higher level communication protocols is let as future work. For instance, Semantic Virtual Environments could be based on http/XML-based protocols like the ones used by current Web Services architectures. Web services are a framework of software technologies designed to support interoperable machine-to-machine interaction over a network. Companies on different systems

132


can use Web services to exchange information online with business partners, customers, and suppliers [93],[95]. We believe a similar architecture could be adopted by our Semantic Virtual Environments approach in order to deploy enterprise-class applications and authoring tools. Concerning the accessibility to resources and components required to build Semantic Virtual Environments, the ideal would be to have a common repository of virtual entities available to both content producers (designers, animators, etc.) and application providers (developers of applications composed of digital items). The applications described in this thesis apply different facets of a semantic modeling approach. However, there is no common repository where a developer can store and retrieve virtual entities. Future work includes applying the ontologies described in Chapter 7 in the development of knowledge management systems that allow for exchanging, retrieving and organizing Virtual Entities. In the same direction taken by the Semantic Web initiative, future work can be targeted at providing effective repositories of VE’s and their components that can be accessible to end-users, developers and researchers. Semantic Virtual Environments enable content adaptation of virtual environments at the visualization, animation and interaction levels. A digital item can communicate the different methods and possibilities for rendering, animation and interacting with it. Although we have proposed the foundations for this novel concept adaptation approach, we have not fully developed this idea called “Semantic Rendering”. Semantic Rendering is a consequence and an innovative application of our semanticsbased representation. Independent digital items can be visualized in different ways


133

depending on the context: as text, 2D or 3D, adaptation applies also to interaction. The term “Semantic Rendering” refers to an enhanced transcoding methodology based on the meaning -semantics- of the entities that compose a VE. The semantics of a virtual entity include features and attributes such as:

• Interaction capabilities • Role(s) in virtual environment(s), e.g. autonomous character, landscape, 3D widget. • Animation algorithms (behavior), e.g. AI algorithms used to animate this virtual entity, key-frame animation(s) suitable for this entity. • Multiple representations, geometric representations used to visualize this entity, such as polygonal meshes with different levels of detail, 2D representations, textbased representations, etc.

Semantic Rendering will enable adaptive VR content: same content, multiple levels of representation and interaction. A unique semantic Virtual Environment could be visualized and used through different devices: PDAs, Graphics Workstations, Thin-Clients (e.g. Set-top boxes), etc. Finally, the semantic representation we have proposed can be further exploited by means of powerful query languages to search and retrieve individual virtual entities or entire virtual worlds. Thanks to well defined ontologies, search criteria will range from low-level features up to high-level semantics such as natural language descriptions or interactive properties. Content retrieved in this way will be used to build new virtual

134


environment applications or to re-use existing ones. This could eventually lead to a transparent integration of Virtual Worlds into our daily life.

Bibliography [1] 3ds max. 3ds max, animation modeling and rendering software. http://www.discreet.com/products/3dsmax/. [2] A. Yamada, M. Pickering, S. Jeannin, L. Cieplinski, J.-R. Ohm, M. Kim. Text of 15938-3/FCD Information Technology Multimedia Content Description Interface Part 3 Visual, Tech. Rep. N4062, ISO/IEC JTC1/SC29/WG11, Singapore, SG, March 2001. [3] T. Abaci, R. de Bondeli, J. Ciger, M. Clavien, F. Erol, M. Gutierrez, S. Noverraz, O. Renault, F. Vexo, and D. Thalmann. The enigma of the sphinx. In Proceedings of the 2003 International Conference on Cyberworlds, pages 106– 113. IEEE Computer Society, 2003. [4] T. Abaci, R. de Bondeli, J. Ciger, M. Clavien, F. Erol, M. Gutierrez, S. Noverraz, O. Renault, F. Vexo, and D. Thalmann. Magic wand and enigma of the sphinx. Computers & Graphics, 28(4):477–484, August 2004. [5] D. Anderson, J. Barrus, J. Howard, C. Rich, and R. Waters. Building multiuser interactive multimedia environments at merl. IEEE Multimedia, 2(4):77–82, Winter 1995. [6] P. Andry, P. Gaussier, S. Moga, J. Banquet, and J. Nadel. Learning and communication in imitation: An autonomous robot perspective. In IEEE Transaction on Systems, Man and Cybernetics, Part A: Systems and Humans, pages 431–444, 2001. [7] Apache Software Foundation. Xerces-C++, validating XML parser. http://xml.apache.org/xerces-c/. [8] Ascension Technology Corporation. Flock of birds. http://www.ascension-tech.com/products/flockofbirds.php. [9] K. Ashida, S.-J. Lee, J. Allbeck, H. Sun, N. Badler, and D. Metaxas. Pedestrians: Creating agent behaviors through statistical analysis of observation data. In Computer Animation 2001, 2001. 135

136

Bibliography

[10] ATI Technologies Inc. ATI Delivers First 3D Gaming Chip For Cellphones, Press Release January 2004. http://www.ati.com/companyinfo/press/2004/4719.html. [11] ATI Technologies Inc. ATI’s XILLEONTM Offers Comprehensive Suite of Graphics and Video Solutions For Windows Media Center Extender Products, Press Release March 2004. http://www.ati.com/companyinfo/press/2004/4742.html. [12] N. Badler, D. Chi, and S. Chopra-Khullar. Virtual human animation based on movement observation and cognitive behavior models. In Computer Animation 1999, pages 128–137, 1999. [13] S. B. Banks, M. R. Stytz, and E. Santos. Towards an adaptive man-machine interface for virtual environments. In Proceedings of Intelligent Information Systems, IIS ’97, pages 90–94, 1997. [14] C. Benoit, J.-C. Martin, C. Pelachaud, L. Schomaker, and B. Suhm. Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation, chapter AudioVisual and Multimodal Speech-Based Systems, pages 102–203. Kluwer, 2000. [15] T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. In: Scientific American, May 17, 2001. [16] A. Bierbaum, C. Just, P. Hartling, K. Meinert, A. Baker, and C. Cruz-Neira. Vr juggler: A virtual platform for virtual reality application development. In Proceedings of the Virtual Reality 2001 Conference (VR’01), pages 89–96. IEEE Computer Society, 2001. [17] R. P. Biuk-Aghai and S. J. Simoff. An integrative framework for knowledge extraction in collaborative virtual environments. In GROUP ’01: Proceedings of the 2001 International ACM SIGGROUP Conference on Supporting Group Work, pages 61–70. ACM Press, 2001. [18] B. Blumberg and T. Galyean. Multi-level direction of autonomous creatures for real-time virtual environments. In SIGGRAPH 1995, 1995. [19] I. M. Boier-Martin. Adaptive graphics. IEEE Computer Graphics and Applications, 23(1):6–10, Jan.-Feb. 2003. [20] BonesPro. Skeletal deformation system for 3ds max http://www.di-o-matic.com/products/max/bonespro/. [21] J. Bormans, J. Gelissen, and A. Perkis. Mpeg-21: The 21st century multimedia framework. IEEE Signal Processing Magazine, 20(2):53–62, 2003.

Bibliography

137

[22] J.-Y. Bouguet. Pyramidal implementation of the lucas kanade feature tracker. In OpenCV Documentation, Microprocessor Research Labs, Intel Corp., 2000. [23] R. Boulic and R. Mas. Hierarchical kinematics behaviors for complex articulated figures. In Interactive Computer Animation, Prentice Hall, pages 40–70, 1996. [24] D. Bowman. Interaction Techniques for Common Tasks in Immersive Virtual Environments. PhD thesis, Georgia Institute of Technology, June 1999. [25] D. Bowman, E. Davis, A. Badre, and L. Hodges. Maintaining spatial orientation during travel in an immersive virtual environment. In Presence: Teleoperators and Virtual Environments, volume 8, pages 618–631, 1999. [26] H. M. Briceo, P. V. Sander, L. McMillan, S. Gortler, and H. Hoppe. Geometry videos: a new representation for 3d animations. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 136–146. Eurographics Association, 2003. [27] I. Brown and G. Loeb. Design of a mathematical model of force in whole skeletal muscle. In IEEE 17th Annual Conference on Engineering in Medicine and Biology Society, pages 1243–1244, 1995. [28] I. Burnett, R. Van De Walle, K. Hill, J. Bormans, and F. Pereira. Mpeg-21: goals and achievements. IEEE Multimedia, 10(4):60–70, 2003. [29] T. Capin, E. Petajan, and J. Ostermann. Efficient modeling of virtual humans in mpeg-4. IEEE International Conference on Multimedia and Expo (ICME), 2, 2000. [30] T. Capin, E. Petajan, and J. Ostermann. Very low bitrate coding of virtual human animation in mpeg-4. IEEE International Conference on Multimedia and Expo (ICME), 2, 2000. [31] V. Cardellini, P. S. Yu, and Y.-W. Huang. Collaborative proxy system for distributed web content transcoding. In CIKM ’00: Proceedings of the ninth international conference on Information and knowledge management, pages 520–527, New York, NY, USA, 2000. ACM Press. [32] C. Chen. Information Visualization and Virtual Environments. Springer-Verlag, London, UK, 1999. [33] C. Chen and M. Czerwinski. From latent semantics to spatial hypertextan integrated approach. In HYPERTEXT ’98: Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space—structure in hypermedia systems, pages 77–86. ACM Press, 1998.

138

Bibliography

[34] C. Chen, L. Thomas, J. Cole, and C. Chennawasin. Representing the semantics of virtual spaces. IEEE Multimedia, 6(2):54–63, 1999. [35] A. Cheok, X. Yang, Z. Ying, M. Billinghurst, and H. Kato. Touch-space: Mixed reality game space based on ubiquitous, tangible, and social computing. In Personal and Ubiquitous Computing, volume 6, january 2002. [36] J. Chudziak and M. Piotrowski. Semantic support for multimedia information system. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pages 3914–3919, 1995. [37] J. Ciger, M. Gutierrez, F. Vexo, and D. Thalmann. The magic wand. In Proceedings of Spring Conference on Computer Graphics. (SCCG ’2003), pages 132–138, 2003. [38] J. Ciger, M. Gutierrez, F. Vexo, and D. Thalmann. The magic wand. In Proceedings of Spring Conference on Computer Graphics 2003, pages 132–138, Budmerice, Slovak Republic, 2003. [39] D. Conner, S. Snibbe, K. Herndon, D. Robbins, R. Zeleznik, and A. van Dam. Three dimensional widgets. In Symposium on Interactive 3D Graphics, March 1992. [40] R. Cucchiara, C. Grana, and A. Prati. Semantic transcoding for live video server. In MULTIMEDIA ’02: Proceedings of the tenth ACM international conference on Multimedia, pages 223–226, New York, NY, USA, 2002. ACM Press. [41] J. Davies, A. Duke, and Y. Sure. Ontoshare: a knowledge management environment for virtual communities of practice. In K-CAP ’03: Proceedings of the international conference on Knowledge capture, pages 20–27. ACM Press, 2003. [42] DecisionSoft Limited. Pathan library, XPath expressions parser and evaluator. [43] T. Deepak, A. Goswami, and N. Badler. Real-time inverse kinematics techniques for anthropomorphic limbs. In Graphical Models and Image Processing, 2000. [44] R. Dörner and P. Grimm. Three-dimensional beans–creating web content using 3d components in a 3d authoring environment. In VRML ’00: Proceedings of the fifth symposium on Virtual reality modeling language (Web3D-VRML), pages 69–74. ACM Press, 2000. [45] P. Dragicevic. Un modèle d’interaction en entrée pour des systèmes interactifs multi-dispositifs hautement configurables. PhD thesis, Université de Nantes, March 2004.

Bibliography

139

[46] P. Dragicevic and J.-D. Fekete. The input configurator toolkit: towards high input adaptability in interactive applications. In Proceedings of the working conference on Advanced visual interfaces, pages 244–247. ACM Press, 2004. [47] J. Esmerado. A Model of Interaction between Virtual Humans and Objects: Application to Virtual Musicians. PhD thesis, Thesis No. 2502, EPFL, 2001. [48] J. Esmerado, F. Vexo, and D. Thalmann. Interaction in virtual worlds: Application to music performers. In Proceedings of Computer Graphics International (CGI), 2002. [49] E. Farella, D. Brunelli, M. Bonfigli, L. Benini, and B. Ricco. Using palmtop computers and immersive virtual reality for cooperative archaeological analysis: the appian way case study. In International Conference on Virtual Systems and Multimedia (VSMM), Gyeongju, Korea, 2002. [50] E. Farella, D. Brunelli, M. Bonfigli, L. Benini, and B. Ricco. Multi-client cooperation and wireless pda interaction in immersive virtual environment. In Euromedia 2003 Conference, Plymouth, United Kingdom, 2003. [51] G. W. Fitzmaurice, S. Zhai, and M. H. Chignell. Virtual reality for palmtop computers. ACM Transactions on Information Systems (TOIS), 11(3):197–218, 1993. [52] F. Garcia-Cordova, A. Guerrero-Gonzalez, J. Pedreno-Molina, and J. Moran. Emulation of the animal muscular actuation system in an experimental platform. In IEEE International Conference on Systems, Man, and Cybernetics, pages 64–69, 2001. [53] P. Gorce. Dynamic postural control method for biped in unknown environment. In IEEE Transactions on Systems, Man and Cybernetics, pages 616–626, 1999. [54] T. Goto, S. Kshirsagar, and N. Magnenat-Thalmann. Automatic face cloning and animation using real-time facial feature tracking and speech acquisition. IEEE Signal Processing Magazine, 18(3):17–25, May 2001. [55] C. Greenhalgh, J. Purbrick, and D. Snowdon. Inside massive-3: flexible support for data consistency and world structuring. In CVE ’00: Proceedings of the third international conference on Collaborative virtual environments, pages 119–127. ACM Press, 2000. [56] C. Grimsdale. dvs-distributed virtual environment system. In Proceedings of Computer Graphics’91, London, UK, Bleinheim Online, pages 163–170, 1991.

140

Bibliography

[57] T. Gruber. The role of a common ontology in achieving sharable, reusable knowledge bases. In Proceedings of the Second International Conference on Principles of Knowledge Representation and Reasoning, pages 601–602, 1991. [58] M. Gruninger and M. Fox. Methodology for the Design and Evaluation of Ontologies. In: Proceedings of the Workshop on Basic Ontological Issues in Knowledge Sharing, IJCAI-95, Montreal. [59] P. Grussenmeyer, M. Koehl, and M. Nour El Din. 3d geometric and semantic modelling in historic sities. XVII International Committee for Architectural Photogrammetry (CIPA) International Symposium, Brazil, 1999. [60] N. Guarino. Formal ontology and information systems. In Proceedings of FOIS 98, (Trento, Italy, June, 1998). IOS Press, pages 3–15, 1998. [61] M. Guihard and P. Gorce. Dynamic control of an artificial muscle arm. In IEEE Transactions on Systems, Man and Cybernetics, pages 813–818, 1999. [62] M. Gutierrez, P. Lemoine, D. Thalmann, and F. Vexo. Telerehabilitation: Controlling haptic virtual environments through handheld interfaces. In Proceedings of ACM Symposium on Virtual Reality Software and Technology (VRST 2004), pages 195–200, 2004. [63] M. Gutierrez, D. Thalmann, F. Vexo, L. Moccozet, N. Magnenat-Thalmann, M. Mortara, and M. Spagnuolo. An ontology of virtual humans: incorporating semantics into human shapes. In Proceedings of Workshop towards Semantic Virtual Environments (SVE05), March 2005, pages 57–67, 2005. [64] M. Gutierrez, F. Vexo, and D. Thalmann. A mpeg-4 virtual human animation engine for interactive web based applications. In Proceedings of the 11th IEEE International Workshop on Robot and Human Interactive Communication (ROMAN 2002), pages 554–559, 2002. [65] M. Gutierrez, F. Vexo, and D. Thalmann. Controlling virtual humans using pdas. In Proceedings of the 9th International Conference on Multi-Media Modeling (MMM’03), 2003. [66] M. Gutiérrez, F. Vexo, and D. Thalmann. The mobile animator: Interactive character animation in collaborative virtual environments. In IEEE Virtual Reality 2004, pages 125–132. Chicago, USA, March 2004. [67] M. Gutierrez, F. Vexo, and D. Thalmann. Reflex movements for a virtual human: a biology inspired approach. In Proceedings of the 3rd Hellenic Conference on Artificial Intelligence, Special Session on Intelligent Virtual Environments (SETN 2004), Lecture Notes in Artificial Intelligence, pages 525–534. Springer Verlag, 2004.

Bibliography

141

[68] M. Gutierrez, F. Vexo, and D. Thalmann. Semantics-based representation of virtual environments. International Journal of Computer Applications in Technology (IJCAT) Special issue on ”Models and methods for representing and processing shape semantics”, 23(2/3/4):229–238, 2005. [69] H-anim. The humanoid animation working group. http://www.h-anim.org. [70] O. Hagsand. Interactive multiuser ves in the dive system. IEEE Multimedia, 3(1):30–39, Spring 1996. [71] J. He, W. Levine, and G. Loeb. The modeling of the neuro-musculo-skeletal control system of a cat hindlimb. In IEEE International Symposium on Intelligent Control, pages 406–411, 1988. [72] L. Hill and C. Cruz-Neira. Palmtop interaction methods for immersive projection technology systems. In Fourth International Immersive Projection Technology Workshop (IPT 2000), 2000. [73] C. Hu, Q. Yu, Y. Li, and S. Ma. Extraction of parametric human model for posture recognition using genetic algorithm. In Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pages 518–523, 2000. [74] S. Huang, R. Baimouratov, and W. L. Nowinski. Building virtual anatomic models using java3d. In VRCAI ’04: Proceedings of the 2004 ACM SIGGRAPH international conference on Virtual Reality continuum and its applications in industry, pages 402–405. ACM Press, 2004. [75] L. Ibarria and J. Rossignac. Dynapack: space-time compression of the 3d animations of triangle meshes with fixed connectivity. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 126–135. Eurographics Association, 2003. [76] T. Ichikawa and M. Hirakawa. Visual programming, toward realization of user friendly programming environments. In Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow, pages 129–137. IEEE Computer Society Press, 1987. [77] IKAN. Inverse kinematics using analytical methods. http://hms.upenn.edu/software/ik/software.html. [78] Immersion Corporation. Haptic workstation. http://www.immersion.com.

142

Bibliography

[79] D. Isla and B. Blumberg. Object persistence for synthetic creatures. In International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2002. [80] ISO/IEC 14496-1:1999. Coding of audio-visual objects: Systems, amendment 1, december 1999. [81] ISO/IEC 14496-2:1999. Coding of audio-visual objects: Systems, amendment 1, december 1999. [82] ISO/IEC 14496-2:1999. Information Technology – Coding of Audio-Visual Objects, Part 1: Systems (MPEG-4 v.2), December 1999. ISO/IEC JTC 1/SC 29/WG 11 Document No. W2739. [83] J-C. Dufourd (Editor). Rationale and Draft Requirements for a Simple Scene Description Format, October 2003. ISO/IEC JTC1/SC29/WG11 Document No. N6038. [84] Z. Kacic-Alesic, M. Nordenstam, and D. Bullock. A practical dynamics system. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 7–16. Eurographics Association, 2003. [85] M. Kallmann and D. Thalmann. Direct 3d interaction with smart objects. In VRST ’99: Proceedings of the ACM symposium on Virtual reality software and technology, pages 124–130. ACM Press, 1999. [86] H. B. Kang and Y. M. Kwon. The needs and possibilities of non-photorealistic rendering for virtual heritage. In VSMM2002: Proceedings of the 8th International Conference on Virtual Systems and Multimedia, 2002. [87] Z. Karni and C. Gotsman. Compression of soft-body animation sequences. Computers & Graphics, 28(1):25–34, February 2004. [88] A. Karniel and G. Inbar. A model for learning human reaching-movements. In 18th Annual International Conference of the IEEE on Engineering in Medicine and Biology Society, pages 619–620, 1996. [89] J. Kimball. Organization of the Nervous System, Kimball’s Biology Pages. 2003. http://users.rcn.com/jkimball.ma.ultranet/BiologyPages. [90] H. Ko and N. Badler. Animating human locomotion with inverse dynamics. In IEEE Computer Graphics and Applications, pages 50–59, 1996. [91] T. Komura, A. Kuroda, S. Kudoh, T. Lan, and Y. Shinagawa. An inverse kinematics method for 3d figures with motion data. In Computer Graphics International, 2003, pages 242–247, 2003.

Bibliography

143

[92] E. Kubica, D. Wang, and D. Winter. Feedforward and deterministic fuzzy control of balance and posture during human gait. In IEEE International Conference on Robotics and Automation, pages 2293–2298, 2001. [93] N. Leavitt. Are web services finally ready to deliver? 37(11):14–18, 2004.

IEEE Computer,

[94] J. E. Lengyel. Compression of time-dependent geometry. In Proceedings of the 1999 symposium on Interactive 3D graphics, pages 89–95. ACM Press, 1999. [95] D. Liu, J. Peng, K. H. Law, and G. Wiederhold. Efficient integration of web services with distributed data flow and active mediation. In ICEC ’04: Proceedings of the 6th international conference on Electronic commerce, pages 11–20, New York, NY, USA, 2004. ACM Press. [96] R. Loureiro, F. Amirabdollahian, S. Coote, E. Stokes, and W. Harwin. Using haptics technology to deliver motivational therapies in stroke patients: Concepts and initial pilot studies. In Proceedings of EuroHaptics 2001, 2001. [97] M. R. Macedonia, D. P. Brutzman, M. J. Zyda, D. R. Pratt, P. T. Barham, J. Falby, and J. Locke. Npsnet: a multi-player 3d virtual environment over the internet. In SI3D ’95: Proceedings of the 1995 symposium on Interactive 3D graphics, pages 93–94. ACM Press, 1995. [98] F. Mantovani. Towards Cyber-Psychology: Mind, Cognitions and Society in the Internet Age, chapter VR learning: Potential and Challenges for the Use of 3D Environments in Education and Training. IOS Press, 2003. [99] J. Martinez. ”MPEG-7 Overview”, ISO/IEC JTC1/SC29/WG11N5525, Pattaya, March 2003. [100] Maya. 3d animation and effects software. http://www.alias.com/eng/products-services/maya/index.shtml. [101] M. Metso, A. Koivisto, and J. Sauvola. Multimedia adaptation for dynamic environments. In IEEE Second Workshop on Multimedia Signal Processing, pages 203–208, 1998. [102] Microsoft Developer Network. Using Matrices, DirectX 9 Tutorials, Microsoft Corporation 2004. http://msdn.microsoft.com/archive/default.asp?url=/archive /en-us/directx9 c/directx/graphics/programmingguide/ TutorialsAndSamplesAndToolsAndTips /tutorials/3/step1.asp.

144

Bibliography

[103] M. Mine. Towards virtual reality for the masses: 10 years of research at disney’s vr studio. In Proceedings of the workshop on Virtual environments 2003, pages 11–17. ACM Press, 2003. [104] Moving Picture Experts Group. Call for Proposals for Lightweight Scene Representation, Munchen – March 2004. ISO/IEC JTC1/SC29/WG11 Document No. N6337. [105] MultiGen-Paradigm, Inc. Vega Prime, http://www.multigen.com/products/runtime/vega prime/index.shtml, 2005. [106] National Instruments Corporation. LabVIEW: Graphical development environment for signal acquisition, measurement analysis, and data presentation. http://www.ni.com/labview/. [107] S. Northrup, N. Sarkar, and K. Kawamura. Biologically-inspired control architecture for a humanoid robot. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1100–1105, 2001. [108] G. O’Driscoll and G. O’Driscoll. The Essential Guide to Digital Set-Top Boxes and Interactive TV. Prentice Hall PTR, 1999. [109] S. Ok, K. Miyashita, and K. Hase. Evolving bipedal locomotion with genetic programming - a preliminary report. In Congress on Evolutionary Computation, pages 1025–1032, 2001. [110] M. Oliveira, J. Crowcroft, and M. Slater. An innovative design approach to build virtual environment systems. In EGVE ’03: Proceedings of the workshop on Virtual environments 2003, pages 143–151. ACM Press, 2003. [111] OpenCV. Open Source Computer Vision Library http://www.intel.com/research/mrl/research/opencv/. [112] OSG Community. OpenSceneGraph, http://www.openscenegraph.org, 2005. [113] K. Otto. Towards semantic virtual environments. In SVE ’05: Workshop towards Semantic Virtual Environments, pages 47–56, 2005. [114] S. Oviatt. Advances in robust multimodal interface design. IEEE Computer Graphics and Applications, 23(5):62–68, 2003. [115] S. Oviatt, P. Cohen, L. Wu, J. Vergo, L. Duncan, B. S. J. Bers, T. Holzman, T. Winograd, J. Landay, J. Larson, and D. Ferro. Designing the user interface for multimodal speech and gesture applications: State-of-the-art systems and research directions. Human Computer Interaction, 15(4):263–322, 2000.

Bibliography

145

[116] P. van Beek, A. B. Benitez, J. Heuer, J. Martinez, P. Salembier, Y. Shibata, J. R. Smith, T. Walker. Text of 15938-5/FCD Information Technology Multimedia Content Description Interface Part 5 Multimedia Description Schemes, Tech. rep. N3966, ISO/IEC JTC1/SC29/WG11, Singapore, SG, March 2001. [117] K. Perlin. Real time responsive animation with personality. In IEEE Transactions on Visualization and Computer Graphics, pages 5–15, 1995. [118] K. Perlin. Building virtual actors who can really act. In 2nd International Conference on Virtual Storytelling, 2003. [119] J. Pierce, A. Forsberg, M. Conway, S. Hong, and R. Zeleznik. Image plane interaction techniques in 3d immersive environments. In Symposium on Interactive 3D Graphics. Providence, Rhode Island, April 1997. [120] M. Ponder, B. Herbelin, T. Molet, S. Schertenlieb, B. Ulicny, G. Papagiannakis, N. Magnenat-Thalmann, and D. Thalmann. Immersive vr decision training: telling interactive stories featuring advanced virtual human simulation technologies. In EGVE ’03: Proceedings of the workshop on Virtual environments 2003, pages 97–106. ACM Press, 2003. [121] M. Ponder, B. Herbelin, T. Molet, S. Schertenlieb, B. Ulicny, G. Papagiannakis, N. Magnenat-Thalmann, and D. Thalmann. Immersive vr decision training: telling interactive stories featuring advanced virtual human simulation technologies. In EGVE ’03: Proceedings of the workshop on Virtual environments 2003, pages 97–106. ACM Press, 2003. [122] M. Preda and F. Prêteux. Advanced animation framework for virtual character within the mpeg-4 standard. In Proceedings IEEE International Conference on Image Processing (ICIP’2002), Rochester, NY, volume 3, pages 509–512, 22-25 September 2002. [123] L. M. Reeves, J. Lai, J. A. Larson, S. Oviatt, T. S. Balaji, S. Buisine, P. Collings, P. Cohen, B. Kraal, J.-C. Martin, M. McTear, T. Raman, K. M. Stanney, H. Su, and Q. Y. Wang. Guidelines for multimodal user interface design. Commun. ACM, 47(1):57–59, 2004. [124] N. Reithinger, J. Alexandersson, T. Becker, A. Blocher, R. Engel, M. Löckelt, J. M¨ uller, N. Pfleger, P. Poller, M. Streit, and V. Tschernomas. Smartkom: adaptive and flexible multimodal access to multiple applications. In Proceedings of the 5th international conference on Multimodal interfaces, pages 101–108. ACM Press, 2003. [125] N. Rishe. Database design: the semantic modeling approach. McGraw-Hill, 1992.

146

Bibliography

[126] I. Rodriguez, M. Peinado, R. Boulic, and D. Meziat. Reaching volumes generated by means of octal trees and cartesian constraints. In CGI 2003, 2003. [127] L. Rong and I. Burnett. Dynamic multimedia adaptation and updating of media streams with mpeg-21. In IEEE First Consumer Communications and Networking Conference (CCNC 2004), pages 436–441, 2004. [128] J. Rossignac. 3d compression made simple: Edgebreaker with zipandwrap on a corner-table. In Proceedings of the SMI 2001 International Conference on Shape Modeling and Applications, pages 278–283, 2001. [129] S. Roy, B. Shen, V. Sundaram, and R. Kumar. Application level hand-off support for mobile media transcoding sessions. In NOSSDAV ’02: Proceedings of the 12th international workshop on Network and operating systems support for digital audio and video, pages 95–104, New York, NY, USA, 2002. ACM Press. [130] E. Ruffaldi, C. Evangelista, and M. Bergamasco. Populating virtual environments using semantic web. In Proceedings of 1st Italian Semantic Web Workshop: Semantic Web Applications and Perspectives. http://semanticweb.deit.univpm.it/ swap2004/program.html, 2004. [131] P. Salembier. Overview of the mpeg-7 standard and of future challenges for visual information analysis. EURASIP Journal on Applied Signal Processing, 4:1–11, 2002. [132] S. Schertenleib, M. Gutierrez, F. Vexo, and D. Thalmann. Conducting a virtual orchestra: Multimedia interaction in a music performance simulator. IEEE Multimedia, Special Issue on Multisensory Communication and Experience through Multimedia, 11(3):40–49, July 2004. [133] Sense8 Corporation. WorldToolkit Reference Manual – Release 7. Mill Valley, CA, 1998. [134] Shout3D. Shout3D, Eyematic Interfaces Inc. (2001). http://www.shout3d.com. [135] Silicon Graphics, Inc. Open Inventor, http://oss.sgi.com/projects/inventor/, 2005. [136] Silicon Graphics, Inc. OpenGL Performer, http://www.sgi.com/products/software/performer/, 2005. [137] M. Soto and S. Allongue. Modeling methods for reusable and interoperable virtual entities in multimedia virtual worlds. Multimedia Tools Appl., 16(12):161–177, 2002.

Bibliography

147

[138] K. E. Steiner and J. Tomkins. Narrative event adaptation in virtual environments. In Proceedings of the 9th international conference on Intelligent user interface, pages 46–53. ACM Press, 2004. [139] D. Sturman. The state of computer animation. SIGGRAPH Comput. Graph., 32(1):57–61, 1998. [140] Sun Microsystems, Inc. The Java 3D API Specification, http://java.sun.com/products/java-media/3D/forDevelopers/J3D 1 3 API/ j3dguide/Concepts.html, 2002. [141] L. Tanco and A. Hilton. Realistic synthesis of novel human movements from a database of motion capture examples. In Workshop on Human Motion, pages 137–142, 2000. [142] G. Taubin. 3D Geometry Compression and Progressive Transmission, Eurographics State of the Art Report, September 1999. [143] H. Tramberend. Avocado: A distributed virtual reality framework. In VR ’99: Proceedings of the IEEE Virtual Reality, pages 14–21. IEEE Computer Society, 1999. [144] X. Tu and D. Terzopoulos. Artificial fishes: physics, locomotion, perception, behavior. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, pages 43–50. ACM Press, 1994. [145] M. Uschold and M. Gruninger. Ontologies: Principles, methods and applications. Knowledge Engineering Review, 11(2):93155, February 1996. [146] A. van Dam, D. H. Laidlaw, and R. M. Simpson. Experiments in immersive virtual reality for scientific visualization. Computers and Graphics, 26(4):535– 555, August 2002. [147] A. Vetro, C. Christopoulos, and H. Sun. Video transcoding architectures and techniques: an overview. Signal Processing Magazine, IEEE, 20(2):18–29, 2003. [148] A. Vetro, H. Sun, and Y. Wang. Object-based transcoding for adaptable video content delivery. IEEE Transactions on Circuits and Systems for Video Technology, 11(3):387–401, 2001. [149] Virtools SA. Interactive 3d content development tools: http://www.virtools.com. [150] D. Vodislav. A visual programming model for user interface animation. In Proceedings of IEEE Symposium on Visual Languages, 23-26 Sept., pages 344 – 351, 1997.

148

Bibliography

[151] W3C. XML Path Language (XPath). [152] K. Watsen, R. Darken, and W. Capps. A handheld computer as an interaction device to a virtual environment. In Proceedings of Third International Immersive Projection Technology Workshop (IPT 1999), Stuttgart, Germany, 1999. [153] B. Witmer and M. Singer. Measuring presence in virtual environments: A presence questionnaire. In Presence: Teleoperators and Virtual Environments, volume 7, pages 225–240, 1998. [154] B. Xiao, R. Lunsford, R. Coulston, M. Wesson, and S. Oviatt. Modeling multimodal integration patterns and performance in seniors: toward adaptive processing of individual differences. In Proceedings of the 5th international conference on Multimodal interfaces, pages 265–272. ACM Press, 2003. [155] C. Yimin, Z. Tao, W. Di, and H. Yongyi. A robot simulation, monitoring and control system based on network and java3d. In Proceedings of the 4th World Congress on Intelligent Control and Automation, pages 139–143, 2002. [156] R. Zeleznik, K. Hemdon, D. Robbins, N. Huang, T. Meyer, N. Parker, and J. Hughes. An interactive 3d toolkit for constructing 3d widgets. In SIGGRAPH, August 1993.

Appendix A

Descriptors for Augmented CGI Films

149

150

Appendix A: Descriptors for Augmented CGI Films

Figure A.1: CGI film descriptor (1/2)


151

Figure A.2: CGI film descriptor (2/2)

152


Virtual Football Player executing a free shoot without goalkeeper! Default view Fixed camera filming from the right of the player. 1.0 0.9 0.1 1.0 0.4 0.7 0.3 0.11 1.0 -1.0

Figure A.3: Example of a CGI film description (1/2)


153

... ... Top view Fixed camera filming from the sky. ... Mobile view Mobile camera closely following the action!. ...

Figure A.4: Example of a CGI film description (2/2)

Curriculum Vitae (Last Updated: April, 2005)

Mario Arturo GUTIÉRREZ ALONSO Research Assistant and PhD student Virtual Reality Laboratory (VRlab) EPFL CH-1015 Lausanne, Switzerland e-mail: [email protected], [email protected]

Date and place of birth: April 17th, 1976; Toluca, Mexico. Civil state: single. Languages: Spanish (native speaker), English (fluent), French (fluent). Education: - Master of Science in Computer Sciences. ITESM Toluca Campus, Mexico (May 2001). - B.Sc., Computer Systems Engineering. ITESM Toluca Campus, Mexico. (December 1998) Experience: Since August 2001

Research Assistant and PhD student at VRlab-EPFL, Switzerland. Thesis subject on Semantic-based models for Virtual Environments. Participation to European Projects: IST-INTERFACE, IST-Worlds Studio, AIM@SHAPE Network of Excellence.

January 1999 – July 2001

Administrator of the Applied Computing Centre. ITESM Toluca Campus, Mexico. Assistant Lecturer of several graduate and undergraduate courses.

June-July 1999

Internship of engineer work at Laboratory for Analysis and Architecture of Systems/French National Center for Scientific Research (LAAS/CNRS): Development of a real time data aquisition system based on Controler Area Network cards. Toulouse, France.

June-July 1998

Internship of engineer work at Paul Sabatier University Toulouse III: Parallelisation of scientific computation codes (Monte Carlo Methods for Heat Radiation Analysis). Toulouse, France.

Selected Publications Refereed Journals -

S. Schertenleib, M. Gutiérrez, F. Vexo, D. Thalmann, Conducting a Virtual Orchestra: Multimedia Interaction in a Music Performance Simulator. IEEE Multimedia, Special Issue on Multisensory Communication and Experience through Multimedia, JulySeptember 2004, Vol. 11, Issue 3, pages. 40-49.

-

T. Abaci, R. de Bondeli, J. Ciger, M. Clavien, F. Erol, M. Gutiérrez, S. Noverraz, O. Renault, F. Vexo, D. Thalmann, Magic Wand and Enigma of the Sphinx. Computers & Graphics, Volume 28, Issue 4, August 2004, pages 477-484.

-

M. Gutierrez, F. Vexo, and D. Thalmann. Semantics-based representation of virtual environments. International Journal of Computer Applications in Technology (IJCAT) Special issue on "Models and methods for representing and processing shape semantics", Vol. 23, Issue (2/3/4), pages 229-238, 2005.

-

Rynson W. H. Lau, Frederick Li, Tosiyasu L. Kunii, Baining Guo, Bo Zhang, Nadia Magnenat-Thalmann, Sumedha Kshirsagar, Daniel Thalmann, Mario Gutiérrez, Emerging Web Graphics Standards and Technologies. IEEE Computer Graphics and Applications. Vol. 23, No. 1, January/February 2003, pages 66-75.

Refereed Conferences - R. Ott, M. Gutierrez, D. Thalmann, F. Vexo, VR Haptic Interfaces for Teleoperation: an Evaluation Study. Proceedings of the IEEE Intelligent Vehicles Symposium: IV'05, June 2005, Las Vegas, Nevada, USA (to appear). - R. Ott, M. Gutierrez, D. Thalmann, F. Vexo, Improving User Comfort in Haptic Virtual Environments trough Gravity Compensation. Proceedings of the First Joint Eurohaptics Conference And Symposium On Haptic Interfaces For Virtual Environment And Teleoperator Systems : WorldHaptics'05, March 2005, Pisa, Italy, pages 401-409. - M. Gutierrez, D. Thalmann, F. Vexo, L. Moccozet, N. Magnenat-Thalmann, M. Mortara, M. Spagnuolo, An Ontology of Virtual Humans: incorporating semantics into human shapes. Workshop towards Semantic Virtual Environments (SVE05), March 2005, Villars, Switzerland, pages 57-67. - M. Gutierrez, D. Thalmann, F. Vexo, Semantic Virtual Environments with Adaptive Multimodal Interfaces. 11th International Conference on Multimedia Modelling (MMM2005), Melbourne, Australia, January 2005, pages 277-283.

- M. Gutierrez, D. Thalmann, F. Vexo, Augmented CGI Films. 2nd. International Conference on Intelligent Access of Multimedia Documents (MediaNet'04), Tozeur, Tunisia, November 2004, pages 53-64. - M. Gutierrez, P. Lemoine, D. Thalmann, F. Vexo. Telerehabilitation: Controlling Haptic Virtual Environments through Handheld Interfaces. Proceedings of ACM Symposium on Virtual Reality Software and Technology (VRST 2004), Hong Kong, November 2004, pages 195-200. - M. Gutierrez, D. Thalmann, and F. Vexo. Creating Cyberworlds: Experiences in Computer Science Education. Proceedings of International Conference on Cyberworlds, Tokyo, Japan, November 2004, pages 401 – 408. - M. Gutierrez, R. Ott, D. Thalmann, and F. Vexo. Mediators: Virtual haptic interfaces for tele-operated robots. Proceedings of the 13th IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN 2004), Kurashiki, Okayama, Japan, September 2004. pages 515-520. - P. Lemoine, M. Gutiérrez, F. Vexo, D. Thalmann, Mediators: Virtual Interfaces with Haptic Feedback. Proceedings of the 4th International Conference EuroHaptics 2004, Munich, Germany, June 2004, pages 68-73. - M. Gutiérrez, F. Vexo, D. Thalmann, Reflex Movements for a Virtual Human: a Biology Inspired Approach. Proceedings of the 3rd Hellenic Conference on Artificial Intelligence, Special Session on Intelligent Virtual Environments (SETN 2004), Lecture Notes in Artificial Intelligence, Springer Verlag, Samos, Greece, May 2004, pages 525 534. - M. Gutiérrez, F. Vexo, D. Thalmann, The Mobile Animator: Interactive Character Animation in Collaborative Virtual Environments. Proceedings of IEEE Virtual Reality 2004, Chicago, USA, March 2004, pages 125-132. - J. Ciger, M. Gutiérrez, F. Vexo, D.Thalmann, The Magic Wand. Proceedings of Spring Conference on Computer Graphics. (SCCG '2003), Budmerice, Slovak Republic, April 2003, pages 132-138.