Spatial Awareness in Full-Body Immersive ... - Semantic Scholar

Spatial Awareness in Full-Body Immersive Interactions: Where do we Stand ? Ronan Boulic1, Damien Maupu1, Manuel Peinado2, Daniel Raunhardt3, 1

VRLAB, Ecole Polytechnique Fédérale de Lausanne, station 14, 1015 Lausanne, Switzerland 2 Universidad de Alcalá, Departamento de Automática, Spain 3 BBV Software Service AG, Zug, Switzerland {Ronan.Boulic,Damien.Maupu}@epfl.ch, {Manuel.Peinado, Daniel.Raunhardt}@gmail.com

Abstract. We are interested in developing real-time applications such as games or virtual prototyping that take advantage of the user full-body input to control a wide range of entities, from a self-similar avatar to any type of animated characters, including virtual humanoids with differing size and proportions. The key issue is, as always in real-time interactions, to identify the key factors that should get computational resources for ensuring the best user interaction efficiency. For this reason we first recall the definition and scope of such essential terms as immersion and presence, while clarifying the confusion existing in the fields of Virtual Reality and Games. This is done in conjunction with a short literature survey relating our interaction efficiency goal to key inspirations and findings from the field of Action Neuroscience. We then briefly describe our full-body real-time postural control with proactive local collision avoidance. The concept of obstacle spherification is introduced both to reduce local minima and to decrease the user cognitive task while interacting in complex environments. Finally we stress the interest of the egocentric environment scaling so that the user egocentric space matches the one of a height-differing controlled avatar. Keywords: spatial awareness, immersion, presence, collision avoidance.

1 Introduction Full-body interactions has a long history in Virtual Reality and start to have significant commercial successes with dedicated products. However exploiting the full-body movements is still far from achieving its full potential. Current spatial interactions are limited to the control of severely restricted gesture spaces with limited interactions with the environment (e.g. the ball of a racquet game). In most games the postural correspondence of the player with the corresponding avatar posture generally doesn’t matter as long as the involvement is ensured and preserved over the whole game duration. We are interested in achieving two goals: 1) increasing the user spatial awareness in complex environment through a closer correspondence of the user and the avatar postures, and 2) impersonating potentially widely different entities ranging from a self-similar avatar to any type of animated characters, including virtual

humanoids with differing size and proportions. These two goals may sound contradictory when the animated character differs markedly from the user but we are convinced it is a definitely useful long term goal to identify the boundary conditions of distorted self-avatar acceptance. For the time being, the core issue remains, as always in real-time interactions, to identify the key factors that should get computational resources for ensuring the best interaction efficiency, either as a gamer or as an engineer evaluating a virtual prototype for a large population of future users. Within this frame of mind it is important to recall the definition and scope of such essential terms as immersion and presence, while clarifying the confusion existing in the fields of Virtual Reality and Games. This is addressed in the first part of section 2 in conjunction with a literature survey relating our interaction efficiency goal to key inspirations and findings from the field of Action Neuroscience. The second part of this background section deals with the handling of collision avoidance in real-time interactions. Section 3 then briefly describe our full-body real-time postural control with proactive local collision avoidance. The concept of obstacle spherification is introduced both to reduce local minima and to decrease the user cognitive task while interacting in complex environments. Finally we stress the interest of the egocentric environment scaling so that the user egocentric space matches the one of a heightdiffering controlled avatar.

2 Background Spatial awareness in our context is the ability to infer one’s interaction potential in a complex environment from one’s continuous sensorimotor assessment of the surrounding virtual environment. It is only one aspect of feedback and awareness considered necessary to maintain fluent collaborations in virtual environments [GMMG08]. We first address terminology issues in conjunction with a brief historical perspective. We then recall the key references about handling collision avoidance for human or humanoid robot interactions. 2.1 What Action Neuroscience tells us about Immersive Interactions The dualism of mind and body from Descartes is long gone but no single alternative theory is yet able to replace it as an explanatory framework integrating both human perception and action into a coherent whole. Among the numerous contributors to alternate views to this problem, Heidegger is often acknowledged as the first to formalize the field of embodied interactions [D01] [MS99] [ZJ98] through the neologisms of ready-to-hand and present-at-hand [H27]. Both were illustrated through the use of a hammer that can be ready-to-hand when used in a standard task; in such a case it recedes from awareness as if it became part of the user’s body. The hammer can be present-at-hand when it becomes unusable and has to be examined to be fixed. Most of human activities are spent according to the readiness-to-hand mode similar to a subconscious autopilot mode. On the other hand human creativity emerges through the periods in present-at-hand mode where problems have to be faced and solved.

We are inclined to see a nice generalization of these two modes in the work of the psychologist Csikszenmihalyi who studied autotelic behaviors, i.e. self-motivational activities, of people who showed to be deeply involved in a complex activity without direct rewards [NvDR08][S04]. Csikszenmihalyi asserted that autotelicity arises from a subtil balance between the exertion of available skills and addressing new challenges. He called flow the strong form of enjoyment resulting from the performance of an autotelic activity where one easily looses the sense of time and of oneself [S04]. This term is now in common use in the field of game design, or even teaching, together with the term of involvement. Both terms are most likely associated to the content of the interactive experience (e.g. the game design) rather than the form of the interaction (the sensory output which is generic across a wide range of game designs). The different logical levels of content and form have not been consistently used in the literature about interactive virtual environments, hence generating some confusion about the use of the word presence, (e.g. in [R03][R06]). As clearly stated by Slater, presence has nothing to do with the content of an interactive experience but only with its form [S02]. It qualifies the extent to which the simulated sensory data convey the feeling of being there even if cognitively one knows not to be in a real life situation [S03]. As Slater puts it, a virtual environment system can induce a high presence, but one may find the designed interaction plain boring. Recently Slater has opted for using the expression Place Illusion (PI) in lieu of presence due to the existing confusion described above [RSSS09]. While PI refers to the static aspects of the virtual environment, the additional term of Plausibility (Psi) refers to its dynamic aspects [RSSS09]. Both constitute the core of a new evaluation methodology presented in [SSC10]. A complementary view to explaining presence/PI is also grounded on the phenomenology approach from Heidegger and the more recent body of work from Gibson [FH98] [ZJ98] [ON01] both characterized by the ability to ‘do’ there [SS05]. However it is useful to recall the findings from Jeannerod [J09] that questions the validity of the Gibsonian approach [ON01]. It is based on the display of point-light movement sequences inspired by the original study of Gunnar Johansson on the perception of human movement [J73]. The point light sequences belong to three families: movement without/with human character, and for this latter class, without meaning (sign language) or know meaning (pantomime) for the subjects. Movements without human characters appear to be processed only in the visual cortex, whereas those with human character are treated in two distinct region depending on whether they are known (ventral “semantic” stream) or unknown (dorsal “pragmatic”[J97] goal-directed stream). One particularly interesting case is that an accelerated human movement loses its human character and is processed only in the visual cortex. These findings have the following consequences for our field of immersive interactions when interacting with virtual humans. First it is crucial to respect the natural dynamics of human entities when animating virtual humans otherwise it may be disregarded as human altogether. Second, humans beings have internal models of human actions that are activated not only when they perform it themselves but also when they see somebody else perform it. It is hence reasonable to believe that virtual human performing good quality movement activate the corresponding internal model of immersed subjects. Viewed known and unknown human movements are treated

along different neural stream, one of which might be more difficult to verbalize a posteriori in a questionnaire as it has no semantic information associated with it. As suggested above it can be more difficult to assess presence through the usual means of questionnaires as actions performed in this mode are not performed at a conscious level. In general, alternative to questionnaires have to be devised, such as the comparison with the outcome that would occur if performed in a real world setting. Physiological measurements are particularly pertinent as Paccalin and Jeannerod report that simply viewing someone performing an action with efforts induce heart and breathe rate variations [PJ00]. Simultaneously introduced with presence, the concept of immersion refers to the objective level of sensory fidelity a Virtual Reality system provides [SS05]. For example, the level of visual immersion depends only on the system’s rendering software and display technology [BM07]. Bowman et al have chosen to study the level of visual immersion on application effectiveness by combining various immersion components such as field of view, field of regard (total size of the visual field surrounding the user), display size, display resolution, stereoscopy, frame rate, etc [BM07]. 2.2 Handling Collisions for Full-Body Avatar Control Numerous approaches have been proposed for the on-line full-body motion capture of the user, we refer the interested reader to [A] [TGB00][PMMRTB09] [UPBS08]. A method based on normalized movement allows to retarget the movement on a large variety of human-like avatars [PMLGKD08][ PMLGKD09].

Fig. 1. (a) In the rubber band method, the avatar’s body part (white dot) remains tangent to the obstacle surface while the real body part collides (black dot). (b) In the damping method [PMMRTB09], whenever a real body part enters the influence area of an obstacle (frame 2), its displacement is progressively damped (2-3) to ensure that no interpenetration happens (4-6). No damping is exerted when moving away from the obstacle (7).

Spatial awareness includes the proper handling of the avatar collisions, including self-collisions[ZB94]. In fact the control of a believable avatar should avoid collision proactively rather than reactively; this reflects the behavior observed in monkeys [B00]. In case an effective collision happens between the user and the virtual environment the standard approach is to repel the avatar hence inducing a collocation error; however such an error is less disturbing than the visual sink-in that would occur

otherwise [BH97] [BRWMPB06] (hand only) [PMMRTB09] (full body); both of these approaches are based on the rubberband method (Fig 1a) except that the second one has an additional damping region surrounding selected body parts to be able to enforce the proactive damping of segments’ movements towards obstacles (Fig 1b, Fig 2). The original posture variation damping has been proposed in [FT87] and extended in a multiple priority IK architecture in [PMMRTB09].

Fig. 2. Selective damping of the arm movement component towards the obstacles; a line is displayed between a segment and an obstacle [PMMRTB09].

3 Smoothing Collision Avoidance The damping scheme recalled in the previous section is detailed in Fig. 3 in the simplified context of a point (called an observer) moving towards a planar obstacle. Only the relative displacement component along the normal is damped. Such an approach may produce a local minima in on-line full-body interaction whenever an obstacle lies between a tracked optical marker and the following avatar segment (equivalent to the point from Fig. 3). In such a case the segment attraction is counter-balanced by the damping as visible on Fig. 4a. closest point PC on obstacle

∆pn

tangent plane

bp ∆ ∆o nn

∆p

gd

∆d ∆o

bp

Op

obp Observer O c Observer

∆bp d

d

δn

∆t

bp

ng

normal line D

g= δn⋅(d/D)2 Fig. 3. Damping in the influence area of a planar obstacle for a point-shaped observer with a relative movement towards the obstacle. The relative normal displacement δn is damped by a factor (d/D)2 [PMMRTB09]

The present section proposes a simple and continuous alteration of an obstacle normals so that obstacles appear from far as if it were a sphere whereas it progressively reveals its proper normal when the controlled segment is getting closer to its surface. Obstacle shapes offering large flat surfaces are ideal candidates for the spherification as we call it (see Fig. 4 bottom row)). The concept is simple to put into practice. Instead of using the normal to the obstacle n, we replace it by nf , a combination of n and a “spherical” normal ns which results from taking a vector from the obstacle centroid (see Fig. 4b). We have :

nf = NORMALIZE((1-k)n + k ns)

(2)

1/m

d  D

with k = 

(1)

.

In Eq. 1, k is a spherification factor which ranges from 0 at the obstacle’s surface, to 1 at the boundary of its influence area (d=D). The rationale for this factor is that we can "spherify" a lot when we are far away from the obstacle (k=1), but not when close to it (k