Taking steps - CiteSeerX

Taking Steps: The Influence of a Walking Technique on Presence in Virtual Reality MEL

SLATER,

University

This article

MARTIN

USOH,

and ANTHONY

STEED

of London

presents

an interactive

ment (or “virtual reality”). restricted to ground level.

technique

for moving

through

an immersive

virtual

environ-

The technique is suitable for applications where locomotion is The technique is derived from the idea that presence in virtual

environments may be enhanced the stronger the match between proprioceptive information from human body movements and sensory feedback from the computer-generated displays. The technique is an attempt to simulate body movements associated with walking. The participant “walks

in place”

to move through

the virtual

environment

across distances

greater

than

the

physical limitations imposed by the electromagnetic tracking devices. A neural network is used to analyze the stream of coordinates from the head-mounted display, to determine whether or not the participant is walking on the spot. Whenever it determines the walking behavior, the participant is moved through virtual space in the direction of his or her gaze. We discuss two experimental studies to assess the impact on presence of this method in comparison to the usual hand-pointing method of navigation in virtual reality. The studies suggeet that subjective rating of presence is enhanced by the walking method provided that participants associate subjectively with the virtual body provided in the environment. An application of the technique to climbing steps and ladders is also presented. Categories

and Subject

puter Graphics]: Three-Dimensional General

User/Machine Systems; H. 1.2 [Models and Principles]: and Presentation]: Multimedia Information Systems—art@-

Descriptors:

H.5. 1 [Information Interfaces cial reulutes; H.5.2 [Information Graphics Graphics

Interfaces and Presentation]: deuice interfaces; Utilities—virtual realtty and Realism—uu-tucd

Terms: Experimentation,

Human

User Interfaces; [Computer

1.3.7

1.3.4 [ComGraphics]:

Factors

Additional Key Words and Phrases: Immersion, ence, virtual environments, virtual reality

locomotion,

navigation,

neural

networks,

pres-

This is a substantially revised and expanded version of Slater et al. [1994a]. This work is funded by the UK Engineering and Physical Sciences Research Council (ESPRC) and the Department of Trade and Industry, through grant CTA/2 of the London Parallel Apphcations Centre. Anthony Steed is supported by an EPSRC research studentship. Authors’ address: Department of Computer Science and London Parallel Applications Centre, Queen Mary and Westfield College, University of London, Mile End Road, London El 4NS, U.K.; email: {reel; bigfoot; steed}@ dcs.qmw.ac.uk. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial

advantage, the copyright notice, the title of the publication. and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. @ 1995 ACM 1073-0516/95/0900-0201 $03.50 ACM Transactions on Computer-Human

Interaction,

Vol. 2, No. 3, September 1995, Pages 201-219.

202

.

Mel Slater

et al.

1. INTRODUCTION The ability to get from place to place is a fundamental requirement for action in both real and virtual environments. This requirement epitomizes what is very

powerful

yet what

These systems offer computer-generated

also may

be flawed

the possibility environments,

in virtual

of perceptually and yet the

reality

(VR)

systems.

immersing individuals typical means for the

into most-

basic form of interaction—locomotion—do not at all match the physical actions of walking in reality. Generally, the powerful illusion of immersion may be lost through naive interaction metaphors borrowed from nonimmersive forms of human-computer interaction. This article describes an interactive technique sive

virtual

environment

(or “virtual

applications where the example, while exploring

participants a virtual

for locomotion

reality”).

The

are constrained building, as in

to an

through. The novelty of the technique is that participants movements in a simulation of walking, without the additional to display (HMD) to move

in an immer-

technique

is suitable

in

ground level, for architectural walk-

carry out whole-body necessity of hardware

the electromagnetic tracking devices on the head-mounted and glove (or 3D mouse). In brief, participants “walk in place”

across

virtual

distances

that

are

greater

than

the

physical

space

determined by the range of the electromagnetic trackers. Pattern analysis of head movements as generated by the HMD predicts whether participants are walking in place or doing anything else at all. Whenever it is determined that they are walking in place, they are moved forward in the direction of gaze, so that the corresponding flow in the Such illusory self-motion is usually (ideally)

only

detects

participants an effective movements.

are

tracker

In an earlier

head

able

report

array gives the illusion of motion. vection. Since the pattern analyzer

movements

to take range,

optical called

real

without

[Slater

characteristic

physical

steps,

causing

vection

et al. 1993]

of walking while

we presented

in

remaining

surplus

place, within

to their

actual

the technique,

called

the Virtual Treadmilljl in the context of (at that time) a partially complete human factors evaluation. In this article we discuss the technique in the context of a model of presence in immersive virtual environments. We also present the implementation details and results of two empirical studies with users. The utility of this idea for climbing or descending steps and ladders is also discussed. 2, VIRTUAL

ENVIRONMENTS

2.1 The Proprioceptive A VR

system

requires

unconsciously to form data that is supplied

Sensory that

Data Loop

the

normal

proprioceptive

a mental model of the by computer-generated

1 The London Parallel Applications Centre had a holding countries to protect aspects of this technology. ACM TransactIons

on Computer-Human

Interaction,

information

we use

body be overlaid with sensory displays. Proprioception was

patent

covering

the U.K.

Vol. 2, No 3, September 1995.

and other

Taking Steps defined from

by Sacks

[1985]

the movable

position

as “that

parts

and tone and motion

a way

which

spatial

from

allows

us to form

and relational

our left

[are]

is hidden

Proprioception

can clap our two hands

together

to map

model placed

real

pant’s tion

mental devices body body

(VB). that

mation model

and sensory and the VB.

Gibson’s

[1986]

elaborate nested

these

which

notion of visual

that

moves

between

he said

sensory

panies

the optical

have object,

[Gibson

body

be employed consisting

to of a

in the environment, is not considered as a live

environment,

eyes, ears, nose, mouth.

is immersed

from

This

is

[Gibson

becomes

to specify

a leg

occurring. air;

in

such

data

must

he wrote:

we feel

the sound

that

correctly

We see our it touch caused

the “The

arms

about

relationship optical

infor-

and hands,

the environment.

by

accom-

The two sources

of

p. 116].

proprioceptive

a way

array something

information

Regarding

body,

environment.

optical

occupied,

it contains

p. 66].

the head,

1986,

of the

in the ambient

array:

1986,

in an environment,

perception

and self-perception

between

the sensory the

infor-

mental

in space, but

information

and

consistency, predictability, and completeness For example, when proprioceptive information

we hear

the

all-surrounding

a head,

to the ambient

information

relationship

is indeed through

an

point

the position

the self, including

coexist”

moved

virtual

the same apex and completely

of a position

“When

information

to specify

This

partici-

self-representa-

may

to a position

an individual

that,

observer”

mation

requires properly.

in order

proprioceptive

Such an individual

is inseparable

happens

of the

information

when

self

the occupation

interesting

pain);

that

of the

describing

body

all with

through

flow.

for an effective

array

up an abstract

on feet and with

this

We

on this

of the

as an arrangement

by an individual. taking

data

between

optical

where

model.

are required

between

particular,

angles

body

space of the mathematician.

argued

an individual very

in

the dynamic

by relying

We call

requirement

ambient

continually

and moving

perception When

world.

is conceived

observer,

not the abstract Gibson

and

unconscious.”

movements

is a consistency

solid

body

corresponding

the apex. The apex corresponds

as a disembodied standing

human

virtual

of the

This

may be occupied

animal

there

this

the proprioceptive

A fundamental

feedback,

ideas.

hierarchy

surrounding

onto

into

their but in

We know

closed eyes) similarly

from

in the

is, therefore,

the

formed

self-representation a virtual

(with

and

describes

and its parts.

flow

by which

and adjusted,

that

by tapping

on the physical

movements

reality

model

203

sensory

joints),

it is automatic

of our body to look)

unconscious

monitored

a mental

disposition having

Tracking

but

tendons,

continually

us because

foot is (without

unconscious

continuous

of our body (muscles,

.

it

leg move; the

comes

inform object

in

into

sensory

order arises

contact

with

us, in all modalities,

we hear (and

by our leg hitting

the feel

“WOOA” any

the object;

data

to function because we another that

this

as it glides

expected

level

of

and we see the

object itself react in accordance with our expectations, This loop is the crucial component of a convincing reality: the “reality” is virtual when the sensory data is computer generated. ACM Transactions

on Computer-Human

Interaction,

Vol. 2, No 3, September

1995.

204

Mel SIater et al.

.

2.2

Immersion

We

call

a computer

virtual

system

environment”

(IVE).

that

supports

It is immersive

such

experience

since

it immerses

an “immersive a representa-

tion of the person’s body (the VB) in the computer-generated environment. is a virtual environment in the sense defined by Ellis [1991]: consisting

It of

content (objects and actors), geometry and dynamics, with an egocentric frame of reference, including perception of objects in depth, and giving rise to the normal ocular, auditory, vestibular, and other sensory cues and consequences. Whether or not a system can be classified as immersive depends crucially on the hardware, software, and peripherals (displays and body sensors) of that system. We use “immersion” as a description of a technology, rather than as a psychological characterization of wh at the system supplies to the human participant. Immersion includes the extent to which the computer displays are extensive,

surrounding,

extensive rounding

organs from turn toward inclusive

inclusive,

vivid,

and

matching.

the more sensory systems that to the extent that information

they can

The

displays

accommodate. arrive at the

are

more

They are surperson’s sense

any (virtual) direction and the extent to which the individual any direction and yet remain in the environment. They

to the

extent

that

all external

sensory

data

(from

physical

can are

reality)

are shut out. Their vividness is a function of the variety and richness of the sensory information they can generate [Steuer 19921. In the context of visual displays, for example, color displays are more vivid than monochrome; high resolution is more vivid than low resolution; and displays depicting dynamically changing shadows are more vivid than those that do not. Vividness is concerned with the richness, information content, resolution,

and

immersion

requires

quality that

of the there

displays.

Finally,

is a match

between

as we have the

argued

participant’s

above, proprio-

ceptive feedback about body movements and the information generated on the displays. The greater the degree of body mapping, the greater the extent to which the movements of the body can be accurately reproduced, and therefore the greater the potential match between proprioception and sensory data. 2.3

Presence

An IVE may lead to a sense of presence for a participant taking part an experience. Presence is the psychological sense of “being there”

in such in the

environment: it is an emergent property based on the immersive base given by the technology. However, any particular immersive system does not necessarily always lead to presence for all people: the factors that determine presence, given immersion, are an important area of study [Barfield and Weghorst 1993; Heeter 1993; Held and Durlach 1992; Loomis 1992; Sheridan 1992]. We concur with Steuer [1992] that presence is the central issue for virtual reality. Our shown ACM

view concerning the relationship between in Figure 1. The x-axis is the extent

TransactIons

on Computer-Human

InteractIon,

immersion and presence is of the match between the

Vol. 2, No. 3, September

1995.

Taking Steps

.

205

z al v c al

(.0 g

Fig. 1. Presence = (flmatch(prop, sense), sense; prop = proprioception; rep = internal tion; sense = sensory data.

CL

displayed tive-world is greater must

sensory data and the internal representation systems and subjecmodels typically employed by the participant. Although immersion the greater the richness of the displays, as discussed above, we

also

take

into

allows

the

models

of reality.

of sound.

For

extent

construct

a vivid

of “reality”

the

visual

virtual

it because

own

display

body

is the

as explained

extent above.

of the

match

and match through time, without motility and locomotion—whether relative

exist

their

between

The changes

system

might

it contradicts

displayed

internal

in the VE,

personal

lag, changes of individual

caused limbs

some

self-model.

match between et al. 1994b].

proprioception must

afford

in the absence

and this [Slater

to the display

mental

might

for others

We have explored the relationship between presence subjectivity and displayed data in earlier experiments The y-axis

information

their

but be unsuited

an excellent

reject

to which

to

example,

though

might

the

individuals

a sense

Even

individuals

account

particular

some individuals

data,

match(rep, representa-

and

sensory

be consistent

with

by the individual’s or the whole body,

to the ground.

Our

general

hypothesis

“matches’’-that are orthogonal—a and tactile to construct

is

it positively system

that

presence

increases

with

might

provide

is

a

function

each of them.

a superb

degree

display immersion, so that most individuals their internal representations successfully

of

Note

these

that

two

the axes

of visual,

auditory,

have sufficient data but fail to provide a

sufficient degree of match between the person’s actions and the displayed results, thus breaking the link between sensory data and proprioception. A further point about this hypothesis at many levels. At a very basic level, parasympathetic tems.

When

should

likewise

immediately enabled.

responses

an individual respond

when

At a much

the

in,

for

focuses

example,

visually

appropriately focus

higher

level,

to a far

when

should

match

the

sensory

ACM Transactions

the

ocular

on a near and

moves

of the virtual body on nearby surfaces 1995]. At a similarly high level, the system

is that we would expect it to operate the displays should result in suitable and

object

immediately object.

a person

and

Eye

moves,

vestibular

the visual again

tracking

the shadow

sys-

displays change

should

be

structure

should change accordingly [Slater et al. interactive metaphors employed in the

data

and

on Computer-Human

proprioception. Interaction,

This

brings

Vol. 2, No 3, September

us 1995

206

.

back

Mel Slater

to walking:

et al

if the

optical

flow

indicates

forward

level, then the proprioceptive information should A specific hypothesis of this article is, therefore, depends on the match the match, the greater

movement

between proprioceptive and sensory data. The greater the extent to which the participant can associate with

the VB as a representation of self. Since the VB is perceived VE, this should give rise to a belief (or suspension of disbelief) of self in that environment. In particular, the closer that the for forward of presence.

locomotion

at ground

correspond to this. that the degree of presence

corresponds

to really

“walking”

as being in the in the presence action required

the greater

the sense

3. LOCOMOTION 3.1

Other

There

Methods

is a tendency

in VR

research

to use hand

gestures

to do everything,

from grasping objects (a natural application), to scaling the world, and to navigation [Robinett and Holloway 1992; Vaananen and Bohm 1993]. This approach overloads greatly the hand gesture idea—the user has to learn a complete vocabulary of gestures in order to be effective in the virtual world. Small differences between gestures can be confusing, and in any case there is no guarantee of a correspondence among the gesture, the action to be performed, The

and the displayed

standard

direction

VR

outcome.

metaphor

of navigation

for

locomotion

determined

either

is a hand by

gaze

or

pointing. The VPL method for navigation, as demonstrated for example, used the DataGlove to recognize a pointing

gesture, by

the

with direction

the of

at SIGGRAPH 90, hand gesture where

the direction of movement was controlled by the pointing direction. Song and Norman [1993] review a number of techniques, distinguishing between navigation based on eyepoint movement and that of object movement. Here we are interested in “naturalistic” navigation, appropriate for a walkthrough application, so we rule out navigation via manipulation of a root object in a scene hierarchy [Ware and Osborne 1990]. Fairchild et al. [ 1993] introduced a leaning metaphor for navigation, where the participant extending the real movement. appropriate,

moves in the direction of body lean. The technique involves apparent movement in virtual space in comparison with the In fact, this is an “ice skating” metaphor, which may not be

for example,

to architects

In the context of architectural experience a sense of moving manner that maximizes sensory

taking

their

clients

on a virtual

walkthrough we require participants through the virtual building interior data and proprioception. Brooks [1992]

tour. to in a used

a steerable treadmill for this purpose. However, the use of any such device as a treadmill, footpads, roller skates [Iwata and Matsuda 1992], or even a large area mat with sensing devices imposes constraints on the movements of participants. Moreover, there will always be an application where the virtual space to be covered is much larger than the physical space available—one of the major advantages of VR systems. ACM TransactIons

on Computer-Human

Interaction,

Vol. 2, No 3, September 1995

Taking Steps

.

207

3.2 Walking We require with

that

participants

an electromagnetic

small

distances

of the

tracker,

by moving

sensor

though,

optical array consistent based on hand gesture The new method

still

that

carry

of the range device,

in order

walking.

out

available

Beyond

movements

to cover the range

reminiscent

of

range. If this is possible, then proprioceptive “walking”) matches sensory data (flow in the

for locomotion

distance

and really

with motion) interfaces.

in the space defined

a virtual

bodies

should

within with

advantage

such as a Polhemus

their

they

walking, while staying information (associated

around

be able to take

to a much

at ground

greater

level

allows

by the electromagnetic

is larger walks

than

the

motion

participants

tracker

physical

space

To cover

afforded

A major

advantage

of this

The hand

used in everyday

technique

may be entirely

reality,

is determined

that

is that

by gaze;

the hand

reserved

is not used

of objects

he or

they just

for the purposes

is, the manipulation

activity

the

move forward in virtual space in the direction of gaze. It is almost but no forward movement takes place in physical space. (We never

navigation.

out this

by

she will walking,

direction

carrying

to move

as usual.

the participant

that

While

than

tracker,

have to explain to users this up automatically.)

in place.

extent

pick

at all for

for which

it is

and activation

of

controls.

3.3

Implementation

The implementation a feed-forward that

detects

else. The which inputs

of

n data

to the bottom There

mz ), and

are two

the

top

corresponding We

obtain

weights

for

training net

and

the

corresponding

layers

consists

data

from

there

of ml

something

( x~, y,,

mz hidden

unit,

which

or O for anything

a person,

which

back-propagation.

down,

interspersed

moving

with

minutes.

An operator

whether

or not

corresponding

the

around,

periods

records subject

sequences

turning

of walking

units

outputs

binary is walking

the

data

into

are

head, This

used

the

to compute

training

of delta-coordinates,

and

on Computer-Human

1

The are

data, then

Interaction,

of these, to ten

corresponding together used

to

the

environsuch as

for five

neural net. The resulting network equations are then implemented machine as part of the code of the process that deals with detection indicating forward movement. ACM Transactions