Taking Steps: The Influence of a Walking. Technique on Presence ..... rather than as a psychological characterization of wh at the system supplies to the human.
Taking Steps: The Influence of a Walking Technique on Presence in Virtual Reality MEL
SLATER,
University
This article
MARTIN
USOH,
and ANTHONY
STEED
of London
presents
an interactive
ment (or “virtual reality”). restricted to ground level.
technique
for moving
through
an immersive
virtual
environ-
The technique is suitable for applications where locomotion is The technique is derived from the idea that presence in virtual
environments may be enhanced the stronger the match between proprioceptive information from human body movements and sensory feedback from the computer-generated displays. The technique is an attempt to simulate body movements associated with walking. The participant “walks
in place”
to move through
the virtual
environment
across distances
greater
than
the
physical limitations imposed by the electromagnetic tracking devices. A neural network is used to analyze the stream of coordinates from the head-mounted display, to determine whether or not the participant is walking on the spot. Whenever it determines the walking behavior, the participant is moved through virtual space in the direction of his or her gaze. We discuss two experimental studies to assess the impact on presence of this method in comparison to the usual hand-pointing method of navigation in virtual reality. The studies suggeet that subjective rating of presence is enhanced by the walking method provided that participants associate subjectively with the virtual body provided in the environment. An application of the technique to climbing steps and ladders is also presented. Categories
and Subject
puter Graphics]: Three-Dimensional General
User/Machine Systems; H. 1.2 [Models and Principles]: and Presentation]: Multimedia Information Systems—art@-
Descriptors:
H.5. 1 [Information Interfaces cial reulutes; H.5.2 [Information Graphics Graphics
Interfaces and Presentation]: deuice interfaces; Utilities—virtual realtty and Realism—uu-tucd
Terms: Experimentation,
Human
User Interfaces; [Computer
1.3.7
1.3.4 [ComGraphics]:
Factors
Additional Key Words and Phrases: Immersion, ence, virtual environments, virtual reality
locomotion,
navigation,
neural
networks,
pres-
This is a substantially revised and expanded version of Slater et al. [1994a]. This work is funded by the UK Engineering and Physical Sciences Research Council (ESPRC) and the Department of Trade and Industry, through grant CTA/2 of the London Parallel Apphcations Centre. Anthony Steed is supported by an EPSRC research studentship. Authors’ address: Department of Computer Science and London Parallel Applications Centre, Queen Mary and Westfield College, University of London, Mile End Road, London El 4NS, U.K.; email: {reel; bigfoot; steed}@ dcs.qmw.ac.uk. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial
advantage, the copyright notice, the title of the publication. and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. @ 1995 ACM 1073-0516/95/0900-0201 $03.50 ACM Transactions on Computer-Human
Interaction,
Vol. 2, No. 3, September 1995, Pages 201-219.
202
.
Mel Slater
et al.
1. INTRODUCTION The ability to get from place to place is a fundamental requirement for action in both real and virtual environments. This requirement epitomizes what is very
powerful
yet what
These systems offer computer-generated
also may
be flawed
the possibility environments,
in virtual
of perceptually and yet the
reality
(VR)
systems.
immersing individuals typical means for the
into most-
basic form of interaction—locomotion—do not at all match the physical actions of walking in reality. Generally, the powerful illusion of immersion may be lost through naive interaction metaphors borrowed from nonimmersive forms of human-computer interaction. This article describes an interactive technique sive
virtual
environment
(or “virtual
applications where the example, while exploring
participants a virtual
for locomotion
reality”).
The
are constrained building, as in
to an
through. The novelty of the technique is that participants movements in a simulation of walking, without the additional to display (HMD) to move
in an immer-
technique
is suitable
in
ground level, for architectural walk-
carry out whole-body necessity of hardware
the electromagnetic tracking devices on the head-mounted and glove (or 3D mouse). In brief, participants “walk in place”
across
virtual
distances
that
are
greater
than
the
physical
space
determined by the range of the electromagnetic trackers. Pattern analysis of head movements as generated by the HMD predicts whether participants are walking in place or doing anything else at all. Whenever it is determined that they are walking in place, they are moved forward in the direction of gaze, so that the corresponding flow in the Such illusory self-motion is usually (ideally)
only
detects
participants an effective movements.
are
tracker
In an earlier
head
able
report
array gives the illusion of motion. vection. Since the pattern analyzer
movements
to take range,
optical called
real
without
[Slater
characteristic
physical
steps,
causing
vection
et al. 1993]
of walking while
we presented
in
remaining
surplus
place, within
to their
actual
the technique,
called
the Virtual Treadmilljl in the context of (at that time) a partially complete human factors evaluation. In this article we discuss the technique in the context of a model of presence in immersive virtual environments. We also present the implementation details and results of two empirical studies with users. The utility of this idea for climbing or descending steps and ladders is also discussed. 2, VIRTUAL
ENVIRONMENTS
2.1 The Proprioceptive A VR
system
requires
unconsciously to form data that is supplied
Sensory that
Data Loop
the
normal
proprioceptive
a mental model of the by computer-generated
1 The London Parallel Applications Centre had a holding countries to protect aspects of this technology. ACM TransactIons
on Computer-Human
Interaction,
information
we use
body be overlaid with sensory displays. Proprioception was
patent
covering
the U.K.
Vol. 2, No 3, September 1995.
and other
Taking Steps defined from
by Sacks
[1985]
the movable
position
as “that
parts
and tone and motion
a way
which
spatial
from
allows
us to form
and relational
our left
[are]
is hidden
Proprioception
can clap our two hands
together
to map
model placed
real
pant’s tion
mental devices body body
(VB). that
mation model
and sensory and the VB.
Gibson’s
[1986]
elaborate nested
these
which
notion of visual
that
moves
between
he said
sensory
panies
the optical
have object,
[Gibson
body
be employed consisting
to of a
in the environment, is not considered as a live
environment,
eyes, ears, nose, mouth.
is immersed
from
This
is
[Gibson
becomes
to specify
a leg
occurring. air;
in
such
data
must
he wrote:
we feel
the sound
that
correctly
We see our it touch caused
the “The
arms
about
relationship optical
infor-
and hands,
the environment.
by
accom-
The two sources
of
p. 116].
proprioceptive
a way
array something
information
Regarding
body,
environment.
optical
occupied,
it contains
p. 66].
the head,
1986,
of the
in the ambient
array:
1986,
in an environment,
perception
and self-perception
between
the sensory the
infor-
mental
in space, but
information
and
consistency, predictability, and completeness For example, when proprioceptive information
we hear
the
all-surrounding
a head,
to the ambient
information
relationship
is indeed through
an
point
the position
the self, including
coexist”
moved
virtual
the same apex and completely
of a position
“When
information
to specify
This
partici-
self-representa-
may
to a position
an individual
that,
observer”
mation
requires properly.
in order
proprioceptive
Such an individual
is inseparable
happens
of the
information
when
self
the occupation
interesting
pain);
that
of the
describing
body
all with
through
flow.
for an effective
array
up an abstract
on feet and with
this
We
on this
of the
as an arrangement
by an individual. taking
data
between
optical
where
model.
are required
between
particular,
angles
body
space of the mathematician.
argued
an individual very
in
the dynamic
by relying
We call
requirement
ambient
continually
and moving
perception When
world.
is conceived
observer,
not the abstract Gibson
and
unconscious.”
movements
is a consistency
solid
body
corresponding
the apex. The apex corresponds
as a disembodied standing
human
virtual
of the
This
may be occupied
animal
there
this
the proprioceptive
A fundamental
feedback,
ideas.
hierarchy
surrounding
onto
into
their but in
We know
closed eyes) similarly
from
in the
is, therefore,
the
formed
self-representation a virtual
(with
and
describes
and its parts.
flow
by which
and adjusted,
that
by tapping
on the physical
movements
reality
model
203
sensory
joints),
it is automatic
of our body to look)
unconscious
monitored
a mental
disposition having
Tracking
but
tendons,
continually
us because
foot is (without
unconscious
continuous
of our body (muscles,
.
it
leg move; the
comes
inform object
in
into
sensory
order arises
contact
with
us, in all modalities,
we hear (and
by our leg hitting
the feel
“WOOA” any
the object;
data
to function because we another that
this
as it glides
expected
level
of
and we see the
object itself react in accordance with our expectations, This loop is the crucial component of a convincing reality: the “reality” is virtual when the sensory data is computer generated. ACM Transactions
on Computer-Human
Interaction,
Vol. 2, No 3, September
1995.
204
Mel SIater et al.
.
2.2
Immersion
We
call
a computer
virtual
system
environment”
(IVE).
that
supports
It is immersive
such
experience
since
it immerses
an “immersive a representa-
tion of the person’s body (the VB) in the computer-generated environment. is a virtual environment in the sense defined by Ellis [1991]: consisting
It of
content (objects and actors), geometry and dynamics, with an egocentric frame of reference, including perception of objects in depth, and giving rise to the normal ocular, auditory, vestibular, and other sensory cues and consequences. Whether or not a system can be classified as immersive depends crucially on the hardware, software, and peripherals (displays and body sensors) of that system. We use “immersion” as a description of a technology, rather than as a psychological characterization of wh at the system supplies to the human participant. Immersion includes the extent to which the computer displays are extensive,
surrounding,
extensive rounding
organs from turn toward inclusive
inclusive,
vivid,
and
matching.
the more sensory systems that to the extent that information
they can
The
displays
accommodate. arrive at the
are
more
They are surperson’s sense
any (virtual) direction and the extent to which the individual any direction and yet remain in the environment. They
to the
extent
that
all external
sensory
data
(from
physical
can are
reality)
are shut out. Their vividness is a function of the variety and richness of the sensory information they can generate [Steuer 19921. In the context of visual displays, for example, color displays are more vivid than monochrome; high resolution is more vivid than low resolution; and displays depicting dynamically changing shadows are more vivid than those that do not. Vividness is concerned with the richness, information content, resolution,
and
immersion
requires
quality that
of the there
displays.
Finally,
is a match
between
as we have the
argued
participant’s
above, proprio-
ceptive feedback about body movements and the information generated on the displays. The greater the degree of body mapping, the greater the extent to which the movements of the body can be accurately reproduced, and therefore the greater the potential match between proprioception and sensory data. 2.3
Presence
An IVE may lead to a sense of presence for a participant taking part an experience. Presence is the psychological sense of “being there”
in such in the
environment: it is an emergent property based on the immersive base given by the technology. However, any particular immersive system does not necessarily always lead to presence for all people: the factors that determine presence, given immersion, are an important area of study [Barfield and Weghorst 1993; Heeter 1993; Held and Durlach 1992; Loomis 1992; Sheridan 1992]. We concur with Steuer [1992] that presence is the central issue for virtual reality. Our shown ACM
view concerning the relationship between in Figure 1. The x-axis is the extent
TransactIons
on Computer-Human
InteractIon,
immersion and presence is of the match between the
Vol. 2, No. 3, September
1995.
Taking Steps
.
205
z al v c al
(.0 g
Fig. 1. Presence = (flmatch(prop, sense), sense; prop = proprioception; rep = internal tion; sense = sensory data.
CL
displayed tive-world is greater must
sensory data and the internal representation systems and subjecmodels typically employed by the participant. Although immersion the greater the richness of the displays, as discussed above, we
also
take
into
allows
the
models
of reality.
of sound.
For
extent
construct
a vivid
of “reality”
the
visual
virtual
it because
own
display
body
is the
as explained
extent above.
of the
match
and match through time, without motility and locomotion—whether relative
exist
their
between
The changes
system
might
it contradicts
displayed
internal
in the VE,
personal
lag, changes of individual
caused limbs
some
self-model.
match between et al. 1994b].
proprioception must
afford
in the absence
and this [Slater
to the display
mental
might
for others
We have explored the relationship between presence subjectivity and displayed data in earlier experiments The y-axis
information
their
but be unsuited
an excellent
reject
to which
to
example,
though
might
the
individuals
a sense
Even
individuals
account
particular
some individuals
data,
match(rep, representa-
and
sensory
be consistent
with
by the individual’s or the whole body,
to the ground.
Our
general
hypothesis
“matches’’-that are orthogonal—a and tactile to construct
is
it positively system
that
presence
increases
with
might
provide
is
a
function
each of them.
a superb
degree
display immersion, so that most individuals their internal representations successfully
of
Note
these
that
two
the axes
of visual,
auditory,
have sufficient data but fail to provide a
sufficient degree of match between the person’s actions and the displayed results, thus breaking the link between sensory data and proprioception. A further point about this hypothesis at many levels. At a very basic level, parasympathetic tems.
When
should
likewise
immediately enabled.
responses
an individual respond
when
At a much
the
in,
for
focuses
example,
visually
appropriately focus
higher
level,
to a far
when
should
match
the
sensory
ACM Transactions
the
ocular
on a near and
moves
of the virtual body on nearby surfaces 1995]. At a similarly high level, the system
is that we would expect it to operate the displays should result in suitable and
object
immediately object.
a person
and
Eye
moves,
vestibular
the visual again
tracking
the shadow
sys-
displays change
should
be
structure
should change accordingly [Slater et al. interactive metaphors employed in the
data
and
on Computer-Human
proprioception. Interaction,
This
brings
Vol. 2, No 3, September
us 1995
206
.
back
Mel Slater
to walking:
et al
if the
optical
flow
indicates
forward
level, then the proprioceptive information should A specific hypothesis of this article is, therefore, depends on the match the match, the greater
movement
between proprioceptive and sensory data. The greater the extent to which the participant can associate with
the VB as a representation of self. Since the VB is perceived VE, this should give rise to a belief (or suspension of disbelief) of self in that environment. In particular, the closer that the for forward of presence.
locomotion
at ground
correspond to this. that the degree of presence
corresponds
to really
“walking”
as being in the in the presence action required
the greater
the sense
3. LOCOMOTION 3.1
Other
There
Methods
is a tendency
in VR
research
to use hand
gestures
to do everything,
from grasping objects (a natural application), to scaling the world, and to navigation [Robinett and Holloway 1992; Vaananen and Bohm 1993]. This approach overloads greatly the hand gesture idea—the user has to learn a complete vocabulary of gestures in order to be effective in the virtual world. Small differences between gestures can be confusing, and in any case there is no guarantee of a correspondence among the gesture, the action to be performed, The
and the displayed
standard
direction
VR
outcome.
metaphor
of navigation
for
locomotion
determined
either
is a hand by
gaze
or
pointing. The VPL method for navigation, as demonstrated for example, used the DataGlove to recognize a pointing
gesture, by
the
with direction
the of
at SIGGRAPH 90, hand gesture where
the direction of movement was controlled by the pointing direction. Song and Norman [1993] review a number of techniques, distinguishing between navigation based on eyepoint movement and that of object movement. Here we are interested in “naturalistic” navigation, appropriate for a walkthrough application, so we rule out navigation via manipulation of a root object in a scene hierarchy [Ware and Osborne 1990]. Fairchild et al. [ 1993] introduced a leaning metaphor for navigation, where the participant extending the real movement. appropriate,
moves in the direction of body lean. The technique involves apparent movement in virtual space in comparison with the In fact, this is an “ice skating” metaphor, which may not be
for example,
to architects
In the context of architectural experience a sense of moving manner that maximizes sensory
taking
their
clients
on a virtual
walkthrough we require participants through the virtual building interior data and proprioception. Brooks [1992]
tour. to in a used
a steerable treadmill for this purpose. However, the use of any such device as a treadmill, footpads, roller skates [Iwata and Matsuda 1992], or even a large area mat with sensing devices imposes constraints on the movements of participants. Moreover, there will always be an application where the virtual space to be covered is much larger than the physical space available—one of the major advantages of VR systems. ACM TransactIons
on Computer-Human
Interaction,
Vol. 2, No 3, September 1995
Taking Steps
.
207
3.2 Walking We require with
that
participants
an electromagnetic
small
distances
of the
tracker,
by moving
sensor
though,
optical array consistent based on hand gesture The new method
still
that
carry
of the range device,
in order
walking.
out
available
Beyond
movements
to cover the range
reminiscent
of
range. If this is possible, then proprioceptive “walking”) matches sensory data (flow in the
for locomotion
distance
and really
with motion) interfaces.
in the space defined
a virtual
bodies
should
within with
advantage
such as a Polhemus
their
they
walking, while staying information (associated
around
be able to take
to a much
at ground
greater
level
allows
by the electromagnetic
is larger walks
than
the
motion
participants
tracker
physical
space
To cover
afforded
A major
advantage
of this
The hand
used in everyday
technique
may be entirely
reality,
is determined
that
is that
by gaze;
the hand
reserved
is not used
of objects
he or
they just
for the purposes
is, the manipulation
activity
the
move forward in virtual space in the direction of gaze. It is almost but no forward movement takes place in physical space. (We never
navigation.
out this
by
she will walking,
direction
carrying
to move
as usual.
the participant
that
While
than
tracker,
have to explain to users this up automatically.)
in place.
extent
pick
at all for
for which
it is
and activation
of
controls.
3.3
Implementation
The implementation a feed-forward that
detects
else. The which inputs
of
n data
to the bottom There
mz ), and
are two
the
top
corresponding We
obtain
weights
for
training net
and
the
corresponding
layers
consists
data
from
there
of ml
something
( x~, y,,
mz hidden
unit,
which
or O for anything
a person,
which
back-propagation.
down,
interspersed
moving
with
minutes.
An operator
whether
or not
corresponding
the
around,
periods
records subject
sequences
turning
of walking
units
outputs
binary is walking
the
data
into
are
head, This
used
the
to compute
training
of delta-coordinates,
and
on Computer-Human
1
The are
data, then
Interaction,
of these, to ten
corresponding together used
to
the
environsuch as
for five
neural net. The resulting network equations are then implemented machine as part of the code of the process that deals with detection indicating forward movement. ACM Transactions