Self-Motion: Visual Perception and Visual Control

0 downloads 0 Views 2MB Size Report
action, for adaptive behavior is the evolutionary bottom line. Traditionally ... current state of affairs (perception), and mechanically, so they can alter that ... over the soles of the feet. .... FIGURE 3 Spherical coordinate system for describing observer motion. ...... flow patterns produced by surfaces slanted in depth (see Figure 6).
CHAPTERS

. I



Self-Motion: Visual Perception and Visual Control

I

..I

William H. Warren, Jr.

We perceive in order to move, but we must also move in order to p erceive.

Gibson (1979, p. 223)

I. INTRODUCTION

Locomotion is one of the most fundamental of animal behaviors playing an integral role in many other basic biological activities. To be adaptive, it must be guided by information about the environment, and hence the visual control of locomotion is a similarly fundamental perceptual ability. Locomotion, and self- motion in general, tbus provides a model system for understanding basic principles of perception and the control of action. In the last fifteen years the topic has proven a natural point of contact between research in perception, computational vision, and neurophysiology. This chapter focuses on the visual control of human locomotion, including Our current understanding of the information available in optic Bow, the perception of self- motion, and the regulation of posture and gait. A. Perception and Action

•I

Broadly speaking, the biological function of perception is the control of action, for adaptive behavior is the evolutionary bottom line. Traditionally, the problems of perception and action have been treated as logically independent. It has been assumed that the goal of perception is to recover

cO

Perception of Space and Motion Copyright © 1995 by Academic Press, Inc. All rights of reproduction in any form reserved.

263

264

William H. Warren, Jr.

objective quantities such as size, distance, shape, color, and motion, yielding a general-purpose description of the scene that can provide the basis for any subsequent behavior. Considering vision in the context of action has potentially important implications for this view of perception. First, the set of action-relevant quantities recovered by the visual system may look different. There is long-standing evidence that perceptual judgments of distance, size, and shape are markedly inaccurate and nonEuclidean (Todd, Tittle, & Norman, in press). Given the assumed accuracy of everyday movements, this seems to present a paradox. However, metric information may be unnecessary for may tasks ranging from obstacle avoidance to object recognition, and metric tasks such as reaching could be governed by specific visuomotor mappings to which such perceptual "distortions" are transparent. On this view, the goal of perception is not to recover a general-purpose scene description but to extract task-specific information for the activity at hand. Second, the linear causality of sensory input to motor output is replaced by the circular causality expressed in the quote above. Animals are coupled to the environment in two ways: informationally, so they are apprised of the current state of affairs (perception), and mechanically, so they can alter that state of affairs (action). When the animal moves, this changes the state of affairs and generates new information that can reciprocally be used to guide subsequent movements, in a perception-action loop. Perception is consequently an act that unfolds over time, rather than a momentary "percept" (Shaw, Turvey, & Mace, 1981). Judgments at any instant may be qualitative or even nonveridical, and yet over the course of the act adaptive behavior emerges from the animal-environment interaction. This notion of what might be called pragmatic control descends from Gibson's (1966, 1979) emphasis on the active, exploring observer.

B. Information for Locomotor Control Four perceptual systems could potentially contribute to the control of posture and locomotion: (1) the visual system; (2) the vestibular system, including the semicircular canals sensitive to angular acceleration and the otolith organs sensitive to linear acceleration (Howard, 1986a); (3) the somatosensory system, including joint, muscle, and cutaneous receptors (Clark & Horch, 1986; Sherrick & Cholewiak, 1986; Turvey and Carello, this volume, chap. 1); and (4) the auditory system (Dodge, 1923; Wightman and Jenison, this volume, chap. 10). The mechanical senses provide information about disturbances after the fact, whereas vision and audition can guide behavior in an anticipatory or prospective manner. Although this chapter focuses primarily on vision, several points about the other modalities should be noted.

8

Self-Motion: Visual Perception and Visual Control

265

There is an ongoing debate about the relationship between these types of information for orientation and self-motion. Under normal conditions, they often redundantly specify a postural state or disturbance (Gibson, 1966), but there are situations in which they apparently conflict. Some have argued that one modality simply dominates in such situations (Lishman & Lee, 1973; Talbot & Brookhart, 1980), but there is considerable evidence for a more subtle relation. Classical sensory integration theories argue that perception results from integrating multiple ambiguous cues, often via an internal model of body orientation (Oman, 1982). Others have argued that information is defined across the modalities such that any combination of stimulation will specify a possible state of affaiIS. without ambiguity or interpretation (Stoffregen & Riccio, 1988, 1991). On this view, for example, standing on a rigid surface while the visual surround is moved does not present a conflict between vision and somatosensory information for upright posture, but specifies sway on a nonrigid surface. A fourth possibility is that perceptual systems are specialized to obtain information about different but overlapping aspects of the environment, which may be redundant or in conflict but is normally specific. For example, there appears to be a division oflabor between modalities, such that the vestibular system is primarily sensitive to high-frequency stimulation and brief acceleration, whereas the visual and somatosensory systems are primarily sensitive to low-frequency stimulation and (for vision) constant velocity motion (Howard, 1986b). The latter are more relevant to normal postural and locomotor control. Spontaneous postural sway is concentrated below 0.5 Hz in standing and 1.0 Hz in walking (Kay & Warren, 1995; Lestienne, Soechting, & Berthoz, 1977; Yoneda & Tokumasu, 1986), the same range in which sway responses and vection can be induced visually (Berthoz, Lacour, Soechting, & Vidal, 1979; Berthoz, Pavard, & Young, 1975; Dichgans & Brandt, 1978; Kay & Warren, 1995; van Asten, Gielen, & van der Gon, 1988a). Cutaneous information has also been shown to contribute to the stabilization of standing posture at frequencies below 1 Hz (Diener, Dichgans, Bruzek, & Selinka, 1982; Diener, Dichgans, Guschlbauer, & Mau, 1984), possibly via the distribution of pressure over the soles of the feet. On the other hand, vestibular responses to linear acceleration are only elicited at frequencies of stimulation above 1 Hz (Diener et aI., 1982, 1984; Melville-Jones & Young, 1978), and the vestibular system is an order of magnitude less sensitive to the direction of linear acceleration than are the visual and somatosensory systems (Telford, Howard, & Ohmi, 1992). It thus appears that stance and locomotion are regulated primarily by visual and somatosensory information, whereas vestibular information contributes to recovery from high-frequency perturbations (Allum & Pfaltz, 1985; Nashner, Black, & Wall, 1982) and gaze stabilization during locomotion (Grossman & Leigh, 1990).

266

8

William H. Warren, Jr.

Self-Motion: Visual Perception and Visual Control

267

II. OPTIC FLOW When an observer moves in a stationary environment, the light reflected to the moving eye undergoes a lawful transformation called optic flow. In addition to the static information about the three-dimensional (3-D) layout of the environment available to a stationary observer (Gillam, this volume, chap. 2; Rogers, this volume, chap. 4), optic flow contains information about both 3-D layout (Lappin, this volume, chap. 5; Todd, this volume, chap. 6) and the observer's self-motion. Helmholtz (1867/1925, p. 295) flIst pointed out the significance of motion parallax as a source of information for distance on the basis of the optical velocities of elements at different distances. Gibson (1947, 1950) subsequently generalized this notion to gradients of velocity 3600 about the observer, produced by surfaces in depth (see Figure 1). He represented the flow pattern as an instantaneous two-dimensional (2-D) velocity field V(x, y), in which each vector corresponds to the optical motion of an environmental element and possesses a magnitude or speed and a direction. 1 The gradient of speed contains information about distance along a surface, which Gibson called motion perspective, whereas the pattern of vector directions provides information about self-motion, which he called visual kinesthesis. In particular, Gibson discovered that the focus of expansion corresponds to the current direction of self-motion or heading in the case of pure observer translation and suggested that this could be used to control steering. He noted, however, that an added rotation of the observer, such as a pursuit eye or head movement, annihilates the focus of expansion and significantly complicates the retinal flow pattern. A persistent question has been how the visual system determines heading from this complex pattern. A general curvilinear motion of the observer can be described instantaneously as the sum of a translation and a rotation (Whittaker, 1944). Considering the resulting flow pattern on a spherical projection surface that moves with the eye (see Figure 2), observer translation generates radial flow along the meridians called the translational component, and observer rotation generates solenoidal flow 2 along the parallels called the rotational component. Although the term is often used loosely, I reserve optic flow proper for 1 A field is a region of n-dimensional space characterized by a quantity such as velocity that has

a unique value at every point and is a continuous function of position. The instantaneous structure of the velocity field is captured by streamlines, continuous field lines to which the vectors are tangent at every point. They are distinct from path lines, the actual trajectories of elements over time, and streak lines, which connect the current locations of elements that have previously passed through location (x, y). The three sets oflines coincide in the special case of steady flow, a stationary field in which the velocity at any point (x, y) does not change over time (Eskinazi, 1962). 2

That is, without sources or sinks.

F}GURE 1 Optic flow field generated by observer translation parallel to a ground plane. The flow pattern is represented as an instantaneous velocity field, in which dots correspond to environmental clements, and line segments represent the associated optical velocity vectors. Vertical probe indicates the heading point.

change in structure of the optic array that is due to displacement of the point of observation, before being sampled by an eye (Gibson, 1966). Thus, optic flow is unaffected by eye rotations 3 but is influenced by movement of the point of observation on a straight or curved path. Retinal flow refers to the change in structure of light on the receptor surface, in retinal coordinates, which is affected by both translation and rotation of the eye. Optic flow thu defines the informational constraints within which all adaptive behavior evolved. How such information is detected by a moving eye is a contingent question, for retinal flow depends on the structure of the eye and oculomotor behavior, and different visual systems provide different solutions Qunger & Dahmen, 1991). Both issues must be addressed for a complete understanding of the use of optic flow information.

A. Formal Analysis of the Flow Field Formal analysis of the flow field involves two .reciprocal problems: (1) a description of the flow field itself-that is, how flow is generated by observer movement relative to the environment, and (2) a description of inforThis assumes that the center of rotation is at the nodal point of the eye, which is not strictly true (Bingham, 1993). However, optic flow induced by displacement of the nodal point during an eye rotation is probably negligible for a mobile observer. 3

268

William H. Warren, Jr.

8

b

a

Self-Motion: Visual Perception and Visual Control

R

269

p

,, ,, , , ,,

x z

,,'D

,,

"

~ ,,

R

FIGURE 2

Flow pattern represented on a spherical projection surface. (a) Translational component: observer translation yields radial flow along meridians. (b) Rotational component:

o

observer rotation yields solenoidal flow along parallels.

FIGURE 3

mation in the flow field-the inverse problem of how flow specifies properties of the environment and self-motion. In what follows I sketch existing approaches to both. I assume that local velocities can be determined from the changing intensity field that characterizes the optic array (Hildreth & Koch, 1987), although this step could be bypassed in special cases (Horn & Weldon, 1988; Negahdaripour & Horn, 1989). Given that there is likely to be considerable noise in such motion extraction, any biologically plausible heading model must be highly robust. 1. Observer Translation

Consider first a local description of optic flow for the case of pure translation, in spherical coordinates (after Gibson, alum, & Rosenblatt, 1955; Nakayama & Loomis, 1974; see Figure 3). The observer moves with translational velocity T relative to a fixed environmental point P, which is a distance D from the observer and a visual angle 13 from the heading direction.4 The point's optical angular speed is

13 =

T sin

13

D

Vector quantities shall be in bold (e.g., velocity V with magnitude V and direction physical variables in uppercase, and optical variables in lowercase.

a. Heading

First, as noted earlier, heading is specified by the eccentricity and elevation of the focus of expansion (Figure 1). Because vector direction is independent of distance, this radial Bow pattern specifies heading in any environment. Even when the focus itself is not in view, heading is specified by triangulating two or more vectors to find their common point of intersection (Gibson, 1950). However, assuming some noise in local motion, triangulation error necessarily increases as the flow field is sampled farther from the focus and the vectors become more parallel (Koenderink & van Doom, 1987), which was confirmed empirically by Crowell and Banks (1993). Thus, the region of the flow field that is most informative is that near the focus of expansion. h. Scaled Distance and Time-to-Contact

(1)

and its direction of optical motion is along a meridian. Significantly, the direction of this vector is determined solely by the observer's heading, whereas its magnitude is influenced by both its position relative to the heading and the distance to the element. Several things follow immediately about the information in optic flow. 4

Spherical coordinate system for describing observer motion. Observer 0 moves with translation T or rotation R with respect to point P, which is at distance D from the observer and visual angle 13 from the axis of translation or rotation.

0,

Second, the distance D to a point at position up to a scale factor of observer speed T:

13 in the field is specified only

(2)

Thus, absolute distance can be determined only if speed is known, and vice versa. However, the ratio D/Thas the dimension of time. Lee (1974,1976, 1980) showed that the time-to-contact T, with the plane containing P that is

270

William H. Warren,

Jr.

8

perpendicular to the direction of travel is specified by the optic variable tau, under the assumptions of constant velocity and a rigid environment: (3)

This version has been dubbed global tau because it requires that ~ be known and hence depends on the perception of heading (Tresilian, 1991). The derivation incorporates a small angle approximation (~ = tan- I XI Z == XI Z) that introduces slight overestimations of T, when ~ becomes large, for example, at short times-to-contact. Local tau specifies time-to-contact for a moving object approaching a stationary observer (or vice versa), given any two points on the object such as the contour defining a visual angle e, and is equal to the inverse of the relative rate of expansion (Lee, 1976; Todd, 1981):

T

e ' =-:-='T[ e .

(4)

However, this requires two additional assumptions: the path of approach must be directly toward the observer, and the object must either be sphericalor not rotating. 5 Violations can lead to large errors, such as significant overestimations of T, with an off-axis approach (Kaiser & Mowafy, 1993; Tresilian, 1991). A more general formulation specifies the time-to-contact with an off-axis interception point, as would be needed for catching, by incorporating information about the visual angle 1\1 between the object and the interception point (Bootsma & Oudejans, 1993; Tresilian, 1990, 1994): 'T[ = ----:---

T C

1

+ 'T[~

cot 1\1

Self-Motion: Visual Perception and Visual Control

271

tive depth is available in the optic flow field, providing universal constraints on the evolution of visual control. 2. Observer Rotation

Co~sider the retinal flow produced by a pure rotation of the observer eqmvalent to a rigid rotation of the world about the eye. If the observer ha~ an angular vel~city R, t~e optical angular speed of a point P at a visual angle ~ from the aXIS of rotatIon is