Please do not quote without permission - CiteSeerX

2 downloads 0 Views 238KB Size Report
Jan 12, 2009 - objects that they were instructed to remember, with evidence of selective storage .... walk?” and “What objects do you remember seeing?” The ...
DRAFT IN REVIEW AT: VISUAL COGNITION Please do not quote without permission

Gaze control, change detection and the selective storage of object information while walking in a real world environment

Jason A. Droll & Miguel P. Eckstein Dept. of Psychology University of California Santa Barbara January 12, 2009 7700 words (body and figure captions)

Correspondence should be addressed to: Jason A. Droll Department of Psychology University of California Santa Barbara Santa Barbara, CA 93106 [email protected] Jason A. Droll is now at: Human Factors Practice Exponent 5401 McConnell Avenue Los Angeles, CA [email protected]

1

ACKNOWLEDGMENTS This work was supported by National Institutes of Health/Public Health Service Research Grants R01 EY015925 and by IC Postdoctoral Research Fellowship Program Grant 8-444069-23149 (icpostdoc.org). We thank Carter Phelps, Jessica Gaska, Nick Bernstein, Paul Hacker and Sanjay Athalye for help running subjects and analysis of eye-tracking video.

ABSTRACT Assessments of gaze behavior and object memory are typically done in the context of experimental paradigms briefly presenting transient static images of synthetic or real scenes. Less is known about observers’ gaze behavior and memory for objects in a real-world environment. While wearing a mobile eye-tracker, twenty subjects performed a task in which they walked around a building eight times and were either told to walk normally, or to also expect to be asked about what they saw following their walk. During the course of their walk, nine objects along their path were exchanged for similar token objects of the same category (e.g. whisk broom for push broom). Observers told to prepare for a memory test were much more likely to notice object changes than observers simply told to walk normally (32% vs. 5%). Detected object changes were also fixated for longer duration prior to the change, suggesting a role of task demand in gaze control and the selective storage of visual information.

2

INTRODUCTION Observers’ intrinsic proficiency at coordinating visual tasks is often neglected in laboratory experiments with designs that suppress the ecological context in which vision is most well suited and adapted. Laboratory experiments often employ tasks with a fixed trial structure in which two dimensional images of sparse synthetic stimuli are briefly presented within the confines of a computer monitor, and observers are instructed how to report their visual judgment with an arbitrary behavioral response. While such controlled methods are necessary for isolating visual and cognitive processes and powerful to test psychological theories, it is not clear how performance measures acquired in laboratory contexts generalize to an understanding of how vision is used in ordinary, real world behavior. The purpose of the present paper is to investigate visual processes under more natural circumstances. Specifically, we sought to test the role task demands have in observers’ use of eye movements and storage of object information in memory while walking within a real-world environment. Task Demands and the Guidance of Eye Movements For over forty years, it has been known that observers direct their gaze towards regions of the scene relevant to their task (Buswell, 1935; Yarbus, 1961, 1967). More recent experiments have reinforced the critical importance of task demand by demonstrating that in real world environments, across a variety of everyday tasks such as driving, walking, sports, playing a piano, visual search, hand-washing, and making tea or sandwiches, fixations are tightly linked to the momentary operations of a task (Chen & Zelinsky, 2006; Hayhoe, Shrivastava, Mruczek, & Pelz, 2003; M. Land, Mennie, & Rusted, 1999; M. F. Land, 2004; M. F. Land & Hayhoe, 2001; Mennie, Hayhoe, & Sullivan, 2007; J. Pelz, Hayhoe, & Loeber, 2001; J. B. Pelz & Canosa, 2001; Turano, Geruschat, & Baker, 2003; Turano, Geruschat, Baker, Stahl, & Shapiro, 2001; Zelinsky, Rao, Hayhoe, & Ballard, 1997). Such task-directed behavior differs from experiments in which subjects simply view images passively, and have no expectation of what information is relevant to their goals or interests. Observers may be engaged in object recognition, remembering object locations and identity, or performing some other visual operation. While there is some relationship between fixation location and image properties such as contrast or chromatic salience, these factors usually account for only a modest proportion of the variance (Foulsham & Underwood, 2008; Itti & Koch, 2000, 2001; Parkhurst & Niebur, 2003; , 2004) and image salience has negligible effect when observers are actively engaged in a task (Einhauser, Rutishauser, & Koch, 2008). In natural behavior where the task is well defined, the demands of the task also have an overwhelming influence on gaze control. Thus, if the behavioral goals are clearly defined, even such as making a peanut-butter and jelly sandwich, an observer’s actions can be reasonably assumed to reflect the internal cognitive representation of the task. While eye movements can be used to acquire visual information “just-in-time” for the momentary demands of a task (Ballard, Hayhoe, Pook, & Rao, 1997), there is clearly a need for storing visual information across successive fixations. What is the relationship between eye movements and visual memory? Under what conditions is information expected to be relevant for storage directed by gaze? Is fixated information necessarily

3

stored in memory? Are observers sensitive to a change in visual information that was earlier stored and later re-fixated?

Estimating Memory Capacity and Memory Usage Across Tasks and Contexts During an ordinary task such as making a sandwich, observers frequently re-fixate objects in the scene (Ballard, Hayhoe, & Pelz, 1995; Hayhoe, Shrivastava, Mruczek, & Pelz, 2003; M. Land, Mennie, & Rusted, 1999; J. Pelz, Hayhoe, & Loeber, 2001; J. B. Pelz & Canosa, 2001). Frequent re-fixations suggest that minimal information is stored internally, as if the world itself is used as a form of “external memory” (Ballard, Hayhoe, Pook, & Rao, 1997; J. K. O'Regan, 1992). Such minimal use of memory is contrasted with performance in memory tasks using traditional experimental paradigms with artificial or synthetic stimuli. Performance in those tasks has suggested the capacity of working memory to be about four objects (Luck & Vogel, 1997; Vogel, Woodman, & Luck, 2001; Wheeler & Treisman, 2002). However, using more realistic stimuli, and after viewing scenes for longer durations or with a greater number of fixations, visual memory may be more robust (Hollingworth, 2006b; Irwin & Zelinsky, 2002; Melcher, 2001, 2006). When viewing images of real world scenes, previously fixated objects are more likely than chance to be identified over novel objects 24 hours after viewing (Hollingworth, 2005). Such robust memory suggests that observers are storing detailed object information, including their position within the scene and their orientation (Hollingworth, 2006a, in press; Tatler, Gilchrist, & Land, 2005). Disparate estimates of memory capacity and memory usage across experiments may be due to at least two factors. First, there are significant differences in the task context. Many memory experiments employ artificial tasks in which observers are instructed to store as much information in a scene as possible, or to seek changes between successively presented images. Natural behavior is unlikely to include such indiscriminate storage of information. Thus, it is not clear how subjects incidentally acquire and store visual information when not instructed to do so. Memory literature has a long history of distinguishing between intentional or incidentally stored information (Craik & Tulving, 1975). For example, observers are more likely to recognize real world objects that they were instructed to remember, with evidence of selective storage for at least two days following exposure (Lampinen, Copeland, & Neuschatz, 2001). However, distinctions between intentional or incidental memory are only infrequently addressed in vision studies monitoring the encoding of objects via eye movements and subsequent storage of this information (Castelhano & Henderson, 2005; Williams, Henderson, & Zacks, 2005). A second reason for disparate estimates of memory capacity is the use of different visual contexts. While sparse artificial stimuli, such as arrays of colored rectangles (Luck & Vogel, 1997), are arguably necessary to test vision without contamination from semantic associations, real world environments are tremendously more complex, often including hundreds of objects, with different shapes, sizes and colors. Stimuli presentation in traditional experiments is also transient and brief, lasting only a few hundred milliseconds. Brief, transient images, are also often static and subtend only the visual angle of the monitor, lacking the larger retinal eccentricities and depth cues of real three-dimensional environments and observers typically exhibit a potentially artificial

4

central-bias in gaze behavior when viewing images in the laboratory (Schumann et al., 2008; Tatler, 2007). Real environments, on the other hand, are immersive and often perpetually present, allowing observers to scrutinize scene detail with leisure, fixating and attending to objects in a sequence and for durations determined by observers’ own internal agenda. This longer viewing time can result in improved retention of visual information (Melcher, 2006). Subjects have also shown improved performance in memory tasks when the scene is representative of a real world environment, suggesting that findings from memory tasks using artificial stimuli may not generalize to ecological behavior (Tatler, Gilchrist, & Land, 2005; Tatler & Melcher, 2007). It should also be noted that viewing objects in the context of real world scenes influences objects’ localization (Eckstein, Dresher, & Shimozaki, 2006) as well as their observers’ detection of objects (De Graef, Christiaens, & d'Ydewalle, 1990).

Detecting Changes in Natural Scenes Perhaps intermediate to the extremes of natural behavior and traditional laboratory experiments are contexts and scenes that demonstrate the phenomenon of change blindness. Change blindness refers to the occurrence in which observers are unaware of otherwise salient changes to objects in a natural scene when the change is masked by an eye movement (Grimes, 1996), a simulated mud splash (J.K. O'Regan, Rensink, & Clark, 1999), a flickering gray screen (Blackmore, 1995; R. A. Rensink, O'Regan, & Clark, 1997), or movie edit (Levin & Simons, 1997). These experimenters have suggested that while we can retain general information about the “gist” of a scene quite easily, memory for scene detail is fleeting and sparse (for reviews see D. Simons, 2000; D. J. Simons, 2000). Failure to notice otherwise salient changes may also be a phenomenon in the real world. For example, only half of observers noticed the change in identity to a person with whom they had been conversing a few seconds earlier (Levin, Simons, Angelone, & Chabris, 2002; Simons & Levin, 1998). It is not clear why observers are so poor at detecting changes. Some instances of change blindness appear to be caused by a failure to compare previously stored information (Angelone, Levin, & Simons, 2003; Hollingworth, 2003; Simons, Chabris, Schnur, & Levin, 2002; Varakin & Levin, 2006). However, when viewing rendered images of natural scenes, observers may also miss changes due to a failure to fixate and attend to the object before or after the change, suggesting a failure to encode the information necessary to make a comparison (Hollingworth & Henderson, 2002). Yet there is also evidence to suggest that observers are capable of encoding peripheral information when detecting object changes (Zelinsky, 2001). The precise role of eye movements and attention in detecting changes in real world environments is not known, as there has not yet been an experiment monitoring gaze while observers encounter object changes. In an experimental paradigm similar to the present paper, observers infrequently detected changes while navigating through a virtual world, where the changes are separated in time by several seconds (Karacen & Hayhoe, 2008). As observers walked a circular 22 meter route, they passed by eight objects. After a variable number of laps, objects disappeared, appeared, were replaced, or underwent a change in their location. Rates of change detection were generally low (~25%). However, observers may have

5

been better at detecting the changes had they been told to either expect changes, or to prepare for some sort of memory test following their walk. Also, while the virtual world was considerably more complex than most experimental displays, it was relatively sparse in comparison to the complexity of real world environments, and it is not clear if observers’ rates of change detection would be higher or lower when interacting in a more complex real-world environment. Monitoring Gaze and Memory as Observers Navigate Through the Real World The present experimental manipulation was designed to examine the role of task demand on gaze control, and the storage of visual information in a real-world environment. If observers are instructed to report what objects they recalled encountering in their environment, they may be more likely to direct gaze towards objects, and to encode and store fixated information. Reflecting this memory, observers may also be more likely to report object changes.

METHODS Two Walking Tasks Subjects were met by the experimenter in the laboratory and escorted to the testing site, approximately a two minute walk to the Life Science building. During the escorted walk, subjects were told that the purpose of the experiment was for researchers “to better understand how people moved their eyes while walking.” At the testing site, subjects were introduced to two male assistants. The first assistant helped prepare and calibrate the eye tracker. Following calibration, subjects were instructed to walk around the building eight times in a clockwise direction. In both the Walking Only and the Walking and Memory condition, subjects were instructed to walk as they would normally, at a comfortable pace. In the Walking and Memory condition, subjects were given further instruction that after their eight laps, they “would be asked about what they saw.” The purpose of the Walking and Memory condition was to assess how a modest change in task instruction might influence observers’ control of gaze and memory for objects. Fifteen subjects participated in the Walking Only condition and nine subjects participated in the Walking and Memory condition 1. During the course of their walk, the second assistant served as an escort and walked alongside, and slightly behind, the subject. The purpose of the escort was to address any possible questions from passing by pedestrians regarding the equipment worn by the subject, and to make the subject more at ease during the walk. Both the subject and the escort were instructed not to converse during the walk to avoid possible interactions between gaze used during walking and gaze during conversation. At the beginning of each lap, the experimenter would hold up a sign indicating the lap number which the subject was beginning (e.g. “3” when starting the third lap). The purpose of the sign was to both keep track of the number of laps navigated during the

1

A different number of subjects were included in each condition because of the variable number of subjects for whom eye tracking data was sufficiently robust to permit analysis. Subjects continued to be run until five subjects in each condition had a sufficient eye track.

6

experiment, and to facilitate video coding. The duration of each lap was approximately 90 seconds. Object Changes As subjects circumnavigated the building, they passed by nine objects set out by the experimenters. The positions of the nine objects are shown in Figure 1. Each of the nine objects was consistent with the semantic context of the environment. Before each experimental session, all objects were displayed in either State 1 or State 2, as listed in Table 1. Each of these objects was present during the subjects’ first four laps. Between the fourth and the fifth lap, all of the objects were exchanged with another token object of the same category type (e.g. whisk broom to push broom). When not appearing along the course, alternate token objects were hidden either in the building, behind walls, or inside containers. To avoid the possibility of the subject noticing any substitutions being made, object changes were made by the experimenter and assistant as the subject was walking along a different side of the building. The order of object presentation was counterbalanced such that half of the subjects were first exposed to objects in State 1, followed by State 2, and the other half of subjects were presented with the reverse sequence. Before the experiment, subjects were not told about the surrounding objects, or the possibility that any objects might undergo a change between laps.

7

Figure 1: Observers circumnavigated a building eight times while passing by nine objects. Each of the objects was exchanged for an alternate token between the fourth and fifth lap.

8

# 1 2 3 4 5 6 7 8 9

Object Sawhorse backpack Broom luggage (square) Trashcan Traffic Cone Assist. Shirt (short sleeve) Exp. Shirt (long sleeve) Advertisement

Object State 1 2 green red purple gray whisk broom push broom red purple large gray small green tin tall (28") short (12") red green green red blue yellow

Table 1: Objects and their two possible states. Between the subjects’ fourth and fifth lap, each of the nine objects were exchanged from one state to another. The order of the change was counterbalanced across subjects. Post-Experiment Questionnaire At the completion of the eighth and final lap, subjects were asked to verbally respond to a series of questions. Initial questions were open ended and later questions were more specific. The questionnaire began with, “What did you notice during your walk?” and “What objects do you remember seeing?” The experimenter recorded whether subjects spontaneously mentioned noticing any of the experimental objects, or if they reported noticing their change. Subsequent questions were more detailed, inquiring, “Do you remember seeing a _______ ?” mentioning one of the seven objects placed around the building (not including the two shirts). Whenever subjects reported seeing a particular object, they were asked to provide more detail, in order to determine if they remembered the object in the initial state, before the change, or the final state, after the change, or if they detected the change. The second to last question asked the subject to report the color of the assistant’s shirt without turning around to face him. The final question was whether the subject remembered the color of the shirt the experimenter was initially wearing during the beginning of the experiment. Subjects wore a microphone to their collar to record their responses, which were simultaneously transcribed by the experimenter. Monitoring Gaze Subjects wore an Applied Science Laboratories (ASL) mobile eye tracker which monitored the position of the right eye using corneal IR reflection and pupil. The videobased tracker was mounted on a pair of lightweight goggles and included a scene camera with a field of view coincident with each observer’s line of sight. Alternate frames of the 60Hz video recorded either the scene, or eye, image, resulting in an effective sampling rate of 30Hz. Spatial accuracy had a lower bound of approximately one degree, although

9

accuracy was sometimes compromised with noise due to outdoor reflections. Video from each camera was recorded by a digital tape recorder placed inside a hip belt worn by the subject. Before the subject began walking the eight laps, a calibration procedure was performed by asking the subject to stand still and fixate the end of a stick held by the experimenter in twelve different positions, spanning the field of view of the scene camera. The twelve points were on a vertical plane orthogonal to the observers’ line of sight, approximately fifteen feet from the subject, representing the approximate distance subjects would be when expected to fixate objects in the environment. After walking the eight laps, and after answering the questionnaire, the calibration procedure was repeated on each side of the building in order to accommodate the unique lighting conditions for each path. After each experiment, ASL software was used to separate alternate video frames to allow for eye calibration. A digital video was then generated, displaying the image from the scene camera, on top of which was projected a cursor indicating the subject’s momentary direction of gaze (Figure 2). Calibrations for each walkway were used to improve the track. The final video also included a frame count used as a time indicator to document the start and end of fixations.

Figure 2: Video frame captured from scene camera. Superimposed crosshairs indicate direction of gaze. Scene includes small traffic cone and red sawhorse

10

Data Analysis The timing of each fixation, and the direction of gaze, was determined through frame-by-frame analysis of the output video with fixation crosshairs. Video coding included documenting the start and end time for each video frame in which gaze was directed to any of the nine objects of interest for each of the eight laps. Due to track noise from outdoor sunlight, the presence of the crosshairs was occasionally intermittent. Thus, rather than classifying fixations by applying an acceleration threshold of eye position, or a minimum number of frames to a particular location, all continuous frames with the crosshairs at a particular location were recorded as a single fixation, regardless of the number of frames in the sequence. Thus, there was an occasional fixation lasting only one frame (33ms), although this was a minority of fixations (