Recognition by Action: Dissociating Visual and ...

3 downloads 0 Views 1MB Size Report
In this article the operation of a direct visual route to action in response to objects, in addition ..... but was not put in an explicit context (e.g.,paper clip —* clip);.
Copyright 1998 by Ihe American Psychological Association, Inc. 0096-1523fl8/$3.00

Journal of Experimental Psychology: Human Perception and Performance 1998, Vol. 24, No. 2,631-647

Recognition by Action: Dissociating Visual and Semantic Routes to Action in Normal Observers Raft'aella I. Rumiati Scuola Intemazionale Superiore di Studi Avanzati

Glyn W. Humphreys University of Birmingham

In this article the operation of a direct visual route to action in response to objects, in addition to a semantically mediated route, is demonstrated. Four experiments were conducted in which participants made gesturing or naming responses to pictures under deadline conditions. There was a cross-over interaction in the number of visual errors relative to the number of semantic plus semantic-visual errors in the two tasks: In gesturing, compared with naming, participants made higher proportions of visual errors and lower proportions of semantic plus semanticvisual errors (Experiments 1, 3, and 4). These results suggest that naming and gesturing are dependent on separate information-processing routes from stimulus to response, with gesturing dependent on a visual route in addition to a semantic route. Partial activation of competing responses from the visual information present in objects (mediated by the visual route to action) leads to high proportions of visual errors under deadline conditions. Also, visual errors do not occur when gestures are made in response to words under a deadline (Experiment 2), which indicates mat the visual route is specific to seen objects.

To date, information-processing models of action have emphasized the role of stored semantic (functional and associative) knowledge in determining which action is selected as a response to objects. For instance, the models proposed by MacKay (1985, 1987) and by Roy and Square (1985) hold that learned actions can only be generated following access to stored semantic knowledge specific to the objects presented. However, there is evidence from neuropsychological dissociations which indicates that a separate direct visual route to action operates independently of a semantic route. Riddoch and Humphreys (1987b) discussed the case of an "optic aphasic" patient, J.B., who was poor at both naming and gaining visual access to detailed semantic information about objects but who, nevertheless, could make specific learned gestures to the same stimuli. For instance, given a knife on one occasion and a fork on another, he correctly made cutting and prodding gestures with his right and left hands, respectively. However, when visual access to semantic knowledge was tested by requiring him to decide which two objects were most related to each other given a knife, a fork, and a plate, he incorrectly pointed to the fork and the plate. His poor semantic matching was not due to failure to understand the task, because he correctly decided that the knife and the fork went together, and that the plate was the odd one out, when he was given the names of the objects auditorily. Riddoch and Humphreys argued that J.B.'s relatively preserved ability to make gestures in response to visually presented objects was due to the operation of a direct route to action that bypassed the semantic system (to which he had impaired access from vision). Deficits among patients similar to J.B. have been reported by Sirigu, Duhamel, and Poncet (1991) and Hillis and Caramazza( 1995). A different pattern of deficit was shown by an apraxic patient, C.D., studied by Riddoch, Humphreys, and Price (1989). C.D. had normal object recognition and preserved

"Routes" to Action Consider how we might select an appropriate action to make in response to a common object, such as a knife. Selection might be determined by previous association with the particular object involved or by accessing stored knowledge about the function of the object. A cutting action might be made with a knife because we retrieve functional knowledge that that object is associated with the act of eating and the cutting action is associated with this act. That is, the action might be selected following the activation of semantic (functional and associative) knowledge by the object. However, it is also possible that actions are directly associated with stored visual knowledge about objects, without full access to semantic knowledge being necessary. In other words, actions may be selected by means of a direct visual route that links them to stored visual knowledge. In this article we present new evidence indicating the involvement of such a direct visual route in the selection of actions to be performed in response to seen objects.

Raffaella I. Rumiati, Settore di Neuroscienze Cognitive, Scuola Intemazionale Superiore di Studi Avanzati, Trieste, Italy; Glyn W. Humphreys, School of Psychology, University of Birmingham, Edgbaston, United Kingdom. This work was supported by grants from the Italian Ministry of University and Scientific Research and the Medical Research Council, Cambridge, United Kingdom. The research formed a partial contribution toward a PhD from the University of Bologna, Bologna, Italy. Many thanks to the Visual Cognition seminar group of the School of Psychology, University of Birmingham, and to Elisabetta Ladavas and Roberto Cubelli, for helpful comments. Correspondence concerning this article should be addressed to Raffaella I. Rumiati, Settore di Neuroscienze Cognitive, Scuola Inttmazionale Superiore di Studi Avanzati, Via Beirut 2-4, 34013 Trieste, Italy. Electronic mail may be sent to [email protected]. 631

632

RUMIATI AND HUMPHREYS

object naming. Despite this, he was impaired at making gestures in response to seen objects, and it is interesting that this impairment for actions was considerably worse than when he had only to gesture in response to the object names; that is, the impairment was specific to the visual modality. C.D.'s ability to gesture in response to auditory commands is consistent with his being able to select actions from semantic knowledge (accessed from the auditory names), that is, with the semantic route to action's being intact. However, the problem in making gestures in response to seen objects suggests a deficit associated with the visual route to action. Riddoch et al. proposed mat C.D. was unpaired at selecting actions when several competing actions were activated. This problem would be most apparent with visual presentation of objects if, with visual presentations, several competing actions are directly activated within the visual route. Similar impairments in other patients were reported by Assal and Regli (1980), Pilgrim and Humphreys (1991), and Motomura and Yamadori (1994). The specificity of the gestures carried out by the optic aphasic patient J.B. (which were hand specific as well as object specific) led Riddoch and Humphreys (1987b) to propose that the direct visual route to action was based on associations between stored visual representations of objects and learned actions. These stored visual representations could be represented in a structural description system separate from semantic memory (see Humphreys, Riddoch, & Quinlan, 1988; Riddoch & Humphreys, 1987a, 1987b; Seymour, 1979). For J.B., the lesion affected visual processing after access to the structural description system had occurred but before there was access to semantic memory. Thus, accurate gestures could be based on activated structural knowledge, even though access to semantic information was disrupted. Consistent with J.B.'s being able to access stored structural descriptions, Riddoch and Humphreys reported that he could perform difficult object decision tasks at a normal level. These tasks required discrimination between pictures of real objects and pictures of novel objects created by exchanging parts between objects; the nonobjects were as perceptually good as the real objects, which forced discrimination to depend on access to stored knowledge about the visual properties of objects (see also Hillis & Caramazza, 1995). Figure 1 illustrates the two routes to action (semantic and visual) we have outlined. Note that, within this model, actions in response to words (unlike actions in response to seen objects) may be made only following access to semantic knowledge. We know of no neuropsychological evidence for patients' being able to select correct actions in response to words without retrieving semantic information. In the present article we provide a first set of data demonstrating the existence of separated visual and semantic routes to action with normal observers. The experiments used a paradigm in which participants had to make a gesture under deadline conditions in response to visually presented line drawings of inanimate objects. Under deadline conditions, errors can be elicited that illustrate both the type of representation that first becomes available to response processes and the type of route used to access the response

Patterns

of stored actions

Action

Figure 1. Suggested framework for learned actions to objects. Route "1" involves seen objects or their names accessing stored semantic knowledge. Route "2" involves seen objects accessing only stored knowledge about the structural properties of objects. Route "2" itself could be subdivided according to whether learned actions are associated with whole objects or local object parts.

(in this case, visual or semantic). Deadline techniques have been used to good effect to study response selection in object naming tasks. Under deadline conditions, participants make a preponderance of naming errors that are either both visually and semantically related or solely semantically related to the target objects (e.g., to the target cup, participants might produce the name bowl [visually and semantically related] or saucer [semantically related]; see Vitkovitch & Humphreys, 1991; Vitkovitch, Humphreys, & Lloyd-Jones, 1993). These errors indicate that naming is mediated by access to semantic representations of objects, with the semantic information activated being strictly constrained by the visual properties of the object (hence the high proportion of errors that are both visual and semantic). Semantic and semantic-visual errors occur because the deadline forces responses to be based on the information concurrently available, which can fail to distinguish items within the same semantic field. Here we used a similar deadline procedure to elicit gesture errors in response to both pictures of objects (Experiment 1) and words (Experiment 2). According to the model outlined in Figure 1, gestures in response to words should be semantically mediated. Under deadline conditions, we expect errors to be semantic-visual or semantic in nature, as are errors in naming under a deadline (Vitkovitch & Humphreys, 1991).

RECOGNITION BY ACTION

In contrast, gesture errors in response to objects may be based on direct partial activation of actions from the visual properties of objects (via Route 2 in Figure 1). In this case, proportionally more visual errors might arise in gesturing under a deadline in response to objects than in response to words.

Experiment 1: Naming and Gesturing Under a Deadline In Experiment 1 participants were asked to name and, in a different block of trials, to make gestures in response to, visually presented objects under time pressure (e.g., writing in response to a pen, hammering in response to a hammer). The deadline took the form of a "beep" that occurred 450 ms after the onset of the picture target. Although this deadline was not formally applied (i.e., responses made after the deadline were included in the error analysis; see Vitkovitch & Humphreys, 1991), it served to cue participants to respond as rapidly as possible and not be overly concerned with the accuracy of performance. By comparing naming with gesturing, we aimed to see whether the two types of response were generated using the same, semantically mediated, processing route (cf. Vitkovitch & Humphreys, 1991). If they were, the relative proportions of semantic and semantic-visual errors and of visual errors should be the same across the tasks.1 Note that it is difficult to make clear predictions about the numbers of "other" types of error, because these may well depend on the relative difficulties of naming and gesturing. For instance, for whichever task is harder, there may be an increased number of errors in which participants simply fail to respond before the deadline and then make no response ("omission" errors). General effects of task difficulty may also increase the total number of semantic-visual, semantic, and visual errors. However, task difficulty should not produce a pattern of results in which one type of error is increased and another decreased in one task relative to the other, when the overall proportions of errors are the same. This last pattern of results resembles a form of double dissociation, similar to the double dissociations found in the neuropsychological literature, in which error rates are selectively increased or decreased in different tasks (Shallice, 1988). Such a data pattern suggests that the tasks are dependent on different routes through the information-processing system, with failures of the different routes leading to the contrasting error types.

Method Participants There were 20 participants, all members of the University of Birmingham and all with either normal or corrected-to-normal vision.

Stimuli The stimuli were a set of 60 line drawings. Some of them were novel, others were taken from the Snodgrass and Vanderwart

633

(1980) norms. The stimuli had previously been given to independent observers who were to rate the visual complexity and familiarity of the pictures (20 observers for each rating). Each observer was told to judge the degree of complexity or familiarity of the pictures, which were presented on cards one after the other, by assigning a number on a scale ranging from 1 (low complexity or low familiarity) to 5 (high complexity or high familiarity). Prior to the study, baseline naming and gesturing experiments were also conducted. In the naming study, we had 35 participants give names to pictures without time pressure, to generate a measure of name agreement. In the gesturing study, we had 45 participants make gestures in response to the pictures, again without time pressure, to generate a measure of gesture agreement. Naming and gesture responses in the deadline study were scored as correct if over 10% of the participants had produced the same responses in the baseline experiments (see Snodgrass & Vanderwart, 1980, for a similar procedure).

Design and Procedure Participants received 60 pictures, 30 for each block. In one block they were asked to name 30 pictures as fast as they could (the naming task). In the other block they had to perform an action in response to each picture as fast as they could (the gesturing task). The order of blocks was counterbalanced across participants, and there was a gap of about 5 min between the first block and the second. The presentation of the stimuli was random both in the naming task and in the gesturing task. The design was fully balanced so that for every item, 10 naming responses and 10 gesturing responses were collected across participants. We presented the stimuli individually on the screen of a Macintosh Classic PC using the Psychlab package (Bub & Gun, 1990). The computer was placed on a table in the middle of a room provided with two video cameras, one pointed at the participant and the other at the screen. This enabled the gesture or naming response made on each trial by a participant to be linked to the object presented for subsequent scoring. Every trial started with presentation of a fixation point (for 1,000 ms) in the middle of the screen, which was followed by presentation of the stimulus for 150 ms. There was then a 300-ms blank interval followed by a beep that went on for 350 ms (the deadline signal). Between each trial there was an interval of 1,000 ms. The viewing distance was about 50 cm. Stimulus presentations and participant responses were recorded with the two-camera video system for subsequent scoring.

Results Error Classification Performance was scored in two ways. First, gesturing and naming responses were scored by two independent judges on the following simple criteria that were used previously to classify naming errors by neuropsychological patients (see Hodges, Salmon, & Butters, 1991):

1 Vitkovitch and Humphreys (1991) found that naming errors under deadline conditions were mostly visual-semantic or semantic in nature. Hence, for comparisons between naming and gesturing performance, the crucial contrast is between the proportion of semantic plus visual-semantic errors (as found in naming) and the proportion of "pure" visual errors (predicted specifically for gesturing).

634

RUMIATI AND HUMPHREYS

Visual error (VIS)—the response involved an articulated version of the response to another item that was similar in shape to the target but was neither associated with the target nor from the same functional category (e.g., razor —• hammer); Semantic error (SEM)—the response involved an articulated version of the response to another item that was associated with the target or from the same functional category but not visually related to the target (e.g., hammei—> saw)', Semantic-visual error (SV)—the response involved an articulated version (named or gestured) of the response to another item that was both visually and semantically related to the target (e.g., cigarette —• match); Omission (OM)—the participants failed to give a naming or gesture response completely; Other errors (OTH)—(a) the response was unarticulated or poorly formed or was an articulated version of the response to an object that was both visually and semantically unrelated to the target (e.g., razor —• ball); (b) the response was a repeat of an earlier response given to a previous object in the session; (c) the response was appropriate to the target but was not put in an explicit context (e.g.,paper clip —* clip); (d) the articulated response involved an object whose name was a phonological neighbor of the target (e.g., corkscrew —* screwdriver). Each observer independently viewed the videotapes of the participants responding under deadline conditions, and each attempted to classify the gesture and naming responses as being either correct or incorrect. When the response was incorrect it was assigned to one of the error categories (see above). The error category then referred to the action that was performed, rather than to the correct action. Naming and gesturing responses were scored as correct if the response was produced by 10% or more of the participants in the (time unlimited) baseline experiment. In addition, naming responses were taken as correct if the erroneous name referred to an object that had the same gesture associated with it as the target (e.g., pen—»pencil). Note that on gesture trials, the same response would be scored as correct (see Funnell, 1987, for comments on the differential discriminability of naming and gesturing responses). In a second scoring procedure, an independent judge viewed videos of the gestures made by participants without simultaneously seeing the objects to which the gestures had been made. The judge attempted to identify each object from the gesture (where possible), and the object identified was compared with the correct object and scored as either correct or an error. Finally, nine independent observers were given the identification responses made by the judge and were asked to classify each response as a particular type of error, using the classification system described above. Responses were assigned to a particular error category if eight of nine judges agreed. The same nine observers were also given the naming errors to assign to categories. When observers classified the gestures made in response to each object (for Scoring Procedure 1), or when the independent judge identified each gesture without seeing the target object (for Scoring Procedure 2), the classification and identification

responses were not constrained by a list of possible objects used in the experiment. Scoring Procedure 1 In the naming task, the average overall judged error rate was 15.0% (across-participants SD = 10.0%) for Judge 1 and 15.0% (across-participants SD = 9.5%) for Judge 2. In the gesturing task, the average overall error rate was 15.5% (SD = 10.0%) for Judge 1 and 15.8% (SD = 8.5%) for Judge 2. The tasks did not differ in overall difficulty for either judge. For naming responses, there was 100% agreement between the two judges as to when a visual, a semantic, or a visual and semantic error occurred. The judges agreed on the error classifications for 95% (36/38) of the errors. For gesturing responses, there was 87% (33/38) agreement between the two judges as to when an error occurred. The judges agreed on the error classifications for 92% (35/38) of the errors. (See Appendix A.) Omissions constituted the largest proportion of errors in each task (7.0% of the total responses for each task, for each judge). For Judge 1, "other" errors occurred on 1.9% of the gesturing trials and on 2.1% of the naming trials; for Judge 2, "other" errors were noted on 1.3% and 2.1% of the gesturing and naming trials, respectively. The remaining errors (on 6.6% and 5.9% of the gesturing and naming trials, respectively, for Judge 2) were either visual, semanticvisual, or semantic in nature. The relative proportions of visual errors and of combined semantic and semantic-visual (sem/sem-vis) errors, for each judge, are given in Figure 2. Figure 2 indicates that the proportions of visual and semantic errors varied across the tasks. In gesturing, high proportions of visual errors occurred; in naming, there were more equal proportions of visual errors and semantic plus semanticvisual errors. A hierarchical fully saturated log-linear analysis, with tests of partial association, was applied to a two-way contingency table for the factors of error type (visual vs. semantic plus semantic-visual) and task type (gesturing vs. naming).2 For the data scored by Judge 1 there was no main effect of task type (\2 < 1.0) or of error type, x20. N= 76) = 2.59, p - .11. However, there was an Error Type X Task Type interaction, likelihood ratio x 2 (l> N = 76) = 4.41, p < .035. For the data as scored by Judge 2 there was no main effect of task type, x2 < 1.0. However, the main effect of error type, x 2 U, N = 76) = 9.08, p < .002, and the Error Type X Task Type interaction, likelihood ratio X 2 (l, N-16) = 7.21, p < .007, were both significant. Comparisons of the visual errors and the semantic plus semantic-visual errors in the two tasks were based on the data averaged across the two judges. When visual errors only were compared with "other" errors (and semantic plus semantic-visual errors), visual errors tended to be relatively

2

Log-linear analysis, like analysis of variance, is appropriate whenever tests of main effects and interactions are required. However, log-linear analysis is suitable for categorical data (see Norusis, 1990, pp. 321-322, for details of the specific method used here).

RECOGNITION BY ACTION

635

across judges). The more familiar the items were, the fewer the errors in both gesturing and naming (r = —.35,p < .005 and r = —.39, p < .005, respectively). However, there was no sign of a correlation between errors and visual complexity in either gesturing or naming (r = .000 and r = .009, respectively).3 Scoring Procedure 2

D Sem/Sem-Vis • Vis

Naming

Gesturing

Naming

Gesturing

Figure 2. Percentages of visual errors (Vis) and semantic plus semantic-visual errors (Sem/Sem-Vis) in naming and gesturing tasks as a function of the total number of trials in each task (Experiment 1).

higher in gesturing than in naming (29 visual errors vs. 21 "other" errors in gesturing, and 19 visual errors vs. 32 "other" errors in naming), x 2 (l, N = 101) = 4.36, p < .04, a result replicated in Experiments 4a and 4b. When semantic plus semantic-visual errors were compared with "other" errors, semantic plus semantic-visual errors were higher in naming than in gesturing (19 semantic plus semantic-visual errors vs. 32 "other" errors in naming, and 9 semantic plus semantic-visual errors vs. 41 "other" errors in gesturing), X 2 (l, N = 101) = 4.67, p < .03. For gesturing, there were relatively more visual errors than semantic plus semanticvisual errors (when compared with the total number of errors; 29 out of 65 and 9 out of 85, respectively), x 2 (l, N = 188) = 13.75, p < .001. For naming, the number of semantic plus semantic-visual errors did not differ from the number of visual errors (19 vs. 19). We computed correlations between familiarity and visual complexity ratings and the total number of errors made on each object, summing across participants (and averaging

When asked to identify the objects from the gestures made by participants, the judge failed to name the correct object (for the particular trial) on 46 occasions (8.0% of the total number of trials; this percentage does not include the 7% omissions made by the participants). On the trials when the objects could not be identified correctly, the additional judges agreed on the following classifications (see Appendix B): A semantic or semantic-visually related object was named on 8 occasions (17.0% of the total errors); a visually related object was named on 30 occasions (65.0% of the total errors). "Other" errors represented 22.0% of the total errors. The classifications of the misidentifications for both naming and gesturing trials are listed in Appendix A. The average numbers of visual errors and semantic plus semanticvisual errors scored when Procedure 2 was used did not differ from the average numbers scored by Judges 1 and 2 using Procedure 1 (x2 < 1.0). The gesture data, scored using Procedure 2, were also compared with the naming error classifications agreed on by the additional nine judges (which overlapped completely with the classifications assigned by Judge 2). For this we used a hierarchical fully saturated log-linear analysis, with tests of partial association, with the main effects being error type (visual vs. semanticvisual) and task (gesturing vs. naming). There was no main effect of task type (x2 < 1.0). However, the main effect of error type, x 2 (l, N=16) = 6.46, p < .01, and the Fjror Type X Task Type interaction x20, N = 76) = 6.95, p < .008, were both significant. These results were not caused simply by responses to one or two of the items in the set. When Scoring Procedure 1 was used, either visual or semantic plus semantic-visual gesture errors were recorded in response to 20 and 19 objects by Judges 1 and 2, respectively. For Judge 1, 13 of the objects had more visual than semantic plus semantic-visual errors, relative to 6 the other way (1 tie); for Judge 2, it was 15 to 4 (1 tie). For naming, visual errors no longer dominated. Visual or semantic plus semantic-visual errors were recorded in response to 24 objects. For both judges, there were more visual errors on 11 objects and more semantic plus semanticvisual errors on 11 objects (2 ties). When Scoring Procedure 2 was used, visual or semantic plus semantic-visual gesture errors were found for 20 objects. For 14, there were more visual than semantic plus semantic-visual errors, with 4 going in the other direction (2 ties). Thus, the tendency for the most likely error to be visual was found for the majority of objects to which errors were made.

3

It would be interesting to test whether factors such as familiarity and complexity predicted particular types of error, but the data were too few to enable a finer-grained assessment to take place.

636

RUMIATI AND HUMPHREYS

Discussion The relative proportions of semantic plus semantic-visual errors and visual errors differed across the two tasks. In naming, there were about the same numbers of semantic plus semantic-visual errors and "pure" visual errors. In contrast, relative to naming, in gesturing there were more visual errors and fewer semantic plus semantic-visual errors. For gesturing there were also more pure visual than semantic plus semantic-visual errors. These results were obtained with two scoring procedures: both when judges saw the objects and gestures together and directly rated the gesture errors and when judges saw the gestures alone and tried to identify the object concerned and other judges then rated the misidentification errors. There was also a close correlation between the errors classified by both scoring procedures. The variations in the proportions of error types were not due to differences in overall task difficulty, because the overall error rates were similar for gesturing and naming. Also, gesturing produced both selective increases and decreases in error types, relative to naming. Instead, the data are consistent with the proposal that the two tasks are mediated by different forms of information. Naming is mediated by access to semantic information. Semantic plus semanticvisual errors occur when the deadline prevents name retrieval from being completed. Visual errors may occur in addition if, on occasion, participants fail to recognize particular objects visually.4 Gesturing is mediated by a direct visual route in addition to a semantic route; hence proportionately more pure visual errors occur in gesturing than in naming. Such errors arise when the deadline prevents full activation of action responses, so that actions are based on responses partially activated by visual information common to more than one object. It may even be that the visual route to action in response to objects is relatively dominant, compared with the semantic route, so that not only are visual errors increased but semantic plus semantic-visual errors are decreased in gesturing compared with naming. However, although the data are consistent with a "dual route" account of gesture responses to seen objects, they do not force this interpretation. For instance, at least two alternative proposals can be put forth. One is that the "visual" errors we have observed in gesturing do not reflect the stimuli at all but are simply the most likely errors for a given gesture response. For instance, the "visual" error of miming hammering in response to the stimulus of a razor may simply be the gesture error most likely to occur for the target response (e.g., shaving). Note that the baseline gestural errors we collected do not provide a strong test for this, because the time-unlimited baseline conditions did not force errors to occur. Of course, this account would still need to explain why the most frequent errors for a given response were gestures that would be made in response to an object visually similar to the target, but the crucial point is that it places the locus of the errors in the response process, not in direct stimulus-response (vision-action) links. We tested this account in Experiment 2, where we required participants to make gestures in response to words as well as to pictures. If "visual" gesture errors occur irrespective of the nature of

the stimuli, they should be as frequent with words as stimuli as they are with pictures as stimuli (and, again, visual errors should be more frequent than semantic plus semantic-visual errors). In contrast, if gestures in response to words are mediated only by a semantic route to action (see Figure 1), and if "visual" errors do reflect the operation of a separate visual route for seen objects, few (if any) visual errors should occur, and errors should be predominantly semantic plus semantic-visual. A second, alternative account is that the differences in error types in naming and gesturing reflect contrasts in the time course of response activation, not contrasts in the information-processing routes involved. It may be that gestures can be initiated from partially processed visual information, before semantic representations are contacted; also, this initiation of action may occur even if gestures are normally mediated by semantic information activated after visual processing of objects. A framework illustrating this idea is given in Figure 3, and we assessed this account in Experiments 3 and 4. One other point to note concerning Experiment 1 is that "other" errors (neither semantic-visual, semantic, or visual) did not differ greatly across the tasks. Deadline conditions affected naming and gesturing equally in preventing the participants from actually producing a response.

Experiment 2: Gesturing in Response

to Words and to Pictures Method Participants There were 18 participants, all members of the University of Birmingham and all with either normal or conected-to-normal vision. Stimuli The items were the same as those in Experiment 1 except that they were sometimes presented as printed words as well as pictures.

Design and Procedure Unless otherwise mentioned, the design and procedure were the same as those in Experiment 1. The experiment was run in two blocks, one with pictures and one with words as stimuli. For a given participant there were 30 pictures and 30 words. In each block participants were told to make a gesture as quickly as possible in response to the stimulus (picture or woid). The design was fully balanced so that for each item there were nine gesturing responses to pictures and nine to words. The deadline was the same as in Experiment 1, for both picture and word stimuli. Words were presented in Geneva Font 24 type, centered at fixation.

4 Rather more visual errors occurred in the naming task here than in the study of Vitkovitch and Humphreys (1991). This likely reflects our own stimuli being less familiar and more complex (note the absolute values for ratings).

637

RECOGNITION BY ACTION

Experiment 2

Results Because of the similarity of the results when the two scoring procedures were used, we used only the first scoring procedure here. The results were produced by two independent judges who used the same criteria for error classification that were used in Experiment 1. For Judge 1, the overall error rate was 12.5% (SD = 5.5%) with pictures and 11.5% (SD = 4.5%) with words. For Judge 2, the overall error rate was 11.5% (SD = 6.0%) with pictures and 12% (SD = 5.5%) with words. For gesturing responses to pictures, there was 95% (19/20) agreement between the two judges as to when the error occurred. The two judges also agreed on the error classification for 100% of the errors. For gesturing responses to words, there was 100% agreement between the two judges as to when the error occurred, and the judges agreed on the classification of 99% of the errors. Overall error rates did not differ across the stimulus types. As in Experiment 1, the majority of errors were omissions, but these did not differ across words and pictures (for pictures, 6.0% of the overall responses were omissions, and for words, 5.0% of the overall responses were omissions, for each judge). There was also a slight but nonsignificant increase in "other" errors for words over pictures (3.4% and 4.0% of the overall responses to pictures were scored as "other" errors for Judges 1 and 2, respectively). The remaining errors were visual, semantic, or semantic-visual. Figure 4 shows the proportions of visual gesture errors and

Judge 1

D Sem/Sem-Vte • Vis

Pictures

Words

Judge 2

D Sem/Sem-Vis Object

• Vis Pictures

Stored visual representations

Under deadline conditions, gesturing initiated from " partial activation of stored visual representations.

Semantic representations

Stored name

Naming initiated only after activation of stored *name representations.

representations

Figure 3. A "single route" model, in which differences between gesturing and naming are accounted for in terms of the time and level of processing at which the responses can be initiated.

Words

Figure 4. Percentages of visual errors (Vis) and semantic plus semantic-visual errors (Sem/Sem-Vis) in naming and gesturing tasks as a function of the total number of trials in each task (Experiment 2).

semantic plus semantic-visual gesture errors to words and pictures. A hierarchical fully saturated log-linear analysis, with tests of partial association, was conducted with error type (visual vs. semantic plus semantic-visual) and stimulus type (pictures vs. words) as factors. For the data scored by Judge 1, there was no main effect of stimulus type, ff-(\,N — 33) = 1.5, p = .22, or of error type (x2 < 1.0). There was, however, an Error Type X Stimulus Type interaction, likelihood ratio x 2 (l, AT = 33) = 25.70,p < .000. For the data scored by Judge 2, there was no main effect of stimulus type, x 2 (l, N = 32) = 1.13, p < .287 or of error type (x2 < 1.0). However, an Error Type X Stimulus Type interaction was found, likelihood ratio x2 (1. N = 32) = 217.79, p .05, and F < 1.0, respectively. Overall, gesturing was easier than naming. In this experiment omission errors were slightly less frequent than in Experiments 1 and 2; for gesturing, omissions were recorded on 4% and 5% of the trials by Judges 1 and 2, respectively, and for naming, omissions were noted on 3% of the trials by both judges. "Other" errors were recorded on 4.2% and 4% of the gesturing trials, and on 5.7% and 5.3% of the naming trials, by Judges 1 and 2, respectively. The remaining errors were visual, semantic, or semantic-visual. Figure 5 gives the relative proportions of visual errors and of semantic plus semantic-visual errors made in gesturing and naming. A hierarchical fully saturated log-linear analysis, with tests of partial association, was conducted with error type (visual vs. semantic plus semantic-visual) and task type (gesturing vs. naming) as factors. For the data scored by Judge 1, there was no main effect of task type, x 2 (l> A T = 9 7 ) = 1.75p = .186,oroferrortype,x 2 (l,#=97) = 1.25, p < .264. However, an Error Type X Task Type

Experiments

interaction was observed, likelihood ratio x 2 (l, N = 97) = 5.46, p < .02. For the data scored by Judge 2, there was neither a main effect of task type, x 2 (l, JV = 94) = 2.99, p = .08, nor a main effect of error type, x 2 (l, N — 94) = 1.25, p = .264. However, there was a reliable Error Type X Task Type interaction, likelihood ratio x2(l, JV = 94) = 3.91, p W = 63) = 15.64, p < .000. For the data scored by Judge 2, there were also reliable main effects of task type, x 2 (l, N = 61) = 18.68,p< .0001, and error type, x 2 (l, N = 61) = 10.24, p < .001, and an Error Type X Task Type interaction, likelihood ratio X 2 (l' N = 61) = 7.81, p < .005. Figure 6 indicates a cross-over interaction between the proportions of visual errors and semantic plus semantic-visual errors in naming and gesturing. Unfortunately, because of the different numbers of trials in each task, it is not possible to assess whether absolute numbers of visual errors or semantic plus semantic-visual errors differed in the two tasks (as opposed to the proportions of the two error types, shown by the above interaction). Such comparisons can be made, however, by combining the data from Experiments 4a and 4b, putting together the data

RECOGNITION BY ACTION

641

Stimuli. The stimuli were the same as those used in Experiments 1 and 3. Design and procedure. The experimental design and procedure were the same as those in Experiment 3 except that naming was required on 75% of the trials. The order of presentation of the stimuli and the responses was random. The design was fully balanced so that for each item 16 gesturing responses and 16 naming responses were collected.

Results D Sem/Sem-Vis

• Vis Gesturing 75%

The results were scored by two independent judges who used the same criteria used in Experiment 4a. For Judge 1, the overall error rate was 17.0% (SD = 7.0%) for naming and 23.0% (SD = 16.0%) for gesturing. For Judge 2, the overall error rate was 16.0% (SD = 6.0%) for naming and 22.0% (SD = 14.0%) for gesturing. The error rate tended to be higher for gesturing than for naming responses (in contrast to Experiment 4a), and there was a trend for this to be significant in an ANOVA across participants, F(l, 15) = 3.54, MSE = 0.14, p = .08. Omission errors were noted on

Judge 2

5% of the naming trials (Judge 1, 6.0%; Judge 2, 4.0%) and on 4.0% of the gesture trials (for both judges). The relative proportions of visual errors and of semantic plus semanticvisual errors, as a function of the total number of errors in each task, are shown in Figure 7. "Other" errors occurred on

Gesturing 75% Figure 6. Percentages of visual errors (Vis) and semantic plus semantic-visual errors (Sem/Sem-Vis) in naming and gesturing tasks as a function of the total number of trials in each task (Experiment 4a). from the high-probability tasks and the low-probability tasks, respectively (because the same numbers of trials are then involved; see the Results section of Experiment 4b). Overall errors in response to stimuli (summed across participants and averaged over judges) were again correlated with the familiarity and complexity ratings. We observed that in gesturing, participants were more likely to make errors with less familiar items (r = — .279, p < .025). No correlation between naming errors and familiarity (r = .167, ns) or between gesturing or naming errors and visual complexity was observed (r = .048 and r = .0047, respectively).

Experiment 4b: 75% Naming and 25% Gesturing Trials Method Participants. Sixteen participants, all members of the University of Birmingham and all with either normal or corrected-tonormal vision, took part in the experiment.

4.6% of the naming trials (Judge 1, 3.1%; Judge 2, 6.0%) and on 4.8% of the gesture trials (Judge 1, 4.1%; Judge 2, 5.4%). Wrong-action or both-actions errors occurred on just 0.35% of the naming trials (all wrong actions), but they occurred on 9.0% of the gesture trials (6.0% wrong-action and 3.0% both-actions errors). The remaining errors were visual, semantic, or semantic-visual in nature. A hierarchical fully saturated log-linear analysis, with tests of partial association, was conducted with error type (visual vs. semantic plus semantic-visual) and task type (naming vs. gesturing) as factors. For the data scored by Judge 1, there were reliable main effects of task type, X 2 (l, JV= 58) = 24.06,p nail (SV), corkscrew (VIS) screwdriver — bat (VIS) shower —• telephone (VIS) spatula — spade (VIS), spade (VIS), spade (VIS), spade (VIS), spade (VIS), shovel (VIS), spade (VIS) stamp — doorknob (VIS) sword — knife (SV) tambourine —• bracelet (VIS), bracelet (VIS) toothbrush —> pen (VIS) tweezers —* pulling (C), pulling (C), pulling (C), pen (VIS) whisk —> screw (PERS)

Naming: Judge 2 aerosol can — lighter (VIS), saltceller (VIS), lighter (VIS) baseball bat — stick (VIS) bracelet —• band (SV), ring (SV) bulldog clip — clip (C), clip (C) cigarette — pen (VIS), chair (UN) clothes-peg — peg (C), peg (C), corkscrew —• screwdriver (PHON), hammer (UN) dart —• plying (SEM) doorknob —»doorhandle (SEM), doorhandle (SEM), doorhandle (SEM) electric drill —• screwdriver (SEM) hairclip — hairslide (SEM), clip (C) holepunch —> paperclip (SEM) ice cream —• screw (VIS), pen (VIS) lighter —> bag (VIS) razor —> hammer (VIS), hammer (VIS) ring -»nut (VIS) safe — cupboard (SV), clock (VIS)

RECOGNITION BY ACTION

647

saw —• cleaver (SV)

hammer —• sword (VIS)

screw — nail (SV)

hairclip —clip (C), clip (C), clip (C)

screwdriver —• corkscrew (PHON)

lighter—jug (VIS)

spatula — meat cleaver (SV), shovel (VIS), spade (VIS),

pliers —• scissors (SV), don't know shears (VIS), puncher (VIS)

(VIS) spinning top —• toy (SEM) stamp —• doorhandle (VIS), punch (SEM)

razor —• scraper (VIS), paint roller (VIS), hammer (VIS), hammer

stapler —»holepunch (SEM), sandal (UN)

safe —mug (PERS)

sword — knife (SV), knife (SV)

saw —• axe (VIS), hammer (SEM), don't know

(VIS), scraper (VIS), hammer (VIS) ring —• bracelet (SV)

toothbrush —• pencil (VIS)

screw — nail (SV), corkscrew (VIS), pen (VIS)

tweezers — pliers (VIS)

screwdriver —»bat (VIS)

whisk — brush (VIS), brush (VIS)

shower —• telephone (VIS) spatula — spade (VIS), spade (VIS), spade (VIS), spade (VIS), spade (VIS), shovel (VIS), spade (VIS)

Gesturing: Judge 2

stamp —• doorknob (VIS) sword —knife (SV)

aerosol can —> bottle (VIS), pliers (PERS) bracelet — pan (VIS), ring (SV), ring (SV) can opener —»pliers (VIS) cigarette —• don't know clothespeg — peg (C), peg (C)

tambourine —- bracelet (VIS), bracelet (VIS) toothbrush —• pen (VIS) tweezers —> pulling (C), pulling (C), pulling (C), pen (VIS) whisk — screw (VIS)

Appendix B Experiment 1 Error Classification: Scoring Procedure 2

The following are the error categories agreed on by eight of nine independent judges when given the identification responses made by another independent judge who attempted to name the gestures made by participants. (VIS = visual; SEM = semantic; SV = semantic-visual; C = no context.) aerosol can —bottle (VIS) boomerang —• don't know bracelet — tambourine (VIS), ring (SV), watch (SV) can opener —> pliers (VIS), scissors (VIS) cigarette —• don't know clothespeg — peg (C), peg (C) electric drill —> screwdriver (SEM) hairclip — comb (SV), clip (C), peg (C), peg (C) lighter —jug (VIS) pliers — scissors (SV), puncher (VIS), can opener (VIS) razor — scraper (VIS), roller (VIS), hammer (VIS), hammer (VIS), scraper (VIS), hammer (VIS), ring — bracelet (SV)

saw — knife (VIS), hammer (SEM), don't know screw — nail (SV), corkscrew (VIS) screwdriver — knife (VIS) shower head — telephone (VIS) spatula — spade (VIS), shovel (VIS), spade (VIS), spade (VIS), shovel (VIS), spade (VIS) stamp — doorknob (VIS) sword — knife (VIS) tambourine — bracelet (VIS), bracelet (VIS) toothbrush — pen (VIS) tweezers — pincers (C), pincers (C), pincers (C), pen (VIS) whisk — screwdriver (VIS)

Received August 17, 1995 Revision received December 20, 1996 Accepted February 6, 1997