Syntactic Ambiguity Resolution in Discourse - Semantic Scholar

1 downloads 0 Views 2MB Size Report
data collection; and Ken McRae for extensive discussions about the modeling. ... dictory literature on discourse context effects (MacDonald,. Pearlmutter ...
Copyright 1998 by the American Psychological Association, Inc. 0278-7393/98/$3.00

Journal of Experimental Psychology: I.earning, Memory, and Cognition 1998, Vol. 24, No. 6, 1521-1543

Syntactic Ambiguity Resolution in Discourse: Modeling the Effects of Referential Context and Lexical Frequency Michael J. Spivey

Michael K. Tanenhaus

Cornell University

University of Rochester

Sentences with temporarily ambiguous reduced relative clauses (e.g., The actress selected by the director believed that...) were preceded by discourse contexts biasing a main clause or a relative clause. Eye movements in the disambiguating region (by the director) revealed that, in the relative clause biasing contexts, ambiguous reduced relatives were no more difficult to process than unambiguous reduced relatives or full (unreduced) relatives. Regression analyses demonstrated that the effects of discourse context at the point of ambiguity (e.g., selected) interacted with the past participle frequency of the ambiguous verb. Reading times were modeled using a constraint-based competition framework in which multiple constraints are immediately integrated during parsing and interpretation. Simulations suggested that this framework reconciles the superficially conflicting results in the literature on referential context effects on syntactic ambiguity resolution.

The question of how expectations created by discourse context are used in syntactic ambiguity resolution has been the subject of numerous experiments during the past decade (for a recent review, see Tanenhaus & Trueswell, 1995). Most of these studies have been motivated by contrasting claims made by theories in which ambiguity resolution is primarily guided by discourse-based principles, for example, minimizing new presuppositions while continuously updating a discourse model (cf. Altmann & Steedman, 1988; Crain & Steedman, 1985), and theories in which discourse information is used only to evaluate and, if necessary, revise an initial structure assigned according to simplicity-based structural principles (cf. Frazier, 1987). The typical study has used sentences with sequences of words that are temporarily ambiguous between a structure that modifies a definite noun phrase and a structure that introduces a new discourse event or entity. In neutral contexts, the modification analysis is typically the less preferred interpretation, resulting in increased processing difficulty if the sentence is disambiguated in favor of the noun phrase modification analysis. For example, The actress selected . . . is temporarily ambiguous between a main clause in which selected is introducing a selecting event with

Michael J. Spivey, Department of Psychology, Comell University; Michael K. Tanenhaus, Department of Brain and Cognitive Sciences, University of Rochester. This work was supported by a National Science Foundation Graduate Research Fellowship and by National Institutes of Health Grant HD27206. We thank Kathleen Eberhard, Chuck Clifton, and three anonymous reviewers for helpful comments on an earlier version of the article; Greg Stevens and Alex Reed for assistance in data collection; and Ken McRae for extensive discussions about the modeling. Correspondence concerning this article should be addressed to Michael J. Spivey, Department of Psychology, Cornell University, Ithaca, New York 14853-9365. Electronic mail may be sent to mjs41 @cornell.edu.

the actress as the agent, as in The actress selected a new costume and a relative clause in which selected modifies the actress as in The actress selected by the director believed that her performance was perfect. Discourse context is manipulated by using contexts that introduce either one or two possible referents for the definite noun phrase. Tworeferent contexts (e.g., two actresses were auditioning for a play; the director chose one of the actresses but not the other) provide discourse support for a modification analysis, because modification is required to disambiguate the referent of the noun phrase. In contrast, a context that introduces a unique referent (e.g., An actress and the producer's niece were auditioning f o r a play. The director chose the actress but not the niece) allows the definite noun phrase in the target sentence to be immediately integrated into the discourse model and thus supports a nonmodification analysis. Unfortunately, the results of this literature have been largely inconclusive (for a review, see Spivey-Knowlton & Tanenlaaus, 1994; Tanenhaus & Trueswell, 1995). Whereas some studies have found clear effects of referential context (Altmann, Garnham, & Dennis, 1992; Altmann, Garnham, & Henstra, 1994; Altmann & Steedman, 1988; Britt, 1994; Britt, Perfetti, Garrod, & Rayner, 1992; Spivey-Knowlton, Trueswell, & Tanenhaus, 1993; Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995; van Berkum, Hagoort, & Brown, 1998), others have found weak or delayed effects (Britt et al., 1992, Experiment 3; Clifton & Ferreira, 1989; Ferreira & Clifton, 1986; Liversedge, 1994; Mitchell, Corley,& Garnham, 1992; Murray & Liversedge, 1994; SpiveyKnowlton, TruesweU & Tanenhaus, 1993, Experiment 3). Recently, several research groups have argued that constraint-based models of ambiguity resolution provide a useful framework for rationalizing the superficially contradictory literature on discourse context effects (MacDonald, Pearlmutter, & Seidenberg, 1994; Spivey-Knowlton, Trueswell, & Tanenhaus, 1993). Constraint-based models claim that the comprehension system continuously integrates multiple constraints to converge on a consistent 1521

1522

SPIVEY AND TANENHAUS

interpretation (Bates & MacWhinney, 1989; McClelland, St. John, & Taraban, 1989; MacDonald, et al., 1994; SpiveyKnowlton & Sedivy, 1995; Spivey-Knowlton et al., 1993; Trueswell & Tanenhaus, 1994). Specific models differ in their details, but as a class they share two common features: (a) multiple constraints are combined to compute alternative interpretations in parallel and (b) the alternatives compete with one another during processing. Crucially, constraintbased models predict that the effectiveness of contextual constraints will be modulated by the strength of other relevant constraints. Important supposing evidence comes from two recent self-paced reading studies using prepositional phrase ambiguities (Britt, 1994; Spivey-Knowlton & Sedivy, 1995). In these studies, the noun inside the prepositional phrase (e.g., The man fixed the door with the rusty lock/screwdriver) determined whether the prepositional phrase modified the noun phrase (with the rusty lock) or the verb phrase (with the rusty screwdriver). Reading times beginning at the noun indicated whether one of the resolutions of the ambiguity was easier than the other. Both studies found that referential manipulations were most effective when verb-based biases were weakest. However, neither of these studies either quantified the strength of the contributing constraints or provided an explicit mechanism linking constraint integration to processing difficulty. This makes it difficult to evaluate claims about whether context has immediate effects or whether it operates during a second stage of processing. Distinguishing among these alternatives using data from the prepositional phrase ambiguities used in these studies is further complicated because the studies did not use an unambiguous baseline (see, however, studies by Spivey-Knowlton, 1996, and Tanenhaus et al., 1995, reporting referential effects established by visual contexts using spoken sentences with prepositional phrase attachment ambiguities and an unambiguous baseline). This article examines the influence of discourse constraints on syntactic ambiguity resolution using the relative clause-main clause ambiguity, which affords an unambiguous baseline. In order to consider some of the relevant constraints that enter into this ambiguity, consider the fragment the actress selected . . . . A reduced relative clause is temporarily ambiguous when it contains a verb that uses the same morphological form for the past tense and the passive participle. If selected is a past tense form, the actress is the subject of a main clause in the active voice and would typically play the semantic role of agent. In contrast, if selected is a passive participle, the actress is the object of a relative clause in the passive voice and would typically play the semantic role of theme or patient, though other roles are possible depending on the argument structure of the verb. Argument structure is also important for evaluating postambiguity constraints that immediately follow the verb (MacDonald, 1994). For example, the verb selected can be used intransitively (The actress selected first) or transitively (The actress selected the lead role). If the verb is followed by a preposition, for example, by, in The actress selected by . . . . then a past tense use of selected would require an

intransitive argument structure. In contrast, a past participle use of selected would be transitive because the noun phrase preceding the verb (the actress) is the logical object. Thus, if the verb is typically used transitively, a preposition such as by immediately following the verb provides strong evidence for a relative clause. If the verb is typically used with an intransitive argument structure, then a preposition is still consistent with a main clause. This analysis suggests that, even setting aside discourse factors, a variety of constraints should be important for resolving the reduced relative ambiguity (MacDonald et al., 1994). These include, but are not limited to (a) the semantic fit between the first noun phrase and the different thematic roles it would play in a main clause versus a relative clause; (b) the frequency with which the ambiguous verb occurs as a participle in a passive clause as compared with a past tense in an active clause; (c) the frequency with which the verb occurs with certain argument structures; and (d) the information provided by postambiguity constraints. There is now a substantial literature demonstrating the importance of each of these constraints for the processing of reduced relative clauses (Burgess, Tanenhaus, & Hoffman, 1994; MacDonald, 1994; MacDonald et al., 1994; Pearlmutter & MacDonald, 1995; Tabossi, Spivey-Knowlton, McRae, & Tanenhaus, 1994; Trueswell, 1996; Trueswell, Tanenhaus, & Garnsey, 1994). There is also evidence that these constraints interact. MacDonald et al. (1994) presented a meta-analysis showing that contextual manipulations have been most successful in studies using verbs that are frequently used as participles. Trueswell (1996) has shown that semantic fit interacts with participle frequency as estimated using the Francis and Ku~era (1982) frequency counts. MacDonald (1994) has demonstrated interactions among semantic fit, preferred argument structure, and strength of postambiguity constraints. Burgess et al. (1994) showed that thematic fit interacts with the availability of constraining parafoveal information. The importance of these constraints can be highlighted by comparing the clear garden-path sentence The horse raced past the barn fell with the sentence The land mine buried in the sand exploded. The latter sentence has the same structure but does not cause noticeable processing difficulty. In The horse raced past the barn fell, the horse is a typical agent of a racing event, raced is used about 10 times more frequently as a past tense than as a past participle (Francis & Ku~era, 1982), and it is typically used intransitively. All of these factors strongly bias the main clause interpretation. In The land mine buried in the sand exploded, the land mine is an atypical agent of a burying event, but a good theme, buried occurs about 5 times more frequently as a past participle than as a past tense (Francis & Ku~era, 1982), and it is typically used transitively. Thus all of the constraints conspire in favor of the relative clause interpretation. From a constraint-based perspective, then, the fact that several previous studies using referential contexts with reduced relatives have failed to find context effects (e.g., Britt et al., 1992; Ferreira & Clifton, 1986; Murray & Liversedge, 1994) may be due to other constraints being so strongly biased in favor of the main clause that discourse

DISCOURSE AND LEXICAL CONSTRAINTS

effects were masked. Because the recent literature has provided insights into the nature of these constraints, it is possible to create materials in which discourse context is more likely to exhibit strong effects, therefore allowing a clear test of the hypothesis that discourse constraints can apply immediately. The research reported here had two primary goals. The first goal was to determine whether discourse context would have immediate effects on processing reduced relative clauses under conditions where opposing constraints would not be expected to overwhelm the context effects. The second goal was to model the results using a multiple constraints approach with an explicit competition algorithm (e.g., Spivey-Knowlton, 1994; Stevenson, 1994). The modeling was undertaken to evaluate the claim that discourse context interacts with other constraints and to see whether superficially conflicting patterns of results in the literature could be accounted for solely by differences in the strength of constraints due to differences in the materials. The general framework of the model will be presented in the introduction to Experiment 1, after we have described our stimulus materials. Experiment 1 This experiment was designed to test whether discourse contexts would reduce or eliminate processing difficulty for reading temporarily ambiguous reduced relative clauses compared with an unambiguous baseline. Reduced relative clauses with morphologically unambiguous verbs were used as the baseline, allowing us to compare ambiguous and unambiguous sentences with the same structure and number of words, as recommended by Ferreira and Henderson (1993). Table 1 presents a sample set of materials. These materials were previously used in self-paced reading studies reported in Spivey-Knowlton et al. (1993). A complete listing of the materials can be found in the appendix to that article. In the

Table 1

Sample Materials From Experiments 1 and 2 Stimulus group One-referent context

Two-referent context Ambiguous reduced relative target sentence Unambiguous reduced relative target sentence (Exp. 1) Full (unreduced) relative target sentence (Exp. 2)

Example text An actress and the producer's niece were auditioning for a play. The director selected/chose the actress but not the niece. Two actresses were auditioning for a play. The director selected/chose one of the actresses but not the other. The actress/selected/by the director/ believed that/her performance was perfect.

The actress/chosen/by the director/believed that/her performance was perfect. The actress/who was/selected/by the director/believed that/ber performance was perfect.

Note. Critical recording regions are shown in target sentences. Exp. = experiment.

1523

present experiments, we used the same experimental stimuli and design as in Spivey-Knowlton et al. (1993). However, the experimental measure in the present study, eye movements during reading, is often considered to be more sensitive to initial syntactic processing than is self-paced reading (e.g., Rayner, Sereno, Morris, Schmauder, & Clifton, 1989). The target sentences (see Table 1) began with a noun phrase-verb-prepositional phrase sequence. The noun phrase always began with a definite article followed by an animate noun (e.g., The actress). For the ambiguous sentences, the verb form was morphologically ambiguous between a simple past tense and a passive participle (e.g., selected). Unambiguous sentences were constructed either by replacing the ambiguous verb (e.g., selected) with a similar verb for which the past participle form was morphologically unambiguous (e.g., chosen, in Experiment 1) or by using a full relative clause (e.g., who was selected, in Experiment 2). The prepositional phrase always began with the preposition by and introduced an agent (e.g., by the director). The main clause biasing context introduced a single referent for the initial noun phrase in the target sentence (e.g., an actress), whereas the relative clause biasing context introduced two potential referents (e.g., two actresses). Figure 1 illustrates some of the constraints that are most relevant for these materials. Discourse context is one source of constraint, with one-referent contexts biasing the main clause more strongly than the relative clause, and tworeferent contexts biasing the relative clause more than the main clause. The relative frequency with which the ambiguous verb occurs as a simple past tense and as a passive participle is another important constraint. Trueswell (1996) has shown that the more frequently the verb is used as a passive participle, the stronger the support for the relative clause structure. We should note that past participle frequency, as measured by Francis and Ku~era (1982) counts, is only a rough estimate of the tense and voice bias associated with a verb's -ed form. 1Nonetheless, it captures a significant portion of the item-specific variance associated with individual verbs in reduced relative constructions (MacDonald et al., 1994; Trueswell, 1996). The prepositional phrase that follows the verb in these materials provides more evidence for the relative clause than the main clause. We used the preposition by because it provides a strong constraint in support of a relative clause: by after an -ed verb is typically used to introduce an agent in a passive construction (Hanna, Barker, & Tanenhaus, 1995;

In general, the participle form of a verb can in fact have any tense and is not always in the passive voice, and frequency counts from Francis and Ku~era (1982) collapse across all of these uses. However, for the purposes of distinguishing between a main clause and a reduced relative, in sentences where there is no auxiliary verb preceding the morphologically ambiguous verb, the participle form of that verb is typically in the past tense and it must be in the passive voice. Thus, the coarseness of this metric as an indicator of past participle availability limits the degree to which these simple verb-form frequency counts can be expected to account for item-by-item variance in reading times.

1524

SPIVEY AND TANENHAUS

Provisional Interpretation of Syntactic Ambiguity Reduced Relative

"~ tr ] k

Main Clause

°,,.~ O

,


~..d° O

Main Clause Bias

Discourse Information

Figure 1. A schematic diagram of the competitive integration model. The syntactic alternatives receive input from four sources of constraint, which then receive feedback from the syntactic alternatives. The sources of constraint are defined in terms of the amount of probabilistic support they provide for the reduced relative and for the main clause. The dashed-line boxes indicate domains of normalization, which produce probabilistic competition. Prob. = probability; RR = reduced relative; MC = main clause.

McRae, Spivey-Knowlton, & Tanenhaus, 1998). Also, readers typically do not fixate separately on by but instead process it parafoveally when fixating on the preceding verb (Trueswell et al., 1994). Thus, for the purposes of this study, all of these constraints can be treated as being more or less available when the verb is being processed. Finally, we are also assuming a configurational bias favoring the main clause over the relative clause. A sentence initial sequence of noun phrase-verb -ed is more typically the beginning of a main clause than the beginning of a reduced relative clause (MacDonald et al., 1994; McRae et al., 1998; Tabossi et al., 1994). For present purposes, we remain agnostic about whether this configurational bias is best characterized at a structural level (e.g., Gibson, Schutze, & Salomon, 1996; Mitchell, Cuetos, Corley, & Brysbaert, 1995; Stevenson, 1994) or whether it emerges from other more local constraints (e.g., Juliano & Tanenhaus, 1995; MacDonald et al., 1994; Tabor, Juliano, & Tanenhaus, 1997; Trueswell, 1996). Treating the clause bias as a unitary constraint allows the model to remain neutral between constraint-based models that include conditional probabili-

ties based on categories along with other constraints (e.g., Jurafsky, 1996) and models in which the configurational bias would be eliminated when it was decomposed into other lexical constraints (MacDonald et al., 1994). In addition, treating the configurational bias as a separate constraint also allows one to simulate a two-stage model by having it precede other constraints (McRae et al., 1998). Note, however, that in the simulations of constraint-based models we are not treating the configurational bias as the sole parsing principle of a first stage in processing that precedes the use of other constraints as in models like the two-stage garden-path model of Frazier and colleagues (Frazier, 1987). Although the constraints we selected are clearly established in the literature, there are other constraints that we did not directly model. Verb argument structure preferences and the thematic fit of the noun phrase to the agent and patient roles of the verb are perhaps the most relevant of these constraints. The extent to which our general modeling effort is compromised by not including certain item-specific constraints depends on the degree to which the items in the experiment vary along that dimension. The greater the

DISCOUgSV. ~ D t ~ v a c ~ c o ~ s ~ r r s

variability, the more important it is to include the constraint. It also depends on the extent to which the constraint is subsumed by or correlated with other constraints. The most important dimension of argument structure for the materials we used is likely to be transitivity. All of our verbs had strong transitive preferences, as suggested by the null context sentence completions in Spivey-Knowlton et al. (1993). In addition, verb form frequency is correlated with transitivity (Trueswell, 1996). Thus, the verb form frequency constraint captures some of the item-specific variability due to argument structure and the overall transitivity bias should be captured by the main clause bias. However, it seems likely that there is item-specific variability in the materials that is only captured by argument structure. Therefore, the current model will account for less of this variance than it would had argument structure been incorporated as a separate constraint. One clear example of where argument structure and verb form frequency conflict in our stimuli is the verb watched. Watched occurs much more frequently as a simple past than as a past participle (Francis & Ku~era, 1982); however, it exhibited a high transitive bias in the verb frame preference norms of Connine, Ferreira, Jones, Clifton, and Frazier (1984). Nonetheless, despite not accounting for some of the variance due to argument structure, the constraints we did select were sufficient to capture a significant amount of item-specific variance. We also did not include thematic fit as a separate constraint. There were several reasons for this. The first is that there was little range in how typical the nouns were as themes for the verb they occurred with. Secondly, in the stimulus items we used, the typicality o f the nouns as agents for specific verbs happened to be correlated with verb form frequency for that verb. Stepwise regressions correlating item-specific ambiguity effects with thematic fit and verb form frequency showed that frequency was the better predictor, with no significant effects of goodness of agent once frequency was partialled out. Finally, it is not clear to what extent prototypical thematic fit remains a relevant constraint when the noun phrase refers to a previously established discourse entity whose role in the event referred to by the verb is strongly constrained by the discourse, as was the case for the materials used here. We are assuming a constraint-based framework in which ambiguity resolution is accomplished by integrating multiple sources of information. During processing, each constraint provides some degree of support for the main clause or the relative clause structures. These alternatives compete for activation, computed as probabilities. As Figure 1 illustrates, discourse constraints will compete with other constraints. Nonetheless, with the current materials, one would expect to see strong effects of referential context because the conflgurational bias will be somewhat counterbalanced by parafoveal information that supports the relative clause. Under these conditions, two-referent contexts should sharply reduce or even eliminate the main clause preference that would normally be present beginning at the verb. Referential context should also interact with tense frequency, a prediction that we explored in Experiment 2. For one-referent contexts, however, there should still be an

1525

initial main clause preference which, as the sentence unfolds, shifts to a relative clause preference as additional evidence supports the relative clause interpretation.

'Me~od Participants. Twenty undergraduate students from the University of Rochester participated for course credit. Materials. Initial construction of stimulus materials produced 24 sentences, each with a one-referent context and a two-referent context. Sentence completions on these items revealed that 8 of them did not exhibit a strong contextual constraint (cf. SpiveyKnowlton et al., 1993). To avoid testing for immediate influences from contexts that are not strongly constraining, these items were not used in the eye-tracking study. The remaining 16 sentences, with their one- and two-referent contexts, were identical to those used in self-paced reading experiments by Spivey-Knowiton et al. (1993). The eye-tracking experiment had a 2 × 2 factorial design with context (one-referent or two-referents) and ambiguity (ambiguous reduced relative or unambiguous reduced relative) as the independent variables. Four of the 16 target sentences were assigned to each of the four experimental conditions, which were rotated to create four versions of each stimulus. Each participant was exposed to only one of the four stimulus lists, and therefore to only one version of any one experimental item. (Consequently, stimulus list was later entered into the statistical analyses as a between-subjects and a between-items factor--thus making the degrees of freedom in those analyses equal to N - 4.) The 16 experimental stimuli were randomly embedded within 32 filler stimuli, with at least one filler stimulus intervening every 2 experimental stimuli. All of the experimental stimuli and half of the filler stimuli were followed by yes-no questions to verify that participants were reading carefully. In constructing the filler items, care was taken to avoid predictable contingencies that might allow participants to become aware of the experimental manipulation. For example, we wished to avoid having participants be able to induce that when an identical pair of referents (e.g., Two actresses) was introduced in context, then a relative clause was likely to follow. Of the 32 filler contexts, 12 began with a pair of identical referents (e.g., Two ministers were walking . . . or Two infants were playing . . . ) , much like the two-referent experimental contexts, but ended with temporarily ambiguous main clause sentences that referred to both entities, (e.g., they). Sixteen of them began with a pair of different referents (e.g., a mathematician and a physicist were discussing . . . or a policeman and a sheriff were working . . . ) , much like the onereferent experimental sentences. Finally, 4 filler contexts described events primarily involving only one participant. Additionally, of the 32 filler stories (with 3-4 sentences each), 4 contained relative clauses (all of which were in contexts that introduced a pair of different referents). All other filler sentences were main clauses without embedded relatives. Previous data with these same stimuli and fillers, from self-paced reading (Spivey-Knowlton et al., 1993), showed significant effects of processing difficulty (in the one-referent context) ranging from 55 ms to 110 ms. Using a conservative typical effect size of 65 ms, and a typical standard deviation of 110 ms, we computed a power analysis for the present experimental design. For subject analyses (n = 20), the design yielded a power of .84, and for item analyses (n = 16), the design yielded a power of .77. Procedure. Contexts and sentences were presented on a 13inch color monitor, one line at a time. Participants read the sentences by pressing the mouse button to present each new line of text. Aline of text spanned no more than 20 degrees of visual angle,

1526

SPIVEY AND TANENHAUS

with each character and space taking up approximately 15 min arc. The critical recording regions of the target sentences were always contained on one line of text. Eye movements were monitored with a Dr. Bouis Oculometer that measured horizontal eye position continuously (the software sampled this analog signal at 1000 Hz) with accuracy to within 20 rain arc. Stimulus presentation and the eye movement record were controlled by a Macintosh II computer. The 9articipant's head was held motionless during trials by a dental bite bar. At the beginning of the session, the participant's horizontal eye positions were calibrated to horizontal screen positions, and a practice session of 8 trials was conducted. Participants were instructed to read each short story of three or four sentences as naturally as possible and to answer questions accurately. In both practice and experimental sessions, accuracy of the eye position signal was checked in between each trial and adjusted if necessary. Trials in which the track was inaccurate, or in which the participant's first fixation of the target sentence was not at the beginning of the sentence (11% of all critical trials), were excluded from further analysis. The data were not transformed in any other way. During the experimental session of 48 trials, participants were encouraged to remove themselves from the bite bar for a short break in between trials whenever they needed to for comfort. The entire session took approximately 1 hr.

Results All participants answered the questions with 80% or better accuracy, the cutoff we used for including a participant's data. For eye-movement analysis, the target sentences were segmented into four critical regions: initial noun phrase, verb, by phrase, main verb + one word. The verb region marks the introduction of ambiguity. The by phrase provides strong probabilistic disambiguation but is still formally syntactically ambiguous. The sentence could still turn out to be an intransitive main clause with, for example, a locative by phrase (i.e., The boy stood by the telephone pole). Recall, however, that all of the verbs had strong transitive preferences, making this an unlikely type of continuation. Finally, the main verb region provides complete syntactic disambiguation of the ambiguity as a reduced relative clause. Reading times. We begin, in Table 2, with a global measure of processing difficulty, total reading time, and we follow this with a more fine-grained analysis, first-pass reading time. Total reading time was measured as the mean

total time that participants spent in each region, including regressive fixations (rereadings). Clearly, the syntactic ambiguity in the ambiguous reduced relative produced substantial processing difficulty at the by phrase in the one-referent context but not in the two-referent context. Analyses of variance (ANOVAs) were computed on total reading times at the verb region and the by phrase region; see top half of Table 3. Results showed a significant increase in total reading time at the verb due to syntactic ambiguity. However, at the by phrase, the interaction between context and ambiguity (Table 3, middle) shows that the difference between ambiguous and unambiguous sentences was reliably modulated by context. Nonetheless, results from total reading times may conflate initial processing effects and "garden-path recovery" effects (Rayner et al., 1989). We therefore examined first-pass reading times, which separate processing effects during the first forward pass through a region from processing effects during regressive eye movements. Table 4 presents the first-pass reading times for each region. Note, in the one-referent context, the substantial difference between ambiguous and unambiguous first-pass reading times at the by phrase (65 ms), compared with the absence of such a difference in the two-referent context ( - 7 ms). This suggests that readers experienced difficulty in processing the relative clause in the one-referent context but not in the two-referent context. ANOVAs computed for first-pass reading times are shown in the bottom half of Table 3. The only consistently reliable result for first-pass reading times was the interaction between context and ambiguity at the by -phrase. This crucial interaction at the by phrase is highlighted in Figure 2. There were no significant effects at the main verb region. Nonetheless, in the two-referent context, reading times were 21 ms longer at the ambiguous verb compared with the unambiguous verb. There are two possible explanations for this effect. First, it could reflect a rapid revision process as argued for by two-stage models. Secondly, it could reflect the fact that the amount of competition for reduced relatives in two-referent contexts depends on lexically specific factors, with significant competition occurring for those items whose verbs were more biased toward the main clause. This predicts a correlation, in the two-referent context condition, between

Table 2

Mean Total Reading Times (in Milliseconds) Across All Four Regions in Experiment 1 Ambiguity condition

The actress

Target sentence selected (chosen) by the director

believed that

Ambiguous relative Unambiguous relative Ambiguous - unambiguous

One-referent context 292 331 ' 308 320 - 16 11

496 417 79

439 436 3

Ambiguous relative Unambiguous relative Ambiguous - unambiguous

Two-referent context 317 312 285 291 32 21

385 413 -28

431 424 7

1527

DISCOURSE AND LEXICAL CONSTRAINTS Table 3

Analyses of Variance for Reading Times From Experiment 1 Reading time analysis F df Total reading times at verb Context Participants Items Ambiguity Participants Items Context x Ambiguity Participants Items Total reading times at by phrase Context Participants Items Ambiguity Participants Items Context × Ambiguity Participants Items First-pass reading times at verb Context Participants Items Ambiguity Participants Items Context X Ambiguity Participants Items First-pass reading times at by phrase Context Participants Items Ambiguity Participants Items Context × Ambiguity Participants Items *p < .05.

MSE

p

2.383 2.020

1,16 1,12

6,573 8,406

.142 .176

11.031 6.610

1,16 1,12

6,688 6,213

.004"* .021"

0.220 0.530

1,16 1,12

11,668 6,308

.646 .478

2.487 1.575

1,16 1,12

23,715 29,284

.134 .229

-0.027 0.033

1,16 1,12

20,800 14,961

.871 .858

6.201 7.610

1,16 1,12

5,876 5,911

.024* .015"

0.598 0.219

1,16 1,12

1,745 3,039

.451 .646

0.805 0.237

i,16 I,12

2,007 2,347

.383 .634

4.608 0.806

1,16 1,12

1,116 3,049

.047" .383

5.481 2.461

1,16 1,12

7,691 9,980

.032* .143

2.263 0.319

1,16 1,12

7,499 3,626

.152 .583

5.195 7.862

1,16 1,12

5,129 2,997

.037" .016*

**p < .01.

Table 4

Mean First-Pass Reading Times (in Milliseconds) Across All Four Regions in Experiment 1 Target sentence

Ambiguity condition

The actress

Ambiguous relative Unambiguous relative Ambiguous - unambiguous

253 270 - 17

selected (chosen)

by the director

believed that

400 335 65

349 372 -23

318 325 - 7

370 358 12

One-referent context 255 263 -8

Two-referent context Ambiguous relative Unambiguous relative Ambiguous - unambiguous

261 267 -6

263 242 21

1528

SPIVEYAND TANENHAUS

420 One-Referent -Referents

400'

.~ [.., 380'

Discussion

~ 360. ~ 340~ 320. 300

then the eyes saccade past by to land somewhere in the subsequent noun phrase, it may be inferred that by was processed parafoveally during viewing of the verb. The overall mean probability of skipping by on the first pass through a target sentence was .68. This is consistent with previous experiments with similar sentences (Trueswell et al., 1994).

!

!

Ambiguous

Unambiguous

Relative Clause Condition Figure 2. Experiment 1. First-pass reading times at the by phrase for ambiguous and unambiguous reduced relative clauses in both referential contexts. processing difficulty and verb form frequency, with the processing difficulty carded by those items with the highest simple past tense frequencies. This prediction was explored in Experiment 2. Regressive eye movements. In addition to reading times, it may also be informative to analyze the frequency of regressive eye movements to various regions of the sentence. Increased frequency of rereading certain regions in certain conditions may be indicative of increased processing difficulty. As seen in Table 5, the ambiguous sentence in the one-referent context typically showed more regressive eye movements than the other conditions. However, no effects were statistically significant in this analysis. Our model assumes that by is typically available parafoveally during fixation on the preceding verb. To see whether this was the case, we analyzed the probability of readers actually skipping (not fixating) the word by during the first pass through the sentence. If a reader fixates the verb and

The results showed clear effects of discourse context on the resolution of the relative clause ambiguity. In the one-referent context, reading times for the by phrase were significantly longer for ambiguous relative clauses compared with unambiguous controls, reflecting the usual main clause preference. However, when the context provided referential support for the relative clause, reading times were similar for the ambiguous and unambiguous reduced relatives. Although these results clearly provide support for the hypothesis that discourse context can have immediate effects on syntactic ambiguity resolution, as proposed by Crain and Steedman (1985) and Altmann and Steedman (1988), it is important to note that our experiments were not designed to provide a test of the presuppositional hypothesis proposed by these authors. For example, referential contexts like the ones that we, and others, have used may invoke a strong expectation in the reader that the future input will discriminate between the members of the set introduced in the context (Spivey-Knowlton, 1992; Spivey-Knowlton & Sedivy, 1995). Although it is possible to deconfound purely presuppositional manipulations from expectations (cf. Sedivy & Spivey-Knowlton, 1994; Spivey-Knowlton & Sedivy, 1995), we did not attempt to do so in the materials used in these experiments. In addition, for some of the target sentences, a main clause continuation of the ambiguous fragment would have been somewhat implausible in the one-referent context. For example, one of our context items described a dragon killing a knight, and the accompanying target sentence began The knight killed . . . . In such a context, it would be highly implausible for the knight to be the agent of the killing event. Thus, the effects of the referential context might have been augmented by effects of plausibility. Note, however, that these plausibility effects can only operate in conjunction with the context. In the absence of the context, a main clause continuation would have been

Table 5 Mean Number of Regressive Eye Movements to a Region: Experiment 1 Target sentence Ambiguity condition The actress selected (chosen) by the director One-referent context Ambiguous relative .21 .38 .43 Unambiguous relative .15 .22 .41 Two-referent context Ambiguous relative .18 .26 .32 Unambiguous relative .10 .24 .49

believed that .50

.34 .33 .35

DISCOURSEAND LEXICAL CONSTRAINTS completely plausible (e.g., knights often kill). Thus, for plausibility to come into play, the reader must have been using the context to interpret the referential noun phrase and the ambiguous verb. Experiment 2 This experiment had two goals. The first goal was to replicate the results of Experiment 1. This is important because several previous studies have not found immediate referential effects with the reduced relative ambiguity (Britt et al., 1992; Ferreira & Clifton, 1986; Murray & Liversedge, 1994). The second goal was to examine the hypothesis that referential constraints interact with verb-specific constraints. The constraint that is most likely to have strong effects with our materials is the relative frequency with which the -ed form of the verb is used as a past tense and as a past participle. To correlate context effects with relative frequency, for individual verbs, it was necessary for us to compare the same verb in ambiguous and unambiguous relative clauses, unlike those used in Experiment 1. Thus, we compared reduced relative clauses (e.g., The actress selected

by the director believed that her performance was perfect) with full, unreduced relatives (The actress who was selected by the director believed that her performance was perfect). Use of the same verbs also provides a data set that can be used for simulations with an implemented competition model.

Method Participants. Twenty undergraduate students from the University of Rochester participated for course credit. Materials. The materials and experimental design were identical to that of Experiment 1, except that the syntactically ambiguous sentence was compared with an unreduced relative clause, instead of a morphologically unambiguous reduced relative clause, and the contexts always used the verb that was in the target sentence (i.e., selected, not chose). Eye movements were recorded in the same manner as in Experirnent 1. Procedure. Data were collected in the same manner as in Experiment 1. Because of inaccurate tracks, or the participant's first fixation of the target sentence not being at the beginning of the sentence, 9% of the critical trials were excluded from analysis.

1529

Results and Discussion All participants answered the questions with 80% or better accuracy. For eye-movement analysis, the target sentences were segmented into the same four regions as before. In the uureduced relative target sentence, the relative pronoun region, who was, was excluded from analysis because it has no counterpart for comparison in the ambiguous reduced relative target sentence. Reading times. Table 6 shows the mean total reading time that participants spent in each recording region, including regressive fixations. Participants spent more time reading the reduced relative sentence when it was preceded by the one-referent context than in any other experimental condition. In the two-referent context, participants spent about the same amount of time reading the reduced relative sentence as the unreduced relative sentence. An ANOVA for the initial verb region found a main effect of reduction, with the reduced relative clauses being read more slowly than uureduced relatives; see Table 7. At the by phrase, there were marginal main effects of both context and relative clause reduction; see Table 7. Most important, there was a reliable interaction between context and relative clause reduction; see Table 7. There were no significant results at the main verb region. First-pass reading times are presented in Table 8. In the ANOVA for the initial verb region itself, there were no significant effects. At the by phrase, there was a reliable interaction between context and reduction but no main effects; see Table 7. The interaction at the by phrase is highlighted in Figure 3. Note the similarity between Figure 2 (from Experiment 1) and Figure 3 (from Experiment 2). Finally, there were no significant effects at the main verb region. Regressive eye movements. Table 9 shows that the ambiguous reduced relative in the one-referent context exhibited the highest frequency of regressive eye movements. An ANOVA revealed only marginal results. There was a suggestive main effect of context at the by phrase, Fl(1, 16) = 4.20, MSE = .1529,p = .057; F2(1, 12) = 4.38, MSE = .1257, p = .054. Additionally, there was an interaction between context and reduction at the main verb

Table 6

Mean Total Reading Times (in Milliseconds) Across All Four Regions in Experiment 2 Target sentence Ambiguity condition

The actress

Reduced relative Unreduced (full) relative Reduced - unreduced

345 342 3

Reduced relative Unreduced (fuH)relative Reduced - unreduced

Two-referent context 330 343 322 317 8 26

selected

by the director

believed that

555 469 86

582 513 69

457 456 1

509 484 25

One-referent context 401 317 84

1530

SPIVEYAND TANENHAUS Table 7

Analyses of Variancefor Reading Times From Experiment 2 Reading time analysis F df Total reading times at verb Context Participants Items Reduction Participants Items Context × Reduction Participants Items Total reading times at by phrase Context Participants Items Reduction Participants Items Context × Reduction Participants Items First-pass reading times at verb Context Participants Items Reduction Participants Items Context x Reduction Participants Items First-pass reading times at by phrase Context Participants Items Reduction Participants Items Context × Reduction Participants Items *p < .05.

MSE

p

0.867 1.385

1,16 1,12

13,578 4,212

.366 .258

9.23 16.477

1,16 1,12

15,662 7,489

.008** .001"*

0.726 0.764

1,16 1,12

8,645 6,418

9.408 3.688

1,16 1,12

8,638 18,003

.007"* .074

3.162 3.688

1,16 1,12

16,932 18,003

.060 .074

5.473 4.347

1,16 1,12

4,225 11,245

.033" .050*

0.064 0.516

1,16 1,12

8,480 5,715

.804 .484

2.209 1.561

1,16 1,12

13,449 8,529

.157 .231

3.219 0.220

1,16 1,12

7,560 4,593

.092 .884

1.330 2.687

1,16 1,12

14,282 6,296

.266 .127

1.838 1.333

1,16 1,12

8,459 11,748

.194 .271

9.017 5.214

1,16 1,12

4,023 7,462

.407 .396

.008"* .041"

**p < .01.

Table 8

Mean First-Pass Reading Times (in Milliseconds) Across All Four Regions in Experiment 2 Target sentence Ambiguity condition

T h e actress

Reduced relative Unreduced (full) relative Reduced - unreduced

293 299 -6

selected

by the director

believed that

477 406 71

451 490 -39

403 418 -15

419 419 0

One-referent context 329 256 73

Two-referent context Reduced relative Unreduced (full) relative Reduced - unreduced

310 307 3

289 286 3

DISCOURSEAND LEXICALCONSTRAINTS 500 +

One-Referent -Referents

480.

~ 460.

°~

~

N 420-

400, 380

with the full (unreduced, unambiguous) relative clause. However, when the discourse context supported the relative clause interpretation by introducing two potential referents for the initial noun phrase, little or no processing difficulty was observed. The data pattern is similar to that obtained in Experiment 1 with the exception that reduction effects were observed earlier for the one-referent contexts.

Interactions Between Frequency and Context

440, .

~

1531

!

!

Reduced

Unreduced

Relative Clause Condition

Figure 3. First pass reading times at the by phrase for reduced and unreduced (full) relative clauses in both referential contexts. Note the similarity between this interaction and that in Experiment 1 (see Figure 2).

that was reliable only by items, F1(1, 16) = 1.99, MSE = .2862,p > .1; F2(1, 12) = 7.32, MSE = . l l 1 7 , p < .02. As in Experiment 1, we again analyzed the probability of readers skipping (not fixating) the word by during the first pass through the sentence. The overall mean probability of skipping by on the first pass through the sentence was .70. This is consistent with the overall mean observed in Experiment 1 and with previous work (Trueswell et al., 1994). In summary, the critical interaction between context and relative clause reduction for first pass reading times at the by phrase indicated an immediate influence of discourse context in the resolution of syntactic ambiguity, as we also saw in Experiment 1. When the discourse context supported the main clause interpretation, substantial processing difficulty was observed for the reduced relative clause as compared

To evaluate the hypothesis that referential context interacts with other relevant sources of constraint, we tested for a correlation between referential context effects and the relative frequency with which each verb is used as a past participle. As Figure 4 illustrates, the higher the past participle frequency, the more support for the relative clause at the verb. For the two-referent contexts, in which readers are biased by the discourse toward a relative clause reading, reading times for verbs with higher past participle frequencies should be faster compared with reading times for verbs with lower past participle frequencies. The logic behind this prediction is based on a competition assumption. If both the local context (verb frequency information) and the global context (referential discourse information) support the relative clause alternative, there will be less competition between the relative clause and main clause alternatives because the relative clause starts out with the vast majority of the probabilistic activation. In contrast, if the local context supports the main clause alternative (because of low frequency of the past participle form of the verb) while the global context still supports the relative clause alternative, both syntactic alternatives will be quite active and the competition will lead to substantial processing difficulty (cf. Spivey-Knowlton, 1994). To estimate verb form frequency, we used the metric employed by Trueswell (1996). Word frequency effects, in general, follow a logarithmic function (e.g., Solomon & Postman, 1952). Therefore, we used the log of the frequency of the verb appearing as a past participle in raw tokens per million (taken from Francis & Ku~era, 1982). This number must then be normalized by the overall frequency of the verb appearing in any form. The resulting metric is log VBN/ IogBASE, where VBN is the frequency of the past participle

Table 9 Mean Number of Regressive Eye Movements to a Region: Experiment 2 Ambiguity condition Reduced relative Unreduced (full) relative Reduced relative Unreduced (full) relative

Target sentence The actress selected by the director One-referent context .18 .35 .38 .15 .27 .35 Two-referent context .13 .22 .22 .07 .11 .16

believed that .50 .14 .39 .37

1532

SPIVEY AND TANENHAUS

3501



3001

O

!

m

!

|

I

.

.

.

.

.

.

.

"

-



-

~. 250 200 150 100 50 i

"~ o

0

0

..........

-50 O

-100 -15£ - • .45 .5

-

.

.

,

.

,

.

,

-

,

.

,

.55 .6 .65 .7 .75 .8 Past Participle Availability

.

,

.85

-

.

.9

Figure 4. Past participle availability (logVBN/IogBASE) predicts the magnitude of initial processing difficulty (ReducedUnreduced in milliseconds) at the verb, in the two-referent context. Each circle is a stimulus item. See text for results when the outlier is excluded.

for that verb and BASE is the overall frequency of that verb.2 This value provides a rough estimate of the availability of the past participle form for a given verb. For example, the past participle availability of the verb selected is .89, and the past participle availability of watched is .48. For each stimulus item in the two-referent context condition of Experiment 2, past participle availability was entered into a regression analysis to predict the magnitude of processing difficulty in first-pass reading times of the verb. Processing difficulty was quantified as the difference in first-pass reading time for the verb when it appeared in the reduced relative clause versus the full relative clause. Using the same verb in the full relative clause as a baseline, we were able to factor out item specific differences that are unrelated to ambiguity. (However, it should be noted that the full relative clause is probably not a perfect zero baseline. In a strongly relative-clause-supporting context, it may be slightly infelicitous to use afull relative clause, and this may be responsible for some of the stimulus items having negative magnitudes of processing difficulty in the tworeferent context.) As predicted by a competition-based account, there was a strong negative correlation between past participle availability and processing difficulty at the verb in the two-referent context; r 2 = .66, p < .001 (see Figure 4). The correlation is negative because there is less competition when both the context and the lexical frequency information provide converging evidence for the relative clause. However, as can be seen in Figure 4, the distribution of stimulus items contains one very conspicuous outlier: the verb watched. The past participle availability for watched is 2.9 standard deviations below the mean for the set of 16 verbs. Although this item performed as predicted (i.e., its processing difficulty score was 3.0 standard deviations

above the mean), an outlier such as this violates the normality assumption in a regression analysis (see also the discussion by Murray and Liversedge, 1994). In addition, the measure of past participle availability used in the regression reported thus far assumes that frequency information is equally weighted for all verbs. This assumption is likely to be incorrect for two reasons. Most important, frequency estimates are likely to be less accurate when the corpus contains few exemplars. Second, the processing system might weight frequency information more strongly for more frequently occurring verbs. For example, a verb that appears twice per million words, and is used both times as a past participle, may not bias the reader toward a reduced relative clause as strongly as a verb that appears 100 times per million and always as a past participle. However, the log VBN/IogBASE metric would assign both verbs the same score. When watched and the three verbs with a base frequency of less than 20 per million were removed from the regression, the past participle still predicted processing difficulty at the verb; r 2 = .34, p < .05. 3 Regression analyses predicting item-by-item processing difficulty were also computed for stimuli in the one-referent contexts. In contrast to the two-referent context condition, thi,~ analysis did not reveal any significant correlations with past participle availability, r 2 = .03, p > .5, nor when the four outlier items are excluded, r 2 = .10,p > .3. The results of these regression analyses, in the tworeferent context, indicate that reading times for the individual verb that introduced the syntactic ambiguity were modulated by the degree to which its relevant lexical frequencies supported a relative clause reading. This result provides support for MacDonald et al.'s (1994) claim that context effects are modulated by lexical frequencies, and it complements Trueswell's (1996) finding that verb form frequency modulates within-sentence thematic effects. It also supports the more general claim that multiple sources of constraint contribute to the initial syntactic ambiguity resolution process. A Competitive Integration Model

Modeling Reading Times for Individual Items Thus far we have been assuming that there is a systematic relationship between processing difficulty, as measured by reading times, and the strength of evidence for competing syntactic alternatives. However, we have not formalized our assumptions. This is problematic because in the absence of explicit assumptions, constraint-based models are difficult to evaluate. Moreover, it is hard to compare predictions made by competing models. In this section, we describe a simple

2 The exact metric used by Trueswell (1996) was the natural log (In) of the frequencies, but the metric is a ratio (normalized by the overall frequency); therefore, IogVBN/logBASE = lnVBN/ lnBASE. 3 With only watched removed from the analysis, the regression has a comparable slope to when it is included, but the correlation is only marginally reliable, r2 = .24, p = .06.

DISCOURSEAND LEXICALCONSTRAINTS implementation of the framework presented in Figure 1 and test it against the stimulus items from Experiment 2. The computational model that we implemented is intended as a generic realization of a model in which (a) multiple constraints are combined to compute alternative interpretations in parallel and (b) the alternatives compete with one another during processing until one achieves criterion activation. Thus, our framework is broadly compatible with the general architectural assumptions made by a number of recently implemented models of syntactic ambiguity resolution (e.g., Burgess & Lund, 1994; Jurafsky, 1996; McClelland et al., 1989; Pearlmutter, Daugherty, MacDonald, & Seidenberg, 1994; Stevenson, 1994; Tabor et al., 1997; Trueswell, Kim, Lund, & Burgess, 1995). An important difference between this model and those of Burgess and Lund (1994), McClelland et al. (1989), Pearlmutter et al. (1994), Tabor et al. (1997), and Trueswell et al. (1995) is that instead of using distributed representations of the syntactic alternatives, it idealizes the competing syntactic representations into pairs of localist nodes. Although localist representations do not share some of the advantages of distributed representations (e.g., graceful degradation and a certain degree of neurophysiological plausibility), they allow a much simpler look into the model to observe and interpret its state at a particular time. On the other extreme, an important difference between this model and those of Jurafsky (1996) and Stevenson (1994) is that it produces a continuously graded degree of syntactic preference that varies from item to item (and therefore a graded degree of processing difficulty that varies from item to item), whereas the other models either produce a discrete syntactic preference and therefore a strict "garden-path" or "no gardenpath," or group large classes of lexical items and treat them identically in determining syntactic preference. It is important to note that none of the models just mentioned have attempted to make explicit item-by-item predictions of the degree of processing difficulty caused by syntactic ambiguity (or alleviated by discourse context), though these models could, of course, be modified to make item-specific predictions. It is important to keep in mind that we have implemented a model of constraint integration during ambiguity resolution and not a model of how the syntactic alternatives are generated. Thus, the model cannot account for sources of variance in processing time that are due to generation of syntactic alternatives. This is clearly an important limitation of the model. However, ambiguity resolution is in itself a central component of language comprehension. Moreover, it has served as the central testing ground for evaluating conflicting claims about parsing theories, typically under conditions where a temporarily ambiguous structure is compared with an unambiguous baseline, precisely the domain in which the model is most appropriate. Implementation. In order to develop an implementation, we needed to adopt an explicit competition algorithm and estimate the parameters for the inputs and weights, for the constraints pictured in Figure 1. The algorithm we used, normalized recurrence, implements competition between syntactic alternatives using recurrent feedback and normal-

1533

ization (Spivey-Knowlton, 1996). 4 This competition algorithm is similar in spirit to the approach developed by Stevenson (1994). First, each pair of constraint nodes was normalized to a sum of 1.0 for main clause (Me) and reduced relative (RR): Sc.,(norm) = S~,,I ~ Sc,a.

(1)

a

Sc, a represents the activation of each constraint node (i.e., the cth constraint that is connected to the a th interpretation node). S,.o (norm) is the same variable but normalized within each constraint. Constraints were then integrated at each of the interpretation nodes by means of a weighted sum based on Equation 2.

Io = ~ [wc × Sc.o(norm)].

(2)

0

The activation of the a ~ interpretation node is represented by Io. The weight on the connection linking the cth constraint node to interpretation node la is represented by we. Equation 2 was applied to each interpretation node and was summed across all constraint nodes that fed into it. Finally, Equation 3 determined how the interpretation nodes sent positive feedback to the constraints commensurate with how responsible the constraints were for that interpretation node's activation. Note that the weights were equal in both directions. S~,~ = S,,~(norm) + la × Wc × S~.a(nOrm).

(3)

These three steps (Equations 1-3) were computed in sequence within each cycle of competition. Thus, as cycles of competition take place, the difference between the two interpretation nodes gradually increases. This integration of constraints converts disparate formats of representation into the common medium of probabilistic support for mutually exclusive interpretations and allows biases from these information Jsources to simultaneously contribute to resolution of the RR-MC ambiguity. Moreover, the normalized recurrence competition algorithm allows these information sources to indirectly resolve each other's ambiguities (not unlike the interactive activation model, cf. McClelland & Rumelhart, 1981 and Rumelhart & McClelland, 1982). For example, in a two-referent context with an equi-biased verb, the RR will be more active than the MC, because the context and the parafoveal by strongly support the RR. Through normalized recurrence, these biases will drive the system toward fully supporting the RR and thus gradually resolve the verb's own tense ambiguity as well. Four sources of constraint were used: verb tense (derived from Francis & Ku~era, 1982), a probabilistic main clause bias (derived from corpus analyses, e.g., Tabossi et al., 4 Recent work in computational ncuroscience has emphasized the importance of recurrent feedback (e.g., Douglas, Koch, Mahowald, Martin, & Suarez, 1995) and normalization (e.g., Carandini & Heegcr, 1994) for input recognition.

1534

SPIVEY AND TANENHAUS

1994), discourse information (derived from sentence completions in the different discourse contexts), and parafoveal information (derived from the presence of by after the verb, supporting the relative clause). Each constraint was condensed into its probabilistic support for either the RR or the MC interpretations. For example, the verb frequency information was condensed into R R - M C probabilities by dividing the individual log-transformed values by their sum according to the following equations: P(RR) = (logVBN/IogBASE)/[(logVBN/logBASE) + (IogVBD/IogBASE)], P(MC) = (logVBD/IogBASE)/[(logVBN/IogBASE) + (logVBD/IogBASE)], where VBN is the verb's frequency as a past participle, VBD is the verb's frequency as a simple past, and BASE is the verb's overall frequency. Ideally, biases for each of the constraints should be independently established for each of the stimulus items and the weights on the constraints should be independently motivated. In subsequent work that builds on the present work, we have used ratings and corpus analyses to establish biases for each of the constraints. Weights on constraints were set to simulate data from offline fragment completions. These weights were then used for the simulations of the online reading-time data (Hanna, Spivey-Knowlton, & Tanenhaus, 1996; McRae et ai., 1998). We did not, however, have appropriate norms or completions to follow all of these procedures here. To minimize a priori assumptions, the present model used equal weights (.25) for all four constraints. The rationale for each of the biases is described in detail below. Estimating biases. Following TruesweU (1996), we used the measure of past participle availability from the regression analyses described earlier for the biases of the verb tense constraint. The biases for the parafoveal constraint were set using a corpus analysis conducted by McRae et al. (1998) in which the Wall Street Journal and Brown corpora (Marcus, Santorini, & Marcinkiewicz, 1993) were searched for a set of 40 verbs to calculate an independent estimate of the degree to which a by phrase subsequent to a verb-ed biases a reader toward a reduced relative. We found 124 sentences in which by directly followed the -ed form of the verb, of which 99 were by phrases introducing agents in passive constructions. Thus, the by bias was set to .2 (25A24) for the main clause and .8 (99A24) for the reduced relative clause. The main clause bias was derived from a corpus analysis reported in Tabossi et al. (1994). For the verbs used in that study, 92% of sentence-initial noun phrase (NP) verb-ed sequences continued as a main clause, whereas 8% continued as a reduced relative, giving us biases of .92 and .08. Thus, the main clause and the parafoveai by biases used in the current simulations were the same as those used in McRae et al. (1998), which also used sentence initial

reduced relative clauses with animate noun phrases and by phrases.

We did not have independent estimates of the individual strength of each discourse context. Accordingly, we assumed that each one-referent context had the same bias and each two-referent context had the same bias. The actual values of the discourse bias were set so that averaging the discourse bias with the main clause bias would approximate the mean proportion of relative clause completions obtained in the completion data when participants completed the NP verb fragment in a biased discourse context. As reported in Spivey-Knowlton et al. (1993), these values were 25% for the one-referent context and 43% for the two-referent context. This procedure assumes that the same biases and weights used in fragment completions are appropriate for online simulations, an assumption that is supported by recent work by McRae et al. (1998) and Hanna et al. (1996). We did not use the completions to set different biases for each context because these completions do not represent an independent estimate of the strength of the context, because they take into account the context in conjunction with the NP and the verb. For the two-referent context, the resulting biases were .22 for the main clause and .78 for the reduced relative clause. For the one-referent context, the biases were .58 for the main clause and .42 for the reduced relative clause. Note that it might seem odd that the one-referent context should be as evenly biased between the main clause and reduced relative as it is. However, in several fragment completion studies we have found that simply including contexts with multiple different referents increases the proportion of reduced relative completions compared with completions without contexts (cf. Trueswell & Tanenhaus, 1991). 5 As shown in Panel A of Figure 5, the model takes the weighted sums of these inputs: one computed as the support for the RR, one computed as the support for the MC. 5 In previous versions of the model, conducted before the McRae et al. (1998) work was completed, we used slightly different weights and biases. The weights in these simulations were .22 (2/9) each for the discourse context, by, and the main clause bias, and .33 (3/9) for the tense bias. The main clauses bias was .85/.15, the by bias was .15/85, and the one-referent and two-referent contexts were .67/.33 and .33/.67, respectively. The context biases values were chosen to approximate the completions given a .85 main clause bias. Simulations with these weights are presented in Tanenhaus et al. (in press). We adopted the weights and biases used in the final version of this article because they involved making simpler a priori assumptions than the earlier weights. It is important to acknowledge that we did not use an algorithm that systematically searched the possible parameter space of the model as in McRae et al. (1998). Thus, we cannot prove that there is not another set of very different weights that would also fit the data at the verb. However, our experience with these simulations and others with similar data sets suggests that this is quite unlikely. In McRae et al., we used a procedure for systematically searching the parameter space to assign weights. The weights we adopted here are roughly consistent with those used by McRae et al. using this procedure. Note that we did not use the same weight fitting procedure here because we did not have appropriate offline data (gated fragment completions) to use in setting weights.

DISCOURSE AND LEXICALCONSTRAINTS

A

ProvisionalInterpretation of SyntacticAmbiguity RR MC



1535

themselves and multiply with the individual weighted input values to send positive feedback, thus rewarding each input node proportional to both its separate contribution and the resultant RR or MC activation. At each time step, the input nodes accumulate activation from the feedback connections, and the activations of each pair of input nodes are renormalized to 1.0. Instead of having the activations of the MC and RR nodes accumulate iteratively, we simply recompute them as the weighted sums of their inputs at the new timestep. 6 For example, a stimulus item in a two-referent context, using the verb presented, would start off with the values in Figure 5A, where the RR and MC values are the weighted sums of their respective inputs. Those weighted input values (e.g., % × .60 = .20) are then multipled by the MC and RR activations (e.g., .20 × .57) to send feedback that is added to the current values of the original inputs (Figure 5B). Those new input values are then renormalized to 1.0, and their means are computed again at the RR and MC nodes (Figure 5C). This process (compute-mean, compute-feedback, renormalize) repeats until one of the provisional interpretations reaches a criterion activation. Dynamic criterion. The criterion used in the model is a function of how long the alternatives have been competing. Following McRae et al. (1998), this dynamic criterion was 1-xt, where x is a constant and t is the time step of competition. As competition lasts longer, the criterion for stopping competition gets more lenient, with the maximum competition duration being (1-1In)Ix time.steps, where n is the number of competing alternative interpretations. (l/n would be the probabilistic activations of the n alternatives if all inputs were perfectly equi-biased [without a dynamic criterion, in such a situation, the model would compete for eternity]. 1-1In is the amount of the probability space that the dynamic criterion must traverse in order to reach those equi-biased alternatives, and dividing that by x gives the number of steps [of size x] that the dynamic criterion would take to get there.) A dynamic criterion is particularly necessary for modeling eyetracking data across multiple regions of the sentence because fixation durations are partially determined by a preset "timing program" (e.g., Rayner & Pollatsek, 1989; Vaughan, 1983). Essentially, the reader will spend only so long in a given region of a sentence before making the next saccade. Explorations of the parameter space of the model applied to several studies suggests that a constant of about .01 for the dynamic criterion

r~oi~/ ~ ~A ~~ormP~:tio~ ~'//"

\1

×~×~

~ 4 ~

4

Main ClauseBias DiscourseInformation

ProvisionalInterpretation of SyntacticAmbiguity RR MC

V •

T

Verb . e n

/ s

" e

/

~ V

~ °

l

~ ~

~ ~ ~

Parafoveal Inf°rmati°n

Main Clause Bias DiscourseInformation

I~, •

ProvisionalInterpretation of SyntacticAmbiguity RR MC



~

Parafoveal Infonnation

g4 Main CI museBias

Di~ourse Information

Figure 5. Schematic diagram of normalized recurrence in the competitive integration model. In Panel A, activation of provisional interpretations is the weighted sum of the four constraints. In Panel B, those same input values are multiplied by the activation of the provisional interpretation for feedback to the input constraints. Those constraints accumulate activation and then renormalize to one before integrating again to compute the new provisional interpretation activations (Panel C). RR = reduced relative; MC = main elanse.

Because the weights sum to 1.0 and the individual inputs are normalized to one, the respective mean values in the RR node and MC node will always sum to 1.0. For the recurrence that produces competition, the activations of these provisional interpretation nodes then act like weights

6 We do this because we do not see the provisional interpretation nodes as separate representations, hut rather as abstractions over the respective patterns of activation across the lower level nodes. In this way, the abstraction over the pattern of activation produces a positive feedback effect (self-reinforcing as well as crossreinforcing) that allows the global representation to, over time, become "more than the sum of its parts." Tabor and Tanenhaus (1998) show that a dynamical systems parser will develop attractors that correspond to competing syntactic analyses, generating similar competition patterns to those obtained with a competitive integration model with localist interpretation nodes. Tabor and Tanenhaus modeled the McRae et al. (1998) results applying the framework developed in Tabor et al. (1997).

1536

SPIVEY AND TANENHAUS

provides good fits to first-pass eye-movement data (Tanenhaus, Spivey-Knowlton, & Hanna, in press). As the model iterates toward the dynamic criterion, the syntactically available interpretations compete for probability resources. In this way, it is similar to Stevenson's (1994) competition algorithm. The number of iterations necessary to achieve the criterion activation indicates how long the syntactic alternatives are actively competing and thus how long readers spend processing the ambiguous verb in the reduced relative clause, compared with the full relative clause. Simulation. We tested the model by determining whether it could predict the item-specific reduction effects for the 16 stimuli in the two-referent context condition. The regression analyses reported earlier established that there was itemspecific variance that was correlated with tense, the one constraint for which the model had different biases for different items. The durations of competition produced by this algorithm predicted the processing difficulty in first-pass reading times at the verb (reduced relative minus full relative) for each item in the two-referent contexts at least as well as the frequency metric used in Figure 4; r 2 = .37, p < .02 (see Figure 6). This result arises out of the fact that the normalized recurrence competition algorithm, which uses the verb frequency information (as well as the other constraints) in an inherently nonlinear fashion, produced near equal initial activation of the RR and MC nodes for stimulus items that showed considerable processing difficulty at the verb, especially for the verb watched--the clear outlier in Figure 6. As with the verb frequency regression analyses described earlier, the verb watched is an outlier that violates the normality assumption, and the stimulus items with low frequency verbs have rather unreliable estimates of the stimulus parameters. When watched and the verbs with

350

-

'

-

i

a

i

|

m

l

l

n

0

,-, 300 250

200 =

150 O

50 0

~

-50

O

f

O

v O

-100 -150 17.5 20 22.5

O

25 27.5 30 32.5 35 37.5 Cycles of Competition

40

Figure 6. The competitive integration model's predictions of processing difficulty at the verb for the 16 stimulus items from Experiment 2. Each circle is a stimulus item. Processing difficulty is computed as first-pass reading time at the verb in the reduced relative minus that in the full relative.

fewer than 20 occurrences per million were excluded from the analysis, as was done previously, the prediction was still reliable; r 2 = .38, p < .05. Thus, without manipulating any free parameters from item to item, only changing the input values as indicated by the lexical frequencies, the model is able to approximate the item-by-item variation in the first-pass reading time differences at the verb in the tworeferent context for Experiment 2. 7

Modeling Reading Times Across Multiple Regions Thus far, we have restricted ourselves to modeling effects at a single region, the first verb, taking advantage of the fact that all of the relevant constraints in Experiment 2 can be treated as being simultaneously available when the ambiguity is introduced. We now extend the model to simulate results across successive regions in the sentence. This extension also allows us to apply the model to comparable experiments in the literature. For this purpose, we restrict ourselves to those experiments to which the model directly applies, namely experiments that manipulated referential contexts for reduced relative clauses compared with an unambiguous baseline, s This allows a preliminary test of the claim that the superficially conflicting results in the literature can be unified within a constraint-based framework. We apply the model to three different data sets. First, we show that the model provides an approximate fit to the results of Experiment 2 across all three critical regions: verb, by phrase, and main verb + one word. The model exhibits the early interaction between context and relative clause reduction, followed by no processing difficulty at the main verb region. Second, the model approximates the results of a 7 Initially, it might seem as though the appropriate next step would be to simulate the item-by-item data from Experiment 1. However, in Experiment 1, the unambiguous baseline condition involved different verbs (e.g., chosen, slain) than those in the ambiguous condition (e.g., selected, killed). Although condition means that collapse across the individual items might smooth over the noise introduced by comparing different verbs (as is done in Figure 7) an item-by-item analysis of the sort done for Experiment 2 would be essentially uninformative in the case of Experiment 1 because of word length differences and base frequency differences. What might be done instead is an item-by-item analysis of the context effect in Experiment 1 regardless of the baseline condition. For example, a difference can be computed between the mean reading time for an ambiguous verb in its one-referent context and the mean reading time for that same verb in its two-referent context. The model can then predict those item-by-item differences by taking the differences between its competition durations for each ambiguous verb in its one-referent and two-referent contexts. Across the 16 stimulus items, this model's prediction is statistically significant, r e = .25, p < .05. When watched and the three low frequency verbs are excluded from the regression analysis (leaving only 12 data points), the effect is marginal, re = .31, p = .06. s Other studies of reduced relatives in referential contexts (Britt et al., 1992; Ferreira & Clifton, 1986) can also, in principle, be accommodated by the constraint-based framework. However, these studies did not use an unambiguous baseline sentence to test their effects. Therefore, a clean measure of processing difficulty cannot be computed.

DlSCOtmSEANDLEXlCALCONSaXAINTS

incrementally approximates the competition for each fixation in the sentence. As readers made an average of about 1.75 fixations in the by phrase region (see also Trueswell et al., 1994), competition durations in this region were multiplied by 1.75. Once the dynamic criterion was reached for competition in this region, mean competition times were computed for the one-referent context items and for the two-referent context items. For the main verb region, a new input was again added: a bias of 1.0 for the RR and 0 for the MC, as this region marks the point of complete syntactic disambiguation. This new input was also given a weight of 1.0, and all weights were renormalized to one. Figure 7 shows that the model provides approximate fits for condition means for the data from Experiment 2. At the by phrase, competition had decreased for the two-referent context, whereas it had increased somewhat for the onereferent context. At the main verb, competition lasted only briefly for both contexts. The model was also implemented to simulate self-paced reading results (Spivey-Knowlton et al., 1993) using the same stimulus materials as those in Experiment 1. In word-by-word self-paced reading, the word by is not visible during reading of the verb; thus only verb tense-voice, main clause bias, and discourse information were entered into the integration with their weights normalized to one. At this region, an interesting result was observed. The model actually exhibited more competition for items in the tworeferent context than for those in the one-referent context. In

word-by-word self-paced reading study that used the same stimulus materials (Spivey-Knowlton et al., 1993, Experiment 3). In these results, the sentences with two-referent contexts showed processing difficulty at the ambiguous verb, whereas sentences in one-referent contexts did not. This pattern then reversed at the NP within the by phrase. Finally, the model provides an approximate fit to the results of Murray and Liversedge (1994, Experiment 2), in which the two-referent context exhibited more processing difficulty throughout the entire target sentence. Thus, by changing only the inputs to the model (not its internal parameters), the normalized recurrence competition algorithm can account for results that show immediate context effects, delayed context effects, and even reversed context effects. For modeling the results of Experiment 2, we compared the mean competition value for the sentences in the onereferent context with that for the sentences in the tworeferent context, already computed for the verb region in the previous section. For the results at the by phrase, where the noun provides strong support for the relative clause, a new input was added: a bias of .875 for the RR and .125 for the MC, consistent with McRae et al. (1998). For simplicity, this new input was given a weight of 1.0, and then all weights were normalized to one (giving it a weight of .5, and the previous constraints' weights were thus halved). The activation values of the inputs, after competition had reached the dynamic criterion from the previous region, were then used to resume competition at the new region. Thus, the model

120 100"

1537

50 Human Data

Model Results

----O-- One-Referent Context

.... O-'" One-Referent Context .... A---- Two-Referent Context

Two-Referent Context

'40

80"

g 60 '30 ~

.¢,,~

40

3

~0

A 02

20

it

'20

0 -20

10

-40 -60

n

I

1

selected

by the director

believed that

0

Figure 7. The competitive integration model approximates processing difficulty across all three critical regions for the two contexts in Experiment 2. Processing difficulty is computed as first-pass reading time in the reduced relative minus that in the full relative, for each region. Model results are overlaid on the data to show general correspondence of the patterns.

1538

SPIVEYAND TANENHAUS

the one-referent context, competition at the verb was quickly resolved in favor of the MC, whereas many of the items in the two-referent context were resolved in favor of the RR, but only after substantial competition. Indeed, this pattern is reflected in the human data (Figure 8). When the input for by was included at the next region, with a weight of 1.0 (at which point all weights were renormalized to a sum of 1.0), the activations of the other inputs had become somewhat polarized because of competition at the verb. As a result, competition was about equal at the by region in the two contexts. At the next region, the, a new input was added with a .875 bias for the RR and .125 for the MC. These biases were established from a corpus analysis reported in McRae et al. (1998). As in the previous simulation, this new input was given a weight of 1.0, and all weights were renormalized to one. Competition at this region increased for the onereferent context and decreased for the two-referent context. Then, at the head noun of the by phrase, a new input was added with a bias of .99 for the RR and .01 for the MC, and its weight was set at 1.0, with all weights then being normalized to one. Again, these bias values were based on McRae et al,'s (1998) gated sentence completions. Then, at the main verb region, a new input was added with a 1.0 bias for the RR and 0 bias for the MC, and its weight was set at 1.0, with the weights then being normalized to

one. As Figure 8 shows, the model predicts a crossover data pattern that is often described as a delayed effect of context (e.g., Fereirra and Clifton, 1986). This crossover in relative processing difficulty, with greater competition for the tworeferent context early in the sentence followed by greater competition for the one-referent context later in the sentence, is in fact what was found in Spivey-Knowlton et al. (1993, Experiment 3). Note, however, that the model predicts that ambiguity resolution should take place more rapidly than it does in human data. One reason why this might be the case is that one-word-at-a-time self-paced reading often shows delayed effects, presumably because processing of the word may lag behind the button press. To better simulate this inertia effect, we reduced the weight for each new input to .75. As Figure 9 shows, this lagged model does a better job of simulating the human reading time data. Finally, the model also simulates the superficially paradoxical situation in which a two-referent context actually increases processing difficulty for the relative clause throughout the entire sentence (Murray & Liversedge, 1994, Experiment 2; see also Ferreira & Clifton, 1986, Experiment 3). The verb tense information for Murray and Liversedge's (1994) verbs was determined for the 36 stimulus items, using the appendixes presented in Liversedge (1994). The verbs in their sentences were typically followed by a noun phrase beginning with the (e.g., The salesman paid th_..e_e

40 Human Data - - - O - - One-Referent Context ,Ik Two-Referent Context

'50

Model Results " " O .... One-Referent Context " " ~ .... Two-Referent Context

30

'40 o ..~ ..~

30 m

20

o

r~

o

20 10 10

0

I

i

!

i

I

selected

by

the

director

believed

Figure 8. The competitive integration model approximates the pattern of processing difficulty across the five critical regions for the two contexts in word-by-word self-paced reading (SpiveyKnowlton et al., 1993, Experiment 3). Processing difficulty is computed as self-paced reading time in the ambiguous reduced relative minus that in the morphologically unambiguous reduced relative, for each word. Model results are overlaid on the data to show general correspondence of the patterns. (As this measure of processing difficulty is qualitatively different from that in the other experiments, the raw scale factor implied by this overlay is not expected to coincide with the others.)

DISCOURSE AND LEXICALCONSTRAINTS

1539

40

---O-30

50

Model Results

Human Data One-Referent Context Two-Referent Context

.... O--"

One-Referent Context

.... •~ ' "

Two-Referent Context

40

= O

.~0) °=,,~

30

tX0

r,.)

.~ 20 r~ r/J

20 L)

10 10

!

!

i

i

i

selected

by

the

director

believed

0

Figure 9. To better approximate the inertia and processing lag of self-paced reading, the competitive integration model simulated the Spivey-Knowlton et al. (1993; Experiment 3) results with each new incoming constraint being given a weight of .75 (instead of 1.0) prior to weight normalization.

money put it . . . . The guest grilled th___esteak said i t . . . ) , which is locally consistent with a main clause. Therefore, parafoveal information was coded as a .5 bias toward the RR and a .5 bias toward the MC. The main clause bias was the same as in the previous simulations. The discourse information was derived from their sentence completion results (in the same manner as the previous simulations), where they observed 4% reduced relative completions in the onereferent context and 10% in the two-referent context (Murray & Liversedge, 1994, Experiment 3). Using the same weights as before, mean competition values were computed for the verb. For the rest of the relative clause (typically an NP), the NP was typically a good theme for the verb, consistent with a main clause interpretation (e.g., The girl made th._eecake cut it . . . . Correspondingly, this new input was given a bias of .25 toward the RR and .75 toward the MC. Its weight, as in the previous simulations, was set at 1.0, with all weights then being normalized to 1.0. Competition in the rest oftbe relative clause was multiplied by 1.75 (as in the simulation of the by phrase from our Experiment 2). Thus, the only differences between this simulation and the simulation of our Experiment 2 were the input values for verb tense-voice, parafoveal information, discourse information, and subsequent regions of the sentence; all weights and the constraint integration regime were the same across both simulations. As Murray and Liversedge (1994) combined the verb and the rest of the relative clause into one region, we simply added the mean competition values for

these two regions for comparison to their eye-movement data (Figure 10). For the relative clause region, the model showed slightly more competition in the two-referent context than in the one-referent context. Essentially, all items in the onereferent context were quickly resolved in favor of the MC, whereas a few of the items in the two-referent context were near equal in their MC/RR activations, and therefore there was lengthy competition between the two syntactic alternafives. For the main verb region (region of syntactic disambiguation), a new input was entered into the integration, with a 1.0 bias toward the RR and 0 bias toward the MC. Its weight was set at 1.0, and all weights were then normalized to 1.0. Again, competition at this region was slightly greater in the two-referent context than in the one-referent context. In summary, a competition model provides approximate fits for three different data patterns using a consistent set of parameters. In Experiment 2, two-referent contexts completely eliminated processing time differences b e t w ~ n full and reduced relatives at the point of disambiguation. In Spivey-Knowlton et al. (1993, Experiment 3), two-referent contexts speeded ambiguity resolution but did not eliminate processing difficulty. In Murray and Liversedge (1994), supportive contexts actually increased processing difficulty. Our simulations suggests that this paradoxical result may be because participants sometimes did not recover from the incorrect main clause analysis for the ambiguous sentences in the one-referent contexts.

1540

SPIVEY AND TANENHAUS

30 Human Data

20

--O--

A

/

One-Referent Context Two-Referent Context

~

.,A ,,,'.i0

J 0#

:::t o

t#

O

10 t# ~

.=. r~

45

f t oi

40

0

o

~o -10

O

4 ,,,J¢/ #0 0

~t -20

-30

O

0 #0 pap fO P oO

,

illI~

/

," /

Model Results

.... O " " .... ' ~ " -

35

One-Referent Context Two-Referent Context

!

I

grilled the steak

said it tasted nice

30

The competitive integration model approximates the pattern of processing difficulty across the two critical regions for the two contexts in an eyetracking experiment with a different set of materials (Murray & Liversedge, 1994; Experiment 2). Processing difficulty is computed as first-pass per word reading time in the reduced relative minus that in the full relative, for each region. Model results are overlaid on the data to show general correspondence of the patterns. (As this measure of processing difficulty is qualitatively different from that in the other experiments, the raw scale factor implied by this overlay is not expected to coincide with the others.) Figure 10.

It is important to note that the data patterns associated with each of these experiments have often been given different interpretations in the literature. The first data pattern is typically taken as evidence that context can affect initial syntactic processing, whereas the other two data patterns are taken as evidence for an initial stage of processing in which syntactic processing is encapsulated from the effects of discourse, with discourse influencing an evaluation and revision stage. Thus far we have demonstrated that a multiple constraints model using the same weights provides approximate fits to three of the different data patterns that have been obtained in the literature on discourse context effects. As our central claim is that these data patterns are best accounted for without appealing to a delay in the use of context, we also implemented two additional versions of the model using the same parameters and weights. In the first version, all of the constraints other than the main clause bias were delayed for 4 cycles of competition at the verb, the time it takes the model to resolve itself entirely in favor of a main clause. Thus this model can be seen as an implementation of a two-stage garden-path model (el. McRae et al., 1998). To compare the goodness of fit for this model with the no-delay simulations, we calculated root mean squared error values (RMS). The RMS values for the original, no-delay simulations were 33.98 for Experiment 2, 9.15 for the self-paced

reading study, and 26.30 for the Murray and Liversedge (1994) data. The comparable simulations with the delay models had RMS values of 36.13 for Experiment 2, 12.1 for the self-paced reading study, and 26.82 for the Murray and Liversedge data. The RMS values were consistently smaller for the no-delay version, indicating that it provided better fits of the data. In the second simulation, we delayed only the discourse bias by four cycles in order to simulate a model in which all within-sentence constraints apply before discourse constraints. The RMS values were again higher than for the no-delay simulations. The RMS values were 39.70 for Experiment 2, 12.42 for the self-paced reading study, and 26.87 for the Murray and Liversedge (1994) data. Finally, for both classes of delay models, making the delay longer resulted in increasingly poorer fits to the data. It is likely that by changing the values of the weights we could have generated better fits for all of these simulations, including the original constraint-based model simulations. However, these simulations do show that with the weights and parameters we motivated for these stimuli, the no-delay model does a better job of simulating the data than the delay model. The success of these simulations provides important evidence for the claim that syntactic ambiguity resolution involves an early integration of multiple information sources

DISCOURSE AND LEXICALCONSTRAINTS and a process of competition between syntactic alternatives. It is important to emphasize, however, that the bias values and weights used in these simulations are clearly only approximations. In future work it will be important to include more precise independently motivated item-specific parameter estimates as well as systematic procedures for assigning weights. Work building on the results presented here that moves considerably in this direction is reported in Hanna et al., 1997, McRae et al., 1998, and Tanenhaus et al., in press. In addition, it will be important to evaluate the particulars of the competition algorithm and the function that maps competition onto reading times. For example, it is an open question as to whether normalized recurrence is the optimal competition algorithm. It is possible that explicit inhibitory connections between the RR and MC nodes (Spivey-Knowlton, 1994), or perhaps even a simple decay function followed by normalization, might be sufficient to produce competition that will fit the data. 9 Similarly, the particular function used for the dynamic criterion may be overly simplistic. General Discussion The research presented here makes two contributions. The first is primarily empirical. Experiments 1 and 2 showed that processing difficulty for reduced relative clauses was sharply reduced when the context contained a pair of referents for the initial noun phrase. This result strongly supports the claim that readers make immediate use of constraints established by the discourse context during ambiguity resolution. We also presented suggestive evidence that discourse context interacts with lexical frequency, a fact that accounts for discrepancies in the discourse context literature, as proposed by MacDonald et al. (1994). Consider, for example, two studies with reduced relatives that failed to find referential context effects. Britt et al. (1992, Experiment 3) had a somewhat effective contextual manipulation (17% RR completions in the two-referent context, cf. SpiveyKnowlton & Tanenhaus, 1994), but the verbs in their target sentences only weakly supported the past participle tense (mean past participle availability: .57, compared with .76 for the verbs in our study). In contrast, the verbs used by Murray and Liversedge (1994) had relatively high past participle availability (.68), but their contextual manipulation Was weak (10% RR completions in the two-referent context). The second contribution is primarily theoretical. We showed that reading times for individual items, as well as superficially conflicting data patterns in the ambiguity resolution literature, can be simulated within a multiple constraints framework using a simple competition algorithm. The specific computational model described in this article demonstrates how graded variation in context effects, across stimulus items as well as across experiments, can be due to informational biases inherent in the stimulus materials, not to architectural constraints on the processing system. In fact, with fixed model parameters, and varying only stimulus parameters, the competitive integration model accounts for a variety of effects in syntactic ambiguity resolution that were previously seen as mutually exclusive.

1541

This fact highlights the importance of developing quantitative models of sentence processing, and the dangers of making simple inferences from reading times in the absence of explicit models. For example, the standard assumption that increased processing difficulty at points of ambiguity is evidence for a garden-path is clearly problematic, as is the assumption that delayed effects of a constraint mean that the constraint is not used during a putative first-stage in processing. Rather, the visibility of the effects of different constraints will vary depending on their strength and availability, as well as on the presence of other relevant constraints. Thus, constraint-based principles can provide a general mechanism for understanding properties of ambiguity resolution that have often led researchers to propose linguistically specific architectural constraints. In summary, then, the results of our experiments and simulations support claims about the importance of discourse representations in online syntactic ambiguity resolution, made by proponents of referential theory (e.g., Altmann & Steedman, 1988; Crain & Steedman, 1985). They also support recent claims about the importance of lexical representations in online syntactic ambiguity resolution made by proponents of the constraint-based lexicalist approach to ambiguity resolution (e.g., MacDonald et al., 1994; TruesweU & Tanenhaus, 1994). Finally, they suggest that constraint-based models incorporating competition provide a general framework for understanding the time-course with which these, and other, constraints are integrated in real-time sentence processing.

9 In the present work, we avoided explicit inhibitory connections because we see the RR and MC nodes as abstractions over patterns of activation at the lower level. Thus, they act as emergent representations with no explicit nodes in the system (hence, their activation does not accumulate, it is simply recomputed at each iteration). In addition to trying inhibitory connections, we also tried several versions of a simple decay function followed by normalization, but the results of this competition algorithm did not match the data. In the end, the normalized recurrence algorithm provided, by far, the best account of the data. References Altmann, G., Garnham, A., & Dennis, Y. (1992). Avoiding the garden-path: Eye movements in context. Journal of Memory and Language, 31, 685-712. Altmann, G., Garnham, A., & Henstra, L (1994). Effects of syntax in human sentence parsing: Evidence against a structure-based proposal mechanism. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 209-216. Altmann, G., & Steedman, M. (1988). Interaction with context during human sentence processing. Cognition, 30, 191-238. Bates, E., & MacWhinney, B. (1989). Functionalism and the competition model. In B. MacWhinney & E. Bates (Eds.), The crosslinguistic study of sentence processing. New York: Cambridge University Press. Britt, M. A. (1994). The interaction of referential ambiguity and argument structure in parsing of prepositional phrases. Journal of Memory and Language, 33, 251-283. Britt, M. A., Peffetti, C. A., Garrod, S., & Rayner, K. (1992).

1542

S~VEYANDTANENHAUS

Parsing and discourse: Context effects and their limits. Journal of Memory and Language, 31, 293-314. Burgess, C., &Lund, K. (1994). Multiple constraints in syntactic ambiguity resolution: A connectionist account of psycholinguistic data. Proceedings of the 16th Annual Conference of the Cognitive Science Society (pp. 90-95). I-Iilisdale, NJ: Erlbaum. Burgess, C., Tanenhaus, M., & Hoffman, M. (1994). Parafoveal and semantic effects on syntactic ambiguity resolution. Proceedings

of the 16th Annual Conference of the Cognitive Science Society (pp. 96-99). Hillsdale, NJ: Erlbaum. Carandini, M., & Heeger, D. (1994). Summation and division by neurons in primate visual cortex..Science, 264, 1333-1336. Clifton, C., & Ferreira, E (1989). Ambiguity in context. Language and Cognitive Processes, 4, 77-104. Connine, C., Ferreira, E, Jones, C., Clifton, C., & Frazier, L. (1984). Verb frame preferences: Descriptive norms. Journal of Psycholinguistic Research, 13, 307-319. Crain, S., & Steedman, M. (1985). On not being led up the garden path. In D. Dowty, L. Kartunnen, & H. Zwicky (Eds.), Natural language parsing. Cambridge, England: Cambridge University Press. Douglas, R., Koch, C., Mahowald, M., Martin, K., & Suarez, H. (1995). Recurrent excitation in neocortical circuits. Science, 269, 981-985. Ferreira, E, & Clifton, C. (1986). The independence of syntactic processing. Journal of Memory and Language, 25, 348-368. Ferreira, E, & Henderson, J. (1993). Reading processes during syntactic analysis and reanalysis. Canadian Journal of Experimental Psychology, 47, 247-275. Francis, W., & Ku~era, H. (1982). Frequency analysis of English usage: Lexicon and grammar. Boston: Houghton Mifflin. Frazier, L. (1987). Theories of syntactic processing. In J. Garfield

(Ed.), Modularity in knowledge representation and natural language processing. Cambridge, MA: MIT Press. Gibson, E., Schutze, C., & Salomon, A. (1996). The relationship between the frequency and the processing complexity of linguistic structure. Journal of Psycholinguistic Research, 25, 59-92. Hanna, J., Barker, C., & Tanenhaus, M. (1995). Integrating local

and discourse constraints in resolving lexical thematic ambiguities. Poster presented at the Eighth Annual City University of New York Conference on Human Sentence Processing, Tucson, AZ. Hanna, J., Spivey-Knowlton, M., & Tanenhans, M. (1996). Integrating discourse and lexical frequency information in resolving lexical thematic ambiguities. Proceedings of the 15th Annual Conference of the Cognitive Science Society (pp. 266--271). Mahwah, NJ: Erlbaum. Hanna, J., Tanenhaus, M., & Spivey-Knowlton, M. (1997). Integrat-

ing contextual and sentential constraints in ambiguity resolution. Manuscript in preparation. Juiiano, C., & Tanenhaus, M. (1995). A constraint-based lexicalist account of subject/object attachment ambiguities. Journal of Psycholinguistic Research, 23, 459-471. Jurafsky, D. (1996). A probabilistic model of lexical and syntactic access and disambiguation. Cognitive Science, 20, 137-194. Liversedge, S. (1994). Referential context, relative clauses and syntactic parsing. Unpublished doctoral dissertation, University of Dundee, Scotland. MacDonald, M. (1994). Probabilistic constraints and syntactic ambiguity resolution. Language and Cognitive Processes, 9, 692-715. MacDonald, M., Pearlmutter, N., & Seidenberg, M. (1994). The lexical nature of syntactic ambiguity resolution. Psychological Review, 101, 676-703. Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a

large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19, 313-330. McClelland, J., & Rumelhart, D. (1981). An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychological Review, 88, 375--407. McClelland, J., St. John, M., & Taraban, R. (1989). Sentence comprehension: A parallel distributed approach. Language & Cognitive Processes, 4, 287-335. McRae, K., Spivey-Knowlton, M., & Tanenhaus, M. (1998). Modeling the effects of thematic fit (and other constraints) in on-line sentence comprehension. Journal of Memory and Language, 37, 283-312. Mitchell, D., Corley, M., & Garnham, A. (1992). Effects of context in human sentence parsing: Evidence against a discourse-based proposal mechanism. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 69-88. Mitchell, D., Cuetos, E, Corley, M., Brysbaert, M. (1995). Exposurebased models of human parsing: Evidence for the use of coarse-grained (nonlexical) statistical records. Journal of Psycholinguistic Research, 24, 469-488. Murray, W., & Liversedge, S. (1994). Referential context effects on syntactic processing. In C. Clifton, K. Rayner, & L. Frazier (Eds.), Perspectives in Sentence Processing (pp. 359-388). Hillsdaie, NJ: Erlbaum. Pearlmutter, N., Daugherty, K., MacDonald, M., & Seidenberg, M. (1994). Modeling the use of frequency and contextual biases in sentence processing. In Proceedings of the 16th Annual Conference of the Cognitive Science Society (pp. 699-704). Hillsdale, NJ: Erlbaum. Pearlmutter, N., & MacDonald, M. (1995). Probabilistic constraints and working memory capacity in syntactic ambiguity resolution. Journal of Memory and Language, 43, 521-542. Rayner, K., & Pollatsek, A. (1989). The psychology of reading. Englewood Cliffs, NJ: Prentice Hall. Rayner, K., Sereno, S., Morris, R., Schmauder, R., & Clifton, C. (1989). Eye movements and on-line language comprehension processes. Language and Cognitive Processes, 4, 21-50. Rumelhart, D., & McClelland, J. (1982). An interactive activation model of context effects in letter perception: II. The contextual enhancement effect and some tests and extensions of the model. Psychological Review, 89, 60-94. Sedivy, J., & Spivey-Knowlton, M. (1994). The use of structural, lexical, and pragmatic information in parsing attachment ambiguities. In C. Clifton, L. Frazier, & K. Rayner (Eds.), Perspectives on sentence processing. I-Iillsdale, NJ: Erlbaum. Solomon, R., & Postman, L. (1952). Frequency of usage as a determinant of recognition thresholds for words. Journal of Experimental Psychology, 43, 195-201. Spivey-Knowlton, M. (1992). Another context effect in sentence processing: Implications for the principle of referential support.

Proceedings of the 14th Annual Conference of the Cognitive Science Society (pp. 486--491). Hillsdale, N J: Erlbaum. Spivey-Knowlton, M. (1994). Quantitative predictions from a constraint-based theory of syntactic ambiguity resolution. Proceedings of the 1993 Connectionist Models Summer School (pp. 130-137). Hillsdale, NJ: Erlbaum. Spivey-Knowlton, M. (1996). Integration of linguistic and visual information: Human data and model simulations. Unpublished doctoral dissertation, University of Rochester. Spivey-Knowlton, M., & Sedivy, J. (1995). Resolving attachment ambiguities with multiple constraints. Cognition, 55, 227-267. Spivey-Knowlton, M., & Tanenhaus, M. (1994). Referential context and syntactic ambiguity resolution. In C. Clifton, K. Rayner & L. Frazier (Eds.), Perspectives in sentence processing (pp. 415-439). Hillsdale, NJ: Erlbaum.

DISCOURSE AND LEXlCAL CONSTRAINTS Spivey-Knowlton, M., Tmeswell, J., & Tanenhaus, M. (1993). Context effects in syntactic ambiguity resolution: Discourse and semantic influences in parsing reduced relative clauses. Canadian Journal of Experimental Psychology, 37, 276--309. Stevenson, S. 0994). A competitive attachment model for resolving syntacticambiguities in natural language parsing. Unpubfished doctoral dissertation,University of Maryland College Park.

Tabor, W., Juliano, C., & Tanenhans, M. K. (1997). Parsing in a dynamical system: An attractor-based account of the interaction of lexical and structural constraints in sentence processing. Language & Cognitive Processes, 12, 211-271. Tabor, W., & Tanenhans, M. K. (1998). Dynamical models of sentence processing. Manuscript submitted for publication. Tabossi, P., Spivey-Knowlton, M., McRae, K., & Tanenhaus, M. (1994). Semantic effects on syntactic ambiguity resolution: Evidence for a constraint-based resolution process. In C. Umilut & M. Moscovitch (Eds.), Attention and performance XE Hillsdale, NJ: Erlbaum. Tanenhaus, M., Spivey-Knowlton, M., Eherhard, K., & Sedivy, J. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 1632-1634. Tanenhans, M., Spivey-Knowlton, M., & Hanna, J. (in press). Modeling the effects of discourse and thematic fit in syntactic ambiguity resolution. In M. Crocker, M. Picketing, & C. Clifton (Eds.), Architectures and Mechanisms of Language Acquisition and Processing. Tanenhans, M., & Trueswell, J. (1995). Sentence comprehension. In J. Miller & P. Eimas (Eds.), Speech language and communication. Volume 11 of the handbook of perception and cognition (pp. 217-262). San Diego, CA: Academic Press.

1543

TruesweU, J. (1996). The role of lexical frequency in syntactic ambiguity resolution. Journal of Memory and Language, 35, 566-585. Trueswell, J., Kim, A., Lurid, K., Burgess, C. (1995). Thematic fit as discourse instantiation of leMcdily specific information: A distributed processing model of thematic integration. Poster presented at the 8th annual CUNY Conference on Human Sentence Processing, University of Arizona, Tucson, AZ. Trueswell, J., & Tanenhans, M. (1991). Tense, temporal context and syntactic ambiguity resolution. Language & Cognitive Processes, 6, 303--338. Trueswell, J., & Tanenhans, M. (1994). Toward a lexicalist approach to syntactic ambiguity resolution. In C. Cfifton, L. Frazier, & K. Rayner (Eds.), Perspectives on sentence processing. Hillsdale, NJ: Erlbaum. Tmeswell, J., Tanenlmus, M., & Garnsey, S. (1994). Semantic influences on parsing: Use of thematic role information in syntactic disambiguation. Journal of Memory and Language, 33, 285-318. van Berkum, J., Hagoort, P., & Brown, C. (1998). Rapid discourse context effects in sentence processing: ERP evidence. Talk presented at the 1lth Annual City University of New York Conference on Human Sentence Processing, Rutgers University, NJ. Vaughan, J. (1983). Control of fixation duration in visual search and memory search: Another look. Journal of Experimental Psychology: Human Perception and Performance, 8, 709--723. Received September 25, 1995 Revision received April 10, 1998 Accepted April 10, 1998 •