Achieving Robust Human-Computer Communication ... - CiteSeerX

2 downloads 6965 Views 326KB Size Report
Nov 25, 1997 - the interaction that a computer system builds over the course of the ... might choose to interrupt the dialog to initiate a repair by herself or the ...
Achieving Robust Human-Computer Communication Susan W. McRoy Department of Electrical Engineering and Computer Science and Department of Mathematical Sciences University of Wisconsin-Milwaukee Milwaukee, WI 53201 [email protected] November 25, 1997

Abstract This paper describes a computational approach to robust human-computer interaction. The approach relies on an explicit, declarative representation of the content and structure of the interaction that a computer system builds over the course of the interaction. In this paper, we will show how this representation allows the system to recognize and repair misunderstandings between the human and the computer. We demonstrate the utility of the representations by showing how they facilitate the repair process.

1

1 Introduction In dialogs between people or between people and machines, understanding is an uncertain process. If the goals or beliefs of two discourse1 participants di er, one of them might interpret an event in the dialog in a way that she believes is complete and correct, although her interpretation is not the one that the other one had intended. When this happens, we as analysts (or observers) would say that a misunderstanding has occurred. The participants themselves might come to know it as a misunderstanding as well|if the problem manifests itself as something incongruous in the interaction. Discourse participants can circumvent many misunderstandings by planning their actions carefully. For example, they can try to correct apparent misconceptions or try to clarify a potential ambiguity. This has been the predominant approach [Calistri-Yeh, 1991, Eller and Carberry, 1992, Goodman, 1985, McCoy, 1985, Zukerman, 1991]. However, because participants' beliefs and goals always di er to some extent, it is impossible to prevent all misunderstandings. Moreover, because people do not always recognize when communication has broken down, participants cannot always rely on each other's judgments about what has been understood. Although misunderstanding cannot be fully prevented, discourse participants can continue to communicate by detecting and repairing the problem. For example, one participant might notice an anomaly and attribute it to a misunderstanding. Then, that participant might choose to interrupt the dialog to initiate a repair by herself or the other. However, for misunderstandings to be diagnosed and resolved, discourse participants must have access to Here we use \discourse" to refer to the general notion of a multi-party communication that continues over a number of actions by various participants using various modalities, such as writing, mouse-clicks, or speech; \dialog" refers to the subclass of two-party interactions using various modalities, not necessarily restricted to speech. 1

2

an internal representation of the dialog that they can use to evaluate the reasonableness of each new action [McRoy, 1995a, McRoy and Hirst, 1995a]. To construct repairs, they can use this representation to re-evaluate earlier interpretations. This paper describes a computational representation of human-computer communication that supports exible and coherent dialog. The representation has been incorporated in a collaborative tutoring system, called B2, that allows medical students to practice their decision-making skills and to receive natural language explanations of complex statistical relationships. The primary modality of interaction is natural language, but other modalities (such as mouse-clicks) are also supported. Each participant in the dialogs can make statements, ask questions, or generate requests. Some of these actions are used to initiate repairs. Moreover, some of these actions would be ambiguous (or uninterpretable) out of context. For example, in dialogs between students and a tutoring system that we have been building, students ask the system questions such as \Is CT2 the best test to rule in gallstones?" (referring to a medical case that has been under discussion) and can follow up with more context-dependent questions, such as \What about HIDA3 ?" or \Why?" To support robust understanding, a ne-grained representation of the dialog has been used because, in the cases of unexpected dialog actions, the system needs to explain why the student might have said what she did and to decide what it should do to restore the coherence of its model. Producing an explanation might also require that the system make some (reasonable, but perhaps unprovable) assumptions about what the other participant believes. After interpretation, these explanations become part of the system's understanding of the dialog. The knowledge that serves as a basis for the system's reasoning includes information 2 3

CT stands for computed tomography, a diagnostic test. HIDA stands for radio-nuclide hepatobilary imaging, another diagonistic test.

3

about language and language use, facts about the world, and information about the content and structure of prior discourse. The representation framework used to represent this information is a mixed-depth representation, that is, one that mixes representations from di erent levels of abstraction or detail. A mixed-depth representation is as informative as needed, without overcommitment, because it represents each component concept at the most relevant level of abstraction, given the task at hand. This framework thus allows us to integrate information from the di erent sources incrementally, as they are needed. It also allows us to capture vagueness or ambiguity that is present in the utterance form. The design of the discourse model used in our work builds on the McRoy and Hirst's previous work on the Recognition and Repair of Speech Act Misunderstandings (RRM) project [McRoy and Hirst, 1993, McRoy, 1995b, McRoy and Hirst, 1995b]. This work uni es theories of speech-act production, interpretation, and repair. It combines intentional and social accounts of interaction to constrain the amount of reasoning that is necessary to identify a plausible interpretation. B2 is being developed using the Common LISP programming language. We are using the SNePS 2.3.1 and ANALOG 1.1 tools to create the lexicon, parser, generator, and underlying knowledge representations of domain and discourse information[Ali, 1994a, Ali, 1994b, Shapiro and Group, 1992, Shapiro and Rapaport, 1992]. Developed at the State University of New York at Bu alo, SNePS (Semantic Network Processing System) provides tools for building and reasoning over nodes in a propositional semantic network. (We will de ne what a semantic network is shortly.) An Internet-accessible, graphical front-end to B2 has been developed using the JAVA 1.1 programming language. It can be run using a network browser, such as Netscape. The interface that the user sees communicates with a server-side program that initiates a LISP 4

process.4 (The Bayesian network software is executed from LISP as a foreign function.) The focus of this paper is on describing the internal representation of the dialog that is constructed by our tutoring system. It will show how the representations facilitate reasoning about understanding and misunderstanding. (It does not consider the problem of nonunderstanding, as it is beyond the scope of this paper.) The next section begins by reviewing some previous approaches to misunderstanding. Sections 3 and 4 introduce the kinds of information in the discourse model, along with their representation. Section 5 describes our account of coherence in dialog and its use of the representation. Section 6 then explains the discourse model that would be constructed for an example dialog. The paper concludes with Section 7.

2 Background Most previous work on representations of interactive discourse have focused on just one aspect of communication, such as interpretation or generation. Also typical is that discourse understanding is treated as generalization of parsing, in which plans (recipes for action) take the place of grammar rules [Lambert and Carberry, 1991, Litman, 1986, Novick and Ward, 1993]. This limits the applicability of these approaches to interactions that t one of the pre-de ned plans. However, in exible communication between people and computers, surprises are inevitable: 

The user's attention might not be focused on the aspect of the presentation that the system expects.

The URL for this site is http://tigger.cs.uwm.edu/ when the server program is running. 4

5

wendy/b2/b2.html; however,

it is only available



The user might not have the same understanding of what a verbal description or a graphical image is meant to convey.



The user might lack some of the requisite knowledge of the domain necessary to interpret a proposed explanation.

Thus, the discourse model must include a representation of what actions the participants' performed, what inferences those actions would warrant, and what evidence there is for a misunderstanding between the explanation system and the user. A computer system that includes information about possible misunderstandings as well as intended interpretations would have to deal with a potential combinatoric explosion of alternatives. To prevent this, a system must be able to focus its processing on the most likely interpretations and avoid extended inference, unless it is necessary. The approach taken by McRoy and Hirst [1995b, 1993, 1995b] combines intentional and social accounts of discourse to capture the expectations that help constrain discourse, without requiring extended inference about plans. In intentional accounts, speakers use their beliefs, goals, and expectations to decide what to say; when they interpret an utterance, speakers identify goals that might account for it. This process can be time-consuming because speakers may have many di erent goals that they are trying to achieve. In sociological accounts provided by Ethnomethodology, discourse interactions and the resolution of misunderstandings are normal activities guided by social conventions. These conventions include expected sequences of utterances, such as question{answer. The approach extends intentional accounts by using expectations deriving from social conventions to guide interpretation. As a result, it does not require reasoning about mutual beliefs5 or multiple levels of goals, avoiding most of the 5

Mutual beliefs are those of the form \I believe that you believe that I believe

6

:::

".

time-consuming inference that is required by plan-based approaches such as [Allen, 1983, Carberry, 1990, Litman, 1986]. A prototype system RRM has been implemented in a logic-programming framework for default and abductive reasoning. The system includes an axiomatization of McRoy and Hirst's theory of how expected actions are understood and how misunderstandings can lead to unexpected actions and utterances. (The speci cs of this theory will be discussed in Section 5, as it forms the basis for our current work.) An action is explained as a manifestation of misunderstanding if it is not expected, given the prior interaction, and there is a plausible reason for supposing that misunderstanding has occurred. The system builds a model of the dialog as it is communicating. This model includes a representation of the speech act interpretation that the system gave to each of the utterances (that it either produced or received), including a temporal ordering relation over the speech acts, and a representation of the beliefs that such speech acts conventionally express. The model also enables the system to form expectations about the types of speech acts that could be expected to occur next. This model also enables the system to detect inconsistencies in the beliefs that had been expressed over the course of a dialog and to resolve the inconsistencies by providing new interpretations of previous utterances. The prototype has been used to simulate conversations involving the detection and correction of speech-act misunderstandings that occurred in human{human dialogs, previously discussed in the literature. (In the simulations, two versions of the system actually talk to each other|given some initial beliefs and goals.) Although this work demonstrated the utility of combining multiple sources of knowledge, the prototype implementation was too inecient for practical use and did not represent enough semantic information to account for misunderstandings of word or phrase-level information. The work described here attempts 7

to address these shortcomings. Representative of plan-based approaches, is the work by Ardissono, Boella, and Damiano [1998]. This work also provides a uni ed theory of interaction that addresses misunderstandings, but does so by extending a purely intentional (plan-based) approach. (Plan-based approaches formulate knowledge as recipes for action, akin to STRIPS operators [Fikes and Nilsson, 1971], that specify the preconditions and constraints that must be true for an action to succeed, and the e ects or goals that an action achieves.) Like previous plan-based approaches to discourse by Litman [1986] and Lambert and Carberry [1991], interpretations are recognized by trying to nd a plan-based relation that links the utterance and the previous context. For example, a new utterance would be coherent if it seeks to gain information about the state of a constraint. Variations to domain plans are allowed by introducing meta-level actions that manage the interaction itself. The system's representation of the dialog consists of a structure that includes an instantiated plan recipe that links, by means of relations such as precondition or subgoal, all of the utterances that have been made by either participant. The values used to instantiate a recipe include more speci c instances of actions (instantiated recipes), complex mutual beliefs6, or goals. Ardissono et al extend the plan-recognition algorithm so that misunderstandings are recognized when the system fails to nd a plan-based relation. When such failures occur, a meta-level action results in a subgoal being posted to resolve the misunderstanding; repairs are actions that are performed to satisfy these goals. Thus, in comparison with McRoy and Hirst, the meta-level action of posting of a goal (e.g., to nd an alternate interpretation) takes the place of the RRM theorem prover's attempt to prove that such an alternative For example, one e ect of a question is that the speaker and hearer mutually believe that the speaker has the communicative intention of expressing that the speaker has the goal of the hearer adopting a goal to answer the question. 6

8

exists. The strengths of purely plan-based approaches are in the uniformity of the representations that are used to specify both domain and discourse information and the wide-spread availability of plan-recognition algorithms. Also, the work of Ardissono et al accounts for several types of breakdown, including ambiguities in syntax, semantics, or pragmatics. The primary weakness of all plan-based approaches, is that they do not provide an explicit account of the constraints needed to focus the search for an interpretation; instead, it embeds them in the control structure of the plan-recognition algorithm. Also, the domain must be kept narrow, so that the number of domain-level ambiguities, and hence the e ort needed to identify plan-base relations, is minimized. In the next section, we consider the design of the discourse model used in B2. To allow for both exibility and robustness, we have based the B2 discourse model on the approach of McRoy and Hirst [1995b]. However, to address the uniformity and eciency issues, B2 represents knowledge in a semantic network framework, that we will discuss in Section 4.

3 The Kinds of Information in the Discourse Model Now we will consider the information that is represented by the B2 system. The discourse model combines information about the discourse level actions performed by the system, as well as its interpretation of the user's utterances. Consider the dialogue shown in Figure 1.

B2: What is the best test to rule in Gallstones? Student: HIDA. B2: No, CT. Student: Ok. Figure 1: A Dialogue between the B2 System and a Medical Student 9

In the discourse model, this dialogue leads to the assertion of a number of propositions about what was said by each participant, how the system interpreted what was said as a discourse action, and how each action relates to prior ones. For the exchange above, the knowledge representation would include representations of facts that have been glossed as shown in Figure 2.

What was said u1 The system asked What is the best test to rule in gallstones?. u2 The student said HIDA. u3 The system said No. u4 The system said CT. u5 The student said Ok. The system's interpretation of what was said i1 The system asks What is the best test to rule in gallstones?. i2 The student answers HIDA is the best test to rule in gallstones. i3 The system discon rms No, HIDA is not the best test to rule in gallstones i4 The system informs CT is the best test to rule in gallstones. i5 The student agrees that CT is the best test to rule in gallstones. Relations r1 i1 is the interpretation of u1 r2 i2 is the interpretation of u2 r3 i3 is the interpretation of u3 r4 i4 is the interpretation of u4 r5 i5 is the interpretation of u5 r6 i2 accepts i1 r7 i3 accepts i2 r8 i5 accepts i4 r9 i4 justi es i3 Figure 2: Propositions that would be Added to the Discourse Model This model of the discourse is used to both interpret students' utterances and to generate the system's responses. When a user produces an utterance, the parser will generate a representation of its surface content (e.g., a word, phrase, or sentence), form (e.g., interrog10

ative, declarative, or imperative), and attitude (e.g., action, belief, or desire). B2 then uses its model of the discourse to build an interpretion of the utterance that captures both its complete propositional content and its relationship to the preceding discourse. This model speci es what the participants have expressed in the discourse, which may be di erent from what they actually believe or are believed to believe. Having this discourse model allows the system greater exibility than previous explanation systems, because it enables the system to judge whether: 

The student understood the question that was just asked and has produced a response that can be evaluated as an answer to it; or



The student has rejected the question and is asking a question of her own; or



The student has misunderstood the question and has produced a response that can be analyzed to determine how it might repair the misunderstanding.

For example, if the student had answered \No" instead of \HIDA", after utterance u1, it would have been a good indication that she had misread (or misheard) the question. If she had responded \What is the patient's sex?", it would have been an indication that she feels that more information about the medical case must be speci ed before a correct answer can be determined. Similarly, when the system generates an utterance, the discourse model will be augmented to include a representation of it and its relation to the preceding discourse. These representations of the system's own outputs are useful, because they allow it to produced a focused answer to a question like \Why?", yet still be able to respond to requests for more information, such as \What about ultrasound?". 11

4 The Representation of Discourse The discourse model has ve levels of representation, shown in Figure 3. The utterance level represents the linguistic or graphical form of communication between the user and the system. The utterance sequence level represents the temporal ordering of utterances. The interpretation level represents the communicative role that the utterance achieves and the domain plan that it invokes. The exchange level represents the socially-de ned relationship that determines whether a communicative act has been accepted or understood. The exchange interpretation-level represents (explicitly) the acceptance or failure of an exchange. interpretation of exchanges exchanges (pairs of interpretations) system’s interpretation of each utterance sequence of utterances utterance level

Figure 3: Five Levels of Representation These levels capture the content expressed by the student and the system, as well as the relations that link their actions to the ongoing discourse. Although many systems give questions and requests a procedural semantics, (interpreting them as operations to be performed, or queries to be derived) such systems are not able to recover from misunderstandings. Thus, in B2 the discourse model includes a representation of all types of communicative acts, including questions, requests, and statements of fact.

12

4.1 The Representation Language The system represents both domain knowledge and discourse knowledge in a uniform framework as a propositional semantic network. A propositional semantic network is a framework for representing the concepts of a cognitive agent that can use language (hence the term semantic). The information is represented as a graph composed of nodes and labeled directed arcs. In a propositional semantic network, the propositions are represented by the nodes, rather than the arcs; arcs represent only non-conceptual binary relations between nodes. The particular semantic network framework that we are using is SNePS [Shapiro and Group, 1992] with ANALOG [Ali, 1994a, Ali, 1994b]. This framework has the following additional constraints: 1. Each node represents a unique concept. 2. Each concept represented in the network is represented by a unique node. 3. The knowledge represented about each concept is represented by the structure of the entire network connected to the node that represents that concept. These constraints allow ecient inference when processing natural language. For example, such networks can represent complex descriptions (common in the medical domain), and can support the resolution of ellipsis and anaphora, as well as general reasoning tasks such as concept subsumption [Ali, 1994a, Ali, 1994b, Maida and Shapiro, 1982, Shapiro and Rapaport, 1987, Shapiro and Rapaport, 1992]. Propositions are expressed using case frames. A case frame is a conventionally agreed upon set of arcs emanating from a node. For example, to express that A isa B we use the MEMBER{CLASS case frame, which is a node with a MEMBER arc and a CLASS arc. The 13

SNePS dictionary [1994] provides a dictionary of standard case frames used in our semantic network. Additional case frames can be de ned as needed. In the knowledge representation, assertion is the mechanism that is used to indicate that a proposition is believed by the cognitive agent that is implemented by the system. We use unasserted nodes to represent descriptions of observations, including surface-level representations of the user's utterances. After interpretation, the system may decide to accept this new information as credible and use it as a basis for inference. When this occurs, it asserts the corresponding node and treats the concept as a belief. (In network diagrams, an exclamation point to the right of a node is used indicate that it has been asserted.) Thus, for an agent, believing is not automatic; it is a result of inference that it performs. An important advantage of this representation is that it is uniform. In B2, discourse knowledge, domain knowledge, and probabilistic knowledge (from the Bayesian net) are all represented in the same knowledge base using the same inference processes. This uniformity facilitates the interaction between di erent processing tasks (such as domain or discourse level reasoning). We will now consider each of the ve levels of the discourse model in turn, starting with the utterance level, shown at the bottom of Figure 3. Then, in the next section, we will consider how these representations are used to achieve robust understanding. (For more details about the knowledge representation, see [Ali et al., 1997, McRoy et al., 1997].)

4.2 The Utterance Level For all inputs, the parser produces a representation of its surface content, which the system will add to its model as part of an occurrence of an event of type say. This representation describes the syntactic structure of the utterance along with some semantic annotations, such 14

as MEMBER{CLASS information. The grammar has broad linguistic coverage (comparable to Winograd [1983]) to accommodate the variety of structures that people might use across a variety of domains. The grammar also parses and builds representations for utterances that are fragmentary and ambiguous. Such utterances often require extensive reasoning about the domain or the discourse model to fully resolve; the system avoids unnecessary inference by producing a mixed-depth representation. A mixed-depth representation is one that may be shallow or deep in di erent places, depending on what was known or needed at the time the representation was created [Hirst and Ryan, 1992]. Moreover, \shallow" and \deep" are a matter of degree. Shallow representations can be a surface syntactic structure, or it can be the text itself (as a string of characters). Deep representations might be a conventional rst-order (or higher-order) AI knowledge representation, taking into account such aspects of language understanding as lexical disambiguation, marking case relations, attachment of modi ers of uncertain placement, reference resolution, quanti er scoping, and distinguishing extensional, intensional, generic, and descriptive noun phrases. Unlike quasi-logical form, which is used primarily for storage of information, mixed-depth representations are well-formed propositions, subject to logical inference. Disambiguation, when it occurs, is done by reasoning. In B2, deeper semantic analysis (which makes use of domain-speci c knowledge) will occur during the interpretation phase (to be described below), if it is needed. At that stage, the system would add additional annotations to account for domain-speci c referring expressions (e.g., the system might recognize that a story refers to a description of a medical case) or other underspeci ed meaning relations (e.g., the system might recognize that a possessive relation such as the patient's arm refers to a PART-OF relation, rather than a 15

KINSHIP relation, such as in the patient's mother). At the utterance level, the content of the user's utterance is always represented by what she said literally. For example, in the medical tutoring domain, the student may request a story problem directly, as an imperative sentence \Tell me a story." or indirectly, as a declarative sentence that expresses a desire \I want you to tell me a story.". Figures 4{7 include representations for the following examples (an imperative sentence, a declarative sentence, an interrogative sentence, and an interrogative sentence where the syntactic object has been moved to the front): 1. Tell me a story. (See Figure 4.) 2. I want you to tell me a story. (See Figure 5.) 3. What is the best test? (See Figure 6.) 4. What does HIDA detect? (See Figure 7.) Consider the example shown in Figure 4. In the network, square nodes correspond to potential discourse entities; round nodes are propositional concepts. Node B3 is the discourse entity corresponding to the utterance itself. Node M5 is a case frame that represents the structure of this utterance, which includes its syntactic form (imperative), the mental attitude that it expresses (action7), and its content (which is represented by node M4). Node M4 is the proposition that the hearer (B1) is to perform the action described by the node M6. Node M6 indicates the type of the action, tell (B4), the object of the action (B2), and the indirect object (recipient) of the action (B5). Nodes B1, B2, and B5 represent the discourse Attitude is the term that we use to indicate whether the content of the sentence describes a mental state (believe, want, or desire), a state of a airs (be, have), or an event (action). Attitude is indicated by the meaning of the main verb. 7

16

entities expressed by the noun phrases corresponding to the subject (the implied hearer), the object (a story), and the direct object me, respectively. Nodes M8 and M13 (attached to B1 and B5) are case frames to indicate that the noun phrases have been expressed as a pronoun. Attached to Node B2, there is a syntactic case frame that expresses that there is a determiner, a, and a semantic case frame that expresses that the entity B2 is a member of a class story. Throughout the gure, arcs marked LEX are used to indicate the word that the speaker used; word sense discrimination is achieved by attaching additional information to the appropriate discourse entity (e.g., B2's interpretation as a medical case).

17

action ATTITUDE UTTERANCE

M5

be

B3

ATTITUDE

FORM IMPER CONTENT

PRONOUN

B5 OBJECT

ACT

AGENT

IOBJECT

B1

M7 M11 CLASS

CLASS

MEMBER CLASS

M3

MEMBER

PRONOUN

OBJECT

"story"

ACTION DOBJECTNP

DET

M7 B4

M2

M10 LEX

M6

MEMBER

OBJECT

M4

M8

CLASS

MEMBER

"hearer"

CONTENT

"me"

M9

LEX

INT

LEX

M4

M8

B3

FORM

M12

OBJECT

UTTERANCE

M5

PRONOUN

M13

M2

B2

M1

LEX

LEX

B2

B1

M1

LEX

LEX

OBJECT

"test"

"a"

"what"

M6

M3

PROPERTY

"tell"

LEX

Figure 4: Node B3 represents an utterance whose form is imperative, and whose content Figure 6: Node B3 represents an utterance (M4) is the proposition that the hearer (B1) whose form is interrogative and whose conwill tell a story (B2) to the speaker (B5). tent is a proposition that the object designated by what (marked as the queried item) is a type of best test. "best"

desire ATTITUDE M10

UTTERANCE

B3

FORM DECL CONTENT

PRONOUN M2

M3 OBJECT

PRONOUN

M5

LEX

M6

FORM

B5

M13

M14

B4

CLASS

CONTENT

PRONOUN

DOBJECT

B2

B3

LEX AGENT

OBJECT

M4

"me"

LEX

COMPL

FORM

M7 M8

LEX "tell"

ACT

DOBJECT

MEMBER

M15

CLASS

M12 MEMBER

ACTION B7

LEX

M16

OBJECT

DOBJECT

M2

ACTION

M11 PRONOUN

MEMBER LEX

B8 M11

M10

IOBJECT

ACT

M9

B1

M3

CONTENT action

CLASS

AGENT

B6

ATTITUDE

"want"

M6

OBJECT PROPERNAME

UTTERANCE M19

B2

OBJECT

INT

IOBJECT

ACTION MEMBER

UTTERANCE

M5

"you"

M4

B1

"I"

ATTITUDE

LEX OBJECT

M1

action

M6

ACT

AGENT

"HIDA"

CLASS

"story"

NP M17

DET

B4 M8

M1

LEX

LEX

LEX

"what"

"a"

"detect"

M18

Figure 5: Node B3 represents an utterance whose form is declarative and whose content (which contains an embedded utterance) is that the speaker wants the hearer to tell a story to the speaker.

Figure 7: Node B2 represents an utterance whose form is interrogative and whose content is a proposition that HIDA detects an object designated by what.

18

4.3 Sequence of Utterances The second level corresponds to the sequence of utterances. (This level is comparable to the linguistic structure in the tripartite model of [Grosz and Sidner, 1986]). In the semantic network, we represent the sequencing of utterances explicitly, with asserted propositions that use the BEFORE-AFTER case frame. The order in which utterances occurred (system and user) can be determined by traversing these structures.

B2 What is the best test to rule in gallstones? 1 User: HIDA 2 B2: No, it is not the case that HIDA is the best test to rule in gallstones. CT is the best test to rule in gallstones.

3 4

Figure 8: A Sequence of Utterances Consider the example dialog shown in Figure 8 and its (partially glossed) representation, shown in Figure 9. In Figure 9, asserted propositional nodes M99, M103, M119, and M122 represent the events of utterances 1, 2, 3, and 4, respectively.8 For example, node M103 asserts that the user said HIDA. Asserted nodes M100, M104, M120, and M123 encode the order of the utterances. For example, node M120 asserts that the event of the user saying HIDA (M103) occurred just before the system said HIDA is not the best test to rule in gallstones (M119). Propositional expressions that are written in italics (for example, 9t.best-test(t, gallstones, jones-case)) represent subgraphs of the semantic network that are relevant, but that we have glossed for reasons of space. 8

19

M100

!

BEFORE

!

M104 AFTER

BEFORE

M120

AFTER

BEFORE

!

M123 AFTER

BEFORE

! AFTER

AGENT M99

. . .

! ACT

AGENT

M19 LEX

M103

ACTION

AGENT

M97

M14

M119

! ACT

ACTION

!

AGENT M122 ACT

M102

ACTION

!

ACT

M118

ACTION

M121

M7 LEX

system

LEX

OBJECT1

user

OBJECT1

OBJECT1

OBJECT1

say

t . best-test(t, gallstones, jones-case)

Hida

best-test(HIDA, gallstones, jones-case) best-test(CT, gallstones, jones-case)

Figure 9: Nodes M100, M104, M120, and M123 represent the sequence of utterances produced by the system and the user, shown in Figure 8. For example, Node M104 represents the proposition that the event M103 immediately followed event M99.

4.4 The Interpretation Level In the third level, we represent the system's interpretation of each utterance. Each utterance event (from level 1) will have an associated system interpretation, which is represented using the INTERPRETATION OF|INTERPRETATION case frame. For example, consider the interpretation of the utterance \Tell me a story." (as well as \I want you to tell me a story."), shown in Figure 10. The network indicates that the utterance has been interpreted as a request for the system to describe a case to the user. The system can distinguish multiple possible interpretations from the one that it believes is the correct. It does this by building several case frames of this type, but only asserting one as believed. Changed beliefs are represented by unasserting certain nodes, while asserting others. (The knowledge representation framework incorporates a justi cation-based truth 20

maintenance system to deal with contradictions.) INTERPRETATION

M22 AGENT

INTERPRETATION_OF

ACT

M21

M23

LEX

M31

"tell me a story"

ACTION

OBJECT1

"user’ M25

M24 ACT

LEX

AGENT

"request" M26 ACTION

M27 OBJECT1

LEX "system"

M28 LEX

B9 MEMBER M29

CLASS

M30

"describe" LEX "case"

Figure 10: Node M31 is a proposition that the interpretation of Tell me a story (which is glossed in this gure) is M22. Node M22 is the proposition that the user requested that the system describe a case to the user. (Describing a case is a domain-speci c action; the pronouns from the utterance level have been interpreted according to the context.)

4.5 The Exchange and Exchange Interpretation Levels The fourth and fth levels of representation in the discourse model are exchanges and interpretations of exchanges, respectively. A conversational exchange is a pair of interpreted events that ts one of the conventional structures for dialog (e.g., QUESTION{ANSWER); it is related to the notion of adjacency pair [Scheglo and Sacks, 1973]. An exchange may comprise a sequence of utterances or it may have another exchange nested within it, as when one defers the answer to a question to ask a clarifying question. (See Figure 11.) 21

B2 What is the best test to rule in gallstones? E1 User: What are my choices? E2 B2: CT, HIDA, Ultrasound for Cholecystitis E2 User:

and Ultrasound for Gallstones. CT.

E1

Figure 11: A pair of nested exchanges The system's knowledge includes a speci cation of these structures and cues for recognizing them. When recognized, an exchange will be interpreted either as an acceptance, indicating that it has been understood and completed, or as rejection, indicating that it was either abandoned or not understood. In cases where a repair is indicated, the system will step back through the history of exchanges to identify possible misunderstandings. (How this is done is discussed in Section 6.) Moreover, when an utterance can not be interpreted as either the completion of an open exchange or the start of a new one, it will be taken as evidence of a failure in understanding. Figure 12 gives the network representation of a conversational exchange and its interpretation. Node M113 represents the exchange in which the system has asked a question and the user has answered it. (This is determined by the type of action within each interpretation; the type names are indicated by nodes M7, an ask, and M106, an answer, respectively.) Using the MEMBER{CLASS case frame, propositional node M115 asserts that the node M113 is an exchange. Propositional node M112 represents the system's interpretation of this exchange: that the user has accepted the system's question (i.e., that the user has understood the question and requires no further clari cation). Finally, propositional node M116 represents the system's belief that node M112 is the interpretation of the exchange represented by node M113. 22

M115

!

M116 CLASS

MEMBER

INTERPRETATTION-OF

! INTERPRETATION

M114 M112

M113

!

LEX EVENT1

EVENT2

AGENT

ACT

exchange

M99 AGENT

!

M108

ACT

AGENT

M97

M19

M7

OBJECT1 ACT

M111

ACTION

M107

M14

ACTION LEX

!

ACTION

M110

M106

LEX

LEX system

LEX OBJECT1

LEX

user

ask

OBJECT1

accept

answer

t . best-test(t, gallstones, jones-case)

best-test(HIDA,gallstones,jones-case)

Figure 12: Node M115 represents the proposition that node M113 is an exchange comprised of the events M99 and M108. M108 is the proposition that The user answered \HIDA is the best test to rule in Gallstones". Additionally, node M116 represents the proposition that the interpretation of M113 is event M112. M112 is the proposition that the user has accepted M96. (M96 is the question that the system asked in event M99.)

23

4.6 Interaction among the Levels An advantage of the network representation is the knowledge sharing between these ve levels. We term this knowledge sharing associativity. This occurs because the representation is uniform and every concept is represented by a unique node. As a result, we can retrieve and make use of information that is represented in the network implicitly, by the arcs that connect propositional nodes. For example, if the system needed to explain why the user had said HIDA, it could follow the links from node M103 (shown in Figure 9) to the system's interpretation of that utterance, node M108 (shown in Figure 12, to determine that 

The user's utterance was understood as the answer within an exchange (node M113), and



The user's answer indicated her acceptance and understanding of the discourse, up to that point (node M112).

This same representation could be used to explain why the system believed that the user had understood the system's question. This associativity in the network is vital if the interaction starts to fail, because the system will need to reassess its interpretation of the user's previous utterances or its beliefs about how the user has interpreted the system's utterances.

5 An Account of Coherence in Dialog The discourse model described above represents the system's interpretation of what has occurred during its interaction with the user. For a system to allow for the possibility of misunderstanding, belief cannot be automatic. Instead, beliefs are conclusions (or assumptions) that it makes on the basis of whether an interpretation seems reasonable, given the 24

context of prior discourse. Moreover, such beliefs may later be withdrawn in the face of contradictory evidence. McRoy and Hirst [1995a, 1995a] provide a computational theory of coherence in dialog that accounts for factors such as social expectations (including conversational exchanges), linguistic information (including the link between discourse acts and the beliefs that they express), and mental attitudes, such as belief, desire, or intention. According to this theory, an action is consistent with a discourse only if the beliefs that it expresses are consistent with the beliefs expressed by prior actions. Together, these di erent factors enable a system to distinguish between actions that are reasonable, although not fully expected (such as an incorrect answer or a clari cation question) from actions that are indications of misunderstanding. The core of this theory includes six general schema for interpreting an exchange relationship. These schema apply to dialogs in which two dialog participants share initiative, each acting as both actor and addressee. The de nitions of the schema assume that a discourse model is a representation of the subjective perspective of a dialog participant. For clarity, the de nitions will be given in terms of the system and the user, assuming the system's understanding of the dialog and assuming that the system has the initiative. These same de nitions will apply when the user has initiative.9

Plan adoption In plan adoption, the system begins a new conversational exchange. An utterance counts as a coherent attempt to initiate a conversational exchange if: 

There is some action that the system wants the user to do.

The de nitions would also apply from the user's point of view; however, her interpretations would di er from the system's when there had been a misunderstanding. McRoy [1993] discusses this in detail, showing the representations that result when two copies of the system communicate with each other. 9

25



This action would be the expected response to the utterance.

There is an additional coherence constraint that both the beliefs that are expressed by the system's utterance and those that would be expressed by the user's response should be consistent with the beliefs already expressed in the discourse.10 For example, if the system asks a question, such as \What is the best test to rule in gallstones?", it is plan adoption, because the question creates an expectation for an answer. (Similarly, it is plan adoption if the student makes a request for the system to \Tell me a story." or asks the system \Why?".)

Acceptance In acceptance, the user continues a conversational exchange in a way that arms the previous utterance's role in the context. For example, providing the answer to a question legitimizes the question and displays the user's understanding of it. An action counts as acceptance if there is another action in the discourse that has created an expectation for this one. There is an additional constraint that the beliefs expressed by the actor must be consistent with those already expressed in the discourse. An example of acceptance would be when the student responds to a question, such as \What is the best test to rule in gallstones?", with an answer, such as \HIDA" or \I don't know.".

Other-misunderstanding In other-misunderstanding, the user's utterance can be explained by the system as a manifestation of a misunderstanding by the user. A user's utterance manifests her The derivation of expressed beliefs from the discourse model is a process that is still under re nement. Minimally it includes implicatures, as discussed in Hinkelman [1990]. 10

26

misunderstanding, if it does not display acceptance, but there is an alternative interpretation of an earlier utterance by the system for which it would display acceptance. An other-misunderstanding, if detected, normally leads to a third-turn repair (see Figure 13, which is described below). In the gure, utterance 2 can be taken to manifest the user's misunderstanding of the system's question, because the system was expecting an answer like \The patient does not have gallstones.", and the user's utterance would make sense if the referring expression a negative ultrasound were replaced by the one given by the user, an ultrasound.

Self-misunderstanding In self-misunderstanding, the user's action can be explained as the result of a misunderstanding by the system. An action can be counted as a sign that the system has misunderstood, if: 

The user's action is inconsistent with the beliefs already expressed in the discourse by some earlier utterance.



The prior utterance allows an alternative interpretation that renders the user's action consistent.

A self-misunderstanding, if detected, could lead the system to produce a fourth-turn repair (see Figure 14, which is described below), or the system could choose to revise its beliefs silently. In the example of Figure 14, the system interprets utterance 3 as an indication of self-misunderstanding, because it was expecting the user to acknowledge that her question (utterance 1) had been accepted and answered by utterance 2.

27

Third-turn repair In third-turn repair, the system initiates a repair to be performed by the user.11 A third-turn repair is indicated when there has been a misunderstanding by the user (an other-misunderstanding). There is an additional constraint that the ful llment of the system's original intention must still be possible.

B2: User: B2:

Given a pretest probability of gallstones of .135, what would a negative ultrasound suggest? 1 An ultrasound raises the probability of gallstones.

No. What would a negative test suggest?

2 3

Figure 13: A third-turn repair An example of a third-turn repair by B2 is shown in Figure 13. From the system's perspective, the user has apparently misread the question, confusing a negative ultrasound with a positive one. B2 attempts to recover by re-asking its question, emphasizing the salient term.

Fourth-turn repair In fourth-turn repair, the system initiates a self-repair, after a self-misunderstanding. A fourth-turn repair is indicated when there has been a misunderstanding by the system. There is an additional constraint that the ful llment of the user's original intention must still be possible. The terminology of third-turn repair and fourth-turn repair is from Scheglo [1992]. Turns are counted from the location of the problematic turn. Hence, a third-turn repair is one that is initiated by an actor after the addressee has produced a response that appears to demonstrate that the addressee has misunderstood. A fourth-turn repair is one that is initiated by an actor after misunderstanding some action (turn 1) that was performed by the addressee, then producing a response (turn 2), and then receiving feedback (turn 3) that indicates that there was trouble earlier. Note that each participant will have her own perspective on the problem. A third-turn repair by one participant can trigger a fourth-turn repair by the other. (This interpretation of events is slightly di erent from Scheglo 's original, which takes the view of the analyst, or an overhearer, of a conversation.) 11

28

An example of a third turn repair initiated by the user, followed by a fourth-turn repair by B2 is shown in Figure 14. In this example, the user requests an explanation. The system interprets it as a request to explain the result with respect to the current case, which includes the probability relationships. However, the user rejects this interpretation, and indicates that she wants the more general explanation, which includes the underlying cause. When utterance 2 is rejected, B2 recognizes self-misunderstanding of utterance 1, and produces the other type of explanation.

User: Why does a positive HIDA suggest gallstones? 1 B2 In the case of Mr Jones, the pretest probability of gallstones is 0.135. A positive HIDA test results in a post-test probability of 0.307.

User: I mean for what reason. B2: Oh. HIDA detects cholecystitis, which is caused by gallstones.

2 3 4

Figure 14: A dialog that illustrates a fourth-turn repair by B2 In our system, these schema guide the construction of the upper levels of the discourse model. The representation of discourse facilitates the reasoning described by these schema by allowing the system to represent all this information in the same network and by providing mechanisms for reasoning over the information eciently.

6 An Example We will now consider the example shown in Figure 14 in greater detail. Figure 15 illustrates the utterance sequence, interpretation, and exchange levels of representation that would

29

result after this conversation.12 Starting from the top of the gure, we see the following: M150 BEFORE

M160 AFTER

BEFORE

AFTER

"In the case of Mr. Jones, the pretest probability of gallstones is 0.135. A positive HIDA results in a posttest probability of 0.307"

"Why does a positive HIDA suggest Gallstones?"

The user requests that the system explain the probabiliity relation between HIDA and Gallstones

M170

INTERPRETATION-OF

"Oh. HIDA detects Cholecystitis, which is caused by gallstones."

The system tells the user the causality relations. between HIDA and Gallstones

The user requests a repair. (an alternative interpretation of an earlier utterance)

INTERPRETATION-OF

INTERPRETATION

INTERPRETATION-OF

INTERPRETATION

INTERPRETATION

M141

AFTER

"I mean for what reason."

The system tells the user the probability relations

INTERPRETATION-OF

BEFORE

M161

M151

INTERPRETATION M171

The user requests that the system explain the causality relation between HIDA and Gallstones.

INTERPRETATION-OF

INTERPRETATION M162

EVENT1

EVENT2

EVENT1

M152

EVENT2 M172

M114 LEX

MEMBER M153

CLASS

MEMBER CLASS

exchange

M173

Figure 15: This gure shows (a gloss of) the utterance level representations, the utterance sequence-level representations, (a gloss of) the interpretation level representations, and the exchange-level representations for a dialog containing a fourth-turn repair. 

Nodes M150, M160, and M170 represent the sequencing relations that hold between utterances 1{4. (The rst row of boxes gloss the subnetworks corresponding to the utterance representations, which would be similar to those discussed in section 4.2.)

Currently, our system produces simplistic natural language output, rather than the output shown. We are still working on improving the text planning part of our system. 12

30

The sequence nodes would be constructed using the utterance level representations produced by the parser. 

Nodes M141, M151, M161, and M171 represent the interpretations that B2 gives to utterances 1{4, at the time that it parsed them. (The second row of boxes gloss the subnetworks corresponding to the interpretations, which would be similar to those discussed in section 4.4.) The interpretations themselves correspond to representations of the speaker's actions on the domain or the discourse; these interpretations are derived from the utterance representations on the basis of linguistic information, social conventions, and domain-speci c plans. For example, to derive the interpretation of M141, the system reasons that when the user asks a why-question about an unspeci ed relation between two concepts in the Bayesian network, then it can be interpreted as a request for the system to perform the action describe-probability-chain on the two nodes. (Other interpretations are possible, the system need only nd one that applies.) In the terminology of the previous section, M141 is explained as a case of plan-adoption, M151 is explained as acceptance, M161 is a self-misunderstanding (of utterance 1), M171 is a fourth-turn repair, which also acts as the acceptance for the interpretation of M141.



Nodes M152 and M172 identify the exchange structure of utterances 1 and 2 (the original interpretation) and the exchange structure of utterances 1 and 4 (the repaired interpretation). When a node is recognized as the acceptance of another node, those two nodes can be taken as a complete exchange. In the gure, M153 and M173 are propositions that the two structures, respectively, are indeed exchanges.

31

The exchange structure is signi cant because, the system will step back through this structure if it needs to reason about alternative interpretations. The exchange structure indicates how each speaker displayed their understanding of the other's previous utterances. (The utterance sequence will not always provide this information because exchanges can be nested inside each other, e.g., to ask a clarifying question.) The interpretation of M161 is special, because it neither begins a new exchange nor completes an open one. (This would be determined by its linguistic form and by the expectations created by the previous interaction.) When the system fails to nd a domain plan (e.g., a request to display part of the network) or a discourse plan (e.g., a request to clarify a previous utterance or an answer to a question from the system), then it considers evidence of failure. In this case, the surface form of the utterance suggests looking back in the conversation to consider an alternative domain plan. Finding an utterance that admits an alternative interpretation{corresponding to an alternative domain plan|a new interpretation (M162) is constructed, which results in a repair action by the system to accept it. (If there had been no alternative domain plan, then the system would not be able to form any interpretation of utterance 3|a case of non-understanding|and would subsequently ask the user to provide more information about the problem.) The result of dialog processing is thus a detailed network or propositions that indicates the content of the utterances produced by the system or the user, their role in the interaction, and the system's belief about what has been understood. If necessary, the system will be able to explain why it produced the utterances that it did and recover from situations where communication has failed.

32

7 Conclusion When a system allows users the exibility to initiate tasks or to ask follow up questions, they must allow for possible misunderstanding. Such a system will need to examine the evidence provided by the user's actions over a sequence of interactions, to verify that understanding has occurred. A system will also need the ability to deviate from its planned sequence of actions to address apparent failures. Here we considered an approach to robust human-computer interaction that relies on an explicit, declarative representation of the content and structure of the interaction. The representation includes ve levels: 

The utterance level represents the linguistic or graphical form of communication between the user and the system.



The utterance sequence level represents the temporal ordering of utterances.



The interpretation level represents the communicative role that the utterance achieves and the domain plan that it invokes.



The exchange level represents the socially-de ned relationship that determines whether a communicative act has been accepted or understood.



The exchange interpretation-level represents (explicitly) the acceptance or failure of an exchange.

Although these structures may grow quite large, two aspects of the representation have made it tractable for an interactive system. First, there is knowledge sharing or associativity between the structures in each of the ve levels. This occurs because the representation is uniform and every concept is represented by a unique node. As a result, we can retrieve and 33

make use of information that is represented in the network implicitly, by the arcs that connect propositional nodes. Second, the system uses mixed-depth representations, which allows it to defer building and elaborating interpretations until the information is most readily available and only when it is actually needed. Building such a representation over the course of an interaction allows the system to recognize and repair misunderstandings, because it has a record of why it invoked the plan that it did and what evidence it had that it was successful. If necessary, it can later reevaluate this evidence in light of new, possibly contradictory (or unexplainable) observations. The construction and use of the representation is facilitated by using a framework that allows us to represent all this information in the same network and by providing mechanisms for reasoning over the information eciently.

Acknowledgments Thanks are due to Susan Haller and Syed Ali for their contributions to the work on B2 and for their comments on this paper. Thanks also to Graeme Hirst for his contributions to the work on miscommunication. This work has been supported by the University of Wisconsin{Milwaukee and the National Science Foundation, IRI-9701617.

References [Ali et al., 1997] Syed S. Ali, Susan W. McRoy, and Susan M. Haller. Mixed-depth representations for dialog processing. 1997. Submitted for publication.

34

[Ali, 1994a] Syed S. Ali. A Logical Language for Natural Language Processing. In Proceedings of the 10th Biennial Canadian Arti cial Intelligence Conference, pages 187{196, Ban , Alberta, Canada, May 16-20 1994. [Ali, 1994b] Syed S. Ali. A \Natural Logic" for Natural Language Processing and Knowledge Representation. PhD thesis, State University of New York at Bu alo, Computer Science,

January 1994. [Allen, 1983] James Allen. Recognizing intentions from natural language utterances. In Michael Brady, Robert C. Berwick, and James Allen, editors, Computational Models of Discourse, pages 107{166. The MIT Press, 1983. [Ardissono et al., 1998] Liliana Ardissono, Guido Boella, and Rossana Damiano. A planbased model of misunderstandings in cooperative dialogue. International Journal of Human-Computer Studies/Knowledge Acquisition, 1998. [Calistri-Yeh, 1991] Randall J. Calistri-Yeh. Utilizing user models to handle ambiguity and misconceptions in robust plan recognition. User Modeling and User-Adapted Interaction, 1(4):289{322, 1991. [Carberry, 1990] Sandra Carberry. Plan Recognition in Natural Language Dialogue. The MIT Press, Cambridge, MA, 1990. [Eller and Carberry, 1992] Rhonda Eller and Sandra Carberry. A meta-rule approach to

exible plan recognition in dialogue. User Modeling and User-Adapted Interaction, 2(1{ 2):27{53, 1992. [Fikes and Nilsson, 1971] R. E. Fikes and Nils J. Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving. Arti cial Intelligence, 2:189{208, 1971. 35

[Goodman, 1985] Bradley Goodman. Repairing reference identi cation failures by relaxation. In The 23rd Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference, pages 204{217, Chicago, 1985. [Grosz and Sidner, 1986] B. J. Grosz and C. L. Sidner. Attention, intentions, and the structure of discourse. Computational Linquistics, 12, 1986. [Hinkelman, 1990] Elizabeth Ann Hinkelman. Linguistic and Pragmatic Constraints on Utterance Interpretation. PhD thesis, Department of Computer Science, University of Rochester, Rochester, New York, 1990. Published as University of Rochester Computer Science Technical Report 288, May 1990. [Hirst and Ryan, 1992] Graeme Hirst and Mark Ryan. Mixed-depth representations for natural language text. In Paul Jacobs, editor, Text-Based Intelligent Systems. Lawrence Erlbaum Associates, 1992. [Lambert and Carberry, 1991] Lynn Lambert and Sandra Carberry. A tri-partite plan-based model of dialogue. In 29th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pages 47{54, Berkeley, CA, 1991.

[Litman, 1986] Diane J. Litman. Linguistic coherence: A plan-based alternative. In 24th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pages 215{223, New York, 1986. [Maida and Shapiro, 1982] A. S. Maida and S. C. Shapiro. Intensional concepts in propositional semantic networks. Cognitive Science, 6(4):291{330, 1982. Reprinted in R. J. Brachman and H. J. Levesque, eds. Readings in Knowledge Representation, Morgan Kaufmann, Los Altos, CA, 1985, 170{189. 36

[McCoy, 1985] Kathleen F. McCoy. The role of perspective in responding to property misconceptions. In Proceedings of the Ninth International Joint Conference on Arti cial Intelligence, volume 2, pages 791{793, 1985. [McRoy and Hirst, 1993] Susan W. McRoy and Graeme Hirst. Abductive explanation of dialogue misunderstandings. In 6th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, pages 277{286, Utrecht, The Netherlands, 1993. [McRoy and Hirst, 1995a] Susan W. McRoy and Graeme Hirst. The repair of speech act misunderstandings by abductive inference. Computational Linguistics, 21(4):435{478, December 1995. [McRoy and Hirst, 1995b] Susan W. McRoy and Graeme Hirst. The repair of speech act misunderstandings by abductive inference. Computational Linguistics, 21(4):435{478, December 1995. [McRoy et al., 1997] Susan McRoy, Susan Haller, and Syed Ali. Uniform knowledge representation for nlp in the b2 system. Journal of Natural Language Engineering, 3(2), 1997. [McRoy, 1993] Susan W. McRoy. Abductive Interpretation and Reinterpretation of Natural Language Utterances. PhD thesis, Department of Computer Science, University of Toronto, Toronto, Canada, 1993. Available as CSRI Technical Report No. 288, University of Toronto, Department of Computer Science. [McRoy, 1995a] Susan W. McRoy. Misunderstanding and the negotiation of meaning. Knowledge-based Systems, 8(2{3):126{134, 1995. 37

[McRoy, 1995b] Susan W. McRoy. Misunderstanding and the negotiation of meaning. Knowledge-based Systems, 1995. [Novick and Ward, 1993] David Novick and Karen Ward. Mutual beliefs of multiple conversants: A computational model of collaboration in air trac control. In Proceedings, National Conference on Arti cial Intelligence (AAAI-93), 1993. [Scheglo and Sacks, 1973] Emanuel A. Scheglo and Harvey Sacks. Opening up closings. Semiotica, 7:289{327, 1973. [Scheglo , 1992] Emanuel A. Scheglo . Repair after next turn: The last structurally provided defense of intersubjectivity in conversation. American Journal of Sociology, 97(5):1295{1345, 1992. [Shapiro and Group, 1992] Stuart C. Shapiro and The SNePS Implementation Group. SNePS-2.1 User's Manual. Department of Computer Science, SUNY at Bu alo, 1992. [Shapiro and Rapaport, 1987] S. C. Shapiro and W. J. Rapaport. SNePS considered as a fully intensional propositional semantic network. In N. Cercone and G. McCalla, editors, The Knowledge Frontier, pages 263{315. Springer{Verlag, New York, 1987.

[Shapiro and Rapaport, 1992] Stuart C. Shapiro and William J. Rapaport. The SNePS family. Computers & Mathematics with Applications, 23(2{5), 1992. [Shapiro et al., 1994] Stuart Shapiro, William Rapaport, Sung-Hye Cho, Joongmin Choi, Elissa Feit, Susan Haller, Jason Kankiewicz, and Deepak Kumar. A dictionary of SNePS case frames, 1994. [Winograd, 1983] Terry Winograd. Language as a Cognitive Process, Volume 1: Syntax. Addison-Wesley, Reading, MA, 1983. 38

[Zukerman, 1991] Ingrid Zukerman. Avoiding mis-communications in concept explanations. In Proceedings of the 13th Annual Conference of the Cognitive Science Society, pages 406{411, Chicago, IL, 1991.

39