Building Spoken Language Collaborative Interface Agents

Building Spoken Language Collaborative Interface Agents Candace L. Sidner, Carolyn Boettner Lotus Development Corporation 55 Cambridge Parkway Cambridge, MA 02142 USA [email protected], carolyn [email protected] Charles Rich MERL–A Mitsubishi Electric Research Lab 201 Broadway Cambridge, MA 02139 USA [email protected] March 15, 2000 Abstract This paper reports on the development of two spoken language collaborative interface agents built with the Collagen system. It presents sample dialogues with the agents working with email applications and meeting planning applications, and discusses how these applications were created. It also discusses limitations and benefits of this approach. An abbreviated version of this paper appears as “Lessons Learned in Building Spoken Language Collaborative Interface Agents,” by Candace L. Sidner, Carolyn Boettner and Charles Rich, Workshop on Conversational Systems, ANLP-NAACL 2000, Seattle, WA, May, 2000.

1

Introduction

The underlying premise of the CollagenTM (for Collaborative agent) project is that software agents, when they interact with people, should be governed by the same principles that govern human-tohuman collaboration. To determine the principles governing human collaboration, we have relied on research in computational linguistics on collaborative discourse, specifically within the SharedPlan framework of Grosz and Sidner [2, 3, 4, 6]. This work has provided us with a computationallyspecified theory that has been empirically validated across a range of human tasks. We have implemented the algorithms and information structures of this theory in the form of a Java middleware component, a collaboration manager called Collagen, which software developers can use to implement a collaborative interface agent for any Java application. In the collaborative interface agent paradigm, illustrated abstractly in Figure 1, a software agent is able to both communicate with and observe the actions of a user on a shared application interface,

1

User

Agent communicate

observe

observe

interact

interact

Application

Figure 1. Collaborative interface agent paradigm.

Figure 2. Graphical interface for Collagen email agent.

2

and vice versa. The software agent in this paradigm takes an active role in joint problem solving, including advising the user when he gets stuck, suggesting what to do next when he gets lost, and taking care of low-level details after a high-level decision is made. The screenshot in Figure 2 shows how the collaborative interface agent paradigm is concretely realized on a user’s display. The large window in the background is the shared application, in this case, the Lotus eSuiteTM email program. The two smaller overlapping windows in the corners of the screen are the agent’s and user’s home windows, through which they communicate with each other. A key benefit of using Collagen to build an interface agent is that the collaboration manager automatically constructs a structured history of the user’s and agent’s activities. This segmented interaction history is hierarchically organized according to the goal structure of the application tasks. Among other things, this history can help re-orient the user when he gets confused or after an extended absence. It also supports high-level, task-oriented transformations, such as returning to an earlier goal. Figure 3 shows a sample segmented interaction history for the an email interaction. To apply Collagen to a particular application, the application developer must provide an abstract model of the tasks for which the application software will be used. This knowledge is formalized in a recipe library, which is then automatically compiled for use by the interface agent. This approach also allows us to easily vary an agent’s level of initiative from very passive to very active, using the same task model. For more details on the internal architecture of Collagen, see [8]. We have developed prototype interface agents using Collagen for several applications, including air travel planning [8], resource allocation, industrial control, and common PC desktop activities.

2

Prototype 1: A Collaborative Email Agent

The email agent [5] is the first Collagen-based agent we have built that supports spoken-language interaction. Our other agents avoided the need for natural language understanding by presenting the user with a dynamically-changing menu of expected utterances, which was generated from the current discourse state according to the predictions of the SharedPlan theory. Sample menus are displayed in Figure 2. The email agent, however, incorporates a speech and natural language sentence understanding system for understanding email related utterances, developed by IBM Research. This capability allowa users to collaborate either entirely in speech or with a mixture of speech and graphical actions. In addition, we have developed the Lotus NotesTM meeting planning agent, which incorporates speech and sentence level understanding using the Java Speech API, as implemented by IBM. The JSAPI toolkit provides a parser, which we use with a vocabulary and grammar we developed for the domain of meeting planning. The tags produced by the Java Speech parser are interpreted with a set of semantic rules that produce internal structures used by the Collagen agent. This agent is discussed in more detail in the next section.

2.1

A Session with a Collaborative Email Agent

With the email application, the user can read, compose and send messages as one typically does with email. The Collagen email agent, called Daffy, watches the user interaction. It can perform a few email actions on its own (such as opening and closing windows, and filling in the user’s signature on email) and can also undertake actions that the user requests in spoken utterances. In the sample session shown in Figure 4, the agent keeps a todo list for the user, explains how to accomplish email tasks for a user who is new to email, answers questions about what actions were taken in the interaction and offers suggestions about what to do next in the interaction.

3

USER: "Your turn." Done checking our to-do list. AGENT: "Let’s check our to-do list." USER: "Ok." AGENT: "The following items are on our to-do list." AGENT: "Work on email." AGENT: "Prepare a presentation about Java Beans." AGENT: "That’s all." USER: "Ok." Done working on email. AGENT: "Let’s work on email." Done explaining how to work on email. USER: "What should I do?" AGENT: "First, you reed a message." USER: "Ok." AGENT: "Then, you might store, print or delete the message." Done reading message from Brian Gorwin about WEEKLY SPEECH GROUP .... Done explaining how to read a message. Done agent selecting message from Brian Gorwin about WEEKLY SPEECH GROUP .... USER: "Select message from Brian Gorwin about WEEKLY SPEECH GROUP ...." AGENT: Select message from Brian Gorwin about WEEKLY SPEECH GROUP .... AGENT: Open message from Brian Gorwin about WEEKLY SPEECH GROUP SEMINAR. Done using the contents of message from Brian Gorwin about WEEKLY SPEECH .... AGENT: "Shall we use the contents of message from Brian Gorwin about WEEKLY ...?" Done scheduling meeting with Brian at three P.M. Thursday by scheduling. Done agent setting up meeting with Brian. USER: "Set up meeting with Brian." AGENT: Set up meeting with Brian. Done agent scheduling event tentatively at three P.M. Thursday. AGENT: "Let’s schedule event tentatively at a time." USER: "Schedule event tentatively at three P.M. Thursday." AGENT: Schedule event tentatively at three P.M. Thursday. AGENT: "There is a conflict of meeting with Brian at three P.M. Thursday with meeting with Irene Landoz at three P.M. Thursday." AGENT: "Let’s remember to reschedule meeting with Irene Landoz at three P.M. Thursday." USER: "Ok." AGENT: Finalize meeting with Brian at three P.M. Thursday. AGENT: Close message from Brian Gorwin about WEEKLY SPEECH GROUP SEMINAR. Done reacting to message from Brian Gorwin about WEEKLY SPEECH .... AGENT: "Shall we react to message from Brian Gorwin about WEEKLY ...?" Done agent forwarding message from Brian Gorwin about WEEKLY ... to Steve Krantz. USER: "Forward message from Brian Gorwin about ... to Steve Krantz." Done user saying identify the recipient as Steve Krantz. AGENT: "Who is the recipient?" USER: "The recipient is Steve Krantz." AGENT: Forward message from Brian Gorwin about ... to Steve Krantz. Done filling in message from Candy Sidner about Fwd:WEEKLY SPEECH GROUP SEMINAR. AGENT: "Shall we fill in message from Candy Sidner about Fwd:WEEKLY ...?" USER: "Ok." AGENT: Add text to message from Candy Sidner about Fwd:WEEKLY .... Stopped ccing message from Candy Sidner about Fwd:WEEKLY SPEECH .... AGENT: "Shall we cc message from Candy Sidner about Fwd:WEEKLY SPEECH GROUP SEMINAR?" USER: "No." AGENT: "Ok, your turn." Done sending, canceling or saving this message as a draft. Done agent sending message from Candy Sidner about Fwd:WEEKLY SPEECH .... USER: "Send message from Candy Sidner about Fwd:WEEKLY SPEECH GROUP ...." AGENT: Send message from Candy Sidner about Fwd:WEEKLY SPEECH .... Done storing, printing or deleting this message. AGENT: "Shall we store, print or delete this message?" USER: "Did I forward message to Steve Krantz." AGENT: "Yes, you did forward message from Brian Gorwin about ... to Steve Krantz." USER: "Done." Done interruption agent checking history. USER: "Please check history." USER: "Done."

Figure 3. Sample segmented interaction history for the an email application interaction.

4

USER: AGENT: USER: AGENT: AGENT: AGENT: AGENT: AGENT: USER: AGENT: AGENT: USER: AGENT: AGENT: AGENT: AGENT: AGENT: AGENT: AGENT: AGENT: AGENT: USER: AGENT: AGENT: AGENT: USER: AGENT: AGENT: USER: AGENT: AGENT: AGENT: USER: AGENT: AGENT: USER: AGENT: USER: AGENT: AGENT: USER: AGENT: AGENT: USER: AGENT: USER: AGENT: AGENT: USER: AGENT: USER: AGENT: USER: AGENT: USER: USER:

gives turn to agent. "Let’s check our to-do list." "Ok." "The following items are on our to-do list." "Work on email." "Prepare a presentation about Java Beans." "That’s all." USER: "Ok." "Let’s work on email." "What should I do?" "First, you reed a message." USER: "Ok." "Then, you might store, print or delete the message." "How do I read a message?" Point to where you select a message. "First, you might select a message by clicking here." USER: "Ok." Point to where you open a message. "Next, you open the message by clicking here." USER: "Ok." Stop pointing. "Then, you might use the contents of the message." USER: "Ok." "Then, you might react to the message." USER: "Ok." Point to where you close a message. "Finally, you close the message by clicking "Close" on pop-up menu here." "Select the second message. Select message from Brian Gorwin about WEEKLY SPEECH GROUP SEMINAR. Open message from Brian Gorwin about WEEKLY SPEECH GROUP SEMINAR. "Shall we use the contents of message from Brian Gorwin about WEEKLY SPEECH GROUP SEMINAR?" "This is about a meeting with Brian." Set up meeting with Brian. "Let’s schedule event tentatively at a time." "Schedule the meeting at three P.M. Thursday." Schedule event tentatively at three P.M. Thursday. "There is a conflict of meeting with Brian at three P.M. Thursday with meeting with Irene Landoz at three P.M. Thursday." "Let’s remember to reschedule meeting with Irene Landoz at three P.M. Thursday." "Ok." Finalize meeting with Brian at three P.M. Thursday. "Shall we react to message from Brian Gorwin about WEEKLY SPEECH GROUP SEMINAR?" "Forward the message." "Who is the recipient?" "The recipient is Steve Krantz." Forward message from Brian Gorwin about WEEKLY SPEECH GROUP SEMINAR to Steve Krantz. "Shall we fill in message from Candy Sidner about Fwd:WEEKLY SPEECH GROUP SEMINAR?" "Ok." Add text to message from Candy Sidner about Fwd:WEEKLY SPEECH GROUP SEMINAR. "Shall we cc message from Candy Sidner about Fwd:WEEKLY SPEECH GROUP SEMINAR?" "No." "Ok, your turn." "Send the message." Send message from Candy Sidner about Fwd:WEEKLY SPEECH GROUP SEMINAR. "Shall we store, print or delete this message?" "Did I forward message to Steve Krantz." "Yes, you did forward message from Brian Gorwin about WEEKLY SPEECH GROUP SEMINAR to Steve Krantz." "Done." "Shall we work on email?" "No." "Ok, your turn." "Please check history." "Done."

Figure 4. A Session with a Collagen email agent (unsegmented).

5

While this conversation takes place completely with speech with the agent performing all the GUI actions, the user could have undertaken any of the GUI actions herself, including selecting messages and forwarding them. In that case, the agent would have recorded the GUI action as part of the conversation. In the figure, agent actions that are not spoken are indicated without quotation marks. For example, when the user says “Forward the message,” the agent’s contribution is to forward the message, as indicated without quotation marks. The session in Figure 4 can be further illustrated by viewing the actual screen the user and agent used at various points in the conversation. In Figure 5 the email application, user window and agent window are presented. The user window displays the words understood by the speech recognition system and also in a very small sub-window, the semantic interpretation produced by the natural language sentence interpretation module, developed by IBM Research. The conversation begins with Daffy proposing to check the to-do list, to which the user agrees. The utterances Daffy speaks through speech synthesis are also presented in textual form in the agent window. Daffy has a graphical “hand” with which it can point to objects on the screen. As shown in Figure 6, Daffy responds to the user’s question “How do I read a message?” by saying “First, you might select a message by clicking here” while pointing to a message in the inbox. Daffy performs similar pointing actions for other information but does not try to point for the description “reacting to a message” because there are several GUI actions (e.g. New Message, Reply and Forward) possible at that time. The reader will note that the semantic interpretation in Figure 6 is “open mail message( message = current”) which is a poor meaning for the utterance “How do I read a message?” For non-

Figure 5. The user and Daffy start a conversation by discussing a todo list.

6

Figure 6. Daffy points to a message to be selected.

Figure 7. Daffy selects and opens a message.

7

imperative utterances, the semantic interpretation produced by the sentence understanding module occasionally was faulty. The email agent does a simple check on the recognized words, and as a result may disregard the semantic interpretation. In effect, two semantic interpretations are thus undertaken for some utterances. While this is inefficient, it was necessary to recover some limits of the semantic interpretation module provided for Daffy. When the user requests that Daffy select the second message, Daffy does so, as shown in Figure 7. Daffy selects the message (highlighted at the top of the screen) and then, on its own decision making, also opens it. Daffy’s choice results from a recipe for reading email which stipulates that typically after selecting a message, it is opened, and from the fact that Daffy is authorized to open messages. Daffy is authorized only to do actions that have reversible effects. Actions, such as sending a message, are not reversible, and Daffy will not perform them without first obtaining consent from the user. After opening the message, Daffy asks if the user wants to use the contents of the message. Daffy’s recipes tell it that users can click on message text, go to URLs or schedule meetings at this point in reading mail. While Daffy has no capabilities to read the contents of messages, it can follow actions taken once the message is open. When the user tells Daffy that the message is about a meeting with Brian in Figure 8, Daffy concludes that a meeting could be scheduled, and proposes scheduling an event at a time (the “tentatively” phrase is Daffy’s way of indicating that this action is optional). Were the user to decline at this point, Daffy is prepared to go on with the mail reading task. Instead the user says to schedule the meeting at three p.m. Thursday, which Daffy does, and reports that there is a conflict with another meeting. Daffy proposes that the conflicted meeting with the user be rescheduled and the user agrees. Rescheduling the conflicted meeting goes on the todo list for later action. Later in the sample conversation, when the user requests that Daffy forward the message (that is, the currently selected message), Daffy does not immediately present a new forwarding template. Rather Daffy asks, as shown in Figure 9, who the recipient is. Upon receiving an answer, Daffy presents to the user the template with the recipient filled in. At the point in the sample session in Figure 4 when the user says “Please check history,” Daffy presents a segmented interaction history nearly identical to the one shown in Figure 3. The figure differs because the last three lines did not occur until after the user closed the segmented interaction history. The segmented interaction history recorded by the agent for this conversation (shown in Figure 3) does not record the actual words spoken by the user in some cases. The agent does not use the spoken words, but rather the semantic interpretation, which it stores as its understanding of what the user said. When it glosses these internal representations into readable English, the descriptions are slightly different than what the user actually said. The history also does not show any of the details of the conversation for explaining how to read a message; those details were deleted in order to accommodate fitting the interaction on the page.

2.2

How We Built the Email Collaborative Agent

The Collagen system architecture, displayed in Figure 10, provides a very general framework for any collaborative agent. Each box in that diagram is a Java Bean; beans communicate via the methods and events indicated on the arrows between the beans. This modularization has been invaluable and has greatly simplified our work. To create the Daffy email agent, we instantiated “Alib; AnotherLib” (which is the recipe library of the Collagen system) with a recipe library about email of about 55 actions and 32 recipes for doing those actions; the actions included GUI primitives such as sending a message, and high level actions such as reacting to a message. We also developed 8

Figure 8. Daffy proposes scheduling a meeting.

Figure 9. Daffy asks the user about the recipient of the message.

9

an adapter (a transducer), between Daffy and the email application that served to translate Daffy’s internal description of email actions to the calls required by the application program interface [1]. Creating a recipe library and and an adapter are two necessary modules that must be supplied when wiring a collaborative interface agent to an application. To speech-enable this architecture, we made two modifications. The first was to create an additional Java bean for the synthesis module of IBM Via VoiceTM to make it a listener for messages from the AgentHome bean, and to modify the AgentHome bean to be a source for the sythesis bean. This change allowed for every utterance that a Collagen agent normally communicated to the user via text in the agent window (an object in AgentHome) to also be sent to the synthesis bean and “spoken” by the synthesis component. Our second modification modified the UserHome bean to be a listener for the IBM speech recognition and sentence understanding system, discussed previously. This system provides semantic interpretations (meanings) of the individual utterances spoken by the user. We also modified the UserHome code to take the semantic interpretations and turn them into the internal language understood by the Collagen system.

3

Prototype 2: A Collaborative Agent for Planning Meetings

Our experience with Daffy convinced us that collaborative agents would be more useful if they not only understood the task that the user was performing with the interface but could undertake more of the user’s sub-goals and thus off-load some of the burden from the user. To explore this notion, we built Dotty, a collaborative interface agent, that works with a user who is planning a meeting ˙ the dialogue in Figure 11 demonstrates, Dotty is able with a customer and is using Lotus NotesTM As to take over many of the details of planning the meeting. Dotty’s abilities result from the general facility in the Collagen system to undertake a wide variety of clarification dialogues (see [8, 6] for further details).

3.1

A Session with a Collaborative Interface Agent for Planning Meetings

This dialogue concerns an overall goal of managing sales contacts and several sub-goals, including creating a profile for a customer (which is displayed to the user as a Notes document), scheduling a meeting with that customer (which the agent undertakes by itself using facilities in Lotus Notes), finding information about companies (which is displayed as a Notes document), and a brief discussion about planning a presentation. The conversation in Figure 11 begins with the Dotty agent proposing that the user and agent manage sales contacts. The user responds by looking at a document in a Notes database from which the user learns about the new CEO of Abbott Laboratories, as demonstrated in Figure 12. Dotty observes this Notes action but does not read the content of the document. Following reading the document, the user proposes that the user and agent create a profile for Miles White, as shown in the user window of Figure 13. Creating a profile is an activity that the agent can takeover from the user. The conversation in Figure 11 illustrates the agent asking two questions and then performing two actions (goto the profile for Abbott Labs and create the profile for Miles White). As shown in Figure 14, the agent next asks if the user would like to schedule a meeting with this individual; the user’s response is also presented. Dotty asks about scheduling because its recipe for managing contacts indicates that this is a possible though not necessary next step in the interaction.

10

COLLAGEN "BEAN WIRING DIAGRAM" graphical

event

ALib; AnotherLib

property/other method

getRecipes getRecipes

Collagen UserHome

uttered

AgentHome

History

getSelected

uttered

getAgenda

push pop add hold

Discourse

AnAgent

getAgenda getStack getHistory getHold

noticed

performed

performed

getCount getInstance getAnInt

AnAdapter

AnApplication

anEvent anotherEvent

AView

AModel

window events semantic events updates

Figure 10. The architecture of the Collagen system.

11

AGENT: USER: AGENT: USER: AGENT: USER: AGENT: AGENT: AGENT: USER: AGENT: AGENT: USER: AGENT: USER: AGENT: AGENT: USER: AGENT: AGENT: USER: AGENT: AGENT: USER: AGENT: AGENT: USER: AGENT: USER: AGENT: AGENT: USER: AGENT: AGENT: USER: AGENT: AGENT: USER: AGENT: AGENT: AGENT: USER: AGENT: AGENT: AGENT: AGENT: USER: AGENT: AGENT:

"Let’s manage sales contacts." "Let’s create profile for Miles White from Abbott Laboratories." "What is the eemail address?" "The emailAddress is [email protected]." "What is the workPhone?" "The workPhone is 6173329280." Goto profile for Abbott Laboratories. Create profile for Miles White. "Do you want to schedule a presentation for Miles White?" "Schedule the meeting for December 9 from 2 to 4 pm." Make meeting entry for Miles White on December 9. "Is the meeting here or at the customer site?" "The site is here." "Which building?" "The building is Rogers Street." Locate room 5003 Center in Rogers Street. "I’m going to reserve room 5003 Center in Rogers Street." "Ok." Reserve room 5003 Center in Rogers Street. "Do you need any special equipment?" "No." "Okay." "Who will attend the meeting?" "Invite Fred Smith and Jerry Johnson." Invite Fred Smith and Jerry Johnson. "Will the customer need consulting services?" "Ok." "Identifying the consulting staff." "Invite Pam Bridges from Lotus Consulting and Jimmy Barnes from IGS." Invite Pam Bridges and Jimmy Barnes. "The following people: Pam Bridges have a conflict with the meeting time.." "Instead invite Julie Knox." Invite Julie Knox instead of Pam Bridges. "I will get back to you to confirm this meeting." "Ok." Send invitations to Fred Smith, Jerry Johnson, Julie Knox, and Jimmy Barnes. "Do you want to see company information?" "Show me information about Dover Hospital." Goto profile for Dover Hospital. "You need to plan a presentation for Miles White." "Shall we find presentations about a subject?" "No." "Okay." "I’m going to get back to you about meeting." "The following people: Jimmy Barnes have a conflict with the meeting time.." "This is message from Jimmy Barnes: I have to be in Cleveland on December 9. "Invite Jonathan Parks instead." Invite Jonathan Parks instead of Jimmy Barnes. "I will get back to you to confirm this meeting."

Jonathan Parks is available."

Figure 11. Sample session of a Collagen agent using for Lotus Notes for meeting planning (unsegmented).

12

Figure 12. Dotty and the user begin managing sales contacts.

Figure 13. The user and Dotty work on a profile for Miles White.

13

Figure 14. Dotty asks about scheduling a meeting.

Figure 15. The user and Dotty resolve a scheduling conflict.

14

Figure 16. The user wants information on another company.

Figure 17. Finding another meeting participant.

15

The conversation at this point consists of several turns between Dotty and the user in which Dotty asks about many details of the meeting that is to be set up. Dotty needs to know the location, so that it can reserve a room, and determine equipment needs as well as attendees. Given a group of attendees, Dotty checks their online calendars to see if they are available. One person, Pam Bridges, is not, and so Dotty tells the user this fact (Figure 15); the user chooses another attendee. Once the user confirms the meeting, Dotty sends out invitations to the four participants. One of Dotty’s utterances in determining who will attend the meeting is “Identify the consulting staff.” (See Figure 11 about halfway through the conversation.) This utterance is meant to convey that the agent wants the user to name the consulting staff. This utterance is not as fluent as the others in this conversation. The difference concerns the phrasal lexicon that comes with Collagen to provide the more natural utterances that appear in most of the conversation. In this one instance, the phrasal lexicon was not changed, so that the default generation capability of Collagen could be shown here. While Dotty is sending out invitations, the user may have other things to do, and so Dotty asks about whether the user wants to see information on other companies (“Do you want to see company information?” in Figure 16). The particular language of this question is not as fluent as it should be, and indicates the limits of the string concatenation methods used in Collagen for turning Dotty’s internal messages into understandable English, even with a phrasal lexicon. When the user requests information for Dover Hospital, Dotty provides the appropriate Notes document, as illustrated in Figure 16. Dotty’s recipe for managing sales contacts also contains optional actions for planning the presentation that the user will give at the meeting. Dotty asks about this next, but the user declines to do so. At that point Dotty announces “I’m going to get back to you about meeting.” Here again Dotty’s utterances are not very fluent because Dotty in fact is getting back to the user at that moment, and so the future tense is improper. Despite the disfluency, Dotty goes on the explain that one of the participants, Jimmy Barnes, has indicated that he cannot attend the meeting (see Figure 17. While in this prototype Dotty has not actually received such an email, we included this step to indicate how Dotty might proceed with such information. The user’s request that Jonathan Parks be invited instead is accepted by Dotty at the end of the conversation. We believe that Dotty illustrates how a collaborative agent can be utilized effectively in a user interface–not only to track the interaction but to undertake simple activities that it can perform. While these simple activities are now done by people, they tend to distact users from their more demanding tasks that cannot at present be undertaken by intelligent agents. Providing the knowhow and conversational skills to an intelligent agent means a time savings for users. Furthermore, the user must manage less of the task in answering Dotty’s questions than in using a GUI to click on windows and buttons and to type out all of requisite information. Dotty’s behavior demonstrates the value of collaborative agents for interface activities.

3.2

How We Built the Collaborative Meeting Planning Agent

The collaborative agent for planning meetings uses much of the architecture displayed previously in Figure 10. The agent uses a library that is far smaller than the email agent’s: 19 actions and 5 recipes. The same modifications of the UserHome and AgentHome beans are used to introduce speech to this agent. However, because NotesTM is a large and complex system, we chose to use separate Java virtual machines for Collagen and Notes. We modified the architecture as shown in Figure 18, to permit the remote method invocation (RMI) server to mediate between the adapter component of Collagen and the Notes application. We coded the adapter to register with the RMI server so that it could serve as a source of commands that would be sent to Notes and as a listener 16

for events that Notes reported. We coded a special Java applet created within Notes using the Lotus Research OverViews technology [7] to receive events from the RMI server and report events to the server. These adaptations permitted the adapter and the applet to act in accordance with the overall guidelines of the original Collagen architecture. We found the RMI addition somewhat problematic because it introduces some unpredictable and infrequent race conditions into the operation of the whole system.

4

Current Conversational Limitations of Collaborative Agents

There are a number of limitations of the conversational capabilities of collaborative agents using the Collagen system. Some of these have to do with limitations of Collagen, but others have to do with language tools used in these spoken language agents. The spoken interaction of the two collaborative agents is limited by the range of utterances that the utterance understanding components can interpret. For email, the collection of imperative utterances that can be interpreted is several hundred, but declarative utterances and questions are far more sparsely interpretable. For meeting planning, the grammar is relatively small, and the total set of understandable utterances is about 100 while the types of utterances understood is more balanced. Furthermore, we have not implemented state of the art capabilities in reference interpretation, modeling of tenses or other general pragmatic and discourse linguistic capabilities for either of these agents. More significantly, we feel these agents are limited in dealing with spoken conversational errors,

Figure 18. The architecture of the collaborative agent for planning meetings.

17

i.e. errors that arise either because the recognition system produces an error, or because the semantic interpretation is faulty (even given the correct choice of words, which, for example, occurs in Figure 6.) Our experience is that Daffy produces more semantic errors in part because its range of utterances is much larger. Errors resulting from semantic mis-interpretation are especially important because often the content of the faulty interpretation is something that the agent can respond to and does, which results in the conversation going awry. For example, if the user says to Daffy, “how do I a read a message?” and Daffy only recognizes part of the utterance as “read a message,” then Daffy will ask, “What message?” When a conversation goes awry because the agent thinks the user requested something that the user did not, generally the user would like to “undo” the request and return the conversation to before the error. In such cases we believe that the history based transformations possible in Collagen (c.f. [8]) will allow the user to turn the conversation back to before where the error occurred. In particular because the user can indicate what was misunderstood (even with the general statement “That’s not what I meant”), the conversation can begin with the context of the collaboration at the point just before problematic utterance occurred, and continue. Of course, some requested actions are difficult to undo because they have side-effects. For such cases, a much more complex conversation will be needed to determine how to proceed, but because Collagen agents can determine which actions have side-effects, they can participate more easily in a conversation about the problem. Conversations about undoing actions with side effects are an open research problem because they essentially involve negotiations. Whether communicating by speech or menus, our collaborative agents are limited by their inability to negotiate with their human partner. Consider the email conversation in Figure 4, where the agent asks “shall we cc message from Candy Sidner about fwd:WEEKLY SPEECH GROUP SEMINAR?” to which the user says “no.” The agent currently does not have any strategies for responding in the conversation other than to accept the rejection and turn the conversation back to the user (for example, Daffy says, “Ok, your turn.”). In general, for any circumstance where the agent proposes actions to perform that the user rejects, the agent only knows to give the conversational turn back to the user rather than to suggest other activities or to question the user choice. Similarly, when the user and agent disagree about some fact in the discussion, the agent has no capabilities for resolving the difference. Such behaviors are instances of the general capability for negotiation in collaboration. We are at present exploring how to use a set of strategies for negotiation of activities and beliefs that we have identified from corpora of human-human collaborations (see [9, 10]). We plan to incorporate these in the Collagen system so that the interface agents will have a wider range of negotiation capabilities, just as they now have clarification capabilities. Finally, our agents need a better model of conversational initiative. We have experimented in the Collagen system with three initiative modes, one dominated by the user, one by the agent and one that gives each some control of the conversation. The dialogues presented in this paper are all from agent initiative. None of these modes is quite right. The user dominated mode is characterized by an agent that only acts when specifically directed to or when explicitly told to take a turn in the conversation; the resulting agent acts oddly taciturn and appears to be stonewalling the conversation. In middle initiative, neither user or agent gets to provide exactly the turn that is appropriate. In the agent dominated mode, the agent constantly offers next possible actions relevant to the collaboration, and the resulting conversations feel as if the agent is not capable of sharing the conversation correctly with the user. We are currently investigating additional modes of initiative. The collaborative agent paradigm that we have implemented has several original features. The 18

conversation and collaboration model is general and does not require tuning or the implementation of special dialogue steps for the agent to participate. The model tracks the interaction and treats both the utterances of both participants and the GUI level actions as communications for the discourse; it relates these to the actions and recipes for actions. The model has facilities for richer interpretation of discourse level phenomena, such as reference and anaphora, through the use of the focus stack. Finally, when we began this research, we were not certain that the Collagen system could be used to create agents that would interact with users for many different applications. Our experience with five different applications indicates that the model has the flexibility and richness to make human and computer collaboration possible in many circumstances.

Acknowledgements The authors wish to thank the Human Language Technologies Group, IBM T.J. Watson Research Labs for use of their speech and natural language tools. We also thank Neal Lesh at MERL–A Mitsubishi Electric Research Lab for his work on Collagen.

References [1] A. H. Cheung. A collaborative interface agent for Lotus eSuite mail. Master’s thesis, Dept. of Elec. Eng. and Computer Science, Mass. Inst. of Technology, 1998. Also available as Lotus Research Technical Report 98-13. [2] B. J. Grosz and S. Kraus. Collaborative plans for complex group action. Artificial Intelligence, 86(2):269–357, October 1996. [3] B. J. Grosz and C. L. Sidner. Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3):175–204, 1986. [4] B. J. Grosz and C. L. Sidner. Plans for discourse. In P. R. Cohen, J. L. Morgan, and M. E. Pollack, editors, Intentions and Communication, pages 417–444. MIT Press, Cambridge, MA, 1990. [5] D. Gruen, C. Sidner, C. Boettner, and C. Rich. A collaborative assistant for email. In Proc. ACM SIGCHI Conference on Human Factors in Computing Systems, Pittsburgh, PA, May 1999. [6] K. E. Lochbaum. A collaborative planning model of intentional structure. Computational Linguistics, 24(4), December 1998. [7] J. Patterson and S. Rohall. OverViews. www.lotus.com/research > projects > overviews, March 2000. [8] C. Rich and C. Sidner. COLLAGEN: A collaboration manager for software interface agents. User Modeling and User-Adapted Interaction, 8(3/4):315–350, 1998. Reprinted in S. Haller, S. McRoy and A. Kobsa, editors, Computational Models of Mixed-Initiative Interaction, Kluwer Academic, Norwell, MA, 1999, pp. 149–184. Also available as MERL Technical Report 97-21a. [9] C. L. Sidner. An artificial discourse language for collaborative negotiation. In Proc. 12th National Conf. on Artificial Intelligence, pages 814–819, Seattle, WA, August 1994. [10] C. L. Sidner. Negotiation in collaborative activity: A discourse analysis. Knowledge-Based Systems, 7(4):265–267, December 1994.

19