An Intelligent Speech Interface for Personal Assistants in ... - CiteSeerX

17 downloads 74602 Views 1MB Size Report
multi-agent system designed for research and development projects. .... As the architecture should be independent of any specific software tool, it integrates a ...
$Q,QWHOOLJHQW6SHHFK,QWHUIDFHIRU3HUVRQDO$VVLVWDQWVLQ5 '3URMHFWV Emerson Cabrera Paraiso1,2

*

Jean-Paul A. Barthès1

/DERUDWRLUH+HXGLDV\F8QLYHUVLWpGH7HFKQRORJLHGH&RPSqJQH %3&RPSLqJQH)UDQFH 2 &RPSXWHU6FLHQFH'HSDUWPHQW3RQWLItFLD8QLYHUVLGDGH&DWyOLFDGR3DUDQi &(3&XULWLED3DUDQi%UD]LO 1

*Corresponding author: Tel: + 33 3 44 23 44 23; fax: + 33 3 44 23 44 77 E-mail address: HSDUDLVR#KGVXWFIU Université de Technologie de Compiègne Génie Informatique – Centre de Recherches BP 20.529 – 60.205- Compiègne Cedex, France



$EVWUDFW Groupware and collaborative tools have been proposed to support cooperative work. However, it was found that they suffer from some rather severe limitations. Alternatively, multi-agent systems have been developed to improve the situation. In the latter case, the user is normally interfaced through a special agent called a personal assistant. In this paper, we describe the design of an ontology-based speech interface for personal assistants applied in the context of cooperative projects. We believe that this type of interface will improve the quality of assistance. We present the interface and its insertion into a multi-agent system designed for research and development projects. We describe the design of the interface, highlighting the role of ontologies for semantic interpretation. As a result of this conversational speech interface, we expect an increase in the quality of assistance and a reduction in the time needed to answer user’s requests. .H\ZRUGV Speech Interface, Ontology, Personal Assistants, R&D Projects.

1



1.Introduction During the past few years, different projects have been developed involving the use of multi-agent systems (MAS) for improving computer-supported cooperative work (Shen and Wang [28], Spinosa et al. [29], Tacla and Barthès [31] or Wu et al. [34]). The application of such an approach can potentially improve information exchanges among the participants, provide support to them, improve workflows and procedures control, and provide convenient user interfaces to CSCW systems. They are an alternative to groupware and collaborative tools that suffer from some limitations as reported by Enembreck and Barthès [6]. Tacla mentioned [30] that groupware and collaborative tools offer a solution by bundling applications needed by the user. As a result, users can rely on shared spaces to organize their documents and can automate some simple tasks. However, as shown in [6], four main problems arise when using them: •

domain-specific tasks cannot be represented easily, i.e., tasks like “design an electrical engine” cannot be expressed with a sufficient level of details without extensive customization;



tools (e-mail, search engine, agenda) are usually not tightly integrated;



users’ preferences are usually ignored outside cosmetic interface customization;



users’ experiences are lost. To overcome such limitations, we propose to use personal assistants (PA) coupled with multi-

agent systems (MAS). The particular skills of a PA are devoted to understanding its master and presenting the information intelligently and in a timely manner. The main goal of such an agent is to reduce the user’s cognitive load. In principle, the approach does not suffer from the same limitations, since a PA agent can be developed so as to adapt to her owner, providing the necessary semantic glue to access external services in a uniform fashion. In this approach, information can be captured on the fly, improving knowledge management (KM). As we will show in details, this strategy allows a personalized solution, since the PA integrates a group of specialized agents working exclusively for the user. It has a pro-active behavior, anticipating user’s needs, and saving user’s time. Agents allow the distribution and reuse of knowledge sources. Hence, partial information from different sources can be used to solve a given problem. We consider that each user is a source of knowledge that the PA tries to represent and make available to other users.

2



Finally, PAs offer a very good opportunity to test new interface paradigms as the one presented in this paper. As in any software application, an appropriate user interface—if possible user-friendly—is essential. Traditionally, developers use graphic-oriented interfaces (containing menus, sub-menus and dialogue-boxes). Often this approach is inappropriate or not very appealing, leading to an interaction of poor quality. In the case of a PA, it may decrease the quality of assistance the agent can offer. On the other hand, conversational interfaces as defined by Kölzer [17] let users state what they want in their own terms, just as they would do, speaking to another person. Of course, the control of the interaction is more complex, but the complexity is handled by the system. Conversational interfaces let users concentrate on their main activity and, once in a while, exchange spoken words with the PA. We are applying this approach to multi-agent system (MAS) managing knowledge in the context of research and development projects, as detailed by Tacla and Barthès in [31]. Agents belong to a networked environment, providing a natural communication channel to the users and a flexible and dynamic approach to distributed/multidisciplinary design teams, which can reduce redundant design activities, and improve coordination as indicated by Chao et al. [4]. Communication is the support of a good collaborative environment. Here we use PAs to provide synchronous and asynchronous communications among the members of the project. Users can use their PA to send e-mails, have online discussions, share documents, see others’ activities, delegate activities, etc. Each user may speak with the PA in English in order to control it or to ask it to perform some task. The user and her PA use a practical dialogue—which means that they are pursuing specific goals or solving tasks cooperatively—as defined by Allen et al. [1]. The dialogue system is task-oriented. Tasks range from simple tasks like “locate a document” to more complex tasks that must be decomposed into subtasks. From the interaction point of view, we expect: •

to improve the quality of assistance; and



to reduce the user’ s cognitive load. In this paper, we discuss how a PA with a conversational speech interface can improve the

collaborative work and knowledge management in research and development projects. We present an ontology-based speech interface for a PA applied in the context of cooperative projects. The paper begins by describing the MAS architecture, the PA and the impact of the speech technology in the design of our

3



approach. After that, we present the ontology-based speech architecture. We then describe how ontologies are used for syntactic and semantic interpretation and for task representation. Finally, we offer a conclusion and indicate some perspectives.

2.The MAS Architecture This section describes the MAS architecture for KM systems in research and development (R&D) projects we are using, following Tacla and Barthès [31]. R&D teams have no time to organize project information, nor to articulate the rationale behind the actions that generate the information. Thus, our main goal is to design a system, that supports collaborative work and helps to capture and to organize experiences without overloading the team members with extra-work. Some general requirements for a KM system that guided the project are: •

the system must cover as much of the R&D activity as possible;



it must save time by helping the user in the day-to-day activities;



it must support users in creating and sharing knowledge;



it must be reliable, secure and persistent. The first reason to employ MAS in KM systems is that, like in a team, an MAS consists of a group

of possibly heterogeneous and autonomous agents that share a common goal and work cooperatively to achieve it. Initially there are two types of agents: 6HUYLFH$JHQWV that provide a particular type of service corresponding to specific skills, and, 3$V in charge of interfacing humans to the system. The particular skills of PAs are devoted to understanding their master (i.e. the user which is its owner) and to presenting the information intelligently and in a timely manner. In this approach, the PAs play a major role in the KM system. Firstly, they are in charge of all exchanges of information among team members. Secondly, PAs are able to organize the documentation of their master with the help of a service agent. Finally, as R&D members have to deal with knowledge intensive tasks, they are supposed to construct their own methods of work, to remember their past experiences, and if possible to access other members’ past experiences. Consequently, PAs must capture and represent the team members’ operations, helping them in the process of preserving and creating knowledge.

4



Figure 1. The multi-agent architecture As shown )LJXUH the system comprises several service agents. A service agent, called 3URMHFW $JHQW, holds all the values shared by the group and among them the ontologies; and the project tasks ontology (e.g. construction of a prototype). The team members can only extend the ontologies (i.e. add a new child concept). In this way, users can express their preferences, for instance, by refining a document category from the documentation ontology. So, the first thing a new user does when he integrates the project is to download the existing ontologies. Each PA is supposed to help its master to organize her documentation. In order to do that, each PA works jointly with its 2UJDQL]HUDJHQW, a service agent able to build representations of the documents for later retrieval. As the architecture should be independent of any specific software tool, it integrates a service agent called a 5HSRVLWRU\$JHQW, encapsulating the groupware that must be part of the architecture. The referred agent offers services for saving and retrieving documentation but, depending on the tool it encapsulates, it can offer other kinds of services like WEB searches, or e-mail management. In the next sections we present the PA in details, highlighting its speech interface.

3.The PA and the Speech Technology The role of the PA and its speech interface is crucial for the effectiveness of this approach. The design and implementation of such a speech interface is a hard task and involves many different components: dialogue controllers, natural language parsers, speech recognizers and synthesizers, knowledge manipulators, to list a few. Adding this type of interface to a PA is also a task that may require a significant effort as reported by Milward and Beveride [25].

5



The PA is a rather complex cognitive agent. Cognitive agents give the possibility of designing intelligent behaviors by specifying a set of skills. In addition, in our case, such agents run independently of any particular task to solve. Our agent is built around three main blocks: the user interface (speechbased), an $VVLVWDQF\ module, mainly responsible for controlling the dialogue, and a fixed body, called the Agent Kernel (detailed information is available in [3]). For the design of our speech interface we made some assumptions related to the PA and to its operation: •

the PA works concurrently with all the user’ s activities; the user is doing her job and, once a while, needs to interact with her PA;



the PA may ask questions or start a dialogue if a previous user command cannot be executed, was misunderstood, or if the agent needs additional information for solving a problem;



the PA may alert the user when an event has occurred (e.g., incoming electronic message, or incoming response to a search request). Some assumptions concern the interaction:



the nature of the application leads us to consider a master-slave relationship where the user commands his PA;



the user makes statements that may be questions or declarative sentences;



the user may change the context of the conversation after a few number of utterances, introducing a break in the chain of discourse. The speech recognition technology has advanced quickly in the last decade and is now used in

commercial projects (see Kotelly [18] for further details). Already, many applications have voice capabilities. Speech recognizers can already handle simple inquiries about bank balance, movie schedules, and phone call transfers. Speech recognition is an extremely complex process, quite error prone, and cannot be solved today without a great deal of knowledge about what the utterances are likely to be. Thus, the following constraints must be taken into account: •

the results from an automatic speech recognition engine may be quite far from what the user actually said; the differences may be lexically and syntactically significant;



people address computers differently than they address other humans, trying to adapt to what they perceive as limitations of the machine as reported by Jurafsky and Martin [15].

6



Regarding the last point, we believe that if we could make the computer interact with people in a more realistic way, we could reduce their cognitive load.

4.The Speech Interface Architecture

The speech interface global architecture is shown )LJXUH. It has three parts: (i) graphical and

speech user interface (GSUI) modules; (ii) linguistic modules; and (iii) agency modules. *68,PRGXOHV produce outputs or collect the user’ s inputs, like capturing voice and handling GUI events. /LQJXLVWLF PRGXOHV are responsible for lexical and syntactical analysis and context verification. $JHQF\PRGXOHV are directly connected to the agent kernel, which can intelligently manage the dialogue and the interface with the help of an ontology. The architecture is detailed in the following paragraphs. 1A15=#0K L 

G 9?  

$ * ?; #

 

HI#% ;

(')#*,+ - . + / /012  3 '/54 J '7354

3  )-   '  60)5*73

* 8"09 : # " %&85 ;

P2"#88 "0! QSR *"#9#  9 G "# R ? 

)#  73 0 ' 

Suggest Documents