Dialogue Management in an Automatic ... - Semantic Scholar

Dialogue Management in an Automatic Meteorological Information System Luis Villarejo, Nuria Castell, and Javier Hernando TALP Research Center, Universitat Politecnica de Catalunya Campus Nord - A0, Jordi Girona 1-3, 08034 Barcelona, Spain fluisv, castell, [email protected]

Abstract. In this paper we present a real automatic meteorological in-

formation system that, not only provides friendly voice access to realtime data coming from automatic sensors, but also establishes an automatic warning service on the weather. It aims to extend the availability, personalization and friendliness of the meteorological information by means of a reusable easy-to-use friendly oral natural language interface. This interface takes advantage of the improvements in speech processing, dialogue handling and the great growth of mobile telephony. After the description of the functionalities of the system and its architecture, we present in detail the features of the dialogue manager. The main goals we have considered are: to provide the right information, to design a friendly interface, and to help the user never getting lost during the dialogue. Keywords: intelligent interfaces, natural language and speech processing.

1 Introduction Meteorological information has long been provided by television and radio as weather reports scheduled at a xed timetable giving impersonal information. Nowadays there are telephonic systems that oer this information but in an unnatural way. Usually these systems oer some menus operated via the telephone keyboard and a very limited vocabulary. These features condemn the system to failure in the real world due to a lack of naturalness and uency in the dialogue. This kind of systems usually provide data more general and updated less frequently than desired. The interest and research on interactive speech systems has increased in the last years due to the extended use of telephone information systems. We should mention the TRINDI[1] (Task-Oriented Instructional Dialogue) project, which focuses in generic technology for the creation of a dialogue movement engine. Nowadays there are systems that oer a good level of interaction during the information exchange. Some examples are the following: ARISE[2] (Automatic Railway Information Systems for Europe), TRAINS[3], and BASURDE[4] (Spontaneous Speech Dialogue System in a Semantically Restricted Domain), all of

them about railway information, and ATIS[5] (Air Travel Information System), about ights. Only few systems are designed to oer meteorological information in a user-friendly way. A reference work in this eld is the JUPITER[6] project. The system described here provides personalized real-time data, in the Catalan language, on a set of meteorological conditions on each place of the Catalan geography through an easy-to-use natural language interface. It also provides an alarm and warning system, based on the same interface, that keeps the user informed on the variables of his/her interest whenever they occur. The whole system has been constructed on the basis of VoiceXML (the voice standard promoted by the World Wide Web Consortium), speci cally the voice framework and the dialogue manager. By building a voice framework based on VoiceXML, the dialogue manager becomes independent not only of the voice technology but also of the application logic. The development of this system has been promoted by the Catalan government, aTTemps project, and it is actually running. In this paper we start by presenting, in section 2, the services oered by the system and its meteorological data source, the Catalan Meteorological Service. In section 3, the system architecture is presented and a brief description on how the system works is done. After that, the issues related to the dialogue manager are discussed in section 4. Section 5 introduces the speech processing. In section 6 the evaluation of the system is discussed. At the end of the document, section 7, we present the conclusions and further work.

2 The Catalan Meteorological Service and aTTemps The information oered by the system is collected by the Servei Meteorologic Catala (SMC)[7] from four dierent sources: 1)The Xarxa Meteorologica (XMET, Meteorological Net) and the Xarxa de Vigilancia i Previsio de la Contaminacio Atmosferica (XVPCA, Atmosferic Pol-

lution's Surveillance and Forecasting Net) nets consists of ninety one automatic meteorological ground stations, scattered over the Catalan geography. These stations are constantly acquiring information on dierent variables, such as temperature or direction and force of the wind, in order to supply measures of each of them every half an hour (via satellite or modem) to the meteorological centre.

2)The Xarxa d'Instruments Oceanogra cs i Meteorologics (XIOM, Meteorological and Oceanographic Instruments Net) net consists of four automatic meteorological buoys, scattered over the Catalan coast. These buoys are constantly acquiring information on maritime variables, such as the sea disturbances, the ground swell and the height of the waves, to supply measures of each of them every sixty minutes (via radio) to the meteorological centre. 3)The rain forecast which provides information, updated twice per day, on the

rain's intensity and areas for the next thirty six hours. 4)The weather alarms which provide information, updated every half an hour, on risk situations such as hailstorms, danger of oods, avalanches or risk of res.

2.1 aTTemps Services

As we stated in the Introduction section, the system developed under the aTTemps project can be used in two ways, depending on the user needs: 1)Asking for immediate information: The user needs to know some meteorological variables or the rain forecast. The system initiates a dialogue to get the desired variable and the location of interest in order to provide the information. 2)Asking for activation/deactivation of a warning. The user needs to be noti ed (with an SMS or a voice message) on the weather conditions either when some meteorological conditions occur or just at a concrete time of the day. The system initiates the dialogue and saves the user warning pro le in its database. In all cases the user can ask for information not only in all the Catalan municipalities (949), but also in larger areas (regions, mountains, natural parks, etc), other spots which have a particular tourist interest (airport, etc), and popular abbreviations of names. In addition, if the system does not have information from the requested place, information from the closest place (climatologically speaking) will be provided.

2.2 aTTemps Dialogue Example

Here we will show a typical dialogue, translated from Catalan, in order to give an idea of the basic interaction between the user and the system. System: Welcome to the ATTEMPS meteorological warning and information system.You can always say help, correction, repetition or exit. From which Catalan place do you want information, or do you want to establish a warning? User: I am interested in Barcelona, please. System: From Badalona, which information do you need? User: I want to correct, not from Badalona. System: From which Catalan place do you want information? User: I want to know the weather right now in Barcelona. System: From Barcelona, which information do you need? User: I need the temperature, please. System: In the last measure at 14:30h, in the meteorological station of Barcelona, the temperature was 20 degrees. Do you need any other information? User: No, that was all. Thanks. System: Thanks for calling. Good bye.

3 System Architecture The system is made up of four main blocks , as can be seen in Figure 1, which correspond with: 1)the VoiceXML framework, which integrates the speech processing and the telephony hardware, 2)the VoiceXML dialogue manager (made up of documents and the application logic both in the web server), which integrates the dialogue control and the language generation, 3)the system databases and 4)the data acquisition module, which provides the real-time meteorological contents by accessing the SMC server. In order to understand later sections, a

Fig. 1. System architecture short overview of all modules (excluding the one of the dialogue manager that is going to be explained in detail in section 4) is done in this section. A more detailed overview of this system can be found in[14]. The VoiceXML framework is made up on the basis of the OpenVXI 2.0[8] from SpeechWorks[9]. An speci c development has been done in order to integrate the Dialogic[10] components for telephony purposes and the Ibervox[11] (cf. section 5) components for speech processing purposes. The databases module is composed of two main databases which store the two kinds of information that the system must maintain: the meteorological data and the user's pro les for warnings. The rst one (referred from now as meteorological database) is stored in a relational database which is updated constantly by the data acquisition module. While the second one (referred from now as user database) is stored in a LDAP (Lightweight Directory Access Protocol) database in order to optimize searches. This database is updated every time that a user activates or deactivates a warning message.

The data acquisition module consists of a process that acquires, every half an hour, the meteorological data in real time from the SMC databases and updates the local meteorological database. This process guarantees that we get the new data as soon as there is in the remote database. The system works as follows: 1) The framework waits for a telephone call. 2) When a call is received, the root VoiceXML document of the dialogue manager is executed by the framework. 3) This execution causes the reproduction of the welcome message (by means of the voice technology integrated in the framework) that gives a little notion about how to interact with the system. 4) Then the system asks to the user about which information he/she wishes. 5) The grammars that should pick up the user's answer are loaded. 6) When the user answers, his/her reply is given back to the document. Once the reply is received, the execution of the document can go on. 7)Depending on the user needs, the dialogue manager will ask another question to the user or will do a request on the server to get data from the database. 7.1) In that last case, the query is sent to the database interface and the result is picked up and used, in the server, to build a document with dynamic content that will be sent to the framework for execution. 8) Once all user requested data has been collected, the system provides the meteorological information (contained in the dynamically generated document) to the user in natural language or takes record of the warning pro le for that user.

4 The Dialogue Manager Module The main goals of the dialogue manager are: to provide the right information in the minimumtime, to be a friendly oral natural language interface that facilitates as much as possible the interaction with the user, and to guide the user in order to avoid situations where he/she would be lost. Setting up the dialogue manager strategy we have to take into account several factors including the dialogue ow, the con rmation policy, the amount of data per turn, the helping features and the language generation.

4.1 Dialogue Flow The dialogue is made up of turns in which typically a question, as clear as possible, is made to the user guiding him/her to the kind of answer expected by the system in each turn. Once the user has answered, the system processes the input and initiates the next turn that can contain another question or a message to the user. As it is seen, the automatic system takes the initiative guiding the dialogue, but it lets the user to answer with a great range of syntactic

possibilities. And what is not less important: lets the user to interrupt the system at any time in order to ask for help or repetition or even to answer a question before it is nished. This last property, called 'barge-in', strengthens the uency, naturalness and speed of the dialogue by shortening turns when a mistake is done or when the system faces an expert user who knows what the system is going to say, just listening to the beginning of the message. Taking into account all this factors, the owchart of the dialogue was designed in order to improve naturalness, as can be seen in Figure 2. Yes

No

No More info?

Final

Warning? No

Variable

Yes

Warning Place/ Warning

Variable/ Warning

No

Not exists in users DB Tel. number

Specific date Place

Variable

Exists in users DB

Value

Activation

Yes Day

Specific conditions No

Delete warning?

Time

Voice/ SMS

Confirm

More info?

Yes

Delete Warning

Yes

Fig. 2. Dialogue ow

4.2 Con rmation Policy

Every utterance got from the user must be con rmed in order to be sure that the system successfully understood what the user was saying. This is done by means of two dierent con rmation policies: explicit and implicit.Explicit con rmations are used to con rm either critical information or a big set of data by means of a direct question to the user, asking him/her whether the last information acquired was correct or not. Implicit con rmations can be used to con rm every data got from the user, the data appear by means of a short sentence at the beginning of the next question and thus without penalizing the uency of the dialogue with an extra question. In this work we mixed both policies, using implicit con rmations as a general rule, while explicit con rmations were used to con rm critical information or the nal step of a whole process. Giving as a result not only an ecient and eective interface, but also a natural dialogue

ow between the human and the system.

4.3 Amount of Data per Turn

The amount of data captured by the system in one turn is closely related, on the one hand, to the user's predisposition and naturalness in giving more than one

data in the same turn, and on the other hand, to the complexity of the task associated with each turn. We kept the balance between both aspects by recognizing more than one data only in turns that did not involve a high complexity task, like the one of getting the temperatures between the user wants to be noti ed, but isolating hard tasks, such as the one that gets the municipality, in order to improve its recognition success.

4.4 Helping Features

The helping features have been focused on two points. Firstly, a set of Catalan key words/expressions, as can be seen in Table 1, has been de ned in order to help the user to interact with the system. This key words can be accessed by saying the exact word or just an expression containing a key word (e.g. "Help", "Help me, please", "I need some help"). Secondly, an automatically adaptable helping message policy has been set up. During the preliminary evaluation of

Table 1. Catalan key words/expressions Key word or expression (Translated to English) "I need help", "Help". . . "Could you repeat?", "Repeat". . . "I want to correct", "Correction". . . "Bye", "Exit". . .

What the What the system does user needs when detects the situation Help Throws a helping message Repetition Throws the last message Correction Asks for the last value given Exit Throws a farewell message and nishes the dialogue

the system, we detected that the users missed more messages telling them what they could do in every moment. However, repeating in each turn helping information involves penalizing the uency of the dialogue because we could substantially increase the time spent in most dialogues. So we decided to take an intermediate solution. At the beginning, the system provides an initial message where welcomes the user, introduces itself and notices the user about the key words/expressions and its use: "Welcome to the Catalan government meteoro-

logical warning and information system. You can always say help, correction, repetition or exit". During the dialogue, an automatically adaptable help ser-

vice only provides help when the system detects the user did not understand something or has problems to continue with the dialogue. This help service is activated when one of the following three dierent situations is detected: the user keeps quiet, the user says something that is not understood by the system, or the user explicitly asks for help.

4.5 Language Generation

The language generation has been done using templates that are dynamically lled with the adequate data to originate the nal messages reproduced to the

user. An example of template is "In the meteorological station of [place] the [variable] at [hour] is [value] [measure]" which, once lled, can result for instance in "In the meteorological station of Blanes the temperature at 8:30 is 20 degrees". However, in order to introduce variability and more naturalness in the system's answer, the history of the dialogue and a set of dierent templates with the same meaning is kept. So in each turn of the dialogue the system gives its answers using dierent templates for the same kind of information. In addition, the system is ready to include a complete text generation module [13] designed for aTTemps, and based on linguistic components.

4.6 Implementation

Structurally the dialogue has been built by means of a set of VoiceXML dialogue documents, stored in a web server, that contained each one a dierentiated part of the dialogue. For example, the initial document gives the welcome message, gets the rst utterances from the user and invokes the document containing the part of the dialogue that ts the user needs. As we stated before, we decided to make the development of the dialogue manager as independent as possible from the managing of the speech technology in order to clarify the dialogue manager, and also to build a dialogue independent platform that can be easily reused by dialogue managers for other domains than the meteorological one. So we built a framework for dialogue managers and the dialogue manager itself based on the VoiceXML language, enabling telephony applications to be developed in an open-standards based environment.

5 The Speech Processing The recognition of the user utterance is based on non-stochastic grammars. These grammars have been developed following the Augmented Bacus-Naur Form (ABNF) that is used to specify languages, protocols and text formats. The ABNF grammar format uses special characters to de ne grammar expressions in a text string: such as '$' for non-terminal symbols, '[ ]' for optional values and 'f g' for return values. All grammars have been designed in order to give the user as much exibility as possible. Ibervox[11] has been used as the text to speech and speech recognition engine in Catalan for this system. The recognition system uses non-stochastic grammars and lling words. A con dence recognition measure is provided when the recognition has been done.

6 Evaluation Two evaluations were done while developing the system in order to accomplish three general objectives: to assess the eectiveness of the user-machine commu-

nication, to evaluate the global performance of the system, and to identify the possible lacks of the system that may have kept unnoticed to the people involved in the project. These evaluations became a part of the development process that helped to improve the system having into account the user opinion. Each evaluation consisted in giving user satisfaction surveys to the users, who answered lling a form on a web site. We decided to carry out an opinion poll letting two dierent groups of users interact with the system and polling them afterwards in order to determine which were the lacks of the system. The rst group was made up of colleagues not involved directly in this work; and the second one, with general public. In the web site, the users were asked to specify the degree in which they agree (from 1 for "completely disagree" to 6 for "completely agree") with 14 statements about the system. The form was divided into two parts: a mandatory one, with 6 statements; and an optional one, with 8 more statements. These statements were oriented to determine whether the speech recognition and the text-to-speech component were working properly, the ow of the dialogue and the behaviour of the system were natural and predictable, the user knew in every moment the actions that he/she could take, the help provided was useful, the repair strategy was useful and easy, and which were the parts of the dialogue where the user had a higher diculty to success. In this preliminary evaluation we obtained an average of 3.68 and 3.43 (over 6)for each evaluation group. These results are good enough taking into account that have been obtained while developing the system.

7 Conclusions and Future Work We have described a working automatic information system that provides two dierent services: on one hand, to obtain information about real time and personalized meteorological data, and on the other hand, to manage a meteorological warning service. Both services are accessed through a natural language interface, in Catalan, and help is provided when needed. The main advantages of the architecture of our system are: it separates the dialogue control, not only from the managing of the voice technology but also from the application logic, it provides a portable dialogue manager among dierent voice technologies, it provides an easily changeable interface, and it provides a portable voice framework among dierent domains application. All this occurs as a result of the VoiceXML standard based design and its architecture that clearly encapsulates every functional area in an independent component. The performance of our system, already running, is acceptable but, of course there are some points that can be improved. Firstly, the grammars can be enriched to recognize a greater range of utterances from the user. Secondly, the system messages and the grammars can be translated to other languages in order to incorporate speech tools in other languages satisfying tourists needs. And

nally, the text generation module[13] should be included to introduce more naturalness in the system answer.

8 Acknowledgements This work has been supported by the Catalan Secretary for the Information Society and by the Spanish Government (TIC2000-1735-C02-01). This project has been developed in collaboration with the Catalan Meteorological Service, Mensatec and ATLAS (Applied Technology on Language and Speech S.L.) companies, and the Phonetic Group of the Department of Filologa Espanola at the Autonomous University of Barcelona. We would also like to thank J. Padrell for his advices, and A. Febrer and A. Abad for his help in the early steps of the project.

References 1. TRINDI project: http://www.linglink.lu/le/projects/trindi. 2. Lamel, L., Rosset, S., Gauvain, J.L., Bennacef, S.: \The Limsi Arise System for Train Travel Information", in Proceedings of ICASSP'99(1999). 3. Allen, J.F., Miller, B.W., Ringger, E.K., Sikorski, T.: \A Robust System for Natural Spoken Dialogue", in Proceedings of ACL'96(1996). 4. A lvarez, J., Arranz, V., Castell, N., Civit, M.:\Linguistic and Logical Tools for an Advanced Interactive Speech System in Spanish", in Proceedings of IEA/AIE 2001, LNAI 2070, 2001. 5. Cohen, M., Rivlin, Z., Bratt, H.: \Speech Recognition in the ATIS Domain Using Multiple Knowledge Sources", in Proceedings of ICASSP'99(1999). 6. Zue, V., Sene, S., Glass, J.R., Polifroni, J., Pao, C., Hazen, T.J., Hetherington, L.: \Jupiter: A Telephone-Based Conversational Interface for Weather Information", IEEE Transactions on Speech and Audio Processing, vol. 8, n 1, pp. 85-96, 2000. 7. Servei Meteorologic Catala http://smc.gencat.es. 8. Eberman, B., Carter, J., Meyer, D., Goddeau, D.: \Building VoiceXML Browsers with OpenVXI", 2002. http://www2002.org/CDROM/refereed/260/index.html 9. SpeechWorks, provider of over-the-telephone automated speech recognition solutions. http://www.speechworks.com. 10. Dialogic, supplier of computer telephony products. http://www.dialogic.com 11. Ibervox from Atlas-CTI. http://www.atlas-cti.com/es/ibervoxasr.htm 12. Villarejo Munoz, L.: \Gestor de dialogo de un sistema de informacion meteorologica", in Spanish, Master Thesis in Informatics Engineering, FIB, 2002. 13. Garca Zorrilla, P. \Sistema d'acces en llenguatge natural a informacio meteorologica", in Spanish, Master Thesis in Informatics Engineering, FIB, 2002. 14. J.Hernando, J. Padrell, A. Bonafonte, N. Castell, J.B. Mario, C. Nadeu, J.A. R. Fonollosa, H. Rodriguez, A. Abad, L. Villarejo: \aTTemps: A Meteorological Information Service through the Telephone Network", Internal Report, UPC, 2002.