Usability design for spoken language dialogue

1 downloads 0 Views 193KB Size Report
Aug 12, 2002 - way the system is constructed from a technical point of view does not have an .... Another technique related to Wizard-of-Oz for in- formation ...
Usability design for spoken language dialogue Anders Green August 12, 2002 Abstract This paper discusses factors and scientific approaches to usability of spoken dialogue systems. The view taken is that these approaches fit into the models for usability engineering and iterative system development. We propose that different aspects of usability engineering becomes important depending if the development is research-oriented or application-oriented. The relation between these aspects needs to be investiged in more depth to assure rapid development that provides still provides a basis for furhter research and theory building.

1

Introduction

The purpose of this article is to review scientific approaches of usability engineering for spoken language dialogue interfaces. We will investigate usability engineering for spoken dialogue systems by examining different aproaches such as guidelines, evaluation metrics and design frameworks. These approaches may be grouped in different ways but in the following we will try to discuss them in the light of two process-oriented approaches proposed in Figure 1 and 2 (Gamm & Haeb-Umback, 1995) (Hulstijn, 2000). The term usability is a measure of the effectiveness, efficiency, and satisfaction with which specified users achieve specified goals in particular environments (ISO 9241). The ”particular environments” we are referring to are different types of spoken dialogue interfaces. Usability may be seen as a local aspect of a system, where focus lies on measuring particular features the system’s components. If focus lies on the system as an artefact in use we generally need to look at how useful the system is to the user in a wider perspective. We therefore need to distinguish between the terms usability and usefulness. Usefulness is a measure of whether the user can achieve a certain goal using the system. Usability is important for achieving usefulness but it is not the only criterion. The motivation and needs of users are also factors that influence the usefulness of the system. This means that systems that gets low usability scores according to some metric may still be

useful, given that the need and motivation is strong enough. The aim of this article is not to cover different technical approaches to system design; the view taken in the following is rather the opposite. The assumption is that no matter what kind of technical approach has been chosen for the system the factors affecting usability will be constant. That is not to say that the way the system is constructed from a technical point of view does not have an affect the usability. If we for the sake of discussion agree to see system development as a process like Figure 1. Analysis, design, implementation and evaluation are the steps that are iterated and lead towards a usable system (Hulstijn, 2000). To this picture we may add a set of stakeholders: users, designers, developers etc, and more recently customers who are buying spoken dialogue systems. ANALYSIS

DESIGN re-design

corpus error analysis logfile analysis scenarios task analysis simulation WoZ user testing

lexicon grammar

verification

EVALUATION

Actors Goals Context Scope Trigger Pre

objects actions

Success

database simulation prototype environment tools

use case B

datastructures control

Failure

p U s U t P U U l U o n n m N c

Body

IMPLEMENTATION

Figure 1: Iterative cycle, (Hulstijn, Figuredevelopment 1: Iterative development cycle 2000) Next

2

3 Analysis Spoken dialogue Development starts with interfaces a detailed domain and task

analysis. should result a specification There is noThis de phase facto standard for inspoken dialogueof the domain model. Analysis may start with descriptions system design like there is for graphical user interfaces, of the projected behaviour of user and system in a transwhere directOnmanipulation in the windows metaphor action. the basis of these, a corpus can be collected. Noun phrases and verbs in the transaction descriptions should roughly correspond to the concepts and actions 1 of the domain model. So preliminary versions of a lexicon and grammar can also be derived. A transaction scenario, or use case, is a detailed account of the context and goals of the different actors involved in a transaction,

C a

Else

Figure 2: adapted fro

3

Figure 2: Usability engineering, (Haeb-Umback, 1995)

(WIMP1 ) is the most common interface to personal computers. There are some areas where spoken dialogue systems are becoming more conventionalized, for instance in telephony based interfaces for information access. Something wich is a sign of the maturation of the speech interface design and needs to be considered when attempting at finding a best practice for the design and development of spoken dialogue systems. However, this article is focused on the design of spoken dialogue systems, leaving the reactive, commandbased systems aside. The spectrum of spoken dialogue systems is quite broad. On the one hand we have systems aiming at providing full-blown natural-like conversation with computers. These, very complex, systems mostly reside in research laboratories or are used for entertainment. Design in these systems is aimed to improve the system’s conversational style. On the other hand, and on the other end of the spectrum keyword-based or touch-tone telephony applications provide task-oriented interaction providing access to databases of various kinds. To this group we may add purely reactive, command-based systems, e.g., system providing speech access to graphical interfaces or dictation systems that comprise text editing modes. The class of systems we are considering in the following resides somewhere in between. A typical vanilla flavour system would be equipped with an automatic speech recognizer, a linguistic interpreter, some kind of dialogue handler and a spoken language generator producing spoken prompts (either pre-recorded or using synthesised speech). 1 Window,

Icon, Menu, Pointing device

User-centered design and Usability Engineering

The idea of user-centred design involves considering input from the users at all stages in the process of creating a novel system. This iterative process has fours phases: analysis, design, implementation and evaluation. In every phase a set of methods can be used leading to the next stage. In Figure 1 this is depicted as a spiral movement. Each time a cycle is completed the complexity, or detail, of the system increases (Hulstijn, 2000). Another way of looking at the design process is put forward by (Gamm & Haeb-Umback, 1995). The usability engineering lifecycle in Figure 2 consists of five phases: functional specification where a set of features is determined, then there is the concept development phase where an outline of the dialogue is created. This dialogue is then iteratively developed and formally specified during the Dialogue specification phase using a protype developed and verified during the Rapid prototyping and Verification phases. There are other possible ways of looking at the design process, but we will stick to these models for the sake of argumentation, assuming that most methods available can be fit into one of these general schemas.

3.1

Analysis

An important step in the design process for a spoken dialogue system is to gain a solid understanding about the intended users. We need learn about their needs, expectations and how they behave both in terms of linguistic behaviour and otherwise taking all aspects that may affect the system design into consideration (Bernsen, Dybkjær, & Dybkjær, 1998; Hulstijn, 2000). One way of achieving this is by involving users during the whole process of creating a spoken dialogue system is important otherwise we may end up with a system that fails to meet the requirements and subsequently is not used (Dybkjær, Bernsen, & Dybkjær, 1997). To determine if an application meets the users’ needs and preferences focus-groups and userexperience forums can be assembled to determine whether the system concept is satisfactory (Potamianos, 1999). Focus groups can also be used early in the process to collect data for a requirements specification. Scenarios and use cases can be developed in different ways to get a deeper understanding of the task that the system should perform. At early stages of development simple scenarios of user behaviour may be recorded to inform design. At later stages scenarios and use cases can be evaluated more formally contributing to a requirements specifi-

cation that may be used in varous ways in implementation phase (Hulstijn, 2000). Another technique related to Wizard-of-Oz for information elicitation and evaluation by using simulated dialogue is described in (James, Rayner, & Hockenmy., 2000). By simulating the dialogue up to a certain point it is possible to ask the user what should happen when a certain command is issued. This kind of verbal protocol is also referred to as forward simulation of system actions (James et al., 2000). In this way it is possible to test specific parts of dialogue, without having set up a full blown Wizard-of-Oz technology. This is related to a technique for knowledge elicitation termed post-verbalization discussed in (Karsenty, 2001). As opposed to the well known method called think-aloud protocols in traditional Human-Computer Interaction where the user continuously comments on her actions continuously during the trial session, the user is told to comment on what was just said. This provides qualitative information at specific stages during the interaction (Karsenty, 2001). 3.1.1

Corpus-based analysis

By looking at a corpus of spoken language of taskoriented human-human dialogue may give a picture of what kind of vocabulary, speech acts, task and dialogue patterns that we have to handle in the spoken dialogue system we are designing. Apart from performing linguistic analyses we may also use spoken language corpora to evaluate our system (Bernsen et al., 1998). Pre-design studies can also be used to study humanhuman interaction in the domain of interest. The goal of pre-design studies is to help the designer see the task from a user perspective (Yankelovich, Levow, & Marx, 1995). Pre-design studies and rapid prototyping with subsequent software redesign is claimed to be inexpensive and fast compared to reportedly costly Wizard-of-Oz studies (Yankelovich et al., 1995). A technique related to this is dialogue distilling where human-human dialogue corpora is rid of human-like content leaving only content directly relevant to the task (J¨onsson & Dahlb¨ ack, 2000)

3.2

Design

During the design phase of the iterative development cycle, different models and prototypes are produced by a designer or a team of designers. The designers’ ability of creating interfaces that provide utility for their users depend on a set of different factors such as their knowledge of users, both in general and specific users having specific tasks. The designers’ knowledge of the properties of the language(s) for

which they are implementing the system. This knowledge covers linguistic theories and data on the lexical, structural and semantic properties of the language in question. Another aspect affecting design, which unfortunately cannot be addressed within the scope of this article, are the ways interface designers may structure their work and how they concieve the art of design, both in terms of in house practices and general design methodology. However, one important point is made by (Dybkjær & Bernsen, 2000) about designers. Dybkjaer claims that even if designers may discover a lot of usability problems they become to well accustomed to the system and avoid getting into trouble (Dybkjær & Bernsen, 2000). Involving users counter these effects, and is worth the extra resources needed. 3.2.1

Properties of users

Human factors limit what is possible to do with speech interfaces. Speech is transient and non-persistent, therefore human working memory becomes one of the most important constraining factors. Task complexity poses constraints on working and short-time memory (Cook, Angyus, & Campbell, 1999). Physical fatigue from speaking continuously (Shneiderman, 2000) and social issues concerning privacy and intrusion are also limiting factors. It is very likely that users’ conception of speech systems is during constant change given that the quality of existing systems is improving rapidly. Today most users have not used a speech interface, and may have diverse models of the capabilities of speech interfaces. The way users’ expectations and conceptions of speech interfaces affect the interaction has been addressed by (Weegels, 2000). In a recent study some users (n=60) were asked how they thought a spoken dialogue system for booking train tickets worked. Some users (n=6) thought that the system had some kind of speech recognizer, whereas others (n=6) thought that there were a set of fixed phrases stored in the computer. Other users (n=4) thought that they had to navigate in the system using the keypad of a telephone like common voice-response systems. This suggests that users have a limited knowledge of they way spoken dialogue systems works (Weegels, 2000). The users of the train table system also underestimated what they could say. For instance they did not believe they could say things like ”tomorrow” or ”next Sunday” instead they tried to use dates. Some of the subjects (n=6) believed that they could not say things like ”half past ten” saying ”ten thirty” instead. In another study, users led to believe they were interacting with

a computer provided less information to the computer which may show that they believed that the computer could only process little information at a time (Amalberti, Carbonell, & Falzon, 1993). In addition to this some users in Weegels’ study felt that they could use only short sentences, because recognition errors ”punished” the use of longer ones with more recognition errors (Weegels, 2000). Therefore, it seems that one strategy when users feel that the system lacks competence is to use telegraphic messages, trying to input information a little bit a time. Weegels (Weegels, 2000) concludes that designers should be aware that users brings along expectations formed by interactions with human operators and other systems. One way of forming users’ expectations of the system’s capabilities is to let users take a tutorial if it is possible. Users may become more efficient and more satisfied if they are given the opportunity of learning the system before they use it. Initial experiences with a speech system seem to be important and a tutorial may ensure a successful first contact with a system (Kamm, Litman, & Walker, 1998). Users conception of system intelligence is not heavily dependent on the phrasing. Using a more anthropomorphic (humanlike) style was not deemed to be more intelligent than a more machine-like (Brennan, 2001). 3.2.2

Design guidelines

As speech technology is maturing guidelines have been developed for different classes of systems. For spoken dialogue systems guidelines may be grouped in two broad categories oriented towards general and pratical aspects of dialog development. General guidelines In the recent years different attempts at providing guidelines for designing spoken language interfaces have been done. The DISC project (Bernsen et al., 1998)) and the Guidelines for the Design of Advanced Voice Dialogues (Cheepen, Gilbert, Failenschmid, & Williams, 2002) are two examples of this. Another, more constrained, example is the aim of USI framework, Universal Speech Interfaces (Rosenfeld, Olsen, & Rudnicky, 2000) where a set of guidelines are proposed together with a set of specific keywords and phrasing forming a general language which may be extended with a task-specific part. Rather than providing robustness from a technical point of view they provide a limited set of universal keywords together with interactuion guidelines. The claim is

that the use of standardized input and output format will improve learnability across applications. Bernsen & Dybkjer (1998) provides seven aspects of interaction based on gricean maxims (Bernsen et al., 1998). These neo-gricean aspects are: Informativeness, Truth and evidence, Relevance, Manner, Partner asymmetry, Background knowledge and Repair and clarification. Examples of using these guidelines may be found on the DISC2 webpages: www.disc2.dk. In (Dybkjær et al., 1997) focus is on walk-up-anduse spoken dialogue with a great deal of first-time users and eleven human factors best practice issues are presented and discussed. Dybkjaer stresses the importance of acquiring as much relavant information about potential user as possible. The quality of the speech recognizers and th lingustics analysis is another set of points. Output voice qualiy, whether to use concatenated speech or synthesised speech is another issue. Providing relevant feedback using appropriate phrasing goes back on gricean principles. Dialogue models and initiative management is anothe area that needs to be considered. Handling errors in an adequate way is important, something which is coupled with providing help or interaction guidance. In (Gamm & Haeb-Umback, 1995) a set of guidelines for consumer electronics are proposed. Some of these are directly subsumed by the gricean guidelines by (Bernsen et al., 1998), for instance those concerning consistency and feedback. Two of them are worth to be examind a bit closer namely “Give the user the choice of input modality” and “Do not overload the voice input channel”. These guidelines are related to what Bernsen et al would consider modality appropriateness. It seems very important to consider the alternatives to speech before embarking on the development of a spoken dialogue system. Ergonomic choices: Another set of principles is termed ergonomic choices (Rosset, Bennacef, & Lamel, 1999) and affect the dialog management strategies of a spoken dialogue system. Freedom and flexibility: Avoid imposing constraints as long as the dialogue flows well. Negotiation: The possibility to accept or refuse system proposals. Navigation: The identifcation of a change in the task. Initiative: Who directs the progression of dialogue. Contact: Maintaining contact with the user, and not to let the user get lost and to immediate response.

Practically oriented guidlines If we look at more practical guidelines we find Naijjar (Najjar, Ockerman, & Thompson, 1998) who presents a compilation of seventeen practically oriented guidelines for speech recognition applications. The focus of these guidelines are more aimed at the speech recognition aspects of usability. Practical techniques to guide the users’ speech are also discussed by (Yankelovich, 1996). Prompt design is in focus rather than discourse strategies. Yankelovich gives examples of five main prompt types: Explicit and implicit prompts. Incremental prompts and tapering, where the quantitiy of information is varied. Explicit hints to the user of what can be said is also proposed. 3.2.3

Habitability

One aspect of design that has rendered some interest are issues related to habitability, which may be defined as a relation between what the system can manage and what the user feels comfortable with. Thus, a habitable interface is one where the users does not feel unnecessarily constrained by the systems ability of understanding (Hone & Baber, 2001). Providing a full-blown natural interaction giving the user maximum freedom of expression (Rosset et al., 1999) means that the designer faces a combinatorial explosion of possible phrases. There is a tradeoff between the number of possible user utterances and the lexicon and grammars employed in a system. In language the mapping of terms to referents is manyto-one, this problem is termed the vocabulary problem and needs to be addressed in any practical system (Brennan, 2001). There have been some attempts at of fighting the combinatorial explosion. Identifying a restricted subset of the particular language by looking at frequencies may be a way to overcome some of the problems. Even if this approach looks promising form a theoretical point of view, there are empirical counter claims (Brennan, 2001). One way of overcoming the vocabulary problem is to use careful phrasing in system responses. This phenomenon is called lexical entrainment and means that the user’s language is coloured by the way the computers speaks. Users adapt to the computer terms (Brennan, 2001; ZoltanFord, 1991). This mechanism may be used to control the behaviour of the user so that linguistic variability decreases and has been discussed in practical terms by (Yankelovich, 1996; ?). Another way of limiting the search space is to be aware that in terms of lexical linguistic variability is high between dialogues

with different users but relatively low within a dialogue (Brennan).

3.3

Implementation

One aspect of software development which affects usability engineering is the use of tools for rapid prototyping. Using rapid prototyping the design interation may be shorter and lead to improved dialogue (Gamm & Haeb-Umback, 1995). Using predefined dialogue objects like Nuance SpeechObjects2 may facilitate the use of standards and guidlines. However, from a research point of view, too much standardization in a tool may make it hard to try out new solution that does not easily fit into the set of features provided by a certain tool.

3.4

Evaluation

Judging from the number of available research reports on the topic of evaluation of dialogue systems this is the area which quantiatively gained the most interest from the research community in the last decade. This is perhaps reflected in the great number of different parameters that have been proposed for evaluting spoken dialogue systems. In the literature which has been covered for this article we found about fifty terms and discussing them in detail is not within the scope of this article. When considering methods for evaluation it is important to chose the right set of methods, since it seems that a single method for evaluation cannot give the necessary understanding of the usability issues involved. There also seems to be different approaches to evaluation depending on whether the focus is practical design of speech interfaces or if the focus is research. The focal point of the first group of designers is qualitative, rapid, low-cost and easily adaptive methods. Speech interface designers in the research community seem to be more interested in the completeness of methods, the depth of analysis and generalizability among dialogue systems. 3.4.1

Quick-and-Dirty usability

From the set of rapid methods for evaulation we may consider the qualitative methods for formative evaluation such as focus groups and interviews. The method using a simulated dialgue described in (James et al., 2000) gives the opportinty to reconfigure the system and replay parts of the dialgue making changes almost in real time. A common method to get quick user feedback on a specific design is to use questionnaires to assess qualitative parameters associated with user satisfaction. 2 cf.

www.nuance.com

Another approach is to use heuristic evaluation, for instance by applying a set of questions based on guidelines like CODIAL3 (Dybkjær & Bernsen, 2000) or to use a set of expert evaluators with experience of usability issues. We may also use some quantitative measurements that are easily collectable and provides rapid respones to designers are speech recognizer accuracy, taskcompletion rate and number of turns. The rationale for using rapid methods is to be able to speed up the development process. From a more scientific point of view we might argue that the use of the more rapid methods described above makes it possible to involve the users more often and that this in itself increases the potential of creating systems with a heightened degree of usability. 3.4.2

ADISE, some evaluation criteria are proposed concerning the choice of modality, feedback adequacy, naturalness of user speech and dialogue structure, adequacy of error handling and the number of interaction problems users face. Usability for spoken dialogue systems can be seen in relation to general usability properties for software quality (ISO, 1991). Hulstijn (1999) distinguishes three levels of abstraction where the first and most abstract level concerns quality factors. Usability is one of these quality factors and can at the second level be found as a set of properties concerning both design and evaluation. These properties are abstract but can be made concrete for a specific application type. At the third level we need to determine observable metrics to be able to collect data about the design. Hulstijn (1999) considers:

In-depth usability measures

In the following we will consider some frameworks for measuring usability for spoken dialogue systems. These frameworks have been developed in the reserach community as a means of evaluating spoken dialogue systems in depth. The most recognized model for usability evaluation is the PARADISE model which uses of battery of combined metrics, with the general aim of providing a way of predicting the usability i.e. stated as the goal: ”the system’s primary objective is to maximize user satisfaction” (Kamm 1998). In PARADISE maximizing the users satisfaction concerns two sub goals: maximizing task success and to minimize dialogue cost. In PARADISE a large set of measure are collected: Dialogue efficiency looking at elapsed time, number of system and user turns. The Dialogue quality in terms of recognition accuracy and a number of other measurements related to speech recognition. The Task sucess both seen as a quantitative measure of comparing users performance on the task with a key and as a qualitative measure of users’ percieved task completion. User satisfaction was measured using a questionnaire containing a set of qustions concerning the behavior of the system4 The fifteen evaluation criteria of the model proposed by (Dybkjær & Bernsen, 2000) covers a broader spectrum than the measures in PARADISE. In addition to the evaluation critera mentioned in PAR3 cf.

DISC2 webpageswww.disc2.dk judged the system according a set of questions concerning: TTS performance, ASR Performance, Task Ease, Interaction pace, Users expertise, System response, Expected (system) behavior, Comparable Interface, Future Use 4 Users

– effectiveness, the accuracy and completeness with which users achieve their task, – efficiency, is the relative effectiveness of a system set in relation to the effort to achieve it, – transparency & coherence, a system is transparent if the user’s mental model of the system capabilites and behavior coincides with the design of the system. The coherence of the system is to what degree the system’s utterances are combined into a wellformed dialogue. – satisfaction, is defined as they way users percieve the system and may be seen as a measure of percieved usability.

4

Concluding discussion

Modeling usability in great depth is still an open question within the research community. Hulstjin (Hulstijn, 2000) argues that the neo-gricean principles of (Bernsen et al., 1998) can be used as guidelines, but may become confusing as maxims may come in conflict sometimes. The evaluation framework of PARADISE provides a model for predicting usability by means of user satisfaction. Even if PARADISE provides an account of what properties that affect usability and gives an indication on whether a system gets better of worse after it has been redesigned, it provides little guidance in terms of design choices. The principled approach taken in (Bernsen et al., 1998) (i.e. CODIAL) is somewhat more hands on, but there is a risk that the great number of principles and issues to be considered means that designers are reluctant to go trough all steps necessary to evaluate the system in depth. Using gricean maximes as the starting

point requires a great deal of theoretic understanding of lingustics from the designers point of view. The five techniques described in (Yankelovich, 1996) are practical hands on advices which for most purposes seem to cover what is communicated in the principle-based approach (Bernsen et al., 1998). After considering the alternative approaches to usability and design, we propose that there are (at least) two different aspects of usability engineering for spoken dialogue systems. One aspect is more oriented towards research in usability and the other approach is more practically oriented towards applications. The time frame, and reusability are important factors that affect these aspects. Resarch-oriented methods often concerns collecting large amounts of data from human-to-human dialogue or simulated humanmachine dialogue with the overall goal of providing theories about human-computer dialogue that can be applied in dialogue design. Much effort is put into annotation and the possibility of re-using corpus data. Practically oriented methods strive towards generating artefacts, systems that are going to be used. Data is also collected during this process, but focus lies on gathering information about application specific aspects rather than theory building. Evaluation within a research framework seems stress the importance of comparing strategies for dialogue systems (an thus the underlying theories). A more practically oriented approach focus on evaluation for usability and usefulness of prototypes at different stages of development. There is no doubt that these characterizations are too black-and-white and need to be refined further. More in-depth studies of spoken dialogue system design is needed before we may address the question of how to provide methods that optimizes the relationship between the swiftness needed in practical application development and approaches that allow for reflection, analysis and theory development.

References Amalberti, R., Carbonell, N., & Falzon, P. (1993). User representations of computer systems in human-computer speech interaction. International Journal of Man-Machine Studies, 38 (4), 547–566. Bernsen, N. O., Dybkjær, H., & Dybkjær, L. (1998). Designing interactive speech systems : from first ideas to user testing. London: Springer. Brennan, S. E. (2001). Automated spoken dialog systems. Cambridge, MA: MIT Press.

Cheepen, C., Gilbert, N., Failenschmid, K., & Williams, D. (2002). Guidelines for the design of advanced voice dialogue. Draft version at http://www.soc.surrey.ac.uk/ scs1ng/guidelines. (Department of Sociology, University of Surrey, UK and Vocalis Ltd) Cook, M. J., Angyus, C., & Campbell, C. (1999). Speech Interfaces in Real-Time Control Systems. In Proceedings of People in Control: An International Conference on Human Interfaces in Control Rooms, Cockpits and Command centres (p. 428-433). Dybkjær, L., & Bernsen, N. O. (2000). Usability issues in spoken language dialogue systems. Natural Language Engineering, 6 (Parts 3/4), 243–272. (in J. v. Kuppevelt, U Heid, U. and H. Kamp (Eds.):Special Issue on Best Practice in Spoken Language Dialogue Systems Engineering) Dybkjær, L., Bernsen, N. O., & Dybkjær, H. (1997). Designing co-operativity in spoken human-machine dialogues. In K. Varghese & S. Pfleger (Eds.), (pp. 104–124). Springer Verlag. Gamm, S., & Haeb-Umback, R. (1995). User interface design of voice controlled consumer electronics. Philips Journal of Research, 49 (4). Hone, K., & Baber, C. (2001). Designing habitable dialogues for speech-based interaction with computers. International Journal of Human Computer Studies, 54 (4), 637–662. Hulstijn, J. (1999). Modelling usability: development methods for dialogue systems. In J. Alexandersson (Ed.), Proceedings of the ijcai’99 workshop on knowledge and reasoning in practical dialogue systems. Stockholm. Hulstijn, J. (2000). Dialogue models for inquiry and transaction. Unpublished doctoral dissertation, University of Twente. ISO.

(1991). International standard iso/iec 9126. information technology – software product evaluation – quality characteristics and guidelines for their use, international organization for standardization. (International Electrotechnical Commission, Geneva)

James, F., Rayner, M., & Hockenmy., B. (2000). ”do that again”: Evaluating spoken dialogue interfaces. (Under review for HCI-Aero 2000)

J¨onsson, A., & Dahlb¨ ack, N. (2000). Distilling dialogues - a method using natural dialogue corpora for dialogue systems development. In Proceedings of 6th applied natural language processing conference (pp. 44–51). Kamm, C., Litman, D., & Walker, M. A. (1998). From novice to expert: The effect of tutorials on user expertise with spoken dialogue systems. In Proceedings of the international conference on spoken language processing , icslp98. Karsenty, L. (2001). Adapting verbal protocol methds to investigate speech systems use. Applied Ergonomics, 32, 15–22. Najjar, L. J., Ockerman, J. J., & Thompson, J. C. (1998). User interface design guidelines for speech recognition applications. In Ieee vrais 98 workshop interfaces for wearable computers. Potamianos, A. (1999). Design principles and toolds for multimodal dialog systems. In Proceedings of esca workshop in interactive dialogue in multimodal systems. Kloster Irsee, Germany. Rosenfeld, R., Olsen, D., & Rudnicky, A. (2000). Universal human-machine speech interface: A white paper (Tech. Rep.). Carnegie Mellon University. (CMU-CS-00-114) Rosset, S., Bennacef, S., & Lamel, L. (1999). Design strategies for spoken language dialog systems. In Proceedings of the european conference on speech technology, eurospeech (pp. 1535– 1538). Budapest. Shneiderman, B. (2000). The limits of speech recognition. Communications of the ACM, 43 (9), 63– 65. Shriver, S., Rosenfeld, R., Zhu, X., Toth, A., Rudnicky, A., & Flueckiger, M. (2001). Universalizing speech: Notes from the usi project. In Proceedings of eurospeech. Weegels, M. F. (2000). Users’ conceptions of voiceoperated information services. International Journal of Speech Technology, 3 (2), 75–82. Yankelovich, N. (1996). How do users know what to say? ACM Interactions, 3 (6). Yankelovich, N., Levow, G.-A., & Marx, M. (1995). Designing speechacts: Issues in speech user interfaces. In Proceedings of conference on human factors in computing systems chi ’95. Denver , CO.

Zoltan-Ford, E. (1991). How to get people to say and type what computers can understand. International Journal of Man-Machine Studies, 34, 527– 547.