Natural Spoken Dialogue Interaction: Technology ...

Natural Spoken Dialogue Interaction: Technology, Tools, Resources and Applications Todor Ganchev, Otilia Kocsis, Dimitrios Lyras, Nikos Katsaounos, Iosif Mporas, Alexandros Lazaridis, Theodoros Kostoulas, Stavros Ntalampiras, Mihalis Siafarikas, Ilias Kotinas, George Papadopoulos, Kyriakos Sgarbas and Nikos Fakotakis1 1 Profile of the Artificial Intelligence Group1(AIG) The Artificial Intelligence Group (AIG) is one of the seven research groups of the Wire Communications Laboratory (WCL – established in 1967) in the Department of Electrical and Computer Engineering, University of Patras, Greece. The unit has more than 30 years of continuous research activity in the areas of Speech & Language Technology and Artificial Intelligence. During this period, AIG has published over 300 scientific publications, contributed over 20 PhD dissertations, developed various resources (databases, research tools, technology-prove prototypes, etc) and participated (or is currently participating) as a partner or coordinator in more than 30 RTD projects. AIG has been developing speech, language and dialogue interaction systems for telecommunication and industrial applications, but also a significant part of its research focuses on theoretical and mathematical models of artificial intelligence methods and algorithms as well.

2 Tools & Resources The AIG team owns a professional studio, which is equipped as a smart-home environment, where various research, development and evaluation activities take place. This facility is also used for recording of speech and audio-visual databases. During the past decades, AIG invested significant efforts to develop spoken and written language resources. AIG, in cooperation with other institutions, developed three large scale speech recognition corpora: SpeechDat(II)-FDB-5000-Greek [1] and SpeechDat(Car)-Greek [2], Orientel [3] and participated in the development of various smaller domain-specific databases, among which is the MoveOn Motorcycle speech corpus [4] created for the needs of police information support systems. Furthermore, AIG participated in the creation of the PolyCost speaker recognition corpus [5], and developed on its own a prosodic database for textto-speech synthesis for Greek language [6]. Recently, AIG completed phase-I of the creation of a real-world emotional speech database [7] that supports research for the needs of the smart-home applications and in cooperation with other institutions the phase-I of the Audio database in support of potential threat and crisis situation management activities [8]. Presently, AIG is leading the recording of multimodal database of heterogeneous sensors 1 Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, 26500, RionPatras, Greece. URL: http://www.wcl.ece.upatras.gr/ai, E-mail: [email protected]

(microphone arrays, video cameras, infrared cameras, 3D cameras, movement sensors) which is created under the aegis of the FP7 Prometheus project [9-(1)]. Furthermore, various text corpora (with overall size over 50 Mwords were created [9-(2÷4)]), and recently the Korais lexicon with 80000 lemmas was completed [9-(5)]. A variety of Natural Language Processing (NLP) tools developed by the AIG have been utilized in research projects: morphological analysis, syntactic analysis [9-(6÷7), 10, 11], lemmatizers [12, 13], grapheme-to-phoneme and phoneme-to-grapheme convertors [15], etc, were developed to support the research on speech and language processing and artificial intelligence in general. Finally, AIG actively participated in the development of a generic platform for multilingual interactive natural interfaces [9-(13)], which provides an environment for semi-automatic generation of spoken dialogue interaction applications.

3 Research and technology development In the framework of various research and technology development projects, AIG developed innovative technology and a number of original paradigms for its use. Among the achievements are algorithms for speech enhancement [17], speaker localization and tracking [18], speech parameterization [19-21], speech recognition [22], speaker recognition [23], language and dialect recognition, emotion recognition [24], text-to-speech synthesis [25], methods for evaluation of voice-conversion systems [26], sound events recognition [27], etc. A number of algorithms for natural language processing were developed: a morphological analyzer employing Directed Acyclic Word Graphs [10], a grapheme to phoneme converter based on decision trees [11], a language independent lemmatizer that combines two alignment models based on the string similarity and the most frequent inflexional suffixes [12, 13], a PC-KIMMO based bi-directional grapheme-to-phoneme and phoneme-to-grapheme converter [14], a modular semantic parser [15] and a semantic analyzer for temporal expressions [16].

4 Applications Based on the speech and language technology algorithms and tools outlined in Sections 2 and 3, various spoken dialogue interaction applications, were developed. Among these are Call router for the Department of Electrical and Computer Engineering, University of Patras [28], Voice portal for the University of Patras [29], etc. In cooperation with other institutions AIG developed spoken dialogue interaction systems for smart-home environment [9-(14, 15)],

T. Ganchev, O. Kocsis, D. Lyras, N. Katsaounos, I. Mporas, A. Lazaridis, T. Kostoulas, S. Ntalampiras, M. Siafarikas, G. Papadopoulos, K. Sgarbas, N. Fakotakis: Natural Spoken Dialogue Interaction: Technology, Tools, Resources and Applications, Proc. of the System Demonstrations of the 18th European Conference on Artificial Intelligence, ECAI 2008, ISBN: 978-960-6843-17-4, July 23-25, 2008, Patras, Greece.

Access authorization services [9-(16)], Spoken information teleservices [9-(8÷12)], etc.

5 Demonstrations • Call router for the Department of Electrical and Computer Engineering, University of Patras, • Voice portal for the University of Patras, • Spoken dialogue interaction system for smart-home environment, • Entrance manager, • Greek voice for text-to-speech synthesis, • Language independent lemmatizer, • Morphological Analyzer for the Greek language, • Phoneme-to-grapheme and grapheme-to-phoneme convertor for the Greek language, etc, • Environment for building interactive natural interfaces.

6 REFERENCES [1] SPEECHDAT (LE-2 4001): "Speech Databases for the Creation of Voice Driven Teleservices". The Greek SpeechDat(II) FDB-5000 corpus is available through ELDA under the code S0118: http://www.elda.fr/catalogue/en/speech/S0118.html [2] SPEECHDAT-CAR (LE-8334): "Speech Data Basis for Voice Driven Teleservices and Control in Automotive Environments". The Greek SpeechDat-Car database is available through ELDA under the code S0146: http://www.elda.fr/catalogue/en/speech/S0146.html [3] Project OrienTel (IST-2000-28373): "Multilingual Access to Interactive Communication Services for the Mediterranean and Middle East". http://www.speechdat.org/ORIENTEL/index.html [4] T. Winkler, T. Kostoulas, R. Adderley, C. Bonkowski, T. Ganchev, J. Köhler, N. Fakotakis, “The MoveOn Motorcycle Speech Corpus”, in Proc. of the LREC’2008, 2008. [5] PolyCost: Available through ELDA: http://www.elda.fr/catalogue/en/speech/S0042.html [6] P. Zervas, N. Fakotakis, G. Kokkinakis, “Development and evaluation of a prosodic database for Greek speech synthesis and research”, Journal of Quantitative Linguistics, vol. 15, no.2, pp. 154-184, 2008. [7] T. Kostoulas, T. Ganchev, I. Mporas, N. Fakotakis: “A Real-World Emotional Speech Corpus for Modern Greek”, in Proc. of the LREC’2008, 2008. [8] S. Ntalampiras, I. Potamitis, T. Ganchev, N. Fakotakis, “Audio Database in Support of Potential Threat and Crisis Situation Management”, in Proc. of the LREC’2008, 2008. [9] RTD Projects with the participation of the AIG: http://www.wcl.ece.upatras.gr/Research/projects.htm (1) PROMETHEUS (FP7-ICT-214901-2007): "Prediction and interpretation of human behaviour based on probabilistic structures and heterogeneous sensors ", http://www.prometheus-fp7.eu (2) ESPRIT-860: "Linguistic Analysis of the European Languages", (3) POLYGLOT (ESPRIT II-2104): "A Multilanguage Speech-to-Text and Text-to-Speech System", (4) TRANSLEARN (LRE, 61-016): "Interactive Corpus-based Translation Drafting Tool", (5) KORAIS Lexicon, (6) GRAMCHECK (MLAP 11): "A Grammatical and Style Checker", (7) TRANSLIB (LIB-3038): "Advanced Tools for Accessing Multilingual Library Catalogues", (8) ACCeSS (LE-1 1802): "Automated Call Center through Speech Understanding System", (9) VASME (TRANSPORT PL-00010): "Value Added Services for Maritime Environment", (10) IDAS (LE-38315): "Interactive, Telephone-based, Directory Assistance Services",

(11) E2M (IST-2000-30167): "From e-services to mobile services", (12) COST 278: "Spoken Language Interaction in Telecommunication", (13) GEMINI (IST-2001-32343): "Generic Environment for Multilingual Interactive Natural Interfaces", (14) INSPIRE (IST-2001-32746): "Infotainment Management with speech Interaction via Remote-Microphones and Telephone Interfaces", (15) LOGOS (EHΓ-102): "A general architecture for speech recognition and (user friendly) dialogue interaction for advanced commercial applications (LOGOS)", (16) Amigo (IST-004182): “Ambient Intelligence for the networked home environment”, Entrance Manager Application, [10] K. Sgarbas, N. Fakotakis, G. Kokkinakis, “A Straightforward Approach to Morphological Analysis and Synthesis”, in Proc. COMLEX 2000, pp. 31-34. [11] D.P. Lyras, K. Sgarbas, N. Fakotakis, “Learning Greek Phonetic Rules using Decision-Tree Based Models”, in Proc. ICEIS 2007, pp.424-427. [12] D.P. Lyras, K. Sgarbas, N. Fakotakis, “Using the Levenshtein Edit Distance for Automatic Lemmatization: A Case Study for Modern Greek and English”, in Proc. ICTAI 2007, pp. 428-435. [13] D.P. Lyras, K. Sgarbas, N. Fakotakis, “Applying Similarity Measures for Automatic Lemmatization: A Case Study for Modern Greek and English”, Special Issue of the International Journal of Artificial Intelligence Tools dedicated to ICTAI 2007 (under review) [14] K. Sgarbas, N. Fakotakis, G. Kokkinakis, “A PC-KIMMO-Based Bidirectional Graphemic/Phonetic Converter for Modern Greek”, Literary and Linguistic Computing, vol. 13, no. 2, pp. 65-75, 1998. [15] K. Sgarbas, N. Fakotakis, G. Kokkinakis, “A Flexible Modular Semantic Parser for the Greek Language”, Technika Chronika-B, TEE, Greece (in Greek), vol. 13, no. 4, pp. 65-75, 1993. [16] K. Sgarbas, N. Fakotakis, G. Kokkinakis, “TEMPO: A Temporal Subparser for Modern Greek”, International Journal on Artificial Intelligence Tools, vol. 7, no.1, pp. 103-120, 1998. [17] I. Potamitis, N. Fakotakis, G. Kokkinakis: “Speech Enhancement Based on Combining Perceptual Enhancement and Short-Time Spectral attenuation”, in Proc. ICSLP 2002, vol. 3, pp. 1785-1788, 2002. [18] T. Giannakopoulos, N.-A. Tatlas, T. Ganchev, I. Potamitis: A Practical, Real-Time Speech-Driven Home Automation Front-end, IEEE Transactions on Consumer Electronics, IEEE May 2005, vol.51, no.2, pp. 514–523. [19] M. Siafarikas, T. Ganchev, N. Fakotakis, G. Kokkinakis: Overlapping Wavelet Packet Features for Speaker Verification, in Proc. InterSpeech2005, pp.3121-3124. [20] I. Mporas, T. Ganchev, M. Siafarikas, T. Kostoulas, "Comparative Evaluation of Speech Parameterizations for Speech Recognition", In Proc. ICTAI 2007, pp. 510-513. [21] M. Siafarikas, I. Mporas, T. Ganchev, N. Fakotakis, “Speech Recognition Using Wavelet Packet Features”, Journal of Wavelet Theory and Applications, ISSN 0973-6336, 2008. [22] K. Georgila , N. Fakotakis , G. Kokkinakis, “Large Vocabulary Search Space Reduction Employing Directed Acyclic Word Graphs and Phonological Rules”, International Journal of Speech Technology, Kluwer Academic Publishers, vol. 5, no. 4, pp. 355-370, 2002. [23] T. Ganchev, I. Potamitis, N. Fakotakis, G. Kokkinakis, “TextIndependent Speaker Verification for Real Fast-Varying Noisy Environments”, International Journal of Speech Technology, Kluwer Academic Publishers, vol. 7, no. 4, pp. 281–292, 2004. [24] T. Kostoulas, T. Ganchev, N. Fakotakis, “Study on speakerindependent emotion recognition from speech on real-world data”, Cost2102 Workshop, LNCS, 2007. [25] A. Lazaridis, P. Zervas, G. Kokkinakis, “Segmental duration modeling for Greek Speech Synthesis”, in Proc. ICTAI 2007, pp. 518-521. [26] T. Ganchev, A. Lazaridis, I. Mporas, N. Fakotakis, “Performance Evaluation for Voice Conversion Systems”, LNCS, 2008. [27] S. Ntalampiras, I. Potamitis, N. Fakotakis, “Automatic Recognition of Urban Environmental Sound Events”, in Proc. CIP Workshop 2008. [28] +302610 996841, [29] +302610 996842.