Articles Volume 13, Number 3 October 2009 Special

0 downloads 0 Views 2MB Size Report
Oct 3, 2009 - 9. Authors of published articles, commentaries, and reviews will .... Mandarin-speaking learners of English, a key issue is how to decide which features of ..... The answer is evaluated globally as correct or not and given a score ... area, the Java Speech Grammar Format (JSGF), can be converted to SRGS.
Articles Promoting Increased Pitch Variation in Oral Presentations with Transient Visual Feedback Abstract | Article PDF Rebecca Hincks & Jens Edlund KTH Royal Institute of Technology pp. 32–50 The Effects of Computer-assisted Pronunciation Readings on ESL Learners’ Use of Pausing, Stress, Intonation, and Overall Comprehensibility Abstract | Article PDF Mark Tanner and Melissa Landon Brigham Young University pp. 51–65 Podcasting: An Effective Tool for Honing Language Students’ Pronunciation? Abstract | Article PDF Lara Ducate and Lara Lomicka The University of South Carolina pp. 66–86 Comprehensibility and Prosody Ratings for Pronunciation Software Development Abstract | Article PDF Paul Warren, Irina Elgort, and David Crabbe Victoria University of Wellington pp. 87–102

Volume 13, Number 3 October 2009 Special Issue on Technology and Learning Pronunciation Columns From the Editors Article PDF by Dorothy Chun & Irene Thompson p. 1 From the Special Issue Editor Article PDF by Debra Hardison pp. 2–3 Emerging Technologies Speech Tools and Technologies Article PDF by Robert Godwin-Jones pp. 4–11 Announcements News from Sponsoring Organizations Article PDF pp. 12–15

Reviews Edited by Sigrun Biesenbach-Lucas Multimedia Learning Suite: Chinese Characters Article PDF Reviewed by Ching-Ni Hsieh and Fei Fei pp. 16–25 Academic Interactions: Communicating on Campus Christine B. Feak, Susan M. Reinhart, and Theresa N. Rohlck Article PDF Reviewed by Heather Weger pp. 26–31

Contact: Editors or Managing Editor Copyright © 2009 Language Learning & Technology, ISSN 1094-3501. Articles are copyrighted by their respective authors.

About Language Learning & Technology

Language Learning & Technology is a refereed journal which began publication in July 1997. The journal seeks to disseminate research to foreign and second language educators in the US and around the world on issues related to technology and language education. •

Language Learning & Technology is sponsored and funded by the University of Hawai'i National Foreign Language Resource Center (NFLRC) and the Michigan State University Center for Language Education And Research (CLEAR), and is co-saponsored by the Center for Applied Linguistics (CAL).



Language Learning & Technology is a fully refereed journal with an editorial board of scholars in the fields of second language acquisition and computer-assisted language learning. The focus of the publication is not technology per se, but rather issues related to language learning and language teaching, and how they are affected or enhanced by the use of technologies.



Language Learning & Technology is published exclusively on the World Wide Web. In this way, the journal seeks to (a) reach a broad audience in a timely manner, (b) provide a multimedia format which can more fully illustrate the technologies under discussion, and (c) provide hypermedia links to related background information.



Beginning with Volume 7, Number 1, Language Learning & Technology is indexed in the exclusive Institute for Scientific Information's (ISI) Social Sciences Citation Index (SSCI), ISI Alerting Services, Social Scisearch, and Current Contents/Social and Behavioral Sciences.



Language Learning & Technology is currently published three times per year (February, June, and October).

Copyright © 2009 Language Learning & Technology, ISSN 1094-3501. Articles are copyrighted by their respective authors.

Sponsor s, Boar d, and Editor ial Staff Volume 13, Number 3 SPONSORS University of Hawai`i National Foreign Language Resource Center (NFLRC) Michigan State University Center for Language Education and Research (CLEAR)

CO-SPONSOR Center for Applied Linguistics (CAL)

ADVISORY AND EDITORIAL BOARDS Advisor y Boar d Susan Gass

Michigan State University

Richard Schmidt

University of Hawai`i at Manoa

Editor ial Boar d Sigrun Biesenbach-Lucas

Georgetown University

Klaus Brandl

University of Washington

Thierry Chanier

Université Blaise Pascal

Robert Godwin-Jones

Virginia Commonwealth University

Lucinda Hart-González

Second Language Testing, Inc.

Philip Hubbard

Stanford University

Michelle Knobel

Montclair State University

Marcus Kötter

University of Münster

Marie-Noelle Lamy

The Open University

Meei-Ling Liaw

National Taichung University

Lara Lomicka

University of South Carolina

Noriko Nagata

University of San Francisco

John Norris

University of Hawai`i at Manoa

Lourdes Ortega

University of Hawai`i at Manoa

Jill Pellettieri

Santa Clara University

Joy Kreeft Peyton

Center for Applied Linguistics, Washington, DC

Patrick Snellings

University of Amsterdam

Maggie Sokolik

University of California, Berkeley

Susana Sotillo

Montclair State University

Mark Warschauer

University of California, Irvine

Editor ial Staff Editor s

Dorothy Chun

University of CA, Santa Barbara

Irene Thompson

The George Washington University (Emerita)

Trude Heift

Simon Fraser University

Carla Meskill

State University of New YorkAlbany

Managing Editor

Matthew Prior

University of Hawai`i at Manoa

Web Pr oduction Editor

Carol Wilson-Duffy

Michigan State University

Book & Multimedia Review Editor

Sigrun Biesenbach-Lucas

Georgetown University

Emer ging Technologies Editor

Robert Godwin-Jones

Virginia Commonwealth University

Copy Editor

Dennis Koyama

Kanda University of International Studies

Associate Editor s

Copyright © 2009 Language Learning & Technology, ISSN 1094-3501. The contents of this publication were developed under a grant from the Department of Education (CFDA 84.229, P229A60012-96 and P229A6007). However, the contents do not necessarily represent the policy of the Department of Education, and one should not assume endorsement by the Federal Government.

Information for Contributors

Language Learning & Technology publishes articles, commentaries, and reviews relating to the application of technology to second language learning, teaching, and research. Gener al Policies | Ar ticles and Commentar ies | Reviews Gener al Policies The following policies apply to all submissions: 1. Send articles and commentaries to [email protected]. If your submission is not acknowledged within one week, and you do not receive a vacation message, contact the Managing Editor ([email protected]). 2. Manuscripts that have already been published or are considered for publication elsewhere will not be considered. It is the responsibility of the author(s) to inform the editors of any similar work that has already been published or is under consideration for publication elsewhere. 3. All submissions should conform to the requirements of the Publication Manual of the American Psychological Association (5th edition). Authors are strongly encouraged to have their manuscripts proofread by an editor familiar with English academic prose and APA guidelines. Both American and British English spelling conventions are acceptable. Authors are responsible for the accuracy of all references and citations. 4. All submissions must be in Microsoft Word or html format. Accompanying images should be sent separately as .jpg or .gif files. Manuscripts containing attachments with extensions .com, .exe, .is, .jse, .html, .ws, or with multiple extensions (e.g., "webpage.txt.htm", "picture.htm.jpg") will be filtered by the mail system and will not reach us. Consult the Managing Editor ([email protected]) if you wish to send a file with one of these extensions. 5. Authors are strongly encouraged to take advantage of the electronic format by including hypermedia links to multimedia and other materials both within and outside the manuscript. 6. List the names, institutions, e-mail addresses, and if applicable, WWW addresses (URLs), of all authors. Include a 50-word biographical statement for each author. This information will be temporarily removed during the blind review. 7. Authors of accepted manuscripts will assign to LLT the permanent right to electronically distribute their article, but authors will retain copyright. Authors may republish their work (in print and/or electronic format) as long as they acknowledge LLT as the original publisher. 8. Requests for republication should be addressed to the author(s). LLT should be acknowledged as the original publisher. 9. Authors of published articles, commentaries, and reviews will receive 5 free hard-copy offprints of their contributions. 10. The editors of LLT reserve the right to make editorial changes to manuscripts accepted for publication for the sake of style or clarity. Authors will be consulted only if the changes are substantive.

Copyright © 2009 Language Learning & Technology, ISSN 1094-3501. Articles are copyrighted by their respective authors.

11. Minor edits will be made within 14 days after publication. Post-publication changes involving content will be made only if there is a problem with comprehensibility. Such changes will be accompanied by a note of revision. External links will be validated at the time of publication. Broken links will be fixed at the author's request. Ar ticles and Commentar ies 1. LLT publishes articles that report on original research or present an original framework that links second language acquisition theory, previous research, and second language learning and teaching practices that utilize technology. Articles containing only descriptions of software, pedagogical procedures, or those presenting results of surveys without providing empirical data on actual language learning outcomes will not be considered. 2. General guidelines are available for reporting on both quantitative and qualitative research (http://llt.msu.edu/resguide.html). 3. Articles should be no more than 8,500 words in length, including references and a 200-word abstract. Appendices should be limited to 1,500 words. Lengthy appendices should be included as hyperlinks and sent as separate files in .html or .pdf format. 3. Commentaries are short articles, typically 2,000-3,000 words, that discuss material either previously published in LLT or otherwise offering interesting opinions on issues related to language learning and technology. 4. Titles should not exceed 10 words and should be adequately descriptive of the content of the article. 5. All articles and commentaries go through a two-step review process: Step 1: Internal Review. The editors first review each manuscript to see if it meets the basic requirements (i.e., that it reports on original research or presents an original framework linking previous research, second language acquisition theory, and teaching practices), and that it is of sufficient quality to merit external review. Manuscripts that do not meet these requirements and are principally descriptions of classroom practices or software are not sent out for further review. The internal review takes 1-2 weeks. Following the internal review, authors are notified about the results. Step 2: External Review. Submissions which meet the basic requirements are then sent out for blind peer review by 3 experts in the field. The external review takes 2-3 months. Following the external review, the authors are sent copies of the external reviewers' comments and are notified as to the decision (accept as is, accept pending changes, revise and resubmit, or reject). Reviews 1. LLT publishes reviews of professional books and software related to the use of technology in language learning, teaching, and testing. 2. LLT does not accept unsolicited reviews. Contact Review Editor Sigrun Biesenbach-Lucas ([email protected]) if you are interested in having material reviewed or in serving as a reviewer. Send materials you wish to be reviewed to: Sigrun Biesenbach-Lucas 2133 Comus Court Ashburn, VA 20147 3. Reviews should provide a constructive critique of the book/software and include references to theory and research in second language acquisition, computer-assisted language learning, pedagogy, or other relevant disciplines. They should also include specific ideas for classroom implementation and suggestions for additional research. Copyright © 2009 Language Learning & Technology, ISSN 1094-3501. Articles are copyrighted by their respective authors.

4. Reviews should be limited to 2,000 words. Reviewers are encouraged to incorporate images (e.g., screen shots or book covers) and hypermedia links that provide additional information. 5. The following information should be included in a table at the beginning of the review:

Books

Software

Author(s) Title Series (if applicable) Publisher City and country Year of publication Number of pages Price ISBN

Title (including previous titles, if applicable) and version number Platform Minimum hardware requirements Publisher (with contact information) Support offered Target language Target audience (type of user, level, etc.) Price ISBN (if applicable)

Copyright © 2009 Language Learning & Technology, ISSN 1094-3501. Articles are copyrighted by their respective authors.

Volume 13 Number 3 October 2009 Special Issue On Technology And Pr onunciation ABSTRACTS PROMOTING INCREASED PITCH VARIATION IN ORAL PRESENTATIONS WITH TRANSIENT VISUAL FEEDBACK Rebecca Hincks and J ens Edlund KTH Royal Institute of Technology This paper investigates learner response to a novel kind of intonation feedback generated from speech analysis. Instead of displays of pitch curves, our feedback is flashing lights that show how much pitch variation the speaker has produced. The variable used to generate the feedback is the standard deviation of fundamental frequency as measured in semitones. Flat speech causes the system to show yellow lights, while more expressive speech that has used pitch to give focus to any part of an utterance generates green lights. Participants in the study were 14 Chinese students of English at intermediate and advanced levels. A group that received visual feedback was compared with a group that received audio feedback. Pitch variation was measured at four stages: in a baseline oral presentation; for the first and second halves of three hours of training; and finally in the production of a new oral presentation. Both groups increased their pitch variation with training, and the effect lasted after the training had ended. The test group showed a significantly higher increase than the control group, indicating that the feedback is effective. These positive results imply that the feedback could be beneficially used in a system for practicing oral presentations. Ar ticle PDF THE EFFECTS OF COMPUTER-ASSISTED PRONUNCIATION READINGS ON ESL LEARNERS’ USE OF PAUSING, STRESS, INTONATION, AND OVERALL COMPREHENSIBILITY Mar k W. Tanner and Melissa M. Landon Brigham Young University With research showing the benefits of pronunciation instruction aimed at suprasegmentals (Derwing, Munro, & Wiebe, 1997, 1998; Derwing & Rossiter, 2003; Hahn, 2004; McNerney and Mendelsohn, 1992), more materials are needed to provide learners opportunities for self-directed practice. A 13week experimental study was performed with 75 ESL learners divided into control and treatment groups. The treatment group was exposed to 11 weeks of self-directed computer-assisted practice using Cued Pronunciation Readings (CPRs). In the quasi-experimental pre-test/post-test design, speech perception and production samples were collected at Time 1 (week one of the study) and Time 2 (week 13). Researchers analyzed the treatment’s effect on the learners’ perception and production of key suprasegmental features (pausing, word stress, and sentence-final intonation), and the learners’ level of perceived comprehensibility. Results from the statistical tests revealed that the treatment had a significant effect on learners’ perception of pausing and word stress and controlled production of stress, even with limited time spent practicing CPRs in a self-directed environment. Ar ticle PDF Copyright © 2009 Language Learning & Technology, ISSN 1094-3501. The contents of this publication were developed under a grant from the Department of Education (CFDA 84.229, P229A60012-96 and P229A6007). However, the contents do not necessarily represent the policy of the Department of Education, and one should not assume endorsement by the Federal Government.

PODCASTING: AN EFFECTIVE TOOL FOR HONING LANGUAGE STUDENTS’ PRONUNCIATION? Lar a Ducate and Lar a Lomicka The Univer sity of South Car olina This paper reports on an investigation of podcasting as a tool for honing pronunciation skills in intermediate language learning. We examined the effects of using podcasts to improve pronunciation in second language learning and how students’ attitudes changed toward pronunciation over the semester. A total of 22 students in intermediate German and French courses made five scripted pronunciation recordings throughout the semester. After the pronunciation recordings, students produced three extemporaneous podcasts. Students also completed a pre- and post-survey based on Elliott’s (1995) Pronunciation Attitude Inventory to assess their perspectives regarding pronunciation. Students’ pronunciation, extemporaneous recordings, and surveys were analyzed to explore changes over the semester. Data analysis revealed that students’ pronunciation did not significantly improve in regard to accentedness or comprehensibility, perhaps because the16-week long treatment was too short to foster significant improvement and there was no in-class pronunciation practice. The podcast project, however, was perceived positively by students, and they appreciated the feedback given for each scripted recording and enjoyed opportunities for creativity during extemporaneous podcasts. Future studies might seek to delineate more specific guidelines or examine how teacher involvement might be adapted to the use of podcasts as a companion to classroom instruction. Ar ticle PDF COMPREHENSIBILITY AND PROSODY RATINGS FOR PRONUNCIATION SOFTWARE DEVELOPMENT Paul War r en, Ir ina Elgor t, David Cr abbe Victor ia Univer sity of Wellington

In the context of a project developing software for pronunciation practice and feedback for Mandarin-speaking learners of English, a key issue is how to decide which features of pronunciation to focus on in giving feedback. We used naïve and experienced native speaker ratings of comprehensibility and nativeness to establish the key features affecting comprehensibility of the utterances of a group of Chinese learners of English. Native speaker raters assessed the comprehensibility of recorded utterances, pinpointed areas of difficulty and then rated for nativeness the same utterances, but after segmental information had been filtered out. The results show that prosodic information is important for comprehensibility, and that there are no significant differences between naïve and experienced raters on either comprehensibility or nativeness judgements. This suggests that naïve judgements are a useful and accessible source of data for identifying the parameters to be used in setting up automated feedback. Ar ticle PDF

Copyright © 2009 Language Learning & Technology, ISSN 1094-3501. The contents of this publication were developed under a grant from the Department of Education (CFDA 84.229, P229A60012-96 and P229A6007). However, the contents do not necessarily represent the policy of the Department of Education, and one should not assume endorsement by the Federal Government.

Language Learning & Technology http://llt.msu.edu/vol13num3/editors.pdf

October 2009, Volume 13, Number 3 p. 1

FROM THE EDITORS Welcome to Volume 13, Number 3 of Language Learning & Technology, a special issue on Technology and Teaching Pronunciation. We hope you had a good summer and have renewed energy for the new academic year. It was a productive summer for many of you: LL&T received over 50 submissions! This special issue is guest edited by Debra Hardison, and we wish to thank her for an outstanding job throughout the process of identifying, selecting, and editing four excellent articles. In his Emerging Technologies column, Bob Godwin-Jones complements the articles with a superb summary of current speech analysis tools and technologies. We are pleased to present two excellent review articles, edited by Sigrun BiesenbachLucas, who has been indefatigable as our Reviews Editor. First, Ching-Ni Hsieh and Fei Fei review Multimedia Learning Suite: Chinese Characters, one of the most easily customizable flashcard programs for learning vocabulary. Second, Heather Weger examines Academic Interactions: Communicating on Campus, a book with an accompanying DVD, designed to raise ESL learners’ awareness about norms of academic discourse in the US and provide them with authentic examples of such discourse as well as practice in academic communication skills. Our CALL Theses page has been updated with the most recent CALL dissertations and we wish to thank Anne Rimrott (Simon Fraser University, Canada) for compiling the current list of Canadian dissertations and Jason Vickers (University at Albany, State University of New York) for updating the list of U.S. dissertations. During the last several years, LL&T has received 150 submissions per year on average, and with a growing readership as well, we have decided to implement a new content management system, ScholarOne (formerly Manuscript Central) from Thomson Reuters. Submissions will continue to be online, and the review process will also remain entirely online, though automated and streamlined. Full details will be given in our next issue in February 2010 when the system has been officially rolled out. LL&T continues to be a free journal, and we invite you to show monetary support by making a tax-deductible contribution directly to the “Language Learning and Tech Special Fund” through the University of Hawai‘i Foundation. This may be done online by clicking here or on the Contribute button on our homepage. Thank you for your support!

Sincerely, Dorothy Chun & Irene Thompson

Copyright © 2009, ISSN 1094-3501

1

Language Learning & Technology http://llt.msu.edu/vol13num3/speced.pdf

October 2009, Volume 13, Number 3 pp. 2–3

FROM THE SPECIAL ISSUE EDITOR In the field of language learning, the teaching of pronunciation has undergone many changes. It has progressed from the early days of drills and strict error correction, through periods of disappearance from the classroom, to contemporary approaches that address the segmental and suprasegmental features within their discourse contexts. Technological advances have provided a range of tools to assist learners in the development of pronunciation skills in a variety of target languages. Visual displays of some features of speech production such as pitch are user-friendly and valuable sources of feedback. Increasing numbers of learners can avail themselves of such tools to practice the sounds of another language as a complement to classroom instruction or for self study. The four articles in this special issue highlight several of these important elements: creating meaningful visual displays, developing self-directed computerassisted pronunciation practice, assessing learners’ attitudes toward the use of technology in pronunciation improvement, and rating the components of secondlanguage speech. In the first article, “Promoting Increased Pitch Variation in Oral Presentations with Transient Visual Feedback,” Rebecca Hincks and Jens Edlund addressed the effect of a different type of intonation feedback generated from speech analysis. Instead of pitch contours, their system produced flashing lights of different colors, which showed users how much pitch variation they produced. Results of a training study revealed that learners of English as a foreign language showed a significant increase in pitch variation in giving an oral presentation and improvement in naturalness, and that they were satisfied with their training. In “The Effects of Computer-assisted Pronunciation Readings on ESL Learners’ Use of Pausing, Stress, Intonation, and Overall Comprehensibility,” Mark Tanner and Melissa Landon explored the effects of self-directed computer-assisted practice involving cued pronunciation readings by learners of English as a second language. Of interest were the learners’ perception and production of pausing, word stress, and sentence-final intonation. Results indicated a significant reduction in the number of instances where they were unable to perceive pauses and stressed syllables. Participants also improved in their ability to use word stress appropriately in a controlled production task. Learners’ use of technology outside the classroom was investigated by Lara Ducate and Lara Lomicka in “Podcasting: An Effective Tool for Honing Language Students’ Pronunciation?” Learners of French and German engaged in both scripted and extemporaneous podcasting of texts in their respective target languages over a period of a semester. Improvement was noted for some learners in terms of comprehensibility and accentedness. Overall perception of the podcasting project by learners was positive, and suggested a way to address pronunciation as a companion to classroom instruction. In the final paper, “Comprehensibility and Prosody Ratings for Pronunciation Software Development,” Paul Warren, Irina Elgort, and David Crabbe addressed the issue of identifying the parameters to be used in establishing automated feedback systems for pronunciation. Native-speaking raters evaluated the comprehensibility of learners’ recorded utterances, noting areas of difficulty. The segmental information in these utterances was then filtered out, and they were rated for nativeness. Findings pointed to important roles for sentence prosody (stress, intonation, and rhythm), word stress, and consonant and vowel pronunciation in the assessment of comprehensibility. In addition,

Copyright © 2009, ISSN 1094-3501

2

Debra M. Hardison

From the Special Issue Editor

the authors found no significant differences between naïve and experienced raters on either task. This special issue would not have been possible without the contributions of LLT Editor Dorothy Chun, and Managing Editors Hunter Hatfield and Matthew Prior. I am very grateful to the many reviewers who offered extensive comments on the submissions.

Debra M. Hardison Special Issue Editor

Language Learning & Technology

3

Language Learning & Technology http://llt.msu.edu/vol13num3/emerging.pdf

October 2009, Volume 13, Number 3 pp. 4–11

EMERGING TECHNOLOGIES SPEECH TOOLS AND TECHNOLOGIES Robert Godwin-Jones Virginia Commonwealth University Using computers to recognize and analyze human speech goes back at least to the 1970's. Developed initially to help the hearing or speech impaired, speech recognition was also used early on experimentally in language learning. Since the 1990's, advances in the scientific understanding of speech as well as significant enhancements in software and hardware have allowed speech technologies to be robust enough to be deployed in commercial enterprises. At the same time, developments in the fields of linguistics and second language acquisition have led to greater interest in using computers to help in developing speaking and listening skills. As a result, there are now powerful tools and technologies available for speech analysis. There has been significant progress in projects using speech technologies for teaching pronunciation and speaking. Some major commercial language learning software also now features these technologies. New standards are being developed in this area, which should allow advances and innovations to be shared more easily and implemented more widely. Visualization of Speech Since the earliest incarnations of computer-assisted language learning, it has been possible to display human speech in a graphic representation on a computer screen. The visual display generally has shown a waveform or pitch contour. Programs such as Visi-Pitch, available since the MS-DOS days, have been widely used to show a representation of a learner's utterance alongside that of a native speaker. Early pedagogical applications left it up to the learner to determine to what extent the generated contour matched the model. Little guidance was given on how to improve, other than the suggestion to try again. While the actual graphics display has remained the same in many applications of speech visualization, language learning implementations today tend to provide much more useful feedback and assistance to the learner in correcting pronunciation problems. Moreover, there is recognition today that the past practice of having students work with individual sounds and sentences out of context only goes part of the way towards helping with pronunciation, as it leaves out intonation at the sentence and discourse levels. Using artificially generated sentences does not necessarily put learners on the path to communicative ability with natural speech. Most language learning projects using speech analysis follow the same basic structure: students listen closely to model speech, paying attention to aspects of the native speaker's pronunciation, then are asked to generate the utterance themselves. They receive feedback, often both visual and auditory. In addition to spectrograms and waveforms, the visual feedback may include a representation of the mouth showing physically how the sound is to be produced. The audio feedback may be a recast of the utterance, some kind of encouragement/critique, or possibly the item produced in a different context. Students then repeat (possibly multiple times) and/or proceed to the next item. Some applications take the next step of having the learners practice the pronunciation skills in communicative exercises, whether in peer-to-peer roleplaying or in real or simulated social interactions. One of the issues often raised with the use of traditional approaches to visualizing speech patterns is the difficulty users may have in understanding or interpreting the significance of the displays. Some speech software developers have experimented with game-like interfaces for providing visual feedback, such as using speech input from the student to control movements in a game (replacing joystick controls) or using a racing car interface in which adherence to the road is determined by how far separated the user's utterance is from that of the model native speaker's (Gòmez et al., 2008). One ESL (English as a second

Copyright © 2009, ISSN 1094-3501

4

Robert Godwin-Jones

Speech Tools and Technologies

language) pronunciation program (Hearsay) used pronunciation accuracy in a bowling environment to determine the number of pins knocked down (Dalby & Kewley-Port, 2008). There are clear ways in which computer-based pronunciation training can be more efficient than classroom-based practice. The computer provides one-on-one individualized attention with patience enough to allow unlimited tries. Individuals proceed at a pace with which they are comfortable, processing feedback and using available tools as they are needed, or moving quickly through an exercise. The interaction with the computer is private, thus likely to be less stressful to the learner than repeated teacher corrections in a classroom environment. Computer-based training can supply many more native speaker voices as models, a recognized benefit to learning pronunciation. A software program can also be programmed to adapt to individual learner's progress by customizing practice to that student's needs, testing for transference of skills to other contexts or speakers, and doing interval checks to see if knowledge has been retained. The accumulated learner data can be valuable to researchers looking to improve software and learn more about optimal approaches to pronunciation training. Automatic Speech Recognition While a valuable tool for pronunciation training, computer-based speech recognition can be used as well to analyze learner speech more generally and to serve as the foundation for creating auditory interactions between the learner and the computer. This is possible through automatic speech recognition (ASR), which allows software to interpret the meaning of a speaker's utterance. This technology has come so far that in recent years it has been deployed widely in commercial systems such as travel reservations, stock price quotes, weather reports, or sports score reporting. Many consumers have had frequent dialogues with disembodied voices through automated phone ASR systems in a variety of help desk or customer service environments. ASR is also the basis for commercial dictation systems such as Dragon NaturallySpeaking. Such systems tend to work well with native speakers and within controlled vocabulary domains. The dictation systems increase accuracy further by allowing the software to be individually trained to recognize particular voices. Despite their use commercially, there are a number of concerns regarding ASR use in computer-assisted language learning (CALL). The most salient issue is the fact that systems built around the recognition of native speaker speech may not recognize the quite different accent of a language learner. If an ASR system is not reliable enough to understand a correct utterance from a learner, this can be a devastatingly frustrating experience. If, on the other hand, the tolerance is set so low as to recognize a wild approximation of the targeted sound, the learner is not likely to take away much from the interaction. ASR systems are likely to have problems as well with non-grammatical constructions and broken sentences. They tend to give accurate results when used with a pre-defined lexical domain and therefore have difficulty with non-standard vocabulary and off-topic comments. This means that they are less easily deployable (i.e., effective) in environments using free-flowing natural speech. Unfortunately, that is just the environment we want our language learners to be working in. Furthermore, such systems are complex, difficult and expensive to develop, and not easy to integrate into language learning programs. Nevertheless, there is immense interest in ASR for CALL, with its use steadily increasing in software for language learning. Despite limitations in accuracy and range, the promise of having a computer recognize spoken language holds so much potential for helping students improve speaking skills, that ASR represents an irresistible attraction to CALL developers. For an ASR system to be used with language learners, the speech recognition and analysis system needs to include components not necessary in a system targeted for use by native speakers. The language models, which together with the defined grammar form the database for speech recognition, need to include samples of non-native language. This is important both for the phonetic analysis and for the system's language grammar, which should be able to recognize common grammatical misconstructions. Unfortunately, collecting learner speech is not easy, due to logistical and legal issues. If the system is to be used with specific first language (L1) learners, that

Language Learning & Technology

5

Robert Godwin-Jones

Speech Tools and Technologies

simplifies the process, in that common characteristics of learner meta-language for that group can be included. This kind of targeting makes it more likely that frequent non-standard usage will be recognized and that specific feedback for that particular error pattern can be given. Early ASR systems used template-based recognition based on pattern matching. Today's systems are more sophisticated, complex, and “intelligent,” typically using a probability-based system known as the Hidden Markov Model (HMM). They are built around a very large collection of speech samples. The speech analysis works in the following way. The new input from the learner is received, digitized, and analyzed. Then the utterance is compared to the stored information in the database, together with the grammar model and lexicon. A statistical process generates a confidence score for possible matches and delivers an evaluation and feedback based on the results of that process. One of the variables that can be adjusted in that system is where to set the bar in terms of pronunciation accuracy. Setting the bar higher requires closer matches for the learner; too high a setting could be discouraging and counter-productive. That decision should be based on the kind of desired outcome for the learner using the program, namely whether it is simply comprehensibility (by the typical native speaker) or the higher goal of closeness to a native accent. In practice, this distinction is not always made. But as discussed in a seminal article, this should be a fundamental component of how a system is designed, as both the analysis and feedback could be quite different depending on the end goal (Neri, Cucchiarini, Strik, & Boves, 2002). Speech Analysis and Language Learning Software There are a number of tools for speech analysis, many of which can be and have been adapted for language learning. KayPentax (formerly Kay Elemetrics) markets the widely used Visi-Pitch, now at version four. The latest release features a waveform editor, auditory feedback and voice games. The games are quite basic, for example, an animated graphic based on pitch and amplitude of the sound input. The company also sells the popular Computerized Speech Lab (CSL), a powerful hardware/software speech analysis system. Its hardware is used in conjunction with a PC and offers high-quality input and output as well as a variety of software add-ons. The latter includes a Video Phonetics Program, which features a synchronized display of video and acoustic data. The most widely used option with CSL is Real-Time Pitch, which provides extensive analysis capabilities to compare two speech samples. KayPentax also sells Multi-Speech, a software only speech analysis program, with similar software options to CSL. The EduSpeak speech recognition system from SRI International supports 9 languages and includes program interfaces for use in Macromedia Director and Microsoft ActiveX. For oral language testing, the Versant suite of tests (available for English, Spanish and Arabic) incorporate speech processing and can be taken over the phone or on a computer. A free on-line demo for English illustrates how the 15-minute test works. There are as well non-commercial tools available. The Speech Analyzer from SIL International (formerly the Summer Institute of Linguistics) performs frequency and spectral analysis and can be used to annotate phonetic transcriptions. Of particular interest to language learning, it also allows for slowed playback and looping of audio. SIL International makes available other tools for language analysis and recording including Phonology Assistant and WinCECIL, both of which can be used together with Speech Analyzer. A widely used authoring tool is WinPitch LTL, a Windows desktop application. Teachers create lessons consisting of a sequence of speech models, to be repeated or imitated by the learner, with the model speech displayed in graphic form on the left and the learner's input on the right. The program features Unicode word processing and Web linking. As with similar tools, WinPitch LTL offers powerful capabilities but it is likely the rare language teacher who can find the time to create pronunciation lessons themselves from scratch. Teachers who do, like Marjorie Chan for teaching Chinese (2003), find that the flexibility and customizability of such tools, as well, of course, as evidence of student improvement, compensate for the time and effort involved.

Language Learning & Technology

6

Robert Godwin-Jones

Speech Tools and Technologies

Other speech analysis tools include RTSPECT, WASP (both from the University College London), WaveSurfer (from the Swedish Royal Institute of Technology), and the CSLU Toolkit (from Oregon Health and Science University). COLEA is a tool for speech analysis that works within Matlab, a wellknown tool in mathematics education. The Sphinx Group at Carnegie-Mellon University (CMU) also makes a set of open source speech recognition engines available. One of the few speech analysis programs currently available for the Macintosh and Linux platforms as well as for Windows is Praat (Dutch for 'talk'), from the Institute of Phonetic Sciences, University of Amsterdam. This is a widely used tool, perhaps in part due to the availability of an extensive tutorial, something often missing from open source tools. An interesting software tool in this area is the Hidden Markov Model Toolkit (HTK), from Cambridge University, a portable toolkit for building and manipulating hidden Markov models, used primarily in speech recognition projects. A widely-used free tool for audio recording, Audacity, also features spectrogram visualizations. It is available for Winows, Mac, and Linux. These open source tools can be used to build quite sophisticated language learning software. A free video annotation tool, Anvil, for example, allows for import of data from Praat, so as show voice patterns in a multi-layered video annotation. An interesting application built with Praat is SpeakGoodChinese, a Webbased tool for beginning Mandarin learners for help with recognizing and producing tones correctly. Instead of basing analysis of feedback on data from a collected database, a system was developed synthetically based on intelligent algorithms, to calculate possible erroneous pitch tracks. A similar program for learning Mandarin is the Pinyin Tutor, which adds an interesting twist. It asks the learner to type in the pinyin (Romanization of Mandarin) of a speech segment. If the pinyin is incorrect, a synthesized voice reads the pronunciation of the pinyin as the learner entered it. Other programs using open source tools include Fluency from CMU, an English language speech-based learner-computer conversation program, and CandleTalk, a similar English conversation program developed in Taiwan. While these programs are for the most part demonstration or research projects, there are high-profile commercial language learning programs that feature speech technologies. One of these is Tell Me More from Auralog. As can be seen from an on-line demo, the software engages the learner in a dialogue, asking a question (in both spoken and written form), which invites a spoken response from the learner. The answer is evaluated globally as correct or not and given a score between 1 and 7, shown as a series of rectangles. The learner sees a pitch curve and waveform display of both the learner's utterance and that of a model native speaker. In its specific pronunciation exercises, the program uses speech recognition for analysis and feedback, while showing simulated images of the mouth. It also uses voice recognition in language games, such as finding pronounced items in a word puzzle. Tell Me More allows role-play through the learner taking on a speaking role in a simulated TV program, with the learner practicing first in a “pronunciation workshop” with the lines to be delivered, then speaking those lines in the video playback together with the other characters being voiced by the native speaker actors. Rosetta Stone also uses voice recognition. The feedback for Rosetta Stone gives a correct/incorrect evaluation but also highlights individual sounds mispronounced, which are shown grayed out in the feedback. Several highprofile ESL/EFL programs such as DynEd's Intelligent Tutor (formerly Dynamic English), also makes extensive use of speech technologies. These programs are high-end and high-cost and, in terms of their use of voice recognition, have gotten mixed reviews in professional journals (Hinck, 2003), but nevertheless they enjoy wide use. Given their popularity, it is unfortunate that there are not studies that go beyond reviewing these products and analyze and evaluate their use in controlled language learning environments, including when used as a supplement in traditional classroom environments. Standards and Outlook Creating software that incorporates speech technologies is not an easy task and is made more difficult from the fact that there have not been commonly accepted standards in this area. Fortunately, that has been changing recently, with the W3C's (World Wide Web Consortium) efforts in the areas of the Speech

Language Learning & Technology

7

Robert Godwin-Jones

Speech Tools and Technologies

Recognition Grammar Specification (SRGS) and the Semantic Interpretation for Speech Recognition (SISR). These standards are beginning to be implemented. A SRGS-XML editor is available which uses a graphical user interface to create new grammars for voice recognition. The Java format often used in this area, the Java Speech Grammar Format (JSGF), can be converted to SRGS. The widely-used commercial Loquendo ASR uses SISR. It also incorporates a subset of ECMAScript, the official moniker for JavaScript. This use of the main Web browser scripting language, along with the fact that the main Web standards organization (W3C) is involved in standards setting, points to the fact that the future speech technology interface of choice is likely to be a Web browser, rather than the desktop application mostly used today. A speech recognition program which makes an interesting use of the Web is SPICE, another project out of CMU. It is a tool for developing speech processing for different languages and features both a tool for harvesting Web texts and a browser-based voice recorder for direct user recordings. The idea is to crowdsource development of a speech analyzer for languages for which such systems are not likely to be available. SPICE has been used to generate speech recognition tools for Afrikaans, Bulgarian, and Vietnamese. The system learns sound rules from analysis of user input, updating after each new word is added. Efforts such as this to encourage development of open source speech recognition capabilities for less-commonly taught languages are important, as it is unlikely that the small market share instruction in these languages represents will lead to the development of commercial products. The ability of software to learn and improve is, in fact, another growing trend in this area. Sagawa, Mitamura, and Nyberg (2004) describe the implementation of a system of correction grammars that are dynamically generated based on analysis of dialogues entered into the system. This is clearly bound to become an ever more important component of such systems, as the drive for more accuracy continues. Another likely direction for the use of voice analysis in language learning software is increased multimedia. Debra Hardison (2005) has shown how beneficial inclusion of video can be for pronunciation training. As shown in the Tell Me More software, video can also be valuable in extending the knowledge gained to real world situations. Indeed, the incorporation of real-world natural speech rather than scripted sentences is another direction that is important. As part of that development, it will be important for software to be able to deal effectively with discourse-level input. In one view, “the use of computer technology has furthered the dominance of sentence-level practice rather than promoting the use of discourse intonation” (Levis & Pickering, 2004). This is clearly not an easy task to take on, however it is particularly important if the software is to address not only prosody but social aspects of speech as well. Levis and Pickering demonstrate, for example, how important changes in tone can be in English to signal attitudes toward the content. Dealing with socio-linguistic aspects of language is a challenging proposition for language technology but intriguing as well, and one that invites new, innovative approaches. Most of the evaluations of speech recognition projects come from the researchers themselves. It would be useful to have more reviews and comparisons of different approaches. Also helpful would be more of the kind of study done by Engwall and Bälter (2007) that compares pronunciation training in the classroom and on the computer. Their study also explores different options for feedback than those typically given. For example, they put forward the suggestion of adding a third response for feedback, supplementing the traditional “correct” and “incorrect” with something along the lines of “satisfactory for the time being.” This is part of a strategy to encourage students yet provide honest feedback. It is important to be accurate, but also to be as positive as possible, a challenge when evaluating learner pronunciation. Feedback to users is a difficult issue due in part to the multiple sources of error in learner speech, including environmental variables. Feedback ideally should take into account the user's language level, L1 typical interference patterns, and the user's personal learning history (i.e., consideration of issues such as whether an error may be a persistent problem that needs special treatment). For most purposes, feedback should focus on errors that adversely affect comprehension, which may in fact be intonation or rhythm issues

Language Learning & Technology

8

Robert Godwin-Jones

Speech Tools and Technologies

rather than individual sounds. I agree with Engwall and Bälter that, given both the frequency of erroneous analysis and the variations in personal learning styles, the best practice in this area generally is to provide minimal feedback as a default, with much richer feedback available on request. It would be helpful to have more studies similar to that of Precoda and Bratt (2008) which deal with telltale indicators of nonnative versus native accents, including identifying particular phonemes or intonational patterns which act as signals of non-nativeness. Such studies, conducted on specific L1 (first language) and L2 (second language) combinations, would enable more specifically tailored feedback in speech applications. Finding confidence-building approaches to computer-based pronunciation training can be particularly important in an area so fraught with emotional baggage. Games can help; Engwall and Bälter (2007) suggest exploring a karaoke style game. An early example of using speech recognition in a game was TraciTalk, a detective story for learning ESL. It used IBM VoiceType to allow users to select from a list of choices on how to proceed to resolve the case. A game element is also introduced in Microworld, part of the Military Language Trainer (Holland, Kaplan & Sabol, 1999). It asks users to give commands, which are then carried out virtually on the computer. In fact, an incorporation of speech recognition into virtual reality is a compelling concept. An example of that combination is Zengo Sayu, an experimental program for learning Japanese. Incorporation of voice analysis is planned as a next step in a Mexican project for teaching English to engineering students. The U.S. military is also continuing in this direction in developing language-learning software. Maatman, Gratch, and Marsella (2005) describe a prototype of a “listening agent” which reacts to user's speech using gestures and posture shifts in an attempt to simulate what happens in actual human conversation. Finally, a direction for the near future is the appearance of voice recognition in mobile learning programs. Efforts in this area have been underway for some time. A high profile program has been used by the U.S. Army in Iraq to facilitate communication between American soldiers and Iraqi civilians. There are a number of projects underway that target mobile devices specifically. At CMU, for example, there is a good deal of interest in new smart phones, since they feature more powerful processors as well as consistent access to the Internet. This allows both the higher demands of voice processing as well as the possibility of bypassing the limited storage capacities of phones by accessing on demand server-based data. PocketSphinx is a lightweight speech recognition engine for mobile devices developed at CMU. The most recent version of the Apple iPhone has voice recognition built in (in 6 languages) but only for dialing numbers in the address book or playing music. Recently a voice API (application programming interface) for the iPhone, CeedVocal SDK (for English, French, German), has been released, which allows developers to build speech recognition into their iPhone applications. Google's Android phones also have some voice recognition capabilities built in. We are likely to see this trend continue and accelerate, and one would hope that future implementations look beyond catering exclusively to English language speakers, and that standards will develop in this area as well. Considering the enormity of the process involved in developing well functioning speech tools, the availability of accepted standards and agreement on common basic approaches would enable sharing when feasible and would ensure an easier path for developers, who would not need to learn to work with an array of different proprietary approaches.

REFERENCES Chan, M. (2003). The digital age and speech technology for Chinese language teaching and learning. Journal of the Chinese Language Teachers Association, 38(2), 49-86. Dalby, J. & Kewley-Port, D. (2008). Design features of three computer-based speech training systems. In V. Holland & F. Fisher (Eds.), The path of speech technologies in computer assisted language learning (155-173). New York: Routledge.

Language Learning & Technology

9

Robert Godwin-Jones

Speech Tools and Technologies

Engwall, O., & Bälter, O. (2007). Pronunciation feedback from real and virtual language teachers. Journal of Computer Assisted Language Learning, 20(3), 235-262. Gòmez, P., Álvarez, A., Martínez, R., Bobadilla, J., Bernal, J., Rodellar, V., & Nieto, V. (2008). Applications of formant detection in language learning. In V. Holland & F. Fisher (Eds.), The path of speech technologies in computer assisted language learning (44-66). New York: Routledge. Hardison, D. (2005). Contextualized computer-based L2 prosody training: Evaluating the effects of discourse context and video input. CALICO Journal, 22(2), 175-190. Hincks, R. (2003). Speech technologies for pronunciation feedback and evaluation. ReCALL 15(1), 3-20. Holland, V., Kaplan, J. & Sabol, M. (1999). Preliminary tests of language learning in a speech-interactive graphics microworld. CALICO Journal, 16(3), 339-359 Levis, J. & Pickering, L. (2004). Teaching intonation in discourse using speech visualization technology. System, 32(4) 505-524. Maatman, R., Gratch, J. & Marsella, S. (2005). Natural Behavior of a Listening Agent. In T. Panayiotopoulos, J. Gratch, R. Aylett, D. Ballin, P. Olivier, & T. Rist (Eds.), Intelligent virtual agents (25-36). Berlin: Springer. Neri, A., Cucchiarini, C., Strik, H., & Boves L. (2002). The pedagogy-technology interface in computer assisted pronunciation training. Computer Assisted Language Learning, 15(5), 441-467. Precoda, K., Bratt, H. (2008). Perceptual underpinnings of automatic pronunciation assessment. In V. Holland & F. Fisher (Eds.), The path of speech technologies in computer assisted language learning (7184). New York: Routledge. Sagawa, H., Mitamura, T. & Nyberg E. (2004). Correction grammars for error handling in a speech dialog system. In S. Dumais, D. Marcu, & S. Roukos (Eds.), HLT-NAACL 2004: Short papers (61-64), Boston: Association for Computational Linguistics.

RESOURCE LIST Articles on speech technologies • • • •

Interactive translation of conversational speech - IEEE article Learning by doing: Space-associate language learning using a sensorized environment Star Trek's universal translator: Coming soon to an iPhone near you? - Mobile voice analysis Talking paperclip inspires less irksome virtual assistant - Article on CALO (Cognitive Assistant that Learns and Organizes) • Virtual reality breathes second life into language teaching - Project in Mexico for teaching English to engineering students Speech analysis tools • • • • • •

Anvil - Video annotation tool Audacity - Spectrogram visualization and audio recording and editing CeedVocal - Speech recognition for the iPhone COLEA - Speech analysis freeware Computerized Speech Lab (CSL) - From KayPentax CSLU Toolkit - Multiple tools for speech analysis, recognition, generation, and display

Language Learning & Technology

10

Robert Godwin-Jones

Speech Tools and Technologies

• • • • • •

Janus - Speech translation system Julius - Open-source large vocabulary continuous speech decoder Let's Go - Info on the mobile speech analysis project from CMU Open Mind Speech - Open-source crowd-sourcing project for speech analysis project PocketSphinx - Sphinx for handhelds Phonology Assistant - Tool using IPA (international phonetic alphabet) characters to index and display data • Praat - Open source speech analysis program (multi-OS) • SFS/RTSPECT Version 2.4 - Windows tool for real-time waveforms & spectra • Speech Analyzer - From SIL International • TalkBank CMU - Speech technology project • The CMU Sphinx Group - Open source speech recognition engines • Video Phonetics Database and Program - From KayPentax • WaveSurfer - Open source tool for sound visualization and manipulation • WinCECIL - Tool for viewing speech recordings, automatic pitch contours, and spectrograms • WinPitch LTL - Voice processing software Language learning software and projects • •

• • • • • • • • • • • •

BetterAcent Tutor - Speech analysis program for ESL CAMMIA (A Conversational Agent for Multilingual Mobile Information Access) - from Language Technologies Institute, CMU CandleTalk - Conversation tool for English Fluency - Automatic foreign language pronunciation training (CMU) Intelligent dialog overcomes speech technology limitations: The SENECa example MyET-MyCT 3 - Uses speech analysis software for tutoring English or Chinese Review of Tell Me More Chinese - From Calico Review Sakhr Software - Mobile voice applications SpeakGoodChinese - Article SpeakGoodChinese - Chinese pronunciation program SPICE (Speech Processing Interactive Creation and Evaluation) - Speech processing models Tactical Language and Culture - US Army language learning software TransTac - Spoken language communication and translation system for tactical use Zengo Sayu - An immersive educational environment for learning Japanese

Language Learning & Technology

11

News From Our Sponsors

NEWS FROM SPONSORING ORGANIZATIONS Sponsors University of Hawai`i National Foreign Language Resource Center (NFLRC) Michigan State University Center for Language Education and Research (CLEAR) Co-Sponsor Center for Applied Linguistics (CAL)

University of Hawai‘i National Foreign Language Resource Center (NFLRC) The University of Hawai‘i National Foreign Language Resource Center engages in research and materials development projects and conducts workshops and conferences for language professionals among its many activities. CULTURA: WEB-BASED INTERCULTURAL EXCHANGES (OCTOBER 10-11, 2009) The Cultura project, pioneered at MIT by Gilberte Furstenberg and her colleagues, has inspired a variety of online cultural exchanges based on a set of principles and best practices. The Cultura: Web-Based Intercultural Exchanges pre-conference event will feature presentations by a variety of educators who have created exchanges based on the Cultura model. LANGUAGE LEARNING IN COMPUTER MEDIATED COMMUNITIES (LLCMC) CONFERENCE (OCTOBER 11-13, 2009) Once, computers were seen as thinking machines or electronic tutors. Now the computer has become one of many devices that people use to form virtual communities of all kinds. In the field of language education, computer mediated communication (CMC) enables students to interact with one another free of space and time constraints and to participate in communities of learning with their counterparts in the target culture. The Language Learning in Computer Mediated Communities (LLCMC) Conference explores the use of computers as a medium of communication in language learning communities. Conference highlights will include a plenary talk by Dr. Gilberte Furstenburg (MIT), a special panel presentation showcasing online cultural exchanges based at the University of Hawai‘i, and a variety of intriguing concurrent sessions. NEW NFLRC PUBLICATIONS Second language teaching and learning in the Net Generation Today’s young people—the Net Generation—have grown up with technology all around them. However, teachers cannot assume that students’ familiarity with technology in general transfers successfully to pedagogical settings. This volume examines various technologies and offers concrete advice on how each can be successfully implemented in the second language curriculum. Browse the table of contents
 Check out our many other publications.

Language Learning & Technology

12

News From Our Sponsors

OUR ONLINE JOURNALS SOLICIT SUBMISSIONS Language Learning & Technology is a refereed online journal, jointly sponsored by the University of Hawai‘i NFLRC and the Michigan State University Center for Language Education and Research (CLEAR). LLT focuses on issues related to technology and language education. For more information on submission guidelines, visit the LLT submissions page. Language Documentation & Conservation is a fully refereed, open-access journal sponsored by NFLRC and published exclusively in electronic form by the University of Hawai‘i Press. LD&C publishes papers on all topics related to language documentation and conservation. For more information on submission guidelines, visit the LD&C submissions page. Reading in a Foreign Language is a refereed online journal, jointly sponsored by the University of Hawai‘i NFLRC and the Department of Second Language Studies. RFL serves as an excellent source for the latest developments in the field, both theoretical and pedagogic, including improving standards for foreign language reading. For more information on submission guidelines, visit the RFL submissions page.

Michigan State University Center for Language Education and Research (CLEAR) CLEAR's mission is to promote the teaching and learning of foreign languages in the United States. Projects focus on materials development, professional development training, and foreign language research. CONFERENCES CLEAR exhibits at local and national conferences year-round. We hope to see you at ACTFL, CALICO, MiWLA, Central States, and other conferences. NEWSLETTER CLEAR News is a free biyearly publication covering FL teaching techniques, research, and materials. Download PDFs of back issues and subscribe at http://clear.msu.edu/clear/newsletter/. MATERIALS DEVELOPMENT Selected Products The list below comprises just some of our free and low-cost materials for language educators. Be sure to visit our website occasionally for updates and announcements on new products: http://clear.msu.edu. •

CLEAR’s Rich Internet Applications initiative has been underway for over a year. RIA is a research and development lab where our programmers are working on free tools that language teachers can use to create online language teaching materials—or have their students create activities themselves! o NEW! Revisions (process writing and feedback tool) o NEW! Broadcasts (create your own podcasts) o Worksheets (add multimedia elements to online worksheets) o Audio Dropboxes (put a dropbox in any web page; students’ recordings get put into your dropbox automatically) o Conversations (record prompts for students to do virtual interviews and conversations) o Mashups (combine media elements to create a new resource for language teaching)

Language Learning & Technology

13

News From Our Sponsors

Viewpoint (record or upload videos to link from other sites or embed inside your own web pages) o SMILE (tool for creating interactive online exercises) La phonétique française (CD-ROM) – This cross-platform multimedia program consists of interactive lessons that can be used by French teachers to learn how to teach pronunciation, or by advanced students working independently. Introductory Business German (CD-ROM) – This CD-ROM provides a condensed, highlyfocused set of activities intended for use by business professionals who conduct business with Germans and German companies and wish to learn more about the German business and economics environment. Celebrating the World’s Languages: A Guide to Creating a World Languages Day Event (guide) – This publication provides a step-by-step guide to planning “World Languages Day,” a university event for high school students designed to stimulate interest in learning languages and to highlight the importance of cultural awareness. MIMEA: Multimedia Interactive Modules for Education and Assessment (German, Chinese, Arabic, Vietnamese, Korean, Russian; online video clips and activities) o

• •





Coming Soon!

• •

More Rich Internet Applications Introductory Business Chinese

The Center for Applied Linguistics (CAL) The Center for Applied Linguistics is a private, nonprofit organization that promotes and improves the teaching and learning of languages, identifies and solves problems related to language and culture, and serves as a resource for information about language and culture. CAL carries out a wide range of activities in the fields of English as a second language, foreign languages, cultural education, and linguistics. Featured Resources: •

CAL News CAL News is our electronic newsletter created to provide periodic updates about our projects and research as well as information about new publications, online resources, products, and services of interest to our readers. Visit our Web site to sign up.



Alliance for the Advancement of Heritage Languages Visit the Alliance Web site to browse the Heritage Language Program Profiles, view the Heritage Voices Collection, and sign up to receive the quarterly electronic newsletter, Alliance News Flash.



Center for Research on the Educational Achievement and Teaching of English Language Learners (CREATE) Visit the CREATE Web site to learn more about CREATE, its research and upcoming events. To keep current on CREATE activities sign up to receive an electronic newsletter and periodic announcements.



CAL SIOP Professional Development Services CAL works with schools, states, and districts to design and deliver high-quality, client-centered professional development services on the SIOP Model.

Language Learning & Technology

14

News From Our Sponsors



CAL Services: Institutes on Teaching Reading to English Language Learners In response to growing requests from K-8 educators for training materials on teaching reading to English language learners, CAL will be offering additional institutes in Washington, DC in 2010. CAL provides a variety of professional development and technical assistance services related to language education and assessment needs.

Featured Publications: • • • • •

Refugees from Iraq (Expanded Refugee Backgrounder) Using the SIOP Model: Professional Development Manual for Sheltered Instruction Developing Reading and Writing in Second Language Learners Realizing the Vision of Two-Way Immersion: Fostering Effective Programs and Classrooms What’s Different About Teaching Reading to Students Learning English?

Visit CAL’s Web site to learn more about our projects, resources, and services.

Language Learning & Technology

15

Language Learning & Technology http:/llt.msu.edu/vol13num3/review1.pdf

October 2009, Volume 13, Number 3 pp. 16–25

REVIEW OF MULTIMEDIA LEARNING SUITE: CHINESE CHARACTERS Title Platform Minimum hardware requirements

Software requirements Publisher Support offered

Target language Target audience Price Publication year

Multimedia Learning Suite: Chinese Characters Microsoft Windows® 2000, Windows XP or Windows Vista USB 1.1 (USB 2.0 recommended) Intel® Pentium® Processor or compatible 128 MB RAM (minimum depending on the operating system) Graphics adapter and monitor capable of more than 256 colors (a resolution of 1024 x 768 or higher is recommended) Microsoft Internet Explorer 6 or later Microsoft .NET Framework 2.0 or above (included on stick) Windows Media Player 6.4 or above LearnLift (1) Complementary Study Guide (2) Learn to Learn booklet (3) MP3 Audiobooks (4) Website: http://www.memorylifter.com (5) Online community board Chinese (simplified) Beginner to intermediate-high, young or adult Chinese second language learners $79.95 2008

Review by Ching-Ni Hsieh and Fei Fei, Michigan State University Learning to read and write Chinese characters poses a great challenge for Chinese as a second language (L2) learners because of the complex Chinese writing system (Allen, 2008). The Chinese characters present a sharp contrast to alphabetic writing systems, such as English, in that Chinese orthography is based on the association of meaningful phonemes with graphic units while alphabetic writing systems are based on the association of phonemes with graphemic symbols (Feldman & Siok, 1999). In addition, Chinese characters, unlike alphabetic words that have a linear structure, have a square, nonlinear configuration (McNaughton & Li, 1999). Learning to memorize a sizeable number of Chinese characters in order to function in the Chinese language is time-consuming and requires repeated mechanical drills (Allen, 2008). Thus, the use of technological tools to assist the learning process is gaining increasing popularity due to the increase in vocabulary learning software readily available to language learners today (Allum, 2004; Liu, Jaeger, & Nakagawa, 2004). Multimedia Learning Suite: Chinese Characters (hereafter Multimedia Learning Suite) is a vocabulary learning software program specifically designed to facilitate users’ learning and long-term memorization of over 3,000 Chinese characters through multimedia flashcards. The intended users of the software are beginners to high-intermediate level learners of Chinese. Multimedia Learning Suite is Windows-only software and ships on a USB stick, along with the MemoryLifter software, a Complementary Study Guide, and a Learn-to-Learn booklet. The software, MemoryLifter, is a free virtual flashcard program developed by LearnLift, the publisher of the Multimedia Learning Suite. This flashcard software is designed as a tool to facilitate memorization of the meaning, pronunciation, and orthography of new words in different languages, including Arabic, French, Spanish, Polish, Portuguese, and Mandarin. The program can also be downloaded for free from the MemoryLifter website [www.memorylifter.com]. Multimedia Learning Suite is one of LearnLift’s latest commercial flashcard learning software programs that builds on the MemoryLifter learning technology. Copyright © 2009, ISSN 1094-3501

16

Ching-Ni Hsieh and Fei Fei

Review of Multimedia Learning Suite

While using the Multimedia Learning Suite to learn Chinese characters, users can also use the MemoryLifter program to produce new virtual flashcards with text, sounds, pictures, and videos in Mandarin as well as any other language since MemoryLifter is not a language-specific software program. MemoryLifter software employs the Box System, also called the Leitner system (LearnLift, 2008), and incorporates the principles of timed spacing and controlled repetition (Karpicke & Roediger, 2007). A generic procedure as to how the Box System algorithm is set up in the Multimedia Learning Suite is explained below, following the information provided in the software’s Help function. 1. Initially, all flashcards in each learning module serve as the initial pool. Within each learning module, there are ten virtual flashcard “boxes.” 2. When users open up a particular learning module, the software automatically checks to see if any of the ten boxes are full. If so, a flashcard is pulled from that particular box and presented to the user. If none of the boxes is full, a flashcard is moved from the initial pool to box 1 and presented to the user. 3. If the user answers the card correctly (i.e., keys in the correct translation or selects the correct definition of the Chinese character presented), the card is promoted to the next higher-numbered box. Any card that is answered incorrectly is demoted to box 1. 4. When the software places a card in a box, it always places the card at the end of each box. When pulling cards from the box, the software uses a first-in, first-out principle. 5. The software continues with step 2 to 4 until the pool is empty. 6. If the pool is empty, the oldest card from the learning module is recycled. This procedure is repeated until all the flashcards are moved to box 10. The use of the Box System allows learners to prioritize their learning by focusing on the Chinese characters that are difficult for them. In order to facilitate long-term retention, the software also adjusts the repetition spacing interval; that is, the characters that are problematic are shown more often while characters that are easier are shown less often. The difficulty level of the characters depends on the number of correct responses users give to each character. The Complementary Study Guide contains information about Chinese language rules, three learning modules, and different learning modes built in the Multimedia Learning Suite. The step-by-step introduction of how to use the software provided in the booklet provides novice users with an easy start. The Learn-to-Learn booklet provides concise theoretical background about how memory functions in encoding, storing, and retrieving information, and how human beings establish and strengthen memory in long-term retention. The Multimedia Learning Suite USB stick also comes equipped with 89 audiobooks, organized into 16 themes that roughly correspond to the chapters in the three learning modules. These audiobooks are in MP3 format and are on average five minutes in length. The audiobooks provide the pronunciation and English translation of the Chinese characters presented in the modules. Each character is first pronounced in Chinese, followed by the English translation, and then pronounced in Chinese again. Users can easily transfer these audiobooks to a portable music device, such as an iPod, to practice on the go. The audiobooks are useful for pronunciation enhancement when learners do not have access to a computer.

Language Learning & Technology

17

Ching-Ni Hsieh and Fei Fei

Review of Multimedia Learning Suite

Learning modules Multimedia Learning Suite has three learning modules built in. Each learning module is further divided into chapters, and chapters into individual flashcards. Each flashcard introduces one lexical item. The first learning module, Numbers, Dates and Time, is organized into 10 chapters and contains 372 flashcards, that is, 372 Chinese characters. This module helps users familiarize themselves with the symbols for the Chinese numbers and teaches them how to form dates and use basic time-related expressions in Chinese. Novice learners will find the Chinese characters introduced in this learning module useful because they are commonly used and easy to memorize. The second learning module, Radicals, contains 32 chapters and 444 flashcards. The first chapter introduces 31 semantic radicals. Radicals are a major component of Chinese characters which carry information about the characters’ meaning (Feldman & Siok, 1999). Chapter 2 through Chapter 32 are named after one radical and consist of compound characters containing the specific radical introduced earlier in that chapter. These 31 radicals (out of a total of 214) introduced in the second learning module are widely used in compound Chinese characters and are of pedagogical importance. Research has shown that learners who are aware of the functions of the radicals can recognize complex Chinese words better and the knowledge of radicals serves as a powerful tool in literacy development (Shu & Anderson, 1999). Thus, learning to recognize the 31 radicals introduced can help users guess the meaning upon encountering a new word and facilitate the memorization of more complex Chinese vocabulary. The third learning module, Thematic Vocabulary, is organized into 14 chapters and contains 2,500 flashcards. Each of the 14 chapters consists of single- and multiple-character words, including nouns and verbs. The module covers a wide range of thematic vocabulary, such as food and drinks, health, the human body, leisure activities, nature, business, travel, and transportation. The breadth of the thematic vocabulary introduced in this module can develop users’ knowledge of Chinese characters and meet their needs for using Chinese in different social contexts. Main Screen Layout The main window of the Multimedia Learning Suite at any instant represents one flashcard from the currently active chapter. Each flashcard provides the following information about the lexical item: 1) the Chinese character, 2) audio with pronunciation from a native speaker, 3) a large image of the character, 4) Pinyin (the Romanization system for standard Mandarin), and 5) a video associated with the lexical item, if applicable. The character is introduced on the left side panel, and the user types in the response (the meaning of the character in English) on the right side panel. Textual and aural feedback are given after the user’s response. The last four characters answered incorrectly are presented at the bottom of the flashcard for review. A typical flashcard from the “Radicals” learning module is shown in Figure 1. Users can also specify a self-assessment feature that allows them to decide whether they really know the item on the flashcard. If a user gives a correct answer but checks the ‘Don’t know’ icon after answering a question, the character will also be presented at the bottom of the flashcard for review. The entire user interface of the Multimedia Learning Suite can be changed from its default English to French, German, Spanish, and Portuguese. For Chinese L2 learners whose first language is not English, the provision of different language interfaces can come in handy. However, it needs to be noted that the learning suite is mainly developed for native speakers of English because the definition of the Chinese characters and feedback are both given in English.

Language Learning & Technology

18

Ching-Ni Hsieh and Fei Fei

Review of Multimedia Learning Suite

Figure 1. Main window of the Multimedia Learning Suite.

Figure 2. Learning modes. Customization of the learning experience 1. Learning Modes Any of the pieces of information about a lexical item that is presented on the left panel of the flashcard can be enabled or disabled for a learning session by specifying the ‘mode’ of learning (see Figure 2). The Standard Mode displays all available information for the question, including the image, sound, and pinyin. The Multiple Choice Mode asks users to select the correct answer from a number of options, chosen randomly by the system from answers to other cards in the chapter. The Sentences Mode asks users to type in the example sentences given in the questions rather than the English translation of the character. However, there is only a handful of flashcards that contains sentences in the three built-in learning modules. Users may find this learning mode less useful due to the absence of example sentences which would create a meaningful context for vocabulary learning. The Listening Comprehension Mode Language Learning & Technology

19

Ching-Ni Hsieh and Fei Fei

Review of Multimedia Learning Suite

presents only the pronunciation, without showing the calligraphy, of the Chinese character pronounced; this learning mode is useful for practicing listening skills. The Image Recognition Mode tests users’ recognition of the character calligraphies without Pinyin or audio stimulus. 2. Learning Options In addition to the Learning Modes, several Learning Options can be specified for the current learning session (see Figure 3). Some useful options that can be set/unset are: 1) a Countdown Timer, which sets a specific response time for answering questions depending on the difficulty of the character presented; 2) Display Statistics, which shows statistics about a user’s progress through a learning module (see further discussion below); 3) Display Images, which displays the calligraphies associated with the character; and 4) Show Correct Answers, which shows the correct answer after the user keys in the response.

Figure 3. Learning options. Progress statistics Progress through a learning module can be tracked by selecting Statistics under the Learn menu item. Several useful statistics are presented in the form of line-graphs and pie charts. For example, ‘Knowledge of Learning Module’ shows the development of users’ knowledge of a specific learning module over time (see Figure 4). The changes in the distribution of the flashcards in the boxes are measured against the total time spent on a learning module up to that point. This information is useful as it helps users keep track of their learning trajectory and determine their progress. Feedback from a Chinese learner who tried out the program indicates that learners will appreciate the statistics function provided in the software because the figures are easy to interpret and motivate learners to continue their learning. Given that the ideal goal of the Box System is to master all the flashcards in the highest numbered box, it would be useful for learners to know the number of cards remaining in each box as they go through it. The ‘current distribution’ statistics (see Figure 5) show the exact number of cards in each box within the chapters that users select for learning. The number of correct and incorrect responses and the percentage of known words are also calculated to give users a precise picture of their progress in their current learning session. These figures are useful as they provide learners with a sense of how much they have learned and how many more characters they need to work on in order to fully acquire them in the chapters specified.

Language Learning & Technology

20

Ching-Ni Hsieh and Fei Fei

Review of Multimedia Learning Suite

Figure 4. Knowledge of learning module.

Figure 5. Card distribution.

Language Learning & Technology

21

Ching-Ni Hsieh and Fei Fei

Review of Multimedia Learning Suite

The progress of an individual’s learning through a module is saved on the accompanying USB stick along with the modules, unless users reset the learning progress saved in the corresponding learning module. This means that one copy of the software can ideally be used for tracking one individual’s progress. While storing the progress data on the USB stick is very convenient for self-study, this feature does make the software less user-friendly when it is used in a classroom or language lab setting with multiple users. Unless one copy of the software is obtained for each student, teachers who use the software as supplementary teaching material cannot track each individual student’s progress using the statistics provided by the software. In this case, other formative assessment tools, such as traditional pen and paper quizzes need to be employed in order to assess students’ learning progress.

Figure 6. Creating or editing a card. Authoring system One major feature of the Multimedia Learning Suite is the MemoryLifter program, which allows users to create and add new flashcards to existing chapters, new chapters to existing learning modules, or even develop an entirely new learning module in addition to the existing learning modules (see Figure 6). Conversely, newly created flashcards, chapters, and modules can also be easily deleted from the software. The dialog box shown in Figure 6 allows users to specify a question, and add images, audio, or video files for the question from the local machine by browsing for files (all common image formats are supported; .mp3 , .wav, and .mid formats are supported for audio; .avi and .wmv formats are supported for video). Audio recordings can be made on the fly if the computer is equipped with a microphone. Users can also add media items to the answer field if available. A default visual style, such as font, colors, and margins Language Learning & Technology

22

Ching-Ni Hsieh and Fei Fei

Review of Multimedia Learning Suite

of the cards, can be set as preferred. Once a new flashcard is created, it can be moved to a particular chapter and within a chapter to a particular box. New learning modules are created as folders at a location that users specify. These folders can be transferred to any other computer and then opened by the MemoryLifter software on the student’s computer. The complete process of creating and using new flashcards is very intuitive and user-friendly. Users with basic working knowledge and computer literacy will be able to create their own flashcards after a few trials. If users encounter problems when creating their own flashcards, the Forum & Community section on the MemoryLifter website provides an accessible channel for users to pose questions and receive timely answers. Overall evaluation and pedagogical implications The main strength of the Multimedia Learning Suite lies in its various mnemonic approaches to enhancing long- term retention of Chinese characters, such as standard text input, multiple choice, listening comprehension, and image recognition. Cognitive research on Chinese reading has well documented that, when learners identify a Chinese character, both the visual-orthographic component and phonological and semantic attributes of the character are activated rapidly in the mental representation (Tan, Hoosain, & Siok, 1996). Although the flashcards in Multimedia Learning Suite are patterned after traditional flashcards, they are more effective in facilitating Chinese vocabulary learning because they provide different input modalities, such as images and sounds, that enhance the learning outcome. Multimedia Learning Suite also allows users to create new flashcards with images, audios, and videos, and to add them to the program in a customized learning process. The fact that users can switch between different learning modes easily offers great flexibility. For example, users can choose to practice with the Image Recognition Mode and then the Listening Mode, or vice versa, depending on their learning style and interest. The statistics provided by the program can also help learners monitor their learning progress as research has shown that learning strategies, such as self-regulation and self-monitoring, are positively related to learners’ self-efficacy, intrinsic motivation, and learning outcome (e.g., Pintrich, 2000; Winne & Perry, 2000; Zimmerman, 1990). As a multimedia flashcard program, the Multimedia Learning Suite is ideal for individual learners to learn, memorize, and recognize Chinese characters because learners can access the program at any time based on their learning needs, without guidance or supervision by language instructors. Beginning or lowintermediate learners of Chinese, particularly those who want to acquire basic knowledge of Chinese characters, as well as improve their pronunciation, will find the software intuitive. Multimedia Learning Suite also provides considerable flexibility for high level learners of Chinese, who can rearrange the material by creating their own flashcards and reorient the learning process according to their priorities. In addition to individual use for vocabulary learning, the software can also serve as a supplemental assessment tool with embedded immediate feedback and statistics to regularly monitor the learning progress. Among the wealth of Chinese language learning materials and software currently on the market, many other flashcard programs, such as YellowBridge Online Chinese Flashcards and Declan’s Chinese Flashcards, are available. Thus, the functions built in the Multimedia Learning Suite are not unique. However, what sets Multimedia Learning Suite apart from its competition is that almost all aspects of the learning environment can be customized to fit learners’ needs. The included MemoryLifter software provides a very intuitive authoring environment for teachers as well as individual learners to create individual flashcards, chapters composed of those cards, and even entire learning modules. The ability to access one’s progress over time across the learning modules is another real strength of the software because it encourages learners’ self-monitoring as a successful learning strategy. Overall, Multimedia Learning Suite delivers what it was designed to do. Nevertheless, several drawbacks of the software also need to be addressed. First, the introduction of strokes and stroke orders is surprisingly absent. Strokes and stroke orders are crucial in learning Chinese Language Learning & Technology

23

Ching-Ni Hsieh and Fei Fei

Review of Multimedia Learning Suite

characters as they represent the intrinsic regularity of the Chinese writing system while incorrectly ordered or written strokes can produce illegible or incorrect characters. As a multimedia flashcard software, this component could have been easily integrated into the program by adding stroke order animation when the target character is introduced. Second, the software supports only simplified Chinese characters. Although traditional Chinese characters are more complex than the simplified ones, they are still popular in Taiwan, Hong Kong, and many other Chinese-speaking communities in the world. It would be helpful if the software developers could make a separate version that introduces traditional Chinese characters in future program development. In this way, the program would be applicable to both learners of simplified Chinese and traditional Chinese. This last concern relates to the vocabulary inventory upon which the 3,000 Chinese characters in the current program are built. There is no introduction, either in the software package or on the MemoryLifter website, about how these characters have been selected. Users could have been better informed about the characters to be learned if the Multimedia Learning Suite software had provided additional information regarding the selected vocabulary’s word frequency and difficulty level based on an established Chinese vocabulary list. In a nutshell, the successful use of the software depends on the given learning context. If the software is to be used by Chinese language instructors as a pedagogical tool in the classroom, a number of decisions have to be made based on the sequences of the characters to be presented. For example, radicals should be introduced first, followed by single element pictographic characters, then multi-element characters and compound words. Nonetheless, for individual learners, since learning Chinese at the beginning level relies largely on memorization and word recognition, the software will be of great value as such learners embark on their first step toward learning Chinese.

ABOUT THE REVIEWERS Ching-Ni Hsieh is a Ph.D. candidate in the Second Language Studies Program at Michigan State University. Her research interests include language testing, individual differences, and Chinese SLA. Ching-Ni has taught Chinese at a heritage language school in Michigan and is currently developing an online Chinese oral proficiency test. E-mail: [email protected] Fei Fei is a Ph.D. candidate in the Second Language Studies Program at Michigan State University. She has been working on language projects at the Center of Language Education and Research (CLEAR), including Multimedia Interactive Modules for Education and Assessment, Online Chinese Pragmatics Tests, and a Business Chinese CD series. E-mail: [email protected]

REFERENCES Allen, J.R. (2008). Why learning to write Chinese is a waste of time: A modest proposal. Foreign Language Annals, 41(2), 237-251. Allum, P. (2004). Evaluation of CALL: Initial vocabulary learning. ReCALL, 16(2), 488-501. Declan’s Chinese flashcards [Computer software]. (1999-2009). Retrieved from http://www.declansoftware.com/chinese/screenshots_dcfc.htm Feldman, L. B., & Siok, W.W.T. (1999). Semantic radicals in phonetic compounds: Implications for visual character recognition in Chinese. In H.-C. Chen, A. W. Inhoff, & J. Wang (Eds.), Reading Chinese Language Learning & Technology

24

Ching-Ni Hsieh and Fei Fei

Review of Multimedia Learning Suite

script: A cognitive analysis (pp. 19-36). Mahwah, NJ: Lawrence Erlbaum Associates. Karpicke, J. D., & Roediger, H. L. (2007). Expanding retrieval practice promotes short-term retention, but equally spaced retrieval enhances long-term retention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(4), 704-719. LearnLift. (2008). Learn to learn: Tips and tricks booklet. Houston, TX: LearnLift. Liu, C.-L., Jaeger, S., & Nakagawa, M. (2004). Online recognition of Chinese characters: The state-ofthe-art. Transactions on Pattern Analysis and Machine Intelligence, 26(2), 198-213. McNaughton, W., & Li, Y. (1999). Reading and writing Chinese: A guide to the Chinese writing system. North Clarendon, VT: Tuttle Publishing. Pintrich, P. R. (2000). Multiple goals, multiple pathways: The role of goal orientation in learning and achievement. Journal of Educational Psychology, 92, 544-555. Tan, L.H., Hoosain, R., & Siok, W.T. (1996). Activation of phonological codes before access to character meaning in written Chinese. Journal of Experimental Psychology: Learning, Memory and Cognition, 22, 865-882. Winne, P.H. & Perry, N.E. (2000). Measuring self-regulated learning. In P. Pintrich, M. Boekaerts, & M. Seidner (Eds.), Handbook of self-regulation (pp. 531-566). Orlando, FL: Academic Press. YellowBridge Online Chinese Flashcards [Online software]. (2003-2009). Retrieved from http://www.yellowbridge.com/chinese/flashcards.php Zimmerman, B. (1990). Self-regulating academic learning and achievement: The emergence of a social cognitive perspective. Educational Psychology Review, 2, 173-201.

Language Learning & Technology

25

Language Learning & Technology http:/llt.msu.edu/vol13num3/review2.pdf

October 2009, Volume 13, Number 3 pp. 26–31

REVIEW OF ACADEMIC INTERACTIONS: COMMUNICATING ON CAMPUS Academic Interactions: Communicating on Campus [with DVD] Christine B. Feak Susan M. Reinhart Theresa N. Rohlck 2009 ISBN: 978-0-472-03332-4 US $29.50 (paperback) 216 pp. Michigan University Press Ann Arbor, MI

Review by Heather Weger, Georgetown University The internationalization of higher education has seen an increasing number of students pursuing academic interests in the U.S. (deWit, 2002). Such learners will likely face a variety of challenges—including not only the need to participate in the academic discourse of a particular discipline, such as the natural sciences, marketing, or legal studies (Bhatia, 2002), but also the need to manage social adjustment (Ward, Bochner, & Furnham, 2001). Moreover, the U.S. academic context itself is a particular community of practice (Belcher, 1994; Lave & Wenger, 1991), characterized by norms of engagement and classroom expectations that may differ widely from the experiences international students bring with them (Casanave & Li, 2008; Shiraev & Boyd, 2008). Designed to raise ESL learners’ awareness about such norms of engagement, Academic Interactions seeks to equip prospective U.S.-based students with “some of the basic communication skills they need to be successful in a college or university setting” (p. vii). The target audience of the book is high-intermediate to advanced level ESL learners. Its explicit academic focus means that it is best suited for college-preparatory programs. In fact, some activities presuppose access to professors and matriculated university students, further indicating that the book is intended for learners “who are either in an intensive academic English program or have newly begun their academic careers at a U.S. community college, college, or university at either the undergraduate or graduate level” (p. vii). The greatest strength of the book is its carefully designed textual analysis tasks drawn largely from the MICASE database (http://quod.lib.umich.edu/m/micase), a publicly available resource of authentic “academic speech from across the University of Michigan campus” (p. vii). While the use of authentic speech samples does not inherently make an approach pedagogically sound (Gilmore, 2007), the book is successful because of the tasks that accompany each excerpt and draw ESL learners’ attention below the surface theme of the unit to consider and practice specific language points—including mining the authentic transcripts to study functional grammar features (e.g., use of ellipsis), new vocabulary, and pragmatic appropriateness (e.g., how to politely decline advice from an authority figure). Though the book includes other authentic sources, the MICASE transcripts are the backbone of the book; they constitute a majority of the tasks in the five units that focus on oral interactions. (One unit focuses on email correspondence and includes authentic sample emails to professors from non-native English speakers.)

Copyright © 2009, ISSN 1094-3501

26

Heather Weger

Review of Academic Interactions

Accompanying the text is a DVD that offers content-based illustrations of the themes of a given unit. The DVD is based on role-playing, and thus is not entirely authentic. However, the actors were not given predetermined scripts; instead a situation was outlined for them (e.g., the need to visit one’s advisor during office hours to discuss changing majors), and the actors improvised the scene. At first glance, some of the scenes seem overly simplistic for the target audience of “high-intermediate to advanced students” (p. vii), but they may serve as useful homework or independent activities that can supplement the core lessons. Substantial suggestions for ways learners might use the DVD to examine specific language functions (including transcripts and lesson ideas) are accessible online (http://www.press.umich.edu/esl/tm/academicinteractions/). Additionally, these DVD scenes also provide the source for most of the book’s pronunciation foci that are found at the end of each unit (only the email unit includes a pronunciation focus not based on the DVD). For example, one DVD scene introduced in Unit 2 shows students (actors who are improvising) discussing dorm life. To reinforce the interview genre introduced in the chapter, the pronunciation task involves transcribing a segment of this scene and analyzing it for question intonation. In Unit 4, which includes a focus on accepting and rejecting advice, recommendations, or suggestions, students analyze another DVD scene and focus on the phoneme /T/, which is particularly relevant for distinguishing a negative expression (can’t) from a positive one (can). The pronunciation foci of the other four units, however, seem less inherently linked to the communicative target of the unit. For example, Unit 3 focuses on email correspondence, a written communicative event, yet the authors include a pronunciation focus that asks learners to consider how to pronounce acronyms. The final unit targets class discussions and presentations, and it is unclear how the pronunciation focus on consonant clusters is particularly relevant for that theme. Each unit is comprised of multiple tasks that will take varying amounts of time to complete. Though there are no explicit timing guidelines in the book, individual tasks will likely take between 5 minutes and an hour, depending on the task type. For example, the initial tasks of each unit, focusing on brainstorming or schema activation, may take only 5 minutes of class time; however, the MICASE transcript analysis tasks will probably take 30 minutes or more. Other tasks include grammar practice (5-10 minutes) and planning and conducting interviews (30-60 minutes). Though the DVD scenes are short (all but one are less than 6 minutes), transcribing and analyzing them for each unit’s pronunciation task will take considerably longer; this task lends itself to independent work or homework. Unit 1 contains 14 tasks and focuses on common American surnames (tasks 1-8), places (task 9), and locations and directions (tasks 10-14). Though these themes seem to be at a level too low for the target audience of high-intermediate to advanced learners, the tasks of the unit can be exploited for three major purposes: first, the opening tasks may serve as good ice-breakers for a new class of learners who might use them as a means of getting to know each other; second, once the unit shifts to “Locations and Directions,” the unit introduces learners to working with the MICASE transcripts via close textual analyses (since such close textual analysis may be an unfamiliar task for many learners, it may be useful to train learners by using the familiar theme of locations and directions); and third, the final task of the unit allows learners to explore the campus or community in which they are currently living and studying through a group project and informal presentation. The 21 tasks of Unit 2 deal with raising learners’ awareness about life as college/university students in the U.S. Learners consider such issues as housing, homework, and means of contacting their professor. The viewpoint of the college/university professor is also present in this unit through MICASE transcript analysis and DVD scenes intended to broadly describe what a “typical” day for a professor might entail. The unit also introduces standards of email protocol (although this is covered in depth in Unit 3). Students’ expectations regarding homework and grading are also raised. As in Unit 1, many tasks again focus on unpacking MICASE transcripts in terms of several language foci—including understanding idiomatic expressions and interpreting speaker intent. Building on the work of Boxer (1993), one Language Learning & Technology

27

Heather Weger

Review of Academic Interactions

pragmatic language focus of the unit concerns how students can use complaining about homework as a means “to establish rapport or connections with other students” (p. 41). This unit also introduces learners to the interview genre, which prepares them to complete two interview tasks: interviewing a matriculated college/university student and someone who works in a student service office on their campus. With a focus on email protocol, Unit 3 is the only unit in the book solely targeting a written mode of communication. The unit addresses mechanics of email, such as appropriate subject headings, greetings, and closings. The strength of the unit, however, is that it addresses the complexities that arise from the hybrid nature of email correspondence. As noted by the authors, “email correspondence has features of both spoken and more formal written English, which sometimes poses a challenge when it comes to vocabulary and grammar choices” (p. 58; see also Biesenbach-Lucas, 2005). The authors outline two pragmatic heuristics to help learners manage these challenges: Grice’s Conversational Maxims (1975) and Leech’s Maxims of Politeness (1983). The unit avoids the pitfall of being overly theoretical because the theories are succinctly presented, and the accompanying 19 tasks of the unit guide learners in applying the theories’ principles to the evaluation and construction of emails. Specific pragmatic foci include how to ask for letters of recommendation, set up appointments, and apologize for absences or missed assignments. In short, the treatment of the linguistic theories is reasonably positioned within the learners’ zone of proximal development (Brophy, 1999; Vygotsky, 1986) and provides useful tools for learning how to manage the balance of formal and informal tenor (Halliday & Matthiessen, 2004) that characterize student-instructor email correspondence. Participating in online course discussions is briefly mentioned, yet no tasks are designed around this potentially important area of course participation. As online education grows in popularity, this lacuna may need greater attention in future editions of the book. Unit 4 consists of 17 tasks and focuses on how students interact appropriately with professors and advisors during office hours. As in Units 1 and 2, MICASE transcripts are used to illustrate specific language points. For example, in task 7, students are presented with three examples of office hour interactions. They are asked to evaluate the three in terms of content (why are learners visiting office hours), idiomatic expressions (e.g., ‘alright’ versus ‘alrighty’ versus ‘all right’), and discourse markers (e.g., the functions of words of such as ‘mhm’ and ‘yep’). Other tasks in the unit focus on different means of explaining the need for help with homework assignments and the use of modal verbs and quasi-modals in asking for and receiving advice; each language point is supported by a MICASE example. Additionally, this unit has several particularly useful DVD scenes. For example, the three scenes for “Appointments with an Advisor” might be mined for useful vocabulary (e.g., credits, prerequisite, elective, have space, off track). The 14 tasks of Unit 5 concern classroom interactions and focus specifically on the functions of questions (asked by both instructors and students) and narratives in classroom lectures. For example, in task 3, students examine MICASE transcripts to identify instructor questions and consider whether they are rhetorical; students then evaluate different strategies for asking questions when they have not understood part of a lecture—including brainstorming expressions for interrupting. Other tasks in the unit focus on how to hedge a response, how a wh-cleft functions, and how to understand ellipsis; MICASE transcripts illustrate each of these language features. Also, this unit contains a particularly useful DVD scene, “Gestures, Facial Expressions, and Body Language.” This scene shows three speakers discussing crosscultural variations in non-verbal cues—including different meanings conveyed through the use of eyebrows, pointing, and other hand gestures. Since nonverbal cues are of critical importance in interpreting face-to-face discourse, which is the focus of units 1, 2, 4, 5, and 6, it might be useful to begin the course with this task. The final unit focuses on two speech events: formal discussions (part 1, tasks 1-11) and panel presentations (part 2, tasks 12-27). Again MICASE transcripts illustrate language cues that might be useful for accomplishing both speech events. In part 1, learners are introduced to the structure of a discussion—including summarizing a topic, developing discussion questions, controlling turn-taking, and Language Learning & Technology

28

Heather Weger

Review of Academic Interactions

listening actively. Part 1 ends with each learner taking on the role of a discussion leader. In part 2, learners are scaffolded through the process of developing their own panel presentations. This includes the planning stages (e.g., how to develop a topic and determine presenter roles) and the execution stages (e.g., how to transition between presenters, how to manage question-answer sessions, and how to develop visual aids). Two sets of comparisons in the DVD scenes of this unit may provide good discussion starters: one set contrasts how a group of students negotiates the division of labor for a group project; the other set contrasts more/less successful uses of visual aids. Academic Interactions has many strong features. First, the approach is grounded in authentic speech samples, whose use has long been considered a cornerstone of sound pedagogy (Brown, 2001). However, the mere inclusion of authentic materials does not account for the success of the textbook. Rather, it is the carefully constructed tasks which guide learners to consider and practice the organizational structure, language features, and pragmatic appropriateness of each authentic excerpt. Another strong point is that the book is fairly comprehensive in addressing likely modes of interaction in a student’s academic life: addressing both in class and out of class interactions with professors and classmates, as well as informal and formal media of communication. A third strength is that the text is highly interactive. Questions instructing students to consider language usage or to link the examples in the book to their own experiences are embedded within the descriptive paragraphs of each unit, and almost every task assumes pair or group discussion. Another strength of the book is that the authors “have attempted to use published research as much as possible to inform [their] materials development” (p. vii). Accordingly, units 2-6 explicitly cite published research or statistics to prove the relevance of the units’ themes to successful academic interactions. Finally, teaching aids (Instructor’s Notes and supplemental suggestions for using the DVD) are available online (http://www.press.umich.edu/esl/tm/academicinteractions/). Though this text might well serve as the foundational text for a general academic preparation class, there are two caveats to consider in planning how to use the book with a class. First, there are a large number of tasks in each unit (from 14-27). While each individual task may not be particularly time consuming, almost every task assumes pair or group discussion. Thus, it seems best that the vast majority of the work be done during class time, with the exceptions of preparing for major projects (e.g., Units 1 and 6) and working through each unit’s pronunciation focus. Although learners can be assigned a set of tasks to complete with a partner outside of class, instructors must then determine how to assess the completion of the assignments, most of which are oral (the only unit that produces a substantive amount of written reflection is the unit on email correspondence). Second, and more importantly, the close textual analysis of transcripts will likely be a new skill for most students, further problematizing the assignment of many of the tasks as homework, at least until learners have been trained. Working with the transcripts during class time is one caveat the authors themselves are fully aware of as they note that careful planning is necessary in order to avoid not being able to complete the analysis of a transcript by the end of a class: “Because momentum is hard to regain, we plan strategic stopping points in the unit rather than find ourselves with a lengthy transcript only ‘half-done’ [at the end of a class period]” (p. x). The book does not provide a guideline of how much time each transcript analysis is expected to take, presumably because this may vary greatly depending on a particular class. To address these practical timing parameters, an instructor may need to carefully select a subset of tasks for each unit, or consider teaching only some units. Unit 1, for example, has a focus on American surnames, places, and locations/directions; these themes may not be as relevant as the need to understand email or office hour protocols for the target audience of high-intermediate to advanced students. Timing constraints also suggest that prudent use of the DVD during class time is warranted; viewing the DVD scenes might best be assigned as homework or independent study. Careful consideration of these caveats will help instructors unlock the benefits of this book for their learners. Specifically, ESL learners will gain not only awareness of, but also practice in how language functions to accomplish academic needs (such as participating in class and preparing for group work with classmates). Although finding the right balance Language Learning & Technology

29

Heather Weger

Review of Academic Interactions

for one’s class might be a challenge when using the book for the first time, it will be worth the effort for both student and instructor.

ABOUT THE REVIEWER Heather D. Weger received her Ph.D. in Applied Linguistics from Georgetown University. Her research interests include individual differences and SLA, identity construction in the adult learner, and teacher training. She is currently researching classroom motivation and teaching English in the Intensive English Language Program at Georgetown University. Email: [email protected]

REFERENCES Belcher, D. (1994). The apprenticeship approach to advanced academic literacy: Graduate students and their mentors. English for Specific Purposes, 13(1), 23-34. Bhatia, V. (2002). A generic view of academic discourse. In J. Flowerdew (Ed.), Academic discourse (pp. 21-39). London: Longman. Biesenbach-Lucas, S. (2005). Communication topics and strategies in e-mail consultation: Comparison between American and international university students. Language Learning & Technology, 9(2), 24-46. Retrieved from http://llt.msu.edu Boxer, D. (1993). Complaining and commiserating: A speech act view of solidarity in spoken American English. New York: Peter Lang. Brophy, J. (1999). Teaching. Geneva: International Bureau of Education. Brown, H. (2001). Teaching by principles: An interactive approach to language pedagogy. London: Longman. Casanave, C., & Li, X. (Eds.) (2008). Learning the literacy practices of graduate school: Insiders’ reflections on academic enculturation. Ann Arbor, MI: University of Michigan Press. de Wit, H. (2002). Internationalization of higher education in the United States of America and Europe. Westport, CT: Greenwood. Gilmore, A. (2007). Authentic materials and authenticity in foreign language learning. Language Teaching, 40, 97-118. Grice, H. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Speech acts (pp. 41-58). New York: Academic Press. Halliday, M., & Matthiessen, C. (2004). An introduction to functional grammar. London: Hodder Arnold. Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge: Cambridge University Press. Leech, G. (1983). Principles of pragmatics. London: Longman. Shiraev, E., & Boyd, G. (Eds.) (2008). The accent of success. Ann Arbor, MI: University of Michigan Press. Vygotsky, L. (1986). Thought and language. Cambridge: MIT Press.

Language Learning & Technology

30

Heather Weger

Review of Academic Interactions

Ward, C., Bochner, S., & Furnham, A. (2001). The psychology of culture shock. Philadelphia, PA: Taylor & Francis.

Language Learning & Technology

31

Language Learning & Technology http://llt.msu.edu/vol13num3/hincksedlund.pdf

October 2009, Volume 13, Number 3 pp. 32–50

PROMOTING INCREASED PITCH VARIATION IN ORAL PRESENTATIONS WITH TRANSIENT VISUAL FEEDBACK Rebecca Hincks and Jens Edlund KTH Royal Institute of Technology This paper investigates learner response to a novel kind of intonation feedback generated from speech analysis. Instead of displays of pitch curves, our feedback is flashing lights that show how much pitch variation the speaker has produced. The variable used to generate the feedback is the standard deviation of fundamental frequency as measured in semitones. Flat speech causes the system to show yellow lights, while more expressive speech that has used pitch to give focus to any part of an utterance generates green lights. Participants in the study were 14 Chinese students of English at intermediate and advanced levels. A group that received visual feedback was compared with a group that received audio feedback. Pitch variation was measured at four stages: in a baseline oral presentation; for the first and second halves of three hours of training; and finally in the production of a new oral presentation. Both groups increased their pitch variation with training, and the effect lasted after the training had ended. The test group showed a significantly higher increase than the control group, indicating that the feedback is effective. These positive results imply that the feedback could be beneficially used in a system for practicing oral presentations. INTRODUCTION In the wake of globalization, public speaking is increasingly done in the world’s second language, English. One form of public speaking is the oral presentation, a genre often studied and taught in courses such as Academic English, Business English or Technical English. Holding an oral presentation in front of their classes gives students who have reached a certain level of communicative competence the opportunity to practice a task that they in all likelihood will meet in their working life. Teachers are given the opportunity to listen in a focused manner to the spoken production of individual students, and classes are given a chance to learn about various topics from their peers rather than from their teachers. Oral presentations can be assigned a grade, and deserve treatment as a genre in themselves, comparable to traditional written genres. One aspect of a successful oral presentation is that the speaker has used his or her voice in a way that has facilitated access to the content of the presentation. This involves temporal features, such as speaking at a pace that is appropriate for the audience, and expressive features, such as using pitch and loudness to give aural shape to the information structure of one’s intended message. This use of intonation can be a challenge for any novice public speaker, but it is more so for those who are speaking in a second language. This is particularly true for speakers whose native languages have intonational systems that differ greatly from English. In the research reported on in this paper we have taken steps in the direction of developing a system for practicing oral presentations with feedback provided by speech technology. People who are required to hold a presentation in a second language are inclined to practice the presentations, especially if they are to receive a grade. Because of the widespread use of presentation software, most speakers are in the proximity of computers as they practice. This presents an opportunity for computer-based feedback (Hincks, 2005). Speech recognition could be used to provide a transcript of the presentation, which could be analyzed for the presence of desirable and undesirable linguistic features. Speech recognition and analysis could also be used to give feedback on the speaker’s pronunciation and intonation. In order to Copyright © 2009, ISSN 1094-3501

32

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

achieve the goal of presentation feedback, however, we must find ways to successfully apply speech technology to the production of free, rather than modeled, speech. Speech analysis for teaching intonation The term speech technology covers three basic technologies: speech analysis, speech synthesis, and speech recognition. An overview of the use of speech technology for teaching pronunciation was recently published by Levis (2008). Speech recognition can be used to provide feedback on speaker pronunciation at the phonemic level (e.g., Neri, Cucchiarini, & Strik, 2008). Since it has achieved considerable breakthroughs in recent years, it can now also be used to provide immediate transcripts of presentations made with native-accented speech (Hincks, 2009). However, we focus on speech analysis, which can be used for feedback on intonational features. Speech analysis is a technology that separates a speech signal into its component parts in order to provide information about the frequencies and intensities of the sound. A typical visual display of an analysis has three main components: the speech waveform, showing the intensity, or how loud the sound is; the spectrogram, showing the distribution of the resonant frequencies; and the display of fundamental frequency, in which a broken line, known as the contour, curve or tracing, represents pitch. Speech analysis was once available to end users mainly in the form of expensive software such as VisiPitch or Speechviewer; however, programs such as WaveSurfer (Sjölander & Beskow, 2000 ) and Praat (Boersma, 2001) are now freely available via the Internet.1 The display of fundamental frequency has long been used to teach intonation patterns in a second language (Anderson-Hsieh, 1992; De Bot, 1983; Hardison, 2004; Molholt, 1988). A visual display of the pitch contour of a learner utterance can be compared to a teacher model of the utterance, in order to heighten the learner’s perception of the importance of appropriate pitch movement and to give immediate feedback on the learner’s production. The early work by de Bot (1983) established the effectiveness of giving learners audio-visual feedback on their intonation rather than audio-only. Hardison (2004) showed that training in intonation with real-time visual pitch display not only improved learner production at the supra-segmental level, but also at the segmental level. Commercially available software packages for pronunciation training, such as those produced by Auralog, incorporate speech analysis, and display the user’s pitch curve along with a target model. There are a number of limitations inherent in the way speech analysis is traditionally used for teaching intonation. One is the standard procedure of using a target model with which to compare the learner utterance. This limits the extent to which learners can use the technology on their own, and also the extent to which it can be integrated into training based on naturally occurring, authentic communication. Learners need some training in order to interpret the pitch contour. The admonition to compare with a teacher model may be interpreted by students as a requirement to match the model precisely—a task at which they are bound to fail. Furthermore, the pitch contour represents not only the intonation that is appropriate to the target language but also intonation related to, for example, speaker attitude or regional dialect. While these features in themselves could provide further pedagogical goals for a certain type of student (Chun, 1998), the type of mimicking required to match a contour precisely is probably frustrating and counter-productive. Many learners have pronunciation goals that are more oriented toward comprehensibility than to achieving a native-like accent. As English consolidates its position as the global lingua franca, there are more students whose goals are closer to the former than to the latter (Jenkins, 2000). Further problems stem from the fact that the fundamental frequency analysis that is used to create the pitch contour is an imperfect technology, with errors ranging from octave errors—the analysis frequently missing by a full octave, something that can be caused both by the nature of fundamental frequency processing and by the processing due to the nature of phonation—to less easily caught errors. Ideally,

Language Learning & Technology

33

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

maximum and minimum fundamental frequency values should be set for each individual speaker in order to limit misrepresentations in the pitch contour. The use of speech analysis over long stretches of discourse is problematic. Scrolling windows allow for the continuous display of information, but students must be able to make connections between their speech and the fairly complex visual pitch patterns that are displayed instantaneously and simultaneously. Language students who have the opportunity to receive personal tutoring on their use of intonation in extended discourse may be presented with a series of pitch tracings—something that can only be accomplished off-line, after the speech has been produced and recorded. However, pitch contours are by nature quite different from how language in natural contexts is perceived. They constitute a static, posthoc, abstract representation of some of the acoustic properties of utterances that are already spoken and lost, whereas the acoustics of speech are normally perceived only in the moment: they are transient and direct rather than static and analytical. We know that giving learners feedback on intonation is valuable, and that it is enabled by the visual representation provided by speech analysis. The standard technique can be advantageously used for practicing phrases in the type of pronunciation training done at elementary levels of language training, but is inadequate for stimulating intonational development over longer stretches of discourse (Chun, 1998) such as those produced by intermediate and advanced learners who make oral presentations. Pitch variation and movement in native and non-native public speaking Let us now turn to what is known about the way pitch is used by native and non-native speakers as they speak in public. First-language speech that is directed to a large audience is normally characterized by more pitch variation than conversational speech (Johns-Lewis, 1986). In studies of English and Swedish, high levels of variation correlate with perceptions of speaker liveliness (Hincks, 2005; Traunmüller & Eriksson, 1995) and charisma (Rosenberg & Hirschberg, 2005; Strangert & Gustafson, 2008). The variable that can be used to represent pitch variation is the normalized standard deviation of fundamental frequency. The standard deviation will decrease with increasing amounts of data, but if the amount of data under analysis is constant, it will reflect differing amounts of variation. In our work we examine the standard deviation of a window of ten seconds of speech at a time. The window moves through the speech as it is processed. If the speaker makes little movement from his or her mean fundamental frequency, the standard deviation will be low. If the speaker has raised or lowered fundamental frequency to give focus to an important word or concept or to indicate a change in topic, the standard deviation will be higher. Speech that is delivered without pitch variation affects a listener’s ability to recall information and is not favored by listeners. This was established by Hahn (2004) who studied listener response to three versions of the same short lecture: delivered with correct placement of primary stress or focus, with incorrect or unnatural focus, and with no focus at all (monotone). She demonstrated that monotonous delivery, as well as delivery with misplaced focus, significantly reduced a listener’s ability to recall the content of instructional speech, as compared to speech delivered with natural focus placement. Furthermore, listeners preferred incorrect or unnatural focus to speech with no focus at all. Intonation has many functions in English, many of which are related to the interaction between speakers in dialogue. In this study, we focus, however, exclusively on intonational functions that are relevant for monologue. Chun (2002) summarizes the functions of intonation in English from the language learning perspective. For monologue, relevant functions would include those “beyond the sentence level for the purpose of achieving continuity and coherence within a discourse, regardless of the length of the discourse” (p. 56). For example, a presenter needs to use intonation to “mark prominence, focus, or newsworthiness of a piece of information in a discourse” and to “mark boundaries in a discourse, e.g. boundaries between sentences, paragraphs, [and] topics” (p.56). Roughly, pitch movement, usually to a Language Learning & Technology

34

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

higher level, is used in English to mark focus, and pitch resets, again to a higher level, are used to introduce new topics. If a speaker speaks with little pitch movement, in a near-monotone voice, the speaker is not making use of the potential of intonation to add structural cues to help the audience understand his or her message. The speaker also risks conveying an impression of disengagement in the topic and the audience (Pickering, 2001). A number of researchers have pointed to the tendency for Asian L1 (first language) individuals to speak in a monotone in English. Speakers of tone languages have particular difficulties using pitch to structure discourse in English. Because in tonal languages “pitch functions to distinguish lexical rather than discourse meaning” (Wennerstrom, 1994, p. 417) they tend to strip pitch movement for discourse purposes from their production of English. Pennington and Ellis (2000) tested how speakers of Cantonese were able to remember English sentences based on prosodic information, and found that even though the subjects were competent in English, the prosodic patterns that disambiguate sentences such as Is HE driving the bus? from Is he DRIVing the bus? were not easily stored in the subjects’ memories. Their conclusion was that speakers of tone languages simply do not make use of prosodic information in English, possibly because for them pitch patterns are something that must be learned arbitrarily as part of a word’s lexical representation. In a second study, however, Pennington and Ellis showed that for certain prosodic features, improvement could be achieved when the subject’s attention was explicitly drawn to prosodic information. Many non-native speakers have difficulty using intonation to signal meaning and structure in their discourse. Wennerstrom (1994) studied how non-native speakers used pitch and intensity contrastively to show relationships in discourse. She found that “neither in … oral-reading nor in … free-speech tasks did the L2 (second language) groups approach the degree of pitch increase on new or contrastive information produced by native speakers. Similarly, there was less reduction of pitch and volume on … redundant words in the oral reading on the part of L2 subjects relative to native speakers” (pp. 415-416). This more monotone speech was particularly pronounced for the subjects whose native language was Thai—a tone language like Chinese. Chinese-native teaching assistants use significantly fewer rising tones than native speakers in their instructional discourse (Pickering, 2001) and thereby miss opportunities to ensure mutual understanding and establish common ground with their students. In a specific study of Chinese speakers of English, Wennerstrom (1998) found a significant relationship between the speakers’ ability to use intonation to distinguish rhetorical units in oral presentations and their scores on a test of English proficiency. Pickering (2004) applied Brazil’s (1986) model of intonational paragraphing to the instructional speech of Chinese-native teaching assistants at an American university. Intonational paragraphing gives structure to English discourse by means of pitch resets at topic changes, and a corresponding series of decreasing peaks until there is a new topic change. By comparing intonational patterns in lab instructions given by native and non-native TAs, she showed that the non-natives lacked the ability to create intonational paragraphs and thereby to facilitate the students’ understanding of the instructions. The analysis of prosodic units in Pickering’s work was “hampered at the outset by a compression of overall pitch range in the [international teaching assistant] teaching presentations as compared to the pitch ranges found in the [native speaker teaching assistant] data set” (2004, p. 31). The Chinese natives were speaking more monotonously than their native-speaking colleagues. Learning to speak with more variation One pedagogic solution to the tendency for Chinese native speakers of English to speak monotonously as they hold oral presentations would be simply to give them feedback when they have used significant pitch movement in any direction. The feedback would be divorced from any connection to the semantic content of the utterance, and would basically be a measure of how non-monotonously they are speaking. While a system of this nature would not be able to tell a learner whether he or she has made pitch movement that is specifically appropriate or native-like, it should stimulate the use of more pitch variation in speakers who underuse the potential of their voices to create focus and contrast in their instructional discourse. It Language Learning & Technology

35

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

could be seen as a first step toward more native-like intonation, and furthermore to becoming a better public speaker. In analogy with other learning activities, we could say that such a system aims to teach students to swing the club without necessarily hitting the golf ball perfectly the first time. Importantly, because the system would give feedback on the production of free speech, it would stimulate and provide an environment for the autonomous practice of authentic communication such as the oral presentation. The use of a computer environment for practicing oral presentations was inspired by the CALL theoretical framework proposed by Levy (1997), who advised that CALL designers give careful consideration to the role they expect the computer to play in the teaching and learning process. Designers should in particular be wary of assigning the role of trusted ‘tutor’ to a computer program that may deliver incorrect feedback on learner production. Here we see the computer’s role as more of a ‘tool’ than a virtual tutor—a tool that will provide a learning environment capable of responding interactively to learner production, without attempting to provide ‘right’ or ‘wrong’ answers to the way the student delivers the presentation. Like the majority of CALL systems, a presentation practice system would provide environments for skills practice where learners are rewarded for meeting certain targets. Unlike most CALL systems, however, the student input would be freely-generated speech with an authentic communicative intent. Enabling communication with a computer is no simple matter, yet much research points to the supremacy of constructive methodologies when it comes to teaching a second language. Having the computer respond to the prosody of presentation speech rather than its lexical content is one way of having it react to the communicative intent of the speaker. In such a system, the target levels for prosodic variation could be flexible, allowing for instructional scaffolding in response to the initial skills of the learner. By providing an environment for rehearsing a presentation, the system would encourage the use of self-assessment by allowing learners to record themselves as they practice. Many learners are bewildered by advice such as: ‘use more variation in your speaking style’; such a system would allow them to test different styles on their own. Finally, like many applications of information and communication technologies in learning situations, the application would stimulate lifelong learning, by being available to users outside traditional classroom settings. Our study was inspired by four points concluded from previous research: 1. Visualization of pitch movement is beneficial to learners but current techniques have limitations. 2. Public speakers need to use varied pitch movement to structure discourse and engage with their listeners. 3. Second language speakers, especially those of tone languages, are particularly challenged when it comes to the dynamics of English pitch. 4. Learning activities are ideally based on the student’s own language, generated with an authentic communicative intent. These findings generated the following primary research question: Will on-line visual feedback on the presence and quantity of pitch variation in learner-generated utterances stimulate the development of a speaking style that incorporates greater pitch variation? Following previous research on technology in pronunciation training (de Bot, 1983; Motohashi-Saigo & Hardison, 2009), comparisons were made between a test group that received visual feedback and a control group that was able to access auditory feedback only. Three hypotheses were tested:

Language Learning & Technology

36

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

1. Visual feedback will stimulate a greater increase in pitch variation in training utterances as compared to auditory-only feedback. 2. Participants with visual feedback will be able to generalize what they have learned about pitch movement and variation to the production of a new oral presentation. 3. Participants with visual feedback will experience a greater degree of satisfaction with their training experience. In addition, we conducted a preliminary follow-up test of human perception of the effect of the training, to ensure that the feedback did not stimulate the development of a speaking style that would be perceived as odd or unnatural. METHOD Base system The system we used consists of a base system allowing students to listen to teacher recordings (targets), read transcripts of these recordings, and make their own recordings of their attempts to mimic the targets. Students may also make recordings of free readings.2 Furthermore, students can browse through targets, make new recordings and listen to their latest recording. The interface keeps track of the students’ actions, and some of this information, such as the number of times a student has attempted a target, is continuously presented to the student. The amount of control the student has over the details in the base system is limited, as it is designed for simplicity of use: file names are assigned automatically, the target files are selected by the teacher and constitute a fixed set of utterances (from the student’s perspective; the teacher can change the utterance set), and only the latest recording can be replayed. Sacrificing detailed control allows us to present the student with an interface that is easy to learn and difficult to misuse. A student session is initiated by informing the software of who the student is. After that, all student actions (listen, record, read transcript, playback) are logged and all student recordings are saved, together with information on the context in which they were recorded. Pitch analysis The meter is fed data from an online analysis of the recorded speech signal. The analysis used in these experiments is based on the /nailon/ online prosodic analysis software (Edlund & Heldner, 2006) and the Snack Sound Toolkit.3 As the student speaks, a fundamental frequency estimation is continuously extracted using an incremental version of getF0/RAPT (Talkin, 1995). The estimation frequency is transformed from Hz to logarithmic semitones, a move from fundamental frequency (an acoustic measure) to pitch (a perceptual measure). There are several reasons for this transformation. Semitones are perceptually relevant, because they are perceptually equidistant, so that a rise of one semitone from 1 to 2 is perceptually the same as a rise from 4 to 5, whereas a rise from 100 to 200 Hz is perceptually much higher than one from 400 to 500 Hz. This gives us a kind of perceptual speaker normalization, which affords us easy comparison between pitch variation in different speakers. Similarly it allows us to compare the variation of a speaker on different occasions, even if the speaker ends up speaking with a generally higher pitch on one of the occasions. Fundamental frequency distributions in Hz over a single speaker also fit a normal distribution less closely than pitch distributions expressed in semitones (Edlund & Heldner, 2007), making the following steps more reliable. After the semitone transformation, the next step is a continuous and incremental calculation of the standard deviation of the student’s pitch over the last 10 seconds. The result is a measure of the student’s recent pitch variation.

Language Learning & Technology

37

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

Pitch variation feedback The base system is extended with a component providing online, instantaneous and transient feedback visualizing the degree of pitch variation the student is currently producing. The feedback is presented in a meter that is reminiscent of the amplitude bars used in the equalizers of sound systems: the current amount of variation is indicated by the number of bars that are lit up in a stack of bars, and the highest variation over the past two seconds is indicated by a lingering top bar, as seen in Figure 1. The meter has a short, constant latency of 100 ms.

Figure 1. Training interface, showing pitch meter to the right. Green bars indicate that the speaker is speaking with relatively increased pitch variation; the single top bar represents the highest relative variation measured in the preceding two seconds. The pitch variation fed to the meter is first normalized against a base value, that is, the pitch variation the student produced in the initial session. The meter utilizes a dampening function, making it impossible for students to max the meter out—the more bars that are lit up, the more variation that is needed to light another one. The pitch meter shows yellow bars when the pitch variation is low or similar relative to the student’s initial reading, and green bars when it is higher. The kind of feedback provided by this system is very different from the kind of feedback given by contour visualization. Its transience does not allow for post-analysis together with the student. It is designed to be used independent of expert interpretation, by students working on their own with a computer. Its automaticity potentially allows a maximum amount of time on task, where the immediacy of seeing a light flash when part of an utterance has been stressed by means of a rise in pitch should reinforce positive developments in speaking habits. Participants The test group and the control group each consisted of 7 students of engineering, 4 women and 3 men.4 The participants were recruited from English classes at Sweden’s largest technological university, and were exchange students from China, in Sweden for stays of six months to two years. Participants’ proficiency in English was judged by means of an internal placement test to be at the upper intermediate to advanced level, with one student at the lower intermediate level. The mean age of the test group was slightly lower than the control group, 22.3 vs. 24.5 years, but both groups had started studying English at an average age of 11. The reported years of English studies were therefore higher for the control group: 12.3 vs. 10.3 years (Table 1). The participants spoke a variety of dialects of Chinese but used Mandarin Language Learning & Technology

38

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

with each other and for their studies. They did not speak Swedish and were using English with their teachers and classmates. Four of them had spent four years at an English-language university in Singapore, but none of them had spent extended periods of time in an inner-circle (Kachru, 1985) English L1 country. Table 1. Data Regarding Participants and Time on Training Task Test (n = 7)

Control (n = 7)

M

SD

M

SD

Age

22.3

0.90

24.5

1.51

Years studying English

10.3

3.59

12.3

1.98

Minutes of training

181

21

171

29

Repetitions of utterances

32

11

23

7

Procedure and Material Each participant began the study by giving an oral presentation of about five minutes in length, either for their English classes or for a smaller group of students. Audio recordings were made of the presentations using a small clip-on microphone that recorded directly into a computer. The presentations were also video-recorded, and participants watched the presentations together with one of the researchers, who commented on presentation content, delivery and language. The individualized training material for each subject was prepared from the audio recordings. A set of 10 utterances, each of about 5-10 seconds in length, was extracted from the participants’ speech. The utterances were mostly non-consecutive and were chosen on the basis of their potential to provide examples of contrastive pitch movement within the individual utterance. The first researcher recorded her own (native-American English) versions of them, making an effort to use her voice as expressively as possible and making more pitch contrasts than in the original student version. For example, a modeled version of a student’s flat utterance could be represented as: “And THIRDly, it will take us a lot of TIME and EFfort to READ each piece of news.” Two sample sets of utterances are shown in the Appendix. The participants were assigned to the control or test groups following the preparation of their individualized training material. Participants were ranked in terms of the global pitch variation in their first presentation, as follows: they were first split into two lists according to gender, and each list was ordered according to initial global pitch variation. Participants were randomly assigned pair-wise from the list to the control or test group, ensuring gender balance as well as balance in initial pitch variation. Four participants who joined the study at a later date were distributed in the same manner. Participants completed approximately three hours of training in half-hour sessions; some participants chose to occasionally have back-to-back sessions of one hour. The training sessions were spread out over a period of four weeks. The mean training time per group and number of repeated utterances are reported in Table 1. Control participants repeated a fewer number of utterances than did test participants; this is probably due to the fact that the only feedback they could receive was to listen to recordings of their production, which in itself used up some of the training time. Training took place in a quiet and private room at the university language unit, without the presence of the researchers or other onlookers. For the first four or five sessions, participants listened to and repeated the teacher versions of their own utterances. They were instructed to listen and repeat each of their 10 utterances between 20 and 30 times. Test group participants received the visual feedback described above and were encouraged to speak so that the meter showed a maximum amount of green bars. The control group was able to listen to recordings of their production but received no other feedback. Upon completion of the repetitions, both groups were encouraged to use the system to practice their second oral presentation, which was to be on a different topic than the first presentation. For this practice, Language Learning & Technology

39

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

the part of the interface (Figure 1) designated for ‘free speech’ was used. In these sessions, once again the test participants received visual feedback on their production, while control participants were only able to listen to recordings of their speech. Within 48 hours of completing the training, the participants held another presentation, this time about ten minutes in length, for most of them as part of the examination of their English courses. This presentation was audio recorded. Questionnaire Participants also completed a questionnaire (Table 2) about their experience taking part in the training. The questionnaire consisted of 10 statements about the training, to which the participants responded on a 5-point Likert scale where 5 = agreed completely and 1 = disagreed completely. Test participants responded to an additional four statements specifically about the pitch meter. Perception Tests In the final stage of the study, listening tests were carried out in order to ensure that the pitch variation feedback had not stimulated the development of an unnatural speaking style. Two native-speaking authorities on intonation rated one minute of speech from each of the first and second presentations. The minute between 3.00 and 4.00 was extracted from the second presentations. Because the first presentations were shorter, it was necessary to use the minute between 2.30 and 3.30 to avoid lexical cues that the presentations were coming to an end. Raters listened to sets of the 12 male files and 16 female files separately, in separately randomized orders. They rated the speech on a five-point scale for four qualities: naturalness, liveliness, pronunciation, and intelligibility. RESULTS Pitch variation We measured development in two ways: over the roughly three hours of training per student, in which case we compared pitch variation in the first and the second half of the training for each of the 10 utterances used for practice, and in generalized form, by comparing pitch variation in two presentations, one before and one after training. Pitch estimations were extracted using the same software used to feed the pitch variation indicator used in training, an incremental version of the getF0/RAPT (Talkin, 1995) algorithm. Variation was calculated in a manner consistent with Hincks (2005) by calculating the standard deviation over a moving 10-second window. In the case of the training data, recordings containing noise only or those that were empty were detected automatically and removed. For each of the 10 utterances included in the training material, the data were split into a first and a second half, and the recordings from the first half were spliced together to create one continuous sound file, as were the recordings from the second half. The averages of the windowed standard deviation of the first and the second half of training were compared. The basic assumption was that speakers from both groups should have a higher pitch variation in the latter half of training than in the first, and Hypothesis 1 states that the test group should show a greater increase than the control group. For the two presentations, the basic assumption was that both groups should show an increased variation after training as compared to before. Hypothesis 2 states that the test group should show a larger increase than the control. The mean standard deviations for each data set and each of the two groups are shown in Figure 2. The yaxis displays the mean standard deviation per moving 10-second frame of speech in semitones, and the xaxis the four points of measurement: the first presentation, the first half of training, the second half of training, and the second oral presentation. The experimental group shows greater pitch variation across all points of measurement following training. Learning takes place in the first half of training, where the difference between the two groups jumps significantly from nearly no difference to one of more than 2.5

Language Learning & Technology

40

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

semitones, and is maintained in the second half of training. It is then transferred to the second presentation, which took place without the presence of feedback.

Figure 2. Average pitch variation over 10 seconds of speech for the two experimental conditions during the 1st presentation, the 1st half of the training, the 2nd half of the training, and the 2nd presentation. The test group shows a statistically significant effect of the feedback they were given.

Figure 3. Average standard deviation of pitch over 10 seconds of speech for each of the participants during the 1st presentation, the 1st half of the training, the 2nd half of the training and the 2nd presentation. S1-s7 belong to the test group; s8-14 are the control. The effect of the feedback method (test group vs. control group) was analyzed using an ANOVA with time of measurement (1st presentation, 1st half of training, 2nd half of training, 2nd presentation) as a within-subjects factor. The sphericity assumption was met, and the main effect of time of measurement was significant (F = 8.36, p < .0005, η² = 0.45) indicating that the speech of the test group receiving visual feedback increased more in pitch variation than the control group. Between-subject effect for Language Learning & Technology

41

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

feedback method was significant (F = 6.74, p = .027, η² = 0.40). The two first hypotheses are confirmed by these findings. The individual results per speaker are illustrated in Figure 3. Expert ratings The results of the preliminary listening test are encouraging. The purpose of rating the one-minute samples from each presentation was to eliminate concerns that the visual feedback could promote the development of an unnatural speaking style if speakers made wild pitch excursions in order to make the green lights flash. The averages of the two ratings are shown in Figure 4. The responses of only two raters provide too little data to allow for statistical analysis, and their inter-rater agreement was only moderate, with a Pearson correlation of 0.41. However, ratings for the test group indicate a positive trend toward a slight improvement in both liveliness and naturalness. The control group was also perceived to increase in liveliness, but was found to worsen in terms of naturalness. Little change is perceived for either group in pronunciation and intelligibility, which were two features our system did not attempt to address. Though we reiterate that this preliminary test can only give indications as to the effect of the training, we believe that the ratings show that we do not need to be concerned that test participants were prompted to put new pitch movement in unnatural places. The feedback thusly did not have a damaging effect on the participants.

5.0 Before training

4.5

After training

Mean rating

4.0 3.5 3.0 2.5 2.0 1.5 1.0 Test

Control

Liveliness

Test

Control

Naturalness

Test

Control

Pronunciation

Test

Control

Intelligibility

Figure 4. Average of two expert blind ratings of one minute of speech extracted from the first oral presentation (before training) and the second oral presentation (after training). Questionnaire Both control and experimental students were satisfied with their training. Table 2 shows the results of the questionnaire. On the 5-point scale, the mean responses to all questions but one were 4 or above. The mean satisfaction for the test group was slightly higher than for the control group: 4.34 vs. 4.29, but a two-sample t-test assuming equal variance of the mean responses to the 10 questionnaire statements common to both groups showed no effect of group, t(18) = .330, p < .05. Both groups must thus be seen to be equally satisfied with their training, and Hypothesis 3 must be rejected.

Language Learning & Technology

42

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

Table 2. Results of Questionnaire Regarding Student Satisfaction with Training Test #

Question

Control

M (n = 7)

SD

M (n = 7)

SD

1

Watching the video of my first presentation showed me what I had to do to improve my pronunciation

4.29

.95

4.57

.53

2

Receiving the teacher’s comments on the first presentation helped me improve my pronunciation

4.86

.38

4.71

.49

3

Listening to the teacher version of my utterances helped me improve my pronunciation

4.43

.53

4.71

.76

4

Imitating teacher utterances helped me improve my pronunciation

4.43

.79

4.43

.53

5

Listening to my own new recordings of the utterances helped me improve my pronunciation

4.43

.79

4.00

.58

6

Trying to get the pitch meter to reach a high level as I practiced the utterances helped me improve my pronunciation

4.00

.82

n.a.

7

Listening to the recording as I practiced my second presentation helped me improve my pronunciation

4.14

.90

3.86

8

Watching the pitch meter as I practiced my second presentation helped me improve my pronunciation

4.29

.76

n.a.

9

My pronunciation in general has improved because of my participation in this project

4.00

.58

4.29

.76

10

My English intonation has improved because of my participation in this project

4.29

.49

4.14

.69

11

My production of the individual sounds of English has improved because of the project

4.00

1.00

3.86

.69

12

My presentation skills have improved because of my participation in the project

4.43

.53

4.29

.76

13

I would recommend the pitch meter to other people who want to improve their pronunciation

4.71

.49

n.a.

14

I understood the connection between what I did with my voice and the movement of the pitch meter

4.57

.53

n.a.

General satisfaction with training (mean)

4.35

.69

4.29

DISCUSSION The major goal of the work underlying this study has been to stimulate intermediate and advanced learners of English to make use of the expressive potential of English intonation as they speak in public. A basic point of departure has been that English speech intended for a large audience is characterized by a Language Learning & Technology

43

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

large amount of pitch variation (Johns-Lewis, 1986). We hypothesized that lights briefly flashing in response to the standard deviation of fundamental frequency would be an effective means of stimulating an increase in pitch variation in monologue. Hypothesis 1, which stated that test participants would increase pitch variation significantly more than control participants during the course of their training, was confirmed by our data. Hypothesis 2, which stated that test participants would be better able to generalize what they had learned to the production of a new presentation, was also confirmed. Our results are in line with other research that has shown that visual feedback on pronunciation is beneficial to learners (De Bot, 1983; Hardison, 2004; Motohashi-Saigo & Hardison, in press; Neri, Cucchiarini, & Strik, 2008). The visual channel provides information about linguistic features that can be difficult for second language learners to perceive audibly. The first language of our Chinese participants uses pitch movement to distinguish lexical meaning; these learners can therefore experience difficulty in interpreting and producing pitch movement at a discourse level in English (Pennington & Ellis, 2000; Pickering, 2004; Wennerstrom, 1994). Our feedback gave each test participant visual confirmation when they had stretched the resources of their voices beyond their own baseline values. It is possible that some participants had been using other means, particularly intensity, to give focus to their English utterances. The visual feedback rewarded them for using pitch movement only, and could have been a powerful factor in steering them in the direction of an adapted speaking style. While our data were not recorded in a way that would allow for an analysis of the interplay between intensity and pitch as Chinese speakers give focus to English utterances, this would be an interesting area for further research. Based on the results of the questionnaire, both the experimental and the control participants felt that they had had a rewarding experience by participating in this study. Hypothesis 3, which stated that participants receiving feedback would feel more positively about their pronunciation development than control students, must therefore be rejected. It is likely that all participants were pleased by the extra contact they were able to have with their English teacher, and indeed the questions mentioning the teacher received the highest responses (Table 2). This is perhaps symptomatic of the current hunger for English proficiency found in the Chinese culture. Although many of the participants interacted socially with each other, none was aware of the differences between the control and the test interfaces, and seemed to think that imitating teacher models was central to the study. Test students did spend a slightly higher mean time on their training (Table 1). In a study of this nature, it is difficult to control all possible variables, and because our students were diligent, reliable and interested, we found, as did both Motohashi-Saigo and Hardison (in press) and Wang and Munro (2004), that it was possible to successfully run a study where students could participate according to their own schedules. A serious potential concern with intonation feedback that is divorced from semantic content is that it could promote the development of an unnatural speaking style. Speech that is produced with unnatural focus is slightly more difficult to comprehend than monotone speech, though it is still preferred to monotone speech (Hahn, 2004). Fortunately, it does not appear to be the case that speakers put new focus in unnatural places. The raters, both of whom are experienced researchers in the field of first-language intonation, judged the naturalness of the intonation of a sample from the test group’s second presentation to be at least as good as their first presentation. This would indicate that the feedback does not have a damaging effect. The increased pitch variation that we have measured is more likely to be contributing to an improvement in the speakers’ intonation. Since researchers have pointed to the problems associated with flat tones produced by Chinese speakers making oral presentations, such as conveying the impression that the speaker is disengaged with the audience (Pickering, 2001), it could be argued that speakers should be encouraged to use more pitch movement in any direction as a step towards developing more expressive English intonation. Given greater resources in terms of time and potential participants, it would have been interesting to compare the development of pitch variation with other kinds of feedback. For example, we could also have given completely random feedback to a third group of students to test for a placebo effect, though Language Learning & Technology

44

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

that would be ethically questionable. However, the fact that our students were not aware of the differences between the two interfaces would indicate that we do not need to be concerned about a placebo effect. We could also have displayed pitch tracings of the training utterances. It has not been an objective of our study, however, to prove that our method is superior to showing pitch tracings. We simply feel that circumventing the contour visualization process allows for the more autonomous use of speech technology. A natural development in future research will be to have learners practice presentation skills without teacher models. It is important to point out that we cannot determine from these data that speakers became better presenters as a result of their participation in this study. A successful presentation entails, of course, very many features, and using pitch well is only one of them. Other vocal features that are important are the ability to clearly articulate the sounds of the language, the rate of speech, and the ability to speak with an intensity that is appropriate to the spatial setting. In addition, there are numerous other features regarding the interaction of content, delivery and audience that play a critical role in how the presentation is received. Our presentation data, gathered as they were from real-life classroom settings, are in all likelihood too varied to allow for a study that attempted to find a correlation between pitch variation and, for example, the perceived clarity of a presentation. However, we do wish to further explore perceptions of the speakers beyond the preliminary ratings of one minute of speech per presentation. We also plan to develop feedback gauges for other intonational features, beginning with rate of speech. We see potential to develop language-specific intonation pattern detectors that could respond to, for example, a speaker’s tendency to use French intonation patterns when speaking English. Such gauges could form a type of toolbox that students and teachers could use as a resource in the preparation and assessment of oral presentations. Our study contributes to the field in a number of ways. It is, to the best of our knowledge, the first to rely on a synthesis of online fundamental frequency data in relation to learner production. We have not shown the speakers the absolute fundamental frequency itself, but rather how much it has varied over time as represented by the standard deviation. This variable is known to characterize discourse intended for a large audience (Johns-Lewis, 1986), and is also a variable that listeners can perceive if they are asked to distinguish lively speech from monotone (Hincks, 2005; Traunmüller & Eriksson, 1995). In this paper, we have demonstrated that it is a variable that can effectively stimulate production as well. Furthermore, the variable itself provides a means of measuring, characterizing and comparing speaker intonation. It is important to point out that enormous quantities of data lie behind the values reported in our results. Measurements of fundamental frequency were made 100 times a second, for stretches of speech up to 45 minutes in length, giving tens of thousands of data points per speaker for the training utterances. By converting the Hertz values to the logarithmic semitone scale, we are able to make valid comparisons between speakers with different vocal ranges. This normalization is an aspect that appears to be neglected in commercial pronunciation programs such as Auralog’s Tell Me More series, where pitch curves of speakers of different mean frequencies can be indiscriminately compared. There is a big difference in the perceptual force of a rise in pitch of 30Hz for a speaker of low mean frequency and one with high mean frequency, for example. These differences are normalized by converting to semitones. Secondly, our feedback can be used for the production of long stretches of free speech rather than short, system-generated utterances. It is known that intonation must be studied at a higher level than that of the word or phrase in order for speech to achieve proper cohesive force over longer stretches of discourse (Brazil, 1997; Chun, 2002; Levis & Pickering, 2004; Pickering, 2004). By presenting the learners with information about their pitch variation in the previous ten seconds of speech, we are able to incorporate and reflect the vital movement that should occur when a speaker changes topic, for example. In an ideal world, most teachers would have the time to sit with students, examine displays of pitch tracings, and discuss how peaks of the tracings relate to each other with respect to theoretical models such as Brazil’s intonational paragraphs (Brazil, 1997; Levis & Pickering, 2004). Our system cannot approach that level Language Learning & Technology

45

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

of detail, and in fact cannot make the connection between intonation and its lexical content. However, it can be used by learners on their own, in the production of any content they choose. It also has the potential for future development in the direction of more fine-grained analyses. A third novel aspect of our feedback is that it is transient and immediate. Our lights flicker and then disappear. This is akin to the way we naturally process speech; not as something that can be captured and studied, but as sound waves that last no longer than the milliseconds it takes to perceive them. It is also more similar to the way we receive auditory and sensory feedback when we produce speech—we only hear and feel what we produce in the very instance we produce it; a moment later it is gone. Though at this point we can only speculate, it would be interesting to test whether transient feedback might be more easily integrated and automatized than higher-level feedback, which is more abstract and may require more cognitive processing and interpretation. The potential difference between transient and enduring feedback has interesting theoretical implications that could be further explored. This study has focused on Chinese speakers because they are a group where many speakers can be expected to produce relatively monotone speech, and where the chances of achieving measurable development in a short period of time were deemed to be greatest. However, there are all kinds of speaker groups who could benefit from presentation feedback. Like many communicative skills that are taught in advanced language classes, the lessons can apply to native speakers as well. Teachers who produce monotone speech are a problem to students everywhere (Hamilton, 2006). Nervous speakers can also tend to use a compressed speaking range, and could possibly benefit from having practiced delivery with an expanded range. Clinically, monotone speech is associated with depression (Nilsonne, 1987) and can also be a problem that speech therapists need to address with their patients. However, the primary application we envisage here is an aid for practicing, or perhaps even delivering, oral presentations. It is vital to use one’s voice well when speaking in public. It is the channel of communication, and when used poorly, communication can be less than successful. If listeners either stop listening, or fail to perceive what is most important in a speaker’s message, then all actors in the situation are in effect wasting time. A speaker delivering a monologue has an even larger responsibility than a writer does, in that the speech is transient and cannot be re-read if the meaning is missed. We require writers to write clearly, and we should require speakers to speak clearly. Clear speech in English requires pitch contrasts to show given and new information and introduce topic changes. We hope to have shown in this paper that stimulating speakers to produce more pitch variation in a practice situation has an effect that can transfer to new situations. People can learn to be better public speakers, and technology should help in the process.

NOTES 1. Available at http://www.speech.kth.se/wavesurfer/ and at http://www.praat.org. 2. Sound files illustrating the repetitions of the utterances are available at http://llt.msu.edu/vol13num3/hincks.zip. There is one file for a test participant and one for a control participant. In the files can be heard first the original utterance taken from the first oral presentation, then a repetition of the utterance made before training started. After these two baseline utterances come seven training utterances, for each the 1st, 5th, 10th, 15th, 20th, 25th, and 30th repetitions. Note that the control student’s intonation varies little between the repetitions, while the student who saw feedback tries different ways of speaking. 3. Sjölander 1997-2008, available at http://www.speech.kth.se/snack/. 4. The original groups consisted of 8 participants, 4 men and 4 women. However, two outlying participants who had joined the study without being tested as to their English abilities were removed Language Learning & Technology

46

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

because they differed drastically from that of the rest of the group: one turned out to have studied English for only two years and did not have the degree of proficiency to benefit from the training, and the other was about ten years older than the other participants, being a Ph.D. rather than a Master’s student.

APPENDIX: Sample Training Utterances Sample Set 1 1. Good evening, my name is XXXX, I’m here from Nanjing Technological University Singapore, and I’m here for my half-year exchange study 2. And today I’d like to give you a short presentation on RSS, which stands for really simple syndication 3. So what RSS does is that it’s an effective web technology that delivers web content to the user 4. Let me first give you an example of how we read news or blog entries in the web now 5. This procedure has been passed on maybe ten years ago and it is still going on nowadays 6. But whether we’re aware of it or not, there are some problems we have in this process 7. First of all when we’re reading a piece of news, we are not aware of what time it is published 8. And thirdly, it will take us a lot of time and effort to read each piece of news 9. So the old technology is a pull technology, we are pulling data from the website 10. In other words, now we are going to have a push technology: we are going to let the web content push to us directly

Sample Set 2 1. Today I’d like to talk about lithium batteries 2. Portable consumer electronic devices just like laptops, cameras and mobile phones 3. As we know, lithium is a positive ion, which is held in the anode of the electrolyte of the batteries 4. This process enables the electron’s flow as an external current. 5. On the other hand, when we apply a voltage to recharge it 6. The lithium ions are driven back to the anode again, and are ready to give us power 7. And this is mainly because of the overheating, which is caused by the defect manufacturing in the battery 8. Because lithium burns violently when it is exposed to moisture 9. Then second, do not try to put out a battery fire with water 10. The right way to do that is to use a chemical-based extinguisher

Language Learning & Technology

47

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

ACKNOWLEDGEMENTS We would like to thank our teaching and research colleagues at the Department of Speech, Music and Hearing for their support. In particular we are grateful for the support from Anders Askenfelt, Beyza Björkman, Sandra Brunsberg, and Mattias Heldner, and the contributions from Julia Hirschberg and David House. The technology used in the research was developed in part within the Swedish Research Council project #2006-2172 (Vad gör tal till samtal / What makes speech special). We also thank the Chinese students who reliably and enthusiastically participated in the study, and express our warmest gratitude for the valuable comments made by editors and anonymous reviewers.

ABOUT THE AUTHORS Rebecca Hincks is Associate Professor of English at the Unit for Language and Communication at KTH. She teaches Technical and Scientific English at different levels, and her primary research interest is in the use of speech technology to develop spoken language skills. E-mail: [email protected] Jens Edlund is a researcher at the Centre for Speech Technology at KTH. His main research interest is spoken communication, both between humans and between humans and computers. He currently investigates the specifics of spoken interaction within the Swedish Research Council project #2006-2172 (Vad gör tal till samtal/What makes speech special). E-mail: [email protected]

REFERENCES Anderson-Hsieh, J. (1992). Using electronic visual feedback to teach suprasegmentals. System, 20(1), 5162. Boersma, P. (2001). PRAAT: A system for doing phonetics by computer. Glot International, 5, 341-345. Brazil, D. (1986). The communicative value of intonation in English. Birmingham, UK: University of Birmingham, English Language Research. Brazil, D. (1997). The communicative value of intonation in English. Cambridge, UK: Cambridge University Press. Chun, D. (1998). Signal analysis software for teaching discourse intonation. Language Learning & Technology, 2(1), 61-77. Retrieved from http://llt.msu.edu Chun, D. (2002). Discourse intonation in L2: From theory and research to practice. Philadelphia: John Benjamins. de Bot, K. (1983). Visual feedback of intonation I: Effectiveness and induced practice behavior. Language and Speech, 26(4), 331-350. Edlund, J., & Heldner, M. (2006). /nailon/ - software for online analysis of prosody. Paper presented at the Interspeech 2006 ICSLP, Pittsburgh PA, USA. Edlund, J., & Heldner, M. (2007). Underpinning /nailon/ - automatic estimation of pitch range and speaker relative pitch. In C. Müller (Ed.), Speaker Classification II: Fundamentals, Features, and Methods, 229-242. Berlin/Heidelberg, Germany: Springer.

Language Learning & Technology

48

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38(2), 201-223. Hamilton, A. (2006, August 4). Vocal cords need to be brushed up for the classroom. The Times Online. Retrieved from http://www.timesonline.co.uk/tol/news Hardison, D. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning & Technology 8(1), 34-52. Retrieved from http://llt.msu.edu Hincks, R. (2005). Measures and perceptions of liveliness in student oral presentation speech: A proposal for an automatic feedback mechanism. System, 33(4), 575-591. Hincks, R. (2009). Speaking rate and information content in English lingua franca oral presentations. English for Specific Purposes. Advance online publication. doi:10.1016/j.esp.2009.05.004 Jenkins, J. (2000). The phonology of English as an international language: New models, new norms, new goals. Oxford: Oxford University Press. Johns-Lewis, C. (1986). Prosodic differentiation of discourse modes. In C. Johns-Lewis (Ed.), Intonation in discourse (pp. 199-220). Breckenham, Kent: Croom Helm. Kachru, B. B. (1985). Standards, codification and sociolinguistic realism: the English language in the outer circle. In R. Quirk & H. Widdowson (Eds.), English in the world. Cambridge, UK: Cambridge University Press. Levis, J. (2008). Computer technology in teaching and researching pronunciation. Annual Review of Applied Linguistics 27, 184-202. Levis, J., & Pickering, L. (2004). Teaching intonation in discourse using speech visualization technology. System, 32, 505-524. Levy, M. (1997). Computer-Assisted Language Learning. Oxford: Clarendon Press. Molholt, G. (1988). Computer-assisted instruction in pronunciation for Chinese speakers of American English. TESOL Quarterly, 22(1), 91-111. Motohashi-Saigo, M., & Hardison, D. (2009). Acquisition of L2 Japanese geminates: Training with waveform displays. Language Learning & Technology 13 (2), 29-47. Retrieved from http://llt.msu.edu Neri, A., Cucchiarini, C., & Strik, H. (2008). The effectiveness of computer-based speech corrective feedback for improving segmental quality in L2 Dutch. ReCALL, 20(2), 225-243. Nilsonne, Å. (1987). Speech in depression: A methodological study of prosody. Stockholm: Karolinska Institute. Pennington, M., & Ellis, N. (2000). Cantonese speakers’ memory for English sentences with prosodic cues. The Modern Language Journal 84(iii), 372-389. Pickering, L. (2001). The role of tone choice in improving ITA communication in the classroom. TESOL Quarterly, 35(2), 233-255. Pickering, L. (2004). The structure and function of intonational paragraphs in native and nonnative speaker instructional discourse. English for Specific Purposes, 23, 19-43. Rosenberg, A., & Hirschberg, J. (2005). Acoustic/prosodic and lexical correlates of charismatic speech. Paper presented at the Interspeech 2005, Lisbon. Sjölander, K., & Beskow, J. (2000). WaveSurfer: An open source speech tool. Paper presented at the International Conference on Spoken Language Processing 2000, Beijing.

Language Learning & Technology

49

Rebecca Hincks and Jens Edlund

Promoting Increased Pitch Variation

Strangert, E., & Gustafson, J. (2008). Subject ratings, acoustic measurements and synthesis of goodspeaker characteristics. Paper presented at the Interspeech 2008, Brisbane, Australia. Talkin, D. (1995). A robust algorithm for pitch tracking (RAPT). In W. B. Klejin, & Paliwal, K. K (Ed.), Speech coding and synthesis (pp. 495-518). Amsterdam: Elsevier. Traunmüller, H., & Eriksson, A. (1995). The perceptual evaluation of F 0 excursions in speech as evidenced in liveliness estimations. Journal of the Acoustical Society of America, 97(3), 1905-1915. Wang, X., & Munro, M. (2004). Computer-based training for learning English vowel contrasts. System, 32, 539-552. Wennerstrom, A. (1994). Intonational meaning in English discourse: A study of non-native speakers. Applied Linguistics, 15(4), 399-421. Wennerstrom, A. (1998). Intonation as cohesion in academic discourse: A study of Chinese speakers of English. Studies in Second Language Acquisition, 20, 1-25.

Language Learning & Technology

50

Language Learning & Technology http://llt.msu.edu/vol13num3/tannerlandon.pdf

October 2009, Volume 13, Number 3 pp. 51–65

THE EFFECTS OF COMPUTER-ASSISTED PRONUNCIATION READINGS ON ESL LEARNERS’ USE OF PAUSING, STRESS, INTONATION, AND OVERALL COMPREHENSIBILITY Mark W. Tanner and Melissa M. Landon Brigham Young University ABSTRACT With research showing the benefits of pronunciation instruction aimed at suprasegmentals (Derwing, Munro, & Wiebe, 1997, 1998; Derwing & Rossiter, 2003; Hahn, 2004; McNerney and Mendelsohn, 1992), more materials are needed to provide learners opportunities for self-directed practice. A 13-week experimental study was performed with 75 ESL learners divided into control and treatment groups. The treatment group was exposed to 11 weeks of self-directed computer-assisted practice using Cued Pronunciation Readings (CPRs). In the quasi-experimental pre-test/post-test design, speech perception and production samples were collected at Time 1 (week one of the study) and Time 2 (week 13). Researchers analyzed the treatment’s effect on the learners’ perception and production of key suprasegmental features (pausing, word stress, and sentence-final intonation), and the learners’ level of perceived comprehensibility. Results from the statistical tests revealed that the treatment had a significant effect on learners’ perception of pausing and word stress and controlled production of stress, even with limited time spent practicing CPRs in a self-directed environment. INTRODUCTION Computer-Assisted Language Learning (CALL) is of interest to language teachers and learners because it can provide individualized instruction and immediate feedback on the correctness of a learner’s response to computerized tasks (Nagata, 1993). In computer-aided pronunciation (CAP), technology has increased learners’ access to their and others’ pronunciation performance through visual displays such as spectrographic analyses of individual phonemes or amplitude waves showing levels of intensity for isolated words or phrases (Anderson-Hsieh, 1992, 1994; Hardison, 2004; Molholt, 1988). While such programs can provide learners with immediate feedback regarding the accuracy of an utterance compared to that of a native speaker (NS), they typically require teacher supervision and interpretation. Pennington (1999) cites another drawback of CAP, stating that nearly all CAP programs focus exclusively on segmentals. This focus implies that intelligibility is primarily impaired by the articulation of individual sounds and ignores the influence of prosody. If intelligibility is prioritized above accuracy, a focus on key words, stress, rhythm, and intonation rather than the articulation of individual sounds, may be needed. Empirical research has begun to confirm the importance of prosodic features in learners’ overall intelligibility and perceived comprehensibility. Blau (1990) found that appropriate pausing patterns in native English speech had a significantly greater effect on non-native listeners’ comprehension than either syntactic complexity or speech rate. Fayer and Krasinski (1995) found that native speakers’ intelligibility judgment of non-native English speech significantly correlated with pause length. Native speakers rated non-native speakers (NNSs) with longer individual pauses and greater total pause time as less intelligible than those with more appropriate pause length. Towell, Hawkins, and Bazergui (1996) found that as British learners of French improved their pausing patterns in French, their overall fluency improved, as measured by speaking rate. Regarding stress, Field (2005) found that when native English speech was manipulated to include incorrect lexical stress, the ability of both NS and NNS listeners to locate words in connected speech was Copyright © 2009, ISSN 1094-3501

51

Tanner and Landon

Effects of Computer-Assisted Pronunciation

seriously affected (p. 419). Sentence stress, also known as primary stress (Hahn, 2004), also has a crucial role in successful communication. Hahn (2004) reported that correct sentence-level stress by an ESL speaker, compared to misplaced or omitted stress, led to improved listener comprehension and recall of content. Regarding the importance of intonation, Levis (1999, 2002), Levis and Pickering (2004), and Jenkins (2004) have emphasized the importance of teaching intonation in context, preferably at a discourse-level, rather than within isolated sentences. Wennerstrom (1998) analyzed brief lectures given by Chinese ESL speakers and found that those using appropriate intonation contours received higher ratings on a speaking skills test. Similarly, Pickering (2004) found that NS teachers consistently used discourse-level intonational cues to “emphasize relationships between semantically related sections of the discourse and highlight information structure” (p. 38) while international teaching assistants (ITAs) did not. She noted that the ITAs’ weaknesses in discourse-level intonation were likely to impede students’ understanding. This research into the effects of pausing, stress, and intonation on the comprehensibility of non-native English speech has prompted teachers and materials developers to devise various techniques for incorporating suprasegmental practice into the classroom. The earliest such technique, jazz chanting (Graham, 1978), continues to be used and advocated by teachers who have students chant poems and songs to become more familiar with English rhythm, stress, and intonation (Richman, 2005). Other oral techniques are advocated in pedagogical materials to enhance learners’ use of prosodic features. Mirroring, tracking, and shadowing involve imitating native speaker discourse models (Celce-Murcia, Brinton, & Goodwin, 1996). A fourth imitative technique, imitative conversation (Goodwin, 2004), has ESL learners select, analyze, and then replicate a brief one- to two-minute clip of dialogic speech from a movie or television show. While such techniques have been promoted as ways to provide contextualized classroom practice for suprasegmentals, empirical research regarding the specific benefits of such techniques has been slow in coming. As Celce-Murcia, et al (1996) put it: There is no consensus in the literature about which of these techniques is most effective; we advise teachers to experiment with them and get feedback from their students as the learners themselves are the ultimate judges of what they find most useful (p. 310). Clearly, empirical research is needed to investigate the actual effects of these pedagogical techniques on learners so teachers can make sound decisions about their use. Empirical studies conducted in recent years have focused on the effects of various types of pronunciation instruction on learners’ overall levels of intelligibility and comprehensibility. Intelligibility is “the extent to which a listener actually understands an utterance” (Derwing & Munro, 2005, p. 385) and is often evaluated through transcription or listening comprehension tasks performed by a listener. Comprehensibility is “a listener’s perception of how difficult it is to understand an utterance” (Derwing & Munro, 2005, p. 385). Comprehensibility (also referred to as perceived comprehensibility) is often measured using a Likert scale to rate speech samples based on listeners’ perceptions of how easily the speaker or speech stimuli can be understood. Derwing, Munro, and Wiebe (1997) investigated how a 12-week course focusing on prosodic features might impact ESL learners considered to be fossilized in their English pronunciation skills. Learners were recorded at Time 1 and Time 2 reading sentences and telling a story based on picture prompts. Results from 57 NS listeners showed significant changes at Time 2 in the ESL speakers’ intelligibility, comprehensibility, and accent. The researchers concluded that pronunciation instruction deemphasizing the importance of segmental units, combined with a greater focus on prosody and general speaking characteristics, can effectively change fossilized pronunciation patterns in individuals who have spent years in an English speaking environment. Language Learning & Technology

52

Tanner and Landon

Effects of Computer-Assisted Pronunciation

Considering the 1997 study’s positive finding, Derwing, Munro and Wiebe (1998) conducted a second study comparing the effects of different types of instruction on NNSs’ comprehensibility, accentedness, and fluency. Forty-eight intermediate ESL students were exposed to one of three types of instruction (segmental accuracy, global features (e.g., stress, intonation, rhythm) and no specific program of instruction) for 12 weeks. The results indicated that in the sentence-reading tasks, the segmental and global treatment groups improved from Time 1 to Time 2 in comprehensibility and accentedness. In the narrative task, only the treatment group improved significantly. The researchers concluded “attention to both global and segmental concerns benefits ESL students. The global instruction, however, seems to provide the learner with skills that can be applied in extemporaneous speech production” (p. 407). Derwing and Rossiter (2003) reanalyzed the data from the Derwing, et al. (1998) study in an effort to determine how the improvements in comprehensibility and fluency reported in the results for the narrative task were manifested in students’ oral productions before and after pronunciation instruction. They concluded that if the goal of pronunciation teaching is to help students become more understandable, then instruction should include a stronger emphasis on prosody (p. 14). While teachers and learners experiment with different techniques to promote measurable gains, one challenge facing learners is the number of teachers lacking formal preparation and training to teach pronunciation. Breitkreutz, Derwing, and Rossiter (2002) surveyed ESL teachers in Canada and found that 67% reported having no training in pronunciation instruction. Derwing and Munro (2005) cite additional studies indicating the lack of prepared pronunciation teachers in other English-speaking countries such as Britain and Australia. This lack of qualified teachers results in a lack of quality pronunciation instruction, suggesting a need for materials that enable learners to direct their own pronunciation learning outside the classroom. Studies by Hardison (2004) and Pennington and Ellis (2000) have shown that computer technology can help second language learners learn prosodic patterns if the computer tasks focus learners’ attention on how prosody works within a piece of discourse. As ESL learners become more aware of how these prosodic features function, they can begin to predict where pauses should occur, which syllables and words should be stressed, and whether intonation should rise or fall at the end of an utterance. This process of perception and prediction, followed by production of prosodic features, is advocated by researchers promoting the use of pronunciation strategies (Hahn & Hahn, 2007; Sardegna & Molle, 2008). Embedding these strategies into computer-assisted materials would benefit learners by allowing them to take control of their own learning and by providing discourse-length contexts in which to practice those prosodic features that improve intelligibility and comprehensibility. Current Study This study is intended to empirically evaluate a self-directed, computer-assisted technique that uses oral readings to improve students’ perception and production of pausing, word stress, and sentence-final intonation. Oral reading techniques have been recommended by several researchers (Anderson-Hsieh, 1990; Firth, 1992; Ricard, 1986; Walker, 2005) to raise ESL learners’ awareness of individual prosodic features. In the techniques they advocate, ESL learners listen to (150-300-word) passages recorded by NSs, mark the location of an individual suprasegmental feature (e.g., pausing or syllable stress), practice the reading orally with the appropriate feature marked, and then record themselves reading the passage. The oral reading technique studied empirically here is referred to as CPR. This cued pronunciation reading technique differs from previous oral reading techniques discussed or advocated in the pedagogical literature in several key ways. First, the CPR tasks are almost entirely self-directed. Students are given an overview of pausing, word stress, and sentence-final intonation patterns in English. This overview raises their awareness of prosody and gives them the skills necessary to begin predicting the occurrence of prosodic features. Teachers’ involvement is minimal because students complete the tasks on their own and teachers provide no feedback on students’ recordings. A second difference is that students practice Language Learning & Technology

53

Tanner and Landon

Effects of Computer-Assisted Pronunciation

perceiving and producing multiple suprasegmental features within a single passage instead of separate passages for each feature. A third difference between CPR and other oral reading techniques is that learners see an answer key after each perception task so they can check their perception and prediction of prosodic features before practicing the passage orally with the recorded NS model and recording it themselves. Since this study’s focus was the influence of self-directed readings on learners’ perception and production skills, participants voluntarily completed the CPR tasks as extra-credit tasks outside the normal class time. To facilitate the use of CPR tasks in a self-directed context, each CPR was set up as a series of PowerPoint slides. Audio recordings from a NS model were embedded in the PowerPoint slides to let participants listen to them as many times as necessary to complete the listening (perception) activities. The 11 CPRs used in this study covered topics from telescope types to the Empire State Building, to similarities between Abraham Lincoln and John F. Kennedy. Study participants used the audio recording feature in Microsoft Word to record themselves reading the passage and save the recordings as an MP3 file for later analysis.

Research Questions The following research questions were addressed: To what extent do cued pronunciation readings practiced in a self-directed context affect intermediate ESL learners’… 1. perception of pausing, word stress, and sentence-final intonation? 2. use of pausing, word stress, and sentence-final intonation in controlled production? 3. perceived comprehensibility in spontaneous speech tasks? METHOD Participants ESL Students. Seventy-five ESL learners enrolled full-time in a university ESL program participated in the study. They were all of intermediate-level proficiency and ranged in age from 17 to 54 (mean age = 25 years). All had spent between one week and two years in the US, with the median length of stay being four months. Participants reported having previously studied English for between two months and 17 years, with the median being four years. The participants’ native language backgrounds were organized into three categories: Asian language speakers (e.g., Japanese, Chinese, Korean, n = 36), Romance language speakers (e.g., Spanish Italian, Romanian, n = 34), and other languages (e.g., Haitian Creole, Russian, Armenian, n = 5). Before this study began, the ESL program assigned the ESL participants to one of six intermediate-level classes, balanced as closely as possible for English language proficiency, L1, gender, and length of time in the US. Intermediate-level ESL learners were selected for this study because their normal curriculum involved the use of self-directed computer-based language learning tasks, and this prior experience would enable them to complete the cued pronunciation readings independently. Native Speaking (NS) Informants. Previous studies researching pronunciation errors (Anderson-Hsieh, Johnson, & Koehler, 1992) have used NS informants to establish baselines for analyzing phonological errors in passages read aloud by NNSs. For this study, the researchers determined that NNS performance on the Time 1 and Time 2 perception and controlled production tasks could only be appropriately measured against NSs’ performance of the same tasks. Ten NSs, 5 males and 5 females, all enrolled in a graduate TESOL program, served as informants. All 10 NSs were from the western United States and were recruited because of their knowledge of linguistics and their ability to speak Standard American English.

Language Learning & Technology

54

Tanner and Landon

Effects of Computer-Assisted Pronunciation

Teachers. Six intermediate listening/speaking classes were randomly selected for the study, each taught by a different teacher. Three classes were randomly selected as the control group, with the other three classes serving as the treatment group. The six teachers had one to three years of formal teaching experience. Each participating teacher agreed to neither address the specifics of the study in their curriculum, nor provide feedback to their students regarding the CPR tasks. Listeners. Two listener groups were used. The first group consisted of ten novice NSs (i.e., people not accustomed to listening to or working with NNSs) recruited to evaluate spontaneous speech samples of ESL participants. These ten listeners (five men and five women) ranged in age from 21 to 52 (mean age = 29 years) and reported normal hearing acuity. The first group of listeners provided comprehensibility ratings for all 75 ESL participants. The second group of listeners included two expert judges who were both native English speakers and had advanced degrees in linguistics as well as extensive experience transcribing speech samples from native and non-native speakers. Previous studies (Anderson-Hsieh, et al., 1992; Derwing & Rossiter, 2003) investigating learners’ errors in controlled speech tasks have used expert judges to classify segmental or suprasegmental errors to determine whether these errors interfered with comprehensibility. The expert judges in this study independently listened to the perception and production tasks, scoring passages for errors in the suprasegmental features being evaluated (pausing, word stress, sentence-final intonation). The expert judges’ inter-rater reliability scores are reported in the results section. PROCEDURE Collection of Speech Perception and Production Data Speech perception and production data at Time 1 and Time 2 were collected from all NNS participants in the ESL program’s computer lab. The total test time was 20 minutes and data collection consisted of a sequence of seven computerized tasks: five spontaneous speech tasks, one perception task, and one controlled production task, in that order. The five spontaneous speech elicitation tasks were similar to those used in the Educational Testing Service (ETS) institutional version of the SPEAK (Spoken Proficiency English Assessment Kit) test (1999). In the spontaneous speech tasks, participants told a story based on a sequence of pictures, suggested potential solutions to a problem illustrated in the picture sequence, discussed the advantages and disadvantages of an issue, expressed their opinion on a debatable topic, and explained changes made in a schedule of events. For the perception task, participants listened to a passage recorded by a NS and marked pauses, stressed words, and sentence-final intonation (rising or falling) on a transcript. The controlled production task required participants to simply read a passage aloud. They were given one minute to read the passage silently before reading it aloud. The test was conducted during the participants’ listening/speaking class period to avoid placing additional stress on the students. The spontaneous speech tasks used across Time 1 and Time 2 were of a similar type but included different pictures, topics, etcetera to avoid participants remembering the task prompts and practicing over the time period of the study (Derwing & Munro, 1994). Perception and controlled production tasks remained the same so that changes in perception and performance errors could be noted across Time 1 and Time 2. Neither teachers nor students could access testing stimuli during the treatment period. English Instruction and CPR Use All six classes in the study were taught using the ESL program’s established syllabus and materials. The overall communicative curriculum was skills-based with attention to listening, speaking, reading, writing, and grammar. Because the study was intended to investigate the effects of self-directed computer-assisted pronunciation tasks, teachers neither graded recordings nor provided feedback. Teachers simply reminded students to complete the weekly CPR tasks, which counted as extra-credit for treatment-group classes.

Language Learning & Technology

55

Tanner and Landon

Effects of Computer-Assisted Pronunciation

During week two of the study, the treatment group classes received one 65-minute instructional period from one of the researchers. During this session, terms used in the CPRs (e.g., suprasegmental, stress, intonation) were defined and students were taken through the stages necessary to complete a sample CPR. The brief period of instruction was conducted in the computer lab so that students would understand the procedures for marking the task sheet, recording the passage, and saving the recorded file. Students in the treatment group were told to submit to the researchers one completed CPR recording and task sheet each week for the 11 weeks that followed that session. The CPR tasks were organized so that students could complete one portion of each reading on their own during the ten minutes they were allotted each day to work on the task after class hours. Participants were required to complete the tasks in the computer lab to ensure that only the students in the treatment group had access to the readings, with only one reading accessible per week, and so that a lab attendant was available to students if any difficulties arose. SCORING PROCEDURES Perception Task Ten NS informants listened to the same passage presented to the NNS participants and marked the location of pauses, stressed words, and the direction (up or down) for sentence-final intonation. Their markings were pooled to determine agreed-upon locations for these suprasegmental features. Nine pause locations were identified by the NS informants. Thirty-six words were identified as having definite stress. Another 13 words were selected variably by the NSs as being stressed within the discourse; these 13 were labeled in the answer key as optionally stressed, so that whether or not an ESL participant selected one of these words as a stressed word, no error would be counted. NS informants agreed 100% on sentence-final intonation (pitch direction at the end of each of the seven sentences). Once the answer key was created, the expert judges used the key to independently score the Time 1 and Time 2 perception tasks for each of the ESL participants. The following types of error were counted. An error was counted for pausing and word stress if a feature was missing (meaning the participant should have marked the feature but did not) or incorrect (the participant did mark the feature but should not have). These two error types constituted the total number of errors identified by the participant for each suprasegmental feature. Because the location of intonation markings was fixed, an intonation error was identified if the pitch direction at the end of the sentence was incorrectly marked. Controlled Production Task The same 10 NSs who had been used as informants for the perception task were recorded reading aloud the passage created for the controlled-production task. They followed the same procedure as the ESL participants in completing the task. The NSs were allowed to read the passage through silently before recording the passage to let them see the flow of the text, identify any unfamiliar words, and practice the pronunciation of the passage. Once the speech samples were recorded, the two expert judges independently evaluated them for the three suprasegmental features being studied (pausing, word stress, and sentence-final intonation). An answer key was constructed based on the pooled responses of the NSs. For pausing, there was considerable agreement among the NSs as to locations where pauses should occur in the oral reading. Locations where 8 or more of the NSs paused collectively were labeled as required pauses. Places where 3 to 7 of the native speakers paused collectively were labeled as optional pauses. No pausing error was recorded if the ESL participant failed to pause in the optional locations. NNS pauses in any other location were counted as errors. Word stress scoring was similar to that of pausing. Seventy-three syllables in the passage were stressed by 8 or more of the NS informants. Any other syllables stressed by the ESL participants were counted as errors. Syllables which participants should have stressed but did not were also counted as errors. There was 100% agreement among the native speakers on pitch direction for the 11 examples of sentence-final Language Learning & Technology

56

Tanner and Landon

Effects of Computer-Assisted Pronunciation

intonation. When an ESL participant’s sentence-final intonation rose instead of falling, fell instead of rising, or remained flat, it was counted as an error.

Spontaneous Speech Tasks The five spontaneous speech tasks were included to evaluate whether the treatment significantly improved participants’ level of perceived comprehensibility from Time 1 to Time 2. Ten novice NSs (people not accustomed to listening to or working with NNSs) rated these speech stimuli. Each was assigned to rate 30 students, randomly selected from each class and each time period (Time 1 or Time 2). The NSs listened to at least a 45-second speech sample from each of the five spontaneous speech tasks for each student before assigning a perceived comprehensibility score. The speech samples were recorded onto CDs in order to facilitate the rating process. Each set of spontaneous speech tasks was rated by two different listeners, resulting in four perceived comprehensibility scores for each examinee: two for Time 1 and two for Time 2, each from a different listener. The novice NS listeners used the five-point Likert scale shown in Table 1 in assigning a single perceived comprehensibility score to the spontaneous speech samples. Table 1. Descriptors for Perceived Comprehensibility Ratings 4

Speaker is very easy to understand. Little (if any) listener effort is required. Errors (if any) are not distracting. 3 Speaker is mostly comprehensible. Listeners can understand with some effort. Errors are occasionally distracting. 2 Speaker is sometimes comprehensible. Significant listener effort is required. Errors are often distracting. Words and individual sentence meaning are usually comprehensible. Meaning of the overall recording is incomprehensible. 1 Speaker is very difficult to understand. Great listener effort is required. Errors are very distracting. Most words are intelligible, but sentence meaning is often unclear. 0 Speaker is basically incomprehensible. Only an occasional word is intelligible. NR Not ratable

RESULTS Quantitative Findings Perception tasks. After the Time 1 and Time 2 perception tasks were scored, the number of each error type for each examinee was entered into a spreadsheet for further analysis. As described previously, errors were counted for pausing and word stress if a feature was missing (meaning the participant should have marked the feature but did not) or incorrect (the participant did mark the feature but should not have). These error types constituted the total number of participants’ errors for each suprasegmental feature. Pearson product-moment correlations showed no significant correlations between or within the three suprasegmental categories (r = 0.43 at p > .05). Therefore, each type of pausing, stress, and intonation error was analyzed separately. Other comprehensibility studies involving Time 1 and Time 2 speech samples (Derwing et al., 1997, 1998; Derwing & Rossiter, 2003) have used repeated-measures ANOVAs to analyze group performance with treatment as the between factor and time as the within factor. An extension of ANOVA that provides Language Learning & Technology

57

Tanner and Landon

Effects of Computer-Assisted Pronunciation

a statistical means for eliminating the linear effects of a particular variable is called an analysis of covariance (ANCOVA) (Vogt, 1993). Following Vogt (1993), ANCOVAs were run using participants’ scores at Time 1 as the covariate to control for learners’ varying abilities at the outset of the study. To account for possible Type I Error with running multiple ANCOVAs, the alpha level was adjusted to .01. ANCOVAs were performed on perception task data using treatment as the independent variable and total scores for three categories of prosodic error (pausing, word stress, and sentence-final intonation) as the dependent variables. Results are given in Table 2. These ANCOVAs indicated that the overall effect of treatment was significant (p < .01) for perception of pausing and word stress, but not intonation. Table 2. Analyses of Covariance for Perception Task: Summary Error Category Perception of Pausing Perception of Word Stress Perception of Sentence-final Intonation

Source Treatment Treatment Treatment

df 1, 71 1, 71 1, 71

F 9.07 21.63 5.14

p .004 < .001 .027

Additional ANCOVAs were run to identify the effects of treatment on the specific types of pausing and word stress errors (missing vs. incorrectly placed). Results showed that the treatment group made a significantly greater reduction in missing pause marks than the control group (F(1,71) = 10.07, p < .01), but not in incorrect pause marks (F(1,72) = 0.04, p = .843). For errors in perception of word stress, ANCOVA results showed that missing word stress marks were also significantly reduced for the treatment group (F(1,71) = 33.10, p < .01), but incorrect word stress marks were not (F(1,72) = 0.58, p = .450). These results indicate that the treatment enabled participants to significantly reduce the number of instances where they were unable to perceive pauses and stressed syllables. Controlled Production Task. Once the expert judges finished independently marking each of the 75 ESL participants’ Time 1 and Time 2 controlled production passages, inter-rater reliability scores were obtained. Inter-rater reliability Pearson coefficients (r) were as follows: pausing, .84; stress, .88; and intonation, .98. The few discrepancies that occurred were resolved by listening to each questionable segment while referring to the NS recordings as a baseline. As in the perception task, errors were counted for pausing and word stress in the controlled production task if a feature was missing (meaning the participant should have produced the feature but did not) or incorrect (the participant did produce the feature but should not have). The total numbers of each error type for each student, as well as their total sentence-final intonation errors, were added to the spreadsheet. ANCOVAs were performed on the controlled production task data using treatment as the independent variable and scores for the three categories of prosodic error (pausing, word stress, and sentence-final intonation) as the dependent variables. Participants’ Time 1 scores were used as a covariate. Results of these analyses, shown in Table 3, indicate that the overall effect of treatment was significant (p < .01) for controlled production of word stress. The results for controlled production of pausing and sentence-final intonation, however, were not significant. Table 3. Analyses of Covariance for the Controlled Speech Production Task: Summary Error Category Production of Pausing Production of Word Stress Production of Sentence-final Intonation

Language Learning & Technology

Source Treatment Treatment Treatment

df 1, 67 1, 67 1, 67

F 2.22 7.73 0.33

p .141 .007 .570

58

Tanner and Landon

Effects of Computer-Assisted Pronunciation

An additional ANCOVA was used to analyze the mean gains in production of word stress to determine which type(s) of error (missing, incorrect, or both) the treatment had affected. The results revealed that the treatment group participants produced fewer incorrectly stressed syllables at Time 2 (F(1,67) = 7.48, p < .01), indicating that the CPR treatment had improved their ability to use word stress appropriately. Spontaneous Speech Tasks. A final ANCOVA was run to determine whether treatment significantly affected participants’ perceived comprehensibility ratings. In the ANCOVA, the independent variable was treatment, with mean gains in perceived comprehensibility ratings as the dependent variable and Time 1 scores as the covariate. The results of the analysis showed no significant effect of treatment (F(1,69) = 0.06, p = .802). To account for possible differences in rater severity between the NS listeners, a FACETS analysis was performed. A FACETS analysis looks at the interaction of multiple facets involved in the rating procedures, including examinee ability, test difficulty, and rater severity. To determine rater consistency, FACETS provides an expected rating for each examinee and compares it to the observed rating. Based on this comparison, it then calculates an infit mean square value for each rater, as well as a mean and standard deviation of these infit values for the group of raters. Two standard deviations from the mean (in both directions) constitute the range of raters who are acceptably consistent. Four facets were included in the analysis for this study: group (treatment vs. control), time (Time 1 vs. Time 2), students, and raters. The FACETS analysis showed that the infit scores for all ten NS listeners were within the allowable limits of 2 standard deviations. (Infit mean values = 0.88, SD = 0.29, Range = 0.30 to 1.46). These results indicate that none of the NS listeners were inconsistently severe or lenient in their ratings. Qualitative Results Following the completion of the Time 2 testing, all 39 treatment group participants completed a brief follow-up survey regarding their experience using the CPRs. In the survey, students indicated the amount of effort they had put into completing the readings and the level of pronunciation progress they perceived in their speech. They also provided feedback on positive aspects of the readings, difficulties encountered, and recommendations for improving the CPRs. The results from this survey are discussed below. It should be noted that the ESL program administrators recommended that treatment group participants limit their work on the CPRs to ten minutes per day to allow sufficient time to complete their regular homework. Level of Effort Expended. Of the 39 students who participated in the treatment group, 37 participants (94.8%) agreed or strongly agreed that they could have put more effort into completing the pronunciation readings. The treatment group’s performance was assessed collectively, but the overall number of CPRs completed by the treatment participants varied. Nineteen students (48.7%) completed nine or more of the 11 CPRs. The remaining students completed eight or fewer of the readings. Students’ reasons for not completing all of the CPRs ranged from simply forgetting to do them to not having time to complete the tasks in the computer lab. Level of Perceived Pronunciation Progress. The follow-up survey included nine statements designed to assess the level of pronunciation progress treatment participants felt they had made by using the CPRs. Students responded to the statements using a four-point Likert scale that ranged from 1 (strongly disagree) to 4 (strongly agree). In Table 4, the statements are rank ordered by the percentage of students who agreed or strongly agreed with each statement. As a whole, treatment group participants were positive about their use of the cued pronunciation readings. While 34 out of the 39 students (87%) recognized that they had more work to do on their pronunciation, they felt they had improved. Thirty-two (82%) of the students felt that as a result of the treatment, they could understand English conversations more easily and they had increased their knowledge of English Language Learning & Technology

59

Tanner and Landon

Effects of Computer-Assisted Pronunciation

pronunciation. Thirty-one participants (79%) felt that because of the CPRs, they could communicate more effectively in situations previously difficult for them, they had more confidence when speaking English in public, and they could speak more fluently and correctly in English. Most treatment participants (64%) also noted that because of the readings, they could more easily recognize their own pronunciation errors. Positive Aspects of CPRs. Treatment students’ qualitative comments were positive about the specific ways in which they benefited from the readings. Six students (15.4%) commented that they liked seeing where the pausing and word stress occurred. For example: “I likes the fact that I had to study the reading with stress and pauses before reading. It helped me to understand better the pronunciation reading. I also liked the PowerPoint presentation. It was cool.” Nine students commented that they felt the readings helped them speak clearly and more accurately: “I learned that [it] is more important to speak clear and using pauses than faster.”; “I learned how to stress syllables and where to [pause].” and “I learned a lot about correct English pronunciation and I tried to copy it every time I did those exercises. I loved it!” Other positive comments included learning new vocabulary from the readings, developing a greater awareness of suprasegmental features, and utilizing the computer’s recording and playback features to identify individual pronunciation errors. Table 4. Students’ Responses Showing Perceived Level of Progress (n = 39)

Statement I feel that I still have more work to do on my pronunciation, but I can see that I have made progress I can understand English conversations more easily By doing the pronunciation readings, I learned a lot about English pronunciation By doing the pronunciation readings, I learned how to speak more fluently and correctly in English I have more confidence now when speaking English in public I can now communicate more effectively in situations that were difficult for me before I feel that people can now understand my speech more easily Because of the pronunciation readings, I can work on correcting my pronunciation errors Because of the pronunciation readings, I can more easily recognize my own pronunciation errors

Percentage who Agreed and Strongly Agreed 87% (n=34) 85% (n=33) 82% (n=32) 79% (n=31) 79% (n=31) 77% (n=30) 77% (n=30) 77% (n=30) 64% (n=25)

Difficulties with CPRs. Students made 31 comments regarding problems they had experienced with the CPRs. These largely focused on the difficulty of the perception tasks (identifying stressed syllables and understanding the native English speakers’ pronunciation of the words in the passages) and having to imitate the pronunciation patterns of the native English speakers (talking fluently, linking words and phrases together with limited pausing, and producing multi-syllabic words). Example comments: “Most difficult was to understand where the stresses in the sentences were” “It’s difficult sometimes when the native speaker speak very fast and I couldn’t understand the pronunciation of each words” “Long words and difficult pronunciation vocabularies” Language Learning & Technology

60

Tanner and Landon

Effects of Computer-Assisted Pronunciation

“Linked some words or phrases as native speakers do” Recommendations for Change. Fifteen students (38.5%) identified time constraints as a concern for them in completing the readings, since they were already spending several hours a day studying English. Two of them suggested that the readings be made available for students to do at home, where it was more convenient for them to complete the tasks, rather than being required to do the tasks in the computer lab. Four students (10.3%) recommended receiving specific feedback on their final recordings to let them know what suprasegmental mistakes they continued to make. Four students (10.3%) recommended selecting more interesting topics for the readings, but they did not provide any suggestions for alternative topics. DISCUSSION Results from this quasi-experimental study showed that treatment had a significant effect (p < .01) on perception of pausing, perception of word stress, and controlled production of word stress. Further investigation of improvement in pause perception revealed that missing pause marks decreased significantly for the treatment group participants at Time 2, indicating that their awareness of appropriate pausing had increased and they had become better able to hear and correctly identify NS pause locations. The cued pronunciation readings also had a significant effect on the participants’ ability to correctly perceive stressed syllables, specifically shown by a significant reduction in missing stress marks. In the controlled production task, the treatment participants also had significantly fewer instances of incorrectly placed stress. The final question addressed in this study was whether treatment significantly affected ESL participants’ level of perceived comprehensibility. This question is important because previous research (Derwing et al., 1997, 1998; Derwing & Rossiter, 2003) has shown that treatment can have a short-term effect on performance directly related to the type of treatment participants received. Results from the current study showed that there was no significant change in the learners’ level of perceived comprehensibility at Time 2. The lack of such a finding may be due to the length of the treatment (11 weeks), the level of sustained effort put forth by the treatment group participants (a maximum of 10 minutes per day over the 11 week study), or the students’ level of motivation to participate. Given these factors and others that the researchers may not be aware of, it is important to note that there still were areas in which treatment group participants improved significantly more than students in the control group. Qualitative data gathered from treatment group participants provided additional insights into the learners’ use of CPRs. They noted that the readings helped them learn a great deal about English pronunciation, learn to speak English more fluently and correctly, have more self-confidence when speaking in public or in situations difficult for them, and feel that people could understand their speech more easily. Implications The findings from this study provide further empirical evidence strengthening claims about the pedagogical use of oral reading techniques for pronunciation improvement. NNSs in the treatment group significantly improved their perception of pausing and word stress and their production of word stress with limited exposure to the available treatment (11 weeks, 10 min. per day) by using self-directed, computer-assisted cued pronunciation readings. It might be argued that participants in the treatment group merely improved their familiarity with the procedure and format for marking prosodic features. However, the fact that their markings were more correctly placed on the post-test indicates that not only were treatment group participants better able to perceive the requisite features, but their ability to predict where those features should occur also improved. It is important to remember that the treatment participants received no feedback other than the answer keys provided in the PowerPoint slides. For language instructors who do not feel comfortable teaching pronunciation or who cannot fit it into their curriculum,

Language Learning & Technology

61

Tanner and Landon

Effects of Computer-Assisted Pronunciation

self-directed, computer-assisted cued pronunciation readings can provide an effective way to help students improve their ability to perceive, predict, and produce prosodic features outside of class. A second implication is that as in other areas of language acquisition, for many learners, the ability to perceive suprasegmental features may precede the ability to correctly produce them. This implication stems from the fact that the CPR treatment had a significant effect on two areas of prosodic perception (pausing and stress), but only on one area of controlled production (stress), and no significant effect on perceived comprehensibility. This implication supports previous pedagogical advice suggesting that students should be empowered with predictive and perceptive skills necessary to improve production (Dickerson, 1995). In turn, perception and controlled production may precede improvements in learners’ spontaneous speech. A third implication is based on recommendations for change given by the treatment participants. Students may be more motivated to complete self-directed, computer-assisted pronunciation tasks if the tasks are available for them to do at a time and in a location of their choosing. In this study, participants were required to complete the tasks in the ESL program’s computer lab to control access to the readings and so that a lab attendant was available for assistance. Limitations Although this study was carefully planned and implemented, some important limitations should be noted. First, the ESL program placed heavy time constraints on the study, as mentioned earlier. A longer, more intensive treatment might have had a more significant impact on the learners’ performance. A second limitation involves the issue of feedback. Since the readings were self-directed, the participants received no teacher feedback on their weekly markings or recordings. Four students expressed that they might have benefited more from the readings if they had received specific feedback on their progress (or lack thereof). A third limitation is the varying level of effort put forth by the treatment group participants. As discussed previously, only half of the treatment group participants completed nine or more readings. Participants who completed fewer readings (particularly the five students who did fewer than three CPRs) may have contributed to the limited number of significant differences that occurred between the treatment and control groups. A final limitation involves the length of the tasks used at Time 1 and Time 2. To avoid learner fatigue while providing sufficient discourse for scoring prosodic errors, the perception and controlled-production paragraphs were fairly short (seven and 11 sentences respectively), particularly with regard to the number of tokens available for testing sentence-final intonation. A passage with more sentences and more varied pitch direction would provide additional tokens for assessing participants’ perception and production of suprasegmentals. CONCLUSION The quantitative analysis of the pre-test and post-test results revealed that treatment group participants made significant gains (p < .01) in three areas: perception of pausing, perception of word stress, and controlled production of word stress. The treatment had a significant positive effect on the participants’ total pause marking errors, specifically on the number of missing pause marks. For perception of stressed words, the treatment had a significant positive effect on the total number of stress marking errors, specifically in the area of missing stress marks. Finally, treatment had a significant effect on the total number of stress placement errors produced in the controlled production task. More specifically, participants had fewer instances of stressing syllables that should not receive stress. Suggestions for Future Research Language Learning & Technology

62

Tanner and Landon

Effects of Computer-Assisted Pronunciation

With several popular pronunciation textbooks (Grant, 2001; Meyers & Holt, 1998; Miller, 2006) advocating the use of imitative oral readings as a pronunciation improvement technique, further empirical research needs to done. This research will help identify what learner variables (e.g., students’ proficiency level, level of motivation, attitude toward the host culture) and pedagogical variables (e.g., time on task, duration of practice, teacher feedback) most influence second language learners’ ability to effectively use these types of pronunciation practice activities both inside and outside the classroom. The results of this study provide empirical evidence supporting other researchers’ claims (Hardison, 2004; Pennington & Ellis, 2000) that computer technology can help second language learners more accurately perceive and produce prosodic features. In the current study, self-directed computer-assisted CPRs provided treatment group participants an opportunity to develop their perception abilities (in pausing and word stress) and production abilities (in word stress). These findings, however, suggest that further work is needed to explore the effects that CPRs can have on language learners’ perception and production of specific suprasegmental features. What impact might CPRs have if the readings were integrated into the ESL curriculum? How might periodic feedback from teachers, as called for by participants in this study’s treatment group, affect learners’ production of suprasegmental features? Could regular teacher feedback focused on specific errors produced by ESL learners in the CPR tasks result in significant changes in the learners’ level of perceived comprehensibility? Could learners’ use of CPRs over an extended time period (longer than 11 weeks) result in greater gains in the perception and production of suprasegmentals as well as improvement in ratings of perceived comprehensibility? Clearly, several issues still need to be addressed.

ACKNOWLEDGMENTS We are grateful to the ESL learners and native English speaker raters whose participation made this study possible. We are also grateful to the three anonymous reviewers and journal editors for their helpful feedback on earlier versions of this manuscript.

ABOUT THE AUTHORS Melissa Landon holds an MA in TESOL from Brigham Young University. She is currently a full-time mother and part-time researcher. Her research interests include assessment and pronunciation instruction. E-mail: [email protected] Mark Tanner is an Assistant Professor in the Linguistics and English Language Department at Brigham Young University. His research interests include pronunciation pedagogy and comprehensibility research, self-directed learning, and second language teacher education. E-mail: [email protected]

REFERENCES Anderson-Hsieh, J. (1990). Teaching suprasegmentals to international teaching assistants using fieldspecific materials. English for Specific Purposes, 9, 195-214. Anderson-Hsieh, J. (1992). Using electronic visual feedback to teach suprasegmentals. System, 20, 51-62. Anderson-Hsieh, J. (1994). Interpreting visual feedback on suprasegmentals in computer assisted pronunciation instruction. CALICO Journal, 11(4), 5-22. Language Learning & Technology

63

Tanner and Landon

Effects of Computer-Assisted Pronunciation

Anderson-Hsieh, J., Johnson, R., & Koehler, K. (1992). The relationship between native speaker judgments of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning, 42, 529-555. Blau, E. K. (1990). The effect of syntax, speed, and pauses on listening comprehension. TESOL Quarterly, 24, 746-752. Breitkreutz, J., Derwing, T. M., & Rossiter, M. J. (2002). Pronunciation teaching practices in Canada. TESL Canada Journal, 19, 51-61. Celce-Murcia, M., Brinton, D. M., & Goodwin, J. M. (1996). Teaching pronunciation: A reference for teachers of English to speakers of other languages. New York: Cambridge University Press. Derwing, T. M., & Munro, M. J. (2005). Second language accent and pronunciation teaching: A researchbased approach. TESOL Quarterly, 39, 379-397. Derwing, T. M., Munro, M. J., & Weibe, G. E. (1997). Pronunciation instruction for fossilized learners: Can it help? Applied Language Learning, 8, 217-235. Derwing, T. M., Munro, M. J., & Wiebe, G. E. (1998). Evidence in favor of a broad framework for pronunciation instruction. Language Learning 48, 393-410. Derwing, T. M., & Rossiter, M. J. (2003). The effects of pronunciation instruction on the accuracy, fluency, and complexity of L2 accented speech. Applied Language Learning, 13, 1-17. Dickerson, L. (1995). Autonomy and motivation: A literature review. System, 23, 165-174. Fayer, J., & Krasinski, E. (1995). Perception of hesitation in nonnative speech. Bilingual Review, 20, 114121. Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly, 39, 399-423. Firth, S. (1992). Pronunciation syllabus design: A question of focus. In P. Avery & S. Ehrlich (Eds.), Teaching American English Pronunciation (pp. 173-183). Oxford: Oxford University Press. Goodwin, J. M. (2004). Imitative conversations. Demonstration presented at the international TESOL Conference, Miami, FL. Graham, C. (1978). Jazz chants: Rhythms of American English for students of English as a second language. New York: Oxford University Press. Grant, L. (2001). Well said: Pronunciation for clear communication (2nd Edition). Boston, MA: Heinle & Heinle. Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38, 201-223. Hahn, L. D., & Hahn, K. (2007). Phrase stress essentials. Paper presented at the international TESOL conference, Seattle, WA. Hardison, D. M. (2004). Generalization of computer-assisted prosody training: Quanitative and qualitative findings. Language Learning & Technology, 8, 34-52. Retrieved from http://llt.msu.edu Jenkins, J. (2004). Research in teaching pronunciation and intonation. Annual Review of Applied Linguistics, 24, 109-125. Levis, J. M. (1999). Intonation in theory and practice, revisited. TESOL Quarterly, 33, 37-63. Levis, J. M. (2002). Reconsidering low-rising intonation in American English. Applied Linguistics 23, 5682. Language Learning & Technology

64

Tanner and Landon

Effects of Computer-Assisted Pronunciation

Levis, J. M., & Pickering, L. (2004). Teaching intonation in discourse using speech visualization technology. System, 32, 505-524. McNerney, M., & Mendelsohn, D. (1992). Suprasegmentals in the pronunciation class: Setting priorities. In P. Avery & S. Ehrlich (Eds.), Teaching American English pronunciation (pp. 185-196). Oxford: Oxford University Press. Meyers, C., & Holt, S. (1998). Pronunciation for success: Student workbook. Burnsville, MN: Aspen Productions. Miller, S. F. (2006). Targeting pronunciation: Communicating clearly in English (2nd Edition). New York: Houghton Mifflin. Molholt, G. (1988). Computer-assisted instruction in pronunciation for Chinese speakers of American English. TESOL Quarterly, 22, 91-111. Nagata, N. (1993). Intelligent computer feedback for second language instruction. The Modern Language Journal, 77, 330-339. Pennington, M. C. (1999). Computer-aided pronunciation pedagogy: Promise, limitations, directions. Computer Assisted Language Learning, 12, 427-440. Pennington, M. C., & Ellis, N. C. (2000). Cantonese speakers’ memory for English sentences with prosodic clues. The Modern Language Journal, 84, 372-389. Pickering, L. (2004). The structure and function of intonational paragraphs in native and nonnative speaker instructional discourse. English for Specific Purposes, 23, 19-43. Ricard, E. (1986). Beyond fossilization: A course in strategies and techniques in pronunciation for advanced adult learners. TESL Canada Journal, Special Issue 1, 243-253. Richman, B. (2005). Songs for Speaking: Lyrics for Pronunciation. Paper presented at the RockyMountain TESOL Conference, Salt Lake City, UT. Sardegna, V., & Molle, D. (2008). Empowering students with pronunciation learning strategies. Demonstration given at the international TESOL conference, New York, NY. Towell, R., Hawkins, R., & Bazergui, N. (1996). The development of fluency in advanced learners of French. Applied Linguistics, 17, 84-119. Vogt, W. P. (1993). Dictionary of statistics and metholodogy: A non-technical guide for the social sciences. Newbury Park, CA: Sage Publications. Walker, R. (2005). Using student-produced recordings with monolingual groups to provide effective individualized pronunciation practice. TESOL Quarterly, 39, 550-58. Wennerstrom, A. (1998). Intonation as cohesion in academic discourse: A study of Chinese speakers of English. Studies in Second Language Acquisition, 20, 1-25.

Language Learning & Technology

65

Language Learning & Technology http://llt.msu.edu/vol13num3/ducatelomicka.pdf

October 2009, Volume 13, Number 3 pp. 66–86

PODCASTING: AN EFFECTIVE TOOL FOR HONING LANGUAGE STUDENTS’ PRONUNCIATION? Lara Ducate and Lara Lomicka 1 The University of South Carolina This paper reports on an investigation of podcasting as a tool for honing pronunciation skills in intermediate language learning. We examined the effects of using podcasts to improve pronunciation in second language learning and how students’ attitudes changed toward pronunciation over the semester. A total of 22 students in intermediate German and French courses made five scripted pronunciation recordings throughout the semester. After the pronunciation recordings, students produced three extemporaneous podcasts. Students also completed a pre- and post-survey based on Elliott’s (1995) Pronunciation Attitude Inventory to assess their perspectives regarding pronunciation. Students’ pronunciation, extemporaneous recordings, and surveys were analyzed to explore changes over the semester. Data analysis revealed that students’ pronunciation did not significantly improve in regard to accentedness or comprehensibility, perhaps because the16-week long treatment was too short to foster significant improvement and there was no in-class pronunciation practice. The podcast project, however, was perceived positively by students, and they appreciated the feedback given for each scripted recording and enjoyed opportunities for creativity during extemporaneous podcasts. Future studies might seek to delineate more specific guidelines or examine how teacher involvement might be adapted to the use of podcasts as a companion to classroom instruction. INTRODUCTION As evidenced by this special issue on teaching pronunciation, foreign language (FL) teachers are often challenged by the ongoing debate on how to teach pronunciation across proficiency levels. While some teachers feel there is often not enough class time to practice pronunciation, including intonation or prosody (Munro & Derwing, 2007; Ramírez-Verdugo, 2006), others may not enjoy nor know how to teach pronunciation, or they may believe that students simply find it boring (Stevick, Morley, & Wallace Robinett, 1975). Furthermore, some teachers may be reluctant to teach pronunciation due to lack of training in phonetics (Weinberg & Knoerr, 2003). Teaching pronunciation in a class specific to pronunciation, phonology, or phonetics may seem more feasible than in a typical language classroom. However, these types of classes normally only occur in the upper levels, so students in beginning language classes could be deprived of systematic pronunciation training until late in their language learning careers. Historically, with the advent of the communicative approach, there may have been some confusion as to the place and role of pronunciation in language learning. Terrell (1989), for example, suggests that those teaching from a communicative approach “have not known what to do with pronunciation” (p. 197). Likewise, Pennington and Richards (1986) discuss that pronunciation is often viewed as having “limited importance” in communicative curricula (p. 207). As a result of the perceived confusion with regard to the role of pronunciation in the communicative approach, language teachers struggle to find ways to practice pronunciation in class (Lord, 2008). Further, Elliott (1995) maintains that “teachers tend to view pronunciation as the least useful of the basic language skills and therefore they generally sacrifice teaching pronunciation in order to spend valuable class time on other areas of the language” (p. 531). Although teachers sometimes forgo pronunciation instruction to spend time on aspects of the FL that they find more important, pronunciation plays a significant role in comprehensibility (Anderson-Hsieh & Koehler, 1988). Leather (1999) points out that non-native speakers (NNSs) with poor pronunciation can Copyright © 2009, ISSN 1094-3501

66

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

even be “personally downgraded because of their accent” (p. 35). While there are a variety of factors that affect pronunciation, including age, individual differences, motivation, and instruction (Leather 1999; Moyer, 1999), teachers should take advantage of the factors over which they have control: instruction and exposure. How might technology provide us with tools to address this challenge? When reflecting on computermediated communication and technology tools in general, Thorne and Payne (2005) suggest, “…one of the principle critiques of textual CMC (computer mediated communication) has been that oral speech and aural comprehension are not explicitly exercised” (p. 386). Podcasting may offer a possible option for practicing speaking skills outside of class. Podcasts are easy-to-create audio files that can be uploaded to the Internet and to which users can subscribe. Our study attempts to explore this option by using podcasting to hone pronunciation skills outside of class. Intermediate level students of French and German created eight podcasts (scripted and extemporaneous) in order to practice pronunciation and to apply their newly practiced pronunciation skills to a more creative, contextualized task. Students also completed a pre- and post-survey based on Elliott’s (1995) Pronunciation Attitude Inventory (PAI) to assess their changing perspectives on the role of pronunciation in language learning. Students’ scripted pronunciation and extemporaneous recordings as well as surveys were rated for accentedness and comprehensibility. Research on Pronunciation Comprehension Studies Many studies have investigated global non-native pronunciation to assess what factors affect pronunciation (Piper & Cansin, 1988; Thompson, 1991), help improve pronunciation (Derwing & Rossiter, 2003; Graeme, 2006; Lord, 2005; Magen, 1998; O’Brien, 2004; Ramírez-Verdugo, 2006; Riney & Flege, 1998), and contribute to accent and comprehension (Brennan & Brennan, 1981; Jilka, 2000; Munro & Derwing, 2007). While the age that someone begins learning a FL seems to have the largest effect on pronunciation (Piper & Cansin, 1988; Thompson, 1991), studies have shown that training can also help to improve students’ pronunciation (Graeme, 2006; Lord, 2005; Ramírez-Verdugo, 2006). After two weeks of training on specific sounds, Graeme (2006) found that the average error rate dropped from 19.9% to 5.5%, and in a delayed post-test to 7.5%, which illustrates that focused instruction can lead to phonological changes. In another study, members of an experimental group improved significantly after listening to native speakers (NSs) and comparing their own speech with the NSs’ (Ramírez-Verdugo, 2006). In a Spanish phonetics class, students who received explicit phonetics instruction improved their pronunciation on specific features (Lord, 2005). The findings of these studies show that “raising [second language (L2)] learners’ awareness of the important role of intonation systems is an attainable aim” (Ramírez-Verdugo, 2006, p. 153) that can ultimately help to improve students’ FL pronunciation. In addition to comprehension, prosody represents another important aspect of pronunciation. Prosody is defined as the “patterns in individual words of stress, pitch, and tone and rhythmic and intonational patterns of longer utterances” (Pennington, 1989, p. 22). As Munro and Derwing (1995) found, the presence of a strong accent does not necessarily hinder intelligibility; in their study, some speakers were rated as heavily accented even though the listeners understood everything. The researchers attribute this apparent contradiction to the effects of inaccurate prosody. Since prosody has been found to be one of the main reasons speech can be perceived as accented, even more than individual sounds, (Anderson-Hsieh & Koehler, 1988; Munro, 1995; Pennington, 1989), prosody training for students at all levels is recommended as part of communicative language teaching (Chun, 1988; O’Brien, 2004; Pennington, 1989; Van Els & de Bot, 1987; Volle, 2005). As learners tend to use L1 (first language) intonation patterns when speaking in the L2 (Ramírez-Verdugo, 2006), they need to be explicitly taught the prosody of the L2. One way to achieve this practice, as well as practice in comprehension and accentedness, is through the use of technology. Language Learning & Technology

67

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

Using Technology to Improve Pronunciation Technology has been used in many ways to improve students’ pronunciation. Since students often have a difficult time hearing their own pronunciation mistakes and judging the nativelikeness of their speech, visual displays can help to show specific sounds and the patterns of prosody (Ehsani & Knodt, 1998; Hardison, 2004; Martin, 2004; Pennington, 1989; Ramírez-Verdugo, 2006; Seferoglu, 2005). Automated speech recognition (ASR) tools, such as WinPitch for example, are advantageous because they do not rely on students’ own perceptions of their pronunciation, but they show exactly how their sounds compare to those of NSs (native speakers) (Ehsani & Knodt, 1998; Martin, 2004; O’Brien, 2006). One drawback of ASR (automatic sound recognition) tools, however, as pointed out by O’Brien (2004) is their lack of contextualization. Technology, specifically the use of podcasts, could offer opportunities for contextualizing tasks, while at the same time honing pronunciation. The next section provides a brief introduction to podcasting, including how it can be used in FL classes and how it has been utilized for pronunciation tasks. We then describe the details of a podcasting project implemented to improve students’ pronunciation and prosody. Podcasting In recent years, Internet audio has greatly increased in popularity (McCarty, 2005). One recent example of Internet audio, a podcast, is an audio file that anyone can create using a computer, microphone, and a software program. Once posted to the web, podcasts can be accessed, downloaded and played to a computer or MP3 player. The popularity of podcasts can be linked to their simplicity in creating, editing, publishing and listening to them. Another reason that could be attributed to their rising popularity, according to Tan and Mong (n.d.), is the “…increasingly widespread ownership of MP3 players and the relative ease with which individual podcasters can create and distribute files” (p. 2). Harris Interactive (2007) reports that players are extremely popular among young adults, noting a marked increase among college students in particular. Due to the increased popularity of podcasts and ownership of MP3 devices, the use of podcasting has begun to find its way into educational settings. Uses of Podcasting in Education Podcasting is being used in a variety of ways in all levels and disciplines of education. More traditionally, it can be used to distribute lecture material. This material is available as a review (for those in class), or, if students or teachers are absent, a podcast can serve to distribute the missed information (Tavales & Skevoulis, 2006). Podcasting can empower students by giving them opportunities to create and publish for a real audience (Stanley, 2006) and facilitate recording and distributing news broadcasts, developing brochures, creating or listening to teachers’ notes, recording lectures distributed directly to students’ MP3 players, recording meeting and conference notes, supporting student projects and interviews, and providing oral history archiving and on-demand distribution (Meng, 2005). More specific to language learning, podcasting has several theoretical underpinnings in second language acquisition (SLA) research. Swain and Lapkin (1995) recognize output as essential for second language learning. One strategy they suggest is having students listen to themselves as they edit their output, and then go back, listen again, and revise as necessary. They can also receive feedback from other students and their instructor. This type of approach could be quite useful in podcasting as it is easy to record, rerecord and listen to various segments of a podcast. After students record podcasts, they can listen multiple times, edit their podcasts and comment on their classmates’ recordings (see also Lord, 2008; Meng, 2005). Although we know that the use of audio in education is far from a novelty, podcasting and MP3 devices have brought a newfound excitement to the classroom. Osaka Jogakuin College in Japan was the first school to provide iPods to incoming students. Podcasts downloaded to the iPods consisted of audio learning aids to help with the learning of English (McCarty, 2005).2 Podcasting trends can now be found

Language Learning & Technology

68

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

in different parts of the world—many universities and colleges3 are embarking on projects using MP3 devices and podcasting in innovative ways. Podcasting Projects Specific to Pronunciation However educators decide to use podcasts, it is first important to determine instructional goals (O’Bryan & Hegelheimer, 2007) and keep the emphasis on pedagogy (Rosell-Aguilar, 2007, 2009). In keeping these objectives in mind, practice with pronunciation, listening, and speaking are specific ways that foreign language teachers and learners can tap into this technological tool. Using podcasting in contextualized language learning (as opposed to simple pronunciation drills) can also be useful in that it allows teachers to contextualize pronunciation and create meaningful tasks, rather than simply have students repeat and practice lists of words or sounds. Chan and Lee (2005) note “audio has been vastly neglected and underused as a teaching and learning medium in recent years” (p. 62). Therefore, it is not surprising that language teachers would be interested in podcasts. McQuillan (2006b) highlights several tasks that focus on oral production, such as using audio diaries, conducting interviews with native speakers, and hosting talk shows where students “can record themselves and classmates for a classroom assignment and provide speech samples to the teacher for assessment” (p. 6). Tavales and Skevoulis (2006) suggest that students can record themselves or native speakers and then engage in listening practice as they focus on pronunciation, grammar use or intonation. Amemiya, Hasegawa, Kaneko, Miyakoda, and Tsukahara (2007) report on a study using a foreign word learning system with iPods, where they examine pronunciation and images of the vocabulary items (n = 10) with iPods versus pen and paper. Results indicate that some of the iPod group participants claimed that they continued to hear the pronunciation of the word even when not listening to the iPod. No immediate difference in the groups was found following the experiment; however, after 2 weeks, the iPod participants retained the meaning of 40% of the English words using the system, while only 27% were retained by the conventional paper-and-pencil group. Lord’s (2008) study is one of few research projects targeting pronunciation and podcasting specific to FL teaching. Nineteen students in an undergraduate phonetics class recorded tongue-twisters, short readings, and personal reflections on their own pronunciation. Lord used the Pronunciation Attitude Inventory (Elliott, 1995) as well as scores from six oral tasks, rated by three judges on overall pronunciation ability. Both attitudes and pronunciation abilities were assessed pre-semester and post-semester; both were found to improve. Podcasts also remained available as references for students to revisit and work on individual pronunciation issues. Research specific to podcasting, part of the field of computer-assisted language learning (CALL), remains a young and growing area. There has consistently been a lack of empirical research and SLA based research with innovative technologies when they emerge, and most often we are confronted with a focus on student perceptions, beliefs, and attitudes. Levy (2007) claims that the researcher’s approach and goals may differ depending on whether the technology is already established or just emerging. He further explains that emerging or new technology often begins with pilot studies or investigations of attitudes and perceptions (for example, surveys). Since the field of podcasting in FL learning remains relatively undeveloped, it is to be expected that the work available thus far consists of reports on pilot studies and investigations of student perceptions. Young (2007), for example, in her article on iPods, developed a survey to administer to students to find out more about language students’ perspectives on iPod or MP3 player use. Lee and Chan (2007) report on research with 18 students studying information technology who participated in a survey after listening to 3-5 minute podcasts (nine total) over the course of a semester. Results indicate that students perceived listening to the podcasts as worthwhile and enjoyable. O’Bryan and Hegelheimer (2006) report that over the course of a semester, graduate and undergraduate students (n = 6) listened to 14 podcasts for a listening course. Based on surveys, interviews, and a teacher reflective journal, results regarding attitudes, Language Learning & Technology

69

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

feelings about podcasts, and student needs suggest that the podcasts were viewed very positively and that few technical problems occurred. These preliminary studies substantiate Levy’s claim that because podcasting is an emerging technology, much of the literature surrounding it has focused on survey work or pilot studies that attempt to pave the way for more research (Lee & Chan, 2007; O’Bryan & Hegelheimer, 2006) or on the technical how-tos and practical ideas for using podcasting in the classroom (see also: Diem, 2005; Godwin-Jones, 2005; McCarty, 2005; McQuillan, 2006a; Stanley, 2006; Young, 2007). In spite of these few preliminary studies on aspects of podcasting such as learner reactions and attitudes, the field remains young and is growing exponentially. The current study sought to broaden existing research on podcasting and pronunciation and to continue to advance the research conducted to date. To further explore pronunciation within a contextualized podcasting approach, our study sought to investigate the following questions: 1. Did students’ comprehensibility and accentedness improve from their pre-test to post-test? 2. Was there a difference in comprehensibility and accentedness between the extemporaneous podcasts and the scripted podcasts? 3. Did students’ comprehensibility and accentedness improve with each task? 4. Did students have positive attitudes towards the pronunciation tasks and feel their pronunciation improved? Using a mixed methodology design, qualitative and quantitative data were collected and analyzed in order to investigate these questions. METHOD Participants The participants in this study consisted of 12 students learning German and 10 learners of French (n = 22), all L1 of American English, enrolled at a university in the United States during one academic semester. Students were enrolled in intermediate level language classes (fourth semester) and were between the ages of 18 and 22 years old. Twelve (4 in French, 8 in German) of the students had been to French or German speaking countries for varying amounts of time, but none more than a summer. Participation in the project was completed over a 16-week period and participants were selected based on a convenience sample. In other words, intact groups of students enrolled in these intermediate courses were asked to provide consent to participate in this project.4 Materials In previous studies on pronunciation, the elicitation techniques have included repetition based on NS models (Olson & Samuels, 1973; Snow & Hoefnagel-Höhle 1977), reading (Munro & Derwing, 2001) and extemporaneous speech (Elliott, 1995; Thompson, 1991). In order to assess the differences between scripted and extemporaneous tasks, we chose to employ two different types of elicitation techniques. Students recorded a total of 8 podcasts over the course of the semester. At the beginning of the project, students received 60 to 90 minutes of technical training on how to create and upload podcasts to their blogs. All podcast tasks were contextualized around the theme of study abroad. Scripted Pronunciation Podcasts Students recorded 5 scripted pronunciation podcasts (pre-, scripted 1, 2, 3, and post-) between 2 and 3 minutes in length, each related to study abroad. The texts5 used in the pre- and post- podcasts were identical, lasted about 3 minutes, and were first-hand accounts of a French or German student beginning a study abroad experience in the U.S. The texts for podcasts 1, 2 and 3 were chosen to prepare students for the contextualized podcast tasks and read by a NS. Students listened to the podcasts and then made their Language Learning & Technology

70

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

recordings, which were posted as podcasts to personal blogs they had created for their German or French course. Extemporaneous Podcasts Students recorded a total of 3 extemporaneous podcasts during the semester. Texts that students listened to and completed during the pronunciation podcasts served as a model for students to use for each of these podcasts. See Table 1 for a description of each task. Pronunciation Attitude Inventory The pre-test PAI survey (based on Elliott, 1995) consisted of 12 Likert-type questions that assessed students’ attitudes toward pronunciation and 9 background information questions (see Appendix A). The post-test PAI survey consisted of the same 12 Likert-type questions, 8 additional Likert-type questions and 6 open-ended questions specific to students’ attitudes toward the podcasting project. The 14 additional items on the post inventory assessed students’ likes and dislikes with regard to the project, what they found helpful to improve their pronunciation, and any suggested changes. Procedure After making their own recordings, students were required to listen to classmates’ extemporaneous podcasts and post comments on the content. The extemporaneous podcasts were graded by the instructor of each class using a rubric that took into account content, coherency and organization, pronunciation and fluency, accuracy, creativity, and impact on the listener (see Appendix B). For the scripted pronunciation podcasts, a NS assistant listened to each student’s recorded podcast, provided a written assessment with detailed feedback to the student (see Appendix C) and occasionally left comments to the podcasts on students’ blogs. All students maintained an individual blog, where each podcast was posted. The blogs and podcasts were therefore available for anyone on the Internet to visit. Table 1 provides the timeline and details for these tasks. Table 1. Pronunciation and Podcast Tasks Step 1

Pre-Pronunciation Survey (PAI)

Step 2

Pre-Task Listening

Step 3

Scripted Pronunciation Podcast 1 (study abroad experience) – Listen and Pronounce

Step 4

Extemporaneous Podcast 1 – Intercultural story/misunderstanding that occurred either in US or abroad and what you learned from it (2-3 minutes)

Step 5

Scripted Pronunciation Podcast 2 (interview with someone who had studied abroad) – Listen and Pronounce

Step 6

Extemporaneous Podcast 2 – Interview someone who has studied abroad in a French or German speaking country and discuss stereotypes s/he had of people in that country before s/he went and stereotypes people had of him/her as an American. (3-4 minutes)

Step 7

Scripted Pronunciation Podcast 3 (description of a French/German city) – Listen and Pronounce

Step 8

Extemporaneous Podcast 3 – Research a French/German town in which you would be interested in studying abroad. Then create a radio advertisement (what to see, do, eat, sleep, university, classes, etc.) for the city. Remember that you are trying to encourage your classmates to visit you here, so make it sound interesting. (3-4 minutes)

Step 9

Post-task Listening

Step 10

Post-Pronunciation Survey (PAI)

Language Learning & Technology

71

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

RESULTS Data Analysis For the purposes of analysis, students’ podcasts (8 per student) were downloaded and stored on a computer or CD. They were assigned random numbers and then judged by two raters6 in each language, for a total of four judges in the study: one NS of the target language and one NNS (the NNS tested at or above an advanced low proficiency level) for each language.7 All raters were graduate students in German or French. Before beginning to rate the samples, the judges attended a training session with the researchers where they rated several samples from the data set together. Two judges for each language were used to account for any possible variation between raters since Munro and Derwing (1995) found that raters notice different factors when rating comprehensibility and accentedness. Each podcast was rated using a 5-point comprehensibility scale (completely, mostly, fairly, almost not, not comprehensible) and a 5-point accentedness scale (nativelike, almost nativelike, between nativelike and nonnative, more nonnative, and nonnative). The 5-point scale was chosen to give raters an uneven amount of options, but not too many from which to choose (Anderson-Hsieh & Koehler, 1988; Bongaerts, van Summeren, Planken, & Schils, 1997; Elliott, 1995; Major, 1987; Olson & Samuels, 1973; Oyama, 1976; Snow & Hoefnagel-Höhle, 1977; Piper & Cansin, 1988; Thompson, 1991). Since a strong accent does not necessarily affect comprehensibility (Munro & Derwing, 1995), the two scales were chosen to assess both how well students can be comprehended by a NS as well as their accentedness as compared to that of a NS. Accentedness and/or comprehensibility are two common characteristics that have been considered in previous pronunciation studies (Anderson-Hsieh & Koehler, 1988; Bongaerts, et al.; Derwing & Rossiter, 2003; Derwing & Munro, 1997; Elliott, 1995; Thompson, 1991), and were therefore chosen in this study to be appropriate measures of NNS pronunciation. The samples in our study ranged from 2-4 minutes in length in order to give students a chance to ease into the text and to allow them to practice prosody. Since students were producing longer segments of speech, we hoped that they would listen both to how individual words were pronounced as well as how they were strung together in the NS examples to help them improve their prosody when speaking. To allow raters to take note of especially nativelike or non-nativelike prosody, they were instructed to listen to each sample in its entirety before applying the 5-point scale.8 Judges were instructed to rate samples using only whole numbers between 1 and 5. In order to assess the differences between extemporaneous and scripted speech, both types of samples were used (see for example, Ramírez-Verdugo, 2006). For the quantitative analysis of the data, all of the two raters’ scorings were averaged in each language. In 95% of the cases, the raters varied by no more than one number (2 vs. 3, for example). A Wilcoxon Signed Ranks test was used to compare the results of both the PAI pre- and post-tests as well as the results of the accentedness and comprehensibility scales. This method of analysis was chosen because it is a non-parametric test that is able to deal with more than two groups and analyzes the magnitude of the differences between pairs. Since we were analyzing the change between pairs (pre-test and post-test speech segments, for example), this test was deemed most appropriate for our analysis. In addition to the Wilcoxon Signed Ranks test, the percentage improvement was calculated to assess the percentage of students who improved between tasks (see Table 2). As mentioned above, in reference to the pronunciation samples, the pre was compared to the post, the scripted to the extemporaneous, the scripted 1 to 2 and 2 to 3, and the extemporaneous 1 to 2 and 2 to 3 to search for statistically significant differences among the samples. In addition, each podcast was analyzed according to comprehensibility and accentedness and will be discussed accordingly in the results. The French and German samples were of course analyzed separately, since students read different texts in different languages for each task.

Language Learning & Technology

72

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

Comprehensibility Ratings Among the German comprehensibility ratings, there were no significant differences. The difference between the first and second extemporaneous recordings (E1/E2) approached but did not reach significance (p = .066), where 44% of the students (n = 9)9 improved in comprehensibility in the second extemporaneous segment. Among the French comprehensibility ratings, there was one significant difference (p = 049), where 9% of the students (n = 10) received higher comprehensibility ratings on the second scripted sample compared to the first scripted sample (S1/S2). Although there was only one significant difference regarding comprehensibility, the trends revealed in the data point to insights into learner patterns regarding comprehensibility. The finding that 30% of German students improved from the pre- to the post-test in terms of comprehensibility coincided with the trend of the entire semester where for each task, not more than 33% of students improved, except for the difference between the first and second extemporaneous tasks, which almost reached significance. While considerable improvement was apparent for the French group between the first and second scripted segments, Table 2 reveals a lower percentage (10%) of improvement for comprehensibility between the second and third scripted segments (S2/S3). Other notable improvements include a 40% increase in preand post-test scores, comprehensibility improvement when comparing the extemporaneous tasks (40%, 50%), and a 40% improvement in comprehensibility when comparing the first and third scripted tasks with the respective extemporaneous tasks. Table 2. Percent Improvement in Tasks10 Task Comparison Pre/post German Comprehensibility 30% German Accentedness 50% French Comprehensibility 40% French Accentedness 10%

S1/S2 33% 44% 90%* 70%*

S2/S3 E1/E2 E2/E3 S1/E1 S2/E2 S3/E3 20% 44% 33% 20% 33% 10% 10%* 44% 33% 10% 0%* 80% 10% 40% 50% 40% 20% 40% 10% 20% 10% 30% 10% 40%

Note. S = scripted podcast; E = extemporaneous podcast * Indicates significant difference at the .05 level

Accentedness Ratings Among the German accentedness ratings, there was a significant difference between the second extemporaneous and scripted (S2/E2) samples (p = 024) with 0% (n = 9) of the students performing better on the extemporaneous sample (see Table 2). There was also a significant difference between the second and third scripted (S2/S3) segment (p = 047) with only 10% (n = 10) of students improving between treatments. Among the French accentedness ratings, there was a significant difference between the second scripted sample and the first scripted (S1/S2) sample (p = 011) with 70% (n = 10) of the students improving. Regarding improvement of accent, 50% of German students improved from the pre- to the post-test. 44% of German students improved between the first and second scripted tasks (S1/S2), 44% improved between the first and second extemporaneous tasks (E1/E2), and 80% improved from the third scripted to the third extemporaneous task (S3/E3). For the French students, Table 2 reveals that students’ accentedness did not improve much from the pre- to post-test (10%). While the greatest increase was between the first and second scripted (S1/S2) tasks (70%), students also made minimal gains in their overall performance when comparing the scripted with the extemporaneous tasks (30%, 10%, 40%). In addition to overall accent, the raters noted specific sounds with which students had problems. Among the German students, the largest problems concerned differentiating between [ʏ], [yː], [ʊ] and [uː] sounds (53 out of 85 samples, 62%, were noted to have difficulties), prosody (48 out of 85: 56%), Language Learning & Technology

73

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

differentiating between a [z] and an [s] (37 out of 85: 44%), pronouncing [ʁ] (36 out of 85: 42%), differentiating between [ç] and [x] (28 out of 85: 33%), pronouncing [œ] (21 out of 85: 25%), pronouncing [v] (18 out of 85: 21%), and enunciating “-tion” (10 out of 85: 12%). The fact that students had difficulty with prosody and [ʁ] was not surprising since O’Brien (2004) had similar results in her study. On a positive note, there were also sounds that students produced that were more native sounding. For example, many students pronounced shorter, more common words, such as hier, ich, and Deutsch, with almost native-like proficiency. Eight out of the 85 samples (9%) had almost native-like prosody. For the students in French, the most challenging areas included: difficulty making liaisons (14 out of 78: 18%), pronouncing the French [r] (34 out of 78: 44%), problems with silent sounds in word endings (38 out of 78: 49%), and pronouncing the sounds [y] (16 out of 78: 21%) and [ø] (11 out of 78: 14%). Thirty out of 78 (38%) of the samples were given positive comments with regard to prosody. PAI Results The PAI was administered at the beginning and at the end of the semester for both groups [pre: n = 22; post n = 21] in order to compare any changes in students’ attitudes with regard to pronunciation. A Wilcoxon Signed Ranks test was used to compare the results of both the PAI pre- and post-tests and revealed no significant differences (see Table 3 for each p value). A second test was administered to compare the differences (gains and losses) for each question of the PAI. While a few questions revealed a slight variation in the gains and losses, the test confirmed that there were not significant differences in the students’ attitudes from the beginning to the end of the semester. Table 3. p Values for PAI Wilcoxon Signed Ranks Test Q1Post- Q2Post- Q3Post- Q4Post- Q5Post- Q6Post- Q7Post- Q8Post- Q9Post- Q10Post- Q11Post- Q12PostQ1Pre Q2Pre Q3Pre Q4Pre Q5Pre Q6Pre Q7Pre Q8Pre Q9Pre Q10Pre Q11Pre Q12Pre Asymp. Sig. (2tailed)

.317

.317

.163

.655

.206

1.000

.180

.290

.210

.705

.854

.380

Note. No significant differences (alpha level of .05).

While the statistics provide us with some information about students’ attitudes, we must also examine the short answer sections attached to the PAI questions. Although we had anticipated that more students would enjoy the extemporaneous tasks because they encouraged more creativity, 12 out of 20 (60%) participants preferred the scripted podcasts over the extemporaneous podcasts. Some students reported that they took less time and were therefore easier to accomplish: “The pronunciation podcasts were far easier and took much less time, so I liked them more, but I enjoyed the creative podcasts more.” Other students enjoyed listening to the NS model before recording themselves: “Overall, I enjoyed the pronunciation podcasts more. I think this is because on the more creative ones, I wasn’t able to hear someone else pronounce everything, so there were times when I wasn't really sure how to say something, which is kind of frustrating.” Another student felt the scripted texts “helped more with [his/her] accent.” Some students also felt the feedback provided after the scripted texts was extremely helpful: “The comments made on [the pronunciation texts] helped me to see what specifically I was doing wrong, and also it was helpful to hear the words spoken correctly; it made it easier to try to imitate those sounds.” Another student commented on the process of recording the scripted texts: “With the pronunciation podcasts, you can listen over and over until the correct pronunciation is ingrained in your head, which is helpful when you're trying to improve on that pronunciation.” Overall, students reported that they appreciated completing tasks that focused on pronunciation and the model and feedback by a NS. They also recognized the value of the extemporaneous texts for promoting creativity and simulating real life situations. Moreover, many students indicated their desire to participate in a similar project in the future.

Language Learning & Technology

74

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

GENERAL DISCUSSION The first three research questions asked if students’ pronunciation improved from the pre-test to the posttest—if there was a significant difference in pronunciation between the extemporaneous podcasts and the scripted podcasts and if students improved with each task. According to the statistical analysis, there were no consistent significant differences from the pre- to the post-test, over time, or between tasks. The only significant difference in terms of improvement was regarding French students’ comprehensibility and accentedness from scripted task 1 to 2. With regard to change over time between similar tasks, the German students did not improve significantly over time regarding comprehensibility. However, there were some changes with regard to accentedness. Over the course of the semester, from the pre- to the post-test, 50% of students improved their accent. There was also an improvement of 44% of German students from the first to the second extemporaneous tasks in comprehensibility and accentedness. A possible reason for this higher rating could be due to the fact that students were conducting an interview in the second sample and therefore attended more to their pronunciation than they had in the first extemporaneous task. As suggested by Rajadurai (2007), the presence of an interlocutor may encourage students to be more comprehensible to facilitate communication. Although there is no evidence to explain the unexpected result regarding the low rate of accent improvement between scripted tasks 2 and 3, where there was a significant difference, one could speculate that perhaps the German students treated the second scripted segment with more care in their recordings, or that they simply were busy with end-of-the semester work at the time of the third recording. The French students also did not score as well between the second and third scripted segments as they did between the first and the second in terms of both accentedness and comprehensibility. Perhaps students found the interview (second scripted segment) easier or more interesting (given that they knew both the interviewer, who was their teacher, and the interviewee, who was the NS working with them) to produce than they did with the first scripted segment, which simply discussed the importance of study abroad. As for the extemporaneous, contextualized tasks, students in both languages showed little improvement. This finding could be due to the fact that students focused more on what they wanted to say than how they actually said it or in other words, they focused more on meaning than form. Since the task required more creativity and students were not able to simply read a prepared text, they may have not devoted as much energy to pronunciation itself or they may not have focused as much attention to form (vs. meaning). Whatever the case may be for the unexpected results, it is important to point out that within both learner groups, there were no repeated significant changes over time and in all cases, students did not consistently improve with each treatment. In terms of task type, the lack of consistent significant differences between the scripted and extemporaneous segments indicates that for these participants the two tasks were relatively similar. Only in the case of the German students did 80% rate higher for accentedness for the third extemporaneous task than the scripted task. Since this was an advertisement for a city where they might want to study abroad, which allowed for more creativity, perhaps motivation was higher and they were more excited about the assignment and attended more to their accent. In all other cases, though, it is interesting to note that students performed similarly whether they were reading a text or speaking extemporaneously. These results are consistent with those of Moyer (1999) and Munro and Derwing (1994), but contradict the findings of Oyama (1976) and Thompson (1991). Although it had been hypothesized that students would score higher in the scripted samples because they only had to focus on pronunciation, perhaps the lack of focus on meaning hindered their pronunciation, while the focus on meaning in the extemporaneous samples led to increased attention to pronunciation as well.

Language Learning & Technology

75

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

There are several possible explanations as to why these participants did not experience substantial improvement in their pronunciation in terms of accentedness and comprehensibility. First, perhaps 16 weeks is not a sufficient amount of time to make gains in pronunciation, especially in an intermediate language course where the focus is not specifically dedicated to this task. While Lord’s (2008) study indicates some improvement in pronunciation, we must consider that her course focused exclusively on phonetics and pronunciation. Similarly in Graeme’s (2006) study, in which students improved their pronunciation over a semester, they focused only on specific phonemes, not global pronunciation. Although the students in the present study focused more on pronunciation than in typical fourth-semester language classes, 8 treatments in 16 weeks does not seem to constitute enough devoted time to facilitate a marked improvement. Their in-class work focused mainly on practicing interpretive and interpersonal skills, not specifically pronunciation. As evidenced in O’Brien’s (2004) research, students make larger gains in pronunciation when in a study abroad context or as Lord (2005) illustrated, perhaps it is necessary for students to enroll in a phonetics and phonology course specifically designed to focus on pronunciation for students to make noticeable improvement, as pronunciation was not a focus in either learner group. In terms of comprehensibility, it is possible that the lack of significant differences from pre- to post-test is due to a ceiling effect. Most of the participants were already completely or mostly comprehensible at the beginning of the semester, although this fact was not known when participants were recruited.11 For the German pre-test, 10 out of 12 students received a ranking of 1 (completely comprehensible) or 2 (mostly comprehensible) and for the post-test, all of the students received a 1 or a 2. For all of the other tasks throughout the semester, both scripted and extemporaneous, most students continued to receive high scores (1 or 2) for comprehensibility. For the French class, 9 out of 10 students received a rating of 1 or 2 for both the pre- and post- tests for comprehensibility. For all other tasks, students received ratings of 1, 2, or 3. While the lack of perceived change between the pre- and post-test is not encouraging, it is remarkable that most of the students in these fourth semester German/French classes were already almost completely comprehensible and by the end of the semester, everyone was rated as completely or mostly comprehensible. The results of the accentedness ratings illustrate that students remained more or less the same throughout the semester. The majority of German students (at least 8 for each sample) received a rating of 3 (between native-like and nonnative), 4 (more nonnative), or 5 (nonnative) for all of the treatments. The French students consistently received ratings of 2 or 3 throughout the semester for all treatments, although there was little improvement between the second and third pronunciation tasks. These results also suggest that 16 weeks and 8 treatments is not enough time for improvement in terms of accentedness. Although students received feedback from a NS on specific phonemes they needed to practice, there was little time in class to devote to this practice. Students were also encouraged to make use of the free tutoring to work on these issues, but they were seemingly unable to make substantial improvements on their own, or perhaps chose to focus their efforts elsewhere. In response to the last research question, whether students had positive attitudes towards pronunciation and felt their pronunciation improved during the semester, statistics again revealed no significant differences in the pre- and post-tests. In examining the frequencies for each question, most answers stayed the same between the PAIs. It is worth noting, however, that there was some variation for two inventory items. For item 8 (Communicating is much more important than sounding like a native speaker of French/German), more students valued communication by the end of the semester than they did when they began their intermediate language course. Responses to item 9 (Good pronunciation skills in French/German are not as important as learning vocabulary and grammar) indicated that by the semester’s end more students valued grammar and vocabulary over pronunciation. It seemed therefore, that students valued pronunciation less by the end of the semester. This attitude may not be surprising since these intermediate courses emphasized communication, vocabulary and grammar during class and Language Learning & Technology

76

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

encouraged students to practice pronunciation independently outside of class as part of the podcasting project. Perhaps their high scores on comprehensibility also communicated to students that they were already comprehensible and therefore did not need to worry as much about their pronunciation as they may have previously, which could help to relieve some anxiety when speaking. Pedagogical Implications The results of this study suggest that podcasting and repeated recordings alone are not enough to improve pronunciation over an academic semester. Based on our findings, we have several suggestions for how FL instructors could integrate podcasting into their classes in order to lead to more advances in pronunciation. Even though the current model in most FL textbooks is to provide pronunciation exercises for students to practice outside of class, which is similar to the design of our project, such independent study does not seem sufficient. If teachers hope that students’ pronunciation will improve as a result of outside practice with CDs, MP3s, or podcasts, it may require more focused and consistent pronunciation practice in class or meetings outside of class with a NS in addition to the assigned tasks, ideally as a supplement to the podcasting exercises. Once students receive feedback on podcasts, for example, they could work with a NS tutor or with the class to improve specific sounds with which they had difficulty or more generally, prosody. Another supplement to podcasting tasks could be computer-assisted visual feedback. With appropriate training, students could visually and aurally compare their sounds to those of NSs to improve specific trouble areas (Ehsani & Knodt, 1998; Hardison, 2004; Martin, 2004; O’Brien, 2006). As mentioned above though, this software should be combined with podcasting since ASR software often lacks a context (O’Brien, 2006) and podcasts can be recorded for a specific purpose and audience. In addition, more classroom practice in prosody, including pronunciation practice in context, would be useful to students. This type of practice could be accomplished by having students repeat longer discourse such as dialogues, as suggested by Moyer (1999), by drawing students’ attention to prosody during communicative tasks, and by including prosody as a component in assessment. Due to the small sample size, small number of raters, and limited amount of time, we have several suggestions for further research. It would be useful to conduct a similar study with lower proficiency students to see if there might be greater statistical improvement, considering that many of our pretest ratings were near the highest rating. Since there were only 22 students in this study, the results are not generalizable. Further studies with podcasting that include more students and levels, more NS raters or NNS raters at a superior level,12 and even other languages could be conducted to investigate whether students’ pronunciation might improve over a year or even longer. Since there is not a large emphasis placed on pronunciation in most beginning and intermediate language classes, our aim was for podcasts to be able to provide this extra practice that is lacking and we designed our tasks to encourage students to focus on and be aware of their pronunciation. It would, therefore, be useful to examine in a follow-up study the results of a similar podcasting project conducted in conjunction with dedicated practice in class and/or with a NS tutor or visualization software to assist students with their specific difficulties. The effect of an interlocutor on pronunciation could also be examined, as the results of the German students corroborated those of Rajadurai (2007), who found that students’ pronunciation improved when speaking with someone else. While our study is based on holistic evaluation and allows primarily for a general account of pronunciation improvement, a more detailed examination of the acquisition of particular pronunciation features, as well as the impact podcasting can have on these features would be worth investigation. Further, although the raters were to take into account the students’ pronunciation at both the segmental and the suprasegmental levels, only one rating scale involved accentedness, so it could have been difficult to distinguish between the two levels. Improvement at one level is not necessarily dependent on Language Learning & Technology

77

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

improvement at the other level, and perhaps the lack of ratings allowing for a distinction between segments and prosody may have contributed to the lack of overall significant differences between preand post-tests. We suggest the use of measurement instruments allowing for a distinction between the segmental and the suprasegmental levels in future studies of this kind. In addition, samples by NSs and NNSs with little exposure to the target language should be included in the pool of samples in future studies to provide raters with a broader range of levels of comprehensibility for the purpose of comparison. It is possible, that because students in the current study were at similar levels, the raters mainly compared them with each other in terms of comprehensibility; therefore they were judged similarly. Addressing some of the limitations in this study would provide useful data for future projects and add to the growing number of empirical studies on implementing podcasting in FL classes.

NOTES 1. Both authors contributed equally to this manuscript. 2. Other pioneering projects include those at Duke University and Middlebury College (cited in Thorne & Payne, 2005) and the University of Wisconsin (The University of Wisconsin Language Institute Website, n.d.). 3. Some examples of other universities using podcasts include the Texas Language Technology Center at UT Austin where podcasts are offered for speakers of Spanish learning Portuguese, featuring pronunciation and grammar. The University of Wisconsin at Madison’s Department of German produces podcasts for different levels of language learners studying German. See, for example: http://german.lss.wisc.edu/gdgsa/podcast 4. While we asked participants to provide information on their prior language background, we did not inquire about their prior use with technology because a 90-minute training session was provided to all participants. Students had access to a soundproof room in the language lab where they could conduct their recordings. 5. All of the authentic texts used for the pronunciation podcasts were found on the Internet. 6. The terms raters and judges are used interchangeably in this paper. 7. Raters were selected from a pool of available Graduate Teaching Assistants (GTAs). 8. The length of speech samples has varied among pronunciation studies from one word (Flege & Munro, 1994; González-Bueno, 1997; Moyer, 1999), to a phrase or sentence (Derwing & Munro, 1997; Flege, Frieda, & Nozawa, 1997; Munro & Derwing, 1998, 2001; Riney & Flege, 1998), or even to a longer 3090 second clip (Elliot, 1995; Piper & Cansin, 1988). 9. While there were 12 German students in the class, not all of the students completed all of the tasks. Hence for some of the comparisons, the n is less than 12. 10. The improvement reflects an increased rating from the first task listed to the second task listed. 11. Comprehensibility is a complex feature of pronunciation that could be influenced by a number of factors. Influence of study abroad on pronunciation is an important factor that should be considered in future studies, especially considering that 12 out of 22 of the students in this study had been abroad. 12. NSs are normally used as raters, and there are also cases where superior level non-native speakers have been deemed to be appropriate raters (Elliott, 1995; Lord, 2005, 2008; Olson & Samuels, 1973).

Language Learning & Technology

78

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

13. The questions below were added to the post-questionnaire to collect students’ feedback on the podcasting project. The pre-questionnaire consisted only of the first 12 questions in the PAI. Both questionnaires were completed on-line.

APPENDIXES Appendix A: Pre- and Post- Surveys The Pronunciation Attitude Inventory (PAI) (Adapted from Elliott, 1995) Please read the following statements and choose the response that best corresponds to your beliefs and attitudes. Please answer all items using the following response categories: 5 = Always or almost always true of me 4 = Usually true of me 3 = Somewhat true of me 2 = Usually not true of me 1 = Never or almost never true of me 1. I'd like to sound as native as possible when speaking a foreign language. 2. Acquiring proper pronunciation in a foreign language is important to me. 3. I will never be able to speak a foreign language with a good accent. 4. I believe I can improve my pronunciation skills in my foreign language. 5. I believe more emphasis should be given to proper pronunciation in class. 6. One of my personal goals is to acquire proper pronunciation skills and preferably be able to pass as a near-native speaker of the language. 7. I try to imitate foreign language speakers as much as possible. 8. Communicating is much more important than sounding like a native speaker of my foreign language. 9. Good pronunciation skills in my foreign language are not as important as learning vocabulary and grammar. 10. I want to improve my accent when speaking my foreign language. 11. I'm concerned with my progress in my pronunciation of my foreign language. 12. Sounding like a native speaker is very important to me.

Language Learning & Technology

79

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

Additional Questions13 Please answer the following questions based on your experiences this semester with the blogs and podcasts. strongly agree agree neutral disagree strongly disagree SA A N D SD 1. I enjoyed posting some of my assignments to my blog this semester. 2. I enjoyed reading my classmates’ blogs and listening to their podcasts this semester. 3. I feel my pronunciation improved from recording myself reading texts in the foreign language. 4. I enjoyed getting comments from my classmates on my blog. 5. I read my classmates’ comments regularly. 6. I would like to continue to work on my pronunciation by recording myself in future foreign language classes. 7. I found the comments from the native speaker grader to be helpful. 8. I found recording and listening to pronunciation to be a useful exercise.

SA A N D SD SA A N D SD SA A N D SD SA A N D SD SA A N D SD SA A N D SD SA A N D SD SA A N D SD

9. Comment on the blog/podcasting assignments this semester? Which ones did you enjoy most and least? Why? 10. Did you enjoy the pronunciation or the more creative podcasts more? Why? 11. Did you find the pronunciation or the study abroad podcasts to be more helpful to your learning? Why? 12. Did you like getting feedback on your pronunciation from a native speaker? Why or why not? 13. Would you have preferred getting feedback on your pronunciation by your teacher or one of your classmates? Why or why not? 14. Is there anything you would change about this project?

Language Learning & Technology

80

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

Appendix B: Extemporaneous Podcast Grading Rubric Your Podcast 5 points – Content 4-5 pts 2-3 pts 1 point

__________ topic fully discussed with several examples from your experiences and research topic only cursorily discussed with only one example provided topic barely discussed with no examples provided

5 points – Coherency and Organization 4-5 points coherent and well-organized, includes title 2-3 points somewhat difficult to follow, includes title 1 point not organized, no title

__________

5 points – Pronunciation and Fluency 4-5 points few errors in pronunciation; conversation flows well 2-3 points a fair amount of pronunciation errors, but still comprehensible; many starts and stops in conversation 1 point meaning unclear due to pronunciation errors

__________

5 points – Accuracy 4-5 points 2-3 points 1 point

__________

5 points – Creativity 4-5 points 2-3 points 1 point 5 points – Impact 4-5 points 2-3 points 1 point

few errors in spelling and grammar many spelling or grammar errors, but still comprehensible meaning unclear due to spelling or grammar errors __________ creative presentation of topic including music, pictures, background, special effects, and/or energetic presentation semi-creative presentation without additional effects completely uncreative presentation __________ voice is engaging, voice sounds natural, includes natural pauses and hesitations, variation in voice intonation voice is not very engaging, little variation in voice intonation, parts of podcasts sounds read aloud voice is not at all engaging, monotone voice, entire podcast sounds read aloud Total Points __________/30

Language Learning & Technology

81

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

Appendix C: Scripted Podcast Grading Rubric Grading Rubric for Pronunciation Podcasts Since everyone’s pronunciation strengths are different, you will be graded on completion, improvement, clarity of pronunciation, and successful posting to the blog for a total of 15 points. Completion (3 pts) Less than half of text read Almost all of text read Entire text read Clarity (3 pts) Many parts of podcast hard to understand Parts of podcast hard to understand Entire podcast clear and easy to understand Improvement (6 pts) No or only slight improvement from last podcast Improvement on one of 2 aspects from last podcast Improvement on both aspects from last podcast Posted to Blog (3 pts) Not successfully posted to blog Posted late to blog Successfully posted on time to blog

__________ 1 pt 2 pts 3 pts __________ 1 pt 2 pts 3 pts __________ 1-2 pts 3-4 pts 5-6 pts __________ 1 pt 2 pts 3 pts Total Points __________/15

Your pronunciation goals for next time (self-assessment):

Two aspects of pronunciation you should work on for next time (teacher comments):

Language Learning & Technology

82

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

ACKNOWLEDGEMENTS We are grateful to Gillian Lord, Mary Grantham O’Brien and the three anonymous reviewers for their useful and insightful comments and suggestions. We also thank Roumen Vesselinov for his patience and advice regarding statistical analysis.

ABOUT THE AUTHORS Lara Ducate is an associate professor of German and foreign language teaching methodology at the University of South Carolina. Her research interests focus on computer-assisted language learning, including blogs, podcasts, wikis, and teacher education. E-mail: [email protected] Lara Lomicka is an associate professor of French and linguistics at the University of South Carolina. Her research interests include intercultural learning, blogs, podcasts, wikis, and teacher education. E-mail: [email protected]

REFERENCES Amemiya, S., Hasegawa, K., Kaneko, K., Miyakoda, H., & Tsukahara, W. (2007). Development and evaluation of a foreign-word learning system by iPods. In Proceedings of the Sixth IASTED International Conference on WEB-Based Education (pp. 264-269). Chamonix, France. Anderson-Hsieh, J. R., & Koehler, K. (1988). The effect of foreign accent and speaking rate on native speaker comprehension. Language Learning, 38, 561-593. Bongaerts, T., van Summeren, C., Planken, B., & Schils, E. (1997). Age and ultimate attainment in the pronunciation of a foreign language. Studies in Second Language Acquisition, 19, 447-465. Brennan, E., & Brennan, J. S. (1981). Accent scaling and language attitudes: Reactions to Mexican American English speech. Language and Speech, 24(3), 207-221. Chan, A., & Lee, M. J. W. (2005). An MP3 a day keeps the worries away: Exploring the use of podcasting to address preconceptions and alleviate pre-class anxiety amongst undergraduate information technology students. In D. H. R Spennemann & L. Burr (Eds.), Good practice in practice. Proceedings of the Student Experience Conference (pp. 59-71). Wagga Wagga, NSW: Charles Sturt University. Chun, D. (1988). The Neglected role of intonation in communicative competence and proficiency. Modern Language Journal, 72, 295-303. Derwing, T., & Munro, M. J. (1997). Accent, intelligibility and comprehensibility: Evidence from four L1s. Studies in Second Language Acquisition, 20, 1-16. Derwing, T. M., & Rossiter, M.J. (2003). The effects of pronunciation instruction on the accuracy, fluency and complexity of L2 accented speech. Applied Language Learning, 13(1), 1-17. Diem, R. (2005). Podcasting: A new way to reach students. The Language Teacher, 29(8), 45-46. Ehsani, F., & Knodt, E. (1998). Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm. Language Learning & Technology, 2(1), 45-60. Retrieved from http://llt.msu.edu

Language Learning & Technology

83

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

Elliott, A.R. (1995). Foreign language phonology: Field independence, attitude, and the success of formal instruction in Spanish pronunciation. Modern Language Journal, 79, 530-542. Flege, J. E., Frieda, E. M., & Nozawa, T. (1997). Amount of native-language (L1) use affects the pronunciation of an L2. Journal of Phonetics, 25, 169-186. Flege, J. E., & Munro, M. J. (1994). The word unit in second language speech production and perception. Studies in Second Language Acquisition, 16(4), 381-411. Godwin-Jones, R. (2005). Skype and podcasting: Disruptive technologies for language learning. Language Learning and Technology, 9 (3), 9-12. Retrieved from http://llt.msu.edu González-Bueno, M. (1997). Voice-onset-time in the perception of foreign accent by native listeners of Spanish. IRAL, 35(4), 251-267. Graeme, C. (2006). The short and long-term effects of pronunciation instruction. Prospect, 21(1), 46-66. Hardison, D. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning & Technology, 8(1), 34-52. Retrieved from http://llt.msu.edu Harris Interactive. (2007). On campus and beyond. Trends & Tudes, 6(6), 1-5. Retrieved from http://www.harrisinteractive.com/news/newsletters/k12news/HI_TrendsTudes_2007_v06_i06.pdf Jilka, M. (2000). Testing the contribution of prosody to the perception of foreign accent. New Sounds, 4, 199-207. Leather, J. (1999). Second-language speech research: An introduction. Language Learning, 49(1), 1-56. Lee, M. J. W., & Chan, A. (2007). Reducing the effects of isolation and promoting inclusivity for distance learners through podcasting. Turkish Online Journal of Distance Education, 8 (1), 85-105. Retrieved from http://tojde.anadolu.edu.tr Levy, M. (2007). Research and technological innovation in CALL. Innovation in Language Learning and Teaching, 1(1), 180-190. Lord, G. (2005). (How) can we teach foreign language pronunciation? On the effects of a Spanish phonetics course. Hispania, 88(3), 557-567. Lord, G. (2008). Podcasting communities and second language pronunciation. Foreign Language Annals, 41(2), 364-379. Magen, H. (1998). The perception of foreign-accented speech. Journal of Phonetics, 26, 381-400. Major, R. C. (1987). Phonological similarity, markedness, and rate of L2 acquisition. Studies in Second Language Acquisition, 9(1), 63-82. Martin, P. (2004). Winpitch LTL II, a multimodal pronunciation software. Paper presented at InSTIL/ICALL 2004 Symposium on Computer Assisted Learning, Venice, Italy. Retrieved from http://www.isca-speech.org/archive/icall2004/iic4_042.html McCarty, S. (2005). Spoken Internet to go: Popularization through podcasting. JALT CALL, 1(2), 6774. Retrieved http://jaltcall.org/news/index.php McQuillan, J. (2006a). Languages on the go: Tuning in to podcasting. The International Journal of Foreign Language Teaching, 2(1), 16-18. Retrieved from http://www.tprstories.com/ijflt McQuillan, J. (2006b). iPod in education: The potential for language acquisition. White Paper. Retrieved http://edcommunity.apple.com/ali/galleryfiles/12071/iPod_Edu_Whitepaper_Language_Aquisit ion.pdf

Language Learning & Technology

84

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

Meng, P. (2005). Podcasting and vodcasting: A white paper. Retrieved from http://edmarketing.apple.com/adcinstitute/wp-content/Missouri_Podcasting_White_Paper.pdf Moyer, A. (1999). Ultimate attainment in L2 phonology: The critical factors of age, motivation, and instruction. Studies in Second Language Acquisition, 21, 81-108. Munro, M. J. (1995). Nonsegmental factors in foreign accent. Studies in Second Language Acquisition, 17(1), 17-34. Munro, M. J., & Derwing, T. M. (1994). Evaluations of foreign accent in extemporaneous and read material. Language Testing, 11(3), 253-266. Munro, M. J., & Derwing, T.M. (1995). Processing time, accent, and comprehensibility in the perception of native and foreign-accented speech. Language and Speech, 38(3), 289-306. Munro, M. J., & Derwing, T. M. (1998). The effects of speaking rate on listener evaluations of native and foreign-accented speech. Language Learning, 48(2), 159-182. Munro, M. J., & Derwing, T. M. (2001). Modeling perceptions of accentedness and comprehensibility of L2 speech. Studies in Second Language Acquisition, 23, 451-468. Munro, M. J., & Derwing, T. M. (2007). The functional load principle in ESL pronunciation instruction: An exploratory study. System, 34(4), 520-531. O’Brien, M. (2004). Pronunciation matters. Die Unterrichtspraxis, 37(1), 1-9. O’Brien, M. (2006). Teaching pronunciation and intonation with computer technology. In L. Ducate & N. Arnold (Eds.), Calling on CALL: From Theory and Research to New Directions in Foreign Language Teaching (pp.1-20). San Marcos, TX: Computer Assisted Language Instruction Consortium. O'Bryan, A., & Hegelheimer, V. (2007) Integrating CALL into the classroom: The role of podcasting in an ESL listening strategies course. ReCALL, 19(2), 162-180. Olson, L., & Samuels, S. J. (1973). The relationship between age and accuracy of foreign language pronunciation. The Journal of Educational Research, 66(6), 263-268. Oyama, S. (1976). A sensitive period for the acquisition of a nonnative phonological system. Journal of Psycholinguistic Research, 5(3), 261-283. Pennington, M. C. (1989). Teaching pronunciation from the top down. RELC Journal, 20(1), 20-38. Pennington, M. C., & Richards, J. C. (1986). Pronunciation revisited. TESOL Quarterly 20, 207-25. Piper, T., & Cansin. D. (1988). Factors influencing the foreign accent. The Canadian Modern Language Review, 44(2), 334-342. Rajadurai, J. (2007). Intelligibility studies: A consideration of empirical and ideological issues. World Englishes, 26(1), 87-98. Ramírez-Verdugo, D. (2006). A study of intonation awareness and learning in non-native speakers of English. Language Awareness, 15(3), 141-159. Riney, T., & Flege, J. (1998). Changes over time in global foreign accent and liquid identifiability and accuracy. Studies in Second Language Acquisition, 20, 213-243. Rosell-Aguilar, F. (2007). Top of the pods -- In search of a podcasting "podagogy" for language learning. Computer Assisted Language Learning, 20(5), 471-492.

Language Learning & Technology

85

Lara Ducate and Lara Lomicka

Podcasting: An Effective Tool?

Rosell-Aguilar, F. (2009). Podcasting for language learning: Re-examining the potential. In L. Lomicka & G. Lord (Eds.), The Next Generation: Social Networking and Online Collaboration in Foreign Language Learning (pp. 13-34). San Marcos, TX: CALICO. Seferoglu, G. (2005). Improving students' pronunciation through accent reduction software. British Journal of Educational Technology, 36(2), 303-316. Snow, C. E., & Hoefnagel-Höhle, M. (1977). Age differences in the pronunciation of foreign sounds. Language and Speech, 20(4), 357-365. Stanley, G. (2006). Podcasting: Audio on the Internet comes of age. TESL-EJ, 9(4). Retrieved from http://www-writing.berkeley.edu:16080/TESL-EJ Stevick, E., Morley, J., & Wallace Robinett, B. (1975). Round robin on the teaching of pronunciation. TESOL Quarterly, 9(1), 81-88. Swain, M., & Lapkin, S. (1995). Problems in output and the cognitive processes they generate: A step towards second language learning. Applied Linguistics, 16, 371-391. Tan, Y. H., & Mong, K. T. (n.d.). Audioblogging and podcasting in education. Retrieved from http://edublog.net/wp Tavales, S., & Skevoulis, S. (2006). Podcasts: Changing the face of e-learning. Retrieved from http://ww1.ucmss.com/books/LFS/CSREA2006/SER4351.pdf Terrell, T. D. (1989). Teaching Spanish pronunciation in a communicative approach. In P. C. Bjarkmana & R. M. Hammond (Eds.), American Spanish pronunciation: Theoretical and applied perspectives (pp. 196-214). Washington, DC: Georgetown University Press. Thompson, I. (1991). Foreign accents revisited: The English pronunciation of Russian immigrants. Language Learning, 41(2), 177-204. Thorne, S. L., & Payne, J. S. (2005). Evolutionary trajectories, internet-mediated expression, and language education. CALICO Journal, 22(3), 371-397. The University of Wisconsin Language Institute Website. Retrieved from http://languageinstitute.wisc.edu Van Els, T., & de Bot, K. (1987). The role of intonation in foreign accent. The Modern Language Journal, 71(2), 147-155. Volle, L. (2005). Analyzing oral skills in voice e-mail and online interviews. Language Learning & Technology, 9(3), 146-163. Retrieved from http://llt.msu.edu Weinberg, A., & Knoerr, H. (2003). Learning French pronunciation: Audiocassettes or multimedia? CALICO Journal, 20(2), 315-336. Young, D. J. (2007). iPods, MP3 players and podcasts for FL learning: Current practices and future considerations. NECTFL Review, 60, 39-49.

Language Learning & Technology

86

Language Learning & Technology http://llt.msu.edu/vol13num3/warrenelgortcrabbe.pdf

October 2009, Volume 13, Number 3 pp. 87–102

COMPREHENSIBILITY AND PROSODY RATINGS FOR PRONUNCIATION SOFTWARE DEVELOPMENT Paul Warren, Irina Elgort, David Crabbe Victoria University of Wellington In the context of a project developing software for pronunciation practice and feedback for Mandarin-speaking learners of English, a key issue is how to decide which features of pronunciation to focus on in giving feedback. We used naïve and experienced native speaker ratings of comprehensibility and nativeness to establish the key features affecting comprehensibility of the utterances of a group of Chinese learners of English. Native speaker raters assessed the comprehensibility of recorded utterances, pinpointed areas of difficulty and then rated for nativeness the same utterances, but after segmental information had been filtered out. The results show that prosodic information is important for comprehensibility, and that there are no significant differences between naïve and experienced raters on either comprehensibility or nativeness judgements. This suggests that naïve judgements are a useful and accessible source of data for identifying the parameters to be used in setting up automated feedback. INTRODUCTION Most learners of English need to develop their pronunciation ability to the point where it has no serious effect on comprehensibility when they are engaged in oral communication. Some develop this skill naturally over time and to a reasonable level of accuracy through imitation of one native speaker norm or another. Others need to work harder at it with expert guidance. Morley (1991, p. 492-495) provides a comprehensive overview of groups of learners in special need of pedagogical support for pronunciation for both ESL (English as a second language) and EFL (English as a foreign language) settings. Yet, Derwing and Munro (2005) lament the “marginalization of pronunciation within applied linguistics” (p. 382). Their informal survey shows either complete omission of pronunciation from some key publications, such as The Handbook of Second Language Acquisition (Doughty & Long, 2003), or only minimal attention to the topic (as in Hedge, 2000; Nunan, 1999). They also found few papers on pronunciation in journals in applied linguistics. Research in Canada, Britain and Australia shows that in addition to a lack of training in pronunciation instruction, English teachers, in general, do not have a strong enough background in phonetics to feel confident to teach pronunciation (Breitkreutz, Derwing, & Rossiter, 2002; Burgess & Spencer, 2000; MacDonald, 2002; Murphy, 1997). Moreover one of the key features of a pedagogy of pronunciation is necessarily feedback on performance, and yet providing pronunciation feedback is an intensive, time-consuming activity requiring one-to-one work. It is not surprising, then, that pronunciation is often given little attention in the classroom, particularly in the communicative curriculum where a focus on meaning has long dominated over a focus on form, including phonetic form. In this context, the computer-assisted language learning approach appears to be promising, as it can enable students to work on improving their pronunciation independently, focusing on aspects of pronunciation relevant to individual needs, based on L1 (first language) background and language learning goals (Pennington, 1999). Unfortunately, it appears that much of the commercially-available pronunciation software does not meet the criterion of being “linguistically and pedagogically sound” (Derwing & Munro, 2005, p. 391; see also Neri, Cucchiarini, Strik, & Boves, 2002). A key requirement for effective CAP (computer-assisted pronunciation, cf. Pennington, 1999) software is that it provides “immediate, useful feedback, especially for those features that are most important for intelligibility” (Levis, 2007, p. 186; see also Neri, et al., 2002).

Copyright © 2009, ISSN 1094-3501

87

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

The data reported in the current paper form part of a larger collaborative project, involving researchers in phonology, computer science, and second language pedagogy. The project explores the provision of automated feedback on learners’ pronunciation, in the context of pronunciation development as a component of conversational fluency (Pennington & Richards, 1986). This is not of course a new undertaking. Many forms of automated feedback on pronunciation are now appearing, based on a comparison of the learner’s utterance with a target norm stored in the system (for reviews of the issues, see Ehsani & Knodt, 1998; Levis, 2007; Neri, et al., 2002; Pennington, 1999). Such programs are not yet developed to the point where all of the automated responses are useful in guiding learners towards improving their performance, but this is a productive field in which gradual progress is being made (see, for example, Connected Speech by Protea Textware1 or ISLE software produced by the European ISLE Consortium2). Our aim was to ensure that software development is informed by linguistic understanding, particularly of comprehensibility parameters. The project component presented in this paper aims to identify the principal speech features that contribute to comprehensibility and nativeness. ACCENTEDNESS AND NATIVENESS, INTELLIGIBILITY, AND COMPREHENSIBILITY Levis (2007, p. 187) identifies “two overlapping and conflicting” principles in pronunciation research and pedagogy (see also Levis, 2005): the nativeness principle and the intelligibility principle. One characterisation of the difference between nativeness and intelligibility is that the former refers to “how strong the talker’s accent is perceived to be” (Munro & Derwing, 1995, p. 291), or “how different a speaker’s accent is from that of the L1 community” (Derwing & Munro, 2005, p. 385), while intelligibility commonly refers to the extent to which an utterance is actually understood by a listener. Although the nativeness principle continues to be reflected in English teaching curricula and in research concerned with the relationship between foreign accents and identity, the principle of intelligibility has come to the fore in the context of communicative language teaching approaches. A commonly-used alternative label to “nativeness” is “accentedness.” Derwing and Munro (1997, p. 6) use both terms for one of their tasks—their response continuum ranges from “perfectly nativelike” to “extremely accented.” For the current study we have chosen to use the label “nativeness,” primarily because our rating task uses low-pass filtering of speech in order to focus attention on prosodic features, and this results in the loss of the segmental details that contribute strongly to what is perceived as an accent in a language. Two further terms that need to be carefully disintinguished are intelligibility and comprehensibility. The former is frequently assessed through transcription tasks, while comprehensibility is more usually measured using human rater judgements (Derwing & Munro, 1997; Derwing, Munro, & Carbonaro, 2000; Munro & Derwing, 1995, 1999, 2001). Comprehensibility typically refers to a listener’s perception of the amount of effort involved in understanding a particular non-native speaker (NNS). The two measures (intelligibility and comprehensibility) appear to be well correlated (e.g., Munro & Derwing, 1999), which suggests that the effort associated with understanding a NNS is indicative of the listener’s ability to correctly process the NNS utterances. In the current study we were concerned with comprehensibility ratings (i.e., a measure of the effort required by raters to understand the utterances they are asked to listen to). PROSODY, COMPREHENSIBILITY, AND NATIVENESS One reason for our focus on prosodic features in this study was that their impact on intelligibility has been acknowledged both in longstanding teacher beliefs and, more recently, in pronunciation instruction research (Derwing, Munro, & Wiebe, 1998; Derwing & Rossiter, 2003; Hahn, 2004). This has led to an increased recognition of the role of prosody in the comprehensibility and accentedness of native and nonnative speech (Anderson-Hsieh, Johnson, & Koehler, 1992; Munro & Derwing, 1999), with prosodic factors often producing more extreme results than segmental factors (Anderson-Hsieh, et al., 1992; Language Learning & Technology

88

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

Benrabah, 1997; Hutchinson, 1973; Tiffen, 1992). Indeed, inappropriate timing and stress patterns are often cited as major contributors to intelligibility deficit (Adams, 1979; Hahn, 1999, 2004; Kenworthy, 1987; Nelson, 1982) or “unnaturalness” (Ono, 1991). There are further pedagogical reasons for a focus on prosodic aspects of non-native speech. For instance, one study of the influence of age, motivation, and instruction on phonological performance (Moyer, 1999) varied the type of phonological feedback given to learners, to include either segmental aspects alone, or both segmental and suprasegmental aspects of learners’ performance. The type of phonological feedback significantly affected learning outcomes, and “subjects who were given both suprasegmental and segmental feedback scored closer to native” (Moyer, 1999, p. 95). In another study (Derwing, et al., 1998), three instruction types were used: two based on pronunciation, that is, segmental and global (the latter including stress, intonation and rhythm), and one with specific pronunciation instruction (providing a control group). Sentences read aloud and learner-produced narratives were recorded at the beginning and end of a 12-week course of instruction. Non-expert native speakers rated the sentences for comprehensibility and accentedness, and excerpts from the narratives for comprehensibility, accentedness and fluency. Training resulted in improvement in the read sentences for both the segmental and global groups, while for the narrative data only “speakers who had had instruction emphasizing prosodic features such as rhythm, intonation, and stress could apparently transfer their learning to a spontaneous production” (Derwing, et al., 1998, p. 406). Hardison (2004) also found that computer-assisted prosody training with a real-time pitch display produced significant improvement in both prosody and segmental accuracy, as judged by native speaker raters, and Hirata (2004) found a similar effect for Englishspeaking learners of Japanese. Our project focused on Mandarin-speaking learners of English (MSLEs) both as the largest group of English language learners, and also as a group that is likely to be particularly affected by important language differences in key aspects of prosodic structure (Hansen, 2001; Pennington & Ellis, 2000; Pennington & Richards, 1986). These include the lexical use of tone in Mandarin but not in English; differences in basic rhythmic structures (Adams, 1979; Grabe, 2002); and the greater use in Mandarin of tonal range to indicate stress (Kratochvil, 1998; Shen, 1990). Thus Chao (1980) showed that through an association of stress with pitch, Chinese learners of English produce phrases with a pitch pattern determined by the stress patterns of the separate words, rather than using an intonation pattern more appropriate to the phrase as a whole. Similarly, Juffs (1990) found that the most frequent stress errors in Chinese English result from using a tonic stress movement to mark lexical stress, and that differences in the syllable structure of the languages also affect stress assignment. Tajima, Port and Dalby (1997) observed many segmental errors in Mandarin English that reflect a tendency to avoid consonant clusters by either deleting consonants or inserting epenthetic vowels (see also Hansen, 2001; Lin, 2001; Weinberger, 1997), impacting the rhythmic pattern of the utterance. They also noted a reduced difference between stressed and unstressed vowel durations. Munro and Derwing (1999) noted that intonation is important in native speaker ratings of comprehensibility and accentedness of Mandarin English. Rhythmic factors were highlighted by Tajima et al. (1997), who used LPC resynthesis and dynamic time warping to align Mandarin English with native English timing patterns, and found a significant increase in intelligibility from 39% to 58%. Their alignment procedures (p. 8-9) also involved so-called “discrete” changes (i.e., removing or inserting segments that were or were not in the original Mandarin, to match the English target). They concluded that “there is good reason to believe that non-native speakers would benefit from training programs which focus on various temporal aspects of their speech” (p. 21). The initial prototype software module used in our project focused on stress patterns as one key feature affecting comprehensibility. Recognition trials—using a combination of features based on vowel duration, amplitude, pitch and vowel quality—produced automatic stress recognition rates for NS (native speaker) English of up to 92.6% (Xie, Andreae, Zhang, & Warren, 2004). Duration and amplitude were the most Language Learning & Technology

89

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

useful features, along with the vowel quality features associated with reduced (therefore unstressed) vowels. Although these results are comparable to those produced by similar systems, they still do not provide an adequate basis for feedback to language learners. Fine-tuning the parameters used by the software might result in some improvement in recognition. But so too might a strategy of allowing the software development to be informed by native speaker judgements of non-native speech, just as feedback provided to learners by CAP software should be consistent with human feedback (Cucchiarini, Strik, & Boves, 2000a; Derwing, et al., 2000; Kim, 2006; Levis, 2007). Thus, it is critical to establish which prosodic features affect NS listener judgements of comprehensibility and nativeness, in order to evaluate the analysis measures used by the software. The rest of this paper reports on the procedures we used to gather data on native speaker perceptions of MSLE utterances, and discusses the results and possible implications for the use of the data. A RATING STUDY OF THE COMPREHENSIBILITY AND NATIVENESS OF MSLE SPEECH Since the overall goal of our larger project was to develop interactive software for pronunciation training with a focus on prosodic aspects of learner speech, we conducted a series of tasks that aimed to establish the links between comprehensibility, nativeness and the segmental and prosodic features of non-native speech. Comprehensibility and nativeness ratings were collected from both experienced and naïve raters. In addition, the experienced listeners were asked to identify specific areas of difficulty in the utterances they heard. We chose to ask experienced listeners because they are more likely than naïve listeners to be able to pinpoint perceived problem areas. These areas included both prosodic features such as lexical and sentence stress, rhythm and pitch, and segmental features such as consonant and vowel articulation. Our nativeness ratings focused on prosody. This was achieved by using low-pass filtered speech (see also Derwing & Munro, 1997; Munro, 1995; Trofimovich & Baker, 2006; Van Els & De Bot, 1987), removing detailed information concerning the consonant and vowel segments in the speech and causing listeners to focus on prosodic features such as the timing features of duration, rate and rhythm, as well as amplitude and intonation. The resulting speech is incomprehensible, since it is deprived of any interpretable segmental and lexical content. Participants’ judgements of nativeness are therefore based solely on the prosodic features that are preserved under such conditions (Derwing & Munro, 1997). While it can be argued that the intonation pattern is severely de-contextualised, since for instance listeners cannot know whether pitch accents are being placed on the appropriate words or syllables for the intended meaning of the utterance, we believe that the low-pass filtered speech conveys sufficient non-segmental information for our judges to assess the nativeness of the more general prosodic aspects of the utterances. The results we present below seem to bear this out. Another important aspect of our rating studies is that they included both experienced and naïve judgements of the same utterances. This allows us to evaluate ratings from experienced and naïve listeners in comparable conditions. This is of methodological importance, since it provides some evidence for the relative merits of using trained and experienced versus naïve listeners for such judgements. For instance, previous research (Thompson, 1991) has indicated higher reliability in accentedness judgments from experienced raters. Speech Material The source materials were from 5 Mandarin Speaking Learners of English (MSLEs) enrolled in 12-week English language courses at Victoria University of Wellington. Only female speaker recordings were used in this study, in order to simplify the speech analysis parameters used in the computational component of the project. The ages of these 5 speakers ranged from 21 to 27, and their language proficiency scores were at a level sufficient for entering into university undergraduate study programmes (their local test scores were equivalent to at least IELTS 6.0). They had been in New Zealand for at least 10 weeks, and all Language Learning & Technology

90

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

subsequently entered degree programmes at Victoria University of Wellington. Given their location and their intentions regarding further study, it can be claimed that New Zealand English was at the time of the experiment their target variety. The materials were based on a set of phonologically-rich isolated sentences used in the New Zealand Spoken English Database (NZSED: Warren, 2002). Pre-selected sentences were used, rather than excerpts from free narratives (Derwing & Munro, 1997; Derwing, et al., 1998), reflecting similar studies that compare listener rating with automatic speech recognition/evaluation (Cucchiarini, Strik, & Boves, 1997; Cucchiarini, et al., 2000a; Cucchiarini, Strik, & Boves, 2000b). Using sentence materials based on those in NZSED also meant that we had access to a large set of comparison NS materials, which was exploited in developing materials for the nativeness rating task. We selected a set of 100 sentences from the 200 used in the NZSED project. The selected sentences contained no low frequency words, as determined by the Range program (Nation & Heatley, 2001), and no other words that were likely to be unfamiliar to our target learner population, as judged by an experienced English language teacher. This reduced the likelihood of word mispronunciation by nonnative speakers due to unfamiliarity. The 100 sentences were then read aloud (after quiet reading for familiarisation) by 5 female MSLEs. A final set of 50 utterances (10 from each of 5 speakers) was chosen so as to optimize the range of segmental and prosodic features of MSLE speech and to exclude hesitations, repeats or restarts. The sentences in this final set had an average word length of 11.3 words (range 7-15), and were long enough for rhythm and rate characteristics of the speaker to emerge. Examples are given in (1) and (2) below. (1)

The price range is smaller than any of us expected

(2)

The world is becoming increasingly dangerous but hardly anyone cares

A further 50 utterances from age-matched female native speakers in the NZSED project were included in the nativeness rating task. Again, this set consisted of 10 sentences from each of 5 speakers. They were different sentences from the materials selected from the MSLEs, and had an average length of 11.9 words. In other respects (lexical frequency, etc.) they were comparable with the MSLE sentences. For the nativeness-rating task, the native and non-native speech materials were subjected to low-pass filtering (with a cut-off frequency of 350 Hz), removing most of the segmental information, while leaving prosodic features largely intact (see also Derwing & Munro, 1997; Trofimovich & Baker, 2006). In addition to forcing the judgement of nativeness to be based on prosodic features, this also has the advantage of reducing the impact of any possible mismatch between the target English variety of the learners and that of the raters, since such a mismatch is likely to be carried by segmental features such as vowel quality.3 Raters Ten naïve and six experienced raters were used in the study, all native speakers of New Zealand English. The naïve group consisted of staff and students of Victoria University of Wellington whose area of expertise and/or study was not related to language or linguistics. This group had no regular contact with Mandarin speakers of English or any other non-native speakers of English. The experienced group consisted of teachers in the English Proficiency Programme at Victoria University of Wellington. As is the case with many English language teachers, they had little phonetic training and minimal expert knowledge of intonation and prosody. They had minimal knowledge of Mandarin or other Chinese languages, but had considerable experience in working with Mandarin learners of English, who at the time of the study made up a sizeable proportion of the students on the English Proficiency Programme. The majority of studies which involve native speaker ratings of L2 (second language) pronunciation use either only expert raters (Anderson-Hsieh, et al., 1992; Cucchiarini, et al., 1997, 2000a, 2000b) or only naïve raters (Derwing & Munro, 1997; Munro & Derwing, 1995). Studies that use experts sometimes Language Learning & Technology

91

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

include raters from different expert backgrounds (Cucchiarini, et al., 2000a, 2000b), for example, phoneticians and speech therapists, to make a comparison and evaluate reliability of expert ratings produced by different groups. However, to our knowledge there is only one study (Thompson, 1991) that compares the ratings of experienced and naïve raters. This is a significant issue both because expert or experienced raters are generally harder to recruit, and because some studies show disparity between the judgements of expert or experienced raters on the one hand, and naïve or inexperienced raters on the other. For example, older studies cited in Cucchiarini et al. (2000b) indicate low reliability for expert fluency ratings. However, Thompson (1991) observed that experienced listeners were more reliable and more lenient in accentedness ratings than inexperienced listeners. Procedure The study consisted of two separate sessions, which differed slightly for experienced and naïve raters. In their first session (comprehensibility rating), naïve listeners completed three tasks for each utterance: i)

First, they rated the comprehensibility of the recorded utterances. The following clarification was provided to encourage raters to focus on comparable criteria: “In carrying out this rating, please think about how much effort you had to put into working out what was being said.” Raters listened to each utterance once, without seeing a transcription of the utterance, before giving a comprehensibility rating on a scale from 1 (“not easy to understand”) to 9 (“very easy to understand”). Our use of a 9point scale is based on that of Derwing and Munro (1997) except that theirs ranged from “extremely easy to understand” to “extremely difficult or impossible to understand” (p5). Note also that in their methodological study of scales used in accent rating, Southwood and Flege (1999) indicate that using anything with fewer than 9 points is likely to yield unsatisfactory outcomes.

ii) Raters were then presented with the orthographic transcription in a response booklet and were asked to mark specific areas of difficulty that affected comprehensibility. iii) Finally, raters were asked to comment in the response booklet on general areas of difficulty affecting comprehensibility across the utterance as a whole. For tasks ii) and iii), raters were able to listen to the utterance as many times as they needed. The experienced listeners followed the same procedure as above, except that between tasks i) and ii) they carried out the following additional task: These experienced raters heard the utterance one more time, still without seeing the orthographic transcription. Their instruction screen for this part of the study read “Thinking about the utterance as a whole, indicate on the next page of your response booklet whether any of the following areas caused particular difficulty for understanding” after which they were given a list of phonetic and prosodic features to choose from, namely pronunciation of consonants, pronunciation of vowels, word stress, sentence stress, rhythm, intonation and rate (i.e., a range of segmental and suprasegmental features that have previously been associated with listener effort in understanding). Our intention was that using these categories would provide us with some structured information about the types of difficulty experienced by the raters. However, the raters were also able to add other areas of difficulty, in their own words. This additional task was included in order to obtain more precise data from experienced listeners on aspects of pronunciation and prosody that might affect comprehensibility judgements, for use in our further analysis. We believed that naïve listeners would not be able to provide such data in a readily interpretable form, because of unfamiliarity with the appropriate linguistic terminology. This additional task distinguishes our study from previous comprehensibility studies, where listeners either only rate overall comprehensibility, or are required to assign specific ratings for identified features, rather than actually identifying features that cause difficulty in comprehension.

Language Learning & Technology

92

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

In the second session, raters were asked to provide nativeness ratings (“Enter your rating of how much this was like a native-speaker”) for each of the 100 examples of low-pass filtered speech (50 NS utterances along with the 50 NNS utterances used in the comprehensibility task), presented in random order. Listeners heard each utterance twice, and assigned a rating from 1 (“not at all native-like”) to 9 (“very like a native speaker”). Derwing and Munro (1997) similarly used a 9-point scale in their accentedness task, but with reversed endpoints, from “no accent” to “extremely strong accent.” As well as removing segmental cues to lexical content, the low-pass filtering also eliminated voice quality information conveyed by segmental properties (e.g., by vowel quality). We believe that it is reasonable to assume that this, along with the random mix of the NNS items with previously unheard NS items, made it unlikely that listeners would have based their judgements of nativeness on remembered aspects of the NNS utterances previously heard in the comprehensibility rating session. In addition, our nativeness rating task, unlike that used by Derwing and Munro (1997), did not present raters with transcripts of the sentences to refer to while assigning nativeness ratings, ensuring that their ‘feel for’ nativeness was based solely on the available prosodic information. Presentation of speech stimuli and collection of rating data were controlled by E-Prime software (Schneider, Eschman, & Zuccolotto, 2002). Raters entered data directly onto response sheets for the more qualitative aspects of the first session. Two presentation orders of the utterances in the first session were used; utterances were placed into two blocks, and the presentation orders differed in how these two blocks were ordered. Within each group of raters (experienced and naïve) half of the participants were randomly allocated to each order, to reduce any impact of practice effects on judgements for individual utterances, particularly effects that might result from increasing familiarity with MSLE pronunciation. For the nativeness rating session, a new random presentation order of utterances was determined for each rater by the software. RESULTS This section presents summary results from the two tasks, for both experienced and naïve listeners, as well as comparisons of results for the two rater groups and comparisons of the results for the two tasks. Detailed discussion of the results follows in the next section. Reliability So that we could be confident that our rating data would be of use in software development, we first assessed inter-rater agreement, by two methods. First we transformed correlations between each pair of raters into Z-scores and calculated the mean (Hatch & Lazaraton, 1991). Second, to allow comparison with other published research using the same method, we calculated intraclass correlations (Shrout & Fleiss, 1979). For the comprehensibility-rating task, we obtained for the entire group of 16 raters (10 naïve, 6 experienced) a Pearson coefficient (r) of .75, significant at p < .01, and an intraclass correlation coefficient (ICC) of .954, p < .01. The equivalent analysis for the nativeness rating data for the entire group of raters over the complete set of 100 utterances (50 native speaker and 50 MSLE) gave a Pearson coefficient (r) of .74, significant at p < .01 and an ICC of .931, p < .01. For the native speaker utterances alone the analysis of nativeness ratings gave an r of .68 and ICC of .824; for the non-native speaker utterances r was .74 and ICC was .937 (all significant at p < .01). The lower figures for native speakers most likely resulted from a more restricted range of rating values given for these speakers, giving less scope for a clear correlation effect. However, statistical comparison of the ICC figures showed no significant difference between the reliability scores for ratings of native and non-native speakers. Note that our overall reliability scores compare well with those reported in the literature (e.g., r of .71 and .70 for comprehensibility and accent ratings respectively for the naïve raters reported in Derwing et al.,1998, and ICC of .968 and .987 for comprehensibility and accent ratings reported by Munro and Derwing, 2006).

Language Learning & Technology

93

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

Because we wished to assess our comprehensibility ratings against identification of problem areas by the group of experienced listeners, we also needed to assess the reliability of comprehensibility ratings given by this group alone. Munro and Derwing (1995), for example, pointed out that previous research (e.g., Gass & Varonis, 1984) has shown that comprehensibility rating “tends to improve with increased exposure to foreign-accented speech”( p. 297), which is likely to be the case with English language teachers in New Zealand, who have high exposure to MSLE pronunciation. Thompson’s (1991) experiment, in which experienced and inexperienced raters evaluated the degree of foreign accent using speech samples from Russian-born NNSs of English, also showed that experienced raters (collegeeducated native speakers who spoke a foreign language fluently, lived and studied abroad, had taken a course in linguistics, and had frequent contacts with Russian speakers of English) were significantly more lenient towards deviations in L2 pronunciation as a group than the inexperienced NS raters. However, experienced raters’ judgements were more reliable and did not fluctuate as much, compared to inexperienced raters (Thompson, 1991). In addition, we wished to compare ratings from naïve and experienced listeners (English language teachers, in our case) in order to determine whether experienced ratings are in agreement with those given by the naïve listeners, and to improve the ecological validity of the study. Our second analysis therefore tests whether each group of raters showed a good level of reliability, and whether there were measurable differences in the comprehensibility ratings given by experienced and naïve listeners. The Pearson coefficients within each group of raters were .72 and .74 for the 6 experienced and 10 naïve listeners respectively, and the corresponding ICC values were .883 and .929. All values were significant at p < .01, and values for the two rater groups did not differ significantly from one another, indicating good and comparable levels of agreement within each of the groups. Mean ratings within each group were calculated for each of the 50 utterances. The overall means were 5.97 and 6.02 for experienced and naïve rater groups respectively (on the 9-point scale), and a matched-pairs t-test indicated that these did not differ (t(49) = 0.528, p = .60). In addition, a correlation analysis of the utterance means for each group showed a high level of agreement between experienced and naïve listeners (r = .92, p < .001). Comprehensibility and areas of difficulty The above analyses have confirmed good overall levels of inter-rater reliability in both tasks, and a high level of agreement between the two rater groups in the comprehensibility task. These results give us confidence that we can generalize to naïve listeners any association that we may find between the comprehensibility ratings and the indications of areas of difficulty given by the experienced listeners. In the context of the overall project goals and our focus on prosodic features, our next analysis addressed the question of whether the comprehensibility ratings given by experts were reliably associated with these same experts’ indications of difficulty in areas related to prosodic structure, namely intonation, rhythm, stress, rate. (It should be noted of course that a positive answer to this question does not necessitate a negative answer to a similar question that might be posed concerning the role of segmental features; that is, it is possible that features in each area are closely associated with comprehensibility.) To determine whether comprehensibility ratings were associated with specific areas of difficulty identified in the utterances, a logit model (Agresti & Liu, 2001; Liang & Zeger, 1986) was applied to the experts’ rating data and the seven problem areas open for identification by them on their second hearing of the utterance (recall that this is still prior to seeing the orthographic transcription of the utterance). This analysis revealed a significant association of comprehensibility ratings with identifications of problems in each of the following areas: sentence stress, consonant pronunciation, vowel pronunciation, and intonation (each at p < .01), as well as rhythm and word stress (each at p < .05), with the strength of the association with these six factors decreasing in the order given. The association in each case was that a lower rating was more likely to be associated with an indication of a problem in each of the six areas for which the association was significant. Unlike other authors (e.g., Munro & Derwing, 2001), we found that Language Learning & Technology

94

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

problems in speech rate showed no significant association with the comprehensibility rating. The generally slow rate of the NNS utterances may have made it difficult for the listeners to discriminate between them in terms of speech rate issues. (The mean rates in syllables/second were 3.08 and 5.48 for NS and NNS respectively, t(98) = 20.64, p < .001.) Factor analysis of the seven problem areas reduced them to five components. The first of these included significant loadings for sentence stress, intonation, and rhythm, which we can call a sentence prosody factor. The other components loaded individually for each of the remaining four areas: consonant pronunciation, vowel pronunciation, word stress, and rate. Subsequent analysis showed significant correlation of comprehensibility ratings with each of sentence prosody, word stress, consonant pronunciation, and vowel pronunciation (with r in the range .24-.31). Nativeness The next set of analyses related to the nativeness ratings. These were obtained, as indicated above, in order to require listeners to focus on the prosodic features of the utterances. The reliability statistics reported above have shown that overall inter-rater reliability is good for this task (r was .75, ICC was .954, p < .01). However, more detailed analysis shows numerically greater reliability for the 10 naïve listeners than for the 6 experts, with r at .73 and .69 and ICC at .904 and .822 for the two groups respectively (significant at p < .01). (Note that the similar analysis of the comprehensibility ratings showed a smaller difference between the two rater groups.) In addition, naïve listeners show a greater distinction between native and non-native speakers (mean ratings for each group were 6.11 and 3.90 respectively) than the experienced listeners (5.83 vs. 4.43). However, this difference was not confirmed in Analysis of Variance of each rater’s mean ratings for each speaker group. This analysis showed a significant main effect of speaker group (F(1,14) = 50.65, p < .001)4, but no interaction of speaker group with rater group (F(1,14) = 2.55, p >.1). Since our subsequent analysis of comprehensibility was based on data only from our naïve listeners (recall that our experts were not asked to complete this part of the test), we were reassured that the results presented in this section failed to show any significant differences between ratings from the experienced raters and those from the naïve raters. Comprehensibility and nativeness In the identification of materials that can be used to assess the software, our goal was to isolate utterances that present difficulties on the basis of their prosodic features. The analysis of comprehensibility and the identification of problem areas went some way towards achieving this goal. The analysis of nativeness ratings also contributed in this direction, in that we could select items simply on the basis of low scores in this task. However, we were also interested in the relationship between comprehensibility and perceived nativeness, and in particular in any association between the two. The presence of a positive relationship might suggest that the prosodic features not eliminated by the low-pass filtering were indeed contributing to comprehensibility. So our next question was whether the nativeness ratings (of low-pass filtered speech, so based largely on prosodic features) and comprehensibility ratings (of unfiltered speech, so including segmental features) from our naïve listeners were correlated, as might be predicted by a model of comprehensibility that acknowledges the contribution made by the prosodic features being assessed in the nativeness rating task. Since the same MSLE utterances were used in each rating task, we addressed this question in a simple correlation analysis of average comprehensibility and nativeness rating scores given to each MSLE utterance. In Figure 1 these rating scores for each utterance in the two tasks are plotted against each other. There was a significant overall correlation (r = .59, p < .001), confirming a positive relationship between nativeness and comprehensibility. Note also that the data are distributed in a manner that indicates that perceived nativeness provides a baseline on top of which comprehensibility appears to be built. That is, comprehensibility ratings usually exceeded nativeness ratings for individual utterances, and were rarely lower than the nativeness ratings. Indeed, our results here mirror those of Derwing and

Language Learning & Technology

95

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

Munro (1997), who observed that “accent ratings are harsher than perceived comprehensibility ratings” (p. 11).

Figure 1. Average ratings from naïve listeners for nativeness (horizontal axis) and comprehensibility (vertical axis) for 50 Mandarin English utterances (10 utterances from each of 5 speakers). Rating scales range from 1 to 9 in each case (see text for details). The two sets of ratings correlate significantly (r = .59, p < .001). DISCUSSION The preceding section has presented the main results from our rating study. These show that inter-rater reliability in the rating tasks is good, and that experienced and naïve raters show a high degree of agreement in the comprehensibility rating task, but less so in the nativeness task. In addition, comprehensibility ratings are significantly associated with experienced listeners’ identification of problems in sentence prosody (intonation, rhythm and sentence stress) as well as in segmental pronunciation (of both vowels and consonants). Finally, naïve listener ratings in the two tasks (with and without segmental information) are significantly correlated, suggesting that the prosodic information used in the nativeness task is also important in the comprehensibility task, and confirming the analysis associating comprehensibility ratings and problem areas. The results of the rating studies, then, provide useful information for future work towards establishing a framework for designing computer-aided pronunciation training tools. First, the studies show that experienced and naïve raters agree in their judgements of L2 comprehensibility, so that there is no evidence of an advantage in using language teachers. Second, the studies also show that naïve listeners are no less reliable than experienced raters in distinguishing between native and non-native accents on the Language Learning & Technology

96

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

basis of prosodic information alone. Note that this pattern differs from that presented by Thompson (1991), who found greater inter-rater reliability in accentedness judgements from experienced raters than from naïve raters. It should be stressed, however, that there are important differences between her studies and ours. Most importantly, our raters listened to low-pass filtered speech to arrive at judgements of nativeness, while Thompson’s raters made accentedness judgements on unfiltered speech. Recall that we used filtered speech because of our primary interest in the prosodic aspects of speech, which were the chosen target of the computational part of our overall research project. Prosody and intonation are perhaps the least well-covered aspects of pronunciation in typical English teacher-training programmes, and so it should come as no surprise that our experienced raters, English language teachers, were no more reliable than our naïve raters. In consequence, apart from being able to request ratings of specific aspects of speech production, for which a certain degree of familiarity with phonetic description would be useful, there seems little advantage in recruiting experienced raters rather than using more readily available untrained listeners. In addition, our rating studies have confirmed that specific features of both prosodic and segmental aspects of speech, as identified by experienced raters, correlate well with the overall judgements of comprehensibility of L2 utterances by naïve speakers. This finding is in line with Munro and Derwing’s (2001) conclusion based on previous studies (Anderson-Hsieh, et al., 1992; Brennan & Brennan, 1981; Munro & Derwing, 1999) that “simple counts of segmental errors and prosodic assessments correlate well with listeners’ ratings of L2 speech on such dimensions as accentedness and comprehensibility, whether or not the listeners are phonetically trained” (p.453). Cucchiarini et al.’s (2000a) study, which compared automatic scores produced by speech recognition algorithms with expert ratings of pronunciation quality, also shows that specific ratings collected from expert raters (phoneticians and speech therapists) were highly correlated with the overall pronunciation ratings. Cucchiarini et al. conclude that these findings “warrant the use of overall ratings of pronunciation as a sole reference for the automatic score” (p.118). Finally, the findings of the factor analysis, which groups together sentence stress, intonation, and rhythm as a sentence prosody factor, warrant an approach to software development that includes all three features in the learning activities aimed to improve sentence prosody. This is, of course, not to deny that the other significant factors—word stress, consonant pronunciation, and vowel pronunciation—also need to be treated within the pedagogical framework used in software development. SUMMARY In the context of developing software that would offer useful and effective feedback to Mandarin speaking learners of English on their pronunciation, we have assessed the relative importance of different speech features through the effect they have on the communicative quality of the utterance, measured by comprehensibility ratings. Such data are important to the issue of how to evaluate and fine-tune the acoustic information that the software derives from learner speech and subsequently uses in assessing learner performance. We have identified a number of issues that need to be addressed in developing pedagogical and software models for learner pronunciation instruction. It was clear that prosodic features have an important effect on comprehensibility, a finding that supports previous studies suggesting that time spent on such features is well justified (see supporting references discussed in our Introduction). Rehearsal of prosodic features in a semi-communicative context can be provided through software that targets features that have the strongest effect on comprehensibility, and a conscious awareness of those features can be raised through a number of explanatory notes associated with the feedback that the software provides. Feedback, rehearsal, and language awareness are three learning opportunities that are well supported in curriculum development (Crabbe, 2003).

Language Learning & Technology

97

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

It has also been acknowledged that accuracy, relevance, and ease of interpretation are key issues in the provision of feedback through automated software for CAP. The two main problems with existing CAP software are the limitations of automatic speech recognition technologies which are yet to reach maturity, and the lack of a clear pedagogical basis in software design. In order to address technological limitations, the research reported here set out to establish relevant comprehensibility data to be used as a feedback parameter in developing CAP software. Our exploration of a methodology for incorporating native speaker judgements into decision-making on the parameters used in developing pronunciation feedback software offers a useful contribution in this area. Our initial results show that holistic comprehensibility ratings by naïve native speakers provide good information with which to fine-tune CAP software for prosodic features. This would imply that where the development of such software incorporates native speaker judgements in determining acceptability, then using naïve speakers is sufficient for this purpose. We believe that the exploration of how such native speaker judgements can be used as a parameter in selecting features for automated feedback on pronunciation is a productive area for further research.

NOTES 1. Connected Speech (2001). Protea Textware Pty Ltd. http://www.proteatextware.com.au 2. ISLE (Interactive Spoken Language Education). The ISLE Consortium. http://natswww.informatik.uni-hamburg.de/~isle/index.html 3. A reviewer has suggested that a prosodic difference between the native and non-native recordings used in our experiment—and therefore a potential difference between target varieties for the learners and the raters—might lie in the New Zealand tendency to use High Rising Terminals (i.e., rising intonation patterns on statement utterances). In fact, these are extremely rare in sentence readings (and were absent from our recordings), since they function largely as discourse markers in conversations or in longer narratives (see Warren & Britain, 2000). 4. Levene’s test showed no significant difference in the variances for the two rater groups.

ACKNOWLEDGEMENTS The research reported here formed part of a project funded by the New Economy Research Fund administered by the NZ Foundation for Research Science and Technology (contract number: VICX0011). We would like to acknowledge the contributions made to this research by the other project members Peter Andreae, Mengjie Zhang, Jason Xie, and Mike Doig, and to thank Ave Coxhead and our panels of raters for their valuable assistance. Our thanks go also to the editors and three anonymous reviewers for their constructive comments on an earlier version of this paper.

ABOUT THE AUTHORS Paul Warren is Associate Professor in the School of Linguistics and Applied Language Studies at Victoria University of Wellington, New Zealand. Paul's primary research interests are in psycholinguistics, in particular spoken word recognition and the use of intonation in sentence processing. Since moving to New Zealand in 1994, he has combined these interests with a growing fascination in the development of New Zealand English.

Language Learning & Technology

98

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

E-Mail: [email protected] Dr. Irina Elgort is a lecturer in academic development at Victoria University of Wellington, New Zealand. Her research interests include L2 vocabulary acquisition, reading and computer assisted language learning (CALL). She teaches a CALL paper in the MA in TESOL/Applied Linguistics programme at Victoria. E-Mail: [email protected] David Crabbe is Associate Professor in the School of Linguistics and Applied Language Studies at Victoria University of Wellington, New Zealand. David works on language curriculum development and learner autonomy. He teaches and supervises in the graduate applied linguistics programme at Victoria University of Wellington and has a broader management role at the university in the area of learning and teaching. E-Mail: [email protected]

REFERENCES Adams, C. (1979). English speech rhythm and the foreign learner. The Hague: Mouton. Agresti, A., & Liu, I. (2001). Strategies for modeling a categorical variable allowing multiple category choices. Sociological Methods and Research, 29, 403-434. Anderson-Hsieh, J., Johnson, R., & Koehler, K. (1992). The relationship between native speaker judgments of non-native pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning, 42, 529-555. Benrabah, M. (1997). Word-stress: A source of unintelligibility in English. IRAL, XXXV(3), 157-165. Breitkreutz, J., Derwing, T., & Rossiter, M. (2002). Pronunciation teaching practices in Canada. TESL Canada Journal, 19, 51-61. Brennan, E., & Brennan, J. (1981). Measurements of accent and attitude towards Mexican-American speech. Journal of Psycholinguistic Research, 10, 487-501. Burgess, J., & Spencer, S. (2000). Phonology and pronunciation in integrated language teaching and teacher education. System, 28, 191-215. Chao, Y. R. (1980). Chinese tones and English stress. In L. R. Waugh & C. H. van Schooneveld (Eds.), The melody of language: Intonation and prosody (pp. 41-44). Baltimore: University Park Press. Crabbe, D. (2003). The quality of language learning opportunities. TESOL Quarterly, 37(1), 9-34. Cucchiarini, C., Strik, H., & Boves, L. (1997). Automatic evaluation of Dutch pronunciation by using speech recognition technology. Paper presented at the 1997 IEEE workshop ASRU, Santa Barbara. Cucchiarini, C., Strik, H., & Boves, L. (2000a). Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. Speech Communication, 30, 109119. Cucchiarini, C., Strik, H., & Boves, L. (2000b). Quantitative assessment of second language learners' fluency by means of automatic speech recognition technology. Journal of the Acoustical Society of America, 107(2), 989-999. Derwing, T. M., & Munro, M. J. (1997). Accent, intelligibility, and comprehensibility: Evidence from four L1s. Studies in Second Language Acquisition, 19(1), 1-16.

Language Learning & Technology

99

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

Derwing, T. M., & Munro, M. J. (2005). Second language accent and pronunciation teaching: A researchbased approach. TESOL Quarterly, 39, 379-397. Derwing, T. M., Munro, M. J., & Carbonaro, M. D. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34, 592-603. Derwing, T. M., Munro, M. J., & Wiebe, G. (1998). Evidence in favour of a broad framework for pronunciation instruction. Language Learning, 48(3), 393-410. Derwing, T. M., & Rossiter, M. (2003). The effects of pronunciation instruction on the accuracy, fluency, and complexity of L2 accented speech. Applied Language Learning, 13, 1-17. Doughty, C. J., & Long, M. H. (Eds.). (2003). The handbook of second language acquisition. Malden, MA: Blackwell. Ehsani, F., & Knodt, E. (1998). Speech technology in computer-aided language learning: strengths and limitations of a new CALL paradigm. Language Learning & Technology, 2(1), 45-60. Retrieved from http://llt.msu.edu Gass, S., & Varonis, E. (1984). The effect of familiarity on the comprehensibility of nonnative speech. Language Learning, 34, 65-89. Grabe, E. (2002). Variation adds to prosodic typology. In B. Bel & I. Marlin (Eds.), Proceedings of the Speech Prosody 2002 Conference (pp. 127-132). Aix-en-Provence: Laboratoire Parole et Langage. Hahn, L. D. (1999). Native speakers' reactions to non-native stress in English discourse. Unpublished dissertation, University of Illinois at Urbana-Champaign. Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38, 201-233. Hansen, J. G. (2001). Linguistics constraints on the acquisition of English syllable codas by native speakers of Mandarin Chinese. Applied Linguistics, 22(3), 338-365. Hardison, D. M. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning & Technology, 8, 34-52. Retrieved from http://llt.msu.edu Hatch, E., & Lazaraton, A. (1991). The research manual: Design and statistics for applied linguistics. New York: Newbury House. Hedge, T. (2000). Teaching and learning in the language classroom. Oxford: Oxford University Press. Hirata, Y. (2004). Computer-assisted pronunciation training for native English speakers learning Japanese pitch and duration contrasts. Computer Assisted Language Learning, 17, 357-376. Hutchinson, S. P. (1973). An objective index of the English-Spanish pronunciation dimension. Unpublished Masters thesis, University of Texas, Austin, TX. Juffs, A. (1990). Tone, syllable structure and interlanguage phonology: Chinese learners' stress errors. IRAL, XXVIII(2), 99-115. Kenworthy, J. (1987). Teaching English pronunciation. New York: Longman. Kim, I. S. (2006). Automatic speech recognition: Reliability and pedagogical implications for teaching pronunciation. Educational Technology and Society, 9(1), 322-344. Kratochvil, P. (1998). Intonation in Beijing Chinese. In D. Hirst & A. di Cristo (Eds.), Intonation systems (pp. 417-431). Cambridge: Cambridge University Press.

Language Learning & Technology

100

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

Levis, J. (2005). Changing contexts and shifting paradigms in pronunciation teaching. TESOL Quarterly, 39(3), 369-378. Levis, J. (2007). Computer technology in teaching and researching pronunciation. Annual Review of Applied Linguistics, 27, 184-202. Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22. Lin, Y.-H. (2001). Syllable simplification strategies: a stylistic perspective. Language Learning, 51(4), 681-718. MacDonald, S. (2002). Pronunciation-views and practices of reluctant teachers. Prospect, 17(3), 3-18. Morley, J. (1991). The pronunciation component of teaching English to speakers of other languages. TESOL Quarterly, 25, 481-520. Moyer, A. (1999). Ultimate attainment in L2 phonology. The critical factors of age, motivation and instruction. Studies in Second Language Acquisition, 21(1), 81-108. Munro, M. J. (1995). Nonsegmental factors in foreign accent: ratings of filtered speech. Studies in Second Language Acquisition, 17, 17-34. Munro, M. J., & Derwing, T. M. (1995). Processing time, accent, and comprehensibility in the perception of native and foreign-accented speech. Language and Speech, 38(3), 289-306. Munro, M. J., & Derwing, T. M. (1999). Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language Learning, 49(Supp. 1), 285-310. Munro, M. J., & Derwing, T. M. (2001). Modeling perceptions of the accentedness and comprehensibility of L2 speech. Studies in Second Language Acquisition, 23, 451-568. Munro, M. J., & Derwing, T. M. (2006). The functional load principle in ESL pronunciation instruction: An exploratory study. System, 34, 520-531. Murphy, J. (1997). Phonology courses offered by MATESOL programs in the United States. TESOL Quarterly, 31(4), 741-764. Nation, P., & Heatley, A. (2001). RANGE. A program for measuring the lexical burden of texts. Wellington: School of Linguistics and Applied Language Studies, Victoria University of Wellington. Nelson, C. (1982). Intelligibility and non-native varieties of English. The other tongue: English across cultures, 15, 59-73. Neri, A., Cucchiarini, C., Strik, H., & Boves, L. (2002). The pedagogy-technology interface in computer assisted pronunciation training. Computer Assisted Language Learning, 15(5), 441-467. Nunan, D. (1999). Second language teaching and learning. Boston: Heinle & Heinle. Ono, Y. (1991). Experimental phonetic analysis of the speech sounds and prosodic features produced by native and non-native speakers. Language and Culture, 20, 241-288. Pennington, M. C. (1999). Computer-aided pronunciation pedagogy: promise, limitations, directions. Computer Assisted Language Learning, 12, 427-440. Pennington, M. C., & Ellis, N. C. (2000). Cantonese speakers’ memory for English sentences with prosodic clues. The Modern Language Journal, 84, 372-389. Pennington, M. C., & Richards, J. (1986). Pronunciation revisited. TESOL Quarterly, 20, 207-226.

Language Learning & Technology

101

Paul Warren, Irina Elgort, and David Crabbe

Prosody and Software Development

Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-Prime user's guide. Pittsburgh: Psychology Software Tools, Inc. Shen, X.-n. S. (1990). The prosody of Mandarin Chinese (Vol. 118). Berkeley: University of California Press. Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428. Southwood, M. H., & Flege, J. E. (1999). Scaling foreign accent: Direct magnitude estimation versus interval scaling. Clinical Linguistics and Phonetics, 13, 335-349. Tajima, K., Port, R., & Dalby, J. (1997). Effects of temporal correction on intelligibility of foreignaccented English. Journal of Phonetics, 25, 1-24. Thompson, I. (1991). Foreign accents revisited: The English pronunciation of Russian immigrants. Language Learning, 41(2), 177-204. Tiffen, B. (1992). A study of the intelligibility of Nigerian English. In A. v. Essen & E. I. Burkart (Eds.), Homage to W. R. Lee: Essays in English as a foreign or second language (pp. 255-259). Berlin: Foris. Trofimovich, P., & Baker, W. (2006). Learning second language suprasegmentals: Effect of L2 experience on prosody and fluency characteristics of L2 speech. Studies in Second Language Acquisition, 28, 1-30. Van Els, T., & De Bot, K. (1987). The role of intonation in foreign accent. The Modern Language Journal, 72, 147-155. Warren, P. (2002). NZSED: Building and using a speech database for New Zealand English. New Zealand English Journal, 16, 53-58. Warren, P., & Britain, D. (2000). Intonation and prosody in New Zealand English. In A. Bell & K. Kuiper (Eds.), New Zealand English (pp. 146-172). Wellington: Victoria University Press. Weinberger, S. H. (1997). Minimal segments in second language phonology. In A. James & J. Leather (Eds.), Second language speech: Structure and process (pp. 263-312). Berlin: Mouton de Gruyter. Xie, H., Andreae, P., Zhang, M., & Warren, P. (2004). Learning models for English speech recognition. In V. Estivill-Castro (Ed.), Proceedings of the Twenty-Seventh Australasian Computer Science Conference (ACSC2004) (Vol. 26, pp. 323-329). Dunedin, New Zealand: Australian Computer Society, Inc.

Language Learning & Technology

102