Effects of Speech-to-Text Recognition Application on Learning ...

2 downloads 29603 Views 280KB Size Report
Effects of Speech-to-Text Recognition Application on Learning Performance in Synchronous ... that they were highly motivated to use STR as a learning tool in the future. .... testing. Figure 1. Flowchart structure for the study. Learning Design.
Hwang, W.-Y., Shadiev, R., Kuo, T. C. T., & Chen, N.-S. (2012). Effects of Speech-to-Text Recognition Application on Learning Performance in Synchronous Cyber Classrooms. Educational Technology & Society, 15 (1), 367–380.

Effects of Speech-to-Text Recognition Application on Learning Performance in Synchronous Cyber Classrooms Wu-Yuin Hwang, Rustam Shadiev, Tony C. T. Kuo1* and Nian-Shing Chen2

Graduate Institute of Network Learning Technology, National Central University, Taiwan // 1Department of Management & Information, National Open University, Taiwan // 2Department of Information Management, National Sun Yat-Sen University, Taiwan // [email protected] // [email protected] // [email protected] // [email protected] * Corresponding author ABSTRACT The aim of this study was to apply Speech-to-Text Recognition (STR) in an effort to improve learning performance in an online synchronous cyber classroom environment. Students’ perceptions and their behavioral intentions toward using STR and the effectiveness of applying STR in synchronous cyber classrooms were also investigated. After the experiment, students from the experimental group perceived that the STR mechanism was easy to use and useful for one-way lectures as well as for individual learning. Most students also expressed that they were highly motivated to use STR as a learning tool in the future. Statistical results showed moderate improvement in the experimental groups’ performance over the control group on homework accomplishments. However, once the students in the experimental group became familiar with the STR-generated texts and used them as learning tools, they significantly outperformed the control group students in post-test results. Interviews with participating students revealed that STR-generated texts were beneficial to learning during and after oneway lectures. Based on our findings, it is recommended that students apply STR to enhance their understanding of teachers’ lectures in an online synchronous cyber classroom. Additionally, we recommend students should take advantages of the text generated by STR both during and after lectures.

Keywords Speech to text recognition, Synchronous learning, Homework, Note–taking

Introduction A number of studies have reported the benefits of online synchronous teaching and learning for online courses, although some challenges and limitations still require resolution (Chen, Ko, Kinshuk & Lin, 2005; Hastie, Hung, Chen & Kinshuk, 2010; Wang, Chen & Levy, 2010). One of the most common concerns reported is the presence of poor audio quality due to restricted bandwidth availability and traffic congestion for last mile of Internet access (Chen et al., 2005; Hastie et al., 2010; Wang et al., 2010). According to Chen and Wang (2008) and Kanevsky et al. (2006), students who suffer from bandwidth problems during online synchronous lectures can benefit from reading text streams, which may be synchronously typed on a keyboard or transcribed by Speech-to-Text Recognition (STR) technology. Moreover, Chen and Wang (2008) and Wald (2010) emphasized the pedagogical usefulness of text displayed simultaneously for students during a synchronous lecture, as it facilitates better learning. However, previous research tended to focus on issues related to STR application development and its rate of recognition accuracy improvement, rather than on how it can be applied for improving learning performance (Kanevsky et al., 2006; Wald & Bain, 2008; Way, Kheir & Bevilacqua, 2008). Furthermore, most studies only applied STR in a traditional face-to-face teaching setting but not in an online synchronous teaching and learning environment (Ryba, McIvor, Shakir & Paez, 2006; SRS, 2011; Wald, 2010). This study argues that teaching and learning activities in an online synchronous cyber classroom can be better facilitated by using STR technology. For example, students can follow a teacher’s lecture more easily by reading the STR-generated texts, if the quality of audio degrades during communications; therefore, STR-generated texts can minimize audio communication difficulties and reduce any chance of missing important information. In the interim, STR-generated texts can help students attain a better understanding of a lecture’s meaning, allow for simultaneous note-taking during the lecture, and help students to complete homework after the lecture. An experiment was conducted with the aim to apply STR for improving learning performance in an online synchronous cyber classroom environment as well as in a situation wherein an individual student completed homework. Students’ perceptions and behavioral intentions toward using STR and the effectiveness of applying STR on learning performance in a synchronous cyber classroom were also investigated. This study addressed three ISSN 1436-4522 (online) and 1176-3647 (print). © International Forum of Educational Technology & Society (IFETS). The authors and the forum jointly retain the copyright of the articles. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear the full citation on the first page. Copyrights for components of this work owned by others than IFETS must be honoured. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from the editors at [email protected].

367

primary research questions. First, what are the students’ perceptions and behavioral intentions regarding the use of the STR technology in a synchronous learning environment? Second, do the students who use STR-generated texts perform better at accomplishing homework tasks and in post-test evaluations than the students who do not use STR technology? Third, based on interviews, are STR-generated texts beneficial to students? The remainder of the paper is organized as follows: literature regarding synchronous teaching and learning with STR are reviewed. The theories or models, which are used to analyze the usefulness of the designed STR mechanisms, are discussed. The description of the study’s designed method follows. The results and pedagogical implications of the study are then presented. Finally, a few concluding remarks are given.

Literature Review Online Synchronous Teaching and Learning A number of studies have demonstrated the benefits of online synchronous teaching and learning for online courses (Chen et al., 2005; Hastie et al., 2010; Wang et al., 2010). For example, Hastie et al. (2010) argued that online synchronous teaching and learning allows teachers and students to establish their communications link, create a social presence, and negotiate learning content. They then embark on the ‘real’ synchronous interactive component of the lesson and the teacher gives the students ‘live’ feedback and evaluation. According to Chen et al. (2005), “…in many situations synchronous solutions for instruction can outperform both asynchronous online instruction and traditional face-to-face education.” However, advantages of online synchronous teaching and learning can be hindered by certain challenges and limitations. Students in the studies of Chen et al. (2005), Hastie et al. (2010), and Wang et al. (2010) occasionally experienced technological challenges during online synchronous learning caused by network traffic congestion. Specifically, the students could not hear the audio clearly. Chen and Wang (2008) drew attention to text chatting during synchronous teaching and learning by emphasizing that it can accommodate interaction in many important ways. First, text chatting can be used to supplement and complement audio when its quality becomes problematic. Second, text chatting can be used by the teacher or by an advanced student to summarize the major points of verbal exchange to review what has been previously said.

Technology of STR and its Accuracy STR technology translates speech input into text in real-time. Way et al. (2008) described STR as a process. This process involves a teacher speaking into a microphone, wherein the speech is recognized and shown synchronously in the form of text for students to read. The accuracy rate is considered one of the fundamental issues in STR studies. Wald and Bain (2008) claimed that the accuracy rate of STR during a lecture depends on a teacher’s lecture experience, abilities, and familiarity with the lecture material. Way et al. (2008) suggested that STR application training should take place in order to achieve good dictation accuracy. Their study demonstrated that the accuracy rate reached 90 percent after moderate STR training. The accuracy rate of application reached 91 percent after the dictionary of STR was customized with unfamiliar domain-specific terminology.

STR for Education STR has the potential to become a valuable tool in education as it makes teaching accessible and understandable to all students and it improves the quality of education (Wald, 2010). For example, the Speech Recognition in Schools (2011) project helped students to overcome difficulties in reading, writing, or spelling; the project reported significant improvements in some students’ basic reading, writing, and spelling skills with the support of STR. Ryba et al. (2006) examined STR in the university lecture theatre. Their participants reported that the system had potential as an instructional support mechanism; however, a greater accuracy in the system’s recognition of lecture text vocabulary needs to be achieved. Wald (2010) and Wald and Bain (2008) developed STR applications to assist deaf students and non native speakers during lectures. According to this research, the students perceived that text 368

generated by STR could improve learning as long as it was reasonably accurate (i.e., >85 percent). Most students used STR-generated text as an additional resource to verify and clarify what they heard as well as to take and augment their own notes. The students also believed that lecture transcriptions helped them to better understand the lecture content.

Method This study used the research method, learning design, STR application and training, and experimental research.

Research Method The method of this study was based on three major steps, as shown in Figure 1. The first step included pretesting. The second step involved experimental treatment. The third step incorporated homework assessment and posttesting.

Figure 1. Flowchart structure for the study

Learning Design This study conducted an experiment with appropriately designed learning activities by using STR technology in an online synchronous lecturing environment as well as in a setting wherein an individual student completed homework.

Online one-way lectures This study used the JoinNet™ application (Wang et al., 2010) to support online learning activities, i.e., online synchronous one-way lectures in a synchronous lecturing environment. The JoinNet™ application provides such tools as a whiteboard, a chat box, and audio and video for synchronous communication purposes (see Figure 2). Additionally, a teacher may upload and present PowerPoint® slides in a synchronous cyber classroom using the whiteboard. This study employed a chat box that displayed STR-generated text to the experimental group. Originally, the teacher planned to use the STR application during online one way lectures and he hoped the application could generate the text in real time. However, Petta and Woloshyn (2001) cautioned about significant delays (up to 50 seconds) in transcript generation by the STR application. Thus, the teacher in this study changed the original and ideal approach and prepared the lecture transcripts beforehand using the STR application in order to reduce delay in transcript generation. Afterwards, the teacher copied STR-pre-generated text and inserted it inside the chat box’s input field during the actual synchronous lectures. Finally, the teacher pressed the Enter key of his keyboard for the STR-generated text to display inside the chat box, so then all students could read it simultaneously.

369

Figure 2. Screenshot of synchronous one-way lectures

Individual learning Individual learning for students included studying the content of previous synchronous lectures and doing homework after online classes. The content of previous synchronous lectures was recorded by the JoinNet™ application and stored online so students could review it. Cooper (2007) argued that homework has an immediate effect on the retention and understanding of the material it covers; thus, students completed homework to practice and master the teaching material. According to the instructional design theory, homework represents skills, knowledge, and even attitudes of students, which are stated in the instructional objectives. Smith and Ragan (2004) recommended testing complex, “high order” knowledge and skills in the real-world context they are actually used, generally with openended tasks, such as an essay writing. Therefore, this study employed essay writing for the students’ homework. The first homework session (HW1) related to the “e-Data Protection” synchronous lecture, and the second homework session (HW2) related to the “File Protection” synchronous lecture. The essay writing activity included: 1) summarizing an online one-way lecture; 2) generating and articulating an understanding and opinion about general concepts of the lecture; and 3) elaborating on individual knowledge and experience. Such a writing activity can challenge students to approach, learn, and explain the complexities of the subject matter in new and thoughtprovoking ways (Smith & Ragan, 2004).

STR and Training This study employed Windows® Speech Recognition using the Microsoft® Operating System for STR tools. The choice was made based on the availability of this application for the students and teacher participating in the experiment. Way et al. (2008) argued that this application is similar to a variety of commercial and open-source 370

products in performance, ease-of-use, and it is available at no additional cost. Following general recommendations of Wald and Bain (2008), the teacher started using the STR application 2 months before the experimental course. He trained on the system first, and then applied it in an online synchronous cyber classroom environment after achieving an STR accuracy rate of more than 90 percent. The experimental group was engaged in STR training during the first month of the experiment. These students were asked to use STR after being trained to complete homework and to consequently identify strengths and limitations of the STR application. In addition, STR training could help students participate in a post-study experiment to investigate the effectiveness of applying STR on improving learning performance in a synchronous student-centered learning environment.

Research of the Experiment The research of the experiment was based on participants and procedures, experimental design, in addition to experimental tools and methods of statistical analysis.

Participants and experimental procedures A total of 44 undergraduate students enrolled in an Information Security course from two classes were participated in this study. One class with 19 students served as the control group, and the other class with 25 students served as the experimental group. Table 1 shows the participants’ gender and age distributions in two groups. An Information Security course was administered in this study from March to June 2009. A 2-hour, physical, face-to-face or online class took place on a weekly basis. Physical and online classes were conducted in rotation. The same teacher lectured for both groups in Chinese with the same lecture content. Appendix 1 presents a timeline of the course and a list of lectured topics. After the e-Data Protection and File Protection lectures, students were assigned homework. STR technology was used for and by the experimental group only. The teacher informed the experimental group students at the first class that online, one-way lectures will be conducted by using the texts pre-generated by the STR application. In addition, the experimental group was encouraged to use the STR technology to complete homework and identify strengths and limitations of STR. Category Gender Male Female Age (years old) 21–30 31–40 41–50 51 or above

Table 1. Participants’ gender and age distribution in two groups Control group (n=19) Experimental group (n=25) Frequency Percentage Frequency Percentage 9 10

47.37 52.63

10 15

40 60

1 8 5 5

5.26 42.10 26.32 26.32

3 13 6 3

12 52 24 12

Experimental Design This study employed a quasi-experimental design following the general recommendations of Creswell (2008). The experiment adopted a nonequivalent control group design to evaluate differences in the control and experimental groups’ learning performance. The study administered a questionnaire survey to investigate students’ perceptions and behavioral intentions toward using STR. The study adopted an independent sample test and effect size methods to test the differences in learning performance of the control and experimental groups in accomplishing homework and post-test objectives. The study conducted one-on-one semi-structured interviews with the experimental group after the experiment to explore the potential effectiveness of applying STR in synchronous cyber classrooms.

371

Experimental Tools This study employed the following tools: 1) evaluation of students’ prior knowledge (pretest); 2) assessment of students’ levels of cognitive development (HW1 and HW2); 3) evaluation of students’ learning achievement (posttest); and 4) a questionnaire survey and one-on-one semi-structured interviews regarding students’ perceptions and behavioral intentions toward using STR. The pretest featured 20 multiple choice questions and took place during the first class. The post-test had 10 true or false questions and 10 multiple choice questions, and it took place during the last class. The content of the pretest and the post-test evaluations related to the Information Security course. Both tests were scored on a 100-point scale (with 100 as the highest score), yet the tests were different in content. The study employed Taxonomy for Information Security Education (van Niekerk & Thomson, 2010) adopted from Bloom (1956) to assess homework in order to determine students’ level of cognitive development. The taxonomy (see Appendix 2) includes six levels; each level increases in complexity as the learner moves through the levels. The study adopted a concept as a coding unit and six-point scales for homework assessment. A score of “1” represented the lowest level of cognitive development, and a score of “6” represented the highest level. The final score for homework was the score that corresponded to the highest level of cognitive development found in the homework. For example, if the highest cognitive level in homework was identified as “4” (Analyze), then the homework was scored with a “4.” The assessments were created by a teacher, with more than 10 years of teaching experience in the Information Security domain; thus, the assessments provided superior validity under this condition. The questionnaire was designed based on the Technology Acceptance Model (Davis, 1986). Four dimensions were covered in the questionnaire: perceived ease (of STR) use (PEU); perceived usefulness (of STR) for learning (PUL); perceived usefulness (of STR) during online one-way lectures (PUOWL); and behavioral intention (BI) to use STR for learning in the future. According to Davis (1986), PEU is the degree to which a student believes that using STR would be free of physical and mental effort. PUL is the degree to which a student believes that using STR for learning would enhance his or her learning performance. PUOWL is the degree to which a student believes that using STR during online one-way lectures would enhance his or her learning performance. BI is hypothesized to be a major determinant of whether or not a student actually uses STR. Responses to the questionnaire items were scored using a five-point Likert scale, anchored by the end-points “strongly disagree” (1) and “strongly agree” (5). Twentyfour valid answer sheets to the questionnaire were obtained out of twenty-five experimental students. One-on-one semi-structured interviews with subsequent data analysis followed the general recommendations of Creswell (2008). Five students were randomly selected for the interviews. The interviews contained open-ended questions in which students were asked about the following: 1) their experience using the STR application during the experiment; and 2) their opinions about the impact of STR-generated texts for learning. Each interview took approximately 30 minutes; all interviews were audio-recorded with the permission of the interviewee and then fully transcribed for analysis. The text segments that met the criteria for providing the best research information were highlighted and coded. Next, codes were sorted to form categories; codes with similar meanings were aggregated together. Established categories produced a framework to report findings to the research questions.

Statistical Analysis Methods The study adopted the following methods of statistical analysis: 1. Cohen’s kappa – to evaluate the inter-rater reliability of the assessment (Creswell, 2008; Punch, 2009), i.e., pretest, HW1, HW2, and post-test. The analysis result exceeded 0.72, indicating its high reliability. 2. Cronbach α – to assess the internal consistency of the survey (Creswell, 2008). The value for PEU = 0.89; PUL = 0.94; PUOWL = 0.97; and BI = 0.84, which indicated that the reliability of the items was satisfied. 3. Independent samples test (t-test) – to compare the difference in learning performances for the control and experimental groups (Creswell, 2008) on the pretest, homework, and post-test. 4. The standardized mean difference statistic (referred as d) – Creswell (2008) suggested quantifying the practical strength of the difference between variables through effect size; this approach is important in a quantitative study, especially when using a small sample size to know the significance of a statistical test. He suggested that the effect size of .20 is small, .50 is medium (or moderate), and .80 is large. 372

Results and Discussion In this section the results of the study are presented as they relate to each research question and the pedagogical implications. Research question (1): What are the students’ perceptions and behavioral intentions regarding using STR technology in a synchronous learning environment? The questionnaire survey data analysis revealed that almost all items in the dimension, “Perceived ease of STR use” were ranked high, as shown in Table 2. This indicates that students generally agreed that STR application was easy to use. However, item number (no.) 2 was ranked as the lowest in this dimension. The interviews with students revealed the reason behind this phenomenon. The students mentioned some particular cases when it was difficult to attain a high recognition accuracy rate of STR. One example is recognizing homophones, the words with the same pronunciation but different meanings. Another example related to the speech of a speaker. When the speaker spoke too slowly, the STR application recognized one spoken word as two. Conversely, when the speaker spoke too quickly, the STR application recognized two spoken words as one (Wald & Bain, 2008; Way et al., 2008).

No. 1. 2. 3. 4. 5. 6.

No. 7. 8. 9. 10. 11. 12.

Item Learning to operate the STR is easy for me. I find it easy to get the STR to do what I want it to do. Interacting with the STR does not require a lot of my mental effort. My interaction with the STR is clear and understandable. It is easy for me to become skillful at using STR. Overall, I found the STR easy to use.

Table 2. Perceived ease of STR use Strongly Disagree Undecided Agree disagree (1) (2) (3) (4)

Strongly agree (5)

Mean

Standard deviation

1

2

1

11

9

4.04

1.08

1

2

8

11

2

3.46

0.93

1

1

2

18

2

3.79

0.83

0

1

4

13

6

4.00

0.78

0

1

4

16

3

3.87

0.68

0

0

3

16

5

4.08

0.58

Mean

Standard deviation

4.08

0.65

4.04

0.69

3.62

0.77

3.62

0.87

3.58

0.83

4.12

0.68

Table 3. Perceived usefulness of STR for learning Strongly Disagree Undecided Agree Strongly Item disagree agree (1) (2) (3) (4) (5) Using STR improves the 0 0 4 14 6 quality of my learning. STR helps me to accomplish learning tasks 0 0 5 13 6 more quickly. Using STR increases my 0 1 10 10 3 productivity. Using STR enhances my 0 3 6 12 3 effectiveness for learning. Using STR improves my 0 2 9 10 3 learning achievement. Overall, I find STR useful 0 0 4 13 7 in my learning.

373

Table 3 shows that almost all items in the dimension, “Perceived usefulness of STR for learning” were ranked high. This demonstrates that most students agreed about how useful STR technology was for learning (Ryba et al., 2006; Wald, 2010; Wald & Bain, 2008). Only item no. 11 was ranked as the lowest in this dimension. The interviews with the students helped infer such a finding. Some students, for various reasons, did not attend online classes; they preferred to make up missed classes by studying the learning materials from textbooks or via the Internet instead of from recorded archives of online classes. These students had no idea STR-generated texts could be used during and after online classes, nor did they realize the benefits of using STR-generated texts for learning. All items regarding the dimensions, “Perceived usefulness of STR in one-way lectures,” shown in Table 4, and “Behavioral intention to use STR,” shown in Table 5, were ranked with high scores. Most students agreed that STRgenerated text during one-way lectures was useful for learning, and most students were highly motivated to use STR continuously after this study.

No. 13.

14.

15. 16.

17.

18.

Table 4. Perceived usefulness of STR in one-way lectures Strongly Disagree Undecided Agree Strongly Item disagree agree (1) (2) (3) (4) (5) Use of STR by the teacher during one-way lectures 1 0 2 7 14 improves the quality of my learning. Use of STR by the teacher during one-way lectures helps me to accomplish 1 0 2 10 11 learning tasks more quickly. Use of STR by the teacher during one-way lectures 1 0 5 10 8 increases my productivity. Use of STR by the teacher during one-way lectures 1 0 3 9 11 enhances my effectiveness in learning. Use of STR by the teacher during one-way lectures 1 0 8 6 9 improves my learning achievement. Overall, I found using STR 0 1 1 9 13 by the teacher during oneway lectures is useful in my learning.

No.

Item

19.

I intend to continue using STR in the future. I plan to use STR often. I will strongly recommend the use of STR to others.

20. 21.

Table 5. Behavioral intentions to use STR Strongly Disagree Undecided Agree disagree (1) (2) (3) (4)

Mean

Standard deviation

4.37

0.97

4.25

0.94

4.00

0.98

4.20

0.98

3.92

1.06

4.42

0.77

Strongly agree (5)

Mean

Standard deviation

0

0

4

9

11

4.29

0.75

0

0

7

9

8

4.04

0.81

0

0

5

8

11

4.25

0.79

374

Research question (2): Do the students using STR-generated texts perform better in accomplishing homework and post-test objectives than the students who do not use the STR technology? Table 6 shows the means and standard deviations of students’ scores on the pretest, homework, and post-test and the results of the t-test and effect size. According to the results, all mean values of the experimental group were higher than the mean values of the control group. The results demonstrate that many assessment items had large standard deviations values. The results of the t-test showed no significant difference in the performance of the two groups on the pretest (t=-294, p=0.770), HW1 (t=-1.607, p=0.119), and HW2 (t=-1.818, p=0.076). However, the results demonstrated that the students of the experimental group significantly outperformed the students of the control group on the post-test (t=-2.239, p=0.034). An effect size of 0.03 was obtained for the pretest, 0.50 for HW1, 0.55 for HW2, and 0.70 for the post-test. The results revealed that the average student in the experimental group would perform higher than a student in the control group as follows: over 0.5 standard deviations in HW1 achievement; over 0.55 standard deviations higher in HW2 achievement; and over 0.7 standard deviations higher in post-test achievement. Table 6. Students’ scores on the pretest, homework, and post-test by group and results of the t-test and effect size Assessment Control Experimental t d Mean SD Mean SD Pretest 73.42 14.15 74.80 16.30 -.294 0.03 HW1 4.21 1.47 4.84 0.99 -1.607 0.50 HW2 3.79 1.18 4.40 1.04 -1.818 0.55 Post-test 70.79 21.75 83.20 12.07 -2.239* 0.70 * p