Comparison of Earth Science Achievement Between ... - Springer Link

1 downloads 0 Views 2MB Size Report
Sep 2, 2009 - Ernest 1998). ..... In D. Gibson, C. Aldrich & M. Prensky (Eds.), ... learning. In R. E. Snow & M. J. Farr (Eds.), Aptitude, learning and instruction: III.
Res Sci Educ (2010) 40:639–673 DOI 10.1007/s11165-009-9138-9

Comparison of Earth Science Achievement Between Animation-Based and Graphic-Based Testing Designs Huang-Ching Wu & Chun-Yen Chang & Chia-Li D. Chen & Ting-Kuang Yeh & Cheng-Chueh Liu

Published online: 2 September 2009 # Springer Science + Business Media B.V. 2009

Abstract This study developed two testing devices, namely the animation-based test (ABT) and the graphic-based test (GBT) in the area of earth sciences covering four domains that ranged from astronomy, meteorology, oceanography to geology. Both the students’ achievements of and their attitudes toward ABT compared to GBT were investigated. The purposes of this study were fourfold as follows: (1) to examine the validity and the reliability of ABT, (2) to compare the difference of ABT and GBT in student achievements, (3) to investigate the impact of ABT versus GBT on student achievements with different levels of prior knowledge and (4) to explore the ABT participants’ attitudes toward ABT in comparison with GBT. A total of 314 students, divided into two groups, participated in the study. Upon completion of the test, the students who took the ABT were given the survey, Attitude toward Animated Assessment Scale (AAAS). The results of the study indicated that ABT was a valid and reliable way of testing. While no significant difference was found between the test formats in student achievements in general, practical significance existed when the study further compared the impact of ABT versus GBT in student achievements with various levels of prior knowledge. It was found that low prior knowledge students performed better in ABT while high prior knowledge students performed better in GBT. Finally, more than 60% of the participants who took ABT were satisfied and held positive attitudes toward ABT.

H.-C. Wu : C.-Y. Chang : T.-K. Yeh Department of Earth Sciences, National Taiwan Normal University, Taipei, Taiwan C.-Y. Chang Graduate Institute of Science Education, Taipei, Taiwan C.-Y. Chang : C.-L. D. Chen : T.-K. Yeh : C.-C. Liu Science Education Center, National Taiwan Normal University, Taipei, Taiwan C.-Y. Chang (*) 88, Section 4, Ting-Chou Road, Taipei 11650, Taiwan e-mail: [email protected] URL: http://ese.geos.ntnu.edu.tw/~chunyen

640

Res Sci Educ (2010) 40:639–673

Keywords Animation . Computerized assessment . Achievement . Attitude

Introduction An important objective of science education has been to enhance learners’ literacy, including, but not exclusively for improved science conceptual understanding, science procedural skills, and problem solving ability (American Association for the Advancement of Science 1993). Using assessment to effectively evaluate learners’ scientific learning and find out how close learners are to the aforementioned educational objective has been considered as one of the most important research issues. Traditional assessment is generally conducted through the use of paper and pencil. However, paper-and-pencil tests may encounter limitations including: (1) the difficulty to represent abstract concepts; for example, many scientific concepts involving micro- or large scales are abstract and incomprehensible to students, such as atom-molecule theory, plate tectonics, and relative movements of the Earth, Sun and Moon and therefore, these abstract concepts are hard to delineate in the test items; and (2) the difficulty in representing real-world contextual problems. Therefore, a new generation of technology-enhanced assessment, such as PISACBAS, is proposed worldwide with the aim of improving the traditional school-based and large-scale testing. Highly illustrated materials like animation are used in these technologyenhanced assessments. Animation, by the nature of presenting multiple images over time enables viewers to perceive dynamic phenomena much as they would in the physical world (Cook 2006). There are three characteristics of animation that can be found beneficial for incorporation into the assessment. First, animation can be seen as a depictive external presentation and enables the representation of abstract concepts, such as the process of tide rising and ebbing. Second, animation can be used in place of reality (Dancy & Beichner 2006), so that the animation could be more effective for presenting real-world contextual problems compared to static graphical representations. Third, animations can motivate learners by their cosmetic appeal (Wouters et al. 2008). Most research findings suggested that animations are better than static graphics to illustrate abstract concepts and to visualize the working process (Rieber 1990; Bodemer et al. 2004). Animation which emphasizes how the features of graphics interact and interrelate allows students to build mental representation of unseen processes, to facilitate science learning by reducing the level of abstraction of spatial and temporal information and the load of cognitive processing (Cook 2006). Based on these characteristics of animation, it enables an animation-based test to not only complement the traditional paper-and-pencil test, but also enhance examinees’ interests in taking exams. However, in some cases, animation might not be as effective as static graphics. When animation was employed to illustrate Earth rotation, students who used animation had lower performance on content questions than students who used static graphics (Schnotz & Grzondziel 1996). Other research (Tversky et al. 2002) also indicated that, in general, dynamic visualization is not more effective than static visualizations because animations are often too complex or too fast to be perceived accurately since they consist of continuous events rather than a sequence of discrete steps. Based on the cognitive load theory (Sweller 1988), a well-designed animation might be able to decrease extraneous cognitive load which is imposed on the learner by the instructional design, as well as the intrinsic cognitive load which is imposed by inherent difficulty of the material, such as element interactivity. Thereafter, more cognitive resources would be released to contribute to schema construction and automation. Therefore, it is

Res Sci Educ (2010) 40:639–673

641

important to recognize the specific conditions under which the use of animation might be most effective and to follow the appropriate principles while designing the animation so that observers’ cognitive resources will not be overwhelmed. Dancy and Beichner (2006) stated that animation appears to be most valuable under the following conditions: (1) The animation is an integral part of the questions and not just a good-looking addition, and (2) The animation could clarify the misreading or misinterpretation of a situation caused by the static form of a question. Research has specified three general principles to effectively incorporate animation into instructional design. First, interactivity is an important component of successful animations (Tversky et al. 2002; Cook 2006). Interactive controls enable observers to manage the pace of the animation so that the information conveyed through the animation can be perceived more accurately. Second, the graphics of animation should be designed with less detail than realistic ones to reduce learners’ information processing demands. However, the apparent simplicity may also be a drawback to the animation. Lowe (2003) discovered that over-simplifying dynamic information would reduce learners’ mental effort to expend on important processing activities. Therefore, it is essential that the design of the animation is appropriately simplified. Third, adding the annotation such as arrows or highlights or other guiding devices into animation may direct observers’ attention to the critical changes and relations (Tversky et al. 2002). Lowe further (2003) suggested that since learners may have some negative consequences for learning without directions (Kirschner et al. 2006), they could benefit from animations designed to provide a more directive learning environment with the incorporation of specific visual and temporal guidance. All in all, if the animation was designed to be interactive, appropriately simplified and annotated, it would be more likely to decrease learners’ cognitive load and thus release cognitive resources for genuine comprehension and learning. In our opinions, integrating animations into test design could provide science education researchers with an insightful perspective and some practical strategies in reducing students’ cognitive load and facilitating their comprehension of the test items. Dexter (2006) mentioned that technology assists in the assessment of the learning outcomes. However, limited research has investigated the impacts and applications of animation incorporated assessments in the field of science education. Therefore, the aims of this study are to examine the effectiveness of Animation Based Test (ABT) in comparison with Graphic Based Test (GBT) in students’ earth science achievements and to survey students’ general attitudes toward ABT and GBT. The following research questions guided this study: 1. Is ABT a valid form of assessment? 2. Is ABT a reliable form of assessment? 3. Is there any difference in students’ earth science achievements between the use of ABT and GBT? 4. Is there any difference in students’ earth science achievements associated with different levels of prior knowledge between the use of ABT and GBT? 5. What are students’ attitudes toward ABT?

Method Participants The participants of the study consisted of 314 tenth graders randomly chosen from a total of 640 students in a normal high school, which ranked in the middle tier among the 20 senior

642

Res Sci Educ (2010) 40:639–673

high schools in Taipei. These participants varied in their levels of prior knowledge in earth sciences. Due to the limited number of earth science teachers and learning materials, half of the tenth graders (Group A), comprised of 194 participants, studied the earth science curriculum in the first semester while the other half (Group B), comprised of 120 participants, studied in the second semester. Since this study was conducted at the beginning of the second semester, Group A had already learned the curriculum while Group B was still learning. Design of the Experiment A comparative experimental design was employed in the study. The 314 students were randomly assigned to two groups according to their learning statuses: the group which had already learned the curriculum (Group A) and the group which was still in the process of learning (Group B). ABT and GBT were both completed by each group. There were two main purposes for this design. First, by examining the students’ score differences as their learning statuses varied with the test format being held constant, the discriminating power of the tests was evaluated such that the students with higher scores for earth science knowledge were distinguished from those with lower scores. Second, by examining the students’ score differences as their test formats varied with the learning statuses held constant, the impact of the test formats on student achievement was determined. That is, the participants’ test scores were compared to determine whether a significant difference on student achievements existed between ABT and GBT. The rationale behind the division of the subject pool into two groups can be best illustrated by Table 1. Finally, the students who participated in ABT were given the “Attitude toward Animation Assessment Scale” (AAAS) questionnaire to survey their perceptions toward ABT. Instrumentation The animation-based test (ABT) was designed using the software of Director MX 2004. Other than the difference in the formats of the test, animated vs. graphic, the content, the sequence of the items and the layouts for both ABT and GBT were identical. Each test

Table 1 Design of the experiment

Learning Status Test Format

Curriculum Curriculum in Learned Progress (Group A) (Group B)

ABT

A-ABT

B-ABT

GBT

A-GBT

B-GBT

For the purpose of determining the impact of test format on student achievements

For the purpose of determining the discriminating power of the test

Res Sci Educ (2010) 40:639–673

643

consisted of 20 multiple-choice items (100 points in total scores) and was divided into four sections (5 items on each section): geologic, oceanographic, meteorological and astronomical concepts. The detailed contents and the item distribution for each particular topic in ABT and GBT are listed in Table 2. The animations incorporated in ABT were appropriately simplified by reducing the details of the objects. Figure 1 represents a sample item in the ABT. The railroad and the mountains in the animation are illustrated by straight lines and simple images. In addition to the animation, arrows and the interactive functions enabled by the “play”, “initial state” and “replay” buttons are provided. The animation was designed cautiously in order to decrease the extraneous cognitive load. By doing so, participants were expected to manage more cognitive resources in their working memory for information processing. This study not only measured students’ earth science achievements when using ABT in comparison to GBT, but also examined students’ attitudes toward ABT as a new testing instrument. The scale used in this study was revised from “Attitude toward Computerized Assessment Scale” (ATCAS) with the internal consistency values ranging from 0.76 to 0.92 (Smith and Caputi 2004). Having revised from the ATCAS, A newly adapted 18-question survey, Attitude toward Animation Assessment Scale (AAAS), revised from the ATCAS, was administered in this study. The survey measured students’ attitudes toward ABT in the following four perspectives: (1) ease of use of ABT, (2) confidence in using ABT, (3) design of ABT and, (4) acceptance of ABT. Students’ responses on the 4-point Likertlike questions ranged from 1, indicating strongly disagree, to 4, indicating strongly agree. Finally, the survey was administered by computers and students’ responses were submitted online. Validity, Reliability and Item Analysis To ensure the content correctness and the validity of the test, the researchers solicited the opinions from 6 earth science professors/educators. Of which, one specialized in science education and five specialized in earth sciences. To judge the quality of the test items, each science educator received an ABT/GBT CD package and wrote down his/her comments on

Table 2 A table of content for ABT and GBT Content

Total

1. Fault

4

2. Crust convergence

1

3. Hydrologic cycle

2

4. Ocean circulation 5. The tides

2 1

6. Cyclone & anticyclone

3

7. Clouds

1

8. Typhoon

1

9. The motion of the Earth

1

10. The ecliptic and seasons

1

11. Local sidereal time Total Items

3 20

644

Res Sci Educ (2010) 40:639–673

Fig. 1 A sample item of ABT

a checklist. The comments were generally constructive. For example, a geologist suggested that the slope of a thrust fault in animation should be smoothed to less than 30 degrees since a steep slope is unlikely to occur in reality. Based on the comments provided, modifications were made accordingly. Through the review of the test items by the earth science professors/educators, the content validity was improved. The reliability of the ABT and GBT were measured using the Kuder-Richardson Formula 20 (KR-20). According to Fraenkel and Wallen (1993), reliability coefficients should be at least 0.70 and preferably higher. The result indicated a reliability coefficient of 0.70 for both tests which demonstrated that they were appropriate measures. To determine the reliability of the Attitude toward Animation Assessment Scale (AAAS) which surveyed participants’ perceptions toward the ABT, the Cronbach’s Alpha reliability coefficient was calculated. The result demonstrated a 0.78 of reliability coefficient. Since the scores in ABT and GBT were fairly consistent, scoring variation would be limited if they were administered to the same student again. The study further conducted an item analysis for the ABT by calculating the item difficulty and item discrimination for each of the 20 items in ABT. The item difficulty of the 20 items in ABT ranged from 0.20–0.91 with an average of 0.51 and the item discrimination ranged from 0.12–0.57 with an average of 0.36. In general, the acceptable range for item difficulty is between 0.30 and 0.90 and for item discrimination is above 0.25. Though slightly lower than the acceptable range, for the purpose of this study, the ranges of item difficulty and discrimination are still considered reasonable. Data Analysis An independent sample t-test was conducted to examine whether the mean scores were significantly different between the participants who had learned the earth science curriculum and the participants who were still in the process of learning the curriculum. In order to test whether an interaction effect existed between the learner statuses (curriculum learned or curriculum in progress) and the test formats (ABT or GBT), a two-way ANOVA was conducted. The magnitude of the differences between the means of the two groups was assessed through the use of effect size (ES). The ES takes into account the size of the difference between the means, regardless of whether or not it is statistically significant. It can be obtained by dividing the mean difference of the two groups by the common standard

Res Sci Educ (2010) 40:639–673

645

deviation. As suggested by Cohen (1988), 0.20 represents a small ES, 0.50 a medium ES, and 0.80 a large ES. Most researchers consider an ES of 0.50 or larger as an important finding.

Results Discriminating Power Testing Since some of the participants in this study had completed the curriculum (Group A) while some were still working on it (Group B), their learning statuses were different. Both the participants of Group A and B were assigned to ABT and GBT groups. An independent sample t-test was conducted to examine the mean differences between the two groups of learners. The results showed that the students in Group A scored significantly higher than those in Group B (t=4.71, p=0.000). Table 3 below illustrates the detailed t-test results. The Comparison of ABT and GBT Scores A two-way ANOVA was conducted to examine if there was an interaction effect between the learner statuses (Group A or Group B) and the types of the tests (ABT or GBT). According to Table 4, the F ratio of the learning status indicated a significant difference between the mean scores of the students who had learned the curriculum and those who were still learning (F=19.0, p=0.000). On the other hand, no significant difference was found in the mean scores between the different test types (F=1.6, p=.205). Overall, there was no interaction effect between the learner statuses and the test types (F=1.1, p=0.293) which implied that levels of learner statuses can affect the scores in the same way across the levels for the two types of the test. The Impact of ABT vs. GBT on Student Achievements with Different Levels of Prior Knowledge Other than examining the test performance between groups as a whole, individual scores within each of the groups were also analyzed separately. Based on learners’ levels of prior knowledge, both groups of learners were categorized into three subgroups: high, moderate and low prior knowledge levels. Students’ prior knowledge was measured by the scores of their earth science examinations taken previously. Since Group A students had completed the curriculum while those in Group B were still in the process of completing, there were three formal summative assessments administered to Group A and only one administered to Group B. The Group A’s prior knowledge was measured from the average score of three assessment results whereas Group B’s prior knowledge was measured from only a single

Table 3 Comparison of mean scores between Group A and Group B Group A-Curriculum Learned (n=194)

Group B-Curriculum in Progress (n=120)

M

SD

M

SD

55.4

15.5

47.0

15.0

df

t

P

312

4.71

.000

646

Res Sci Educ (2010) 40:639–673

Table 4 Descriptive data for students’ scores and the results of ANOVA Test type

Learner Status

ABT group

GBT group

Mean (SD)

Mean (SD)

Group A-Curriculum Learned

57.2 (15.0)

52.9 (16.0)

Group B-Curriculum in Progress

47.2 (15.3)

46.8 (14.3)

Source

SS

df

MS

F

375.5

1

375.5

1.6

4430.1

1

4430.1

19.0

0.000

258.0

1

258.0

1.1

0.293

SSa (Test-type) SSb (Status) SSab

Sig. 0.205

assessment result. Table 5 shows the comparison of students’ prior knowledge scores between Group A and B. For Group A, there were significant differences in the average scores between ABT and GBT groups. The average score of 57.2 for ABT group was significantly higher than the average score of 52.9 for GBT group (t=1.91, p=0.05, d=0.3, small effect size). The mean scores for the three subgroups were also examined. Students of low prior knowledge obtained 53.9 on ABT and 43.5 on GBT (t=1.94, p=0.06, d=0.7, approaching large effect size). For Group B, the overall scores for the two types of the tests did not differ significantly from each other (t=0.14, p=0.89, d=0.1, small effect size). While further examination of the mean scores for the subgroups showed the high prior knowledge students scored higher on GBT than ABT, it was not statistically significant (t=1.44, p=0.16, d=0.7, approaching large effect size). On the contrary, students with low prior knowledge obtained a higher average score on ABT in comparison with GBT with no statistical significance (t=0.94, p=0.36, d=0.7, approaching large effect size). Though both findings were statistically insignificant, the medium to large effect size may indicate some practical significance since Table 5 Comparison between ABT and GBT scores within Group A and Group B in relation to different levels of prior knowledge Level of prior knowledge

Group A Low

ABT score

GBT score

t (p)

n

Mean

SD

n

Mean

SD

114 19

57.2 53.9

15.03 14.51

80 18

52.9 43.5

15.99 15.05

1.91 (0.05*) 1.94 (0.06)

Cohen’s d

0.3 0.7

Mid

65

56.7

15.23

49

52.8

14.95

1.44 (0.15)

0.3

High

30

62.3

13.87

13

60.9

16.95

0.25 (0.80)

0.1

Group B

79

47.2

15.31

41

46.8

14.30

0.14 (0.89)

0.1

Low

13

48.2

17.28

8

38.3

10.41

0.94 (0.36)

0.7

Mid

50

45.6

14.50

21

44.1

13.76

0.44 (0.66)

0.1

High

16

50.6

15.71

12

59.4

10.50

1.44 (0.16)

0.7

*p