The Initial Knowledge State of High School Astronomy Students Philip ...

5 downloads 11275 Views 3MB Size Report
Graduate School of Education of Harvard University ...... computer and manual searches of data bases and reference volumes, ..... astronomer Hipparcus (ca.
The Initial Knowledge State of High School Astronomy Students

Philip Michael Sadler

A Dissertation Presented to the Faculty of the Graduate School of Education of Harvard University in Partial Fulfillment of the Requirements for the Degree of Doctor of Education

1992

© 1992 Philip M. Sadler All Rights Reserved

Dedication

To Professor Jerrold Reinach Zacharias (1905-1986) of the Massachusetts Institute of Technology, a world-class scientist, a wonderful teacher, and a national leader, whom I first met in the Fall of 1969. He, more than any other person, inspired me to dedicate my life to teaching science and mathematics to young people. Forever berating schools of education, I can imagine Zach saying, “Philip, a Ph.D. in Physics is better than an Ed.D., especially from that school up the river.” But Jerrold, that “school up the river” is actually quite wonderful. I know this is the best path to carry your traditions and ideas to the next generation of teachers and students.

Acknowledgments This dissertation has been a long time in coming and many people have encouraged me to start and cajoled me into finishing. I wish to thank all those involved with Project STAR at the Harvard-Smithsonian Center for Astrophysics, especially Professor Irwin Shapiro, for the risk he took in bringing in an outsider to direct the project and for his thoughtful counsel in matters of both science and education; Bruce Gregory, for always having time to discuss problems of misconceptions and science learning; and Professor Darrel Hoff for encouragement, ideas, and enthusiasm for my chosen path. Dr. Marcus Lieberman, consultant to Project STAR, has organized, coded, and provided preliminary analysis of the data. He has always had the patience to explain obscure statistical procedures and has been a sounding board for my research methodology. Heather Whipple supported these efforts by tracking participating teachers and providing them with copies of pre- and post-tests. Sam Palmer has helped me by just being himself, a profoundly creative and reflective teacher who, along with Professor Charles Whitney and Professor Owen Gingerich, has helped to mold the items on the pre-test through their many incarnations. Marvin Grossman, a long-time friend and colleague, has been a sounding board for my ideas as well. Test items dealing with light and color were originally worked on with the advice of Professor Eleanor Duckworth. Professor Ann Young of the Rochester Institute of Technology was instrumental in developing and refining many of the test items. Jenny Bond Hickman of Phillips Andover and Paul Hickman of Belmont High School contributed by taping interviews with many of their students. This process revealed students’ ideas about light and color. Professor Linda Shore of Boston University also examined light and color, as well as questions that tease out students’ ideas about gravity. Dr. Matthew Schneps directed and produced A Private Universe, which has promoted many of the ideas within this dissertation to tens of thousands of teachers. This study could not have been performed if it were not for an extraordinary group of teachers dedicated to improving Project STAR: Andrew Anzalone, Richard Ayache, Lynn Bastoni, Russell Blake, Richard Brown, Daniel Francetic, Linda French, Gita Hakerem, Jennifer Bond Hickman, Paul Hickman, Robert Holtzman, Jeff Lane, Larry Mascotti, Bruce Mellin, Fiona McDonnell, Mark Petricone, Michael Richard, Larry Sabbath, Gary Sampson, Dorothy Walk, and especially Harold Coyle, Jr., who is now on the project staff, and Bill Luzader who spent a year’s sabbatical at the Center for Astrophysics. Mark Petricone labored for a summer helping to rewrite a version of the misconception test that resulted in the improvement of many of the items. I want to thank the National Science Foundation for the generous funding they have provided for Project STAR from 1985 to 1992 and the three Program Directors who have supported our activities: John Thorpe, Mary Kohlerman, and Gerhard Salinger. A special thanks goes to Andy Molnar of the NSF, who, as well as being an enthusiastic supporter of our efforts, has never ended a conversation with me without asking about the progress of my dissertation. The Smithsonian Institution has also supported our activities with grants and the time of scientists.

i

My wife Jane has put up with the long hours and grumpy moods provoked by my doctoral studies and I appreciate her generosity of time and spirit. Thanks to my kids Benjamin, Samuel, and Daniel who have provided so much in the way of pleasant diversion which more than make up for their interruptions at the keyboard. A good friend and experienced editor, Bessie Blum, proofread, commented upon, and corrected several versions of both my qualifying paper and this dissertation. Thanks also go to my readers. Professor Israel Scheffler was my advisor through most of my doctoral work. He always found the time to advise me on my research and confided that I gave him “less trouble” than any other graduate student. Since his retirement, Professor Judah Schwartz has more than filled the breach. He was the natural choice, since he was instrumental in providing me with an exceptional undergraduate education at MIT as the Director of the Unified Science Studies Program. I am proud to have joined him as a colleague at HUGSE. Professor Irwin Shapiro has aided me with excellent feedback through his instrument of choice, the red pen. I am sure he will continue to do so in future projects. Terrence Tivnan has somehow found time to read my work and to discuss my methodology while seemingly being a reader for every other doctoral student at the Graduate School of Education. Moreover, he carried out the initial pre- and post-test analyses of Project STAR, along with Terresa Tatto, in 1987. As crude as our instrument was at that time, he planted the seeds of this dissertation by shaping the formative evaluation of the project. His insights are the seeds from which this study grew. Thank you all. I will try to measure up to the confidence you have shown in me. I will attempt to give back to my own students the caring and guidance that you have shown to me.

ii

Table of Contents Dedication......................................................................................................................i Acknowledgments........................................................................................................ii Table of Contents.........................................................................................................v Abstract...........................................................................................................................vii I. Introduction...............................................................................................................1 A. The Problem Context: Difficulty in Learning Science............................1 B. Statement of the Problem..........................................................................5 C. Significance of this Study............................................................................5 II. Review of the Literature ......................................................................................7 A. Cognitive Roots of Misconception Research ..........................................9 B. History of Scientific Misconceptions.........................................................13 C. A Hierarchy of Misconception Research .................................................19 D. Methodological Problems of Past Research............................................49 E. Implications of Past Studies .......................................................................51 III. Methodology .........................................................................................................56 A. Description of the Dataset..........................................................................56 B. Research Questions.....................................................................................63 C. Hypotheses ..................................................................................................64 D. Instrument....................................................................................................64 E. Procedure .....................................................................................................65 F. Statistical Analyses ......................................................................................77 IV. Reliability and Validity.......................................................................................80 A. Reliability.......................................................................................................80 B. Validity Tests .................................................................................................80 V. Item Analysis Results ...........................................................................................87 A. Total Score....................................................................................................87 B. Earth and Sun...............................................................................................89 C. Earth and Moon...........................................................................................117 D. Mathematics.................................................................................................131 E. Solar System.................................................................................................146 F. Stars...............................................................................................................150 G. Galaxies.........................................................................................................159

v

H. Light and Color ...........................................................................................165 VI. Whole Test Results ...............................................................................................184 A. Ranking of Test Items by P-value.............................................................184 B. Ranking of Test Items by D-value............................................................189 C. Mean Item Characteristic Curve...............................................................190 D. Discrimination/Difficulty Graph...............................................................192 E. Distribution of Correct Answers ..............................................................195 F. Which Questions Should Be Included on a Shortened Test? ................197 G. Predictors of Difficulty and Discrimination.............................................199 VII. Demographic and Schooling Factors Results.................................................201 A. Demographic Factors .................................................................................204 B. Schooling Factors ........................................................................................213 C. Attitude Factors...........................................................................................228 D. Analysis of Variance ...................................................................................234 VIII. Discussion............................................................................................................238 A. Research Questions.....................................................................................238 B. Characterizing Misconception Questions................................................244 C. Dissemination ..............................................................................................245 D. Errors, Omissions, and Problems .............................................................245 E. Future Extensions........................................................................................247 IX. References ...............................................................................................................248 X. Bibliography ............................................................................................................261 Appendices.....................................................................................................................269 A. School Data B. Pre-test Instrument C. P-Value, D-Value Tables D. Classical Test Theory Tables E. Item Correlation Matrix F. Chi-Square Analysis Vitae

Abstract This study of 1,414 high school earth science and astronomy students characterizes the prevalence of their astronomical misconceptions. The multiplechoice instrument was prepared by scouring the literature on scientific misconceptions for evidence of preconceptions and from the author’s interviews with students. Views that were incorrect, but espoused by a large fraction of students, were included as distractors. Results have been analyzed using classical test theory. A linear multiple regression model has helped to show the relative contributions of demographic and school factors to the number of misconceptions held by students. The instrument was found to be a reliable and valid test of students’ misconceptions. The mean student score was 34 percent. Fifty-one student misconceptions were revealed by this test, nineteen of which were preferred by students to the correct answer. Several misconceptions appeared more frequently among the higher-performing students. Significant differences in student performance were found in several subgroups based upon schooling and demographic factors. Twenty-five percent out of a total of 30 percent of the variance in total test score could be accounted for by gender, race, and math level courses taken. Grade level and previous enrollment in an earth science course were not found to be predictive of total score. Mother’s education proved to be of small import; level of father’s education was not significant. This test is a useful addition to instruments that measure student misconceptions. It could find application in tests of effective intervention for conceptual learning. Significantly shortened versions of this instrument that account for 75 and 90 percent of the variance in the forty-seven-item instrument are recommended. Such tests of misconceptions may be somewhat disheartening to teachers and their students. A test made up of only misconception questions will probably have average total scores less than 40 percent. If teachers are to test their students using misconception questions, they should adjust grading policies to reflect this lower average score.

i

I. Introduction A. The Problem Context: Difficulty in Learning Science Science education in the United States is a disaster. According to tests that compare our students with those of other countries, the United States is ranked close to, if not at, the bottom of the list (International Association for the Evaluation of Educational Achievement 1988). Our major economic competitors, Germany, Korea, and Japan, place at or near the top. Science has dropped from the lofty position it held in the nation’s schools in the late 1950s and 1960s. Can we reverse this decline and make our society more scientifically literate and economically competitive? When examined, tests of students’ performance show that American students know their science facts. Where they stumble badly is in conceptual understanding and in the use of concepts to solve problems.1 Several factors could help account for these differences: • Our textbooks are long and filled with jargon. Comparable Japanese texts are short and filled with concepts (Troost 1985). • American students spend nearly one-half of their class time reading science texts and listening to their teachers lecture (Weiss 1987a). There is little cooperative or hands-on learning (Hofwolt 1985). • Our texts rarely apply scientific concepts directly to the world with which students are familiar (Goodlad 1984; National Assessment of Educational Progress 1989). German schools emphasize relevance and technical applications of physics (Klein 1985). Few would contest that our curriculum materials and teaching methods could use improvement, but in the United States, professional evaluation of curricula and teaching is for the most part ignored. Whereas Germany, Japan, and Korea have powerful Ministries of Education that create curricula and evaluate them with national tests, in the United States there are 16,000 independent school systems and almost 200,000 full- and part-time teachers of science (Fisher and Lipson 1985). Teachers are simply left on their own to design instruments to assess the learning of their own students. Such tests and quizzes as they do design are amateur efforts. They do little to discriminate between effective and poor teaching or curricula against national standards (Rutherford 1985).

1 Examples

of questions from the 1986 National Assessment of Educational Progress :

Fact (level 150), To which of the following is the wolf most closely related? Buffalo, Deer, Dog, Rabbit, Sheep, I don’t know Concept (level 350), An ore sample contains 50 grams of radioisotope with a half-life of 5 seconds. After 10 seconds, how many grams of radioisotope are in the sample? 12.5 grams, 25 grams, 50 grams, 75 grams

Teaching for conceptual understanding rather than the memorization of facts and algorithms is far from easy. Many hurdles face teachers who hope to impart to their students even a few powerful scientific concepts. Perhaps the most daunting problem is that students enter the classroom with beliefs about how the natural world behaves that are at odds with accepted views in science. Students apply these “misconceptions”2 to make predictions about events, such as that gravity is the result of air pressure (Minstrell 1982a) or that light from a candle will travel further at night than in the day (Stead and Osborne 1980). Dozens of studies in recent years show that these beliefs, constructed by the students themselves or garnered from misinformed adults, are quite tenacious. Once in place, they rarely change in the course of even the best instruction. Most students leave their science courses with no better conceptual understanding of scientific ideas than when they enter.3 The fact that students maintain conceptions that are at odds with scientific fact has major consequences. For the individual student, these beliefs can become critical barriers that often keep them from succeeding in their science courses, prematurely ending their study in science (Arons 1983). Teachers become frustrated when dealing with the confusion of students who have difficulty learning simple concepts because of their faulty foundations. Yet teachers are woefully unaware of the misconceptions of their students (Sadler 1987). Perhaps the most damaging indictment of science teaching is that students who took high school physics do no better in college physics than those who chose not to take a high school course (Champagne and Klopfer 1982; Halloun and Hestenes 1985). For the U.S. educational system, it is a monumental waste of resources to teach ineffectively, yet there is little proof of the existence of teaching effective enough to alter students’ misconceptions.

2

The term “scientific misconceptions,” as used in this paper, refers to ideas that people possess that are different from accepted scientific views. Alternatives for the term “misconception” have been suggested by some researchers, because use of the term can seem to denigrate student ideas. After all, it is wonderful that students do come up with such amazing and original constructions. “Alternative frameworks” has been suggested as less judgmental (Driver, 1978). “Preconceptions” emphasizes the prior knowledge that students bring to class (Ausubel, 1978). “Naive theories” is a term that recognizes that students' ideas are theories that result from thought, not guessing (Arnaudin, 1985). “Alternative conceptions” was coined to characterize those false ideas that do not change even after instruction (Hewson, 1983).

3

An interesting example is that in spite of a fine education, commencement-day interviews showed 21 out of 23 Harvard graduates, alumni, and faculty thought that the earth's changing distance from the sun is responsible for its seasons or that the moon's phases are caused by the shadow of the earth (Sadler and Schneps, 1988 #341).

Increasing the efficacy of conveying scientific concepts demands that teachers become aware of and seek to change the misconceptions of their students (Champagne et al. 1982). Much of the science that students are required to study in school is “alien” and in conflict with their experiences and thinking (Gardner 1991). Students have difficulty learning science because of the difficulty of reconciling their own beliefs with the conflicting ideas of their teacher. Misconceptions should be viewed instead as stumbling blocks for the student and signposts for the teacher (Narode 1987). The examination of scientific misconceptions, however, is still in its infancy. Through clinical interviews and formal testing, the misconceptions that students bring to their science classes have been extensively investigated in only a few domains. The most attention has been paid to Newtonian mechanics; astronomy is much less studied. Yet the subject of astronomy is taught yearly in secondary school as a separate course to approximately 50,000 students (Weiss 1987b), as a part of middle school earth science to more than 1,000,000 students (Welch et al. 1984), and at the college level as an introductory course to more than 300,000 (Hoff 1982). For many pre-service teachers, especially those who will teach in the primary grades, taking a single astronomy course may fulfill their only physical science course requirement. The misconceptions in this domain have not been adequately examined in large-scale surveys. There have been many investigations of single concepts through interviews of small numbers of students. This study characterizes the preconceptions of more than 1,400 high-school students taking introductory astronomy. Demographic information helps to determine the relationship of gender, race, age, and the role of previously taken science and mathematics courses to the quantity of misconceptions that students hold. The results of this study could help in the formulation of tests, new curriculum materials, and teaching techniques for introductory astronomy courses. It is the intent of this study to explore the extent of common “wrong answers” among introductory astronomy students. Some may view these student responses as not being definitively indicative of particular scientific misconceptions. This is only partly true. Many previous studies have explored the “web of entailed and related” faulty answers in astronomy and linked them together into justifiable misconceptions. This test builds upon such work and attempts to quantify the prevalence of these misconceptions by having students choose among scientific explanations and distractors garnered from interviews. In my analysis, I have explicitly attempted to use statistical techniques to examine the relatedness of various misconceptions. I wished to avoid qualitative analyses in trying to establish these connections and use a purely quantitative approach. This quantitative approach to relatedness of misconceptions has not been commented upon in the literature. This work is an outgrowth of Project STAR, a curriculum development project supported by the National Science Foundation. The course produced by this project based at the Harvard-Smithsonian Center for Astrophysics has had as a main objective the study of students’ ideas in astronomy and the investigation of methods for aiding students in building powerful and predictive scientific ideas.

B. Statement of the Problem The purpose of this thesis is to identify and to investigate key scientific misconceptions that students may have when they enter introductory astronomy courses. It also seeks to determine the significance of schooling and demographic factors in the frequency of these misconceptions. C. Significance of this Study Astronomy is taught at many levels to over one million students each year. Astronomy and earth science texts are packed with sophisticated concepts that appear trivial to teachers, but interviews with students uncover the fact that they have notions contrary to those espoused by teacher or text. There has yet to be an attempt at formulating a comprehensive instrument that will evaluate basic astronomical misconceptions in a form that is both reliable and easy to administer. The validation of such a test would provide a way for: • Over 15,000 teachers to test their students to identify prevalent misconceptions and allow them to plan their lessons accordingly. • Teachers and researchers to measure the effect of instructional techniques or curriculum materials in reducing student misconceptions. • Astronomy departments to measure the misconceptions of their graduate teaching fellows.4 This analysis of results from the 1,414 students who took a test of this kind in September of 1991 seeks to answer questions that were not examined in prior studies of student misconceptions in science. Studying the role of schooling and demographic factors would help to determine if: • Previous science courses appear to reduce misconceptions. • Older students have different misconceptions than younger ones. • Boys and girls have different misconceptions. • Members of minority groups have different misconceptions. The application of advanced statistical tools on this test explores how items designed to reveal misconceptions may differ from conventional multiplechoice items. This may help substantiate the claim that standardized tests exclude items dealing with misconceptions because they do not fit the recommended profile (Narode 1987). Construction of an inventory of astronomical misconceptions will help many teachers to focus on changing student preconceptions. Test items can be used as the basis of classroom discussion, to help guide laboratory activities and writing assignments, and for the assessment of students on quizzes and tests.

4 This

has been suggested by Professor Al Cameron of Harvard’s Department of Astronomy for

implementation in September 1992.

II. Review of the Literature A good literature review distinguishes what has been done from what needs to be done. There has been a great deal of effort expended in the last fifteen years to identify student conceptions in science and in developing instructional methods to change them. My first step was to begin a wide search to select and collect documents related to astronomy teaching. I began with my professional collection of articles, books, proceedings, and private correspondence, starting a data base in the bibliographic program Endnote. Articles, books, and reports that were referred to in these documents and appeared to have some use were entered into the program. I then began computer and manual searches of data bases and reference volumes, moving on to indexes of appropriate journals. Finally, I collected all the materials listed in my data base, read through all of it, and sought out any useful documents referenced by these sources. Project STAR has been acquiring articles, books, and teaching materials concerned with astronomy education since December 1985. This resource library contains well over 1,000 items.5 Over forty articles and texts from this source were used for this paper. A computer search of the ERIC (Educational Resources Information Center) CD ROM proved valuable, searching on these descriptors: astronomy; misconception; light; curriculum; evaluation; teaching; and secondary school, in various combinations. This produced about 200 abstracts, of which I determined sixty would be useful. Articles dealing with misconceptions are widely dispersed in a variety of journals. I looked through the indexes of the following journals back to 1980 for appropriate articles: Journal of Research in Science Teaching; Science Education; American Journal of Physics; The Science Teacher; Science & Children; American Educational Research Journal; and International (originally European) Journal of Science Education. Since misconception research is recent, dissertations report on research that has been carried out in the last few years. I searched through listings of dissertations and qualifying papers at Harvard and through Dissertations Abstracts International using these descriptors: astronomy; physics; science; misconceptions; and evaluation. Twenty-three dissertations were found using this method. Conference announcements and proceedings also turned out to be a valuable source of information, especially the International Seminars on Misconceptions and Educational Strategies in Science and Mathematics (1983, 1987), the American Association of Physics Teachers Conferences (twice yearly), and the International Conference of GIREP (1986), an international physics education society. In addition, I searched through the abstracts for the meetings

5

This collection is located at the Science Education Department at the Harvard-Smithsonian Center for Astrophysics.

of the American Astronomical Society, the National Science Teachers Association, and the International Planetarium Society. My personal correspondence was also useful. Since I have given over forty workshops and papers on science teaching throughout the world, many researchers have sent me private letters and pre-publication reports on their work. Altogether, approximately 225 documents turned out to be pertinent to this study (see Figure 1). Some were especially fruitful. A doctoral dissertation (Schoon 1988) had surveyed the literature pertaining to misconceptions in earth science. A study had evaluated a major NSF-sponsored curriculum development project in astronomy (Klopfer 1964a). The local peaks in the graph appear to be the result of the publication of two collections of articles on misconceptions (both in 1985), and four proceedings of major conferences: two on misconceptions (1983, 1987) and two on astronomy education (1986, 1988). Since this period, the number of references has been much smaller. There has not been a conference on scientific misconceptions since 1987. Figure 1. References and Sources by Year Published 30 25 20 # of Items 15 10 5 0 1950 1955 1960 1965 1970 1975 1980 1985 1990 A. Cognitive Roots of Misconception Research What are the roots of the recent work in scientific misconceptions? How did all this interest in the student’s view of the world begin? A progression of psychologists and educators have noted that students have answers for questions that differ from the answers of knowledgeable adults. These differences have been explained in several different ways and have roots in the field of cognitive psychology. 1. Disequilibrium of Piaget The work of Jean Piaget is the earliest attempt to investigate the influence of children’s prior knowledge. Piaget developed the method of the “clinical inter-

view” to assess how children think about certain concepts. This method involves the child explaining what is happening in a particular situation and responding to proposed changes. Perhaps this technique of careful inquiry and observation grew out of Piaget’s formal training in natural history. When he began his career working on the standardization of intelligence tests in France, he was amazed at the types of wrong answers children would give and how these answers changed qualitatively with older children (Novak 1977). Piaget makes specific mention of “pre-concepts” when discussing youngsters’ attempts to distinguish between different spatial viewpoints (Piaget and Inhelder 1929). In a series of clinical interviews, Piaget showed a drawing to a child and asked her to select a point on a three-dimensional model from which the view would be the same as the drawing. Subjects showed great difficulty in differentiating between different viewpoints and often mistook right for left. Drawings that were reversed right-for-left were often as acceptable as a true view. Piaget theorized that children eventually learn to perform these tasks through a process of “assimilation” and “accommodation.” New experiences, such as taking photographs and looking at them later at a different location, will lead a child to fit this experience into his or her way of thinking about the world. The assimilation happens as the child’s cognitive structure is changed permanently, leading him to give different answers than he had done previously.6 Piaget’s ideas had relatively little impact on school curricula. His investigations focused on areas that were not a part of the traditional school science curriculum, for example, conservation of volume. Piaget’s research results were reported in journals and books that use the specialized jargon of the psychologist, and are not easily understood by teachers and curriculum developers. Little connection was made in trying to improve teaching or student materials as a result (Anderson and Smith 1986). That is not to say that Piaget has had no influence at all. Most pre-service teachers study some of Piaget’s work in their educational psychology courses. Some curriculum development projects have been based on Piaget’s ideas. The Lawrence Hall of Science (University of California at Berkeley) developed the SCIS (Science Curriculum Improvement Study) Program, which emphasizes Piaget’s stage theory in its student materials and teachers’ guides. This program saw great success in the 1960s and 1970s in the nation’s elementary schools.

6

I have found similar difficulties in high school and college students when they were asked to identify the phase of the moon from different viewpoints (item 7 in the STAR pre-test). The ability to visualize physical systems from different points of view is important in the study of astronomy. Problems such as the changing brightness of eclipsing binary stars or the appearance of the Earth from the Moon all rely on adequate spatial reasoning abilities, which are still undeveloped in many older students.

2. Cognitive Dissonance When two ideas seem to be in contradiction and yet both are indisputably true, the learner experiences “cognitive dissonance.”7 Much has been written about trying to avoid or reduce cognitive dissonance in instruction so that these “unhappy” experiences do not occur (Festinger 1957). The preference is expressed in the literature that students should integrate new concepts into their way of thinking so that they will not run into the problem of “getting the wrong answer.” Within STAR, we have found that experiences of wrongly predicting events are not detrimental, but rather can be beneficial. They provide the motivation for students to explore the inconsistency and hold the potential for students to change their ideas. Without such obvious examples of the powerlessness of student conceptions, students will learn new material by rote and promptly forget it after the term is over. 3. Preconceptions and Meaningful Learning David P. Ausubel was the first cognitive psychologist to realize that student conceptions are amazingly tenacious and almost impossible to change through instruction (Ausubel et al. 1978). Ausubel posited that preconceptions are not simply isolated beliefs, but elements of a very stable and comprehensive view of the world, and that formal instruction vies for, but often loses in the battle for, the student’s beliefs. In his Educational Psychology (1978, p. 336), he says, “the unlearning of preconceptions might very well prove to be the most determinative single factor in the acquisition and retention of subject-matter knowledge.” Ausubel laments that thorough studies of these preconceptions in science had never been undertaken at the time of his publication. Yet, because of his theories and writings, hundreds have been conducted since then. Ausubel makes a clear distinction between what he calls “meaningful” learning and rote learning. Meaningful learning denotes the incorporation of

7

For example, within the STAR curriculum, students are asked to predict what will happen to the appearance of a beautiful spectrum (white light broken up by a diffraction grating) when viewed through a piece of transparent red plastic. Many students believe that a filter changes the color of light that passes through it or that a filter acts on the spectrum to change its actual color. Students with either of these beliefs predict that the appearance of the entire spectrum will change to red or that portions of the spectrum will change to a different color. When students actually try this experiment and look through the filter at the spectrum, many get quite upset that their conception contradicts what they see. Unable to accommodate this new experience in their scheme of the world, initially many blame this situation on the filter, believing that it has some unusual or magical properties.

new concepts and facts into a student’s scheme of the world that results in the restructuring of the student’s knowledge.8 Anchoring ideas, or “anchors,” are preconceptions that are correct and may provide a path to assimilate new concepts. Although anchors may be structurally connected to ideas that are false, they can still provide some help.9 Teachers who search out the anchoring ideas of their students prior to instruction can more effectively eliminate student misconceptions by concentrating on examples and analogies that act as a bridge between these anchors and the desired concept (Clement 1986). For example, almost all students are aware that the Sun rises and sets, and that the Sun is up in the sky in other parts of the world at times when it is night here. That the Sun rises and sets is an anchor for believing that the Moon may rise and set as well. Over time, students can observe this behavior. Since Moonset happens about an hour later every day, it takes only several days for Moonset to begin happening during daylight hours. In this way the Moon can be regularly placed in the daytime sky and a new way of thinking about its motions must be constructed. For students to change their misconceptions, four conditions must be fulfilled. First, students must be aware of and unhappy with the power of existing conceptions. Second, any new conception must be understandable. The student need not believe it, but must be able to express it. Third, this new idea must be plausible. It must seem reasonable that this concept could be true and

8For

example, students in my Celestial Navigation course (Astronomy 2, Harvard University)

must keep journals of their observations of the night sky. Many who go out on a cloudless night expect to see the Moon in the sky, regardless of time or phase. Shocked that the Moon is not visible, they often keep an eye out for it on successive nights and days until they finally see it, then track it for the rest of the month. Incorporation of their observations, through record keeping and class discussion, completely revamps how they think about the Moon's visibility. Out of the restructuring comes the ability to predict Moonrise, Moonset, and the cycle and cause of phases. Simply memorizing by rote that the Moon is not always up at night and the names of its phases would have no effect on the structure of their knowledge. It would do nothing to change their ability to make any substantial predictions of the Moon's motions or appearance. 9In

the example in footnote 8, although many students sought for the Moon at night, they did

remember that on some occasions they had seen the Moon in the daytime. A correct theory of the Moon's motions must include the fact that the Moon is found in the daytime sky as much as in the nighttime sky. The anchor of seeing the Moon in the daytime can help in reformulation of the learner's ideas in a positive fashion.

that it does not conflict with the way the student sees the world. Finally, it must be productive. This new idea must predict and explain more than the student’s old idea (Posner et al. 1982; Roth 1985a). B. History of Scientific Misconceptions 1. Historical Recapitulation Children’s theories often appear well thought out and quite remarkable. The development of scientific conceptions in the child in many cases appears to be similar to historical theories in a field of science. Three examples are presented to document this claim. Children often talk of heat as a substance that flows from object to object. Flames can put heat into a kettle of water or heat can escape out of an open window. This is reminiscent of the “caloric” theory of heat. So-named by Lavoisier in 1787, caloric is a fluid that will “flow from the hotter to the colder body until an equilibrium has been achieved” (Holton 1985). The mechanism of heat transfer that children describe, however, is very different from that of caloric. Caloric could flow by itself, although it is governed by certain physical laws. Youngsters rarely attribute an inherent motive force to heat, but talk of its movement through “fumes, rays, or waves” (Erickson and Tiberghien 1985). Many students think of physical motion as dependent on a force inherent in an object. Any object, such as a ball thrown into the air, is kept in upward motion by a force that slowly diminishes, after which the ball falls back to Earth (Caramazza et al. 1981). The imparting and dissipation of “impetus” was a key element of medieval physics and can be traced as far back in origin as the Greek astronomer Hipparcus (ca. 150 BC) (McClosky 1983). Leonardo da Vinci (14521519 AD) believed in this theory. Even the great Galileo (1564-1642 AD), whom many consider the first modern scientist, began his career believing in the impetus theory. Many children describe vision as an active process, whereby emanations from their eyes travel out to the object and behold it (Guesne 1985). Plato (ca. 427-347 BC) and the Pythagoreans thought of vision in virtually the same way (Lindberg 1976). In their “extramission” theory, vision was caused when: Visual current issues forth [from the eyes]...and is formed into a single homogeneous body in a direct line with the eyes, in whatever quarter the stream issuing from within strikes upon any object it encounters outside. (Plato 1937, pp. 152-53) Although to the Pythagoreans there are many additional processes (e.g., the interaction of this stream with daylight), vision is definitely an active process. This theory did not go undisputed. Aristotle (384-322 BC), for one, did not believe in extramission: In general it is unreasonable to suppose that seeing occurs by something issuing from the eye; that the ray of vision reaches as far as the stars, or goes to a certain point and coalesces with the object (Aristotle 1957, p. 225) We should not, however, draw too close a parallel between the scientific theories of old and the ideas of students. Although some common features can

0

be found, students’ theories are never as sophisticated, internally consistent, and coherent, or even as powerful as those of the scientists of old (Driver et al. 1985). Many researchers have suggested that a historical approach to the teaching of science would help reduce misconceptions (Champagne et al. 1980; Nussbaum 1986; Prather 1985). Studying science by simply reviewing the historical development of the field cannot be effective in changing students’ misconceptions. Understanding what scientists believed in the past is a very difficult and time-consuming process. The evolution of scientific ideas is a field all its own— It is not enough to discover what our predecessors believed and leave it at that: we must try to see the world through their untutored eyes, recognize the problems which faced them, and so find out for ourselves why it was that their ideas were so different than our own. (Toumlin and Goodfield 1967, intro.) Others suggest that modern and ancient paradigms could be contrasted by reading original sources. This would help to acknowledge preconceptions held by the students and foster a major reconceptualization (Champagne et al. 1980). Although this all seems reasonable, most astronomy texts already begin with a historical treatment of the development of astronomy, and many teachers expand on this treatment in their courses. Yet misconceptions persist in these courses (Sadler and Luzader 1988). There has yet to be a definitive study showing whether emphasis on the history of science changes even a single misconception. One university biologist carried out an investigation into whether students’ conception of photosynthesis across grade levels modeled the historical development of the concept (Wandersee 1986). By testing 1,405 students in grade 5, grade 8, and grade 11, and college sophomores, he found that younger students are more likely to have conceptions of photosynthesis that were accepted long ago, but have since been discarded. Their conceptions do not necessarily pass through the same stages that the field has passed through historically. Yet, Wandersee notes, teachers should still expose students to misconceptions of the past as a useful activity. Professionals in every field usually organize their own ideas around certain grand schemes: philosophical (evolution is the engine of nature); historical (the place of the Earth in the universe); or symbolic (Newton’s three laws of motion are the basis of mechanics). We must guard against believing that the way we think as adults (and scientists) can provide a framework for the structure of curricula or teaching methodology. Philosophical, historical, or symbolic approaches to the teaching of science may not be effective in changing student misconceptions. They are expert views and we, as experts, are comfortable with them. Students, however, may benefit from methods that may seem objectionable or vacuous. The real determinant is whether they work with youngsters and not whether they appeal to us as experts who have since forgotten how we learned these concepts ourselves. 2. Concept Lists

1

Before the modern investigation of scientific misconceptions, several studies were conducted that had as a goal the identification of key concepts taught in secondary school science courses. Some studies undertook the measurement of concept attainment for certain groups. Others hoped that by enumerating these concepts, one could compare this course content with that of college level courses. With this information, college courses could then be modified so that teachers-in-training would have a better foundation in the subject that they were preparing to teach. In 1983, 100 high schools in Texas participated in the assessment of the earth science knowledge of 492 randomly selected seniors (Rollins et al. 1983). Six concepts were tested by a 72-item multiple-choice instrument, with 12 questions per concept. Two of the concepts were astronomy-related: the reason for seasons and the cause of day and night. The questions were rather simple and mixed factual and conceptual items.10 Each item consisted of a stem that posed the problem, one right answer, and three distractors. Random guessing with only four choices would have resulted in an average score of 25 percent. Students had not completely mastered either of these astronomy concepts by the end of twelfth grade. Students answered 79 percent of the day and night questions correctly and 67 percent of the seasons questions correctly. Students who had taken more science courses performed marginally better on these items (7 to 10 percent) than the others. In 1966, an Ed.D. candidate at Colorado State College identified 119 key astronomical concepts being taught by the ESCP (Earth Science Curriculum Project), at the time a new ninth-grade curriculum, and compared these with concepts in earth science courses being taught in local colleges (Sonnier 1966). The researcher found that the concept content of the high school and college courses was very similar. Only 4 out of the 119 concepts covered in high school courses were not covered in college courses. Sonnier was especially interested in whether teachers’ comfort in teaching earth science was related to the courses they took in college. Sonnier found that teachers did not feel especially well prepared by their college courses, although teachers with more earth science in college knew more of the ESCP concepts. Of special significance, a negative correlation was found between the amount of college preparation in earth science and the number of concepts teachers felt they learned in college. Teachers with more courses under their belt felt they had learned less than those with fewer courses. Perhaps these more advanced teachers recognized after several semesters that they held misconceptions that took a long time to change and were less likely than others with less experience to assume that they had learned these concepts in college. These more experienced teachers were much more likely to attribute learning these concepts to other professional activities or simply to reading the ESCP textbook, according to Sonnier.

10An

example of the latter is:

The tilt of the earth's axis is a cause of: A. the Coriolis Parameter

B. earthquakes

C. day and night

D. the seasons.

2

A list of earth science concepts was prepared by fifty-four earth science professors in 1972 to help guide curriculum efforts in grades K-12 (Janke and Pella 1972). These authors requested and then selected earth science concepts for grades K-12 in response to this request: “list what you believe to be the three to five most important concepts in your area of specialization within earth science” (p. 225). This resulted in a list of fifty-two concepts, of which eleven related to astronomy. Of the six top choices, three were astronomical: the reason for seasons; the cause of night and day; and the Sun as a provider of almost all the energy available on the surface of the Earth. All three are subject to misconceptions within school and adult populations. In 1963, James L. Keuthe investigated the popular understanding of many facts and concepts in science by interviewing 100 college-bound high school seniors. Drawing from his own store of questions, he discovered several answers that were very popular, but wrong. He deemed these “misconceptions,” stipulating that they were produced when a question “evoked the same error” in most of his subjects. These were not simply guesses that would produce a great number of different answers, but the same wrong answers appeared again and again. Some of Keuthe’s discoveries about these students were: — 70 percent believed that the Earth’s shadow causes the phases of the Moon. — 54 percent believed that the Sun is not the nearest star. — 33 percent were unable to state why the Sun rises in the East. He concluded that these misconceptions were a cause for concern because students really do believe that their answers are correct. In spite of the fact that perhaps the students had once been correctly taught this material, Keuthe explained why misconceptions appear: “forgetting occurs because the memorization was rote and not in the framework of a logically meaningful system (Keuthe 1963).” This is an idea very similar to the ideas of David Ausubel. In 1976, a study was conducted with 220 students in grades K-4 to examine their explanations for various natural phenomena (Za’rour 1976). One question concerned the Moon. While 75 percent of the students remarked that it changed shape, 49 percent thought it changed weight as its shape changed. Za’rour found that youngsters are very observant, but waste no time in fitting their observations into their current conceptual framework. In this case, they thought that the Moon was not a sphere that was being illuminated in different ways, but an object that was losing part of itself. C. A Hierarchy of Misconception Research Since the first clinical interviews with students, misconception studies have become more sophisticated. Because the interview process is so time-consuming, many researchers have attempted to construct simple written instruments to assess the misconceptions of students. Such tests have been used to inform teachers of the potential difficulties that will arise when they are trying to teach certain concepts. These “pre-tests” have also been used to predict student achievement. Most encouraging is that attempts have been made to apply tests of misconceptions to the improvement of instruction, both by testing instructional methods and by curriculum revision.

3

1. Clinical Interviews While working on his doctoral dissertation testing the efficacy of audiotutorial methods, a Cornell graduate student found that he could not make sense of the explanations given by his second-grade subjects. His interviews were open-ended and based on getting the answers to these questions : What is the shape of the Earth? How do you know that the Earth is round? Which way do we have to look to see the Earth? Why don’t we see the Earth as a ball? What does one have to do to see the Earth as a ball? (Nussbaum and Novak 1976). Although he started by following Piaget’s structured clinical interview technique, Nussbaum soon found that the props he provided — a globe and a tiny figure of a person — were not having the desired result. Student explanations still did not make much sense to him. He finally settled on a series of questions about drawings he had prepared that helped to place the children into five categories based on their notions of the Earth in space. Even so, he was sure that he had not discovered all their conceptions, and he recommended that they be confirmed through further interviews. The notions that Nussbaum found prevalent in second graders were: 1. The Earth is flat, although it may be circular and hence “round.” 2. The Earth is a ball, but we live inside or above an absolute “ground.” 3. The Earth is a ball surrounded by space, but there is an absolute up and down. 4. The Earth is a ball, but objects fall only to the Earth’s surface. 5. The Earth is a ball and objects fall toward the center of the Earth. Nussbaum returned again to test more students and developed an “Earth Notion Classification Scheme” that roughly approximated the stages of thought students pass through in developing their notions of the Earth in space (Nussbaum 1979). Because of the ground-breaking results of Nussbaum’s research, from around the world—in Nepal (Mali and Howe 1979); in California (Sneider and Pulos 1983); in Israel (Nussbaum 1979); with Mexican-American children (Klein 1982); and in Greece (Vosniadou and Brewer 1987)—his studies were replicated by many researchers. Nussbaum’s results were substantiated and expanded. His scheme was refined to: 1. The Earth is flat. 2. We live within an Earth that is shaped like a ball and surrounded by space. 3. We live only on top of an Earth that is shaped like a ball and surrounded by space. 4. (#3 above) and objects fall to the surface of the ball. 5. (#3 above) and objects fall toward the center of the Earth. Sneider and Pulos added another category to this scheme, which they named 3/4, in which students believe that objects fall to the surface of the Earth, but that people do not live all over the Earth’s surface. Vosniadou and Brewer found that many children believe that there are two Earths, one that is in the sky

4

(or in space) and the other on which we live.11 In the studies of Nussbaum and Sneider and Pulos, many 13- and 14-year-old students still believe that the Earth is flat or that we all live inside a hollow Earth. This confounded the middle school science teachers with whom I have talked. Could it be that their students have these beliefs without the teachers being aware of them? I decided to pursue my own line of inquiry into this matter by asking teachers to predict the Earth notions of their students. For three summers (1988, 1989, 1990), as a part of my presentation of astronomy activities to teachers at the National Science Resources Center Institute and the Independent School Association of Massachusetts, I had 111 teachers of grades K-8 predict the distribution of Earth notions among students in their classes (Lightman and Sadler 1986). The level of Earth notion was based on Nussbaum’s scale as described above. For each grade level, an average score was calculated for both the teachers’ predictions and the actual students’ performance. Sneider and Pulos did carried out testing in grades three to eight. Teacher predictions were made for grades kindergarten to eight. The results were quite surprising (see Figure 2). At each grade level, teachers predicted that their students would perform well above the average levels found by Sneider and Pulos. If the data collected by Sneider and Pulos are representative of the students in the sample teachers’ classrooms, then teachers are woefully unaware of their students’ conceptions of the world. Student conceptions in grade 3 (and most likely in K, 1, and 2), are of a flat or hollow Earth. Yet, teachers believe that students’ notions are much more sophisticated. Undoubtedly, instruction about space or world geography is very ineffective at these grades because of the disparity between the conceptions students hold and the concepts their teachers think they hold. Many students at the lower primary level think the globe in the classroom is a model of some other planet in outer space, not the one on which we live. How can they make sense of geography instruction whenever a globe is used? Teachers’ awareness of their own students’ misconceptions may be a fruitful subject for further study.

11I

have had discussions with my own son when he was five years old about his notion of the

Earth: he does believe that there are two Earths, the one that is in outer space, which he sees in his picture books clustered with the other planets, and the one on which we live.

5

5

Average Earth Notion Development Level

6

Figure 2, Earth Notions Development Level Student Interviews from Sneider and Pulos 1983 Teacher Predictions from Sadler 1987-90

4 3 2 1 K

1

2

3

4

5

6

7

8

Grade Level Teacher Predictions N=111 (Lightman and Sadler 1986)

Student Interviews N=134, (Sneider and Pulos 1983)

Interviewing students to find out what their own ideas are is a very difficult and time-consuming affair. It requires attention to finding out what youngsters really think, as opposed to having them reiterate what they have been taught. The technique of “Interviews about Instances” was developed to help get the interviewee to talk about his or her ideas (Bell et al. 1985). This method involves asking a series of open-ended (“What do you think the Earth looks like from space?”) and closed-ended (“Does the Moon have night and day?”) questions that the interviewer can prepare in advance. Each question can be written on a card with a simple picture or diagram that helps to illustrate it. These instances, however, are only the starting point. The interviewer must not stick to a rigorous line of questioning, but must always seek to improvise questions to help him understand the ideas of the child. Whenever scientific or specialized vocabulary is mentioned by the child, the interviewer must follow up until she is confident that she understands what the child means by any such word. Children often mean quite different things when they use scientific terms than do scientists. To young children, the word “animal” usually means four-legged mammals and specifically excludes humans and insects (Carey 1985). The terms “velocity” and “acceleration,” so clearly distinguishable to the physicist, are used interchangeably by many physics students (McDermott 1984). The phrase “I don’t know” should be a stimulus to the interviewer to push further, albeit perhaps gently, to determine whether the student is avoiding the question because he thinks the interviewer is judgmental of the answer. It is important that the interviewer be seen as interested in all of the student’s ideas and not as someone who is judging the student on his or her worth (Driver and Easley 1978). I have found that students rarely stick to their “I don’t know” answer when given a little encouragement and support. As a result, they feel safer and are more forthcoming with their ideas.

Videotaping the interview can be very helpful. It frees up the interviewer’s attention by removing the need to take notes, so that the subject is never told to wait while the interviewer catches up. The tapes can later be reviewed to create an inventory of student misconceptions (Sadler 1987). I have found that students seem less reticent in front of a camera than when their ideas are being recorded on paper. Researchers recommend that teachers interview their students about their scientific conceptions (Duckworth 1987; Langford 1989; Novick and Nussbaum 1978; Nussbaum and Novak 1982). Such action is recommended to help clarify teachers’ own ideas and lead to an appreciation of “children’s science” (Osborne and Bell 1983). While this is an admirable suggestion, teachers have great difficulty developing expertise in this procedure.12 The misconception literature says nothing about teacher difficulties in learning to interview their students, and I could find no study that looks at how effective teachers can become successful interviewers. Judging from my own experience, listening to my classmates, and watching videos of teacher-led interviews, I have found that teachers have great difficulty simply because they expect to teach when they interview. Many express a desire that students will learn something as a result of the interview. This leads them to correct any student statements that they deem incorrect, and, of even more consequence, to limit the exploration of statements that appear correct or use scientific jargon. Teachers tend to turn interviews into situations with which they are more familiar and more comfortable: either testing sessions where they reward right answers or tutoring sessions where they instruct the students. Papers that recommend that teachers interview their students should stress two indispensable rules: • Never try to teach in an interview. • Continue until the student’s conception disagrees with your own. 2. Development of Written Instruments The major shortcoming of finding student misconceptions through interviews is that these investigations, because of their one-to-one nature, always rely on few subjects. Interview studies with more than 100 subjects are rare in the literature. Because of their small size, these investigations are not really

12The

technique of using teachers to interview learners is used by Professor Eleanor Duckworth

in two of her classes at the Harvard Graduate School of Education (T-440, Teaching and Learning; and T-150, Research Based on Understanding). Her students, most of whom have experience as teachers or soon will be teachers, learn to use clinical interviews to explore students' ideas. Even after an entire semester of practice, I have observed that many of these teachers and teachers-to-be still appear to be struggling to master this technique. To suppose that this skill can be learned by only reading an article is naive.

7

representative of the school-age population they seek to study. Researchers have investigated various ways to get around this problem in order to be able to test larger and more diverse samples of students. Prather called for the development of “reliable diagnostic tests designed to identify and classify as many common misconceptions of science as possible” (Prather 1985, p. 27). He assigns this task an immediate priority in that any work on improving the teaching of science by reducing misconceptions must wait for appropriate evaluation instruments. Open-ended written tests, where students fill in an explanation for the example posed, have been used by researchers: students are asked to give a written explanation for a particular event; they can include diagrams if they wish. Although this technique can be quite fruitful, students often simply recall the “official” explanation for a particular phenomenon and load their explanation with jargon to cover their confusion. However, these written instruments can still lead to identification of many popular misconceptions, from which multiplechoice tests can be constructed (Halloun and Hestenes 1985). Written tests have been constructed that force a choice between a single correct answer and several misconceptions that have been identified through interviews (Freyberg and Osborne 1985). This technique limits the student’s incorrect responses to previously identified misconceptions. The inclusion of an “I don’t know” or “none of the above” category reduces the efficacy of the test because students are no longer obliged to choose among conceptions that may closely, but not exactly, match their own. A two-tier pencil-and-paper test has proved to be somewhat more helpful, in that it combines the elements of an open-ended test with a multiplechoice test (Treagust 1986). Each test item consists of two parts, each of which is multiple-choice in nature. The first part asks the student to predict the outcome of a situation. The second part lets him select a reason for this answer and provides a space for filling in his or her own, if the stated answers are inadequate. In interpreting these multiple-choice tests, responses that draw more than 10 percent of the answers are usually examined in depth (Gilbert 1977). A major shortcoming of using written instruments instead of interviews is that the results have to be interpreted in the light of differences in students’ reading and writing abilities, rather than only in terms of their misconceptions (Cohen 1980). Written tests do not allow the interviewer to pursue the subject’s ideas until they are clearly captured. However, in an attempt to find out if multiple-choice tests could discriminate between students’ Piagetian operational levels, Gilbert interviewed and later tested with a written instrument 20 college students (Gilbert 1977). He found the students’ performance identical in 93 out of 120 items. However, I could find no study in which the effectiveness of multiplechoice tests of misconceptions was compared with that of interviews. This could be a valuable research opportunity. Multiple-choice tests need not be used exclusively. Interviewing students along with giving them multiple-choice tests can be used for the purpose of validation of the tests (Halloun and Hestenes 1985). Interviewing even a modest fraction of students taking a formal written test can help describe in detail any conceptual changes that have taken place. Interviews, because of their open-

8

ended nature, can help uncover areas of conceptual change that may not be picked up by the prepared pre-tests and post-tests (Finley 1986). Table I shows the type of misconception test used in each of thirty-eight studies that I have examined in the course of preparing this paper: clinical interview; open-ended statements requiring writing or drawing by the subject; and multiple-choice. The median size of studies using interviews is only thirty-six subjects, so results may not often be significant. The mean and standard deviation of the number of subjects are calculated for these three types of test. Written tests are quickly and easily administered to large groups of students. Tests requiring written answers have a median size of 113 subjects, roughly three times the size of studies using interviews. Reading level and the ability to express oneself through writing become important in these tests (Stead and Osborne 1980). Open-ended questions have the advantage of uncovering unexpected misconceptions, whereas multiple-choice tests produce standardized answers that are easily compared. Multiple-choice tests show a median of 189 subjects in the studies I have examined. They require that the misconceptions of the subjects must be identified previously and codified into a structure of short, unambiguous answers. Multiple-choice tests are easily scored, even by those with no knowledge of misconceptions, and their results can be used immediately. They are useful not only for studies, but also for teachers to easily ascertain misconceptions in their own classrooms.

9

0

Table I. Misconception Study Sample Size by Domain and Type of Test Study

Domain

Interview Written

Mult. Ch.

Anderson and Karrqvist 1983b

light

21

207

Anderson and Smith 1986

light

11

125

Arnaudin and Mintzes 1985

human biology

50

Bouwens 1986

light

Brown and Clement 1986

mechanics

50

Caramazza et al. 1981

mechanics

50

Clement 1986

mechanics

Cohen 1982

astronomy

Dai 1990

astronomy

185

Dufresne et al. 1986

mechanics

42

Finley 1986

magnets

Gunstone and White 1981, trial

gravity

175

Gunstone and White 1981

gravity

468

Halloun and Hestenes 1985

mechanics

Happs and Coulstock 1987

astronomy

25

Hardiman et al. 1986

mechanics

42

Kenealy 1987

mechanics

513

Keuthe 1963

general science

100

Klein 1982

cosmography

Lightman and Miller 1989

cosmology

Mali and Howe 1979

cosmography

Nussbaum 1979

cosmography

Nussbaum and Novak 1976

cosmography

Ogar 1986

gravity

Placek 1987

mechanics

Rhoneck and Grob 1987

electricity

Roth 1985a

biology

18

Sadler 1987

astronomy

25

Shipstone et al. 1987

electricity

Sneider and Pulos 1983

cosmography

Stead and Osborne 1980

light

Thijs 1987

mechanics

639

132 50

16

32

1,500

24 250

1,111

240 60 189 49 10

213 1,250

159 36

144 162

1 Touger 1985

astronomy

113

99

Treagust and Smith 1986

astronomy

24

113

Viglietta 1986

astronomy

Vosniadou and Brewer 1987

cosmography

Wandersee 1986

biology

Za’rour 1976

general science

25 60 1,405

Median =

220

0

0

36

113

189

The advantage to interviews is that the reasons behind a student’s response can be pursued by the interviewer and ambiguous responses can be clarified. However, the interviewer must be very skilled and aware of student misconceptions. Tapes or video must be transcribed for a full analysis to be completed (Stead and Osborne 1980), and this may be both time-consuming and expensive. Figure 3. Range of Study Size by Instrument Type 15 10 # of Studies 5 0 10

100

1,000

10,000

# of Subjects in Study Multiple choice

Written or drawn

Interview

Figure 3 illustrates the relative frequency of use of these three different test methods. Each of the studies was grouped by its number of subjects into one of four categories: 1 to 9; 10 to 99; 100 to 999; 1,000 to 9,999. Interviews are generally used in small groups, while open-ended questions and multiple-choice tests are used in large groups, with the latter preferred for very large groups. Table II. % of Studies Using Different Instruments by Sample Size

Size of Study

Multiple-choice

Written or drawn Interview

11 to100

29%

38%

78%

101 to 1,000

50%

56%

22%

1,001 to 10,000

21%

6%

0%

3. Comparative Studies of Scientific Misconceptions Several studies have attempted to compile misconceptions that cover an entire scientific domain. Compared to studies based on interviews, these attempts usually are based on earlier interview research. The reason given for pursuing this line of inquiry is to inform the practitioner of the broad range of misconceptions that must be dealt with while teaching a specific course. In 1988, a doctoral student at Loyola University produced an eighteenitem multiple-choice test to identify misconceptions in earth science (Schoon 1988). Of these items, thirteen pertained to astronomy.13 Schoon developed many of his test questions from those of earlier researchers (Janke and Pella 1972; Keuthe 1963; Lightman et al. 1987; Sadler 1987). He wished to determine how the number of misconceptions varied across gender, race, grade level, geographic location, exposure to earth science courses, and last science grade received in school. He gave his test to 1,213 students in the greater Chicago area in grades 5, 8, 11, college, and trade school. Schoon found that females had significantly more misconceptions than males (at p≤.05), although by a magnitude of only one-third of a test item. Black and Hispanic students had significantly more misconceptions than white students (at p ≤ 0.05), although again the magnitude was small, only about one-half of a test item. One might expect students at different grade levels to exhibit vastly different levels of misconceptions; however, the mean number of misconceptions of fifth-grade students was only one-half an item higher than the mean for college students (at p ≤ 0.05). A significant difference was found between urban and suburban students (at p ≤ 0.05), although it too was only about one-half a test item. Surprisingly, there was no significant difference between students who had taken earth science courses (at p≤.05), either in college or in high school, and those who had not. Finally, no significant difference was detected in total number of misconceptions based upon the last science grade the student

13

Here is an example of one of the test questions:

Summer is warmer than winter, because in summer: a) The sky has fewer clouds. b ) The earth is nearer the sun. c) The earth is better insulated. d) The sun is higher in the sky.

2

received (at p ≤ 0.05). The “best” students in science were no different from their classmates in the number of misconceptions they possessed. Schoon’s study shows very small but statistically significant differences in mean number of misconceptions between his subgroups. This may be a result of the similarity in knowledge of these groups, or it could be a result of problems with his test. It appears that the test was devised with little attention paid to the establishment of its validity. He did construct a pilot instrument of sixty-three multiple-choice items, but these questions do not appear to be identical to those used in the studies of earlier researchers. Three science teachers reworded questions to improve clarity and accuracy and eliminated others. A fifty-question test resulted from this procedure. Schoon’s work has made a major contribution to study of misconceptions; however, several changes would have increased its validity. A discussion of these problems of readability, comprehension, expert evaluation, reliability, and number of distractors follows. Three fifth-grade teachers reviewed this instrument for readability and content, but no formal readability measurement was applied. I applied two readability tests to Schoon’s test. 14 The Flesch grade level (Hopkins 1981) is 5.8 and is based on the average number of words per sentence and the average number of syllables per word. The Gunning Fog Index (Microsoft Corporation 1991) is 7.6 and is based upon the overall sentence length and the number of words per sentence containing more than one syllable. Since the estimated readability of the test is somewhere around 5.8 and 7.6, the majority of the fifth graders would have had difficulty reading it and some of the eighth-grade students would have had difficulty as well. Had the test been designed to be more readable, the younger students might have performed the same as or possibly better than the older students in Schoon’s study. This preliminary instrument was administered on the very first day of the fall semester to seventy-five high school and college students. Students suggested that some questions be reworded. This pilot test was never tried out with students in the fifth or eighth grades. The suggestions and improvements made by high school teachers and students and college students may have made the test too difficult for younger students. The author does not discuss how he made the final selection of items to include. An example of the type of question that may have been too difficult for fifth graders to understand is Schoon’s question #2:

14

Applied by the Grammar Checker in the word processing program, Microsoft Word 5.0.

3

Each day during the summer months, the amount of daylight: a. is more than the day before. b. is less than the day before. c. is the same as the day before. d. has nothing to do with the day before. Fifth-grade students may not be aware which are the summer months (“Is June a summer month?”), that the “amount of daylight” means the number of hours the Sun is above the horizon, or may not understand the proposed relationship between daylight on subsequent days. If June is a summer month, then the correct answer is not listed. The logical structure of some of the questions seems very complex and beyond the ability of younger students to decipher. For example, take question #15: If a crystal can scratch glass, then: a. it is a diamond. b. it is not a diamond. c. it may be a diamond. d. it probably is not a diamond. Students may not be able to discriminate clearly between statements that are true, may be true, are probably not true, and are not true. In addition, the “it” in these answers may be confusing to students. “It” may be construed as a crystal, glass, or diamond. Schoon did not administer the test to earth science experts to determine if they agreed on all of the answers. He never administered the test in an openended format to determine if students had any additional misconceptions. Reliability was never formally established. Students were never asked to answer the test questions orally to check if they would still give the same answers. Students were never given the test twice to ascertain the role of guessing in their answers. There was no comparison of test scores among different but equivalent groups. The Kuder-Richardson test (Aiken 1985) would have proven useful for examining the internal consistency of the instrument (Anderson 1975). 4. Causal Models of Achievement Using Scientific Misconceptions Students have various misconceptions that persist throughout their schooling and carry over into adulthood. With the development of instruments to ascertain misconceptions in various fields, it becomes possible to investigate the role of misconceptions in learning. Studies have been constructed to determine: How important is the holding of scientific misconceptions to student achievement? What is the relative importance of preconceptions when compared to the variety of demographic attributes and the previous schooling on student achievement ? To what degree is it possible to predict students’ performance in a course on the basis of the misconceptions they hold at the start?

4

One might suspect that students with high Intelligence Quotients hold fewer misconceptions than less bright students, or that students who receive higher grades in courses hold fewer misconceptions than those who receive lower grades. Using the statistical tools of multiple regression or factorial modeling procedures (Lohnes 1979), one can investigate the causes of student performance. Two investigations have been carried out to explore the role of IQ and the number of scientific misconceptions that students hold. One study involved the administration of a fourteen-item Newtonian mechanics misconception test to two groups taking a similar physics course in the same school (Placek 1987). The gifted group of twenty-five students had a mean IQ of 146. The less gifted group had a mean IQ of 116. The test consisted of nine prediction questions that required a written response and five tasks requiring predicting the outcome of an experiment. The test items were based on those that had been administered by other researchers (Clement 1982; McDermott 1984; Minstrell 1982; Osborne 1984). Although the mean final course grades given by the physics teacher were very different (93 for the more gifted versus 81 for the less gifted), the misconception test did not show a similar dichotomy. A chi-square test detected no significant difference at the p < 0.01 level between the two groups on any of the fourteen questions. On half of these items the more gifted students did better; on half they did worse than the less gifted group. Another investigation in Germany was conducted with ten students whose IQs ranged from 107 to 141 (Rhoneck and Grob 1987). These ninth graders were taught lessons on basic electricity using apparatus with which they could test their ideas. They took a multiple-choice misconception test based on misconceptions uncovered by prior investigators (Shipstone et al. 1987). Rhoneck found no significant correlation between IQ and initial misconceptions or misconception at the end of the course at the p < 0.01 level. However, he detected a significant positive correlation between pre-test and post-test scores of 0.90. This implies that students tended to hold on to their original conceptions throughout the course, regardless of their IQ. Newtonian mechanics is one area within physics in which many student misconceptions have been identified through interviews. There have been so many interviews and small studies that a comprehensive inventory can be created (Halloun and Hestenes 1985). It has become possible to use such an inventory of items to find out how the holding of misconceptions may affect student performance. Three studies have sought to measure students’ qualitative understanding of Newtonian mechanics and to assess the role of misconceptions in mastery of mechanics. In 1978, a study was conducted to determine factors predictive of student achievement in an introductory, non-calculus-based physics course at the University of Pittsburgh (Champagne et al. 1980). Two classes of factors were investigated. The first were measures of the students’ background and included gender, and the number and type of courses previously taken in mathematics and science. The second set of factors were measured by testing the students using instruments constructed especially for the study:

5

— a motion preconception test in which watched an adult carry out demonstrations, and then described their observations and offered predictions of future events (e.g., many thought objects hung on a string exerted more force on the string when closer to the floor). — a reasoning test required the application of logical reasoning to representations of the real world. — a math test covered skills such as use of the quadratic formula, scientific notation, and trigonometric functions. These three specially prepared instruments were given at the start of the course to a total of 110 participants. Subjects also answered questions about their science and math backgrounds. Students took mid-term and final one-hour exams that tested their ability to apply physical principles to new problems. The authors used correlations between each of the variables and regression analysis to examine the effect of the students’ previous knowledge on their mechanics achievement score (their examination grade). The only significant correlations (p < 0.01) between the variety of factors and the mechanics achievement score were with the instruments that were especially prepared for this study: the preconception test, the reasoning test, and the math skills test. Number and type of science and math courses taken were not significant. A multiple regression analysis was used to determine the amount of variance in the achievement test that could be accounted for by each of the components. All of the background measures examined in the study—gender and years of physics, mathematics, or science taken in high school or college—had no significant effect (at p = 0.05). The authors were especially surprised at the apparent lack of effect (R = 0.09 and was not significant at the p < 0.05 level) of taking high school physics (some had taken two years of it, a standard course and an advanced placement course). Since the primary justification given by teachers at every grade level for science coursework is the preparation it provides for the next level, it appears that this reason is unjustified by the results of this study (Hofwolt 1985). High school physics courses are, on average, ineffective in preparing students properly for introductory college physics courses. The three specially prepared tests were significant (at the p = 0.01 level) in accounting for some of the variance in the mechanics achievement score. The multiple regression analysis estimated the contribution of each factor to explaining the variation of test scores. All correlations were significant at the p = 0.01 level. In Table III the cumulative effect of these three significant factors is tabulated. Multiple R is the cumulative correlation coefficient. R2 is the amount of variance explained by the listed factors. The “Variance explained” is the cumulative effect of all the previous factors as a percentage.

6

Table III. Contribution of Factors to Variance

Multiple R2

Variance explained

Motion preconception test:

0.056

5.6%

Logical reasoning:

0.143

8.7%

Math skills

0.325

18.3%

The authors were confounded by the relatively small amount of variance in achievement score explained by the motion preconception test. They examined the students’ written answers on the motion preconception test, concentrating on answers from only the highest- and lowest-performing students in the class. The contrast between these two groups was obvious in that the highest-scoring students made heavy use of the terms and concepts of mechanics and were consciously aware that they had accepted the Newtonian model of the world. Low-scoring students explained motion on the basis of some other paradigm. Champagne and Klopfer continued this line of inquiry and in 1982 carried out further analysis using the same data, but different statistical tools (Champagne and Klopfer 1982). Their application of Factorial Modeling gives a more detailed look at the structure and strength of the relationship between tested variables. It produces factors that are minimized for their intercorrelation. The advantage of this procedure is that predictions can be made concerning the effect changes of the predictors will have on the criterion variate. In this study, they modeled the impact on the mechanics achievement score of changing the motion concepts, reasoning, or math skills scores. (Raising the score on any of several independent variables will increase the dependent variable by a calculated amount.) The example the authors give is the effect of separately raising the scores of each of the pre-tests by one standard deviation. In each case the predicted achievement score would be for each test: motion concepts, 0.1 SD; reasoning, 0.2 SD; math skills, 0.3 SD. It is interesting that the two components that have the least to do with physics raise the achievement score the most. To improve the reasoning score, the authors suggest working with puzzles, word problems, and logical problems. To improve the math score, practice with solving equations and graphing would be most appropriate. I have often heard high school physics teachers complain that their students “can’t think” and “can’t do math.” They lament that these abilities should have been honed in earlier grades and there is no time in physics class for the teacher to pursue improvement in these areas. Given that high school physics instruction has virtually no predictive value for college achievement in physics, physics teachers may be wise to teach less of what they deem physics and more reasoning and math. In this way they might make a significant impact on the learning of physics by their students. A study that explores this hypothesis in comparison with conventional instruction would be very useful. In 1985, researchers at Arizona State University carried out an investigation similar in parts to that of the study above (Halloun and Hestenes 1985). Two college physics teachers constructed a thirty-six-item physics misconceptions test and administered it to a much larger population than

7

Champagne and Klopfer: 1,500 college students taking either a first semester, calculus-based or non-calculus-based physics course. They also ascertained the students’ math skills, and the number of physics and math courses taken previously. The dependent variable was the course grade. This was determined almost entirely from examination results. These results are consistent with those of Champagne and Klopfer. However, with the help of a large sample size, statistically significant results were found for four factors: the physics pre-test, the math pre-test, and the physics and mathematics courses taken previously. High school physics had some impact on the non-calculus-based course, but none on the calculus-based class. High school math had some impact on the calculus-based course, but almost none on the non-calculus-based class (see Table IV). Although both these factors were significant, their effect was tiny compared to the contribution of the physics and mathematics pretests. Table IV. Contribution of Courses to Variance R2 for students in these courses: Calculus-based

Non-calculus-based

Physics pre-test

0.30

0.32

Math pre-test

0.26

0.22

All previous physics courses

0.07

0.12

All previous math courses

0.10

0.04

The R2 statistic is the fraction of variance of the course grade for the student population that is accounted for by each factor independently. By using the math and physics pre-test scores together, Halloun and Hestenes found that they could accurately predict 53 percent of the variance in letter grades of the students in the physics courses they examined. The physics pre-test score had the greatest predictive power for how well students would learn the subject. The authors suggest that professors give the pre-tests in their courses and that students with low scores be offered special instruction or a pre-physics preparatory course that is more student-centered than large lecture courses. These two ground-breaking studies have a number of minor flaws. The factors may not have a causal effect. Simply because there is a correlation between math ability and performance in college physics does not guarantee that the math ability of students is a major cause of physics performance, no matter how plausible the idea. The authors do not fully explore alternative explanations for their data. There may have been hidden variables that accounted for these correlations. The authors also lost the opportunity to confront high school teachers of physics with their results. How would teachers explain the lack of correlation between taking their course and doing well in college physics? There is an opportunity to replicate these studies in chemistry and biology as well. Determining the role that high school science courses play in how students

8

perform in college science courses may give some hints on what type of science curriculum is appropriate for those students in high school who wish to take science in college. Neither of these studies uses item analysis to determine which items on each of the math and physics pre-tests are the best predictor of student performance and which have no predictive power. A much shortened test would be almost trivial to administer on the first day of class, if it were made up of only a few questions. It would then be easy to shunt those students who would ordinarily do poorly in introductory physics away from the course or offer them additional help.15 5. Evaluation of Instruction and Curriculum We all have misconceptions and it is difficult to comprehend how they persist in spite of instruction. Many researchers have questioned the efficacy of

15I

have taken the advice of Halloun and Hestenes in Celestial Navigation. The course

enrollment is limited by the fact that the department has only twenty-five sextants. Nevertheless, for the last three semesters fifty to seventy students have showed up each semester on the first day of class. I administered a misconception test based on navigational concepts to students in the spring of 1990. The test also documented each student's math and science background, as well as his or her year. At the end of the term, a Spearman correlation (Aikens 1990) was carried out to find which questions on the misconception test were most highly correlated with students' final grades and which were not, or were negatively correlated. A stepwise regression was then performed to find a small set of questions that would account for the largest amount of variance in the student grades. I was surprised that the math and science background of the students had virtually no impact on their performance in the course and that two questions explained 73 percent of the variance in the final grade. I now use this pre-test to winnow down the number of students in the course to twenty-five and I have noticed that the students seem to learn the material much more easily (I can cover several more topics) and get much higher grades on exams (there are fewer Ds and Fs). I have had misgivings about this being the primary criterion for selection of students (I am also swayed by students who have tried to enroll previously). This improvement could instead be the result of improvement in my teaching, the laboratory material, the text, or some combination of variables. I am continuing to give pre-tests to my students to see if there are other misconceptions that I can test for that are also predictive.

9

conventional instruction in changing students’ conceptions (Anderson and Smith 1986; Brumby 1984; Carey 1985; Champagne et al. 1980; Eaton 1984; Nussbaum and Novak 1976). They often found that the conceptions of students do not change as a result of instruction. Others have sought alternative methods that do work in changing students’ ideas. These attempts fall into two broad areas: changing teaching methods, or changing instructional materials. Those who suggest modifications in curriculum materials or teaching methods bear the responsibility for demonstrating their effectiveness (Atkin 1963). To be taken seriously, one must show that any alternative curriculum or teaching technique makes a significant difference, when compared either with a control group or with the same group prior to instruction. These methods can be qualitative, such as using clinical interviews, or quantitative, using multiplechoice tests. The number of educational experiments where new schemes are tested is too large to comment on in this paper. I have restricted myself to a few characteristic examples that bear on astronomy teaching or methods that deal with misconceptions. The launching of Sputnik by the Soviet Union on October 4, 1957 gave rise to the curriculum reform movement of the late 1950s and 1960s. Many new approaches to teaching science were funded by the National Science Foundation. These programs were unusual in that they were created by large teams that included scientists, science educators, and teachers, not simply by one or two authors (Kyle 1985). Several of the programs produced are still in use today. BSCS (Biological Science Curriculum Study) Biology and PSSC (Physical Sciences Study Committee) Physics are two examples of programs that remain popular. ESSP (Elementary School Science Project) is an example of one project that has passed from the scene since its initial funding in 1960. ESSP was developed at the University of Illinois under the direction of Stanley Wyatt, an astronomer, and J. Myron Atkin, a science educator. Their goal was to produce: materials that are sound astronomically, that reflect the structure of the subject as it is viewed by astronomers of stature, and that can be handled by teachers and children in actual classrooms. (Atkin 1963, p. 129). It took only three years for this team to create a series of six booklets for elementary students, complete with descriptions of activities, text, and clever illustrations. By 1963, 350 teachers were trying the materials and providing feedback to the project when formal evaluation of the program began (Klopfer 1964b). Leopold Klopfer, then of the University of Chicago School of Education, designed and carried out a study to assess the effectiveness of ESSP materials, beginning in February and ending in May 1964. Klopfer selected a subset of ESSP materials, the first book of the six-book series, Charting the Universe, for his evaluation. He hypothesized that use of the ESSP materials would increase students’ knowledge of astronomy and general understanding of science, and positively affect students’ attitudes toward scientists and science. The study used a pre-test/post-test format and three test components were used to explore the hypotheses. A subject matter pre-test consisted of 28 questions: 15 that dealt with the subject matter specifically covered in the book,

0

and 13 that tested for general knowledge in astronomy. The post-test include these 28 questions plus 14 additional questions. A 36-item test of understanding the processes of science, TOUS (Test On Understanding Science), was also given as a pre-test and post-test, and a “semantic differential” instrument was created and administered to determine affective changes (Osgood et al. 1957).16 The tests were given to ninety-two students during the 1963-64 school year, in four classes comprising the entire fifth grade of the University of Chicago Laboratory School. The same teacher taught all the classes in three fiftyminute periods each week for a total of ten weeks. She followed the teacher guide exactly and also chose to include most of the supplementary activities and exercises in the book. She taught each class in the same way so that the four different classes were considered to be a single population. None of the tests was administered by the teacher. The results of these tests were depressing. The gains on the subject matter tests were quite small, although they were all significant at the p < 0.001 level. Table V shows mean pre-test and post-test scores. The “Book 1 Content” tested items that were explicitly addressed in the ESSP materials. The “General Knowledge” section tested for gains in astronomical concepts that were not explicitly addressed in the materials. The authors had hoped that exposure to the ESSP materials would increase knowledge of related topics. Table V, ESCP Test Scores

Pre-test

Book 1 Content

37%

General Knowledge37%

Post-test 49%

43%

IQ accounted for only 7 percent of the variance in these scores. Klopfer went on to calculate the significance of individual test items using a chi-square test. Of the 28 test items on the pre-test, he reported that only 11 had significant changes: 5 from Book 1 and 6 from the general knowledge test. However, Klopfer used a rather generous probability coefficient in interpreting his data, that of p < 0.05. If we use a more conservative level of p < 0.01, only 7 out of 28 questions showed gains that were significant. Results on the TOUS were even less impressive. Klopfer found a significant difference on this test, but the gain in the mean of the class scores was less than one test item. It was only a 2 percent gain. Only 6 of the 36 questions showed significant gains, but, again, Klopfer was generous in his setting the level of significance at p < 0.05, as the differences for only 2 out of 36 questions were significant at the p < 0.01 level. Of the test population, 33 students (36 percent) actually scored lower on the post-test than on the pre-test. The results from the 44-item semantic differential test are a bit more difficult to interpret. At the p < 0.05 level, 23 items show a significant change. At

16TOUS,

of a form in common usage today, asked students to rate certain statements on a scale of

1 to 5; for example: Doing science is:

dull A

B

C

D

E

exciting

1

the p < 0.01 level, 17 show a significant change. Surprisingly, most of the changes are in the negative direction. Students viewed astronomy as less exciting, reading about science as less enjoyable, scientists as less interesting, and their science teacher as less useful after the ten-week experiment. Clearly this curriculum had some very serious problems. Students who took it seemed to be in worse shape as a result. Although they learned a few specific skills, there was almost no gain at all in their understanding of how science works. They developed a worse attitude toward science and its practitioners as a result of the course. There were flaws in the evaluation design as well that could have affected the results. The subject matter test questions were made up especially for the test, but no mention is made of testing its readability or of having any feedback from teachers about the chance that fifthgrade students would be able to understand it. The questions appear to be the sort that would appeal to astronomers, but that even elementary school teachers themselves would have difficulty answering. 17 I used two tests of readability to determine if this question was understandable by Klopfer’s subjects. This question’s Flesch Grade Level is 5.6 and its Gunning Fog Index is 7.0. These are too high for most fifth graders to understand the item. Klopfer never interviewed students to determine if they understood the questions. The test’s reliability (Kuder-Richardson 21) was only 0.353 for the Book 1 items and 0.475 for the general knowledge items. These are quite low measures for reliability. Low reliability on achievement test is considered to be in the 0.66 level (Aiken 1985). For tests with high reliability, students would be expected to answer the same question in exactly the same way if they took the test more than once. Klopfer made no use of a control group. He could have used half the group for treatment and taught the other half some biology by reading stories; this might have led him to discover some other variable responsible for the poor showing of the course, such as an ineffective teacher. The population of students was quite extraordinary. This was a laboratory school. The mean IQ of the subjects was 124. Whatever the results might have

17An

example of this is question #13:

To find the scale of a model boat, you would A. find the difference between the length of the model boat and the length of the real boat. B. measure both the length of the mast and the length of the sail since at least two measurements are always needed C. divide the length of the sail on the model boat by the length of the sail on the real boat. D. multiply the length of the model boat by the length of the real boat. E. divide the length of the real boat by the length of the model boat. (Klopfer 1964b, p. 21)

2

been from Klopfer’s study, he would have a hard time applying them to the average fifth grader. By having the same individual teach all of the ESSP classes, the study may be demonstrating the poor teaching of this individual and not the failings of the materials. Using more than one teacher, especially since the treatment classes were randomized, would have helped determine what factor the teachers played in the changes in student knowledge and attitude. In 1985, two researchers at Michigan State University decided to focus on trying to change the misconceptions of fifth graders concerning light (Anderson and Smith 1986). They developed a simple and efficient method for testing groups of children and embedded this within a sound evaluative design. The experiment consisted of teaching a control group of 102 fifth graders about light using a conventional elementary science textbook (Eaton 1984). The second year, another group of 125 fifth graders were taught the same content, but the text was supplemented by a teachers’ manual and a series of transparencies that helped explore students’ misconceptions about color and light. Teachers taught about light for two or three times a week during a four to six week period. Students were given identical pre-tests and post-tests. The tests were developed by the authors and were based on the content of the textbook and commonly held student beliefs. They contained forty-seven questions in year 1 and were shortened to thirty-seven questions in year 2. The tests were a mixture of open-ended, yes/no, and multiple-choice questions. It covered four topics: a. how people see; b. the nature of color vision; c. the interaction of light with various objects; and d. the structure and function of human eyes. Anderson and Smith developed a 5-point scale (-2, -1, 0, 1, 2) that rated the students’ test answers on the degree to which they were naive or scientific beliefs. The tests were corrected by coders who assigned ratings to each student’s answer to each question. To increase reliability, only questions for which two independent coders had agreements of greater than 80 percent were used during the first year. For the second year, these other questions were dropped from the test. Comparisons are made of only questions that were posed in both years. The authors conducted eleven clinical interviews, five before instruction and six after instruction, to determine the construct validity of the tests. Only two items were found to have agreement of less than 80 percent between interviews and written tests. The authors found statistically significant differences in the post-test scores of the treatment group over the control group when they used a “consistent commitment” to the scientific view as their gauge of student understanding. For example, by the post-test only 20 percent of students in the control group thought that people’s eyes see light that is reflected from objects, whereas within the treatment group the score was an impressive 78 percent.

3

The large gains in learning of the treatment group may not have simply been the result of the misconception focus of the lessons. The treatment group was taught by the same teachers as the control group, but a year later. The study also found measurable learning in both control and treatment groups, from pretest to post-test. Undoubtedly, teachers were more experienced in teaching about light during the second year, so larger gains should have been expected. Also, the teachers may have been less likely to transfer their own misconceptions to the students during the second year, because they were the objects of study and they learned from the teaching materials as well. The authors report that “Several of the teachers told us that the transparencies and the manual had helped them to understand color and color vision” (Anderson and Smith 1986, p. 26). The authors should have taken the precaution of administering their tests to the teachers as well, to see if their changes of conception played a role in measured gain of the students. The instrument that was constructed for this study relied almost exclusively on requesting explanations for events, and not on predictions, for example: Is white light a mixture of colors of light? If you answered yes, list some of the colors that make up white light. (Anderson and Smith 1986, p. 20) The answer used to grade this question is not even scientifically accurate. Light that appears white can be made from two (blue and orange), three (red, green, and blue—the colors of phosphors in a color television set), or up to an infinite number of colors. The so-called seven colors of the spectrum are an arbitrary set named by Isaac Newton (Newton 1721). This question and others on the test may not identify misconceptions at all, but may simply elicit the recall of what the teacher has said or what the student has read. To get a high score for this question in particular, a student needed only to answer “yes” followed by “all colors” or a listing of colors in the spectrum. This answer could easily have been memorized. A better discriminator between scientific and misconceived concepts would have been, “What happens to white light when it passes through a prism?,” followed by “What happens when you pass only red light through a prism?” Presented with a new situation, students would have had to use their conceptions to come up with an answer. The rating scale used by the study changed significantly from the first year to the second, so that the results are not so easily compared. The test instrument had 21 percent fewer questions in the second year compared to the first. Students may have had more time to complete the test or they may have just been fresher from taking a shorter test. The study would have been better if two teachers were involved. The first could teach a control group during the first year and used the misconception materials the second. The second teacher could have used the misconception materials during the first year and taught the control group in the second. The textbook defines the science class. It is generally followed slavishly by teachers, who attempt to cover its entire contents and rarely depart from the topics and the order presented in the book (Hofwolt 1985). This dominance of the text discourages alternative techniques of instruction. Texts present facts and

4

concepts as truth, with little need for discovery or verification by the student. Students’ own ideas are not involved. Basing instruction on a text is arguably not as productive when compared to alternative instructional techniques. In an attempt to explore why students have great difficulty learning from texts, a doctoral student at Michigan State University created text material that dealt explicitly with student misconceptions and compared it with conventional texts (Roth 1985b). Roth found that conventional texts do little to change middle school students’ conception of photosynthesis (that plants make their own food, rather than absorb it from the soil). She selected two conventional textual treatments of photosynthesis (control 1 and control 2) and constructed one herself of the same length (treatment). She assigned six students each to read each text and interviewed them immediately afterwards. Pre-tests and post-tests were also given containing multiple-choice and open-ended items. The experimental text first elicits students’ conceptions by asking for definitions and explanations. It then presents experimental evidence that challenges their conceptions. Only after carefully examining and ruling out the most popular student misconceptions by giving evidence for their falsity is the scientific concept elucidated. Finally, students must apply this new conception to a variety of situations and problems. Through interviews, Roth studied how students answered questions after reading the text. She found that students who read the conventional texts would answer text-based questions by recalling “big” words and facts from the text or by calling on their prior knowledge and ignoring the text. When answering questions about the real world, they would always use their prior knowledge, although they would recall examples from the text that they maintained would support their ideas. Students who read the experimental text were more reported to use the ideas in the text to change their conception. An analysis of post-test scores showed that all students who had used the experimental text were successful in giving up at least some of their misconceptions for scientific conceptions (these four conceptions are labeled q1, q2, q3, q4). Students with the correct conception are labeled as “1.” Those who had nonscientific conceptions are marked “0.” The students who changed their conceptions were aware that the ideas presented in the text were in conflict with their own views. They used the information in the text to help discard their own ideas in favor of more accurate ones. Roth did notice that some of the students who read the conventional text also changed their misconceptions, but they did not change them with the same frequency as those in the treatment group (see Table VI). Also, only those students who had reading levels roughly six years above grade level could effectively use the conventional texts to change their ideas. Those who were closer to average or below average in reading level (RL) made little or no progress (Roth 1985a).

5

6

Table VI, Misconception Test Results from Text Study Group

Name

RL

q1

q2

q3

q4

Treatment

Daryl

3.4

1

1

1

0

Treatment

Evalina

5.6

1

1

1

1

Treatment

Allison

7.6

0

1

0

1

Treatment

Doug

8.1

1

1

1

1

Treatment

Vera

8.6

1

1

1

1

Treatment

James

11.3

1

1

1

1

Treatment

Sheila

13.0

1

1

1

1

Control 1

Jill

4.0

Control 1

Maria

4.0

0

0

0

Control 1

Myra

6.0

0

0

0

Control 1

Phil

6.0

0

1

0

Control 1

Deborah

10.0

0

1

0

Control 1

Parker

13.0

0

0

0

Control 2

Linda

4.5

0

0

0

0

Control 2

Tracey

5.6

0

0

0

0

Control 2

Danny

7.1

0

1

0

0

Control 2

Sally

8.4

0

1

0

0

Control 2

Kevin

12.6

0

1

0

1

Control 2

Susan

12.6

1

1

1

1

The design of Roth’s study is adequate for an exploratory purpose, but the quantitative analysis is too weak to make its case. The author takes care to build a case for the types of learning strategies that each student uses to answer questions, but each is a qualitative assessment. She performs no statistical analysis on her results to find the probability that her impressive post-test results could have occurred by chance selection of those students who were ready to change their ideas. After all, there were only seven students in her treatment group. The study does not fully explore the relationship between reading level and conceptual change. Nor was statistical analysis performed to determine what fraction of the variance in outcome might be explained by use of the text, student reading level, or any other variable. In my own analysis of Roth’s data, I found her conclusions not to be well supported (see Table VII). Separate t-tests (Aiken 1985) were computed for each control group and the treatment group. Using a t-test to determine the probability that her results could have occurred by chance, I found that two of the concepts fail the test at p < 0.01 for control group 2, and one question for

control group 1. The control and treatment groups were just too small for the results of the experiment to be significant. A simple regression test for student reading grade level against the test results shows a disturbingly high regression coefficient. One could interpret the results of the fourth question as resulting from difference of reading level and not difference in treatment. Table VII, Regression Analysis

Group 1

Group 2

Group 1 and 2

Treatment vs. Control group

t-test, p= t-test, p= regression R2 =

q1. Plants make food

0.0005

0.009

0.037

q2. Need light to make food

0.01

0.1132

0.096

q3. Get food only by making it

0.0005

0.009

0.037

q4. Plants first get food from seeds

———

0.0586

0.425

Roth’s study makes wonderful reading and many of her ideas have been incorporated in the writing of Project STAR’s text. Each of our chapters now starts with prediction activities with which the students identify and clarify their own conceptions through either answering questions or interviewing others. Upon reflection, I believe that the Project STAR text does not handle each misconception explicitly and give students reason to disbelieve it. Revisions this spring should probably include the specific evidence so that each misconception can be discarded by the students who hold it. D. Methodological Problems of Past Research Misconception testing is in its infancy. Many of the studies that I have examined were carried out by science professors on their own students, often with little regard for accepted testing procedures. Many had severe methodological flaws . There is little evidence that their conclusions were supported by their tests. Test items are rarely analyzed individually for appropriateness. The use of small samples of students inhibits the finding of statistically significant results. Probability that the same results can be obtained at random drops as the sample size grows. Modern statistical tools for analysis of data are infrequently used. Of the thirty-nine studies that I examined in the preparation of the paper, each had some deficiencies in validity, or reliability, or sampling problems. Validation is the process of accumulating evidence to support inferences made from test results (Cronbach 1990); it includes not ignoring evidence that does not support inferences. It is the most important single measure of test quality. However, misconception tests are often devised with little attempt to assess their validity. In the process of reviewing published misconception surveys, I found that many were deficient in the establishment of their validity. For example: • Testmakers did not clearly define their objectives. The purpose and domain of the test were not clearly stated and the need for a new test was not justified.

7

• Test items did not measure only student misconceptions. Individual items should independently measure a single psychological construct, namely, a single misconception. Subjects who score well on the test should, on average, outperform students who do poorly on the test, on each item. • There were no systematic criteria to minimize test error by examining each test item. This often helps to reduce random errors. Reliability was not measured by retesting or some other suitable method to determine if students gave the same answers repeatedly or were guessing. Of course, the process of interviewing students to find their misconceptions has an inherently low reliability. The test-retest method is not very effective for interviews since learning may occur during the initial interview, leading to different results in the second (Hoz 1983). • The format of the test was often inappropriate. Establishment of test validity should include a discussion of why a timed test, multiplechoice, or difficulty of items was chosen and why it is appropriate. • Items did not appear to be statistically independent from each other. Items should not provide clues to other questions and should not assess precisely the same notions. The correlation between different test items should not be too high. • Teachers did not feel that the items were good indicators of learning within the curriculum. Tests should be based upon objectives that teachers feel characterize their courses rather than simply being an assemblage of random items drawn from the discipline. • Items were not well written. They must be at the appropriate reading level, and minimize complex grammar and scientific jargon. E. Implications of Past Studies The discovery of deeply held beliefs that are at odds with scientific conceptions has had a profound effect within the field of science education. Students are not learning many of the concepts that were thought in the past to be easy to learn. Students manage to cope with being asked to accept beliefs that they feel are foreign by simply memorizing a few terms and regurgitating them on exams, or they may accept portions of the concept and discard others that do not fit with their models. These misconceptions persist in spite of instruction. Many astronomical conceptions are covered in subjects taught in elementary school, in junior high earth science, in high school physics, and again in college (where they may be taken for granted by the professor and never discussed). No subset of students is spared from holding misconceptions. Gifted students have as many misconceptions as less gifted students. Students who take high school physics exhibit as many misconceptions in that field as those who do not. College professors are no more effective at changing students’ ideas than high school teachers. All the while, students continue to pass their science courses with the same misconceptions they hold upon entering and often they get As.

8

Several research groups have attempted to produce inventories of student misconceptions through interviews. These have been found to be useful in cross-sectional studies that have determined that many misconceptions remain virtually unchanged over the course of schooling. Pre-test/post-test studies have been used to attempt to find teaching strategies and instructional materials that are effective in modifying students’ beliefs. Although many people have pointed out that the progression of misconceptions that a student may hold over time bears some similarity to the historical development of the discipline, no one has yet determined if the study of the history of a scientific field has any beneficial effects in overcoming student misconceptions. Many researchers have suggested that conventional high school science courses do little to prepare students for courses in college, in spite of what their teachers believe. Several studies have shown that performance in science courses is highly correlated with students having few misconceptions and having a high level of skill with mathematics. No one has yet proposed doing away with high school science courses in favor of courses that deal only with student misconceptions and math skills. Such an experiment could be performed to determine if students taught in this fashion achieve more when they take additional courses in science. Many researchers think that teachers will benefit from interviewing their own students in order to become aware of their misconceptions. Although this is a noble idea, no experiment has yet verified that this is true. Knowledge of student misconceptions on the part of the teacher may not be enough to change the students’ ideas. In fact, the additional time needed to carry out interviews may decrease instructional time and could turn out to be counterproductive. The construction of curriculum materials that effect changes in student beliefs when used by classroom teachers would be a much more cost-effective solution. Progress has already been made in the development of textual material and visual aids to teaching. Alternative instructional methods, such as teachers concentrating on anchoring concepts and reasoning skills, may also prove useful. The study of scientific misconceptions as a field has grown dramatically in the last decade. Many researchers have investigated the naive beliefs of students in a variety of science disciplines. These investigations cover grade levels from kindergarten to college. Several involve questions that are astronomical subjects, but none is comprehensive in this field. Most investigations of students’ misconceptions are presently performed using clinical interviews or open-ended written instruments and are dependent on the ability of the interviewer to identify the conceptions of the student. As investigatory tools, these open-ended techniques work well, but as evaluatory instruments they have many failings. They are not free of the bias of the investigator, who may be looking for certain results and may find them even when they do not exist. Some investigators change the instrument used during the study or do not control for variables that may invalidate the results. Many of these studies examine too few students for their results to be statistically significant.

9

Many studies have investigated student conceptions about ideas that are included in astronomy courses at the high school and college level. It is possible to construct a comprehensive multiple-choice test using these investigations.18 The few studies that have attempted to evaluate the efficacy of experimental instructional methods or materials in changing student misconceptions are often flawed by problems with their design. Several use control groups and post-tests without pre-tests, so one cannot tell if the groups were really similar at the start. Others use pre-tests and post-tests without control groups, so they cannot separate out the effect of changes that would occur naturally over the course of the experiment. Some use populations too small for adequate statistical significance or populations not sufficiently randomized. Several studies use instruments with reading levels that are too difficult for their subjects. Not analyzing results in enough detail is another problem; examining the results from individual items or even the change in selection of distractors could prove very useful. Often the validity of a test is only ascertained by those connected with the study and outside validation is not sought. This reduces the potential discriminating power of the instrument. Sometimes no attempt is made even to ascertain the reliability of the test instrument by retesting or interviewing subjects, or by applying conventional statistical tests. The major question that still lies unanswered by this review of the literature is:

18

My investigation has identified these sources of the most common student misconceptions in astronomy:

Cosmography: Klein 1982; Mali and Howe 1979; Nussbaum 1979; Nussbaum 1985; Nussbaum 1986; Nussbaum and Novak 1976; Sneider and Pulos 1983. Cosmology: Lightman and Miller 1989; Lightman et al. 1987; Viglietta 1986. Gravity: Gunstone and White 1981; Mali and Howe 1979; Sneider and Pulos 1983; Stead and Osborne 1981; Vincentini-Missoni 1981. Light and Color: Anderson and Karrqvist 1983a; Anderson and Karrqvist 1983b; Anderson and Smith 1986; Bouwens 1986; Brown and Clement 1986; Eaton 1984; Eaton et al. 1983; Feher 1986; Guesne 1985; Jung 1987; Slinger 1982; Stead and Osborne 1980; Watts 1985. Seasons: Furuness and Cohen 1989; Klein 1982; Rollins et al. 1983. Moon Phases: Camp 1981; Cohen 1982; Cohen and Kagan 1979; Dai 1990; Kelsey 1980; Za'rour 1976. The Solar System: Broman 1986; Dobson 1983; Edoff 1982; Friedman et al. 1980; Touger 1985; Treagust and Smith 1986.

0

Can instructional methods or curriculum materials dramatically and efficiently change student misconceptions? The opportunity exists to develop, field-test, and validate an instrument for assessing student misconceptions in astronomy. This instrument should be based on the large body of interviews in this domain. It can be validated through a review by a panel of astronomers and science teachers. Reliability can be increased by attention to the instrument’s reading level and assured through interviewing subjects who have taken the written test. This instrument should be embodied in pre-test/post-test, control group design to test for changes in the conceptions of students who are instructed with experimental materials. Treatment and control groups should be chosen with attention to their equivalence. The groups should be large enough to ensure a reasonable statistical significance for small changes in conception. Such a test can be used to examine conceptual changes in students wrought by learning with Project STAR (described in the following section) curriculum materials compared with conventional astronomy courses. If properly carried out, this could be the first significant test of science materials that seek to modify students’ beliefs in science. Such an investigation has the potential to change the way curricula are designed and taught, so that misconceptions may be more efficiently replaced by scientific conceptions.

1

III. Methodology This study has been carried out as a survey of a large nonrandom population. It is a part of a much larger evaluative study of Project STAR using pre-testing and post-testing of control and treatment groups. For this study, the control and treatment pre-test groups have been combined and looked at together, since the pre-test was given before the treatment of using the Project STAR curriculum had begun. In carrying out this study, a sixty-item multiple-choice instrument was constructed based on interviews reported on in the research literature and on interviews with high school students conducted by myself and members of the Project STAR staff. This instrument was administered to students at the start of their earth science or astronomy class. These data were collected and analyzed at Project STAR using a variety of computer-based statistics programs. In this section, I will review the sample selection procedures, describe the sample as it relates to the larger Project STAR dataset and its history, describe the research questions and my hypotheses, describe the origins of the instrument, and review the administration and analysis procedures. A. Description of the Dataset In an attempt to remedy some of the current problems in pre-college science instruction, the Harvard-Smithsonian Center for Astrophysics (CfA) has undertaken to develop and test a new type of high school science course. Similar to its support of curriculum development efforts in the 1960s, the National Science Foundation (NSF) has funded a team of physicists, educators, and classroom teachers to develop Project STAR (Science Teaching through its Astronomical Roots), a modern, activity-based physical science course. This sixyear project receives support from the NSF, the Smithsonian Institution, and Apple Computer. The need for Project STAR emerged from growing concern that too few students were learning science at the high school level (Welch et al. 1984). Using astronomy as a focus, the development team hopes not simply to increase the enrollment in high school science courses, but, more importantly, to improve the students’ understanding of science and its role in making sense of the world. Unlike most high school science texts, STAR’s materials de-emphasize vocabulary and facts, and emphasize powerful scientific concepts. The educational approach of Project STAR is based on three principles: • Students learn best through hands-on activities (Wise 1983, Shymansky 1982). • Mastery of a few ideas is more effective than cursory exposure to many concepts (Bloom 1971). • Students’ misconceptions, unless confronted and changed, obstruct learning (McDermott 1984). Classroom observations of students and discussions with teachers show that students enjoy the “hands-on” nature of this course and actively participate in observation and experiment. Both teachers and students have remarked that

2

the limited number of concepts treated by the preliminary materials is a welcome change from the more encyclopedic coverage of other science courses. Whether students actually undergo changes in their strongly held conceptions, however, remains open to question (Sadler and Luzader 1988). A major component of the curriculum development process for Project STAR has been formative evaluation. Hundreds of students and teachers have been interviewed and thousands of subjects have taken multiple-item pre-tests and post-tests to measure their conceptions. This dataset provides a rich repository of information on student misconceptions in astronomy and how they change in the course of instruction. 1. History of the Dataset In 1986, I began the program of formative evaluation for Project STAR. Each year, as new curriculum materials were trial-taught in participating schools, a pre-test was administered in September and a post-test given in either January or June, depending on whether the course was one or two semesters in length. These tests contained items covering misconceptions, astronomical facts, and math skills that students brought to the course. Each year, the gains on various test items were examined and compared with control classrooms. We modified the curriculum materials as a result. These tests were not explicitly designed to be used for summative evaluation, but over the years they evolved into a form that has proven quite useful for diagnosing student misconceptions. The validity and reliability have risen to such a level that they could be useful in classrooms across the country. The initial work involved creating a two-tiered misconception test that combines the elements of an open-ended test with a multiple-choice test (Treagust 1986). Each test item consisted of two parts. The first part asked students to predict the outcome of a situation. The second part let them provide a reason for their answer.19 In interpreting these multiple-choice tests, responses that drew more than 10 percent of the answers were examined in depth (Gilbert 1977). These items were drawn from interviews reported in the literature that related to astronomy.

19An

example from this test:

On August 7, 1654, J. Kepler went outside, looked up and saw a star explode, a supernova. When do you think the star really exploded? a. at least a few years before August 7, 1654. b. on August 7, 1654. c. at least a few years after August 7, 1654. d. at some other time. Give your reason: ____________________________________________________

3

In the spring of 1987, I used open-ended written tests with twenty-five ninth-grade students at Cambridge Rindge and Latin High School and then conducted interviews seeking to validate the written responses. These interviews were quite successful and documented the responses to the written instrument. Videotapes of these students explaining their astronomical ideas became the central theme in the production of a documentary film, A Private Universe, which compared the conceptions of high school students with those of Harvard graduates and faculty.20 As the curriculum project progressed, each year a new pre- and post-test was developed to match the year’s evolving curriculum objectives. Items that teachers deemed to be unclear were rewritten. Changes in the objectives of the Project STAR course forced the elimination or modification of some of the items. Items that were answered correctly more than 80 percent of the time were deemed “anchors” and were incorporated into the curriculum materials as ideas that students would probably know and on which they could build. Examples of such ideas and facts are: • Light takes time to reach us from the stars. • The light year is a measure of distance. • The Earth is 93,000,000 miles from the Sun. • The Moon is closer to the Earth than is the Sun. These facts were removed from subsequent tests. Distractors—items that were chosen less than 5 percent of the time—were rewritten and replaced so that they would have greater appeal to the students. Often this meant combing the recent literature for more popular alternatives or inserting “scientific sounding” jargon.21 These changes tended to make the test more difficult each year as the distractors became more attractive.

20The

film has since won four major awards:

Silver Apple, National Educational Film and Video Festival, Seattle, WA. (1987); Gold Medal, Documentary, Houston International Film Festival (1988); Gold Plaque Award, Chicago International Film Festival (1989); and Blue Ribbon, American Film and Video Association (1990). 21 In

1987 the question dealing with the cause of the seasons was:

What causes the seasons? A. The Earth's distance from the Sun. B. The Earth's axis flipping back and forth as it travels around the Sun. C. The Sun's motion around the Earth. D. The Earth's axis always pointing in the same direction. E. The shifting seasons on the Earth.

4

2. Aspects of the Dataset The Project STAR dataset is too unwieldy to be used in its entirety for this dissertation. I have therefore chosen to use only multiple-choice data for the most recent academic year, 1990-91 (see Appendix A for test). Interviews and open-ended questions will not be included except as they apply to issues of validity. This decision also maximizes the number of subjects who have included demographic information on their tests. Of the sixty multiple-choice questions asked, thirteen dealt with demographics or descriptions of the students’ background. Many of the questions are based on pictures, diagrams, or graphs, which are intended to reduce the reading level required of subjects. The target population for this study is students from grades eight through twelve who are beginning an earth science or astronomy course. The 1,414 subjects are the students of 22 teachers from four different groups: • Students of seven teachers from the greater Boston area who have participated in the development and testing of Project STAR materials. This group was recruited by initially contacting each of the fifty-four high school science departments; within the route 495 area to find who, if anyone, taught astronomy. All twelve of these teachers of astronomy were interviewed individually by me and were offered a consultancy with the project. Ten teachers accepted; seven remain with the project today. • Students of twelve teachers throughout the United States who volunteered to attend a two-week Project STAR summer institute at the CfA where they helped to develop Project STAR materials and agreed to test them in their classrooms.22 This group was self-

By 1990 the question evolved into a different form: The main reason for it being hotter in the summer than the winter is: A. The Earth's distance from the Sun changes. (46%) B. The Sun is higher in the sky in the summer. (12%, the correct answer) C. The distance between the northern hemisphere and the Sun changes. (37%) D. Oceans carry warm water north. (3%) E. An increase in greenhouse gases. (3%) 22This

group initially consisted of respondents to a call for Project STAR participants announced

yearly in StarNews, the project newsletter. The original list of 408 astronomy teachers responded to a national census of all 11,100 high school science department heads in the United States sent out by the project. These census cards were mailed postpaid on May 15,

5

selected by accepting all applicants from inner-city schools with full scholarships and by requesting that teachers from suburban and private schools pay a larger fraction of their own expenses. • Students of three teachers who teach earth science or astronomy in the same schools as the teachers in the above two groups, who are not involved with Project STAR • Students of two teachers who were identified as teaching high school astronomy courses and were randomly selected from the StarNews mailing list. These teachers were successively contacted by telephone until we obtained a total of 200 control students. The primary reason for involving these teachers was to include a control group for control/treatment studies of Project STAR curriculum materials. These teachers have characterized their schools according to a number of factors. Three teach in rural areas, sixteen in suburban districts, and three in cities. All but two teach in public schools. The economic status of their communities varies considerably. Four are characterized by these teachers as low, five as low to average, nine as average, two as average to high, and one as high. School sizes vary from 325 to 2,000 students with a mean of 1,282. The geographic distribution of these sites is shown in Figure 4. Figure 4, Locations of School Sites

Long Prairie, MN

Hudson, NH

Wausau, WI

Rochester, MN

Wauwatosa, WI

Oak Park, MI

Needham, MA Framingham, MA

North Andover, MA Andover, MA Watertown, MA Plymouth, MA East Weymouth, MA

Rockford, IL Indianapolis, IN

Euclid, OH

Maryville, MO

1986. Since then, as I and other project staff have given over 160 papers and workshops, this list of newsletter recipients has grown to roughly 4,000 teachers.

6

The sixty-item tests were given to subjects within the first two weeks of the start of their astronomy or earth science class. They were given by their regular teachers during an ordinary class period of from 45 to 55 minutes. Students generally finished the test in 30 to 35 minutes. An attempt was made to produce a positive testing atmosphere by having the teacher explain to each class that this was a test that would help to “design an even better course for students to take in the future” and that it would not count as a part of their grade. An identical test was given to each subject within two weeks of the end of their course as a post-test. Analysis of these data, however, will not be included in this study. Tests and answer cards were shipped to participating teachers in time for the start of school. Teachers agreed not to discuss the items on the test with the students prior to or after the test administration. B. Research Questions 1. Question 1 focuses on the validity of the Project STAR misconception test. 1(a) Is the test a valid instrument for measuring the misconceptions of students entering an introductory astronomy course? 1(b) Which test items appear to be most appropriate in assessing student misconceptions in astronomy, and should be included in a revised instrument? 1(c) How reliable is this test? 2. Question 2 focuses on the misconceptions revealed by the test. 2(a) For students enrolling in a course where astronomical concepts are taught, for which concepts will students initially hold conceptions that are at odds with accepted scientific views? 3. Question 3 focuses on the demographic aspects of the subjects and their relation to scientific misconceptions. 3(a) Are differences in the quantity of misconceptions related to gender? 3(b) Are differences in the quantity of misconceptions related to ethnic heritage? 3(c) Are differences in the quantity of misconceptions related to the educational accomplishment of parents or guardians? 4. Question 4 focuses on the school-based aspects of the subjects and their relation to scientific misconceptions. 4(a) Are differences in the quantity of misconceptions related to students’ grade level or age? 4(b) Are differences in the quantity of misconceptions related to students’ prior completion of specific mathematics or science courses? C. Hypotheses It is difficult to predict reliably how the analyses described above will turn out, but based on my own experience over the years with student interviews and a cursory analysis of student responses, I definitely had my own predictions.

7

I believe that the majority of test items will expose misconceptions. Distractors will be chosen by students with much greater frequency than one would expect from random choice. In many cases, distractors will be more popular than the scientifically correct answers. Teachers will, for the most part, characterize the test items as reasonable. They will predict low initial pre-test scores and high post-test scores for their own students. Some test items will probably be found to be poor indicators of student misconceptions based on correlations with the total test score or through the application of classical test theory. I believe that demographic and schooling factors will not account for more than one-third of the variance in the total scores. Older students will have no fewer misconceptions than younger ones. This prediction is supported by evidence that students who have taken earth science have no fewer misconceptions than those who have not. Gender and ethnicity account for no reduction in variance. D. Instrument The Project STAR pre-test consists of directions to the student, forty-seven content questions, and thirteen demographic questions. The content questions, for the most part, deal with misconceptions, but there are seven items that were thought to cover astronomical facts, and six that address mathematical skills essential to astronomy. Demographic questions identify gender, age, ethnic heritage, grade level, math background, science background, parents’ educational level, the students’ educational plans, their view of the need for science in their future, and their reason for enrolling in the course. All the questions are of multiple-choice format. Each content question consists of a stem and five alternatives that are selected by filling in the corresponding cell on a computer-readable “bubble” response card. Only one of the responses is scientifically correct. The other four represent misconceptions that are found in the literature on scientific misconceptions or have been revealed through interviews with students conducted either by me or by other project staff. These distractors have been written to be as plausible as possible. An effort has been made to ensure that: • Correct answers are free of scientific jargon. • Correct answers are similar in length to the distractors. • Clues to correct answers are not found in other test items. • Distractors do not overlap conceptually. • Distractors do not have grammatical mismatch cues in tense or number. • All responses are concise. • All diagrams are labeled correctly. • All responses are within a well-defined content area. • There is sparing use of “none-of-the-above” or “all-of-the-above” answers.

8

Most of all, a concerted effort has been made to include only material relevant to uncovering scientific misconceptions. The questions have been kept simple, and the reading level has been kept low. E. Procedure Many operational details seem obvious to me only after completing this research, so I have included them in this section to help any future researcher, wishing to carry out a similar study, to save time and energy by replicating these methods.

1. Administration The misconception test was administered by classroom teachers and 100item Chatsworth data cards with five alternatives were filled out by students and then mailed back to Project STAR where they were read by a Chatsworth computer card reader. Cards that were unreadable because they were filled out in pen or hard pencil were entered into the database by hand. Several dozen cards were spot-checked to establish the accuracy of the computer input. Data are stored in text-file format so the dataset is accessible to both Macintosh and IBM-PC microcomputers. 2. Variables Many variables were directly measured or calculated from the student tests. They fall into two groups: variables that vary by student and variables that vary by test item. a. Measured Variables by Subjects Students answered forty-seven questions that dealt with content knowledge and thirteen that described their demographic background and schooling. A total score was calculated as the total number out of forty-seven questions that students answered correctly. All demographic and schooling variables let students choose an answer that was a category. For the purpose of analysis all these variables were converted into numbers directly, by assigning a ranking of 1 to 5, or by use of dummy variables. A simple regression of the data revealed that students’ ages could be computed by adding 6 to their grade levels. This shows that the range of ages available for students to select should have been extended downward by one year. Eighth-grade students are for the most part, fourteen years old. By coding all students who chose “other” as their grade and “15 yrs or younger” as their age as 14-year-old eighth graders, a much better regression fit was possible . This recoded age variable has been labeled Age*. The fraction of variance in grade level explained by students age was increased from 46 percent to 73 percent using this recoded variable. Math level follows the usual sequence of courses offered in high schools (see Table VIII). Most students begin with Algebra I and continue in the

9

sequence, Geometry, Algebra II, and Pre-Calculus or Trigonometry, until they stop taking math. Students who avoid this sequence usually follow a general math course. This progression is graphed in Figure 5. Table VIII. Distribution of Math Level by Grade General Math Algebra I Geometry Algebra II Trigonometry missing count

Grade 8 88% 8% 2% 2% 1% 4% 374

Grade 9 Grade 10 43% 14% 41% 56% 10% 15% 3% 14% 4% 1% 13% 1% 157 136

Grade11 5% 29% 36% 25% 6% 1% 309

Grade12 missing 8% 40% 10% 40% 21% 0% 39% 0% 23% 20% 0% 0% 301 5

Figure 5 shows graphically how the profile of math knowledge changes from one grade level to another. In the eighth grade, most students have only taken general mathematics; by the tenth grade the majority have taken at least Algebra I, and by twelfth grade the majority have completed Algebra II. Examining the three questions that relate to educational attainment of mothers, fathers, or the students’ themselves, I found that choice C, Graduate from Trade, Vocational, or Business School and choice D, Some college were reversed in level. Item 59 on the importance of science to a student’s future occupation was designed to be rank ordered in the same way that it was written. Table IX summarizes the coding assignments described above.

0

1

Figure 5. Math Profile by Grade Level

Table IX. Coding Assignments for Student Background Variables Numerical Assignment

1

2

3

4

5

Grade

other

9

10

11

12

Math

general

algebra I

Geometry

Algebra II

Pre-calc.

Mother’s ed

planets->stars 20. Moon revolution 23. Moon revolution (around Sun) 31. time zones 35. astrology 42. filters 43. light sources 44. light propagation - night 46. gravity Averages SD

Pre-test Pre-test PostStudent Pre-test Teacher Pred. Teacher Pred. testtest 0.66 0.65 0.89 0.25 0.34 0.72 0.30 0.47 0.81 0.24 0.20 0.61 0.18 0.27 0.73 0.29 0.36 0.74 0.11 0.29 0.76 0.44 0.49 0.86 0.38 0.48 0.85 0.52 0.34 0.73 0.46 0.25 0.63 0.22 0.40 0.80 0.15 0.23 0.50 0.39 0.35 0.71 0.40 0.38 0.73 0.29 0.23 0.65 0.33 0.36 0.73 0.14 0.12 0.10

Corr. coeff. with student pre-test

0.63

0.75

2. Expert Validation Fourteen graduate students in Harvard’s Department of Astronomy volunteered to take a forty-question version of the pre-test in the Fall of 1988. Many also added comments to the scoresheet pointing out inconsistencies or other problems with items. Their average score on this test was 36.7 items out of 40 (92 percent correct). The lowest-scoring student answered 33 questions correctly. One student answered all the questions correctly. Only two questions were answered incorrectly by more than 40 percent of the students. One asked them to estimate what size object would just cover the Moon when held at arms’ length. The other asked them to choose properties that would be the same for two stars of equal apparent brightness. Both questions were eliminated from future studies. 3. Missing Item Answers Whereas a rather large fraction of students did not answer demographic and schooling questions on the test, fewer did not answer individual content questions. I have attempted to identify factors that would help explain these missing data. If these missing answers are related to problem difficulty or type,

1

this would enter into an analysis of results and affect conclusions. If students skipped answering a question because it was too hard, then the actual difficulty of the question, calculated by dividing the number correct by the total number of subjects, should be revised upward because, if students had guessed at the answer rather than skipping the problem, more would have gotten it right. I created several factors that help characterize test items: P-value: the difficulty of the question. Item #: from the order in which the question appeared. Picture: whether the problem had an accompanying graphic. Concept: whether the problem dealt with an astronomical concept. Fact: whether the problem dealt with an astronomical fact. Math: whether the problem required the exercise of a math skill. Readability: Gunning Fog Index. Data for each test items are presented in Table XIII.

2

Table XIII. Test Item Characteristics for Missing Answer Regression Item ## MissingP-value Picture Concept Facts Math GF Index 1 0 .66 0 1 0 0 2.9 2 6 .26 1 1 0 0 2.7 3 4 .31 1 0 0 1 8.2 4 1 .49 1 1 0 0 8.4 5 1 .10 1 1 0 0 10.1 6 6 .13 0 1 0 0 3.3 7 4 .45 1 1 0 0 12.4 8 11 .34 0 1 0 0 9.4 9 5 .25 0 1 0 0 12.4 10 8 .40 0 0 0 1 12.9 11 5 .13 1 1 0 0 8.2 12 3 .18 1 0 0 0 14.8 13 8 .29 0 0 1 0 8.1 14 12 .24 0 0 1 0 8.1 15 10 .30 0 0 1 0 8.1 16 16 .28 1 0 0 1 6.2 17 11 .12 0 1 0 0 4.2 18 5 .44 0 1 0 0 6.4 19 11 .66 0 0 0 1 7.0 20 19 .37 0 0 1 0 7.2 21 12 .62 0 0 1 0 7.2 22 20 .68 0 0 1 0 7.2 23 11 .51 0 1 0 0 7.2 24 10 .23 0 1 0 0 7.2 25 15 .37 1 0 0 1 5.8 26 16 .46 1 0 0 1 13.5 27 19 .39 0 1 0 0 7.3 28 17 .42 0 1 0 0 6.4 29 19 .33 0 1 0 0 6.4 30 14 .31 0 1 0 0 11.5 31 16 .45 0 1 0 0 3.6 32 13 .44 0 1 0 0 9.4 33 22 .36 0 0 0 1 11.3 34 26 .28 1 1 0 0 6.5 35 20 .23 0 1 0 0 9.1 36 25 .33 0 1 0 0 5.6 37 23 .19 1 1 0 0 6.5 38 30 .24 0 1 0 0 5.6 39 33 .39 0 1 0 0 4.6 40 30 .33 0 1 0 0 8.1 41 33 .07 1 1 0 0 6.0 42 35 .15 1 1 0 0 9.1 43 33 .38 0 1 0 0 6.9 44 38 .40 1 1 0 0 5.8 45 43 .30 1 1 0 0 5.7 46 51 .29 0 1 0 0 6.7 47 44 .48 1 1 0 0 9.1

3

These characteristics were entered into a regression equation to explain the variance in the number of missing answers. If the frequency of missing answers is the result of some random process, a regression analysis will explain none of the variance in the number of missing answers. This multiple regression explained 88 percent of the variance. Regression Analysis of Missing Answers by Item Characteristics Dependent variable is: # Missing Answers in Each Item R2= 86.4%

R2(adjusted) = 84.0%

s = 4.989 with 47 - 8 = 39 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

6187.31

7

884

Residual

970.907

39

24.8950

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

-3.96001

7.128

-0.556

Item #

0.838298

0.0564

14.9

P-value

-2.20499

5.529

-0.399

Picture

3.16731

1.688

1.88

Concept

3.33546

5.728

0.582

Facts

6.74269

6.078

1.11

Math

4.05691

5.730

0.708

GF Index

-0.396620

0.3068

-1.29

35.5

Reducing the regression equation to only a single factor—that of item number—still explained 84 percent of the variance. I feel that this single factor is indicative that students became bored or discouraged with the test, or ran out of time for completion. It does not appear that other item factors have much of a role in explaining students’ choice not to answer a question. In particular, the difficulty of the item (as represented by its P-value), has a t-ratio < 2. Its contribution is not significant at the p = 0.05 level. Students are not avoiding questions because they are difficult. This could be tested by changing the order of questions in a new test and carrying out this analysis again. R2 should be similar if Item # is responsible for missing answers.

4

5

Regression Analysis of Missing Answers by Item Characteristics Dependent variable is: # Missing Answers in Each Item R2= 83.9%

R2(adjusted) = 83.6%

s = 5.056 with 47 - 2 = 45 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

6007.78

1

6008

Residual

1150.43

45

25.5652

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

-2.68455

1.499

-1.79

Item #

0.833488

0.0544

15.3

235

6

V. Item Analysis Results In this section I examine each test item individually using a variety of techniques and tests. First, I discuss the descriptive statistics for students’ total scores. I have grouped the test items into categories for discussion based on conventional astronomical or curricular areas: Earth and Sun; Earth and Moon; Mathematics; Solar System; Stars; Galaxies; and Light and Color. Within each grouping, the items are subjectively ordered by complexity of the underlying concept. I trace the origins of each item in the literature or through my own interviews, and identify the correct answer and the reasons for including each distractor. This is followed by the calculated P-value and D-value for the correct answer and each distractor. Distractors with P-values greater than 0.20 are discussed. A plot of the P-value of each answer is plotted for each quintile and its meaning is discussed. I have also included suggestions for improving items. A. Total Score The distribution of the 1,414 total scores on the test is a slightly skewed distribution with a mean of 16.0 and a standard deviation of 6.4 (see Figure 7). The highest score of any student on the test was forty items correct, while the lowest score recorded was only three items. Figure 7, Histogram of Total Score 150

Summary Statistics for Total Score 100

Total Cases = 1414 Mean = 16.028 Median = 15

50

SD = 6.407 Range = 37 0.00

6.00

18.00 Total Score

30.00

42.00

Variance = 41.062 Minimum = 3

Maximum = 40 10th percentile = 9 90th percentile = 25 Randomly guessing the answer to each question on this test would have resulted in a average score of 47/5 or 9.4 answers correct. A Monte Carlo model of 1,414 subjects each with a random probability of choosing one of five answers on a forty-seven-item test, results in an average score of 9.45 with a standard deviation of 2.76 (see Figure 8). There may have been many students who simply guessed at the answers to questions.

Figure 8. Monte Carlo Model of Random Selection of Answers 250 200 150 100 50

0.00

6.00

12.00

18.00

Total Score

Graphs and tables describing individual test items are not numbered or label in this section. Every item analysis contains a table of P-values and Dvalues and an item response curve. Each item starts on a new page and is thereby separated from others.

7

8

B. Earth and Sun The idea that the Sun is a star and that the Earth orbits around it is a fundamental concept that can be found in most primary school science books (Nussbaum 1986). Most teachers take it for granted that their students know the reason for day and night and the length of the year. In the prediction study carried out with Alan Lightman, participating teachers thought that two-thirds of their students would enter their classes knowing the reason for day and night and that only 36 percent would know the diameter of the Earth. Item 21, The Earth’s Rotational Period Choose the best estimate of the time for the Earth to turn on its axis. A. Hour

B. Day

C. Week

D. Month

E. Year

For all items on this test dealing with astronomical periods, the same five choices were given for periods: hour, day, week, month, or year. These include three periods that are astronomical in nature: — the rotational period of the Earth (a day); — the orbital period of the Moon about the Earth (a month); — and the orbital period of the Earth around the Sun (a year). Two measurements of duration that have no astronomical significance are also included: the hour and the week. A small study (N = 24) of second-grade students found that half of the sample thought that the Earth made one turn in 24 hours. Incorrect answers for the Earth’s rotational rate ranged from “6 minutes” to “200 hours” (Klein 1982). Six of these could demonstrate the reason for day and night; the other twelve could not. Item 21 P-value D-value

A .08 -.18

B

C .62 .48

.06 -.17

D .10 -.21

E .13 -.22

There are no distractors chosen with greater than .20 frequency. The majority of students appear to know the rotational period of the Earth.

9 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The results show a classic item response curve that rises steeply with overall test performance. Although there may be some lower-performing students who think the Earth rotates on its axis once a year, all distractors diminish as the student performance increases. This curve is characteristic of factual information: the better students have it memorized while others do not. Distractor curves have roughly the same shape, diminishing from close to the 20 percent random level to almost zero for the high-performing students. To improve this question, the least chosen distractor—that the Earth spins in a week—could be replaced by a choice that the Earth does not turn.

Item 1, Reason for Day and Night What causes night and day? A. The Earth spins on its axis.

D. The Earth moves into and out of the Sun’s shadow.

B. The Earth moves around the Sun. E. The Sun goes around the Earth. C. Clouds block out the Sun’s light.

The reason for day and night is perhaps the most basic idea assumed by teachers of astronomy of their introductory students. In the prediction survey, participating teachers assumed that 65 percent of their students, on average, would enter their classes with this concept understood and, by the end of the course, 89 percent would leave knowing it. The Earth turning on its axis causing day and night has been described as one of the “most essential ideas which form the Earth conception” (Nussbaum 1985). Children as old as twelve, however, have been shown to believe that the world is a spherical shell with the Earth consisting of the bottom of the hemisphere, and air in the top half. The sun travels along the surface of the sphere in this model. Children think that the Sun is in the sky in the daytime and travels below the Earth at night (Nussbaum 1986). One student had integrated this with his other knowledge, explaining that, “at night the Sun travels below us... and this is how the lava in the Earth is heated.” Students with this belief would choose answer “E.” In a study of elementary school students, the reasons stated for day and night included that the Earth revolves around the Sun (B), that the Earth or Sun moves into a shadow (D), or that clouds block out the Sun’s light (C) (Vosniadou and Brewer 1987). Another study of second-grade students found that many knew that the Sun was “on the other side of the Earth” at night, but showed no clear preference for whether it was the Earth or the Sun that moves (Klein 1982). A item similar to this one was included in the 1969 National Assessment of Educational Progress of nine-year-old (third-grade) students. The percentage choosing each answer follows in parentheses: One reason that there is day and night on Earth is that the Sun turns. (8%) Moon turns. (4%)

Earth turns. (81%)

Sun gets dark at night. (6%)

I don’t know. (1%)

This is a fine example of a question that is not designed to identify misconceptions. The high percentage of correct answers can be attributed to the lack of plausible distractors (Schoon 1988). Had the question included “the Earth goes around the Sun” and “the Sun goes around the Earth,” the students’ choices might have been quite different. One researcher found that although college students prefer a heliocentric explanation of the solar system and reject a geocentric model, the majority could not give convincing arguments for their view when answering an exam question on the subject after an introductory astronomy course (Touger 1985). Justifications for heliocentrism took many surprising forms; two examples: “The Sun is the center... by observation we can see that the planets move around the

0

Sun” (p. T-5) or “all our pictures and telescopes and space flights tell us that there is one big star, the Sun, with several smaller planets (moving) around it” (p. T-8). Touger argues that students’ belief in heliocentrism is derived almost exclusively from secondary sources and lacks an empirical base. Many students believe that scientists have actually viewed the entire solar system from a vantage point in space. I would argue that this acceptance of heliocentrism as dogma, without an ability to muster a shred of supporting evidence, makes this concept an attractive and almost universal answer to astronomical questions. Much as one researcher found when interviewing young children that God was invoked frequently to explain certain natural events (Za’rour 1976), the hard-learned belief in heliocentrism is called upon as justification for any astronomical problem for which the individual cannot give evidence supporting her or his view. The follow table lists the P-values and D-values for each answer to Item 1. For all the following items a similar table is presented. Item 1 P-value D-value

A

B .66 .39

.26 -.29

C .00 -.06

D .03 -.09

E .04 -.19

Students do very well on this question, with 62 percent of them selecting the correct answer—that the reason for day and night is that the Earth spins on its axis. From this table, one can clearly see that answer B, day and night are caused by the Earth moving around the Sun, is preferred by .26 of the students in the survey. Surprisingly, this item is not highly correlated with Item 21 (R = 0.37). Twenty percent of the 877 students who get this question right answer item 21 wrong. These students may not have connected the 24-hour period of the Earth’s rotation with day and night. The following graph plots the P-values of each answer to Item 1 for each performance quintile of students. This and similar graphs for following items are not labeled.

1

2 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

From this graph, it appears that answer “B,” the Earth circling the Sun causes day and night, is a major misconception for all but the best performing students in the test population. Note that the answer, “the Sun goes around the Earth,” appears unattractive to students. It does not seem that they are confusing the Earth’s rotation with their observations of the Sun apparently circling the Earth. They actually think that our orbiting the Sun causes day and night.

3

Item 31, Time Zones Boston is 90° east of Hawaii. If it is noon in Hawaii, in Boston it would be about: A. Sunrise.

B. Sunset.

C. Noon.

D. Midnight.

E. Noon the next day.

This item attempts to have students apply the knowledge that the Earth makes one complete rotation in twenty-four hours along with the fact that there are 360° in a circle. With Boston 90° east of Hawaii, there should be a time difference of one quarter of a day or the time between noon and either sunrise or sunset. Item 31 P-value D-value

A

B

.18 -.05

C .45 .34

D

.08 -.19

.22 -.14

E .06 -.13

This question is of moderate difficulty and discriminating power. None of the distractors appears to stand out based on the total score statistics. In the nationwide survey, teachers predicted that only 25 percent of entering students would be able to answer this question correctly. Roughly twice as many students understood this concept than teachers thought. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

I expected that students might get the direction of rotation wrong and pick answer “A” with greater frequency. Answer “D,” however, seems to attract a fair number of students. This can be explained by students not knowing that there are 360° degrees in a circle, and thus thinking that there would be a twelvehour time difference between Boston and Hawaii. This misconception is discussed in more detail in the Mathematics section.

4

Item 22, Earth’s Orbital Period Choose the best estimates of the time for the Earth to go around the Sun. A. Hour

B. Day

C. Week

D. Month

E. Year

Much like the reason for day and night, the concept of the Earth’s revolution around the Sun in a year is fundamental to understanding solar system astronomy. Students who believe that the Sun orbits the Earth in a day would choose “B.” Item 22 P-value D-value

A

B

.03 -.16

C

.16 -.24

D

.06 -.24

.06 -.18

E .68 .48

This question appears to be relatively easy and is moderately discriminatory. No distractor appears to stand out when examining the statistic for the total test. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

This appears to be an easy question for most students. More students get the correct answer to this item than to any other item on the test. However, there are still low-performing students who think that the Earth orbits the Sun in a day.

5

Item 12, When is the Sun Overhead? How often is the Sun directly overhead at noon in your hometown? A. Every day.

D. Only for one day each year.

B. Only in the summer.

E. Never.

C. Only for the week of the summer solstice.

This test was given to students only in the continental United States. The Sun can only be seen directly overhead between the Tropics of Capricorn and Cancer (between 22.5°N and 22.5°S latitude). The Sun is never overhead in the continental United States. The correct answer is “E. Never.” In Boston, the Sun is only 25° above the horizon at noon on the winter solstice. On the first day of summer it is much higher, but still rises only to 71° altitude at its maximum. Schoon found that 12 out of 13 participating teachers and 20 out of 32 student teachers believed that the Sun is always overhead at noon (Schoon 1988). Item 12 P-value D-value

A .41 -.14

B .11 -.14

C .12 -.07

D

E .18 .09

.18 .27

This appears to be a difficult question, with “A” being students’ preferred answer. A plurality of students believe that the Sun is always directly overhead at noon. Teachers predicted that students would do poorly on this question, that only .27 would get this question right. Students do much worse than teachers predict. Not knowing that the Sun is lower in the sky in the winter precludes a proper understanding of the reason for seasons. The connection between the geocentric and heliocentric frames of reference is key to understanding this concept.

6 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The misconception that the Sun is overhead at noontime is the most common answer among all performance levels. These youngsters have not noticed how much longer their shadow is at noon in the winter than in the summer or how the Sun always seems to be in their eyes in the winter. Only students in the highest performance quintile show a substantial reduction in this misconception. However, many still cannot let go of the belief that the Sun is directly overhead at some time, choosing “D, only for one day each year,” but with greater frequency than other students.

7

Item 39, Sun’s Path at the Pole During July at the North Pole, the Sun would: A. be overhead at noon.

D. set in the northwest.

B. never set.

E. none of the above.

C. be visible for 12 hours each day.

This item is an attempt to get students to apply the idea that the Sun’s path through the sky differs at different latitudes. In the summer, the “Land of the Midnight Sun” enjoys twenty-four hours of sunlight each day and would never set. In 1986, three Italian researchers found that the majority of 11-13-yearold pupils believed that “the Sun always rises from the same point on the horizon, the East, and always sets in the opposite point, the West (Loria et al. 1986).” Item 39 P-value D-value

A

B

.12 -.15

C .39 .41

D

.23 -.17

E

.09 -.19

.15 -.04

A surprising number of students answer this question correctly, considering that they do so poorly on other questions dealing with the Sun’s motion. Perhaps they have heard the fact that the Sun never sets at the pole in the summer and could recall it for this test. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Many students think that the Sun’s apparent motion changes little at different latitudes and dates. Many students chose “C,” that the Sun would be visible for twelve hours each day at the North Pole. The same student would probably describe the length of daylight no differently for our latitude. Only the

highest-performing students (in the highest quintile) seemed not to be taken in by this distractor.

8

9

Item 40, Duration of Daylight Which date below has the most hours of daylight in your hometown? A. June 15

B. July 15

C. August 15

D. September 15 E. All dates are the same.

That there is more daylight in summer than in winter is easily noticed, but days begin to get shorter after the summer solstice (June 21) has passed. June 15 is less than a week from the solstice, while July 15 is over three weeks from that date. This question helps to find out if students know what the solstice signifies. Item 40 P-value D-value

A

B .33 .28

C .34 .00

D

.13 -.14

.07 -.11

E .10 -.13

1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

This was a difficult question for most students. A large fraction of students at all performance levels think that the amount of daylight increases from the summer solstice or at least is longer during the summer than the spring. Many students appear to think that, because the summer is warmer than the winter, days must be longer (Schoon 1988).

0

Item 4, Shape of the Earth’s Orbit Of the following choices, which looks most like the Earth’s path around the Sun? C. A.

Sun

Sun D. Sun B. E.

Sun

Sun

Many students believe that the change of seasons is evidence of the Earth’s elliptical path (Touger 1985). Students who have this misconception would show a preference for answer “C,” since without the Earth’s varying distance from the Sun, there would be no summer and winter. Others explain that the Earth is simply closer to the Sun in the summer than the winter (Furuness and Cohen 1989); these students should show a preference for any choice but “A.” The Earth’s orbit is almost perfectly circular with the Sun very slightly displaced from the center of the circle. At the scale of these drawings, the orbit of the Sun is indistinguishable from a perfect circle. Item 4 P-value D-value

A

B .49 .08

.14 -.13

C

D .28 .14

.08 -.20

E .01 -.06

This is a question of moderate difficulty with virtually no discriminating power. Students who score well on the test overall do no better on this item than students who do poorly on the test. The choice of the correct answer, “A,” appears to be virtually independent of student performance on the entire test. With a D-value of only 0.08, a question similar to this one would be rejected from any standardized test. From the overall scores on this item, it appears that answer “C” appeals to a larger fraction of students than one might expect when being chosen at random.

1 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The distractor “C,” that the Earth’s orbit is highly elliptical, is popular with all groups but is preferred more strongly by higher-performing students. Perhaps these students are more likely to have heard that our orbit is elliptical, but do not know how tiny its eccentricity really is. This fact is used by betterperforming students in thinking that the orbit of the Earth is highly elliptical.

2

Item 17, The Reason for Seasons The main reason for it being hotter in summer than in winter is: A. the Earth’s distance from the Sun changes. B. the Sun is higher in the sky.

D. ocean currents carry warm water north. E. an increase in “greenhouse” gases.

C. the distance between the northern hemisphere and the Sun changes.

The Sun is lower in the sky in the winter than in the summer. This change in altitude spreads the Sun’s light over a much broader area on the Earth. The Boston Curriculum Objectives (Marshall and Lancaster 1983) for fifth grade explain correctly that the reason for winter is that “the Sun is lower in the sky,” but then go on to qualify this reason with the incorrect statement, “its rays have to shine through more atmosphere before they reach us, losing heat energy in the process.” Item 17 P-value D-value

A

B

.45 -.15

C .12 .04

D .36 .21

.03 -.08

E .03 -.10

This is a question that is both extremely difficult and does not discriminate between students based upon overall performance. Teachers in the prediction survey thought students in introductory astronomy courses would score .29 before their courses. The actual P-value is less than half of that score. Both answer “A” and answer “C” are far more popular than the scientifically correct response. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

In answering this question, students appear to be torn between two distractors that mention changing distance. Many students believe that the

Earth’s orbit is highly eccentric so that the entire Earth is physically closer to the Sun in the summer than in the winter. A more “evolved” explanation is that the Earth leans toward the Sun in the summer and away from the Sun in the winter. This is consistent with many diagrams in textbooks that show the one pole proportionally much closer to the Sun in the summer. The correct answer, “B,” appears to be avoided by most students at all performance levels. It is clear that students have not connected the Earth’s tilt with the altitude of the Sun in the sky during different seasons.

3

4

Item 13, Diameter of the Earth. Choose the best estimate of the diameter of the Earth. A. 1,000 miles.

B. 10,000 miles.

C. 100,000 miles. D. 1,000,000 miles.

E. 10,000,000 miles.

The Earth’s diameter is roughly 8,000 miles. A study of twenty-four second-grade students in 1982 found that boys and girls did not have a significant preference for the Sun being larger than the Earth (Klein 1982), even though the Sun is roughly one hundred times larger in diameter and one million times larger in volume. Item 13 P-value D-value

A

B

.06 -.07

C .29 .30

D .32 .00

.25 -.15

E .08 -.18

The correct answer to this problem is preferred less often than the misconception represented by answer “C.” Although one may argue that this item tests for the knowledge of a fact, it is apparent that many students prefer a wrong answer to the correct one. It is doubtful that they have been taught that the Earth is 100,000 miles in diameter. There must be some reason why they prefer this answer to the correct one. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Not many students appear to know the diameter of the Earth. At all levels but the highest, students seem to prefer a much larger diameter for the Earth, ranging from 100,000 miles to 1,000,000 miles.

5

Item 14, Diameter of the Sun. Choose the best estimate of the diameter of the Sun. A. 1,000 miles. B. 10,000 miles.

C. 100,000 miles. D. 1,000,000 miles.

E. 10,000,000 miles.

The diameter of the Sun is approximately 865,000 miles, so answer “D, 1,000,000 miles,” is the closest choice. Item 14 P-value D-value

A

B

.03 -.12

C

.09 -.21

D

.17 -.08

E .24 .11

.46 .14

Students prefer answer “E” to the correct answer by almost a two-to-one margin. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Students do not appear to be guessing when they answer this question. Students in all performance groups appear to have a preference for 10,000,000 miles for the diameter of the Sun. Coupled with the answer to the previous item, it appears that many students believe that the ratio of the Sun’s diameter to the Earth’s diameter is from 10:1 to 100:1. The actual ratio is 110:1, so most students think the Sun and Earth are both much larger than they really are and that the Earth is much closer in size to the Sun than it really is.

6

Item 9, Scale Model of the Sun and the Earth If you used a basketball to represent the Sun, about how far away would you put a scale model of the Earth? A. 1 foot or less

B. 5 feet

C. 10 feet

D. 25 feet

E. 100 feet

The Earth is about 110 solar diameters from the Sun. Yet, textbook illustrations show the Earth and Sun to be very close in size and just a few solar diameters from each other. Item 9 P-value D-value

A

B

.07 -.07

C

.22 -.14

D

.23 -.10

E .22 .03

.25 .25

This is a difficult question for most students. They get little practice in building scale models in school, and so they may have little idea what the EarthSun system would be like in scale. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The pattern of responses shows no overall misconception that appeals approximately equally to all performance groups. Students with low overall scores have a preference for a scale distance from 5 feet to 10 feet from the Sun. This translates to a ratio of the Earth orbiting from 6 to 12 solar diameters from the Sun. The correct ratio is only preferred by the top-performing students. This item could be improved by making it more like the previous questions. There appears to be little discrimination between answers “B,” “C,” “D,” and “E” among subjects, so expanding the dynamic range of the answers may be useful. Answers of 1 foot, 10 feet, 100 feet, 1,000 feet, and 10,000 feet may help to focus students on a single misconception.

7

Item 46, Gravity Which of the following would make you weigh half as much as you do right now? A. Take away half of the Earth’s atmosphere.

D. More than one of the above.

B. Double the distance between the Earth and Sun.

E. None of these.

C. Decrease the Earth’s rate of spin so that 1 day equals 48 hours instead of 24 hours.

Gravity has nothing to do with the atmosphere, the distance to the Sun, or the length of the day. None of the first three actions would make you weigh less, so “E” is the correct answer. The idea that air pressure is the cause of gravity could be the result of incorporation of the fact that people weigh less on the Moon, where there is no atmosphere. As a fourteen-year old student explained, “There isn’t any gravity on the Moon...because...there’s hardly any air there, is there?” Helping out with another explanation is a sixteen-year-old student who stated, “there ain’t no air in space so they’re as light as anything...if they were on the Moon they’d have to wear steel boots to keep them on the ground” (Watts 1982). Even high school physics students are not immune to this idea; roughly 15 percent believe that gravity is the result of air pressure. One expresses his idea of what holds a book on a table: “If the air was taken away, the book might drift off” (Minstrell 1982b). A 1981 study involving interviews of 179 college physics students during their first week of class uncovered similar ideas (Gunstone and White 1981). When asked to predict the effect of a change in altitude on the weight of an object, several students explicitly stated that lower air pressure would make objects weigh less, “[It is] ... common sense that the rarefied air will make the bucket weigh less” (p. 298). Through interviews with twenty-four tenth-grade students, several were found to believe that the Earth’s gravity was entirely dependent on either the Earth’s distance from the Sun or related to the Earth’s rate of rotation (Treagust and Smith 1986). When asked to explain from which of three identical planets, at different distances from the Sun, a rocket would have the easiest time lifting off, many students expected that the planet furthest from the Sun would manifest the least gravity. One explained: “it [the planet] is furthest away from the Sun and the gravitational pull is less there” (p. 365). When asked to explain if a rocket would have an easier time lifting off from a planet that was not spinning, a majority of students (47 percent) argued that the spin was related to gravity. Here is an example of one student’s explanation, “the Earth is a fast rotating planet and it takes an enormous amount of fuel to lift a rocket off the Earth. It will be easier with a planet (other than the Earth) or (if the Earth exhibited) no rotation” (p. 365). Another student argues for his point with an analogy, “Swinging a stone on a string slowly, it will go out a little way, and fast, a long way. That is the same as gravity. Then the ‘no rotation’ planet would be easiest” (p. 365). The above misconceptions are accounted for in answers “A,” “B,” and “C.” Students who think that more than one of these ideas is operating would choose “D.” Students with a scientific understanding of gravity would choose “E.” The reason that the question was phrased to read “weigh half as much as you do right now” is that there are very slight effects on weight from each of

two distractors. Removing the atmosphere would make you weigh a few ounces more, since the atmosphere provides some buoyancy. If the Earth were not spinning each of us would weigh a bit more. Both these effects would make us heavier, not lighter. Item 46 P-value D-value

A

B

.21 -.09

C

.13 -.14

D

.17 -.01

E .17 .00

.29 .25

This is a difficult question with limited discriminating power. The correct answer is chosen more frequently than other answers by students, although not by much. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The item response curves for this problem show that only answer “A,” that the Earth’s atmosphere affects gravity, is a possible misconception. Air pressure does affect weight at sea level by less than 1 percent. We must not confuse these minor effects with the major misconceptions. Replacing answer “B” with a less attractive alternative might help improve this problem as a test for misconceptions.

8

9

C. Earth and Moon The Moon is the Earth’s closest neighbor. It is about 2,000 miles in diameter and, as it orbits the Earth, it keeps an average distance of 240,000 miles. It is a spellbinding sight in the night sky. Over the course of a month in its orbit about the Earth, it goes through a cycle of phases. Item 11, Scale Model of the Earth and the Moon Which is the most accurate model of the Moon in relative size and distance from the Earth?

A B C D E The larger object in each model is the Earth.

The Moon appears large in the night sky and even larger on the horizon, but its angular size is small, only one-half of a degree. That is small enough for the tip of your little finger to cover it with your hand outstretched. Picturing the Moon and Earth together from outer space, the Moon is a quarter of the Earth’s diameter and about thirty Earth diameters away. Answer “E” is an accurate scale representation of the Earth and the Moon. Item 11 P-value D-value

A .10 -.10

B C .30 .20 .12 -.16

D E .26 .13 .10 -.01

The correct answer to this problem is the second most unpopular answer. Distractors “B,” “C,” and “D” are chosen more often than the correct answer. This is a very difficult question with no discriminating power.

0 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

All students, regardless of performance, appear to prefer a model of the Earth-Moon system that is not to scale. Most students share a belief that the Moon is relatively close to the Earth, roughly from 3 to 10 diameters away. Many of the lower-performing students prefer a Moon that is close to the size of the Earth as well.

1

Item 15, Distance to the Moon from the Earth Choose the best estimates of the distance to the Moon from the Earth. A. 1,000 miles. B. 10,000 miles. C. 100,000 miles.

D. 1,000,000 miles. E. 10,000,000 miles.

The Earth is 240,000 miles from the Moon, on average. “C” would be the closest answer to being correct. Item 15 P-value D-value

A

B

.09 -.02

C

.18 -.06

D .30 .15

E .25 .02

.17 -.11

1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The correct answer is the one that is most frequently chosen for this item. Answer “D” attracts almost as many students.

2

Item 20, Moon’s Orbital Period Choose the best estimates of the time for the Moon to go around the Earth. A. Hour

B. Day

C. Week

D. Month

E. Year

Most students in introductory astronomy courses know that the Moon orbits the Earth (Targan 1987). Watching the Moon over twenty-four hours, it is easy to see that the Moon appears to orbit the Earth in a day. The Earth’s motion confounds the Moon’s. While the Earth is spinning, the Moon takes a leisurely trip around the Earth, rising about an hour later each day. Schoon found that 42 percent of his population knew that the Moon’s orbit period was one month, that 36 percent thought it took a day, and 20 percent thought it took a year. Item 20 P-value D-value

A

B

.05 -.13

C

.37 -.21

D

.12 -.12

E .37 .46

.07 -.16

The correct answer, “D,” shares the same popularity among students as answer “B.” This is a moderately difficult question with good discriminating power. From the point of view of an Earth-based observer, the Moon does appear to orbit the Earth in a day. This is precisely the misconception that this item tests. Even though most students think that the Earth spins, they cannot use this idea in helping to understand the positions of celestial bodies. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Among lower-performing students, the idea that the Moon orbits the Earth in a day is extremely popular. Higher-performing student have a much greater preference for the correct answer.

3

Item 23, Moon’s Solar Orbital Period Choose the best estimates of the time for the Moon to go around the Sun. A. Hour

B. Day

C. Week

D. Month

E. Year

As the Moon circles the Earth, the Earth orbits the Sun, taking the Moon along for the ride. So the Moon goes around the Sun in a year, just as the Earth does. Item 23 P-value D-value

A

B

.06 -.18

C

.10 -.21

D

.14 -.21

.19 -.25

E .51 .57

This appears to be a relatively easy question for students. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Only the lower-performing groups have any preference for the Moon’s orbital period about the Sun being shorter than a year.

4

Item 7, Frame of Reference Person

Sun

Mars Deimos

The diagram above represents a model of the Sun, Mars, and one of Mars’ moons, Deimos. Please look at the model and determine which object looks most like Deimos to the person in the model who is observing from the north pole of Mars.

A

B

C

D

E

This item originally appeared in a book on astronomy teaching as an example of Piaget’s stage of “formal reasoning” (Schatz et al. 1978). To answer this question correctly, students must be able to switch their frame of reference from outside of this system to being on one of the objects within the system. Looking out from Mars, the dark portion of Deimos would be on one’s right, not on the left as seen from the outside. Item 7 P-value D-value

A

B .45 .31

.07 -.04

C .37 -.23

D .04 -.08

E .07 -.04

For all students, the notion that the dark portion of the Moon remains on the left is a powerful misconception. Notions that the Moon would appear any way other than half-illuminated appears unattractive. Students unable to answer this question correctly would have problems trying to change frames of reference.

5 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Among students who perform less well on the entire test, answer “C” is very popular. None of the other distractors appears to have much popularity. This is another problem that tests a student’s ability to change his or her frame of reference. Although the problem appears much easier than, say, changing from geocentric to heliocentric frames, many students still have great difficulty with this problem. Learning about the phases of the Moon, the light curves of binaries, the apparent motion of the Sun at different latitudes, or the appearance of galaxies all requires some agility with spatial thinking. Without the ability to imagine what objects look like from different perspectives, students will find many astronomical concepts virtually impossible to learn.

6

Item 2, Reason for Moon Phases One night you looked

A few days later you

at the Moon and saw this:

looked again and saw this:

Why did the Moon change shape? A. Something passed in front of it.

D. Its far side is always dark.

B. It moved out of the Earth’s shadow.

E. None of the above.

C. It moved out of the Sun’s shadow.

The Moon’s phases are caused by the fact that our view of the lighted side of the Moon changes as the Moon orbits the Earth. The Moon has no light of its own and is illuminated by the Sun. This answer is not listed in the first four choices, so “E” is the correct answer. Even teachers are confused by this concept. The Boston Curriculum Objectives (Marshall 1983) urge teachers to test their students on identifying Moon phases with a drawing of two “phases,” a crescent Moon and a partial lunar eclipse. Clearly, whoever made up this guide would have answered Item 2 with “B” instead of the correct answer. In a study that interviewed fifty preservice and in-service elementary school teachers, 74 percent of respondents were found to have incorrect concepts (Cohen 1982). In this study, eleven teachers thought that clouds, a planet, or a star blocked the Moon. Two thought that the Moon is black and white, and rotates, and twenty-four implicated the Earth or its shadow. An early precursor to misconception studies examined the “sophisticated errors” of 100 recent high school graduates in 1963. Seventy percent believed that the Earth’s shadow caused the phases of the Moon (Keuthe 1963). Item 2 P-value D-value

A .03 -.09

B

C .41 .19

.27 -.21

D .04 -.05

E .26 .06

This is a difficult question, especially because the correct reason for the phases is not listed in the answers, only “none of the above.” However, teachers in our nationwide survey predicted that .34 of entering students would know the answer to this question and that the fraction who would learn it by the end of their course would rise to .72. Two distractors, “B” and “C,” appear to be more popular than the correct answer

7 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

This question reveals that most students have the wrong idea about the cause of the phases of the Moon. Higher-performing students appear much more likely than lower-performing students to think that the Moon’s phases are caused by the Earth’s shadow.

8

Item 30, Time from Moon Phases Approximately what time could it be if you saw a thin crescent Moon on the western horizon? A. Sunrise

B. Sunset

C. Noon

D. Midnight

E. Anytime of day or night.

If the thin crescent Moon is on the western horizon, the Sun must be close by it. It could only be around the time of sunset, since the Sun sets somewhere in the western part of the sky. Students who think that the Earth’s shadow causes the phases of the Moon might choose “A,” since for the Earth to be between the Moon and Sun, the Sun would have to be on the opposite side of the sky from the Moon. Item 30 P-value D-value

A

B .25 .14

C .31 .05

D

.09 -.19

.12 -.15

E .22 .07

This question was designed to examine whether students could apply their theory of the phases of the Moon to predict the time of day from its position and phase. This item is difficult and has virtually no discriminating power. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

“B” appears to be the most popular answer, but I do not believe it has been chosen for the right reason. It has been chosen with uniformity by all performance groups. Perhaps this question is just too difficult. One can see that misconception “A” appears to be more popular among better-performing students.

9

Item 24, Moon’s Rotation Choose the best estimate of the time for the Moon to turn on its axis. A. Hour

B. Day

C. Week

D. Month

E. Year

The Moon turns on its axis once every month. It therefore always keeps the same face directed to the Earth. Until the space age, we never knew what the far side of the Moon looked like.31 Keuthe found that 19 percent of the high school graduates whom he studied had the common belief that the Moon does not rotate (Keuthe 1963). I was not aware of this study when I originally wrote this test. In future versions of this test, this item should be modified so that “it does not spin” replaces answer “E.” Item 24 P-value D-value

A

B .23 .02

C .23 .00

D

.21 -.03

E .23 .12

.10 -.13

Students do not appear to have a clear preference for one answer over any other. Perhaps changing “year” to “it does not spin” would attract many students. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

31The

Center for Astrophysics has several globes of the Moon with only one side painted. The

other sided was left unpainted.

00

D. Mathematics Many astronomy classes rely on mathematical presentations to present facts or concepts. Scientific notation is used to express astronomical sizes and distances. Angular measure is used to locate heavenly objects in a variety of coordinate systems. Ratio and proportion are relied upon to explain identical angular sizes. Graphs are used to illustrate relationships and patterns. None of the mathematical skills listed above is unfamiliar to students by the eighth grade and many science teachers assume familiarity with these abilities. Item 3, Graph Interpretation Which star on the graph has a temperature most like that of Betelgeuse?

Brightness

Temperature A

B

C

D

E

The ability to interpret graphs is a fundamental skill for students studying science. Graphs can show patterns and relationships that are virtually impossible to ascertain from data alone. They usually present concepts in a concise manner that would otherwise require a great deal of descriptive writing (Weintraub 1967). The position of a point can represent two values simultaneously. Several curriculum projects in science have placed a heavy emphasis on the use of graphs including SCIS (Science Curriculum Improvement Study), SAPA (Science-A Process Approach), and ESS (Elementary Science Study) (Padilla et al. 1991). In the graph above, the datapoint closest in temperature to the star, Betelgeuse, is “D.” Many students believe that graphs are concrete representations of physical systems. To these students graphs are maps, not abstractions. A graph is thought to be a picture of a situation (Bell et al. 1987). Interpreting the “closeness” of datapoints on a scatter-plot is considered only as an omnidirectional physical closeness and not related to position with respect to one axis only. Bell, Brekke, and Swan found that only 26 percent of British high school students were able to correctly answer questions about the data in a scatter-plot similar to the one above. The graph above is a simplification of the Hertzsprung-Russell diagram, which relates stellar type (or temperature) and luminosity (or absolute brightness). During its lifetime, the star’s position on this graph will change.

Many students believe that this movement represents a real spatial movement of a star (Schatz et al. 1978). Graphs similar to this one are fixtures in introductory astronomy texts, although the data represented in this particular graph are of no import for solving the posed problem. Students who interpret the above graph in this way would think of “B” as the physically closest point to Betelgeuse. Those students who do not know how to interpret the axis might choose “A” because it is closely aligned horizontally with Betelgeuse. One would expect “E” and “C” to be chosen only by students who are guessing. Item 3 P-value D-value

A

B

.33 -.13

C

.31 -.18

D

.03 -.14

E .31 .38

.01 -.02

The correct response to this question is not the most popular answer. Answers “A” and “B” attract many students. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Lower-performing students show a clear preference for distractors “A” and “B.” Those who choose “B” are choosing the closest datapoint. Those who choose “A” are answering in terms of the wrong axis.

01

02

Item 26, Graph Extrapolation This graph shows a plot of the distances of several galaxies from the Earth and the speeds at which they are moving away from us. If a galaxy were discovered to be 2,200 million light years from Earth, a good estimate of its speed would be: 40,000 35,000 30,000 25,000 Speed (miles/sec) 20,000 15,000 10,000 5,000 0 0

500

1,000 1,500 2,000 Distance (millions of light years)

2,500

A. 0 miles/sec B. 200 miles/sec C. 16,000 miles/sec D. 25,000 miles/sec E. 32,000 miles/sec

Scientists infer relationships from collections of observations. That the universe is expanding is evidenced by the fact that distant galaxies are speeding away from us more quickly than are our closer neighbors. This relationship is typically presented in graphical form and is a good example of how graphs are used in teaching astronomy. In this item, the correct answer can be found by extrapolating a straight line through and beyond the datapoints. At a distance of 2,200 million light years from Earth, a galaxy would probably have a speed of about 32,000 miles/sec away from us. One of the nine objectives for TOGS (Test of Graphing in Science) is: “Given a graph and a situation requiring interpolation and/or extrapolation, the students will identify trends displayed in a set of data” (McKenzie and Padilla 1986). This item was included in the test to determine whether students had the graph-reading ability to comprehend cosmological arguments based on graphs of recessional velocity. Item 26 P-value D-value

A .04 -.11

B .12 -.21

C .15 -.22

D .22 -.16

E .46 .49

This is a relatively easy question for many students. No misconception stands out from the table of total statistics.

03 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

For the lowest-performing group there is a preference for the extrapolation to be maintained at the level of the final datapoint, answer “D.”

04

Item 33, Scientific Notation Conversion Convert to scientific notation: 25,600,000 A. 2.56x105

B. 2.56x106

C. 2.56x107

D. 2.56x108

E. None of the above.

Scientific notation is a shorthand for representing very large or very small numbers. It is particularly useful in astronomy because astronomical objects are so large that manipulating quantities represented in conventional form would be very unwieldy. Most textbooks present astronomical quantities in scientific notation and assume that students know how to interpret them. The correct answer for this question is found by moving the decimal point over seven places and representing this as the seventh power of ten. Item 33 P-value D-value

A

B

.32 -.11

C

.12 -.20

D .36 .40

E

.06 -.13

.12 -.10

This item reveals a misconception that appears in students at all performance levels. Those who choose answer “A” are simply adding up the number of zeros in 25,600,000 and using this quantity of zeros, 5, as the exponent for the power of ten. These students have probably learned a rule by rote, to count the zeros only and use this number as an exponent. They most likely do not know what this exponent represents. The others who get the right answer may also not know. 1.00 .90 .80

A

.70 B

.50

C

.40

D

p-value

.60

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

Quintile

.60

.70

.80

.90

1.00

05

Item 10, Addition of Exponents If there are 100,000,000,000 stars in a galaxy and 100,000 galaxies in the Local Supercluster, how many stars are there in the Local Supercluster? A. 105

B. 1011

C. 1016

D. 1055

E. None of the above.

Multiplying large numbers is easier using scientific notation. These two numbers can be multiplied by first converting both to scientific notation and then adding the exponents (1011 stars/galaxy x 105 galaxies = 1016 stars). Adding up exponents is much easier than multiplying quantities out longhand. Item 10 P-value D-value

A

B

.11 -.14

C

.11 -.14

D .40 .35

.13 -.09

E .25 -.12

1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

This skill does not appear to be particularly difficult for most students, although those at the low end of the performance spectrum appear to be guessing at the answer. The students who choose answer “E” are unable to calculate the answer correctly using the addition of exponents. They are made up of a larger fraction of lower-performing students than of higher-performing ones.

06

Item 16, Similar Triangles Looking out of a living room window, you see the following:

3 ft.

The window measures 3 feet from top to bottom. The house that you see is 24 feet tall. If you are 5 feet from the window, calculate how far you are from the house. A. 40 ft

B. 72 ft.

C. 120 ft. D. 360 ft. E. None of the above

The application of reasoning about proportions allows students to solve problems involving angular size. In this case, the problem can be solved algebraically or geometrically. Height window Height house = Distance window Distance house

24'

3' = 24' 5' Distance house

3'

5' Distance house = 24'*5' = 40' ? 3' Researchers have found that students often are not able to reason formally when it comes to thinking about proportions. Yet, explanations in textbooks require the handling of many variables (Karplus et al. 1978). Karplus later went on to suggest that “cross-multiplication” may actually inhibit the development of the understanding of proportion (Karplus et al. 1983). In a test of reasoning about proportions, only 22 out of 474 college-bound high school students could use proportions to solve a two-step problem involving shadows and similar triangles (Farrell and Farmer 1985). Students who multiply the various dimensions together will end up with the other numerical choices. Those who cannot find the result among the choices will choose “E.” Item 16 P-value D-value

A

B .28 .19

.24 -.05

C .24 -.08

D .10 -.02

E .14 -.06

This is a difficult question for most students. Answers “B” and “C” are popular, compared with “D” and “E.” The relatively small fraction of students choosing “E” implies that students are at least attempting to calculate the answer to this problem. The difficulty that they have is that they made the calculation incorrectly.

07 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Among lower-performing students, solving this problem appears to be an exercise in guesswork. They remember that one must use multiplication, so many simply multiple two or three of the numbers together to get an answer. All student groups appear to have great difficulty with solving this problem involving an application of simple ratios.

08

Item 25, Degrees in a Circle While at sea on a small boat, you see a ship on the horizon. It appears 5° in length. How many ships of the same size and at the same distance could fit around you in a circle?

A. 5 B. 36

C. 72

D. 180

E. None of the above.

Angular measure is very important in astronomy. Most objects in the sky are too far away to measure their size in any direct fashion. Astronomers must instead try to find the distance to the object and then use its angular size to compute its actual size. This question is an attempt to determine if students think of a circle as having 360° and whether they are able to use this information. Many teachers start with this as a given in teaching astronomy, as stated in an article in The Science Teacher: “All of us know that circles are divided into 360°” (Russo 1988). My belief is that for many students, any question dealing with the number of degrees would produce a rote response of 180°, since there are 180° total in the interior angles of a triangle. They would have some preference for “B” (180°/5° = 36). Item 25 P-value D-value

A .14 -.20

B .17 -.14

C

D .37 .45

.08 -.11

E .22 -.13

09 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

This question appears to be easy for the highest-performing students but very difficult for those at the other end of the scale. Some students appear to think that there are only 180° in a circle. Many cannot use division to solve this simple problem in angles.

10

Item 19, Probability You have flipped a coin six times and it has come up heads each time. What is your best estimate of what will happen on the seventh flip. A. definitely tails

D. probably heads

B. probably tails

E. definitely heads

C. equal chances of heads or tails

There are a few concepts in introductory astronomy courses that deal with probability. Most texts discuss astrology and the search for extraterrestrial intelligence and some do so at great length. The idea that events can exert “odds pressure” is common in people’s conversations. Sports fans talk of basketball players with “hot hands,” assuming that it is more likely for a player to sink a basket after a long run of success than after a long run of failures. This test was constructed to see if students had any preference for the outcome of a coin flip after many successes of flipping heads. Item 19 P-value D-value

A

B

.04 -.12

C

.10 -.08

D .66 .27

.16 -.17

E .03 -.06

Students do quite well on this problem and seem comfortable with an inability to predict the exact outcome of a random event. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

This appears to be a relatively easy question for groups of all performance levels. Only a few of the lowest-performing group had any preference for an answer above the .20 level. It appears, at least as measured by this problem, that students do not believe in “odds pressure.”

11

E. Solar System The solar system consists of the Earth, Moon, and Sun and all the planets that orbit the Sun. The various planetary moons, asteroids, comets, and other objects gravitationally bound to the Sun can also be included. The names of these objects are often covered in elementary school, but the scale is often distorted when represented in diagrams. The distances in the solar system are vast when compared with the sizes of the objects it contains. Item 27, Visual Parallax Objects that can be seen with the unaided eye and appear to move against the background of stars during one month are always: A. farther away from us than the stars.D.

at the edge of the visible universe.

B . within the solar system.

E.

a part of a binary star system.

C. within the Earth’s atmosphere.

Objects that move against the background of stars are closer to us than the stars themselves. Airplanes and meteors are certainly within the atmosphere and move noticeably in seconds. Satellites, just above our atmosphere, move measurably in minutes. Objects such as planets or comets may move noticeably against the background of stars in a few days or months. So, objects that move in a month are definitely within our solar system—“B.” Item 27 P-value D-value

A .13 -.16

B

C .39 .35

.24 -.11

D .15 -.15

E .07 -.04

12 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

This question is typical of most factual information. Students prefer the correct explanation. There are a few students, however, who characterize any object that moves in the sky as within the Earth’s atmosphere. Perhaps they have never seen an artificial satellite or noticed the slow trek of the planets against the fixed stars.

13

Item 28, Relative Distances in the Solar System Which answer shows a pattern from closest object to the Earth to farthest from the Earth? A. Sun → Saturn → Moon

D. Moon → Saturn → Sun

B. Saturn → Moon → Sun

E. Sun → Moon → Saturn

C. Moon → Sun → Saturn

Many children’s books show drawings of the solar system as vastly out of scale, with planets and the Sun all being about the same size and distance from each other. Measuring from the Earth, we find the Sun is about 400 times further away from the Earth than is the Moon. Saturn varies between 3,500 and 4,500 times further away from us than the Moon, as Saturn orbits the Sun. So the correct answer is C. Item 28 P-value D-value

A

B

.05 -.19

C

.10 -.24

D .42 .29

E .32 .09

.10 -.20

1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Many students have little difficulty with this question. Most know that the Moon is closer to the Earth than either the Sun or Saturn. A substantial number of students, however, believe that Saturn is closer to the Earth than is the Sun. The fraction of students who choose this distractor is larger for higherperforming students.

14

F. Stars Item 18, Relative Distances of Stars and Planets Which answer shows a pattern from closest object to the Earth to farthest from the Earth? A. Space Shuttle in orbit → Stars → Pluto

D. Stars → Pluto → Space Shuttle in orbit

B. Pluto → Space Shuttle in orbit → Stars

E. Space Shuttle in orbit → Pluto → Stars

C. Stars → Space Shuttle in orbit → Pluto

Stars are very far from us on the scale of the solar system. The closest star to our solar system is about 200,000 times further away than the Sun is from the Earth. Or, to put it another way, if the Sun were a grape, the earth would be a speck of dust three feet away and the closest star would be another grape 100 miles from us. At this scale, even Pluto would be close to the Earth, at a distance of 100 feet. Since stars and planets are almost indistinguishable in the night sky, students may not make a distinction between their distances from us. Among 200 eleven- to thirteen-year-old Italian students interviewed, there was no distinction between stars and planets (Loria et al. 1986). Item 18 P-value D-value

A

B

.26 -.17

C

.07 -.20

D

.17 -.27

.06 -.17

E .44 .55

1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The results of this question support the idea that many students believe that there are stars within the solar system, between the Earth and Pluto. Some

students actually believe that the space shuttle goes out beyond the stars. I have talked to students who are mystified about the reason why our space ships have not visited other solar systems, since they think our spacecraft can reach them. Many of these students can see no impediments to human colonization of the galaxy and view the possibility that visitors from other solar systems have come to the Earth as totally reasonable.

15

16

Item 8, Scale Model of the Sun and a Close Star Two grapes would make a good scale model of the Sun and a close star, if separated by: A. 1 foot.

B. 1 yard.

C. 100 yards.

D. 1 mile.

E. 100 miles.

As described in the preceding item, a good model of the Sun and a close star is two grapes separated by 100 miles. Item 8 P-value D-value

A

B

.19 -.20

C

.18 -.14

D

.14 -.04

E .14 .04

.34 .29

1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

This is a moderately difficult question for students. Surprisingly, the major misconception here appears to be that some students think that stars are only a few dozen diameters away from each other.

17

Item 37, Sun’s Movement against the Stars If you could see stars during the day, this is what the sky would look like at noon on a given day. The Sun is in the constellation of Gemini.

€° €° °

° ° ° ° ° ° ° ° Leo

Gemini

° °

Cancer

°

°

° ° °

°

° ° °

°

°

°

Taurus



°

S u °n

° °



°°

°

€ ° °

°

°



° Orion

Canis Minor



°

€ ° °

° °

Canis Major

°

In what constellation would you expect the Sun to be located at sunset on this day? A. Leo B. Canis Major

C. Gemini

D. Cancer

E. Orion

From a geocentric point of view, the celestial sphere makes one complete turn about the earth in a day, with the Sun pretty much stuck in position. Over the course of a year, the Sun slowly makes its way through the zodiac at a rate of about 1°/day until it arrives back in its starting position. If the background of stars could be seen along with the Sun, its movement in a day would be barely perceptible against its background. The Sun would appear to be in the same constellation for a month at a time. The correct answer to this item is “C”: the Sun would set in the same constellation that it was in at noontime. Many students view the night sky as permanent and unchanging. This could be because they view the universe as static (Lightman et al. 1987). I have found that even Harvard students are surprised to find the “stars have moved” when asked to measure the position of stars over several hours. Students who think of the starry sky as static could interpret this problem in a few different ways. If they view the above chart as a picture of the sky from the northern hemisphere, west would be to the right and the Sun would set down to the right, in Canis Major (answer “B”). Viewed as a map, with north always up, west is to the left, so the Sun would set in Leo (answer “A”). Students who know that the Sun is always in the zodiac might recognize the span of Taurus, Gemini, Cancer, and Leo as part of the zodiac. They could reason that the Sun would set in the zodiacal constellation closest to the horizon, Leo (answer “A”).

Item 37 P-value D-value

A

B .32 .17

C

.28 -.05

D .19 .05

.09 -.19

18

E .10 -.02

1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The use of a celestial sphere can allow students to test their theories by modeling the appearance of the sky to determine whether the model matches actual observations. This has been done in Project STAR activities and by others (Carter and Stuart 1989). The discrimination power of this question is close to zero; students who perform well on the test do no better than those who do poorly. Two major misconceptions are apparent. The better-performing students appear to be attracted to the Sun setting in the constellation Leo. Perhaps they recognize Leo as a sign of the zodiac. Other students appear attracted to the Sun setting in Canis Major.

19

Item 34, Stellar Parallax

*

*

**

*

*

* *

The Big Dipper would have a noticeably different shape to the unaided eye: A. if viewed from another star. B. if viewed from Pluto. C. if you looked at it a year from now. D. if you viewed it from China. E. never, it would always look the same.

The pattern of stars in the sky is a unique arrangement that remains unchanged to the naked eye wherever one looks in the solar system. The stars are so far away that changes of viewing location of a billion miles are insignificant. Only from another star would the constellation look different. Indeed, our own Sun would then appear as a part of some constellation if we were in another star system. Item 34 P-value D-value

A

B .28 .39

.14 -.10

C .07 -.17

D .08 -.15

E .41 -.09

20 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The invariance of the starry sky has a powerful pull on the minds of students. The view of the heavens as unchanging is much more popular than the scientific explanation that the patterns of stars change from different viewing locations. For all but the highest-performing students the stars are fixed.

21

Item 35, Astrology Most astronomers consider astrology to be: A. a science.

D. more than one of the above.

B . a good way to determine personality traits.

E. none of the above.

C. helpful in predicting world events.

The topic of astrology often comes up in introductory astronomy classes; indeed, many teachers have told me that new students are often disappointed to find they will not be casting horoscopes. In Western countries, roughly one person in a thousand is practicing or studying serious astrology (Dean 1987). Although scientists view astrology as a pseudo-science and choose “E” as the correct answer, the inclusion of this question sought to determine how students viewed the subject. With more than 100 periodicals and about 1,000 books in print (about the same as for astronomy), astrology can be a highly technical and mathematical undertaking (Dean 1987). Many students can confuse this analytical sophistication with science and choose “A.” This may only reflect ignorance of what science is. More confounding is that students may actually believe that astrology can predict either personality traits, “B,” or world events, “C.” Selection of “D,” more than one of the above, must include belief in either “B” or “C.” A choice of “B,” “C,” or “D” can be viewed as a true belief in astrology as predictive. Item 35 P-value D-value

A .37 -.08

B .07 -.17

C .07 -.16

D .25 -.11

E .23 .45

22 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Students think there is something to astrology. A plurality think it is a science. Many think it helps determine either personalities or world events.

23

G. Galaxies Galaxies are vast collections of stars and gas that are gravitationally bound to each other. They are so large that if one were to shrink our own Sun down to the size of a basketball, the center of our Milky Way galaxy on the same scale would be 100,000,000 miles away, or at the distance of the Earth from the Sun. Item 29, Relative Distances in the Universe Which answer shows a pattern from closest object to the Earth to farthest from the Earth? A. center of Milky Way B. center of Milky Way C. Andromeda galaxy

→ Andromeda galaxy → North Star → North Star → Andromeda galaxy

→ North Star → center of Milky Way

D. North Star → Andromeda galaxy E. North Star

→ center of Milky Way

→ center of Milky Way → Andromeda galaxy

The Sun exists as a not very special star among 100,000,000,000 others in our galaxy, the Milky Way. The closest large galaxy to us is Andromeda, which can be seen as a faint patch of light with the naked eye in the night sky. The North Star in comparison is relatively close to us. The correct answer is “E.” The center of the Milky way is 100 times further away than the North Star. Andromeda is 3,000 times further away. Item 29 P-value D-value

A

B .12 .04

C .23 .02

.14 -.18

D .16 -.13

E .33 .21

24 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Students do not seem to prefer one distractor over another in this item, except for a slight preference for the Milky Way being closer to us than the North Star. Perhaps some students do not know that the Milky Way is the name of our own galaxy.

25

Item 38, Observable Galaxies The best place to look for other galaxies in the night sky is: A. near the Moon.

D. away from the Milky Way.

B. near dense concentrations of stars.

E. close to planets.

C. in the constellation Sagittarius.

The topic of galactic structure is often included in introductory astronomy and earth science courses. Activities on the classification of galaxies based on their structure are common, leading to the explanation that our own Milky Way is a spiral galaxy. The distribution of observable stars and globular clusters in the night sky is often used as evidence that we are within a large, relatively flat collection of stars. Since we are within a galaxy, we cannot see its structure as easily as we can view others from afar. Some teachers have proposed activities to help students construct models to describe our stellar system. Through thinking out what the night sky would look like if we were within different types of galaxies and at different positions within galaxies, a student can develop a very good idea of the shape of our galaxy and our position within it (Doménech and Casasús 1991). Since we are on an arm of our own spiral galaxy, we do not see any galaxies in the plane of the Milky Way. They are blocked by stars and dust in our own galaxy. One must look outside the galactic plane to see other galaxies. The correct answer is “D.” Galaxies are dense concentrations of stars, so students may be swayed to answer “B,” but galaxies do not appear so in the sky. Galaxies visible to the naked eye, Andromeda or the Magellanic Clouds, appear as faint patches of light. The dense concentrations of stars that we can see, such as the Pleiades, are within our own galaxy. Planets within our solar system and the Moon have nothing to do with the far-away galaxies and would actually obstruct our view of such faint objects. Sagittarius, a constellation that coincides with the center of the Milky Way, is particularly devoid of other galaxies and was simply included as a jargon-laced distractor. Item 38 P-value D-value

A .08 -.19

B

C .45 .13

.12 -.16

D

E .24 .22

.10 -.13

26 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The most popular answer to this item is that galaxies can be found near dense concentrations of stars in the night sky. This is a misconception. When we look out at the night sky, dense concentrations of stars are inevitably accompanied by invisible dust clouds that block our view of galaxies.

27

Item 36, Expansion of the Universe When the observable universe was half its present age, it was: A. larger than it is now.

D. exactly the same size as it is now.

B . smaller than it is now.

E. collapsed into a black hole.

C. roughly the same size as it is now.

The universe is expanding. The galaxies are shooting away from each other at enormous speeds. At half of the universe’s present age it had to be smaller than it is now (“B”). In a telephone study of 1,111 American adults concerning their cosmological beliefs, only 24 percent believed that the universe is expanding (Lightman and Miller 1989). The majority preferred to think that the universe is static. Greater preference for an expanding universe was found among males, college graduates, those younger than fifty years of age, and those who were not church members. This study went on to probe for the reasons supporting each individual’s belief in the size of the universe. The most prevalent reason was “observation.” The stars in the night sky appear motionless and this observation appears to be a fact that strongly motivates the belief in a static universe. An earlier study found that among eighty-three high school students, many expressed “fears of catastrophe to Earth” with the idea of a changing universe (Lightman et al. 1987). There appears to be a strong emotional component related to beliefs concerning the nature of the universe. In a series of interviews of Italian eleven-year-olds, a variety of interesting explanations were given for a static universe. One study put it succinctly: “The stars do not move. If only one will, all the universe will be untidy” (Viglietta 1986). Item 39 P-value D-value

A .12 -.15

B

C .39 .41

.23 -.17

D .09 -.19

E .15 -.04

28 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

All but the highest-performing students appear to prefer that the universe is constant in size over thinking that it is expanding.

29

H. Light and Color Item 43, Role of Illumination You are in a completely dark room. There are no lights and no windows. Which group of objects do you believe you might be able to see? A. bicycle reflectors, a cat’s eyes

D. more than one of these groups

B. silver coins, aluminum foil

E. none of these

C. white paper, white socks

For objects to be seen, they must either produce light or reflect light. None of the objects listed produces light and in a completely dark room, none could reflect light. The correct answer is “E,” none of the above. So how could students choose any answer but “E”? The answer is that they do not understand the role of light in illumination. A study of 102 fifthgrade students found that the most prevalent belief about the role of illumination is that we see things because light shines on objects and “brightens” them, not because light is reflected from them (Eaton 1984). Only three students from this group mentioned reflection or bouncing light in their explanation. Bright objects that are “dazzling” or unusual in their appearance may be sensed as active in some way and not passive scatterers of light (Jung 1987). In a study of twenty high school students, pupils were asked to explain “What is it that makes you see this object?” Most students never mentioned any linking mechanism between the object and the eye. Others explained that the act of seeing takes place by a “look” or “vision” going from the eye to the object (Anderson and Karrqvist 1983). Students who hold the latter view would have little problem thinking they could see nonluminous objects in the dark, since all they must do is “look” at an object to see it. Students with this view would choose answer “D.” In responding to item 43, students who view some objects as “naturally bright” would choose those objects as being seen without illumination. White paper and white socks seem to fit this category. Objects that are seen in the dark, such as bicycle reflectors and cat’s eyes, may be thought of as emitting their own light, even though they are just directional reflectors of light. Item 43 P-value D-value

A .17 -.19

B .07 -.18

C .20 -.04

D .16 -.17

E .38 .45

30 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

This relatively easy question still catches a few misconceptions. Among lower-performing students there is a slightly greater than random choice of “A,” bicycle reflectors and cats’ eyes. Since these objects are usually seen at night and appear to glow in the dark this answer may seem reasonable. These objects, however, glow only when reflecting light. Among higher-performing students, there is a slight preference over random for white paper and socks. Since these objects are very reflective, they would probably be the most noticeable in a darkened room, but they would be unseen in a completely dark room.

31

Item 44, Light Propagation at Night It is nighttime. Headlights from a parked automobile light up the road brightly from point A to point B. A person standing at Point D can see the headlights glowing.

A

B

C

D

Which statement best describes the farthest point that light from the headlights can reach? A. Light does not leave the headlights.

D. The light reaches only as far as point C.

B. The light reaches only as far as point A. E. The light reaches at least as far as point D. C. The light reaches only as far as point B.

This question, in a slightly different form, was first proposed in a study of Swedish students 12 to 15 years of age (Anderson and Karrqvist 1983). Many students said that the light did not reach the observer, even though he could see the headlights. The explanation appears to be based on the conception that the light from the headlamps reaches only as far as the brightly lit road and that a separate activity of the observer “looking” at the headlamps allows him to see it. For many students there is no connection between the headlights being illuminated and them being perceptible. As one student remarked, “The light doesn’t reach further (than the spot on the road). The light is too weak, but the pedestrian can see it all the same” (p.30). Another student had a similar prediction, but a different mechanism, “Light gets weaker the further away it goes. In the end it fades out” (p. 31). All in all, only 216 out of 558 students believed that light actually reached the observer. Fifty-nine students in grades four through ten were interviewed about their ideas concerning the propagation of light (Stead and Osborne 1980). In individual interviews and later in a multiple-choice test, they were asked to explain how far light “could go” from a variety of sources of light. Students expressed a variety of misconceptions, including, “it stays there [in the candle],” and “it comes out a certain distance depending on the brightness ...but it stops after a while....” The majority of students thought that light either stayed in the source or could travel only a certain distance before it “fades away” or “just gets duller.” These students would prefer answers “A,” “B,” “C,” or “D.” Many students confuse light propagation and vision. They do not understand how they are linked. While it may seem to be contradictory that a student can agree that the observer in the problem can see the headlights glowing and yet choose that the light never reaches him, many do not know that light must enter the eye to be seen. As one student remarked, “he can see it [the source], but the light doesn’t reach him” or “it wouldn’t reach him [the light] but he could still see the TV” (p. 86). The word “light,” as in light coming from the headlamps, is somewhat ambiguous in this context. Students view light as not coming to the observer

because the source does not perceptibly illuminate the observer (Jung 1987). Light exists only where it can be seen, in the headlights and on the road. Item 44 P-value D-value

A

B

.06 -.08

C

.10 -.20

D

.29 -.17

.12 -.14

E .40 .45

The correct answer, “E,” is the most popular answer chosen. This question, at a P-value of 0.40, is of average difficulty and, with a D-value of 0.45, is of average discriminating power.

1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Many students choose the distractor “C,” that the light only reaches as far as the end of the spot on the road; it does not reach the eye of the observer. Among lower-performing students, it is the preferred answer.

32

33

Item 45, Propagation in the Daytime Imagine that the parked car described in the item above has its lights on during a bright sunny day. A person standing at point D can see the headlights glowing.

A

B

C

D

Which statement best describes the farthest point that light from the headlights can reach? A. Light does not leave the headlights.

D. The light reaches only as far as point C.

B. The light reaches only as far as point A. E. The light reaches at least as far as point D. C. The light reaches only as far as point B.

In the Stead and Osborne research discussed above, the authors noticed that students’ answers concerning the distance that light traveled were dependent on ambient lighting conditions. Among the students who thought that light traveled some distance from the source, most thought that it would travel less far in the daytime, and many thought that in the daytime the light would remain inside the source. It appears that many students believe that light is present only if there is enough light for visual effects such as shadows or bright spots to be noticeable (Stead and Osborne 1980). Item 45 P-value D-value

A .26 -.11

B .16 -.17

C .15 -.13

D .10 -.13

E .30 .47

34 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The misconception that light does not reach the observer still has a powerful hold on students in this item. Fewer students choose the correct answer in this problem than in the one above.

35

Item 47, Shadows Lightbulb A (ON)

Lightbulb B (OFF)

Book

Book's Shadow Two identical lightbulbs are placed behind a book. If lightbulb A is on and lightbulb B is off, the book casts a shadow as shown to the right. If both lightbulbs are now turned on, which diagram best represents the shape of the shadow that will be cast by the book? C. longer shadow

A. same shadow

B. no shadow

D. double shadow

E. shadow pointing directly toward you.

Teachers have misconceptions too. Elementary school teachers have been found to think that shadows are concrete entities (Apelman 1984). As a part of a large study of the process of conceptual change among elementary school teachers, ten teachers were studied to determine how their ideas about light and shadows changed as a result of a four-week intensive physics workshop (Smith 1987). Nearly all could state that a shadow was produced when light was blocked in some way, but failed in predicting the outcome of experiments with shadows. This lack of an accurate conceptual model resulted in many predicting that objects placed within a shadow would themselves have shadows. Teachers took little note of the role of light sources when discussing shadows. Item 47 P-value D-value

A .08 -.16

B .14 -.14

C .11 -.20

D

E .48 .37

.17 -.04

This item is of average difficulty and average discrimination. There are no obvious misconceptions.

36 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

This question reveals no strongly held misconceptions by any subgroups based on whole-test performance. All the distractors are below the 20 percent level, which would indicate the probability of random guessing for students who do not know the correct answer.

37

Item 32, Relative Brightness Stars A and B appear equally bright in the night sky. However, star A actually gives off more light than star B. Which of the following is true about star A? A. It is the same distance from us as is star B.

D. It is the same temperature as is star B.

B . It is farther away from us than is star B.

E. It is the same diameter as star B.

C. It is closer to us than is star B.

Stars vary greatly in their intrinsic brightness or luminosity and in their distance from Earth. It is important for students to be able to puzzle out the effect of distance on the apparent brightness of stars. This question presents a situation of two stars that appear equally bright. Item 32 P-value D-value

A

B

.07 -.16

C .44 .43

D

.37 -.19

.07 -.19

E .04 -.11

This item is of moderate difficulty and discrimination. Answer “C” appears to be very attractive to many students. This distractor along with the correct answer make up 81 percent of the student choices. Since the answers represent opposite responses, it may be that students are confused by the complexity of the question.

1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

Answer “C” appears to be preferred by lower- and average-performing students in the survey. Only the highest-performing students appear not to be particularly attracted to this response.

38

Item 5, Inverse Square Law on Earth The man is reading a newspaper by the light of a single candle 5 feet away. How many candles would be needed to light up the paper to the same brightness, if the candle holder were moved 10 feet from the paper?

A. 1 candle

B. 2 candles

C. 3 candles

D. 4 candles

E. More than 4 candles

This item seeks to quantify the nature of students’ view of the propagation of light. A prevalent view is that light is matter, emitted from the source, which slowly loses mass. In this view, the intensity of light slowly falls in intensity until light is no longer present (Reiner and Finegold 1987). Teachers’ guides also perpetuate misconceptions. For example, one guide for eighth-grade science explains that light’s “brightness diminishes the further it gets from its source (except in the case of laser beams).” Light spreads out the further it gets from its source, but it never “tires” and laser beams behave exactly the same way (Marshall and Lancaster 1983). Item 5 P-value D-value

A

B .05 .00

.72 -.09

C .07 -.09

D

E .10 .22

.07 -.01

This is a very difficult question for students, with very little discriminating power. The misconception represented by answer “B,” that light falls off as the inverse first power of distance, is the most popular answer chosen by students.

39 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The popularity of answer “B” is uniform across all performance groups. Only for the highest-performing students does the choice of the correct answer even climb above the .20 line.

40

Item 6, Inverse Square Law in Space Saturn is 10 times farther from the Sun than is the Earth. From Saturn the Sun would appear: A. 100 times brighter than from the Earth.

D. 10 times dimmer than from the Earth.

B .10 times brighter than from the Earth. E.

100 times dimmer than from the Earth.

C.the same brightness as from the Earth.

Item 6 P-value D-value

A

B

.04 -.12

C

.05 -.14

D

.08 -.09

E .71 .13

.13 .07

This is a very difficult question with little discriminating power. The most popular response, “D,” represents the 1/r misconception. This item is very similar to the one above, but asks students to apply the concept of light propagation in an astronomical context. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

There is very little difference between performance groups for this item. The selection of misconception “D” actually increases with student performance on the whole test. Higher-performing students have a higher probability of selecting this misconception than lower-performing students.

41

Item 42, Filtering of Light When green glass is placed between the flashlight and the white movie screen, a green spot appears on the screen. Green Spot Flashlight Green Glass

White Movie Screen

If green glass and red glass are placed between the flashlight and the movie screen (as shown on the right), what will happen to the spot?

? Flashlight Green Glass

Red Glass

White Movie Screen

A. It will be green.

D. It will be red.

B. It will be yellow.

E. It will disappear.

C. It will be brown.

Objects that appear to change the color of light, such as theatrical gels and stained glass, actually selectively absorb different colors from the beam and allow some colors to pass. In the top diagram, all but green light would be removed from the beam. The addition of a second filter that absorbs all colors but red would absorb the green light and let nothing through. The correct answer is “E.” This question was extended into its present form so that students’ concepts would be discernible from their selection of multiple-choice answers. A simpler version of this question was first developed by Anderson and Kärrqvist as an “interview about instances.” They asked students how a piece of colored glass could change the color of a white flashlight beam, showing them a diagram much like the first illustration in item 42. Many said that the colored glass caused the light to change color, but when questioned about the mechanism, revealed that the red glass added color to the light, bent the light (perhaps like a prism), or lit up the glass so that it produced red light (Anderson and Karrqvist 1983). Only 30 out of 558 students could explain that the glass allowed selective transmission of light, that is, “It’s only the red rays that penetrate the sheet of glass” (p. 59). A study of 227 fifth-grade students showed that 72 percent thought that white light was not made up of a mixture of colors, but that white light was “clear” or “colorless” (Anderson and Smith 1986). So what does colored glass do to a beam of light? Many people believe that colored objects transform the light

beam into another color, so that the green light would emerge from the red filter as being changed into red, “D” (Watts 1985). Other students believe that the action of filters is the same as when paint mixes. They believe that color is a property of objects, not of light (Eaton et al. 1983). Combining green and red paint would produce the color brown. These students would answer the question as “C.” Students who think that the first filter determines the color of the light would answer “A.” Those who have mixed lights might know that two separate beams of light, a green one and a red one, when combined would appear yellow and answer “B.” Item 42 P-value D-value

A

B

.05 -.14

C

.13 -.12

D .52 .11

.13 -.04

E .15 .15

Misconception “C” is the most frequently chosen response by students. The choice of the correct answer is low. This is a very difficult question with little discriminating power. 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

With a D-value of 0.11, the choice of answer “C” actually increases with student performance.

42

43

Item 41, Mixing of Light Yellow light

Blue car A driver came out of a shopping mall one night and looked at his car. His car is painted blue and the lights illuminating the parking lot are yellow. What color did his car appear to be? A. white

B. green

C. yellow

D. blue

E. black

Color is described as an innate property of an object, e.g., “the book is red” (Anderson and Smith 1986). Scientists view color as the selective reflectivity by objects of light at different wavelengths. The source of illumination plays a large role in determining the colored appearance of objects. In a study of fifthgrade students, Charles Anderson found that few students share this view. Only 2 students of 125 described a green book as reflecting green light. Not a single student could successfully determine the appearance of an object when viewed in colored light. Light of a given color is often thought of as a material of that color, so that combining colors, whatever their origin, is just a mixture of objects (Reiner and Finegold 1987). In this way, students can predict that a mixture of yellow light and blue car would be green (“B”). No distinction is made between the color of the source and the reflective properties of the car. Item 41 P-value D-value

A .04 -.12

B

C .73 .20

.06 -.16

D .08 -.07

E .07 .05

This is a very difficult problem for students. At 0.07, it has the lowest Pvalue of any correct answer on the entire test. The misconception represented by answer “B,” that a blue car would look green under a yellow light, is chosen far more often than any other distractor on the test.

44 1.00 .90 .80

A

p-value

.70 .60

B

.50

C

.40

D

.30 E

.20 .10 .00 .00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Quintile

The result that students chose distractor “B” with increasing frequency as their overall test performance increases shows that it is a very powerful misconception.

45

VI. Whole Test Results The preceding section discussed how subjects answered individual test items. This section deals with the test items in a more general fashion. Statistics are calculated that characterize the test as a whole. Comparisons are made between test items and their average characteristics, beginning with an analysis of differences in the P-values and the D-values of test items. This is followed by a discussion of mean item responses. P-values and D-values are graphed against one another to characterize the different types of answers to questions. The frequency of answer codes (A, B, C, D, or E) chosen by subjects is discussed in relation to the guessing. Suggestions are made as to which questions should be included on a shortened test based on a stepwise regression model. This follows with an analysis of item characteristics and a discussion of the degree to which these help to characterize item discrimination and difficulty. A. Ranking of Test Items by P-value P-value denotes the probability of choosing an answer as determined a postiori from a large sample of subjects. It most often refers to the probability of choosing the correct answer, although I have used it to characterize each answer, whether correct or not. The P-values of correct test items range from a low of 0.07 to a high of 0.68 with a mean of 0.34 and a standard deviation of 0.15. These values are quite different from the P-values of problems that most teachers generate; in constructing a test for which the average student grade is 75/100, the average P-value of the items must be 0.75. Compared with those of most standardized tests, P-values of teacher-constructed problems are higher because they are easier. The common procedure of “grading on a curve” does not change the fact that these problems are relatively easy; this technique simply employs artificial means to change the inherent discriminating power of the test. A histogram of the forty-seven P-values of correct answers is plotted in Figure 9. There is a wide range of difficulties represented by the items on this test. Figure 9, Histogram of Item P-values 8 6 4 2

0.00

0.10

0.20

Difficult

0.30

0.40

P-value

0.50

0.60

Easy

0.70

In many standardized tests of academic achievement, items are restricted to only those with P-values between 0.40 and 0.80 (Osterlind, 1989). Using this convention, Figure 10 shows that only fifteen out of forty-seven items of the Project STAR pre-test are acceptable. One can conclude that items that deal with misconceptions appear to have much lower P-values and may be excluded from many tests based on such “rules-of-thumb.” Distributing test items by difficulty reveals that four items have P-values greater than 0.60 and comprise a separate grouping distinct from the other questions (see Figure 9 and Table XIV). These questions can be deemed anchors, in that the majority of students have answered them correctly. Table XIV. Anchors Revealed by Misconception Test 22 19 1 21

Earth's Orbital Period Probability Reason for Day and Night Earth's Rotational Period

0.69 0.67 0.66 0.63

Clement (1986) suggests that anchors can be used to advantage in instruction as starting points to help overcome misconceptions. To do this, teachers should try to build upon these known concepts. For the four problems above, it is relatively easy to imagine ways in which the preexisting knowledge of the students can be used in teaching new concepts. Four examples follow: •Reference should be made to the Earth’s orbital period when discussing the periods of other bodies such as planets. •The reason for day and night should be revisited when discussing the phases of the Moon; after all, the “dark side of the Moon” is simply nighttime on the Moon. •The 24-hour rotational period of the Earth can be brought up to explain the periodicity in the positions of the Sun and stars. •The randomness of astrology or meteor impact can be compared to the flipping of coins. The forty-seven-item pre-test was designed to uncover misconceptions. To that end, I have calculated the P-values of the twenty-five most popular distractors in Table XV. Those that were chosen with a frequency greater than the correct answer for each problem are marked with an “X.”

46

47 Table XV. Ranking of Misconceptions by P-Value Answer P-value #

> Correct answer

Misconception exhibited

41B

.75

X

Colors of light mix like paint.

5B

.72

X

Light intensity drops as 1/r.

6D

.71

X

Light intensity drops as 1/r.

42C

.53

X

Colored filters mix like paints.

14E

.46

X

The Sun is 10x larger than it really is.

17A

.46

X

Changing distance is responsible for seasons.

38B

.46

X

Galaxies can be seen near star clusters.

34E

.42

X

Constellations look the same from any star.

2B

.41

X

The Earth’s shadow makes Moon phases.

12A

.41

X

The Sun is overhead every day.

35A

.38

X

Astrology is a science.

20B

.37

The Moon orbits the Earth in a day.

32C

.37

Inability to reason with two variables.

7C

.37

Objects look the same from the back.

17C

.37

X

Hemispheres are at different distances from the Sun.

40B

.35

X

Daylight lengthens in the summer.

3A

.33

X

Inability to use one axis.

33A

.33

37A

.33

28D

.32

13C

.32

36C

.32

3B

.31

X

Inability to use one axis.

11B

.30

X

The Earth and Moon are a few diameters from each other.

44C

.30

37B

.29

4C

.28

The Earth’s orbit is highly elliptical.

45A

.27

Light does not leave sources in daylight.

2C

.27

Number of zeros is the power of ten. X

The Sun moves rapidly against the celestial sphere. Saturn is closer than the Sun.

X

The Earth is 10x larger than it is. The universe is constant in size.

Light exists only where it can be seen. X

X

The Sun moves rapidly against the celestial sphere.

Moon moves through the Sun’s shadow.

It is not unusual for students to prefer a misconception to the correct answer on this test. The majority of the misconceptions listed in Table XV have P-values greater than the P-values of the correct answer. Moreover, an attractive misconception has a powerful effect on the P-value of an item’s correct answer. The more attractive the misconception is, the lower the P-value of the correct answer. The P-value of the maximum distractors is plotted against the P-value of the correct answer in the graph below (Figure 10). Note the overall trend that the items with the most attractive misconception have the lowest P-values for the correct answer. All datapoints must be beneath the dotted line, which represents the limiting case of no other distractors being chosen by subjects.

1.00 .90 .80 .70 .60 .50 .40 .30 .20 .10 .00

Regression Fit

Value

Correct

Answer

P-

Figure 10. Effect of Distractor Popularity on Item Difficulty

.00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

Maximum Distractor P-value

I have performed a simple linear regression on this data, shown in Figure 10 as a solid diagonal line. Knowing the P-value of the maximum distractor accounts for 48 percent of the variance in the P-value of the correct answer. This conclusion is significant at the p = 0.05 level, since this corresponds to a t-ratio of 2.01 (Tuckman 1988).

48

Regression Analysis of Correct Answer P-value by Distractor P-value Dependent variable is:

Correct Answer

R2 = 47.6% R2(adjusted) = 46.4% s = 0.1056 with 48 - 2 = 46 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

0.465495

1

0.465495

Residual

0.512753

46

0.011147

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

0.571137

0.0387

14.7

Max. Distractor

-0.726616

0.1124

-6.46

41.8

This regression analysis produces a model for predicting the fraction of students that choose the correct answer to any items based upon the P-value of the most popular distractor. The equation is: P-value correct answer = 0.57 – 0.73 * P-valuemaximum distractor It may appear as obvious that the larger the P-value of the maximum distractor, the smaller the P-value of the correct answer. B. Ranking of Test Items by D-value The discriminating power of each test item has been calculated to characterize the items on the basis of how well they discriminate between students who do well or poorly on the test as a whole (see Figure 11). These Dvalues range from –0.01 to 0.57 with a mean of 0.29 and a standard deviation of 0.16. Items with low D-values are of little help in discriminating between students based upon their overall performance Researchers suggest that items with D-values less than 0.40 are subject to improvement and less than 0.20 are unacceptable (Hopkins 1981, Ebel 1991). The D-values of test items for the Project STAR pre-test are graphed below. According to these standards, only fourteen out of forty-seven items are “very good” items and should remain unchanged.

49

50

Figure 11, Histogram of Item D-values 10

8

6

4

2

-0.100

0.100

0.300

0.500

D-value

To what degree does the inclusion of an attractive misconception distractor affect the D-value of an item? In Figure 12, I have plotted the D-value of the correct answer against P-value of the maximum distractor.

D-Value of Correct Answer

Figure 12, Effectiveness of Distractor Popularity on Item Discrimination 1.00 .90 .80 .70 .60 .50 .40 .30 .20 .10 .00

.00

.10

.20

.30

.40

.50

.60

.70

.80

.90 1.00

P-value of Maximum Distractor

In this case, one can see a trend in that items with the highest distractor Pvalues have low D-values. Only those items with low distractor P-values—e.g., less than 0.40—have D-values greater than the desired 0.40. How can the effects of the inclusion of very attractive distractors on the difficulty and discriminating power of items be characterized? The inclusion of misconception distractors significantly lowers the P-values and D-values of test items. Test-makers would probably exclude such items based on these standard measures. Classroom teachers, who have a preference for test items with relatively high P-values, have

no choice but to discard these distractors. Inclusion of such items could easily lower average test grades to 50 percent or below. In order to use such items, teachers and their students must feel comfortable with these lower average test scores. C. Mean Item Characteristic Curve In addition to looking at item response curves (graphs of P-value versus student quintiles for each of the five answers), as we have done for each question, one can combine them all to generate an average curve. This was done by first averaging the P-values of the correct answer for each item. Next, the distractors for each item were arranged separately by popularity, from the distractor with the maximum P-value to the distractor with the minimum Pvalue. These were then averaged across all the items. A pie graph of these data is shown in Figure 13. The correct answer was chosen 34 percent of the time. The most popular distractor was chosen almost as frequently—32 percent of the time. These differences are not significant at the p = 0.05 level. A further analysis is carried out below. The remaining distractors were chosen much less frequently. Figure 13, Average P-value of Correct Answers and Distractors Minimum Distractor 6% 3rd Distractor 10%

Correct Answer 34%

2nd Distractor 16%

Maximum Distractor 32%

Figure 14 shows the averages for all five answers in which P-values are broken down by quintiles to produce composite item response curves for the entire test. This graph shows that the correct answers, as a whole, display a monotonically increasing behavior with respect to the overall performance of students. This result is essentially a consequence of overall student performance on the test. The maximum distractor average has a P-value almost independent of students’ overall performance. It is surprising to see the flatness of this curve. Successively more unpopular distractors have item response curves that are similar to each other in shape. These lowest three distractors each diminish in Pvalue roughly by about a factor of 2 from the lowest-performing quintile to the highest.

51

Figure 14, Average P-value of Correct Answers and Distractors by Quintile .90

Correct Answer

.80

Maximum Distractor

p-value

.70 .60 .50

2nd Distractor

.40

3rd Distractor

.30 Minimum Distractor

.20 .10 .00 .00 .10 .20 .30 .40 .50 .60 .70 .80 .90 Quintile

D. Discrimination/Difficulty Graph P-values and D-values can be examined simultaneously by plotting them against each other, as suggested by Hopkins and Stanley. Items that are very difficult have little potential to discriminate between individuals. The same can be said of items with high P-values. Items with P-values between 0.25 and 0.75 have the potential to be highly discriminatory. In Figure 15, I have plotted three sets of data on one graph. Correct answers are plotted as circles, the most chosen distractor as squares, the least chosen distractor as diamonds. The mean value in both P-value and D-value for each of these groups is plotted as the same shape, but filled in. Almost all test items fall very near or within the diamond shape. Only two items fall into the region of “ideal” items, where their potential discriminating power is greater than 0.50. Almost all are scattered throughout the top half of the graph, where the D-value is greater than zero. Few correct answers are in the bottom half of the graph. A D-value less than zero corresponds to an answer that is chosen more frequently by students performing better on the entire test and chosen less frequently by students performing poorly on the test. Minimum distractors, those chosen least frequently by subjects, have characteristics similar to each other. They appear constrained to one small section of graph in Figure 16. Their average P-values and D-values are tightly grouped. The average P-value of this class of answers is 0.06 with a standard deviation of 0.03. The average D-value is –0.11, with a standard deviation of 0.06. Maximum distractors, on the other hand, appear to be much more varied in their P-value and D-value and are much more similar to the correct answers in these characteristics. Their average P-value is 0.32, with a standard deviation of 0.14. This is very similar to the P-value of correct answers of 0.34. The average D-value of maximum distractors is –0.06, with a standard deviation of 0.13. This is quite different from the average D-value of the correct answers of 0.29. The D-

52

values of the maximum distractors are much smaller than the correct answer and very close to zero. This characteristic is reflected in the flatness of the corresponding curve in Figure 14. Figure 15, P-value versus D-value for Item Responses 1.00

Ideal Items

D-value, discrimination coefficient

.50 Correct Answers

.00

Minimum Distractors

Maximum Distractors

-.50

-1.00 .00

.25

.50

.75

1.00

P-value, difficulty coefficient

From the detailed graph in Figure 16 of the correct answers only, it is possible to identify those items with the highest D-values as numbers 23, 18, and 26.

53

54

Figure 16, P-value versus D-value for Correct Responses 0.6 23

18 0.5 35 0.4

D-Value

20 44 25 43 32 33 39

45 34 3

0.3 12 5

0.2

42 0.1 41

6 17

0

13 9 46 38 16

27 10 8 A 40 36

28

26

31 7

21

22

1

47

19

29 15

24 1 4 2 37

4 30

11

-0.1 0

0.1

0.2

0.3 0.4 P-value

0.5

0.6

0.7

E. Distribution of Correct Answers In any multiple-choice test, the ideal distribution of correct answers draws equally from each of the possible choices. In this test, the distribution of correct answers is heavily biased toward category “E” (see Figure 17 and Table XVI). Table XVI. Answer Response Frequency A Count %

B

C

D

χ2

E

6

9

7

7

18

13%

19%

15%

15%

38%

10.34

This was certainly not my intention in creating this test. I had thought that I had randomly assigned the answers. We can apply a test to find out, in much the same way as in other statistical analyses in this paper, whether this distribution could be considered random.

55

Figure 17, Histogram of Answer Response Frequency 40% 35% 30% 25% 20% 15% 10% 5% 0% A

B

C

D

E

Letter Alternative of Correct Item Answer

Chi-square, χ2, is a measure of the departure of P-values from those expected by chance. A χ2 test was performed on each category to determine if the distribution of answers could be explained by a random selection at the p = 0.05 level. I used the following formula to calculate χ2 for the test answers:

c2=

5

5 x 2 x observed-x expected 2 observed -.20*47 = ∑ ∑ x expected .20*47 a=1 a=1

6-9.4 2+ 9-9.4 2+ 7-9.4 2+ 7-9.4 2+ 18-9.4 2 = 9.4 = 97.2 = 10.34 9.4 In this equation, “a” ranges though each of the five answers: A, B, C, D, and E. The observed frequency count (xobserved) of each correct letter answer varies according to Table XVI. The expected frequency (xexpected) for each of the five answers is the same: xexpected = 0.20 * 47 items. So one would expect each answer to be selected 9.4 times out of a total of 47. For four degrees of freedom, the value of χ2 = 9.49 at p = 0.05 (Tuckman 1988, p. 484). I have calculated χ2 = 10.34. This distribution could not have occurred by chance at the p = 0.05 level. It appears that the test has two many answers that are “E,” the last choice in each question. In revisions of this test, answers should be redistributed so that the correct answers fall more equally into the five possible choices. A chi-square test was also performed on individual test items to determine if the distribution of responses to any particular test item could be explained by random guessing. For four degrees of freedom, the expectation value of χ2 would be the same as in the test above, χ2 = 9.49.

I have calculated χ2 for each item; all are greater than 9.49 (see Appendix 3). This means that selection of answers by students cannot be accounted for by chance at the p = 0.05 level. Although some subjects were undoubtedly answering some questions in a random fashion, too few were answering this way to characterize any item as being answered randomly. The distribution of answers one would expect for each item if one were answering randomly (roughly 20 percent for each answer) does not occur in this dataset. I have also calculated χ2 for each item’s four distractors to determine if the choice of distractor could be explained by chance selection. For three degrees of freedom, the expectation value of χ2 would be the same as in the test above, χ2 = 7.82. All groups of four item distractors have χ2 > 7.82. Distractors, as a whole, were not chosen randomly at the p = 0.05 level. F. Which Questions Should Be Included on a Shortened Test? Many items on this test have little or no discriminating power. A few items have a great deal of discriminating power. It is possible to build a model that is made of items that account for the most variance in total test scores. I have built such a model using the technique of stepwise regression. Starting with the item with the highest D-value, one tests each other item to find the one that accounts for the most incremental variance. This process is iterated until all items are used. Figure 18 below shows the amount of variance accounted for by models of increasing numbers of items. Figure 18. Stepwise Regression of Test Items by Variance in Total Test Score

R^2, % of Variance Accounted F o r

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0

5

10

15

20

25

30

# of Items in Regression Equation

35

40

45

56

Any number of test items can be used to build a shortened test; the number of items to include is purely subjective. The more items, the more variance that can be accounted for (see Table XVII). Since the entire test has a KR-21 Reliability Coefficient of 0.76, one could argue that a shortened test need not account for more of the variance in the total test score than the entire test does of itself. The KR-21 can be thought of as the correlation coefficient that relates all the various sets of one-half of the test with each other. One item alone accounts for almost one-third of the variance (how long it takes the Moon to go around the Sun). Adding two additional items will account for more than half of the variance (graphical extrapolation and knowing how to order the space shuttle, Pluto, and stars in distance from the Earth). Eight items account for 75 percent of the variance in the total score. Fewer than one-sixth of the test questions are needed to build a good shortened test. By including twenty-one items, we can account for 90 percent of the variance. Teachers may wish to use these shortened tests so that less class time is spent in testing. Table XVII. Stepwise Regression Results for Shortened Test Step

Item

R^2

1

23

32%

2

26

45%

3

18

54%

4

45

61%

5

35

66%

6

25

69%

7

20

72%

8

32

75%

9

31

77%

10

8

78%

11

38

80%

12

33

81%

13

7

82%

14

28

83%

15

43

85%

16

27

86%

17

21

87%

18

13

88%

19

36

88%

20

10

89%

21

9

90%

57

G. Predictors of Difficulty and Discrimination Writing questions to test misconceptions is difficult. One would like to identify factors that help to identify questions that have high D-values. Conversely, finding markers for low D-values or for very high or very low Pvalues would help to weed out items with low utility. To this end, I have sought to identify factors that account for the P-values or D-values of items. I have hypothesized that certain attributes could make questions very difficult for students. These include questions that are too ambiguous to answer, questions at the end of the test, those that present a problem in diagrammatic or graphical form, those that can be characterized as conceptual, factual, or mathematical, those that require graph reading or calculation, or those that have high reading difficulty. Questions at the end of the test would be difficult to eliminate, but if item # is a factor, different forms of the test with items in different orders would be called for. By carrying out a multiple regression analysis to account for the variance in either P-value or D-value, I have investigated the possibility that these factors could improve or degrade misconception questions. In this analysis, I have used only the Gunning Fog Index as a measure of readability. The Flesch-Kincaid and Flesch Grade Level could not be calculated reliably for some of the questions with few words. For statistical significance at the p = 0.05 level, the t-ratio for 39 degrees of freedom (47 test questions minus 7 factors and a constant) should exceed 2.03. A regression model built with all of these factors accounts for 17.1 percent of variance in P-value. However, no single factor is significant at the p ≤ 0.05 level. None of these factors appears to be useful in determining the difficulty of test questions. A regression model built with all of the above factors accounts for only 15.6 percent of the variance in D-value. Again, none of these factors is significant at the p ≤ 0.05 level. The various item characteristics cannot be used to significantly predict the discriminating power or the difficulty of these test items. One could argue that this analysis shows that many factors play no role in how students answer these questions. The position of a particular item on the test—its item #—affects the number of students who choose not to answer the item, but not its D- or Pvalue. There is no significant difference in discriminating power of difficulty of questions whether or not they contain a picture, or whether they deal with concepts, facts, or math. The readability of a question also appears to play no role. This lends support to the view that the reading level of the test items is sufficiently low as not to affect the choices that students make in choosing an answer.

58

VII. Demographic and Schooling Factors Results The purpose of this section is to investigate the connection between a variety of factors and the number of misconceptions held by students. Do gender, age, parents’ education, school background, and attitude each affect misconceptions in astronomy? The thirteen items at the end of the test question students’ background and attitudes. I am interested in finding out how, if at all, these factors relate to how well or poorly students do on this test of misconceptions. I used several procedures to help determine these relationships. First, I graphed the response to each item from a table of these raw numbers. Next, I calculated the mean total score and standard deviation for each subgroup of students, based on their answers to each of the demographic and schooling questions. I created a “boxplot” graph that shows this median and several other statistics in graphical form as described in Figure 19. Figure 19. Key to Graphical Icons Extreme Outlier Outliers Highest Connected Data Value Whisker High Hinge ≈ 75% Median, surrounded by 95% Confidence Interval Low Hinge ≈ 25% Lowest Connected Data Value Whisker The purpose of these “boxplots” is to present data visually, showing the differences in students’ total test score by demographic and schooling subgroups. They allow easy comparison of key statistical features of the data. These graphs present two ways of comparing the “central tendency” of the data. The “box” in the boxplot small rectangle encloses 50 percent of the students who chose that particular answer; this shows the difference in data as the relative height of the boxes. The boxplots also show the median value of the data. While these are not the means, they still make it easy to see and compare subgroups (Velleman 1988). Displaying central tendencies as medians is much less sensitive to outliers than using means. The 75 percent cutoff is called the high hinge; the 25 percent cutoff is called the low hinge. The median value of the total test score is shown as the horizontal line inside the rectangle, between the high hinge and low hinge. A 95 percent confidence interval is superimposed as a gray area around the median. The “whiskers” extend from the box to the highest data value not above the high hinge + 1.5 * (high hinge–low hinge) and to the lowest data value not below

59

the low hinge –1.5 * (high hinge–low hinge). Beyond this limit, datapoints are plotted with a circle. The extreme outliers, datapoints beyond 3.0 * (high hinge–low hinge), are plotted as starbursts (Velleman 1988). When two medians appear widely separated, one may think that the difference is statistically significant. However, medians do not take into account sample size. These plots are augmented so that one can tell if a difference in median is statistically significant (Velleman 1988). The shaded areas about the median line are 95 percent confidence intervals. Since the sample size of this survey is only 1,414 subjects out of a theoretically infinite population, the computed median is only an approximation of the population median. Since one cannot predict this unknown value with certainty, a range of medians for the entire population can be generated with some degree of probability. With a confidence level of 0.95, the median of the population can be calculated to be the ± 1.58 * (high hinge–low hinge)/√n. Velleman and Hoaglin (1981) discuss the derivation of this interval and boxplots. One can see from inspection if the means of these subgroups are different or if there appears to be a trend in the data. Following this is a simple linear regression model to fit each factor to the total scores and to calculate if the difference in mean is significant. This determines the amount of variance accounted for by this single factor alone. The significance of these models is determined by the size of the t-ratio. For p = 0.05 with greater than 120 degrees of freedom (beyond this number, the t-ratio changes only by a tiny amount), t = 1.960 (Tuckman 1988). T-ratios greater than this number have probabilities less than 0.05. For a level of significance of p = 0.01, the t-ratio must be 2.62.

60

61

A. Demographic Factors Item 48, Gender Sex: A. Male B. Female

800

Frequency breakdown of gender

600

400

Group

Count

%

female

609

43.1

male

658

46.5

missing

147

10.4

200

Total

female

male

1,414

missing

Gender

About equal numbers of girls and boys specified their gender in this study. This fact is supported by teachers who describe their astronomy and earth science classes as having roughly equal numbers of boys and girls. In chemistry and physics classes, they are relatively more skewed with boys .

62

40

Total score by gender

30 T o t a l

20

10

Female

Male

Group

Mean

SD

female

14.80

5.48

male

17.50

6.68

missing

14.51

7.27

missing

Gender

There is a difference of 2.70 items answered correctly in the means of the two subgroups, with boys answering more questions correctly than girls. This difference is significant at the p = 0.05 level.

63

Regression Analysis of Total Score Based on Gender Dependent variable is:

Total

1,414 total cases of which 147 are missing R2 = 4.6%

R2(adjusted) = 4.5%

s = 6.136 with 1,267 – 2 = 1265 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

2292.63

1

2293

Residual

47630.2

1265

37.6524

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

14.8046

0.2486

59.5

Male

2.69236

0.3450

7.80

60.9

That boys score higher than girls on this test of science should come as no surprise. Results are similar to those obtained on National Assessment of Educational Progress science content questions (Schoon 1988). Boys score about 5 percent higher than girls on the NAEP. On this test they performed about 7 percent better than girls. Lightman and Miller found that males scored 13 percent higher than females on their test of cosmological beliefs. Girls and boys exhibit statistically significant differences in answering fifteen out of the forty-seven items. Girls score higher than boys on items 10, 11, 29, 32, and 35. Boys do better on items 3, 12, 13, 18, 19, 20, 21, 36, 43, and 47. I have examined these problems and have found no patterns or similarities that would help explain this result.

64

Item 50, Ethnic Heritage What is your ethnic heritage? (Indicate the one that you consider the most important part of your background.) A. Latin American/Caribbean B. African C. Asian D. European E. Other 600

Frequency breakdown of heritage 400

Group 200

86

6.08

Asian

70

4.95

Latin Asian

Eur

Latin

msg

Other

Ethnic Heritage

Total

%

African European Afn

Count

532

37.6

99

7.00

Missing

118

8.35

Other

509

36.0

1,414

The analysis of total scores by ethnic heritage is made problematic by the way in which students answered this question. A large percentage of subjects chose “other” as their heritage. About 8 percent of the subjects chose not to answer this question at all. Relatively small numbers of students were of African, Asian, or Latin descent. 40

Total scores by heritage

30 T o t a l

Group 20

10

Afn

Asian

Eur

Latin

Ethnic Heritage

Msg

Other

Mean

SD

African

13.91

5.51

Asian

13.90

5.37

European

18.71

6.50

Latin

13.61

5.34

Missing

16.30

7.42

Other

14.26

5.44

65

Students who considered themselves of European heritage score much higher on the total test score, while all other groups (except “other”) appear to score an average of about five points lower. I have built a multiple regression model by creating “dummy” variables for each of the ethnic heritage subgroups. I have excluded the “other” subgroup, since it is accounted for by the presence of the other four variables. Regression Analysis of Total Score Based on Ethnicity Dependent variable is:

Total

1,414 total cases of which 118 are missing R2 = 13.0%

R2(adjusted) = 12.7%

s = 5.895 with 1296 – 5 = 1291 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

6699.02

4

1675

Residual

44864.0

1291

34.7513

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

14.2692

0.2613

54.6

European

4.44889

0.3655

12.2

African

-0.350551

0.6873

-0.510

Latin

-0.652994

0.6475

-1.01

Asian

-0.369155

0.7515

-0.491

48.2

This analysis shows an accounting for 12.7 percent of the variance in total score by students who select their heritage as European. None of the minority subgroups is significant at the p = 0.05 level. Only the European subgroup is significant. Excluding all the subgroups but European still accounts for 10.6 percent of the variance. As far as this analysis is concerned, it does not appear that the particular minority group of a subject matters, only whether they see themselves as of European heritage or not. This is an important finding.

66

Item 56, Mother’s Education What was the highest level of education that your female parent or guardian completed? A. Did not complete High School.

D. Some college

B. Graduated from High School only.

E. College degree

C. Graduated from Trade, Vocational, or Business School.

For this question and the following one concerning fathers’ education, answers “C” and “D” have been reassigned in coding, so that graduating from trade, vocational and business school” is rated higher than “some college.” A more extensive analysis is contained in Section III, Methodology under the subheading of 2. Variables. 500

Frequency breakdown of mother’s education

400 300

Group

200 100

1

2

3

4

Mother's Education

Total

5

Msg

Count

%

1-A

96

6.79

2-B

411

29.1

3-D

229

16.2

4-C

126

5-E

402

28.4

Missing

150

10.6

8.91

1,414

Most of the subjects in this study reported on the education of their parents that allows for an analysis based upon their schooling. Looking at the graph of the means, it appears that students with more highly educated mothers do better on this test.

67

40

Total score by mother’s education

30 T o t a l

Group 20

10

1

2

3

4

5

Msg

Mother's Education

Dependent variable is:

Mean

SD

1-A

14.15

5.86

2-B

15.14

5.64

3-D

16.06

6.18

4-C

16.73

6.97

5-E

17.56

6.64

Missing

14.86

6.99

Total

1414 total cases of which 150 are missing R2 = 3.2% R2(adjusted) = 3.2% s = 6.222 with 1264 – 2 = 1262 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

1639.55

1

1640

Residual

48854.9

1262

38.7123

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

13.4965

0.4459

30.3

Mother’s Ed.

0.818985

0.1258

6.51

score.

42.4

Mothers’ education only accounts for 3.2 percent of the variance in total

68

Item 57, Father’s Education What was the highest level of education that your male parent or guardian completed? A. Did not complete High School.

D. Some college.

B. Graduated from High School only.

E. College degree

C. Graduated from Trade, Vocational, or Business School.

500

Frequency breakdown of father’s education

400 300

Group

200 100

1

2

3

4

5

Msg

Father's Education

Total

Count

%

1-A

103

7.28

2-B

361

3-D

133

4-C

162

11.5

5-E

478

33.8

Missing

177

12.5

25.5 9.41

1,414

40

Total score by father’s education

30 T o t a l

Group 20

10

1

2

3

4

Father's Education

5

Msg

Mean

SD

1-A

14.00

5.26

2-B

15.69

6.31

3-D

16.93

6.36

4-C

15.50

5.90

5-E

16.936.54

Missing

15.246.85

Dependent variable is:

69

Total

1414 total cases of which 177 are missing R2 = 1.2% R2(adjusted) = 1.2% s = 6.300 with 1237 – 2 = 1235 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

610.621

1

611

Residual

49013.5

1235

39.6870

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

14.4698

0.4617

31.3

Father’s Ed

0.484472

0.1235

3.92

15.4

Fathers’ education accounts for only 1.2 percent of the variance in total scores.

70

B. Schooling Factors Item 49, Age* Age: A. 15 yrs or younger B. 16 yrs C. 17 yrs D. 18 yrs E. 19 yrs or older

It originally appeared that the greatest numbers of students chose answer “A,” 15 or younger. However, since there were many eighth-grade students in the sample, this called for a recoding, as the 15-year-old group is composed of both 14- and 15-year-old students. Students’ answers to this question have been recoded to a new variable, Age*, based on their answers to Item 51 on their grade level. This has allowed an additional age level to be added. The recoding is discussed in detail in Section III, Methodology under the subheading of 2. Variables. 400

Frequency breakdown of age*

300

Group 200

100

14

15

16

Age*

Total

1,414

17

18

19

Msg

Count

%

14

389

27.5

15

275

19.5

16

279

19.7

17

295

20.9

18

99

7.00

19

17

1.20

Missing

82

5.80

71

40

Total scores by age*

30 T o t a l

20

10

14

15

16 Age*

Missing

17

18

19

Msg

Group

Mean

14

13.63 4.92

15

16.38

16

16.90 6.04

17

17.67 6.98

18

16.64 6.82

≥19

11.64 3.67

SD 6.34

16.728.26

By examining the 95 percent confidence intervals about each of the median test scores for each age, one can see a significant overlap in students of ages 15, 16, 17, and 18. These medians are not different at the p = 0.05 level. There appears to be a curvilinear trend in the data, not a linear trend. Students at the extremes of age in this study have lower test scores than those in the central region; students who are 14 or 19 appear to perform significantly worse than others. The oldest students, those who are 19 years of age or older, have the lowest scores of any age group, but they make up only 1 percent of the sample. They represent a group of students who are older than their classmates. Most students this age have already graduated from high school. These students have most likely repeated at least one grade. I have created a new variable, acceleration, to account for students who are behind or ahead of their classmates (discussed later) to account for this factor separately.

Dependent variable is:

72

Total

1414 total cases of which 82 are missing R2 = 3.7% R2(adjusted) = 3.7% s = 6.162 with 1332 – 2 = 1330 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

1968.21

1

1968

Residual

50530.7

1331

37.9644

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

1.80174

1.978

0.911

Age*

0.910080

0.1264

7.20

51.8

With a t-ratio of 7.20, the age of subjects in this study is significant at the p = 0.05 level. The variance in total test scores accounted for is 3.7 percent . The results on this test do not depend in any large way on the age of the students.

73

Item 51, Grade Current Grade Level: A. 9 B. 10 C. 11 D. 12 E. other 400

300

Frequency breakdown of grade

200

Group

100

8

10

9

11

12

Msg

Grade

Count

8 389

27.5

9 177

12.5

10

137

11

311

22.0

12

302

21.4

Missing Total

%

98

9.69

6.93

1,414

Students who took this test were in grades eight to twelve with a mean grade level of 10.0. The large number of eighth-grade students is a natural consequence of astronomy being taught as a part of earth science, as shown in the graph below. Science Enrollments in US Schools 1981-82 3,500,000

Total Enrollment

3,000,000 2,500,000 2,000,000 1,500,000

Life Science

Earth Science

Physical Science Biology

other

1,000,000 Physics 500,000

Chemistry

General Science

0 grade 7

grade 8

grade 9

grade 10

grade 11

grade 12

from "How Many are Enrolled in Science?" The Sci ence Teacher, NSTA. December 1984

30 T o t a l

74

Total scores by grade

40

20

10

8

9

10

11

Msg

12

Group

Mean

8

13.63 4.91

9

15.18 6.44

10

16.15 5.40

11

17.03

6.19

12

18.65

6.93

Missing 15.59

SD

8.01

Grade

Students at higher grade levels appear to have fewer misconceptions. The mean total test score rises with each grade level. This gain is significant at the p = 0.05 level. Grade level accounts for 9.1 percent of the variance. Examining the 95 percent confidence intervals for the median test scores, one can see that these intervals overlap in grades 9, 10, and 11. It appears that students in these grades do not perform significantly differently from each other at the p = 0.05 level. Those in grade twelve do significantly better and those in grade eight do significantly worse. Dependent variable is:

Total

1414 total cases of which 98 are missing R2 = 9.1% R2(adjusted) = 9.0% s = 5.984 with 1316 – 2 = 1314 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

4720.72

1

4721

Residual

47047.5

1314

35.8048

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

4.04602

1.059

3.82

Grade

1.20506

0.1049

11.5

132

Although this result is significant, one must withhold judgment until after performing a multiple linear regression on these data. Students in higher grades have generally taken more mathematics and science courses. These may contribute to student scores more than grade level.

75

Item 53, Earth Science Have you completed a course in Earth Science?

A. Yes

B. No

One would assume that taking courses in science would help students overcome misconceptions. In particular, earth science, usually taught in grades eight or nine, typically deals with astronomical concepts for as long as onequarter of the course. However, earth science taught at the eleventh- and twelfth-grade levels is frequently recommended to students who are not pursuing the more rigorous science sequence of biology, chemistry, and physics (Schoon 1988). 800

Frequency breakdown of earth science

600

Group

400

200

No

Yes

Missing

Count

No

709

50.1

Yes

538

38.0

Missing

167

11.8

Total

1,414

Earth Science 40

30 T o t a l

Total score by earth science 20

Group 10

No

Yes Earth Science

Missing

%

Mean

SD

No

15.58

6.11

Yes

17.17

6.41

Missing

14.26

6.97

76 A large fraction of the students in this study have taken earth science. These students appear to do somewhat better on the test than do students who have not taken the subject. I followed up on Schoon’s suggestion that some students who take earth science may actually perform less well on certain items. On this test, students who had taken earth science did better on average on some problems and worse on others. One of the items exhibited a difference significant at the p = 0.05 level and was more often answered incorrectly by earth science students. This was item 4 which deals with the shape of the Earth’s orbit. Students were more likely to choose a highly elliptical shape for the orbit after taking earth science. This weakness was offset by statistically significant gains in four other items, numbers 3, 7, 23, and 35, which include the ability to read graphs, switch frames of reference, know that the Moon takes one year to orbit the Sun, and knowledge that astrology is not a science or useful for predicting events. Dependent variable is:

Total

1414 total cases of which 167 are missing R2 = 1.6% R2(adjusted) = 1.5% s = 6.244 with 1247 – 2 = 1245 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

770.915

1

771

Residual

48545.0

1245

38.9920

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

15.5853

0.2345

66.5

Earth Science

1.58753

0.3570

4.45

19.8

Taking earth science appears to help students with a few misconceptions, but having taken the subject accounts for only a one and one-half item gain in total test score.

77

Item 54, Chemistry Have you completed a course in Chemistry?

A. Yes

B. No

This question was included to account for students who had taken multiple science courses. It was thought that the total number of science courses taken might be a good predictor of the number of misconceptions in astronomy that students hold. 1000

Frequency breakdown of chemistry

800 600

Group

400 200

No

Yes

Missing

Count

%

No

925

65.4

Yes

317

22.4

Missing

172

12.2

Total

1,414

Chemistry

Almost one-quarter of the students in this study have taken chemistry. 40

Total score by chemistry

30 T o t a l

20

10

No

Yes Chemistry

Missing

Group

Mean

SD

No

15.34

5.74

Yes

19.16

6.89

Missing

13.92

6.92

Dependent variable is:

78

Total

1414 total cases of which 172 are missing R2 = 7.0% R2(adjusted) = 7.0% s = 6.058 with 1242 – 2 = 1240 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

3445.59

1

3446

Residual

45514.1

1240

36.7050

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

15.3438

0.1992

77.0

Chemistry

3.82025

0.3943

9.69

93.9

Students who have taken chemistry do an average of almost four points better on total test score than student who have not. The reason for this difference, however, may have more to do with other factors than with having taken chemistry. Correlation and cause and effect are quite different. Unexamined factors, such as IQ, may be responsible for this difference, or factors examined in this study, that are highly correlated with chemistry, such as mathematics or grade level, may explain even more variance.

79

Item 55, Physics Have you completed a course in Physics?

A. Yes

B. No

Many of the questions on this test deal with material that is covered in physics courses. Many items are presumed by physics teachers to be known by students upon entering their physics classes. Of particular note are the set of questions dealing with light. One would expect that students who have taken physics would do quite a bit better than others who have not. 1500

Frequency breakdown of physics 1000

Group 500

No Physics

Yes

Missing

Count

%

No

1097

77.6

Yes

147

10.4

Missing

170

12.0

Total

1,414

% at Each Grade Level Reporting that They Have Taken Physics 30%

26%

25% 20%

14%

15% 10% 5%

10% 6%

5%

0% 8

9

10

11

80 Few students who take introductory astronomy or earth science have taken physics. Physics is usually offered only at the twelfth grade level for regular students and at the eleventh grade level for accelerated students. An analysis calls the accuracy of student responses into question. Fifty-eight students in grades eight, nine, and ten have designated that they have taken physics.

12

If these answers, which can be checked in some way, exhibit problems with accuracy, what can we conclude about the reliability of the answers to other demographic questions? In this case, students may be confusing an eighth- or ninth-grade physical science course with a high school physics course. Errors in answer questions may only extend as far as this question alone. If they extend further, however, this may have the result of reducing correlation and regression coefficients. Somewhat more variance could be explained by factors if students had answered them accurately. Grade

40

Total scores by physics

30 T o t a l

Group 20

10

0

1

Mean

0 15.86

6.04

1 19.32

7.32

Missing

14.24

SD

6.88

Missing

Physics

In spite of this problem, students who say they have taken physics score about three and one-half points higher than those who have not. This result is statistically significant at the p = 0.05 level.

Dependent variable is:

81

Total

1414 total cases of which 170 are missing R2 = 3.1% R2(adjusted) = 3.1% s = 6.208 with 1244 – 2 = 1242 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

1549.52

1

1550

Residual

47858.2

1242

38.5332

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

15.8624

0.1874

84.6

Physics

3.45738

0.5452

6.34

40.2

82

Item 52, Math Level Highest level math course you have completed: A. General Math B. Algebra I C. Geometry D. Algebra II

E. Pre-calculus or Trigonometry

The four-course sequence of Algebra I, Geometry, Algebra II, and Precalculus or Trigonometry is standard in most American high schools. Algebra I starts the exponential decline in college-level math sequence. A student can drop out of the high school math sequence at any grade. For the most part, those who do take no more math in high school or drop to a general math course. Several problems on this test deal with mathematics in the areas of graph reading, angular measurement, or scientific notation. 500

Frequency breakdown of math level

400 300

Group

200 100

1

2

3

4

5

Msg

Math Level

Total

Count

%

1-General

456

32.2

2-Algebra I

290

20.5

3-Geometry 214

15.1

4-Algebra II 224

15.8

5-Pre-calculus 986.93 Missing

1329.34

1,414

40

T o t a l

30

Total score by math level

20

Group

10

1

2

3

Math Level

4

5

Msg

Count

%

1-General

13.77

4.80

2-Algebra I

15.93

5.99

3-Geometry

16.36

6.30

4-Algebra II

18.58

6.39

5-Pre-calculus 21.14

7.97

Missing

7.27

15.33

83

A substantial rise in total test score is apparent in the regression graph and is substantiated by the regression analysis below. Dependent variable is:

Total

1414 total cases of which 132 are missing R2 = 12.0% R2(adjusted) = 12.0% s = 5.921 with 1282 – 2 = 1280 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

6137.17

1

6137

Residual

44871.2

1280

35.0557

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

12.1544

0.3409

35.7

Math Level

1.65046

0.1247

13.2

175

A background in mathematics makes a statistically significant difference in students’ performance on many problems. Problems 3, 10, 16, 19, 25, 26, and 33 were designed to test students’ misconceptions in mathematics. The regressions with these problems are all statistically significant. Math experience also appears to help students with problems 20, 22, and 45. Math level appears to hurt students in answering two questions: 11 and 41. These are significant at the p = 0.05 level.

84

Derived Variable: Acceleration The grade level of a student is a somewhat redundant measure with age. By subtracting a student’s age from his or her grade and adding six, one has a measure of how advanced or impeded a student is in progress through school. I have defined a factor, acceleration, that is a measure of the degree to which students are ahead of or behind their classmates. Students who are one year younger than the average for students in their grade have an acceleration measure of +1. Those who are one year older than the average of their classmates’ ages have an acceleration measure of –1. 800

Frequency breakdown of acceleration 600

Group 400

200

-4

-3

-2

-1

0

Acceleration

1

2

3 Msg

Missing Total

%

-4

4

0.283

-3

14

0.990

-2

16

1.13

-1

29

2.05

0 699

49.4

1 479

33.9

2 63 35

Count

4.46

0.354 105

7.43

1,414

Most students appear to be at the 0 or 1 level for acceleration. This disparity exists because of round-off error in students reporting their ages. For example, in tenth grade, the average student turns fifteen in September, so when the test was taken, about half the students reported that they were fourteen and half reported that they were fifteen. This results in an acceleration measure of 0 for about half the students and 1 for the other half.

85

40

Frequency breakdown of acceleration

30 T o t a l

Group

20

10

-4

-3

-2

-1

0

1

2

3

Msg

Acceleration

2 17.47

6.86

3 13.00

3.87

Missing

15.56

Mean

SD

-4

8.75

.95

-3

11.28

2.30

-2

13.06

3.73

-1

13.41

5.73

0 14.69

5.79

1 18.37

6.30

13.41

Note that overall performance on the test increases with increasing acceleration. However, there is a substantial reduction when acceleration exceeds +2. This is indicative of a nonlinear trend in the data. Students who are two or more years younger or older than their classmates appear to hold more misconceptions than their peers. Dependent variable is:

Total

1414 total cases of which 105 are missing R2 = 6.7% R2(adjusted) = 6.6% s = 6.067 with 1309 – 2 = 1307 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

3448.90

1

3449

Residual

48104.6

1307

36.8053

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

15.3017

0.1853

82.6

Acceleration

1.99420

0.2060

9.68

93.7

This acceleration factor is significant at the p = 0.05 level. Students with high acceleration measures appear to answer questions correctly more than others: 1, 21, 26, 28, 31, 39, and 40. These correlations are significant at the p = 0.05 level.

86 C. Attitude Factors A set of attitude factors was originally included on this test as a way to measure if certain attitudes changed as a result of taking the Project STAR course. The plan was to see if there were any changes from pre-test to post-test. I have included them here only for informational purposes. They are not to be treated as predictive factors. Item 58, Educational Aspirations What is the highest level of education that you plan to complete? A. Not finish High School. B. High school.

D. Some college

E. College degree

C. Trade, Vocational, or Business School.

1000

Frequency breakdown of educational aspiration

800 600

Group

400 200

1

2

3

4

Educational Aspiration

Total

5

Msg

Count

%

1 -A

49

3.47

2-B

88

6.22

3-D

130

9.19

4-C

137

9.69

5-E

883

Missing

127

62.4 8.98

1,414

A large majority of the students in this study are planning to attend some postsecondary school.

30 T o t a l

87

Total score by student’s educational aspirations

40

Group

20

10

1

2

3

4

5

Msg

Mean

SD

1-A

12.49

5.28

2-B

13.14

4.70

3-D

13.62

5.19

4-C

14.53

5.02

5-E

17.20

6.51

Missing

15.29

7.27

Educational Aspiration

Students with higher postsecondary aspirations appear to do much better on this test, especially those who intend to graduate from college. This trend is significant at the p = 0.05 level. Comparing the confidence intervals for the plotted medians, one can see an overlap for the first four categories and the missing data. Only students who aspire to finish college appear to perform significantly better (at the p = 0.05 level) than their those with lesser goals. Dependent variable is:

Total

1414 total cases of which 127 are missing R2 = 6.5% R2(adjusted) = 6.4% s = 6.110 with 1287 – 2 = 1285 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

3310.56

1

3311

Residual

47967.5

1285

37.3288

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

9.98389

0.6714

14.9

Ed. Aspiration

1.41121

0.1499

9.42

88.7

88

Item 59, Importance of Science How important do you feel science will be in your future occupation? A. Not at all

B. Somewhat

C. Important

D. Very important

E. Essential

This question was included in the instrument to see if students’ interest in pursuing scientific careers changed from pre-test to post-test. In the context of this study, it helps to show how students’ attitudes toward scientific careers relate to their test performance. 500

Frequency breakdown of importance of science

400 300

Group

200 100

1

2

3

4

5

msg

Importance of Science

Total

Count

1 207

14.6

2 460

32.5

3 304

21.5

4 153

10.8

5 163

11.5

Missing

%

127

8.98

1,414

Judging from the graph above, we see that roughly half the students who took this test are interested in pursuing careers in which science plays a major role. 40

Total score by importance of science

30 T o t a l

20

10

1

2

3

4

Importance of Science

5

Msg

Group

Mean

1 13.87

5.15

2 15.19

5.86

3 16.25

6.09

4 17.58

6.56

5 19.64

7.14

Missing

15.48

SD

7.36

89 Students are interested in scientific careers do much better on this test than those who are not. This result is significant at the p = 0.05 level. Dependent variable is:

Total

1414 total cases of which 127 are missing R2 = 7.3% R2(adjusted) = 7.2% s = 6.074 with 1287 – 2 = 1285 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

3733.50

1

3733

Residual

47408.9

1285

36.8941

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

12.3758

0.4054

30.5

Importance of … 1.37604

0.1368

10.1

101

90

Item 60, Reason for Taking Course Why did you decide to take this course? A. Curiosity or interest.

D. Recommended by an adult.

B. Hobby or amateur astronomer. E. Friend has taken the course. C. Needed credit.

This question was included on the original test to help determine how students make the decision to take a science elective in high school. 600

Frequency breakdown of reason for taking course (in alphabetical order)

400

200

Adult Credit Cur

Friend Hobby Msg

Reason for Taking Course

Total

Group

Count

%

Adult

132

9.34

Credit

423

29.9

Curiosity

584

41.3

Friend

69

4.88

Hobby

53

3.75

Missing

153

10.8

1,414

Relatively few students took this course because a friend recommended it or because astronomy is their hobby.

91

40

Total score by reason for taking course

30 T o t a l

20

10

Adult Credit

Cur

Friend Hobby Msg

Reason for Taking Course

Group

Mean SD

Adult

15.17

5.45

Credit

13.89

5.25

Curiosity

17.81

6.51

Friend

15.00

5.67

Hobby

18.77

8.42

Missing

15.36

6.98

Students who gave the reason that they were curious about astronomy or that astronomy was their hobby did better than average in total score. Those who took the course to get credit had fewer correct answers. An adult’s or a friend’s referral was not significant at the p = 0.05 level. Analysis of Variance For: Total Source

df

Sum of Squares

Mean Square F-ratio

cry1

726.150

726.150

19.078

0.0000

hby

1

457.078

457.078

12.009

crt 1

244.372

244.372

6.4204

0.0114

adt1

2.60606

2.60606

0.06847

0.7936

frd 1

6.37061

6.37061

0.16737

0.6825

Error

1408

53591.4

38.0621

Total

1413

58019.9

Probability 0.0005

Dependent variable is:

92

Total

1414 total cases of which 153 are missing R2 = 8.6%

R2(adjusted) = 8.3%

s = 6.063 with 1261 – 5 = 1256 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

4353.40

4

1088

Residual

46167.9

1256

36.7579

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

15.1742

0.5277

28.8

Curiosity

2.63911

0.5843

4.52

Hobby

3.59934

0.9859

3.65

Credit

-1.28299

0.6045

-2.12

Friend

-0.174242

0.9007

-0.193

29.6

D. Analysis of Variance The preceding section analyzed each factor alone for its contribution to the variance in total test score. This section applies the technique of multiple linear regression to explain the variance using all of the appropriate factors simultaneously. All demographic and schooling factors are included; attitude factors are not included in this model. Multiple regression analysis is a technique for examining the effects of many independent variables on a dependent variable. In the case of this study, the independent variables are the demographic and schooling factors, the dependent variable is the total test score. Using all of these available factors shows that 30.4 percent of the variance in total test score can be accounted for. This is a substantial portion of the variance, but 69.6 percent of the variance remains unaccounted for.

93

Regression Analysis of Total Score by Demographic and Schooling Factors Dependent variable is: Total 1414 total cases of which 360 are missing R2 = 31.1%

R2(adjusted) = 30.2%

s = 5.331 with 1054 – 14 = 1040 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

13339.2

13

1026

Residual

29555.4

1040

28.4187

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

8.56079

1.646

5.20

European

3.33754

0.3710

9.00

Male

2.31198

0.3333

6.94

Math level

1.00495

0.2027

4.96

Physics

2.75583

0.5805

4.75

Acceleration

1.06361

0.3118

3.41

Mother’s ed.

0.484542

0.1345

3.60

African

-0.866513

0.7749

-1.12

Earth science

0.329892

0.3578

0.922

Chemistry

0.410729

0.4634

0.886

Father’s ed.

0.105647

0.1266

0.834

Latin

0.039820

0.7012

0.057

Grade

0.004254

0.1863

0.023

Asian

0.009125

0.9449

0.010

36.1

Using all of these factors, one can see that some have large coefficients and some have small ones. The ratio of the absolute values of largest to the smallest is 1,117:1. The factor coefficients appear to be very different in this multiple regression than when factors are treated singly in simple regression. For example, the coefficient for the impact of grade level in this model is a minuscule –0.001, while when treated alone it is 1.21. However, the key statistic in this analysis is the t-ratio. Those factors with t-ratios less than 2.00 are not significant at the p = 0.05 level. The t-ratio for grade level, when all the factors are entered together is a measly –0.008 and is not significant at the p = 0.05 level. As a consequence of this multiple regression, many factors are revealed to be not significant at the p = 0.05 level. These nonsignificant factors are grade level, Latin heritage, African heritage, Asian heritage, the taking of Earth Science,

94

and the taking of Chemistry. The magnitude of the effect of each of these factors is the coefficient in the second column. Using this particular stepwise regression model, one could predict the total score of a student by adding to the constant coefficient of 8.81 points, 2.3 points if male, 1 point for each year of acceleration, 3.2 points for being of European heritage, 1 point for each math course, 2.8 points or taking physics, and 0.5 points for each level of mother’s education. Reduced Model of Total Score by Demographic and Schooling Factors Dependent variable is: Total1414 total cases of which 297 are missing R2 = 28.8%

R2(adjusted) = 28.5%

s = 5.399 with 1117 – 6 = 1111 degrees of freedom Source

Sum of Squares

df

Mean Square F-ratio

Regression

13079.5

5

2616

Residual

32381.7

1111

29.1464

Variable

Coefficient

s.e. of Coeff

t-ratio

Constant

8.75662

0.5114

17.1

European

3.58453

0.3360

10.7

Math level

1.32554

0.1290

10.3

Male

2.38195

0.3252

7.33

Mother’s ed.

0.485370

0.1191

4.08

Physics

2.05254

0.5315

3.86

89.8

These factors have been recalculated by building a stepwise model. Each factor was added to a linear equation based on the maximum amount of additional variance accounted for. Six factors are statistically significant and are included in the model. What are we to make of this model? The first three factors seem to contribute the most to the regression equation; the contribution of each of the last three is small (see Table XIX). Table XIX. Variance Explained by Reduced Regression Model Step 1 2 3 4 5 6

Item Math European heritage Gender Physics course Mother's education Acceleration

R^2 R 12% 35% 21% 46% 25% 50% 27% 52% 28% 53% 29% 54%

These results are graphed in Figure 20.

95

Figure 20. Stepwise Regression Model of Total Score by Six Background Factors

25% Accounted For

R^2, % of Variance

30%

20% 15% 10% 5% 0% Math

European heritage

Gender

Physics course

Mother's education

Acceleration

To what degree are the first three factors independent from each other? One can tell from the magnitude of the correlation coefficients, summarized in Table XVIII (from Appendix E). These factors are not highly correlated. Table XVIII. Correlation of Major Background Factors Table XI Math European heritage Gender

Math Euro. Gender 1.00 0.20 1.00 0.11 0.05 1.00

That mathematics level of students is the major contributor to student test scores is surprising. There are seven problems that deal with mathematics on the test and students with higher-level math courses do significantly better on these questions. These students also do better on several questions that are not related to mathematics—items 7, 21, 23, and 45. These are questions on frames of reference, the rotational rate of the Earth, the Moon’s orbital period about the Sun, and light propagation in the daytime.

96

VIII. Discussion This work confirms misconceptions that have been addressed in smaller studies and develops a profile of conceptions of introductory astronomy students. It addresses the relationship between performance on a test of misconceptions and various background experiences and factors of the subjects. I shall first address each of the research questions in turn. I will then discuss the items on the test and follow with dissemination issues and problems specific to this dissertation. A. Research Questions 1. Validity 1(a) Is the test a valid instrument for measuring the misconceptions of students entering an introductory astronomy course? This test is a valid instrument for measuring the misconceptions of students in astronomy. The test was generated from interviews with students about their ideas and by combing the literature for misconceptions about astronomical concepts. The instrument was pilot-tested on thousands of students to refine the wording of questions, so that distractors reflected popular ideas of the students. A group of astronomers took the test and agreed on the answers to all forty-seven questions. Graduate students in Harvard’s Department of Astronomy took early versions of the test. Those questions that were answered incorrectly by these students were changed or eliminated. Teachers involved in the Project STAR curriculum reviewed the test and suggested changes in items that they viewed as confusing or inaccurate. A large group of 240 teachers of introductory astronomy and earth science helped to validate the test by predicting how their own students would perform on sixteen items from this test. They predicted that their own students would answer the items correctly at a 73 percent mastery level, on the average, after taking their course. Alternative hypotheses were explored for why some questions were answered with greater frequency than others. The factors explored for each item were: the order in the test, number of students who chose not to answer, the inclusion of a picture or diagram, concept, fact, or math skill, and reading level. None was found to be significant at the p = 0.05 level. Two χ2 tests were performed. The first was to determine if any student answers to items could be explained as based on random guessing of all the possible answers. The second looked only at the distractors, discounting any students who answered items correctly. Both hypotheses, that the statistics were the result of guessing, were rejected at the p = 0.05 level. 1(b) Which test items appear to be most appropriate in assessing student misconceptions in astronomy and should be included in revised instruments? All test items were subject to a calculation and discussion of their difficulty (P-value) and discrimination (D-value). A stepwise regression was performed that generated a list of items that were, as a group, highly discriminatory. This

97

technique selected a group of items that could more efficiently predict students’ total scores by eliminating questions that were highly correlated. A set of seven problems was able to account for 75 percent of the variance in the population. A set of twenty-one questions could account for 90 percent of the variance. Either set could be used by researchers and teachers as a highly discriminatory test of students’ astronomical misconceptions. 1(c) How reliable is this test? Several tests of internal validity were carried out as measures of this instrument’s reliability. These measures range from a low of 0.76 to a high of 0.80. These results are consistent with other tests of this type. For achievement tests that are used to determine whether the mean scores of subgroups are significantly different, a reliability coefficient of 0.65 is satisfactory (Aiken 1985). The test–retest method was not used. 2. Misconceptions Revealed 2(a) For students enrolling in a course where astronomical concepts are taught, for which concepts will students initially hold conceptions that are at odds with accepted scientific views? 2(b) Which misconceptions appear to be the most prevalent among students. This test revealed that students held a level of mastery over only three astronomical concepts tested. They held misconceptions, however, about most of the major astronomical concepts treated in introductory astronomy and earth science courses. Fifty-one student misconceptions were revealed by this test, nineteen of which were preferred by more students than was the correct answer. These misconceptions were listed by overall student preference. An item response curve was generated for each question, comparing the P-values of the correct answer and four distractors across five student performance levels. Surprisingly, some twenty-two wrong answers were preferred with greater frequency by higher-performing students. Perhaps the best way to characterize student understanding is to present a profile of a hypothetical student entering an earth science or astronomy course. Although there are probably no students with all of these ideas, this composite represents the average student in our sample. In this profile, I have taken some liberty in reporting misconceptions that are very popular, but may not be held by a plurality of students: The composite student is a male high schooler who has not taken earth science, chemistry, or physics previously. His math background consists of a course in Algebra I. He identifies himself as being of European heritage. Both his father and his mother have some higher education, a few years at college or a degree from a two-year school. The student intends to go on to get a college degree and thinks that science will, at least, be somewhat important in his future occupation. He has chosen to take this astronomy course because he is curious about the subject, but he does not consider it a hobby. He has firm ideas about most scientific concepts and rarely guesses at the answers to the questions in this test.

98

He tends to think of the astronomical world as fixed, or, at least, as constant. He can state that the Earth turns on its axis, but he is not quite sure of the ramifications of this motion. The length of daylight, the path of the Sun in the sky, and the movement of the Sun against the background of stars are all misconceived. In his view, the Sun moves in a uniform, unchanging way, rising in the East, being overhead at noon, and setting in the West. Its path is independent of geographic location or season. He knows that the Earth orbits the Sun in a year, but thinks that its path is highly elliptical. He has ideas about the size of and relative distance between astronomical objects that are vastly out of proportion. Both the Earth and the Sun are thought to be about ten times their actual diameter. Solar system objects are thought to be much closer to each other than they actually are. This supports his view that the seasons are caused by the Earth’s changing distance from the Sun and the Moon’s phases are caused by the Earth’s shadow. The Moon circles the Earth in a day while the stars appear fixed in the sky. The entire universe is compressed. Since the stars are fixed, traveling to another star would not change the appearance of constellations. Galaxies are much further away than the visible stars. The universe itself is static, neither expanding nor contracting. Gravity does not play a major role in the structure of the universe since it is not dependent on mass and distance, but only on air pressure. The nature of light is thought to be understood. Light takes time to reach us from the stars, but its role in vision is misconceived. Light exists only where it can be seen. When a flashlight illuminates an object at night, he thinks there is no light between the flashlight and the object; light only exists where its effect can be seen by the observer’s eyes. During the daytime, sunlight is thought to force the light from leaving a source. Moreover, objects can be seen without light traveling from them to one’s eye. He also believes that light intensity diminishes with the inverse first power of distance. Colored objects transform the color of light, as opposed to selectively absorbing different wavelengths. Light is not believed to be composed of particles, but to be a condition. Misconceptions in mathematics limit the usefulness of graphs and calculations in helping to understand astronomical concepts. He can extrapolate graphical data, but has difficulty reading graphs and extracting useful information or patterns. His understanding of scientific notation is poor. Order-of-magnitude calculations are difficult for him and are often performed incorrectly. He understands angles only when they are concrete and small. More abstract arguments using angular measure are not effective with him. He thinks a circle has only 180° of internal angle. Size-to-distance ratios are a foreign idea to him. He solves simple algebraic equations, but cannot apply proportional reasoning to real-world or word problems. He sees math as a separate subject with little relevance to or utility in learning science. 3. Demographic Factors and School-Based Factors Significant differences in student performance on this test relate to several demographic factors. By using the technique of multiple linear regression, the most important factors were identified. The model constructed from this analysis accounts for roughly 30 percent of the variance in total test scores of the student population. Thus, most of the variance remains unexplained. Somehow students who do well on this test are exposed to information or processes that were undetected by this test. Because overall test performance was poor, one explanation for the small amount of variance explained by identified factors is

99

that there was considerable guessing on the part of students and that there were other unaccounted-for factors. 3(a) Are differences in the quantity of misconceptions related to gender? 3(b) Are differences in the quantity of misconceptions related to ethnic heritage? 3(c) Are differences in the quantity of misconceptions related to the educational accomplishment of parents or guardians? Male students perform better on this test by about 3.5 items, or about 7 percent. Students who identify themselves as of European heritage average 5 items, or 11 percent, higher on the test than others. Both these results are significant at p = 0.05, but are not unusual for tests of ability. What is much more surprising is that other background factors play such a limited role in accounting for the variance in results. Mother’s education is a small factor, explaining only an additional 1 percent of variance in the regression model. Father’s education was not significant at the p = 0.05 level. The fact that fathers of some students have much more education than others was not significantly related to student scores. It appears that the educational level of parents has little influence on the quantity of misconceptions that students hold. Two possible explanations are that parents may transmit misconceptions to their students, or that, even if parents are aware of scientific conceptions, they have difficulty transmitting these views effectively to their children. These two factors are usually used to characterize the socioeconomic background of students. It appears that the income level of a student’s parents has little relation to his or her conceptual understanding in astronomy. Students without the benefit of highly educated parents appear to be at no disadvantage as far as scientific misconceptions are concerned. 4(a) Are differences in the quantity of misconceptions related to student’s grade level or age? 4(b) Are differences in the quantity of misconceptions related to a student’s prior completion of specific mathematics or science courses? Only one highly significant schooling factor was uncovered by this study. The level of math courses taken explained the largest fraction of variance in the regression model. Students with several math courses had fewer misconceptions. Even for items on the test that did not deal with mathematical skills, the level of mathematics explained a large amount of the variance. Two other factors were significant, but minor in their effect: Whether or not a student had taken a physics course explained an additional 2 percent of the variance. Only a 1 percent effect of the accounted variance was explained by “acceleration”—that is, whether a student is ahead of or behind her or his classmates on the basis of age. Again, the most astounding result of this multiple regression is that several factors appear to have no impact on the number of student misconceptions. Student scores were found to be independent of grade level when all other factors were taken into account. It appears that spending more time in school has no impact on scientific misconceptions. Taking an earth science course did not appear to have a significant impact on scientific misconceptions either. Even though one-quarter of most earth science curricula

00

is astronomy, this fact did not appear to reduce misconceptions in the sample. Taking an additional science course or a year of chemistry did not help either. B. Characterizing Misconception Questions Characteristics of items that identify student misconceptions are quite different from those on standardized tests or those that teachers create for student tests. The P-values of these misconception questions are often low, because students prefer an incorrect misconception to the correct answer. These questions are less able to discriminate between students who perform well on an entire test and those who do poorly, because these questions are more difficult. Analysis of test items may show that multiple-choice misconception items are very good test questions and should be included in standardized tests in spite of their low P-values. Creating items that reliably identify misconceptions is not easy. There is no simple rule to help create these questions except, perhaps, one: misconception questions are more difficult than ordinary questions—they must contain very attractive distractors. Testmakers must use interviews of research results to find such distractors. A question that is answered correctly by 75 percent of subjects cannot reveal misconceptions. As revealed by this test, students appear often to choose misconceptions with the same average frequency as correct answers. A P-value greater than 0.50 prevents this type of result. As a rule-of-thumb, misconception questions must have P-values less than 0.50. On this test they have averaged 0.34. This result on testing for misconceptions may be somewhat disheartening to teachers and their students. A test made up of only misconception questions might indeed yield a mean score of 34 percent. In most classrooms, this would be equivalent to a letter grade of F. If teachers are to test using misconception questions, they must let their assignments of letter grades reflect this lower average P-value of questions. C. Dissemination Dissemination of these findings should heighten the awareness of practitioners as to the prevalence of scientific misconceptions in their students. I expect the impact of these results on the committed teachers is to be a reduction in the number and complexity of concepts presented in introductory astronomy courses and the implementation of explicit treatment of student misconceptions through discussion and experimentation. The presentation of a revised misconception test, in multiple-choice format, will encourage teachers and researchers to use this test as a tool for assessing the misconceptions of their own students both prior to and after instruction. Individual items may find their way onto exams and standardized tests as well. I expect to submit the results of this study in summary form to the journal Science Education. Two papers already submitted are: Philip M. Sadler. High School Astronomy: Characteristics and Student Learning, Proceedings of the Workshop for Hands-on Astronomy for Education. Tucson, AZ: Fairborn Observatory, on 3/5/91.

01

Alan Lightman and Philip M. Sadler. How Well Can Science Teachers Predict Student Misconceptions before and after Instruction? Submitted to Science Education, March 1992. This study will set the stage for further papers on the role of misconceptions in teaching astronomy. One is planned: • The effectiveness of the Project STAR curriculum in reducing student misconceptions in astronomy. This will be an analysis of pre-test and post-test results for control and Project STAR groups. D. Errors, Omissions, and Problems This study is subject to the limitations of any survey. It is a sample of a total population and, even though the sample is large, it may be biased. Students in the selected schools may be different from the general population of students enrolling in introductory astronomy classes. The test itself may be biased in favor of males of European extraction who have taken many science and math classes. After all, I certainly fit that description, as do most of those who helped to create and administer the test. Perhaps some subtle nuances in wording or style worked their way into the test. Also, students may have worked harder to complete the test if they shared certain attributes with the administering teacher. Correct answers on this test were not equally distributed; 38 percent of the correct answers corresponded to answer “E.” Students may have discovered this preference and chosen this letter disproportionately or may have avoided this answer. Future versions of this test should remove any such preference. Demographic questions could have offered better choices. The extension of student ages downward by recoding could have been avoided by offering a wider range of possible responses. The fact that a large portion of the students chose “other” for their ethnic heritage points to a need for an expanded range of choices for future tests or better definitions in the sense that all students will understand all choices. Some questions on this test do not appear to measure misconceptions well. There is no dominant choice of an answer from students on these questions: the D-value of the correct answer is low. These questions should be reworked and retested, or eliminated from the test. The reliability of the test could be improved. Internal consistency should not be the only measure used to evaluate reliability. A test–retest procedure could be carried out on a group of students sufficiently large to produce statistically significant results. The tests of validity could be strengthened. Accomplished individuals, such as astronomy graduate students or astronomy teachers, should take the test in its entirety. Astronomy teachers should predict outcomes of their students on the entire test, rather than on only a subset of the test. Using only sixteen items is not as strong a validation procedure as using the entire test. A comparison could also be made between how students answer the written test questions and how they choose answers when taking the test orally.

02

E. Future Extensions The level of overall understanding of astronomical concepts in this student population is appalling, even in a pre-test, and probably limits students’ ability to integrated new concepts into their already well-developed frameworks of understanding. Introductory earth science and astronomy books pay scant attention to many of the ideas that students hold. Without revisiting these misconceptions, students are damned to try to place new conceptions upon faulty foundations. Perhaps this is why taking an earth science course has no impact on student conceptions in astronomy. These misconceptions were never dealt with before attempting to teach new ideas. Several researchers have found that students can abandon their misconceptions and learn scientifically correct ideas only with unusual teaching methods. The key technique is that students must elucidate their own preconceptions and then test them. Only by realizing that their own ideas cannot explain the outcomes of experiments or natural phenomena do students realize a need for a different theory. Teachers can then present the scientifically accurate concept as a powerful idea that can predict and explain events. One consequence of accepting these new ideas is, strangely enough, that the old conceptions are forgotten. So, misconceptions appear to be erased from students’ minds. This makes it very difficult for teachers to recall misconceptions from their own student days; they simply do not remember them. Teachers must rely on their own interviews or become familiar with the literature on scientific misconceptions in order to incorporate these ideas in their teaching. The low test scores could be thought of as boding well for teachers of introductory earth science and astronomy courses. There is a lot of room for improving students’ scientific conceptions. This test will prove to be a useful instrument in attempts to determine if interventions that seek to modify students’ conceptions produce significant change. This test provides an essential baseline of student misconceptions for further studies. By analyzing post-test scores of Project STAR treatment groups and control groups, the impact of this program can be assessed and results documented for future efforts of curriculum reform.

03

IX. References Abacus Concepts Inc. Statview 512+. Calabasas, CA: Brainpower Inc., 1986. Aiken, Lewis R., Psychological Testing and Assessment. Boston: Allyn and Bacon, 1985. Anderson, B., and C. Karrqvist. “How Swedish pupils, aged 12-15 years, understand light and its properties.” European Journal of Science Education 5 (4 1983a): 387-402. Anderson, B., and C. Karrqvist. Light and Its Properties. Trans. by Gillian Thylander. Molndal, Sweden: University of Gothenburg, 1983b. Anderson, Charles W., and Edward L. Smith. Children’s Conceptions of Light and Color: Understanding the Role of Unseen Rays. Institute for Research on Teaching, Michigan State University, 1986. ERIC ED 270 318. Anderson, G. Encyclopedia of Educational Evaluation. London: Jossey-Bass, 1975. Apelman, M. “Critical barriers to understanding of elementary science: Learning about light and color.” In Observing Science Classrooms: Observing Science Perspectives from Research and Practice, ed. by Charles Anderson. Columbus, OH: ERIC/SMEAC, 1984. Aristotle. de Sensu. Trans. by W. S. Hett. London: Loeb Classical Library, ed., 1957. Arnaudin, Mary W., and Joel J. Mintzes. “Students’ alternative conceptions of the human circulatory system: A cross-age study.” Science Education 69 (5 1985): 721-733. Arons, Arnold B. “Student patterns of thinking and reasoning, part one of three.” The Physics Teacher (December 1983): 576-581. Atkin, J. Myron. “Some evaluation problems in a course content improvement project.” Journal of Research in Science Teaching 1 (1963): 129-132. Ausubel, D.P., J.D. Novak, and H. Hanesian. Educational Psychology: A Cognitive View. New York: Holt, Rinehart and Winston, 1978. Bell, Alan, Gard Brekke, and Malcom Swan. “Misconceptions, Conflict and Discussion in the Teaching of Graphical Interpretation.” In 2nd International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed. by Joseph D. Novak. Ithaca, NY: Cornell University Press, 1987, pp. 46-58. Bell, Beverly, Roger Osborne, and Ross Tasker. “Finding out what children think.” In Learning in Science, The Implication of Childrens’ Science, ed. by Roger J. Osborne and Peter Freyberg. Auckland, New Zealand: Heineman, 1985, pp. 151-165. Bloom, B. S. “Mastery Learning.” In Mastery Learning: Theory and Practice, ed. by J. Block. New York: Holt, Rinehart, and Winston, 1971. Bouwens, Robert E. A. “Misconceptions among pupils regarding geometrical optics.” In GIREP - Cosmos - and Educational Challenge in Copenhagen, European Space Agency, 369-370, 1986.

04

Broman, Lars. 27 Steps to the Universe. Salt Lake City: International Planetarium Society/Hansen Planetarium, 1986. Brown, David E., and John Clement. “Misconceptions concerning Newton’s law of action and reaction: The underestimated importance of the third Law.” In GIREP - Cosmos - and Educational Challenge in Copenhagen, European Space Agency, 1986, pp. 39-53. Brumby, Margaret N. “Misconceptions about the concept of natural selection by medical biology students.” Science Education 68 (4 1984): 493-503. Camp, Carol Ann. “Problem solving patterns in science: Gender and spatial ability during early adolescence.” Ed.D., University of Massachusetts, Amherst, 1981. Caramazza, A., M. McCloskey, and B. Green. “Naive beliefs in sophisticated subjects: Misconceptions about trajectories of objects.” Cognition 9 (1981): 117. Carey, Sue. Conceptual Development in Children. Cambridge, MA: MIT Press, 1985. Carter, Karl C., and Bruce R. Stuart. “Using a celestial sphere to test scientific concepts.” Journal of College Science Teaching 19 (3 1989): 164-167. Champagne, A. B. and Leo E. Klopfer. “A causal model of students’ achievement in a college physics course.” Journal of Research in Science Teaching 19 (1982): 299. Champagne, A. B., L. E. Klopfer, and J. D. Anderson. “Cognitive research and the design of science instruction.” Educational Psychologist 17 (1 1982): 31-53. Champagne, A. B., Leo E. Klopfer, and J. H. Anderson. “Factors influencing the learning of classical mechanics.” American Journal of Physics 48 (1980): 1074. Clement, John. “Overcoming students’ misconceptions in physics: The role of anchoring intuitions and analogical validity.” In GIREP - Cosmos - and Educational Challenge in Copenhagen, ed. by J. Hunt. European Space Agency, 1986, pp. 84-97. Clement, John. “Students preconceptions in introductory mechanics.” American Journal of Physics 50 (1982): 66. Cohen, H.G. “Dilemma of the objective paper-and-pencil assessment within the Piagetian framework.” Science Education 64 (1980): 741-745. Cohen, Michael R. “How can the sunlight hit the Moon if we are in the dark? Teacher’s concepts of phases of the Moon.” Paper presented at Henry Lester Smith Conference on Educational Research, 1982. Cohen, Michael R., and Martin H. Kagan. “Where does the old Moon go?” The Science Teacher 46 (1979): 22-23. Dai, Meme F. “Misconceptions about the Moon held by fifth and sixth graders in Taiwan.” National Science Teachers Association, 1990. Dean, Geoffey. “Does Astrology Need to Be True?” The Skeptical Inquirer 11 (1987): 166-184.

05

Dobson, Henry David. “An Experimental Study of the Effectiveness of the Planetarium in Teaching Selected Science Concepts in the Middle School.” Ph.D. dissertation, Pennsylvania State University, 1983. Doménech, Antonio and Elena Casasús. “Galactic structure: A constructivist approach to teaching astronomy.” School Science Review 72 (260 1991): 87-93. Driver, R., and J. Easley. “Pupils and paradigms: A review of literature related to concept development in adolescent science students.” Studies in Science Education 5 (1978): 61-84. Driver, Rosalind, Edith Guesne, and Andrée Tiberhien. “Some features of children’s ideas and their implications for teaching.” In Children’s Ideas in Science, ed. by Rosalind Driver, Edith Guesne, and Andrée Tiberhien. Philadelphia: Open University Press, 1985, pp. 193-201. Duckworth, Eleanor. “The Having of Wonderful Ideas” & Other Essays on Teaching and Learning. New York: Teachers College Press, 1987. Dufresne, Robert, William Gerace, Pamela T. Hardiman, and Jose Mestre. “Hierarchically structured problem solving in elementary mechanics: Guiding novices’ problem analysis.” In GIREP Conference, Cosmos — An Educational Challenge in Copenhagen, European Space Agency, 1986, pp. 116-130. Eaton, Janet F. “Student misconceptions interfere with science learning: case studies of fifth-grade students.” Elementary School Journal 84 (4 1984): 365-379. Eaton, Janet F., Charles W. Anderson, and Edward L. Smith. Student’s Conceptions Interfere with Learning: Case Studies of Fifth Grade Students. Institute for Research on Teaching, Michigan State University, 1983. ERIC ED 228 094. Ebel, Robert L. and David A. Frisbie. Essential of Educational Measurement. Englewood Cliffs, NJ: Prentice Hall, 1991. Edoff, James Dwight. “An experimental study of the effectiveness of manipulative use in planetarium astronomy lessons for fifth and eighth grade students.” Ed.D., Wayne State University, 1982. Erickson, Gaalen and Andree Tiberghien. “Heat and temperature.” In Children’s Ideas in Science, ed. by Rosalind Driver, Edith Guesne, and Andrée Tiberhien. Philadelphia: Open University Press, 1985, pp. 52-84. Farrell, Margaret A. and Walter A. Farmer. “Adolescents’ performance on a sequence of proportional reasoning tasks.” Journal of Research in Science Teaching 22 (6 1985): 503-518. Feher, Elsa. “Conception of Light and Color.” In American Association of Physics Teachers in Atlanta, ERIC1986. Festinger. A Theory of Cognitive Dissonance. Evanston, IL: Row, Peterson and Company, 1957. Finley, Fred N. “Evaluating instructing: the complementary use of clinical interviews.” Journal of Research in Science Teaching 23 (1986): 635-660.

06

Fisher, Kathleen M., and Joseph I. Lipson. “Science education in other countries—Issues and questions.” In Science Education in Global Perspective, ed. byMargrete Siebert Klein and F. James Rutherford. 1-11. Boulder, CO: Westview Press, 1985. Freyberg, Peter and Roger Osborne. “Constructing a survey of alternative views.” In Learning in Science, The Implication of Childrens’ Science, ed. by Roger J. Osborne and Peter Freyberg. Auckland, New Zealand: Heineman, 1985, pp. 166-167. Friedman, Alan J., Lawrence F. Lowery, Steven Pulos, Dennis Schatz, and Cary I. Sneider. Planetarium Educator’s Workshop Guide. Berkeley, CA: International Planetarium Society/Lawrence Hall of Science, 1980. Furuness, Linda Bishop, and Michael Cohen. “Children’s conception of the seasons: A comparison of three interview techniques.” Paper presented at National Association for Research in Science Teaching in 1989. Gardner, Howard. The Unschooled Mind. New York: Basic Books, 1991. Gilbert, John K. “The study of student misunderstandings in the physical sciences.” Research in Science Education (1977): 165-171. Goodlad, John. A Place Called School. New York: McGraw-Hill, 1984. Guesne, Edith. “Light.” In Children’s Ideas in Science, ed. by Rosalind Driver, Edith Guesne, and Andrée Tiberhien. Philadelphia: Open University Press, 1985, pp. 10-32. Gunstone, Richard F., and Richard T. White. “Understanding of gravity.” Science Education 65 (3 1981): 291-299. Halloun, Ibrahim Abu, and David Hestenes. “The initial knowledge state of college physics students.” American Journal of Physics 53 (11 1985): 1043-1055. Hambleton, Ronald K., H. Swaminathan, and H. Jane Rogers. Fundamentals of Item Response Theory. Newbury Park, CA: Sage Publications, 1991. Happs, John C., and Christine Coulstock. “What might parents be teaching their children about astronomy? Adult understanding of basic astronomy concepts.” Paper presented at Australian Science Education Research Association in 1987. Hardiman, Pamela T., Robert Dufresne, and William Gerace. “Physics novices’ judgments of solution similarity: When are they based on principles?” In GIREP Conference, Cosmos — an Educational Challenge in Copenhagen, European Space Agency, 1986, pp. 194-202. Hoff, D. “Astronomy for the non-science student—A status report.” The Physics Teacher March (1982): 175. Hofwolt, Clifford A. “Instructional strategies in the science classroom.” In Research within Reach: Science Education, ed. by David Holdzkom and Pamela B. Lutz.Washington, DC: National Science Teachers Association, 1985, pp. 4357.

07

Holton, Gerald. Introduction to Concepts and Theories in Physical Science. Princeton, NJ: Princeton University Press, 1985. Hopkins, Kenneth D., and Julian C. Stanley. Educational and Psychological Measurement and Evaluation, 6th ed., Englewood Cliffs, NJ: Prentice-Hall, 1981. International Association for the Evaluation of Educational Achievement. Science Achievement in 17 Countries: A Preliminary Report. New York: Teachers College, Columbia University, 1988. Janke, Delmar L. and Milton O. Pella. “Earth science concepts list for grades K-12 curriculum construction and evaluation.” Journal of Research in Science Teaching 9 (3 1972): 223-230. Jung, Walter. “Understanding students’ understandings: the case of elementary optics.” In 2nd International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed.by J.D. Novak, Cornell University Press, 1987, pp. 268-277. Karplus, Robert, et al. Science Teaching and the Development of Reasoning: Earth Science, 2d ed., Berkeley, CA: Lawrence Hall of Science, 1978. Karplus, Robert, Steven Pulos, and Elizabeth Stage. “Early adolescents’ proportional reasoning on ‘rate’” problems.” Educational Studies in Mathematics 14 (1983): 219-233. Kelsey, Linda J. “The performance of college astronomy students on two of Piaget’s projective infralogical grouping tasks and their relationship to problems dealing with the phases of the Moon.” Ph.D. dissertation, University of Iowa, 1980. Kenealy, Patrick. “A syntactic source of a common “misconception” about acceleration.” In 2nd International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed. by Joseph D. Novak, Cornell University Press, 1987, pp. 278-292. Kerlinger, Fred N. Foundations of Behavioral Research. New York: Holt, Rinehart, and Winston, 1986. Keuthe, James L. “Science concepts: A study of sophisticated errors.” Science Education 47 (4 1963): 361-364. Klein, Carol A. “Children’s concepts of the Earth and the Sun: A cross cultural study.” Science Education 65 (1 1982): 95-107. Klein, Margrete Siebert. “Two worlds of science learning: A look at the Germanies.” In Science Education in Global Perspective, ed. by Margrete Siebert Klein and F. James Rutherford. Boulder, CO: Westview Press, 1985, pp. 97154. Klopfer, Leopold. “Effectiveness and effects of ESSP astronomy materials—An illustrative study of evaluation in a curriculum development project.” Journal of Research in Science Teaching 6 (1 1964a): 64-75. Klopfer, Leopold. An evaluative study of the effectiveness and effects of astronomy materials prepared by the University of Illinois Elementary-School Science Project. ERIC, 1964b. ED032221.

08

Kyle, William C., Jr. “Curriculum development projects of the 1960s.” In Research within Reach: Science Education, ed. by David Holdzkom and Pamela B. Lutz. Washington, DC: National Science Teachers Association, 1985, pp. 324. Langford, Peter. Children’s Thinking and Learning in Elementary School. Lancaster: Technomic, 1989. Lightman, Alan, and Philip M. Sadler. “How can the Earth be round?” Science and Children (February 1986): 24-26. Lightman, Alan P., and Jon D. Miller. “Contemporary cosmological beliefs.” Social Studies of Science 19 (1989): 127-36. Lightman, Alan P., Jon D. Miller, and B. J. Leadbeater. “Contemporary cosmological beliefs.” In Misconceptions and Educational Strategies in Science and Mathematics, ed. by Joseph Novak. Ithaca, NY: Cornell University Press, 1987, pp. 309-321. Lindberg, David C. Theories of Vision from Al-Kindi to Kepler. The Chicago History of Science and Medicine, ed. by Allen G. Debus. Chicago: University of Chicago Press, 1976. Lohnes, P.R. “Factorial modeling in support of causal inference.” American Educational Research Journal 16 (1979): 323-340. Loria, A., M. Michelini, and V. Mascellani. “Teaching Astronomy to Pupils Aged 11-13.” In GIREP - Cosmos - and Educational Challenge in Copenhagen, ed. by J. Hunt. European Space Agency, 1986, pp. 229-233. Mali, G., and A. Howe. “Development of earth and gravity concepts among Nepali children.” Science Education 64 (2 1979): 213-221. Marshall, Kim, and Oliver W. Lancaster. Science: Elementary and Middle School Curriculum Objectives. Boston Public Schools, 1983. McClosky, M. “Intuitive Physics.” Scientific American 248 (1983): 122-130. McDermott, Lillian C. “Research on conceptual learning in mechanics.” Physics Today July (1984): 24-32. McKenzie, D.L., and M.J. Padilla. “The construction and validation of the test of graphing in science (TOGS).” Journal of Research in Science Teaching 23 (1986): 571-579. Microsoft Corporation. Microsoft Word User’s Guide. Redmond, WA: Microsoft Corporation, 1991. Minstrell, J. “Conceptual development research in the natural setting of a secondary school classroom.” In Science for the 80’s, ed. by H. B. Rowe. Washington, DC: National Education Association, 1982a. Minstrell, Jim. “Explaining the ‘at rest’ condition of an object.” The Physics Teacher (January 1982b): 10-14. Narode, Ronald. “Standardized testing for misconceptions in basic mathematics.” In 2nd International Seminar on Misconception and Educational Strategies in

09

Science and Mathematics in Ithaca, NY, ed. by Joseph D. Novak, Cornell University Press, 1987, pp. 222-333. National Assessment of Educational Progress. The Nation’s Report Card. Princeton, NJ: Educational Testing Service, 1989. Newton, Sir Issac. Opticks. London: William and John Innys, 1721. Novak, Joseph D. A Theory of Education. Ithaca, NY: Cornell University Press, 1977. Novick, Shimshon, and Joseph Nussbaum. “Using interviews to probe understanding.” The Science Teacher November (1978): 29-30. Nussbaum, Joseph. “Childrens’ conception of the Earth as a cosmic body: a cross age study.” Science Education 63 (1 1979): 83-93. Nussbaum, Joseph. “Students perception of astronomical concepts.” In GIREP Cosmos - and Educational Challenge in Copenhagen, ed. by J. Hunt. European Space Agency, 1986, pp. 87-97. Nussbaum, Joseph. “The earth as a cosmic body.” In Children’s Ideas in Science, ed. by Rosalind Driver, Edith Guesne, and Andrée Tiberhien. Philadelphia: Open University Press, 1985, pp. 170-192. Nussbaum, Joseph, and Joseph Novak. “Alternative frameworks, conceptual conflict and accommodation: Toward a principled teaching strategy.” Instructional Science 11 (1982): 183-200. Nussbaum, Joseph, and Joseph Novak. “An assessment of childrens’ concepts of the earth utilizing structured interviews.” Science Education 60 (4 1976): 535550. Ogar, J. “Ideas about physical phenomena in spaceships among students and pupils.” In GIREP - Cosmos - and Educational Challenge in Copenhagen, European Space Agency, 1986, pp. 375-378. Osborne, R. “Children’s dynamics.” The Physics Teacher 22 (1984): 504-508. Osborne, Roger J., and Beverly F. Bell. “Science teaching and childrens’ views of the world.” European Journal of Science Education 5 (1 1983): 1-14. Osgood, Charles E., George J Suci, and Percy H. Tannenbaum. The Measurement of Meaning. Urbana: University of Illinois Press, 1957. Osterlind, Steven J. Constructing Test Items. Boston: Kluwer Academic, 1989. Piaget, Jean, and Bärbel Inhelder. The Child’s Conception of Space. Trans. by F.J. Langdon and J.L. Lunzer. New York: W.W. Norton, 1929. Placek, Walter Anthony. “Preconceived knowledge of certain Newtonian concepts among gifted and non-gifted eleventh grade physics students.” In 2nd International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed. by Joseph D. Novak. Ithaca, NY: Cornell University Press, 1987, pp. 386-391. Plato. Plato’s Cosmology: The Timaeus of Plato. Trans. by Francis M. Cornford. London: Loeb, 1937.

10

Posner, G. J., K. A. Strike, P. W. Hewson, and W. A. Gertzog. “Accommodation of a scientific conception: Toward a theory of conceptual change.” Science Education 66 (2 1982): 211-227. Prather, J. Preston. Philosophical Examination of the Problem of Unlearning of Incorrect Science Concepts. National Association for Research in Science Teaching, 1985. ERIC ED256570. Reiner, Miriam, and Menahem Finegold. “Changing students explanatory frameworks concerning the nature of light using real-time computer analysis of laboratory experiments and computerized explanatory simulation of e.m. radiation.” In 2nd International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed. by Joseph D. Novak. Ithaca, NY: Cornell University Press, 1987, pp. 368-377. Rhoneck, Christoph von, and Karl Grob. “Representation and problem solving in basic electricity, predictors for successful learning.” In 2nd International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed. by J.D. Novak. Ithaca, NY: Cornell University Press, 1987, pp. 564-577. Rollins, M. M., J. J. Denton, and D. L. Janke. “Attainment of Selected Earth Science Concepts by Texas High School Seniors.” Journal of Educational Research 77 (1983): 81-88. Roth, Kathleen J. “Conceptual change learning and processing of science texts.” Paper presented at American Educational Research Association in Chicago, 1985a. Roth, K.J. “The effect of science texts on students misconceptions about food for plants.” Ph.D. dissertation, Michigan State University, 1985b. Russo, Richard. “Shoot the Stars—Focus on the Earth’s Rotation.” The Science Teacher February (1988): 25-26. Rutherford, F. James. “Lessons from Five Countries.” In Science Education in Global Perspective, ed. by Margrete Siebert Klein and F. James Rutherford. Boulder, CO: Westview Press, 1985, pp. 207-231. Sadler, Philip M. “Misconceptions in Astronomy.” In 2nd International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed. by Joseph D. Novak. Ithaca, NY: Cornell University Press, 1987, pp. 422425. Sadler, Philip M., and William M. Luzader. “The Teaching of Astronomy.” In International Astronomical Union, Colloquium 105 in Williams College, Williamstown, MA, ed. by Jay M. Pasachoff and John R. Percy. New York: Cambridge University Press, 1988, 257-276. Schatz, Dennis, Andrew Fraknoi, R. Robert Robbins, and Charles D. Smith. Effective Astronomy Teaching and Student Reasoning Ability. Berkeley, CA: Lawrence Hall of Science, 1978. Schoon, Kenneth J. “Misconceptions in Earth and Space Sciences, A Cross-Age Study.” Ph.D. dissertation, Loyola University, 1988.

11

Shipstone, D. M., C. Rhoneck, W. Jung, C. Karrqvist, J.J. Dupin, S. Joshua, and P. Licht. “A Study of students’ understanding of electricity in five European countries.” European Journal of Science Education (1987). Shymansky, James A., William C. Kyle, Jr. and Jennifer M. Alport. “How effective were the hands-on science programs of yesterday?” Science and Children, November/December (1982). Slinger, Lucille A. Studying light in the fifth grade: A case study of text-based science teaching. Institute for Research on Teaching, Michigan State University, 1982. Research Series No. 129. Smith, Deborah. “Primary teachers’ misconceptions about light and shadows.” In 2nd International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed. by Joseph D. Novak. Ithaca, NY: Cornell University Press, 1987, pp. 461-476. Sneider, Cary, and S. Pulos. “Childrens’ cosmographies: understanding the Earth’s shape and gravity.” Science Education 67 (2 1983): 205-222. Sonnier, Isadore L. “A study of the number of selected ideas in astronomy found in earth science curriculum project materials being taught in college and university astronomy courses.” Ed. D. dissertation, Colorado State College, 1966. Stead, B.F., and R.J. Osborne. “Exploring science students concepts of light.” Australian Science Teachers Journal 26 (3 1980): 84-90. Stead, B.F., K. E. and R. J. Osborne. “What is gravity? Some children’s ideas.” New Zealand Science Teacher 30 (1981): 5-12. Targan, David. “A study of conceptual change in the target domain of the lunar phases.” In 2nd International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed. by Joseph D. Novak. Ithaca, NY: Cornell University Press, 1987, pp. 499-511. Thijs, Gerard D. “Conceptions of force and movement, intuitive ideas of pupils in Zimbabwe in comparison with finding from other countries.” In 2nd International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed. by J.D. Novak. Ithaca, NY: Cornell University, 1987, pp. 501-513. Touger, J. S. “Students’ conceptions about planetary motion.” Paper presented at American Association of Physics Teachers in 1985. Toumlin, Stephen, and June Goodfield. The Fabrics of Heavens. London: Hutchinson, 1967. Treagust, David F. “Evaluating students’ misconceptions by means of diagnostic multiple-choice items.” Research in Science Education (1986): 363-369. Treagust, David F., and Clifton L. Smith. “Secondary students understanding of the solar system: implications for curriculum revision.” In GIREP - Cosmos and Educational Challenge in Copenhagen, ed. by J. Hunt. European Space Agency, 1986, pp. 363-369.

12

Troost, Kay Michael. “Science education in contemporary Japan.” In Science Education in Global Perspective, ed. by Margrete Siebert Klein and F. James Rutherford. Boulder, CO: Westview Press, 1985, pp. 13-66. Tuckman, Bruce W. Conducting Educational Research, 3d ed. New York: Harcourt, Brace, Jovanovich, 1988. Velleman, Paul F. Data Desk Handbook. Northbrook IL: Odesta Corporation, 1988. Velleman, Paul F., and Hoaglin. Applications, Basics, and Computing of Exploratory Data Analysis. Boston: Duxbury Press, 1981. Viglietta, M. L. “Earth, sky and motion. Some questions to identify pupil ideas.” In GIREP - Cosmos - and Educational Challenge in Copenhagen, ed. by J. Hunt. European Space Agency, 1986, pp. 369-370. Vincentini-Missoni, M. “Earth and gravity: Comparison between adults’ and children’s knowledge.” In Problems Concerning Students’ Representation of Physics and Chemistry Knowledge in University of Frankfort, ed. by W. Jung, 1981. Vosniadou, Stella, and William F. Brewer. “Theories of knowledge restructuring in development.” Review of Educational Research 57 (1 1987): 51-67. Wandersee, James H. “Can the history of science help science educators anticipate students’ misconceptions?” Journal of Research in Science Teaching (1986). Watts, D. Michael. “Gravity—don’t take it for granted!” Physics Education 17 (1982): 116-121. Watts, D. Michael. “Student conceptions of light: a case study.” Physics Education 20 (4 1985): 183-187. Weintraub, S. “Reading graphs, charts, and diagrams.” Reading Teaching 20 (1967): 345-349. Weiss, Iris. Report of the 1985-86 National Survey of Science, Mathematics, and Social Studies Education. 1987a. Weiss, Iris R. Report of the 1985-86 National Survey of Science and Mathematics Education. Research Triangle Institute, 1987b. RTI/2938/00-FR. Welch, W. W., L. J. Harris, and R. E. Anderson. “How many are enrolled in science?” The Science Teacher 51 (9): 1984. Wise, K. C., and J. R. Okey. “A meta-analysis of the effects of various science teaching strategies on achievement.” Journal of Research in Science Teaching 20 (1983): 419-435. Za’rour, George I. “Interpretation of natural phenomena by Lebanese school children.” Science Education 60 (1976): 277-287.

13

X. Bibliography Aikenhead, Glen Stirton. “The measurement of knowledge about science and scientists: An investigation into the development of instruments for formative evaluation.” Ed.D., Harvard University Graduate School of Education, 1972. Anderson, B., and C. Karrqvist. “How Swedish pupils, aged 12-15 years, understand light and its properties.” European Journal of Science Education 5 (4 1983a): 387-402. Anderson, C. W., and E. L. Smith. “Teacher behavior associated with conceptual learning in science.” Paper presented at American Educational Research Association in Montreal, 1983. Arons, A. B. “Addressing students’ conceptual and cognitive needs.” In Computers in Physics Instruction in Raleigh, NC, ed. by Edward F. Redish and John S. Risley. Reading, MA: Addison-Wesley, 1988, pp. 301-308. Arons, Arnold. A Guide to Introductory Physics Teaching. New York: John Wiley & Sons, 1990. Ausubel, David P. “An evaluation of the ‘Conceptual Schemes’ approach to science curriculum development.” Journal of Research in Science Teaching 3 (1965): 255-264. Balaco, M.R. “Test development related to the understanding of basic chemistry and its application to societal problems: For ChemCom curriculum (pilot study).” Dissertation Abstracts International 46 (9 1986): 2647-A. Bishop, Roy L. “Multiple-choice questions.” In International Astronomical Union, Colloquium 105 in Williams College, Williamstown, MA, ed. by Jay M. Pasachoff and John R. Percy. Cambridge University Press, 1988, pp. 83-87. Blosser, Patricia E. Secondary School Students’ Comprehension of Science Concepts: Some Findings from Misconception Research. ERIC, 1987. ERIC/SMEAC Science Education Digest 2. Bogdan, Robert C. and Sari Knopp Biklen. Qualitative Research for Education. Boston: Allyn and Bacon, 1982. Bowers, Raold Walker. “Effects of natural science courses upon Harvard College freshmen.” Ed.D., Harvard University Graduate School of Education, 1952. Bransford, J. D., and N. S. McCarrell. “A sketch of a cognitive approach to comprehension.” In Cognition and the Symbolic processes, ed. by W. B. Weiner and D. S. Palermo. Hillsdale, NJ: Erlbaum, 1974. Bruner, Jerome. Actual Minds, Possible Worlds. Cambridge: Harvard University Press, 1986. Cain, Peggy W., and Daniel W. Welch. Astronomy Activities for the Classroom. South Carolina State Department of Education, 1980. Teaching Guide ED199062 SE034287. Cangelosi, James S. Designing Tests for Evaluating Student Achievement. New York: Longman, 1990.

14

15 Carter, Carolyn, and George Bodner. “How student misconceptions of the nature of chemistry and mathematics influence problem solving.” In The Second International Seminar: Misconceptions and Educational Strategies in Science and Mathematics. Cornell University (Department of Education), 1987, pp. 6983. Cauldwell, Loren T. “A determination of earth science principles desirable for inclusion in science programs of general education in the secondary school.” Ph.D. dissertation, Indiana University, 1953. Cavena, G. R., and W.H. Leonard. “Extending discretion in high school sciencecurricula.” Science Education 69 (5 1985): 593-603. Champagne, Audrey B., and Leslie E. Hornig. The Science Curriculum. American Association for the Advancement of Science, 1987. Champagne, Audrey B., and Leopold E. Klopfer. “Research in science education: The cognitive psychology perspective.” In Research within Reach: Science Education, ed. by David Holdzkom and Pamela B. Lutz. Washington, DC: National Science Teachers Association, 1985, pp. 171-189. Champagne, Audrey B., Richard F. Gunstone, and Leopold E. Klopfer. Effecting Changes in Cognitive Structures among Physics Students. ERIC, 1983. ED 229 238. Cohen, Edward G. Attitude toward Science and Astronomy. 1980a. Cohen, Roalie. “Conceptual styles, culture conflict, and nonverbal tests of intelligence.” American Anthropologist 71 (1969): 828-56. Collis, K.F., and H.A. Davey. “A technique for evaluating skill in high school science.” Journal of Research in Science Teaching 23 (1986): 651-663. Committee on Research in Mathematics Science and Technology Education. Interdisciplinary Research in Mathematics, Science, and Technology Education. Washington, DC: National Academy Press, 1987. Crawley, F., and S. Arditzoglou. “Life and physical science misconceptions of preservice elementary teachers.” Paper presented at School Science and Mathematics Association in 1988. Cronbach, Lee J. Essentials of Psychological Testing,m 5th ed. New York: Harper & Row, 1990. Czujko, Roman, and David Bernstein. Who Takes Science: A Report on Student Coursework in High School Science and Mathematics. American Institute of Physics, 1989. AIP: R-345. Driver, Rosalind. “Pupils alternative framework in science.” European Journal of Science Education 3 (1 1981): 93-101. Driver, Rosalind. “The pupil as scientist.” In Physics Teaching, GIREP in Philadelphia, ed. by Uri Ganiel. Balaban International Science Services, 1980, pp. 331-345. Evans, Alan D. “Implementation and validation of a new course in introductory astronomy at the college level.” ERIC ED162649 (1978): 1-10.

Feldstine, J. N. “Concept mapping: A method for detection of possible student misconceptions.” In International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed. by Joseph D. Novak and H. Helm. Ithaca, NY: Cornell University Press, 1983. Gabel, D. L. “Research interests of secondary science teachers.” Journal of Research in Science Teaching 23 (2 1986): 145-163. Gee, Brian. “Astronomy in School Science.” School Science 82 (1979): 31. Gibbs, Robert E. “Observing the sky.” Department of Physics, Eastern Washington University, 1989. Gilbert, John K., and Roger J. Osborne. “Children’s Science and Its Consequences for Teaching.” Science Education 66 (4 1982): 225-633. Gilman, D., I. Hernandez, and R. Cripe. “The correction of general science misconceptions as a result of feedback mode in computer assisted instruction.” In Proceedings of the National Association of Research in Science Teaching in Minneapolis, 1970. Gorodetsky, Malka, and Esther Gussarsky. “The role of students and teachers in misconceptualization of aspects of ‘chemical equilibrium’.” In Second International Seminar: Misconceptions and Educational Strategies in Science and Mathematics. Cornell University (Dept. of Education), 1987, pp. 187-193. Gregory, Bruce. Inventing Reality. New York: John Wiley & Sons, 1988. Hale-Benson, Janice E. Black Children: Their Roots, Culture, and Learning Styles. Baltimore: Johns Hopkins University Press, 1986. Happs, J. C., and L. Scherpenzeel. “Achieving long term change using the learner’s prior knowledge and a novel teaching setting.” In 2nd International Seminar on Misconception and Educational Strategies in Science and Mathematics in Ithaca, NY, ed. by Joseph D. Novak. Ithaca, NY: Cornell University Press, 1987. Harris, D. “The place of Astronomy in schools.” Physics Education 17 (4 1982): 154-157. Hawkins, David. “Critical Barriers to Science Learning.” Outlook 9 (1978): 3. Healy, Mary K. “Writing in a science class: A case study of the connection between writing and learning.” Ph.D. dissertation, New York University, 1984. Hill, Lon Clay Jr. “Spatial thinking and learning astronomy: The implicit visual grammar of astronomical paradigms.” In International Astronomical Union, Colloquium 105 in Williams College, Williamstown, MA, ed. by Jay M. Pasachoff and John R. Percy. Cambridge University Press, 247-248, 1988. Hoff, D. B., L. J. Kelsey, and J. S. Neff. Activities in Astronomy, 2d ed., Dubuque, IA: Kendall/Hunt, 1984. Holdzkom, David, and Pamela B. Lutz. Research within Reach: Science Education. National Science Teachers Association, 1985.

16

Holton, Gerrold, F. James Rutherford, and Fletcher G. Watson. Project Physics. New York: Holt, Rinehart and Winston, 1981. Howe, Ann C., and Bessie Stanback. “ISCS in review.” Science Education 69 (1 1985): 25-37. Idar, J., and U. Ganiel. “Learning difficulties in high school physics: Development of a remedial teaching method and assessment of its impact on achievement.” Journal of Research in Science Teaching 22 (2 1985): 127-140. Jackson, D., B.J. Edwards, and C.F. Berger. “Teaching the design and interpretation of graphs through a computer aided graphical data analysis.” In National Association for Research in Science Education in Atlanta, GA, 1990. Janke, Delmar L., and Milton O. Pella. “Earth science concepts list for grades K-12 curriculum construction and evaluation.” Journal of Research in Science Teaching 9 (3 1972): 223-230. Lyman, Howard B. Test Scores and What They Mean. Englewood Cliffs, NJ: Prentice-Hall, 1978. Malone, Thomas W. “Toward a theory of intrinsically motivating instruction.” Cognitive Science 4 (1981): 333-369. Mathematics, National Science Board Commission on Precollege Education in. Educating Americans for the 21st Century. National Science Foundation, 1985. McCarthy, Francis Wadsworth. “Age placement of selected science subject matter.” Ed.D., Harvard University Graduate School of Education, 1951. McDermott, Lillian C., Mark L. Rosenquist, and Emily H. van Zee. “Student difficulties in connecting graphs and physics: Examples from kinematics.” American Journal of Physics 55 (6 1987): 503-513. McNally, D. “Astronomy at school.” Physics Education 17 (4 1982): 157-160. Miller, David, ed. Popper Selections. Princeton, NJ: Princeton University Press, 1985. Miller, Patrick W., and Harley E. Erickson. How to Write Tests for Students. Washington, DC: National Educational Association, 1990. Minstrell, Jim. “Explaining the ‘at rest’ condition of an object.” The Physics Teacher (January 1982): 10-14. Moore, R., and F. Sutman. “The development, field test and validation of an inventory of scientific attitudes.” Journal of Research in Science Teaching 7 (1970): 85-94. Munby, H. “Studies involving the Scientific Attitude Inventory: What confidence can we have in this instrument?” Journal of Research in Science Teaching 20 (1983): 141-162. Nisbett, Richard E., Geoffrey T. Fong, Darrin R. Lehman, and Patricia W. Cheng. “Teaching Reasoning.” Science 238 (1987): 625-631. Novak, Joseph D., and D. Bob Gowin. Learning How To Learn. Cambridge: Cambridge University Press, 1984.

17

Nussbaum, Joseph, and Joseph Novak. “Alternative frameworks, conceptual conflict and accommodation: Toward a principled teaching strategy.” Instructional Science 11 (1982): 183-200. Nussbaum, Joseph, and Joseph Novak. “An assessment of childrens’ concepts of the earth utilizing structured interviews.” Science Education 60 (4 1976): 535550. Omar, Abdulaziz Saud. “The effect of using diagnostic-prescriptive teaching on achievement in science of Saudi Arabian high school students.” Ph.D. dissertation, University of Kansas, 1984. Osborne, R. J. “Some aspects of the student’s view of the world.” Research in Science Education 10 (1980): 11-18. Osborne, R. J., and J. K. Gilbert. “A technique for exploring students’ views of the world.” Physics Education 15 (1980): 376-379. Osborne, Roger J., and Peter Freyberg. Learning in Science, The Implication of Childrens’ Science. Auckland, New Zealand: Heineman, 1985. Othman, Mazlan. “Influence of culture on understanding astronomical concepts.” In International Astronomical Union, Colloquium 105 in Williams College, Williamstown, MA, ed. by Jay M. Pasachoff and John R. Percy. Cambridge University Press, 1988, pp. 239-240. Pearson, P.D., J. Hansen, and C. Gordon. “The effect of background knowledge on young children’s comprehension of explicit and implicit information.” Journal of Reading Behavior 11 (1979): 201-209. Piaget, Jean. The Child’s Conception of the World. London: Routledge and Kegan Paul, 1929. Schneps, Matthew, H. And Sadler, Philip M., “A Private Universe.” Pyramid Films, 1988. Sadler, Philip M. “Astronomy in U.S. High Schools.” In GIREP Conference, Cosmos—An Educational Challenge in Copenhagen, ed. by J. Hunt. European Space Agency, 1986, pp. 261-264. Schatz, Dennis, and Anton E. Lawson. “Effective astronomy teaching: Intellectual development and its implications.” Mercury (July/August 1976): 6-13. Schleffler, Israel. Science and Subjectivity. Indianapolis: Hackett Publishing, 1982. Seeds, Michael A. Foundations of Astronomy, 2d ed. Belmont, CA: 1988. Smith, E.L. “Teaching for conceptual change: Some ways of going wrong.” In Proceedings of the International Seminar on Misconceptions in Science and Mathematics in Cornell University, Ithaca, NY, ed. by H. Helm and J. Novak. Ithaca, NY: Cornell University Press, 1983. Smith, Murray R. “Astronomy in the native-oriented classroom.” Journal of American Indian Education 23 (2 1984): 16-23. Snydle, Richard W., and John F. Koser. “An activity-oriented astronomy course.” Science Activities 10 (3 1973): 16-18.

18

Solomon, Joan. “Messy, contradictory, and obstinately persistent: A study of out-of-school ideas about energy.” School Science Review 65 (23 1983): 225229. Stepans, J., and A. McCormack. “A study of scientific conceptions and attitudes toward science of prospective elementary teachers.” Paper presented at Northern Rocky Mountain Educational Research Association in Jackson Hole, WY, 1985. Sunal, Dennis W., and V. Carol Demchik. Astronomy Education Materials Resource Guide. 3d ed. Morgantown: West Virginia University Bookstore, 1985. The College Board. Academic Preparation in Science. Academic Preparation Series. New York: College Board Publications, 1990. Tinkelman, S. “Planning the objective test.” In Educational Measurement, 2d ed., ed. by R. Thorndike. Washington, DC: American Council on Education, 1971. Tremblath, R.J. “The frequencies and origins of scientific misconception.” Ph.D. dissertation, University of Texas at Austin, 1980. Unger, Christopher Matthew. “Conceptual change in science instruction: How might interactive, computer-based models help?” Ed.D. Qualifying Paper, Harvard University, 1988. Watson, Fletcher G. “Astronomy at the upper school level.” Annals of the New York Academy of Sciences 198 (1972): 173-77. Welch, Wayne. “Twenty years of science curriculum development: A look back.” In Review of Research in Education, ed. by D. Berlinger. Washington, DC: American Educational Research Association, 1979. Wittrock, M.C. “Learning as a generative process.” Educational Psychology 11 (1974): 87-95. Zeilik, Michael II. “PSI Astronomy unit: Astrology—The space age science?” American Journal of Physics, 42 (7 1974): 538-542.

19

Appendices A. School Data B. Pre-test Instrument C. P-Value, D-Value Tables D. Classical Test Theory Tables E. Item Correlation Matrix F. Chi-Square Analysis

20

21

Vitae Philip Michael Sadler 10 Carver Street, Cambridge, MA 02138

Education: Massachusetts Institute of Technology, Cambridge, MA

B.S. Physics, 1973

Harvard Graduate School of Education, Cambridge, MA

Ed.M., 1974

Harvard Graduate School of Education, Cambridge, MA

Candidate for Ed.D., 1992

Professional Employment: Instructor, Harvard Graduate School of Education

9/91-present

Frances W. Wright Lecturer on Navigation, Harvard University

1/90-present

Director, Education Department, Harvard-Smithsonian Center for Astrophysics

2/90-present

Project Director, of these NSF Education Projects: MicroObservatory, development of robotic telescope for school use.

3/90-present

InSIGHT, development of advanced simulations for introductory physics.

3/90-present

SPICA, summer institutes to train astronomy workshop leaders.

5/90-present

Project STAR, development of high school level astronomy course.

12/85-5/92

Vice President and Co-Founder, Peripheral and Software Marketing Inc., Newton, MA Vice President and Co-Founder, Computer Products Marketing Inc., Newton, MA

10/82-12/85 8/81-12/85

President (presently on leave) and Founder, Learning Technologies Inc., Cambridge, MA

7/77-present

Science Teacher (grades 7 and 8) and Coordinator, Carroll School, Lincoln, MA

9/74-6/77

Staff Developer, Calculus Project, Education Development Center, Newton, MA

7/73-8/74

Staff Member, Mathematics Project, Education Research Center, MIT

9/71-6/73

Consulting Experience. Bolt, Beranek, and Newman, Cambridge, MA

1991-present

Cambridge Public Schools, Science Advisory Board, Cambridge, MA

1988-present

Boston Childrens’ Museum, Boston, MA

1988-present

Science Museum of Virginia, Richmond, Virginia

1988-present

Lawrence Hall of Science, Berkeley, CA

1988-present

Apple Computer, Cupertino, CA.

1981-84

National Council of Science Museums, Calcutta, India. Children’s Television Workshop, New York, NY.

1981 1977-78

Other Teaching Positions Summer Science Institute, Independent Schools Association, Concord, MA

1987-90

Workshop Leader, National Air and Space Museum, Washington, DC

1987-89

Workshop Leader, National Science Resources Center Summer Institute Smithsonian Institution, Washington, DC

1988

Amplification ‘86, Mathematics Teaching Institute Harvard Graduate School of Education, Cambridge, MA

1986

22

23

Honors and Awards Margaret Noble Address, Middle Atlantic Planetarium Society

May 1991

Representative, U.S.– Soviet Commission on Education, National Academy of Education

May 1988

Executive Producer, “A Private Universe”, Pyramid Films Blue Ribbon, American Film and Video Association

1990

Gold Medal, Documentary, Houston International Film Festival, Houston, TX

1988

Gold Plaque Award, Chicago International Film Festival

1989

Silver Apple, National Educational Film and Video Festival, Seattle, WA.

1987

Representative of the Year (Worldwide), Apple Computer

1982

Patents: 4,164,829 Inflatable Structure

8/21/79

4,178,701 Cylindrical Projector

12/18/79

Teacher Certification: Massachusetts Certificate #183612, General Science, Physics, Mathematics grades 7-12 Professional Membership: American Association for the Advancement of Science American Association of Physics Teachers American Astronomical Society Association of Science Technology Centers Astronomical Society of the Pacific International Planetarium Society National Science Teachers Association

8/12/74-life