Journal of Literacy Research

3 downloads 2376 Views 663KB Size Report
http://jlr.sagepub.com/content/7/3/283.refs.html. Citations: ... Two. (or more) different layouts are printed and their relative effectiveness is decided by comparing ..... FOSTER, J.J. A study of the legibility of one and two column layouts for B.P.S..
Journal ofhttp://jlr.sagepub.com/ Literacy Research

Some Observations on the Reliability of Measures Used in Reading and Typographic Researcha James Hartley, Susan Fraser and Peter Burnhill Journal of Literacy Research 1975 7: 283 DOI: 10.1080/10862967509547146 The online version of this article can be found at: http://jlr.sagepub.com/content/7/3/283

Published by: http://www.sagepublications.com

On behalf of: Literary Research Association

Additional services and information for Journal of Literacy Research can be found at: Email Alerts: http://jlr.sagepub.com/cgi/alerts Subscriptions: http://jlr.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations: http://jlr.sagepub.com/content/7/3/283.refs.html

>> Version of Record - Sep 1, 1975 What is This?

Downloaded from jlr.sagepub.com by guest on October 11, 2013

SOME OBSERVATIONS ON THE RELIABILITY OF MEASURES USED IN READING AND TYPOGRAPHIC RESEARCHa James Hartleyb and Susan Fraser

Peter Burnhill

Department of Psychology, University of Keele

Stafford College of Further Education

Abstract. This study assessed the reliability of nine different measures used in reading and typographic research. Test-retest correlations were calculated for university students and schoolchildren, both male and female. It was clear that some measures were more reliable than others: for example, oral reading was highly reliable, but comprehension was not. The results are discussed with reference to the objectives of different types of measure.

Much research in typography is concerned with the comparison study. Two (or more) different layouts are printed and their relative effectiveness is decided by comparing scores on measures of the ways in which readers react, read and understand. The purpose of this paper is to comment on the reliability of these different measures. In our own research we have been interested in how layout affects the behaviour of people reading materials prepared for instructional purposes (Hartley et al, 1973; Burnhill and Hartley, 1974). We have been much vexed by the problem of which measures to use to assess differences between layouts, and by how reliable these measures might be. In addition we have been concerned by sex differences. We have been concerned because so few investigators seem to have attached much importance to sex differences and yet they have been clearly present in many of our investigations. As a preliminary exercise to our main research, therefore, we have been gathering data on sex differences and on the reliability of different measures that are commonly used to assess differences between typographical layouts. The purpose of this paper is to present these data. a

The authors are grateful to the participating children and students, and to the school staff who assisted with our investigations, and to the Social Science Research Council who financed our research. b Reprints may be requested from James Hartley, University of Keele, Dept. of Psychology, Keele, Staffordshire, ST5 5B6.

284

Journal of Reading Behavior 1975 VII, 3 METHOD

Our procedure has been similar for most of the measures we wish to discuss, and, therefore, it can be briefly outlined here. Full details of the actual materials used, the composition of the experimental subjects, and the procedures employed, are provided in the appendix to this paper. Basically our aim has been to take a measure and to ask readers first to practice it, and then to do it twice more with very little delay between each attempt. We have then calculated the correlation between the scores obtained on the second and third attempt. Three points to note here are: (1) Two groups of subjects were used, university students and schoolchildren with a wide ability range, often all the children of a given age in a particular school. (The schoolchildren were aged from nine to twelve years.) (2) The difficulty of the subject matter of the materials used for each measure was equated (e.g. by using two passages or two parts of a passage by the same author) and difficulty was also tailored to the ability of the readers by using appropriate subject matter. (The actual materials used are described in the appendix.) (3) The typographical dimensions of the materials used for each test-retest measure were held constant, although the subject matter would vary. (Our aim was to ascertain whether or not a measure was reliable. If a measure can be shown to be reliable, one may then proceed to see if this measure also reliably detects differences between layouts. As will be argued below, a measure may be reliable, but it may not be sensitive to differences between typographical layouts.) RESULTS The major results from our studies are presented in Table 1 .In all we have tested the,reliability of nine measures, andin Table 1 we have grouped the results in terms of different types of measure. These results speak fairly clearly for themselves. What follows in this paper are some of our observations concerning each of the measures tested. DISCUSSION The measures listed in Table 1 are divided into four main groups: oral reading, retrieval (scanning), silent reading, and comprehension. Measures concerned with oral reading The reliability coefficients obtained for rate of reading aloud, and rate of reading aloud inverted text, were the highest recorded in this study. Three observations here, however, are as follows:

Hartley, Fraser, Burnhill

285

(1) Although performance is reliable, it may not be sensitive: in other words the effects of differences between typographical layouts may not be detected by using these measures. Indeed, in a separate study (Hartley et al, 1973) we did not find that oral reading did detect differences between different typographic layouts. Our observations suggested that the behaviour of a student reading aloud was rather like that of the motorist who drives at a steady rate despite the road conditions: physical restrictions of comfort seemed to determine the rate of output. (2) Although rate of reading inverted text is highly reliable, this does not indicate that this is a useful measure to use. In addition to the problem of lack of sensitivity (as described above), the correlation between rate of reading inverted and normal text was (with the university students) 0.18. This finding has been replicated in two unpublished studies reported by groups of the authors' students. Such findings indicate that it would not be valid to use rate of reading aloud inverted text as a measure for detecting the effects of differences between different layouts (and, incidentally, they point a criticism at the work of investigators such as Kolers (1972) who use different orientations of printed text in their studies of reading). (To the reader who may be wondering at this point indeed just why this measure was used at all: the aim was to try to slow down the reading process so that it could be more clearly observed. It was anticipated that such a slowing down would allow the effects of differences between layouts to be detected more easily. Although this may happen, the validity of making inferences from such results, as has been indicated in the above paragraph, may be severely questioned.) (3) The third observation is that there were intriguing sex differences in these studies which are not fully revealed by the similar sized correlation coefficients obtained for males and females: this was particularly true with the reading aloud of inverted text. The mean time taken to read approximately 200 words printed upside down for the third time by the men students was 271 sees., whereas it was 151 sees, for the women students. However as the standard deviation was 144 sees, for the men and 65 sees, for the women, this difference (with this N) was not significant (t=2.05; .05