Ophthal. Physiol. Opt. Vol. 21, No. 1, pp. 30±35, 2001 q 2000 The College of Optometrists. Published by Elsevier Science Ltd All rights reserved. Printed in Great Britain 0275-5408/00/$20.00 www.elsevier.com/locate/ophopt
Clinical grading of corneal staining of non-contact lens wearers Morven Dundas 1, Alyson Walker 1 and Russell L. Woods 2 1 Private Practice, Glasgow, Scotland, UK, and 2Schepens Eye Research Institute, Harvard Medical School, 20 Staniford Street, Boston, MA 02114, USA
Summary To distinguish normal from pathological corneal ¯uorescein staining requires knowledge of background levels of staining among otherwise healthy individuals. Corneal staining of 102 noncontact lens wearing subjects was assessed using a photographic grading scale that uses a generic (0 to 4) scale to score corneal staining. Some degree of corneal staining was found on 79% of the corneas. Low inter-observer variability suggests that the corneal staining grading scale can be used successfully with decimal rather than integer scale increments. q 2000 The College of Optometrists. Published by Elsevier Science Ltd. All rights reserved.
Typically, the absence of a sign (i.e. no staining) is given a grade of zero, and numbers up to 4 are used to describe increasing levels of the sign (i.e. grade 4 staining is extreme and requires immediate intervention) (Woods, 1989). Expansion of the grading scale from these ®ve levels (e.g. by using decimals) should increase discriminability (Bailey et al., 1991). However, while a small expansion of the ®velevel scale may be effective (Lloyd, 1992; Schwallie et al., 1997), a recent report suggests that inter-observer variability is a limiting factor (Efron, 1998). One method of describing the background level of staining is to measure the prevalence in a population known to have no obvious cause for staining (i.e. disease or contact lens wear). Reported prevalence of corneal staining among healthy non-contact lens wearers has varied between 4 and 78% (Caffery and Josephson, 1991; Josephson and Caffery, 1988; Korb and Exford Korb, 1970; Korb and Herman, 1979; Norn, 1970; Schwallie et al., 1997; Soni et al., 1996; Thomas et al., 1997). The earlier studies used methods of de®ning staining that are not open to other practitioners as the de®nition of staining was based on the opinion of the observer. For example, if a further 63 subjects who exhibited staining that Korb and Exford Korb (1970) did not consider signi®cant are included, prevalence rises from 37 to 58%. The inclusion of subjects with `signi®cant' staining only rather than subjects with any staining may explain why the earlier studies reported a lower prevalence of staining (4±37%) than Schwallie et al. (1997) who used a clinical grading scale (78%). Schwallie et al. examined a small
Introduction Fluorescein has been used widely for many years to assess corneal integrity. Though the precise nature of ¯uorescein staining is uncertain (Caffery and Josephson, 1991; Thomas et al., 1997; Wilson et al., 1995), it is commonly accepted that ¯uorescein staining represents compromised corneal epithelium. Common causes of staining in clinical practice include contact lens wear, keratopathies and complications of systemic disease. However, staining can occur without any obvious cause (Korb and Exford Korb, 1970; Norn, 1970). Better knowledge of the characteristics of this background staining and better methods of quantifying staining will improve the detection of corneal compromise (a signal detection task). In recent years, photographic (Cornea and Contact Lens Research Unit, 1996) and pictorial (Efron, 1997) clinical grading scales, similar to those used in clinical research for more than a decade, have become available to practitioners. These grading systems allow quanti®cation of clinical signs using a generic ordinal scale (Woods, 1989). Received: 10 August 1999 Revised form: 3 February 2000 Accepted: 7 February 2000 Correspondence and reprint requests to: R. L. Woods, Schepens Eye Research Institute, Harvard Medical School, 20 Staniford Street, Boston, MA 02114, USA. Tel.: 11-617-912-2589; fax: 11-617-912-0111. E-mail address: [email protected]
(R. L. Woods).
Clinical grading of corneal staining: M. Dundas et al.
Figure 1. Each of the ®ve corneal zones were graded separately for depth and extent of ¯uorescein staining.
number of subjects
n 16 over an extended period, whereas the earlier studies were conventional crosssectional studies. However, Soni et al. (1996), using a grading scale, reported only 6% staining among adolescents. Other methodological differences include the ¯uorescein (e.g. concentration, mode of instillation), observation timing, equipment and sample characteristics. No study using a clinical grading scale has reported the prevalence of staining or the distribution of staining grades in a large sample population. To evaluate background staining, we conducted a crosssectional study de®ning staining using the CCLRU photographic grading scale (Cornea and Contact Lens Research Unit, 1996). Also, we measured inter-observer variability when the grading scale was interpolated to a decimal scale. Methods Subjects The sample consisted of 102 subjects (57 female, 45 male), mainly staff and students at a university in Scotland. All subjects were healthy non-contact lens wearers or had no contact lens wear within the previous 6 weeks. Six weeks is suf®cient time for contact lens induced epithelial abrasions to heal (Duke Elder, 1973; Schwallie et al., 1997), and is consistent with that of other studies (Korb and Exford Korb, 1970; Korb and Herman, 1979; Schwallie et al., 1997). Median subject age was 22 years and age ranged from 18 to 50 years. Over 50 years of age, there are factors which could in¯uence staining (Millodot, 1977; Millodot and Owens, 1984; Norn, 1970). Subjects with any anterior segment disease including blepharitis and conjunctivitis (Norn, 1970) or a history of corneal surgery (Xu et al., 1996) were excluded. Procedure: prevalence study Only the right eye of each subject was examined as a strong correlation between eyes has been reported (Begley et al., 1996). A single dose of ¯uorescein was administered to the superior bulbar conjunctiva with a Fluorete wetted with a single drop of unpreserved saline. The cornea was examined immediately by one of the two observers using a
slit-lamp biomicroscope with cobalt blue ®lter. Any observed staining was drawn on to a diagram that subdivided the cornea into ®ve zones (Figure 1), as different zones have been reported to stain differently (Caffery and Josephson, 1991; Josephson and Caffery, 1992; Korb and Exford Korb, 1970; Korb and Herman, 1979; Schwallie et al., 1997; Thomas et al., 1997). The extent and depth of any staining observed was graded for each zone. The grading scale used was based on the CCLRU grading scale (a series of reference photographs). The two observers extrapolated between the ®ve grades in 0.1 increments (i.e. 0, 0.1, 0.2, 0.3, ¼ 3.8, 3.9, 4) to increase the sensitivity of the scale as suggested by Bailey et al. (1991). Within each corneal zone, a grade for both depth and extent could range between 0 and 4. Therefore, the sum over the whole cornea of the grades for both extent and depth could range between 0 and 20 (i.e. 5 £ 4: Before the study commenced, the three authors discussed grading strategies and compared the grades assigned to the corneal staining of human subjects, some of who were contact lens wearers. The two observers (MD & AW) were trainee optometrists and the third author (RLW) an experienced user of clinical grading scales. Procedure: inter-observer agreement study The agreement between the two observers was evaluated before commencement of and at completion of the prevalence study. This assessed measurement reliability of the observers and, by inference, of the grading system. For this investigation both the extent and depth of staining grades were compared. Each observer independently (i.e. masked to the results of the other observer's examination), sequentially examined the same subject and graded any staining observed in the right eye. Fluorescein was instilled by the ®rst observer only. More staining is likely to be observed as the time between instillation and observation increases (Korb and Herman, 1979) and instillation of ¯uorescein by the second observer would introduce the complication of sequential staining (Thomas et al., 1997). To balance this source of error, the observers alternated the order of subject examination. No order effects were found in the data. Data analysis In addition to analyses of the 10 staining values (two scales, ®ve corneal zones), we evaluated overall grading scores as suggested by Begley et al. (1996). For both extent and depth of staining we (1) summed the grades in the ®ve areas; (2) averaged the grade in those zones containing staining; and (3) reported the maximum grade. Non-parametric statistical tests were used, as the grades were not normally distributed, being highly skewed towards zero. Also, it is unknown whether the CCLRU grading scale represents a truly
Ophthal. Physiol. Opt. 2001 21: No 1
Table 1. Prevalence of the overall grading scores of the 102 subjects Extent
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 .1 .1
21 15 9 14 13 7 12 1 3 2 1 4
21 34 26 15 2 1 1 1 0 0 1 0
21 21 22 23 7 2 2 0 0 0 2 2
21 26 28 15 8 4 0 0 0 0 0 0
21 76 3 1 1 0 0 0 0 0 0 0
21 69 9 2 1 0 0 0 0 0 0 0
interval scale. Since 16 statistical tests were conducted (10 staining values and six overall scores), the Bonferroni correction was used. Thus, the statistical signi®cance required was p , 0:003 (i.e. 0.05/16). To analyse the inter-observer agreement, frequency distributions of the discrepancy scores (i.e. observer one's grade 2 observer two's grade) were examined for: (1) all grades (i.e. in all ®ve corneal zones); (2) averaged grade in those zones containing staining; and (3) the maximum grade. An evaluation of summed grades could be misleading, since the summed grade is measured on a scale that ranges from 0 to 20 not 0 to 4. This method of analysis makes certain assumptions about the data, in particular that the data is from an interval scale. These clinical grading scales are an ordinal system, but may not approximate an interval scale. In an interval scale, the difference between each subsequent level is identical (Stevens, 1951). For example, we cannot be sure that the difference between 0.5 and 1.0 units is the same as the difference between 3.0 and 3.5 units in these clinical grading scales. If the grading scale does not approximate an interval scale, inter-observer discrepancy scores are expected to vary with the level of measurement. In other words, if the grading scale approximates an interval scale, the discrepancy scores will be normally distributed, and the standard deviation of the discrepancy distribution will be independent of the range of the actual (raw) scores used in its determination. We
Figure 2. About half of the 102 subjects had ¯uorescein staining in the inferior and superior corneal zones, while few had central corneal staining.
found no systematic variation in our discrepancy scores (i.e. no correlation between the discrepancy score and the raw score). Therefore we completed the analysis of the discrepancy distributions in the manner recommended by Bland and Altman (1986) and estimated concordance from the standard deviations (Bailey et al., 1991). Results Some degree of ¯uorescein staining was found on 79% of the subjects' corneas (Table 1). Extent and depth of staining in each corneal zone were highly correlated (Spearman's correlation coef®cient . 0.89, p , 0:0001: Half of the subjects had staining in the inferior or superior zones, but only 5% had staining in the central zone (Figure 2). For both extent and depth of staining, there was a signi®cant difference between the corneal zones (Friedman one-way analysis of variance, p , 0:001: Frequency distributions for extent and depth grades in each of the ®ve corneal zones are shown in Figure 3. As there was no signi®cant difference in the extent or depth of staining found by the two observers (Mann±Whitney test, p . 0:02; we assume that there was no relative bias between observers in the allocation of grades. As shown in Table 1, a maximum grade or an average grade of greater than 0.5 units may be considered unusual. For each of the three measures (individual, average and maximum grade), we found no signi®cant differences in the inter-observer agreement before and after the prevalence study. This suggests that the two observers were consistent in their grading during the prevalence study. There was no signi®cant difference in inter-observer agreement between extent and depth grades. Therefore the data was combined. As shown in Figure 4, the discrepancy distributions for the three measures were apparently normally distributed, though the distribution of individual grades was signi®cantly leptokurtic (Kolmogorov±Smirnov, p , 0:001: The standard deviations were 0.13, 0.18 and 0.17 for the individual grades, the average grades and the maximum
Clinical grading of corneal staining: M. Dundas et al.
Figure 4. The discrepancy distributions for (A) all grades in each corneal zone; and (B) average and maximum grades for the whole cornea were all approximately normally distributed.
Figure 3. The (A) extent; and (B) depth of corneal staining differed between the ®ve corneal zones. The frequency histograms are scaled identically.
grades, respectively. Concordance between the two observers ranged between 0.22 and 0.30 (Bailey et al., 1991). Since this analysis was conducted using corneas that had only small amounts of staining, there was a risk that the measured discrepancy distributions were affected by the proximity of most of the measurements to the end of the measurement range. This limits the amount of variability that can occur in one direction. For example, if one observer has allocated a grade of 0.2 units, the second observer could allocate a grade that differed by up to 3.8 units in one direction, but by no more than 0.2 units in the other direction. To examine this problem we conducted computer simulations using a range of ªactualº discrepancy distributions to produce simulations of measured discrepancy distributions under conditions of truncated range. Distortion of the
simulated measured discrepancy distributions was minimal under the conditions of our experiment, showing a small shift of the mean, as found in our experiment (Figure 4B). An actual discrepancy distribution with a standard deviation of 0.18 units produced standard deviations similar to those found in our study. However, it was apparent that if the reliability of the two observers had been worse (i.e. a larger standard deviation of the actual discrepancy distribution), the measured discrepancy distributions would have been markedly distorted (i.e. no longer normally distributed). Discussion That 79% of apparently healthy non-contact lens wearers should exhibit some degree of corneal ¯uorescein staining seems inconsistent with clinical experience. Or at least, inconsistent with the assumption that healthy people have no corneal staining (grade 0). Most previous studies have reported much lower prevalences of corneal staining (Caffery and Josephson, 1991; Josephson and Caffery, 1988; Korb and Exford Korb, 1970; Korb and Herman, 1979; Norn, 1970; Soni et al., 1996). Only Schwallie et al. (1997), who used a similar grading scale, have reported similar levels of staining. This discrepancy is explained
Ophthal. Physiol. Opt. 2001 21: No 1
probably by differences in the de®nitions of staining that have been used. We used a very strict de®nition of staining. If we had used a different de®nition, we would have reported a prevalence of staining more consistent with earlier studies. For example, if we de®ned staining as a maximum grade $0.3, the prevalence would have been 38 or 3% for extent and depth, respectively (Table 1). Begley et al. (1996) found that maximum score was used slightly more commonly than the other overall scores, but the results for extent and depth were quite different. These differences between different methods of determining a combined corneal score may account for some of the discrepancies between prevalence studies. In clinical practice, a healthy patient with corneal staining grade of greater than 0.5 units should be considered as unusual. Our subject sample may not have been a good representation of the normal population found in a typical ophthalmic practice, as the majority were students and staff of a university in Scotland. Our sample was 88% Caucasian and had a median age of 22 years (range 18±50 years). New contact lens patients are often of this younger age. Adolescents may have less staining (Soni et al., 1996), while older people may have more staining (Norn, 1970) than the subjects in our study. Though there are substantial bene®ts to the use of clinical grading scales (Efron, 1997; Lloyd, 1992; Woods, 1989), use by clinicians remains limited. Part of that reluctance may be perceived dif®culty of use. The experience of our two observers, both inexperienced clinicians, demonstrates that with moderate practice and inter-observer discussion and comparison, it was possible to use the corneal staining grading scale successfully. Our observers achieved reasonable concordance with limited practice, suggesting that experienced clinicians should not ®nd this dif®cult. Bailey et al. (1991) noted that a high concordance is not necessarily good. High concordance indicates good consistency in grading, but it also indicates that the scale could be more sensitive if ®ner increments were adopted. They suggested that the size of the scale increment should not exceed one standard deviation of the discrepancy distribution. We found standard deviations between 0.13 and 0.18 units, suggesting that decimal increments (0.1 units) are suitable for grading of corneal staining. Discrepancy distribution standard deviations reported in other studies have ranged from 0.17 to 0.30 units (Chong et al., 2000) through 0.5 units (Lloyd, 1992; Efron, 1998). Efron's recommendation that units of 1.0 were optimal was based on data collected from a large sample of clinicians at a contact lens conference. Experience with the grading scale of those clinicians ranged from novice to expert. In addition, many of the clinicians had a poor view of the slides displayed. It is not clear that grading of photographs (a single, static view of the condition) is the same as grading an actual eye, when the clinician has the ability to obtain multiple views of the eye over an extended period. Unlike
the other studies (Chong et al., 2000; Efron, 1998; Lloyd, 1992), only two observers were involved in our investigation of inter-observer reliability, and those observers had the advantage of a period of training in the use of the grading scales when they worked to enhance their concordance. Chong et al.'s ®ve observers had at least 2 years experience with clinical grading scales, but interactions between observers prior to observing the photographs was not reported. The experience of Lloyd's seven observers was not reported. The inter-observer agreement found in our study may represent a near best-case situation, that may not be representative of agreement in other situations. Equally good agreement may occur within a larger group (e.g. a practice or a research group) given such discussion, but is unlikely to occur between groups. If our study does represent a best-case situation, then this suggests the limits of reliability. It is such limits that should be used to determine the minimum increments of the grading scale. One caveat to the use of 0.1 unit increments for corneal staining grading, is that different increments may be appropriate for different clinical grading scales. However, Chong et al. (2000) found similar discrepancy distributions for bulbar redness, 3 and 9o'clock staining and palpebral conjunctival roughness using three methods of grading. Since our study involved only healthy subjects with corneal staining grades usually below 0.5 units, the evaluation of inter-observer agreement was made using a very restricted range of the total grading scale (see Table 1). Since there was no variation in the discrepancy scores over the range of actual scores and our computer simulations suggested no major distortion in the distributions due to the range of actual scores, the truncated range appears not to have been a problem. If clinical grading scales can be shown to be an approximately interval scale, then other estimates of reliability obtained under similar conditions should be similar. In that case, the discrepancy distribution should be independent of the range of actual grades for most distributions of actual scores. This was found by Chong et al. (2000). Further investigation of these clinical grading scales is required. Acknowledgements The authors have no proprietary interests in the products described. This study formed the basis of the honours project completed by M.D. and A.W.
References Bailey, I. L., Bullimore, M. A., Raasch, T. W. and Taylor, H. R. (1991). Clinical grading and the effects of scaling. Invest. Ophthalmol. Visual Sci. 32, 422±432. Begley, C. G., Barr, J. T., Edrington, T. B., Long, W. D., McKenney, C. D. and Chalmers, R. L. (1996). Characteristics of
Clinical grading of corneal staining: M. Dundas et al. corneal staining in hydrogel contact lens wearers. Optom. Vision Sci. 73, 193±200. Bland, J. M. and Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 8476, 307±310. Caffery, B. E. and Josephson, J. E. (1991). Corneal staining after sequential instillations of ¯uorescein over 30 days. Optom. Vision Sci. 68, 467±469. Chong, T., Simpson, T. and Fonn, D. (2000). The repeatability of discrete and continuous anterior segment grading scales. Optom. Vision Sci. 77, 244±251. Cornea and Contact Lens Research Unit, 1996. CCLRU Grading Scales, School of Optometry, University of New South Wales, Sydney, Australia. (available from Vistakon). Duke Elder, S. (1973). Diseases of the Outer Eye. Part 2. Disease of the Cornea and Sclera, Epibulbar Manifestations of Systemic Disease Cysts and Tumours. 3rd ed., vol. 8, Henry Kimpton, London. Efron, N. (1997). Clinical application of grading scales for contact lens complications. Optician 213, 26±35. Efron, N. (1998). Grading scales for contact lens complications. Ophthal. Physiol. Opt. 18, 182±186. Josephson, J. E. and Caffery, B. E. (1988). Corneal staining after instillation of topical anesthetic (SSII). Invest. Ophthalmol. Visual Sci. 29, 1096±1099. Josephson, J. E. and Caffery, B. E. (1992). Corneal staining characteristics after sequential instillations of ¯uorescein. Optom. Vision Sci. 69, 570±573. Korb, D. R. and Exford Korb, J. M. (1970). Corneal staining prior to contact lens wearing. J. Am. Optom. Assoc. 41, 228±232. Korb, D. R. and Herman, J. P. (1979). Corneal staining subsequent
to sequential ¯uorescein instillations. J. Am. Optom. Assoc. 50, 361±367. Lloyd, M. (1992). Lies, statistics and clinical signi®cance. J. Br. Contact Lens Assoc. 15, 67±70. Millodot, M. (1977). The in¯uence of age on the sensitivity of the cornea. Invest. Ophthalmol. Visual Sci. 16, 240±242. Millodot, M. and Owens, H. (1984). The in¯uence of age on the fragility of the cornea. Acta Ophthalmol. (Kbh) 62, 819±824. Norn, M. S. (1970). Micropunctate ¯uorescein vital staining of the cornea. Acta Ophthalmol. 48, 108±118. Schwallie, J. D., McKenney, C. D., Long, W. D. and McNeil, A. (1997). Corneal staining patterns in normal non-contact lens wearers. Optom. Vision Sci. 74, 92±98. Soni, P. S., Horner, D. G. and Ross, J. (1996). Ocular response to lens care systems in adolescent soft contact lens wearers. Optom. Vision Sci. 73, 70±85. Stevens, S. S. (1951). Handbook of Experimental Psychology, Wiley, New York. Thomas, M. L., Szeto, V. R., Gan, C. M. and Polse, K. A. (1997). Sequential staining: the effects of sodium ¯uorescein, osmolarity, and pH on human corneal epithelium. Optom. Vision Sci. 74, 207±210. Wilson, G., Ren, H. and Laurent, J. (1995). Corneal epithelial ¯uorescein staining. J. Am. Optom. Assoc. 66, 435±441. Woods, R. L. (1989). Quantitative slit lamp observations in contact lens practice. J. Br. Contact Lens Assoc. Sci. Meetings 12, 42±45. Xu, K. P., Yagi, Y. and Tsubota, K. (1996). Decrease in corneal sensitivity and change in tear function in dry eye. Cornea 15, 235±239.