Actuarial Science, University of Waterloo, Waterloo, Ontario, â¡University of Waterloo, Waterloo, Ontario, Â§Psoriatic Arthritis. Clinic, University of Toronto and ...
British Journal of Rheumatology 1998;37:760–765
RADIOLOGICAL ASSESSMENT IN PSORIATIC ARTHRITIS P. RAHMAN, D. D. GLADMAN,* R. J. COOK,† Y. ZHOU,‡ G. YOUNG§ and D. SALONEN¶ Department of Medicine and Institute of Medical Sciences, University of Toronto, *Department of Medicine, University of Toronto and Psoriatic Arthritis Program, The Toronto Hospital Rheumatic Disease Unit, †Department of Statistical and Actuarial Science, University of Waterloo, Waterloo, Ontario, ‡University of Waterloo, Waterloo, Ontario, §Psoriatic Arthritis Clinic, University of Toronto and ¶Department of Medical Imaging, University of Toronto, Toronto, Canada SUMMARY Our objective was to compare the reliability and responsiveness of the original Steinbrocker’s (OS), our modified Steinbrocker’s (MS) and Larsen’s (L) radiological scoring methods for detecting radiological change in psoriatic arthritis over time. Two sets of radiographs of the hands and feet at least 2 yr apart were selected from 68 patients. Films were randomly presented and scored independently by a rheumatologist (DDG) and a radiologist (DS), in a blinded fashion using all methods. The index of reliability was the intraclass coefficient (ICC ) and the responsiveness was assessed using plots and regression analyses. All three radiological scoring methods have excellent interobserver and good intra-observer reliability. L and MS are equally responsive and superior to OS in detecting change in joint damage over time. Thus, the L or MS radiological scoring methods can be used to monitor disease progression in psoriatic arthritis. K : Psoriatic arthritis, Radiological methods, Larsen’s method, Steinbrocker’s method, Validity, Responsiveness.
T plain radiograph of the joints is considered the current standard for assessing disease progression in inflammatory arthritis. In particular, radiographs of the hands and feet have been most widely used for this purpose. The first standardized radiological scoring method was devised by Steinbrocker et al. in 1949 . Since then, numerous scoring methods have been proposed which differ greatly in the type of abnormalities they are designed to capture and the number of joints that are scored [2–9]. All the scoring methods evaluating peripheral joints were developed for patients with rheumatoid arthritis. At present, there is no single method that has achieved universal acceptance. The scoring methods by Larsen et al. , Steinbrocker et al.  and Sharp et al.  appear to be the most commonly reported for evaluating radiographic changes and quantifying damage. While these methods have been validated in rheumatoid arthritis [12, 13], similar studies in psoriatic arthritis are lacking. Radiographic changes in psoriatic arthritis differ from those in rheumatoid arthritis, as evidenced by lower frequency of periarticular osteopenia, higher prevalence of distal interphalangeal erosions, along with the presence of tuft changes, pencil-in-cup changes, bony proliferation, periostitis and bony ankylosis in psoriatic arthritis. Since most scoring methods involve assessment of erosions and joint destruction, it seems prudent to validate specifically radiological methods in psoriatic arthritis. Scoring methods already validated in rheumatoid arthritis were chosen as both conditions are inflammatory arthropathies and share histological features which eventually lead to joint destruction . The ideal radiological scoring method to follow
psoriatic arthritis longitudinally and monitor disease progression should be valid, easy to administer, rapid and economically feasible. Larsen’s and Steinbrocker’s methods fulfil these requirements if adequately validated, as suggested by the methodological framework by Bombardier and Tugwell  (content validity, face validity, criterion validity, discriminant validity and construct validity). The objective of our study was to compare the reliability, reproducibility and responsiveness of the original Steinbrocker’s method, our modified Steinbrocker’s method and Larsen’s method for detecting radiological change in psoriatic arthritis over time. METHODS Patients The radiographs were selected from the University of Toronto Psoriatic Arthritis Clinic. This clinic was established in 1978 and since then has been enrolling patients with psoriatic arthritis as part of an ongoing prospective study. Patients were assessed every 6–12 months, at which time a standardized history, physical examination and laboratory evaluation were completed and entered into a computerized database. Routine radiographs of the hands, feet, spine and sacroiliac joints were completed at 1–2 yr intervals. Radiographic assessment From our cohort, radiographs of 68 patients with a wide spectrum of disease were selected by an independent physician who was not involved in interpreting the films. Two sets of posteroanterior radiographs of the hands and feet at least 2 yr apart were obtained for each patient. Prior to reading the films, a training session was held to review the scoring methods. For the Steinbrocker method, each of the distal interphalangeal, proximal interphalangeal and metacarpophalangeal joints of the hand, the wrist, and the
Submitted 18 July 1997; revised version accepted 27 February 1998. Correspondence to: D. Gladman, 1-318 Main Pavilion, The Toronto Hospital, Western Division, 399 Bathurst Street, Toronto, Ontario M5T 2S8, Canada.
© 1998 British Society for Rheumatology 760
RAHMAN ET AL.: RADIOLOGICAL ASSESSMENT IN PSORIATIC ARTHRITIS
metatarsophalangeal and first interphalangeal joint of the feet were scored. The wrist was counted as a single joint. Each joint was judged on a 0–4 scale ( Table I ). For the original Steinbrocker’s method, a single score was assigned according to the status of the worst joint . We felt that the assignment of a single global score would make this method insensitive to detect changes, thus Steinbrocker’s method was modified to record the score of each individual joint assessed, using the scale in Table I. Thus, our modified Steinbrocker method is based on the same scale, but provides a more detailed method of scoring. For Larsen’s method, the joints scored were of a similar distribution and each joint was graded according to a 0–5 scale in accordance with Larsen’s method as modified by Rau and Hehborn [16 ] (Table I ). The distal interphalangeal joint, as well as the first metatarsophalangeal and interphalangeal joint of the feet, were added to the original Larsen’s method as these joints are frequently affected in psoriatic arthritis. The index was expressed as a mean of all individual areas rather than an aggregate score, as suggested by Larsen for evaluating radiographs for long-term studies . The films were presented to the observers in a random order and were scored in a blinded fashion. The radiographs were read independently by an experienced rheumatologist (DDG) and a radiologist (DS ) with an expertise in musculoskeletal radiography using all methods. Twenty radiographs were recirculated to the observers and were again presented in a random order to determine the intra-observer variability. Any joint demonstrating radiographic alterations exclusively due to degenerative arthritis was excluded, as the purpose of the study was to evaluate methods to detect alterations in psoriatic arthritis. Analysis In order to assess the interobserver reliability, only baseline measurements were used to ensure that the assumption of independence across patients was achieved. To assess reliability based on standard statistical methods, assumptions of approximate normality of scores are made. Individual joint assessments are very discrete and highly skewed, so overall mean ratings and mean ratings by joint location were examined. Intra-observer reliability was measured with mulTABLE I Grading for Steinbrocker’s and modified Larsen’s method Steinbrocker’s method 0 Normal 1 Soft-tissue swelling/osteopenia 2 Erosion 3 Erosion plus joint space narrowing 4 Total joint destruction Modified Larsen’s method 0 Normal 1 Soft-tissue swelling, osteoporosis, slight joint space narrowing 2 Erosion with destruction of joint surface (DJS ) 75%
tiple assessment on each patient. Analysis was carried out based on overall mean scores and location-specific mean scores. For assessing both the interobserver and intra-observer reliability, analysis of variance was conducted with a random ‘patient’ effect. Let s2 and s2 s r denote the variance components reflecting subject to subject and observer variability. The index of reliability adopted is the interclass correlation coefficient (ICC ) given by ICC = s2 /(s2 + s2 ) . Our objective in s s r fitting these models was to obtain estimates for the ICC for Larsen’s and both Steinbrocker’s methods, and to compare them. Regarding the responsiveness analysis, as there was no gold standard for measuring joint damage, we were restricted to making comparisons of change scores between instruments. The extent to which change suggested by one instrument related to change by another was used. The responsiveness was measured by plotting the change scores between baseline and 2 yr for both methods, along with regression analysis. The indices of interest include the r2, and the slope of the best fitting regression line. A slope near one would indicate that two methods being compared are reacting to changes to approximately the same degree. Of course the scoring methods given in Table I must be borne in mind when interpreting these results. RESULTS The demographics of the 68 patients are listed in Table II. The Larsen and both Steinbrocker methods all showed excellent interobserver reliability as measured by the ICC. The interobserver ICC for Larsen’s method was 0.87 (95% CI: 0.79, 0.92); for the original Steinbrocker’s method it was 0.86 (95% CI: 0.76, 0.90) and for the modified Steinbrocker’s method it was 0.86 (95% CI: 0.76, 0.90). All methods also showed good intra-observer reliability for each observer. For Larsen’s method, the intra-observer ICC was 0.84 (95% CI: 0.62, 0.94) for DS and 0.85 (95% CI: 0.64, 0.95) for DDG. For the original Steinbrocker’s method, the intra-observer ICC was 0.90 (95% CI: 0.74, 0.96) for DS and 0.86 (95% CI: 0.65, 0.95) for DDG. Finally, for the modified Steinbrocker’s, the intra-observer ICC was 0.80 (95% CI: 0.52, 0.92) for DS and 0.81 (95% CI: 0.59, 0.93) for DDG. As all ICC values were >0.5, the variance of error term was always smaller than the variance of the patient effect. In terms of comparing Larsen’s method with the two Steinbrocker’s methods, since the confidence intervals overlap with each other, there is no statistically TABLE II Demographics of study population (68 patients)
Age at presentation to clinic (yr) Arthritis duration Age at onset of arthritis Number of active joints Number of effusions Number of damaged joints
40.86 6.19 34.67 9.36 3.15 1.74
12.42 8.70 10.79 7.73 3.03 3.67
BRITISH JOURNAL OF RHEUMATOLOGY VOL. 37 NO. 7
significant difference among the methods with respect to interobserver or intra-observer reliability. In order to assess whether these radiographic scoring methods were able to detect change in psoriatic arthritis, the change scores of the instruments were compared with each other. The relative responsiveness of Larsen’s vs Steinbrocker’s method as measured by the slope of the regression line was 0.15 (95% CI: 0.06, 0.24) for DS and 0.09 (95% CI: 0.001, 0.19) for DDG ( Fig. 1a and b, respectively). Thus, the original Steinbrocker’s was not as sensitive in detecting radiographic change as Larsen’s for both examiners. In contrast, the slope of Larsen’s vs the modified Steinbrocker’s method was 1.1 (95% CI: 1.0, 1.1) for DS and 0.93 (95% CI: 0.80, 1.0) for DDG (Fig. 2a and b, respectively). Thus, Larsen’s and the modified Steinbrocker’s were of comparable responsiveness for change for both examiners. DISCUSSION Steinbrocker’s is a simple scoring method, initially devised in 1949, which is still widely used today. It assesses global changes and gives an overall measure of joint damage from 0 to 4. The severity of radiological involvement is scored by assessing the degree of softtissue swelling, osteopenia, joint space narrowing, malalignment and bony ankylosis. It is performed rapidly and thus is quite useful in clinical practice. No radiographic standards are employed in using this method. Larsen’s method was introduced in 1977, and has been modified on numerous occasions [16, 19, 20]. Similar to Steinbrocker’s method, Larsen’s assesses the joint globally. Scores range from 0 to 5 depending on the extent of osteoporosis, joint space narrowing, erosions and joint destruction. This method is based on standard radiographs in an attempt to improve reproducibility. Rau modified Larsen’s method by quantifying the extent of joint space destruction required to attain stages 2–5 [16 ]. Larsen’s method theoretically may be more responsive for detecting change than Steinbrocker’s for any given joint as this index has an additional stage to detect change. Sharp’s method was developed in 1971  and modified in 1985 . Erosions and joint space narrowing are scored separately in this method for a total of 35 observations in each hand. This detailed analysis provides a greater sensitivity and amplitude to change than Larsen’s and Steinbrocker’s methods in patients with rheumatoid arthritis . However, Sharp’s method may not be as sensitive in detecting changes in psoriatic arthritis as compared to rheumatoid arthritis, since the erosions and new bone formation in psoriatic arthritis are often para-marginal or involve the shaft of the phalanges. These lesions would not be accounted for by Sharp’s method, as it detects discrete intra-articular erosions and joint space narrowing, unlike Larsen’s and Steinbrocker’s methods that score the joint globally. Sharp’s method was not assessed in our study as it is quite time consuming to perform and felt not to be very practical in a clinic setting.
In our study, Larsen’s and both the original and modified Steinbrocker’s methods all showed excellent interobserver reliability with an ICC of 0.87, 0.86 and 0.86, respectively. These correlations are quite high and similar to those reported in studies with rheumatoid arthritis . The consistency among each reader was also high as the intra-observer reliability for Larsen’s was 0.84 and 0.85 for DS and DDG, respectively; for the original Steinbrocker’s, it was 0.90 and 0.86 for DS and DDG, respectively, and for the modified Steinbrocker’s it was 0.80 and 0.81 for DS and DDG, respectively. The combination of excellent agreement between the two observers and the high consistency of each observer suggests that all methods are reliable. A radiographic scoring method should not only be reliable, but also sensitive to change. As suspected, the original Steinbrocker’s was relatively insensitive to detect changes over time as this expedient method omitted a significant amount of information. Meanwhile, the Larsen and the modified Steinbrocker methods were equally responsive to change by both examiners. Thus, these two radiographic methods can be used to detect change in psoriatic arthritis. In applying the concepts of the validity criteria , as they relate to the assessment of a radiographic scoring method, the Larsen’s and modified Steinbrocker’s methods both appear to be valid. Content validity (choice and the relative importance given to each component appropriate for the method) and face validity (method of aggregating the individual components into a score) appear to be justified as the components (soft-tissue swelling, osteoporosis, erosions, joint space narrowing, and destruction) reflect an orderly sequence of the pathogenic, histological and radiological stage resulting in joint damage. Criterion validity (the method produces consistent results that reflect the true clinical state of the patient) is confirmed for both methods by our results as the ICC for interobserver reliability and intra-observer reliability was quite high, suggesting excellent consistency of the methods. With respect to discriminant validity (detects smallest clinically significant differences), Larsen’s and modified Steinbrocker’s methods both detect change over time and were equally responsive. It is not possible to ascertain whether the radiological change detected was the smallest clinically significant difference as there is no clinical outcome measure which represents the minimal radiological change considered clinically significant for comparison. Construct validity (the method agrees with expected results based on the hypothesis of the investigator) seems justified as Larsen’s and Steinbrocker’s methods detect change which mimics pathophysiological changes and these methods have been previously validated in rheumatoid arthritis, which is an inflammatory arthritis with many features similar to psoriatic arthritis. Finally, Larsen’s and Steinbrocker’s methods are feasible in a clinical practice as they are easy to score, not time consuming and relatively inexpensive. Thus, from our study we conclude that Larsen’s and
RAHMAN ET AL.: RADIOLOGICAL ASSESSMENT IN PSORIATIC ARTHRITIS
F. 1.—(a) Responsiveness analysis of Larsen vs original Steinbrocker (DS ). (b) Responsiveness analysis of Larsen vs original Steinbrocker (DDG).
BRITISH JOURNAL OF RHEUMATOLOGY VOL. 37 NO. 7
F. 2.—(a) Responsiveness analysis of Larsen vs modified Steinbrocker (DS). (b) Responsiveness analysis of Larsen vs modified Steinbrocker (DDG).
RAHMAN ET AL.: RADIOLOGICAL ASSESSMENT IN PSORIATIC ARTHRITIS
both Steinbrocker’s radiological scoring methods are reliable and reproducible. Larsen’s and the modified Steinbrocker’s methods were both equally responsive and superior to the original Steinbrocker’s to detect radiographic change in psoriatic arthritis. Thus, Larsen’s and the modified Steinbrocker’s methods can be used to monitor disease progression, examine clinical correlations or study the effects of anti-rheumatic drugs in the radiographic assessment of psoriatic arthritis. A Supported by the Medical Research Council of Canada.
9. 10. 11.
R 1. Steinbrocker O, Traeger CH, Batterman RC. Therapeutic criteria in rheumatoid arthritis. J Am Med Assoc 1949;140:659–62. 2. Kellegren JH. Radiological signs of rheumatoid arthritis. A study of observer differences in the reading of hand films. Ann Rheum Dis 1956;15:55–60. 3. Thould AK, Simon G. Assessment of radiological changes in the hands and feet in rheumatoid arthritis. Their correlation with prognosis. Ann Rheum Dis 1966;25:220–8. 4. Mall JC, Genant HK, Silcox DC et al. The efficacy of detail radiography in the evaluation of patients with rheumatoid arthritis. Radiology 1974;112:37–42. 5. Genant HK. Methods of assessing radiographic change in rheumatoid arthritis. Am J Med 1983;75:35–47. 6. Bluhm GB, Smith DW, Mikalaskek WM. A radiological method of assessment of bone and joint destruction in rheumatoid arthritis. Henry Ford Hosp Med J 1983;31:152–61. 7. Plant MJ, Saklatvala J, Borg A, Jones PW, Dawes PT. Measurement and prediction of radiological progression in early rheumatoid arthritis. J Rheumatol 1994; 21:1808–13. 8. Van der Heijde DMFM, van Leeuwen MA, van Reil P et al. Biannual radiographic assessment of hands and feet in a three year prospective follow-up of patients
14. 15. 16. 17. 18. 19.
with early rheumatoid arthritis. Arthritis Rheum 1992;35:26–34. Scott DL, Coulton BL, Bacon P. Methods of X-ray assessment in rheumatoid arthritis: a re-evaluation. Br J Rheumatol 1985;24:31–9. Larsen A, Dale K, Eek M. Radiographic evaluation of rheumatoid arthritis and related conditions by standard reference films. Acta Radiol Diagn 1977;18:481–91. Sharp JT, Young DY, Bluhm GB et al. How many joints in the hands and wrist should be included in a score of radiological abnormalities used to assess rheumatoid arthritis? Arthritis Rheum 1985;28:1326–35. Kaye J. Radiographic assessment of rheumatoid arthritis. Rheum Dis Clin North Am 1995;21:395–406. Cuchacovich M, Couret M, Peray P. Precision of the Larsen and the Sharp methods of assessing radiological change in patients with rheumatoid arthritis. Arthritis Rheum 1992;35:736–9. Abu-Shakra M, Gladman DD. Aetiopathogenesis of psoriatic arthritis. Rheumatol Rev 1994;3:1–6. Bombardier C, Tugwell P. A methodological framework to develop and select indices for clinical trials: statistical and judgemental approaches. J Rheumatol 1982;9:753–7. Rau R, Hehborn G. A modified version of Larsen’s scoring method to assess radiologic changes in rheumatoid arthritis. J Rheumatol 1995;22:1976–82. Larsen A. How to apply Larsen score in evaluating radiographs of rheumatoid arthritis in long term studies? J Rheumatol 1995;22:1974–5. Fleiss JL. The design and analysis of clinical experiments. New York: John Wiley and Sons, 1986. Wassenberg S, Rau R. Problems in evaluating radiographic findings in rheumatoid arthritis using different methods of radiographic scoring: examples of difficult cases and a study design to develop an improved scoring method. J Rheumatol 1995;22:1990–2002. Rau R. Methods of scoring radiographic changes in rheumatoid arthritis. J Rheumatol 1995;22:1048–54. Sharp JT, Lidsky MD, Collins LC et al. Methods of scoring the progression of radiologic changes in rheumatoid arthritis. Correlation of radiological, clinical, and laboratory abnormalities. Arthritis Rheum 1971;14: 706–20.
BRITISH JOURNAL OF RHEUMATOLOGY VOL. 37 NO. 7
BRHEUM – MSS No. 7A/1032 Radiological assessment in psoriatic arthritis P. Rahman, D. D. Gladman, R. J. Cook, Y. Zhou, G. Young and D. Salonen 000