quantifying the quality of medical x-ray images

5 downloads 0 Views 4MB Size Report
QUALITY OF. MEDICAL X-RAY IMAGES. An evaluation based on normal anatomy for lumbar spine and chest radiography. Anders Tingberg. Malmö 2000 ...
QUANTIFYING THE QUALITY OF MEDICAL X-RAY IMAGES An evaluation based on normal anatomy for lumbar spine and chest radiography

Anders Tingberg Department of Radiation Physics, Malmö Lund University

Malmö 2000

Department of Radiation Physics, Malmö Lund University, Sweden

QUANTIFYING THE QUALITY OF MEDICAL X-RAY IMAGES An evaluation based on normal anatomy for lumbar spine and chest radiography

Anders Tingberg

Malmö 2000

Doctoral Dissertation Department of Radiation Physics Lund University Malmö University Hospital SE-205 02 Malmö, Sweden

Copyright © 2000 Anders Tingberg (pp 1-54) ISBN 91-628-4225-0 Printed in Sweden by Team Offset Media, Malmö 2000

4

Bara döda fiskar flyter med strömmen.

5

6

ABSTRACT Optimisation in diagnostic radiology requires accurate methods for determination of patient absorbed dose and clinical image quality. Simple methods for evaluation of clinical image quality are at present scarce and this project aims at developing such methods. Two methods are used and further developed; fulfilment of image criteria (IC) and visual grading analysis (VGA). Clinical image quality descriptors are defined based on these two methods: image criteria score (ICS) and visual grading analysis score (VGAS), respectively. For both methods the basis is the Image Criteria of the “European Guidelines on Quality Criteria for Diagnostic Radiographic Images”. Both methods have proved to be useful for evaluation of clinical image quality. The two methods complement each other: IC is an absolute method, which means that the quality of images of different patients and produced with different radiographic techniques can be compared with each other. The separating power of IC is, however, weaker than that of VGA. VGA is the best method for comparing images produced with different radiographic techniques and has strong separating power, but the results are relative, since the quality of an image is compared to the quality of a reference image. The usefulness of the two methods has been verified by comparing the results from both of them with results from a generally accepted method for evaluation of clinical image quality, receiver operating characteristics (ROC). The results of the comparison between the two methods based on visibility of anatomical structures and the method based on detection of pathological structures (free-response forced error) indicate that the former two methods can be used for evaluation of clinical image quality as efficiently as the method based on ROC. More studies are, however, needed for us to be able to draw a general conclusion, including studies of other organs, using other radiographic techniques, etc. The results of the experimental evaluation of clinical image quality are compared with physical quantities calculated with a theoretical model based on a voxel phantom, and correlations are found. The results demonstrate that the computer model can be a useful tool in planning further experimental studies.

7

ABBREVIATIONS 2AFC A1 AEC AFROC AP Az DQE DROC ESD ESF FFE FP FPI FPF FROC H/D curve ICS IQI KAP (DAP) kVp LROC LSF mAs MTF NPS PA ROC SNR TLD TP TPF VGA VGAS

two-alternative forced choice the area under an AFROC curve automatic exposure control alternative free-response receiver operating characteristics anterior – posterior the area under an ROC curve detective quantum efficiency differential ROC entrance surface dose edge spread function free-response forced error false positive false positive image false positive fraction free-response receiver operating characteristics Hurter-Driffield curve (characteristic curve) image criteria score image quality index kerma area product (dose area product) tube potential (peak kilo voltage) localisation ROC line spread function milliampere second modulation transfer function noise power spectrum (Wiener spectrum) posterior – anterior receiver operating characteristics signal-to-noise ratio thermoluminiscent dosemeter true positive true positive fraction visual grading analysis visual grading analysis score

AAPM CEC EU ICRP ICRU IEC ISO

American Association of Physicists in Medicine Commission of the European Communities European Union International Commission on Radiological Protection International Commission of Radiation Units and Measurements International Electrotechnical Commission International Organization for Standardization

8

THIS THESIS IS BASED ON THE FOLLOWING PAPERS: I

A. Almén, A. Tingberg, S. Mattsson, J. Besjakov, S. Kheddache, B. Lanhede, L. G. Månsson, M. Zankl. The influence of different technique factors on image quality of lumbar spine radiographs as evaluated by established CEC image criteria. Brit. J. Radiol., 2000, revision requested.

II

B. Lanhede, A. Tingberg, S. Kheddache, M Carlsson, L. G. Månsson, L. Björneld, M. Widell, A. Almén, J. Besjakov and S. Mattsson. The influence of technique factors on image quality for chest radiography - Application of the CEC image quality criteria. Radiat. Prot. Dosimetry, 2000, accepted for publication.

III

A. Tingberg, C. Herrmann, A. Almén, S. Mattsson, W. Panzer, J. Besjakov, B. Lanhede, L. G. Månsson, S. Kheddache, M. Zankl. The influence of the characteristic curve on the image quality of clinical radiographs of the lumbar spine and chest. Manuscript, 2000.

IV

C. Herrmann, A. Tingberg, J. Besjakov, K. Rodenacker. Simulation of nodule-like pathology in radiographs of the lumbar spine. Radiat. Prot. Dosimetry, 2000, accepted for publication.

V

A. Tingberg, C. Herrmann, J. Besjakov, K. Rodenacker, A. Almén, P. Sund, S. Mattsson, L. G. Månsson. Evaluation of lumbar spine images with added pathology. Proceedings of SPIE, Vol 3981, pp 34-42, 2000

VI

P. Sund, C. Herrmann, A. Tingberg, S. Kheddache, L. G. Månsson, A. Almén, S. Mattsson. Comparison of two methods for evaluating image quality of chest radiographs. Proceedings of SPIE, Vol 3981, pp 251-258, 2000

VII

M. Sandborg, A. Tingberg, P. Sund, G. McVey, D. Dance, A. Almén, B. Lanhede, G. Alm-Carlsson, S. Mattsson, L. G. Månsson. Demonstration of correlation between physical and clinical image quality measures in chest and lumbar spine screen-film radiography. Manuscript, 2000.

The papers will be referred to in the text by their Roman numerals.

9

THE FOLLOWING PRELIMINARY REPORTS HAVE BEEN GIVEN: A. Tingberg, A. Almén, J. Besjakov, S. Kheddache, B. Lanhede, S. Mattsson, L. G. Månsson: The influence of the characteristic curve on the image quality Svenska läkaresällskapets riksstämma, Göteborg, 1998 A. Tingberg, C. Herrmann, A. Almén, J. Besjakov, S. Mattsson, P. Sund, B. Lanhede, S. Kheddache, L. G. Månsson: Comparison of two methods for evaluation of the image quality of lumbar spine radiographs Workshop on “Medical X-ray Imaging – Potential Impact of the new EC Directive”, Malmö, 1999. A. Tingberg, C. Herrmann, A. Almén, J. Besjakov, S. Mattsson, P. Sund, B. Lanhede, S. Kheddache, L. G. Månsson: VGA och ROC Röntgenveckan i Skåne, Malmö, 1999. A. Tingberg, P. Sund, C. Herrmann, S. Kheddache, L. G. Månsson, A. Almén and S. Mattsson: Jämförelse av två metoder för utvärdering av bildkvalitet i röntgenbilder av lungor Svenska läkaresällskapets riksstämma, Stockholm, 1999

10

CONTENTS 1. INTRODUCTION ........................................................................................... 12 1.1 OPTIMISATION IN DIAGNOSTIC RADIOLOGY ............................................................... 12 1.2 EVALUATION OF IMAGE QUALITY IN DIAGNOSTIC RADIOLOGY .................................. 13 1.2.1 PRIMARY PHYSICAL CHARACTERISTICS OF THE IMAGING SYSTEM ....................... 13 1.2.2 OVERALL SYSTEM PERFORMANCE ....................................................................... 15 1.3 OBSERVER PERFORMANCE METHODS ........................................................................ 16 1.3.1 OBSERVER PERFORMANCE METHODS BASED ON LESION DETECTION ................... 18 1.3.2 OBSERVER PERFORMANCE METHODS BASED ON VISIBILITY OF ANATOMICAL STRUCTURES ....................................................................................................... 21 1.4 MODELLING THE IMAGING CHAIN .............................................................................. 22 1.5 BACKGROUND OF THE CURRENT STUDY .................................................................... 22

2 AIMS.................................................................................................................. 24 3. MATERIALS AND METHODS .................................................................... 25 3.1 IMAGES AND RADIOGRAPHIC TECHNIQUES ................................................................ 25 3.2 PRODUCTION OF HYBRID IMAGES .............................................................................. 28 3.3 EVALUATION OF CLINICAL IMAGE QUALITY .............................................................. 29 3.4 IMAGE CRITERIA SCORE, ICS ..................................................................................... 30 3.5 VISUAL GRADING ANALYSIS SCORE, VGAS .............................................................. 30 3.6 FREE-RESPONSE FORCED ERROR, FFE, EXPERIMENT ................................................. 31 3.7 VALIDATION OF CLINICAL IMAGE QUALITY EVALUATION METHODS ......................... 31 3.8 PREDICTION OF CLINICAL IMAGE QUALITY WITH THEORETICAL MODELS .................. 32

4. RESULTS AND DISCUSSION ...................................................................... 33 4.1 FULFILMENT OF IMAGE CRITERIA, ICS ...................................................................... 33 4.2 VISUAL GRADING ANALYSIS, VGAS ......................................................................... 40 4.3 VALIDATION OF METHODS FOR EVALUATION OF CLINICAL IMAGE QUALITY .............. 43 4.4 PREDICTION OF CLINICAL IMAGE QUALITY WITH THEORETICAL MODELS .................. 45

5. CONCLUSIONS .............................................................................................. 46 5.1 CLINICAL IMAGE QUALITY DESCRIPTORS .................................................................. 46 5.2 VALIDATION OF CLINICAL IMAGE QUALITY DESCRIPTORS ......................................... 46 5.3 PREDICTION OF CLINICAL IMAGE QUALITY WITH THEORETICAL MODELS .................. 47

6. ACKNOWLEDGEMENTS ............................................................................ 48 7. REFERENCES ................................................................................................ 49

11

1. INTRODUCTION 1.1 Optimisation in diagnostic radiology Optimisation in diagnostic radiology means that for each examination the image quality must satisfy the diagnostic requirements for making a correct diagnosis at the lowest possible patient exposure. Optimisation of diagnostic radiology is recommended by the ICRP (1996), and with the stricter legislation of the European Medical Exposure Directive (1997) optimisation of diagnostic radiology is required within Europe, as from the year 2000. Because most radiographic procedures are clearly justified and normally directly benefit the exposed individual, less attention has been given to the optimisation of protection at medical exposure than at most other applications involving ionising radiation. As a result, there is considerable scope for dose reduction in diagnostic radiology (ICRP, 1991). At present there is a variety of methods available for the determination of patient absorbed dose, including methods based on phantom measurements as well as purely computational methods. Gray et al. (1981) measured the organ-absorbed doses for 24 diagnostic examinations by placing TLDs in an anthropomorphic phantom, and calculated the mean organ dose to six risk organs. Almén and Nilsson (1996) determined conversion factors for risk organs, according to ICRP (1991), for two radiographic examinations of children, by measuring depth dose curves and applying them to a mathematical phantom describing the sizes and locations of the internal organs (Cristy and Eckerman, 1987). Tingberg et al. (1995) determined conversion factors for risk organs by measuring the dose distributions in an anthropomorphic phantom for a CT thorax examination, and applying these distributions to the Cristy and Eckerman phantom (1987). The computational methods are typically based on Monte-Carlo calculations. Alm Carlsson et al. (1984) determined conversion factors between the kerma area product (KAP) and the energy imparted in examinations of the head and the trunk. Rosenstein (1988) has presented a catalogue of conversion factors for calculating organ doses from the entrance surface dose. Hart et al. (1994) determined conversion factors for a number of x-ray examinations for calculation of the effective dose from the kerma area product or the entrance surface dose (ESD). Zankl et al. (2000) calculated the effectice dose to patients for two examinations. There are also reference levels for the ESD (European Commission, 1996) and KAP (Saxeböl et al., 1998) for a number of standard radiographic examinations. The interpretation of these reference levels is that the mean ESD or KAP to a group of patients should not exceed the reference levels. For individual patients the reference levels should be applied with flexibility to allow higher doses if justified by clinical judgement (ICRP, 1991). There are, however, no corresponding reference levels for clinical image quality.

12

1.2 Evaluation of image quality in diagnostic radiology Before going into details of how to determine image quality, it is important to understand the concept of image quality in diagnostic radiology. An image with good quality is an image that fulfils its diagnostic purpose. For example, in repeated examinations of scoliosis a noisy image might fulfil the diagnostic purpose without being an image with the highest possible spatial resolution. The image has good quality in relation to its purpose (ICRU, 1996). There is a wide spectrum of methods for assessment of image quality, Table 1. Some of these methods focus on the physical characteristics of the imaging systems and others on subjective assessment of image quality; some are used for the whole imaging chain including the human observer (observer performance), while others are used for parts of the systems (typically physical measurements). The measurements mentioned in the first row are usually part of a regular quality control programme. The ultimate goal of all of these methods is to establish a connection between the physical characteristics of the imaging system and the diagnostic outcome of the system for a given examination. Table 1. Methods for quality assessment of diagnostic imaging procedures. Level of ambition

Investigation

Measurement

Lowest

Radiographic technique

 

Primary physical characteristics

Overall system performance

Images of anthropomorphic phantoms

Highest

         

Images of patients

 

Equipment characteristics (focal spot size, grids, screen-film combination, …) Exposure parameters (tube voltage, mAs-value, …) Contrast Spatial resolution (MTF) Noise (Wiener spectrum) Signal-to-noise ratio (SNR) Detective quantum efficiency, DQE Image quality index, IQI Contrast-detail resolution Receiver operating characteristics, ROC and ROC related methods (2AFC, LROC, FROC, AFROC, FFE, DROC) Visual grading analysis, VGA Receiver operating characteristics, ROC and ROC related methods (2AFC, LROC, FROC, AFROC, FFE, DROC) Visual grading analysis, VGA Image criteria

1.2.1 Primary physical characteristics of the imaging system The imaging characteristics of a system for diagnostic radiology can be studied using various physical test phantoms. These types of tests are often employed in determin-

13

ing the primary image quality parameters, such as contrast, spatial resolution and noise. Generally these tests are performed with physical measurements on film or soft copy (print-out or monitor image from digital detectors).

Contrast Contrast is a measure of the relative signal difference between two locations in an image, especially between the image of an object and the background (Cunningham, 2000). The contrast of an imaging system is described by the characteristic response curve of the system. For screen-film systems the characteristic curve has an S-shape and can be determined according to ISO 9236-1 (1996). An x-ray film can easily become over- or underexposed. In digital systems the characteristic curve is generally linear and thus the risk of over- or underexposure can be avoided. However, if the radiographers or radiologists are not properly trained there is an obvious risk that the patient exposure can be unnecessarily high since a digital detector does not set the limit as film does with respect to film blackening.

Spatial resolution The spatial resolution of an imaging system is the ability to reproduce small objects, or to separate the images of two objects close to each other. The spatial resolution of an imaging system can be described by the line spread function (LSF), the edge spread function (ESF) or the modulation transfer function (MTF). The MTF is a measure of how well an imaging system reproduces the spatial frequencies of the signal variations impinging on the system. The MTF can be determined by Fourier transform of the line spread function. This method is described by for example Doi et al. (1982). They made measurements on screen-film systems using the LSF. In digital radiography, Fujita et al. (1985) determined the MTF using measurements of the LSF, and Samei and Flynn (1998) used the ESF to determine the MTF. Yin et al. (1990) and Boone and Seibert (1994) have described methods for determination of analytical functions to describe the line spread function and the edge spread function respectively. By these methods the noise of the subsequent modulation transfer function can be reduced.

Random noise Random noise is the fluctuations of the signal over an image, as a result of a uniform exposure (Cunningham, 2000). Random noise can be characterised by the standard deviation of the signal variations but a complete description must take into account the spatial frequency dependence of the noise (Yaffe, 2000). The noise power or Wiener spectrum describes the noise contents in an image as a function of the spatial frequency. Dainty and Shaw (1974) have described a noise theory and measurements of the Wiener spectrum. Measurements of Wiener spectra have been performed by e.g. Doi et al. (1982) for screen-film systems, Giger et al. (1984) for digital radio-

14

graphic systems and by Marsh et al. (1995) for various digital systems (e.g. ultrasound, digital subtraction angiography (DSA) etc.).

Signal-to-noise ratio, SNR While the difference in signal amplitude of an object relative to the background (contrast) and the noise properties of the image are important characteristics of the image quality, it is the ratio between them that is the most significant indicator of image quality (Dobbins, 2000). The signal-to-noise ratio (SNR) is defined by Equation 1:

SNR 

S

Equation 1



where ΔS = Difference in signal of an object and of the background σ = Standard deviation of the signal distribution in a background area Rose (1948) showed that the SNR of an object needs to be 5 or more to make a reliable detection by human observers possible. The SNR of a radiographic system describes the ability of the system to reproduce low-contrast objects. The quantity SNR is especially useful in digital radiography systems, where the image noise can be lowered by increasing the number of photons in the image (and also increasing the patient absorbed dose), and the contrast can be varied arbitrarily by changing the slope of the characteristic curve of the system (Yaffe, 2000).

1.2.2 Overall system performance A combination of the primary physical characteristics (contrast, spatial resolution and noise) into a common quantity will be a measure of the overall imaging capability of a radiographic system. Such a quantity can be used to compare the total image quality, of two radiographic systems. For instance, a system with high spatial resolution but with very high inherent noise can have lower overall image quality than the system with moderate spatial resolution and low noise. The combination of the primary physical characteristics can be achieved analytically or indirectly. The detective quantum efficiency and the image quality index (see below) are formed by combining primary physical characteristics into one equation. The contrast-detail resolution test (see below) combines the primary physical characteristics indirectly, by a study of the visibility of small low-contrast objects in the final image.

Detective quantum efficiency, DQE The detective quantum efficiency (DQE) is a parameter that basically describes the overall capability of the system to use the information of the incoming photon fluence distribution for the formation of the image. It is defined as the square of the

15

quotient of the signal-to-noise ratio of the signal out of the imaging system over the signal-to-noise ratio impinging on the imaging system (Equation 2) (Rose, 1946; Shaw, 1979). Determination of the DQE is made by combining the MTF, the Wiener spectrum and the response of the system (gradient of the characteristic curve). The DQE is a valuable quantity for comparing the performance of various imaging systems (Dobbins et al., 1995; Metz et al., 1995). Especially, it is widely used for characterising the performance of digital systems. The problem with this quantity is that it is difficult to determine and at the moment there is no widely accepted standard method for the determination of the DQE. There are however at least two task groups (one international, IEC, and one American, AAPM), that are currently working on establishing an acceptable standard method.

 SNRout   DQE    SNRin 

2

Equation 2

Image quality index, IQI Another method for describing the overall performance of an imaging system is the image quality index (IQI), described by Hessler et al. (1985a) and by Desponds et al. (1991). The IQI of an imaging system is the diameter of the smallest high contrast sphere (microcalcification) that is detectable with the system. The IQI is determined by combining the MTF, the Wiener spectrum and the contrast of the imaging system. The method has mostly been used in mammography (Hessler et al., 1985b; Verdun et al., 1996).

Contrast-detail resolution A tool for quick determination of the performance of an imaging system is the use of a contrast-detail phantom (e.g. Thijsen et al. (1989)). The phantom consists of a PMMA slab with cylindrical holes of different diameter and depth, or, for mammography, gold discs of different diameter and thickness included in a PMMA slab. For a given diameter the observer marks the detail that is on the edge of detection (smallest visible contrast) in the image. Usually human observers are used for the evaluation, but due to the subjective nature of human observers (e.g. the acuteness of the human eye, fatigue etc.) a detection algorithm implemented in a computer program may be used (Jansen and Zoetelief, 2000). Rose (1948) described the theoretical basis for this method, in the so-called Rose-model. Cook et al. (1994) have used the method for chest radiography, Almén et al. (1996) for paediatric radiology, and de Paredes et al. for mammography (1998).

1.3 Observer performance methods In observer performance studies the quality of the whole imaging chain is evaluated, including the human observer (the radiologist). These methods give a measure of the

16

clinical image quality of an imaging system, i.e. how good the system is for a diagnostic purpose. Ideally these tests are performed on patient images produced in the clinic. For practical reasons images of human-like phantoms and hybrid images (patient images in which structures, typically simulating pathology, have been inserted) are often used. There is a risk that observer performance methods will be influenced by the subjective nature of the observer, for example the experience of the radiologist, the viewing conditions, fatigue etc. Therefore, efforts have been made to reduce these effects. Figure 1 shows the relationships between the observer performance methods that are described below, and the relationships that were searched for in the present study (Papers V and VI). The methods on the left side of Figure 1 (ROC and ROCrelated methods) are generally very time consuming, both the set-up of the study and the execution of the study. The images that should be used for ROC and ROCrelated methods must contain a known signal (e.g. a tumour), and the signal must be subtle enough. If the signal is too conspicuous the quality of the imaging system will not be tested, since the signal will be fully reproduced independently of the quality of the system. The advantage of the ROC methods is that they have been widely used and are well accepted in the radiological community. The methods on the right side of Figure 1 are comparatively easy to perform: the preparations for these studies are straightforward, since images from normal patients (i.e. healthy patients) can be used. The disadvantage of these methods is that there are few studies described in the literature. Naturally it would be very desirable to find a correlation between the easy-toperform methods on the right and the well-tried ROC methods to the left.

17

Figure 1. The relationships between different observer performance methods. The arrows indicate relationships that were searched for in the present study.

1.3.1 Observer performance methods based on lesion detection Receiver operating characteristics, ROC Receiver operating characteristics (ROC) analysis is widely accepted as the most complete way of quantifying and reporting accuracy in two-group classification tasks (Metz, 2000). By deliberately changing the “threshold of abnormality” or “critical confidence level” which is used to distinguish nominally positive images from nominally negative results, the radiologist can achieve different combinations of sensitivity and specificity. ROC analysis is ideally suited to the task of separating such “decision threshold” effects from inherent differences in diagnostic accuracy. Changing the setting of this critical confidence level changes an imaging procedure’s sensitivity (i.e. the probability that an actually-positive image will be classified correctly as “positive” with respect to a given disease) and the procedure’s specificity (the probability that an actually-negative image will be classified correctly as “negative”). ROC analysis shows all tradeoffs between sensitivity and specificity that the imaging system can achieve.

18

An ROC curve is a plot of all of these tradeoffs, and is formed by plotting the true positive fraction (TPF = sensitivity) against the false positive fraction (FPF = [1specificity]). The area under an ROC curve, Az, is a common measure of the ability of the imaging system to discriminate the signal (e.g. a tumour) from the background, i.e. Az is a measure of the image quality of the system (Metz, 1986, 1989; Chesters, 1992; Månsson, 1994). The objective of ROC analysis is to observe images containing either one signal or none. For each image the observer has to state the level of confidence of his/her decision on a scale, often from 1 to 5, see Table 2. If the observer is totally sure that the image contains a tumour he/she calls a 5, if the observer thinks that an image might not contain a tumour then he/she calls a 2, and so on. Table 2. A five-level confidence scale often used in ROC. Confidence level 1 2 3 4 5

Meaning definitely or almost definitely negative probably negative possibly positive probably positive definitely or almost definitely positive

Based on these decisions the TPF and the FPF of each decision threshold level are calculated and an ROC curve can be constructed (Månsson, 1994), Figure 2. By means of the ROCKIT software (Metz, 1998) the TPF, the FPF and the area under an ROC curve, Az, can be calculated. The area can assume values between 0.5 and 1. The larger the area, the better the system is for detecting lesions. An ROC curve based on pure guessing would be a diagonal through the coordinates (0,0) and (1,1) and thus it has an area of 0.5.

Figure 2. Examples of curves resulting from ROC, FROC and AFROC, respectively. ROC analysis was originally used for quantifying electrical signals, e.g. radar. It was later adapted to nuclear medicine and diagnostic radiology and the method has been widely used for assessment of the quality of an imaging system. Swets and Picket (Swets and Pickett, 1982) described the theory of ROC. Hanley and McNeil (1982)

19

described the interpretation of the area under an ROC curve. Criticism of the ROC methodology includes the fact that ROC cannot deal with multiple signals in an image and the location of the lesion (Harrington, 1990). This can lead to a situation where the true signal in an image is missed but a false signal is indicated. This results wrongly in a true positive classification of the image. In comparisons between imaging systems, this is not a serious problem, provided that the degree of uncertainty due to localisation errors is approximately the same for all the systems being evaluated (Månsson, 1994). Even though ROC analysis has some limitations it is the most widely used method for observer performance studies, especially in the USA.

Variants of ROC Chakraborty (2000) has made a detailed review of the different variants of ROC. A short description of some of these methods is given below.

Free-response receiver operating characteristics, FROC To overcome the limitations of ROC analysis, several other methods have been developed. Bunch et al. (1978) proposed the free-response ROC, FROC. The images for an FROC study contain either no signal (i.e. a negative image) or one or more known signals (i.e. a positive image). The observer is asked to search each image for lesions and to indicate and rate the suspicious locations according to confidence level. A high rating means high confidence that the location in question is actually a lesion, and a low rating means that the location probably is not a lesion. Multiple responses are allowed for each image, but the observer can also choose not to indicate any locations in the image (the observer has not detected any lesions). If an indicated location is an actual lesion, the event is scored as a true positive (TP) event; otherwise it is scored as a false positive (FP). The true positive fraction is plotted against the mean number of false positives per image or sample area to form an FROC curve (Figure 2). The FROC curve is unbounded in the positive x direction and does not, by itself, lead to a summary performance index (like the area under an ROC curve, Az). The FROC curves resulting from an evaluation of two different imaging systems can however be compared, qualitatively. The curve of the system that has higher y-values for given x-values is the better system for separating lesions from the background.

Alternative free-response receiver operating characteristics, AFROC An AFROC study is performed in the same way as an FROC study, but the interpretation of the outcome is somewhat different. The observer searches the images for lesions and indicates for each suspected location the level of confidence that the location actually is a lesion. In AFROC, however, instead of using the mean number of false positives per image, Chakraborty and Winter (1990) suggested to use the probability that an image produces one or more false positive responses. The AFROC curve is constructed by plotting the true positive fraction of detected and localised lesions as a function of the probability of at least one false positive per image (Figure

20

2). The area under an AFROC curve, A1, can assume any value between 0 and 1. Zero means pure guessing and 1 means perfect performance of the imaging system.

Free-response forced error, FFE With the FFE experiment a direct estimation of the area under an AFROC curve, A1, can be obtained (Chakraborty and Winter, 1990). A1 is called “FFE-score” in this text. Images with one or more lesions are evaluated. The observers are asked to search the images for suspected locations, and to rank these findings in order of confidence. The average fraction of correct findings over a number of images before a false positive location is indicated is equal to the area under the AFROC curve. The mathematics of FFE is simple and the method is very intuitive (Månsson, 1994).

1.3.2 Observer performance methods based on visibility of anatomical structures Visual grading analysis, VGA In visual grading analysis, VGA, the appearance of the whole image or parts of an image is evaluated visually. A special case of VGA is to compare the visibility of defined structures with the same structures in a reference image (Månsson, 1994). The latter approach has been used in this study and the term VGA refers to this method in the following text. The visibility of the structures is often graded on a five-level scale: clearly inferior to (-2), slightly inferior to (-1), equal to (0), slightly better than (+1) and clearly better than (+2) the structure in the reference image. Previously this evaluation method has mostly been used in chest radiography (Manninen et al., 1985; Kheddache et al., 1993; Leitz et al., 1993), but also in mammography (Olsen and Sager, 1995; Kheddache and Kvist, 1997) and skeletal radiography (De Smet et al., 1981, 1982).

Image criteria, IC A special case of visual grading analysis is the use of image criteria. The criteria state various levels of visibility of defined structures (e.g. “visually sharp reproduction of the pedicles”). The task for the observer is to decide whether the criterion is fulfilled or not in an image. With this method an absolute level of image quality can be determined from what is stated in the criterion, within the frame of reference of the observer, provided that the observer’s decision threshold is constant. The CEC has presented a list of image criteria in the European guidelines on Quality Criteria for Diagnostic Radiographic Images (European Commission, 1996). Schibilla and Moores (1995) have reviewed the history of these image criteria. Vañó et al. (1995) have used the image criteria for chest radiography and proposed a list of revised criteria. Guibelalde et al. (1996) used the image criteria for evaluation of two types of screen-film combinations for chest radiography.

21

1.4 Modelling the imaging chain For optimisation purposes it is necessary to use a realistic theoretical model to test different imaging conditions. The model must give the same results as human observers would in clinical detection studies (Dance et al., 1999). Detailed studies can be performed with the model, e.g. varying the tube voltage in small steps, which would be very time-consuming if real patient images and human observers should be used (Dance et al., 1999). Models can also be used for predicting the effects of varying the exposure parameters beyond what is available on the market today (Sandborg et al., 1999), e.g. “Is it possible to use an air gap for lumbar spine radiography to lower the patient absorbed dose in a way similar to that for chest?”. With the aid of a theoretical model a selection of which radiographic techniques should be tested on patients can be made, thereby reducing the number of exposures applied to patients with techniques that will later prove to be useless.

1.5 Background to the present study 1.5.1 European Guidelines In the late 1980s, several studies of patient exposure in Europe had shown large variations in the strategies for radiographic examinations as well as large variations in the patient absorbed dose for a given examination (Schibilla and Moores, 1995). It was proposed that the collected dosimetric data should become the starting point for optimisation of radiographic examinations. In 1987, a group from the Radiation Protection Programme of the Commission of the European Communities (CEC) initiated a project for the establishment of quality criteria for diagnostic radiographic images (Maccia et al., 1995). The purpose of the project was to provide radiologists with a list of radiological, as well as physical and technical, criteria, to be used evaluating the quality of routine examinations. Two clinical trials were performed in 1987 and 1991 (European Commission, 1990, 1996) and the purpose of them was to present a set of guidelines for the achievement of uniform strategies for standard radiographic examinations in Europe, thereby acquiring adequate image quality at low patient absorbed dose. The results of the trials in 1987 and 1991 showed large variations in technique parameters such as kVp, focal spot size, focus-film distance, speed and the use of automatic exposure control (AEC). There was a considerable difference in the patient absorbed doses for all types of examinations and projections, especially for lumbar spine. However, 50 % of the mean dose values were within the CEC reference values. No significant correlation was observed between patient absorbed dose and image quality. AEC and fast systems seemed to be suitable strategies for the optimisation of patient absorbed dose and image quality. One suggestion, as a result of the trials, was to develop quality assurance programmes and offer the personnel training in radiation protection for the personnel in order to improve the situation.

22

Another important outcome of the CEC trials in 1987 and 1991 was the image criteria that were proposed (Table 5 and Table 6). The image criteria were intended to specify an image for which all pathological conditions relating to the disease, provided that the criteria in Table 5 and Table 6 were met, could be expected to be diagnosed. One of the conclusions of the 1991 trial was that these criteria could be used in a reproducible way and that they represented an improvement in the evaluation of image quality compared to pure subjective judgement.

1.5.2 CEC 4th Framework Programme, Predictivity and Optimisation in Diagnostic Radiology After a joint proposal from three British, Swedish and German groups, a project “Predictivity and Optimisation in Diagnostic Radiology”, was funded by CEC in 1996 (Moores et al., 2000). The main objective of the project was to investigate and possibly to understand the relationship between physical and technical parameters of a radiographic procedure, the image quality as experienced by a human observer and the patient absorbed dose. One of the main objectives for the Malmö-Göteborg group was to use and to further improve the existing image criteria and to evaluate the image quality of clinical images produced with a variety of radiographic techniques, with methods based on the image criteria. Images produced in the clinic and images that were digitally manipulated in various ways to simulate characteristics of available imaging systems and beyond what is currently available, were evaluated. The Malmö-Göteborg group worked very closely with GSF (National Research Center for Environment and Health, Neuherberg, Germany) throughout the project. The GSF group was responsible for performing the image manipulations, as well as for calculating the effective dose to the patients, given the exposure factors actually used.

23

2. AIMS The aims of this thesis, which was a part of the project “Predictivity and Optimisation in Diagnostic Radiology”, were: 

To develop simple methods based on the CEC image criteria for quantifying the clinical image quality of various radiographic techniques for lumbar spine and chest radiography



To validate these new image quality evaluation methods with generally accepted methods based on receiver operating characteristics, ROC. A correlation between the simpler methods based on the CEC image criteria and the more tedious methods based on ROC analysis would be very beneficial to routine image quality evaluation in a clinical environment



To compare measures of image quality, predicted by a theoretical model, with the clinical image quality descriptors obtained by the new methods

24

3. MATERIALS AND METHODS 3.1 Images and radiographic techniques A bank of clinical AP lumbar spine and PA chest radiographs was collected together with detailed information about the exposure settings, the film processing, the patient configuration (height, weight and thickness in the beam direction) as well as the entrance surface dose (Paper I and Paper II), for several different radiographic techniques (Table 3 and Table 4). The detailed information about all factors affecting the exposure made it possible to simulate changes in the exposure settings actually used during the exposure. The first four techniques for lumbar spine and the first sixteen for chest (under the heading “Trial 1”) were produced with conventional screen-film technology, and the others were simulated by digital manipulation of various imaging parameters (Herrmann et al., 2000a). The techniques listed under ”Trial 2” were produced by varying the shape of the characteristic curve. IL2, IL and A have a flatter characteristic curve than the L-film (lower contrast) and M, G, UG and UG2 have a steeper characteristic curve than the L-film (higher contrast) (Paper III). Varying the MTF and/or the noise of the system simulated the techniques under ”Trial 3”. The basis for these techniques was the characteristics of a screen-film system with speed 400 for lumbar spine and speed 320 for chest (speed is the sensitivity of the screen, the higher the speed the higher the sensitivity). By reducing the MTF and increasing the noise (by manipulation of the Wiener spectrum) in the image, two techniques were simulated which correspond to faster screens, speed≈800 and speed>1000. For the last four techniques (for both lumbar spine and chest), either the MTF or the noise of the system was altered. The MTF or the noise was altered to the level of the 800 and the 1000 speed screens. As an example, for the technique MTF 0, noise 2, the noise was increased to the level of the system with speed>1000 (no alteration of the MTF). After the alteration of radiographic parameters the images were printed on film for evaluation by means of conventional viewing boxes. Four experienced radiologists compared the quality of the printed images with the quality of the original films. No significant differences could be detected; thus the digitisation and printing process did not visibly influence the quality of the printed images. The procedure from original analogue image, via digitising, image manipulation and printing, to image quality evaluation is summarised in Figure 3. A thorough description of the image manipulations is given by Herrmann et al. (2000a) and of the digitising and printing process by Herrmann et al. (2000b). All of these techniques except the last four were evaluated with the image criteria method and with visual grading analysis. Images produced with the last four techniques (for both lumbar spine and chest) were only evaluated with VGA. The three techniques marked with * were used for producing the images for the FFE study.

25

Table 3. The radiographic techniques used for the production of the lumbar spine images. The techniques marked with * were used for producing the hybrid images. The abbreviations in the table mean kVp – speed of screen-film system – film type – and, if present, resolution level and added noise Radiographic technique Trial 1 70 kVp, 400, Latitude film (L) 70 kVp, 600, Latitude film (L) 90 kVp, 400, Latitude film (L) 90 kVp, 600, Latitude film (L)

Trial 2 70 kVp, 400+600, ”Infra Latitude2” film (IL2) 70 kVp, 400+600, ”Infra Latitude” film (IL) 70 kVp, 400+600, Latitude film (L) 70 kVp, 400+600, Medium film (M) 70 kVp, 400+600, Gradient film (G) 70 kVp, 400+600, ”Ultra Gradient” film (UG) 70 kVp, 400+600, ”Ultra Gradient2” film (UG2) 90 kVp, 400+600, ”Infra Latitude2” film (IL2) 90 kVp, 400+600, ”Infra Latitude” film (IL) 90 kVp, 400+600, Latitude film (L) 90 kVp, 400+600, Medium film (M) 90 kVp, 400+600, Gradient film (G) 90 kVp, 400+600, ”Ultra Gradient” film (UG) 90 kVp, 400+600, ”Ultra Gradient2” film (UG2)

Trial 3 70 kVp, 400, Latitude film (corresponds to MTF 0, noise 0) * 70 kVp, 800, Latitude film (corresponds to MTF 1, noise 1) * 70 kVp, >1000, Latitude film (corresponds to MTF 2, noise 2) * 70 kVp, 400, Latitude film, MTF 1, noise 0 70 kVp, 400, Latitude film, MTF 2, noise 0 70 kVp, 400, Latitude film, MTF 0, noise 1 70 kVp, 400, Latitude film, MTF 0, noise 2

Table 4. The radiographic techniques used for the production of the chest images. The techniques marked with * were used for producing the hybrid images. The abbreviations in the table mean kVp – speed of screen-film system – method for scatter reduction – maximum density – film type – and, if present, resolution level and added noise. Radiographic technique Trial 1 102 kVp, 160, grid, 1.3, Latitude film 102 kVp, 320, grid, 1.3, Latitude film

26

102 kVp, 160, grid, 1.8, Latitude film 102 kVp, 320, grid, 1.8, Latitude film 102 kVp, 160, air gap, 1.3, Latitude film 102 kVp, 320, air gap, 1.3, Latitude film 102 kVp, 160, air gap, 1.8, Latitude film 102 kVp, 320, air gap, 1.8, Latitude film 141 kVp, 160, grid, 1.3, Latitude film 141 kVp, 320, grid, 1.3, Latitude film 141 kVp, 160, grid, 1.8, Latitude film 141 kVp, 320, grid, 1.8, Latitude film 141 kVp, 160, air gap, 1.3, Latitude film 141 kVp, 320, air gap, 1.3, Latitude film 141 kVp, 160, air gap, 1.8, Latitude film 141 kVp, 320, air gap, 1.8, Latitude film

Trial 2 102 kVp, 160, air gap, 1.8, Asymmetric film (A) 102 kVp, 160, air gap, 1.8, ”Infra Latitude2” film (IL2) 102 kVp, 160, air gap, 1.8, ”Infra Latitude” film (IL) 102 kVp, 160, air gap, 1.8, Latitude film (L) 102 kVp, 160, air gap, 1.8, Gradient film (G) 102 kVp, 320, air gap, 1.8, Asymmetric film (A) 102 kVp, 320, air gap, 1.8, ”Infra Latitude2” film (IL2) 102 kVp, 320, air gap, 1.8, ”Infra Latitude” film (IL) 102 kVp, 320, air gap, 1.8, Latitude film (L) 102 kVp, 320, air gap, 1.8, Gradient film (G) 141 kVp, 160, air gap, 1.8, Asymmetric film (A) 141 kVp, 160, air gap, 1.8, ”Infra Latitude2” film (IL2) 141 kVp, 160, air gap, 1.8, ”Infra Latitude” film (IL) 141 kVp, 160, air gap, 1.8, Latitude film (L) 141 kVp, 160, air gap, 1.8, Gradient film (G) 141 kVp, 320, air gap, 1.8, Asymmetric film (A) 141 kVp, 320, air gap, 1.8, ”Infra Latitude2” film (IL2) 141 kVp, 320, air gap, 1.8, ”Infra Latitude” film (IL) 141 kVp, 320, air gap, 1.8, Latitude film (L) 141 kVp, 320, air gap, 1.8, Gradient film (G)

Trial 3 141 kVp, 320, air gap, 1.8, Latitude film (corresponds to MTF 0, noise 0) * 141 kVp, 800, air gap, 1.8, Latitude-film (corresponds to MTF 1, noise 1) * 141 kVp, >1000, air gap, 1.8, Latitude-film (corresponds to MTF 2, noise 2) * 141 kVp, 320, air gap, 1.8, Latitude film, MTF 1, noise 0 141 kVp, 320, air gap, 1.8, Latitude film, MTF 2, noise 0 141 kVp, 320, air gap, 1.8, Latitude film, MTF 0, noise 1 141 kVp, 320, air gap, 1.8, Latitude film, MTF 0, noise 2

27

Figure 3. Schematic description of the production of the images for the three clinical trials.

3.2 Production of hybrid images Hybrid images are images into which artificial structures, typically pathological, are introduced, i.e. structures that are not present in the normal patient are inserted in the image. These images were used to evaluate the image quality of a radiographic system with various receiver operating characteristics (ROC) methods. In this way, the intrinsic problem of ROC analysis, that the true state of the images (positive or negative signal) must be known, can be overcome. Either the structures are placed on the surface of the patient at the time of the exposure or the structures are inserted in the image after the exposure. The structures to be added to the images must mimic, as closely as possible, the real signals (e.g. tumours) that can be found in radiographic images (Sherrier et al., 1985). Ideally the added lesions should be indistinguishable from real tumours. A method originally developed for chest images (Samei et al., 1997) was adapted (Paper IV). The addition of lesions into the chest images was rather straightforward, but for lumbar spine images there were some special problems that had to be taken care of. Previously there was no method for producing lumbar spine hybrid images; such a method had to be developed. The appearance of a lesion in a lumbar spine radiograph can be both increased and decreased optical density, reproducing destructive and sclerotic lesions, respectively. These two types of lesions were added to the lumbar spine images. Furthermore, a destructive lesion involving

28

cortex in a vertebral body destructs the cortical line and in order to simulate this behaviour image processing was performed to erase the cortex line. The development of this method is described in Paper III.

3.3 Evaluation of clinical image quality The clinical image quality of the radiographs was evaluated in three so-called clinical trials. In each of these trials physical and technical parameters were varied in a controlled manner. In the first trial, the tube voltage and speed of the screen-film system were varied for the lumbar spine study while the tube voltage, the speed of the screen-film system, the maximum optical density and the method for scatter reduction were varied for the chest study. In Trial 2 the film type (i.e. the shape of the characteristic curve) was varied, and in Trial 3 the resolution and the noise of the imaging system were varied for both types of investigations. The images were evaluated using two methods based on the image criteria of the European Guidelines (European Commission, 1996), fulfilment of image criteria and visual grading analysis. For each image the observers stated whether each of the image criteria was fulfilled or not, and graded the visibility of the structures of these criteria compared to the same structures in a reference image. Based on these two evaluation methods two clinical image quality descriptors were defined, the image criteria score, ICS, and the visual grading analysis score, VGAS. All images were evaluated on film with conventional viewing boxes by seven expert radiologists having each at least fifteen years of experience in the field. At the time of the first trial, the 1996 version of the image criteria was not available and therefore the 1990 version was used. The difference between the 1990 and the 1996 versions for the lumbar spine image criteria is negligible (only the wording for criterion 1 has been changed). The 1990 version of the lumbar spine image criteria used in Trial 1 is listed in Table 5. In the chest study the criteria that mainly relates to the positioning of the patient were omitted since they address the skill of the radiographer rather than the actual performance of the imaging system. The image criteria used for the chest study in the first trial are listed in Table 6. A suggestion for a revision of the chest criteria was presented before Trial 1, and this revised version of the criteria was also used in Trial 1 (Paper II). Table 5. The CEC image criteria for lumbar spine examinations presented in the CEC Quality Criteria for Diagnostic Radiographic Images and Patient Exposure Trial (1990). Image criteria 1 2 3 4 5

Visually sharp reproduction of the upper and lower-plate surfaces, represented as lines in the centred beam area Visually sharp reproduction of the pedicles Reproduction of the intervertebral joints Reproduction of the spinous and transverse processes Visually sharp reproduction of the cortex and trabecular structures

29

6 7

Reproduction of the adjacent soft tissue, particularly the psoas shadows Reproduction of the sacro-iliac joints

Table 6. The CEC image criteria for chest examinations presented in the CEC Quality Criteria for Diagnostic Radiographic Images and Patient Exposure Trial (1990). Image criteria 1 2 3 4

Reproduction of the vascular pattern in the whole lung, particularly the peripheral vessels Visually sharp reproduction of the trachea and proximal bronchi, the borders of the heart and aorta Visually sharp reproduction of the diaphragm and costo-phrenic angles Visualisation of the retrocardiac lung and mediastinum

3.4 Image criteria score, ICS For a group of images stemming from the same radiographic technique, the fraction of fulfilled criteria is calculated to form an image criteria score, the ICS. Equation 3 defines the ICS: I

ICS 

C

O

 F i 1 c 1 o 1

i ,c ,o

Equation 3

I C O

where Fi,c,o = Fulfilment of criterion c for image i, and observer o. F i,c,o = 1 if criterion c is fulfilled, otherwise Fi,c,o = 0 I = Number of images C = Number of criteria O = Number of observers The definition implies that the ICS can be used as a score for individual images, criteria or observers.

3.5 Visual grading analysis score, VGAS Defined structures in an image are visually compared with the corresponding structures in a reference image and graded on a five-level scale (clearly inferior to (-2), slightly inferior to (-1), equal to (0), slightly better than (+1) and clearly better than (+2) the structure in the reference image). A mean score, the visual grading analysis score, VGAS, is calculated for a group of images. VGAS is defined analogously to ICS and can also be used for individual images, structures or observers.

30

I

VGAS 

where Gi,s,o = I = S = O =

S

O

 G

i , s ,o

i 1 s 1 o 1

Equation 4

I  S O

Grading (-2, -1, 0, +1 or +2) for image i, structure s and observer o Number of images Number of structures Number of observers

3.6 Free-response forced error, FFE, experiment The objective of FFE is to detect lesions in images containing multiple lesions and to rank the findings in order of confidence (Chakraborty and Winter, 1990). The lesions are positioned randomly throughout an image or a part of an image. The observer has to mark the most apparent lesion with “1”, the second most with “2” and so on. This procedure continues until a false positive finding is called, i.e. the observer marks something in the image that is not a lesion. If the observer did not mark a false positive location, then he/she must continue searching the image until a false positive finding has been called. The fraction of correct findings before the false positive finding is the FFE-score and this score indicates how well the imaging system is capable of reproducing the lesions. The FFE-score, A1 is calculated for a group of images with Equation 5 I

A1 

 (TPF i 1 o 1

where (TPFo,i) I O

O

o ,i

)

Equation 5

I O = the quotient between the number of correct findings before observer o makes an error and the total number of lesions in image i = Number of images = Number of observers

3.7 Validation of clinical image quality evaluation methods To test the validity of the ICS and the VGAS as methods for evaluation of clinical image quality, comparisons were made with a method based on the well-established ROC methodology, the free-response forced error experiment (FFE). Hybrid images were produced with the characteristics of three of the radiographic techniques mentioned in Table 3 and Table 4 for lumbar spine and chest respectively, i.e. the same

31

techniques evaluated using ICS and VGAS. The expert radiologists performed the free-response forced error experiment of the hybrid images in connection with the ICS and the VGAS study.

3.8 Prediction of clinical image quality with theoretical models The complete imaging chain for the lumbar spine and the chest study including a voxelised male anatomy (voxel phantom) was modelled and used in a Monte Carlo simulation (Paper VII). The voxel phantom was derived from three-dimensional CT data of an adult male, segmented into the human anatomy (Zubal et al., 1994). Each voxel belongs to one of 52 organs and each organ was identified with one of four tissue types: average soft tissue, healthy lung, bone or bone spongiosa (Dance et al., 1999). The size of the voxels of the phantom was adjusted to simulate the size of an average European male (ICRU, 1992). Appropriate anatomical details have been added to the phantom so that realistic estimates of the contrast and signal-to-noise ratio (SNR) of details in the normal anatomy could be made. Detailed information about the exposure conditions, the processing, the characteristics of the screen-film system, and the patient configurations was included in the model. The computer program has been validated against measurements on phantoms and patients (Dance et al., 1997; Sandborg et al., 2000). The contrast and the SNR of important anatomical details were calculated and the dynamic range of the image was determined. These quantities were determined for the radiographic techniques used in Trial 1 (see section 3.1) and compared with the ICS and VGAS for the corresponding techniques, and correlations were sought for. The purpose of these investigations was to establish a connection between quantities calculated with the model and “the truth” as evaluated by the radiologists.

32

4. RESULTS AND DISCUSSION 4.1 Fulfilment of image criteria, ICS The results of the evaluation of all radiographic techniques with the image criteria method for lumbar spine (Papers I & III) are presented in Figure 4 and for chest (Papers II, III & VI) in Figure 5. The ICS values are corrected for the fact that the evaluation procedure was slightly changed from Trial 1 to Trial 3. The ICS-value for the techniques which were evaluated in all three trials (70 kVp, speed 400, L-film for lumbar spine, and 141 kVp, speed 320, air gap, density 1.8, L-film for chest) should have been constant, but as a result of the change of evaluation procedure a small deviation was found between Trial 1 and Trial 2, and between Trial 1 and Trial 3. To correct for this a correction factor was calculated for the ICS values from Trial 2, and another one for the ICS values from Trial 3. These correction factors were applied to all ICS values for the techniques of the two trials respectively, and the corrected ICS values are presented in Figure 4 and 5, respectively. The radiographic techniques evaluated in Trial 1 only are white, the techniques in Trial 2 are diagonally striped and the techniques in Trial 3 are horizontally striped. The chest techniques evaluated in Trial 1 and Trial 2 are vertically striped. The 70 kVp, speed 400, L-film technique for lumbar spine and the 141 kVp, speed 320, air gap, density 1.8, L-film technique for chest evaluated in all three trials are dotted. An average score for the three trials is presented for these two techniques.

Figure 4. ICS for all radiographic techniques for lumbar spine. The abbreviations on the x-axis mean kVp – speed of screen – film type.

33

Compared with the standard 70 kVp – speed 400 – L-film technique for lumbar spine, all 90 kVp techniques have a lower ICS regardless of speed and film type, and this also applies to the techniques with higher speed. The ICS values do not show if any one of the techniques evaluated in the three trials is superior in terms of clinical image quality to the standard technique. Stronger discriminative tests, such as VGA, must be employed in order to detect these small differences in image quality. These results show that the ICS values indicate what was expected and thus ICS can be used as a clinical image quality descriptor. The ICS is an absolute score and thus the image quality of techniques evaluated in different studies can be compared with each other. The weak discriminative power of the ICS is also demonstrated and as a result any ICS study should be accompanied by a VGA study.

34

Figure 5. ICS for all radiographic techniques for chest. The abbreviations on the x-axis mean kVp – speed of screen – method for scatter reduction – maximum density – film type.

35

Two conclusions can be drawn from the ICS results for the chest study (Figure 5): 1) A proper exposure is crucial (i.e. a maximum optical density of 1.8). The clinical image quality of the underexposed (maximum density 1.3) images is lower than that of the properly exposed images. 2) There is no correlation between clinical image quality and patient absorbed dose (i.e. the speed of the screen-film system and use of air gap or grid). An appropriate choice of radiographic technique can lower the patient absorbed dose without reducing the quality of the image. Discussions with the radiologists in connection with the image quality evaluation trials showed that the image criteria presented in the CEC Guidelines (European Commission, 1996) were ambiguous and could be improved. A revised version of the image criteria for lumbar spine is presented in Table 7, and for chest in Table 8. More detailed descriptions of the revisions are given by Besjakov et al. (2000) for the lumbar spine criteria and by Kheddache et al. (2000) for the chest criteria. Table 7. A suggested revision of the CEC image criteria for lumbar spine (Besjakov et al., 2000). Image criteria 1 2 3 4 5 6 7

Visually sharp reproduction of the upper and lower-plate surfaces, represented as lines in the centred beam area directed at L3 Visually sharp reproduction of the pedicles in the centred beam area Visually sharp reproduction of the lateral cortex Reproduction of the intervertebral joints Reproduction of the spinous processes Reproduction of the transverse processes Reproduction of the adjacent soft tissue

Table 8. A suggested revision of the CEC image criteria for chest (Kheddache et al., 2000). Image criteria 1 2 3 4 5 6 7

Sharp visualisation of the vessels seen 3 cm from the pleural margin Visualisation of the thoracic vertebrae behind the heart Visualisation of the retrocardiac vessels Sharp visualisation of the pleural margin Sharp visualisation of vessels seen en face in the central area Sharp visualisation of the hilar region Visualisation of the carina with main bronchi

The locations of the structures of the revised criteria for the lumbar spine are indicated in Figure 6 and for chest in Figure 7.

36

Figure 6. Schematic description of locations of the lumbar spine image criteria. The numbers in the image correspond to the numbering of the image criteria in Table 7.

Figure 7. Schematic description of the locations of the chest image criteria. The numbers in the image correspond to the numbering of the image criteria in Table 8.

37

The discussions also revealed the fact that the radiologists tended to view different parts of the images. As an example, take criterion 2 for lumbar spine. If one radiologist views the pedicles of all five vertebrae bodies he/she may state that this criterion is not fulfilled since the fifth vertebrae body is normally obscured by the pelvic bone, whereas another radiologist viewing only the central part of the lumbar spine (L2 – L4) states that the criterion is fulfilled. Furthermore, the first split second of viewing an image might result in an “overall impression” influencing further evaluation. Therefore masking of the images was introduced and used in Trial 2 and Trial 3. Manninen et al. (1985) have previously used masking for chest radiography without discussing the effects of the masking. Masks were cut so that only parts of the images were visible to the observer thereby forcing the observers to view exactly the same areas of the images. The masking of the images is shown schematically in Figure 8 for lumbar spine and in Figure 9 for chest.

Figure 8. The masking of the lumbar spine images.

38

Figure 9. The masking of the chest images. The standard deviation of the ICS for individual observers is a measure of how well they agree judging the images. The standard deviations of the first trial, where the images were evaluated with the original image criteria (European Commission, 1990), and of the third trial, where the images were evaluated with the revised criteria and when the images were masked, are shown in Table 9. The standard deviation for the revised lumbar spine criteria has decreased somewhat and thus the radiologists agree more in their opinions of the quality of the images. As for the chest images the standard deviation for the revised criteria used in Trial 3 are higher than for the original image criteria. There are two possible explanations for this: 1) The number of original criteria (Table 6) was only four compared to seven for the revised criteria. Some of the original criteria deal with more than one structure, e.g. criterion 2 mentions four different structures: the trachea, the proximal bronchi, the borders of the heart and the aorta. 2) For more than 55 % of the observations (one observer and one image comprising one observation) all image criteria were fulfilled. This leads to a high ICS and thus a low separating power (the criteria are always fulfilled regardless of the actual quality of the image). The first revision of the image criteria had a higher separating power than the original criteria.

39

Table 9. The standard deviation of the ICS for the lumbar spine and the chest studies for the original image criteria of Trial 1, and the revised criteria of Trial 3. Lumbar spine

Standard deviation

Trial 1 Trial 3

23.3 % 20.7 %

Chest

Standard deviation

Trial 1 Trial 3

26.5 % 32.4 %

4.2 Visual grading analysis, VGAS VGA is a powerful tool for comparison of similar exposure conditions (Månsson, 1994), and small differences between imaging systems can be detected (Olsen and Sager, 1995). Differences between radiographic techniques are indicated with VGAS that the ICS does not reveal. VGAS for lumbar spine is presented in Figure 10 and for chest in Figure 11. Positive values of VGAS mean that the image quality of the investigated system is superior to that of the reference system, and negative values of VGAS mean that the system is inferior to the reference system. The data for the three trials are separated since the reference images were not identical and the evaluation procedure has changed slightly over the trials. Note: the images were masked according to Figure 6 and Figure 7 in Trial 2 and Trial 3.

Figure 10. VGAS for all radiographic techniques for lumbar spine. The results are separated for the three trials. The abbreviations on the x-axis mean kVp – speed of screen – film type and if present decreased resolution and/or added noise.

40

Figure 11. VGAS for all radiographic techniques for chest. The results are separated for the three trials. The abbreviations on the x-axis mean kVp – speed of screen – method for scatter reduction – maximum optical density – film type and if present decreased resolution and/or added noise.

41

For lumbar spine three conclusions can be drawn from Figure 10: 1) The image quality in terms of VGAS for 70 kVp is better than for 90 kVp. The contrast is lowered too much by the use of 90 kVp. The speed of the screen-film system is of minor importance. 2) The image quality is improved for the films with a steeper characteristic curve (UG2, UG, G & M), compared to the L-film, which was the standard film in this study. The image quality is decreased for the films with flatter characteristic curve (IL & IL2), compared to the L-film. (The L-film was used as reference, i.e. VGAS is zero for the L-film). The 70 kVp techniques were always better than the corresponding 90 kVp techniques. It has to be noted that the results were obtained by observing a “window” showing an area around the third vertebrae body (L3). Thus the image quality can be increased locally by employing a steeper characteristic curve. 3) When separating the MTF component from the noise component of a faster screen (e.g. the 1000 screen is divided into an MTF0-noise2 component and an MTF2-noise0 component), the results show that added noise decreases image quality more, than reduced MTF. Actually, the image quality improves when reducing the MTF in a noisy image (e.g. from MTF0-noise2 to the 1000 screen). Three conclusions can also be drawn from the results of the chest study (Figure 11): 1) The image quality for maximum optical density 1.8 is better than that of 1.3. The tube voltage, the speed of the screen-film system and the choice of air gap or grid have a minor influence on the image quality. This means that a dose reduction can be achieved by using 141 kVp, screen-film system with speed 320 and an air gap for chest radiography. 2) A local increase in image quality can be achieved by using a film with a steeper characteristic curve (G) than the standard L-film technique. A steeper film has the disadvantage that the risk for under- or overexposure is higher than for the standard L-film. The image quality for films with a flatter characteristic curve is lower than for the standard L-film. 3) As for lumbar spine, added noise deteriorates image quality more than reduced MTF. VGA is a strong discriminative test for separating radiographic techniques in terms of clinical image quality. Differences between radiographic techniques could be detected with VGAS, which were not detectable with ICS. VGAS yields results that were expected in advance, and thus VGAS can be used as a descriptor for clinical image quality. The VGAS results are relative to the quality of the reference images, and this is the reason why the quality of the 70 kVp images cannot be compared to the quality of the 90 kVp images in Trial 2. For this comparison ICS must be employed.

42

4.3 Validation of methods for evaluation of clinical image quality The results of the comparison between the two methods based on visibility of anatomical structures (fulfilment of image criteria and visual grading analysis) and the method based on detection of pathological structures (free-response forced error) show an indication that the former two methods can be used for evaluation of clinical image quality quite as well as the method based on ROC (Papers V, VI and Figure 12, see also Tingberg et al. (2000)). The data points refer to the three screen-film combinations evaluated in Trial 3. One objection to this conclusion is, – of course – that three data points are too few to draw any valid statistical conclusions about the type of relationship between methods based on normal anatomy and the ROC related method. However, for reasons of statistical precision in the data points, 50 images of each technique were evaluated in the FFE experiment. Due to a limited amount of time and evaluation capacity there was only room for three different radiographic techniques in the trial. In future studies it may be more valuable to lower the statistical precision and instead use more radiographic techniques. Studies of ICS and VGAS are easy to set up even on a routine basis for evaluation of image quality in a clinical environment. The results imply that the data yielded by such studies are objective and robust. It should be noted that for the FFE experiment, 50 images of each technique were evaluated, whereas in the ICS and VGAS studies only fifteen images of each technique were evaluated. Yet, the standard error of the mean is smaller for ICS and VGAS compared to the FFE-score (Papers V, VI and Figure 12). The easier ICS and VGAS evaluation methods can therefore be employed instead of the much more cumbersome ROC analysis, and the same conclusions about the quality of different imaging systems can be drawn.

43

1.00

ICS ± S.E.

0.90 0.80 0.70 0.60 0.50 0.30

0.40

0.50

0.60

0.70

FFE score ± S.E.

Figure 12. ICS vs. FFE-score for lumbar spine. The results of the FFE study also show that the production of hybrid images suitable for a study of detectability, such as FFE, was successful (Paper IV). Consequently hybrid images produced by the methods described in Paper IV can be used in ROC analysis for evaluation of image quality of clinical images.

4.4 Comparison between ICS and VGAS In Figure 13 the two new clinical image quality descriptors are compared to each other. For chest there is a strong correlation between the two descriptors (Spearman’s R=0.86, p=0.00). This means that ICS and VGAS work equally well as clinical image quality descriptors for the radiographic techniques evaluated in this study. For lumbar spine, no correlation can be detected (Spearman’s R=0.32, p=0.19). The reason for this can be found by comparing Figures 4 and 10. In Figure 4 the difference in ICS is relatively small between the techniques with a flat characteristic curve (IL & IL2) and the techniques with a steep curve (M, G, UG & UG2). The difference in VGAS (Figure 10), however, is big between flat and steep characteristic curves. Even though the radiologists consider the image quality of the flat curves to be inferior to the reference image, the image criteria are still fulfilled. ICS is therefore not suitable for lumbar spine radiography as a clinical image quality descriptor for evaluating imaging systems with different contrast. By excluding the data from Trial 2 a moderate correlation can be demonstrated (Spearman’s R=0.67, p=0.10) between ICS and VGAS for lumbar spine.

44

Figure 13. ICS vs. VGAS for all radiographic techniques. Spearman’s R is presented in the figure.

4.5 Prediction of clinical image quality with theoretical models A comparison between the physical measures calculated with the computer model, and clinical image quality descriptors evaluated by human observers (ICS and VGAS) proves that a good estimation of clinical image quality can be predicted, provided the imaging chain is modelled in sufficient detail (Paper VII). The study also shows that a careful selection of the physical quantities used in the prediction is crucial, since these quantities were not identical for the lumbar spine and the chest examinations. For lumbar spine, the correlation between physical quantities and clinical image quality descriptors was strongest for the contrast and SNR of trabecular details in the vertebrae bodies, and for chest, the contrast of blood vessels, the dynamic range and the contrast of calcifications (Paper VII). The model can turn out to be a very useful tool for the optimisation of radiographic procedures.

45

5. CONCLUSIONS 5.1 Clinical image quality descriptors Two types of clinical image quality descriptors, image criteria score ICS and visual grading analysis score VGAS, have been developed and tested for several radiographic techniques in lumbar spine and chest radiography. Both of these descriptors have a potential for evaluation of clinical image quality. A careful selection of the image criteria formulations and of the structures which are used in the evaluations, is, however, crucial. Throughout this study, the CEC image criteria for lumbar spine and chest have been further developed. The separating power of two versions of the image criteria for chest was studied in Paper II, and the results showed that the revised version of the image criteria was better at separating radiographic techniques in terms of image quality (ICS) than the original criteria. A comparison between ICS and VGAS shows that there is a strong correlation between the two descriptors for chest but for lumbar spine such a correlation cannot be seen. The reason for this is probably that ICS is not sensible enough for detection of differences in contrast. The difference between the various observers’ opinions of the quality of an image is still substantial for both lumbar spine and chest. The two methods complement each other: IC is an absolute method, which means that images produced with different radiographic techniques and different patients can be compared. Still, the separating power of IC is weaker than VGA. VGA is an excellent method for comparing images produced with different radiographic techniques and has strong separating power, but the results are relative since the quality of an image is compared to the quality of a reference image.

5.2 Validation of clinical image quality descriptors There is an indication of a correlation between the results of the evaluation methods based on the European image criteria (IC and VGA) and the evaluation method based on ROC methodology (FFE) (Paper V and VI and Figure 12). This means that clinical image quality can be evaluated quite easy and equally well with either ICS or VGAS instead of the much more cumbersome and tedious methods based on ROC. The setting up of an image evaluation with IC or VGA is quick since ordinary clinical images can be used, the time for the actual image observation is shorter and the results of the study are very intuitive and easy to interpret (Paper V). Therefore, methods based on IC and VGA are suitable for optimising radiographic procedures in the clinic.

46

5.3 Prediction of clinical image quality with theoretical models Until now it has been very difficult to predict clinical image quality when evaluated by human observers. The imaging systems used in Trial 1 were modelled including a voxel phantom and Monte Carlo calculations were used to simulate the radiation transport (Paper VII). Physical quantities were calculated in the simulated images and compared with the clinical image quality descriptors for the radiographic techniques evaluated in Trial 1. Significant correlations were found between the quantities determined by the theoretical model and the image quality evaluated by the radiologists. It should be noted that the exposure conditions have to be well known in order to be truthfully included in the model. The success of the predictions with the model proves that such predictions are possible. The theoretical model can be a useful tool in planning further experimental studies, i.e. for the investigation of imaging techniques without having to expose patients during an experimental stage.

47

6. ACKNOWLEDGEMENTS To Sören Mattsson, my supervisor, for your support and enthusiasm and for always getting me back on track when I was about to quit science and instead become a computer engineer To Anja Almen, my co-supervisor. You have taught me a lot useful things – both about science and about life To Jack Besjakov for friendship, help and advice especially on the medical field To the other members of the Malmö-Göteborg group, Lars Gunnar Månsson, Birgitta Lanhede, Susanne Kheddache and Patrik Sund for lots of fun during the EUproject, especially for all the “late night meetings” at various hotels in different European cities To Clemens Herrmann and Werner Panzer for friendship and good cooperation, and for teaching me the German way of working To Mike Moores for never-ending enthusiasm and for keeping the project together To the group of European radiologists who evaluated the images and in other ways contributed to the success of this study: Paloma Chimeno, Claudius Gückel, Maurice Laval-Jantet, Mario Maffessanti, Jörg Oestmann, Dieter Saure, Ulf Tylén and Graham Whitehouse To all of my friends and colleagues at the Department of Radiation Physics, Malmö, and especially to those who carefully read and contributed to this thesis To my wife Jonna for love, patience and encouragement over the years To my parents Siv and Bengt for love, support and for always helping me To my childhood friends Daniel Mattisson, Martin Jacobsson and Jesper Lans for keeping me in touch with the real world Finally, to the Commission of the European Communities (CEC), Swedish Radiation Protection Institute (SSI) and Swedish Foundation for Strategic Research for financial support

48

7. REFERENCES Alm Carlsson G., Carlsson C.A. and Persliden J. Energy imparted to the patient in diagnostic radiology: calculation of conversion factors for determining the energy imparted from measurements of the air collision kerma integrated over the beam area. Phys Med Biol 29, 1329-1341. (1984) Almén A., Lööf M. and Mattsson S. Examination technique, image quality and patient dose in paediatric radiology. A survey including 19 Swedish hospitals. Acta Radiol 37, 337-342. (1996) Almén A. and Nilsson M. Simple methods for estimation of dose distributions, organ doses and energy imparted in paediatric radiology. Phys Med Biol 41, 1093-1105. (1996) Besjakov J., Tingberg A., Almén A. and Mattsson S. Further development of the European Image Criteria – An attempt to create a useful tool for clinical image quality assessment. Manuscript (2000) Boone J.M. and Seibert J.A. An analytical edge spread function model for computer fitting and subsequent calculation of the LSF and MTF. Med Phys 21, 1541-1545. (1994) Bunch P.C., Hamilton J.F., Sanderson G.K. and Simmons A.H. A Free-Response Approach to the Measurement and Characterization of Radiographic Observer Performance. Journal of Applied Photographic Engineering 4, 166-171. (1978) Chakraborty D.P. The FROC, AFROC and DROC variants of the ROC analysis. In Handbook of Medical Imaging. Edited by Beutel J., Kundel H.L. and Van Metter R.L. pp 771-796. SPIE Press, Bellingham, USA. (2000) Chakraborty D.P. and Winter L.H. Free-response methodology: alternate analysis and a new observer-performance experiment. Radiology 174, 873-881. (1990) Chesters M.S. Human visual perception and ROC methodology in medical imaging. Phys Med Biol 37, 1433-1476. (1992) Cook L.T., Insana M.F., McFadden M.A., Hall T.J. and Cox G.G. Comparison of the low-contrast detectability of a screen-film system and third generation computed radiography. Med Phys 21, 691-695. (1994) Cristy M. and Eckerman K.F. Specific absorbed dose fractions of energy at various ages from internal photon sources. Appendix A: Description of the mathematical phantoms. Oak Ridge National Laboratory. ORNL/TM-8381/VI. Oak Ridge National Laboratory, Oak Ridge, TN, USA. (1987) Cunningham I.A. Applied linear-systems theory. In Handbook of Medical Imaging. Edited by Beutel J., Kundel H.L. and Van Metter R.L. pp 79-159. SPIE Press, Bellingham, USA. (2000) Dainty J.C. and Shaw R. Image Science; Principles, analysis and evaluation of photographic-type imaging process. Academic Press, London. (1974)

49

Dance D.R., Lester S.A., Alm Carlsson G., Sandborg M. and Persliden J. The use of carbon fibre material in radiographic cassettes: estimation of the dose and contrast advantages. British Journal of Radiology 70, 383-390. (1997) Dance D.R., McVey G., Sandborg M., Persliden J. and Alm Carlsson G. Calibration and validation of a voxel phantom for use in the Monte Carlo modelling and optimisation of X-ray imaging systems. Proc SPIE, Medical Imaging: Physics of Medical Imaging 3659, 548-559. (1999) de Paredes E.S., Fatouros P.P., Thunberg S., Cousins J.F., Wilson J. and Sedgwick T. Evaluation of a digital spot mammographic unit using contrast detail phantom. In Digital mammography. Edited by Karssemeijer N., Thijsen M., Hendriks J. and van Erning L. pp 47-50. Kluwer Academic Publishers, Dordrecht, The Netherlands. (1998) De Smet A.A., Ritter E.M., Fritz S.L., Martin N.L., Chang C.H. and Templeton A.W. Evaluation of a new rare-earth screen for skeletal radiography. Radiology 141, 542-543. (1981) De Smet A.A., Ritter E.M., Fritz S.L., Martin N.L., Chang C.H. and Templeton A.W. An evaluation of screen-film combinations for detail skeletal radiography. Radiology 143, 259-260. (1982) Desponds L., Depeursinge C., Grecescu M., Hessler C., Samiri A. and Valley J.F. Image quality index (IQI) for screen-film mammography. Phys Med Biol 36, 19-33. (1991) Dobbins 3rd J.T., Image quality metrics for digital systems. In Handbook of Medical Imaging. Edited by Beutel J., Kundel H.L. and Van Metter R.L. pp 161-222. SPIE Press, Bellingham, USA. (2000) Dobbins 3rd J.T., Ergun D.L., Rutz L., Hinshaw D.A., Blume H. and Clark D.C. DQE(f) of four generations of computed radiography acquisition devices. Med Phys 22, 1581-1593. (1995) Doi K., Holje G., Loo L.-N. and Chan H.-P. MTF's and Wiener Spectra of Radiographic Screen-Film Systems. Bureau of Radiological Health. US Department of Health and Human Services, Rockville, Maryland. (1982) European Commission. CEC Quality Criteria for Diagnostic Radiographic Images and Patient Exposure Trial. EUR 12952, Brussels. (1990) European Commission. European guidelines on quality criteria for diagnostic radiographic images. EUR 16260, Brussels. (1996) European Union. European Medical Exposure Directive 97/43 (Euroatom). (1997) Fujita H., Doi K. and Giger M.L. Investigation of basic imaging properties in digital radiography. 6. MTFs of II-TV digital imaging systems. Med Phys 12, 713720. (1985) Giger M.L., Doi K. and Metz C.E. Investigation of basic imaging properties in digital radiography. 2. Noise Wiener spectrum. Med Phys 11, 797-805. (1984)

50

Gray J.E., Ragozzino M.W., Van Lysel M.S. and Burke T.M. Normalized organ doses for various diagnostic radiologic procedures. Am J Roentgenol 137, 463-470. (1981) Guibelalde E., Morillo A., Fernandez J.M. and Vanó E. Short communication: Use of the European image quality criteria for screen-film comparison - application for asymmetric systems. British Journal of Radiology 69, 64-69. (1996) Hanley J.A. and McNeil B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29-36. (1982) Harrington M.B. Some methodological questions concerning receiver operating characteristic (ROC) analysis as a method for assessing image quality in radiology. J Digit Imaging 3, 211-218. (1990) Hart D., Jones D.G. and Wall B.F. Estimation of effective dose in diagnostic radiology from entrance surface sode and dose-area product measurements. NRPB-R262. National Radiological Protection Board, Chilton. (1994) Herrmann C., Lanhede B., Tingberg A., Panzer W., Almén A., Mattsson S., Besjakov J., Månsson L.G., Kheddache S., Zankl M. and Verdun F. Methods for the digital simulation of certain image characteristics of conventional clinical radiographs of chest and lumbar spine. Manuscript (2000a) Herrmann C., Lanhede B., Tingberg A., Panzer W., Almén A., Mattsson S., Besjakov J., Månsson L.G., Kheddache S., Zankl M. and Regulla D. A system for the digital reproduction of conventional film radiographs of chest and lumbar spine. Manuscript (2000b) Hessler C., Depeursinge C., Grecescu M., Pochon Y., Raimondi S. and Valley J.F. Objective assessment of mammography systems. Part I: Method. Radiology 156, 215-219. (1985a) Hessler C., Depeursinge C., Grecescu M., Pochon Y., Raimondi S. and Valley J.F. Objective assessment of mammography systems. Part II: Implementation. Radiology 156, 221-225. (1985b) ICRP. Recommendations of the International Commission on Radiological Protection. ICRP publication 60. Pergamon Press, Oxford. (1991) ICRP. Radiological protection and safety in medicine. ICRP Publication 73. Pergamon Press, Oxford. (1996) ICRU. Phantoms and computational models in therapy, diagnosis and protection. ICRU Report 48. ICRU Publications, Bethesda, MD, USA. (1992) ICRU. Medical imaging - The assessment of image quality. ICRU Report 54. ICRU Publications, Bethesda, MD, USA. (1996) ISO. Photography - Sensitometry of screen-film systems for medical radiography - Part 1: Method for determination of sensitometric curve shape, speed and average gradient. Report 9236-1. (1996) Jansen J.T.M. and Zoetelief J. Computer aided assessment of image quality for mammography using a contrast detail phantom. Radiat Prot Dosimetry (2000)

51

Kheddache S., Denbratt L., Angelhed J.E. and Schlossman D. Image intensifier based digital chest radiography. Visibility of lesions and anatomy. Acta Radiol 34, 618-621. (1993) Kheddache S. and Kvist H. Digital mammography using storage phosphor plate technique-optimizing image processing parameters for the visibility of lesions and anatomy. Eur J Radiol 24, 237-244. (1997) Kheddache S., Lanhede B., Månsson L.G., Besjakov J., Almén A., Tingberg A. and Mattsson S. The European Commission Guidelines on Quality Criteria for Chest Radiography - Application and Revision. Manuscript (2000) Leitz W., Månsson L.G., Hedberg-Vikström B.R.K. and Kheddache S. In search of optimum chest radiography techniques. Br J Radiol 66, 314-321. (1993) Maccia C., Ariche-Cohen M., Nadeau X. and Severo C. The 1991 CEC trial on quality criteria for diagnostic radiographic images. Radiat Prot Dosimetry 57, 111117. (1995) Manninen H., Rytkonen H., Soimakallio S., Terho E.O. and Hentunen J. Large-screen image intensifier photofluorography compared with full-size screen-film technique in chest radiography. Acta Radiol Diagn (Stockh) 26, 525-533. (1985) Marsh D.M., Cooney P., McMahon B.P. and Malone J.F. Measurement of Wiener spectra in digital systems. Radiat Prot Dosimetry 57, 273-276. (1995) Metz C.E. ROC methodology in radiologic imaging. Invest Radiol 21, 720-733. (1986) Metz C.E. Some practical issues of experimental design and data analysis in radiological ROC studies. Invest Radiol 24, 234-245. (1989) Metz C.E. ROCKIT. Department of Radiology. The University of Chicago. (1998) Metz C.E. Fundamental ROC analysis. In Handbook of Medical Imaging. Edited by Beutel J., Kundel H.L. and Van Metter R.L. pp 751-769. SPIE Press, Bellingham, USA. (2000) Metz C.E., Wagner R.F., Doi K., Brown D.G., Nishikawa R.M. and Myers K.J. Toward consensus on quantitative assessment of medical imaging systems. Med Phys 22, 1057-1061. (1995) Moores B.M., Mattsson S., Månsson L.G. and Panzer W. Quality criteria development within the 4th Framework Programme. Radiation Protection Dosimetry (2000) Månsson L.G. Evaluation of radiographic procedures. Investigations related to chest imaging. Thesis. Göteborgs universitet, Göteborg. (1994) Olsen J.B. and Sager E.M. Subjective evaluation of image quality based on images obtained with a breast tissue phantom: comparison with a conventional image quality phantom. British Journal of Radiology 68, 160-164. (1995) Rose A. A unified approach to the performance of photographic film, televsion pickup tubes and the human eye. J Soc Motion Pict Engrs 47, 273-294. (1946) Rose A. The sensitivity performance of the human eye on an absolute scale. J Opt Soc Am 38, 196-200. (1948) Rosenstein M. Handbook of selected tissue doses for projections common in diagnostic radiology. Federal Food and Drug Administration, US. Department of Health,

52

Education and Welfare, Bureau of Radiological Health, Rockville, MD, USA. (1988) Samei E. and Flynn M.J. A method for measuring the presampled MTF of digital radiographic systems using an edge test device. Med Phys 25, 102-113. (1998) Samei E., Flynn M.J. and Eyler W.R. Simulation of subtle lung nodules in projection chest radiography [published erratum appears in Radiology 1997 Jun;203(3):884]. Radiology 202, 117-124. (1997) Sandborg M., McVey G., Dance D.R. and Alm Carlsson G. Comparison of model predictions of image quality measures with results of clinical trials in chest and lumbar spine screen-film imaging. Radiat Prot Dosimetry (2000) Sandborg M., McVey G., Dance D.R., Alm Carlsson G. and Verdun F.R. Optimization of chest and lumbar spine radiography by Monte Carlo modeling of the patient and imaging system. Proc SPIE, Medical Imaging: Physics of Medical Imaging 3659, 444-454. (1999) Saxeböl G., Olerud H.M., Hjardemaal O., Leitz W., Servomaa A. and Walderhaug T. Nordic guidance levels for patient doses in diagnostic radiology. Radiat Prot Dosimetry 80, 99-101. (1998) Schibilla H. and Moores B.M. Diagnostic radiology better images - lower dose. Compromise or correlation? A European strategy with historical overview. J Belge Radiol 78, 382-387. (1995) Shaw R. Some modern aspects of image evaluation. In The Physics of Medical Imaging: Recording System Measurements and Techniques. Edited by Haus A. pp 515-523. American Institute of Physics, New York. (1979) Sherrier R.H., Johnson G.A., Suddarth S.A., Chiles C., Hulka C. and Ravin C.E. Digital synthesis of lung nodules. Invest Radiol 20, 933-937. (1985) Swets J.A. and Pickett R.M. The evaluation of diagnostic systems: Methods from signal detection theory. Academic Press, New York. (1982) Thijsen M.A.O., Thijsen H.O.M., Merx J.L., Lindeijer J.M. and Bijerk K.R. A definition of image quality: the image quality figure. In Optimization of image quality and patient exposure in diagnostic radiology. BIR report 20. Edited by Moores B.M., Wall B.F., Eriskat H. and Schibilla H. pp 29. The British Institue of Radiology, London. (1989) Tingberg A., Almén A. and Nilsson M. Organ doses from CT thorax examinations of one year old children. Report MA-RADFYS 04, (1995) Tingberg A., Herrmann C., Almén A., Besjakov J., Mattsson S., Sund P., Lanhede B., Kheddache S. and Månsson L.G. Comparison of two methods for evaluation of the image quality of lumbar spine radiographs. Radiat Prot Dosimetry (2000) Vanó E., Guibelalde E., Morillo A., Alvarez-Pedrosa C.S. and Fernandez J.M. Evaluation of the European image quality criteria for chest examinations. Br J Radiol 68, 1349-1355. (1995)

53

Verdun F.R., Moeckli R., Valley J.F., Bochud F. and Hessler C. Survey on image quality and dose levels used in Europe for mammography. Br J Radiol 69, 762-768. (1996) Yaffe M.J. Digital mammography. In Handbook of Medical Imaging. Edited by Beutel J., Kundel H.L. and Van Metter R.L. pp 329-372. SPIE Press, Bellingham, USA. (2000) Yin F.F., Giger M.L. and Doi K. Measurement of the presampling modulation transfer function of film digitizers using a curve fitting technique. Med Phys 17, 962-966. (1990) Zankl M., Panzer W. and Herrmann C. Calculation of patient dose using a human voxel phantom of variable diameter. Radiat Prot Dosimetry (2000) Zubal I.G., Harrell C.R., Smith E.O., Rattner Z., Gindi G. and Hoffer P.B. Computerized three-dimensional segmented human anatomy. Med Phys 21, 299302. (1994)

54

Paper I

The influence of different technique factors on image quality of lumbar spine radiographs as evaluated by established CEC image criteria 1*A

Almén, PhD, 1A Tingberg, MSc, 1S Mattsson, Prof, 2J Besjakov, MD, PhD, Kheddache, MD, PhD, 4B Lanhede, MSc,4L G Månsson, PhD and 5M Zankl, MSc 3S

1. Department of Radiation Physics, Malmö University Hospital, SE-205 02 Malmö, Sweden 2. Department of Diagnostic Radiology, Malmö University Hospital, SE-205 02 Malmö, Sweden 3. Department of Diagnostic Radiology, Sahlgrenska University Hospital, SE-413 45 Göteborg, Sweden 4. Department of Radiation Physics, Sahlgrenska University Hospital, SE-413 45 Göteborg, Sweden 5. GSF-National Research Center for Environment and Health, D-85764 Neuherberg, Germany

Short title: Evaluating image quality in clinical radiographs

Keywords: lumbar spine, image quality, conventional radiographs

*

Address correspondence to A Almén PhD, Department of Health Physics, C2:63 Huddinge University Hospital, SE-141 86 Huddinge, Sweden

I-1

Abstract In this study we have investigated the image quality of lumbar spine radiographs taken under careful recording of technical and physical parameters. Two technical parameters was altered, tube voltage (70 and 90 kV for the AP projection and 77 and 95 kV for the lateral projection) and sensitivity of the film-screen system (sensitivity class 400 and 600). In total, 85 images were included in the study. Entrance surface dose (ESD) was measured with TL-dosimeters. The mean value of ESD for the different technique groups varied between 1.9 mGy (90 kV, sensitivity class 400) and 4.6 mGy (70 kV, sensitivity class 400) for the AP projection and between 6.4 mGy (95 kV, sensitivity class 600) and 20.4 mGy (70 kV, sensitivity class 400) for the lateral projection. Image criteria, given in the ‘European Guidelines on Quality Criteria for Radiographic Images’, were used to assess image quality. Two evaluation methods have been employed: straight forward scoring of fulfilled image criteria, and visual grading analysis using the structures defined in the image criteria. The latter method provided a sharper distinction between groups of images taken with different radiographic techniques. The average number of fulfilled image criteria for the frontal projections varied between 0.74 (90 kV, sensitivity class 400) and 0.87 (70 kV, sensitivity class 400) for the frontal projection. For the lateral projection this number varied between 0.79 (95 kV, sensitivity class 600) and 0.84 (77 kV, sensitivity class 600). This study shows that image criteria is a useful tool in clinical studies of image quality.

I-2

Introduction Quality assurance in diagnostic radiology combines a guarantee of sufficient image quality with a reduction of patient radiation exposure to the lowest achievable. Throughout the years much work has been done to determine patient exposure and risk. However, the results have only to some extent been linked to examination technique [1,2,3,4]. Efforts have been made to describe the relation between image quality and technical and physical parameters used at the examination. Theoretical and experimental studies using phantoms, either anthropomorphic or in the form of simplified test objects, have been undertaken [5,6,7,8]. However, investigations involving real patient images produced under clinical conditions are rare, and associated with numerous difficulties. The patients can not be irradiated without limitation. Hence, in studies of highdose examinations, such as those of the lumbar spine, the images stem from different subjects which complicates the assessment of image quality. Furthermore, the inclusion of several human observers implies that both intra- and inter-observer variation must be considered. Standardised methods for evaluating clinical images have not been established and agreed upon within the radiological community. An important step towards standardising and optimising diagnostic radiology has been taken by the European Community in the form of the CEC guidelines [9]. These guidelines are the product of extensive research initiated and supported by the Commission of the European Communities (CEC). They were introduced in order to reduce patient exposure and to increase image quality throughout Europe [10,11]. The guidelines include examples of good radiographic technique and image criteria for common types of examinations. Reference dose levels are also given for these examinations. The guidelines have shown to be useful when changing the examination technique in order to lower the patient exposure to meet the reference dose level [12,13,14]. However, being able to relate image criteria to technical and physical parameters in a predictable way is also of great importance. It is therefore important to establish methods for evaluating the image quality of clinical images. In this work we have used the established CEC I-3

image criteria for this purpose. The image criteria defines the minimum level of visibility for various structures (normal anatomy) in the image. The aim of this work was - to evaluate the usefulness of the CEC image criteria in assessing the image quality of clinical images taken with different radiographic procedures - to use the CEC image criteria in quantifying image quality for different examination techniques - to collect clinical images of the lumbar spine together with carefully measured technical and physical parameters, including measurements of patient exposure.

Materials and methods The images included in this study were produced at Malmö University Hospital, Sweden and obtained from patients referred to the Department of Radiology. When performing this study no additional radiation exposure of the patients was needed and the radiographs was used in the clinic. Examination techniques The examination techniques used corresponds to the European guidelines [9] (Table 1) and other basic considerations of good examination technique concerning e.g. collimation. However, a lower tube voltage was also used in the AP projection and in the lateral projection than suggested. Furthermore, no automatic exposure control (AEC) was used. The optical density of the radiographs was measured in a soft tissue region. Two different tube voltages were applied for each projection (70 and 90 kV for the AP projection and 77 and 95 kV for the lateral projection). Two different screen-film combinations were used: Kodak TMAT L/RA film plus Kodak Regular Plus screen (sensitivity class 400) and Kodak Fast screen (sensitivity class 600). This resulted in four different technique groups for each projection (Table 2). A total of 85 radiographs, female and male patients, were obtained: 41 in the AP projection and 44 in the lateral projection (Table 2). The patients were examined in a single laboratory but two X-ray tubes (CGR RSN 742), denoted Penta-X Left (focal spot size 0.60 x 0.74 mm2) and

I-4

Penta-X Right (focal spot size 0.58 x 0.67 mm2) were employed. For both tubes, the measured half value layer was 3.8 mm Al at 80 kV. Total filtration was 4.5 and 4.7 mm Al, respectively. The two tubes were connected to the same generator (Medira 150/60, HF). The two examination tables were equipped with different grids: 51 L/cm, ratio 12 and 60 L/cm, ratio 10 for the left and right tube respectively. The equipment is routinely checked in a quality control programme. Patient characteristics To minimise the influence of patient anatomy, only patients weighing between 55 and 85 kg and without metal implant material were included in the study. Patient data were recorded. Data, separated for the technique group, is presented in Table 3. Radiation dose measurements The radiation dose was measured individually for each patient with thermoluminiscent LiF dosimeters (TLD-100 Harshaw, USA). The dosimeters were placed in the centre of the X-ray beam on the skin of the patient. Thus the entrance surface dose was measured, including back-scattered radiation. A patient dose related to the risk for late effects, HGolem [15], was also calculated. This quantity is calculated by means of the male adult voxel phantom Golem, adjusted to the average size of the patients in the technique groups. Evaluation of image quality The images were evaluated by a panel of seven European expert radiologists# as part of a 3-day session of intense film reading. This group was established in 1996 in connection with the CEC project FI 4P CT950005. The seven radiologists evaluated the individual images in parallel using the CEC image criteria for lumbar spine examinations (Table 4). The evaluation procedure was divided into two parts: a) The radiologists used the image criteria as intended, stating only whether the criteria were fulfilled or not (fulfilment of image criteria).

#

J. Besjakov, MD, PhD, Malmö, Sweden, C. Gückel, MD, Basel, Switzerland S. Kheddache, MD, PhD, Göteborg, Sweden, M Laval Jeantet, Prof., Paris, France M Maffessanti, Prof. Trieste, Italy, J W Oestmann, Prof. Berlin, Germany and G Whitehouse, Prof. Liverpool, United Kingdom.

I-5

b) In the other part, the radiologists compared the image quality of the images with that of one reference image, using anatomical structures specified in the image criteria (visual grading analysis). A limited number of the images were evaluated twice in order to assess the intra observer variability. Inter- and intra-observer variability was assessed by means of the kappa value [16]. Fulfilment of image criteria A number including all image criteria, the so-called image criteria score, (ICS) was calculated. ICS 

number of criteria fulfilled total number of criteria

(0 ICS1) An ICS was calculated for each image and observer. The average ICS value for each image, using the result from all the seven observers, was used as the basis for the statistical analysis. Visual grading analysis Two images of one patient, one AP and one lateral projection, were used as reference images in the visual grading analysis. The two reference images were taken at 80 kV and 85 kV for the AP and lateral projection respectively. The sensitivity class of the screen-film combination was 400 for both projections. The quality of the images studied was compared with that of the reference image, using the structures defined in Table 4. A five level scale was used; clearly better than (+2), slightly better than (+1), equal to (0), slightly worse than (-1) and clearly worse than (-2) the reference image. The result from the visual grading was evaluated using a calculated visual grading analysis score (VGAS) for each image, including the observations of all observers.

VGAS 

sum of scores number of scores

(-2VGAS2)

I-6

Statistical analysis A statistical analysis was performed on the values obtained by the image criteria evaluation and the visual grading analysis using the ANOVA test, calculating the significance of differences between technique groups. The Newman-Keuls test was also used in order to reduce the risk of random significance.

Results Patient characteristics and radiation dose The age, weight, height and thickness of the patients varies considerably between different technique groups (Table 3). This will influence the result of the radiation dose measurements. Entrance surface dose and HGolem for the different radiographic techniques are presented in Table 5a and 5b. The entrance surface doses in this study are much lower than the CEC reference dose level of 10 mGy (AP projection) and 30 mGy (lateral projection). The higher tube voltage reduces the entrance surface dose by a factor of two for the AP projection. The reduction when using the higher tube voltage is even higher for the lateral projection. HGolem is also reduced by half when the higher tube voltage is employed. The expected influence of the screen-film combination is not reflected in the results. When the more sensitive combination is employed the entrance surface dose should be reduced accordingly. A reason for this inconsistent result are obviously the different size of the patient. The result of not using automatic exposure control on optical density of the radiographs is presented in Table 5a and Table 5b. Optical density varies considerably between the different technique groups. The optical density also varies within each technique groups. This emphasises the use of a automatic exposure control, as suggested by the CEC guidelines [9]. Inter- and intra-observer variance The rereading resulted in a average change of 16 % of the answers between the first and second readings for the different observers. This change I-7

corresponds to a kappa value varying between 0.56 and 0.92 for the different observers. This is a fairly good result [16]. Furthermore, the first reading was performed very early in the trial and was succeeded by thorough discussions regarding the interpretation of the criteria. Interobserver variance was also studied and was also fairly low, a kappa value above 0.6 was observed between six of the seven observers. However, one observer reported consistently a much lower number of images fulfilling image criteria, resulting in a poor kappa value. The observer was consistent throughout the study therefore the observer was not excluded from the study. Image criteria The mean number of criteria that were fulfilled for the different radiographic techniques is given in Table 6a (AP projection) and Table 6b (lateral projection). The number are presented for the different criteria separately, seven criteria for the AP projection and five for the lateral projection. For the AP projection, criterion 7 (sacro-iliac joints) was nearly always fulfilled for all techniques. The least fulfilled criterion was number 2 (pedicles). Technique group 70-400 had the highest score for five of the seven criteria, but criterion 3 (intervertebral joints) was least fulfilled for this technique. For the lateral projection, technique group 77-400 had the highest acceptance rate for all criteria except for criterion 2 (posterior vertebral edges) and criterion 3 (pedicles and intervertebral foramina). Technique groups 95400 and 95-600 had the highest acceptance rate for criterion 2, but only for this criterion. For criterion 3 the acceptance rate was lowest for technique group 77-400. This was also the case for the low tube voltage-slow screen technique in the AP projection. The mean value of the image criteria score (ICS) for all observers and images for the different technique groups is shown in Figure 1a and 1b. Considering all criteria, technique group 70-400 for the AP projection and 77600 for the lateral projection fulfils most criteria.

I-8

Visual grading analysis (VGA) The mean value of the visual grading analysis scores is shown in Figures 2a and 2b. The best image quality when using VGA was the technique groups 70400 and 77-600 for the AP and lateral projection respectively. When comparing image criteria score and visual grading analysis score the result is consistent, grading the different technique groups the same considering image quality. Separability of the technique groups by means of the two methods The statistical analysis of the material shows that some of the technique groups could be significantly separated from the others using the image criteria. For the 70-400 technique group more image criteria was fulfilled compared to the two 90 kV technique groups (90-400 p=0.12 and 90-600 p=0.09). For the lateral projection the image criteria did not demonstrate any significant difference between the techniques. However, when studying the result from the visual grading analysis more technique groups could separated. The technique group 70-400 could be separated from the two 90 kV technique groups (90-400 p=0.02 and 90-600 p=0.04). For the lateral projection the technique group 77-400 could be separated from 95-400 (p=0.06) and 77-600 could be separated from 95-400 (p=0.07). Discussion and conclusion When performing studies including clinical images the patient characteristics is of most importance and should be carefully noted. The known size of the patient together with the measured optical density could explain the variance of entrance surface. The image quality could have been improved by using a correctly calibrated automatic exposure control, giving a lower variance of the optical density. This study shows that the reference level for the entrance surface dose is easy to meet. In fact for all patients in the study, including persons weighting up to 85 kg, the measured entrance surface dose was lower than the reference level [9]. We therefore conclude that the national radiation protection authorities need to re-evaluate the reference levels on a national basis.

I-9

The number of images fulfilling all image criteria was rather low. This is in accordance with other studies performed during the development of the criteria [17]. All images in the present study were judged to clinically useful. It must be stressed, as indicated in the CEC guidelines [9], that no image should be rejected even if not all image criteria are met. The image criteria themselves are is not unambiguous. A discussion about their meaning was needed prior to and during the image evaluation. However, the intra- and inter-observer variability, which certainly complicate evaluation of the study results, do not necessarily serve as an argument against the image criteria and their application. As indicated for example by Robinson [18] there is always an immense intra- and inter-observer variability in the interpretation of clinical images. We conclude that the image criteria have to be discussed prior to a clinical study of image quality. Significant differences between technique groups could be detected using both the image criteria and the visual grading analysis. This study shows that the CEC image criteria can be used for quantitative evaluation of quality of clinical images, provided that radiographic technique and patient size are carefully audited and recorded. The visual grading analysis provided a sharper distinction between groups of images taken with different radiographic techniques. This study shows that the high tube voltage (95 kV) for the lateral projections of lumbar spine radiographs can be used not altering the image quality significantly. The higher tube voltage did not decrease the quality of the lateral radiographs as much as for the AP projection. Using the higher tube voltage for the lateral projection also decreases the radiation dose to the patients. Important image criteria for getting a high image criteria score was criterion number 1 (upper and lower-plate surfaces), number 2 (pedicles) and number 5 (the cortex and trabecular structures) for the AP projection. The corresponding important image criteria for the lateral projection was number 1 (upper and lower-plate surfaces) and number 5 (the cortex and trabecular structures).

I - 10

Acknowledgements

This work has been supported by the Commission of the European Communities, (FI 4P CT 95 0005) and the Swedish Radiation Protection Institute (SSI P1019.97).

I - 11

References 1. Shrimpton PC, Wall BF, Jones DG, Fisher ES, Hillier MC, Kendall GM, Harrison RM. Doses to patients from routine diagnostic X-ray examinations in England. Br J Radiol 1986; 59: 749-758.

2. Padovani R, Contento G, Fabretto M, Malisan MR, Barbina V, Gozzi G. Patient doses and risk from diagnostic radiology in North-east Italy. Br J Radiol 1987; 60: 155-165.

3. Maccia C, Benedittini M, Lefaure C, Fagnani F. Doses to patients from diagnostic radiology in France. Health Phys 1988; 54: 397-408.

4. Warren-Forward HM, Millar JS. Optimisation of radiographic technique for chest radiography. Br J Radiol 1995; 68: 1221-1229.

5. Dance DR, Lester SA, Carlsson GA, Sandborg M, Persliden J. The use of carbon fibre material in radiographic cassette: estimation of the dose and contrast advantages. Br J Radiol 1997; 70: 383-390.

6. Sandborg M, Dance DR, Carlsson GA, Persliden J. Monte Carlo study of grid performance in diagnostic radiology: task dependent optimization for screenfilm imaging. Br J Radiol 1994; 67: 76-85.

I - 12

7. Leitz WK, Mansson LG, Hedberg-Vikstrom BR, Kheddache S. In search of optimum chest radiography techniques. Br J Radiol 1993; 66: 314-321.

8. Almén A, Lööf, Mattsson S. Examination technique, image quality and patient dose in paediatric radiology. A survey including 19 Swedish hospitals, Acta Radiologica 1996; 37: 337-342.

9. Commission of the European Communities, European guidelines on quality criteria for diagnostic radiographic images, EUR 16260 EN, Brussels: CEC, 1996.

10. Schibilla H, Moores BM. Diagnostic radiology better images - lower dose comprise or correlation? A European strategy with historical overview. J Belge Radiol 1995; 78: 382-387.

11. Moores BM. CEC quality criteria for diagnostic radiographic images - basic concepts. Rad Prot Dosim 1995; 57: 105-110.

12. Vañó E, Guibelalde E, Morillo A, Alvarez-Pedrosa CS, Fernández JM. Evaluation of the European image quality criteria for chest examination. Br J Radiol 1995; 68: 1349-1355.

13. Vañó E, Oliete S, González L, Guibelalde E, Velasco A, Fernández JM. Image quality and dose in lumbar spine examination: results of a 5 year

I - 13

quality control programme following the European quality criteria trial. Br J Radiol 1995; 68: 1332-1335.

14. McNeil E A, Peach D E, Temperton DH. Comparison of entrance surface doses and radiographic techniques in the West Midlands (UK) with the CEC criteria, specifically for lateral lumbar spine images. Rad Prot Dosim 1995; 57: 437-440.

15. M Zankl, W Panzer, C Herrman. Calculation of patient dose using a human voxel phantom of variable diameter. In: Medical x-ray imaging potential impact of the revised European medical exposure directive 1997. Rad Prot Dosim June 2000.

16. Altman D G. Some common problems in medical research. In: Altman DG, editors. Practical statistics for medical research: Chapman & Hall, London, 1991: 396-439.

17. Commission of the European Communities, The 1991 CEC trial on quality criteria for diagnostic radiographic images: detailed results and findings, EUR 16635 EN, Brussels: CEC, 1996.

18. Robinson PAJ. Radiology´s Achilles´heel: error and variation in the interpretation of the Röntgen image. Br J Radiol 1997; 70: 1085-1098.

I - 14

Tables Table 1 Comparison of the radiographic technique used with the CEC good radiographic technique guidelines. No

Good radiographic technique CEC, 1996

Used in this study

1

Grid table stationary or moving grid

yes

2

Nominal focal spot size  1.3

yes

3

Total filtration  3.0 mm Al equivalent

yes

4

Anti scatter grid r=10; 40/cm

5

Screen film system nominal speed class >400

yes

6

FFD: 115 (100-150) cm

yes

7

Radiographic voltage AP 75-90 kV lat 80-95 kV

8

Automatic exposure control central chamber

no

9

Exposure time AP < 400 ms lat < 1000 ms

yes

r=12, 51/cm & r=10, 60/cm

I - 15

yes (also AP 70 and lat 77 )

Table 2 Radiographic techniques and number of radiographs (including sex of the patients) for the AP and the lateral projection Projection

Abbreviation

Tube voltage

Film-screen

Number of images

(kV)

speed class

(Female/Male)

Frontal

70-400

70

400

10 (8/5)

Frontal

70-600

70

600

10 (5/5)

Frontal

90-400

90

400

11 (5/6)

Frontal

90-600

90

600

10 (1/9)

Lateral

77-400

77

400

10 (5/5)

Lateral

77-600

77

600

14 (10/4)

Lateral

95-400

95

400

10 (1/9)

Lateral

95-600

95

600

10 (5/5)

I - 16

Table 3 Characteristics of the patients included in the study AP

Age

Weight

Height

Thickness

projection

(y)

(kg)

(cm)

(cm)

mean  SD

53  15

70  8

169  7

21  3

range

29 - 72

55 - 85

160 - 180

18 - 26

mean  SD

54  22

69  9

168  10

21  3

range

19 - 82

52 - 82

154 - 179

16 - 26

mean  SD

60  21

70  9

170  8

21  4

range

25 - 82

59-85

159 - 182

16 - 29

mean  SD

54  21

74  8

174  7

23  3

range

19 - 80

60 - 85

160 - 185

15 - 26

mean  SD

56  22

69  9

168  10

29  3

range

19 - 82

58 - 80

154 - 179

24 - 32

mean  SD

48  18

72  8

170  7

28  2

range

18 - 72

57 - 85

160 - 183

25 - 32

mean  SD

49  22

72  8

175  8

28  3

range

19 - 80

60 - 85

160 - 185

25 - 33

mean  SD

64  19

71  9

169  7

28  2

range

29 - 82

59 - 85

159 - 181

26 - 31

mean

54

71

170

AP: 22/

range

15 - 82

55 - 85

155 - 185

Lat: 28

70-400 70-600 90-400 90-600 Lateral projection 77-400 77-600 95-400 95-600 All pat.

I - 17

Table 4 CEC image criteria, structures underlined (CEC, 1996) AP/PA

Criterion

1

Visually sharp reproduction of the upper and lower-plate surfaces, represented as lines in the centred beam area

2

Visually sharp reproduction of the pedicles

3

Reproduction of the intervertebral joints

4

Reproduction of the spinous and transverse processes

5

Visually sharp reproduction of the cortex and trabecular structures

6

Reproduction of the adjacent soft tissue, particularly the psoas shadows

7

Reproduction of the sacro-iliac joints

Lateral 1

Visually sharp reproduction of the upper and lower-plate surfaces, with the resultant visualisation of the intervertebral spaces

2

Full superimposition of the posterior vertebral edges

3

Reproduction of the pedicles and the interverterbral foramina

4

Visualisation of the spinous processes

5

Visually sharp reproduction of the cortex and trabecular structures

I - 18

Table 5a. Entrance surface dose (ESD) and HGolem, for the different radiographic techniques, AP projection. Radiographic

Film Optical

ESD

HGolem

technique

density

(mGy)

(mSv)

1.3

4.6

0.29

1.3

0.05

70-400

mean SD

70-600

range

0.9 - 1.7

3.7 - 7.8

mean

1.2

3.4

0.25

1.0

0.05

SD

90-400

range

0.9 - 1.7

2.3 - 5.4

mean

1.1

1.9

0.14

0.7

0.03

SD

90-600

range

0.6 - 2.0

1.0 - 3.1

mean

1.2

2.4

0.14

1.1

0.04

SD range

0.8 - 1.6

I - 19

0.8 - 4.3

Table 5b. Entrance surface dose (ESD) and HGolem, for the different radiographic techniques, lateral projection. Radiographic

Optical density

technique 77-400

mean

1.8

SD

77-600

(mGy)

(mSv)

20.4

0.67

5.1

0.14

1.4 - 2.3

11.5 - 29.8

mean

2.0

13.7

0.39

4.8

0.06

range

1.5 - 2.7

5.8 - 23.4

mean

1.4

7.0

0.23

3.2

0.07

SD

95-600

HGolem

range

SD

95-400

ESD

range

1.1 - 1.9

2.4 - 11.6

mean

1.8

6.4

0.23

2.3

0.05

SD range

1.1 - 2.2

I - 20

2.7 - 9.4

Table 6a. Mean value of the image criteria score (ICS) for the AP projection Technique group Criterion 1

2

3

4

5

6

7

70-400

70-600

90-400

90-600

mean

0.90

0.83

0.73

0.64

SD

0.09

0.18

0.17

0.21

range

0.78 - 1.0

0.60 - 1.0

0.55 - 1.0

0.30 - 0.90

mean

0.74

0.75

0.60

0.56

SD

0.09

0.12

0.21

0.21

range

0.50 - 0.90

0.50 - 0.90

0.30 - 0.89

0.30 - 0.80

mean

0.77

0.88

0.87

0.85

SD

0.17

0.09

0.13

0.12

range

0.50 - 1.0

0.70 - 1.0

0.72 - 1.0

0.70 - 1.0

mean

0.84

0.77

0.73

0.77

SD

0.21

0.29

0.34

0.26

range

0.50 - 1.0

0.40 - 1.0

0.10 - 1.0

0.30 - 1.0

mean

0.80

0.72

0.56

0.61

SD

0.17

0.22

0.28

0.25

range

0.67 - 1.0

0.33 - 1.0

0.27 - 0.81

0.30 - 0.90

mean

0.80

0.73

0.60

0.64

SD

0.21

0.26

0.24

0.21

range

0.40 - 1.0

0.20 - 1.0

0.18 - 0.91

0.40 - 1.0

mean

0.97

0.93

0.92

0.96

SD

0.05

0.15

0.14

0.11

range

0.90 - 1.0

0.60 - 1.0

0.70 - 1.0

0.70 - 1.0

I - 21

Table 6b. Mean value of the image criteria score (ICS) for the lateral projection Technique group Criterion 1

2

77-400

77-600

95-400

95-600

mean

0.94

0.90

0.77

0.70

SD

0.05

0.11

0.16

0.27

range

0.90 - 1.0

0.71 - 1.0

0.50 - 1.0

0.20 - 1.0

mean

0.60

0.65

0.86

0.89

SD

0.08

0.14

0.13

0.07

0.60 - 1.0

0.80 - 1.0

range 3

4

5

0.50 - 0.70 0.50 - 0.86

mean

0.71

0.90

0.84

0.88

SD

0.15

0.09

0.19

0.13

range

0.50 - 0.90

0.79 - 1.0

0.50 - 1.0

0.60 - 1.0

mean

0.88

0.88

0.86

0.86

SD

0.11

0.08

0.10

0.18

range

0.70 - 1.0

0.79 - 1.0

0.70 - 1.0

0.50 - 1.0

mean

0.93

0.85

0.60

0.67

SD

0.15

0.17

0.30

0.38

range

0.60 - 1.0

0.57 - 1.0

0.20 - 1.0

0 - 1.0

I - 22

Figure text Figure 1a. Fulfilled image criteria (mean ICS) for different radiographic techniques, AP projection. Bars indicate one standard error. Figure 1b. Fulfilled image criteria (mean ICS) for different radiographic techniques, lateral projection. Bars indicate one standard error. Figure 2a. Result of visual grading analysis (mean VGAS) for different radiographic techniques, AP projection. Bars indicate one standard error. Figure 2b. Result of visual grading analysis (mean VGAS) for different radiographic techniques, lateral projection. Bars indicate one standard error.

I - 23

Image Criteria Score 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3

70-400

70-600

90-400

90-600

Radiographic technique

Figure 1a

Image Criteria Score 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3

77-400

77-600

95-400

Radiographic technique

Figure 1b

I - 24

95-600

1.0

Visual Grading Analysis Score

0.5

0.0

-0.5

-1.0

70-400

70-600

90-400

90-600

Radiographic technique

Figure 2a

1.0

Visual Grading Analysis Score

0.5

0.0

-0.5

-1.0

77-400

77-600

95-400

Radiographic technique

Figure 2b

I - 25

95-600

Paper II

THE INFLUENCE OF DIFFERENT TECHNIQUE FACTORS ON IMAGE QUALITY FOR CHEST RADIOGRAPHS – APPLICATION OF THE RECENT CEC IMAGE QUALITY CRITERIA

B Lanhede1, A Tingberg2, L G Månsson1, S Kheddache3, M Widell1, L Björneld3, P Sund1, A Almén2, J. Besjakov4, S Mattsson2, M Zankl5, W Panzer5 and C Herrmann5 1. Department of Radiation Physics, Sahlgrenska University Hospital, SE- 413 45 Göteborg, Sweden 2. Department of Radiation Physics, Malmö University Hospital, SE-205 02 Malmö, Sweden 3. Department of Radiology, Sahlgrenska University Hospital, SE- 413 45 Göteborg, Sweden 4. Department of Diagnostic Radiology, Malmö University Hospital, SE-205 02 Malmö, Sweden 5. GSF-National Research Center for Enviroment and Health, Neuherberg, Germany Corresponding author: Birgitta Lanhede, MSc, Department of Radiation Physics Norrlands Universitetssjukhus, SE-901 89 UMEÅ tel: +46 90 785 24 07, fax +46 90 785 15 88 E-mail: [email protected] ABSTRACT The aim of this work was to evaluate and possibly improve the CEC image criteria for radiographic chest images. Chest images of healthy volunteers were acquired using different technique factors. The image criteria were used as a tool to discriminate between the different images. The technique factors were chosen so that the image quality would differ slightly. Four different technique parameters, each with two possible settings, used in clinical practice today, were used: tube voltage - 102 and 141 kV; screen/film speed - 160 and 320; maximum optical density in the parenchyma - 1.3 and 1.8; method for scatter reduction – air gap and moving grid. The results showed that the image criteria were able to separate between different technique groups. Optical density 1.8 was better than 1.3 independent of the other parameters. No difference was seen for screen/film speed. No correlation was seen between the ranking of the systems and patient dose.

II - 1

INTRODUCTION The framework of the CEC research project ‘Predictivity and Optimisation in Medical Radiation Protection’ addresses fundamental operational limitations in existing radiation protection mechanisms in diagnostic radiology. An attempt was made to evaluate possible relationships between the quality of the radiological information content of the image, the imaging procedures and the radiation dose to the patient. This part of the project also tested the CEC image criteria, the 1990 version and a revised version of the 1990 version(1) to see if the image criteria could be used for the optimisation of the image quality and the radiographic technique. At the time of the study the 1996 version(2) of the CEC image quality had not been published. The first study (Trial I within the CEC research project), using conventional X-ray film-screen examinations of chest and lumbar spine(3), was performed at the University Hospitals of Göteborg and Malmö in 1996. The part of the trial dealing with chest imaging will be described here.

MATERIALS AND METHODS With approval from the Committees of Ethics and Radiation Protection at Göteborg University/Sahlgrenska University Hospital a total of 240 PA chest radiographs were produced using healthy volunteers. In order to control all parameters in the imaging procedure, the group of volunteers was selected to be as uniform as possible. They were normal sized women and men with a mean age of 25 years (range 19-49 years) and a mean weight of 70 kg (range 52-105kg). The mean PA thickness was 21 cm (range 17-26).

All the images were produced with a Siemens Polydoros 50s generator and a Siemens (Bi150/30/52R-100 s/n 4056 1/0,6) x-ray tube using the same x-ray stand, Siemens Vertix-E with a Siemens Jk-1 465*465 mm ionchamber (Siemens AG, Erlangen, Germany). All images were exposed with automatic exposure control using Kodak films (35x43 cm T MAT L), Kodak’s screens (Lanex 160/320) and Kodak cassettes (X-Omatic LW), (Kodak, Rochester, NY, USA). The films were developed in a Kodak M8 film processor with Agfa G 138 developer.

Four physical and technical parameters were studied with two different values each: The tube voltage (102 and 141 kVp), the nominal speed class of the screen (160 and 320), the maximum density in the parenchyma (1.3 

II - 2

0.15 and 1.8  0.15), and the method for scatter reduction (air gap 30 cm with FFD 390 cm and a grid with ratio R=12 and 40 lamellae/cm, FFD 150 cm). The total filtration was 3.7 mm Al for 102 kVp and 5.7 mm Al for 141 kVp. Anyone of these settings can be found in European radiology departments today.

The air kerma (kerma in free air) at the entrance surface (Kair, e) was measured with a semiconductor detector. The air kerma is related to the entrance surface dose (ESD) - by the relation: ESD = Kair,e  BSF, where BSF is the backscatter factor for the radiation quality and geometry used.

Risk-related doses HGolem have been derived by Zankl et al.(4) by Monte Carlo calculations using the voxel phantom Golem. The conversion between air kerma (Kair,e is) and this risk-related dose is: HGolem = Kair,e  fc(HGolem) where fc(HGolem) is a conversion coefficient (Sv/Gy) (4).

The European guidelines on quality criteria for chest images as presented in the 1990 version(1) were used in the trial. A total of 10 image criteria were used (as compared to eight in the 1990 version). The first four of these image criteria were identical to the 1990 version, the six later (5 – 10) were slightly rephrased and extended from the original four remaining criteria. This revision of the criteria was proposed after discussions within the European group of expert radiologists before the study(5). The following questions were at issue: Does the X-ray film comply with the following image criteria? 1

Reproduction of the vascular pattern in the whole lung, particularly the peripheral vessels

2

Visually sharp reproduction of the trachea and proximal bronchi, the borders of the heart and aorta

3

Visually sharp reproduction of the diaphragm and costo-phrenic angles

4

Visualisation of the retrocardiac lung and mediastinum Details to be sharply visualised in the parenchyma:

5

Thin linear structures (0.5 - 2 mm): fissures, peripheral vessels

6

Rounded structures (2 - 6 mm): vessels seen en face Details to be reproduced in the mediastinum

7

The carina with main bronchi

II - 3

8

The thoracic vertebrae

9

The interface mediastinum – lung Details to be sharply visualised in the costo-pleural junction:

10

The costo-pleural junction

All chest radiographs were examined by seven European expert radiologists. The observers viewed each radiograph separately and stated whether each of the ten image criteria was fulfilled or not with a simple “yes” or “no” answer. For a given combination of the four parameters under study an Image Criteria Score (ICS) was calculated as

I

ICS 

C

O

 F i 1 c 1 o 1

i ,c ,o

I C O

where Fi,c,o is the fulfilment of criterion c for image i, and observer o. Fi,c,o = 1 if criterion c is fulfilled (the observer’s answer was “yes”), otherwise Fi,c,o = 0 (the observer’s answer was “no”). I, C and O are the number of images criteria observers, and, respectively. Differences in ICS values between combinations of technical and physical parameters were statistically analysed with analysis of variance (ANOVA) and the Newman-Keuls test.

RESULTS The results showed that the 1990 version of the CEC image criteria and revised criteria gave approximately the same ranking of the technique groups, but the revised criteria discriminated better between the different techniques (Table 1). Consequently, the further analysis will be based on the revised criteria.

For the two density levels and for the two separate voltage tube levels, the statistical analysis showed that the density level 1.8 was significantly better than 1.3 (p