Statistical textural features for detection of ... - Semantic Scholar

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 18, NO. 3, MARCH 1999

231

Statistical Textural Features for Detection of Microcalcifications in Digitized Mammograms Jong Kook Kim and Hyun Wook Park*

Abstract— Clustered microcalcifications on X-ray mammograms are an important sign for early detection of breast cancer. Texture-analysis methods can be applied to detect clustered microcalcifications in digitized mammograms. In this paper, a comparative study of texture-analysis methods is performed for the surrounding region-dependence method, which has been proposed by the authors, and conventional texture-analysis methods, such as the spatial gray-level dependence method, the gray-level run-length method, and the gray-level difference method. Textural features extracted by these methods are exploited to classify regions of interest (ROI’s) into positive ROI’s containing clustered microcalcifications and negative ROI’s containing normal tissues. A three-layer backpropagation neural network is used as a classifier. The results of the neural network for the texture-analysis methods are evaluated by using a receiver operating-characteristics (ROC) analysis. The surrounding region-dependence method is shown to be superior to the conventional texture-analysis methods with respect to classification accuracy and computational complexity. Index Terms—Breast cancer, clustered microcalcifications, neural network, texture analysis.

I. INTRODUCTION

B

REAST cancer is one of the major causes for the increase in mortality among middle-aged women, especially in developed countries [1]. In 1994, the mortality and the incidence rate of breast cancer were estimated as 3.9 and 9.9 per 100 000 women, respectively, in Korea [2]. The mortality of breast cancer in Korea is lower than that of other developed countries; 33.7 in the United States, 10.7 in Japan, 13.8 in Singapore, 35.5 in France, and 44.5 in Germany. However, it continues to slowly increase in Korea [2]. Mammography associated with clinical breast examination is the most efficient method for early detection of breast cancer [3]. However, it is very difficult to interpret X-ray mammograms because of the small differences in the image densities of various breast tissues, which is particularly true for dense breasts. The interpretation of mammograms by radiologists is performed by a visual examination of films for the presence of abnormalities that indicate cancerous changes. Computer-aided diagnosis (CAD) Manuscript received July 14, 1997; revised February 10, 1999. The Associate Editor responsible for coordinating the review of this paper and recommending its publication was M. Giger. Asterisk indicates corresponding author. J. K. Kim is with Software Center, Corporate Technical Operations, Samsung Electronics Co., Ltd., Seoul, Korea. *H. W. Park is with the Korean Advanced Institute of Science and Technology, Department of Electrical Engineering, 373-1 Kusung-dong, Yusong-gu, Taejon 305-701, Korea (e-mail: [email protected]). Publisher Item Identifier S 0278-0062(99)04006-9.

is a useful tool for improving diagnostic accuracy and for assisting to radiologists with film interpretation. Microcalcifications, one of the early indicators of breast cancer, are tiny granule-like deposits of calcium. The presence of clustered microcalcifications in X-ray mammograms is considered an important indicator for the detection of breast cancer, especially for individual microcalcifications with diameters up to about 0.7 mm and with an average diameter of 0.3 mm [3]. Radiologists define a cluster of microcalcifications as the presence of three or more visible microcalcifications within a square centimeter region of the mammogram [3]. The detection of clustered microcalcifications in mammograms has been of interest to many researchers [4]–[14]. Chan et al. [4] used difference-image, gray-level thresholding, and signal-extraction techniques in order to detect microcalcifications in digitized mammograms. Wu et al. [5] applied neural networks directly to the images or to preprocessed images to recognize patterns that might include microcalcifications in digitized mammograms. They showed that using neural networks, clusters of microcalcifications were more accurately distinguished in the frequency domain than they were in the spatial domain. Davies and Dance [6] proposed a local area thresholding process for automatic detection of clustered microcalcifications. Recently, a variety of schemes based on the wavelet transform have been proposed for the detection of microcalcifications [7]–[9]. Some researchers have also studied the potential of CAD to improve the radiologist’s performance in detecting of clustered microcalcifications [11], [12]. Intelligent clinical workstations for CAD of breast cancer have been developed in order to demonstrate the possibility of using CAD for clinical applications [13], [14]. Texture information plays an important role in image analysis and understanding, with potential applications in remote sensing, quality control, and medical diagnosis. Texture is one of the important characteristics used in identifying an object or a region of interest (ROI) in an image [15]. The authors previously proposed a novel statistical texture-analysis method, called the surrounding region-dependence method (SRDM), for use in detecting clustered microcalcifications in digitized mammograms [10]. In the proposed method, a motivation for using texture is the fact that the surrounding contains the texture inforregion-dependence matrix mation of an ROI. The texture coarseness or fineness of an ROI can be interpreted as the distribution of the elements in the surrounding region-dependence matrix. For example, if a texture is smooth, a pixel and its surrounding pixels will probably have similar gray values. This means that the

0278–0062/99$10.00  1999 IEEE

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on June 2, 2009 at 00:38 from IEEE Xplore. Restrictions apply.

232


distribution of elements should be concentrated on or near the upper left corner of the matrix. If a texture has fine details, the difference between a pixel and its surrounding pixels will probably be large. This means that the distribution of elements should be spread out along the diagonal of the matrix. For positive ROI’s containing clustered microcalcifications, the distribution of elements tends to spread to the right and/or lower right corner of the matrix. This paper will focus on a comparative study of the performances of the SRDM and other conventional statistical texture-analysis methods in detecting clustered microcalcifications in digitized mammograms. The conventional statistical texture-analysis methods compared in this paper are the spatial gray-level dependence method (SGLDM) [16], the gray-level run-length method (GLRLM) [17], and the gray-level difference method (GLDM) [18]. These texture-analysis methods have been widely used in pattern recognition fields. Textural features extracted by these methods are used to classify ROI’s into positive ROI’s containing clustered microcalcifications and negative ROI’s containing normal tissues. To evaluate the classification efficacies of these textureanalysis methods, a three-layer backpropagation neural network [19] was employed as a classifier. A receiver operatingcharacteristics (ROC) analysis [20], [21] was used to evaluate the classification performances of the textural features extracted by each texture-analysis method. ROC analysis is based on statistical decision theory and has been applied extensively to the evaluation of clinical diagnosis. The ROC curve represents the relationship between the true-positive fraction (TPF) and the false-positive fraction (FPF) for variations of the decision threshold. The TPF and the FPF denote the fraction of patients actually having the disease in question that are diagnosed as positive and the fraction of patients actually without the disease in question that are diagnosed as positive, is used as respectively. The area under the ROC curve a measure of the classification performance. A higher indicates better classification performance because a larger value of TPF is achieved at each value of FPF. An ideal performance produces an area of 1.0. The paper is organized as follows: The texture-analysis methods are described in Section II of this paper. The experimental results from the comparative study of the textureanalysis methods are presented in Section III. The three-layer backpropagation neural network used as the classifier is also described in Section III. Finally, conclusions are given in Section IV. II. TEXTURE ANALYSIS A. Surrounding Region-Dependence Method A statistical texture-analysis method, called the SRDM, for detecting clustered microcalcifications in digitized mammograms has been developed by the authors [10]. The SRDM is based on a second-order histogram in two surrounding regions. Let us consider three rectangular windows centered and on a current pixel ( , ), as shown in Fig. 1. In Fig. 1, are the inner surrounding region and the outer surrounding

Fig. 1. Configuration of the surrounding regions for the current pixel (x; y ). R1 and R2 are the inner surrounding region and the outer surrounding region, respectively. w1 ; w2 , and w3 denote the sizes of the three square windows.

region, respectively, and , , and denote the sizes of the have values three square region. In this study, , , and of three, five, and seven, respectively. These window sizes are determined considering the lesion size to be detected and the pixel size of the digitized X-ray mammograms. An ROI image is transformed into a surrounding region-dependence matrix which is defined as (1) and where is a given threshold value and the values of are the total numbers of pixels in regions and , is given as respectively. In (1), the element and (2) denotes the number of elements in the set, and is the two-dimensional (2-D) image space. In (2) and the outer count on the inner count are defined as follows: the current pixel

where

and (3) and (4) is the image intensity of the current pixel . In general, the larger the threshold value is, the more microcalcifications can be missed, whereas the smaller the value is, the more sensitive the random noise effect is, so that negative ROI’s may be classified as positive. The optimal selection of the value is very important for the classification performance. It is evident that the surrounding region-dependence matrix contains the texture information of an ROI. The texture coarseness or fineness of an image can be interpreted as the . For example, distribution of the elements in the matrix if a texture is smooth, it is very possible that a pixel and its surrounding pixels will have similar gray values. This means

where


KIM AND PARK: STATISTICAL TEXTURAL FEATURES FOR DETECTION OF MICROCALCIFICATIONS

and

233

is the reciprocal of the element, which is defined as if

(10)

otherwise.

Fig. 2. Dimension of the surrounding region-dependence matrix.

that the distribution of elements should be concentrated on or near upper left corner of the matrix in Fig. 2. If a texture has fine details, it is very possible that the difference between a pixel and its surrounding pixels will be large. This means that the distribution of elements should be spread out along the diagonal of the matrix. The distribution of elements tends to spread to the right and/or lower right corner of the matrix for positive ROI’s containing clustered microcalcifications. The examples for the element distributions of a positive and a negative ROI are described in [10]. These descriptions for the element distributions in the matrix are based on empirical observation. Feature extraction is a very important process for the overall system performance in the area of classification. From the characteristics of the element distribution in the surrounding region-dependence matrix, the following four textural features are extracted: 1) horizontal-weighted sum (HWS) HWS

The proposed features are based on the characteristics of the elements in the surrounding region-dependence matrix. The features were defined considering that the distribution of the matrix elements for positive ROI’s tended to spread to the right and/or lower right corner of the matrix. The reciprocal operation of the matrix elements and the square operation of the location index ( , , or ) have been applied to the features to emphasize the elements in the right or lower right corner of the matrix. Each feature represents the characteristics of the element distribution in the matrix. Good features should have four characteristics: reliability, independence, discrimination, and small numbers [22]. The authors described the effectiveness of the above-mentioned four textual features in terms of reliability, independence, discrimination, and small numbers [10]. B. Spatial Gray-Level Dependence Method The SGLDM [16] is based on an estimation of the second-order joint conditional probability density funcfor and . The function tions is the probability that two pixels, which are located with an intersample distance and a direction , have a gray level and a gray level . The estimated joint conditional probability density functions are defined as follows:

(5) (11)

2) vertical-weighted sum (VWS) VWS

(6)

or (12)

3) diagonal-weighted sum (DWS)

(7)

DWS

(13) 4) grid-weighted sum (GWS) GWS

(8)

is the total sum of elements in the surrounding regiondependence matrix, i.e., (9)

or (14) is where denotes the number of elements in the set, , and stands for the image intensity at the point the total number of pixel pairs within the image which have the intersample distance and direction .


234


Each of the estimated joint probability density functions can be written in matrix form, i.e., the spatial gray-level , as follows: dependence matrix, (15) is the maximum gray level. If a texture is coarse where and is small compared to the sizes of the texture elements, should the pairs of points at the intersample distance usually have similar gray level. This means that the probability is concentrated on or near distribution in the matrix its diagonal. On the other hand, for a fine texture, the gray levels of the points separated by the distance should be quite is distributed different so the probability distribution in away from its diagonal. Thirteen textural features are measured from the matrix : energy, entropy, correlation, local homogeneity, inertia, sum average, sum variance, sum entropy, difference average, difference variance, difference entropy, information measure of correlation 1, and information measure of correlation 2 [16]. In this study, four spatial gray-level dependence and ) matrixes for four different directions ( are obtained for a given distance and the textural features are calculated for each matrix.

Fig. 3. The distribution of the five category ratings for the clustered microcalcifications in positive ROI’s, which were rated subjectively by an expert mammographer.

intersample spacing. Five textural features are measured from : contrast, angular second moment, entropy, mean, and inverse difference moment [18]. In this study, four probabilitydensity functions for four different displacement vectors with are obtained and the textural features are calculated for each probability-density function. III. EXPERIMENTAL RESULTS

C. Gray-Level Run-Length Method The GLRLM [17] is based on computing the number of gray-level runs of various lengths. A gray-level run is a set of consecutive and collinear pixel points having the same graylevel value. The length of the run is the number of pixel points in the run. The gray-level run-length matrix is as follows: (16) is the maximum gray-level and is the maxiwhere , . The element mum run length which is equal to specifies the estimated number of times that a given for a gray level in the picture contains a run length direction of the angle . Four gray-level run- length matrixes and are computed for an corresponding to . ROI of : Five textural features are measured from the matrix short-run emphasis, long-runs emphasis, gray-level nonuniformity, run-length nonuniformity, and run percentage [17]. In this study, four gray-level run-length matrixes for four and ) are obtained and different directions ( the textural features are calculated for each matrix. D. Gray-Level Difference Method The GLDM [18] is based on the occurrence of two pixels which have a given absolute difference in gray level and which are separated by a specific displacement . For any let given displacement vector and be the estimated probability-density function defined by (17) In this analysis, four possible forms of the vector will be ) where is the considered: (0, ), ( , ), ( , 0), and ( ,

This section quantitatively describes the usefulness of the texture-analysis methods in detecting clustered microcalcifications in digitized mammograms. A comparative study of the classification accuracy is performed for the SRDM, the SGLDM, the GLRLM, and the GLDM. The textural features from each texture-analysis method are used to classify ROI’s into positive ROI’s containing clustered microcalcifications and negative ROI’s containing normal tissues. All textureanalysis methods are evaluated with the same database, and all the textural features are obtained from 12-bit gray-level images. A three-layer backpropagation neural network is employed as a classifier. The results from the neural network for the texture-analysis methods are evaluated by using ROC analysis. A. ROI Selection 120 X-ray mammograms were selected from the patient files based on visual criteria and biopsy results. The mammograms were digitized by a LUMISYS laser film scanner with a pixel 100 m and 12 bits per pixel. size of 100 For the comparative study of the texture-analysis methods, 172 ROI’s were selected from 120 digitized mammograms by an expert mammographer, which can represent entire feature space. Each ROI had 128 128 pixels (i.e., 1.28 1.28 cm ). Among the selected 172 ROI’s, 72 ROI’s were positive whereas 100 ROI’s were negative. Positive ROI’s included clustered microcalcifications in dense regions and/or in the glandular tissues of the breast. Negative ROI’s included normal breast areas involving ducts, breast boundaries, Cooper’s ligaments, blood vessels, and/or glandular tissues. The 72 positive ROI’s were rated into five categories by an expert mammographer. Fig. 3 shows the rating result for the database used in this study.



235

error measure [22] is given as

(20) and are the output value and the target value for where the th input pattern, respectively, and is the total number of training patterns. In this study, the training process is stopped becomes smaller than a given when the RMS error constant . C. Classification Results

Fig. 4. Structure of the three-layer backpropagation neural network.

B. Classifier The classifier employed in this research was a three-layer backpropagation neural network [19]. The backpropagation neural network optimizes the net for correct responses to the training input data set. More than one hidden layer may be beneficial for some applications, but one hidden layer is sufficient if enough hidden neurons are used [19]. Fig. 4 shows a schematic of the artificial neural network (ANN) which was used as a classifier in this research. In Fig. 4, the initial weights are randomly selected from [ 0.5, 0.5]. The textural features discussed in Section II are used as the input signals of the input layer. There is a single output node for classification into positive or negative ROI. A nonlinear sigmoid function with zero and one saturation values is used as the activation function for each neuron and is defined by (18)

is the th element of the actual output pattern where is the weight from the th produced by an input pattern, is the threshold of the th neuron. to th neurons, and The network is trained to provide a 0.9 output value for a positive ROI and a 0.1 output value for a negative ROI. In the training process, the weights between the neurons are adjusted iteratively so that the differences between the output values and the target values are minimized. The iteration can be described by (19) where is the learning rate, is the number of epochs, is the error signal, and is a momentum parameter. In (19), is the difference between the th output the error signal of the neural network and the th target output. To evaluate the network performance during the training process, a global

The neural networks were tested by using a jack-knife method [5] and a round-robin method [15]. The results were analyzed by using ROC analysis. ROC analysis was employed to evaluate the performance of the texture-analysis methods in classifying the ROI’s into positive and negative ROI’s. ROC curves were generated by LABROC1 program developed by Metz et al. [23] to fit the continuous outputs of the neural , was used as networks. The area under the ROC curve, a measure of the classification performance. The CLABROC program developed by Metz et al. [24] was employed to test the statistical significance of the difference between two ROC curves. In this program, a bivariate chi-square test is applied to the simultaneous difference between the parameters ( intercepts on the normal-deviate axes) and between the parameters (the slopes of straight lines on the normal-deviate axes) of the two ROC curves. 1) Jack-Knife Method: The classification performance was first studied using the jack-knife method and ROC analysis. For the jack-knife method, one half of the sample patterns were selected randomly from the database for training of the neural network; subsequently, the other half of the sample patterns were used for testing the trained neural network [5]. The 172 ROI’s were partitioned arbitrarily into training and test sets with equal numbers of ROI’s, i.e., each set included 86 ROI’s containing 50 negative ROI’s and 36 positive ROI’s. In this study, ten combinations of training and testing pairs were used to generate the ROC curves. All the textural features were normalized by the sample mean and standard deviation of the training set. The training set was used to train the neural network which used a backpropagation algorithm. Each training in this experiment was completed once the value of was less than 0.1, i.e., was 0.1. The learning rate and the momentum had values of 0.08 and 0.7, respectively. The optimal parameters for each texture-analysis method were analyzed. These are the numbers of hidden neurons and the intersample distance ( ) for the SGLDM and the GLDM, the number of hidden neurons and the threshold value ( ) for the SRDM, and the number of hidden neurons for the GLRLM. values for various parameters. Each Fig. 5 shows the value is the average performance from ten combinations of values for the training and testing pairs. Fig. 5(a) shows SRDM with respect to the threshold and the number of hidden neurons. The optimal performance was achieved when was 70 and the number of hidden neurons was five. In general, the larger the value of , a large number of positive


236


(a)

(b)

(c)

(d)

Fig. 5. Comparisons of classification performances for the texture-analysis methods with various parameters. The Az values were computed for various hidden neurons (i.e., 1, 3, 5, 7, 10, and 12 neurons). The Az values of the SGLDM and the GLDM were analyzed with respect to the intersample distances (i.e., d = 1; 3; 5; and 7 pixels) whereas the Az values of the SRDM were analyzed with respect to the threshold values (i.e., q = 50; 60; 70; 80; and 100). Number of hidden neurons: (a) SRDM, (b) SGLDM, (c) GLDM, and (d) GLRLM.

ROI’s can be classified as negative, whereas the smaller the value of , a large number of negative ROI’s can be classified values for the as positive. Fig. 5(b) and (c) shows the SGLDM and the GLDM with respect to the intersample distance and the number of hidden neurons. The optimal performances were achieved when was one and the numbers of hidden neurons were ten and seven for the SGLDM and the GLDM, respectively. The performance decreases as the intersample distance increases. Fig. 5(d) shows the results values for the GLRLM with respect to the number of of hidden neurons. The optimal performance was achieved for seven hidden neurons. Fig. 6 shows a comparison of the four ROC curves obtained by using the optimal parameters for each of the four textureanalysis methods. TPF and FPF denote the true-positive fracvalues tion and the false-positive fraction, respectively. The for the SRDM, the SGLDM, the GLDM, and the GLRLM are 0.93, 0.88, 0.74, and 0.68, respectively. The textural features from the SRDM have the best performance in terms of the , whereas those of the GLRLM area under the ROC curve, have the worst performance. The CLABROC program was levels corresponding to the statistical used to test the significance of the difference between the SRDM and the other conventional texture-analysis methods. The results are summarized in Table I. As shown in Table I, the difference between the SRDM and the GLDM is statistically significant level, the difference between the SRDM and at the level, the GLRLM is statistically significant at the and the difference between the SRDM and the SGLDM is not level. significant at the

Fig. 6. Comparison, based on the jack-knife method, of the ROC curves for each texture-analysis method with optimal parameters. TPF and FPF denote the true-positive fraction and the false-positive fraction, respectively. The Az values of the SRDM (at q = 70 and H N = 5 ), the SGLDM (at d = 1 and H N = 10 ), the GLDM (at d = 1 and H N = 7 ), and the GLRLM (at H N = 7) are 0.93, 0.88, 0.74, and 0.68, respectively. HN denotes the number of hidden neurons.

2) Round-Robin Method: The classification performance was also studied by the round-robin method (leave-one-out method) and ROC analysis. For sample patterns, the round-



237

TABLE I COMPARISONS OF p-LEVELS CORRESPONDING TO THE STATISTICAL SIGNIFICANCE OF THE DIFFERENCE BETWEEN THE SRDM AND THE CONVENTIONAL TEXTURE-ANALYSIS METHODS

TABLE II COMPARISON OF COMPUTATION TIME REQUIRED TO EXTRACT TEXTURAL 128 PIXELS WITH 12 BITS FEATURES FROM AN ROI, WHICH IS 128 PER PIXEL, IN HP WORKSTATION (MODEL 715 AND 100 MHz CPU)

2

Fig. 7. Comparison, based on the round-robin method, of the ROC curves for each texture-analysis method with optimal parameters. TPF and FPF denote the true-positive fraction and the false- positive fraction, respectively. The Az values of the SRDM (at q 70 and HN 5), the SGLDM (at d 1 and HN 10), the GLDM (at d 1 and HN 7), and the GLRLM (at HN 7) are 0.92, 0.86, 0.73, and 0.67, respectively. HN denotes the number of hidden neurons.

=

=

=

=

=

=

=

robin method trains the classifier with samples, then uses the one remaining sample as a test sample. Classification is repeated in this manner until all samples have been used was 172 (172 sample once as a test sample. In this study ROI’s). Among the 172 ROI’s, 72 ROI’s were positive, whereas 100 ROI’s were negative. Since the round-robin method is performed with a sample which is not used for training the ANN, these trials provide a good approximation of the general performance of the ANN. All the textural features were normalized by the sample mean and standard deviation of the data set. As shown in Fig. 6, the performance comparison using the round-robin method was performed with the optimal parameters, which were defined by the jack-knife method. Fig. 7 shows a comparison, based on the round-robin method, of the ROC curves for each texture-analysis method.

The values for the SRDM, the SGLDM, the GLDM, and the GLRLM are 0.92, 0.86, 0.73, and 0.67, respectively. The textural features extracted from the SRDM have the best performance in terms of the area under the ROC curve, , whereas the GLRLM has the worst performance. For round-robin tests, CLABROC was used to test the levels corresponding to the statistical significance of the difference between the SRDM and the other conventional texture-analysis methods. The results are summarized in Table I. As shown in Table I, the difference between the SRDM and the GLDM is level, the difference statistically significant at the between the SRDM and the GLRLM is statistically significant level, and the difference between the SRDM at the level. and the SGLDM is not significant at the 3) Computation Time: Table II shows the computation time required to extract the textural features from an ROI 128 pixels with 12 bits per pixel. All which had 128 programs were written in C language and executed on a HP workstation (model 715 and 100 MHz CPU). The SGLDM was very time consuming on 12-bit processing because the matrix size was too large. From the viewpoint of classification accuracy and computational complexity, it is apparent that the SRDM is superior to the other methods. IV. CONCLUSIONS The goal of this work was to compare the SRDM to the other conventional texture-analysis methods with respect to detection of clustered microcalcifications in digitized mammograms. The conventional texture-analysis methods addressed in this research are the SGLDM, the GLRLM, and the GLDM. This paper quantitatively analyzed the abilities of the textureanalysis methods to detect clustered microcalcifications in digitized mammograms. Textural features extracted from these methods were exploited to classify ROI’s into positive ROI’s and negative ROI’s. A three-layer backpropagation neural


238


network was employed as a classifier. The results of the classifications by the texture-analysis methods were evaluated by ROC analysis. From the viewpoint of classification accuracy and computational complexity, the SRDM was superior to the other conventional methods. A database must be representative of the population over the entire feature space in order for the network adequately to model the probability density function of each class. For a fair comparison of the texture-analysis methods, the neural network for each texture-analysis method was optimized with the sample patterns of 172 ROI’s which represented the entire feature space. For the limited number of sample patterns, the experimental results for the SRDM are promising. However, a comprehensive database that covers many more positive ROI’s and negative ROI’s will be needed for training of the neural network in order to apply our method to clinical situations involving the detection of clustered microcalcifications in mammograms. ACKNOWLEDGMENT The authors are grateful to Dr. J. M. Park and Dr. K. S. Song of Asan Medical Center in Seoul, Korea, for helpful discussion and for providing the digitized mammograms. REFERENCES [1] R. G. Bird, R. G. Wallace, and B. C. Yankaskas, “Analysis of cancers missed at screening mammography,” Radiology, vol. 184, pp. 613–617, 1992. [2] Y. O. Ahn, B. J. Park, and K. Y. Yoo, “Incidence estimation of female breast cancer among Koreans,” J. Korean Med. Sci., no. 9, pp. 328–334, 1994. [3] D. B. Kopans, Breast Imaging. Philadelphia, PA: J. B. Lippincoff, 1989, pp. 81–95. [4] H. P. Chan, K. Doi, S. Galhotra, C. J. Vyborny, H. Macmahon, et al., “Image feature analysis and computer-aided diagnosis in digital radiography—1: Automated detection of microcalcifications in mammography,” Med. Phys., vol. 14, no. 4, pp. 538–548, 1987. [5] Y. Wu, K. Doi, M. L. Giger, and R. M. Nishikawa, “Computerized detection of clustered microcalcifications in digital mammograms: Application of artificial neural networks,” Med. Phys., vol. 19, no. 3, pp. 555–560, 1992. [6] D. H. Davies and D. R. Dance, “Automatic computer detection of clustered calcifications in digital mammograms,” Phys. Med. Biol., vol. 35, no. 8, pp. 1111–1118, 1990. [7] R. N. Strickland and H. I. Hahn, “Wavelet transform for detecting microcalcifications in mammograms,” in IEEE Trans. Med. Imag., vol. 15, no. 2, pp. 218–229, 1996.

[8] H. Yoshida, K. Doi, and R. M. Nishikawa, “Automated detection of clustered microcalcifications in digital mammograms using wavelet transform techniques,” Proc. SPIE on Visual Commun. and Image Processing, vol. 2167, pp. 868–886, 1994. [9] M. N. Gurcan, Y. Yardimci, A. E. Centin, and R. Ansari, “Detection of microcalcifications in mammograms using nonlinear subband decomposition and outlier labeling,” Proc. SPIE on Visual Commun. and Image Processing, vol. 3024, pp. 909–918, 1997. [10] J. K. Kim, J. M. Park, K. S. Song, and H. W. Park, “Detection of clustered microcalcifications on mammograms using surrounding region dependence method and artificial neural network,” J. VLSI Signal Processing, vol. 18, pp. 251–262, 1998. [11] H. P. Chan, K. Doi, C. J. Vyborny, R. A Schmidt, and C. E. Metz, et. al., “Improvement in radiologists’ detection of clustered microcalcifications on mammograms,” Investigative Radiology, vol. 25, pp. 1102–1110, 1990. [12] R. M. Nishikawa, D. E. Wolverton, R. A. Schmidt, and J. Papaioannou, “Radiologists’ ability to discriminate computer-detected true and false positives from an automated scheme for the detection of clustered microcalcifications on digitized mammograms,” Proc. SPIE Med. Imag., vol. 3036, pp. 198–204, 1997. [13] K. Doi, M. L. Giger, R. M. Nishikawa, K. R. Hoffmann, and R. A. Schmidt, et. al., “Prototype clinical ‘intelligent’ work station for computer-aided diagnosis,” RSNA, no. PH087, pp. 1–10, 1995. [14] H. Kobatake, K. Okuno, M. Murakami, M. Ishida, and H. Takeo, et. al., “CAD system for full-digital mammography and its evaluation,” Proc. SPIE Med. Imag., vol. 3034, pp. 745–752, 1997. [15] M. Nadler and E. P. Smith, Pattern Recognition Engineering. New York: Wiley, 1993. [16] R. M. Haralick, K. Shanmugan, and I. Dinstein, “Textural features for image classification,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, pp. 610–621, Nov. 1973. [17] M. M. Galloway, “Texture analysis using gray level run lengths,” Comput. Graph. Image Processing, vol. 4, pp. 172–179, 1975. [18] J. S. Weszka, C. R. Dyer, and A. Rosenfeld, “A comparative study of texture measures for terrain classification,” IEEE Trans. Syst., Man, Cybern., vol. SMC-6, pp. 269–285, Apr. 1976. [19] L. Fausett, Fundamentals of Neural Networks. Englewood Cliffs, NJ: Prentice-Hall, 1994, pp. 289–333. [20] C. E. Metz, “ROC methodology in radiologic imaging,” Investigative Radiology, vol. 21, no. 9, pp. 720–733, 1986. , “Some practical issues of experimental design and data analysis [21] in radiological ROC studies,” Investigative Radiology, vol. 24, no. 3, pp. 234–245, 1989. [22] K. R. Castleman, Digital Image Processing. Englewood Cliffs, NJ: Prentice-Hall, 1996, pp. 527–534. [23] C. E. Metz, J. H. Shen, and B. A. Herman, “New methods for estimating a binormal ROC curve from continuously-distributed test results,” presented at the 1990 Annual Meeting American Statistical Association, Anaheim, CA, Aug. 7, 1990. [24] C. E. Metz, P. L. Wang, and H. B. Kronman, “A new approach for testing the significance of differences between ROC curves measured from correlated data,” in Proc. Information Processing Medical Imaging 8th Conf., Boston, MA, 1984, pp. 432–445.