Improved Parkinsonism diagnosis using a partial least squares based ...

5 downloads 50065 Views 628KB Size Report
Key words: automatic classification, 123I-ioflupane, partial least squares, support vector machines ... proprietary software to delimit regions of interest (ROIs) and.
Improved Parkinsonism diagnosis using a partial least squares based approach F. Segovia,a) J. M. Górriz, J. Ramírez, and I. Álvarez Department of Signal Theory, Networking and Communications, University of Granada, Granada 18071, Spain

J. M. Jiménez-Hoyuela and S. J. Ortega Department of Nuclear Medicine, Hospital Universitario Virgen de la Victoria, Málaga 29010, Spain

(Received 23 January 2012; revised 16 May 2012; accepted for publication 6 June 2012; published 27 June 2012) Purpose: An accurate and early diagnosis of Parkinsonian syndrome (PS) is nowadays a challenge. This syndrome includes several pathologies with similar symptoms (Parkinson’s disease, multisystem atrophy, progressive supranuclear palsy, corticobasal degeneration and others) which make the diagnosis more difficult. 123 I-ioflupane allows to obtain in vivo images of the brain that can be used to assist the PS diagnosis and provides a way to improve its accuracy. Methods: In this paper, we show a novel method to automatically classify 123 I-ioflupane images into two groups: controls or PS. The proposed methodology analyzes separately each hemisphere of the brain by means of a novel approach based on partial least squares (PLS) and support vector machine. Results: A database with 189 123 I-ioflupane images (94 controls and 95 pathological images) was used for evaluation purposes. The application of the proposed method based on PLS yields high accuracy rates up to 94.7% with sensitivity = 93.7% and specificity = 95.7%, outperforming previous approaches based on singular value decomposition, which are used as a reference. Conclusions: The use of advanced techniques based on classical signal analysis and their application to each hemisphere of the brain separately improves the (assisted) diagnosis of PS. © 2012 American Association of Physicists in Medicine. [http://dx.doi.org/10.1118/1.4730289] Key words: automatic classification, Parkinson’s disease

123

I-ioflupane, partial least squares, support vector machines,

I. INTRODUCTION Parkinsonian syndrome (PS), a.k.a. Parkinsonism, is a neurological disorder characterized by the presence of hypokinesia associated with rest tremor and/or rigidity and/or postural instability. The most common cause of Parkinsonism is the Parkinson’s disease (PD), a neurodegenerative disease that originates due to the progressive loss of dopaminergic neurons of the nigrostriatal pathway (a neural pathway that connects the substantia nigra with the striatum). As a result of this cell loss, there is a substantial decrease in the dopamine content of the striatum, and a corresponding loss of dopamine transporters.1 Nowadays, PD is the second most frequent neurodegenerative disease. About 1%–2% of people over 65 years suffer the disease and about 20%–24% of them are incorrectly diagnosed. 123 I-ioflupane (better known by its tradename DaTSCAN) is a neuroimaging radiopharmaceutical drug, widely used to assist the diagnosis of Parkinsonism. After intravenous injection, this drug binds to the dopamine transporters in the striatum. Then, using a single-photon emission computed tomography (SPECT) camera, it is possible to obtain the distribution of the radiopharmaceutical in the brain and visualize the loss of dopamine transporters.1–3 The SPECT maps obtained that way are traditionally examined by experienced clinicians to assist the PS diagnosis.4 For this purpose, clinicians often use proprietary software to delimit regions of interest (ROIs) and quantify the radiopharmaceutical uptake.5, 6 This procedure is subjective and prone to error since it requires gross changes in 4395

Med. Phys. 39 (7), July 2012

the transporter density throughout the ROI to allow the differentiation between controls and pathological images. As such, it may not be sensitive to changes in the pattern of distribution that can characterize the progression of the disease.7 Machine learning8 is a scientific discipline concerned with the design and development of algorithms that allows computers to recognize complex patterns and make intelligent decisions based on data. In combination with image analysis procedures, it allows to develop computer-aided diagnosis (CAD) systems for several neurodegenerative diseases, such as PS.9, 10 These systems not only process and analyze image data but also can determine if an image belongs to the class of normal images (healthy subjects) or pathological images (patients), performing that way an automatic diagnosis. Recently, a method7 that combines principal components analysis (PCA) and several classifiers in order to automatically diagnose PS has been proposed. The authors use the single value decomposition (SVD) to reduce the input space and improve the classifiers performance. This approach yields high accuracy rates and outperforms other widely used commercial systems, such as Brass and QuantiSPECT.11, 12 In this work, we propose an improved approach that uses partial least squares (PLS) (Ref. 13) instead of SVD. PLS and SVD are similar techniques since both provide a data decomposition using the variance, however, the former takes into account the information about the underling classes (labels) during the decomposition. In addition, we perform the classification by means of a support vector machine (SVM)

0094-2405/2012/39(7)/4395/9/$30.00

© 2012 Am. Assoc. Phys. Med.

4395

4396

Segovia et al.: Improved Parkinsonism diagnosis using a partial least squares based approach

classifier8 rather than the Bayes’ rule. Another major difference is that we analyze and classify the two hemispheres of the brain separately and then an image is classified as PS whether one of its hemispheres is classified as such. That way two goals are achieved: on the one hand, we improve the detection of the disease even when it only affects a given hemisphere (patients often develop symptoms on one side of the brain before developing bilateral disorder7 ) and, on the other hand, we reduce by half the input space for classification operating on the ROIs which are relevant in medical practice. This paper is organized as follows. Section II describes the background of the key techniques used in the paper. Section III discusses the materials and methods employed in this work, i.e., the database of 123 I-ioflupane images used for evaluation purposes and the proposed methodology based on PLS and SVM. The application of the latter methods to DaTSCAN images is shown in Sec. IV, performing a fair comparison with the results obtained in Ref. 7 which are used as a baseline. Results are discussed in Sec. V and finally, conclusions are shown in Sec. VI.

II.B. Support vector machine

Support vector machine8 separates a set of binary labeled training data by means of a hyperplane (called maximal margin hyperplane) that is maximally distant from the two classes. The objective is to build a function f : RN → ±1 using training data, that is, N-dimensional patterns xi and class labels yi so that f will correctly classify new examples (x, y) (x1 , y1 ), (x2 , y2 ), . . . , (xi , yi ) ∈ RN × ±1.

(2)

g(x) = wT x + w0 = 0,

II.A. Partial least squares

PLS (Ref. 13) is a statistical method for modeling relations between sets of observed variables by means of latent variables. The underlying assumption of all PLS methods is that the observed data are generated by a system or process which is driven by a small number of latent (not directly observed or measured) variables. Mathematically, PLS is a linear algorithm for modeling the relation between two data sets X ⊂ RN and Y ⊂ RM . After observing n data samples from each block of variables, PLS decomposes the n × N matrix of zero-mean variables, X, and the n × M matrix of zero-mean variables, Y, into the form X = TPT + E, (1)

where N and M are, respectively, the number of features and the number of properties of the observed variables. T and U are n × p matrices of the p extracted score vectors (also known as components or latent vectors), the N × p matrix P, and the M × p matrix Q are the matrices of loadings and the n × N matrix E and the n × M matrix F are the matrices of residuals (or error matrices). The x-scores in T are linear combinations of the x-variables and can be considered as good summaries of the x-variables. Similarly, the y-scores in U are linear combinations of the y-variables and can be considered as good summaries of them. The model structures of PLS and PCA are essentially the same. Both methods transform the data into a set of a few intermediate linear latent variables (components) and then, these new variables are used for regression, dimension reduction, etc. The main difference between PLS and PCA is that the former creates orthogonal weight vectors by maximizing the covariance between elements in X and Y. Thus, when PLS is used for dimension reduction, it not only conMedical Physics, Vol. 39, No. 7, July 2012

siders the variance of the samples (as PCA) but also considers the class labels. Fisher discriminant analysis (FDA) is, in this way, similar to PLS. However, FDA has the limitation that after dimensionality reduction, there are only c − 1 meaningful latent variables, where c is the number of classes being considered. Additionally, when the number of features exceeds the number of samples, the covariance estimates when applying PCA do not have full rank and the weight vectors cannot be extracted.

Linear discriminant functions define decision hyperplanes in a multidimensional space

II. BACKGROUND

Y = UQT + F,

4396

(3)

where w is the weight vector that is orthogonal to the decision hyperplane and w0 is the threshold. The optimization task consists of finding the unknown parameters w and w0 that define the decision hyperplane. When no linear separation of the training data is possible, SVM can work effectively in combination with kernel techniques so that the hyperplane defining the SVM corresponds to a nonlinear decision boundary in the input space. A kernel function is defined as K(xi , xj ) = ϕ(xi )ϕ(xj ).

(4)

The use of kernel functions avoids directly working in the high dimensional feature space, thus the training algorithm only depends on the data through dot products in Euclidean space, i.e., on terms of the form ϕ(xi )ϕ(xj ). III. MATERIALS AND METHODS III.A. Image database

A database consisting of 189 SPECT images from 189 subjects (94 controls and 95 PS patients) was used in order to evaluate the proposed methodology. The images were acquired by the “Virgen de la Victoria” hospital (Málaga, Spain) from January 2003 until December 2008 (see Table I for demographic details). All of them are from patients that TABLE I. Demographic details of the SPECT images used in this work. μ and σ stand for the average and the standard deviation, respectively. Sex

Controls Patients

Age

#

M

F

μ

σ

Range

94 95

49 54

45 41

69.26 68.29

10.16 9.62

33–89 30–87

4397

Segovia et al.: Improved Parkinsonism diagnosis using a partial least squares based approach

5

10

15

20

25

30

35

4397

40

45

F IG . 1. Comparison between a 123 I-ioflupane-SPECT brain image from a healthy subject (top row) and other one from a PS patient (bottom row). Each image is represented by its central axial slices.

attended to the hospital redirected by primary care services and who had symptoms that could be considered as Parkinsonism symptoms. Patients on treatment with drugs which have an effect, known or suspected, by a direct competitive mechanism at the level of dopaminergic transporters were excluded. The SPECT data acquisition was performed after a period of between 3 and 4 h after the intravenous injection of 185 MBq (5 mCi) of 123 I-ioflupane, with prior thyroid blocking by means of the Lugol’s solution. The gamma camera used was a Millennium model from General Electric, equipped with a dual head and general purpose collimator. A 360◦ circular orbit was made around the cranium, at 3◦ intervals, 60 images per detector with a duration of 35 s per interval, 128 × 128 matrix. Image reconstruction was carried out using filtered backprojection algorithms without attenuation correction, application of a Hanning filter (frequency 0.7) and images were obtained with transaxial cuts. All the images were spatially normalized using the SPM software14 and yielding a 73 × 73 × 45 three-dimensional functional activity map for each patient. This software performs the normalization in two steps. The first one assumes a general affine model with 12 parameters and a cost function, f, that measures the difference between a given image and a template  (i(Mxk ) − t(xk ))2 , (5) f = k

where i is the initial image, Mxk is the affine transformation for each voxel xk = {x1 , x2 , x3 }, and t is the template.15 The second step consists of several nonlinear deformations defined by a linear combination of three-dimensional discrete cosine transform (DCT) basis functions.16 This spatial normalization ensures that the same position in the volume coordinate system in different images corresponds to the same anatomical position. Moreover, the intensity of the images was normalized by equaling the background (average intensity of voxels which intensity is between 15% and 30% of maximum intensity) of each image and the template, and scaling the remaining voxels. The template used for spatial and intensity normalization was computed as follows: Medical Physics, Vol. 39, No. 7, July 2012

r The n − 1 controls of the database were spatially normalized using the remaining control (randomly selected) as template. r After spatial normalization, the average of the n controls was computed. r The average image was made symmetric and used as template. Finally, the images were reduced in order to improve the computational complexity of the system. The reduction was performed by averaging blocks of 2 × 2 × 2 voxels, thus, the final images were a size of 37 × 37 × 23 voxels. Once the images have been properly normalized, they were visually labeled by three nuclear medicine specialists from the hospital using only the information contained in the images, without any other medical information. The criteria followed, reached by consensus among the three specialists, were: a study was considered to be normal when bilateral, symmetrical uptake appeared in caudate and putamen nuclei, and abnormal when there were areas of significant reduced uptake in any of the striatal structures.4 Quantitative evaluation was not regarded for image assessment. It is worth noting that most of the patients are borderline cases.

III.B. Automatic classification based on PLS and SVM

The 123 I-ioflupane radiopharmaceutical provides brain images with high activation in the striatum, a region of high interest for the diagnosis of PS.4 Figure 1 shows 4 axial slices of two 123 I-ioflupane-SPECT images, one of them from a healthy subject and the other one from a patient. Observe that the most of the activity is gathered in the striatum. However, the images contain a lot of information (a large number of voxels) that is not relevant for the diagnosis of the disorder. The main feature of the proposed method is to automatically extract the two high-intensity regions and analyze them separately.17, 18 Thus, the first stage consists of separating the two regions by dividing the axial slices into two halves. After that, two volumes are obtained, each one containing one hemisphere of the brain. Then, a binary mask is applied to each volume in order to select only the high-intensity voxels

4398

Segovia et al.: Improved Parkinsonism diagnosis using a partial least squares based approach

4398

A LGORITHM I. Preprocessing. Input: The image database, DB = I1 , . . . , In , and the labels, Y = Y1 , . . . , Yn Output: High-intensity areas of each hemisphere (named DBL∗ and DBR∗ ) C = select controls from DB; A = average of C; A* = square and make symmetrical A; M = create binary mask with highest voxels of A*; [DBL , DBR ] = Split by half (axial axis) all the images in DB; [ML , MR ] = Split by half (axial axis) the mask M; foreach k ∈ [L, R] do DBk∗ = apply MK to all image from DBk end

estimations for the class of each image. Both estimations are logically combined as follows: if at least one is positive (PS), the image is labeled as normal. Figure 2 illustrates the whole procedure. IV. EXPERIMENTS AND RESULTS

F IG . 2. Proposed methodology.

of the striatum area. This mask is computed as follows:  1 if ci ≥ 12 maxci , mi = 0 otherwise,

(6)

where mi , i = 1 . . . n is the value (0 or 1) at position i of the mask, ci , i = 1 . . . n is the intensity of the voxel at position i of an intermediate image, c, and maxci is the highest intensity of c. The image c is computed by squaring and making symmetric (regarding to the axial axis) the average of all controls of the database. Applying this mask allows to select the voxels whose intensity is high (compared with the maximum intensity) in healthy subjects. In practice, this is equivalent to select the voxels of the striatum. Subsequently, the PLS decomposition is performed to obtain the score vectors that will be used as features by the statistical classifier. Therefore, a given image is independently represented and processed by two score vectors, one of them models the high-intensity region of the left hemisphere of the brain and the other one models the high-intensity region of the right hemisphere. Finally, a SVM classifier is used to estimate the underlying class of the image (normal or PS). The classification is carried out for the two score vectors separately, obtaining two Medical Physics, Vol. 39, No. 7, July 2012

In order to evaluate the proposed method, we have used the image database described in Sec. III.A to calculate several measures. Then, these measures have been compared with the ones obtained by a previous approach. The implementation of the experiments consists of two procedures. First, the striatum’s voxels of each hemisphere are selected for all the images. This procedure involves the division of each image into two halves (the two hemispheres) and the selection of the high-intensity area of each hemisphere by means of a binary mask as it is described in Sec. III.B (pseudocode is shown in algorithm 1). Second, the accuracy estimation is performed following a leave-one-out (LOO) crossvalidation (CV) strategy: the classifier is trained as many times as the size of the database and in each iteration an image is used for test and the remaining ones for training. The global accuracy is then calculated as the average of the accuracy achieved in each iteration. The PLS score vectors for the test images are computed as follows: T = VW,

(7)

where T is the score vector for a given image, V are the selected voxels for that image rearranged in a vector form, and W is a weight matrix obtained from the PLS decomposition of the remaining images of the database (see the Appendix for further details). Computing the score vectors that way avoids that the label of the test images are taken into account for the calculation. Since each hemisphere is used separately, a given image is represented by two score vectors, modeling the two highintensity regions of the image. The first scores of the two vectors contain the most part of the covariance between images and labels and can be used to distinguish between pathological images and controls. Once the scores vectors are extracted for both, test image and training set, the classification is performed by means of

Segovia et al.: Improved Parkinsonism diagnosis using a partial least squares based approach

4399

A LGORITHM II. Feature extraction and classification. Input: The labels, Y, and the volumes for the high intensity area of the

4399

TABLE II. Mean, standard deviation, and maximum of the 20 measures obtained with the proposed methodology by varying from 1 to 20 the size of the feature vectors. These rates are compared with the ones obtained in Ref. 7.

left and the right hemispheres, i.e. DBL∗ and DBR∗

Proposed approach

Output: Accuracy, sensitivity and specificity rates foreach ILi , IRi ∈ DBL∗ and DBR∗ respectively do foreach k ∈ [L, R] do Xk∗ = select voxels from all but Iki images in DBk∗ ; Yk∗ = select labels for volumes in Xk∗ from Y;

Accuracy Sensitivity Specificity

Baseline

Mean

Std

Maximum

Mean

Std

Maximum

93.39% 93.16% 93.62%

1.01 1.00 2.18

94.71% 94.74% 95.74%

84.71% 86.37% 83.03%

9.30 7.04 11.84

89.42% 91.58% 88.30%

[Tk∗ , Pk∗ , Wk∗ , . . . ] = SIMPLS (Xk∗ , Yk∗ ); Vki = rearrange Iki into a vector form; Tki = Vki Wk∗ ; SV Mki = train a SVM classifier using Tk∗ and Yk∗ ; classki = predict class for Tki using SV Mki end classi = classLi OR classRi ; if classi = Yi then errors = errors + 1; end

the baseline approach,7 which to our knowledge, is the best so far. The kernel function used along the SVM classification and its parameters were empirically chosen. For this purpose, we computed the accuracy of several possibilities for parameters C and σ by following a leave-one-out strategy. The mean and the maximum accuracies achieved with each pair are gathered in Fig. 4.

update accuracy, sensitivity and specificity rates;

V. DISCUSSION

end

a SVM classifier. As it is shown in algorithm 2, each hemisphere of the brain is classified individually obtaining two predictions per image. If one of them is positive (classified as pathological), the whole image will be labeled as such. Figure 3 shows the accuracy, sensitivity, and specificity obtained as a function of the number of scores which are used as input features, i.e., as a function of the feature vector dimension. The SVM classifier uses a radial basis function (RBF) as kernel with parameters σ and C equal to 5 and 1, respectively. The highest accuracy rates are obtained using few components (five or less) and then they remain quite stable and around 94%. This stability is numerically estimated by means of the mean and the standard deviation shown in Table II. The results are also compared with the ones obtained by means of

As it is shown in Table II, the methodology proposed in this work to analyze 123 I-ioflupane images provides high accuracy rates for PS diagnosis with peak values over 94%. It represents a significant improvement compared with the results obtained by the method described in Ref. 7, used here as a baseline. The main differences regarding the methodology proposed in this work are as follows:

r Preprocessing: Spatial normalization is carried out in the same way. Both methods use the SPM software, which perform a linear affine transformation followed by nonlinear warping using basis functions. However, we perform the intensity normalization by equaling the average value of the background of all images instead of using a volume centered on the occipital cortex to scale the intensity of the voxels. In addition, we do not need to

100 98 96 94 92 90 88 86 84

Accuracy Sensitivity Specificity

82 80

2

4

6 8 10 12 14 Number of PLS scores used as features

16

18

20

F IG . 3. Accuracy, sensitivity, and specificity of a CAD system for Parkinsonism based on the proposed method. Medical Physics, Vol. 39, No. 7, July 2012

Segovia et al.: Improved Parkinsonism diagnosis using a partial least squares based approach

4400

1

1

92

2

94.6

2 94.4

90

4

88

5

86

6 84

3 Parameter σ

3 Parameter σ

4400

7

94.2

4 5

94

6

93.8

7 93.6

82

8

8

80

9 10

10

78 1

2

3

4

5 6 7 Parameter C

8

9

93.4

9

10

93.2 1

2

3

4

5 6 7 Parameter C

8

9

10

F IG . 4. Mean (left) and maximum (right) accuracy achieved with the proposed method in function of the parameters σ and C for the SVM classifier (RBF kernel).

apply a mask to remove voxels outside the brain since we select only the voxels of the striatum (in practice, the voxels with a intensity above the 50% of maximum intensity in controls). r Handling of the nonbilateral disease: In some patients, the disorder affects only one side of the brain before progressing to bilateral disease.17, 18 While in the previous work the authors face the problem realigning the images to cause the hemispheres with the maximum uptake to be on the same side, we try to detect these cases and diagnose them as pathological by analyzing and classifying the hemispheres separately. r Feature extraction: The main difference between PCA (Ref. 7) and the proposed PLS is that the latter creates orthogonal weight vectors by maximizing the covariance between the images and their labels. Thus, PLS not only considers the variance of the samples but also considers the class labels. Using PLS, we get feature vectors with more “information” about how to distinguish

between controls and PS images. In a sense, PLS is a supervised approach and PCA is unsupervised. r Classification: We have chosen a SVM classifier which usually reports highest accuracy rates and generalization ability in Refs. 19–21 which are usually affected by the “small sample size” problem.22 The hyperplanes of decision defined by SVM are shown in Fig. 5. There are other widely used commercial systems for similar purpose,12 however, they have a different philosophy focused on measuring the intensity/activity of a set of previously defined regions and not on the classification. The main advantage of the methodology presented in this work is that it is fully automatic. When the system is trained, it is able to provide a diagnosis from only a 123 I-ioflupane SPECT image, i.e., no prior knowledge about the disease and its ROIs is required. However, it is worth noting that the system only reproduces the knowledge contained in the labeled images used for training. The results reported are a measure about how similar are

0.2

0.15 0 1 Support Vectors

0.15

0.05 Second PLS score

Second PLS score

0.1 0.05 0 −0.05

0 −0.05 −0.1

−0.1

−0.15

−0.15

−0.2

−0.2 −0.15

0 1 Support Vectors

0.1

−0.1

−0.05

0

0.05

First PLS score

(a)

0.1

0.15

0.2

−0.25 −0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

First PLS score

(b)

F IG . 5. Hyperplanes of decision defined by the SVM classifier with RBF kernel when only the first two PLS scores are used as features. (a) classification of left hemispheres, (b) classification of right hemispheres. Medical Physics, Vol. 39, No. 7, July 2012

4401

Segovia et al.: Improved Parkinsonism diagnosis using a partial least squares based approach

4401

1

0.95

0.9

0.85

Sensitivity

0.8

0.75 0.7

0.65

0.6

PLS + SVM (proposed approach) (AUC=0.96831) PLS + SVM (only left hemisphere) (AUC=0.9617) PLS + SVM (only right hemisphere) (AUC=0.95515) PLS + SVM (without division into hemispheres) (AUC=0.96125) Baseline (AUC=0.91837)

0.55

0.5

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

1 − Specificity F IG . 6. ROC curves obtained by the proposed approach and the baseline. It is shown only the upper left quarter in order to better appreciate the differences.

the diagnosis given by a CAD system and the given one by a clinician team. Thus, if the training set is large enough and assuming the labels are correct (gold standard), a CAD system for PS based on the proposed methodology provides a diagnosis as reliable as the one provided by a clinician team but including the robustness of the SVM based machine learning. All the classification results shown in this work were achieved following a LOO CV scheme. CV is an effective method for estimating the actual risk of the classifier,23, 24 however, the error estimated that way could be biased if the steps of the algorithm, such as classifier parameter tuning, feature selection or classifier type selection are left out during each CV loop.25 In this sense, this principle is satisfied ensuring that: (i) the data used to test the classifier are not part of the data used to train it and (ii) the inclusion of the feature selection in a nested CV loop as shown in Sec. IV. V.A. Receiver operating characteristic (ROC) analysis

A ROC curve is a plot of the trade-off achieved between sensitivity and specificity for a classification procedure. Each

−140 −120 −100

−80

−60

−40

−20

0

20

point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. The optimal solution is located in the upper left corner and corresponds to a sensitivity and specificity of 100%. Therefore, the closer the ROC curve is to the upper left corner, the higher the overall accuracy of the procedure.26 A value to measure this accuracy is provided by the area under the curve (AUC). Figure 6 shows a ROC curve for the proposed PLS and SVM based approach and compares it with the ones for the two intermediate classification processes (one per hemisphere) and with the one obtained if the brain is treated as a whole (without division into hemispheres). In addition, a curve for the baseline approach is shown. The two curves for intermediate one-hemisphere classification achieve similar accuracy, which suggests that the two hemispheres are equally appropriate to diagnose the disease. When the two hemispheres are treated as a whole, the results are better, however, they are not as good as possible because the cases with nonbilateral disease are not correctly diagnosed. Finally, the proposed method, described in Sec. III.B, also takes into account the information from both hemispheres but separately

−30

−20

−10

0

10

F IG . 7. Central slices (from 7 to 12 of 23) of the first (left) and the second (right) PLS-brain. Medical Physics, Vol. 39, No. 7, July 2012

20

30

40

4402

Segovia et al.: Improved Parkinsonism diagnosis using a partial least squares based approach

Control image

Pathological image

Control image (reconstructed)

Pathological image (reconstructed)

4402

F IG . 8. Central slices of a control image (left column) and a PS image (right) compared with same slices reconstructed after PLS decomposition but using only the two first scores and loadings.

and achieves a more accurate ROC curve that significantly improves the baseline. V.B. PLS-brains

Similar to the concept of eigenfaces,27 it is possible to define the concept of eigenbrains or, more precisely, PLS-brains (please see Fig. 7). As it has been shown in Sec. II.A, the PLS algorithm decomposes the data in two matrices known as scores and loadings [see Eq. (1)]. The latter would be viewed as elementary brain images and the former would represent the quantity of loadings used for building a specific image. Since the decomposition is performed so that the majority of the covariance is gathered in the first PLS-brains, the first scores are enough for classification. In addition, a given image may be reconstructed from only the first scores and loadings getting a filtered image (without low covariance information). Figure 8 shows two images from the database (one control on the left and one PS image on the right) and the same images reconstructed from their two first scores and loadings. Observe that it is easier to notice the differences between the control and the PS image using the reconstructed images. Note the difference in the shape of the high-intensity areas in the striatum. This image uses only two scores and loading for reconstruction because this number of components allows to achieve the highest accuracy rate as shown in Fig. 3. It is worth noting that a reconstruction using all the components leads to obtain virtually the original images.

have been visually labeled by experienced clinicians as control (94 images) and PS (95 images). The experiments performed yielded an accuracy rate over 94%, outperforming other previous approaches. ACKNOWLEDGMENTS This work was partly supported by the Ministerio de Ciencia e Innovación (Spain) (MICINN) under the TEC200802113 project and the Consejería de Innovación, Ciencia y Empresa (Junta de Andalucía, Spain) under the Excellence Projects P07-TIC-02566, P09-TIC-4530, and P11-TIC-7103. APPENDIX: SIMPLS ALGORITHM The SIMPLS algorithm28 was proposed by Sijmen de Jong in 1993 as an alternative to the NIPALS algorithm for PLS. The main difference between the initial algorithm, NIPALS, and SIMPLS is the kind of deflation: In the latter, no deflation of the centered data matrices X and Y is made, but the deflation is carried out for the cross-product matrix S = XT Y between the x-data and y-data.29 Pseudocode of the SIMPLS algorithm is shown in algorithm 3. A LGORITHM III. SIMPLS (Ref. 29) Input: X and Y Output: W and T S0 = XT Y

VI. CONCLUSIONS In this work, an improved approach to develop a fully automatic CAD system for Parkinsonism based on 123 I-ioflupane SPECT images has been presented. The proposed methodology independently analyzes each hemisphere of the brain with the purpose of properly diagnosing early stages of the disease when only one side of the brain is affected.17, 18 First, the brain volumes were masked to extract high-intensity voxels, usually located in the striatum area. Then, a reduced score vector was obtained per volume using a PLS algorithm. Finally, the state (normal or pathological) of a given image was determined by means of a SVM classifier. In order to evaluate this approach, a database consisting of 189 123 I-ioflupane SPECT images was used. The images Medical Physics, Vol. 39, No. 7, July 2012

for j = 1 to n do if j = 1 then Sj = S0 ; else Sj = Sj −1 − Pj −1 (PTj−1 Pj −1 )−1 PTj−1 Sj −1 ; Compute wj as the first singular vector of Sj ; wj =

wj wj ;

tj = Xwj ; tj tj ; pj = XTj tj ;

tj =

Pj = [p1 , p2 , . . . , pj −1 ]; Use wj as the jth column of W; Use tj as the jth column of T; end

4403

Segovia et al.: Improved Parkinsonism diagnosis using a partial least squares based approach

a) Author

to whom correspondence should be addressed. Electronic mail: [email protected] 1 J. Booij, G. Tissingh, G. J. Boer, J. D. Speelman, J. C. Stoof, A. G. Janssen, E. C. Wolters, and E. A. van Royen, “[123I]FP-CIT SPECT shows a pronounced decline of striatal dopamine transporter labelling in early and advanced Parkinson’s disease,” J. Neurol., Neurosurg. Psychiatry 62, 133– 140 (1997). 2 J. Booij, J. B. Habraken, P. Bergmans, G. Tissingh, A. Winogrodzka, E. C. Wolters, A. G. Janssen, J. C. Stoof, and E. A. van Royen, “Imaging of dopamine transporters with iodine-123-FP-CIT SPECT in healthy controls and patients with Parkinson’s disease,” J. Nucl. Med. 39, 1879–1884 (1998). 3 A. Winogrodzka, P. Bergmans, J. Booij, E. A. van Royen, J. C. Stoof, and E. C. Wolters, “[123I]beta-CIT SPECT is a useful method for monitoring dopaminergic degeneration in early stage Parkinson’s disease,” J. Neurol., Neurosurg. Psychiatry 74, 294–298 (2003). 4 S. J. O. Lozano, M. D. M. del Valle Torres, J. M. Jiménez-Hoyuela García, A. L. G. Cardo, and V. C. Arillo, “Diagnostic accuracy of FP-CIT SPECT in patients with Parkinsonism,” Rev. Esp. Med. Nucl. (English Ed.) 26, 277–285 (2007). 5 S. J. O. Lozano, M. D. M. del Valle Torres, E. R. Moreno, S. S. Viedma, T. A. Raissouni, and J. M. Jiménez-Hoyuela, “Quantitative evaluation of SPECT with FP-CIT. Importance of the reference area,” Rev. Esp. Med. Nucl. (English Ed.) 29, 246–250 (2010). 6 L. Tossici-Bolt, S. Hoffmann, P. Kemp, R. Mehta, and J. Fleming, “Quantification of [123I]FP-CIT SPECT brain images: an accurate technique for measurement of the specific binding ratio,” Eur. J. Nucl. Med. Mol. Imaging 33, 1491–1499 (2006). 7 D. J. Towey, P. G. Bain, and K. S. Nijran, “Automatic classification of 123IFP-CIT (DaTSCAN) SPECT images,” Nucl. Med. Commun. 32, 699–707 (2011). 8 V. N. Vapnik, Statistical Learning Theory (Wiley, New York, 1998) . 9 J. M. Górriz, F. Segovia, J. Ramírez, A. Lassl, and D. Salas-Gonzalez, “GMM based SPECT image classification for the diagnosis of Alzheimer’s disease,” Appl. Soft Comput. 11, 2313–2325 (2011). 10 C. C. Tang, K. L. Poston, T. Eckert, A. Feigin, S. Frucht, M. Gudesblatt, V. Dhawan, M. Lesser, J. P. Vonsattel, S. Fahn, and D. Eidelberg, “Differential diagnosis of Parkinsonism: A metabolic imaging study using pattern analysis,” Lancet Neurol. 9, 149–158 (2010). 11 W. Koch, P. E. Radau, C. Hamann, and K. Tatsch, “Clinical testing of an optimized software solution for an automated, observer-independent evaluation of dopamine transporter SPECT studies,” J. Nucl. Med. 46, 1109– 1118 (2005). 12 R. J. Morton, M. J. Guy, R. Clauss, P. J. Hinton, C. A. Marshall, and E. A. Clarke, “Comparison of different methods of DatSCAN quantification,” Nucl. Med. Commun. 26, 1139–1146 (2005).

Medical Physics, Vol. 39, No. 7, July 2012

13 S.

4403

Wold, H. Ruhe, H. Wold, and W. Dunn III, “The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverse,” J. Sci. Stat. Comput. 5, 735–743 (1984). 14 Statistical Parametric Mapping: The Analysis of Functional Brain Images, edited by K. Friston et al. (Academic, New York, 2007). 15 R. P. Woods, “Spatial transformation models,” in Handbook of Medical Imaging, edited by I. N. Bankman (Academic, San Diego, 2000), Chap. 29, pp. 465–490. 16 J. Ashburner and K. J. Friston, “Nonlinear spatial normalization using basis functions,” Hum. Brain Mapp. 7, 254–266 (1999). 17 R. Djaldetti, I. Ziv, and E. Melamed, “The mystery of motor asymmetry in Parkinson’s disease,” Lancet Neurol. 5, 796–802 (2006). 18 M. Hoehn and M. Yahr, “Parkinsonism: Onset, progression and mortality” Neurology 17, 427–442 (1967). 19 Y. Lee, J. B. Seo, J. G. Lee, S. S. Kim, N. Kim, and S. H. Kang, “Performance testing of several classifiers for differentiating obstructive lung diseases based on texture analysis at high-resolution computerized tomography (HRCT),” Comput. Methods Programs Biomed. 93, 206–215 (2009). 20 M. López, J. Ramírez, J. M. Górriz, I. Álvarez, D. Salas-Gonzalez, F. Segovia, and R. Chaves, “SVM-based CAD system for early detection of the Alzheimer’s disease using kernel PCA and LDA,” Neurosci. Lett. 464, 233–238 (2009). 21 D. Salas-Gonzalez, J. M. Górriz, J. Ramírez, I. A. Illan, M. López, F. Segovia, R. Chaves, P. Padilla, and C. G. Puntonet, and Alzheimer’s Disease Neuroimage Initiative (ADNI), “Feature selection using factor analysis for Alzheimer’s diagnosis using F-FDG PET images,” Med. Phys. 37(suppl. 18), 6084–6095 (2010). 22 R. P. W. Duin, “Classifiers in almost empty spaces,” in Proceedings of the 15th International Conference on Pattern Recognition (IEEE, New York, 2000), Vol. 2, pp. 1–7. 23 C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn. 20, 273–297 (1995). 24 K. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf, “An introduction to kernel-based learning algorithms,” IEEE Trans. Neural Networks 12, 181–201 (2001). 25 S. Varma and R. Simon, “Bias in error estimation when using crossvalidation for model selection,” BMC Bioinf. 7, 91 (2006). 26 M. H. Zweig and G. Campbell, “Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine,” Clin. Chem. 39, 561–77 (1993). 27 M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cogn. Neurosci. 3, 71–86 (1991). 28 S. de Jong, “SIMPLS: An alternative approach to partial least squares regression,” Chemom. Intell. Lab. Syst. 18, 251–263 (1993). 29 K. Varmuza and P. Filzmoser, Introduction to Multivariate Statistical Analysis in Chemometrics (Taylor & Francis, Boca Raton, FL, 2009).