multi-kernel svm based classification for brain tumor segmentation of ...

MULTI-KERNEL SVM BASED CLASSIFICATION FOR BRAIN TUMOR SEGMENTATION OF MRI MULTI-SEQUENCE Nan Zhang1.2.3, Su Ruan1, Stéphane Lebonvallet1, Qingmin Liao2,Yuemin Zhu3 1

2

CReSTIC, IUT de Troyes, 10026, France Department of Electronic Engineering, Tsinghua University, 100084, China 3 CREATIS, INSA de Lyon, 69621, France [email protected], [email protected] ABSTRACT

In this paper, the multi-kernel SVM (Support Vector Machine) classification, integrated with a fusion process, is proposed to segment brain tumor from multi-sequence MRI images (T2, PD, FLAIR). The objective is to quantify the evolution of a tumor during a therapeutic treatment. As the procedure develops, a manual learning process about the tumor is carried out just on the first MRI examination. Then the follow-up on coming examinations adapts the learning automatically and delineates the tumor. Our method consists of two steps. The first one classifies the tumor region using a multi-kernel SVM which performs on multi-image sources and obtains relative multi-result. The second one ameliorates the contour of the tumor region using both the distance and the maximum likelihood measures. Our method has been tested on real patient images. The quantification evaluation proves the effectiveness of the proposed method. Index Terms—brain tumor segmentation, follow-up, fusion, multi-kernel SVM. 1. INTRODUCTION With a high death risk brain tumor is a serious and malignant disease. Examination results of tumor indicate that MRI (Magnetic Resonance Imaging) is a highly sensitive and successful diagnostic imaging modality [1,2]. From different excitation sequences, MRI can obtain different types of images with different information. Fusing multi-spectral MRI sources has great necessity for good prognosis and diagnosis. As MRI examination leads to a large amount of 3D data with high dimensions, developing an automatic system for processing data and automatic tumor segmentation from multiple sources is necessary to reduce both human errors and workload. But it is still a great challenge. Many methods are proposed for solving this problem. Classic methods concentrate on clustering or morphological operations. But the results are not satisfactory. A hybrid deformable model method, composed of several deformable models such as shape model, texture integration model and graphical model is proposed in [3]. Texture analysis is another effective algorithm. But the number of extracted

978-1-4244-5654-3/09/$26.00 ©2009 IEEE

features should be reduced and the sensitivity of algorithms must be improved for general tumor cases [4]. A structural analysis to extract features on both tumorous and normal tissues is studied in [5]. In order to discriminate all the existing tissues such as edema, white matter, grey matter, and cerebrospinal fluid, a probability map is proposed in [6] which can also predict tumor growth at the same time. SVM (support Vector Machine), as a parametrically kernel-based method to deal with supervised classification problems, has been proven to be successful [2,7]. Its theory and method is stated detailed in [7]. Single kernel SVM has been widely used in data analysis domains, including tumor segmentation, mostly owing to its great classification ability [6-8]. As introduced in [9], SVM, which fits for data of high dimensions and from multiple sources particularly, extends the use of kernels which are crucial to incorporate a priori knowledge of the applications. Recent developments have shown the benefit of multi-kernel SVM [10,11]. Multikernel SVM adequately utilizes particular characteristic of each source and provides more possibility to choose suitable kernels or their weighted combination especially for data from multiple heterogeneous sources. In the learning progress described in [10], Canonical correlation analysis (CCA) is adopted for the data in all the feature spaces mapped from input sources with multi-kernels where each kernel is corresponding to a relevant feature space. Then a modification of Ho-Kashyap algorithm based on squared approximation of misclassification errors (MHKS) is used to fuse the multi-results together. In [11], a two step algorithm is proposed, which in the learning firstly decide parameters to make up each kernel, then parameters of multi-kernel. In this paper, a multi-kernel based SVM integrated with a feature selection and a fusion process is proposed to segment the brain tumor from multi-sequence MRI images. It can reduce the data needed to be dealt with, speed up the process and finally improve the segmentation accuracy. 2. METHOD 2.1. Overview of the segmentation system The framework of the segmentation system is shown in Fig. 1. The main step for tumor segmentation is based on multi-

3373

ICIP 2009

kernel SVM. Three MRI types-T2, PD (Proton Density) and Flair (Fluid Attenuated Inversion Recovery) are input data. Image sources Feature selection of each source Multi-kernel based classification Tumor class

Adaptive training

Refinement by region growing Final tumor Fig. 1- Framework of the proposed system

All the data sets are firstly registered identically to images of the first examination. The same numbers of points inside and outside the tumor region in a same Flair slice are chosen randomly as learning samples. Then features of each sample point are selected with PCA (Principal Components Analysis) for the three volumes respectively. Each source corresponds to a special SVM model with different kernel parameter. This manual learning step is carried out only once at the beginning of the whole therapeutic process. All the three volumes are then classified by the relevant SVM classifiers to get multiple results which allow to catch all characteristics of the tumor from different sequences. A tumor class is finally obtained by a logical operation AND to these multiple results. The obtained kernel parameter is adapted in the whole process for each corresponding source in the next examinations. However, due to noise or the low contrast near the boundary of the tumor region, some points belonging to the tumor are possibly missing. In order to find these points a supplementary step based on the contour of tumor region is proposed here. The characteristics of the tumor are measured in the tumor class. A region growing process is then performed on the boundary points based on both the distance between the contour and the likelihood measures. New images in the coming examinations are always registered to those of the first MRI examination. The tumor region obtained at last examination is projected to images in a newer one to initialize the zone of interest from which the same number of sample points as in the first examination are selected adaptively, and a new training process on new sources can be automatically carried out. As the patient is examined about every four months, the size and location of the tumor region can not change very much, different from the intensity. A region erosion is performed before the projection to assure the performance of the segmentation system. 2.2. Feature selection Feature selection is an important pre-processing step for dimension reduction and classification of multiple input data. It can not only avoid the curse of dimensionality but also improve the classification performance by removing non-relevant inputs. PCA as a classic feature selector has

proven to be very powerful and fit for wide applications [12]. In order to build our multi-kernel SVM treatment framework, this method is used here to select the features from a vector extracted from a small window centered at each sample point. Each feature vector is composed of intensity values and seven texture parameters (mean value, standard variance, geometric mean, harmonic value, median absolute deviation based on mean value, skewness, and kurtosis) of each volume. All the feature vectors of each volume are selected and re-organized in the same order to form a training matrix and the eigenvectors matrix is obtained meanwhile for projecting all the test points to the feature space. Tab. 1 shows influence of windows size to the features with the same learning samples (a demonstration of Flair volume, the same situation as T2 and PD). The symbol “No.” denotes the number of selected features. From Tab. 1, the number of the most representative features does not change any more after 11u11 . Considering the necessity of containing all the main features and computation complexity of the system, the window size is selected as 11u11 pixels. With PCA the data amount to be dealt with reduces by more than eighty percent and the time consumption more than the fifty percent. Tab .1-the influence of window size to the number of features 7u 7 9u 9 11 u 11 13 u 13 15 u 15 size 5u 5 No. 7 9 9 10 10 10

2.3. Multi-kernel SVM Basically, multi-kernel SVM for dealing with multi-sources has two situations. The first one is to use a kernel for all the sources. In this case, the fusion is twofold: for input data and also for results from repetitious classifications. The second one is that each input has a relevant kernel to obtain a corresponding result. Fusion of all results leads to the final one. In the classification problems, the kernel type can be different or the same (distinguished by some parameters with different values in this case). In our application, each MRI input has its special information provided. Hence, just using a single kernel can not reflect the characteristic of all the data sources. Moreover, multi-kernel SVM will well broaden the kinds of kernel used and develop the result estimation. Based on the analysis above, the second type of multi-kernel SVM is used in our work. The main idea here is to get the tumor class through the intersection of the best classified regions from the best kernel parameters of each volume. The kernel function is defined as follows: (1) K ( xi , x j ) )(xi ), )(x j ) xi , x j are input vectors, ) is a map to transform source data

from input space to feature space. Kernel function has different forms of linear, polynomial or Gaussian, etc. In (1), a scalar value obtained from kernel function is equal to the dot product of transformed vectors in feature space. That means kernel function can determine the distance or describe the similarity of the two feature vectors of high dimensions to some degree.

3374

The decision function in the classification of each volume with a best parameter set has a form of:

¦

f n ( x)

M m 1

D m y m K n ( x, x m ) b

(2)

{D m } is a weight series. ym is the label of sample xm . M sample points and the n th kernel (the most effective one) are used in the learning. b is a constant coefficient. In a two-class multi-kernel based segmentation problem, synthesize equation (2) and the number of sources N , the final decision function is defined as: N (3) f ( x) ¦ n 1 E n f n (x) bt

where f n (x) is a decision function as defined in equation (2) relevant to a certain volume, x is input data needed to be classified, {E n } is another weight series to show the effect of each volume to final result and bt is another scalar coefficient similar to b . Hence, to determine the decision function is equivalent to obtain all parameters values of (3) by learning. The smaller E n is, the less proportion f n (x) takes in the final decision. When En 0 , the corresponding f n (x) does not influence the process of clustering. A family of Gaussian kernel functions is chosen to form multi-kernel in our work. 2 (4) K q (x, xi ) exp( x xi / 2V q 2 ) where q is the number of kernels, and V q the corresponding standard deviation of each kernel. Different parameters V construct different kernels. 2.4. Region growing Based on the tumor class from the first step, the possible missing points outside the contour can be determined by using both the distance (between the point and the contour of the tumor region) and the maximum likelihood measures. Suppose that the intensity distribution of the tumor region obeys a normal distribution N ( P , V ) . The likelihood of one point p in each volume is measured as: 1

li ( p )

2SV i

exp[

( xi Pi ) 2 ] 2V i 2

(5)

In (5), i corresponds to one single volume. Pi , V i are the relevant mean value and standard variance. xi is the intensity of point p . The decision criterion of this point belonging to tumor is defined as: 3

¦ l ( p) ! th p tumor i

(6)

i 1

th is a threshold and it is described as: th [1 0.1u (d 1)] u c (7) d denotes the distance between the point p and the contour and it increases one by one actualized by a morphological dilation operation. c is an experienced constant prefixed experimentally. In our work its value is 0.06. Next dilation starts from the new contour. Dilation is being fulfilled

3375

continuously until the contour does not change any more and then we get the final tumor result. 3. EXPERIMENT In our study, MRI images are acquired on a 1.5T GE (General Electric Co.) machine using an axial FSE (Fast Spin Echo) T2-weighted sequence, an axial FSE PDweighted sequence and an axial FLAIR. The total number of slices in a single T2 volume is 24, the same as in PD and Flair, with a voxel size of 0.47×0.47×5.5 mm3. All the images have been registered to the first examination before feature selection. 60 samples inside and outside the tumor are randomly chosen in a slice of Flair to carry out the learning in the first examination. Once the optimal tumor class is obtained, it will be automatically adapted directly to segment the tumor of the following examinations. A family of Gaussian kernel functions with different standard deviation for each source is selected to form the multi-kernel. The effective parameter domain to maintain the performance of SVM is from 0.01 to 1.5. The standard deviation changes in an order of 0.01, 0.05, 0.1, 0.5, 1, 1.1, 1.2, 1.3, 1.4, and 1.5 without an equal step in our tests. Small parameter makes the tumor class change a little, but do very much when superior to 1 from our experiments. The parameter set above is suitable for all the cases of our experiments and the parameter does not change very much from case to case. Each parameter leads to a result of the tumor region. The best one will be selected based on quantitative measurements of SVM performance (such as the total number of support vectors and the accuracy of testing sample points) and comparison with ground truth, and the parameter obtained in the learning process will be automatically adapted in the follow-up examinations. The same procedure repeats to T2 and PD with the samples in the same location as Flair and the intersection of all the best results of the three volumes determines the kernel tumor class region. And then region growing is used to obtain relevant results of the three volumes respectively according to special knowledge and characteristic of each one. Fig.2 shows the tumor class after fusion (shown in a Flair slice) and the final results of the three volumes after region growing. The tumor is more complex with an irregular contour than the second one and our algorithm can well fit this situation. In the experiments to the first patient, the best parameter V for Flair is 0.05 while 0.1 to T2 (twice to that of Flair) and 0.5 to PD (ten times). The parameter of Flair is the smallest, because slices of Flair have a higher contrast and are easier to be clustered than those of T2 and PD. Larger parameter will cause a weaker classifier for non-constrictive low-contrast data. Fig.3 gives the original slices and final results of another patient just in the Flair volume for visual convenience. It also demonstrates the effectiveness of the region growing step which allows to ameliorate the final contour of the tumor (Fig.3.b). Two final contours superimposed in the

original images are shown in Fig.3.c. The both results are confirmed by the experts.

system has been used to follow up the tumor shown in Fig.3 supervised for more than one year. Tab. 3 gives the reducing percentage of the tumor volume during the therapeutic procedure. The symbol Pn (n=1~5) denotes the n th examination. The examination intervals are about 4 months. The follow-up process shows that the tumor deceases during the treatment for the patient. Tab. 2- Accuracy evaluation (average of the three volumes) Measurement TP FP FN total error Before dilation 93.9% 8.8% 6.1% 14.9% Multi-kernel 98.9% 4.5% 3.1% 7.6% Single-kernel 98.6% 11.6% 1.4% 13.0% Tab. 3-Reducing percentage of tumor in a therapy procedure P2~P3 P3~P4 P4~P5 Period P1~P2 Percentage (%) 7.40 13.06 2.39 4.29

4. CONCLUSION Fig.2- Tumor class after fusion (Flair) (a) and final results after region growing respectively: (b) T2; (c) PD; and (d) Flair.

(a)

Multi-kernel SVM fusing different source data is proposed in this paper. This method deals with multi-input data, solves challenging tumor segmentation problem and obtains accurate results. Compared with traditional single kernel SVM, it ameliorates the results with “good” parameters for each data. Our future work is to study other fusion methods to integrate efficiently multiple sources and validate the whole process in a larger database with different patients in different therapeutic periods. ACKONWLEDGEMENT The authors would like to thank Doctor J.M. Constans (CHU de Caen) very much for providing the MRI data used in this study.

5. REFERENCES (b)

(c) Fig.3- Original images (Flair) of the second patient (a); Segmentation results in which gray points denote tumor class region and white ones are derived from the region growing (b); Final results (c).

A quantitative evaluation of the proposed method is carried out by comparing with a ground truth and shown in Tab. 2. Criteria such as true positive (TP), false positive (FP) and false negative (FN) are used for quantitative measure of the segmentation accuracy. Total error is the sum of FN and FP. Tab.2 demonstrates an evaluation of the region growing step. The comparison with a single kernel SVM fusion [2], under the same conditions (Tab. 2), shows a diminution of the total error and improvement of accuracy by our method although TP is almost the same as single kernel SVM. Our

[1] J. Levman, et al, “Classification of dynamic contrast-enhanced magnetic resonance breast lesions by support vector machines”, IEEE Trans. Med. Imaging, Vol. 27 (5), pp.688-696, 2008; [2] S. Ruan, et al, “Tumor segmentation from a multi-spetral MRI images by using support vector machine classification”, ISBI 2007, pp.1236-1239, 2007; [3] D. Metaxas, et al. “Hybrid deformable models for medical segmentation and registration”, ICARCV 2006, pp.5-8, 2006; [4] K. Iftekharuddin, et al, “Brain tumor detection in MRI: technique and statistical validation”, ACSSC 2006, pp.1983-1987; [5] X. Xuan, et al, “Statistical structure analysis in MRI brain tumor segmentation”, ICIG 2007, pp.421-426, 2007; [6] H. Cai, et al. “Probabilistic segmentation of brain tumors based on multi-modality magnetic resonance images”, ISBI 2007, pp.600-603, 2007; [7] V, Vapnik, “The Nature of Statistical Learning Theory”, Springer, 1995; [8] J. Zhou, et al, “Extraction of brain tumor from MR images using one-class SVM”, EMBS 2005, pp.6411-6414, 2005; [9] M. Filippone, et al, “A survey of kernel and spectral methods for clustering”, Pattern Recognition (41), pp.176- 190, 2008; [10] Z. Wang, et al, “MultiK-MHKS: a novel multiple kernel learning algorithm”, IEEE Trans. PAMI (30), pp.348-353, 2008; [11] A. Rakotomamonjy, F. Bach, et al, “More efficiency in multiple kernel learning”, ICML 2007, pp. 775-782, 2007; [12] T. Jolliffe, “Principal Component Analysis”. Springer 1986;

3376