Domain Adaptation Methods for ECG Classification - IEEE Xplore

2 downloads 4567 Views 228KB Size Report
The first is known as domain transfer SVM, whereas the second is the importance weighted kernel logistic regression method. To assess the effectiveness of ...
Domain Adaptation Methods for ECG Classification Yakoub Bazi, Naif Alajlan, Haikel AlHichri, and Salim Malek Advanced Lab for Intelligent Systems Research College of Computer and Information Sciences King Saud University, Riyadh, Saudi Arabia ybazi,najlan,hhichri,[email protected] Abstract— The detection and classification of heart arrhythmias using Electrocardiogram signals (ECG) has been an active area of research in the literature. Usually, to assess the effectiveness of a proposed classification method, training and test data are extracted from the same ECG record. However, in real scenarios test data may come from different records. In this case, the classification results may be less accurate due to the statistical shift between these samples. In order to solve this issue, we investigate, in this paper, the capabilities of two domain adaption methods proposed recently in the literature of machine learning. The first is known as domain transfer SVM, whereas the second is the importance weighted kernel logistic regression method. To assess the effectiveness of both methods, the MIT-BIH arrhythmia database is used in the experiments. Keywords-component; Electrocardiogram, Domain Transfer; Maximum Mean Distribution

I.

Arrhythmias;

INTRODUCTION

Electrocardiogram signal (ECG) provides an effective and practicable way to analyze the heart condition. In recent years, computer based analysis of ECG signals has been developed to assist cardiologists. Since an ECG signal carries important information related to the functionality of the heart, it should be possible to develop an intelligent system for the analisys of ECG signals, in order to classify them into normal and abnormal signals. One of the most important abnormalities is the rhythmic abnormality, which is called arrhythmias. Arrhythmias are related to the rate, regularity, site of origin, or conduction of the electrical impulses carried out in the heart. In recent years, different methods have been developed to detect and classify the different arrhythmias [1-3]. Most of the research has been related to feature extraction and classification of different arrhythmias. The most common features used are morphological features such as R-R intervals, QRS width, P waves, PR segment and ST segment [4-6]. In some techniques, Fourier transform and wavelet transform are used to extract the time and frequency based features of the ECG signals [7-9]. In Reference [10], a cardiac arrhythmia detection system was proposed, with adaptive feature selection and modified support vector machines (SVMs). In a classification system, domain adaptation is another

important problem besides robust features extraction. The majority of the arrhythmias classification techniques, in the literature, use training dataset and testing dataset from the same records. Only few classification techniques assume that training and test datasets are from the different records. The classification method that takes training and test samples from the same distribution are not very robust for real world applications, where we have statistically different training and testing dataset distributions. The conventional trained classifiers perform poor in the testing dataset with different statistical properties from the training dataset. To overcome this problem, the cross domain methods have been introduced to increase the performance of classifiers in these situations. These cross-domain learning methods, are used in other machine intelligence applications such as, natural language processing [12], information extraction [13], etc. Recently, domain adaptation metric learning was proposed to cope the domain adaptation problem [14]. Covariant shift and sampling selection bias are other methods to reduce the distribution difference between the training and testing datasets. These methods assign weights to training datasets to reduce the distribution gap between the training and testing datasets [15,16]. In this paper, we will use to two domain transfer methods, which are domain transfer SVM [17] and importance weighted kernel logistic regression method [18] to enhance the classification performance. We will show that the classification performance of arrhythmias is high when training and test datasets are taken from same domain and the performance of the classification system deteriorates, when the training dataset and test dataset are picked from the different domains, having statistically different properties. However, in this case, using domain transfer SVM method produces good performance. II.

DESCRIPTION

Let us consider that the source domain dataset XS={( xi, yi )}Ni=1 representing different classes of arrhythmias which includes data consists of N vectors xi ∈ ℜd (i = 1, 2, …, N) from the d-dimensional feature space X. To each vector xi, we associate a labels yi ∈ {1,2….M}, where M represents the number of classes. Let us consider the target domain dataset

This work is supported by the National Plan for Sciences and Technology (NPST), King Saud University. Project ID: 11-MED1832-02

978-1-4673-5214-7/13/$31.00 ©2013 IEEE

XT= {(xi, yi)}Li=1, where L represents the number of target domain vectors and with the unknown label yi. XS and XT are drawn from two different joint distributions P and Q, respectively. The mismatch between two distributions is measured by Maximum Mean Distribution (MMD) [19], which is based on the distance between the means of the two datasets: 1) the source domain XS, and 2) the target domain XT in a Reproducing Kernel Hilbert Space (RKHS) mapped by a kernel function k. Let the kernel-induced feature map be . The empirical calculation of the MMD between the source and target domains is: ଵ ଵ ୐ ୗ ୘ (1) ‹•–ሺୗ ǡ ୘ ሻ ൌ ቛ σ୒ ୧ୀଵ ijሺš ୧ ሻ െ σ୧ୀଵ ijሺš ୧ ሻቛ ୒





Where ԡǤ ԡ࣢ is the RKHS norm, ijሺš୧ ୗ ሻ and ijሺš୧ ୘ ሻ are the non-linear feature mapping of the samples from the source domain to the target domain, respectively. The kernel function k is induced from the non-linear feature mapping function ijሺǤ ሻ, i-e k(xi, xj)= ijሺš୧ ሻƍ ij൫š୨ ൯ It is obvious that the classification performance in the target domain will be degraded because of the distribution shift between the source domain and target domain.

The first part of the (2) is to minimize the distance between the source domain data and the target domain data. A column vector s is defined with M=N+L entries, in which the first L entries are set as 1/N and the remaining entries are defined as 1/L, respectively. Therefore, the squared of MMD in (1) can be simplified as follows [19,20]. ‹•– ଶ ሺୗ ǡ ୘ ሻ ൌ –”ሺሻǡ

(3)

Where –”ሺǤ ሻrepresents the trace of a matrix, S = ••ƍ and ୗǡୗ

 ൌ ൤ ୘ǡୗ 

 ୗǡ୘ ൨  ୘ǡ୘

where  ୗǡୗ ‫ א‬Թ୒ൈ୒ ,  ୗǡ୘ ‫ א‬Թ୒ൈ୐ and  ୘ǡ୘ ‫ א‬Թ୐ൈ୐ are the kernel matrices for the source domain, the cross domain from the source domain to target domain, and the target domain respectively. The second part of (2) is to minimize the structural risk function SVMk,f(X) of the better classification performance in the target domain. Let Į be a column vector of the dual variables Įi of each source domain sample, and let y be the label vector. Therefore, SVM is usually solved by its dual formulation: ƒš ୒ ଵ σ Įƍ െ σ୒ σ୒ › › Į Į ሺš୧ š୨ ሻ, ଶ ୧ୀଵ ୨ୀଵ ୧ ୨ ୧ ୨ Į ୧ୀଵ

A. Pre-processing and feature extraction

(4)

ECG signals contain low frequency noise due to respiration and power line interference. The power line interference contains high frequency components. All ECG signals are filtered with high pass filter to remove the low frequency components added due to respiration. To remove the high frequency components, notch filter is used to obtain the noise free ECG signals.

subject to Ͳ ൑ Į୧ ൑  and σ୒ ୧ୀଵ Į୧ ›୧ ൌ Ͳǡ where C is regulation parameter. Using (3) and (4), the minimax problem is written as:

In order to feed the classification process, in this study, we adopted a subset of the features described in [2]. In particular, we used the two following kinds of features: 1) ECG morphology features and 2) ECG wavelet features with QRS width. Then, after extracting the temporal features of interest, we normalized to the same periodic length the duration of the segmented ECG cycles according to the procedure reported in [23]. To this purpose, the mean beat period was chosen as the normalized periodic length, which was represented by 300 uniformly distributed samples.

The domain transfer SVM method assumes the kernel function k or K matrix as a linear combination of a set of base kernel function km, which can be defined as  ൌ σ୑ ୫ୀଵ †୫  ୫ ǡ where †୫ ൒ Ͳǡ σ୑ ୫ୀଵ †୫ ൌ ͳ and with further assumption that,

‹ƒš ȍ൫–”ሺሻ൯ ൅ șሺσ୒ ୧ୀଵ Į୧  െ  Į ଵ ୒ σ୧ୀଵ σ୒ ୨ୀଵ ›୧ ›୨ Į୧ Į୨ ሺš ୧ š ୨ ሻሻ,

(5)





ȍ൫–”ሺሻ൯ ൌ ሺ–”ሺሻሻଶ ǡ ଶ

(6)

Therefore, final formulation can be written as: ‹ƒš ଵ  ሺ–”ሺሻሻଶ  ൅ șሺσ୒ ୧ୀଵ Į୧  െ ଶ Į † ଵ ୒ σ୧ୀଵ σ୒ (7) ୨ୀଵ ›୧ ›୨ Į୧ Į୨ ሺš ୧ š ୨ ሻሻ, ଶ

B. Domain Transfer SVM

C. Importance Weighted Kernel Logistic Regression The domain transfer (DTSVM) was proposed to learn both, the SVM decision function ˆሺšሻ ൌ ™ ƍ ijሺšሻ ൅ „ and kernel function k, simultaneously. This method minimizes the distance between the two distributions, as well as the structural risk function of SVM. This is defined as: ሾǡ ˆሿ ൌ ƒ”‰ ‹ πቀ‹•–ሺ ୗ ǡ  ୘ ሻቁ ൅ Ʌ୩ǡ୤ ሺሻ

(2)

This method uses importance weights and kernel logistic regression to cope with the shift between source distribution and target distribution. An importance weight, estimated by using Kullback-Leibler importance estimation procedure [21], is used to estimate the change between the distributions. The combination of importance sampling and kernel logistic

regression is defined importance regression: ܲఋ ሺܸǢ ሼ‫ݔ‬௜ ǡ ‫ݕ‬௜ ሽ௧௥ ሻ ൌ

as

importance

weighted

kernel TABLE II.

െ σௌ௜ୀଵ Ž‘‰ ܲሺ‫ݕ‬௜ ȁ‫ݔ‬௜ ǡ ܸሻ





൅ ‫ݎݐ‬ሺܸ‫ ܸܭ‬ሻ ଶ

(8)

where V is the learning parameter, į is the regularization parameter. III.

NUMBERS OF TRAINING, SEEN TEST AND UNSEEN TEST DATASETS USED IN THE EXPERIMENTS

EXPERIMENTAL RESULTS

Class

N

A

V

F

Total

Training beats

11455

235

946

103

12739

SEEN test beats (DS1)

34366

708

2840

311

38225

UNSEEN test beats(DS2)

44214

1836

3219

388

59615

Total

90035

2779

7005

802

110579

A. Dataset Description In this experiment, we have considered real ECG data, obtained from the MIT-BIH arrhythmia database (Mark, 1997). In order to evaluate the performance of the domain transfer SVM, the Association for the Advancement of Medical Instrumentation (AAMI) proposes a standard for performance evaluation. In particular, the considered beats refer to the four classes: 1) normal sinus rhythm (N), 2) atrial premature beat (A), 3) ventricular premature beat (V), and 4) fused beats (F). The details of each class are illustrated in Table I. TABLE I.

B. Results The overall performance accuracy is shown in the Table III. The conventional SVM, IWKLR and DTSVM are used for performance comparison. As expected, the overall accuracy of DS1 is better as compared to DS2, as there is statistical shift between the DS1 and DS2. However, the main finding is that the results clearly show that the domain transfer SVM performs better as compared to the other two techniques, as it aims to reduce the difference between the SEEN and UNSEEN datasets.

BEAT GROUPS AND BEAT CLASSES ACCORDING TO AAMI Beat Group

Beat classes

TABLE III. OVERALL PERFORMANCE OF THE CLASSIFICATION TECHNIQUES FOR THE SEEN AND UNSEEN DATASETS.

Normal (N)

N, e, j, L, R

Ventricular Ectopic (V)

V,E

Super-ventricular Ectopic (S)

A, a, J, S

SVM

Fused (F)

F

The dataset contains a total of 44 records. We divide this database into two sets DS1 and DS2, such that they each contain the following file numbers: DS1 = 101, 106, 108, 109, 112, 114, 115, 116, 118, 119, 122, 124, 201, 203, 205, 207, 208, 209, 215, 220, 223, and 230. DS2 = 100, 103, 105, 111, 113, 117, 121, 123, 200, 202, 210, 212, 213, 214, 219, 221, 222, 228, 231, 232, 233, and 234. Each record in DS1 is again divided into two sets of beats, a smaller first set of beats used for training the classifier, and a larger set of beats which is used for initial testing of the classifier. Hence, we term this set of beats as the set of SEEN beats. The remaining 22 records from the MIT-BIH arrhythmia database, are new to the classifier and hence are termed as the set of UNSEEN heart beats. This set will be classified by the following three different classifiers: 1) SVM,2) IWKLR, and 3) DTSVM. Table II shows the number of training beats of each beat class vs the training, SEEN test beats (DS1), and the UNSEEN test beats (DS2).

Technique

Overall Classification Accuracy DS 1 (SEEN)

DS 2 (UNSEEN)

97%

91.8%

IWKLR

96.6%

87.2%

DTSVM

97.3%

93%

IV.

CONCLUSION

In this paper, we have proposed two domain adaptation methods shown the classification results using morphological feature and wavelet feature with QRS width. We have used the conventional SVM, domain transfer SVM, and class Importance Weighted Kernel Logistic Regression (IWKLR) method to show the classification results. Our finding shows that the domain transfer SVM performs better as compared to the other approaches.

ACKNOWLEDGMENT (HEADING 5) This work is supported by the National Plan for Sciences and Technology (NPST), King Saud University. Project ID: 11-MED1832-02.

REFERENCES [1]

V.X. Afonso, W.J. Tompkins, 1995. Detecting ventricular fibrillation. IEEE Eng. Med. Biol. 14 152–159. [2] F. de Chazal and R. B. Reilly, 2006. A patient adapting heart beat classifier using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng., vol. 53, no. 12, pp. 2535–2543. [3] T. Inan, L. Giovangrandi, and J. T. A. Kovacs, 2006. Robust neural network based classification of premature ventricular contractions using wavelet transform and timing interval features. IEEE Trans. Biomed. Eng., vol. 53, no. 12, pp. 2507–2515. [4] Kutlu Y, Kuntalp D., 2012. Feature extraction for ECG heartbeats using higher order statistics of WPD coefficients. Computer Methods and programs in biomedicine, Elsevier. [5] Andreao, R. V., Dorizzi, B., & Boudy, J., 2006. ECG signal analysis through hidden Markov models. IEEE Transactions on Biomedical Engineering, 53(8), 1541–1549. [6] Choi, S., Adnane, M., Lee, G., Jang, H., Jiang, Z., & Park, H., 2010. Development of ECG beat segmentation method by combining lowpass filter and irregular R–R interval checkup strategy. Expert Systems with Applications, 37, 5208–5218. [7] S. Osowski, T.H. Linh, 2001. ECG beat recognition using fuzzy hybrid neural network. IEEE Trans. Biomed. Eng. 48, 1265–1271. [8] L. Khadra, A. Al-Fahoum, S. Binajjaj, 2005. A quantitative analysis approach for cardiac arrhythmia classification using higher order spectral techniques. IEEE Trans. Biomed. Eng. 52. [9] S. Osowski, L.T. Hoai, T. ve Markiewicz, 2004. Support vector machine-based expert system for reliable heartbeat reliable heartbeat recognition. IEEE Trans. BME 51 (4) . [10] Sheng Hu; Zhenzhou Shao; Jindong Tan; 2012 , "A Real-Time Cardiac Arrhythmia Classification System with Wearable Electrocardiogram," Sensors 2012, 12(9), 12844-12869; doi:10.3390/s120912844. http://www.mdpi.com/1424-8220/12/9/12844 [11] M.H. Song, J. Lee, S.P. Cho, K.J. Lee, S.K. ve Yoo, 2005. Support vector machine based arrhythmia classification using reduced features.

Int. J. Control Autom. Syst. 3 (4) 571–579. [12] J. Blitzer, R. McDonald, and F. Pereira, 2006. Domain adaptation with structural correspondence learning. Conference on Empirical Methods in Natural Language Processing, EMNLP ’06, pages 120–128. [13] W. Dai, G.-R. Xue, Q. Yang, and Y. Yu, 2007. Transferring naive bayes classifiers for text classification. In Proceedings of the 22nd national conference on Artificial intelligence – Volume 1, pages 540– 545. [14] B. Geng, D. Tao, C. Xu, 2011. DAML: Domain Adaptation Metric Learning. IEEE Transactions On Image Processing, Vol. 20, No. 10. [15] A. Storkey and M. Sugiyama, 2006. Mixture regression for covariate shift. In NIPS. [16] J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt and B. Scholkopf, 2006. Correcting sample selection bias by unlabeled data. In NIPS, 2006. [17] L. Duan, I. W. Tsang, Dong Xu, 2012. Domain Transfer Multiple Kernel Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence,34(3):465-479. [18] M. Yamada, M. Sugiyama, & T. Matsui, 2010. Semi-supervised speaker identification under covariate shift. Signal Processing, vol.90, no.8, pp.2353-2361, 2010. [19] K. M. Borgwardt, A. Gretton, M. J. Rasch, H.-P. Kriegel, B. Scholkopf and A. J. Smola, 2006. Integrating structured biological data by kernel maximum mean discrepancy. In ISMB. [20] G. Lanckriet et al, 2004. Learning the kernel matrix with semidefinite programming. JMLR, 27–72. [21] M. Sugiyama, S. Nakajima, H. Kashima, P. von B¨unau, and M. Kawanabe, 2008. Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems, Cambridge, MA, pp. 1433– 1440, MIT Press. [22] R. Mark and G.Moody, 1997. MIT-BIH Arrhythmia Database [Online]. Available: http://ecg. mit.edu/dbinfo.html. [23] J. J. Wei, C. J. Chang, N. K. Shou, and G. J. Jan, 2001. ECG data compression using truncated singular value decomposition. IEEE Trans. Biomed. Eng., vol. 5, no. 4, pp. 290–299.