PDF (355 KB) - World Scientific

22 downloads 0 Views 355KB Size Report
Feb 23, 2016 - St. Joseph Engineering College, Mangaloru 575028, India ...... Feigin VL, Forouzanfar MH, Krishnamurthi R, Mensah GA, Connor M, Bennett ...
Journal of Mechanics in Medicine and Biology Vol. 16, No. 1 (2016) 1640012 (19 pages) c World Scientific Publishing Company ° DOI: 10.1142/S0219519416400121

DECISION SUPPORT SYSTEM FOR ARRHYTHMIA BEATS USING ECG SIGNALS WITH DCT, DWT AND EMD METHODS: A COMPARATIVE STUDY

USHA DESAI*,†,||, ROSHAN JOY MARTIS‡, C. GURUDAS NAYAK§, G. SESHIKALA†, K. SARIKA* and RANJAN SHETTY K.¶ *Department of Electronics and Communication Engineering NMAM Institute of Technology, Nitte, Udupi, Karnataka 574110, India †School

of Electronics and Communication Engineering REVA University, Bengaluru 560064, India



Department of Electronics and Communication Engineering St. Joseph Engineering College, Mangaloru 575028, India §Department

of Instrumentation and Control Engineering MIT, Manipal University, Manipal 576104, India



Department of Cardiology, Kasturba Medical College Manipal University, Manipal 576104, India ||[email protected]

Received 2 September 2015 Accepted 16 December 2015 Published 23 February 2016 Electrocardiogram (ECG) signal is a non-invasive method, used to diagnose the patients with cardiac abnormalities. The subjective evaluation of interval and amplitude of ECG by physician can be tedious, time consuming, and susceptible to observer bias. ECG signals are generated due to the excitation of many cardiac myocytes and hence resultant signals are nonlinear in nature. These subtle changes can be well represented and discriminated in transform and non-linear domains. In this paper, performance of Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT) and Empirical Mode Decomposition (EMD) methods are compared for automated diagnosis of five classes namely Non-ectopic (N), Supraventricular ectopic (S), Ventricular ectopic (V), Fusion (F) and Unknown (U) beats. Six different approaches: (i) Principal Components (PCs) on DCT, (ii) Independent Components (ICs) on DCT, (iii) PCs on DWT, (iv) ICs on DWT, (v) PCs on EMD and (vi) ICs on EMD are employed in this work. Clinically significant features are selected using ANOVA test (p < 0:0001) and fed to k-Nearest Neighbor (k-NN) classifier. We have obtained a classification accuracy of 99.77% using ICs on DWT method. Consistency of performance is evaluated using Cohen’s kappa statistic. Developed approach is robust, accurate and can be employed for mass diagnosis of cardiac healthcare. Keywords: Preprocessing; feature extraction; dimensionality reduction; ANOVA; empirical mode decomposition; Cohen’s kappa statistic; class-specific accuracy.

|| Corresponding

author. 1640012-1

U. Desai et al.

1. Introduction Cardiac arrhythmias are a group of disorders caused due to abnormal conduction of electrical impulses in the heart. Disturbance in the cardiac rhythm may be due to hypertension, hypokalemia, cardiomyopathy, congenital heart diseases and cardiovascular diseases (CVDs) or coronary artery diseases. Arrhythmias can be classified based on the site of origin.1 Key health factors that increase the risks of heart diseases are excessive use of alcohol and tobacco, unhealthy diet, insufficient physical activity, obesity and increased intake of processed foods.2 Abnormalities in the cardiac rhythm are the major causes of morbidity and mortality in the developed and under developed countries.3 Approximately 7 million people are living with CVDs in United Kingdom (UK). About 22,000 deaths each year due CVDs are attributed to smoking and 30% of children in the UK are overweight. Over the 50% of adults in UK have high blood lipid levels and greater than one-third of men and over a one-fourth of women commonly go beyond the government recommended restrictions of alcohol consumption.4 About 2150 Americans die every day from CVDs. American Heart Association (AHA) reports 16% of students between 9 and 12 grades are smokers and among adults, 20% of men and 16% of women are smokers.5 Among 16 million deaths below the age of 70 years, 82% are in middle and low income countries and 37% are affected due to CVDs.6 As the risk factors accelerate around the world, prediction and prevention of cardiac arrhythmia is the main medical challenge for the researchers and health care community. The electrocardiogram (ECG) is a P-QRS-T time-voltage signal on conduction of the heart which helps in non-invasive clinical diagnosis.7 Time domain features itself, cannot provide high discrimination in diagnosis of cardiac health.8 Furthermore, to increase the diagnostic accuracy Martis et al.9–11 and Acharya et al.12–20 applied various linear and non-linear transform domain approaches. The automatic analysis of ECG initiated by Refs. 21 and 22, further, the Computer-Aided Cardiac Diagnosis (CACD) system gained more significance in clinical ECG diagnosis.23 The attributes of the ECG signal such as non-linearity and highly transient nature24,25 makes the manual screening tedious and prone to classification errors. Therefore, CACD using transform domain features of ECG can significantly improve the performance in clinical diagnosis and reduce workload on the physicians. Several studies are conducted on automated diagnosis of cardiac health using ECG signal.26 Ubeyli27 identified automatically four classes (Normal Sinus Rhythm (NSR), Cognitive Heart Failure (CHF), Atrial Fibrillation (AF), and Ventricular Tachycardia (VT)) using eigenvector method and Recurrent Neural Network (RNN) and yielded an accuracy of 98.06%. They have also classified same classes using eigenvector coefficients and Support Vector Machine (SVM) and obtained an accuracy of 98.33%.27 Ebrahimzadeh et al.29 classified five classes of heartbeats (NSR, Right Bundle Branch Block (RBBB), Left Bundle Branch Block (LBBB), Ventricular Premature Contraction (VPC) and Atrial Premature Contraction (APC)) using spectral and 1640012-2

Decision Support System for Arrhythmia Beats Using ECG Signals

timing features, Radial Basis Function (RBF) and hybrid bees algorithm with an accuracy of 95.18%. Martis et al.30 classified same classes using Higher Order Spectra (HOS) cumulants and PCA combination. They have reported an accuracy of 94.52% using neural network classifier. Ge et al.31 classified six classes (NSR, APC, Supraventricular Tachycardia (ST), VT, VPC and Ventricular Flutter (VFL)) using autoregressive model and Generalized Linear Model (GLM) classifier. They have reported an accuracy of 93.2% to 100% using their GLM algorithm. Melgani and Bazi32 diagnosed six classes (NSR, RBBB, LBBB, VPC, APC, and Paced (P)) using ECG morphology and RR interval coefficients. They have reported an accuracy of 93.27% using a combination of SVM and Particle Swarm Optimization (PSO). Mishra and Raghav33 used Local Fractal Dimension (LFD) and automatically detected six classes of heartbeats (NSR, RBBB, LBBB, VPC, APC and P) using k-NN classifier and achieved 99.49% of accuracy. Daamouche et al.34 applied DWT, Particle Swarm Optimization (PSO) technique and SVM to classify six classes of beats (NSR, RBBB, LBBB, VPC, APC and P). Their proposed algorithm yielded 88.84% of accuracy. Doğan and Korürek35 applied kernelized fuzzy c-means algorithm coupled with hybrid ant colony optimization and clustered six classes of heartbeats (NSR, VPC, Fusion of Ventricular and Normal beat, Artrial Premature, RBBB and Fusion of Paced and NSR beat) and they have reported with 93.58% of accuracy. Although, there are many transform domain methods used for ECG signals, the best technique is yet to be explored. Conversely, most of these methods are tested using smaller dataset and the statistical importance of the feature sets are not verified clearly. The aim of this paper is to compare linear transform domain method DCT and non-linear transform domain methods DWT and EMD in automated classification of five classes of ECG beats. The proposed system approach is shown in Fig. 1. Recently, developed technique EMD by Haung et al.36 for non-linear and nonstationary signal decomposition is an adaptive method, depicts the subtle changes in the physiological signals. Currently, Acharya et al.37 and Martis et al.38 applied EMD technique in automated diagnosis of epilepsy, diabetes39 and in denoising40 of physiological signals. In this work, to extract the hidden information from the ECG signals, along with normally used methods DCT and DWT, the new approach EMD is applied. Using these methods, features are extracted from the: (i) DCT coefficients, (ii) DWT sub-bands third and fourth level detail coefficients and (iii) Intrinsic Mode Functions (IMF1 and IMF2) taken from EMD. Further, these features individually applied for dimensionality reduction using PCA and ICA, respectively and subjected to ANOVA test. The diagnostically relevant samples (p < 0:0001) are fed to k-NN classifier for pattern recognition. The consistency of feature sets is measured by Cohen’s kappa statistic and performance is evaluated using class-specific accuracy (%) and overall accuracy (%). Further the paper is structured as follows: Section 2 describes the dataset and tools used for the experimental work and Sec. 3 discusses the methodology used. 1640012-3

U. Desai et al.

Fig. 1. Proposed system.

Section 4 presents the experimental results achieved and followed by Sec. 5 contains discussion of the results. Finally paper concludes in Sec. 6. 2. Dataset and Tools Used 2.1. Dataset used In this work, altogether 110,093 beats are used from MIT-BIH arrhythmia database.41 In this database, ECG signals are sampled at 360 Hz and beats are classified into five main arrhythmia classes based on ANSI/AAMI EC57:1998 standard.42 Table 1 presents the five classes (N, S, V, F and U) of ECG beats used in this paper. The starting and ending of the dataset, not having enough samples are ignored and also these beats are verified by an expert cardiologist. 2.2. Tools used The proposed system is developed using MATLAB 2013A simulation tool. The methods DCT, DWT, PCA, ANOVA, 10-fold cross-validation, confusion matrix Table 1. Summary of dataset used. ECG classes

No. of beats

Non-ectopic (N) Supraventricular ectopic (S) Ventricular ectopic (V) Fusion (F) Unknown (U) Total beats

1640012-4

90,575 2972 7707 1784 7055 110,093

Decision Support System for Arrhythmia Beats Using ECG Signals

and k-NN are implemented using MATLAB. In order to perform ICA, EMD, Cohen’s kappa, we have used FastICA toolbox,43 EMD toolbox designed by Manuel Ortigueira44 and kappa toolbox designed by Giuseppe Cardillo,45 respectively. 3. Methodology The implementation of proposed automated system involves following stages: preprocessing, feature extraction, dimensionality reduction, statistical test, 10-fold cross-validation and classification using k-NN. 3.1. Preprocessing ECG signals taken from MIT-BIH arrhythmia database41 is filtered by applying DWT based multi-resolution analysis.46 Daubechies-db4 mother wavelet47 is used for DWT and decomposed into nine stages of sub-bands. The ninth stage approximation coefficient of frequency band 0–0.351 Hz, refers to the baseline drift, are initialized to zero. The ECG signal does not contain any significant data above 45 Hz, so initial two levels of detailed coefficients are set to zero. Remaining wavelet coefficients in the detailed sub-bands of third, fourth, fifth, sixth, seventh, eighth and ninth level are reconstructed to get the resultant filtered ECG. R-peak is identified using Pan–Tompkins algorithm48 from the filtered ECG signal. R-peak identified data is divided, such that each division consists of 99 and 100 samples before and after the R-peak, respectively. ECG beats of 200 samples belonging to five arrhythmia classes are used in this study. 3.2. Feature extraction In this study, features are extracted independently from six different methods namely: (i) DCT on PCA, (ii) DCT on ICA, (iii) DWT on PCA, (iv) DWT on ICA, (v) EMD on PCA, and (vi) EMD on ICA. 3.2.1. Discrete Cosine Transform (DCT) DCT is linear transform method, in which basis functions are orthogonal and realvalued cosine functions. DCT possesses high spectral compaction using which more information is packed in the first one-third DCT coefficients.49 In this work, ECG beat of 200 samples are transformed to DCT domain and initial 67 coefficients dimensionality is reduced individually using PCA and ICA, respectively. 3.2.2. Discrete Wavelet Transform (DWT) Wavelet transform is non-linear time-scale decomposition method; applied mainly for noise filtering, feature extraction and data compression applications.46 In this study, each ECG beat containing of 200 samples is divided into four sub-bands by applying Daubechies-db4 mother wavelet.50 Features are extracted at QRS-complex frequency range52 from third level detail of 22.25–45 Hz and fourth level detail 1640012-5

U. Desai et al.

coefficients of 11.25–22.5 Hz. After sub-band decomposition, the third and fourth level details are compressed to 31 coefficients and 19 coefficients, respectively. These two sub-bands coefficients are independently applied for dimensionality reduction using PCA and ICA, respectively. 3.2.3. Empirical Mode Decomposition (EMD) EMD is time-frequency analysis technique, used for non-stationary and non-linear adaptive time sequence decomposition, in which the basis functions are direct derivative of the signal under consideration.36 In this work, EMD technique decomposes ECG beat into an instantaneous frequency of finite quantity oscillatory signals, called Intrinsic Mode Functions (IMFs). IMFs has two basic characters; (i) equal quantity of extrema and zero-crossings or it may vary utmost by single, and (ii) they are symmetric with respect to confined zero mean. The lower level IMFs represent more frequency oscillations and higher level IMFs correspond to less frequency oscillations. In this work, initial two IMFs of 200 samples are considered (some of the beats have only two IMFs) as shown in Fig. 2. These IMFs are individually subjected to dimensionality reduction using PCA and ICA, respectively. 3.3. Dimensionality reduction In the current work, PCA and ICA methods are used for dimensionality reduction of (i) DCT coefficients, (ii) DWT sub-bands third and fourth level detail coefficients, and (iii) IMF1 and IMF2 extracted from EMD. 3.3.1. Principal Component Analysis (PCA) PCA is a linear dimensionality reduction technique, which orders structures in the dataset and expresses by highlighting their similarities and differences.52 For group of ECG beats PCA computation involves; (i) computing covariance matrix, (ii) eigenvalue and eigenvector calculation, (iii) arranging eigenvectors in the descending direction of eigenvalues, and lastly (iv) projecting the original ECG beats in the directions of arranged eigenvectors. The eigenvector with maximum eigenvalue is the principle component of the dataset. First few components signify the maximum discrimination present in the heartbeats. In this work, first 12 principal components (PCs) of DCT, DWT, and EMD are separately fed to the k-NN classifier for automated classification. 3.3.2. Independent Component Analysis (ICA) ICA is multivariate data reduction method used to reveal the unseen factors present in the source signals.53 ICA assumes the data variables to be linear or non-linear combinations of some indefinite unseen variables, and the mixing system is also unknown. The unseen variables are assumed to be non-Gaussian, jointly independent and are called the independent components of the experiential data. In this work, ICA is computed using FastICA toolbox43 and first 12 independent 1640012-6

Decision Support System for Arrhythmia Beats Using ECG Signals

Fig. 2. Typical plot of S beat and F beat and corresponding first two IMFs.

components (ICs) of DCT, DWT, and EMD coefficients are individually used for pattern classification using k-NN. 3.4. ANOVA-statistical test In this work, significance of the initial 12 PCs and independent components are verified by applying one-way Analysis-of-Variance (ANOVA) test. Subsequently the discrimination among multiclass labels is computed using F -value (ratio of the between class variance to the within class variance) and p-value measures (p < 0:05 is preferred). The larger F -value indicates that the between groups discrimination is greater than the within group. Every value of F essentially relates to a distinct p-value, and the greater F -values correspond to lesser p-values.54 3.5. 10-fold cross-validation The 10-fold cross-validation technique is applied to obtain robust automated diagnosis system. In this work, cross-validation process is performed by partitioning 1640012-7

U. Desai et al.

the 110,093 labeled heartbeat features into 10 equally sized disjoint divisions known as folds, ensuring that all classes are represented in each fold. During the sequence of experiments, one-fold is used as test set; remaining nine-folds are used as training set. This procedure is repeated for 10 stretches, so that each division is used for testing just once.55 3.6. k-Nearest Neighbor (k-NN) classifier The k-NN classification is supervised detection method, in which the training patterns with class labels are subjected to obtain the class separation to the test patterns.55 The classification begins by fixing k number of samples and once the samples are unevenly distributed, further the change in width will be such that each region contains exactly k samples. This procedure is continued till the algorithm converges. In this work, we have used k ¼ 3 and Euclidean distance metric for neighboring pattern search to get the highest classification accuracy (%). 3.7. Performance measure of classification Performance of the developed system is evaluated based on the following measures: (i) Class-specific accuracy (%) ¼ (ii) Overall accuracy (%) ¼

Number of detected beats of individual class Total number heartbeats of individual class

Number of detected beats from the all the classes Total number heartbeats from all the classes

 100,

 100,

(iii) Cohen’s kappa statistic (): This is a new technique published by Cohen,56 using which the reliability of diagnosis is measured. Generally, kappa value ranges between 0    1, where 1 represents the perfect consistency during 10-folds of cross-validation. These performance measures for the 10-fold cross-validation are evaluated in each fold and graphically represented using box plots. Also, the confusion matrix is represented for the validation of class-specific accuracy (%) and overall accuracy (%). 4. Results ECG signals of 360 Hz sampling frequency are subjected to nine levels of sub-band decomposition using DWT method. The DWT coefficients in the sub-bands whose frequencies do not correspond to the ECG signal are set to zero and then subjected for inverse DWT to denoise the ECG signal. This process will remove the baseline drift and high frequency noise from the ECG signal. Based on the reference annotations for QRS middle points as marked in the database, in total 110,093 ECG beats belonging to five classes (N, S, V, F and U) of arrhythmia and ECG beats with complete P-QRS-T cycle are considered in this study. These ECG beats are transformed independently using three different transform representations DCT, DWT and EMD. Further, using PCA and ICA six sets of features for pattern 1640012-8

Decision Support System for Arrhythmia Beats Using ECG Signals

recognition are used as follows: (i) PCs on DCT coefficients, (ii) ICs on DCT coefficients, (iii) PCs on DWT, (iv) ICs on DWT, (v) PCs on EMD decomposition and (vi) ICs on EMD. ECG signals are decomposed using third and fourth level detail coefficients of DWT and six PCs (six ICs) are used from each of these sub-bands, respectively. During EMD decomposition, six components from IMF1 and IMF2, respectively are subjected individually to both PCA and ICA methods. In each method, 12 components are used as they are clinically significant (p < 0:0001). 10-fold cross-validation is applied during the classifier design and performance is evaluated using class-specific accuracy (%), overall accuracy (%) and Cohen’s kappa statistic () measure. The clinically significant features of six methods F -values are presented in Table 2. Figure 3 depicts the percentage of energy (%) present in the first 12 PCs for DCTþPCA methodology. It can be seen from the figure that first PC consists of 56.45% of variability; second PC consists of 20.56% of variability and so on. The second column of Table 2, lists the F -value for the respective PCs. Figure 4(a) shows the plot of percentage energy (variability) contained in the respective PCs of third level detail sub-band coefficients for DWTþPCA methodology. Similarly, Fig. 4(b) depicts the plot of energy contained in each of the PCs of the fourth level detail coefficients. The fourth column of Table 2 indicates the F -value for DWTþPCA methodology. Figures 5(a) and 5(b) depict the energy (%) present in the PCs of IMF1 and IMF2, respectively. Fifth and sixth column of Table 2 describes the F -value for EMDþPCA and EMDþICA, respectively. Tables 3–8 provide the confusion matrix (rows represents gold standard and column gives actual classifier outputs) of five classes for six proposed methodologies. Table 9 describes the class-specific and overall classification accuracy (%) using six methodologies. It can be noted from this table that, the DWTþICA methodology provides the highest overall classification accuracy of 99.77% and kappa coefficient () Table 2. Results of F -value for six methods used (p-value < 0:0001). Statistical significance of feature sets No. of DCT þ PCA DCT þ ICA DWT þ PCA DWT þ ICA EMD þ PCA EMD þ ICA components F-value F-value F-value F-value F-value F-value C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12

2892.63 1513.35 597.44 328.34 192.27 196.13 133.18 125.86 93.66 82.90 69.70 64.74

534.19 826.55 625.66 608.76 719.43 765.16 577.06 516.12 524.71 645.81 463.33 424.31

4049.96 1700.06 110.71 926.65 1472.11 221.49 5177.32 5904.93 1478.62 448.47 472.61 707.78 1640012-9

525.93 892.58 1585.41 451.25 413.73 867.5 1837.01 1787.18 768.12 1608.32 1167.36 4343.43

150.95 105.58 91.88 81.54 67.45 93.25 470.11 54.02 47.65 42.8 46.03 39.86

71.22 69.13 85.49 105.16 100.78 78.75 181.97 263.17 286.9 151.64 147.55 128.47

U. Desai et al.

Fig. 3. Plot of energy (%) versus PCs using DCT method.

(a)

(b)

Fig. 4. Plot of energy (%) versus to PCs using DWT method: (a) Third level detail, and (b) Fourth level detail.

of 0.9926. The kappa coefficient () indicates the consistency of accuracy (%). A kappa coefficient close to one indicates that the 10 classification accuracies of 10-fold cross-validation do not have much variation and hence the results are consistent. We have, obtained highest Cohen’s kappa coefficient for DWTþICA method. Hence, it can be inferred that the classifier performance is more consistent for DWTþICA method than other techniques. Figures 6 and 7 indicate the box plot of class-specific accuracy (%) for five ECG classes and overall accuracy (%), using DWTþICA and DCTþPCA methods, respectively. The line inside the box indicates the median value of 10-folds. The box width in the box plot of overall accuracy in DWTþICA methodology is less than the other methods. Hence, the DWTþICA 1640012-10

Decision Support System for Arrhythmia Beats Using ECG Signals

(a)

(b)

Fig. 5. Plot of energy (%) versus to PCs using EMD method: (a) IMF1 and (b) IMF2.

Table 3. Confusion matrix of classification for DCTþPCA method over the 10-folds. Classified output

Gold standard

Classes

N

S

V

F

U

N S V F U

90,516 26 66 32 3

18 2925 38 7 5

21 12 7560 13 7

14 7 33 1729 6

6 2 10 3 7034

Table 4. Confusion matrix of classification for DCTþICA method over the 10-folds. Classified output

Gold standard

Classes

N

S

V

F

U

N S V F U

90,322 107 241 172 29

60 2798 72 34 11

98 44 7293 63 18

64 16 71 1512 11

31 7 30 3 6986

method provides consistent estimate of accuracy (%) in all the 10-folds. Also, the box width in the box plot accuracy of Fusion (F) beats is more than that of other classes, which indicates that less number of F beats are detected and range of variation is more over 10-fold cross-validation. This is due to the skewed data (number of samples in five classes are unequal) considered in this study, which leads 1640012-11

U. Desai et al. Table 5. Confusion matrix of classification for DWTþPCA method over the 10-folds. Classified output

Gold standard

Classes

N

S

V

F

U

N S V F U

89,419 1106 406 241 33

805 1831 62 5 3

211 30 7139 85 11

121 2 89 1404 25

19 3 11 49 6983

Table 6. Confusion matrix of classification for DWTþICA method over the 10-folds. Classified output

Gold standard

Classes

N

S

V

F

U

N S V F U

90,535 5 27 4 3

5 2924 62 1 0

26 41 7601 27 12

2 1 10 1743 5

7 1 7 9 7035

Table 7. Confusion matrix of classification or EMDþPCA method over the 10-folds. Classified output

Gold standard

Classes

N

S

V

F

U

N S V F U

88,932 795 892 677 623

647 1904 114 121 109

436 96 6441 168 92

252 43 137 659 111

308 134 123 159 6120

Table 8. Confusion matrix of classification for EMDþICA method over the 10-folds. Classified output

Gold standard

Classes

N

S

V

F

U

N S V F U

88,417 954 1267 859 622

797 1800 144 116 63

632 90 5990 157 136

345 43 131 560 78

384 85 175 92 6156

1640012-12

Decision Support System for Arrhythmia Beats Using ECG Signals Table 9. Results of class-specific accuracy, overall accuracy and kappa statistic using different methods over the 10-folds. Methods used

N (%)

S (%)

V (%)

F (%)

U (%)

Overall (%)

kappa ()

DCTþPCA DCTþICA DWTþPCA DWTþICA EMDþPCA EMDþICA

99.93 99.72 98.72 99.96 98.18 97.61

98.41 94.15 61.61 98.39 64.11 60.6

98.09 94.63 92.63 98.63 83.56 77.71

96.93 84.80 78.7 97.71 36.86 31.25

99.70 99.02 98.98 99.72 86.75 87.26

99.70 98.93 96.99 99.77 94.52 93.41

0.9904 0.9655 0.9024 0.9926 0.8194 0.7845

Fig. 6. Box plot of class-specific accuracy (%) and overall accuracy (%) using DWTþICA method for k-NN classifier.

to over fitting of the classification model. Hence, results in bias toward more data sample class. 5. Discussion The objective of this paper is to estimate the performance of three feature extraction methods viz. DCT, DWT, EMD and dimensionality reduction approaches (PCA and ICA) for classification of five ECG arrhythmia classes. In our approach complexities in ECG classes are extracted using linear (DCT) and non-linear (DWT and EMD) transform domain representations. Further, these applied feature extraction methods are summarized as follows: (i) DCT a linear transform domain method has the capacity to pack spectrum of ECG beats within few coefficients. Hence, it permits to reject very small amplitude 1640012-13

U. Desai et al.

Fig. 7. Box plot of class-specific accuracy (%) and overall accuracy (%) using DCTþPCA method for k-NN classifier.

coefficients without introducing any changes in the ECG beats. In this work, 200 samples of ECG signal are compressed to 67 (one-third) samples. Further, PCA is applied on these 67 features to obtain 12 PCs. These 12 PCs yielded average accuracy of 99.7% using k-NN classifier. (ii) DWT representation is compact and hence more information will be confined in few sub-bands. In this study, during DWT sub-band decomposition, 200 samples of ECG signal are reduced to 31 third level detail coefficients and 19 fourth level detail coefficients. Further, these DWT coefficients are individually subjected to PCA and ICA. Initial 12 components with maximum variability are fed to the classifier. ICA on DWT achieved the highest accuracy of 99.77% than the PCA on DCT. (iii) EMD is a signal driven approach,36 decomposes the given heartbeat signal into few sums of Intrinsic Mode Functions (IMFs). These IMFs are adaptive, and able to capture the minute non-linear fluctuations in the bio-signal. In this study, to maintain uniformity initial two IMFs (IMF1 and IMF2) are considered as features for all five classes. Then these features are individually subjected to dimensionality reduction using PCA and ICA methods. Further, 12 PCs features (6 each from IMF1 and IMF2) are classified and obtained a maximum average accuracy of 94.52% using k-NN with 10-fold cross-validation. EMD method is most appropriate for long duration signals such as EEG.38 In this study, short segment (200 samples beat) of signal is considered. Therefore, the accuracy achieved with EMD is less compared to DCT and DWT methods. Table 10 presents a complete summary of computer-assisted diagnosis of ECG beats using MIT-BIH arrhythmia database.41 The studies reported in literature are 1640012-14

Decision Support System for Arrhythmia Beats Using ECG Signals Table 10. Overview of studies conducted on automated classification of ECG beats using MITBIH arrhythmia database.41 Studies conducted

Features

Classifier

Classes

Acc. (%)

Small Dataset Özbay et al.57 Yu and Chen58

Fuzzy C means clustering DWT

10 6

98.90 99.65

Yu and Chou59 Martis et al.60 Martis et al.61

ICA and RR interval PCA PCA on Bispectrum

Neural Network (NN) Probabilistic neural network (PNN) PNN SVM SVM

8 5 5

98.71 98.11* 93.48*

NN

5

96.60

Multidimensional PSO PNN PNN k-NN

5 5 5 5

95.58 99.28* 99.52* 99.77*

Large Dataset Jiang and Kong62 Ince et al.63 Martis et al.64 Martis et al.65 In this study

Hermite function and RR interval PCA on DWT ICA on DWT PCA on DCT ICA on DWT

*10-fold cross-validation.

divided into two groups based on small and large dataset used. It can be inferred from the table that, EMD is not used earlier for automated diagnosis of cardiac abnormalities, which is one of the novel contributions of this work. We have used large dataset of 110,093 heartbeats, to automatically classify the five classes of arrhythmias. In this work, application of ICA on DWT features yielded the class-specific accuracy (%) of 99.96%, 98.39%, 98.63%, 97.71% and 99.72% to discriminate Non-ectopic, Supraventricular ectopic, Ventricular ectopic, Fusion and Unknown beats respectively using k-NN with 10-fold cross-validation. Following are the salient features of this contribution: (i) The signals from the entire MIT-BIH arrhythmia database41 are used in this work. (ii) Performance of proposed method is validated using Cohen’s kappa statistic. (iii) Results obtained are robust as we have used the entire database and performed 10-fold cross-validation. (iv) Developed system is entirely computerized, precise, non-invasive and requires less intervention of clinicians. (v) Developed method can be used during mass screening of cardiac health in third world countries. The results of diagnosis can be sent to the hospitals from remote places to cross check the diagnosis of the developed system by the clinicians. (vi) The developed software can be extended to other diseases like diabetes, autism, and other abnormalities. 1640012-15

U. Desai et al.

6. Conclusion In this study, the performance of DCT, DWT and EMD methods are compared for automated arrhythmia beat detection. Our results show that, DWT technique performs better than DCT and EMD method. The unique signatures (significant features) derived from ECG can characterize unfamiliar cardiac disorders. This study delineates a systematic approach to design a CACD tool to screen AAMI recommended classes with high precision. Proposed decision supportive pattern recognition system (ICA on DWT) is able to classify five cardiac classes with an overall accuracy of 99.77%. This technique can be extended toward the diagnosis of life-threatening cardiac arrhythmias which can help to prevent the sudden cardiac deaths. Thus, our methodology can reduce the burden on physicians during large population screening and can aid the medical community in providing accurate fast diagnosis. Conflicts of Interest There are no said and potential conflicts of interests in the submission of this paper. Acknowledgment The authors would like to thank NMAM Institute of Technology, Nitte and REVA University, Bengaluru for providing the resources to conduct this experiment. References 1. Bennett DH, Bennett’s Cardiac Arrhythmias, Practical Notes on Interpretation and Treatment (John Wiley & Sons, United Kingdom, 2013). 2. O’Donnell MJ, Mente A, Smyth A, Yusuf S, Salt intake and cardiovascular disease: Why are the data inconsistent?, European Heart J 34(14):1034–1040, 2013. 3. Feigin VL, Forouzanfar MH, Krishnamurthi R, Mensah GA, Connor M, Bennett DA, Moran AE et al., Global and regional burden of stroke during 1990–2010: Findings from the global burden of disease study 2010, The Lancet 383(9913):245–255, 2014. 4. British Heart Foundation (BHF), Physical Activity Statistics 2015, G1020; Accessed on: 28/01/2015, URL:. 5. Lichtman JH, Froelicher ES, Blumenthal JA, Carney RM, Doering LV, Frasure-Smith N, Freedland KE et al., Depression as a risk factor for poor prognosis among patients with acute coronary syndrome: Systematic review and recommendations a scientific statement from the American Heart Association, Circulation 129(12):1350–1369, 2014. 6. World Health Organization (WHO), Global Status Report on Noncommunicable Diseases: 2014 Update, World Health Organization, Geneva, (2014) 1–298. 7. Goldberger AL, Clinical Electrocardiography: A Simplified Approach, Mosby, St. Louis, MO, USA, 2012. 8. de Chazal P, Reilly RB, A patient-adapting heartbeat classifier using ECG morphology and heartbeat interval features, IEEE Trans Biomed Eng 53 (2006) 2535–2543. 9. Martis RJ, Acharya UR, Adeli H, Current methods in electrocardiogram characterization, Comput Biol Med 48:133–149, 2014. 1640012-16

Decision Support System for Arrhythmia Beats Using ECG Signals

10. Desai U, Martis RJ, Nayak CG, Sarika K, Nayak SG, Shirva A, Nayak V, Mudassir S, Discrete cosine transform features in automated classification of cardiac arrhythmia beats, Emerging Research in Computing, Information, Communication and Applications Springer, India, pp. 153–162, 2015. 11. Giri D, Acharya UR, Martis RJ, Sree SV, Lim TC, Ahamed T, Suri JS, Automated diagnosis of coronary artery disease affected patients using LDA, PCA, ICA and discrete wavelet transform, Knowl-Based Syst 37:274–282, 2013. 12. Acharya UR, Faust O, Sree V, Swapna G, Martis RJ, Kadri NA, Suri JS, Linear and non-linear analysis of normal and CAD-affected heart rate signals. Comput Methods Prog Biomed 113(1):55–68, 2014. 13. Faust O, Acharya UR, Krishnan SM, Min LC, Analysis of cardiac signals using spatial filling index and time-frequency domain, BioMed Eng OnLine 3:30, 2004. 14. Patidar S, Pachori RB, Acharya UR, Automated diagnosis of coronary artery disease using tunable-Q wavelet transform applied on heart rate signals, Knowl-Based Syst 82:1–10, 2015. 15. Acharya UR, Fujita H, Sudarshan VK, Sree VS, Eugene LW, Ghista DN, San Tan R, An integrated index for detection of sudden cardiac death using discrete wavelet transform and non-linear features, Knowl-Based Syst 83:149–158, 2015. 16. Acharya UR, Vidya KS, Ghista DN, Lim WJ, Molinari F, Sankaranarayanan M, Computer-aided diagnosis of diabetic subjects by heart rate variability signals using discrete wavelet transform method, Knowl-Based Syst 81:56–64, 2015. 17. Acharya UR, Fujita H, Sudarshan VK, Bhat S, Koh JE, Application of entropies for automated diagnosis of epilepsy using EEG signals: A review, Knowl-Based Syst. 88:85– 96, 2015. 18. Faust O, Acharya UR, Molinari F, Chattopadhyay S, Tamura T, Linear and non-linear analysis of cardiac health in diabetic subjects, Biomed Signal Process Control 7(3):295– 302, 2012. 19. Acharya UR, Faust O, Kadri NA, Suri JS, Yu W, Automated identification of normal and diabetes heart rate signals using non-linear measures, Comput Biol Med 43(10):1523–1529, 2013. 20. Acharya UR, Kannathal N, Hua LM, Yi LM, Study of heart rate variability signals at sitting and lying postures, J Bodyw Mov Ther 9(2):134–141, 2005. 21. Frankel P, Rothmeier J, James D, Quaynor N, A computerized system for ECG monitoring, Comput Biomed Res 8(6):560–567, 1975. 22. Nygårds M-E, Hulting J, An automated system for ECG monitoring, Comput Biomed Res 12(2):181–202, 1979. 23. Hultgren HN, Shettigar UR, Specht DF, Clinical evaluation of a new computerized arrhythmia monitoring system, Heart Lung 4:241–251, 1975. 24. Cerutti S, Marchesi C, Advanced Methods of Biomedical Signal Processing, Vol. 27, John Wiley & Sons, 2011. 25. Akay M, Nonlinear Biomedical Signal Processing Vol. II: Dynamic Analysis and Modeling, Wiley-IEEE Press, 2000. 26. Kannathal N, Acharya UR, Ng EY, Krishnan SM, Min LC, Laxminarayan S, Cardiac health diagnosis using data fusion of cardiovascular and haemodynamic signals, Comput Methods Programs Biomed 82(2):87–96, 2006. 27. Übeyli ED, Combining recurrent neural networks with eigenvector methods for classification of ECG beats, Digital Signal Process 19(2):320–329, 2009. 28. Übeyli ED, Usage of eigenvector methods in implementation of automated diagnostic systems for ECG beats, Digital Signal Process 18(1):33–48, 2008.

1640012-17

U. Desai et al.

29. Ebrahimzadeh A, Shakiba B, Khazaee A, Detection of electrocardiogram signals using an efficient method, Appl Soft Comput, 22:108–117, 2014. 30. Martis RJ, Acharya UR, Lim CM, Mandana KM, Ray AK, Chakraborty C, Application of higher order cumulant features for cardiac health diagnosis using ECG signals, Int J Neural Syst 23(4):1350014, 2013. 31. Ge D, Srinivasan N, Krishnan SM, Cardiac arrhythmia classification using autoregressive modeling, BioMed Eng Online 1(5):1–12, 2002. 32. Melgani F, Bazi Y, Classification of electrocardiogram signals with support vector machines and particle swarm optimization, IEEE Trans Inform Technol Biomed 12(5):667–677, 2008. 33. Mishra AK, Raghav S, Local fractal dimension based ECG arrhythmia classification, Biomed Signal Process Control 5(2):114–123, 2010. 34. Daamouche A, Hamami L, Alajlan N, Melgani F, A wavelet optimization approach for ECG signal classification, Biomed Signal Process Control 7(4):342–349, 2011. 35. Doğan B, Korürek M, A new ECG beat clustering method based on kernelized fuzzy c-means and hybrid ant colony optimization for continuous domains, Appl Soft Comput 12(11):3442– 3451, 2012. 36. Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, Yen NC, Tung CC, Liu HH, The empirical mode decomposition and the Hilbert spectrum for non-linear and nonstationary time series analysis, Proc R Soc Lond A: Math, Phys Eng Sci 454:903–995, 1998. 37. Acharya UR, Vinitha Sree S, Swapna G, Martis RJ, Suri JS, Automated EEG analysis of epilepsy: A review, Knowl-Based Syst 45:147–165, 2013. 38. Martis RJ, Acharya UR, Tan JH, Petznick A, Yanti R, Chua CK, Ng EYK, Tong L, Application of empirical mode decomposition (EMD) for automated detection of epilepsy using EEG signals. Int J Neural Syst 22(06):1250027, 2012. 39. Pachori RB, Avinash P, Shashank K, Sharma R, Acharya UR, Application of empirical mode decomposition for analysis of normal and diabetic RR-interval signals, Expert Syst Appl 42(9):4567–4581, 2015. 40. Pal S, Mitra M, Empirical mode decomposition based ECG enhancement and QRS detection, Comput Biol Med 42(1):83–92, 2012. 41. Moody GB, Mark RG, The impact of the MIT-BIH arrhythmia database, IEEE Eng Med Biol Mag 20:45–50, 2001. 42. ANSI/AAMI EC57; Testing and reporting performance results of cardiac rhythm and ST segment measurement algorithms (AAMI Recommended Practice/American National Standard), Order Code: EC57–293, 1998. . 43. Najarian A, Hoa NP, FastICA . 44. Manuel Ortigueira, EMD . 45. Giuseppe Cardillo, kappa . 46. Mallet SG, A theory for multiresolution signal decomposition: The wavelet representation, IEEE Trans Pattern Anal Mach Intel 11(7):674–693, 1989. 47. Singh BN, Tiwari AK, Optimal selection of wavelet basis function applied to ECG signal denoising, Digital Signal Proc 16(3):275–287, 2006. 48. Pan J, Tompkins WJ, A real-time QRS detection algorithm, IEEE Trans Biomed Eng 32(3):230–236, 1985. 49. Rao KR, Yip P, Discrete Cosine Transform: Algorithms, Advantages, Applications, Academic Press, 2014.

1640012-18

Decision Support System for Arrhythmia Beats Using ECG Signals

50. Addison PS, Wavelet transforms and the ECG: A review, Physiol Measure 26(5):R155– R199, 2005. 51. Kohler B-U, Hennig C, Orglmeister R, The principles of software QRS detection, IEEE Eng Med Biol Mag 21(1):42–57, 2002. 52. Jolliffe I, Principal Component Analysis, Springer Series in Statistics, Springer, 2002. 53. Hyvärinen A, Erkki O, Independent component analysis: Algorithms and applications, Neural Netw 13(4):411–430, 2000. 54. Girden ER, ANOVA: Repeated Measures, Sage Publications, Incorporated, 1991. 55. Duda RO, Hart PE, Stork DG, Pattern Classification, John Wiley & Sons, 2012. 56. Cohen J, A coefficient of agreement for nominal scales, Educ Psychol Measure 20(1):37– 46, 1960. 57. Özbay Y, Ceylan R, Karlik B, A fuzzy clustering neural network architecture for classification of ECG arrhythmias, Comput Biol Med 36(4):376–388, 2006. 58. Yu S-N, Chen Y-H, Electrocardiogram beat classification based on wavelet transformation and probabilistic neural network, Pattern Recognit Lett 28(10):1142–1150, 2007. 59. Yu S-N, Chou K-T, Integration of independent component analysis and neural networks for ECG beat classification, Expert Syst Appl 34(4):2841–2846, 2008. 60. Martis RJ, Acharya UR, Mandana KM, Ray AK, Chakraborty C, Application of principal component analysis to ECG signals for automated diagnosis of cardiac health, Expert Syst Appl 39:11792–11800, 2012. 61. Martis RJ, Acharya UR, Mandana KM, Ray AK, Chakraborty C, Cardiac decision making using higher order spectra, Biomed Signal Process Control 8(2):193–203, 2013. 62. Jiang W, Kong GS, Block-based neural networks for personalized ECG signal classification, IEEE Trans Neural Netw 18:1750–1761, 2007. 63. Ince T, Kiranyaz S, Gabbouj M, A generic and robust system for automated patientspecific classification of ECG signals, IEEE Trans Biomed Eng 56:1415–1426, 2009. 64. Martis RJ, Acharya UR, Lim CM, Suri JS, Characterization of ECG beats from cardiac arrhythmia using discrete cosine transform in PCA framework, Knowl-Based Syst 45:76–82, 2013. 65. Martis RJ, Acharya UR, Min LC, ECG beat classification using PCA, LDA, ICA and discrete wavelet transform, Biomed Signal Process Control 8(5):437–448, 2013.

1640012-19