A Rough Set Based Approach for ECG Classification - Semantic Scholar

1 downloads 0 Views 623KB Size Report
fv(ξ). (6) ξ lies between the extreme values of the abscissas involved in the formula. After squaring the values of ...... Wurzburg (1998). 58. Presedo, J., Vila, J., ...
A Rough Set Based Approach for ECG Classification Sucharita Mitra1 , M. Mitra2 , and B.B. Chaudhuri1 1

2

Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India {sucharita r,bbc}@isical.ac.in Department of Applied Physics, Faculty of Technology, University of Calcutta, Kolkata, India [email protected]

Abstract. An inference engine for classification of Electrocardiogram (ECG) signals is developed with the help of a rule based rough set decision system. For this purpose an automated data extraction system from ECG strips is being developed by using a few image processing techniques. A knowledge base is developed after consulting different medical books as well as feedback of reputed cardiologists on interpretation and selection of essential time-plane features of ECG signal. An algorithm for extraction of different time domain features is also developed with the help of differentiation techniques and syntactic approaches. Finally, a rule-based rough set decision system is generated using these time-plane features for the development of an inference engine for disease classification. Two sets of rules are generated for this purpose. The first set is for general separation between normal and diseased subjects. The second set of rules is used for classifications between different diseases. Keywords: Rough set, knowledgebase, decision system, Electrocardiogram (ECG).

1

Introduction

In 1889 Waller first developed a method of recording [21, 62] the ECG voltage by capillary electrometer introduced by Lippman in 1875. The method was improved by using the string galvanometer discovered by Einthoven in 1903. In 1906 Einthoven [16] added a new dimension by introducing the concept of vectors to represent the ECG voltages. He also standardized the electrode locations for collecting ECG signals as right arm (RA), left arm (LA) and left leg (LL), and these locations are known after him as the standard leads of Einthoven or limb leads(see fig. 1). String galvanometer was replaced by electronic amplifiers around 1920s, which allowed the use of less sensitive and more rugged recording devices. Next, direct writing recorders, which used ink or pigment from a ribbon to record the ECG trace on a moving paper strip, were intrduced around 1946. Later on, a special heat sensitive paper was developed, which is now used almost J.F. Peters et al. (Eds.): Transactions on Rough Sets IX, LNCS 5390, pp. 157–186, 2008. c Springer-Verlag Berlin Heidelberg 2008 

158

S. Mitra, M. Mitra, and B.B. Chaudhuri

Fig. 1. The placement of the bipolar leads (Left image) and the exploratory electrode for the unipolar chest leads (Right image) in an electrocardiogram (ECG); (RA = right arm, LA = Left arm, LL = left leg)

exclusively as a recording medium for electrocardiograms. Modern direct writing electrocardiographs have a frequency range extending to over 0 - 100 Hz, which is quite adequate for clinical ECG recordings. Wilson [70] in 1934, and, later, Frank [17] in 1955, have made considerable progress in the dipole theory of the heart. This is the first significant step to solve ECG inetrpretation problem analytically. The human body is assumed to be a uniform, homogeneous, isotropic conducting medium having the shape of a sphere containing a centric current dipole, which simulates the electrical activity of the heart. In addition, Wilson suggested that the three Einthoven leads be connected by means of three equal external resistors to an external node is now called Wilson Central Terminal (WCT). Using centric dipole model [55] WCT is shown to be at electrical zero of the system. He also introduced the unipolar limb leads (AVR , AVL , AVF ) and chest leads (V1, V2 ,. . . V6 ). These unipolar leads along with Einthoven’s bipolar limb leads are used in the modern 12 lead electrocardiogram (see fig. 1). In the late 1960s, low cost microprocessors became available due to advent of integrated-circuit technology and work started to develop, which has immensely advanced during the last 40 years. As a result, the computerized ECG analysis began to emerge on experimental basis with the help of lower cost, on-line, real time computer systems since such systems became both economically and technically feasible for clinical use [13]. However, the research is still continue to boost the sophistication of the methodology and to condense the dimension of the hardware so that it becomes more mobile, accurate and helpful for both doctors and patients.

A Rough Set Based Approach for ECG Classification

1.1

159

Raw ECG Data Extraction

The first step of computer analysis is to acquire digitized ECG data. For this purpose, some small processing algorithms are developed which can transfer continuous data recorded on paper to a digital time data, corresponding to those obtained by A/D converter [10, 68, 69]. An instrumentation scheme using Computer Aided Design and Drafting (Auto CAD) application package is developed by our group to capture ECG database. Here, a single channel strip chart recorder and a digitizer with tablet attached to the RS 422/432 port of the computer is used as input device. A digital plotter/printer is used as the output device [23]. Alternative system is reported in [5], and several others may be found in the literature. 1.2

Work on ECG Wave Segment Detection and Feature Extraction

ECG conveys information regarding the electrical function of the heart, by altering the shape of its constituent wave sesments, namely the P, QRS, and T waves (see fig. 2). Pattern recognition approaches are widely used for the detection and analysis of these waves. Direct signal pattern analysis, non-linear signal transformations, principal component analysis, and neural networks (NN) based techniques are used for ECG pattern recognition and classification [8, 15, 39, 67]. The significant point extraction algorithm, based on the analysis of curvature, is an example of direct signal analysis, that helps in both data reduction as well as pattern matching [11, 36]. Syntactic approaches are also used for ECG waveform analysis [19, 32]. In another study, a learning system is reported to grammatically classify biomedical patterns [20]. Other approaches for P and St segment detection are reported in [35, 61, 65]. In recent years, Wavelet transform has been used to decompose ECG signal for detection of P wave by neural network [72]. In another case, wavelet is employed to obtain a multiresolution representation of some example patterns for ECG signal structure extraction. Similarly, Neural Networks are trained with the wavelettransformed templates providing an efficient detector even for temporally varying patterns within the complete time series [66]. Recently, multi-resolution wavelet transform, wavelet decomposition and continuous wavelet transform have been combined for ECG feature extraction [29, 33, 38, 41, 42]. QRS detection is an important task in time-plane ECG analysis, that helps in detection of other time-plane features more accurately. Hidden Markov Model [12], Wavelet transform [30] and Artificial Neural Network (ANN) [74] were used for the detection of QRS complexes from ECG signals. A slope vector waveform based QRS detection algorithm which is ideal for embedded real-time ECG monitoring is also reported [73]. Various QRS detectors are compared in [18]. Baseline (see fig. 2) detection is another essential task in ECG analysis, which aids for extraction of different time domain features. Most methods for baseline or isoelectric line detection are based on the assumption that the isoelectric level of the signal lies on the area ∼80 ms left of the R-peak, where the first derivative remains zero for at least 10 ms or minimum in the 20 ms segment [39, 40].

160

S. Mitra, M. Mitra, and B.B. Chaudhuri

Fig. 2. A typical cycle of ECG signal

1.3

Studies on ECG Classification and Abnormality Detection

Considerable research has been done to assist cardiologists with their task of diagnosing the ECG recordings. The research field range from abnormality detections to fully automated ECG diagnosing systems. A wide range of techniques has been used, including statistical pattern recognition, Expert Systems, Artificial Neural Networks, Wavelet Transform, Fuzzy and Neuro-fuzzy Systems. The computer task for ECG interpretation comprises of two distinct and sequential phases: feature extraction and classification. A set of signal measurements containing information for the characterization of the waveform is obtained by feature extraction methods. These waveform descriptors are then used to allocate the ECG to one or more diagnostic classes in the classification phase. These classifiers may be heuristic and use rules-of-thumb or employ fuzzy logic as a reasoning tool [14]. A classifier may also be statistical with the use of complex and even abstract signal feature probability to define discriminant functions for class allocation. Many approaches have been proposed to generate Expert Systems for ECG diagnosis [1, 71]. An approach to intelligent ischaemia event detection is proposed based on ECG ST-T segment analysis. ST-T trends have been processed by a Bayesian forecasting approach using multistate Kalman filter [7]. Several Ischemia detection methods are proposed in [37, 40, 56]. Hermite functions and Self Organizing Maps(SOM) are used for clustering ECG complexes [34]. Another system for automatic analyzing of ECG is proposed in [71]. Expert’s knowledgebase dependent Ischemia detection and automatic ECG interpretation techniques are described in [46, 58, 63]. More recently, artificial neural network techniques have been employed for signal classification [6]. Learning algorithms for two phase and three phase radial basis function (RBF) networks are proposed in [3]. Bi-group Neural Network classifiers are also utilized to examine independent feature vectors of ECG recordings for each diagnostic class and the outputs from the classifiers are fused together to produce single result [44].

A Rough Set Based Approach for ECG Classification

161

A method is developed with wavelet transforms as features extractor and Radial Basis Function Neural Network (RBFNN) as classifier for arrhythmia detection [2]. Also, Fuzzy Adaptive Resonance Theory MAP (ARTMAP) is used to classify cardiac arrhythmias [25]. A hybrid neuro-fuzzy system for ECG classification of myocardial infarction is reported in [9]. For the past few years, rough set theory and granular computation has proved to be another soft computing tool which, in various synergetic combination with fuzzy logic, artificial neural network and genetic algorithms provide a stronger frame work to achieve tractability, low cost solution, robustness and close resembles with human like decision making. For example, rough-fuzzy integration is the basis of the Computational Theory of Perceptions (CPT), recently explained by Zadeh, where perceptions are considered to have fuzzy boundaries and granular attribute values. Similarly to describe different concept or classes, crude domain knowledge in the form of rules are extracted with the help of rough neural synergistic integration and encoded them as network parameters. Thus the initial knowledge base network for efficient learning has been built. In the case of granular computation every operation are done on granules (clump of similar objects or points), rather than on the individual data points. As a result, the computation time is greatly reduced. As the methodology is getting matured, several interesting applications of the theory have surfaced, also in medicine. For example, in a medical setting, sets of interest to approximate could be the set of patients with a certain disease or outcome, or the set of patients responsive to a certain treatment. Pawlak [48] used rough set theory in Bayes’ theorem and showed that it can apply for generating rule base to identify the presence or absence of disease. Discrete Wavelet Transform and rough set theory were combined for classification of arrhythmia [31].

2

Basics of Electrocardiogram (ECG)

A pair of surface electrodes placed two different locations on the heart of the body will record a repeating pattern of changes in electrical “action potential” of the heart. The heart has four chambers namely left atrium, left ventricle and right atrium, right ventricle. As action potentials spread from the atria to the ventricles, the voltage measured between these two electrodes will vary in a way that provides a “picture” of the electrical activity of the heart. The nature of this picture can be varied by changing the position of the recording electrodes; different positions provide different perspectives, enabling an observer to gain a more complete picture of the electrical events. The body is a good conductor of electricity because tissue fluids contain a high concentration of ions that move (creating a current) in response to potential differences. Potential differences generated by the heart are thus conducted to the body surface where they are recorded by surface electrodes placed on the skin. The recording is called an electrocardiogram (ECG or EKG). There are two types of ECG recording leads. The bipolar limb leads record the voltage between electrodes placed on the wrists and legs. These bipolar leads

162

S. Mitra, M. Mitra, and B.B. Chaudhuri

include lead I (right arm to left arm), lead II (right arm to left leg), and lead III (left arm to left leg). In the unipolar leads, voltage is recorded between a single exploratory electrode placed on the body and an electrode that is built into the electrocardiograph and maintained at zero potential (ground). The unipolar limb leads are placed on the right arm, left arm, and left leg; these are abbreviated AVR , AVL , and AVF , respectively. The unipolar chest leads are labeled one through six, starting from the midline position (fig. 1). There are thus a total of twelve standard ECG leads that “view” the changing pattern of the heart’s electrical activity from different perspectives. This is important because certain abnormalities are best seen with particular leads and may not be visible with other leads. As shown in fig. 2, each cardiac cycle produces three distinct ECG wave segments designated P, QRS, and T. When a heart muscle cell is stimulated, it begins to depolarize and the spread of depolarization through the atria causes a potential difference that is indicated by an upward deflection of the ECG line. When nearly half the mass of the atria is depolarized, this upward deflection reaches a maximum value, because the potential difference between the depolarized and unstimulated portions of the atria is at a maximum. When the entire mass of the atria is depolarized, the ECG returns to baseline, because all regions of the atria have the same polarity. In this way, the spread of atrial depolarization creates the P wave. Similarly, conduction of the impulse into the ventricles creates a potential difference that results in a sharp upward deflection of the ECG line, which then returns to the baseline as the entire mass of the ventricles becomes depolarized. The spread of the depolarization into the ventricles is represented by the QRS wave. During this time the atria does repolarize i.e, it retuns to its resting state, but this event is hidden by the greater depolarization occurring in the ventricles. Finally, repolarization of the ventricles produces the T wave (fig. 2). The T wave may be notched or inverted in shape also. Sometimes another rounded wave, the U wave, follow the T wave. The exact significance of this wave is not clearly known. Functionally it represents the last phase of ventricular repolarization. Normally the prominent direction of U wave is the same as that of T wave. Negative U waves sometimes appear with positive T waves. This abnormal situation has been noted in left ventricular hypertrophy and myocardial ischemia.

3

Ischemic Heart Disease (IHD)

Different heart diseases that can be interpreted by ECG may be broadly classified into 4 major classes. They are: (a) Atrial and Ventricular Enlargement or Chamber Enlargement, (b) Ventricular Conduction Disturbance, (c) Ischemic Heart Disease (IHD) and (d) Cardiac Rhythm Disturbance. But statistical surveys indicate that IHD is a major health burden in India and other developing countries. In this paper we concentrate on analysis and classification of IHD. Fig. 3 shows a cross section through the heart muscle, called cardium, having several layers. The innermost layer, called endocardium, is a layer of smooth lining cells.

A Rough Set Based Approach for ECG Classification

163

Fig. 3. Layers of the Heart Muscle

The myocardium is the mass of the heart muscle cells whose coordinated contraction causes the chambers of the heart to contract and pump blood. The next layer myocardium is thin in the atria, thicker in the right ventricle and thickest in the left ventricle. The epicardium is a fatty layer on the outer surface on the myocardium. The major coronary blood vessels, the vessels that supply blood to the heart itself, run through the epicardium. The outermost layer is the pericardium, actually two layers with a small amount of lubricating fluid between them, forming the pericardial sac which encloses the entire heart. Myocardial cells require oxygen and other nutrients supplied by the coronary arteries. Severe narrowing or complete blockage of a coronary artery cause, the blood flow to be inadequate, then ischemia of the heart muscle develops. If the ischemia is more severe, permanent damage or necrosis (cell death) of a portion of heart muscle may occur. Myocardial Infarction (MI) refers to myocardial necrosis (“heart attack”) which is usually caused by severe ischemia. Myocardial Ischemia or Infraction may affect the entire thickness of the ventricular muscle (transmural injury) or may be localized to the inner layer of the ventricle (subendocardial ischemia or infarction). Transmural MI often(but not always) produces a typical sequence of ST-T changes and abnormal Q waves (duration is 0.04sec. or more in lead I, all three inferior leads[II, III, aVF ], or leads V3 to V6 ). The ST-T changes can be divided into two phases: The acute phase of transmural MI is marked by ST segments elevations and sometimes tall positive T waves (hyper-acute T waves). The evolving phase is characterized by the appearance of deeply inverted T waves in leads that show the hyperacute T waves and ST elevations. 3.1

Classification of MI

Infarction of the heart generally occur in left ventricle which is cone shaped and divided into 4 regions (Basal, Mid, Apical and Apex) of 17 segments (6 basal,

164

S. Mitra, M. Mitra, and B.B. Chaudhuri

Fig. 4. 17 standard segments and 4 walls of left ventricular cone

6 medial, 4 apical and the apex) and 4 walls (Anterior, Inferior, Septal and Lateral)[fig. 4]. MI can be classified according to the location of the damage in the walls of the left ventricular cone. Hence, if the infarction or necrosis occur in the inferior wall then it will be classified as inferior wall infarction and typical infarction pattern is reflected in lead II, III and AVF . Similarly, damage in anterior wall will be termed as anterior wall infarction and signal in standard lead I, AVL and all the precordial leads (V1 to V6 ) show the infarction pattern. Beside these, there are two other walls known as septal (lead V1 , V2 ) and lateral (lead I, AVL, V5 , V6 ). Damage in septal and lareral surfaces along with anterior or inferior surfaces may be termed as antero-lateral, antero-septal, inferolateral, infero-septal MI and signal from all the leads oriented to these surfaces will show the typical infarction pattern.

4

Rough Sets

The theory of rough sets, introduced by Pawlak [49, 50] in 1982, has lately emerged as a key mathematical tool for managing ambiguity that arises from vague, noisy or partial information. It is methodologically significant to the domains of artificial intelligence and cognitive sciences, especially in the representation of reasoning with vague or imprecise knowledge, data classification, rule creation, machine learning, data mining, and knowledge discovery. The theory is also showing to be of substantial consequence in many other areas of applications [52, 53, 54]. 4.1

Mathematical Basics of Rough-Set Theory

Rough set theory proposed by Pawlak [51], deals with imprecise or vague concepts. Central to the theory is an information system that can be viewed as a

A Rough Set Based Approach for ECG Classification

165

data table, whose columns are labeled by attributes, rows are labeled by objects of interest and entries of the table are the attribute values. If, U and A are finite, nonempty sets where, U is the universe of objects and A is the set of attributes then, S = (U,A) is an information table where every attribute a ∈ A is associated with a set Va , of its values, called the domain of a. Any subset B of A will establish a binary relation I(B) on U , called an indiscernibility relation, by satisfying the following conditions. x I (B) y if and only if a (x) = a (y) for every a ∈ B, where a(x) denotes the value of attribute a for object x. Obviously I(B) is an equivalence relation. The family of all equivalence classes of I(B), i.e., a partition determined by B, will be denoted by U/I(B), or simply by U/B; an equivalence class of I(B), i.e., block of the partition U/B, containing x will be denoted by B(x). If (x, y) ∈ I (B) it is said that x and y are B-indiscernible (indiscernible with respect to B). Equivalence classes of the relation I(B)(or blocks of the partition U/B) are referred to as B-elementary sets or B-granules. In rough set based methods, these granules are the basic building blocks about our knowledge of realty. The union of B-granules are known as B-definable sets. Now, consider Xa proper subset of universe U. Two sets B∗ (X) and B ∗ (X), called the B-lower and the B-upper approximation of X, respectively, can be defined as  {B(x) : B(x) ⊆ X}. (1) B∗ (X) = x∈U

B ∗ (X) =



{B(x) : B(x) ∩ X = φ}.

(2)

x∈U

It is clear that B-lower approximation of a set is the union of all B-granules that are included in the set, whereas B-upper approximation of a set is the union of all B-granules that have a nonempty intersection with the set. The set BN B (X) = B* (X) – B ∗ (X) Is defined as the B-boundary region of X. If the boundary region of X is the empty set, i.e., BN B (X) = Ø, then X is crisp (exact) set with respect to B. On the other hand, if BN B (X)= Ø, X is referred to as rough (inexact) set with respect to B. 4.2

Rough-Set Description

In various rule based practical applications, rough set theory is used for getting the optimal number of appropriate rules needed for developing a classifier. From every information system, a subset of minimal attributes is generated which is known as reduct. Determination of reduct is a computationally expensive task. Different algorithms are available to generate rules from this reduct. To describe such a system more precisely, consider a decision table expressed as S = (U,C, D) , where A i.e, the set of attributes are partitioned into two classes C, D ⊆ A, called condition and decision attributes respectively. Every x

166

S. Mitra, M. Mitra, and B.B. Chaudhuri

∈ U determines a sequence c1 (x) , . . . , cn (x), d1 (x) , . . . , dm (x) where {c1 , . . . , cn } = C (conditions) and {d1 , . . . , dm } = D (decisions). The sequence will be called a decision rule induced by x (in S) and denoted by c1 (x) , . . . , cn (x) → d1 (x) , . . . , dm (x) . In short, C→x D. Thus, the decision table determines decisions, which must be taken, when some conditions are satisfied. In other words, each row of the decision table specifies a decision rule which determines decisions in terms of conditions. The term suppx (C,D) = |C (x) ∩ D(x) | is called a support of the decision rule x (C,D) , will be referred to as the C →x D and the number σx (C,D) = sup p|U| strength of the decision rule C →x D. Every decision rule C →x D is allied with the certainty factor of the decision rule, denoted by cerx (C,D) and defined as cerx (C, D) =

|C(x) ∩ D(x)| sup px (C, D) σx (C, D) = = . |C(x)| |C(x)| π(C(x)) where π(C(x)) =

(3)

|C(x)| |U| .

The certainty factor may be interpreted as a conditional probability that y belongs to D(x) given y belongs to C (x), symbolically πx (D|C) where y must be an object of the universal set. If cerx (C,D) = 1, then C →x D will be called a certain decision rule in S; if 0 < cerx (C,D) < 1 the decision rule will be referred to as an uncertain decision rule in S. Besides, a coverage factor of the decision rule is also used and denoted as covx (C,D), defined as covx (C, D) =

sup px (C, D) σx (C, D) |C(x) ∩ D(x)| = = . |D(x)| |D(x)| π(D(x)) where π(D(x)) =

(4)

|D(x)| |U| .

Similarly covx (C,D) = πx (C|D) . If C →x D is a decision rule then D →x C will be called an inverse decision rule. The inverse decision rules can be employed to provide explanations (reasons) for decisions. Decision rules are normally represented in a form of “if ... then ...” implications. So decision table can be altered in a set of “if ... then ...” rules, called a decision algorithm. Using this decision algorithm, the optimal rules are generated which are used for development of the rule based classifier. Another important factor in data analysis is to find out the degree of dependency γ (C, D) between condition attributes C and decision attributes D. It can be shown that D depends on C in a degree k (0 ≤ k ≤ 1) where C →k D, if k = γ(C, D) =

|P OSC (D)| . |U |

(5)

A Rough Set Based Approach for ECG Classification

where, POSC (D) =



167

C∗ (X) is known as a positive region of the partition

X∈U/D

U/D with respect to C. Positive region is actually the set of all elements of U that can be individually classified to blocks of the partition U/D by means of C. If k=0, then D is independent on C. On the other hand, if k=1, D is fully dependent on C. Values 0< k < 1 denote partial dependency.

5

Materials and Methods of Analysis

The block diagram of the developed system is given in fig. 5. The detail methodologies are described below in step by step. 5.1

Development of ECG Data Extraction System

A software is being developed for getting pixel to pixel co-ordinate information of every ECG images with the help of few image processing techniques [43]. For development of off-line data extraction system [GUI based], the paper records are scanned by flat-bed scanner (HP Scanjet 2300C) to form image database in TIFF format. These TIFF formatted gray tone images are converted into two tone binary images with the help of a global thresholding technique on gray value histogram [22]. This method almost removes the background including the grid lines of paper strips from the actual ECG signal. The rest dotted portion of the background noise are removed by component labeling [22]. Then a thinning algorithm [22] is applied on the two tone image to avoid repetition of co-ordinate information in the dataset (fig. 6). The pixel to pixel co-ordinate information is extracted and calibrated according to the electrocardiographic paper to generate an ASCII datafile. A time (in sec.) Vs. millivolt data-file is obtained for each of 12 lead ECG signal after processing as above [43]. The present database contains ECG from 85 normal and 85 diseased subjects, out of which 50 patients had acute myocardial infarction (MI) and rest 35 patients had Myocardial Ischemia.

Fig. 5. Block Diagram of the Proposed System

168

S. Mitra, M. Mitra, and B.B. Chaudhuri

Table 1. Extracted database of an image

X (sec.) 0.0032 0.0064 0.0096 0.0128 0.016 0.0192 0.0224 0.0256 0.0288 0.032 0.0352 0.0384 0.0416 0.0448 0.048 0.0512 0.0544 0.0576 0.0608 0.064 0.0672 0.0704 0.0736 0.0768 0.08 0.0832 0.0864 0.0896 0.0928 0.096 0.0992 0.1024 0.1056 0.1088 0.112 0.1152 0.1184 0.1216 0.1248

Paper speed = 25 mm/s, Calibration factor = 10 mv/mm, Total no. of points = 615, Heart rate = 85 beats/min Y (mv) X (sec.) Y (mv) X (sec.) Y (mv) X (sec.) Y (mv) X (sec.) 0.36 0.3936 0.456 0.784 0.272 1.1744 0.32 1.5648 0.36 0.3968 0.456 0.7872 0.272 1.1776 0.312 1.568 0.36 0.4 0.456 0.7904 0.272 1.1808 0.312 1.5712 0.36 0.4032 0.448 0.7936 0.272 1.184 0.312 1.5744 0.36 0.4064 0.448 0.7968 0.28 1.1872 0.304 1.5776 0.352 0.4096 0.448 0.8 0.304 1.1904 0.304 1.5808 0.352 0.4128 0.44 0.8032 0.368 1.1936 0.304 1.584 0.352 0.416 0.432 0.8064 0.584 1.1968 0.304 1.5872 0.352 0.4192 0.424 0.8096 0.624 1.2 0.304 1.5904 0.352 0.4224 0.416 0.8128 0.792 1.2032 0.296 1.5936 0.344 0.4256 0.416 0.816 0.84 1.2064 0.296 1.5968 0.344 0.4288 0.416 0.8192 0.928 1.2096 0.296 1.6 0.344 0.432 0.408 0.8224 0.88 1.2128 0.296 1.6032 0.344 0.4352 0.408 0.8256 0.848 1.216 0.296 1.6064 0.344 0.4384 0.4 0.8288 0.8 1.2192 0.288 1.6096 0.336 0.4416 0.392 0.832 0.592 1.2224 0.288 1.6128 0.336 0.4448 0.384 0.8352 0.456 1.2256 0.288 1.616 0.328 0.448 0.384 0.8384 0.296 1.2288 0.288 1.6192 0.328 0.4512 0.376 0.8416 0.176 1.232 0.28 1.6224 0.328 0.4544 0.376 0.8448 0.144 1.2352 0.28 1.6256 0.32 0.4576 0.368 0.848 0.144 1.2384 0.28 1.6288 0.32 0.4608 0.368 0.8512 0.144 1.2416 0.28 1.632 0.32 0.464 0.36 0.8544 0.152 1.2448 0.28 1.6352 0.32 0.4672 0.36 0.8576 0.152 1.248 0.28 1.6384 0.312 0.4704 0.36 0.8608 0.152 1.2512 0.28 1.6416 0.312 0.4736 0.36 0.864 0.16 1.2544 0.28 1.6448 0.304 0.4768 0.352 0.8672 0.168 1.2576 0.28 1.648 0.304 0.48 0.352 0.8704 0.168 1.2608 0.28 1.6512 0.304 0.4832 0.352 0.8736 0.168 1.264 0.28 1.6544 0.304 0.4864 0.344 0.8768 0.168 1.2672 0.28 1.6576 0.296 0.4896 0.344 0.88 0.176 1.2704 0.28 1.6608 0.296 0.4928 0.344 0.8832 0.184 1.2736 0.28 1.664 0.296 0.496 0.336 0.8864 0.184 1.2768 0.28 1.6672 0.288 0.4992 0.336 0.8896 0.184 1.28 0.28 1.6704 0.288 0.5024 0.328 0.8928 0.184 1.2832 0.28 1.6736 0.288 0.5056 0.32 0.896 0.192 1.2864 0.28 1.6768 0.28 0.5088 0.32 0.8992 0.192 1.2896 0.28 1.68 0.296 0.512 0.32 0.9024 0.192 1.2928 0.272 1.6832 0.344 0.5152 0.32 0.9056 0.2 1.296 0.272 1.6864

Y (mv) 0.136 0.136 0.144 0.144 0.144 0.152 0.152 0.16 0.16 0.168 0.168 0.168 0.168 0.176 0.176 0.176 0.184 0.184 0.184 0.192 0.192 0.192 0.192 0.192 0.2 0.2 0.208 0.208 0.216 0.216 0.216 0.224 0.224 0.232 0.24 0.24 0.248 0.256 0.272

A Rough Set Based Approach for ECG Classification

169

Table 1. (continued) X (sec.) 0.128 0.1312 0.1344 0.1376 0.1408 0.144 0.1472 0.1504 0.1536 0.1568 0.16 0.1632 0.1664 0.1696 0.1728 0.176 0.1792 0.1824 0.1856 0.1888 0.192 0.1952 0.1984 0.2016 0.2048 0.208 0.2112 0.2144 0.2176 0.2208 0.224 0.2272 0.2304 0.2336 0.2368 0.24 0.2432 0.2464 0.2496 0.2528 0.256

Y (mv) X (sec.) 0.456 0.5184 0.616 0.5216 0.68 0.5248 0.784 0.528 0.824 0.5312 0.88 0.5344 0.944 0.5376 0.776 0.5408 0.68 0.544 0.568 0.5472 0.36 0.5504 0.216 0.5536 0.168 0.5568 0.168 0.56 0.168 0.5632 0.168 0.5664 0.168 0.5696 0.176 0.5728 0.176 0.576 0.176 0.5792 0.176 0.5824 0.176 0.5856 0.184 0.5888 0.184 0.592 0.192 0.5952 0.192 0.5984 0.2 0.6016 0.208 0.6048 0.208 0.608 0.208 0.6112 0.208 0.6144 0.216 0.6176 0.216 0.6208 0.224 0.624 0.224 0.6272 0.224 0.6304 0.224 0.6336 0.232 0.6368 0.232 0.64 0.232 0.6432 0.232 0.6464

Y (mv) X (sec.) 0.32 0.9088 0.32 0.912 0.32 0.9152 0.32 0.9184 0.32 0.9216 0.32 0.9248 0.32 0.928 0.32 0.9312 0.32 0.9344 0.312 0.9376 0.312 0.9408 0.312 0.944 0.304 0.9472 0.304 0.9504 0.304 0.9536 0.304 0.9568 0.304 0.96 0.304 0.9632 0.304 0.9664 0.304 0.9696 0.304 0.9728 0.296 0.976 0.296 0.9792 0.296 0.9824 0.296 0.9856 0.296 0.9888 0.296 0.992 0.296 0.9952 0.296 0.9984 0.304 1.0016 0.304 1.0048 0.296 1.008 0.296 1.0112 0.296 1.0144 0.296 1.0176 0.296 1.0208 0.296 1.024 0.296 1.0272 0.296 1.0304 0.296 1.0336 0.296 1.0368

Y (mv) 0.2 0.2 0.208 0.208 0.208 0.216 0.216 0.216 0.224 0.224 0.224 0.224 0.232 0.232 0.232 0.232 0.24 0.248 0.248 0.256 0.256 0.256 0.256 0.264 0.264 0.272 0.28 0.288 0.288 0.296 0.304 0.312 0.312 0.32 0.328 0.344 0.352 0.36 0.376 0.384 0.392

X (sec.) 1.2992 1.3024 1.3056 1.3088 1.312 1.3152 1.3184 1.3216 1.3248 1.328 1.3312 1.3344 1.3376 1.3408 1.344 1.3472 1.3504 1.3536 1.3568 1.36 1.3632 1.3664 1.3696 1.3728 1.376 1.3792 1.3824 1.3856 1.3888 1.392 1.3952 1.3984 1.4016 1.4048 1.408 1.4112 1.4144 1.4176 1.4208 1.424 1.4272

Y (mv) X (sec.) 0.272 1.6896 0.272 1.6928 0.272 1.696 0.272 1.6992 0.272 1.7024 0.264 1.7056 0.264 1.7088 0.264 1.712 0.264 1.7152 0.264 1.7184 0.264 1.7216 0.264 1.7248 0.264 1.728 0.264 1.7312 0.264 1.7344 0.264 1.7376 0.264 1.7408 0.264 1.744 0.264 1.7472 0.264 1.7504 0.264 1.7536 0.264 1.7568 0.264 1.76 0.272 1.7632 0.272 1.7664 0.28 1.7696 0.28 1.7728 0.28 1.776 0.28 1.7792 0.28 1.7824 0.28 1.7856 0.28 1.7888 0.28 1.792 0.28 1.7952 0.272 1.7984 0.272 1.8016 0.272 1.8048 0.264 1.808 0.264 1.8112 0.264 1.8144 0.264 1.8176

Y (mv) 0.28 0.288 0.296 0.304 0.312 0.328 0.336 0.336 0.344 0.344 0.352 0.352 0.352 0.36 0.36 0.368 0.368 0.368 0.368 0.368 0.36 0.36 0.36 0.352 0.352 0.352 0.344 0.336 0.328 0.328 0.32 0.312 0.312 0.312 0.304 0.304 0.296 0.288 0.288 0.28 0.28

170

S. Mitra, M. Mitra, and B.B. Chaudhuri

Table 1. (continued) X (sec.) 0.2592 0.2624 0.2656 0.2688 0.272 0.2752 0.2784 0.2816 0.2848 0.288 0.2912 0.2944 0.2976 0.3008 0.304 0.3072 0.3104 0.3136 0.3168 0.32 0.3232 0.3264 0.3296 0.3328 0.336 0.3392 0.3424 0.3456 0.3488 0.352 0.3552 0.3584 0.3616 0.3648 0.368 0.3712 0.3744 0.3776 0.3808 0.384 0.3872

Y (mv) X (sec.) 0.24 0.6496 0.24 0.6528 0.24 0.656 0.248 0.6592 0.248 0.6624 0.248 0.6656 0.248 0.6688 0.248 0.672 0.248 0.6752 0.256 0.6784 0.256 0.6816 0.256 0.6848 0.264 0.688 0.264 0.6912 0.272 0.6944 0.28 0.6976 0.28 0.7008 0.296 0.704 0.304 0.7072 0.312 0.7104 0.32 0.7136 0.32 0.7168 0.328 0.72 0.336 0.7232 0.344 0.7264 0.36 0.7296 0.368 0.7328 0.376 0.736 0.376 0.7392 0.384 0.7424 0.4 0.7456 0.408 0.7488 0.416 0.752 0.416 0.7552 0.424 0.7584 0.432 0.7616 0.44 0.7648 0.44 0.768 0.448 0.7712 0.448 0.7744 0.456 0.7776

Y (mv) X (sec.) 0.296 1.04 0.296 1.0432 0.296 1.0464 0.296 1.0496 0.296 1.0528 0.296 1.056 0.296 1.0592 0.296 1.0624 0.296 1.0656 0.296 1.0688 0.296 1.072 0.296 1.0752 0.296 1.0784 0.304 1.0816 0.304 1.0848 0.304 1.088 0.312 1.0912 0.32 1.0944 0.32 1.0976 0.312 1.1008 0.312 1.104 0.312 1.1072 0.312 1.1104 0.312 1.1136 0.304 1.1168 0.304 1.12 0.304 1.1232 0.304 1.1264 0.304 1.1296 0.296 1.1328 0.296 1.136 0.296 1.1392 0.296 1.1424 0.296 1.1456 0.296 1.1488 0.296 1.152 0.296 1.1552 0.288 1.1584 0.288 1.1616 0.28 1.1648 0.28 1.168

Y (mv) 0.4 0.408 0.416 0.416 0.424 0.432 0.432 0.44 0.44 0.448 0.448 0.448 0.448 0.44 0.44 0.44 0.432 0.424 0.424 0.416 0.416 0.408 0.408 0.4 0.392 0.392 0.384 0.376 0.368 0.368 0.368 0.36 0.36 0.352 0.344 0.336 0.336 0.328 0.328 0.328 0.32

X (sec.) 1.4304 1.4336 1.4368 1.44 1.4432 1.4464 1.4496 1.4528 1.456 1.4592 1.4624 1.4656 1.4688 1.472 1.4752 1.4784 1.4816 1.4848 1.488 1.4912 1.4944 1.4976 1.5008 1.504 1.5072 1.5104 1.5136 1.5168 1.52 1.5232 1.5264 1.5296 1.5328 1.536 1.5392 1.5424 1.5456 1.5488 1.552 1.5552 1.5584

Y (mv) X (sec.) 0.264 1.8208 0.256 1.824 0.256 1.8272 0.256 1.8304 0.256 1.8336 0.248 1.8368 0.248 1.84 0.24 1.8432 0.24 1.8464 0.24 1.8496 0.232 1.8528 0.232 1.856 0.232 1.8592 0.232 1.8624 0.24 1.8656 0.288 1.8688 0.408 1.872 0.488 1.8752 0.64 1.8784 0.688 1.8816 0.816 1.8848 0.728 1.888 0.688 1.8912 0.616 1.8944 0.464 1.8976 0.248 1.9008 0.136 1.904 0.104 1.9072 0.088 1.9104 0.08 1.9136 0.088 1.9168 0.088 1.92 0.096 1.9232 0.096 1.9264 0.104 1.9296 0.112 1.9328 0.112 1.936 0.12 1.9392 0.12 1.9424 0.128 1.9456 0.128 1.9488

Y (mv) 0.272 0.272 0.264 0.264 0.264 0.264 0.264 0.256 0.256 0.248 0.248 0.248 0.248 0.248 0.24 0.24 0.24 0.24 0.24 0.24 0.24 0.24 0.232 0.232 0.232 0.232 0.232 0.232 0.224 0.224 0.224 0.224 0.224 0.224 0.224 0.224 0.224 0.224 0.216 0.216 0.216

A Rough Set Based Approach for ECG Classification

Normal

Myocardial Ischemia

171

Myocardial Infarction

Fig. 6. Original ECG image[Upper], ECG signal after removal of noise[Middle], ECG signal after thinning [Lower]

The ASCII datafile generated for the image of ischemic sample shown in fig. 6, is given in table 1. 5.2

Removal of Noises from ECG Signals

Electrocardiographic signals may be corrupted by different types of noises [18]. Typical examples are: 1.power line Interference, 2.electrode contact noise, 3.motion artifacts, 4.muscle contraction(electrmyographic,EMG), 5.baseline drift and ECG amplitude modulation with respiration, and 6.electrosurgical noise. All the noises are simulated by a software package Cool Edit Pro offered by Syntrillium Software Corporation. This is done to get a realistic situation for the algorithm. The EMG is simulated by adding random noise(white noise) to the ECG. An FIR filter depending upon Savitzky-Golay algorithm is developed to remove EMG like white noises from the ECG signals. 50Hz sinusoid is modeled as power line interference and added with ECG. The base line drift due to respiration was modeled as a sinusoid of frequency 0.15 to 0.4 Hz. A 50 Hz Notch filter is designed for rejection of frequency band due to power line oscillation. The selection of notch width is very important. It should not affect the notch depth and hence, would not be too narrow as well as too wide. In our experiment, the notch width is fixed around 4 to 6 Hz keeping 50 Hz at the center of the notch. So the signal should not be distorted so much after using notch filter, especially,the low frequency band should remain less affected because this band carries more useful information. Then a high pass filter of critical frequency 0.6 Hz is developed to block the low frequency noise signal that causes the base line shift. The fundamental frequency of ECG signals generally varies from 0.8 Hz to 1.8 Hz and the ECG bandwidth is 0.8Hz to 500Hz. But in conventional ECG machines this bandwidth is reduced to 0.8Hz to 80Hz since the mechanical stylus of ECG machines cannot move

172

S. Mitra, M. Mitra, and B.B. Chaudhuri

faster. Hence, 0.6Hz high pass filter blocks only the noise and pass the original signal. Both these FIR filters are also designed by the Cool Edit Pro software. The abrupt base line shift is simulated by adding a dc bias for a given segment of the ECG. This noise can be blocked with the help of the high pass filter described above. Since motion artifact is similar to baseline drift in respiration, it was not specifically modeled. All of these noises are added to the ECG signal to simulate the composite noise. This corrupted ECG signal is passed through all the filters described above to get almost noise free ECG signal. All types of noise levels are varied from 10% to 30% and the generated filters have produced good response in all the cases. 5.3

Time-Plane Features Extraction

Accurate detection of the R-R interval between two consecutive ECG waves is very important to extract the time based features from ECG signals. For this purpose, the 2nd order derivative of the captured signal is computed by 5-point Lagrangian interpolation formula for differentiation [27] given below : f0

=

1 h4 (f−2 − 8f−1 + 8f1 − f2 ) + f v (ξ). 30 12h

(6)

ξ lies between the extreme values of the abscissas involved in the formula. After squaring the values of 2nd order derivative, a square-derivative curve having only high positive peaks of small width at the QRS complex region can be obtained (fig. 7). A small window of length (say W) was taken to detect the area of this curve and we obtained maximum area at those peak regions. The local maxima of these peak regions are considered as R-peak. For this experiment the value of W is set as ∼0.07 sec. The system is tested for both noise free and noisy signals. The levels of all type of noises are increased from 0% to 30% and still we achieved 99.5% accuracy in detection of QRS complexes. In order to accurately detect P wave and ST segments, the isoelectric line must be correctly identified. Most methods employed for this purpose are based on the assumption that the isoelectric level of the signal lies on the area ∼80 ms left of the R-peak, where the first derivative becomes equal. In particular, let y1 ,y2 , ..., yn be the samples of a beat [R-R interval], y1 , y2 , ..., yn −1 be their first differences and yr the sample where the R-peak occurs. The isoelectric level samples yb are then defined if either of the two following criteria is satisfied:     (7) y r−j−int(0.08f )  = 0, j = 1, 2, . . .., 0.01f or | y r−j−int(0.08f ) |



| y r−i−int(0.08f ) | , i, j = 1,2,. . . .,0.02f

where f is the sampling frequency. After detection of baseline, the location of P wave is determined from the first derivative of the samples.

A Rough Set Based Approach for ECG Classification

173

Fig. 7. QRS complex or R-R interval Detection

The R wave can be detected very reliably and for this reason, it is used as the starting point for ST segment [fig 8] processing, and for T wave detection. In most algorithms dealing with ST segment processing it is assumed that the ST segment begins at 60 ms after the R-peak in normal sinus rhythm. In the case of tachycardia (RR-interval