Use of Support Vector Machines and Neural Network ... - Springer Link

C 2005) Journal of Medical Systems, Vol. 29, No. 3, June 2005 ( DOI: 10.1007/s10916-005-5187-4

Use of Support Vector Machines and Neural Network in Diagnosis of Neuromuscular Disorders Nihal Fatma Guler ¨ 1,2 and Sabri Koçer1

In this study the performance of support vector machine (SVM)and back-propagation neural network were applied to analyze the classification of the electromyogram (EMG) signals obtained from normal, neuropathy and myopathy subjects. By using autoregressive (AR) modeling, AR coefficients were obtained from EMG signals. Moreover, the support vector machine and artificial neural network (ANN) were used as base classifiers. The AR coefficients were benefited as inputs for SVM and ANN. Besides, these coefficients were tested both in ANN and SVM. The results show that SVM has high anticipation level in the diagnosis of neuromuscular disorders. It is proved that its test performance is high compared with ANN. KEY WORDS: back-propagation (BP); artificial neural network (ANN); support vector machine (SVM); electromyogram (EMG).

INTRODUCTION The electromyogram (EMG) is a signal that measures the electrical activity in a muscle.(1) In assisting the diagnosis of neuromuscular disorders such as myopathy and neuropathy, clinical analysis of electromyogram is a powerful tool.(2) The advent of computer technology allowed scientists to renew their efforts in the analysis of EMG so as to improve the assessment of neuromuscular disorders.(3) Computer-aided EMG processing saves time, standardizes the measurement and enables the feature extraction of EMG signals that could not be calculated manually. Signal analysis permits feature extraction from EMG signals and more reliable classification. The parameters of some stochastic models such as autoregressive (AR) or autoregressive moving average (ARMA) can be used as a feature set.(4−6) In clinical EMG, AR analysis of motor unit action potential (MUAP) was suggested as a diagnostic tool by Coatrieux,(7) Maranzana et al.(8) and Basmajian et al.(9) 1 Department

of Electronics and Computer Education, Faculty of Technical Education, Gazi University, Teknikokullar, Ankara, Turkey. 2 To whom correspondence should be addressed; e-mail: [email protected]. 271 C 2005 Springer Science+Business Media, Inc. 0148-5598/05/0600-0271/0

272

Guler ¨ and Koçer

Artificial intelligence is a valuable tool in the medical field for the development of decision support systems.(10) Artificial Intelligence has been widely applied to nonlinear statistical modeling in the case of large and complex databases of medical information.(11) A feedforward multilayer perception (MLP) neural network is a popular model in the applications of artificial intelligence.(12) Abel et al.(13) examined the use of different neural network types and training methods in electromyography diagnosis of neuromuscular disorders. A neural network with back propagation algorithm was used to predict dynamic tendon forces from EMG signals by Savalberg et al.(14,15) An automatic diagnostic tool for neuromuscular disorders based on the feature extraction and classification of myoelectic patterns using neural network is described by Kumaravel et al. 16 Nussbaum et al.(17) used a model artificial neural network (ANN) to develop the prediction of muscle activity in response to steadystate external moment loads.(18,19) Pattichs et al.(20) examined AR and Cepstral analysis for MUAP classification and disorder identification by using a neural network. Support vector machines (SVMs) developed by Vapnik,(21) have meant a great advancement in solving classification or pattern recognition problems. In recent years, a number of nonlinear classification and regression SVMs have been developed and these have been benchmarked against ANN. It has been found that the empirical performance of SVMs is generally as good as the best ANN solution and it has been hypothesized that this is because there are fewer model parameters to optimize in the SVM approach, reducing the possibility of over fitting the training data and thus increasing the actual performance.(22) There are rather few applications in the bioengineering field and, in particular, in neurology.(23) The present contribution is an attempt to apply SVM to classify EMG signals. In this study, the classification performance of neuromuscular disorders obtained from support vector machine learning and neural network were compared. EMG signals of myopathy, neuropathy and normal subjects were used for analyzing the diagnostic performance of neuromuscular disorders. The feature sets of EMG signals that are obtained from Burg AR analysis can be taken as input parameters of SVM and ANN. These input parameters were divided into training, cross-validation and test sets. The network was then trained and tested. In the testing phase, the classification performance of SVM and ANN were compared. MATERIALS AND METHODS Hardware Figure 1 shows the architecture of the system. EMG signals are taken from muscle fibers of patients using both needle and surface electrodes. These signals are then amplified and filtered with a 20–1000 Hz bandpass filter. A direct interface between the circuits and the computer is accomplished using the National Instruments DAQ board (PCI-MIO-16XE-10). Signals are digitized in A/D conversion with 16-bit resolution. The sampling frequency up to 100 kHz may be chosen as required. The data

Diagnosis of Neuromuscular Disorders

273

Fig. 1. System architecture.

at the output of A/D converter is stored on hard disk of computer. The stored signal is processed to generate the feature sets that are used by SVM and ANN to classify the EMG signals belonging to myopathy, neuropathy and normal subjects.

Experimental Protocol All the measurements from patients and control group were done in Neurology Department of Hacettepe University, Medical School. The listed nerves and muscles have been carefully chosen by the expert doctors. Diagnostic criteria for the subjects selected were based on clinical team, on the other hand if it is required, muscle biopsy was performed. Neuromuscular disease such as myopathy, neuropathy, etc. related to subjects were evaluated by expert doctors. All the EMG data, collected from 59 subjects have been analyzed. Data were recorded from 19 normal subjects, 20 subjects suffering from myopathy and 20 subjects suffering from neuropathy. Mean age of the subjects is 28 ± 0.5 years (range 2 months to 58 years). Filter settings on the EMG machine were adjusted to 20 Hz for lower frequency and 1 kHz for higher frequency. The sampling frequency was set to 5 kHz for all the runs. The recording period was 4 s. EMG recordings were made on the patients in supine position and lying comfortably in a quiet laboratory room. For surface recordings, two different bar electrodes were used. For adults, 40 mm sensory bar electrode, for children and infants, 20 mm sensory bar electrode was used. Two different muscle groups were selected for measurement: biceps and hypothenar group. At biceps, anode of surface electrode was placed on tendon and cathode was placed on belly of the muscle. Recording was done while maximum forearm flexion. At hypothenar group, anode of surface electrode was placed on head of the fifth metacarpus, and cathode was placed on the belly of abductor digitiminimi muscle. Recording was done while maximum finger abduction.

274


EMG Signal Processing There are several methods for recognizing patterns in EMG signals. Among them, AR modeling has been one of the most popular methods in classifying EMG signal patterns.(4−9) Autoregressive modeling can be applied to nonstationary process provided that the process is locally stationary or wide sense stationary. Autoregressive modeling has become a common technique for parameterizing linear systems.24 The model is based on the linear difference equation: x(n) = −

p

akx(n − k) + e(n)

n = 0, 1, . . . , (N − 1)

(1)

k=1

where x(n) are the signal samples, ak the AR coefficients, p the model order and e(n) the time series of residuals. The power density spectrum of the input signal becomes: σw2 , Pxx (f ) = p 1 + ake−j 2πf k

(2)

k=1

where σw2 is the total prediction error and can be obtained while computing ak. Spectral estimation through AR modeling consists in the determination of the ak parameters. A linear system in which the whose parameters are represented by the autocorrelation matrix of the signal must be solved. To evaluate the autocorrelation matrix, time samples of the signal before and after the considered time window have to be generated, and various methods employed to different approaches to extend the sample set. We selected the Burg method since it is computationally efficient, yields stable estimates, and can be optimized for real-time applications. Burg’s method proposes an autocorrelation matrix estimation that does not use zero padding, as done, for example, by the Yule–Walker algorithm, but guesses the time samples to maximize the entropy of the process.(25) This approach, thus, reduces the possible spectral leakage of the strong clutter components, which might even cancel weak informative signals. Selection of a suitable AR order is considered to be of critical importance to the overall accurate performance of the system. Therefore it was decided to adopt a systematic approach to the order selection. The model order was obtained by minimizing an Akaike Information Criterion (AIC) (26) given by AIC = N ln(ρp ) + 2p

(3)

where p is the model order, N the number of data points and ρp the error variance for model order p.


275

GLOBAL MACHINE LEARNING MODELS Neural Network Multilayer perceptron neural networks are very popular and have been successfully applied in various domains.(27) ANN models were developed for the classification of EMG signals. The motivation behind the use of ANN lies in their capacity for making no assumptions about the underlying probability density functions of the input data, finding near-optimum solutions from incomplete datasets, and the fact that learning is accomplished through training. The multilayer perceptron is characterized by a set of input units, a layer of output units and a number of hidden layers. The input to each unit is given by the summation of all the individual weighted outputs passed from the previous layer. The output is then a function of the summation of these inputs. The network training is accomplished by varying the connection weights and the neuron threshold values using the back-propagation learning algorithm.(28) In this study, only one hidden layer was used. There may be some increase in classification performance by determining the optimum number of hidden layers and the number of neurons in each layer. Since determination of optimum level causes delay time in applications, support vector machine learning based on statistical theorems was preferred. Support Vector Machine Support vector machine theory provides the most principled approach to the design of neural networks, eliminating the need for domain knowledge.(21) SVM theory applies to pattern classification, regression, or density estimation using an MLP with an single hidden layer. Unlike BP learning, different cost functions are used for pattern classification and regression. Most importantly, the use of SVM learning eliminates the need to select the size of the hidden layer in an MLP. In the latter case, it also eliminates the need to specify the centers of the MLP units in the hidden layer. Simply stated, support vectors are those data points that are most difficult to classify and are optimally separated from each other. The hidden (feature) space is chosen to be of high dimensionality so as to transform a nonlinear, separable, pattern classification problem into a linearly separable one. Most importantly, however, in a pattern classification task, the support vectors are selected by the SVM learning algorithm so as to maximize the margin of the separation between classes. The curse-of-dimensionality problem, which can plague the design of multilayer perceptions, is avoided in SVMs through the use of quadratic programming. This technique, based directly on the input data, is used to solve for the linear weights of the output layer. Performance Evaluation For comparison of the diagnostic accuracy of the different classification methods and groups, the concept of receiver operating characteristic (ROC) analysis was

276


used. ROC analysis is an appropriate means to display sensitivity and specificity relationships when a predictive output for two possibilities is continuous. In its tabular form the ROC analysis displays true and false positive and negative totals and sensitivity and specificity for each listed cutoff value between 0 and 1. The ROC curves are a more complete representation of the classification performance than the report of a single pair of sensitivity and specificity values.(29) In order to analyze the output data that are obtained from the application, sensitivity (true positive ratio) and specificity (true negative ratio) are calculated by using confusion matrix. Sensitivity value (true positive, same positive result with the diagnosis of expert physicians) is calculated by dividing the total of diagnosis numbers to total diagnosis numbers that are stated by the expert physicians. The performance of classification can be determined through correct classification (CC), sensitivity (SE), and specificity (SP) analysis. CC = 100 ×

TP + TN , N

SE = 100 ×

TP , TP + FN

SP = 100 ×

TN TN + FP

(4)

where N is the total number of patients (TS + TP + FN + FP), TP the number of true positives, TN the number of true negatives, FN the number of false negatives and FP number of false positives. In order to perform the performance measure of the output classification graphically, ROC curve is calculated by analyzing the output data obtained from the test. The relation between sensitivity and specificity may be seen in this graphic by examining the correct and incorrect classification information. Furthermore, the performance of the model may be measured by calculating the region under the ROC curve. RESULTS AND DISCUSSION In this study, the number of total subjects tested was 59. It is possible to divide the patients into two main groups: adults and infants. Those who are under the age of 7 were classified as infants; and those who are above the age of 8 were classified as adults. Distribution profiles of the patients are summarized in Table I. Due to randomness of EMG signals, AR analysis model was used. The AR Burg method was used to compute the AR coefficients, and the model order has been selected using Eq. (3). The major advantages of the Burg method for estimating the parameters of the AR model are high-frequency resolution, yields a stable AR model, and computationally efficient. The AR model order was selected as 50. Table I. Patient Distribution by Age among Disorders

Infants (ages 0, 2, . . . , 7) Adults (ages 8, . . . , 58) All

Myopathy

Normal

Neuropathy

8 12 20

9 10 19

9 11 20


277

Fig. 2. Spectral curve belonging to myopathic infant patient.

Power spectrum density representation is plotted for different groups of patients independently to check for similarities and variations. This can be easily seen from the graphs of three different cases, Fig. 2 for myopathy, Fig. 3 for neuropathy and Fig. 4 for normal. As it can be seen from Figs. 2–4, between 0 and 100 Hz, the highest spectral power density is obtained from spectral curve of neuropathic patients; on the other hand the lowest spectral power density is obtained from spectral curve of myopathic patients. Power spectral density curve of the normal person is lying in these values. In the same way, when the spectral power density (dB) range between 0 and 1000 Hz of the Figs. 2–4 are examined, the biggest difference is obtained from

Fig. 3. Spectral curve belonging to neuropathic adult patient.

278


Fig. 4. Spectral curve belonging to normal person.

spectral curve of neuropathic patient and the smallest difference is obtained from spectral curve of myopathic patients. Power spectral density curve of the normal person lies between in these values. To have a quick and more accurate diagnosis, AR coefficients computing to these spectral curves were used in the input vectors for ANN and SVM. Twentyseven of 59 subjects were used for training set, 27 of them were used for test set and the 5 of them were used as cross-validation set. Desired output values obtained from input vectors of EMG signals have been determined as [1 0 0] (myopathy), [0 1 0] (neuropathy) and [0 0 1] (normal) vectors. A three-layer network can approximate any reasonable function to any degree of required precision as long as the hidden layer can be arbitrarily large [30]. The determination of the approximate number of hidden nodes in each layer is one of the most critical tasks in ANN design. Therefore, hidden layer neuron number has been varied between 2 and 32 and network has been trained for each. Averages of min MSEs were plotted as a function of the number of hidden nodes as shown in Fig. 5. The MSE error curve stabilizes after five neuron numbers, and the lowest MSE value is 0.004. Traditional knowledge from data modeling and recent developments in learning theory clearly indicates that after a critical point an MLP trained with back-propagation will continue to do better in the training set, but the test set performance will begin to deteriorate.(21) One method to solve this problem is to stop the training at the point of maximum generalization. This method is called stopping with cross-validation. It has been experimentally verified that the training error always decreases when the number of iterations is increased. If we plot the error in a set of data with which the network was not trained, we find that the error initially decreases with the number of iterations but eventually starts increasing again. Training therefore should be stopped at the point of the smallest error in the validation set.


279

Fig. 5. MSE value at different hidden neuron numbers.

A MLP with topology 59-5-3 have been trained by back-propagation. In Fig. 6, the error in training set and validation set are shown on the same graph. Training therefore has been stopped where the point of the smallest error (0.0445) is achieved with 41 epochs. Learning curve concerning ANN has been introduced in Fig. 6. Elapsed time is 4 s. After the training phase, the testing of the neural network was done. The EMG data that the network had not been trained on were applied to the neural network for testing the network performance.

Fig. 6. MSE curve belonging to Neural Network (stopping with cross validation).

280

Guler ¨ and Koçer Table II. Confusion matrix belonging to ANN Output/desired Neuropathy Myopathy Normal

Neuropathy

Myopathy

Normal

7 0 2

0 7 2

0 0 9

The confusion matrix gives us a picture of the performance. In this table, the desired classification is represented by rows and the columns represent actual network output. From the confusion matrix introduced in Table II, the percentages of correct classification (CC), sensitivity (SE), and specificity (SP) values were calculated using Eq. (4). CC is 88.8%, while SE is 77.7%, and SP was obtained as 100%. The value of the area under the ROC curve shows the performance classification. In Fig. 7, this value is seen as 0.927. The entire training, test and cross-validation were also applied to SVM. The most important criterion is to choose the number of iterations for training. Cross-validations were used to stop the training at the optimum level. In Fig. 8, the error in training set and validation set are shown on the same graph. As it is seen in Fig. 8, SVM training is stopped in 170 epochs and the number of epochs is determined according to cross-validation. Min MSE value using training sets is 0.0415 at the end of 170 epochs. Elapsed time is 18 s. After the training phase, testing of the SVM was done. The EMG data that the network had not been trained on, were applied to the SVM for testing the network performance. The confusion matrix of the test results is shown in Table III.

Fig. 7. ROC curve belonging to ANN.


281

Fig. 8. MSE curve belonging to SVM (stopping with cross-validation).

From the confusion matrix introduced in Table III, the percentages of correct classification (CC), sensitivity (SE), and specificity (SP) values were calculated using Eq. (4). The CC percentage is 94.4%, while SE is 88.8%, and SP was obtained as 100%. ROC curve shows the performance of the support vector machine given in Fig. 9. In the classification with SVM, the value of the area under this curve is 0.954. When Figs. 6 and 8 are examined, MSE value belonging to ANN in 41 epochs is 0.0445. On the other hand, MSE value belonging to SVM in 170 epochs is 0.0415. When, the elapsed time was examined during training in both classification systems, ANN was faster than SVM. However, a lower MSE value was obtained in SVM. Overall SVM approach can be viewed as a special case of a general regularized ANN, with a single layer, which is simpler to control than general neural network. In particular of basis function is determined by the SVM constrained optimization algorithm and not determined by the user. However, in ANN, bias/variance tradeoff is difficult to control due to determining the number of hidden units. The best percentage of classification (94.4) is obtained in SVM. However in ANN, correct classification value is 91.6. Furthermore, the balance between sensitivity and specificity in SVM is better than the one in ANN. The bigger is the area under the ROC curve, the higher is the probability of making a correct decision. If ROC in Figs. 7 and 9 are examined, SVM classifier shows a higher performance with area of 0.954 and ANN classifiers show slightly lower performance over the entire ROC space with an area of 0.927. Table III. Confusion Matrix Output/Desired Neuropathy Myopathy Normal

Neuropathy

Myopathy

Normal

8 0 1

0 8 1

0 0 9

282


Fig. 9. ROC curve belonging to SVM.

Despite slow learning speed, SVM, can provide the best prediction performances, determine the network topology and estimate all the parameters automatically. It is important to indicate the differences between the SVM and ANN. First, the SVM always finds a global solution that is in contrast to the ANN. On the other hand, the SVM does not minimizes the empirical training error but ANN does. Instead it minimizes the sum of an upper bound on the empirical training error and a penalty term that depends on complexity of the classifier used. Therefore, the proposed method seems to be a potentially useful tool for the automated diagnosis of neuromuscular disorders. CONCLUSION In this study, the classification performances of SVM and ANN were compared in the case of neuromuscular disorders. In order to examine the diagnosis performance of the neuromuscular disorders, myopathy, neuropathy and normal subjects were used. To reduce the computing expense and enable the diagnosis to become faster and easier, AR coefficients were used as input parameters for SVM and ANN. In both classifying systems, size of the training and test sets are kept equal. The reason of equality is to compare the results with SVM more accurately than the one with ANN. When test performances of both classification systems are examined, the highest percentage of correct classification (94.4) is found in SVM. In ANN, the correct classification value is 91.6. SVM shows a higher performance with areas 0.954. ANN shows slightly lower performance over the entire ROC space with areas 0.927.


283

The results by SVM and ANN classification systems are very similar; however, we believe that SVM is a more practical solution to our application. SVM has a significant advantage compared to ANN as SVM can achieve a trade-off between false positives and false negatives, SVM always converges to the same solution for a given dataset regardless of the initial conditions, and finally, SVM removes the danger of over fitting. In conclusion, the SVM learning technique has shown high anticipation level and has proved by its test performances when compared to ANN technique in the diagnosis of neuromuscular disorders. REFERENCES 1. Basmajian, J., and De Luca, C. J., Muscles Alive, Williams & Wilkins, Baltimore, 1985. 2. Deluca, C. J., Towards Understanding the EMG Signal Ch 3 of Muscles Alive, fourth edition, Williams & Wilkonson, Bultimore, 1978. 3. Stalberg, E., Andreassen, S., Falck, B., Lang, H., Rosenfalck, A., and Trojaborg, W., Quantitative analysis of individual motor unit potentials: A proposition for standardized terminology f1and criteria for measurement. J. Clin. Neurophsiol. 3(4):313–348, 1986. 4. Cadzow, J. A., ARMA modeling of time series. IEEE Trans. Pattern Anal. Mach. Intell. 1982. 5. Marple, S. L., Digital Spectral Analysis with Application, Prentice-Hall, Englewood Cliffs, NJ, 1987. 6. Graupe, D., and Cline, W. K., Functional separation of EMG signals via ARMA identification methods for prosthesis control purposes. IEEE Trans. Syst. Man Cyber. SM-5:252–259, 1975. 7. Coatrieux, J. L., Interference electomyogram processing. Part II. Experimental and simulated EMG AR modeling. Elect. Clin. Neurophysiol. 23:481–490, 1983. 8. Maranzana, M. F., Molinari, R. R., and Somma-Riva, G., The parameterization of the electrmyographic signal: An approach based on simulated EMG signals. Elect. Clin. Neurophysiol. 24:47–65, 1984. 9. Basmajian, J. V., Gopal, D. N., and Ghista, D. N., Electrodiagnostic model for motor unit action potential generation. Am. J. Phys. Med. 64:460–475, 1985. 10. France, F. H. R., and Santucci, G., Perspectives of Information Processing in Medical Application Strategic Issues, Requirements and Option for the European Community, 1991. 11. Frize, M., Ennett, M., Stevenson, M., and Trigg, C. E., Clinical decision support for intensive care unit using ANN. Medical Eng. Phys. 23:217–225, 2001. 12. Basheer, I. A., and Hajmeer, M., Artificial neural networks: Fundamentals, computing, design and application. J. Microbiol. Methods 43:3–31, 2000. 13. Abel, E. W., Zacharia, P. C., Forster, A., and Farrow, T. L., Neural network analysis of the EMG interference pattern. Med. Eng. Phys 18:12–17, 1996. 14. Savelberg, H. H., and Herzog, W., Prediction of dynamic tendon forces from electromyographic signals: An artificial neural network approach. J. Neurosci. Methods 30,78(1–2):65–74, 1997. 15. Liu, M. M., Herzog, W., and Savelberg, H. H., Dynamic muscle force predictions from EMG: An artificial neural network approach. J Electromyogr. Kinesiol 9(6):391–400, 1999. 16. Kumaravel, N., and Kavitha, V., Automatic diagnosis of neuromuscular disease using neural network. Biomed. Sci. Instrum. 90:245–250, 1994. 17. Nussbaum, M. A., Martin, B. J., and Chaffin, D. B., A neural network model for simulation of torso muscle coordination. J. Biomech. 30(3):251–258, 1997. 18. Nussbaum, M. A., and Chaffin, N. B., Evaluation of artificial neural network modeling to predict torso muscle activity. Ergonomics 39(12):1430–1444, 1996. 19. Nussbaum, M. A., Chaffin, D. B., and Martin, B. J., A back-propagation neural network model of lumbar muscle recruitment during moderate static exertions. J. Biomech. 28(9):1015–1024, 1995. 20. Pattichis, C. S., and Elia, G. A., Autoregressive and Cepstral analysis the motor unit potential. Med. Eng. Phys. 405–419, 1999. 21. Vapnik, V. N., Statistical Learning Theory, Wiley Series on Adaptive and Learning Systems for Signal Processing, Communications and Control, Wiley, New York, 1998. 22. Hearst, M., et al., Support vector machines. IEEE Intell. Syst. 13(4), July–August 1998.

284


23. Millet-Roig, J., Ventura-Galiano, R., Chorro-Gasco, F. J., and Cebrian, A., Support vector machine for arrhythmia discrimination with wavelet transform-based feature selection. Comput. Cardiol. 407–410, 2000. 24. Guler, I., Hardalac, F., and Muldur, S., Determination of aorta failure with the application of FFT, AR and wavelet methods to Doppler technique. Comput. Biol. Med. 31:229–238, 2001. 25. Proakis, J. G., and Manolakis, D. G., Digital Signal Processing. Principles Algorithms and Applications, 2nd eds., Macmillan Publishing Company, New York, 1992. 26. Akaike, H., A new look at the statistical model identification. IEEE Trans. Autom. Control. 19:716– 723, 1974. 27. Haykin, S., Neural Network—A Comprehensive Foundation, Macmillan, New York, 1994. 28. Hassoun, M. H., Fundamentals of Artificial Neural Network, MIT Press Cambridge, MA, 1995. 29. Hanley, J. A., McNeil, B. J., The meaning and use of the area under the Receiver Operating Characteristic (ROC) curve. Radiology 143:29–36, 1982. 30. Basher, I. A., and Hajmeer, M., Artificial neural network fundamentals, computing, design and application. J. Microb. Methods 43:3–31, 2000.