experimental comparison of fuzzy and neural network techniques in ...

1 downloads 0 Views 169KB Size Report
NETWORK TECHNIQUES IN LEARNING MODELS OF THE ... The central nervous system controls the hemodynamical system by generating the regulating ...
EXPERIMENTAL COMPARISON OF FUZZY AND NEURAL NETWORK TECHNIQUES IN LEARNING MODELS OF THE CENTRAL NERVOUS SYSTEM CONTROL Jordi Cueva, Rene Alquezar and Angela Nebot Dept. Llenguatges i Sistemes Informatics Universitat Politecnica de Catalunya, Jordi Girona Salgado, 1-3, Barcelona 08034, Spain. Phone: (34-3) 4015642 Fax: (34-3) 4016050 [email protected], [email protected], [email protected] Fuzzy systems; Inductive reasoning; Recurrent neural networks; Time-delay neural networks; Model identi cation; Cardiology.

Keywords:

ABSTRACT: The aim of this work was to apply the fuzzy inductive reasoning (FIR) methodology and both time-delay and recurrent neural networks (TDNNs and RNNs) to induce models of the central nervous system (CNS) control that accurately represents the input/output behavior available from observations of a particular patient. A comparative study of these approaches from the point of view of the predictiveness of the inferred model is presented in this paper. The model obtained by the FIR methodology greatly outperforms those obtained by TDNNs and RNNs in forecasting the output signals of the CNS control. INTRODUCTION

The cardiovascular system is composed of the hemodynamical system and the Central Nervous System (CNS) control. Whereas the structure and functioning of the hemodynamical system are well known and a number of quantitative models have already been developed that capture the behavior of the hemodynamical system fairly accurately, the CNS control is, at present, still not completely understood and no good deductive models exist that are able to describe the CNS control from physical and physiological principles. The use of other approaches like qualitative methodologies or neural networks may o er an interesting alternative to classical quantitative modeling approaches, such as di erential equations and NARMAX techniques, for capturing the behavior of the CNS control (Vallverdu, 93). This paper deals with the inductive inference of models of the CNS control through three di erent approaches. On the one hand, a qualitative model is obtained by means of the Fuzzy Inductive Reasoning (FIR) methodology. On the other, two connectionist models based on time-delay and recurrent neural networks, respectively, are also inferred in order to compare fuzzy and neural network approaches in the task at hand.

THE CENTRAL NERVOUS SYSTEM CONTROL

The central nervous system controls the hemodynamical system by generating the regulating signals for the blood vessels and the heart. These signals are transmitted through bundles of sympathetic and parasympathetic nerves, producing stimuli in the corresponding organs and other body parts. The functioning of the central nervous system is of high complexity and not yet fully understood. This is the reason why many of the cardiovascular system models developed so far have been designed without taking into account the e ects of CNS control. Nevertheless, individual di erential equation models for each of the hypothesized control mechanisms have been postulated by various authors (Leaning et al., 83). The CNS control model is composed of ve separate controllers: the heart rate controller (HRC), the peripheric resistance controller (PRC), the myocardiac contractility controller (MCC), the venous tone controller (VTC), and the coronary resistance controller (CRC). All ve controller models are single{input/single{output

(SISO) models driven by the same input variable, namely the Carotid Sinus Pressure. Although the Carotid Sinus Pressure is not easily measurable, it can be extracted from the di erential equation model describing the hemodynamics of the cardiovascular system. The ve output variables of the controller models are not even amenable to a physiological interpretation, except for the Heart Rate Controller variable, which is the inverse heart rate, measured in seconds between beats.

FUZZY INDUCTIVE REASONING

The inductive reasoning methodology was rst developed by George Klir (Klir, 85) as a tool for general system analysis. Fuzzy measures were introduced into the General System Problem Solver in the late eighties, deriving the Fuzzy Inductive Reasoning (FIR) methodology. In the FIR approach, the qualitative systems are represented (modeled) by a special class of nite state machines de ned by an optimal mask and a behavior matrix, and the episodical behavior of the system is inferred (simulated) by a technique called fuzzy forecasting. Since fuzzy inductive reasoning, just like all other qualitative reasoning approaches, bases its decisions on discretized (i.e., qualitative) variables, it is necessary to discretize continuous variables by means of a technique called fuzzy recoding, before the identi cation of a qualitative model can be attempted. The fuzzy recoding technique converts quantitative values into qualitative triples. The rst element of the triple is the class value, the second element is the fuzzy membership value, and the third element is the side value that indicates whether the quantitative value is to the left or to the right of the peak value of the associated membership function. In the process of modeling, it is desired to discover causal relations among the recoded variables that make the resulting state transition matrices as deterministic as possible. If such a relationship is found for every output variable, the behavior of the system can be forecast by iterating through the state transition matrices. The more deterministic the state transition matrices are, the higher is the likelihood that the future system behavior will be predicted correctly. A possible relation among the qualitative variables for a two{variable system example could be of the form presented in Eq. (2), where ~f denotes a qualitative relationship. In FIR methodology Eq. (2) is represented by the matrix (called a mask) shown in Eq. (3). FIR optimal mask function evaluates the possible masks and returns the one that has the highest quality by means of the entropy reduction measure. Once the best model has been identi ed, it can be applied to the qualitative data matrices resulting in a particular rule base, that in FIR terminology is called the behavior matrix. Once the behavior matrix and the mask are available, system's prediction can take place using FIR inference engine. This process is called fuzzy forecasting. FIR inference engine is a specialization of the k-nearest neighbor rule, commonly used in the pattern recognition eld, which has been proved to be very succesful in diferent areas. For a deeper and more detailed insight into the FIR methodology, the reader is referred to (Cellier et al., 96; Nebot, 94).

TIME-DELAY AND RECURRENT NEURAL NETWORKS

Two types of neural network architectures can be used for learning tasks that involve a dynamic input/output, such as sequence classi cation, prediction, and temporal association: time-delay neural networks (TDNNs) and recurrent neural networks (RNNs) (Hertz et al., 91). If some xed-length segment of the most recent input values is considered enough to perform the task successfully, then a temporal sequence can be turned into a set of spatial patterns on the input layer of a multi-layer feedforward (MLFF) net and common backpropagation can be used for training; these architectures are called TDNNs, since the values x(t); x(t ? ); : : :; x(t ? (m ? 1)) from a signal x(t) are presented simultaneously at the network input using a moving window (shift register or tapped delay line). TDNNs have been applied to di erent tasks, e.g. prediction and system modelling (Lapedes and Farber, 87). In recent years, several RNN architectures including feedback connections, together with their associated training algorithms, have been devised to cope naturally with the learning and computation of tasks involving sequences and time series (Elman, 90; Hertz et al., 91). The key point is that the recurrency lets the network remember cues from the past and encode them in its internal state representation to accomplish with the trained task. A type of RNN that has been proven useful in grammatical inference through next-symbol prediction is the rst-order augmented single-layer RNN (or ASLRNN) (Sopena and Alquezar, 94), which is similar to Elman's SRN except that is trained by a true gradient-descent method, using backpropagation for the feedforward output layer and an ecient RTRL algorithm reported in (Schmidhuber, 92) for the fully-connected recurrent hidden layer. Although the use of sigmoidal activation functions has been common in both TDNNs and RNNs, a better

learning performance can be achieved using other activation functions such as the sine function (Sopena and Alquezar, 94). Moreover, TDNNs and MLFF nets with sinusoidal units can be seen as generalized discrete Fourier series with adjustable frequencies (Lapedes and Farber, 87).

EXPERIMENTAL PROCEDURE AND RESULTS

As mentioned previously, ve controllers have been inferred by means of FIR methodology, TDNNs and ASLRNNs. The input and output signals of the CNS control were recorded with a sampling rate of 0.12 seconds from simulations of the purely di erential equation model. The model had been tuned to represent a speci c patient su ering from an at least 70% coronary arterial obstruction, by making the four di erent physiological variables (right auricular pressure, aortic pressure, coronary blood ow, and heart rate) of the simulation model agree with the measurement data taken from the patient. The data used in the identi cation process constitute only a subset of the data available from the studied patient. More precisely, for each controller, 3500 data points have been used as training set. The models were validated by using them to forecast six data sets not employed in the training process. Each one of these six test data sets, with a size of about 300 data points each, contains signals representing speci c morphologies, allowing the validation of the model for di erent system behaviors. Data set # 1 represents two consecutive Valsalva maneuvers of 10 seconds duration separated by a two second break, data set # 2 shows two consecutive Valsalva maneuvers of 10 seconds duration separated by a four second break, and data set # 3 exhibits two consecutive Valsalva maneuvers of 10 seconds duration separated by an eight second break. Data set # 4 shows a single Valsalva maneuver of 10 seconds duration with an intensity (pressure) increase of 50% relative to the previous three data sets. Data set # 5 describes a single Valsalva maneuver of 20 seconds duration with nominal pressure. Finally, data set # 6 characterizes a single Valsalva maneuver of 10 seconds duration with nominal pressure. Data set # 6 is called the reference data set, since it represents a standardized Valsalva maneuver, from which all the other variants are derived by modifying a single parameter. In the modeling process, the normalized mean square error (in percentage) between the simulated output, y^(t), and the system output, y(t), is used to determine the validity of each of the control models. The error equation is given in Eq. (1). 2 MSE = E[(y(t)y ? y^(t)) ]  100% var

(1)

where yvar denotes the variance of y(t).

FIR CENTRAL NERVOUS SYSTEM CONTROLLER MODELS

The input and output variables of the heart rate controller subsystem were recoded into three qualitative levels each. Three classes are sucient to obtain a good qualitative model of the system, and consequently, it was not necessary to work with more complex models. For the heart rate controller, the causal relation that was found among the variables can be represented by the following equation HRC(t) = ~f(CSP(t ? t); HRC(t ? t); CSP(t));

(2)

which in FIR methodology is expressed as the optimal mask

nx 0 CSP HRC 1 t ? 2t 0 0 t ? t @ ?1 ?2 A t ?3 +1 t

(3)

Although a mask depth of three was initially proposed, the qualitative modeling (optimization) algorithm reduced the mask depth from three to two. Notice that t = 0:24, i.e., only every second data point was used. The left of Figure 1 shows a comparison of the output obtained by forecasting data set # 1 using the FIR model (dashed line) with the true measured output (solid line). The data exhibit high frequency oscillations modulated onto a low frequency signal. The FIR model is capable to properly forecast both the low-frequency and the high-frequency behavior of this signal. There is only a short interval where the FIR model evidently was unable to predict how the signal is supposed to continue.

Data Set 1 0.95

0.9

Heart Rate Control

0.85

0.8

0.75

0.7

0.65 0

50

100

150 200 Samples (Ts=0.24 sec.)

250

300

350

Figure 1: Heart Rate Controller data set # 1 forecast by FIR model (left) and ASLRNN (right) The mean square error (MSE) for this signal is 5.74%. It should be noted that the forecast shown in Fig. 1 (left) is the second worst result obtained for any of the six data sets. Since even the worst result (8.64%) is fairly good, the model can be accepted as valid. The mean square errors obtained for the six test data sets are given in the HRC column of Tab. 1. The same identi cation procedure was used in order to obtain FIR models of the other four controllers. Each of those resulted in a di erent optimal mask. The computation of the MSE errors of the peripheric resistance, myocardiac contractility, venous tone, and coronary resistance controller models are also presented in Tab. 1. The table shows that the average errors obtained for the six validation data sets were all smaller than 5.0%. Hence the FIR qualitative modeling methodology has been shown to work exceedingly well when confronted with cardiac data.

Data Set 1 Data Set 2 Data Set 3 Data Set 4 Data Set 5 Data Set 6 Average Error

HRC

PRC

MCC

VTC

CRC

5.74 % 2.08 % 15.00 % 2.59 % 3.07 % 0.97 % 0.03 % 0.10 % 0.28 % 0.02 % 4.94 % 1.31 % 3.77 % 7.20 % 2.62 % 8.64 % 10.51 % 8.31 % 6.18 % 0.56 % 0.84 % 0.62 % 1.36 % 0.42 % 0.15 % 0.82 % 0.12 % 0.61 % 0.42 % 0.11 % 3.65 % 2.44 % 4.86 % 2.85 % 1.09 % Table 1: MSE errors of the HR, PR, MC, VT, and CR controller models induced by FIR

Data Set 1 Data Set 2 Data Set 3 Data Set 4 Data Set 5 Data Set 6 Av. Error

HRC

24.31% 7.47% 13.48% 6.87% 32.12% 7.86% 15.35%

Time Delay Neural Networks

PRC

58.15% 17.80% 41.56% 29.09% 34.73% 21.22% 33.76%

MCC

41.72% 20.92% 40.19% 39.80% 34.32% 27.20% 34.02%

VTC

41.68% 20.90% 40.22% 39.80% 34.41% 27.22% 34.04%

CRC

147.73% 28.35% 84.84% 4.69% 56.20% 12.32% 55.69%

HRC

28.25% 8.62% 16.77% 8.16% 38.24% 9.80% 18.31%

Recurrent Neural Networks

PRC

50.07% 16.11% 36.89% 26.97% 38.54% 18.40% 31.16%

MCC

55.83% 17.18% 35.60% 42.08% 36.87% 23.38% 35.16%

VTC

54.25% 16.93% 35.68% 41.86% 36.77% 23.12% 34.77%

CRC

148.65% 36.17% 83.75% 4.49% 58.50% 11.16% 57.12%

Table 2: Average MSE errors of the CNS controller models inferred by TDNNs and RNNs

TDNN AND RNN CENTRAL NERVOUS SYSTEM CONTROLLER MODELS

The same training and test data sets used by FIR methodology were also given to train and test two speci c NN architectures. The former consisted of a two-layer TDNN with 1 output unit, 8 hidden units, and 5 input units corresponding to the values x(t); x(t ? t); x(t ? 2t); y^(t ? t); y^(t ? 2t), where x(t) denotes the current value of the input variable (Carotid Sinus Pressure), y^(t ? t) denotes the net output in the previous time step, and t = 0:24 sec. Note that the moving window size agrees with the size of the FIR initial mask. The second net was a rst-order ASLRNN with 1 output unit, 4 recurrent hidden units, and 2 input units corresponding to the values x(t) and y^(t ? t). Both network types were built with all sinusoidal units. The reason why y^(t ? t) was used as input during training instead of y(t ? t), the previous target output, was because the target output is not available to the net after training, and thus, the net behavior was expected

to be more similar during and after training in this way (indeed, the above change was shown empirically to improve the generalization performance). For each CNS controller and architecture, ve di erent training trials were run using a di erent random weight initialization. All nets were trained 1; 000 epochs using a small learning rate of = 0:001 to allow a smooth minimization trajectory. These parameters were tuned after some preliminary tests. For each run, the network yielding the smallest MSE error on the training set during learning was taken as the controller model and applied to the six test data sets associated with the controller. The MSE errors on the test sets were calculated. The average of these MSE errors for the ve di erent runs is displayed in Tab. 2 for each architecture, controller, and test data set. It can be observed that the two architectures behaved similarly, but the middling generalization performance obtained by both TDNN and RNN models is very far from the excellent performance provided by the FIR methodology. One of the causes of this di erence is due to the fact that the networks were often unable to predict the high-frequency oscillations of the output signal (to the contrary of FIR models). The right of Figure 1 shows again the target output signal of the HRC data set # 1 (solid line), now confronted with the signal predicted by an ASLRNN (dotted line) with 28.03% MSE error.

CONCLUSIONS

In this paper, a portion of the human central nervous system control, namely the portion that is responsible for the functioning of the heart, has been modeled using inductive modeling techniques. Five controller models, for a single patient, describing di erent control actions related to cardiovascular control have been learned separately using the FIR qualitative inductive modeling approach and two connectionist approaches based respectively on time-delay and recurrent neural networks. It has been shown that the FIR methodology is capable of capturing dynamic behavior of systems much more accurately than the TDNN and RNN approaches. On the other hand, the connectionist models are much simpler. The most important characteristic of the FIR models is their self-assessment capability. FIR models have a way of knowing the quality of their own predictions, a fact that increases dramatically the robustness of the forecasting process as well as the con dence that the user should have in the predictions made. Further experimentation with TDNN and RNN architectures is needed to study whether their performance can be improved signi cantly. For instance, some more hidden units or an extra hidden layer could be added. Note however that even though the training data could be learned more accurately, this does not mean that the generalization performance were better. In addition, the behavior displayed by the nets should be investigated more deeply, e.g. the failure in learning correctly the phase of the signal.

REFERENCES Cellier, F.E., Nebot, A., Mugica, F. and de Albornoz, A. 1996. Combined Qualitative/Quantitative Simulation Models of Continuous{Time Processes Using Fuzzy Inductive Reasoning Techniques. International Journal of General Systems Vol. 24 Num. 1-2, pp.95-116. Elman, J.L. 1990. Finding structure in time. Cognitive Science Vol. 14, pp.179-211. Hertz, J., Krogh, A. and Palmer, R.G. 1991. Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City. Klir, George J. 1985. Architecture of Systems Problem Solving. Plenum Press, New York. Lapedes, A. and Farber, R. 1987. Nonlinear signal processing using neural networks: prediction and system modelling. Tech. Rep. LA-UR-87-2662, Los Alamos National Laboratory, Los Alamos NM. Leaning, M.S., Pullen, H.E., Carson, E.R. and Finkelstein, L. 1983. Modelling a complex biological system: the human cardiovascular system. Trans. Inst. Meas. Control Vol. 5, pp. 71-86. Nebot, A. 1994. Quantitative Modeling and Simulation of Biomedical Systems Using Fuzzy Inductive Reasoning. Ph.D. thesis dissertation, Universitat Politecnica de Catalunya, Barcelona. Schmidhuber, J. 1992. A xed size storage O(n3 ) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation Vol. 4, pp.243-248. Sopena, J.M. and Alquezar, R. 1994. Improvement of learning in recurrent networks by substituting the sigmoid activation function. ICANN'94, Proc. of the Int. Conf. on Artif. Neural Networks, Sorrento, Italy, Springer-Verlag, Vol.1, pp.417-420. Vallverdu, M. 1993. Modelado y Simulacion del Sistema de Control Cardiovascular en Pacientes con Lesiones Coronarias. Ph.D. thesis dissertation, Universitat Politecnica de Catalunya, Barcelona.