Using a Neural Network to Learn General Knowledge in a ... - CiteSeerX

Using a Neural Network to Learn General Knowledge in a Case-Based System Eliseo Reategui1 , John A. Campbell1 and Shirley Borghetti2 1

Department of Computer Science, University College London Gower St, London WC1E 6BT, UK (e-mail: e.reategui, [email protected]) 2 Department of Transplants, The Heart Institute of S~ao Paulo Av. Dr. Eneas Carvalho de Aguiar 44 05403-000 S~ao Paulo, SP - Brazil (e-mail: dcl [email protected])

Abstract. This paper presents a new approach for learning general

knowledge in a diagnostic case-based system through the use of a neural network. We take advantage of the self-adapting nature of the neural network to discover the most relevant features and combination of features for each diagnosis considered. The knowledge acquired by the network is interpreted and mapped into symbolic diagnosis descriptors, which are kept and used by the case-based system to guide its reasoning process, to retrieve cases from a case library and to build explanations. The neural network used in the learning process was the Combinatorial Neural Model, a network that has been combined with other symbolic approaches previously. The paper presents the method used to interpret the knowledge learned in the neural network, as well as the guidelines followed by the reasoning process of the CBR system. An initial experiment in clinical psychology is also reported, where the case-based model introduced here was used to learn and represent the psychological pro le of patients in evaluation for heart transplant.

1 Introduction Case-Based Reasoning (CBR) emphasises the use of particular experiences in reasoning. However, general knowledge often plays an equally important role in such systems. For example, a number of early CBR systems such as CYRUS [5] or CASEY [6] have compiled experiences in generalized episodes. These generalizations have been used to organize the case library and index the cases by exploiting the dierences among them. In the CBR model presented here, general knowledge acquired by a neural network is interpreted and stored in memory structures called diagnosis descriptors. Each descriptor is responsible for representing and highlighting the most important features for the identi cation of the diagnosis, and for associating the diagnosis with previous relevant experiences. Other eorts in the combination of symbolic and connectionist representations can be seen in [3], where the author examines the limitations of each

approach and the advantages of hybrid architectures, or in [4], where connectionist and symbolic processes have been combined in the design of a hybrid marker-passing model. The neural network chosen for learning the general knowledge was the Combinatorial Neural Model (CNM) [10]. Our choice of neural network had much to do with our previous experience with this same network, as well as with the fact that it had been successfully combined with symbolic approaches previously, to solve problems in dierent application domains such as renal diseases [11], cardiology [7] and banking [16]. The paper is organized as follows: section 2 introduces the CNM, the operation of the network and its learning mechanism. Section 3 presents the structure of the diagnosis descriptors and details of how the knowledge stored in the neural network is interpreted and mapped into them. Section 4 introduces the reasoning process, which uses the diagnosis descriptors and previous cases to solve new problems. In the nal sections we describes an experiment in clinical psychology and some initial results, discuss this work in relation to similar projects, and consider both lessons from our experience and possible future developments.

2 The Combinatorial Neural Model The knowledge acquisition methodology of knowledge graphs inspired the development of the Combinatorial Neural Model. A knowledge graph is described as a minimal directed AND/OR acyclic graph representing the knowledge of an expert for a speci c diagnostic hypothesis [8]. The structure of the CNM is therefore very similar to that of the graphs. The CNM has a feedforward topology with three layers. The input layer is formed by fuzzy-number cells. These fuzzy numbers (values in the interval [0,1]) represent the degree of con dence the user has in the information that is observed and inserted into the neural network. Cells in dierent layers are linked by connections with an associated weight which represents the in uence of lower-layer cells on the output of upper-layer cells. Diagnostic Hypothesis Output Layer Combinatorial Layer Input Layers

Fig. 1. Basic structure of the Combinatorial Neural Model

Figure 1 depicts the basic structure of the CNM. Dierent types of information are contained in each layer of the neural network: { output layer: contains nodes that represent the dierent diagnostic hypotheses; { input layer: contains nodes that represent evidence such as symptoms, test results or any other information that supports the diagnostic hypothesis. { intermediate layer: speci es dierent combinations of evidence that can lead to a particular diagnostic hypothesis. The connections of the input layer can be either excitatory or inhibitory. An excitatory connection propagates the arriving signal using its weight as an attenuating factor. An inhibitory connection performs fuzzy negation on the arriving signal X, transforming it to 1-X. The combinatorial layers are formed by hidden fuzzy AND-cells. They associate dierent input cells in intermediate chunks of knowledge which are relevant in the classi cation process. The output layer is formed by fuzzy OR-cells. They implement a competitive mechanism between the dierent pathways that reach the diagnostic hypothesis.

2.1 The neural network learning mechanism The neural network learning process is based on two main functions: { determining the combinations of features that are in uential for each diagnosis; { adjusting weights of the connections that link the nodes of the input, the combinatorial and the output layers. The CNM uses a punishment and reward learning algorithm to modify connection weights and force the network to converge to a desired behaviour intrinsically represented in a set of examples. This algorithm uses a simple version of the backpropagation mechanism [18]; it rewards the pathways that lead to correct results, and punishes the pathways that lead to incorrect results. Punishment and reward accumulators are stored for each connection of the network. The operant components of the network are not the accumulators though, but the weights that are associated with each connection and that are computed with the use of values held in the accumulators. After being calculated and normalized for the interval [0,1], the connection weights are used to calculate the output of each connection and to show how strong the connection is.

3 Using the CNM to build symbolic descriptors of diagnoses One of our main goals in this work has been to transform the mathematical knowledge stored in the neural network as accumulators and weights into more intelligible symbolic knowledge. The main task was therefore to extract from

the neural network symbolic descriptors for each diagnosis considered, in doing something similar to what is stated in some previous work [17] on combining the CNM with a frame system. Three attributes were created to describe generalized knowledge about a diagnosis. The features referenced by these attributes are characterized by their frequency and speci city in relation to the diagnosis. Features that are speci c are important for their discriminatory properties, being normally observed only in one diagnosis and not in others. Features that are frequent, on the other hand, are important for con rming a certain diagnostic hypothesis. The following attributes store the generalized knowledge of a diagnosis descriptor: { Triggers: nodes of the neural network containing features and combinations of features that are highly signi cant, being very speci c and frequent to the diagnosis. These are the nodes that, after the learning phase, have the punishment accumulator equal to 0 and the connection weight equal to 1. { Primary features: nodes of the network containing features and combinations of features that are either very speci c or very frequent to the diagnosis. These features usually provide a high degree of con dence when pointing out a diagnostic hypothesis. They were selected by taking the nodes of the network with high weight. { Supporting features: nodes containing features and combinations of features that are not very frequent or speci c, but that can reinforce the diagnostic conclusion. The weights of these nodes are lower than the weights observed in primary features. Two additional attributes were used to identify cases of the case library that are likely to be relevant in the diagnostic process, but that would not normally be considered if only the generalized knowledge of the diagnosis descriptors were used. { Positive remindings: the purpose of this attribute is to index the cases of the case library that are less typical and that can be useful in the analysis of new cases that do not conform with the norm. The positive remindings are selected by taking the nodes of the neural network containing features that are highly speci c but not particularly frequent in such diagnoses. These are the nodes that have the punishment accumulator equal to 0 and a low reward accumulator. { Negative remindings: the purpose of this attribute is to index previous cases containing features that would normally imply a diagnosis dierent from the one by which the case was classi ed. These nodes have the connection weight equal to 1 and the punishment accumulator dierent from 0. The cases used to punish these nodes during the training phase are indexed by this type of reminding. It is important to point out here that, by eliciting the information above, we convert the \opaque" knowledge stored in the network into meaningful attributes. Furthermore, some attributes are formed with a type of knowledge that

would not be taken into account in the neural network. For example, the nodes of the neural network referenced in the negative remindings would normally indicate a certain diagnosis with a high degree of con dence, without taking into account that in some previous circumstances, the same set of features contained in the node have been observed in cases diagnosed dierently. The negative remindings are used by the symbolic system in such a way that these cases are always considered for further evaluation before a nal solution is passed to the user. In order to control the volume of redundant data in the case library, only a small set of cases is stored. These cases are classi ed as prototypical or atypical. Prototypical cases represent normative experiences, being selected from the whole set of existing cases as the ones that show the highest degree of similarity with the generalized knowledge of diagnosis descriptors. Although these cases contain the same kind of knowledge represented in the diagnosis descriptors, it was important to have them stored in the library to enable the system to present a similar previous experience that could help the user to understand the given results. Atypical cases, on the other hand, are the cases that do not conform with the norm. They show rarer or exceptional features and contain knowledge that is not normally captured by systems that learn through generalization. The criterion for determining which cases are atypical is simple: all the existing cases that are indexed by both the positive and negative remindings of the diagnosis descriptors are selected as atypical.

4 The reasoning process As mentioned earlier in this paper, CBR emphasises the use of particular experiences in reasoning, in contrast with adopting the more traditional arti cial intelligence approach of using only generalized knowledge, such as production rules. However, CBR systems usually also maintain some form of generalized knowledge, in an attempt to hold a broader view of how to solve the problems that they address (e.g. at least for indexing purposes). Here, the general knowledge learned from the neural network is represented in diagnosis descriptors and used to draw hypotheses for possible solutions, to index the cases in the library and to explain the reasoning performed by the system. The cases in the library are used to support a particular hypothesis (or to choose among suggested hypotheses) and to illustrate to the user previous experiences that resemble the current one, highlighting their similarities and implying that the solutions outlined for those cases should also apply for the new one. When presented with a new case, the rst step in the reasoning process is to match the given case with the triggers and primary features of the diagnosis descriptors. For simple cases, the system can usually nd a single diagnostic hypothesis and exploit the supporting features just to reinforce the hypothesis. For more complicated cases, the system would follow the same step described above,

but it would generate more than one hypothesis. In this event, the following procedure would be started: { The cases indexed by the positive remindings of the hypotheses raised are retrieved and matched against the problem case. The best matches among those are selected as holders of possible solutions. { The cases indexed by the negative remindings of the hypotheses raised are retrieved for further comparison with the problem case. Retrieved cases that match closely the problem case are also selected as holders of possible solutions. { All the prototypical cases for the hypothetical diagnoses are matched against the problem case. The best match among these and the reminded cases from the steps above is selected to provide the nal solution.

New case Diagnosis

?

Features A0 G0 C0 F0 D0 H0 B1 E3

Trigger Primary Supporting Features Features

Retrieved more atypical case

Retrieved prototypical case

Case128

Features

A0 B1 C0 D0 E0 F0 G0 H0 Diagnosis

Favourable

Case097

Features

A0 B0 C0 D0 E3 F0 G0 H0 Diagnosis

Temporarily not favourable

A0 - Psychological functioning: normal

E0 - Family dynamics: satisfactory

B0 - Use of psychoactive substances: no use

E3 - Family dynamics: unmanageable difficulties

B1 - Use of psychoactive substances: social use

F0 - Psychosocial state: absence of symptoms

C0 - Adherence to surgical treatment: good

G0 - Intellectual resorces: available

D0 - Motivation and knowledge about the transplantation: good understanding

H0 - Psychiatric antecedent in the family: absent

Fig. 2. Some cases used in the construction of an explanation After determining a nal solution to the problem, the best-matching case

is presented to the user and an explanation for the reasoning process is built, based on the importance of each attribute of the nal diagnosis descriptor and on the similarities of the problem case with the best-matching case. Figure 2 shows an example of the reasoning carried out for a psychosocial evaluation case. Some features of the case presented as a problem are identi ed as triggers, primary and supporting features for the diagnosis Favourable. The prototypical Favourable case most similar to the new case is retrieved for further evaluation. A more atypical case is also retrieved, as not all the features of the new case could be explained by the diagnosis descriptor of Favourable. The new case is diagnosed Temporarily not favourable and the following explanation is produced: \The features A0 and G0 were identi ed as Triggers of the diagnosis Favourable. Thus, these features indicate the diagnosis Favourable with a high degree of con dence. The features C0 and F0 were identi ed as Primary features of Favourable cases, and as such they con rm this diagnosis with a high degree of con dence. The feature D0 supports the same diagnostic hypothesis. Case128 is the most similar prototypical case in the library. However, it has one feature that does not match the new case: E0. Other more atypical cases were considered for further analysis, among which the most similar was Case097. A thorough comparison among the three nal cases showed that the new case has a higher degree of similarity with Case097. This higher similarity is explained by the fact that feature E3 - Family Psychodynamics: blocked is more important than the feature B1 - Use of Psychoactive Substances: social. Therefore, new case can also be catalogued as atypical and it can have the same diagnosis as Case097: Temporarily not favourable."

5 Discussion CBR and neural networks have been combined in previous research, as in [1, 13] where the neural networks are used mainly for case matching and retrieval tasks. In the CBR model presented here, we have also used neural networks to learn patterns of similarity between cases, but our approach has been dierent in that we did not involve the neural networks themselves in the reasoning process. Instead, we tried to de ne symbolic interpretations for the knowledge stored in the neural networks, and use these interpretations, combined with speci c cases, to support the reasoning. The mapping of the knowledge stored in the neural network into diagnosis descriptors is similar to our previous work [17]. Here, however, we have discussed diagnosis descriptors more extensively and we have proposed a new method for eliciting symbolic knowledge of the neural network. Furthermore, by incorporating prototypical cases in the model, as well as positive and negative remindings in the diagnosis descriptors, we have de ned a way to connect the diagnoses to real cases in the library. In [16] we have introduced another approach for combining neural networks with CBR in a system for classifying credit card transactions. In that scheme the

two reasoning mechanisms work in an independent fashion, the answers provided by each of the individual mechanisms being merged by a mediator. The approach described in the present paper is very dierent in that the neural network is not used to provide answers for a posed problem. It is used to learn and form the normative knowledge kept in the diagnosis descriptors. These descriptors are the actual components of the architecture that are used in the reasoning process. The advantages of the architecture presented here over pure neural networks are very similar to those of symbolic systems over connectionist systems. For instance, the knowledge of the diagnosis descriptors is easy to understand. Hence, the descriptors can be consulted easily and used to build explanations for the reasoning process, unlike the knowledge of the neural networks. We have started to use this model in the development of a system for the psychological evaluation of candidates for heart transplants. The psychological features of heart transplant patients have been studied since 1969 [9]. The initial studies showed the importance of a patient's psychological pro le for a satisfactory outcome after the transplant [2]. At present, psychological evaluation is a routine procedure before heart transplants, used in 5 out of 6 of the transplantation centres worldwide. Between 1988 and 1994, 463 psychological evaluations were performed and catalogued in a case library at the Heart Institute of S~ao Paulo University Medical School. Each case was described in terms of the following attributes: psychological functioning, use of psychoactive substances, family dynamics, intellectual resources, adherence, motivation and knowledge about the transplant, psychological state and psychiatric antecedents in the family [14].

We have started the development of a system for psychological evaluation using part of the cases set contained in this library. Regarding the performance of the system, we have not yet been able to collect statistical evidence showing conclusively that the CBR system is more accurate than a system using only the CNM, or vice versa. However, as the CBR system is based on the use of the general knowledge elicited from the neural network combined with a case-based matching algorithm, sometimes the two systems do present dierent answers. We are continuing to collect information about performance on new examples. Another dierence in the behaviour of the two systems is in the way they deal with features that are exceptional, e.g. the ones referenced by positive and negative remindings. When presented with a case in which not all the features can be explained by some diagnosis descriptor, the CBR system attempts to bring into consideration the cases pointed out by the positive and negative remindings. The goal here would be to determine whether these cases would not be more likely to have the correct answer than the prototypical cases of the hypotheses formed. The neural network, however, would behave dierently ignoring nodes that were not activated with the highest output (which is usual for nodes with more atypical combinations of features). Regarding the organization of the system's memory, it can often be observed that in CBR systems both normative experiences and distinctive ones are represented as cases. In Protos [15], a well-known system for the diagnosis of hearing disorders, normative experiences are stored in prototype cases. The same can be

seen in more recent examples, such as in [12], where the memory of the system is split into two levels, one containing atypical cases and the other containing prototype cases. In the architecture presented in this paper, normative experiences are not represented in the same form of cases, but in the form of diagnosis descriptors. The reason for this is that by generalizing cases in diagnosis descriptors and by categorising their attributes into triggers, primary features, etc, we can guide the reasoning process of the system and build explanations.

6 Conclusion We have presented here a new approach for learning general knowledge in a case-based system through the use of a neural network. After being trained with a set of real cases, the neural network is able to discriminate between relevant and irrelevant features for each of the diagnoses considered by the system. The network is not employed to solve new case-like problems, though. Instead, it is analysed, and the knowledge embodied in its connections is interpreted and stored in diagnosis descriptors. These descriptors are then used to guide the reasoning process in a case-based system. We are currently using the case-based model described in this paper to develop a system that performs psychological evaluation of candidates for heart transplants. For this system, we have also collected knowledge graphs showing the relevance of possible combinations of features for each of the diagnoses considered. We have compared these knowledge graphs with the knowledge acquired and stored in the diagnosis descriptors. So far, we have been able to recognise that the kind of information stored in the diagnosis descriptors is often dierent from the knowledge represented in the experts' knowledge graphs. However, when used in practice, the knowledge graphs have a lower rate of correct classi cation than the method described here, although it is important that the two representations (the one given by the expert and the one given by the computer) match. This is particularly demanded because it is one of our goals that the knowledge stored in the diagnosis descriptors be not only comprehensible but also useful for building appropriate explanations. The case-based model presented here is also being expanded for use in the development of other systems in dierent application domains: in particular, one in geology and one in banking. In the geological application, the main goal of the system is to classify sandstone deposits in accordance with previous cases of rock classi cation collected by a geologist. In the banking application, the system has to determine whether a client has the pro le of a good or bad payer, in accordance with previous cases in credit scoring.

References 1. L. Becker and K. Jazayeri. A connectionist approach to case-based reasoning. In K. J. Hammond, editor, Proceedings of the Case-Based Reasoning Workshop, pages 213{217, Pensacola Beach, Florida, 1989. Morgan Kaufmann.

2. S. A. Borghetti-Maio. Quality of life after cardiomyoplasty. Journal of heart and lung transplantation, 13(6), 1994. 3. J. Dinsmore. Thunder in the gap. In J. Dinsmore, editor, The connectionist and the symbolic paradigm: closing the gap. Lawrence Erlbaum Associates, Hillsdale, NJ, 1992. 4. J. A. Hendler. Marker-passing over microfeatures: towards a hybrid symbolic/connectionist model. Cognitive Science, 13:79{106, 1989. 5. J. Kolodner. Maintaining organization in a dynamic long-term memory. Cognitive Science, 7:243{280, 1983. 6. P. Koton. Reasoning about evidence in causal explanations. In Proceedings of the AAAI, pages 256{261, St. Paul, Minnesota, August 1988. Cambridge, MA: AAAI Press. 7. B. F. Le~ao and E. B. Reategui. A hybrid connectionist expert system to solve classi cational problems. In Proceedings of Computers in Cardiology, London, UK, 1993. 8. B. F. Le~ao and A. F. Rocha. Proposed methodology for knowledge acquisition: a study on congenital heart disease diagnosis. Methods of Information in Medicine, 29:30{40, 1990. 9. D. T. Lunde. Psychiatric complications of heart transplant. American Journal of Psychiatry, 126(3):117{129, 1969. 10. R. J. Machado and A. F. Rocha. The combinatorial neural network: a connectionist model for knowledge based systems. In B. Bouchon-Meunier, R. R. Yager, and L. A. Zadeh, editors, Uncertainty in knowledge bases. Springer Verlag, 1990. 11. R. J. Machado and A. F. Rocha. A hybrid architecture for fuzzy connectionist expert systems. In A. Kandel and G. Langholz, editors, Hybrid architectures for intelligent systems, pages 135{152. CRC Press, Boca Raton, 1992. 12. M. Malek and V. Rialle. A case-based reasoning system applied to neuropathy diagnosis. In M. Keane, J. P. Haton, and M. Manago, editors, Proceedings of the European Workshop on Case-Based Reasoning, Chantilly, France, 1994. 13. P. Myllymaki and H. Tirri. Massively parallel case-based reasoning with probabilistic similarity metrics. In K-D. Altho, K. Richter, and S. Wess, editors, Proceedings of the First European Workshop on Case-Based Reasoning, pages 48{53, Kaiserslautern, November 1993. 14. M. E. Olbrich and J. L. Levenson. Psychosocial evaluation of heart transplant candidates: an international survey of process, criteria and outcomes. Journal of heart and lung transplantation, 10(6):948{955, 1991. 15. B. W. Porter, R. Bareiss, and R. C. Holte. Concept learning and heuristic classi cation in weak theory domains. Arti cial Intelligence, 45(1-2):229{263, September 1990. 16. E. B. Reategui and J. Campbell. A classi cation system for credit card transactions. In M. Keane, J. P. Haton, and M. Manago, editors, Proceedings of the European Workshop on Case-Based Reasoning, Chantilly, France, 1994. 17. E. B. Reategui and B. F. Le~ao. Integrating neural networks with the formalism of frames. In S. Grossberg, editor, Proceedings of the World Congress on Neural Networks, Portland, Oregon, 1993. Lawrence Erlbaum Associates. 18. D. E Rumelhart, G. E. Hinton, and J. L. McClelland. Learning internal representations by error propagation. In D. E Rumelhart, J. L. McClelland, and The PDP Research Group, editors, Parallel Distributed Processing: explorations in the microstructures of cognition, volume 1. MIT Press, Cambridge, MA, 1986.

This article was processed using the LATEX macro package with LLNCS style