Unsupervised case classification using kohonen - IEEE Xplore

5 downloads 2212 Views 309KB Size Report
Diagnosis,. Help Desk, Assessment, Design and. Decision Support. CBR solves new problems by adapting previously successful solutions to similar problems.
UNSUPERVISED CASE CLASSIFICATION USING KOHONEN ‘SELF-ORGANIZING FEATURE MAP’ IN A CASE-BASED REASONING SYSTEM Selvakumar Manickam

Syed Sibte Raza Abidi

Health Informatics Research Group School of Computer Sciences Universiti Sains Malaysia 11800 Minden, Malaysia. e-mail : selva@,cs.usm.my

Health Informatics Research Group School of Computer Sciences Universiti Sains Malaysia 11800 Minden, Malaysia e-mail : sraza@,cs.usm.my

remembering. Remindings facilitate human reasoning in many contexts and for many tasks, ranging fiom children’s simple reasoning to expert decision-making. Much of the original inspiration for the CBR approach came fiom the role of remindings in human reasoning.

Abstract: Case-Based Reasoning (CBR) is a relatively recent problem solving technique that is attracting increasing attention. Major areas where CBR is used are Diagnosis,. Help Desk, Assessment, Design and Decision Support. CBR solves new problems by adapting previously successful solutions to similar problems. This paper presents a technique that uses Kohonen ‘self-organizing feature map’ (SOM) in improving and enhancing the indexing and retrieving method of cases in a CBR system by clustering cases with similar properties together. The SOM approach has proven to be an efficient method for clustering large data collections, and simultaneously presenting the user with a particular planar representation of the clusters. By using SOM, the system could learn about the emergence of any indices that had not previously been thought significant and thus, increasing the flexibility of the system when it comes to different indexing scheme. It could also help cases to be retrieved quickly. The nodes in the network converge to form clusters to represent groups of entities with similar properties.

The CBR approach is based on two tenets about the nature of the world. The first tenet is that the world is regular: similar problems have similar solutions. Consequently, solutions for similar prior problems are a useful starting point for new problem-solving. The second tenet is that the types of problems an agent encounters tend to recur. Consequently, future problems are likely to be similar to current problems. When the two tenets hold, it is worthwhile to remember and reuse current reasoning: case-based reasoning is an effective reasoning strategy. A CBR system needs some mechanism for retrieving similar cases fiom the case base. There are different approaches, depending on the case representfition that was chosen. Text-based CBR systems use a keyworddriven approach. Conversational CBR systems usually have a hard-coded selection approach.

Keywords Case-Based Reasoning, Kohonen Map, Self Organizing Map, Case Retrieval.

CBR systems that follow the structures data approach sometimes offer the possibility of customizing the way the similarity is computed. The underlying technique. used is called nearest neighbor retrieval. The global similarity between cases can be computed, for example, as the weighted sum of a local similarity that is computed for each feature used to describe a case. If the case base is reasonably large, it must be indexed.

I. INTRODUCTION Reasoning is often modeled as a process that draws conclusions by chaining together generalized rules, starting from scratch. CBR takes a very different view. In CBR, the primary knowledge source is not generalized rules but a memory of stored cases recording specific prior episodes. In CBR, new solutions are generated not by chaining, but by retrieving the most relevant cases from memory and adapting them to fit new situations. Thus in CBR, reasoning is based on

Nearest neighbor retrieval is a search approach that selects experience based on some geometrical distance computed in the attribute space. The search engine

0-7803-6355-8/00/ $10.00 02000 IEEE

11-524

evaluates the n-dimensional “distance” between the query and all cases in the case-base, taking into account the weights. The results are presented in order of Qdimensional “proximity”.

The output layer, on the other hand, is used to represent the learnt pattems. Basically, it maps an input pattem to one output unit. The output layer may consist of m number of units, organised as a ‘two-dimensional map’, i.e. in terms of rows and columns.

Kohonen ‘self-organizing feature map’ (SOM) incorporates the same type of knowledge representation which employs nearest neighbor by implementing a topological neighborhood. This paper presents how a traditional flat structured case base can be represented in the form of a kohonen map and how this helps the CBR system in improving it’s performance.

So, if the breast cancer case-base contains 18 cases with 3 input attribute and 1 output attribute, it could be represented as a Kohonen Map as shown in Fig. 1.

Both the input and output layers are fully connected such that each unit in the input layer is connected with all units in the output layer. Associated with each output unit, say 0, , is an n-dimensional weight vector (where n is the number of units in the input layer). The weight vector stores the strength of the connections fiom the output unit Oi to all the input units. So, the input attributes are mapped to it’s corresponding outputs based on the strength of the weight vectors. During learning, the weight vectors for each output unit is modified. Which means as new cases come in, the weight would have to be altered to support the evidence received.

11. CASE REPRESENTATION The first step in building a case-based application is to decide how the case is stored, how the CBR retrieval engine will perform, and what kind of knowledge can be discovered by the data mining engine. A case typically contains the Determiner(1nput) attribute and Outcome(0utput) attribute, for example in a breast cancer domain a case would have the following properties :

111. CASE RETRIEVAL Table 1. Breast Cancer Domain

I

Attribute

1

Kohonen maps incorporate the notion of topological neighborhoods in the output layer. A topological neighborhood is a region comprising some number of output units surrounding a particular output unit, say Oi as shown in Fig. 3. The units in the topological neighbourhood of the unit Oi would be affected by its activation level. Each output unit may have a topological neighbourhood n, where the size of the neighbourhood is determined by the parameter neighbourhood radius.

Input

+Output

Fig. 2 illustrates the conventional method of a case-based reasoning system. In this methodology the index and he structure had to be manually determined by the developer of the system, causing much time spent on it. Furthermore, as the case-base becomes larger, the retrieval process of the cases would take a longer time and this would slow down the whole reasoning process.

Meanwhile, a.Kohonen Map comprises of an input layer and an output layer. The input layer is used to present an input pattem to the Kohonen Map. It is n-dimensional, i.e., it may consist n number of units, in the breast cancer domain it contains 3 number of input units. Each dimension of the input pattem is represented by a unit in the input layer. Output Layer (18 units arranged as - Rows = 3 & Columns = 6 )

Input Layer (n dimensions, where n =3)

-

confirmed __

solution

Figure 1. Kohonen Map Representationfor Breast Cancer Case-Base

Revise

proposed

‘soiution

Figure 2. Conventional Case-Based Reasoning System

11-525

‘NeighborhoodRadius = I Output Unit of Case n

0 0 0

__ Output unit of other cases which have similar properties with Case n

Figure 3. Topological Neighbourhood

Once the case-base has been converted into Kohonen Map representation, it can be used by the system for the reasoning process. When the system is approached with a new problem or case, the Kohonen Map would then map the input unit of the case and map it to the best output unit based on the strength of the weight vector which has been earlier has learnt during the training period. The retrieved solution would then be adapted to fit the new case and it is then output to the user. As more new cases is passed to the Kohonen Map, it will alter and adjust the weights to provide a more accurate solution in the future, thus, it learns as new cases come in.

Fig. 2 illustrates the conventional method of a case-based reasoning system. In this methodology the index and he structure had to be manually determined by the developer of the system, causing much time spent on it. Furthermore, as the case-base becomes larger, the retrieval process of the cases would take a longer time and this would slow down the whole reasoning process. Meanwhile, Fig. 4 illustrates a case-based reasoning system that uses kohonen map to represent the case-base. A one-time process is performed at the initial stage of the system where the case-base in the conventional format is converted into a Kohonen Map representation. This is the stage where the Kohonen Map is trained using the available case-base, adjusting all the weight vectors. The accuracy of the case retrieved using Kohonen Map increases with the number of cases used to train the network. A 100 cases would give a very low accuracy, a 500 cases would give a nominal accuracy and a 1000 cases would give a high accuracy.

current problem

There is no effort of assigning index to the cases, as the Kohonen Map would cluster all the cases that have similar properties, thus, removing the need for index creation. Furthermore, retrieval of closely matching solutions is fast since if a case closely matches to the new case, then all the neighbors of that particular case is also closely matching, and they can be immediately retrieved.

Convert conventional Domain

kohonen map representation

etrieve

I-

Revise

output solution to user

confirmed so I utio n

Figure 4. Case-Based Reasoning System with Kohonen Map

11-526

0 ne-Time Process

[6] T. Kohonen, E. Oja, 0.Simula, A. Visa, and J. Kangas, ‘Engineering applications of the self-organizing map’, Proceedings of the IEEE, vol. 84, pp. 1358-1384, Oct. 1996.

IV. CONCLUSION The idea behind this approach is to emulate the problem description space and the solution space of a case as input and output pattem respectively and the mapping of a problem to the solution is treated as the connections with weight as illustrated in Fig. 4.

[7] Klaus-Dieter Althoff, Ralph B e r g ” , Case Based Reasoning for Medical Decision Support Tasks: The ZNRECA Approach, In : Artificial Intelligence in Medicine Journal, Vol. 12, 1998.

lapul

[8] Alexandre Meissonnier. A Case-Based Information System for Case-Based Applications and Systems Based on MRECA, AVExpert System Group, University of Kaiserslautem, October 1996.

Figure 4. ProblemlSolutionSpace As InpuUOuput Pattem

Although this methodology can help reduce the development period of a case-based reasoning system and also reduce the time of case retrieval but it cannot be applied for all domains. Finally, it should be stated once more that this paper advocates the use of Kohonen Map as an approach that can be used to expedite the retrieval stages of a case-based reasoning system. It should be used in conjunction with traditional approaches, not in direct competition.

V. REFERENCES [l] A. Aamodt and E. Plaza, ‘Relating Case-Based Reasoning: Foundational Issues, Methodological Variations and System Approaches’, AI Communications 7, 1, 1994. [2] B. Bartsch-Sporl, ‘How to Make CBR Systems Work in Practice’, in: H.D. Burkhard and M. nz eds., 4”. German Workshop on Case-Based R soning System Development and Evaluation, University Berlin , 1996.

C Y

[3] J.L. Kolodner, ‘Case-Based Reasoning’ (Morgan Kaufinann, San Mateo), 1993. [4] Kohonen, T., The Self-organizing Map, SpringerVerlag, 1995. [5] Evangelos, Simoudi? and James Miller. Validated retrieval in case-based reasoning. In Proceedings AAAI90, pages 3 10-313, August 1990,.

11-527