A Neural Network Model for Monotonic Diagnostic Problem Solving

0 downloads 0 Views 185KB Size Report
School of Mathematical and Computer Sciences ... problem solving where a diagnostic problem is treated as a ... is to develop problem solving algorithms involving large amounts of ..... age error between M HYP and M OBS is de ned as.
A Neural Network Model for Monotonic Diagnostic Problem Solving Yue Xu and Chengqi Zhang School of Mathematical and Computer Sciences The University of New England Armidale, NSW 2351, Australia Abstract { The task of diagnosis is to nd a hy- model, it includes three parts: network architecture,

pothesis that best explains a set of manifestations(observations). Generally, it is computationally expensive to nd a hypothesis because the number of the potential hypotheses is exponentially large. Recently, many e orts have been made to nd parallel processing methods to solve the above diculty. In this paper, we propose a neural network model for diagnostic problem solving where a diagnostic problem is treated as a combinatorial optimisation problem. One feature of the model is that the causal network is directly used as the network. Another feature is that the errors between the observations and the current activations of manifestation nodes are used to guide the network computing for nding optimal diagnostic hypotheses.

1 Introduction For a set of manifestations(observations), the diagnostic inference is to nd the most plausible faults or disorders which can explain why the manifestations are present. In general, an individual disorder can explain only a portion of the manifestations. Therefore, a composite hypothesis which consists of several individual disorder has to be found. Such composite hypothesis is able to explain all the manifestations. However, nding a composite hypothesis is very expensive computationally. One reason is the number of the possible combinations of individual disorder is exponentially large. Two approaches have been made to solve the diculty. One approach is to develop traditional AI algorithms that focus the diagnostic reasoning on a restricted diagnosis space alone so that the combinatorial exposition is largely avoided [1,2,3]. Another approach is to develop problem solving algorithms involving large amounts of parallel processing. Neural network computing, an approach based on highly parallel local computations, is known to be strong in solving computationally dicult tasks. A neural network is a massively parallel distributed processor. For a general neural network diagnosis

activation rules to calculate the node activations, and the network equilibrium which indicates the stop condition of network computing. The probabilistic causal networks are often directly adopted as the network architectures [4,5]. The activation rules and the network equilibrium vary with di erent models. Given a set of manifestations, the node activations are calculated by the activation rules repeatedly until the network equilibrium is reached. Thus, a set of disorders will be determined by the activations of disorder nodes. For neural network diagnosis models, the diculty mentioned above can be alleviated by means of the highly parallel local computations of neural networks. In this paper, we will propose a general connectionist diagnosis model where a diagnostic problem is treated as a combinatorial optimisation problem. The errors between the observations and the current activations of manifestation nodes are used to guide the network computing for nding optimal diagnostic hypotheses. For a given observation set, the neural network model proposed here can nd a composite hypothesis with the error between observations and manifestation activations being less than 0.1. Experimental results show that the average correct rate of composite hypotheses found by the model is 98%. The remainder of this paper is organised as follows. In Section 2, we characterise the diagnostic tasks and the causal network which represents the diagnostic knowledge. In Section 3, we present the neural network model including the network architecture, the activation rules, and the network computing procedure. In Section 4, experimental results will be given to demonstrate the eciency of the model. Finally, Section 5 will summarise the paper.

2 Formulation of Diagnosis In this section, rstly we characterise the general diagnosis task and describe a constrained class of diagnosis, then characterise the causal network that is used to represent the constrained diagnosis task.

A. Characterisation of Diagnosis Tasks

A diagnostic task can be characterised as a ve-tuple (D; M; e; MI ; DI ), where:

 D = fdi j i = 1; : : :; ng is a nite set of all the network only includes two layers. All the nodes in individual disorder (or causes) nodes. the causal network are divided into two sets, one is the disorder set D = fd ; : : :; d g, the other is the  M = fmj j j = 1; : : :; pg is a nite set of all the manifestation set M = f1m1 ; : : :;nmp g. In this case, manifestation (or e ects) nodes to be explained. H = D [ M. For each manifestation node mj 2 M, the interaction between the cause nodes of mj (de e is a map from ?(D) to ?(M), where ?(A) de- noted as causes(mj )) is either disjunctive or conjuncnotes the power set of a set A. 8Ds 2 ?(D), e(Ds ) tive but the compound of the two relationships speci es a subset of M which can be explained by such as d not 1 _ (d2 ^ d3 ) ?! mj or d1 ^ (d2 _ d3 ) ?! mj , Ds . d1; d2; d3 2 D. In order to distinguish the two di erent  MI is a speci c observation set which is a subset of relationships, for each node mj 2 M , there is a node M, It is represented as MI = fmij jj = 1; : : :; pg. type TY PE(mj ) associated with it, TY PE(mj ) 2 fOR; ANDg. TY PE(mj ) = OR indicates that the The value of MI will not change with time. relationship between the nodes in causes(mj ) is dis DI is an explanation to MI , e(DI ) = MI . junctive, i.e., di1 _ : : : _ dir ?! mj , causes(mj ) = di1; : : :; dir g. TY PE(mj ) = AND indicates that the MI is the observation which is the input. e is the frelationship is conjunctive, i.e., di1 ^ : : : ^ dir ?! mj . relationship between ?(D) and ?(M) which is the net-

work. The task of diagnosis is to derive DI based on MI and e. There are four possible types of interaction between individual disorders, i.e., independence, monotonicity, incompatibility and cancellation. The diagnosis problem is nonlinear when the incompatibility interaction exists and nonmonotonic when the cancellation interaction exists. It has been proved that the general (nonlinear and nonmonotonic) diagnosis task is NP hard[7]. In this paper, we consider only a constrained class of diagnosis, the monotonic diagnosis. The diagnostic problem is monotonic if the composite explanation fd1; d2g can explain additional data that are not explained by any of them. Formally the monotonic diagnosis problem can be described as: [ 8Ds  D; ( e(fdg))  e(Ds ) d2Ds

B. Formulation of Causal Networks

A causal network consists of a directed acyclic graph and a conditional probability distribution associated with the graph. It can be characterised as a triplet (H; L; P), where  H is a nite set of nodes, H = fh1; : : :; hng,  L  H  H is a set of links or arcs, where each element in L is a pair of nodes like < hi ; hj >, hi ; hj 2 H,  P is a set of probabilities. For each link < hi; hj > in L, there is a conditional probability P(hj =hi ) accompanying the link which is called the causal strength from hi to hj , representing how strongly hi causes hj . If there is no causal link between hi and hj , P(hj =hi ) is assumed to be zero. A class of simpli ed causal network, the network of disjunctive relationship between disorders or noisy-or gates, has received much attention[4,9]. In this paper, not only the disjunctive but also the conjunctive relationship are considered, and a simple version of such causal network is involved. This simple causal

The diagnostic problem characterised by this causal network is a restricted monotonic diagnosis. The restriction comes from the limitation of the compound of the two relationships. There are two basic assumptions underlying the causal network.

 Independence assumption All disorders are independent of each other, i.e., P(d1 ^ : : : ^ dn) = P(d1)P(d2) : : :P(dn).  Knowledge assumption For all disorders di 2 D, their prior probabilities 0 < P(di) < 1 are given. For di1 _ : : : _ dir ?! mj , conditional probability P(mj =di1) is given, 0 < P(mj =dik )  1, k = 1; : : :; r. For di1 ^ : : :^ dir ?! mj , conditional probability 0 < P(mj =di1 ^ : : : ^

dir ) < 1 is given. The notations pi and cij will be used as abbreviations for P(di) and P(mj =di), respectively.

3 The Neural Network Model for the Monotonic Diagnosis Task

A neural network consists of a set of nodes connected together via directed links. Numeric connection strengths, called link weights, are associated with the links. For each node hi in the network, there is a numeric value hi (t), called activation, associated with it at time t. During the neural network computing, the activation hi (t) of hi is updated by certain activation rules. In this section, we describe a neural network model for solving the monotonic diagnostic problem. Firstly, we describe the network architecture, secondly, the energy function is given, then the rules to update the node activations and nally the computing procedure to complete the diagnosis task are described.

A. Network Architecture

The two-layer causal network described in Section 2.B is used directly as the network S architecture. The node set of the network is D M. The activation di(t) of each disorder node di represents the possibility of di

if di 2 DI and xi = 0 otherwise, then DI can be considered as one of the 2n corners of the n-dimensional hypercube [0; 1]n. Thus, a diagnostic problem can be viewed as a discrete optimisation problem, i.e., to nd one of the corners of hypercube [0; 1]n which minimises the equation (2). If X is allowed to go continuously through the interior of the hypercube, i.e., 0  xi  1 rather than xi 2 f0; 1g, the diagnostic problem of nding DI with the minimumvalue of equation (2) is transformed into a continuous optimisation problem. The equation (2) is the energy function against which the minimum value is desired. For our neural network model, the optimisation problem is performed by repeatedly updating di (t) and mj (t) using certain activation rules until E(t) reaches a minimum value which is the equilibrium of the neural network. In order to reach the minimum value of the energy function, di(t) is updated based on the current error E(t). The updated di (t) will let the error get smaller and smaller. Initially, di(0) = pi . When the network approaches its equilibrium, if di(te ) > , for instance,  = 0:8, the disorder di is considered to be B. Energy Function an element of the solution, otherwise di is rejected as For real world diagnosis, people always try to nd a solution. Thus, all the disorders di with di(te ) >  the best causes under which the observed manifes- form a subset of disorders DI = fdi j di(te ) > g. DI tations are most likely. That is, the e ects of the is considered as a solution for the given observation. causes found by the diagnosis should be the same or C. Activation Rules very similar with the observed manifestations. Let 1) Manifestation Activation M(te ) = fm1 (te ); : : :; mp (te )g be the nal manifesta- Motivated by the results in [8], we developed the foltion activations derived by the diagnosis model through lowing activation rules to calculate mj (t). the neural network computing, MI = (mi1 ; : : :; mip ) be the observation as described above. According to Theorem 1 If E1 and E2 are independent, then the idea here, the smaller the di erence between M(te ) P(E1 ^ E2) = P(E1)P(E2) (3) and MI is, the better the inference of the diagnosis model works. We use the following equation to meaP(E1 _ E2) = P(E1) + P(E2) ? P(E1)P(E2) (4) sure the error E(t) between M(t) and MI at time t. The theorem tells us how to calculate the probabilp X ity of a conjunction or a disjunction of independent (1) propositions. E(t) = (mij ? mj (t))2 j =1 De nition 1 (Operator )  is a two-element opThe smaller the E(t) is, the closer the M(t) erator which is expressed as gets access to MI . By certain activation rules, (X1 ; X2) = P(X1) + P(X2 ) ? P(X1)P(X2 ) (5) mj (t) can be calculated through the activations of its cause nodes in causes(mj ), that is, mj (t) is a For more than two propositions, formulas (3) and (4) function of its cause nodes which could be expressed as become the following formulas (6) and (7) respectively.

being an element of the solution to the diagnosis problem. The activation mj (t) of each manifestation node mj measures the possibility that mj can be caused by all its causative disorders. The causal strength P is used to represent the link weights. Each link connecting node di and mj is associated with the corresponding probabilistic causal strength cij which, during the computation, is used to update the activations of nodes di and mj . For a diagnostic problem, MI is the input data which are observed before the diagnosis. Generally, if a manifestation mj is observed as being present, mij , the initial value of mj is marked as being 1, otherwise as being 0. But for real world diagnosis, the presentation of some manifestations may not be clear enough to be recognised, e.g., some manifestations might be presented very clearly or strongly, but others might be presented weakly. For representing the real case, in our model, the initial values of manifestations will no longer be exactly 1 or 0, but will be a value between 1 and 0. That is, 0  mij  1, which demonstrates how strongly the manifestation is presented.

( 1 ^ E2 : : : ^ En ) = P (E1 )P (E2 ) : : : P (En )

(6)

P E

mj (t) = f(dij (t) j dij 2 causes(mj )) ( 1 _ 2 _ ) = (( (( 1 2 ) 3 ) ?1 ) ) (7) E(t) hence becomes a function of all the disorder nodes. 8mj 2 M, suppose causes(mj ) = fdi1; : : :; ding. We de ne the activation rule for mj as follows. The equation (1) could be expressed as TY PE(mj ) = AND: E(t) = F(d1(t); : : :; dn(t)) (2) mj (t) = P(di1 ^ : : : ^ din)P(mj =di1 ^ : : : ^ din) TY PE(mj ) = OR: For the diagnostic problem modelled by the neural mj (t) = P(di1 _ : : : _ din)P(mj =di1 _ : : : _ din) network, to nd a solution for a given set of manifes- = P((mj ^ di1) _ : : : _ (mj ^ din)) tations is to derive a set of disorders whose activations Since di1; : : :; din are independent, P(mj ^ di ) = minimise the equation (2). P(mj =di)P(di), P(mj =di) = cij , and P(di) refers to Let j D j= n, if any subset DI of D is represented as the current activation di(t) of di at time t, according an n-dimensional vector X = (x1; : : :; xn) where xi = 1 to the formulas (6) and (7), the activation rules above P E

E

:::

En

:::

E ;E

;E

: : : En

; En

can be expressed as the following formula.

TY PE (m ) = AND : m (t) = d 1 (t)d 2 (t) : : : d (t)c 1 TY PE (m ) = OR : m (t) = ((: : : (c 1 d 1 (t); c 2 d 2 (t)) : : :); c d (t)) j

j

in

i

i

i j

j

j

i j

i j

i

i

in j

in

(8)

For di1 ^ : : :^ din ?! mj , in the neural network model, the causal strength between dik and mj is assigned as P(mj =di1 ^ : : : ^ din), i.e., P(mj =dik) = P(mj =di1 ^ : : : ^ din), k = 1; : : :; n:

2). Disorder Activation

The activation of disorder node di is determined by the activations of nodes in e ects(di). The change of di (t) with time can be expressed approximately as i di(t + ) ? di(t) = ( dd dt )   where  is a small positive number (the change amount of time), dddt is the derivative or rate of change of di(t), then the activation of di at time t+ is determined by the following equation: i (9) di(t + ) = di(t) + ( dd dt )   We de ne dddt as X ddji ddi =  0 < < 1 (10) dt m 2effects(d ) dt  ddji = (mij ? mj (t))cij mj 2 effects(di ) 0 otherwise dt (11) With the equation (9), (10) and (11), the activation of di can be calculated according to the current activations di(t) and mj (t), link strength P and the observation MI . i

i

j

i

In step 3, j di (t) ? di(t ? 1) j 0 is used to determine the network equilibrium instead of E(t)  0. This is because for some given observation set, E(t) might not be close to 0 if the observations (often called \ghost data") cannot be derived by any subset of D. In this case, j di(t) ? di (t ? 1) j is still likely equal to 0 because di(t) will be no longer changed when E(t) reaches a stable value. So j di (t) ? di(t ? 1) j 0 can stop the network computing when \ ghost data" is given, but there is no solution to the observation MI . If E(t)  0, according to the equations (9), (10) and (11), j di(t) ? di(t ? 1) j is close to 0 without question. In this case, j di (t) ? di (t ? 1) j 0 stops the computing, and a solution is found.

4 Experiments

We have conducted a number of experiments to test the neural network model and also to make comparisons with the model proposed in [5]. All the experiments are performed on a neural network simulator called SNNS(Stuttgart Neural Network Simulator) which provides an ecient and exible simulation environment for neural network research. The activation rules and the computing procedure are implemented with C program functions which are inserted into SNNS. By means of SNNS, the parallel computation of the neural network model is implemented. The two examples used by [5] are borrowed and tested by the neural network proposed here. We call the two examples as Example 1 and Example 2 respectively. Each of the two examples has a causal network of 10 disorders and 10 manifestations. The details of the two causal networks are given in TABLE 1. pi

D. Network Computing Procedure

The network computing procedure consists of a series of iterative computing. Initially, di (0) = pi , mj (0) = 0, i = 1; : : :; n, j = 1; : : :; p. Based on the observation data MI , the activations di (t) and mj (t) are updated with the equations (8), (9), (10) and (11). The computing keeps going repeatedly until the energy function reaches its equilibrium. Here t refers to the times of iteration. Each computing iteration consists of the following steps, initially t = 0. 1. Update mj (t) with the equation (8), j = 1; : : :; p. 2. Update the activation di (t) with the equations (9), (10) and (11), i = 1; : : :; n. 3. Determine whether the network reaches its equilibrium, that is, determine whether j di(t) ? di (t ? 1) j 0 for i = 1; : : :; n. If not, t is assigned with t + 1, and the computing goes to step 1 for the next iteration, otherwise, the computing stops. If E(t)  0, then all the di with di (t) >  forms a solution, otherwise, there is no solution to the observation MI .

cij

Example 1 0:026 p2 = 0:014 1= = 0 : 054 p4 = 0:06 3 = 0:003 p6 = 0:023 5 = 0:048 p 7 = 0:098 8 = 0:079 p10 = 0:027 9 = 0 : 31 c 12 = 0:3 14 = 0:85 c27 = 0:5 17 = 0:29 c 32 = 0:15 34 = 0:64 c36 = 0:11 35 = 0:72 c 38 = 0:88 39 = 0:62 c48 = 0:27 43 = 0 : 72 c 51 = 0:07 52 = 0:92 c55 = 0:47 54 c5;10 = 0:96 59 = 0:73 0:32 c67 = 0:26 66 = 05 c72 = 0:8 71 == 00::73 c78 = 0:58 75 = 0:04 c84 = 0:26 82 c87 = 0:69 86 = 0:23 c92 = 0:12 8 10 = 0:51 c9;10 = 0:67 97 = 0:95 = 0 : 43 c 10 4 10;5 = 0:18 10 6 = 0:11

p p p p p c c c c c c c c c C C c c c c ; c c ; c ;

Example 2 0:17 p2 = 0:07 1= = 0 : 03 p4 = 0:12 3 0:135 p6 = 0:18 5= = 0 : 075 p8 = 0:03 7 = 0:14 p10 = 0:5 9 = 0 : 06 c 12 = 0:1 14 = 0:68 c17 = 0:51 16 = 0:53 c 21 = 0:09 23 = 0:81 c25 = 0:85 24 = 0:13 c 28 29 = 0:34 c32 = 0:54 2 10 = 0:85 = 0 : 45 c 35 = 0:59 36 = 0:9 c3;10 = 0:29 37 c45 = 0:52 42 = 0:74 65 c49 = 0:32 47 = 00::72 c58 = 0:49 53 = = 0 : 09 c65 = 0:66 63 c73 = 0:22 6 10 = 0:44 = 0 : 46 c75 = 0:21 74 = 0:76 c7;10 = 0:43 76 c82 = 0:34 81 = 0:29 c91 = 0:39 88 = 0:25 c95 = 0:9 94 = 0:2 c97 = 0:38 96 = 0:48 10 2 = 0:74 c10;8 = 0:27

p p p p p c c c c c c ; c c c c c c c ; c c c c c c c ;

TABLE 1 Network details of Examples 1 and 2 For a causal network with n disorder nodes, there are 2n potential subsets of disorders. For each ofn the two examples here, we will construct all the 2 potential cases. The basic experiment includes three steps : observation generation, hypothesis generation and correctness judgement. In the observation generation step, the observations to all the 2n cases are generated by the neural network model using the 2n disorder subsets to instantiate the disorder nodes. The observations are the e ects of the disorder subsets. Let M OBS

represent the set of the observations, in the hypothesis generation step, the observations in M OBS are used as the input observations, and the composite hypotheses will be derived by the model through the network computing. Suppose D HY D is the set of the composite hypotheses, then each composite hypothesis in D HY D can explain the corresponding observation in M OBS. In order to judge whether or not the composite hypotheses are correct solutions, in the third step, the manifestations caused by the composite hypotheses are compared with the observations to see whether or not the manifestations are close to the observations. If they are close, which means that the observations can be caused by the composite hypotheses, the composite hypotheses are correct solutions for explaining the observations. Suppose M HY P is the manifestation set caused by D HY P, for each M hyp(j ) 2 M HY P and each corresponding M obs(j ) 2 M OBS, j = 1; : : :; 2n, M hyp(j ) = fm hyp(1j ) ; : : :; m hyp(pj ) g and M obs(j ) = fm obs(1j ); : : :; m obsp(j ) g, then the avhyp( ()j ) and M obs(j ) is de ned erage error between P= M ( ) =1 jm hyp ?m obs j , and the averas aem(j ) = p age error P between M HY P and M OBS is de ned as =2 aem( ) . AEM = =12 As mentioned in Section 3.B, if the e ects of the causes found by a diagnosis are the same or very close to the observed manifestations, then these causes are considered to be able to explain these observed manifestations. The closer the e ects of the causes are to the observed manifestations, the more plausible the causes form a correct solution to explain these manifestations. Therefore, the error between the manifestations and the e ects of the causes found by the neural network model is a criterion to judge whether these causes form a correct solution, i.e., if aem(j ) < , say  = 0:1, the composite hypothesis D hyp(j ) is considered to be a correct solution to explain the manifestation M obs(j ) . The experimental results given below will show that the average correct rate for the two examples is 98%. The experimental results are given in TABLE 2. For the two examples, there are around 1000 of 1023 cases whose aem are less than 0.1. According to the criterion mentioned above, the composite hypotheses of these cases are correct ones. The average correct rate for the two examples is 98%. k

p

j k

k

j

j

n

j k

j

n

Number of cases Number of cases with aem < 0:1 Number of cases with D hyp(j) = D(j) 2 ?(D) AEM

Example1 Example2 Total 1023 1023 2046 990 1016 2006 (96.7%) (99.3%) (98%) 849 (82.9%)

932 (91.1%)

1781 (87%)

0.0091

0.0089

0.009

TABLE 2 The experimental results of the neural network model

5 Conclusion

In summary, a neural network diagnosis model was presented in this paper. One feature of the model is that, unlike many other optimisation neural network models, the causal network is directly used as the neural network without any further transformation. To the authors' knowledge, the only other neural network diagnostic model which also uses a causal network directly as the neural network is the model proposed by Peng and Reggia [4,5]. But the causal network used by the model is a noisy-or network, i.e., only independence interaction is involved in the network. This kind network can only represent independent diagnostic problems. Compared with the model of Peng and Reggia, our model can represent and further perform the monotonic diagnostic problem which is more general than the independent diagnostic problem. Another feature of the model is that, the errors between the observations and the current activations of the manifestation nodes are used to guide the network computing. The experimental results show that the correct rate of diagnosis with this method is very high.

References

[1] de Kleer, J., \Focussing on Probable Diagnoses", AAAI-91, Vol.2, pp 842-848, 1991. [2] Raiman, O., de Kleer,J. and Sarawat, V., \Critical Reasoning", IJCAI, Vol.1, pp 18-23, 1993. [3] Yue Xu and Chengqi Zhang, \An Improved Critical Diagnosis Reasoning Method", ICTAI, Vol.1, pp 170-173, 1996, Toulouse, France. [4] Peng, Y., and Reggia, J., Abductive Inference Models for Diagnostic Problem Solving, New York: Springer-Verlag, 1990. [5] Peng, Y., and Reggia, J., \A Connectionist Model for Diagnostic Problem Solving", IEEE Trans. On Systems, Man and Cybernetics, Vol.19, No.2, pp 285-198, 1989. [6] Goel, A. and Ramanujam, J., \A Neural Architecture for a Class of Abduction Problems", IEEE Transactions on Systems Man and Cybernetics, Vol.26, No.6, pp 854-860, 1996. [7] Bylander, T., et al., \The Computational Complexity of Abduction", Arti cial Intelligence, Vol.49, pp 25-60, 1991. [8] Blockley, D.I., et al., \Measures of Uncertainty", Civil Engineering Systems, Vol.1, pp 3-9, 1988. [9] Pearl. J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers, Inc., San Mateo, California, 1988.