Parallel Inductive Logic Programming 1 Introduction - CiteSeerX

3 downloads 0 Views 322KB Size Report
Jan 17, 1995 - The generic task of Inductive Logic Programming (ILP) is to search a prede ned ... of machine learning, theoretically settled at the intersection of ...
Parallel Inductive Logic Programming Luc Dehaspe and Luc De Raedt January 17, 1995

Department of Computer Science, Katholieke Universiteit Leuven Celestijnenlaan 200A, B-3001 Heverlee, Belgium email : Luc.Dehaspe,[email protected] fax : ++ 32 16 32 75 39; telephone : ++ 32 16 32 75 50

Abstract

The generic task of Inductive Logic Programming (ILP) is to search a prede ned subspace of rst-order logic for hypotheses that in some respect explain examples and background knowledge. In this paper we consider the development of parallel implementations of ILP systems. A rst part discusses the division of the ILP-task into subtasks that can be handled concurrently by multiple processes executing a common sequential ILP algorithm. We de ne the notion of a valid partition of an ILP-task, and test this de nition against two problem speci cations that have been employed within ILP. The second part of the paper focuses on the algorithmic description, prototypical implementation, and comparative evaluation of a parallel version of the clausal discovery system Claudien. Keywords: inductive logic programming, knowledge discovery, machine learning, con-

currency, rst order logic.

1 Introduction Inductive Logic Programming (ILP) [12, 13, 14] by now has become an established sub eld of machine learning, theoretically settled at the intersection of inductive learning and logic programming, and eld-tested through the implementation and application of a variety of ILP systems. In this paper we consider the development of parallel implementations of these systems. The classical and ambitious objective is then to obtain an algorithm with a speedup proportional to the number of processors over the best available serial algorithm1 (cf. [20]). A central issue in designing a computer system to support parallellism is how to break up a given task into subtasks, each of which will be executing in parallel with the other (cf. [11, 18]). If we want to exploit the parallel setup to the full, the distribution of the global task should not be a once-only event during initialization. Busy processes should at any time cede part of their local task to free waiting processes, thus causing the global task to be recursively partitioned. In Section 2 we de ne the general ILP-task, derive our notion of a (valid) partition of an ILP-task, verify this notion in the context of two problem speci cations for ILP, and end up with theoretical constraints on Parallel Inductive Logic Programming (PILP). Given these constraints, the second part of the paper then focuses on the algorithmic description (Section 3.1), prototypical implementation (Section 3.2), and comparative evaluation 1

Notice that the aim is not to extend the solvable problem space of a problem.

1

(Section 4) of a parallel version of the clausal discovery system Claudien [2, 6]. Finally, we round up with some conclusions in Section 5.

2 Partitioning the ILP-task Roughly speaking, ILP starts from an initial background theory B , some evidence (or examples) E , and a language bias L, which de nes the set of well-formed clauses. The aim is then to induce a hypothesis H  L that together with B explains the examples E . Formally speaking, we have:

Given:  a set of examples E  a background theory B  a language bias L that de nes the hypothesis space  a notion of explanation (a semantics) Find: a hypothesis H  L that explains the examples E with respect to the theory B. From this general problem speci cation, we can derive the notion of a partition of an ILP-task2 .

De nition 1 A partition fT1; :::; T g of an ILP-task T = (B; E; L) is a set of ILP-tasks T = (B; E ; L ) such that E  E , and L  L for all i, and that [ =1 E = E and [ =1 L = L. n

i n i

i

i

i

n i

i

i

i

Our de nition of a valid partition is an instantiation of the general constraint on the development of parallel systems according to which the results of concurrent and sequential execution should be equivalent.

De nition 2 Given:  a common sequential ILP algorithm A  an ILP task T  a partition fT1; :::; T g of T  the solution hypothesis H obtained by applying algorithm A to task T  8i 2 1 : : :n: the partial solution hypothesis H obtained by applying algorithm A to n

task T

i

i

partition fT1; :::; T g is valid i the union of partial hypotheses [ =1 H is equivalent to H (cf. Figure 1). n i

n

i

The problem of ILP is usually underdetermined, in the sense that one ILP task may have multiple solutions. Taking this perspective, one might generalize (or interpret) the above de nition as follows: any solution of the form [ =1 H (where each H is a solution for n i

2

i

i

Notice that we will not consider splitting the background knowledge between multiple processes.

2

T

ILP algorithm - A

valid partition ?

T1 T2 .. .

T

n

-

  H 

-

 H1  H. 2 . . H

6union

n

    

Figure 1: De nition 2 graphically T ) should be a solution for T , and vice versa, any solution H for T should be decomposable into [ =1 H , where each H is again a solution for T . i

n i

i

i

i

In the following paragraphs we will apply De nition 2 to the two alternative notions of explanation or semantics distinguished in ILP, i.e. the so-called normal ILP setting and the nonmonotonic ILP setting, cf. [14]. A complete overview of the di erences and similarities between these two settings can be found in [4, 14].

2.1 Normal ILP Setting

The normal semantics is the default setting of ILP used in systems such as Mis [17], Foil [16], Golem [15], and Progol [19]. Characteristic is that both positive and negative examples are used.

De nition 3 (normal explanation) Given background theory B, negative evidence E ?, positive evidence E + , and language L, the aim is to nd a hypothesis H  L such that: B [ H [ E ? 6j= 2 (consistency) and B [ H j= E + (completeness). From the combination of De nitions 2 and 3 we can derive a rst constraint on valid partitions in the normal setting:

Condition 1 (normal semantics - consistency) 9 B [ H1 [ E1? 6j= 2 > > B [ H2 [ E2? 6j= 2 = , B [ ([ =1H ) [ E ? 6j= 2; with E ? = [ =1 E ? .. > . ; B [ H [ E ? 6j= 2 > n i

n

n i

i

i

n

According to Condition 1 a valid partition in the normal setting should preserve consistency with the negative examples. Monotonicity of rst-order logic causes the implication from right to left to hold in general. The implication in the other direction however is false if the partition splits up the negative evidence, such that for some i, E ? 6= E ? . Consider in that respect the case where B [ H [ E ? j= 2 for some distinct i; j . This implies that the division of negative evidence may result in invalid partitions. If instead we split up either positive evidence E + or language L, again the implication from left to right in Condition 1 does not generally hold. Consider in that respect the following counterexample where L (resp. E +) is subdivided into clauses (resp. examples) for haswings and clauses (resp. examples) for yellow: i

i

j

3

8 > bird(tweety) > < bird(oliver) B = > flies(tweety) > : flies(X ) haswings(X ) E ? = ( yellow(oliver) (oliver) E + = haswings yellow(tweety) H1 = haswings(X ) bird(X ) H2 = yellow(X ) flies(X ) B [ H1 [ E ? 6j= 2 B [ H2 [ E ? 6j= 2 B [ ([2=1 H ) [ E ? j= 2 i

i

This counterexample illustrates that the division of E + and L may also result in invalid partitions. Consistency with the negative examples is not preserved in case a resolution derivation involving multiple partial hypotheses occurs. Only if this type of derivation is ruled out, Condition 1 is satis ed. This implies that recursive rules, or hypotheses with multiple predicates cannot be learned if the positive evidence or the language bias is split up. The completeness requirement mentioned second in De nition 3 further constrains the possibility of creating valid partitions in the normal setting:

Condition 2 (normal 9 semantics - completeness) B [ H1 j= E1+ > > B [ H2 j= E2+ = , B [ ([ =1H ) j= E +; with E + = [ =1 E + .. > . > B [ H j= E + ; n i

n

n i

i

i

n

Again the monotonicity of rst-order logic guarantees the equivalence in Condition 2 to hold in one direction, now from left to right. Therefore any solution [ =1 H (where each H is a solution to subtask T ) will be a solution for T . The implication from right to left is invalid, as it may be that for a given task T and partition fT1; :::; T g, not all solutions H to a learning task T , can be formulated as the union of solutions to subtasks T . This is illustrated in the following example: 8 > bird(tweety) > < bird(oliver) B = > yellow(tweety) > : yellow(lemon) ( flies(oliver) ? E = 8 flies(lemon) > < haswings(tweety ) E + = > haswings(oliver) : flies(tweety ) ( (tweety ) + E1 = haswings haswings(oliver) E2+ = flies(tweety) H1 = haswings(X ) bird(X ) H2 = flies(X ) haswings(X ) ^ yellow(X ) B [ ([2=1 H ) j= E + n i

i

i

i

n

i

i

i

4

B [ H1 j= E1+ B [ H2 6j= E2+ If due to distribution of E + or L, clauses for haswings and ies are induced separately, H2 may not be found.

2.2 Nonmonotonic ILP

The less common nonmonotonic setting of ILP is used (f.i.) in the system Claudien [2]. It assumes only positive evidence E , and has the following notion of explanation.

De nition 4 (nonmonotonic explanation) Given background theory B, evidence E , and language L, target hypothesis H is a maximal subset3 of L such that all clauses h 2 H are true in the minimal model of B [ E . A short hand notation for \all clauses h 2 H are true in the minimal model of B [ E " , is \H is true in M (B [ E )". The instantiation of De nition 2 to the nonmonotonic case yields a single constraint on valid partitions:

Condition 3 (nonmonotonic semantics) 9 H1 is true in M (B [ E1) > > H2 is true in M (B [ E2) = ([ =1 H ) is true in M (B [ E ); , .. > with E = [ =1 E . > ; H is true in M (B [ E ) It is easy to see that if for some i, E = 6 E , the hypothesis ([ =1 H ) might be invalid, and n i

n

i

n i

i

n

n i

i

i

therefore also the partition. However, if for all i, E = E , the condition is trivially satis ed. Therefore, partitioning in the nonmonotonic setting can be based on the language bias L. Given any partition of the language, the resulting hypotheses can always be combined into an equivalent hypothesis at the overall level. i

2.3 Constraints on PILP

The conclusion for the normal setting, should be that partitioning is only permitted if one is learning a single predicate without recursion. This nding is in line with previously reported results on the comparable problem of Multiple Predicate Learning (cf. [5]). For the nonmonotonic setting, valid partions of an ILP-task can be produced by splitting up the language bias L. Before going on to the introduction of a parallel nonmonotonic system, we should point out that many other opportunities for decomposition into subtasks can be detected at lower levels of ILP algorithms, in both settings. The programmer can for instance partition at design time by using a parallel language. Or, at run time, the algorithm might evaluate several next moves through the hypothesis space concurrently. These forms of ne grained parallellism can be both investigated and applied independent of the coarse grained type of parallellism we focus on. Sometimes, see [2, 10], one also requires minimality, which means that the hypothesis should not contain redundant clauses. 3

5

Algorithm 1: Sequential Claudien function sequential claudien( ) returns solution var QC : array [1 : : :maxint] of clause QC [1] := false H := ; while (QC =6 ;) Delete c from QC if c is true in the minimal model of B [ E then add c to H else add all re nements (c) of c to QC QC := Prune(QC )

endwhile return H endfunc

3 A parallel version of Claudien 3.1 Algorithmic description

The nonmonotonic ILP system Claudien is a straightforward instantiation of the generic ILP algorithm described in [14]. Algorithm 1 iteratively processes a queue QC of clauses c that are false in the least model of the knowledge base B [E (cf. De nition 4 of the nonmonotonic setting). Iteration continues until QC is empty4. At each step, a clause c is selected from queue QC and tested against the knowledge base B [ E . If c is a solution, it is added to H . In the other case, c is overly general, i.e. there exist substitutions  for which body(c) is true and head(c) is false in the model. As clause c is overly general it should be specialized. Applying standard ILP principles, we can use a re nement operator  (under -subsumption) for this (cf.[14, 17]). The result of re ning c is put back, and unpromising items are pruned away. Through the instantiation of the Delete and Prune parameters, the user can tailor searching and pruning strategies to the application at hand. For more information on theoretical background, PAC-learning results and applications of Claudien, see [2, 3, 6]. The notion of specialization imposes a graph structure on the hypothesis space, such that the overly general clauses c in QC each represent the subgraph of clauses that can be construed via repeated applications of re nement operator . This property of the hypothesis space o ers a simple strategy for building a parallel Claudien system. As every subgraph corresponds to a subpart L of the language bias L, the recursive division of the language bias can be easily realized through the repeated redistribution of queue QC over free waiting processes. Algorithm 2 is the main function of Parallel Claudien. The input parameter n determines the degree of parallellism, i.e. the maximal number of processes that will be executing concurrently. Processes exchange information through the use of the shared variable mailbox5. For each of the n processes, this variable contains a queue equivalent i

As Claudien is an any-time algorithm, the user can also decide to interrupt the loop in an earlier stage. 5 More sophisticated systems for interprocess communication exist, but for reasons of simplicity we will 4

6

Algorithm 2: Parallel Claudien: parent function function parallel claudien(n: integer ) returns solution type queue = array [1 : : :maxint] of clause var mailbox = array [1 : : :n] of queue for all i 2 2 : : :n do mailbox[i] := ; endfor mailbox[1; 1] := false H2 := fork(parallel claudien child(2)) H3 := fork(parallel claudien child(3)) ::: H := fork(parallel claudien child(n)) H1 := parallel claudien child(1) H := [ =1 H return H n

n i

i

endfunc

to queue QC in Algorithm 1. Initially, all queues in mailbox except the one of the rst process are set to empty. The queue of the rst process is initialized to the top node of the hypothesis space, which corresponds to total language L. The UNIX6 inspired fork instruction creates a new (child) process that will execute the call given as the single argument of fork concurrently with the calling (parent) process. Algorithm 2 calls Algorithm 3, n times. The fork instruction causes n ? 1 of these calls to be executed concurrently with the parent process in n ? 1 newly created processes. All results are stored in H1 : : :H and combined to H , which is ultimately returned as the solution. The single input parameter p of Algorithm 3 ranges between 1 and n, and identi es the present process. Global variable mailbox[p] contains a queue of clauses that represents the language partition L to be explored by p. The outmost loop terminates the moment this queue is empty for all processes. At that moment the local solution H is returned and Algorithm 3 stops. There are two more nested loops. The rst one terminates either if the same condition of the outer loop is ful lled or if the current process has received a new subtask L . The body of this loop is empty but for the do-nothing-instruction skip. After termination of this rst inner loop, QC gets the value of mailbox[p]. The second inner loop is a near copy of the one in Algorithm 1. The only di erence is that at the beginning of each step mailbox is searched for empty queues. If such an empty queue is found on position i in mailbox, process p cedes part of its local sublanguage L to process i by moving part of QC to mailbox[i]. Which part of QC is moved will depend on the search strategy chosen by the user. An important general restriction is that the Move instruction should not be allowed to empty QC , as this might result in a loop where the same subtask is passed round forever. From the moment QC contains no further candidates for re nement, mailbox[p] is set to empty in order to inform the other processes that process p is ready to receive a new subtask, i.e. a new part of L. In case common variables such as mailbox are used for interprocess communication n

p

p

p

p

continue to use the most general and basic constructs throughout. 6 UNIXTM Trademark of Bell Laboratories

7

Algorithm 3: Parallel Claudien: child function function parallel claudien child(p: integer ) returns solution H := ; while not(8i 2 1 : : :n : mailbox[i] = ;) while not(8i 2 1 : : :n : mailbox[i] = ;) and (mailbox[p] = ;) do skip endwhile p

QC := mailbox[p] while (QC =6 ;) for all i 2 1 : : :n do if mailbox[i] = ; then Move part of QC to mailbox[i]

endfor

Delete c from QC if c is true in the minimal model of B [ E then add c to H else add all re nements (c) of c to QC QC := Prune(QC ) p

endwhile

mailbox[p] := ;

endwhile return H endfunc

p

the synchronization problem of mutual exclusion occurs. Mutual exclusion is concerned with ensuring that a sequence of statements, called a critical section, is treated as an indivisible operation that can not be executed by more than one process at the same time. In Algorithm 3 the boxes mark two critical sections. They should prevent that two processes are simultaneously writing to mailbox[i] or that the incomplete mailbox[p] is copied to QC while it is being written by some other process.

3.2 A prototypical implementation

The prototype parallel Claudien built around the Claudien system7 uses the most elementary concurrency organization techniques. All interprocess communication and synchronization activities introduced in Algorithms 2 and 3 go via the le system:  setting mailbox[i] to empty in our implementation corresponds to removing le process i (if it exists), and creating an empty le process i:free;  a queue of clauses is assigned to mailbox[i] by creating a temporary le, writing the queue to this le, and moving the le to process i;  partial solutions H1 : : :H are written to separate les that are nally concatenated to H ; n

7 Both Claudien and the parallel extensions are implemented in ProLog by BIM release 4.0.5. Standalone versions for Solaris 2.3 and SunOS 4.1.3 are now available on request.

8

 (8i 2 1 : : :n : mailbox[i] = ;) succeeds when all les process 1:free : : :process n:free exist;  (mailbox[p] = ;) fails when a le process p exists;  the test (mailbox[i] = ;) in the second critical section of Algorithm 3 consists in an attempt to remove le process i:free; the test fails if this le does not exist. Our solution to the critical section problem is based on the assumption that the UNIXoperations move and remove are indivisible. As only one process can remove the le process i:free, at most one process can decide to move part of its queue to process i. Likewise, the intermediary step of writing a queue to a temporary le, and then moving this le to process i, prevents a partially written le from being copied to the local queue QC of process i. For more advanced strategies to implement synchronization we refer to [1, 9, 18].

4 Experimental evaluation

4.1 Experimental setup

Our aim was to measure and compare the speed at which sequential and parallel Claudien traverse the same hypothesis space. We selected two applications where this space contains about 120000 nodes, and ran Claudien using a depth- rst search strategy with 1, 2, 4, 8, and 16 processes. With each tested clause, and again with each solution found, we recorded the consumed cpu time in seconds. All tests were done on a SPARCserver1000 with 4 processors8. The rst application we tested on was that of learning nite element mesh-design, which has become a standard benchmark for ILP systems (see e.g. [7, 12]. The mesh data give the following characteristics of 278 edges belonging to 5 di erent structures (a-e):  the number of subedges the edge should be divided into: mesh(b11; 6);  edge types: long(b19); short(b10); halfcircuit(b3);  boundary conditions: fixed(b1); twosidefixed(b6);  loading: notloaded(b1); contloaded(b22);  geometry: neighbour(b1; b2); opposite(b1; b3); same(b1; b3). In this knowledge base, the Claudien system discovered 65 rules such as:

mesh(E; 1) mesh(E 1; 12)

notimportant(E ) circuit(E 1) ^ free(E 1) ^ neighbour(E 1; E2) ^ mesh(E 2; 2) ^ short(E 2)

The second application is taken from the domain of environmental monitoring (see also [8]). The goal here is to capture the expertise of an expert river ecologist who classi ed 292 eld samples of benthic communities from British Midland Rivers. Each sample is described by means of the abundances (recorded on a scale of 0 to 6) of eighty di erent microinvertebrate families.

As cpu time was measured, we could test parallel Claudien with degrees above 4. It should be kept in mind however that the speedups here reported will only correspond to real time speedups if a separate processor is dedicated to all concurrent processes. 8

9

ECOLOGY DATA 140000

100000

120000

80000 serial parallel-2 parallel-4 parallel-8 parallel-16 solutions

60000 40000

explored nodes

explored nodes

MESH DATA 120000

100000

20000

80000

serial parallel-2 parallel-4 parallel-8 parallel-16 solutions

60000 40000 20000

0

0 0

2000

4000

6000 8000 10000 12000 14000 cpu time (s)

0

500 1000 1500 2000 2500 3000 3500 4000 cpu time (s)

MESH DATA

ECOLOGY DATA

70

300

60

250

40 30 serial parallel-2 parallel-4 parallel-8 parallel-16

20 10

solutions

solutions

50 200 150

serial parallel-2 parallel-4 parallel-8 parallel-16

100 50

0

0 0

2000

4000

6000 8000 10000 12000 14000 cpu time (s)

0

500 1000 1500 2000 2500 3000 3500 4000 cpu time (s)

Figure 2: Results of the experiment The expert classi ed the samples into ve classes. In the experiment we limited ourselves to discovering characteristics of poorest quality water. The 279 discovered rules include: class0(Sample) leuctridae(Sample; Abundance) class0(Sample) nemouridae(Sample; Abundance) ^ Abundance > 1

4.2 Results

The results of running Claudien with 1, 2, 4, 8, and 16 processes are reported in Figure 2. In the charts on top, the values on the y-axis are the number of explored nodes. If n is the degree of concurrency, and explored(p; t) the number of nodes explored by process p after P p has consumed t cpu seconds, then y = f (t) = =1 explored(p; t). The clauses that were found to be valid are marked with a diamond. A separate chart with the number of solutions is presented in the lower half of Figure 2. The results shown in Figure 2 indicate that for up to 16 processes, the speedup is approximately proportional to the number of processes executing the task: the consumed cpu time is roughly halved each time the number of processes is doubled. n p

4.3 Comments

An important question related to the results of our experiments is how long we can go on adding new processes to reduce the consumed cpu time. Apart from obvious hardware restrictions9 , there are mainly two software related limitations we should take into account when trying to solve this question. Remember that we assume every process can execute on a separate processor. If not enough processors are available, they have to be switched between processes. By ever increasing the number of processes scheduled for a single processor we will nally overload the operating system. 9

10

The rst, application-dependent, upper boundary on the degree of concurrency stems from the fact that a (near) linear speedup can only be obtained if all processes are more or less constantly working on a subtask, i.e. if most of the time there are enough sublanguages L available. The maximal number of candidate sublanguages available at a given time equals the total size of all local queues QC (see Algorithm 3) and is related to the application-speci c average branching factor. It is for instance easy to see that in the extreme case where the branching factor equals 1, concurrency will produce no speedup at all. Secondly, interprocess communication requires a certain amount of computational overhead. If this overhead increases with the degree of concurrency, as it does with our naive implementation of parallel Claudien, there will be a point where adding more processes is useless, or even counter-productive in terms of consumed cpu time. i

5 Conclusion In this paper, we have restricted the notion of Parallel Inductive Logic Programming (PILP) to algorithms that recursively create partitions of the ILP-task, and process the subtasks concurrently. This scheme is problematic only in the normal setting of ILP where both the consistency and completeness requirements are not generally compatible with our de nition of a valid partition. With the parallel implementation and experimental evaluation of the knowledge discovery system Claudien, we have demonstrated that even with a naive interprocess communication technique, PILP in the nonmonotonic setting yields linear speedup at least for low degrees of concurrency. It remains to be investigated how the algorithms here presented can be used for building massively parallel systems and how ne grained parallellism can be added for making these systems still more e ective.

Acknowledgements

This work is part of the Esprit Basic Research project no. 6020 on Inductive Logic Programming. Luc Dehaspe is paid by the Esprit Basic Research Action ILP (project 6020), and co- nanced by the \Vlaamse Gemeenschap" through contract nr.93/014. Luc De Raedt is supported by the Belgian National Fund for Scienti c Research. The authors would like to thank Bart Demoen and Gunther Sablon for valuable comments and discussions on earlier versions of this paper.

References [1] G.R. Andrews. Concurrent programming: principles and practice. Benjamin Cummings, 1991. [2] L. De Raedt and M. Bruynooghe. A theory of clausal discovery. In Proceedings of the 13th International Joint Conference on Arti cial Intelligence, pages 1058{1063. Morgan Kaufmann, 1993. [3] L. De Raedt and S. Dzeroski. First order jk-clausal theories are PAC-learnable. Arti cial Intelligence, 70:375{392, 1994. [4] L. De Raedt and N. Lavrac. The many faces of inductive logic programming. In J. Komorowski, editor, Proceedings of the 7th International Symposium on Methodologies for Intelligent Systems, Lecture Notes in Arti cial Intelligence. Springer-Verlag, 1993. invited paper. 11

[5] L. De Raedt, N. Lavrac, and S. Dzeroski. Multiple predicate learning. In Proceedings of the 13th International Joint Conference on Arti cial Intelligence, pages 1037{1042. Morgan Kaufmann, 1993. [6] L. Dehaspe, W. Van Laer, and L. De Raedt. Applications of a logical discovery engine. In S. Wrobel, editor, Proceedings of the 4th International Workshop on Inductive Logic Programming, volume 237 of GMD-Studien, pages 291{304. Gesellschaft fur Mathematik und Datenverarbeitung MBH, 1994. [7] B. Dolsak and S. Muggleton. The application of inductive logic programming to nite element mesh design. In S. Muggleton, editor, Inductive logic programming, pages 453{472. Academic Press, 1992. [8] S. Dzeroski, L. Dehaspe, B.M. Ruck, and W.J. Walley. Classi cation of river water quality data using machine learning. In Proceedings of the 5th International Conference on the Development and Application of Computer Techniques to Environmental Studies (ENVIROSOFT'94), to appear. [9] N. Gehani and A.D. McGettrick, editors. Concurrent programming. International Computer Science. Addison-Wesley, 1988. [10] N. Helft. Induction as nonmonotonic inference. In Proceedings of the 1st International Conference on Principles of Knowledge Representation and Reasoning, pages 149{156. Morgan Kaufmann, 1989. [11] K. Hwang and D. DeGroot, editors. Parallel Processing for Supercomputers and Arti cial Intelligence. McGraw-Hill, 1989. [12] N. Lavrac and S. Dzeroski. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, 1994. [13] S. Muggleton, editor. Inductive Logic Programming. Academic Press, 1992. [14] S. Muggleton and L. De Raedt. Inductive logic programming : Theory and methods. Journal of Logic Programming, 19,20:629{679, 1994. [15] S. Muggleton and C. Feng. Ecient induction of logic programs. In Proceedings of the 1st conference on algorithmic learning theory, pages 368{381. Ohmsma, Tokyo, Japan, 1990. [16] J.R. Quinlan. Learning logical de nitions from relations. Machine Learning, 5:239{ 266, 1990. [17] E.Y. Shapiro. Algorithmic Program Debugging. The MIT Press, 1983. [18] A. Silberschatz and J. L. Peterson. Operating System Concepts. Addison-Wesley, 1988. [19] A. Srinivasan, S.H. Muggleton, R.D. King, and M.J.E. Sternberg. Mutagenesis: Ilp experiments in a non-determinate biological domain. In S. Wrobel, editor, Proceedings of the 4th International Workshop on Inductive Logic Programming, volume 237 of GMD-Studien, pages 217{232. Gesellschaft fur Mathematik und Datenverarbeitung MBH, 1994. [20] B.W. Wah, G.-J. Li, and C.F. Yu. Multiprocessing of combinatorial search problems. IEEE Computer, 18(6):93{108, june 1985. 12