An Evolutionary Approach to Concept Learning ... - Semantic Scholar

An Evolutionary Approach to Concept Learning with Structured Data Claire J. Kennedy and Christophe Giraud-Carrier Department of Computer Science, University of Bristol, Bristol BS8 1UB, U.K. fkennedy,[email protected] fax: +44-117-9545208

Abstract This paper details the implementation of a strongly-typed evolutionary programming system (STEPS) and its application to concept learning from highly-structured examples. STEPS evolves concept descriptions in the form of program trees. Predictive accuracy is used as the fitness function to be optimised through genetic operations. Empirical results with representative applications demonstrate promise.

1 Introduction The aim of concept learning is to induce a general description of a concept from a set of specific examples. The examples and the concept description are expressed in some representation language (e.g., attribute-value language, Horn clauses) and the learning task can be viewed as a search, through the space of all possible concept descriptions, for a description that both characterises the examples provided and generalises to new ones [9]. As concept learning problems of increasing complexity are being tackled, increasingly expressive representation languages are becoming necessary. The original concept learning systems use the attribute-value language, where examples and concepts are represented as conjunctions of attribute-value pairs. The simplicity of this (propositional) representation allows efficient learners to be implemented, but also inhibits their ability to induce descriptions involving complex relations. Research in the area of inductive logic programming [12] extends the propositional framework to the first-order by designing systems around the Prolog language. In this context, examples and concepts take the form of Horn clauses. Most recently, higher-order representations, based on the Escher language, have been proposed [3]. In this most expressive context, examples are closed terms and concepts are arbitrary Escher programs. Whilst the greater expressiveness of the representation language extends the applicability of concept learn-

ing, it also results in the explosion of the search space. In addition, higher-order concept learning still lacks a counterpart to the clean refinement methods of the first order and propositional settings; subsequently the search for a solution to the problem can become intractable. Evolutionary techniques have been successfully applied to concept learning in both a propositional [6] and firstorder setting [14]. The idea of and basic assumptions for the application of evolutionary higher-order concept learning are presented in [4]. This paper details the implementation of a strongly-typed evolutionary programming system (STEPS) and its application to the problem of concept learning within Escher. In STEPS, examples are closed terms and concept descriptions take the form of program trees. STEPS starts from a randomly generated, initial population of program trees and iteratively manipulates it by genetic operators until an optimal solution is found. The paper is organised as follows. Section 2 briefly introduces Escher and the closed term representation used by STEPS. Section 3 details the implementation of STEPS. Section 4 presents the results of experiments with some representative concept learning problems. Finally, section 5 concludes the paper.

2 STEPS Representation The program trees evolved by STEPS use constructs from the Escher language [8]. Escher is a new strongly typed declarative programming language that integrates the best features of both functional and logic programming languages. Its syntax is based on Haskell, and it features higher-order constructs such as set processing to provide a facility for learning in a higher order context. This paper focuses on the use of STEPS to evolve concept descriptions from examples. Here, the concept descriptions (or program trees) take the form of IF Cond THEN Ci ELSE S,

Diagram D1

Data type D1 = D2 =

Diagram D2

Shape = Circle | Triangle | Inside(Shape, Shape); Diagram = {(Shape, Int)}; {(Circle,2), (Triangle,1), (Inside(Triangle, Circle),1)}; {(Circle,3), (Triangle,2), (Inside(Circle, Triangle),1)}; Fig. 1. Example diagrams and their corresponding Escher representations

where Cond is a Boolean expression, Ci is a class label and S is either a class label or another if-then-else expression. The examples are represented as closed terms, which give a compact and self-contained description of each example. Figure 1 illustrates the closed term representation used here on two diagrams taken from Problem 47 of [2]. Each diagram contains a number of shapes, where each shape can be a circle, a triangle or a shape inside another shape. A diagram in the Escher closed term representation is a set of pairs, each consisting of a shape and a number indicating the number of times the shape appears in the diagram. Thus, D1 contains 2 circles, 1 triangle and 1 triangle inside a circle. In order to induce descriptions, it is necessary to extract parts of the individual closed terms so as to make inferences about them. This is accomplished by selector functions, which “pull” individual components out of terms. Each structure (e.g., list, set) that is used in a term has it own set of associated selector functions. The selector functions for tuples, lists and sets: v = Tuple Type proji(v) v = List Type exists \v2 -> v2 ’elem’ v length(v) v = Set Type exists \v2 -> v2 ’in’ v card(v) For example, the number of occurrences of some shape in D1 above is obtained with exists \x -> x ’in’ D1 && proj2(x), which appears as Figure 2(a) in tree form.

Once the components of the data structures have been extracted, conditions can be made on them or they can be compared to values or other data types. For example, the following expression tests whether the number of circles in D1 is equal to 2. exists \x -> x ’in’ D1 && (proj1(x) == Circle && proj2(x) == 2), The equivalent tree form appears as Figure 2(b). An algorithm has been designed to automatically generate the appropriate selector function associated with a set of types [1].

3 Evolutionary Approach Since Escher is a strongly typed language, an evolutionary paradigm that incorporates type information is necessary so that only type-correct programs are generated during learning. Traditional program tree based evolutionary paradigms, such as Genetic Programming (GP), assume the closure of all functions in the body of the program trees [7]. This means that every function in the function set must be able to take any value or data type that can be returned by any other function in the function set. While this characteristic simplifies the genetic operators, it limits the applicability of the learning technique and can lead to artificially formed solutions. In order to overcome this problem, a type system was introduced to standard GP to give Strongly Typed Genetic Programming (STGP) [10]. STGP helps to constrain the search space by allowing only type correct programs to be considered. STEPS extends the STGP approach to allow the vast space of highly ex-

exists x

exists x

&&

&&

in

proj2

in

x D1

x

x D1

&& ==

==

proj1 Circle proj2 2 (a)

(b)

x

x

Fig. 2. (a) A sample selector in tree form, (b) A sample condition in tree form

pressive Escher concept descriptions to be explored efficiently. 3.1 Population Creation The if-then-else form of the descriptions to be evolved by STEPS provides a template for all individuals in the population. In tree form, the template is as follows. if then else Cond Ci S Trees in the initial population are formed by randomly selecting subtrees from the problem alphabet. The total alphabet for a problem consists of the appropriate selector function subtrees, any additional functions provided by the user, and the domain-derived constants. The function set provided by the user typically includes the connective functions && and || (the boolean functions conjunction and disjunction) so that a number of comparisons can be made on the components of the data types. However, subtrees selected to fill in a blank slot in a partially created program tree must satisfy certain constraints so that only valid Escher programs are produced. These constraints are type and variable consistency. In order to maintain type consistency, each node in a subtree in the alphabet is annotated with a type signature indicating its argument and return types. A subtree selected to fill in a blank slot must be of the appropriate return type. The program tree in Figure 3 provides an example of type consistency violation. The type signatures in Figure 3 are in curried form and dotted-lines indicate where a subtree has been added. The addition of the Circle :: Shape subtree violates type consistency, as it is of type Shape and the function

== :: Int -> Int -> Bool requires a subtree returning type Int as its second argument. In order to maintain variable consistency, the local variables in a subtree selected to fill in a blank slot in the partially created program tree must be within the scope of a quantifier. In addition, all quantified variables in a program tree must be used in the conditions of their descendant subtrees to avoid redundancy. The program tree in Figure 4 provides an example of variable consistency violation. The addition of the subtree rooted at == :: Shape -> Shape -> Bool in Figure 4 violates variable consistency as the variable v4 :: Shape is not within the scope of a quantifier. In addition variable consistency is violated by not using the quantified variable v2 :: (Shape, Int). 3.2 Modified Crossover The requirement for type and variable consistent program trees needs to be maintained during the evolution of the programs so that only syntactically correct programs are evolved. In addition to this, it is necessary to preserve the structure of the selector function subtrees. This results in a situation where crossover can only be applied to certain nodes within a program tree. These crossover points correspond to the roots of the subtrees in the function set. Once a crossover point has been randomly selected from the first parent, a crossover point that will maintain type and variable consistency can be randomly selected in the second parent. If no such crossover point is available then an alternative crossover point is selected from the first parent and the process is repeated.

if then else :: Bool -> Class -> Class -> Class == :: Int -> Int -> Bool

Class1 :: Class Class2 :: Class

card :: Diagram -> Int Circle :: Shape v1 :: Diagram Fig. 3. A program tree exhibiting type consistency violation

if then else :: Bool -> Class -> Class -> Class exists :: (Shape,Int) -> Bool -> Bool v2 :: (Shape,Int)

Class1 :: Class Class2 :: Class

&& :: Bool -> Bool -> Bool

in :: (Shape, Int) -> Diagram -> Bool == :: Shape -> Shape -> Bool v2 :: (Shape, Int) v1 :: Diagram v4 :: Shape Triangle :: Shape Fig. 4. A program tree exhibiting variable consistency violation

3.3 Mutation During successive iterations of the evolutionary process, the amount of genetic variation in a population decreases. In an extreme case, this can lead to the loss of genetic material that is essential to the search for an optimal solution and a method for reintroducing such lost material is required. STEPS ensures the preservation of genetic diversity through six distinct forms of mutation. These mutation operators are the terminal and functional mutation operators of conventional GP and four specialisations of functional mutation. The various functional mutations can only be applied at the crossover points in a program tree and must preserve type and variable consistency. The specialised functional mutations include AddConjunction, DropConjunction, AddDisjunction and DropDisjunction. AddConjunction and AddDisjuction involve inserting an && or || respectively at the node to be mutated. Its first argument is the subtree originally rooted at that node and its second argument is randomly grown. For example, if we apply the AddConjunction operator to the == node in the tree of Figure 5(a), then we could obtain the tree of Figure 5(b). The DropConjunction and DropDisjunction operators involve randomly selecting an && or || crosspoint respectively, replacing it with the subtree that makes up its first argument.

3.4 Learning Strategy STEPS creates an initial population of a specified size ensuring that each tree preserves the necessary constraints and is unique. In order to perform population updates steady state replacement is used. Parent program trees are selected by the tournament selection technique and are recombined using both crossover and mutation. Fitness is evaluated as the predictive accuracy of a program tree over the set of examples. The choice of genetic operator is determined by the depth of the program tree. If the depth of the tree is greater than a specified maximum depth, then the tree is considered to be too big so a mutation operator that drops a conjunction or a disjunction is used. If the depth of the selected tree is less than a specified minimum depth then a conjunction or disjunction is added. If the depth of the program tree lies within the specified depth constraints then any genetic operator can be applied to it. If the offspring of a program tree already exists within the population then the tree is mutated.

4 Experiments 4.1 Michalski’s train Problem Description: The objective of Michalski’s train problem is to generate a concept description that distinguishes trains that are travelling East from trains travelling West [11]. A train in the Escher closed term representa-

if then else exist v2

if then else Class1

Class2 v2

&&

v1

Class1

in

proj2 2

v2

&& v1

== proj2 2

v2 (a)

Class2

&&

==

in v2

exist

(b)

== proj1 Circle

v2

v2

Fig. 5. Sample AddConjunction Mutation

tion is a list of cars with each car represented by a tuple of characteristics including shape, length, wheel, roof and load. type Car = (Shape,Length,Wheels,Roof,Load) type Train = [Car] direction :: Train -> direction direction([ (Rectangular,Short,2,Open,(Circle,1)), (Hexagonal,Short,2,Flat,(Triangle,1)), (Rectangular,Short,2,Flat,(Circle,2)) ]) = East;

Learning Parameters: Examples: 10, Population size: 300, Minimum depth of trees: 4, Maximum depth of trees: 7. Results: The experiment was carried out for 10 runs with an optimal solution found in the initial population for 2 out of the 10 runs. For the remaining runs, the optimal solution was found in an average of 1187 evaluations. The following is the optimal solution found in one of the runs. direction(v1) = if exists \v2 -> v2 ’elem’ v1 && proj2(v2) == Short && proj4(v2) /= Open then East else West;

i.e. A train is travelling East if it contains a short closed car. 4.2 Mutagenicity Problem Description: The aim of the Mutagenicity problem is to generate a concept description that distinguishes mutagenic chemical compounds from nonmutagenic chemical compounds [13, 5]. A chemical com-

pound in the Escher closed term representation is a highlystructured term consisting of the atom and bonds that make up the structure of the compounds and some chemical attributes thought to be relevant to the problem by domain experts, as illustrated below. type Atom = (Label,Element,AtomType,Charge); type Bond = ({Label},BondType); type Molecule = (Ind1,IndA,Lumo,{Atom},{Bond}); mutagenic :: Molecule -> Bool; mutagenic (True,False,-1.487, {1,C,22,-0.188) ... (28,O,40,-0.389)}, {{1,2}, 7) ... ({26,28},2)}) = True;

Learning Parameters: Examples: 188, Population size: 300, Minimum depth of trees: 5, Maximum depth of trees: 12. Results: A 10-fold cross validation was carried out to give an average accuracy of 87.3% with a standard deviation of 5.8%. This is comparable to the results of other learning systems on the same data set (e.g., see [13]). The following is a solution (87.8% accurate with respect to all examples) found in 3337 evaluations. mutagenic(v1) = if (proj1(v1) == True || proj3(v1) < -2.368) && (not (exists \v2 -> v2 ’in’ proj4(v1) && proj3(v2) == 93)) then True else False;

5

Conclusion

[8] J.W. Lloyd. Declarative programming in escher. Technical Report CSTR-95-013, Department of Computer Science, University of Bristol, 1995.

This paper details the implementation of a strongly-typed evolutionary programming system (STEPS) and its application to the problem of learning concept descriptions from highly-structured examples. The type system allows efficient use of genetic operators to evolve complex program trees. Preliminary experiments with STEPS on two representative, higher-order concept learning tasks demonstrate promise.

[9] T.M. Mitchell. Generalization as search. Artificial Intelligence, 18:203–206, 1982. [10] D.J. Montana. Strongly typed genetic programming. Evolutionary Computation, 3(2):199–230, 1995. [11] S. Muggleton and C.D. Page. Beyond first-order learning: Inductive learning with higher order logic. Technical Report PRG-TR-13-94, Oxford University Computing Laboratory, 1994.

Acknowledgements This work is funded by EPSRC grant GR/L21884. The authors would like to thank the members of the Machine Learning Research Group at Bristol University for many interesting discussions relating to this work. Special thanks to Tony Bowers for his implementation of the Escher interpreter.

[12] S. Muggleton and L. De Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming, 19/20:629–679, 1994. [13] A. Srinivasan, S. Muggleton, R. King, and M. Sternberg. Mutagenesis: ILP experiments in a nondeterminate biological domain. In S. Wrobel, editor, Proceedings of Fourth Inductive Logic Programming Workshop. Gesellschaft f¨ur Mathematik und Datenverarbeitung MBH, 1994. GMD-Studien Nr 237.

References [1] C. Giraud-Carrier A.F. Bowers and J.W. Lloyd. Higher-order logic for knowledge representation in inductive learning. In preparation, 1999. [2] M. Bongard. Pattern Recognition. Spartan Books, 1970.

[14] M. L. Wong and K. S. Leung. Genetic logic programming and applications. IEEE Expert, October 1995.

[3] P. Flach, C. Giraud-Carrier, and J.W. Lloyd. Strongly typed inductive concept learning. In Proceedings of the International Conference on Inductive Logic Programming (ILP’98), pages 185–194, 1998. [4] C.J. Kennedy. Evolutionary higher-order concept learning. In John R. Koza, editor, Late Breaking Papers at the Genetic Programming 1998 Conference, University of Wisconsin, Madison, Wisconsin, USA, 22-25 July 1998. Stanford University Bookstore. [5] R. King, S. Muggleton, S. Srinivasan, and M. Sternberg. Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity in inductive logic programming. Proceedings of the National Academy of Sciences, 93:438–442, 1996. [6] J.R. Koza. Concept formation and decision tree induction using the genetic programming paradigm. In H.-P. Schwefel and R. M¨anner, editors, Parallel Problem Solving from Nature, pages 124–128, 1990. [7] J.R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge, Massachusetts, 1992. 6