Evolving Finite Automata with Two-Dimensional ... - Semantic Scholar

Evolving Finite Automata with Two-Dimensional Output for DNA Recognition and Visualization

Edgar E. Vallejo

Fernando Ramos

Computer Science Department Tecnologico de Monterrey Campus Estado de Mexico, Mexico [email protected] Abstract

This article presents a computational model for biosequence recognition and visualization. Finite automata with 2-D output recognize and create maps of DNA sequences. Populations of such machines evolve discrimination capabilities using genetic algorithms. Experimental results indicate that evolved machines are capable of recognizing HIV sequences in a collection of training and validation sets. In addition, we found that these sequences map to similar structures on the cartesian space.

Computer Science Department Tecnologico de Monterrey Campus Morelos, Mexico [email protected] from the concatenation of values of the nite automata state transition function (Vallejo and Ramos, 2001). Fitness was de ned as the number of sequences correctly classi ed. 3

We performed several runs with this model. In most runs, the evolutionary process yielded nite automata capable of classifying all sequences correctly. In addition, we found that HIV sequences map to very similar structures on the cartesian space. 4

1

FINITE AUTOMATA WITH TWO-DIMENSIONAL OUTPUT

A nite automaton with two-dimensional output is a special type of nite automata (Hopcroft and Ullman, 1979). It has a read-only input tape. The input tape head can read one symbol at a time, and in one move the input tape head shifts one square to the right. The output of the machine is produced on the cartesian space. The output tape head writes one symbol at the current position and in one move the output tape head advances one coordinate (up, down, left or right). The machine either accept or reject each sequence. 2

EXPERIMENTS

We applied a genetic algorithm to a population of nite automata with two-dimensional output to evolve the ability of discriminating a collection of DNA sequences from a collection of training and validation sets. These sets consist of positive and negative examples of HIV sequences. We used a generational genetic algorithm with tournament selection and elitism (Mitchell, 1996). Genome representation was derived

RESULTS

CONCLUSIONS

In this work, we extended the nite automata model to perform biosequence recognition and two-dimensional output. We evolved a population of such machines to perform sequence recognition. The evolutionary process yielded machines with recognition and generalization capabilities. The evolved machines were capable of producing a convenient biosequence representation. References

J.E. Hopcroft, J.D. Ullman (1979).

Introduction to

. Ad-

Automata Theory, Languages and Computation

dison Wesley Publishing Company.

M. Mitchell (1996). An Introduction rithms. The MIT Press.

to Genetic Algo-

D.W. Mount (2001). Bioinformatics: Sequence and Genome Analysis. Cold Sring Harbor Laboratory. E.E Vallejo, F. Ramos (2001). Evolving Turing Machines for Biosequence Recognition and Analysis. In J. Miller, M. Tomassini, P.L. Lanzi, C. Ryan, A.G.B. Tettamanzi W.B. Langdon (eds.) 4th European

Conference

roGP2001

on

Genetic

Programming,

Eu-

LNCS 2038. pp. 192-203. Springer-Verlag.