PatternDiviner: A Pattern Recognition Tool

0 downloads 0 Views 180KB Size Report
PatternDiviner: A Pattern Recognition Tool. Gregory S. Hill and Goran Trajkovski. Cognitive Agency and Robotics Laboratory. Towson University, 8000 York ...
PatternDiviner: A Pattern Recognition Tool Gregory S. Hill and Goran Trajkovski Cognitive Agency and Robotics Laboratory Towson University, 8000 York Road, Towson MD 21252-0001 E-mail: {ghill1, gtrajkovski} @ towson.edu Phone 410-704-6310, Fax 410-704-3868

Overview of Original Software Activity is central to thought and cognition. Through

Figure 1: TamTam ExampleFigure 1 gives an

interaction autonomous agents build working

example of the TamTam output for a pattern. Patterns that

representations of the environment they inhabit

are successfully recognized are the ‘winners’ and are used

(Trajkovski 2007).

as the parents for the next generation of more sophisticated

TamTam is a software demo based on interactionist

patterns, in an approach reminiscent of genetic algorithms.

principles (Bickhard 1980) based on unsupervised learning (Buisson 2006). This applet was developed to recognize

We viewed TamTam not only as an demonstrative

and anticipate rhythmic patterns entered via a computer

simulation, but also as a possible learning paradigm in

keyboard. TamTam always starts with a basic set of

pattern recognition and anticipation. We studied the

rhythms (e.g., four full-notes), and then uses a

efficacy of converting this approach into a generalized

sophisticated algorithm for generating more and more

pattern recognition tool to be used for pattern mining for

complicated child patterns, based on the previously

large datasets, including sets of bioinformatics data such as

recognized patterns input by the user.

gene sequences.

Methods One of the issues when working with the code was that it was written as a java applet (see [Buisson 2003] for details), with all of the classes residing in one file, and a subsequent reliance on global variables. Based on the TamTam code, we developed PatternDiviner, a software suite for pattern mining in gene sequences. The core of the product is the pattern recognition engine (PRE), that relies

Figure 1: TamTam Example

on a multitude of other independent classes and interfaces, including Note.java, PatternBuilder.java,

PatternDisplay.java, Sequence.java,

through the gene sequence in set sizes of 4 genes through

SequenceCanvas.java, Stroke.java, TamTamPanel.java,

22 genes, as customary in bioinformatics data mining

and TouchPanel.java. PatternDisplay is an interface, used

when studying nucleotide sequence interaction.

by TamTamPanel, which implements the method updatePatternDisplay(Sequence). This method is used to

For example, given a partial gene sequence of

explicitly display the results of the PRE. In the case of the

‘AGGGTGCGCA AATTGGCGCA …’, the first round of

original TamTam applet, TamTamPanel displays a

sequences would be ‘AGGG’, ‘GGGT’, GGTG’, ‘GTGC’,

graphical sequence of notes, based on the pattern being fed

etc. Any patterns recognized by the PRE are stored by the

to it by the user, as output in the applet. The core pattern

engine internally.

recognition code was placed in PatternBuilder, which is the PRE of this system. See Figure 2 for the UML diagram

Work in Progress

of the essential classes.

The results of running the Salmonella gene sequence (NCBI GeneBank 2006) through the engine were negative. The key sequence lengths we were interested in were between 18 and 22 genes. The PRE was able to recognize patterns of up to five genes in length, but nothing further.

There are several possible reasons for the negative result. The first, and most obvious, is that there may simply be no patterns to be recognized. A further issue might be the particular algorithm for generating the child patterns off of the winning parent patterns. A winning pattern of, for example, ‘atat’ might generate a child pattern of ‘atgat’, which may very well not be a larger pattern within the sequence. Finally, larger gene sequence datasets should be tested against the tool. As well as different types of data (meteorological, traffic-flow patterns in various urban areas, etc.).

Figure 2: UML Diagram This is a work in progress. More research into alternative A separate class, DNAPatterns.java, implements PatternDisplay, which loads up the sample dataset and feeds it to the Sequence class. We used the following conversion between notes representation in TamTam and the base nucleotides as follows: adenine (abbreviated A, equivalent to a full note), cytosine (C, half note), guanine (G,  note) and thymine (T, 1/8 note). We then iterated

pattern generation schemes might be worthwhile. Also, it would be interesting to further develop the PRE into a more abstract, and extensible, class. With a little work, a base class could be developed that managed some of the basic pattern recognition, and then let any sub-classes implement the specific pattern recognition algorithms needed by the developer. Using a basic Factory pattern,

TamTam could be used in a variety of environments, easily modifiable and testable.

8) Stojanov, Bozinovski, S, Trajkovski, G, "Interactionist-Expectative View on Agency and Learning", in: IMACS Journal of Mathematics and

Acknowledgements

Computers in Simulation, North-Holland, Amsterdam,

This work was assisted generously by Dr Jean-Christophe

vol 44 (1997) 295-310.

Buisson, and was partially funded by the Faculty Research and Development Committee of Towson University.

References 1) Bickhard, M. “Interactivist Manifesto”. Retrieved online on June 10, 2006 at http://www.lehigh.edu/~mhb0/InteractivismManifesto. pdf. 2) TamTam, Retrieved online on June 5, 2006 at http://diabeto.enseeiht.fr/tamtam/, Dr. Jean-Christophe Buisson of L’Ecole Nationale Supérieure d'Electrotechnique, d'Electronique, d'Informatique, d'Hydraulique et des Télécommunications (http://enseeiht.fr/) 3) Jean-Christophe Buisson: “A rhythm recognition computer program to advocate interactivist perception”, Cognitive Science, Volume 28, Issue 1, January-February 2004, Pages 75-87, 4) NCBI GeneBank: http://www.ncbi.nlm.nih.gov/Genbank/GenBankFtp.ht ml 5) Trajkovski, G., “An Imitation-Based Approach to Modeling Homogenous Agents Societies”, IDEA Publishing, 2007. 6) Collins, S, and Trajkovski, G (2006) “Attack of the Rainbow Bots: Generating Diversity through MultiAgent Systems”. In Trajkovski, G. (ed) Diversity in Information Technology Education. Hershey, PA: InfoSys Press, pp 196-241. 7) Trajkovski, G., Collins, S.: “Autochthony Through Self-Organization: Interactivism and Emergence in a Virtual Environment”, New Ideas, Elsevier, in press.