PatternDiviner: A Pattern Recognition Tool. Gregory S. Hill and Goran Trajkovski. Cognitive Agency and Robotics Laboratory. Towson University, 8000 York ...
PatternDiviner: A Pattern Recognition Tool Gregory S. Hill and Goran Trajkovski Cognitive Agency and Robotics Laboratory Towson University, 8000 York Road, Towson MD 21252-0001 E-mail: {ghill1, gtrajkovski} @ towson.edu Phone 410-704-6310, Fax 410-704-3868
Overview of Original Software Activity is central to thought and cognition. Through
Figure 1: TamTam ExampleFigure 1 gives an
interaction autonomous agents build working
example of the TamTam output for a pattern. Patterns that
representations of the environment they inhabit
are successfully recognized are the ‘winners’ and are used
(Trajkovski 2007).
as the parents for the next generation of more sophisticated
TamTam is a software demo based on interactionist
patterns, in an approach reminiscent of genetic algorithms.
principles (Bickhard 1980) based on unsupervised learning (Buisson 2006). This applet was developed to recognize
We viewed TamTam not only as an demonstrative
and anticipate rhythmic patterns entered via a computer
simulation, but also as a possible learning paradigm in
keyboard. TamTam always starts with a basic set of
pattern recognition and anticipation. We studied the
rhythms (e.g., four full-notes), and then uses a
efficacy of converting this approach into a generalized
sophisticated algorithm for generating more and more
pattern recognition tool to be used for pattern mining for
complicated child patterns, based on the previously
large datasets, including sets of bioinformatics data such as
recognized patterns input by the user.
gene sequences.
Methods One of the issues when working with the code was that it was written as a java applet (see [Buisson 2003] for details), with all of the classes residing in one file, and a subsequent reliance on global variables. Based on the TamTam code, we developed PatternDiviner, a software suite for pattern mining in gene sequences. The core of the product is the pattern recognition engine (PRE), that relies
Figure 1: TamTam Example
on a multitude of other independent classes and interfaces, including Note.java, PatternBuilder.java,
PatternDisplay.java, Sequence.java,
through the gene sequence in set sizes of 4 genes through
SequenceCanvas.java, Stroke.java, TamTamPanel.java,
22 genes, as customary in bioinformatics data mining
and TouchPanel.java. PatternDisplay is an interface, used
when studying nucleotide sequence interaction.
by TamTamPanel, which implements the method updatePatternDisplay(Sequence). This method is used to
For example, given a partial gene sequence of
explicitly display the results of the PRE. In the case of the
‘AGGGTGCGCA AATTGGCGCA …’, the first round of
original TamTam applet, TamTamPanel displays a
sequences would be ‘AGGG’, ‘GGGT’, GGTG’, ‘GTGC’,
graphical sequence of notes, based on the pattern being fed
etc. Any patterns recognized by the PRE are stored by the
to it by the user, as output in the applet. The core pattern
engine internally.
recognition code was placed in PatternBuilder, which is the PRE of this system. See Figure 2 for the UML diagram
Work in Progress
of the essential classes.
The results of running the Salmonella gene sequence (NCBI GeneBank 2006) through the engine were negative. The key sequence lengths we were interested in were between 18 and 22 genes. The PRE was able to recognize patterns of up to five genes in length, but nothing further.
There are several possible reasons for the negative result. The first, and most obvious, is that there may simply be no patterns to be recognized. A further issue might be the particular algorithm for generating the child patterns off of the winning parent patterns. A winning pattern of, for example, ‘atat’ might generate a child pattern of ‘atgat’, which may very well not be a larger pattern within the sequence. Finally, larger gene sequence datasets should be tested against the tool. As well as different types of data (meteorological, traffic-flow patterns in various urban areas, etc.).
Figure 2: UML Diagram This is a work in progress. More research into alternative A separate class, DNAPatterns.java, implements PatternDisplay, which loads up the sample dataset and feeds it to the Sequence class. We used the following conversion between notes representation in TamTam and the base nucleotides as follows: adenine (abbreviated A, equivalent to a full note), cytosine (C, half note), guanine (G, note) and thymine (T, 1/8 note). We then iterated
pattern generation schemes might be worthwhile. Also, it would be interesting to further develop the PRE into a more abstract, and extensible, class. With a little work, a base class could be developed that managed some of the basic pattern recognition, and then let any sub-classes implement the specific pattern recognition algorithms needed by the developer. Using a basic Factory pattern,
TamTam could be used in a variety of environments, easily modifiable and testable.
8) Stojanov, Bozinovski, S, Trajkovski, G, "Interactionist-Expectative View on Agency and Learning", in: IMACS Journal of Mathematics and
Acknowledgements
Computers in Simulation, North-Holland, Amsterdam,
This work was assisted generously by Dr Jean-Christophe
vol 44 (1997) 295-310.
Buisson, and was partially funded by the Faculty Research and Development Committee of Towson University.
References 1) Bickhard, M. “Interactivist Manifesto”. Retrieved online on June 10, 2006 at http://www.lehigh.edu/~mhb0/InteractivismManifesto. pdf. 2) TamTam, Retrieved online on June 5, 2006 at http://diabeto.enseeiht.fr/tamtam/, Dr. Jean-Christophe Buisson of L’Ecole Nationale Supérieure d'Electrotechnique, d'Electronique, d'Informatique, d'Hydraulique et des Télécommunications (http://enseeiht.fr/) 3) Jean-Christophe Buisson: “A rhythm recognition computer program to advocate interactivist perception”, Cognitive Science, Volume 28, Issue 1, January-February 2004, Pages 75-87, 4) NCBI GeneBank: http://www.ncbi.nlm.nih.gov/Genbank/GenBankFtp.ht ml 5) Trajkovski, G., “An Imitation-Based Approach to Modeling Homogenous Agents Societies”, IDEA Publishing, 2007. 6) Collins, S, and Trajkovski, G (2006) “Attack of the Rainbow Bots: Generating Diversity through MultiAgent Systems”. In Trajkovski, G. (ed) Diversity in Information Technology Education. Hershey, PA: InfoSys Press, pp 196-241. 7) Trajkovski, G., Collins, S.: “Autochthony Through Self-Organization: Interactivism and Emergence in a Virtual Environment”, New Ideas, Elsevier, in press.