Recurrent Neural Networks Learn Deterministic

0 downloads 0 Views 239KB Size Report
form of FFAs can be modeled by deterministic recurrent neural networks. ..... 19] J. Hopcroft and J. Ullman, Introduction to Automata Theory, Languages, and ...
Recurrent Neural Networks Learn Deterministic Representations of Fuzzy Finite-State Automata Christian W. Omlin a, C. Lee Giles b;c a Department of Computer Science, University of Stellenbosch 7600 Stellenbosch, SOUTH AFRICA b NEC Research Institute, Princeton, NJ 08540 USA c UMIACS, U. of Maryland, College Park, MD 20742 USA E-mail: [email protected] [email protected] Abstract The paradigm of deterministic nite-state automata (DFAs) and their corresponding regular languages have been shown to be very useful for addressing fundamental issues in recurrent neural networks. The issues that have been addressed include knowledge representation, extraction, and re nement as well development of advanced learning algorithms. Recurrent neural networks are also very promising tool for modeling discrete dynamical systems through learning, particularly when partial prior knowledge is available. The drawback of the DFA paradigm is that it is inappropriate for modeling vague or uncertain dynamics; however, many real-world applications deal with vague or uncertain information. One way to model vague information in a dynamical system is to allow for vague state transitions, i.e. the system may be in several states at the same time with varying degree of certainty; fuzzy nite-state automata (FFAs) are a formal equivalent of such systems. It is therefore of interest to study how uncertainty in the form of FFAs can be modeled by deterministic recurrent neural networks. We have previously proven that second-order recurrent neural networks are able to represent FFAs, i.e. recurrent networks can be constructed that assign fuzzy memberships to input strings with arbitrary accuracy. In such networks, the classi cation performance is independent of the string length. In this paper, we are concerned with recurrent neural networks that have been trained to behave like FFAs. In particular, we are interested in the internal representation of fuzzy states and state transitions and in the extraction of knowledge in symbolic form.

Keywords: Recurrent neural networks, fuzzy knowledge extraction, automata, languages, nonlinear sys-

tems.

1

1 Introduction There has been an increased interest in combining fuzzy systems with neural networks because fuzzy neural systems merge the advantages of both paradigms (see [4] for a collection of papers). Fuzzy logic [45] provides a mathematical foundation for approximate reasoning; fuzzy logic has proven very successful in a variety of applications [7, 9, 13, 18, 21, 22, 34, 43]. On the one hand, parameters in fuzzy systems have clear physical meanings, and rule-based and linguistic information can be incorporated into adaptive fuzzy systems in a systematic way. On the other hand, there exist powerful algorithms for training various neural network models. It is very rare for such hybrid systems to contain feedback in their structure. In [41], the implementation of a nite-state machine with fuzzy inputs and crisp states is realized by training a feedforward network explicitly on the state transition table using a modi ed backpropagation algorithm. Such implementations are inadequate for modeling systems whose state depends on variables which are not observable. Recurrent neural networks with hidden neurons are able to model such systems through learning. They are particularly well-suited for problem domains where incomplete or contradictory prior knowledge is available. We have previously shown that recurrent networks can be initialized with such prior knowledge; the objective of training networks then becomes that of knowledge revision or re nement [30]. Fuzzy nite-state automata (FFAs) can model a large class of dynamical processes whose current state depends on the current input and previous states. Unlike in the case of deterministic nite-state automata (DFAs), FFAs are not in one particular state; instead, each state is occupied to some degree de ned by a fuzzy membership function. It has been shown that recurrent neural networks can represent DFAs, i.e. they can be trained to behave like DFAs, and a description of the learned knowledge can be extracted in the form of DFAs [6, 8, 10, 12, 15, 16, 28, 36, 40, 42, 46]. Thus, it is only natural to investigate whether recurrent neural networks can also be trained to behave like FFAs and whether a symbolic description can be extracted from trained networks. Fuzzy grammars have been found to be useful in a variety of applications such as digital circuit design [25] and in the analysis of X-rays [35], Neural network implementations of fuzzy automata have been proposed in the literature [17, 41]. The synthesis method proposed in [17] uses digital design technology to implement fuzzy representations of states and outputs. The above mentioned applications demonstrate that fuzzy automata are gaining signi cance as synthesis tools for a variety of problems. 1 Thus, it would be useful to extend the role of recurrent networks as tools for knowledge revision and re nement to problem domains that can be modeled as FFAs and where prior fuzzy knowledge is available. Maclin & Shavlik have demonstrated the power of combining DFAs and neural networks in the area of computational molecular biology (protein folding): They modeled prior knowledge about protein structures as DFAs, initialized a neural network with that prior knowledge, and trained it on sample data. This approach outperformed the best known `traditional' algorithm for protein folding [23, 24]. 1

2

The purpose of this paper is to show that recurrent networks can learn the behavior of FFAs, and that the learned knowledge can be extracted in a symbolic form. The latter objective requires an understanding of how trained networks represent FFAs. We proved in [33] that the computational structure of deterministic recurrent neural networks is in principle rich enough to represent FFAs. We proposed an algorithm for mapping arbitrary FFAs into recurrent networks; the constructed networks assign the correct fuzzy membership to strings of arbitrary length with arbitrary accuracy. The investigation in this paper is a natural extension of that line of work to address learning capabilities. We will demonstrate that the above mentioned FFA encoding algorithm provides a qualitatively correct prediction of the internal FFA representation in trained recurrent networks. As a consequence, empirical and theoretical results on the extraction of DFAs, the solution of the model selection problem, and the results on the use of recurrent neural networks for knowledge re nement all carry over to recurrent networks trained to behave like FFAs. The remainder of this paper is organized as follows: We brie y review fuzzy nite-state automata (FFAs) and their encoding in recurrent neural networks in Section 2. In Section 3, we brie y discuss a recurrent network architecture capable of representing FFAs and present the relevant theoretical result. In Section 4, we report simulation results on training recurrent networks to behave like FFAs. In Section 5, we apply theoretical and empirical results on the extraction of deterministic nite-state automata to the extraction of a symbolic description of FFAs from trained networks. A summary and directions for future work in Section 6 conclude this paper.

2 Fuzzy Finite-State Automata Here, we discuss the relationship between nite-state automata and recurrent neural networks for mapping fuzzy automata into recurrent networks; details can be found in [28, 33]. We begin by de ning the class of fuzzy automata which we are interested in learning: f is a 6-tuple M f =< ; Q; Z; R; e ; ! > where  and De nition 2.1 A fuzzy nite-state automaton (FFA) M e Q are the same as in DFAs; Z is a nite output alphabet, R is the fuzzy initial state,  :   Q  [0; 1] ! Q is the fuzzy transition map and ! : Q ! Z is the output map.

In this paper, we consider a restricted type of fuzzy automaton whose initial state is not fuzzy, and ! is a function from F to Z , where F is a non fuzzy subset of states, called nal states. Any fuzzy automaton as described in De nition 2.1 is equivalent to a restricted fuzzy automaton [11]. Notice that a FFA reduces to a conventional DFA by restricting the transition weights to 1. An example of a FFAs is shown in Figure 1a2. Similarly to regular grammars, fuzzy regular grammars are de ned as follows:

De nition 2.2 A fuzzy regular grammar Ge is a quadruple Ge =< S; N; T; P > where S is the start symbol,

Under the common de nition of DFAs, we have Z = f0; 1g (i.e., a string is either rejected or accepted). We extend that de nition of DFAs by making the string membership function fuzzy: DFA M =< ; Q; R; F; ; Z; ! > where Z is the nite set of all possible string memberships and ! : Q ! Z is a labeling de ned for all DFA states. 2

3

 a or N and T are non-terminal and terminal symbols, respectively, and P are productions of the form A ! A ! aB where A; B  N , a  T and 0 <   1.

Unlike in the case of crisp regular grammars where strings either belong or do not belong to some regular language, strings of a fuzzy language have graded membership:

De nition 2.3 Given a regular fuzzy grammar Ge, the membership grade Ge(s) of a string s in the regular language L(Ge) is the maximum value of any derivation of s, where the value of a speci c derivation of s is equal to the minimum weight of the productions used:

min[Ge(S ! 1 ); Ge( 1 ! 2 ); : : : ; Ge( m ! s)] Ge(s) = Ge(S ) s) = max  S )s

This is akin to the de nition of stochastic regular languages [37] where the min- and max-operators are replaced by the product- and sum-operators, respectively. Both fuzzy and stochastic regular languages are examples of weighted regular languages [38]. However, there are also distinct di erences: In stochastic regular languages, the production rules are applied according to a probability distribution (i.e., the production weights are interpreted as probabilities). If, at any state of string generation, there exist several alternative rules, exactly one of them is applied; there exists no uncertainty about the generated string once the rule has been applied (note also that the production probabilities sum to 1). In fuzzy regular grammars, there is no question whether a production rule is applied; all applicable production rules are executed to some degree (note that the production weights do not need to sum to 1). This leaves some uncertainty or ambiguity about the generated string. Whether to choose stochastic or fuzzy regular grammars depends on the particular application. There exists a correspondence between FFAs and fuzzy regular grammars [11]: f, there exists a fuzzy grammar G e , such that L(M f) = Theorem 2.1 For a given fuzzy fuzzy automaton M

L(Ge).

The obvious correspondence is between non-terminal and terminal symbols and non-accepting and accepting FFA states, respectively, where transitions between states are weighted with the corresponding production weight . The following result is the basis for mapping FFAs into deterministic recurrent neural networks [39]: f, there exists a deterministic nite-state automaton M Theorem 2.2 Given a regular fuzzy automaton M with output alphabet Z  f :  is a production weightg [ f0g which computes the membership function f).  :  ! [0; 1] of the language L(M

An example of FFA-to-DFA transformation is shown in Figure 1b3 . An immediate consequence of this theorem is the following corollary: This algorithm is an extension to the standard algorithm which transforms non-deterministic nite-state automata into equivalent deterministic nite-state automata [19]; unlike the standard transformation algorithm, we must distinguish accepting states with di erent fuzzy membership labels. Another method of decomposing a fuzzy grammar into crisp grammars has been investigated by Zadeh using the concept of level set [44]. 3

4

2 2

0/0.5 0,1/0.3

1

0 0

1/0.5 1/0.2

1

4

0

4

1

1

0.5

0

3

0 0.3 5

1/0.4 0/0.7

1

3

0 6

0

1

1 7

1

0.2

1 (a)

(b)

Figure 1: Transformation of a FFA into its corresponding DFA: (a) A fuzzy nite-state automaton with weighted state transitions. State 1 is the automaton's start state; accepting states are drawn with double circles. Only paths that can lead to an accepting state are shown (transitions to garbage states are not shown explicitly). A transition from state qj to qi on input symbol ak with weight  is represented as a directed arc from qj to qi labeled ak =. (b) corresponding deterministic nite-state automaton which computes the membership function strings. The accepting states are labeled with the degree of membership. Notice that all transitions in the DFA have weight 1.

Corollary 2.1 Given a regular fuzzy grammar Ge, there exists an equivalent grammar G in which productions  a: have the form A !: aB or A ! 10

3 Network Architecture for Fuzzy Automata Theorem 2.2 enables us to transform any FFA into a deterministic automaton which computes the same membership function  :  ! [0; 1]. Various methods have been proposed for implementing deterministic automata in recurrent neural networks [1, 2, 15, 14, 20, 26, 31, 28]. We use discrete-time, second-order recurrent neural networks with sigmoidal discriminant functions which update their current state according to the following equations: X Si(t+1) = h( i (t)) = 1 + e1? (t) ; i (t) = bi + Wijk Sj(t) Ik(t) ; (1) j;k i

where bi is the bias associated with hidden recurrent state neurons Si ; Ik denotes the input neuron for symbol ak , Wijk is a second-order weight, i (t) is the total input to neuron Si at time t, and h() is the sigmoidal discriminant function. The indices i; j and k run over the set of recurrent state neurons and the size of the language alphabet, respectively. The product Sj(t) Ik(t) directly corresponds to the state transition (qj ; ak ) = qi . The recurrent neurons Sj implement the desired nite-state dynamics, i.e. transitions between crisp states. 5

network output neuron fuzzy membership weights

String Membership

recurrent state neurons second−order weights W ijk z

−1

input neurons

Finite State Dynamics

Figure 2: Recurrent Network Architecture for Fuzzy Finite-State Automata: The architecture consists of two parts: Recurrent state neurons encode the state transitions of the deterministic acceptor. These recurrent state neurons are connected to a linear output neuron which computes string membership. We have proven that any deterministic automaton can be encoded in discrete-time, second-order recurrent neural networks with sigmoidal discriminant functions such that the internal state representation remains stable for strings of arbitrary length [28]. The recurrent state neurons Sj connect to a linear output neuron as follows:

=

X

j

Sj vj

(2)

The weights vj are just the memberships assigned to the DFA states after the transformation of a FFA into an equivalent DFA. This augmented, second-order recurrent neural network architecture is shown in Figure 2. We have previously proven the following theorem [33]: f there exists an augmented, second-order neural Theorem 3.1 For any given fuzzy nite-state automaton M

network with sigmoidal recurrent neurons and a single linear output neuron which computes the fuzzy membership RNN (s) for all strings s 2  with arbitrary accuracy, i.e.



Me (s) ? RNN (s) < 

6

.

run # 1 2 3 4 5 6 7 8 9 10

training time (epochs) 5109 9461 3657 8036 4672 2849 6041 10364 8874 6832

average classi cation error l=11 l=12 l=13 l=14 l=15 0.013 0.037 0.052 0.093 0.162 0.008 0.016 0.027 0.041 0.083 0.027 0.051 0.080 0.106 0.193 0.010 0.026 0.041 0.072 0.104 0.016 0.021 0.030 0.057 0.094 0.024 0.062 0.097 0.136 0.179 0.011 0.019 0.026 0.032 0.058 0.004 0.013 0.021 0.029 0.037 0.009 0.020 0.026 0.031 0.044 0.015 0.029 0.035 0.052 0.071

Table 1: Network Generalization Performance: The table shows the results of training and generalization performance of 10 di erent networks with 7 recurrent state neurons. Generalization performance was measured on all strings of length 11 to 15. The fuzzy string memberships were generated by the FFA shown in Figure 1b. Training networks to behave like FFAs is about an order of magnitude slower compared to training networks to behave like DFAs of similar size and complexity. As a side-e ect of attaining a required output accuracy on the training set (here, we chose 0.005), the networks' internal FFA state representation becomes generally more stable compared to networks trained to behave like DFAs, i.e. the clusters that are formed in the output space of recurrent state neurons tend to be tighter leading to better generalization performance when tested on longer strings. The purpose of the next section is to show that augmented recurrent neural networks that are trained on strings with fuzzy membership develop an internal FFA representation according to Theorems 2.2 and 3.1. This will not only validate the prediction made by the network construction algorithm used in Theorem 3.1, but also enable us to rely on established theoretical foundations and algorithms for extracting symbolic descriptions of the learned knowledge in the form of deterministic acceptors.

4 Learning FFAs We trained 10 augmented, second-order, sigmoidal recurrent neural networks with 7 state neurons on the rst 1,000 strings whose fuzzy membership was generated by the FFA shown in Figure ?. All weights were initialized to random values in the interval [?0:1; 0:1]. For training, we used the backpropagation through time learning algorithm for recurrent networks with step size and momentum term equal to 0.5. Empirical and theoretical evidence suggests that the order in which strings are presented as input during the learning phase can have a profound impact on a network's training performance [3, 27]. The order of positive and negative training examples and the length of the training strings are important factors. We have found that the following heuristic for data presentation works well: (1) Positive and negative example strings are alternated; this ensures that the network does not develop an inductive bias toward positive or negative 7

example strings if that bias is not present in the training data. (2) We order the training strings according to length and adopt an incremental learning strategy. Long example strings represent long-term dependencies; because the gradient information vanishes for long strings, learning becomes increasingly inecient as the temporal span of the dependencies increases [3]. Our strategy is to learn in cycles, where each cycle consists of three distinct phases: (1) Train the network on a small subset of the training data set (`working set'); this initial data set will contain the shortest strings only. These strings represent short-term dependencies which may be sucient to infer similar, but longer term dependencies. We refer to one pass through the working set as an epoch. (2) The training cycle ends when the network performs satisfactorily on the working set or when a maximum number of epochs in this cycle have expired. (3) The network is tested on the entire available training set. If the network does not perform satisfactorily on the training set, then a xed number of strings are added to the working set and a new cycle starts. Otherwise, the training algorithm terminates. In general, networks do not need to be explicitly trained on the entire training set, i.e. the size of the nal working set may be a fraction of the size of the entire training set. This serves as evidence that short-term dependencies learned from short strings can indeed infer similar longer term dependencies for longer strings. Training was halted as soon as the network output di ered from the actual string membership by no more than 0.005 for all strings of the training set. The results of 10 simulations are shown in Table 1. We observe that the training times are about an order of magnitude slower for training networks to behave like FFA compared to training networks to behave like DFAs for automata of similar size and complexity. In the latter case, networks only need to make a binary decision accept/reject whereas networks trained on strings with fuzzy membership have to achieve a much higher output accuracy (here, the acceptable error was 0.005). The required accuracy accounts for the slower training. We measured the networks' generalization performance on all strings of length up to 15. We observe that the generalization performance deteriorates with increasing string length, i.e. the output error increases. As in the case of networks trained to behave like DFAs, the internal state representation becomes unstable with increasing string length; this instability results in an increase in output error.

5 Extracting FFAs In order to validate the prediction of Theorem 3.1 that deterministic acceptors are indeed the appropriate internal representation for networks that are trained to behave like FFAs, we analyzed the hidden state space of recurrent neurons after training. We observed that the outputs of recurrent neurons tend to cluster. This is the same observation made for networks trained to behave like DFAs; thus, the internal representation 8

is consistent with that of deterministic acceptors. As a consequence, the theoretical foundations and algorithms for the extraction of deterministic nite-state automata also apply to the extraction of deterministic representations of FFAs. In particular, the following theorem developed for DFA extraction also applies to FFAs [5]:

Theorem 5.1 The state space of a recurrent network modeling a given FFA must have mutually disjoint, closed sets Qi in [0; 1]N . The sets Qi correspond to states qi of the deterministic acceptor of some FFA.

The following theorem asserts that FFA extraction based on an equal partitioning of the output state of recurrent neurons is appropriate [5]:

Theorem 5.2 It is sucient to consider only partitions created by dividing each recurrent neuron's output range into r partitions to always succeed in extracting the deterministic acceptor of some FFA that is being modeled for a suciently large quantization level r.

We applied an extraction algorithm based on a partitioning of the output space of recurrent neurons [29]; other algorithms have been proposed (e.g. [10]). The extraction algorithm divides the output of each of the N state neurons into r intervals of equal size, yielding rN partitions in the space of outputs of the state neurons. We also refer to r as the quantization level. Each partition is an N-dimensional cube with edge length r1 . From some de ned initial network state, a network's trained weights will cause it to follow a trajectory in state space as symbols are presented as input. The algorithm considers all strings in alphabetical order starting with length 1. This procedure de nes a search tree with the initial state as its root; the number of successors of each node is equal to the number of symbols in the input alphabet. Links between nodes correspond to transitions between DFA states. The search is performed in breadth- rst order. When a transition from a partition reaches another (not necessarily di erent) partition, then all paths from that new network state are followed for subsequent input symbols creating new states in the DFA for each new input symbol, subject to the following three conditions: (1) When a previously visited partition is reached, then only the new transition is de ned between the previous and the current partition; no new state is created. This corresponds to pruning the search tree at that node. (2) In general, many di erent network states belong to the same partition. We use the network state which rst reached a partition as the new network state for the subsequent input symbols fa1; a2 ; : : : aK g. (This condition only applies when two or more transitions from a partition to another partition are extracted.) (3) When a transition leads from a partition immediately to the same partition, then a loop is created and the search tree is also pruned at that node. The algorithm terminates when no new states are created and all possible transitions from all states have been extracted. The algorithm assigns fuzzy membership labels to extracted states based on the fuzzy network output for that acceptor state and the set of fuzzy memberships that occur in the training set. For a 9

given state, its label is set to the fuzzy membership occurring in the training set that is closest to the actual network output for that state. We apply a modi ed automaton minimization algorithm to each extracted automaton. The algorithm di ers from the standard algorithm [19] in that the former not only distinguishes between accepting and rejecting states, but also considers the di erent fuzzy memberships with which an automaton accepts strings. Clearly, the extracted automaton depends on the quantization level r chosen, i.e., in general, di erent automata can be extracted for di erent values of r. We solve the model selection problem by choosing the rst automaton that correctly classi es the training set [29]. Furthermore, because of condition (2), di erent automata may be extracted depending on the order in which the successors of a node in the search tree are visited. In general, however, the order in which input symbols are presented is not signi cant because of the subsequent minimization of the extracted automaton. In all of our experiments, we were able to extract the ideal minimized deterministic acceptor (see Figure 1b). Although the fuzzy string memberships in the training set are fairly close to each other (0.0, 0.2, 0.3, and 0.5), we were able to assign the correct labels to each state. We attribute this performance to the shallowness of the state search tree of the extraction algorithm and the accuracy required by the training algorithm. While the required network output accuracy lengthens the training time, it also leads to a more stable internal state representation which in turn bene ts improved network classi cation performance.

6 Conclusions Prior work demonstrated that second-order recurrent neural networks are capable of representing fuzzy nite-state automata when the fully recurrent network architecture is augmented by a linear output layer. The representation used the model of equivalent deterministic acceptors to encode FFAs. In this paper, we have empirically demonstrated that such networks also learn such representations, i.e. networks trained on strings with fuzzy membership  2 [0; 1] also represent the generating automata in the form of deterministic acceptors. This observation implies that the theoretical foundations and algorithms for the extraction of deterministic nite-state automata (DFAs) from networks trained on regular strings with crisp membership and our previous empirical results on model selection also apply to networks trained to behave like FFAs. The drawback of this deterministic representation is that the original fuzzy model of a generating automaton, i.e. states with multiple, weighted transitions for a given input symbol, cannot be reconstructed. It remains an open problem whether other FFA representations exist for this network architecture. We have shown that a recurrent network architecture with a slightly enriched neuron functionality is able to represent the `fuzziness' of state transitions, i.e. the transition weights are also parameters of the network [32]. Whether such networks can be trained to behave like FFAs and whether FFAs can be directly extracted from such networks remains to be seen. Since the representation of FFAs and DFAs is so similar, it is very likely that our results on training recur10

rent networks with prior knowledge - training times for networks which are initialized with prior knowledge improve by a factor which is `proportional' to the amount of correct prior knowledge - also apply to networks trained to behave like FFAs [30]. It would be interesting to apply these methods to a real-world problem where vague prior knowledge is available that can be modeled in the form of FFAs.

References [1] N. Alon, A. Dewdney, and T. Ott, \Ecient simulation of nite automata by neural nets," Journal of the Association for Computing Machinery, vol. 38, no. 2, pp. 495{514, April 1991. [2] R. Alquezar and A. Sanfeliu, \An algebraic framework to represent nite state machines in single-layer recurrent neural networks," Neural Computation, vol. 7, no. 5, p. 931, 1995. [3] Y. Bengio, P. Simard, and P. Frasconi, \Learning long-term dependencies with gradient descent is dicult," IEEE Transactions on Neural Networks, vol. 5, pp. 157{166, 1994. Special Issue on Recurrent Neural Networks. [4] J. Bezdek, ed., IEEE Transactions on Neural Networks { Special Issue on Fuzzy Logic and Neural Networks, vol. 3. IEEE Neural Networks Council, 1992. [5] M. Casey, \The dynamics of discrete-time computation, with application to recurrent neural networks and nite state machine extraction," Neural Computation, vol. 8, no. 6, pp. 1135{1178, 1996. [6] M. Casey, Computation in discrete-time dynamical systems. PhD thesis, Department of Mathematics, University of California at San Diego, La Jolla, CA, 1995. [7] S. Chiu, S. Chand, D. Moore, and A. Chaudhary, \Fuzzy logic for control of roll and moment for a

exible wing aircraft," IEEE Control Systems Magazine, vol. 11, no. 4, pp. 42{48, 1991. [8] A. Cleeremans, D. Servan-Schreiber, and J. McClelland, \Finite state automata and simple recurrent recurrent networks," Neural Computation, vol. 1, no. 3, pp. 372{381, 1989. [9] J. Corbin, \A fuzzy logic-based nancial transaction system," Embedded Systems Programming, vol. 7, no. 12, p. 24, 1994. [10] S. Das and M. Mozer, \A uni ed gradient-descent/clustering architecture for nite state machine induction," in Advances in Neural Information Processing Systems 6 (J. Cowan, G. Tesauro, and J. Alspector, eds.), (San Francisco, CA), pp. 19{26, Morgan Kaufmann, 1994. [11] D. Dubois and H. Prade, Fuzzy sets and systems: theory and applications, vol. 144 of Mathematics in Science and Engineering, pp. 220{226. Academic Press, 1980. [12] J. Elman, \Finding structure in time," Cognitive Science, vol. 14, pp. 179{211, 1990. [13] L. Franquelo and J. Chavez, \Fasy: A fuzzy-logic based tool for analog synthesis," IEEE Transactions on Computer-Aided Design of Integrated Circuits, vol. 15, no. 7, p. 705, 1996. 11

[14] P. Frasconi, M. Gori, M. Maggini, and G. Soda, \Representation of nite state automata in recurrent radial basis function networks," Machine Learning, vol. 23, no. 1, pp. 5{32, 1996. [15] P. Frasconi, M. Gori, M. Maggini, and G. Soda, \Uni ed integration of explicit rules and learning by example in recurrent networks," IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 2, pp. 340{346, 1995. [16] C. Giles, C. Miller, D. Chen, H. Chen, G. Sun, and Y. Lee, \Learning and extracting nite state automata with second-order recurrent neural networks," Neural Computation, vol. 4, no. 3, p. 380, 1992. [17] J. Grantner and M. Patyra, \Synthesis and analysis of fuzzy logic nite state machine models," in Proceedings of the Third IEEE Conference on Fuzzy Systems, vol. I, pp. 205{210, 1994. [18] T. L. Hardy, \Multi-objective decision-making under uncertainty fuzzy logic methods," Tech. Rep. TM 106796, NASA, Washington, D.C., 1994. [19] J. Hopcroft and J. Ullman, Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addison-Wesley Publishing Company, Inc., 1979. [20] B. Horne and D. Hush, \Bounds on the complexity of recurrent neural network implementations of nite state machines," in Advances in Neural Information Processing Systems 6, pp. 359{366, Morgan Kaufmann, 1994. [21] W. J. M. Kickert and H. van Nauta Lemke, \Application of a fuzzy controller in a warm water plant," Automatica, vol. 12, no. 4, pp. 301{308, 1976. [22] C. Lee, \Fuzzy logic in control systems: fuzzy logic controller," IEEE Transactions on Man, Systems, and Cybernetics, vol. SMC-20, no. 2, pp. 404{435, 1990. [23] R. Maclin and J. Shavlik, \Re ning algorithms with knowledge-based neural networks: Improving the Chou-Fasman algorithm for protein folding," in Computational Learning Theory and Natural Learning Systems (S. Hanson, G. Drastal, and R. Rivest, eds.), MIT Press, 1992. [24] R. Maclin and J. Shavlik, \Re ning domain theories expressed as nite-state automata," in Proceedings of the Eighth International Workshop on Machine Learning (ML'91) (L. B. . G. Collins, ed.), (San Mateo, CA), Morgan Kaufmann, 1991. [25] S. Mensch and H. Lipp, \Fuzzy speci cation of nite state machines," in Proceedings of the European Design Automation Conference, pp. 622{626, 1990. [26] M. Minsky, Computation: Finite and In nite Machines, ch. 3, pp. 32{66. Englewood Cli s, NJ: PrenticeHall, Inc., 1967. [27] M. Mozer, \Induction of multiscale temporal structure," in Advances in Neural Information Processing Systems 4 (J. Moody, S. Hanson, and R. Lippmann, eds.), (San Mateo, CA), pp. 275{282, Morgan Kaufmann Publishers, 1992. 12

[28] C. Omlin and C. Giles, \Constructing deterministic nite-state automata in recurrent neural networks," Journal of the ACM, vol. 43, no. 6, pp. 937{972, 1996. [29] C. Omlin and C. Giles, \Extraction of rules from discrete-time recurrent neural networks," Neural Networks, vol. 9, no. 1, pp. 41{52, 1996. [30] C. Omlin and C. Giles, \Rule revision with recurrent neural networks," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 1, pp. 183{188, 1996. [31] C. Omlin and C. Giles, \Stable encoding of large nite-state automata in recurrent neural networks with sigmoid discriminants," Neural Computation, vol. 8, no. 7, pp. 675{696, 1996. [32] C. Omlin, K. Thornber, and C. Giles, \Equivalence in knowledge representation: Automata, recurrent neural networks, and dynamical fuzzy systems," tech. rep., 1998. Submitted. [33] C. Omlin, K. Thornber, and C. Giles, \Fuzzy nite-state automata can be deterministically encoded into recurrent neural networks," IEEE Transactions on Fuzzy Systems, vol. 6, no. 1, pp. 76{89, 1998. [34] C. Pappis and E. Mamdani, \A fuzzy logic controller for a trac junction," IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-7, no. 10, pp. 707{717, 1977. [35] A. Pathak and S. Pal, \Fuzzy grammars in syntactic recognition of skeletal maturity from x-rays," IEEE Transactions on Systems, Man, and Cybernetics, vol. 16, no. 5, pp. 657{667, 1986. [36] J. Pollack, \The induction of dynamical recognizers," Machine Learning, vol. 7, pp. 227{252, 1991. [37] M. Rabin, \Probabilistic automata," Information and Control, vol. 6, pp. 230{245, 1963. [38] A. Salommaa, \Probabilistic and weighted grammars," Information and Control, vol. 15, pp. 529{544, 1969. [39] M. Thomason and P. Marinos, \Deterministic acceptors of regular fuzzy languages," IEEE Transactions on Systems, Man, and Cybernetics, no. 3, pp. 228{230, 1974. [40] P. Tino and J. Sajda, \Learning and extracting initial mealy machines with a modular neural network model," Neural Computation, vol. 7, no. 4, pp. 822{844, 1995. [41] F. Unal and E. Khan, \A fuzzy nite state machine implementation based on a neural fuzzy system," in Proceedings of the Third International Conference on Fuzzy Systems, vol. 3, pp. 1749{1754, 1994. [42] R. Watrous and G. Kuhn, \Induction of nite-state languages using second-order recurrent networks," Neural Computation, vol. 4, no. 3, p. 406, 1992. [43] X. Yang and G. Kalambur, \Design for machining using expert system and fuzzy logic approach," Journal of Materials Engineering and Performance, vol. 4, no. 5, p. 599, 1995. [44] L. Zadeh, \Fuzzy languages and their relation to human and machine intelligence," Tech. Rep. ERLM302, Electronics Research Laboratory, University of California, Berkeley, 1971. 13

[45] L. Zadeh, \Fuzzy sets," Information and Control, vol. 8, pp. 338{353, 1965. [46] Z. Zeng, R. Goodman, and P. Smyth, \Learning nite state machines with self-clustering recurrent networks," Neural Computation, vol. 5, no. 6, pp. 976{990, 1993.

14