Learning Non-deterministic Finite Automata from Queries ... - CiteSeerX

Learning Non-deterministic Finite Automata from Queries and Counterexamples

T. Yokomori Department of Computer Science and Information Mathematics, University of Electro-Communications

1

Introduction

In the recent theoretical research activity of inductive learning, in particular, of inductive inference, Angluin has introduced the model of learning called minimally adequate teacher (MAT), that is, the model of learning via membership queries and equivalence queries, and has shown that the class of regular languages is eciently learnable using deterministic nite automata (DFAs) (Angluin 1987b). More speci cally, she has presented an algorithm which, given any regular language, learns from MAT a minimum DFA accepting the target in time polynomial in the number of states of the minimum DFA and the maximum length of any counterexample provided by the teacher. The MAT learning model is reasonably accepted for the following reasons. First, the limit of the learning capability from only given example data is well-recognized. Actually, Gold shows that the time complexity of learning consistent DFAs from given data is computationally intractable (Gold 1978). Hence, learning models from more than given data are required to study the feasible learnability. On the other hand, there is another motivation for introducing the MAT learning model which comes from a more practical viewpoint. Suppose one wants to construct an expert system (or knowledge system) and (s)he is trying to collect inference rules by interviewing human experts. The problem here is that human experts often do not retain their expert knowledge as a form of rules in a systematic fashion, and thus, no one can expect to obtain those expert rules directly from human experts. However, it is usually possible for experts to provide concrete example (knowledge) derived from those rules to be collected. Hence, it is of crucial importance to nd a method for achieving rule acquisition from a large number of examples through queries and answering. Thus, this requirement nicely meets an interactive learning by the MAT model which is illustrated by Figure 1.1. Following Angluin's work, several extended results about polynomialtime MAT learnability for subclasses of context-free grammars have been 1

2

Query Learning of NFAs

Teacher

counterexamples

Learner

Yes / No

Queries (Expert)

guess

Fig. 1.1.

Gi

( System Software )

An Interactive Learning in MAT Model

reported (Berman and Roos 1987; Ishizaka 1990; Shirakawa and Yokomori 1993). However, the polynomial-time learnability of the whole class of context-free grammars is still open and seems to be negative, which is strongly suggested by a recent result that under a certain cryptographic assumption the class of context-free grammars is not learnable in polynomial time from MAT (Angluin and Kharitonov 1991). On the other hand, it is well recognized that non-deterministic nite automata (NFAs) are useful in many domains for both theoretical and practical reasons. We know, for example, that various theoretical properties of DFAs can be easily proved using the notion of NFAs. From a more practical viewpoint, we can pick up the pattern matching problem as a typical task for which the non-determinism works in a much more elegant manner than the determinism. Let t and p be strings over fa; bg, called text and pattern, respectively. Assuming that p = ab, the problem here is, given an input text t, to check if a text t ends with a pattern p or not. A nite automaton M [p] is usually constructed for this purpose, and it is checked whether t is accepted by M [p] which accepts fa; bg3 p. The non-deterministic version of M [p] is given together with its deterministic version in Figure 1.2. It is seen that an NFA yields a simpler and more natural description for a concept in question than a DFA. More generally, it may be said that human experts often retain their expert knowledge in the form of non-deterministic rules. (In the case above, an NFA description is much easier for human experts to recognize its correctness for their knowledge M [p] than is a DFA description.) Returning to the MAT-learnability issues, a recent study shows that under a certain cryptographic assumption the class of NFAs is not learnable in polynomial time from MAT. More exactly, under the assumption the class is not learnable from MAT in time polynomial in the number of states of a minimum NFA equivalent to the target NFA and the maximum length

3

T. Yokomori

a ,b NFA a

b

b

for M[p ]

a a

b

DFA

a

for M[p ]

b Fig. 1.2.

NFA and DFA for Pattern Matching

of any counterexample (Angluin and Kharitonov 1991). In this chapter we shall show that the class of NFAs is learnable in polynomial time from MAT in the following sense that given any regular language L an algorithm learns an NFA M accepting L in time polynomial in n and `, where n is the number of states of a minimum DFA equivalent to M and ` is the maximum length of any counterexample provided during the learning process. This provides an alternative algorithm for learning regular languages in polynomial time from MAT. The idea used in this chapter is roughly explained in two steps. First, all the necessary states and transition rules are introduced from positive counterexamples. Then, among these transition rules, wrong (incorrect) transitions are removed using negative counterexamples. A learning method based on this idea is employed by Angluin (1987a), Ishizaka (1989, 1990) and in fact, the learning algorithm proposed here is almost immediately obtained as a special case from one of our recent papers (Shirakawa and Yokomori 1993). Using an example, we shall outline the basic idea used in the present chapter. Let L = fam bjm 0g be a target regular language. The goal is to nd an NFA M accepting L in the MAT learning model. Suppose that, in response to the initial conjecture M0 (accepting an empty set) from a learning algorithm, a string w1 = ab is given as the rst (positive) counterexample. Then, we construct the set of candidate states Q1 = f[]; [a]; [ab]g, and set [] and [ab] as the initial and nal states, respectively. For simplicity, let us rename q0 = []; q1 = [a]; q2 = [ab]. By

4


constructing a set of transition rules 1: a q0 ; (r0 :) q0 ! b q0 ; (r01 :) q0 ! a q1 ; (r 0 :) q0 ! b q1 ; (r00 :) q0 ! 00 01 a 0 b a 0 (r02 :) q0 ! q2; (r02 :) q0 ! q2 ; (r10 :) q1 ! q0 ; (r10 :) q1 !b q0; a q1 ; (r0 :) q1 ! b q1 ; (r12 :) q1 ! a q2 ; (r 0 :) q1 ! b q2 ; (r11 :) q1 ! 11 12 a q0 ; (r0 :) q2 ! b q0 ; (r21 :) q2 ! a q1 ; (r 0 :) q2 ! b q1 ; (r20 :) q2 ! 20 21 a 0 b (r22 :) q2 ! q2; (r22 :) q2 ! q2 ; we have the rst conjectured NFA M1 = (fq0 ; q1; q2 g; fa; bg; 1; q0 ; fq2g). Since L(M1 ) generates the set 6+, we expect a (negative) counterexample from MAT, say, w2 = ba. Then, after parsing w2 via M1 we have a transition sequence: q0 !b q1 !a q2 . Then, check if [a](= q1 ) !a [ab](= q2 ) is correct for L or not. This is done making a query that aa 2 L ? Since the a q2 is an incorrect answer is No, we decide that a transition rule (r12 :)q1 ! rule for L and remove it from 1 , making the second conjecture M2 with 2 (= 1 0 fr12 g). This procedure is justi ed by the principle called contradiction backtracing (Angluin 1987a; Ishizaka 1989, 1990; Shapiro 1981), which will be discussed in detail later. For the next (negative) counterexample w3 = bb, using the current conjecture M2 we have a transition sequence : q0 !b q1 !b q2. To check if [a](= q1) !b [ab](= q2 ) is correct for L, we make a query that ab 2 L ? 0 :)q ! Since the answer is Yes, this implies that (r01 0 b q1 is incorrect for L and removed from 2 . Thus, we have the third conjecture M3 with 0 g). 3 (= 2 0 fr01 In a similar way, we see that when negative counterexamples bb; a; abb; 0 ; r0 g; fr g; fr 0 ; r 0 g; bab; bbb; ba are provided in this order, rule sets fr00 02 22 10 11 0 0 fr20 ; r21 g; fr20 ; r21 g; fr22 g are, respectively, removed from the conjecture. As a result, we have a conjecture M 0 = (fq0 ; q1; q2 g; fa; bg; 0 ; q0; fq2 g) with a q ; q ! b q ; q ! a q ; q ! 0 = fq0 ! 0 0 a q1 ; q0 ! 2 1 a q0 ; q1 ! 1 1 b q2g: If we transform this NFA M 0 into an equivalent minimum DFA M = (fp0; p1g; fa; bg; ; p0 ; fp1g) with = fp0 !a p0; p0 !b p1g, it is easy to see that M accepts the target L. Thus, the above procedure works as a learning algorithm to identify an NFA accepting a target L from MAT. This chapter is organized as follows. After providing basic de nitions in Section 2, we formalize the above idea to show that the class of regular languages is polynomial-time learnable from MAT using NFAs in Section 3. A learning algorithm LA for NFAs is rst described, and the worst-case analysis for the time complexity of the proposed algorithm as well as its correctness is then given. The main result provides an alternative ecient MAT learning algorithm for regular languages. Section 4 deals with related topics, in which a comparative analysis of two algorithms (i.e., LA and

T. Yokomori

5

Angluin's one) for learning regular languages is discussed in Section 4.1. We also mention in Section 4.2 a corollary of the main result claiming that there exists a subclass of NFAs which is polynomial-time learnable from MAT, in contrast to the fact that it seems not to be the case for the whole class of NFAs. Further, a practical variant of LA is brie y discussed in Section 4.3. 2

Preliminaries

2.1 De nitions and Notation

We assume the reader to be familiar with the rudiments of formal language theory (see, e.g., Harrison 1978 or Salomaa 1973). For a given nite alphabet 6, the set of all strings with nite length (including zero) is denoted by 63 . (An empty string is denoted by .) lg (w) denotes the length of a string w. 6+ denotes 63 0 fg. A language L over 6 is a subset of 63 . For any strings x, y 2 63 and any languge L over 6, let xnL = fy j xy 2 Lg. A non-deterministic nite automaton (NFA) is denoted by M = (Q; 6; ; p0 ; F ), where Q and 6 are nite sets of states and terminals, respectively, p0 is the initial state in Q, F ( Q) is a set of nal states, and is a nite set of transition rules of the form : p !a q, where p; q are states and a is a terminal from 6. M is deterministic i for each a 2 6 and p 2 Q, there a q is in . For each p; p0 ; q 2 Q and exists at most one q 2 Q such that p ! 3 wa w p0 and p0 ! a q. a 2 6; w 2 6 , de ne p ! p, and p 0! q i p 0! w q; q 2 F g. In particular, For each p 2 Q, let L(p) = fw 2 63 jp 0! L(p0 ), equivalently denoted by L(M ), is called the language accepted by M . Two NFAs M and M 0 are equivalent i L(M ) = L(M 0 ) holds. By jQj(the cardinarity of Q) we de ne the size of an NFA M , denoted by size(M ). M is minimum i size(M ) is minimum, that is, for any NFA M 0 that is equivalent to M , size(M ) size(M 0 ) holds. A language L is regular if there exists an NFA M such that L = L(M ). Since we are concerned with the learning problem of regular languages, without loss of generality, we may restrict our consideration to only -free regular languages. 2.2 MAT Learning

Let L be a target language to be learned over a xed alphabet 6. We assume the following types of queries in the learning process. A membership query proposes a string x 2 63 and asks whether x 2 L or not. The answer is either yes or no. An equivalence query proposes an NFA M and asks whether L = L(M ) or not. The answer is yes or no, and in the latter case together with a counterexample w in the symmetric dierence of L and L(M ). A counterexample w is positive if it is in L 0 L(M ), and negative otherwise.

6


The learning protocol consisting of membership queries and equivalence queries is called minimally adequate teacher (MAT). The purpose of the learning algorithm is to nd an NFA M = (Q; 6; ; p0 ; F ) such that L = L(M ) with the help of the minimally adequate teacher. 3

Main Results

3.1 Learning NFAs

Throughout this section, for a target regular language L, let M3 = (Q3 ; 6; ~ ) be a minimum DFA such that L = L(M3 ). 3.1.1 Introducing new states Given a positive counterexample w of L, let Q(w) be a set of new states consisting of all the pre xes of w, that is, Q(w) = f [x] j w = xy for some y 2 63 g: The following lemma obviously holds. Lemma 1. Let w be in L = L(M3 ). Then, for any state p~ 2 Q3 that w q~ (for some q~ 2 F3 ), there is an appears in the transition sequence p~0 0! [x] in Q(w) such that L(~p) = xnL. 2 3.1.2 Constructing new candidate rules Suppose that we have a conjectured NFA M = (Q; 6; ; p0; F ) at the current stage of learning process, where p0 = []. From the set of new states Q(w) produced above and the current set of states Q, we newly construct the set of new candidate transition rules new as follows: a q j p; q 2 Q [ Q(w ); a 2 6, and at least new = fp ! one of p and q is in Q(w)g: (As seen below, we set M = (Q [ Q(w); 6; [ new ; p0 ; F [ f[w]g) to obtain new conjectured NFA for the next stage.) 3.1.3 Diagnosing the set of transition rules Let M = (Q; 6; ; p0 ; F ) be a conjectured NFA, where p0 = []. Let r be a transition rule p !a q in , where p = [x]; q = [y ]. Then, r is incorrect for L i there exists w 2 63 such that yw 2 L and xaw 2 = L. A rule is correct for L i it is not incorrect for L. Now, given a negative counterexample w0 , the algorithm has to prevent the conjectured NFA from accepting w0 by removing wrong rules. In order to determine such wrong rules, the algorithm calls the diagnosing procedure whenever a negative counterexample is provided. w q (for some q 2 F ) in Let w0 be in 6+ . A transition sequence : p0 0! 0 0 M is denoted by Trans(M; p ; w ).

3 ; p0 F3

0

7

T. Yokomori

The diagnosing algorithm is a modi cation of Shapiro's contradiction Given a transition sequence Trans(M; p0 ; w0 ) as an input, this procedure outputs a transition rule r of M which is incorrect for L, where w0 2 L(M ) 0 L. w q (2 F ), where w0 = aw(a 2 6; w 2 Let Trans(M; p0; w 0) : p0 !a p 0! 3 6 ) and p = [y]. procedure diag(Trans(M; p0 ; w0 )) a q and halts If w0 = a 2 6, then output r : p0 ! else make a membership query yw 2 L? ; if the answer is no then call diag(Trans(M; p; w)) ; a p and halts else output r : p0 ! backtracing algorithm (Shapiro 1981).

We are now ready to prove the correctness of the diagnosing algorithm. Lemma 2. Given a negative counterexample

M, L.

w0

and a conjectured NFA the diagnosing algorithm always halts and outputs a rule incorrect for

It is clear that since the procedure diag is recursively called with a proper sux of the initial input string w0 in Tans(M; p0; w0 ), the u p0 ! a algorithm always halts and outputs some rule in . Let p0 0! w 0 0 p 0! q(2 F ), where w = uaw and p0 = [], p = [x], p = [y], q = [z ]. By de nition, we note that z 2 L. If w0 is a terminal a in 6, then a rule r : p0 !a q is returned, where u = w = and p0 = p0 ; p = q . Since z 2 L and w0 62 L, a rule r is incorrect for L. Assume that a rule r : p0 !a p is returned by this algorithm. Then, since this is the rst time the answer of a membership query is yes, at this moment we have that yw 2 L. Further, from the property of procedure diag, it must hold that xaw 62 L. This implies that a rule r is incorrect for L. 2 3.1.4 Learning algorithm LA The following is a learning algorithm LA for NFAs: Input : a regular language L over xed 6. Output : an NFA M such that L = L(M ); Procedure : initialize M = (fp0 g; 6; ;; p0 ; ;), where p0 = [] ;

Proof.

repeat

make an equivalence query to the current M = (Q; 6; ; p0 ; F ) ;

If the answer is yes then output M and halts else if the answer is a positive counterexample w

8


then introduce the set of new states Q(w) from w;

:= Q [ Q(w); construct the set of new rules new ; := [ new ; F := F [ f[w]g; else (the answer is a negative counterexample w0 ) parse w0 via the conjectured NFA M to obtain the transition sequence Trans(M; p0 ; w0 ); call diag(Trans(M; p0 ; w0 )) to nd an incorrect rule r; := 0 frg A owchart diagram for LA is given in Figure 1.3. Q

3.2 The Correctness and Time Analysis of LA

3.2.1 Correctness For states p = [x] in Q and p~ 2 Q3 , we say that p is well-corresponding to p ~ i xnL = L(~p) holds. A state p is non-corresponding to M3 i there is no state p~ 2 Q3 to which p is well-corresponding. A transition rule p !a q a q~ in 3 i p and q are well-corresponding in is well-corresponding to p~ ! to p~ and q~, respectively. It is clear that every rule well-corresponding to a rule of M3 is correct for L. We show that if a string w0 which is not in L is accepted by the conjectured NFA M , then there exists at least one incorrect rule in , that is, if has no incorrect rule for L, then M accepts no string w0 such that w0 62 L. Lemma 3. Let M = (Q; 6; ; p0 ; F ) be a conjectured NFA and L be a target language. If all of the rules in p = [x] 2 Q, L(p) xnL holds.

are correct for

L,

then for all

Let w be a string such that w 2 L(p). By the induction on the length i of the transition sequence for w, we show that w is also in xnL. Suppose i = 1, that is, p !a q for some q 2 F . Let q = [z ], then z 2 L. Further, suppose xa 62 L, then we have that p !a q is incorrect for L, contradicting the assumption. Therefore, a 2 xnL. Next, suppose that the claim holds for all w0 such that lg(w 0) < lg (w). w q and q 2 F . Then, by the induction hypothesis, Let p !a p0 (= [x0 ]) 0! 0 0 we have that w 2 x nL. Suppose w = aw0 is not a string in xnL. Then, since w0 2 x0nL, we have that the rule p !a p0 is incorrect for L. This contradicts the assumption, and therefore, the lemma holds. 2 Corollary 4. If all of the rules in are correct for L, then L(M ) L. Let M3 = (Q3 ; 6; 3 ; p~0 ; F3 ) be a minimum DFA such that L = L(M3 ). Proof.

0

Lemma 5. The number of positive examples needed to identify a correct NFA by LA is at most jQ3 j (the cardinality of the state set of M3 ).

9

T. Yokomori

begin

M:=M 0

(M 0 : empty NFA )

output M

:= U new

L(M)=L?

construct from w

new

Yes

output M Yes

No

Is c.e. w positive ?

No find out an :=

-{r}

incorrect rule r from w Fig. 1.3.

Flowchart Diagram for LA

Halt

10


Proof. Let M = (Q; 6; ; p0 ; F ) be a conjectured NFA from LA, and suppose a positive counterexample w is given. Then, we claim that at least one new state p well-corresponding to some p~ in Q3 is introduced. For w 2 L, let Q3 (w ) be the set consisting of all the states in Q3 used in the transition sequence of M3 for accepting w. By the nature of LA, if a positive counterexample w is given, then there exists at least one state p not contained in Q but needed to accept w. By Lemma 1, Q(w) contains all states well-corresponding to the states in Q3 (w) that appear in the w q~(2 F3 ). Further, for each new state p in Q(w), transition sequence p~0 0! all the rules containing p are added to the rule set , and only incorrect rules in are removed by the diagnosing procedure. To sum up, whenever a positive counterexample w is given, there exists at least one state p in Q(w) 0 Q that is necessary for accepting w and is well-corresponding to p~. Thus, this justi es the claim and, hence, completes the proof. 2

Note that this lemma implies that after receiving at most jQ3 j positive counterexamples, Q includes sucient number of states to accept the target language L. When jQ3 j positive counterexamples are given, the set of transition rules includes all the rules that are well-corresponding to the ones in 3 . In other words, any string in L is accepted by the conjectured NFA M . Thus, at that time it holds that L L(M ), and hence no more positive counterexample is required and provided. By Lemma 2, each time a negative counterexample is given, one incorrect rule is determined and removed from of the current conjecture M . Further, we know that no correct rule is removed at any stage. Therefore, the number of required negative counterexamples is at most the maximum number of rules of conjectured NFA. By Corollary 4, if all incorrect rules are removed, the resulting conjectured NFA M accepts no string which is not in L, that is, L(M ) L. From these facts described above, we conclude that LA always converges and outputs a correct NFA. Thus, we obtain the main theorem. Theorem 6. For any regular language L, the learning algorithm LA eventually terminates and outputs an NFA M accepting L.

2

3.2.2 Time Analysis We note that for a positive counterexample w, the number of states newly introduced from w in LA is obviously bounded by lg(w), i.e., it holds that jQ(w)j lg(w ). Let ` be the maximum length of any counterexample provided during the learning process. Futher, let n(= jQ3 j) be the number of states of a minimum DFA accepting a target L.

T. Yokomori

11

Lemma 7. The total number jtotal j of all rules introduced in the learning algorithm is bounded by j6jn2 `2 . 2 Proof. Let jQmax j be the maximum number of states of any conjectured NFA. Let fw1; :::; wtg be the set of positive counterexamples provided by LA in the entire process of learning. (Note that, by Lemma 5, t is at most jQ3 j.) Then, from the above observation, jQmaxj Pti=1 lg (wi ) jQ3 j` is obtained. Further, from the manner of constructing , jtotal j j6jjQmaxj2 j6jjQ3 j2 `2 = j6jn2 `2 is obtained. 2

Now, suppose a negative counterexample w0 is given. Then, the learning algorithm LA constructs the transition sequence Trans(M; p0 ; w0 ) by parsing w0 via the conjectured NFA M = (Q; 6; ; p0; F ) at that time. Since there exists an algorithm that constructs a transition sequence for w0 in time proportional to j j lg(w0 )2 , each parsing procedure requires at most jtotal j`2 times. From these observations, the next lemma follows. Lemma 8. The total time complexity of LA is bounded by O(j6j2 n4`6), where n is the number of states of a minimum DFA for a target language, and ` is the maximum length of any counterexample provided.

Proof. It is obvious that the total time complexity of LA is dominated by those of parsing negative counterexamples and of obtaining the transition sequences for them. Each time a negative counterexample is provided, the number of transition rules of the resulting conjecture from LA is reduced by exactly one. Further, by Lemma 7, the number of transition rules of any conjectured NFA M is at most k, where k = j6jn2 `2 . Hence, the total time required for parsing is bounded by 1 k`2 + (k 0 1)`2 + 1 1 1 + `2 = k(k + 1)`2 : 2 2 Thus, we have the following theorem. Theorem 9. The total running time of LA is bounded by a polynomial in

` the maximum length of any counterexample provided during the learning process and n the number of states of a minimum DFA for a target language.

2

12 4


Discussions

4.1 A Comparative Analysis

We will make a comparison of the main results in this chapter with that of Angluin (1987). The time complexity of her algorithm is dominated by the task of constructing the observation table whose size is at most O(n2 `2 + n3 `), and the algorithm makes at most n dierent guesses before terminating with a correct minimum DFA for the target, where n is the number of states of the minimum DFA and ` is the maximum length of any counterexample. As a result, the total time complexity is bounded by O (n3 `2 + n4`). (See Table 7.1.) From these, in the worst-case analysis, Angluin's algorithm has a great advantage over LA presented in this article. (1) Although Table 7.1 shows that LA does not seem better than Angluin's one in the worst case, we do not know whether or not the worst case can really occur in LA. Moreover, we should like to call one's attention to the following fact that there is a subclass of regular languages for which LA in the worst case runs faster than the other in the best case. Let L = fw 2 fa; bg3 jlg (w) 2g(= fa; bg3 0 f; a; bg) be a target language to be learned. As the rst (positive) counterexample, suppose w1 = ab is given. Then, LA produces as the rst guess M1 pictured in Figure 1.4. At this moment, since there are only two negative counterexamples: a and b to M1 , suppose w2 = a is taken for the second counterexample. Then, after diagnosing the transition sequence for w2, the second guess of LA will be M2 . Further, for the third counterexample w3 = b, LA eventually outputs M3 which is correct for L and terminates. (Note that this termination is independent of the choice of counterexamples, i.e., the other choice of w2 = b and w3 = a leads to the same correct NFA M3 .) It is important to note that no membership query is actually needed in this case, and that the choice for w1 is completely arbitrary. Thus, we observe that only three counterexamples have been provided to successfully terminate even in the worst case. On the other hand, Angluin's algorithm works as follows. As the initial guess, using three membership queries, it rst produces M10 accepting the empty set in Figure 1.4. Then, given a positive counterexample w1, the algorithm eventually produces a correct DFA M20 after 16 membership queries and terminates at its best. Thus, for a target language L, LA clearly terminates faster than Angluin's algorithm, provided that each operation involved equally takes one unit time. This shows that the total time performance of the two algorithms strongly depends upon what class of regular languages is targeted. (2) When we look at these two algorithms from the human-machine interface point of view, the most important factor is the response time between conjectures and, in particular, the time complexity the teacher (human) must get involved in. That is, the comparison of the number of

T. Yokomori

13

Table 1.1 Performance Comparison in the Worst-case Analysis

Number of guesses Total time Number of membership queries per revison

DFA learning NFA learning O ( n) O (n 2 ` 2 ) 3 2 4 O(n ` + n `) O (n 4 ` 6 ) O(n`) `

membership queries required per one revison of conjecture shows that our algorithm LA is superior to Angluin's one in this respect. Although the total number of guesses of LA is greater than that of Angluin's one in the worst case, taking the empirical number of guesses into consideration, we believe that the former may provide better human-machine interface for (at least some type of) application systems than the latter. Thus, it is an interesting open problem to investigate the empirical performance of LA over Angluin's one, and such a quantitative analysis will give clearer and more practical comparisons of the two algorithms. 4.2 A Subclass of NFAs

As strongly suggested by a recent result (Angluin and Kharitonov 1991), the whole class of NFAs seems not to be learnable in polynomial time from MAT. This leads us to consider a subclass of NFAs which allows us to have a positive result for learning the subclass. We say that an NFA M is polynomially deterministic if there exists a DFA M 0 equivalent to M such that the number of states of M 0 is bounded by a polynomial in the number of states of M . It is trivially clear that any DFA is polynomially deterministic, and there is an NFA which is not polynomially deterministic. (For example, Ln = fa; bg3 afa; bgn accepted by an NFA with (n + 2) states has no DFA with states less than 2n+1.) Let NFApoly be the class of polynomially deterministic NFAs. Then, we have that DFA NFApoly NFA, where NFA (DFA) denotes the class of NFAs (DFAs). From the main result, we immediately obtain the following proposition. Proposition 10. The class NFApoly is learnable in polynomial time from

MAT.

It remains open whether or not the polynomial-time learnability from MAT can be extended to the whole class NFA, although it is strongly suggested that this is not the case. 4.3 A Practical Variant of LA

In a practical situation in performing the algorithm LA, there may be a problem on the feasibility of equivalence queries because, when compared

14


a,b

a,b

a,b

a,b

a,b a,b a,b a,b a,b

a,b

a,b

a,b

a,b a,b a,b b a,b

a,b

M2

M1

a,b

a,b

a,b

a,b a,b a,b a,b

a,b

M 3 (NFA) a,b

a,b a,b

a,b

M’2 (DFA)

M’1 Fig. 1.4.

Example Runs

T. Yokomori

15

with the membership queries, the equivalence queries cost too much to carry out, and they are sometimes even computationally infeasible. In this respect, it would be very convenient if the equivalence queries could be replaced with an other device. Actually, the algorithm LA can induce in a straightforward manner its modi ed version which is viewed as a learning algorithm on the in the limit basis (Gold 1967) where no equivalence queries are required. That is, we can think of LA as an algorithm LAm which learns a correct NFA in the limit from membership queries and the complete presentation (i.e., positive and negative examples) of a target language. Further, LAm learns any regular language L consistently, conservatively, and responsively in the sense of Angluin (1980), and may be implemented to run in time polynomial in n (the number of states of a minimum DFA for L) and m (the total lengths of all examples provided so far). Note that Angluin's algorithm mentioned above also has its variation of this type. A owchart diagram for LAm is given in Figure 1.5. Proposition 11. The class of regular languages is learnable in the limit

using membership queries by an algorithm LAm . Further, LAm may be implemented to run in time polynomial in n (the number of states of a minimum DFA for a target language) and m (the total lengths of all examples provided so far).

Thus, the algorithm LAm provides a reasonably practical, ecient algorithm for learning the class of regular languages. In a related work (Ishizaka 1989) Ishizaka proposes an algorithm for learning regular languages based on the model inference algorithm. His algorithm learns a correct DFA in a logic program formulation in the limit from membership queries and the complete presentation, and may be implemented to run in time polynomial in n, `, and N (the total number of examples provided so far). Thus, this algorithm is close to LAm but is dierent in that it learns DFAs in a logical formulation. An interesting problem to be investigated is how far we can go along the line of research proposed here in the MAT learning paradigm. In fact, we have obtained a couple of extended results to superclasses of regular languages (Yokomori 1992; Shirakawa and Yokomori 1993). Acknowledgements

The author would like to thank a referee for many constructive suggestions which greatly improved the quality as well as readability of this chapter. This work is supported in part by Grants-in-Aid for Scienti c Research No. 04229105 from the Ministry of Education, Science and Culture, Japan.

16


begin

M:=M 0 R:= 0

:=

U(

new

output M

-R)

request next w construct from w

new

Yes

No

Is w in L(M) ?

Yes

Is w positive ?

No

R:= R U {r} :=

-{ r } find out an incorrect rule r from w

Fig. 1.5.

Yes

Is w in L(M) ?

Flowchart Diagram for LAm

No

T. Yokomori

Bibliography

17

1. Angluin, D. (1980). Inductive inference of formal language from positive data. Information and Control, 45, 117{135. 2. Angluin, D. (1987a). Learning k -bounded context-free grammars. Research Report, 557, Dept. of Computer Sci., Yale Univ. 3. Angluin, D. (1987b). Learning regular sets from queries and counterexamples. Information and Computation, 75, 87{106. 4. Angluin, D. and Kharitonov, M. (1991). When won't membership queries help? Proceedings of 23rd ACM Symposium on Theory of Computing, 444{454. 5. Berman, P. and Roos, R. (1987). Learning one-counter languages in polynomial time. Proceedings of 28th IEEE Symposium on Foundations of Computer Science, 61{67. 6. Gold, E.M. (1967). Language identi cation in the limit. Information and Control, 10, 447{474. 7. Gold, E.M. (1978). Complexity of automaton identi cation from given data. Information and Control, 37, 302{320. 8. Harrison, M.A. (1978). Introduction to Formal Language Theory. Addison-Wesley, Reading, MA. 9. Ishizaka, H. (1989). Inductive inference of regular languages based on model inference. International Journal of Computer Mathematics, 27, 67{83. 10. Ishizaka, H. (1990). Polynomial time learnability of simple deterministic languages. Machine Learning, 5, 151{164. 11. Salomaa, A. (1973). Formal Languages. Academic Press, New York, NY. 12. Shapiro, E. (1981). Inductive inference of theories from facts. Research Report, 192, Dept. of Computer Sci., Yale Univ. 13. Shirakawa, H. and Yokomori, T.(1993). Polynomial time MAT Learning of c-deterministic context-free grammars. Transactions of Information Processing Society of Japan, 34, 380-390. 14. Yokomori, T. (1992). On learning systolic languages. Proceedings of 3rd Workshop on Algorithmic Learning Theory, 41{52.