Fundamenta Informaticae XX (2002) 1–30
1
IOS Press
Residual Finite State Automata François Denis LIF, UMR 6166 CNRS, Université de Provence, Marseille
Aurélien Lemay, Alain Terluttey GRAPPALIFL, Université de Lille I
Abstract. We define a new variety of Nondeterministic Finite Automata (NFA): a Residual Finite State Automaton (RFSA) is an NFA all the states of which define residual languages of the language L that it recognizes ; a residual language according to a word u is the set of words v such that uv is in L. We prove that every regular language is recognized by a unique (canonical) RFSA which has a minimal number of states and a maximal number of transitions. Canonical RFSAs are based on the notion of prime residual languages, i.e. that are not the union of other residual languages. We provide an algorithmic construction of the canonical RFSA from a given NFA. We study the size of canonical RFSAs and the complexity of our constructions.
1. Introduction Regular languages are among the most studied objects in formal language theory. Deterministic finite automata (DFA) and nondeterministic finite automata (NFA) are two basic types of representation of regular languages. DFAs have many good properties: most classical constructions are polynomial and there exists a unique minimal element for a given regular language. Moreover, the MyhillNerode theorem shows that the states of a DFA correspond to natural components of the language it recognizes: its residual languages or Brzozowski derivatives or left quotients (here, we shall use the first name). NFAs are a generalization of DFAs which lose most properties of DFAs but gain concision. For example, the minimal DFA which recognizes the language an , with = fa; bg, has 2n+1 states while a minimal equivalent NFA has only n + 2 states. However, there may exist several nonisomorphic equivalent minimal NFAs and states of NFAs may correspond to no natural component of the language they recognize.
[email protected] y {lemay, terlutte}@lifl.fr
2
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
Both type of representations properties, concision and the fact that they are based on natural components of the associated language, can be necessary within certain application field such as Grammatical Inference. The main goal of Grammatical Inference is to identify some target language from examples of this language, i.e. words together with a piece of information indicating whether it belongs to the language. General machine learning properties show that targets with short representations should be identified from fewer examples than others. On the other hand, inference algorithms try to detect properties of the target language from properties of some of its examples in order to build some representation of it. However, it has been shown that regular languages can be polynomially identified from given data using DFAs representations [7, 8] but that they cannot be identified in the same conditions using NFAs representations. In consequence, languages as simple as an cannot be infered efficiently by inference algorithms using DFAs representations and NFAs representations cannot be used. Hence, it is a natural goal to look for intermediary representations having both kind of properties. In this paper, we consider NFAs all the states of which define residual languages of the language it recognizes. We call Residual Finite State Automata (RFSA) such automata. RFSAs have been introduced in [5].Clearly, all DFAs are RFSAs but the converse is false. We show that we can naturally associate with every regular language L an RFSA which has a minimal number of states and which we call the canonical RFSA of L: each of its states is associated with a prime residual language, i.e. a residual language which is not the union of other residual languages. We provide an algorithmic construction of the canonical RFSA equivalent to a given NFA which stems from the classical subset construction used to build the minimal DFA. We give some results on the size of RFSAs: for example, there are canonical RFSAs exponentially larger (resp. smaller) than equivalent minimal NFAs (resp. DFAs). Then, we study RFSAs over a oneletter alphabet and we show that the gap between the sizes of minimal DFAs and canonical RFSAs is quadratic in the worst case: remind that the gap between the sizes of minimal DFAs and NFAs can be superpolynomial. Finally, we give some complexity results which show that natural constructions and decision problems are PSPACEcomplete. In Section 2, we recall classical definitions and notations about regular languages and automata. We define RFSAs in Section 3 and we study their properties in Section 4. In particular, we introduce the notion of canonical RFSA. The construction of the canonical RFSA using the subset method is given in Section 5. In Section 6, we study some particular (and pathological) RFSAs. RFSAs over oneletter alphabet are studied in Section 7. Section 8 is devoted to the study of the complexity of decision and construction problems on RFSAs. We conclude in Section 9.
2. Preliminaries In this section, we recall some definitions on finite automata. For more information, we invite the reader to consult [9, 14].
2.1. Automata and languages Let be a finite alphabet, and let be the set of words on . We denote by " the empty word and by juj the length of a word u. For an integer n, we define n = fu 2 j juj = ng and n = fu 2 j juj ng. A language is a subset of . A nondeterministic finite automaton (NFA) is a quintuple A = h; Q; Q0 ; F; Æ i where Q is a finite set of states, Q0 Q is the set of initial states, F Q is the set of final states, Æ is the transition function
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
of the automaton defined from a subset of Q to function defined from a subset of 2Q to 2Q by
2Q .
3
We also denote by Æ the extended transition
Æ(fqg; ") = fqg, Æ(fqg; x) = Æ(q; x), where x 2 , Æ(Q0 ; u) = S Æ(fqg; u), where u 2 , 2
q Q0
Æ(fqg; ux) = Æ(Æ(q; u); x),
where x 2 and u 2 .
8 2 8 2 9 2 2 ( ) 2 ( ) i ( )\ =6 ;
Q, x , An NFA is deterministic (DFA) if Q0 contains only one element q0 and if q , q Æ Q0 ; w1 and Card Æ q; x . An NFA is trimmed if and only if q Q, w 1 , Æ q; w2 F w2 . A state q is reachable by the word u if q Æ Q0 ; u . is recognized by an NFA A ; Q; Q0 ; F; Æ if Æ Q0 ; u F and the language A word u Æ Q0 ; u F u . Let Q0 Q. We denote by LA;Q0 the language recognized by A is LA Æ Q0 ; v F v . So, LA LA;Q0 . When Q0 contains exactly one state q , we simply denote LA;Q0 by LA;q . We denote by Re the class of recognizable languages. It can be proved
( ( )) 1 9 2 ( )\ = 6 ; 2 = h = f 2 j ( ) \ 6= ;g f 2 j ( ) \ 6= ;g = ( )
8 2
that every recognizable language can be recognized by a DFA. There exists a unique minimal DFA that recognizes a given recognizable language (minimal with regard to the number of states and unique up to an isomorphism). Finally, the Kleene theorem [10] proves that the class of regular languages Reg ( ) is identical to Re ( ). The reversal uR of a word u is defined inductively by "R = " and (xv )R = v R x for x 2 and v 2 : The reversal LR of a language L is defined by LR = fuR 2 j u 2 Lg. The reversal AR of an NFA A = h; Q; Q0 ; F; Æ i is defined by AR = h; Q; F; Q0 ; Æ R i with q 2 Æ R (q 0 ; x) if and only if q 0 2 Æ (q; x). We have (LA )R = LAR .
2.2. Residual Languages Let L be a language over and let u 2 . The residual language of L with regard to u is defined by u 1 L = fv 2 j uv 2 Lg and we say that u is a characterizing word for u 1 L. The MyhillNerode theorem [11, 12] proves that the set of distinct residual languages of a language L is finite if and only if L is regular. Automata and residual languages are linked by the following properties. Let A = h; Q; Q0 ; F; Æ i be an NFA. For any state q 2 Q and any word u 2 , if q 2 Æ (Q0 ; u), then LA;q u 1 LA : Let A = h; Q; q0 ; F; Æ i be a trimmed DFA.
For every nonempty residual language u u 1 LA . For every state q
1L
A,
there exists a state q
2 Q, there exists a residual language u
1L
A
2
such that u
Q such that LA;q 1L
A
=
= LA;q .
Furthermore, if A is the minimal DFA, the correspondence between states of A and nonempty residual languages of LA is bijective.
4
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
3. Definition of Residual Finite State Automaton Definition 3.1. A Residual Finite State Automaton (RFSA) is an NFA A = h; Q; Q0 ; F; Æ i such that, for each state q 2 Q, LA;q is a residual language of LA . More formally, 8q 2 Q, 9u 2 such that LA;q
=u
1L : A
Trimmed DFAs are RFSAs. So, every regular language is recognized by a RFSA. Example 3.1. Let L = a where = fa; bg. This language is recognized by the following automata A1 , A2 and A3 (Figures 1, 2, 3):
A1 is an NFA which is neither a DFA, nor an RFSA. Languages associated with states are: a , LA ;q LA1 ;q0 , LA1 ;q2 " . As for every u in , we have uL L and so, 1 1 1 L u L, neither LA1 ;q1 nor LA1 ;q2 are residual languages.
=
=fg
A2 is the minimal DFA that recognizes L. A2 is also an RFSA. We have LA2 ;q0 a a LA2 ;q1 a 1 L, LA2 ;q2 " aa 1 L, LA2 ;q3 ab 1 L.
( )
=
= [ =
= [[f g = ( )
A3 is an RFSA. Indeed, we have LA3 ;q0 notice that A3 is not a DFA.
="
1 L, L A3 ;q1
=a
1 L, L A3 ;q2
= a = " 1 L, = a [ f"g =
= (ab)
1 L.
One can
Definition 3.2. Let A = h; Q; Q0 ; F; Æ i be an RFSA, and let q be a state of A. The word u is a characterizing word for q if LA;q = u 1 LA : The automaton A is consistent if each state q is reachable by a characterizing word for q . We say that A is strongly consistent if each state q is reachable by every characterizing word for q . Examples of not consistent RFSA and not strongly consistent RFSA are shown in Figure 4 and Figure 5. If, for every state q of an NFA A, there exists a word uq that only leads to q (that is, such that Æ (Q0 ; uq ) = fq g), then A is an RFSA since LA;q = uq 1 LA . However, next example shows that the converse property is false. Example 3.2. Let L = a b + b a and consider the automaton described in Figure 6. This automaton is a strongly consistent RFSA. But we can observe that there exists no word u such that Æ (Q0 ; u) = fq0 g.
4. Properties of Residual Finite State Automata 4.1. General Properties Definition 4.1. Let L be a regular language over and let u 2 . The residual language u 1 L is prime if it is not equal to the union of the residual languages it strictly contains: let Ru = fv 2 j v 1 L ( u 1 Lg, v 1 L ( u 1 L: u 1 L is prime if
[
2 u
v R
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
} a; b
a
q0
Figure 1.
q
a; b
q1
5
q
q2
A1 is an automaton recognizing a ; it is neither a DFA nor an RFSA.
b
q1
a
q0
a
b
q3
b
Figure 2.
a
a
~ q 2
b
A2 is the minimal DFA recognizing a.
}Y a; b
a
a
q0
a; b
j q1 Y
a; b a
j
q2
a; b
Figure 3.
A3 is the canonical RFSA recognizing a.
Figure 4. Lq0
6= "

q0
q1
a

q2
An RFSA which recognizes L = f"; ag. It is not consistent as the only word which reaches q0 is " but
1 L.

q0
a
6 q1
a,b

q2
Figure 5. An RFSA which is consistent but not strongly consistent.
6
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata

a
b
b
q q1
b
q0

a
a
q2
Figure 6. An RFSA recognizing the language a b + b a .
q q3
A residual language is composite if it is not prime. A state q of an RFSA A is prime (resp. composite) if the residual language LA;q is prime (resp. composite). Note that a prime residual language is not empty and that the set of distinct prime residual languages of a regular language is finite. Proposition 4.1. Let A = h; Q; Q0 ; F; Æ i be an RFSA. For each prime residual language u exists a state q 2 Æ (Q0 ; u) such that LA;q = u 1 LA .
1L
A,
there
Proof: As u 1 LA is prime, Æ (Q0 ; u) is not empty. Let Æ (Q0 ; u) = fq1 ; : : : ; qs g and let v1 ; : : : ; vs be words such that LA;qi = vi 1 LA for every 1 i s. We have u
As u
1L
A
1
LA
=
[s
i=1
is prime, there exists vk such that u
LA;qi
1L
A
=
[s
i=1
vi 1 L A :
= vk 1 LA = LA;qk .
ut
As a corollary, an RFSA A has at least as many states as the number of prime residual languages of LA .
4.2. Saturation operator We define a saturation operator which may add transitions and initial states to an NFA without modifying the language it recognizes. Definition 4.2. Let A = h; Q; Q0 ; F; Æ i be an NFA. The saturated of A is the automaton As = h; Q; Qs0 ; F; Æs i where Qs0 = fq 2 Q j LA;q LAg and Æs (q; x) = fq0 2 Q j xLA;q0 LA;q g for q 2 Q and x 2 . We say that an automaton A is saturated if A = As . Lemma 4.1. Let A1 and A2 be two NFAs over sharing the same set of states Q. If LA1 for every state q 2 Q, LA1 ;q = LA2 ;q , then As1 = As2 .
= LA2 and if
Proof: Let A1 = h; Q; Q0;1 ; F1 ; Æ1 i and A2 = h; Q; Q0;2 ; F2 ; Æ2 i. The state q is initial in As1 iff LA1 ;q LA1 , i.e. iff q is initial in As2 . In the same way, q 0 2 Æ1s (q; x) in As1 iff xLA1 ;q0 LA1 ;q , i.e. iff q 0 2 Æ2s (q; x) in As2 . Eventually, the state q is terminal in As1 iff " 2 LA1 ;q , i.e. iff q is terminal in As2 . ut
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
Proposition 4.2. Let A = h; Q; Q0 ; F; Æ i be an NFA and let As of A. For each q in Q, we have LA;q = LAs ;q .
7
= h; Q; Qs0 ; F; Æs i be the saturated
Proof: Clearly, LA;q LAs ;q as the saturated of an automaton is obtained by adding transitions and initial states. ut The converse inclusion can be proved by induction on the length of the words of LAs ;q . We can then deduce the following corollaries. Corollary 4.1. Let A be an NFA and As be its saturated. Then A and As recognize the same language and As = (As )s . Corollary 4.2. If A is an RFSA, then As is also an RFSA. It could have seemed better to define the saturation operator another way: in order to saturate an NFA, add initial states and transitions as long as this operation does not modify the language recognized by the automaton. Unfortunately, this procedure does not lead to a unique NFA in the general case: in the automaton defined in Figure 4, it is possible to add a transition labelled by a from q1 to q0 and from q0 to q2 without modifying the language but not simultaneously. However, we have the following lemma: Lemma 4.2. Let A be a consistent RFSA, let q; q 0 be two states of A and let x be a letter such that adding the transition (q; x; q 0 ) to A does not modify the language recognized by A. Then, xLA;q0 LA;q . Proof: Let v 2 LA;q0 and let u such that LA;q = u 1 LA and q is reachable by u. As adding the transition (q; x; q0 ) to A does not modify the language recognized by A, the word uxv 2 LA. Then, xv 2 u 1LA = LA;q . Therefore, xLA;q0 LA;q . ut Corollary 4.3. In order to compute the saturated of a consistent RFSA A, the following procedure can be applied: add initial states and transitions as long as the recognized language is not modified. Proof: A state q of A can become an initial state without changing the recognized language if and only if LA;q LA . The result directly comes from this remark and previous lemma. ut There are saturated RFSAs which are not consistent (see Figure 7). Moreover, there are consistent saturated RFSAs which are not strongly consistent (see Figure 8). However, we have the following result. Proposition 4.3. If A is a saturated RFSA and if q is a prime state of A, then for every word u such that LA;q = u 1 LA , q is reachable by u. Proof: If u = " and LA;q
=u
1L
A,
we haveLA;q
LA and as A is saturated, q is initial.
8
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
: 1 q z q1
a; b;
t
q2
x1 ; b t
q0
b;
q5
t
a; b
q3
t
x2 ; b
q4
b; d
a
b
q1w q7
q6
Figure 7. A saturated inconsistent RFSA: the state q3 defines the residual language of bb but it is not reachable by bb.
q0
1 q q1
x1 ; b a
x2 ; b
q2
q3
b; a a a b; d
1 q q4
q5
a a; b
b
q1
q7
q6
Figure 8. A saturated consistent RFSA which is not strongly consistent; aa and bb define the same residual language fa; bg but the state q5 is not reachable by bb.
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
9
Suppose now that LA;q = (ux) 1 LA , where x 2 . As LA;q is a prime residual language, by Proposition 4.1, there exists q 00 2 Æ (Q0 ; ux) such that LA;q00 = (ux) 1 LA . Let q 0 be such that q 0 2 Æ (Q0 ; u) and q 00 2 Æ (q 0 ; x). Let v 2 LA;q . As LA;q = LA;q00 , xv 2 LA;q0 . That is, xLA;q LA;q0 and as A is saturated, q 2 Æ (q 0 ; x). Therefore, q is reachable by ux in A. ut
4.3. Reduction operator We define a reduction operator which may delete states in an NFA without changing the language it recognizes. Definition 4.3. Let A = h; Q; Q0 ; F; Æ i be an NFA, and let q be a state of Q. We denote by R(q ) the set fq 0 2 Q n fq g j LA;q0 LA;q g. We say that q is erasable in A if LA;q = q0 2R(q) LA;q0 . If q is erasable, we define (A; q ) = h; Q0 ; Q00 ; F 0 ; Æ 0 i where:
S
Q0
= Qnfqg, Q00 = Q0 if q 62 Q0 , and Q00 = (Q0 n fq g) [ R(q ) otherwise, T Q0 , F0 = F for every q 0 2 Q0 and every x 2 , ( 0 Æ (q ; x) if q 62 Æ (q 0 ; x) 0 0 Æ (q ; x) = (Æ(q0 ; x) n fqg) [ R(q) otherwise.
If q is not erasable, we define (A; q ) = A.
Note that if A is a saturated NFA and if q is an erasable state in A, then (A; q ) is obtained by deleting q and its associated transitions from A ; no transitions are added. Definition 4.4. Let A be an NFA. If there is no erasable state in A, we say that A is reduced. Proposition 4.4. Let A be an NFA, q a state of A and A0 have LA;q0 = LA0 ;q0 . As a consequence, LA = LA0 .
= (A; q). For every state q0 2 Q n fqg, we
Proof: If q is not an erasable state, the proposition is straightforward. Suppose now that q is an erasable state. Let A = h; Q; Q0 ; F; Æ i and (A; q ) = A0 = h; Q0 ; Q00 ; F 0 ; Æ 0 i. We first prove by induction on n that, for every state q 6= q and every integer n,
\ n LA0;q \ n: For n = 0, if " 2 LA;q, then q 2 F and as q 6= q , q 2 F 0 and " 2 LA0 ;q. Let u = xu 2 LA;q where u 2 n, x 2 and let q1 2 Q such that q1 2 Æ(q; x) and Æ(fq1 g; u) \ F 6= ;. LA;q
10
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
2 Q0, we can apply the inductive hypothesis: u 2 LA0;q1 and as q1 2 Æ0 (q; x), we have
If q1
Otherwise, q1 = q and there exists a state q1 6= q such that LA;q1 LA;q and u 2 LA;q1 . Using 2 LA0;q1 and as q1 2 Æ0 (q; x), we have u 2 LA0;q. the inductive hypothesis again, we have u
u
2 LA0 ;q.
So, we have shown that for every state q 6= q , LA;q LA0 ;q. We prove now by induction that for every state q 6= q and every integer n, LA0 ;q
\ n LA;q \ n
For n = 0, if " 2 LA0 ;q, then q 2 F 0 F and then " 2 LA;q. 2 LA0;q where u 2 n, x 2 and let q1 2 Q0 such that q1 2 Æ0 (q; x) and Æ0 (fq1 g; u)\ Let u = xu 0 F 6= ;. We can apply the inductive hypothesis: u 2 LA;q1 . If q1 2 Æ(q; x), we directly have u 2 LA;q. q ; x). We also have u 2 LA;q. Otherwise, we have LA;q1 LA;q and q 2 Æ ( So, we have shown that for every state q 6= q , LA0 ;q = LA;q. We show that A and A0 recognize the same language by studying the two following cases.
62 Q0, we have LA = Sq02Q0 LA;q0 = Sq02Q0 LA0;q0 = Sq02Q00 LA0;q0 = LA0 . If q 2 Q0 , we have [ LA;q0 LA = q0 2Q0 [ = ( LA;q0 ) [ LA;q q0 2Q0 ;q0 6=q [ [ = ( LA;q0 ) [ ( LA;q) q0 2Q0 ;q0 6=q q2R(q ) [ [ = ( LA0 ;q0 ) [ ( LA0 ;q) q0 2Q0 ;q0 6=q q2R(q ) [ = LA0 ;q0 If q
=
2
q0 Q00
LA0 :
The reduction operator does not change the language recognized by the automaton.
ut
Corollary 4.4. The reduction operator is an internal operator in the class of RFSAs. We shall now show that saturation and reduction operators commute. Lemma 4.3. Let A = h; Q; Q0 ; F; Æ i be an NFA and let q be a state of Q. Then the automaton (As ; q ) is saturated.
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
11
Proof: Let A = h; Q; Q0 ; F; Æ i; As = h; Q; Qs0 ; F; Æ s i; (As ; q ) = h; Q0 ; Q00 ; F 0 ; Æ 0 i and let L be the language recognized by these three automata. Let q0 2 Q0 = Q n fq g. If L(As ;q);q0 L(As ;q) then LAs ;q0 LAs from Proposition 4.4. Since As is saturated, we have q0 2 Qs0 and as Q00 = Qs0 n fq g, we have q0 2 Q00 . Let x 2 and q 0 ; q 00 2 Q0 be such that xL(As ;q);q00 L(As ;q);q0 . Then, xLAs ;q00 LAs ;q0 from Proposition 4.4. Since As is saturated, we have q 00 2 Æ s (q 0 ; x) and since Æ s (q 0 ; x) n fq g Æ 0 (q 0 ; x), we have q 00 2 Æ 0 (q 0 ; x). Thenrefore (As ; q ) is saturated.
ut
Proposition 4.5. Let A = h; Q; Q0 ; F; Æ i be an NFA recognizing a regular language L and q a state of Q. We have
[(A; q)℄s = (As ; q)
Proof: We can observe that (A; q ) and (As ; q ) have the same set of states. Furthermore, languages associated with every state q 0 are identical in the two automata because of Propositions 4.2 and 4.4. Because of Lemma 4.1, the saturated of these automata are isomorphic, therefore [(A; q )℄s = [(As ; q )℄s . But as (As ; q ) is a saturated automaton by Lemma 4.3, the proposition is proved. ut
4.4. Canonical RFSA Definition 4.5. Let L be a regular language. We define the canonical RFSA A of L the following way: A = h; Q; Q0 ; F; Æ i where
is the alphabet of L, Q is the set of prime residual languages of L, so Q = fu 1 L j u 1 L is prime g, its initial states are prime residual languages included in L, so Q0 = fu 1 L 2 Q j u 1L Lg, its final states are prime residual languages containing the empty word, so F = fu 1 L 2 Q j " 2 u 1 Lg, its transition function is defined by Æ(u 1 L; x) = fv 1 L 2 Q j v 1L (ux) 1 Lg, for u 1 L 2 Q and x 2 . Example 4.1. An example of canonical RFSA is shown in Figure 3. This definition assumes that the canonical RFSA is an RFSA ; we shall prove this presumption below. We have showed that the reduction operator transforms an RFSA into an RFSA, and that it commutes with the saturation operator. We shall now show that, if A is a saturated RFSA, the reduction operator converges and that the resulting automaton is the canonical RFSA of LA .
12
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
Proposition 4.6. Let L be a regular language. If A = h; Q; Q0 ; F; Æ i is a reduced saturated RFSA recognizing L, then A is (isomorphic to) the canonical RFSA of L. Proof: As A is an RFSA, every prime residual language u 1 L of L can be defined as a language LA;q associated with some state q 2 Q (Proposition 4.1). As there are no erasable states in A, for every state q , LA;q is a prime residual language and distinct states define distinct languages. As A is saturated, prime residual languages contained in L correspond to initial states of Q0 . As A is saturated, for a prime residual language u 1 L and a letter x 2 , we have Æ (u 1 L; x) = fv 1 L 2 Q j x(v 1 L) u 1 Lg = ut fv 1 L 2 Q j v 1L (ux) 1 Lg which is the transition function of the canonical RFSA. Let A0 ; : : : ; An be a sequence of NFAs such that for every index 1 i n, there exists a state qi of Ai 1 such that Ai = (Ai 1 ; qi ). Propositions 4.5 and 4.6 show that if A0 is a saturated RFSA and if An is reduced, then An is the canonical RFSA of the language recognized by A0 . Theorem 4.1. The canonical RFSA of a regular language L is a strongly consistent RFSA which recognizes L and is minimal regarding to the number of states. Moreover, the canonical RFSA possesses a maximal number of transitions. Proof: It is an RFSA that recognizes L since it can be obtained from any RFSA that recognizes L using saturation and reduction operators, and since these two operators do not change either the language recognized by the automaton or the fact that the automaton is an RFSA. It possesses a minimal number of states because of Proposition 4.1 and it is strongly consistent from Proposition 4.3. It has a maximal number of transitions from Corollary 4.3. ut Several transitions of the canonical RFSA may be redundant as Example 4.3 shows. Unfortunately, there may exist several non isomorphic RFSAs having as many states as the canonical RFSA and a minimal number of transitions, i.e. such that no transition is redundant. However, it is possible to describe a procedure that rules out some redundant transitions in a systematic way. Definition 4.6. Let L be a regular language. For any set of distinct residual languages R of L, define max(R) = fL0 2 R j 8L00 2 R; L0 L00 ) L0 = L00 g. Let A = h; Q; Q0 ; F; Æ i be the canonical RFSA which recognizes L. The simplified canonical RFSA of L is the automaton A0 = h; Q; Q00 ; F; Æ 0 i where Q00 = max(Q0 ) and Æ 0 (q; x) = max(Æ (q; x)), for q 2 Q and x 2 . It can be shown that the simplified canonical RFSA of L is an RFSA which recognizes L. Clearly, every regular language admits a unique simplified canonical RFSA. Example 4.2. Let = fa; bg. The simplified canonical RFSA for a has 4 transitions less than the canonical RFSA and is the minimal RFSA with respect to number of states and transitions. However, it may happen that several non isomorphic RFSAs have as many states as the simplified canonical RFSA but less transitions.
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
} b
a
a
q0
j q1 Y
a; b a
13
j
q2
b
Figure 9.
The simplified canonical RFSA for a.
Example 4.3. Consider the language L = faa; ab; ba; b ; b;
; da; ea; eb; e g. With regard to the number of states, the minimal RFSAs recognizing L have 6 states. The canonical RFSA has 17 transitions (see Figure 10). The simplified canonical RFSA has 14 transitions as a 1 L = d 1 L and b 1 L = d 1 L. There are three nonisomorphic RFSAs with 13 transitions as e 1 L = a 1 L [ b 1 L = a 1 L [ 1 L = b 1 L [ 1 L and none with less transitions.
* : z j q1
a; e b; e
q0
; e
a; b; d; e
q2
q3
a; b a;
b;
z:j *
q5
a
q4
Figure 10. A canonical RFSA with 17 transitions. The simplified canonical RFSA has 14 transitions. There are three equivalent nonisomorphic RFSAs with 6 states and 13 transitions.
5. Construction of the canonical RFSA using the subset method In the previous section, we described a way to build the canonical RFSA from a given DFA using saturation and reduction operators. Starting from an NFA, this method requires to build an equivalent DFA and to check whether residual languages are composite. These checks can be very expensive, even for simple automata. In this section, we present another method which stems from a classical construction of the minimal DFA of a language and which is easier to implement. The subset construction is a classical method used to build a DFA equivalent to a given NFA. Let A = h; Q; Q0 ; F; Æ i be an NFA. The method consists in building the set of reachable sets of states of A.
14
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
We denote by QR(A) the set fp 2 2Q D (A) = h; QD ; QD0 ; FD ; ÆD i with
j 9u 2 s.t. Æ(Q0 ; u) = pg and we define the subset automaton
= QR(A) QD0 = fQ0 g FD = fp 2 QD j p \ F 6= ;g ÆD (p; x) = fÆ (p; x)g if Æ (p; x) 6= ; and ; otherwise, for p 2 QD and x 2 : The language associated with a state p in the automaton D (A) isSthe union of the languages associated with the states that compose p in the automaton A, i.e. LD(A);p = q2p LA;q . The automaton D (A) is a deterministic trimmed automaton that recognizes the same language as A. QD
We remind that the reversal of a language L (resp. of an NFA A) is denoted by LR (resp. AR ). The following result provides a method to build the minimal DFA of L. Theorem 5.1. [1] Let L be a regular language and B an NFA such that B R is a DFA that recognizes LR . Then D (B ) is the minimal DFA recognizing L.
We can deduce from this theorem that for an NFA A, D (D (AR )R ) is the minimal DFA recognizing the language LA . We adapt the subset construction technique to deal with inclusions of sets of states. Let A = h; Q; Q0 ; F; Æi be an NFA. We say that p 2 QR(A) is coverable if there exist p1; : : : ; pl 2 QR(A) n fpg, such that p = li=1 pi . We define the automaton C (A) = h; QC ; QC 0 ; FC ; ÆC i with
S
= QC 0 = FC = ÆC (p; x) =
fp 2 QR(A) j p is not coverable g fp 2 QC j p Q0g fp 2 QC j p \ F 6= ;g fp0 2 QC j p0 Æ(p; x)g for any p 2 QC and x 2 : Let A be an NFA. The automaton C (A) is an RFSA recognizing LA whose all states are QC
Lemma 5.1. reachable.
Proof: The automaton C (A) can be obtained from the DFA D (A) in three steps, by using operations like saturation and reduction defined in Section 4. The states of D (A) are associated with residual languages of L. We only have to verify that the transformations do not change these residual languages. Let A = h; Q; Q0 ; F; Æ i be an NFA. Let D (A) = h; QD ; QD0 ; FD ; ÆD i.
Build A1 = h; QD ; QDC 0 ; FD ; ÆD i where QDC 0 = fp 2 QD j p Q0 g. The new initial states p verify LD(A);p LD(A);QD0 L. This does not change the language. Build A2 = and x 2 .
h; QD ; QDC 0; FD ; ÆA2 i where ÆA2 (p; x) = fp0 2 QD j p0 Æ(p; x)g for p 2 QD
We have LD(A);p0 = LA;p0 LA;Æ(p;x) = LD(A);ÆD (p;x) and then xLD(A);p0 LD(A);p . Thus, as in Proposition 4.2, the languages associated with states are not changed: LA2 ;p = LD(A);p for p 2 QD .
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
15
Build C (A) = h; QC ; QC 0 ; FC ; ÆC i by removing coverable states from A2 .
Let p00 be a coverable state and (p; x; p00 ) be a transition leading to p00 in A2 . Let p001 ; : : : ; p00l be such that p00 = li=1 p00i . By the previous step, p00i 2 ÆA2 (p; x) for every 1 i l.
S
S
We also have xLA2 ;p00 = x( li=1 LA2 ;p00i ). So we can remove the transition (p; x; p00 ) from A2 without changing the language LA2 ;p .
When all transitions leading to p00 are removed, we can remove p00 itself ; if p00 is an initial state, each p00i belongs to QDC 0 and the language LA2 is not changed. When all coverable states are removed, we obtain the automaton C (A).
ut
Theorem 5.2. Let L be a regular language and let B be an NFA such that B R is an RFSA recognizing LR whose all states are reachable. Then C (B ) is the canonical RFSA recognizing L. In order to prove this theorem, we introduce some lemmas. Lemma 5.2. Let B = h; QB ; Q0 ; F; Æ i be an NFA such that B R is a trimmed RFSA. Let q v 2 such that LB R ;q = (v R ) 1 LR B , let p 2 QR(B ) . Then v 2 LB;p if and only if q 2 p. Proof: Let u be such that p = Æ (Q0 ; u); thus LB;p We have q
2p , , , , ,
=u
2 QB and
1L
B.
2 Æ(Q0 ; u) u 2 LB R ;q = (v R ) v R uR 2 LR B uv 2 LB v 2 u 1 LB = LB;p :
q
R
1 R
LB
ut Lemma 5.3. Let B = h; QB ; Q0 ; F; Æ i be an NFA such that B R is a trimmed RFSA. For every p, p0 2 QR(B ) , we have LB;p LB;p0 if and only if p p0 .
S
S
Proof: If p p0 , then LB;p = q2p LB;q q2p0 LB;q = LB;p0 . Conversely, let q 2 p. As B R is an RFSA, there exists a word v such that LB R ;q = (v R ) 1 LR B . From Lemma 5.2, we know that v is in LB;p . As LB;p LB;p0 , v also belongs to LB;p0 and, using Lemma 5.2, we have q 2 p0 . ut
S
S
Lemma 5.4. Let B be an NFA such that B R is a trimmed RFSA. For every p, p1 , p2 : : : pn LB;p = 1kn LB;pk is equivalent to p = 1kn pk .
2 QR(B) ,
16
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
S
S
Proof: It is obvious that p = 1kn pk implies LB;p = 1kn LB;pk . Suppose that LB;p = 1kn LB;pk . Then LB;pk LB;p for every 1 k n. Using Lemma 5.4, we have, pk p for all k and so, 1kn pk p. Let q 2 p. Since B R is a trimmed RFSA, there exists a word v such that LB R ;q = (v R ) 1 LR B . From Lemma 5.2, we have v 2 LB;p . As LB;p = 1kn LB;pk , there exists an index k such that v 2 LB;pk . ut From Lemma 5.2 again, we have q 2 pk . So p 1kn pk .
S
S
S
S
Proof: [Proof of Theorem 5.2]
Reachable sets of states of B correspond to residual languages of L and Lemma 5.4 shows that composite residual languages correspond to coverable sets of states. So, p 2 QC if and only if LB;p is a prime residual language of L and QC can naturally be identified with the set of states of the canonical RFSA. Due to Lemma 5.3, we also verify that LB;p = u 1 L implies p = Æ (Q0 ; u) ; the inverse is obvious.
=f 2 j g = =f 2 j g FC = fp 2 QC j p \ F 6= ;g. We have " 2 LB;p if and only if there exists qi 2 p \ F . So FC = fp 2 QC j " 2 LB;p g is the set of final states of the canonical RFSA. For each state p 2 QC , let up 2 be such that Æ(Q0 ; up) = p. Clearly, up 1 L = LB;p. For any p; p0 2 QC and x 2 , Æ (p; x) = Æ (Q0 ; up x) 2 R(B ) and p0 2 ÆC (p; x) iff p0 Æ (p; x): As LB;p0 = up01 L and LB;Æ(p;x) = (up x) 1 L, we can deduce from Lemma 5.4 that p0 Æ (p; x) iff up01 L (up x) 1 L: QC 0 p QC p Q0 . Lemma 5.3 tells us that p Q0 is equivalent to LB;p LB;Q0 L. p QC LB;p L corresponds to the set of initial states of the canonical RFSA. So QC 0
So, the transition functions are equivalent.
ut
We can deduce from Lemma 5.1 and Theorem 5.2 that for any NFA A, C (C (AR )R ) is the canonical RFSA of LA . The simplified canonical RFSA can be computed from the canonical RFSA by using the operator max. However, there exists a similar construction which allows to obtain it directly from any NFA. Let C 0 (A) = h; QC ; QC 0 0 ; FC ; ÆC 0 i with
= QC 0 0 = FC = ÆC (p; x) = QC
fp 2 QR(A) j p is not coverable g fp 2 QC j p Q0 and 6 9p0 2 QC s.t. p ( p0 Q0g fp 2 QC j p \ F 6= ;g fp0 2 QC j p0 Æ(p; x) and 6 9p00 2 QC s.t. p0 ( p00 Æ(p; x)g; for p 2 QC and x 2 :
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
17
The simplified canonical RFSA is obtained by C 0 (C 0 (AR )R ). Example 5.1. Figures 11, 12, 13 and 14, shows the construction of the canonical RFSA recognizing the language a2 with = fa; bg by using the subset construction. The automaton A recognizes a2 and the reversal automaton AR is deterministic. The first steps of the construction of C (A) are represented on Figure 12; they are identical to steps in the classical subset construction. The state 012 (resp. 013) is coverable with 01 and 02 (resp. 01 and 03). Since the states 012 and 013 are coverable, it is not necessary to build the next states. On the Figure 13, the coverable states 012 and 013 have been removed. Transitions reaching the state 012 (resp. 013) have been redirected to the states 01 and 02 (resp. 01 and 03). We obtain an RFSA which recognizes the language a2 and which is by accident the simplified canonical RFSA too. Finally, the canonical RFSA C (A) is obtained by saturation. The state 0 being included in each other state, every transition which reaches a state, has also to reach the state 0. Note that, as in the deterministic case, this construction may produce cumbersome intermediate automata; indeed, it is possible to find examples for which C (AR ) has an exponential number of states with regard to the number of states of A or C (C (AR )R ). Thus in the worst case, this algorithm is exponential with regard to the size of the canonical RFSA. This situation can be observed with the reversal of the automaton used in Proposition 6.2.
6. Results on size of RFSAs We classically take the number of states of an automaton as a measure of its size. It can be argued that the number of states of an automaton is suitable for DFAs but not for NFAs since the number of transitions in the latter can be quadratic with regard to the number of states. However, our results show the existence of exponential or superpolynomial gaps between the number of states of particular NFAs, RFSAs and DFAs. These results imply similar gaps between number of transitions. The size of a canonical RFSA is bounded by the size of the equivalent minimal DFA and by the size of one of its equivalent minimal NFAs. We show that both bounds can be reached despite the fact that there is an exponential gap between them. Proposition 6.1. There exist languages for which the minimal DFA has a size exponentially larger than the size of the canonical RFSA, and for which the canonical RFSA has the same size as the size of a minimal NFAs. Proof: Consider the languages Ln = an , where n is an integer and = fa; bg. It is well known that minimal NFAs for Ln have n + 2 states and that Ln has 2n+1 distinct residual languages. It is easy to verify that only n + 2 of them are prime: " 1 Ln and (abi ) 1 Ln for 0 i n.
ut
Proposition 6.2. The size of the canonical RFSA of a language L can be exponentially larger than the size of a smallest NFA recognizing L.
18
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
 0 a; b
a
1 012 s 01 k
a; b
s
2 013 s 02
a; b
s
Figure 11. A is an NFA recognizing a2 ; AR is deterministic.
 0 k
a
b
a
b
a
3
s
03
b
s
a; b
s
a
b
Figure 12. First steps of the construction of C (A).
 0 k b
a
a
s 01 k
a; b a
02
s
a
03
b
Figure 13. The coverable states have been removed and transitions are redirected; this automaton is an RFSA.
a; b
 0 k a; b
s 01 k a
a
a; b
a; b a
s 02 a
a; b
03
s
a; b Figure 14. The canonical RFSA C (A) recognizing a2 is obtained by saturation.
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata

b
a
q0
b
a
Y
j
q1
a
b
b
q3
a
19
j q2
Figure 15. Automaton A4 whose canonical RFSA is exponentially larger than A4 .
Proof: We can verify this proposition on automata An
= fa; bg, Q = fqi j 0 i n 1g, Q0 = fqi j 0 i < n=2g, F = fq0 g. Æ(qi ; a) = qi+1 for 0 i < n 1, Æ(qn and Æ (q1 ; b) = qn 1 .
= h; Q; Q0 ; F; Æi defined by
) = q0, Æ(q0 ; b) = q0, Æ(qi ; b) = qi
1; a
1
for 1 < i < n
The automaton A4 is represented in Figure 15. The reversal of An is trimmed and deterministic, thus we can apply Theorem 5.2. The automaton C (An ) is the canonical RFSA. The initial state in the subset construction has dn=2e elements. The reachable sets of states are all sets of states with dn=2e elements. So, none of them is coverable. Therefore, the canonical RFSA C (An ) has a size exponentially larger than the size of the initial ut NFA. Every nonempty residual language of a regular language L has a minimal characteristic word whose length is bounded by the number of states of the minimal DFA. Next proposition shows that this is no longer true if we consider RFSA. Proposition 6.3. There exist regular languages for which the smallest characterizing word for some residual language is longer than any polynomial in the number of states of the canonical RFSA.
20
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
 i 7
b2 ; b 3
a; b2 ; b3
q
q02
a; b2
q12
b3
K
b2 ; b 3
b2 ; b 3
q q13
a; b2 ; b3
q03
b2
a; b3
)
a; b2 ; b3
q23
Figure 16. Automaton AP for P
= f2; 3g.
Proof: Let P = fp1 ; : : : ; pn g be a set of n distinct prime numbers. h; Q; Q0 ; F; Æi by:
Let us define the automaton AP
= fag [ fbp j p 2 P g, Q = fqjp j p 2 P; 0 j < pg, Q0 = fq0p j p 2 P g, F = Q0 and
(p ) p Æ (qj ; bp0 ) p Æ (qp 1 ; bp0 ) See Figure 16 for P = f2; 3g. Æ qj ; a
= fq(pj+1)mod pg = fqjp; qjp+1g = fq0p0 g
0 j < p; p 2 P 0 j < p 1; p; p0 2 P for p; p0 2 P for
for
=
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
21
= p1 p : : : pn and let uij = aN 1bi aj for 1 i n and 0 j < pi. We can check that Æ (Q0 ; uij ) = fqj i g . Therefore, LAP ;qpi = uij1 L and AP is an RFSA. j Let 1 i n, 0 j < pi and 1 k n, 0 l < pk . If pi j 6= pk l, then api j 2 LAP ;qpi n LAP ;qpk . j l If pi j = pk l, then a2pi j 2 LAP ;qpi n LAP ;qpk . j l Let N
Therefore, all residual languages LAP ;qpi are different, none of them is included in another one and j AP has the same number of states as the canonical RFSA.
We can check that for every u 2 1. So, the smallest word uq such that LAP ;q = uq 1 L has a length juq j N . Now, let f be some polynomial. We can choose different prime numbers p1 ; : : : ; pn such that p1 : : : pn > f (p1 + : : : + pn ) = f (jAP j). ut Next proposition shows that the simplified canonical RFSA can have far less transitions that the canonical RFSA. Proposition 6.4. The number of transitions of the canonical RFSA of a language L can be quadratic wrt the number of transitions of the equivalent simplified canonical RFSA. Proof: We can verify this proposition on languages Ln = a an where n 2 N . It is easy to verify that the number of transitions of the simplified canonical RFSA of L is NQ = n +1 2 + 3N )=2 and that the number of transitions of the canonical RFSA is (NQ 1. Q For n = 3, the canonical RFSA and simplified canonical RFSA are represented in Figures 18 and 19.
ut
7. RFSAs over a oneletter alphabet Here, we consider the case where the underlying alphabet possesses only one letter and we compare the state complexity of DFAs and RFSAs. Such a study has already been done for DFAs and NFAs in [2] where it has been shown that the minimal DFA equivalent to a given NFA with m states could pm log m ) states. Here we show that the number of states of the minimal DFA over a oneletter have (e alphabet is at most quadratic in terms of the number of states of the equivalent canonical RFSA. In all Section 7, we suppose that = fag, L is a nonempty regular language over and A = h; Q; q0; F; Æi is the trimmed minimal DFA which recognizes L. Let us set nL = jQj 1. Let us define qi = Æ (q0 ; ai ) and for sake of simplicity, let Li = LA;qi , for 0 i nL . If L is infinite, Æ (q0 ; ai ) 6= ; for any integer i and previous notations can be extended: for every integer i, let qi = Æ (q0 ; ai ) and Li = LA;qi . Let mL be the smallest index such that qmL = Æ (qnL ; a) and let d = n mL + 1 be the length of the loop (see Figure 20). Note that d > 0 and that for all i n1 and every nonnegative integer k , qi+kd = qi .
22
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
 0
1
a
a
a
s
2
s
a
3
s
Figure 17. An NFA A which recognizes a a3 ; AR is deterministic.
 0 }k
a
a
s 01 }k
a a
a
a a
a
s 012 k
a
a a
s 0123
a
a
Figure 18. The canonical RFSA C (A) which recognizes a a3 .
 0
a
s
a
01
s 012
a
s 0123
Figure 19. The simplified canonical RFSA C 0 (A) which recognizes a a3 .


q0
a
q1
a
.....
a
 qmL =
a
a
a
.....
a
 qnL
Figure 20. A minimal DFA that recognizes an infinite regular language over fag (final states have been omitted).
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
23
7.1. Inclusion relations between residual languages of an infinite regular language over a oneletter alphabet We suppose in this subsection that L is infinite. Lemma 7.1. For all integers i; j; k , Li
Proof: Let u 2 Li+k . We have ai+k u u 2 Lj +k .
2
Lj ) Li+k Lj+k :
L, i.e. ak u
2
Li . Then ak u
2
Lj and aj +k u
2
L. Therefore
ut
Lemma 7.2. For all integers i; j , Li
Moreover, if Li
Lj ) d divides (i
)
j :
( Lj , min(i; j ) < mL.
Proof: Let r be an integer such that rd i + j 0 and mL + rd i 0. Let i1 = i + (mL + rd i) and j1 = j + (mL + rd i). We have Li1 = LmL +rd = LmL Lj1 and j1 mL . Applying Lemma 7.1 with k = j1 mL we obtain, LmL
Lj1 L2j1
m
L
: : : Ld(j1
L
m )+m
L
= LmL :
As A is minimal, this implies that qj1 = qmL , i.e. that d divides j1 mL and also i j . As max(i; j ) = min(i; j ) + max(i;j ) d min(i;j ) d, if Li ( Lj , we must have min(i; j ) < mL .
ut
Proposition 7.1. We have the following cases 1. If A is a loop, i.e. if mL
= 0, then there are no inclusion relations between residual languages.
2. If there are no inclusion relations between LmL residual languages.
and LnL , then there are no inclusions between
( LnL , then [Li ( Lj ) i < j ℄. If LnL ( LmL 1 , then [Li ( Lj ) i > j ℄.
3. If LmL 4.
1
1
Proof: The first assertion is clear from Lemma 7.2. If there exist some i; j such that Li ( Lj , we have min(i; j ) mL 1. If k = mL 1 min(i; j ), then Lmin(i;j )+k = LmL 1 and Lmax(i;j )+k = LnL . Now, using the Lemma 7.1, the three last points are ut clear.
24
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
7.2. Prime and composite residual languages Proposition 7.2. Every nonempty residual language of a finite regular language over a oneletter alphabet is prime. Proof: Let L be a finite regular language over and let A be its trimmed minimal DFA. We have qnL 2 F and Æ (qnL ; a) = ;. For any 0 i nL , anL i 2 Li . Let j 6= i. If anL i 2 Lj , then j < i and anL j 62 Li , i.e. Lj 6 Li . Hence, Li is prime. ut Lemma 7.3. Let L be an infinite regular language over a oneletter alphabet and let A be its trimmed minimal DFA. If some residual language of L is composite, then A is not a loop, i.e. mL > 0, and LmL 1 ( LnL . Proof: Suppose that some residual language of L is composite. From Proposition 7.1, mL > 0 and there must exists an inclusion relation between LmL 1 and LnL . Suppose that LnL ( LmL 1 . Let qj be a composite state. From Proposition 7.1 and Lemma 7.2, we have j < mL . Let u 2 LmL 1 . We have amL 1 j u 2 Lj . Let i such that Li ( Lj and amL 1 j u 2 Li . We have u 2 LmL 1+i j . From Proposition 7.1, we have i > j and from Lemma 7.2, we have i j = rd with r > 0. So, u 2 LmL 1+i j = LnL +(m 1)d = LnL . Therefore, LmL 1 = LnL which is contradictory. ut Proposition 7.3. Let L be a regular language over a oneletter alphabet and let A be its trimmed minimal DFA. If some state q of A is composite, then all the states which follow q are composite too. Proof: From previous lemmas, L is infinite, A is not a loop, LmL 1 ( LnL and Li ( Lj implies i < j . Let i be the first index such that Li is a composite residual language. Let Ri = fj j j < i and Lj ( Li g. We have Li = j 2Ri Lj . It is easy to verify that for every i k nL , we have Lk = j 2Ri Lj +k i and Lj +k i 6= Lk for every j 2 Ri . ut
S
S
7.3. Ratio between the size of the minimal DFA and the canonical RFSA for a oneletter alphabet regular language If all states of the minimal DFA A are prime, the canonical RFSA has the same size as the minimal DFA. From Proposition7.2 and Lemma 7.3, it remains to consider the case where LmL 1 ( LnL . Let QP be the set of prime states of Q and nP = jQP j. Due to Proposition 7.3, i nP implies that qi is composite. Let AS = h; QP ; fq0 g; F \ QP ; ÆS i be the trimmed NFA built from (A; qnP ) where is the reduction operator defined in Section 4.3: AS is an RFSA recognizing L that has the same number of states as the canonical RFSA. Let n0 be the smallest index such that qn0 2 ÆS (qnP 1 ; a). Lemma 7.4. For every i 0, we have jÆS (q0 ; ai )j jÆS (q0 ; ai+1 )j. Moreover, if jÆS (q0 ; ai )j jÆS (q0 ; ai+nP n0 )j for some i n0, then ÆS (q0; ai ) = ÆS (q0 ; ai+nP n0 ).
=
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
j j j i Y

a
q0
a
a
q2
q1
25
a
.....
a
jq
n 1
a
Figure 21. The minimal DFA corresponding to this canonical RFSA has (n
1)2 + 2 states.
Proof: If i < nP , ÆS (q0 ; ai ) = fqi g. Now let i nP . Due to the way ÆS is built, qj 2 ÆS (q0 ; ai ) implies n0 j < nP . We have ÆS (q0 ; ai+1 ) = qj 2ÆS (q0 ;ai ) ÆS (qj ; a). If j < nP 1, then ÆS (qj ; a) = fqj+1g and if j = nP 1, then qn0 2 ÆS (qj ; a). And as qn0 62 fqj 2ÆS (q0;ai )jj n, or there are at most nP n0 1 states qi such that f (i) = n. There is at most one state qi such that f (i) = nP
From this, we can calculate that nR
n0 .
nP + (nP 1)(nP 2) + 1 = (nP 1)2 + 2 states.
ut
The upper bound is reached by the automaton described in Figure 21.
8. Complexity results We have defined the notions of RFSAs, saturated automata, canonical RFSAs; in this section, we evaluate the complexity of constructions and decision problems linked to them. We shall mainly use the following classical complexity results concerning finite automata (quoted from [6]).
26
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
Proposition 8.1. Deciding whether two NFAs recognize the same language is a P SP ACE problem.
omplete
As an immediate corollary: given two NFAs A and A0 , deciding whether LA LA0 is a P SP ACE
omplete problem. The problem is quadratic if A and A0 are DFAs. Deciding whether the intersection of two DFAs is empty can be done in quadratic time. On the other hand: Proposition 8.2. Deciding whether the intersection of n DFAs is empty or not is a P SP ACE omplete problem. The first notion that we defined is saturation. Clearly, deciding whether a DFA is saturated is a polynomial problem. For NFAs, we have the following result. Proposition 8.3. Deciding whether an NFA is saturated is a P SP ACE
omplete problem.
Proof: Given an oracle which decides whether the language recognized by a given NFA is included in another one, we can build the saturated of a given NFA within polynomial time. Given an oracle which builds the saturated of a given NFA, we can say whether a given NFA is saturated within polynomial time. It remains to prove that the inclusion problem between two languages represented by NFAs polynomially reduces to the problem of deciding whether an NFA is saturated. Let A = h; Q; Q0 ; F; Æ i and A0 = h; Q0 ; Q00 ; F 0 ; Æ 0 i be two NFAs. We can suppose that A and A0 are trimmed, that Q \ Q0 = ; and that they have a unique initial state which cannot be reached from other states. Let Q0 = fq0 g, Q = fq0 ; q1 ; : : : ; ql g, Q00 = fq00 g and Q0 = fq00 ; q10 ; : : : ; ql00 g. We complete the alphabet by adding l + l0 + 2 new letters x1 ; : : : ; xl ; x01 ; : : : ; x0l0 ; z; t: let 0 be the new alphabet. We consider two new states qe and qf and let B = h0 ; Q [ Q0 [fqe ; qf g; fq00 g; F [ F 0 [fqf g; Æ 00 i where Æ 00 contains the transitions of Æ [ Æ 0 and the transitions defined below:
2 Æ(qi ; xi) for i = 1 : : : l and qf 2 Æ(qi0 ; x0i) for i = 1 : : : l0; qe 2 Æ (q0 ; x) \ Æ (q00 ; x) for x 2 ; qe 2 Æ (qe ; x) for x 2 ; qf 2 Æ (qe ; xi ) for i = 1 : : : l and qf 2 Æ (qe ; x0i ) for i = 1 : : : l0 ; qf 2 Æ (qf ; z ) and qf 2 Æ (qe ; t) (see Figure 22). Now, it is easy to verify that B is saturated if and only if LA 6 LA0 . qf
Next proposition shows that deciding whether a given NFA is an RFSA is also difficult. Proposition 8.4. Deciding whether an NFA is an RFSA is a P SP ACE
omplete problem.
ut
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
27
qi
Automaton A q0
q00
xi
R q e
z
j~ q >
xi , x0j , t
f
x0j
Automaton A0 qj0
Figure 22. The automaton is saturated iff Lq0
6 Lq0 . 0
28
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
Proof: First, we prove that the problem of deciding whether the union of n regular languages described by DFAs is equal to can be polynomially reduced to the problem of deciding whether an NFA is a RFSA. We can consider n DFAs A1 = h; Q1 ; q 1 ; Q1F ; Æ 1 i; : : : ; An = h; Qn ; q n ; QnF ; Æ n i where i 6= j implies Qi \ Qj = ;. Let x1 ; : : : ; xn ; y1 ; : : : ; yn ; a; b be 2n + 2 new letters. We can build the NFA A = h; Q; QI ; QF ; Æ i where
Q = Si=1:::n Qi [ fq1 ; : : : ; qn; qe; qf ; qg g where fq1; : : : ; qn; qe; qf ; qg g are new states, QI = fq1 ; : : : ; qn; qe; qf g QF = Si=1:::n QiF [ fqg g Æ = (Si=1:::n Æi ) [ (Si=1:::nf(qi ; xi; qi ); (qi; yi ; qi); (qe; a; qi )g) [ f(qe ; b; qe); (qf ; b; qf ); (qf ; a; qg )g [ f(qg ; x; qg ) j x 2 g (see Figure 23). Every states, except maybe qe , defines a residual language of the described language:
states qi correspond to residual languages of words yi , states q of sets Qi correspond to residual languages of xi u where Æ i (q i ; u) = q , qf corresponds to the residual language of b and qg corresponds to the residual language of a.
State qe defines a residual language of the language if and only if the union of recognized languages by Ai automaton is equal to . That is, A is an RFSA if and only if the union of languages described by Ai automaton is equal to It remains to show that we can decide whether a given NFA A is an RFSA within polynomial space. Consider the subset construction defined in Section 5. The reachable set of states of A can be enumerated within polynomial space: therefore, for each state q of A and for each reachable set of states, decide whether they define equal regular languages. The NFA A is an RFSA if and only if all its states are equivalent to some reachable set of states. ut Using similar techniques, it can be shown that deciding whether the saturated of a given DFA is the canonical RFSA is a PSPACEcomplete problem. Hint: given n DFAs A1 ; : : : ; An over , it is easy to build a DFA over an extended alphabet 0 such that [ni=1 LAi 6= iff the saturated of A is the canonical RFSA.
9. Conclusion The class of RFSAs can be viewed as an intermediary class between the DFAs and the NFAs. Based on an important property of the DFAs, namely the fact that each state of an automaton A must define a residual language of the language recognized by A, the RFSAs also share with the DFAs the property
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
Automaton
29
Automaton
A1
An
qn
qg
~ ~ Æ O 6 y1
yn
x1
q1
xn
q1
qn
...
a
b
Figure 23.
qe
a
I a

qf
b
A is an RFSA iff the union of languages described by the Ai is equal to .
of having a minimal canonical form. On the other hand, the canonical RFSAs can be in some cases as concise as minimal NFAs. It has been indicated in the introduction that the ideas developped in this paper come from a work done in the domain of Grammatical Inference. A main problem in this field is to infer efficiently (a representation of) a regular language from a finite set of examples of this language. Some positive results can be proved when regular languages are represented by DFAs. For example, it has been shown that Regular Languages represented by DFAs can be infered from given data ([7, 8]). In this framework, classical inference algorithms such as RPNI [13] need a polynomial number of examples relatively to the size of the minimal DFA that recognizes the language to be infered. So, regular languages as simple as an cannot be infered efficiently using these algorithms. Hence, it is a natural idea to think of using other kind of representations for regular languages, such as NFAs. Unfortunately, it has been shown that Regular Languages represented by NFAs cannot be efficiently infered from given data ([8]). The main difficulty that arises when one try to build an NFA from examples comes from the fact that states do not correspond to natural components of the associated language. So, we defined RFSAs in order to obtain an automata representation of regular languages for which states correspond to residual languages. RFSAs have been used to design grammatical inference algorithms in [3, 4].
References [1] Brzozowski, J. A.: Canonical regular expressions and minimal state graphs for definite events, in: Mathematical Theory of Automata, vol. 12 of MRI Symposia Series, 1962, 52–561. [2] Chrobak, M.: Finite automata and unary languages., Theorical Computer Science, 1986, 47(2):149–158. [3] Denis, F., Lemay, A., Terlutte, A.: Learning regular languages using non deterministic finite automata, ICGI’2000, 5th International Colloquium on Grammatical Inference, 1891, Springer Verlag, 2000.
30
F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata
[4] Denis, F., Lemay, A., Terlutte, A.: Learning regular languages using RFSA, ALT 2001, number 2225 in Lecture Notes in Artificial Intelligence, Springer Verlag, 2001. [5] Denis, F., Lemay, A., Terlutte, A.: Residual Finite State Automata, STACS 2001, 18th Annual Symposium on Theoretical Aspects of Computer Science, number 2010 in Lecture Notes in Computer Science, Springer Verlag, 2001. [6] Garey, M. R., Johnson, D. S.: Computers and Intractability, a Guide to the Theory of NPCompletness, W.H. Freeman and Co, San Francisco, 1979. [7] Gold, E.: Complexity of Automaton Identification from Given Data, Inform. Control, 37, 1978, 302–320. [8] Higuera, C. D. L.: Characteristic Sets for Polynomial Grammatical Inference, Machine Learning, 27, 1997, 125–137. [9] Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages, and Computation, AddisonWesley, 1979. [10] Kleene, S. C.: Representation of Events in Nerve Nets and Finite Automata, in: Automata Studies, Annals of Math. Studies 34 (C. Shannon, J. McCarthy, Eds.), New Jersey, 1956. [11] Myhill, J.: Finite Automata and the Representation of Events, Technical Report 57624, WADC, 1957. [12] Nerode, A.: Linear Automaton Transformation, Proc. American Mathematical Society, 9, 1958. [13] Oncina, J., Garcia, P.: Inferring regular languages in polynomial update time, Pattern Recognition and Image Analysis, 1992. [14] Yu, S.: Handbook of Formal Languages, Regular Languages, vol. 1, chapter 2, Springer Verlag, 1997, 41–110.