REMARKS ON MULTIPLE ENTRY DETERMINISTIC FINITE AUTOMATA

Journal of Automata, Languages and Combinatorics u (v) w, x–y c Otto-von-Guericke-Universit¨

at Magdeburg

REMARKS ON MULTIPLE ENTRY DETERMINISTIC FINITE AUTOMATA

´ k1 Libor Pola Department of Mathematics, Masaryk University Jan´ aˇckovo n´ am 2a, 662 95 Brno, Czech Republic e-mail: [email protected]

ABSTRACT We investigate several aspects of the multiple entry DFA’s. We consider their DFA conversion. Further, we show that they appear as minimal NFA’s for certain classes of languages. Finally, we deal with their decompositions into disjoint unions of automata with fewer states. Keywords: multiple entry DFA, minimalization, conversion, decomposition

1. Introduction Multiple entry deterministic finite automata are non-deterministic finite automata with a very restrictive and natural kind of non-determinism. They have been studied in numerous papers : see the recent ones by Holzer, Salomaa and Yu [4] and by Malcher [9] and the references there. After this introductory section, Section 2 contains preliminaries and recalls that the minimalization of a DFA is a “local” task (i.e., one considers certain quotient of a given automaton) and that the problem of the minimalization of an k-entry deterministic finite automaton within the class of all such automata is NP-complete [9]. In Section 3 we complete the results from [4] concerning the DFA conversion. In Section 4 we point out that for regular languages with a group as the syntactic monoid one can find minimal NFA’s in our class. In the next section we show that the same is true for an other class of languages. Finally, in Sections 6 and 7 we consider the so-called decompositions of our automata. This leads to a new notion of minimality : in this sense a disjoint union of several automata with < n states is more simple than an n state automaton. This can be justified from the viewpoint of the so-called distributed computing. The proposed decompositions is again of a “local” character since the resulting components are quotients of the automaton under consideration. 1 Supported by the Institute of Theoretical Informatics, Masaryk University Brno, and partially ¨ also by Project AKTION Osterreich - Tschechische Republik.

2

L. Polák

2. Preliminaries A non-deterministic finite automaton (briefly NFA) A = (P, A, E, I, T ) over a finite non-empty alphabet A consists of a finite non-empty set P of states, a transition relation E ⊆ P × A × P and the sets I, T ⊆ P of initial and terminal states. We denote by L (A) the language accepted by A and, for p ∈ P , we put L p (A) = L (P, A, E, {p}, T ). A homomorphism of A into a NFA B = (Q, A, F, J, U ) is a mapping φ : P → Q such that ( ∀ p, q ∈ P, a ∈ A )( (p, a, q) ∈ E ⇒ (φ(p), a, φ(q)) ∈ F ) , and φ(I) ⊆ J, φ(T ) ⊆ U . Clearly, L (A) ⊆ L (B) in such a case. An isomorphism is a bijective homomorphism having above ⇔ , =, = instead of ⇒ , ⊆, ⊆. Let Mk denote the class of all complete deterministic finite automata with at most k initial states. More formally, A ∈ Mk if and only if A = (P, A, ·, I, T ) where P is a non-empty finite set of states, A is a non-empty finite alphabet, · : P × A → P is a transition function (naturally extended to · : P × A∗ → P ), I ⊆ P S with |I| ≤ k is a set of initial states, T ⊆ P is a set of terminal states. Let M = k≥1 Mk be the class of all multiple entry DFA’s. Let det A ∈ M1 denote the result of the classical determinization of A ∈ M (the states are {I · u | u ∈ A∗ },...). An equivalence relation ̺ on the set of states of A = (P, A, ·, I, T ) ∈ M is a congruence on A if ( ∀ p, q ∈ P, a ∈ A ) ( p ̺ q ⇒ (p · a) ̺ (q · a) ) . We define the quotient automaton as A/̺ = (P/̺, A, ·, I/̺, T̺ ) where p̺ = { q ∈ P | q ̺ p }, P/̺ = { p̺ | p ∈ P }, p̺ · a = (p · a)̺, I/̺ = { i̺ | i ∈ I }, and T̺ = { p̺ | p ∈ P and p̺ ⊆ T } . For each successful path

a

a

a

m pm ̺ p0 ̺ →1 p1 ̺ →2 · · · →

in A/̺, there exists i ∈ I with i ̺ p0 and a

a

a

m i · a 1 . . . am i →1 i · a1 →2 · · · →

is a successful path in A. Thus L (A) ⊇ L (A/̺). For L ⊆ A∗ and u ∈ A∗ we put u−1 L = { v ∈ A∗ | uv ∈ L }. The first part of the following result is trivial and the second part is well-known. Result 1 Let A = (P, A, ·, I, T ) ∈ Mk . The relation µA on P defined by p µA q if and only if L p (A) = L q (A) is a congruence relation on A and L (A/µA ) = L (A). For k = 1, A/µA gives the minimal complete DFA for the language L (A).

Remarks on Multiple Entry Deterministic Finite Automata

3

For k ≥ 2 we can speak only about a local minimalization; even the following is true. Result 2 (Malcher, [9] Theorem 1) For each k ≥ 2 the following problem is NPcomplete : Instance : A ∈ Mk , m a positive integer. Question : Is there an m-state automaton B ∈ Mk such that L (B) = L (A) ? 3. Mk to DFA conversion Result 3 (Holzer, Salomaa, Yu, [4] Lemmas 2 and 3) For A ∈ Mk with n Pk states there exists a DFA D with at most i=1 ni states such that L (D) = L (A). This upper bound is sharp. Theorem 5 of [4] says that for A ∈ Mk with n states over the single letter alphabet {a} there exists a DFA D with at most ( nk )k states such that L (D) = L (A). This is not completely true since for the automaton A = ( {0, 1, 2}, {a}, +1 (mod 3), {0, 1}, {0} ) we have n = 3, k = 2 and by the theorem k there should be a DFA with ≤ nk = 49 states excepting L (A), which is not true. We correct first the bound and then we solve a problem from [4], namely to present a sharp upper bound. Theorem 1 For A = (P, {a}, ·, I, T ) ∈ Mk with n states there exists a DFA D with 2 k at most max{ n, n2 , . . . , nk } states such that L (D) = L (A).

Proof. Step 1. We can decompose A into components : p, q ∈ P are in the same component if and only if there exist c, d ≥ 0 such that p · ac = q · ad . Let A1 = (P1 , {a}, ·, I1 , T1 ), . . . , Aℓ = (Pℓ , {a}, ·, Iℓ , Tℓ ) be all the components containing at least one initial state. Clearly, ℓ ≤ k and L (A) = L (A1 ) ∪ · · · ∪ L (Aℓ ).

Step 2. Let A = (P, {a}, ·, I, T ) consist of a single component, let I = {i1 , . . . , ik }. We can construct a DFA D = (D, {a}, ◦, I, U ) by the classical determinization : D = { {i1 · ac , . . . , ik · ac } | c ≥ 0 } , {i1 · ac , . . . , ik · ac } ◦ a = {i1 · ac+1 , . . . , ik · ac+1 } , and q ∈ U if and only if q ∩ T 6= ∅ . Clearly, L (D) = L (A) and |D| ≤ |P |.

Step 3. For an arbitrary A, let D1 = (D1 , {a}, ·, i1 , U1 ), . . . , Dℓ = (Dℓ , {a}, ·, iℓ, Uℓ ) be DFA’s constructed in Step 2 from its components A1 , . . . , Aℓ obtained in Step 1. We construct D = (D, {a}, ◦, (i1 , . . . , iℓ ), U ) as the classical product of D1 , . . . , Dℓ : D = { (i1 · ac , . . . , iℓ · ac ) | c ≥ 0 } , (i1 · ac , . . . , iℓ · ac ) ◦ a = (i1 · ac+1 , . . . , iℓ · ac+1 ) ,

4

L. Polák and (p1 , . . . , pℓ ) ∈ U if and only if there exists j ∈ {1, . . . , ℓ} such that pj ∈ Uj .

Clearly, L (D) = L (D1 )∪...∪L (Dℓ ) and |D| ≤ |D1 |·· · ··|Dℓ | ≤ |P1 |·· · ··|Pℓ |. Since the geometric mean of positive numbers x1 , . . . , xℓ is bounded by their arithmetic mean, ℓ ℓ ℓ |P1 |+···+|Pℓ | ℓ the product x1 ·· · ··xℓ is bounded by x1 +···+x . Thus |D| ≤ ≤ nℓ . l ℓ For showing that the bound in Theorem 1 is sharp we will need the following : Result 4 (communicated by R. Kuˇ cera [5]) For each integer k ≥ 2 and each constant α < 1, there exist pairwise relatively prime positive integers x1 , . . . , xk , such that k x1 + · · · + xk . x1 · · · · · xk ≥ α · k Proof. Let p1 < p2 < . . . be the increasing sequence of all prime integers. Fix a positive integer s – its role will be clarified later. Put xi = (pk !)s + pi for i = 1, . . . , k . We show that x1 , . . . , xk are pairwise relatively prime. Suppose that a prime p divides both of xi , xj , 1 ≤ i ≤ j ≤ k. Then p|(xj − xi ) = pj − pi . Thus consecutively : p < pk , p|pk !, p|pi , p|pj , p = pi , p = pj , i = j. Now √ k x1 · · · · · xk x1 (pk !)s + 2 ≥ x1 +···+xk = x1 +···+xk k (pk !)s + p1 +···+p k k k which runs to 1 if s goes to infinity. Theorem 2 The bound in Theorem 1 is sharp. Proof. Using a bit of calculus, one gets that, for n ≥ ke (e is the Euler’s constant), 2 k k max{ n, n2 , . . . , nk } = nk . We will construct a sequence A1 , A2 , . . . of automata from Mk with ke ≤ n1 < n2 < . . . states for which the numbers m1 , m2 , . . . of the states in the minimal DFA’s for L (A1 ), L (A2 ), . . . are such that for each α < 1, it holds n k i . ( ∃ i ∈ N ) mi ≥ α · k For each sequence x1 , . . . , xk from the proof of Result 4 we construct the automaton A ∈ Mk with n = x1 + · · · + xk states as follows : A is the disjoint union of the following components ( {0, 1, . . . , xi − 1}, {a}, +1 ( mod xi ), {0}, {0, 2, 3, . . . , xi − 1} ), i = 1, . . . , k . Using the product construction as in the proof of Theorem 1 we get (up to isomorphism) D = ( {0, 1, . . . , x1 . . . xk −1}, {a}, +1 ( mod x1 . . . xk ), {0}, {0, 2, 3, . . . , x1 . . . xk −1} ) .

Both A and D accept exactly those ac , c a non-negative integer, where c 6≡ 1 (mod x1 · · · xk ). Clearly, the automaton D is a minimal complete DFA.


5

4. Minimalization of group languages Our view to the universal automata follows Polák [10]. Let L ⊆ A∗ be a regular language. We put D = { u−1 L | u ∈ A∗ },

∗ U = { u1−1 L ∩ · · · ∩ u−1 k L | k ≥ 0, u1 , . . . , uk ∈ A } .

Classically, one assigns to L its (canonical) minimal DFA D = (D , A, ·, L, F ) where D is the (finite) set of states, a ∈ A acts on u−1 L by (u−1 L) · a = a−1 (u−1 L), L is the initial state and q ∈ D is a terminal state (i.e., an element of F ) if and only if 1 ∈ q. The universal automaton of a language L is a (non-deterministic) automaton U = (U , A, E, I, T ) where (p, a, q) ∈ E if and only if q ⊆ a−1 p and q ∈ U is an element of I if and only if q ⊆ L and q ∈ T if and only if 1 ∈ q. Result 5 (Arnold, Dicky and Nivat [1], Carrez [2]) Let U = (U , A, E, I, T ) be the universal automaton of a regular language L over an alphabet A. Then (i) U accepts L, (ii) for each non-deterministic automaton V = (V, A, G, J, W ) accepting a subset of L, the mapping \ φ : p 7→ { u−1 L | p is reachable from an initial state by a path labeled by u } is an automaton homomorphism of V into U, (iii) for each q ∈ U , we have L q (U) = q. A regular language L ⊆ A∗ is called a group language if its syntactic monoid is a group. Strongly connected components of a NFA A considered as graphs are called balls of A. Result 6 (Lombardy, Sakarovitch [8], Prop. 1, proof of Th. 4) Let L ⊆ A∗ be a group language. Then (i) The balls of the universal automaton U of L are multiple entry complete deterministic automata. (ii) Let V = (V, A, G, J, W ) be a NFA accepting L, let φ be as above. Then, for each g from the image of L in the syntactic group of L, there exists p ∈ V such that φ restricted to the ball Vg of V determined by p is a surjective homomorphism onto the ball Bg of U determined by φ(p), and Bg as a subautomaton of U accepts all words from A∗ which are mapped by the syntactic homomorphism to g. The following is an immediate consequence. Theorem 3 For a group language L there exist balls B1 , . . . , Bl of U such that the union B of B1 , . . . , Bl considered as subatomaton of U forms a minimal NFA for L.

6

L. Polák

Proof. Let V = (V, A, G, J, W ) be a minimal NFA for L. The homomorphism φ is injective. Let B1 , . . . , Bl be the balls of U which we get when g runs through the image of L in the syntactic group of L. Due to the minimality of V, the mapping φ is a bijection of V onto the set of states of B. Finally, B considered as a subatomaton of U automaton accepts L due to Result 6 (ii). Examples 1 and 2 Extremely nice examples are Example 2 of [7] and the example from Proposition 8 of [3]. In the first case, the two balls are of the form ({0, 1, 2}, {a, b}, ·, 0, {1}) and ({0′ , 1′ }, {a, b}, ·, 0′, {1′ }) where p · a = p + 1, p · b = p − 1 (mod 3) and p′ · a = p′ · b = (p + 1)′ (mod 2). In the second case the only ball is of the form : ↓↑ ↓ ↓ 0 1 . . . ⌈n/2⌉ − 1 ⌈n/2⌉ . . . n − 2 n − 1 a 1 2 ... ⌈n/2⌉ ⌈n/2⌉ + 1 . . . n − 1 0 b 0 n − 1 . . . ⌈n/2⌉ − 2 ⌈n/2⌉ − 1 . . . n − 3 n − 2 (The meaning of the arrows is that 0, 1, . . . , ⌈n/2⌉ − 1 are initial states and 0 is the only terminal one.) 5. Other multiple entry DFA’s which are minimal NFA’s In [4] the authors used the automata Ak,n ∈ Mk , 1 ≤ k ≤ n : ↓ 0 a 1 b 0

↓ ↓ ↑ 1 ... k − 1 k ... n− 2 n − 1 2 ... k k + 1 ... n − 1 0 1 ... k − 1 k ... n− 2 0

When discussing those automata we observed [6] that they are minimal in the class of all NFA’s. A generalization follows. Theorem 4 Let A = (P, A, ·, I, T ) ∈ Mk be such that (i) by the determinization of A we get all the singletons (i.e., one element sets), (ii) for each p ∈ P , there exists u ∈ A∗ such that u ∈ L p (A) \

[

{ L q (A) | q ∈ P, q 6= p } .

Then the automaton A is a minimal NFA for L (A). Proof. (Suggested by a referee of the conference version of this paper.) Let P = {p1 , . . . , pn } and let ui , vi ∈ A∗ , i = 1, . . . , n be such that I · ui = {pi } in det A

and

vi is accepted only from pi .


7

Then ui vj 6∈ L (A) if and only if i 6= j. Let B be a NFA with L (B) = L (A). Then the set of states of B reachable via different ui ’s must be pairwise disjoint and non-empty. Therefore, B has at least n states. Example 3 Taking in Aℓ,n any non-empty set of states as the initial ones we get the automata for which our theorem applies. Remark 1 For the automata from Sections 4 and 5 we can construct a minimal NFA as follows : Decompose L in U into a union p1 ∪ · · · ∪ pℓ of states of U and form P ⊆ U as a set of all derivatives of p1 , . . . , pℓ . Output UP – see [10]. More precisely, each such UP accepts L and at least one of them is a minimal NFA for L. 6. Decompositions of multiple entry DFA’s Let A1 ⊔ · · · ⊔ Am denote the disjoint union of the automata A1 , . . . , Am over a common alphabet. Notice that L (A1 ⊔ · · · ⊔ Am ) = L (A1 ) ∪ · · · ∪ L (Am ). A automaton A = (P, A, ·, I, T ) ∈ M is called non-trivial if |P | ≥ 2. A congruence ̺ on A is non-trivial if ̺ 6= ∆P (the diagonal relation { (p, p) | p ∈ P } on P ). A non-trivial automaton A ∈ M is called reducible if there exists a system (̺1 , . . . , ̺m ) of non-trivial congruences on A satisfying the condition L (A) = L (A/̺1 ⊔ · · · ⊔ A/̺m )

(∗)

In the opposite case we speak about an irreducible automaton. Remark 2 Even the case m = 1 has its own sense : for instance, in the case µA 6= ∆P . Theorem 5 For each non-trivial automaton A = (P, A, ·, I, T ) ∈ M there exists a system of congruences (̺1 , . . . , ̺m ) on A such that L (A) = L (A/̺1 ⊔ · · · ⊔ A/̺m ) and A/̺1 , . . . , A/̺m are irreducible. Proof. For an irreducible A take m = 1, ̺1 = ∆P . Let (̺1 , . . . , ̺m ) be a system of non-trivial congruences on A satisfying the condition (∗). Now we have to consider that some of A/̺1 , . . . , A/̺m are further reducible. For the sake of simplicity we consider only the last summand. So let (σ1 , . . . , σr ) be a system of non-trivial congruences on A/̺m satisfying the condition (∗). For s = 1, . . . , r, we define the relation τs on P by p τs q if and only if (p̺m ) σs (q̺m ) . Then (̺1 , . . . , ̺m−1 , τ1 , . . . , τr ) is a system of non-trivial congruences on A with L (A) = L (A/̺1 ⊔ · · · ⊔ A/̺m−1 ⊔ A/τ1 ⊔ · · · ⊔ A/τr ) . Due to the finiteness of P , the process terminates. Remark 3 For congruences σ, ̺ on A with ̺ ⊆ σ, we have L (A/̺) ⊇ L (A/σ). Consequently, when testing the irreducibility of A, it is enough to check the condition (∗) only for the atoms ̺1 , . . . , ̺m of the lattice of all congruences of A.

8

L. Polák

Remark 4 Let ̺ be a congruence of A ∈ M. If u, v ∈ A∗ determine the same transformation of the states of A, then the same is true also for the following automata : detA, detA/µdetA , A/̺, det(A/̺), (det(A/̺))/µdet(A/̺) . This can be used for the checking the condition (∗). Example 5 Consider again Example 1 with initial states 0 and 0’ and terminal ones 1 and 1’. Example 6 Consider a complete DFA A over the alphabet A = {a, b} : 0 a, b a 1

2 a b

b

b

b a 4

3 a

Clearly, the equivalences ̺, σ on the set P = {0, . . . , 4} given by P/̺ = { {0}, {1, 2}, {3, 4} }, P/σ = { {0}, {1, 4}, {2, 3} } are congruences on A. The quotient automata look as follows : 0

0 a, b

a, b

b

a

a

1, 2

1, 4

2, 3 a

b

b

b

a 3, 4

Computing the transformation monoid of A we can see that exactly (modulo the syntactic congruence) a2 , ab, ba, b2 , a2 b, bab are accepted by A. Each of these words is accepted by at least one of A/̺ and A/σ. Thus L (A) = L (A/̺ ⊔ A/σ). Moreover, both A/̺ and A/σ are irreducible and A is the product of A/̺ and A/σ. The point of this example is that A is a minimal NFA. We can verify it by computing the so-called basic and universal matrices of A first – see [10]. Then we can see that there exist no 4 columns in the universal matrix such that each column of the basic matrix is expressible as union of some of those columns. Finally, use Theorem 1 of [10], part (A) ⇒ (W ).


9

7. Decompositions in case of a single letter alphabet In this case the computation of congruences and the finding decompositions into irreducibles are very transparent. A DFA A is of a type (k, d), where k ≥ 0, d ≥ 1, if it is of the form A = ({0′ , . . . , (k − 1)′ , 0, . . . , d − 1}, {a}, ·, i, T ) and 0′ ·a = 1′ , . . . , (k−1)′ ·a = 0, 0·a = 1, . . . , (d−1)·a = 0, i = 0 if k = 0, i = 0′ otherwise. We write A ∈ A(k, d). Congruences of an A ∈ A(k, d) correspond to the pairs (ℓ, e), where 0 ≤ ℓ ≤ k and e divides d; more precisely : we identify first i and j if and only if i ≡ j (mod e), i, j ∈ {0, . . . , d − 1} to get (after a renaming of states) an automaton from A(k, e), and if ℓ < k we identify (k − 1)′ and d − 1 to get (after a renaming of states) an automaton from A(k − 1, e). We repeat the last step k − ℓ times to get an automaton A/̺ℓ,e ∈ A(ℓ, e). Example 7 Let A ∈ A(2, 6) be with T = {1′ , 0, 2, 3, 4}. We can easily see that the irreducible quotient automata of A are exactly A1 = A/̺2,1 , A2 = A/̺1,2 , A3 = A/̺0,6 and A4 = A/̺0,3 . The word a is accepted only by A1 , the word a6 is accepted only by A2 . Since L (A1 ⊔ A2 ⊔ A4 ) = L (A) and we need all the summands, this decomposition can be considered as the “minimalization” of A in the class M.

Example 8 Let A ∈ A(2, 6) be with T = {0′ , 1, 2, 3, 5}. We see that the irreducible quotient automata of A are exactly A1 = A/̺2,2 , A2 = A/̺2,3 , A3 = A/̺0,6 and A4 = A/̺1,1 . We have three candidates for the “minimalization” of A, namely A1 ⊔ A2 , A1 ⊔ A3 , A2 ⊔ A3 . The first one can be considered as the best one. Acknowledgments The author wishes to express his gratitude to the anonymous referee whose numerous remarks are used in the text. References [1] A. Arnold, A. Dicky and M. Nivat, A note about minimal non-deterministic automata, Bull. EATCS 47, 166–169 (1992). [2] C. Carrez, On the minimalization of non-deterministic automaton, Laboratoire de Calcul de la Faculté des Sciences de l’Université de Lille, 1970.

10

L. Polák

[3] F. Denis, A. Lemay and A. Terlutte, Residual finite state automata, Fundam. Inform. 51, No. 4, (2002), 339–368. [4] M. Holzer, K. Salomaa, S. Yu, On the state complexity of k-entry deterministic finite automata, Journal of Automata, Languages and Combinatorics 6 (2001), 453–466. ˇera, A personal communication, Brno March 2005. [5] R. Kuc [6] O. Kl´ıma and R. Roˇ zn´ık, A personal communication, Brno March 2005. [7] S. Lombardy and J. Sakarovitch, On the star height of rational languages. A new presentation for two old results, in Proc. 3rd Int. Coll. on Words, Languages and Combinatorics, World Scientific, 2003, 266–285. [8] S. Lombardy and J. Sakarovitch, Star height of reversible languages and universal automata in Proc. LATIN 2002, Springer Lecture Notes in Computer Science, 2286 2002, 76–90. [9] A. Malcher, Minimazing finite automata is computationally hard, in Proc. DLT 2003, Springer Lecture Notes in Computer Science, 2710, 2003, 386–397. ´k, Minimalizations of NFA using the universal automaton, Int. Journal [10] L. Pola of Found. of Computer Science 16, No. 5 (2005), 999–1010.