Finite Fields and Propositional Proof Systems

Finite Fields and Propositional Proof Systems Michael Soltys Department of Computing and Software McMaster University 1280 Main Street West Hamilton, Ontario L8S4K1, CANADA Email: Abstract Propositional proof complexity is a well established area of theoretical computer science; it is an area of research intimately connected with complexity theory and automated reasoning. In this paper we introduce a proof system, which we call A (a fragment of the theory LA), for formalizing algebraic proofs over arbitrary fields, and we show how to translate formulas and proofs over A into propositional proofs, and conclude that we can formalize properties of finite fields (where the depth is small) with short Frege proofs. Keywords: Propositional proof complexity, Frege systems, finite fields. 1. INTRODUCTION Proof Complexity is the area of Theoretical Computer Science which is concerned with the length of formal derivations. It is intimately related to Complexity Theory and Automated Theorem Proving. For example, it is a well known result of [2] that there exists a proof system where every tautology has a polynomial size proof in its length, if and only if NP = co-NP. On the other hand, an exponential explosion in the length of proofs is a fundamental obstacle to efficient theorem provers (for example, the shortest resolution proofs of the family of tautologies which formalize the Pigeonhole Principle have exponential size, and hence are infeasible—see [1]). In [6] and [3] it has been shown that the main universal principles of linear algebra (such as the Cayley-Hamilton Theorem or the multiplicativity of the determinant) have efficient proofs in Extended Frege. Extended Frege is a standard propositional proof system which can be thought of as reasoning with polynomial size circuits. If a family of tautologies has uniform polynomial size proofs in Extended Frege, it is considered to have feasible proofs. Research supported by the Natural Sciences and Engineering Research Council of Canada.

The results in [6] and [3] are field independent (i.e., they hold over any field), but the underlying field must be fixed when translating these results into propositional proofs. These translations have a dual purpose: to demonstrate the feasibility of the reasoning involved, and to provide potential tools for independence results. The translations over the field of two elements are the easiest to formalize with boolean logic; in this paper we show how to formalize algebraic reasoning with boolean logic over arbitrary finite fields. We restrict ourselves to field properties (rather than for example matrix properties), as it is there that the main issues with boolean finite field representations arise. We introduce a new proof system, which we call A, for formalizing algebraic proofs over arbitrary fields. A is technically a quantifier-free, first order theory, with a finite set of axiom schemes, but it is more akin to a propositional proof system. A is a fragment of a more general theory, designed for reasoning about linear algebra, called LA (see [6]). In the next few paragraphs we give a quick introduction to propositional proof systems. For a complete exposition see [7]. We use a; b; ; : : : to denote field as well as boolean variables, and ; ; ; : : :, to denote algebraic as well as boolean formulas. We use the logical connectives f:; _; ^; ; $g. We include the propositional constants 0 and 1 meaning “false” and “true,” respectively. If is a formula and a1 ; : : : ; am a sequence of variables then we write [ 1 =a1 ; : : : ; m =am ℄ for the formula resulting from by substituting 1 ; : : : ; m for a1 ; : : : ; am . A Frege rule is defined to be a sequence of formulas written in the form 1 ; : : : ; k ` 0 . In the case that the sequence 1 ; : : : ; k is empty, the rule is referred to as an axiom scheme. The rule is sound if 1 ; : : : ; k j= 0 , that is, if every truth-value assignment satisfying 1 ; : : : ; k also satisfies 0 . If 1 ; : : : ; k ` 0 is a Frege rule, then

0 is inferred from 1 ; : : : ; k by this rule if there is a sequence of formulas 1 ; : : : ; m and variables a1 ; : : : ; am so that for all i, 0 i k , i = i [ 1 =a1 ; : : : ; m =am ℄. If F is a set of Frege rules and a formula, then a

proof of in F from 1 ; : : : ; m is a finite sequence of formulas such that every formula in the sequence is one of 1 ; : : : ; m or inferred from earlier formulas in the sequence by a rule in F , and the last formula is . The formulas in the sequence are the lines in the proof. If F is a set of Frege rules, then it is implicationally complete if whenever 1 ; : : : ; m j= 0 then there is a proof of 0 in F from 1 ; : : : ; m . A Frege system is defined to be a finite set of sound Frege rules that is implicationally complete. Example 1.1 Shoenfield’s system [5, p. 21], in which the primitive connectives are f:; _g (which is a complete set of connective), has the following set of rules:

` :a _ a a`b_a a_a`a a _ (b _ ) ` (a _ b) _ a _ b; :a _ ` b _

(excluded middle) (weakening) (contraction) (associative rule) (cut rule)

Shoenfield’s system is sound and implicationally complete for the set of boolean formulas over the connectives f:; _g. 2. THE PROOF SYSTEM A We define a new proof system, which we call A (“A” for Algebra), and which is designed to reason about fields. Technically, A is a quantifier-free first order logical theory, but in practice it is more akin to a propositional proof system. We shall show that the theorems of A are intimately related to propositional tautologies with short Frege proofs. However, the great advantage of A over propositional proof systems is the clarity of expression of properties of fields; the same properties must be encoded, and hence become less readable, in the context of propositional formulas. We denote the variables of A by a; b; ; : : :. We define the terms of A (A-terms) by structural induction. Basis case: the variables a; b; ; : : :, are terms, and the field elements 0,1 (which are in every field) are terms. The induction step: if t; u are terms, then so are (t + u); (t u); ( t); (t 1 ). Note that for readability we will sometimes omit parenthesis, when it is clear from the context what the terms are, and we may also omit “” and simply write tu instead of t u. The formulas of A (A-formulas) are likewise defined by structural induction. Basis case: if t; u are terms, then t = u is an atomic formula (and t 6= u denotes the negation of the atomic formula t = u, i.e., :(t = u)). Induction step: if ; are formulas, then so are :; _ ; ^ ; ; $ . (Note that if we wanted to restrict ourselves to the basis f:; _g, then we could have defined the formulas ^ , and

$ as abbreviations for :(: _ : ), : _ and :(:(: _ ) _ :(: _ )), respectively.) If is a formula in the language of A, and

a1 ; a2 ; : : : ; ak is a sequence of variables, then we write [t1 =a1 ; : : : ; tk =ak ℄ for the formula resulting from by substituting the terms t1 ; : : : ; tk for a1 ; : : : ; ak .

The rules of A are any complete set of Frege rules over the connectives f:; _; ^; ; $g. Since any two Frege systems are p-equivalent, even over different sets of connectives (see [2]), we can use all the rules that we want (as long, of course, as we fix a finite number of them at the beginning). In particular, we will include the following two rules: ^-intro, a; b ` a ^ b, and modus ponens, a; a b ` b (which is just a variant of the cut-rule). As in Frege systems, a; b can be replaced by arbitrary formulas of A. Finally, A has two sets of axiom schemes: field axioms and equality axioms. We call these axioms “schemes” because they are meant to be closed under substitution of terms for variables. For example, the axiom scheme 0 + a = a is really a template for all the axioms of the form 0 + t = t, for any term t. The field axiom schemes are: A1 0 + a = a A2 1a = a A3 a + b = b + a A4 ab = ba A5 a + (b + ) = (a + b) + A6 a(b ) = (ab) A7 a(b + ) = ab + a A8 a + ( a) = 0 A9 a 6= 0 aa 1 = 1 and the equality axiom schemes: E1 E2 E3 E4 E5 E6 E7

a=a a=bb=a (a = b ^ b = ) a = (a = b ^ = d) (a + = b + d) (a = b ^ = d) (a = bd) a=b a= b a=ba 1=b 1

Note that without A9 (and E7), the axiom scheme for multiplicative inverse, and without the function symbol “ 1 ,” we would have a theory of commutative rings. A proof of an A-formula (A-proof) is a sequence of A-formulas f1 ; 2 ; : : : ; n g, where n is , and each i , 1 i n, is either an axiom, or results from the application of a Frege rule to previous formulas. Example 2.1 We demonstrate the proof of rightdistributivity (b + )a = ba + a (the axiom scheme A7 is left-distributivity, hence right-distributivity must be derived). We present the proof in three columns for better readability, but formally the proof is just the sequence

of formulas in the middle column. If a given line is an axiom (A1–9 or E1–7) the “justification” states which, and if it is not axiom, it shows by what rule and from which previous formula(s) the given line follows. 1 2 3 4 5 6 7

8

ab = ba [A4] a = a [A4] ab = ba ^ a = a [^-intro from 1,2] (ab = ba ^ a = a) ^ ab + a = ba + a [E4] ab + a = ba + a [modus ponens 3,4] a(b + ) = ab + a [A7] (a(b + ) = ab + a ^ ab + a = ba + a) [E3] a(b + ) = ba + a a(b + ) = ab + a ^ ab + a + ba + a [^-intro 5,6]

14

a(b + ) = ba + a [modus ponens 7,8] a(b + ) = (b + )a [A4] a(b + ) = (b + )a (b + )a = a(b + ) [E2] (b + )a = a(b + ) [modus ponens 10,11] ((b + )a = a(b + ) ^ a(b + ) = ba + a) (b + )a = ba + a [E3] (b + )a = a(b + ) ^ a(b + ) = ba + a

15

( + )

9 10 11 12 13

b a = ba + a

[^-intro 12,9]

[modus ponens 13,14]

+

*

*

+

a b

d

+ c c

* a

b

Figure 1: Arithmetic circuit representation of the Aformula (a(b + )) + (((ab) + )d) with a maximal path indicated with dotted lines. Let s() be the size of a formula (A-formula or boolean formula), where the size is simply the number of symbols. Let s( ) be the size of a proof (A-proof or propositional proof), and if is an A-proof, let d( ) be the largest depth of any term appearing in . For any finite field F , we want the translations to satisfy the following three properties:

Note that the justifications are not necessary in general because they can be computed in polynomial time in the length of the proof.

1. Given any A-formula , the size of kkF (s(kkF )) is bounded by s()O(jF j)d() .

Finally, any field F is a standard model for A. When we speak of standard models of A, we mean a fixed finite field, and the corresponding interpretations of f+; ; ; 1g.

2. If is a true A-formula (in a standard model F ) then the propositional formula kkF is a logical consequence of the propositional axioms which state the correctness of the encoding of field elements by boolean variables.

3. TRANSLATIONS

3. If is an A-formula, and is its A-proof, then kkF has a Frege proof of size bounded by s()O(jF j)d() .

Given a particular finite field F, we show how to translate efficiently A-formulas (and A-proofs) into propositional formulas (and proofs). If is an A-formula, kkF denotes the translation of over the field F . When the field F is clear from the context, the subscript if omitted. Similarly, if is an A-proof, we denote the corresponding propositions proof by k kF . The depth of an A-term t, d(t), is its longest nesting of operations; if the term is represented as an arithmetic circuit, its depth is the length of a maximal branch. For example, the depth of the term (a(b + )) + (((ab) + )d) is 4 (see figure 1). The depth of an A-formula , d(), is the max of the depths of all terms appearing in it.

We shall call these three conditions the conditions of correctness of translations. Any two finite fields of the same size are isomorphic, and any finite field is either isomorphic to Zp (p prime), or to a field of size pr , where p is prime and r > 1 (note that Zpr is not a field. In the next sections we show how to translate over Z2, Zp, and F , where jF j = pr . Note that the exponential increase of the size of the boolean formula in the depth of the original A-formula could be avoided by introducing extension definitions. This, however, would no longer be Frege but rather Extended Frege type of reasoning.

3.1. Translations over Z2 The A-formulas over the field of two elements have natural translations into propositional formulas, since zero is represented by “false,” and one by “true.” We first show how to translate terms. We use the following set of connectives: f:; _; ^; ; ; $g. A variable a is translated as follows: kak ) a. For general terms:

k(t + u)k ) (ktk kuk) k(t u)k ) (ktk ^ kuk) k( t)k ) ktk k(t )k ) ktk 1

Note that t and t 1 translate into t; this is because 1 is the only non-zero element in Z2, and 1 is its own additive and multiplicative inverse. We translate atomic formulas (always given by t = u) as follows: kt = uk ) ktk $ kuk. A general Aformula is a boolean combination of atomic formulas; the translation respects boolean connectives (and parenthesis), that is k:k ) :kk, and for any given connective Æ 2 f_; ^; ; $g, k( Æ )k ) (kk Æ k k). Example 3.1 We demonstrate the translation of the axiom (a(b + )) = ((ab) + (a )):

k(a(b + )) = ((ab) + (a ))k ) k(a(b + ))k $ k((ab) + (a ))k ) (kak ^ k(b + )k) $ (k(ab)k k(a )k) ) (a ^ (kbk k k)) $ ((kak ^ kbk) (kak ^ k k)) ) (a ^ (b )) $ ((a ^ b) (a ^ )) The recursive procedure stops when we obtain a boolean formula. Note that over the filed Z2, the result is a propositional tautology of the same size (where size as always is the number of symbols) as the original Aformula. Lemma 3.2 The translation procedure over Z2 satisfies the three conditions of correctness. P ROOF : In effect, the translation procedure can be much simplified if we observe that the resulting boolean formula can be obtained from the original A-formula by replacing each +; by ; ^, respectively, and by eliminating ; 1 , and preserving all the other symbols. Thus, s() s(kk). Hence the first condition of correctness is satisfied. Over Z2 there are no axioms stating the correctness of the encoding of field elements, since we translate a field variable a directly into a boolean variable a. Hence, to show the second condition of correctness, we must show that given a true A-formula , kk is a tautology. As

boolean connectives are translated verbatim, it is enough to consider true atomic formulas t = u. We want to show that if t = u is true, then kt = uk is a tautology. Let : ffield variablesg ! f0; 1g be a value assignment; given a particular and an A-term t, then (t) 2 F, where is extended to terms in the obvious way (that is, (t + u) = (t)+ (u), etc.). It is clear that can be also viewed as a truth assignment to boolean formulas (over the field Z2). Hence, we show the following: given a term an A-term t, for any , (t) = 1 iff (t) = T . This follows immediately by structural induction on A-terms, and proves the first sentence of this paragraph, and hence the second condition of correctness. Finally, to show that the third condition of correctness also holds, suppose that is an A-formula, with an Aproof = f0 ; 1 ; : : : ; n g, n = . If i follows from i1 ; i2 ; : : : ; ik , i1 ; i2 ; : : : ; ik < i, by a Frege rule, then ki k follows from the kij k’s by the same Frege rule. If k = 0, then i is an axiom. To deal with this case, we use a trick: recall that all Frege systems are p-equivalent (the famous result of [2], which states that given two different Frege systems—different rules and even different connectives—we can transform in polytime the proofs in one system into the other). Thus, we stipulate that our Frege system has the translations of A1–9 and E1–7 as axioms. Thus, if a + b = b + a is i in , then we replace it by (a b) $ (b a) in k k. The new Frege system, with the translations of algebraic axioms as its own axioms, is p-equivalent to any other Frege system. All that remains is to show that the translations of the algebraic axioms are tautologies (i.e., sound Frege rules). This follows directly from the second condition of correctness, as the axioms are true in any standard model. Note that s( ) s(k k) (the size of may be slightly bigger as it has and 1 which are not present in k k). This proves the third condition of correctness.

3.2. Translations over Zp We now consider fields Zp (note that p may be an odd prime, or 2, and in the latter case we obtain an alternative way of translating over Z2). Each field variable of A must be encoded with several propositional variables. There are many possible encodings; we choose the following one: a field variable a is going to be represented by the following set of boolean variables: fa0 ; a1 ; : : : ; ap 1 g. The idea is that only one ai will be true, and if ai is true, then a = i. To ensure that only one ai is true (among fa0 ; a1 ; : : : ; ap 1 g), we add the following set of axiom schemes for each field variable a:

^

ip

1

ai (

^

jp 1;j6=i

0

:aj )

(1)

which will ensure that if, for any i, ai is true, then for all j 6= i, aj is false; that is, it will ensure the correct representation of field elements by the boolean variables fa0 ; a1 ; : : : ; ap 1 g. The notation i i is shorthand for the conjunction of all the i ’s, parenthesized from left to right. We are going to use the following notation: to write i = j (mod p), we will write i =p j . We use this nonstandard notation to shorten our boolean expressions. When translating terms we will have a new parameter, i 2 f0; 1; : : : ; p 1g, and ktki is true if the value of the term t is i. Here is how the translation works: kaki ) ai , and:

V

k(t + u)ki ) k(t u)ki

)

_

k+l=p i

_

ktkk ^ kukl)

(

k l (ktk ^ kuk )

k( k(t )ki ) ktkj

a3

(4)

where ij

1

=p 1

^

ktki $ kuki )

(

ip

0

(6)

1

and a boolean combination of atomic formulas is translated as before; that is, the connectives are left invariant and atomic formulas are replaced by the right-hand side of (6). Example 3.3 We translate over Z3:

ab

ab) + (a ))

( ( + )) = ((

k(a(b + )) = ((ab) + (a ))k ^ i i ) (k(a(b + ))k $ k((ab) + (a ))k ) )

i2

0

(

i2 kl=3 i

0

)

^ _

kakk ^ k(b + )kl ))

(

_

$(

l 3i ^ k_ + =

(

i2 kl=3 i

0

$(

_

k(ab)kk ^ k(a )kl ) ak ^

(

_

_p

(

k+l=3 i pq=3 k

bp ^ q )))

(

q l p q (a ^ b ) ^ + =3

_

a4

Figure 2: The structure of the boolean formula resulting from the translation of a1 + (a2 + (a3 + a4 )). While the depth of the A-term was 1 (only the operator “ +”) the depth of the boolean formula is 6.

(5)

(Note that the “j ” in the last line above is unique.) An atomic formula is translated as follows:

kt = uk )

a2

(2) (3)

kl=p i i t)k ) ktkp i (mod p)

a1

p q (a ^ )))

pq=3 l

There is a different scheme for translating formulas over Zp. In this scheme, if a = i, then rather than having ai “on” and all the other aj “off,” we have a0 ; a1 ; : : : ; ai “on” and ai+1 ; : : : ; ap 1 “off.” This way we can make use of modular connectives (see [4, x12.6] for the precise definition and axiomatization of modular connectives). A

modular connective, is an unbounded fan-in connective MODp;i , where p is a positive integer (we are of course interested in p prime) and i is in f0; 1; : : : ; p 1g, and MODp;i (1 ; 2 ; : : : ; n ) true iff jfj : j is true gj =p i. With the modular connective, we translate a + b over Zp as follows: MODp;i (faj g1j p 1 ; fbj g1j p 1 ). Note that over Z2, the connective , if considered as a connective of unbounded fan-in, plays the role of MOD2;i . The advantage of the modular connectives is that they do not increase the depth of summations. For example, consider the A-term a1 + (a2 + (a3 + + an ) ). The translation of this term, using Eq. (2), produces a boolean formula of depth 2(n 1) as can be seen from the example in figure 2. If we used modular connectives, the depth of the boolean formula resulting from a summation is also 1. Furthermore, the size of the boolean formula when translating as in Eq. (2) is exponential in the depth of the A-formula. On the other hand, it is possible to express a lot in A with bounded depth (for example, much of linear algebra—see [6]), and modular connectives are arguably not as natural as the usual boolean connectives. Lemma 3.4 Given any A-term t, the size of its translation over Zp, denoted s(ktk), is bounded by s(t)(4p)d(t) . P ROOF : We are concerned with the scheme of translation as described by Eq. (2)–(5). The increase in size comes from Eq. (2) and (3), where we have to simulate addition and multiplication with boolean connectives. Consider Eq. (2): kt + uki becomes a disjunction of p formulas, each of the form ktkk ^ kukl . Suppose that d(t) d(u). Then, s(kt + uki ) (4p) sktkk .

Lemma 3.5 The translation procedure over Zp satisfies the three conditions of correctness. P ROOF : The first condition of correctness is the consequence of lemma 3.4. To show the second condition of correctness, we have to show that if t = u is true in a standard model F = Zp (for any p), then kt = uk is a logical consequence of the axioms given by Eq. (refensure) for each variable that appears in t = u. Let be a value assignment to the variables in some A-term t. We construct the induced truth value assignment 0 as follows: if (a) = i, then 0 (ai ) = T and 0 (aj ) = F 8j 6= i. We now show by structural induction on t that if (t) = i, then 0 (ktki ) = T and 0 (ktkj ) = F 8j 6= i. Suppose that t = u is true in any F = Zp. Then for all , (t) = (u). Then 0 (ktki ) = T iff 0 (kuki ) = T , and any truth assignment which satisfies the axioms given by Eq. (1) is equal to some 0 , and so we are done. Finally, to show that the third condition of correctness is satisfied, we notice that we cannot proceed as quickly as in the proof of lemma 3.2, since over Zp, the algebraic axioms do not translate into tautologies, but rather into logical consequences of the axioms given by Eq. (1). This can be remedied by adding to k k an axiom of the form Eq. (1) for each variable that appears in (see the third-before-last paragraph of the proof of lemma 3.2 to understand this more fully).

3.3. Translations over field of size pr To translate A-formulas over fields F, where jF j = pr , r > 1, we must choose a concrete representation for F. Unfortunately, Zpr is not a field, so F 6 = Zpr . To choose a representation, we choose an irreducible polynomial S (x) over Zp[x℄ of degree r, so that Zp[x℄=(S (x)) is a field of size pr , and hence any F of size pr is isomorphic to it. The field Zp[x℄=(S (x)) can be viewed as a vector space over 1; x; x2 ; : : : ; xr 1 , so each element can be represented as a0 + a1 x1 + a2 x2 + + ar 1 xr 1 , where ai 2 Zp. Therefore, each element can be encoded with the r-tuple (a0 ; a1 ; a2 ; : : : ; ar 1 ). To add two tuples we just add them component-wise, modulo p for each component. To multiply two tuples, we multiply the corresponding polynomials modulo S (x), and compute the coefficients of the result as a vector over the basis f1; x; x2 ; : : : ; xr 1 g to get the resulting tuple. The additive inverse is trivial to compute, and the multiplicative inverse can be computed with the FFT, for example. Therefore, each field element can now be encoded with the boolean variables given by aji , 0 i r 1, and 0j p 1. An atomic formula is translated as follows:

kt = uk )

^

ir

0

jp

1;0

ktkji $ kukji )

( 1

(We have i; j as we are now comparing tuples componentwise.) Terms for addition and additive-inverse are translated as before, except now we have r components:

k(t + u)kji )

_

ktkki ^ kukli)

(

k+l=p j k( ) ktkip j (mod p) To translate k(t u)kji is a little bit more involved: we compute, modulo S (x), the product of the two polynomials ktkl00 + ktkl11 x + + ktklrr 11 xr 1 kukk00 + kukk11 x + + kukkr r 11 xr 1 for all the values of l0 ; l1 ; : : : ; lr 1 and k0 ; k1 ; : : : ; kr 1 , searching for an assignment to the l’s and the k ’s so that the coefficient of xi in the result is j . This can be stated with a long boolean formula, of size O(pr s(t u)). A similar trick works for kt 1 kji . Lemma 3.6 The translation procedure over jF j = pr satisfies the three conditions of correctness.

t)kji

References [1] Paul Beame, Russell Impagliazzo, Jan Kraj´ıc˘ ek, Toniann Pitassi, Pavel Pudl´ak, and Alan Woods. Exponential lower bounds for the pigeonhole principle. In Proceedings of the 24th Annual ACM Symposium on theory of computing, pages 200–220, 1992. [2] Stephen A. Cook and Robert A. Reckhow. On the lengths of proofs in the propositional calculus. In Proceedings of the Sixth Annual ACM Symposium on Theory of Computing, 1974. See also corrections for above in SIGACT News, Vol. 6 (1974), pp. 15-22. [3] Stephen A. Cook and Michael Soltys. The proof complexity of linear algebra. In Seventeenth Annual IEEE Symposium on Logic in Computer Science (LICS 2002), 2002. [4] Jan Kraj´ıcˇ ek. Bounded Arithmetic, Propositional Logic and Complexity Theory. Cambridge University Press, 1996. [5] Joseph Shoenfield. Mathematical Logic. AddisonWesley, 1967. [6] Michael Soltys. The Complexity of Derivations of Matrix Identities. PhD thesis, University of Toronto, 2001. [7] Alasdair Urquhart. The complexity of propositional proofs. The Bulletin of Symbolic Logic, 1:425–467, 1995.