Coding-Theoretic Lower Bounds for Boolean

0 downloads 0 Views 178KB Size Report
Nov 4, 2004 - integer multiplication (IMUL), and the discrete Fourier transform (DFT). ...... Karatsuba [19] gave a divide-and-conquer multiplication algorithm.
Coding-Theoretic Lower Bounds for Boolean Branching Programs Nandakishore Santhi Department of Electrical and Computer Engineering Department of Computer Science and Engineering University of California San Diego 9500 Gilman Drive, La Jolla, CA 92093

[email protected] Alexander Vardy Department of Electrical and Computer Engineering Department of Computer Science and Engineering University of California San Diego 9500 Gilman Drive, La Jolla, CA 92093

[email protected]

November 4, 2004 (Preliminary version)

Abstract We develop a general method for proving lower bounds on the complexity of branching programs. The proposed proof technique is based on a connection between branching programs and error-correcting codes and makes use of certain classical results in coding theory. Specifically, lower bounds on the complexity of branching programs computing certain important functions follow directly from lower bounds on the minimum distance of several well-known families of algebraic codes. In order to establish a connection between the two domains, we “invert” the recent results of Bazzi and Mitter which are, in turn, based upon Ajtai’s new proof techniques for the branching program model. Using the proposed method, we obtain lower bounds for deterministic boolean branching programs that compute several fundamental operations, such as finite-field multiplication, cyclic convolution, integer multiplication, matrix-vector multiplication, and the discrete Fourier transform (DFT). In several cases, our lower bounds either match the best previously known results or improve upon them.

This work was supported in part by the David and Lucile Packard Fellowship, by the National Science Foundation, and by the California Institute of Telecommunications and Information Technology at the University of California San Diego.

1. Introduction Proving lower bounds in general models of computation is a notoriously difficult task. Topics of interest in this direction are both the bounds themselves as well as the methods for obtaining these bounds. Branching programs (BPs) have emerged as the standard model for nonuniform sequential computation [38]. In this paper, we present a new method for proving lower bounds on the time-space tradeoff for general BPs. Our method is based on establishing a connection between branching programs and error-correcting codes, in such a way that lower bounds on the complexity of branching programs follow directly from (known) lower bounds on the minimum distance of the corresponding codes. In general, whenever a meaningful connection between two apparently unrelated areas of research is discovered, the potential for a better handle on problems in both areas is often greatly enhanced. Indeed, using the proposed method, we obtain new lower bounds for deterministic boolean BPs that compute several important functions, such as finitefield multiplication, cyclic convolution, integer multiplication, and the discrete Fourier transform. The new bounds are more general than the lower bounds for these functions that are available in the existing literature, and thus not directly comparable with them. We have nevertheless tried to make such comparisons, and found that in most cases our bounds either match the best previously known results or improve upon them.

Relevant prior work. The branching program model of computation was introduced in [24]; it was subsequently extended and refined in a number of papers [10, 20, 27, 32, 37]. We will give a precise definition of branching programs in the next section. It is known [10] that RAMs of space S and time T with an arbitrary instruction set can be simulated by BPs of size 2 O( S) and time T. Thus the BP model is in many ways more general than the RAM model. This underscores the importance of establishing lower bounds on the time-space tradeoff for general BPs. Much progress in this direction has been already achieved. Borodin and Cook [10] gave an exponential lower bound on the size of read-k BPs that sort integers. A major step forward was the exponential size lower bound for nondeterministic read-k BPs due to Borodin, Razborov, and Smolensky [11]; the proof of this result in [11] paved the ground for many of the current proof methods. More recently, in a series of breakthrough papers, Beame, Jayram, and Saks [6, 8], Ajtai [2, 3], and Beame, Saks, Sun, and Vee [7] proved exponential size lower bounds for general branching programs that are only restricted in the length of their computation. Some of these papers also provide the first ever lower bounds on the time-space tradeoff for general (unrestricted) branching programs. Apart from proving lower bounds for the most general BP model possible, it is also of interest to derive such bounds for branching programs that compute a large class of functions, preferably practically important ones. Although our proof technique is general, as specific examples we consider the following functions: finite-field multiplication (FMUL), cyclic convolution (CONV), matrix-vector multiplication (MVMUL), integer multiplication (IMUL), and the discrete Fourier transform (DFT). The best previously known (to us) results for FMUL and CONV were obtained by Sauerhoff and Woelfel [34], based on [6] and the work of Mansour, Nisan, and Tiwari [26] on the branching program complexity of universal hash classes. Integer multiplication (IMUL) is one of the most fundamental arithmetic functions. Thus rather more is known about the complexity of IMUL [9, 12, 14, 16, 31, 34, 39]. In particular, Bryant [12] showed that computing the middle bit of a n-bit integer product requires exponential size for oblivious read-once BPs, also known as ordered binary decision diagrams (OBDDs). Gergov [14] extended this result to oblivious, but not necessarily read-once, branching programs of linear length. Ponzio [31] proved that unrestricted (not necessarily √ oblivious) read-once BPs require space (space is the logarithm of size) at least Ω( n). This bound has been improved to Ω(n) by Bollig and Woelfel [9]. For branching programs that are not necessarily oblivious and not necessarily read-once, the strongest lower bounds are due to Sauerhoff and Woelfel [34]. For (nondeterministic) boolean BPs, the result of [34] implies that for k = O(log n), a read-k BP computing the middle bit of IMUL requires space at least Ω(n/k 2 34k ). The discrete Fourier transform (DFT) is another fundamental operation of great practical and theoretical importance. Morgenstern [28] showed that computing an

1

n-point FFT using complex numbers of modulus < 1 requires at least (n/2) log n additions. However, it appears that not a lot is known about the complexity of computing DFT in the branching program model: we are aware of only two relevant prior results. Yesha [40] proved that for a q-way BP that computes an n-point DFT on a domain of size q, the time-space product is at least Ω(n 2 ). Abrahamson [1] was able to improve this result by considering the expected values of time and space under uniform distribution over all possible inputs. He showed that for q-way BPs, the product of expected time and space is Ω(n 2 log q). Observe that for boolean BPs computing the same function (cf. [30, Proposition 1]), this translates to Ω(n 2 / log q). As already mentioned, our results are based upon a connection between branching programs and errorcorrecting codes. Several attempts to establish such connections have been made before. In particular, Okol’nishnikova [29] considered branching programs that compute the characteristic function of a binary code C ⊆ {0, 1} n , defined by f ( x) = 1 iff x ∈ C. Among other results, she proved that read-k BPs that compute this function, where k = (1 − 2c) ln n/ ln ln n for a positive constant c < 0.5, must have space at least Ω(nc ). Lafferty and Vardy [21] showed that an oblivious read-once branching program (OBDD) that computes the characteristic function of a binary code C is equivalent to the minimal proper trellis for C. There is an extensive literature on minimal trellises for binary codes — see [35, 36] and references therein. In particular, Lafourcade and Vardy [22, 23] proved that asymptotically good binary codes have exponential trellis complexity. This implies, via the results of [21], that OBDDs that compute the characteristic function of such codes must have exponential size. More recently, Bazzi and Mitter [4] have shown that the minimum distance of a binary code is related to the branching program complexity of its encoder. This essentially extends the trellis complexity results of [21–23] to a much more general computation model. It is this paper of Bazzi and Mitter [4] that paves the ground for most of the results reported herein.

Coding-theoretic lower bound for branching programs. Most of the existing lower bounds for branching programs essentially follow from a variation of the pigeonhole principle. When time is cut into short sections, the recorded history of an entire computation must be contained in the restricted memory at the beginning of a section, which makes it possible to apply pigeonhole-type arguments. Our method is completely different. The starting point for our lower bounds is the aforementioned paper of Bazzi and Mitter [4]. This paper establishes the following result. Let C ⊆ {0, 1} n be a binary code of cardinality |C| = 2 k and let B be any deterministic boolean (2-way) branching program that encodes C in time T and space S; then it is shown in [4] that the minimum distance d of C satisfies  3 2Tk ! T S (1) d = O k k k The motivation of Bazzi and Mitter [4] in proving (1) was to use known results about the complexity of encoders for turbo codes (and other related codes) in order to establish asymptotic upper bounds on their minimum distance (cf. [5]). In this work, we first improve upon the results of Bazzi and Mitter [4] by providing an upper bound on the minimum distance that is stronger than (1), and also modify them by proving an even stronger bound for read-r/write-w branching programs (see Theorem 4 and Theorem 5, respectively). More importantly, we invert these results as follows. We consider families of error-correcting codes for which asymptotic lower bounds on the minimum distance are already known, or can be easily derived using wellknown methods in coding theory. Moreover, we pick families of codes that can be encoded essentially by computing a certain target function f (·), preferably of practical importance. Then (1) in conjunction with a lower bound on the minimum distance of such codes immediately imply a lower bound on the time-space tradeoff of any boolean branching program that computes the target function f (·). For example, in Section 4, we exhibit a family of binary linear codes of dimension n and length 2n that can be encoded by computing a product of two (n-bit) elements of the finite field GF (2n ). Moreover, we prove that this family of codes attains the Gilbert-Varshamov bound [15], which means that there exist codes 2

in this family such that d > 0.22n for all sufficiently large n. In conjunction with (1), this implies that if B is a boolean BP that computes n-bit finite field multiplication (FMUL), then the time T and the space S of B must satisfy ( T /n) 3 ( S/n)n/2T = Ω(1). For more details on this (and a slightly stronger bound on the branching program complexity of FMUL), see Sections 3 and 4.

Specific results for certain target functions. We have applied the general method described in the foregoing paragraph to several well-studied target functions, and our results are summarized in Table 1 below. In each case, we were able to find families of codes that “encode” the desired function on one hand and have a sufficiently large minimum distance on the other hand (see Section 4 for more details). We note that the bounds 1 in Table 1 are based on Theorem 4 and Theorem 5, which are slightly stronger than (1). We also point out that the branching programs we consider are multi-output. As opposed to decision BPs that compute a single bit, multi-output BPs compute a number of bits and each node in the BP (rather than just the sink nodes) can, in principle, write (assign) one of the output bits. Function

Time-Space Tradeoff

Model

n-bit FMUL n-bit CONV n-bit MVMUL

 2 n/2T T S = Ω(1) n n

General boolean BPs

n-bit IMUL



T log n n

n-bit IMUL n-point DFT

t log n



2

2 

T n log n

2S log n n

2S log n n

2

n/2T log n

1/t

S γ n log n

= Ω(1)

read-r/write-w BPs  t = max {r, w}

= Ω(1)

γ n log(n)/2T

General boolean BPs

= Ω(1)

General boolean BPs  for all 0 < γ < 1

Table 1. Time-space tradeoffs for boolean BPs computing certain fundamental functions

In all cases, except for the third row in Table 1, the underlying computation model is a deterministic boolean branching program that is not restricted in any way (not necessarily oblivious, or leveled, or read/write limited, and so forth). This makes it somewhat difficult to compare our results to the best previously known bounds, since these bounds usually apply to more restricted computation models. For example, it appears that the best currently known bounds on branching program complexity of n-bit FMUL, CONV, and IMUL are due to Sauerhoff and Woelfel [34]. Sauerhoff and Woelfel establish two types of bounds in [34]. One of these applies only to q-way BPs, where q grows as n O(1) and must be at least 2 120 . Since we are concerned with boolean (2-way) BPs, bounds of this kind are not directly comparable to those in Table 1. For boolean BPs, Sauerhoff and Woelfel [34] prove the following. Theorem 1. Consider the set of all (nondeterministic) boolean read-r branching programs that compute the middle bit in a product of two n-bit integers. There exists a positive constant c such that for all r 6 c log n,  the space of all the BPs in this set is bounded by Ω n/r2 34r .

There are several important differences between Theorem 1 and our bounds for IMUL in Table 1. First, Theorem 1 applies to nondeterministic BPs whereas our results do not; in this sense Theorem 1 is more 1 All

the logarithms in Table 1, and throughout this paper, are to base 2.

3

general. Second, the branching programs in Theorem 1 are decision BPs which compute only the middle bit (coefficient of 2n−1 ) of n-bit IMUL, whereas our bounds apply to BPs that compute all the 2n bits of the product. This difference does not seem to be significant, since it is known [12, 38] that the middle bit is the “hardest” one to compute. A third difference is that the number of reads r in Theorem 1 is restricted to O(log n), whereas our bounds in the second and third rows of Table 1 hold without this restriction. Note that when the number of reads is limited to r, the computation time T is also limited, since T 6 rn. Thus Theorem 1 applies only to the case where T = O(n log n). In this sense, our results are much more general. Finally, ignoring all these differences, we find that for almost all valid choices of r and T, our bounds on the space S are only slightly weaker than Theorem 1, by a factor of about n log log n / log n. With regard to boolean BPs computing n-bit finite-field multiplication (FMUL) or convolution (CONV), Sauerhoff and Woelfel [34] prove the following 1 result. Theorem 2. Let r be a positive integer such that 192 r 2 3r 6 n. Let m > 1932 r 2 3r be an integer such that 3r m 6 n. Let f (·) be a function computing any m consecutive bits of n-bit FMUL or of n-bit CONV.  Then 2 r the space of all (nondeterministic) boolean read-r BPs that compute f (·) is bounded by Ω m/r 3 .

Essentially the same differences as before pertain to this case as well, the major ones being deterministic vs. nondeterministic BPs and read-r vs. unrestricted BPs. However, ignoring these differences, we can try to make a comparison as follows. If r is constant and m = Ω(n), then Theorem 2 reduces to S = Ω(n), which is exactly the same result we get from the first row of Table 1 for the case T = O(n). On the other hand, if r is allowed to grow, say r = 1/4 log n, then the bound on S in Table 1 becomes stronger than Theorem 2. With regard to DFT, the best known (to us) lower bound on the time-space tradeoff of boolean BPs, due to [1, 40], establishes TS = Ω(n 2 ). Here, if time is superlinear in n, then the resulting bound on the space is sublinear. In contrast, the bound in the fourth row of Table 1 makes it possible to provide superlinear bounds on space when time is also superlinear. For example, for 2 T = ω(n log1− T n) our results imply that S = ω(n log 1− S n), where  T ,  S are arbitrary positive constants. Thus the bound in Table 1 is substantially stronger than the best previously known bounds. In summary, for FMUL/CONV/MVMUL/DFT, our space lower bounds are stronger than the best previously known results, at least at certain superlinear time points. Our results are also inherently more general, since we do not assume artificial limitations on the number of reads per input bit.

Organization. We start in the next section with some background and definitions. In particular, we define precisely the branching program model of computation that is used throughout this paper. We also recount some elementary facts concerning error-correcting codes. In Section 3, we give a precise statement of the Bazzi-Mitter theorem [4], as well as our improvements thereupon. We then explain how these results lead to lower bounds on the time-space tradeoff of branching programs. In Section 4, we deal with specific target functions and prove the bounds compiled in Table 1. In particular, in Section 4.1, we exhibit asymptotically good binary codes that require the FMUL operation for their encoding. From this we infer a time-space tradeoff for branching programs computing FMUL. In Section 4.2, we describe classical coding-theoretic results which establish the existence of asymptotically good quasi-cyclic codes, whose encoding can be reduced to computing a cyclic convolution (CONV). Alternatively, we can recast this encoding as a matrix-vector multiplication (MVMUL). This leads to lower bounds for these operations. We also reduce the IMUL function to the CONV function (using a standard procedure) thus proving the lower bounds for IMUL. In Section 4.3, we consider Reed-Solomon codes, which can be encoded using a generalized DFT. From this, we infer the lower bound for the DFT operation. Finally, in Section 5, we describe several typical functional forms for the general lower bounds in Table 1. We also compare these results with the complexity of known algorithms [19]. 1 In fact, Theorem 2 is a simplified statement of (the first part of) Corollary 1 in [34], since Sauerhoff-Woelfel also consider branching

programs that approximate f (·) with a certain error probability. However, this would make the comparison even more difficult.   2 Recall that the definition of ω (·) is as follows: f ( n ) = ω g ( n ) if and only if g ( n ) = o f ( n ) .

4

2. Preliminaries: background and definitions We will need several elementary facts and definitions concerning branching programs and error-correcting codes. Due to space limitations, some of these are discussed only briefly. See Wegener [38] and MacWilliams and Sloane [25] for a detailed treatment of branching programs and codes, respectively.

2.1. The branching program model There are many subtly different, yet essentially equivalent, definitions of branching programs in the literature. The following is the model we use throughout this paper. The definition below generally follows Ajtai [3]. Definition 1. Let D be a finite set of size q, with 0 ∈ D . A q-way branching program B with n input variables x1 , x2 , . . . , xn and m output variables y 1 , y2 , . . . , ym is a four-tuple h G, varin , varout , outi, where:

a. G is a finite edge-labeled directed acyclic graph, with a unique source node and a unique sink node; b. varin is a function defined on the non-sink nodes of G with values in the set { x1 , x2 , . . . , xn }; c. varout is a function defined on all the nodes of G with values in the set { y1 , y2 , . . . , ym } ∪ {φ};

d. out is a function defined on all the nodes of G with values in the set D ; e. The sink node has out-degree zero, all other nodes have out-degree q. For each non-sink node v of G, the set of edges starting at v is labeled by the elements of D so that all the q edges are labeled distinctly. The branching program B computes a function f : D n → D m as follows. Given a ∈ D n , we think of a as an assignment of values to the n input variables x 1 , x2 , . . . , xn . A computation in B upon input a is the unique path followed from the source node to the sink node in G according to the following rules. At each node v along the path (including the source node), we read the value of the input variable var in (v) and leave v along the unique edge whose label is equal to the value of that variable. If var out (v) 6= φ, we also write the output variable varout (v) by assigning varout (v) := out(v). The fact that G is acyclic and finite guarantees that every computation eventually terminates in the sink node. The value of f (a) is defined as the assignment of the m output variables that results when the computation terminates. If for an output variable y no assignment is made during the computation, then y = 0 is assumed by default. On the other hand, if for an output variable y more than one assignment is made, the last assignment takes precedence. The total number of nodes in B is called its size and denoted by |B|, while S = log |B| is called the space of B . The length of a computation in B is the number of edges in the corresponding path. The time

of a computation in B is the number of edges plus the number of nodes in the corresponding path, excluding those nodes v for which varout (v) = φ. The time T (respectively, the length L) of B is defined as the maximum time (respectively, the maximum length) of a computation in B . A branching program B is said to be oblivious if the input variables are read in the same order along all possible paths from the source to the sink in G. A read-r (respectively, write-w) branching program is one in which each input (respectively, output) variable is read at most r times (respectively, written at most w times) along any path from the source to the sink in G. Note that not all paths from the source to the sink in G are (valid) computations. If the restrictions above apply only to computation paths, then B is said to be semantically oblivious and/or semantically read-r/write-w. In this paper, we assume the former, more restrictive (often called syntactic ) definitions throughout.

A q-way branching program with q = 2 is said to be boolean. A useful visualization of a boolean BP is as a RAM with 1-bit-wide input registers and a working memory of S bits [2, 10]. Proving nonlinear timespace lower bounds for boolean BPs is generally considered the most challenging. Indeed, as shown in [30, Proposition 1], a factor of log q is lost when converting any space or time lower bound from a q-way BP to a boolean BP. In this paper, we denote the class of unrestricted boolean branching programs by BP 2 . 5

2.2. Error-correcting codes A binary1 error-correcting code C of length n is simply a subset of {0, 1} n . A linear code is a subspace of {0, 1}n , where {0, 1} n is regarded as a vector space of dimension n over the finite field F2 . If C is a linear code, then |C| = 2 k for an integer k. We shall assume that |C| = 2 k throughout, whether C is linear or not. An encoder for C is a one-to-one function E : {0, 1} k → C. The Hamming distance between two vectors in {0, 1}n = F2n is simply the number of positions where they differ. The minimum distance of a code C is the minimum Hamming distance between any two distinct vectors in C. When we say that C is an (n, k, d) binary code, we mean that C ⊆ {0, 1} n is such that |C| = 2 k and the minimum distance of C is at least d.

The rate of an (n, k, d) binary code is defined as R = k/n and its relative distance is defined as δ = d/n. An infinite set of codes characterized by a certain property is called a family of codes (e.g., the family of binary linear codes). A family F is said to be asymptotically good if it contains an infinite sequence of codes C1 , C2 , . . . of increasing length n, such that both the rate and the relative distance of all the codes in this sequence are bounded away from zero as n → ∞. Similarly, F is said to attain the Gilbert-Varshamov bound if it contains an infinite sequence of codes C1 , C2 , . . . of increasing length, such that the relative distance of all the codes in the sequence is at least δ and their rate is at least R, with δ and R satisfying def

R > 1 − H2 (δ ) = 1 + δ log δ + (1−δ ) log (1−δ )

(2)

where H2 (·) is known as the binary entropy function. For example, the family of binary linear codes attains the Gilbert-Varshamov bound, and many other (more restrictive) families that attain this bound are known.

3. The connection between codes and branching programs Although we have not formally defined a trellis in the previous section, loosely speaking, a trellis for a binary code C ⊆ {0, 1} n may be thought of as an oblivious, leveled, read-once/write-once branching program that computes the encoder function E : {0, 1} k → C. Arguing by partitioning the trellis for an (n, k, d) binary code C into sections of length d − 1, Lafourcade and Vardy [22] proved, back in 1995, that the logarithm S of the number of nodes in such a trellis is lower-bounded by   k(d− 1) S > (3) n Inspired by Ajtai’s proof [3] of time-space complexity tradeoffs for the H AMMING D ISTANCE problem, it was shown by Bazzi and Mitter [4] that the minimum distance of a binary code C is related to the time-space complexity of a boolean branching program that computes the encoder function for C. Theorem 3. Let C be an (n, k, d) binary code. Let B be a deterministic boolean branching program that computes the encoder function E : {0, 1} k → C in time T and space S. Then  3 2Tk ! T S (4) d = O k k k The arguments used for the proof of Theorem 3 in [4] have a lot in common with those used in [22] for the proof of (3). However, the proof of Bazzi and Mitter [4] is much more general and the resulting bound is stronger. Recently, by optimizing the parameters used for the proof of Theorem 3 in [4], we were able to improve upon the bound in (4) slightly. This result is presented in Theorem 4. Moreover, by slightly modifying our arguments, we proved an even stronger bound for a more restricted branching program model. This result is given in Theorem 5. The proofs of Theorem 4 and Theorem 5 can be found in [33]. 2 Up

until we encounter Reed-Solomon codes in Section 4.3, we will deal exclusively with binary codes.

6

Theorem 4. Let C be an (n, k, d) binary code. Let B be a deterministic boolean branching program that computes the encoder function E : {0, 1} k → C in time T and space S. Then  2 2Tk ! T S d = O k k k Theorem 5. Suppose that a binary (n, k, d) code C is computable using a read-r/write-w deterministic boolean branching program B in time T and space S. Let t = max {r, w}. Then   1t ! S d = O kt2 k Since the bounds in Theorem 4 and Theorem 5 are slightly stronger than the bound of Bazzi and Mitter [4], it is these bounds that we will use for the proof of our results in the next section.

4. Specific lower bounds for certain fundamental algebraic operations 4.1. Complexity of finite-field multiplication In order to familiarize the reader with the techniques which are common with the subsequent proofs, we will go through this proof in some detail. First we construct a family of binary codes using linear operations in an extension field. The binary codes are systematic rate R = 1/2, parametrized by an element β ∈ GF(2n ). Definition 2. Given any integer n > 0 and β ∈ GF(2 n ), a (2n, n, d) binary code from the family of codes Jβ is defined by the following mapping: i 7→ c = [i | i ? β]

where i is a n-dimensional binary information vector and ? denotes the FMUL operation in GF (2 n ). These codes were originally used as the inner codes by Justesen [17] in his family of concatenated codes. The Justesen codes are remarkable because they were the first family of constructive codes with assured, simultaneously good, relative minimum distance and rate. Here, we will not be interested in the concatenated construction, but rather in the inner codes themselves. Using the usual counting arguments, we prove that this family contains codes which asymptotically achieve the binary Gilbert-Varshamov bound [15]. We have the following asymptotic results. For the proofs of Theorem 6 and Theorem 7, see the appendix. Theorem 6. Let the elements in GF (2 n ) be represented using the positional encoding (natural encoding where each position is associated with the corresponding power of a fixed primitive element in the field) for some fixed primitive element in GF (2n ). Then the family of binary codes Jβ parametrized by β ∈ GF (2 n ) are linear and contain asymptotically good codes which meet the binary Gilbert-Varshamov bound. Theorem 7. Any general deterministic boolean branching program B computing multiplication over elements in GF (2 n ) represented using the positional encoding in time T and space S satisfies:  2   2Tn T S = Ω(1) n n

7

4.2. Complexity of CONV, MVMUL, and IMUL In order to prove lower bounds on the complexity of these operations, we make use of an old result from coding theory which says that the family of quasi-cyclic rate- 12 binary codes is asymptotically good. Definition 3. An (mn 0 , mk0 ) linear code is said to be quasi-cyclic with basic block length n 0 if every cyclic shift of a code word by n 0 symbols yields another code word. Let us consider a binary linear code with a generator matrix of the form G = [C 1 | C2 ], where G has full row rank and both C1 and C2 are square n × n circulant matrices. It is easy to see that this is a (2n, n) rate- 21 quasi-cyclic code. When C1 = In , the code is said to be in systematic form. We will be concerned with the case when C2 is non-singular. Associated with any n × n circulant matrix is a degree n − 1 polynomial representing its first row. It is well known that a circulant matrix is non-singular iff its associated polynomial is relatively prime to ( x n − 1). The following result is due to Chen, Peterson and Weldon [13] (for ` = 1) and to Kasami [18] (for ` > 1). Theorem 8. For an integer a, let M ( a) denote the smallest positive integer m such that a|2 m − 1. Let p be an odd prime number satisfying M ( p) = p − 1 and M ( p 2 ) > M ( p). Then for any integer ` > 0, there exists a systematic non-singular quasi-cyclic (2p ` , p` ) binary code whose relative minimum weight δ ` −1 satisfies the inequality H2 (δ` ) > p2p . R EMARK : Many such primes exist. Examples include 3, 5, 11, 13, . . . , 10006699, and so forth. Expressed in a different way, for such primes p, 2 has multiplicative order p − 1 modulo p. Theorem 9. Any general deterministic boolean branching program B computing multiplication over the ring GF (2)[ x]/( x n − 1) (circular convolution of binary polynomials) in time T and space S satisfies:  2   2Tn S T = Ω(1) n n The proof of Theorem 9 is given in the appendix. Unlike this theorem, the following corollary is valid only for a more restricted model, but will be used later for proving a lower bound for IMUL. Corollary 10. Let r, w 6 t. Then any general read–r, write–w deterministic boolean branching program B computing multiplication over the ring GF (2)[ x]/( x n − 1) (circular convolution of binary polynomials) in space S satisfies   1t S 2 t = Ω(1) n Proof. The class of branching programs considered is restricted in the number of times any input variable is read and any output variable is written. The assertion is a simple consequence of the minimum distance bound for this restricted class of programs. Corollary 11. Any general deterministic boolean branching program B computing binary matrix-vector product of dimension n in time T and space S satisfies:  2   2Tn T S = Ω(1) n n 8

Proof. Almost all random binary linear codes are asymptotically good. The result follows since an encoder for such a code may be realized using matrix-vector multiplication. Another way to see this is by noting that convolution of two polynomials is the same as the evaluation of a matrix-vector product. Now the result follows from previous theorem. Corollary 12.

(i) Any general deterministic boolean branching program B computing the integer product of two n-bit numbers represented using the positional (2-adic) encoding in time T and space S satisfies: 

T log n n

2 

2S log n n



n 2T log n

= Ω(1)

(ii) Let r, w 6 t. Then any general read–r, write–w deterministic boolean branching program B computing the integer product of two n-bit numbers in space S satisfies: t

2



2S log n n

 1t

= Ω(1)

Proof. See the appendix for a proof by reduction to CONV.

4.3. Complexity of discrete Fourier transform (DFT) In this section, we make use of the properties of (generalized) Reed-Solomon codes to conclude about the time-space tradeoff for branching programs computing the DFT. Definition 4. Let α = (α 0 , . . . , αn−1 ) where αi are distinct elements of GF (q m ) and let v = (v 0 , . . . , vn−1 ) where vi are any nonzero elements in GF (q m ). Let k 6 n and let a k dimensional information vector with elements in GF (q m ) be represented by a k − 1 degree polynomial F ( x). Then the generalized RS code denoted as GRS (n,k) (α , v) is defined by the linear map over GF (qm ),

( F0 , . . . , Fk−1 ) 7→ (v0 F (α0 ), . . . , vn−1 F (αn−1 )) It is well known that the minimum distance d of GRS codes satisfies d = n − k + 1. This leads us to the following results. For the proofs of Theorem 13 and Corollary 14, see the appendix. Theorem 13. Let m > log n be an integer. Then any general deterministic boolean branching program B encoding the code GRS (n,k) (α , v) defined over GF (2m ) in time T and space S satisfies: m



T km

2 

S km

 km 2T

= Ω(1)

Corollary 14. Let 0 < γ < 1 be a constant. Then any general deterministic boolean branching program B computing an n-point discrete Fourier transform in time T and space S satisfies: log n



T n log n

2 

S γ n log n

9

log n  γ n2T

= Ω(1)

5. Functional forms of the lower bounds and comparison with upper bounds As observed in the introduction, lower bound results often appear in a variety of forms, making them difficult to compare. Hence, it is of interest to see typical functional forms of the lower bounds derived in this paper. It is also interesting to observe how tight the various lower bounds are in comparison with known efficient algorithms for sequential computation of the functions considered in Section 4.

Typical functional forms. A list of simultaneous time-space lower bounds which can be obtained for boolean BPs implementing the fundamental operations of interest as implied by the tradeoff results of this paper is compiled in Table 2 below. Operation n-bit FMUL, CONV, MVMUL

Time ω ( n)

n-bit FMUL, CONV, MVMUL

ω

n-bit IMUL n-bit IMUL n-point DFT n-point DFT



n log n log 1 + T log n

Space Ω(n) 

 ω n1− S ,

,

∀T > 0  n ω , log 1 + T log n ∀ T > 0   log n t = ω , 1 + T log log n ∀T > 0  ω n log1− T n , ∀T > 0  ω

 ω n1− S ,  ω n1− S ,

Model General BP 2

∀ S > 0

General BP 2

∀ S > 0

General BP 2

∀ S > 0

read-r/write-w t = max {r, w}

  ω n log1− S n , ∀ S > 0  ω n1− S , ∀ S > 0

n log 2 n log 2 log n

BP 2

General BP 2 General BP 2

Table 2. Typical functional forms for the lower bounds in Table 1 It can be easily shown (using M ATHEMATICA TM for instance) that if we substitute the lower bounds in Table 2 for T and S in the corresponding time-space tradeoff expressions given in Table 1, they vanish asymptotically as n → ∞, thereby verifying the results in Table 2.

Upper bounds from known efficient algorithms. The time complexity of practical algorithms is usually computed in terms of the number of bit operations, with each such operation assumed to consume one time tic. In the BP model however, the operations involved in a state transition requires only one time tic, no matter how many bits are affected in the process. Therefore it is easy to see that BP lower bounds are applicable for most sequential machines, even though tighter bounds are usually possible for specific other models. While each of the operations we considered may be trivially accomplished using a 2-way BP in space and time linear in either the input or output length, there are often clever algorithms which significantly reduce the number of bit operations. Karatsuba [19] gave a divide-and-conquer multiplication algorithm which requires T = O(n log 3 ) and S = O(n) for n-bit multiplication. Sch¨onhage and Strassen [19] employ divide-and-conquer and FFT techniques to give a T = O(n log n log log n) and S = O(n) algorithm for integer multiplication. Both the above algorithms are applicable to polynomial multiplication. Similarly, a n-point DFT may be performed in T = O(n log n) and S = O(n log n) using an FFT algorithm.

10

References [1] K. A BRAHAMSON, Time-space tradeoffs for algebraic problems on general sequential machines, J. Computer and System Sciences, 43, pp 269–289, 1991. [2] M. A JTAI, Determinism versus non-determinism for linear time RAMs with memory restrictions”, in Proc. 31-st Annual ACM Symp. Theory of Computing (STOC), pp. 632–641, Atlanta, GA., May 1999. [3] M. A JTAI, A nonlinear time lower bound for boolean branching programs, in Proc. 40-th Annual Symp Found. Computer Science (FOCS), pp. 60–70, New York, NY, October 1998. [4] L.M.J. BAZZI and S.K.M ITTER, Encoding complexity versus minimum distance, submitted to IEEE Trans. Inform. Theory, preprint 2003 (http://web.mit.edu/˜louay/Public/preprints/). [5] L.M.J. BAZZI , M. M AHDIAN, and D.A. S PIELMAN, The minimum distance of turbo-like codes, submitted to IEEE Trans. Inform. Theory, May 2003 (http://math.mit.edu/˜spielman/Papers/mindist.pdf). [6] P. B EAME , T.S. JAYRAM, and M. S AKS, Time-space tradeoffs for branching programs, J. Computer and System Sciences, 63, pp. 542–572, 2001. [7] P. B EAME , M. S AKS , X. S UN, and E. V EE, Time-space trade-off lower bounds for randomized computation of decision problems, Journal of the ACM, 50, pp. 154–195, 2003. [8] P. B EAME , M. S AKS, and J.S. T HATHACHAR, Time-space tradeoffs for branching programs, in Proc. 39-th Annual Symp. Found. Computer Science (FOCS), pp. 254–263, Palo Alto, CA, November 1998. [9] B. B OLLIG and P. W OELFEL, A read-once branching program lower bound of Ω(2 n/4 ) for integer multiplication using universal hashing, in Proc. 33-th Annual ACM Symp. Theory of Computing (STOC), pp. 419–424, Heronissos, Crete, Greece, July 2001. [10] A. B ORODIN and S.A. C OOK, A time-space trade-off for sorting on a general sequential model of computation, SIAM J. Computing, 11, pp. 287–297, 1982. [11] A. B ORODIN , A.A. R AZBOROV, and R. S MOLENSKY, On lower bounds for read-k-times branching programs, Computational Complexity, 3, pp. 1–18, 1983. [12] R.E. B RYANT, On the complexity of VLSI implementations and graph representations of boolean functions with applications to integer multiplication, IEEE Trans. Computers, 40, pp. 205–213, 1991. [13] C.L. C HEN , W.W. P ETERSON, and E.J. W ELDON J R ., Some results on quasi-cyclic codes, Information and Control, 15, pp. 407–423, 1969. [14] J. G ERGOV, Time-space tradeoffs for integer multiplication on various types of input oblivious sequential machines, Information Proc. Letters, 51, pp. 265–269, 1994. [15] E.N. G ILBERT, A comparison of signalling alphabets, Bell Syst. Tech. Journal, 31, pp. 504–522, 1952. [16] S. J UKNA, The graph of integer multiplication is hard for read-k-times networks, Tech. Report 95–10, Universit¨at Trier, 1995. [17] J. J USTESEN, A class of constructive asymptotically good algebraic codes, IEEE Trans. Inform. Theory, 18, pp. 652–656, 1972.

R-1

[18] T. K ASAMI, A Gilbert-Varshamov bound for quasi-cyclic codes of rate 1/2, IEEE Trans. Inform. Theory, 20, pp. 679–681, 1974. [19] D.E. K NUTH, The Art of Computer Programming, Volume 2: Seminumerical Algorithms, 3-rd edition, Addison-Wesley, New York, NY., 1998. [20] V.A. K UZ ’ MIN, An estimate of the complexity of the realization of functions of the algebra of logic by the simplest forms of binary programs, Metody Diskretnogo Analiza, 29, pp. 11–39, 1976. [21] J.D. L AFFERTY and A. VARDY, Ordered binary decision diagrams and minimal trellises, IEEE Trans. Computers, 48, pp. 971–986, September 1999. [22] A. L AFOURCADE and A. VARDY, Asymptotically good codes have infinite trellis complexity, IEEE Trans. Inform. Theory, 41, pp. 555–559, March 1995. [23] A. L AFOURCADE and A. VARDY, Lower bounds on trellis complexity of block codes, IEEE Trans. Inform. Theory, 41, pp. 1938–1954, November 1995. [24] Y. L EE, Representation of switching cirucits by binary-decision programs, Bell Syst. Tech. Journal, 38, pp. 985–999, 1959. [25] F.J. M AC W ILLIAMS and N.J.A.S LOANE, The Theory of Error Correcting Codes, North-Holland/Elsevier, Amsterdam, 1977. [26] Y. M ANSOUR , N. N ISAN, and P. T IWARI, The computational complexity of universal hashing, Theoretical Computer Science, 107, pp. 121–133, 1993. [27] W. M ASEK, A Fast Algorithm for the String Editing Problem and Decision Graph Complexity, M.Sc. Thesis, MIT, May 1976. [28] J. M ORGENSTERN, Note on a lower bound of the linear complexity of the fast fourier transform, Journal of the ACM, 20, pp. 305–306, 1973. [29] E.A. O KOL’ NISHNIKOVA, Lower bounds for branching programs computing characteristic functions of binary codes, Metody Diskretnogo Analiza, 51, pp. 61–83, 1991 (in Russian). [30] J. PAGTER, Time-Space Tradeoffs, Ph.D. Thesis, University of Aarhus, Denmark, March 2001. [31] S. P ONZIO, A lower bound for integer multiplication with read-once branching programs, SIAM J. Computing, 26, pp. 798–815, 1998. [32] A.A. R AZBOROV, Lower bounds for deterministic and nondeterministic branching programs, Lecture Notes in Computer Science, 529, pp. 47–60, 1991. [33] N. S ANTHI and A. VARDY, On the branching program complexity of encoding binary codes, preprint, 2004. [34] M. S AUERHOFF and P. W OELFEL, Time-space tradeoff lower bounds for integer multiplication and graphs of arithmetic functions, in Proc. 35-th Annual ACM Symp. Theory of Computing (STOC), pp. 186–195, San Diego, CA., June 2003. [35] A. VARDY, Algorithmic complexity in coding theory and the minimum distance problem, in Proc. 29-th Annual Symp. Theory of Computing (STOC), pp. 92–109, El Paso, TX., May 1997.

R-2

[36] A. VARDY, Trellis structure of codes, Chapter 24, pp. 1989–2118 in the Handbook of Coding Theory, V.S. Pless and W.C. Huffman (Editors), Amsterdam: Elsevier, 1998. [37] I. W EGENER, The Complexity of Boolean Functions, Wiley, New York, NY., 1987. [38] I. W EGENER, Branching Programs and Binary Decision Diagrams: Theory and Applications, SIAM Press, Philadelphia, PA., 2000. [39] S. W INOGRAD, On the time required to perform multiplication, Journal ACM, 14, pp. 793–802, 1967. [40] Y. Y ESHA, Time-space tradeoffs for matrix multiplication and the discrete Fourier transform, J. Computer and System Sciences, 29, pp. 183–197, 1984.

R-3

Appendix A. Proof of Theorem 6 The codes Jβ are by definition linear codes in GF (2n ). Furthermore, since GF (2 n ) ∼ = [ GF (2)]n and since we are using the positional encoding for the extension field elements, linearity is preserved in the binary image of these codes. Hence all members of this family are (2n, n, d) binary linear codes. Let us count the number of codes in which a given code word c appears. Since the family of codes is linear, the information vector i = 0 maps to the code word c = 0 in all codes. We now assume i 6= 0 and consider two codes in this family, parametrized by β 1 and β2 . Then a particular code word appears in these two codes simultaneously iff, i ? β1 = i ? β2 . Since GF (2 n ) is a field, i has an inverse with respect to ? and hence it must be that β1 = β2 . Therefore we conclude that any non-zero code word appears in one unique code from this family. We give below the usual argument for a Gilbert-Varshamov bound for completeness. For some arbitrary 0 < d 6 n, the total number of non-zero code words of weight less than d is: d−1



w=1



2n w



6 22nH2 (d/2n) − 1

where we used the Binary Entropy Function, H 2 ( x) = − x log x − (1 − x) log (1 − x).

Each member in our family of codes is parametrized with a unique β ∈ GF (2 n ) = {0, 1, α , ..., α 2 where α is a primitive element in GF (2 n ). So there are a total of 2 n distinct codes in the family. Therefore, provided

n −2

}

22nH2 (d/2n) < 2n

holds, there must exist codes in this family with no non-zero code words of weight less than d. In other words, there exits codes with relative minimum distance δ which satisfies,

H2 (δ )

=⇒ δ

> ∈

n/2n 1 1 ( H2−1 ( ), ] 2 2

= (1 − R) ⊂ [0.11, 0.5]

and hence these codes meet the G-V bound for binary codes of rate 1/2. The last conclusion is a consequence of the codes being linear which means that their weight spectrum and distance spectrum are identical. The upper bound on δ was because of the singleton bound.

B. Proof of Theorem 7 Let us suppose that for any 2-way BP B computing FMUL,  2   2Tn S T 6= Ω(1) n n A-1

We can then trivially construct another BP B 0 to encode any code Jβ within the same space-time complexity bounds as follows. Upon reading any bit i j in the information vector i, B 0 writes out i j and performs the same state transition as B would. Thus if B is read−r, write−w; so is B 0 . While B writes n variables, B 0 writes 2n variables. Therefore the maximum time for any computation path of B 0 is at most a constant factor larger than the maximum time for any computation path of B .

But by the theorem 3, this implies that for all β, the codes J β are asymptotically bad so that their relative minimum distance, δ → 0. However we know that this is not true. Hence having reached a contradiction, it must be that FMUL cannot be computed more efficiently than what the space-time complexity bounds above imply, using any 2-way BP.

C. Proof of Theorem 9 Let the generator matrix of a good systematic non-singular quasi-cyclic (2n, n) code be given by G = [ I | C ]. Also, let the binary polynomial associated with the circulant matrix C be g( x). The encoding of a binary information polynomial i ( x) ∈ GF (2)[ x]/( x n − 1) can then be represented in the following form: i ( x) 7→ c( x) = (i ( x) , i ( x) g( x)) where represents the CONV operation. The proof of the theorem now follows by arguing along the same lines as the FMUL operation. Furthermore, it is clear from the proof that a particularly difficult case of convolution is when one of the polynomials is relatively prime to ( x n − 1).

D. Proof of Corollary 12 We perform a reduction from the CONV operation using a standard encoding scheme [1]. Let a = ( a0 , . . . , ak−1 ) and b = (b0 , . . . , bk−1 ) represent the detached coefficients of any two polynomials in the ring GF (2)[ x]/( x k − 1). Similarly let c = (c 0 , . . . , ck−1 ) be the detached coefficient vector of the result of the CONV operation c( x) = a( x) b( x). We now encode a and b to obtain A and B of dimensions 2kdlog ke and kdlog ke respectively as follows. Each original bit is post-fixed with dlog ke − 1 zeros. Only for the encoding of a, repeat the entire bit string produced by the padding procedure once. Now treat A and B as two 2kdlog ke bit numbers in the 2-adic representation and perform the integer multiplication operation to get C = A ∗ B. Padding with zeros prevents the propagation of carry and repetition mimics the wrap around in circular convolution. It is easily verified that c = (c0 , . . . , ck−1 ) = (C jdlog ke+2k | j ∈ {0, 1, . . . , k − 1}).

Thus we see that 2kdlog ke bit IMUL is at least as complex as k bit CONV. Define n = 2kdlog ke, giving log k < log n < 2 log k for sufficiently large k. Substituting for k the equivalent expressions involving n in theorem 9 we get part (i) of the corollary. The generality of the branching program unfortunately makes this bound rather weak. However, the bound can be strengthened if we restrict the BP to a limited number of reads and writes per variable. If we substitute the expressions involving n in corollary 10, we get part (ii) of the corollary. This is a stronger bound for the restricted class of branching programs for which it is valid.

A-2

E. Proof of Theorem 13 ( 1 −γ )

Let 0 < γ < 1 be a real valued parameter. We choose k = γ n such that, D = n − k + 1 = γ k + 1. This is legitimate, as for any valid choice of v and α , there exist GRS codes for any positive k 6 n. Each element in GF (2 m ) can be uniquely encoded as a binary vector of dimension m. Let us fix any one such encoding. Now consider the binary image of the code GRS (n,k) (α , v) obtained by replacing each symbol in GF (2 m ) by its binary representation. Let us call this binary code C . Clearly, C is an (km/γ , km, > (1−γ γ ) k + 1) binary code, though it is not necessarily linear. It therefore follows that: km



T km

2 

S km

 km 2T

= Ω(

(1 − γ ) k + 1) γ

The theorem follows upon rearranging the factors.

F. Proof of Corollary 14 Let n = 2m − 1, k = γ n, v = (1, . . . , 1) and α = GF (2 m )∗ . Where GF (2 m )∗ is the cyclic multiplicative subgroup of GF (2 m ) generated by β, a primitive element in GF (2 m ). The encoder for GRS (n,k) (α , v) is then simply the n-point finite field DFT of the information vector evaluated over the multiplicative subgroup. We make the substitutions k = γ n and m = log (n + 1) in the proof of theorem 13 giving,

γ n log (n + 1)



T γ n log (n + 1)

2 

S γ n log (n + 1)

 γ n log2T(n+1)

= Ω((1 − γ )n + 1)

After rearranging and ignoring constant factors on the LHS, we get the desired result.

A-3