Correlation Attacks on Block Ciphers - Semantic Scholar

Correlation Attacks on Block Ciphers

Thomas Jakobsen Master’s Thesis

✵

January, 1996 Department of Mathematics Technical University of Denmark Supervisor: Tom Hoholdt

Abstract This report presents a new statistical attack on iterative block ciphers called the correlation attack which is a natural generalization of linear cryptanalysis. The attack is based on nding complex-valued functions on the input and the output of a cipher which have a high correlation. Their mutual relation is then exploited to yield information about the nal round key. Introducing the notions of imbalance, I/O product, and correlation matrix, it is shown how to measure a cipher's security against the attack, and the mini-cipher IDEA(8) is found to be provably secure (assuming independency of subkeys). Links to other kinds of statistical attacks are explored. In particular, it is shown that the correlation matrix of a cipher and the matrix of dierential transition probabilities used with dierential cryptanalysis are connected by the 2-dimensional Fourier transform. This implies that correlation cryptanalysis and dierential cryptanalysis are essentially of the same strength.

Key words: Correlation, Boolean complexity, linear cryptanalysis, partitioning

cryptanalysis, dierential cryptanalysis, statistical attack, block cipher, IDEA, SAFER.

Resume Denne rapport omhandler et nyt statistisk angreb pa itererede blok-kryptosystemer kaldet et korrelationsangreb. Angrebet, som er en naturlig generalisering af liner kryptoanalyse, bygger pa forekomsten af hjt korrelerede komplekse funktioner, der opererer pa klartekst og ciertekst. Deres indbyrdes sammenhng udnyttes til at opna information om nglen i sidste runde. Begreberne ubalance, I/O-produkt og korrelationsmatrix indfres og det bliver vist, hvordan det er muligt at male et kryptosystems sikkerhed mod angrebet. Det demonstreres, at mini-kryptosystemet IDEA(8) er beviseligt sikkert (idet underngler antages at vre uafhngige). Sammenhnge med andre former for statistiske angreb undersges. Specielt vises det, at et kryptosystems korrelationsmatrix og matricen med dierentielle overgangssandsynligheder, som benyttes til dierentiel kryptoanalyse, er forbundet via den 2-dimensionale Fourier-transformation. Dette betyder, at korrelationskryptoanalyse og dierentiel kryptoanalyse i bund og grund har samme styrke.

Ngleord: Korrelation, boolsk kompleksitet, liner kryptoanalyse, partitionerende

kryptoanalyse, dierentiel kryptoanalyse, statistisk angreb, blok-kryptosystem, IDEA, SAFER.

Here Legrand, having re-heated the parchment, submitted it my inspection. The following characters were rudely traced, in a red tint, between the death's-head and the goat: 53++!305))6*;4826)4+.)4+);806*;48!8`60))85;]8*:+*8!83(88)5*!; 46(;88*96*?;8)*+(;485);5*!2:*+(;4956*2(5*-4)8`8*; 4069285);)6 !8)4++;1(+9;48081;8:8+1;48!85;4)485!528806*81(+9;48;(88;4(+?3 4;48)4+;161;:188;+?;

\But," said I, returning him the slip, \I am as much in the dark as ever. Were all the jewels of Golconda awaiting me on my solution of this enigma, I am quite sure that I should be unable to earn them." \And yet," said Legrand, \the solution is by no means so dicult as you might be led to imagine from the rst hasty inspection of the characters. These characters, as any one might readily guess, form a cipher { that is to say, they convey a meaning; but then, from what is known of Kidd, I could not suppose him capable of constructing any of the more abstruse cryptographs. I made up my mind, at once, that this was of a simple species { such, however, as would appear, to the crude intellect of the sailor, absolutely insoluble without the key." \And you really solved it?" \Readily; I have solved others of an abstruseness ten thousand times greater. Circumstances, and a certain bias of mind, have led me to take interest in such riddles, and it may well be doubted whether human ingenuity can construct an enigma of the kind which human ingenuity may not, by proper application, resolve. In fact, having once established connected and legible characters, I scarcely gave a thought to the mere diculty of developing their import." Sir Edgar Allan Poe The Gold Bug, 1843.

Preface Block ciphers are often used to secure communication since they are fast and easy to implement in both software and hardware. Unlike public key systems, the theory of block ciphers has not been very rmly based in a mathematical sense { at least not until recently. Then came the dierential attack of Biham and Shamir [4], and later the linear attack of Matsui [30]. These provided basis for work dealing with the provable security against statistical attacks. The new attack in this report which is called correlation cryptanalysis represents a natural generalization of linear cryptanalysis since it is dual to dierential cryptanalysis in a \time/frequency" sense. I hope it will help to a better understanding of the nature of both attacks and to a more thorough knowledge of how to construct good block ciphers. To my knowledge, most of the results presented in this work are original. This report expands results found while I spend a semester at the Signal and Information Processing Laboratory at the Swiss Federal Institute of Technology, Zurich. The work presented here has come into existence at the Department of Mathematics at the Technical University of Denmark. It has been very nice working here, and I look eagerly forward to doing my Ph.D. at this very same place. It is a pleasure to thank my supervisor Tom Hholdt, Carlo Harpes, James Massey, and Kim Luders-Jensen for useful comments and discussions. It has been a pleasure to work on the subject of cryptanalysis which I nd extremely interesting since it brings together such a wide variety of mathematical topics. Thanks also to Karin and my family for support and for putting up with the many hours that I spent over a piece of paper or in front of the screen.

Lyngby, January 31, 1996 Department of Mathematics Technical University of Denmark

Thomas Jakobsen i

Contents 1 Introduction 2 Preliminaries 2.1 2.2 2.3 2.4

Cipher Model : : : : : : : : : Miscellaneous : : : : : : : : : The Fourier Transform : : : : Elements from Linear Algebra

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

3 The Statistical Attack 4 The Correlation Attack

1 5 5 7 7 8

11 17

4.1 I/O Products and Correlation Matrices : : : : : : : : : : : : : : : : : 17 4.2 The Distribution of Imbalance Estimates : : : : : : : : : : : : : : : : 25 4.2.1 Empirical Evidence for the Distribution Function : : : : : : : 31

5 Attack Algorithms 5.1 5.2 5.3 5.4

The Simple Attack : : : : : : : : The Advanced Attack : : : : : : Simplifying the Advanced Attack Success Probability : : : : : : : : 5.4.1 The Simple Attack : : : : 5.4.2 The Advanced Attack : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

6 Bounds for Multiple Rounds 6.1 6.2 6.3 6.4 6.5

The Simple Attack : : : : : : : : : : : : The Advanced Attack : : : : : : : : : : Bounds Related to Composite Matrices : Schur Stochastic Decomposition : : : : : Construction of Secure Round Functions

7 Links to Other Statistical Attacks 7.1 7.2 7.3 7.4

Dierential Cryptanalysis : : : : Generalized Linear Cryptanalysis Partitioning Cryptanalysis : : : : Relations Between the Attacks : : iii

: : : :

: : : :

: : : :

: : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

35 35 36 37 42 42 47

51 51 56 59 64 67

69 69 71 73 75

8 A Couple of Ideas

8.1 Approximation of Boolean Functions : : : : : : : : : : : : 8.1.1 Repeated Substitution, Expansion, and Truncation 8.1.2 An Approach Using Resultants : : : : : : : : : : : 8.1.3 An Approach Using Buchberger's Algorithm : : : : 8.2 An Authentication Scheme using Grobner Bases : : : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

77 77 78 79 79 80

9 Conclusion

81

Bibliography A Symbols B Abbreviations

85 89 91

9.1 Suggestions for Further Work : : : : : : : : : : : : : : : : : : : : : : 82

Chapter 1 Introduction Since the adoption of DES [36] as a standard in 1977, people have tried zealously to break the cipher but apparently without any great success. Biham and Shamir [5] mention some of the attacks on DES that have been attempted during the years. Among these are attacks utilizing the complementation property of DES, exhaustive search and related time/memory tradeos, the method of formal coding, and the meet in the middle attack. None of these attacks have succeeded in bringing down the complexity to less than half that of exhaustive key search. Two approaches, however, have proven to be more successful than the rest, and in fact many a block cipher has been brought to its knees by one of these attacks. The attacks are the dierential cryptanalysis (DC) of Biham and Shamir [4] introduced in an attempt to cryptanalyze DES, and the linear cryptanalysis (LC) of Matsui, which was also used to attack DES. Harpes, Kramer, and Massey generalized LC in [12] by introducing the notion of binary I/O sums, and the new attack (GLC) was shown to be in some cases more successful than ordinary LC. A similar attack, m-ary GLC, which went beyond the binary case was explored in [17]. Harpes made further generalizations in [13] when he introduced the notion of partitioning cryptanalysis (PC) and showed that this is a still more powerful attack. In this work, we develop a new attack, namely correlation cryptanalysis (CC), which can be seen a the natural generalization of LC since it has several nice properties which LC, PC, and most notably GLC lack. This includes generalizations of Matsui's piling-up lemma and better applicability to ciphers which are not of the xor-kind. Furthermore, the attack is the \time/frequency" dual of DC, so to speak. The attack is also closely related to the correlation attack on stream ciphers which was rst pointed out by Blaser and Heinzman [6] and fully developed by Siegenthaler [45]. CC works by exploiting highly correlated functions acting on the cipher input and output, respectively. This correlation is then used to yield information about the nal round key. When looking at CC, there are three main questions which naturally come to mind: 1

2

CHAPTER 1. INTRODUCTION

How does one determine a cipher's security against CC? Given a cipher, what is the best possible attack via CC? How is it possible to construct ciphers which are secure against CC? Each one of these questions will be addressed before we reach the nal page. More precisely, the report is organized as follows. Chapter 2 introduces de nitions, models, and notions used in the rest of the text. A number of basic theorems, some of which the reader is probably already familiar with, are also presented to provide reference. The Fourier transform over Abelian groups is also introduced since it plays an important role in the theory of CC. The notion of a statistical attack is de ned in Chapter 3. The statistical attack is a common framework into which it is possible to put both LC, GLC, PC, DC, and CC. The cryptanalyses are all known-plaintext attacks on block ciphers and common to them all is the statistical analysis of plaintext/ciphertext pairs (P/Cpairs), which in the successful case leads to knowledge of the key of the last (or rst) round. The basic notions of the correlation attack are explained in Chapter 4. Among these are the so-called correlation matrix and the imbalance operator which are both central tools. It is shown that the correlation of function pairs over a cipher is closely linked with the correlation matrix. It is also shown how to obtain the correlation matrix of a whole cipher given the correlation matrices of the individual rounds. In Chapter 5, the mechanics of the correlation attack are developed and three dierent attack algorithms of various complexity and strength are presented. The simple attack considers the correlation of only one function pair to yield information about the key. Inspired by the approach with \multiple approximations" of [20], the advanced attack considers several pairs of functions. The advanced attack has the drawback of being dicult to analyze and we therefore also present a simpli cation of this attack. Based on analysis of the simpler attack, the chapter also presents a result which states how many P/C-pairs are required to carry out a successful attack. Finally, based on Weil's bound [28], it is shown how the Boolean complexity of an xor-cipher relates to its correlation immunity. On basis of the multiplicativity of correlation matrices, we derive in Chapter 6 lower and upper multiround bounds for the number of P/C-pairs required to do a successful analysis of a given cipher. The Frobenius norm of the cipher's correlation matrix plays an important role in this context. The results answer one of the questions which have arised through the study of LC, namely how to establish by proof whether a given cipher is secure or not. The relationships between CC and other types of statistical attacks are examined in Chapter 7. It is shown that dierential and CC are dual notions in the sense that, informally speaking, they represent the same attack in, respectively, the time domain and the frequency domain. More precisely, the matrix of dierential transition probabilities used with DC is the 2-dimensional Fourier transform of the correlation matrix (and vice versa). As a consequence, the strength of a full correlation attack exploiting every possible linear connection between input and output depends on exactly the same measure of weakness as the strength of a full dierential attack

3 exploiting every possible dierential characteristic. Links to m-ary GLC and PC are also examined, and it is shown that these two approaches are not more successful than CC. Chapter 8 contains a collection of ideas for new attacks which perhaps would be worth some investigation. Among these is an approach using Buchberger's algorithm to obtain probabilistic information about the key. An idea for an authentication scheme using Grobner bases is also presented. Finally, in Chapter 9 we summarize our results with a conclusion and some suggestions for further work.

4

Chapter 2 Preliminaries In this chapter we present the notation and the de nitions used throughout the report. We also present some basic theorems regarding the Fourier transform and elements from linear algebra to provide reference.

2.1 Cipher Model We consider an iterative block cipher with round function consisting of a keyed Abelian group operation + at the entry followed by a keyed permutation (see g. 2.1). Letting (G; +) denote the employed Abelian group of order n with neutral element 0, we denote by ?a the additive inverse of a, and we write a ? b instead of a + (?b). X

Abelian group operation

KL

One round

ϕKL

Keyed permutation

KR

Y

Figure 2.1: The considered round function. More formally, we are looking at an iterative block cipher of r rounds with n possible input values and a round function given by Rk (x) = 'kL (x + kR) where x; kR 2 G and 'kL : G ! G is a permutation indexed by a key kl (by kL and kR we denote the \left hand" and the \right hand" part of the key k). 5

CHAPTER 2. PRELIMINARIES

6

As in [12], the capital letters K , X , and Y denote random variables describing respectively the key, the input, and the output of the cipher. The corresponding lower-case letters denote instances of these random variables. With superscripts on letters, e.g. Y (j) and k(j), we indicate that expressions belong to a certain round (j in this case); thus Y (j) is the output from the j -th round (and the input to round j + 1). We will assume that all subkeys are independent and uniformly distributed. Although subkeys are often generated by some key schedule, this assumption can usually be made without loss of generality. By the reduced cipher corresponding to an r-round cipher, we de ne the cipher consisting of the rst (r ? 1) rounds of the original cipher. The expanded cipher is de ned to be the cipher consisting of the original cipher followed by the inverse of the original last round (with a new subkey K~ ). Figure 2.2 shows the original cipher and the corresponding expanded and reduced ciphers. X

X

(1)

(1)

K

RK(1)

(2)

(2)

RK(2)

K

(r-1)

(r-1)

RK(r-1)

K

(r)

(r)

K

RK(r)

(r)

Y=Y

(1)

RK(1)

(2)

RK(2)

(r-1)

RK(r-1)

(r)

X

(1)

K

(2)

K

(r-1)

K

K

(r) -1

~ K

Y

(2)

RK(2)

(r-1)

RK(r-1)

(r)

RK(r)

(RK ~)

(1)

RK(1)

Y

(1)

K

(2)

K

(r-1)

K

(r-1)

(r+1)

Figure 2.2: A cipher and the corresponding expanded and reduced cipher.

2.2. MISCELLANEOUS

7

2.2 Miscellaneous

Let the decomposition of (G; +) into additive, cyclic groups be given by G = ZZn ZZn ZZng (where n = n1 n2 ng and g is the number of subgroups). By subscript on elements from G we denote the corresponding elements of the cyclic subgroups, e.g., x = (x1; x2; : : : ; xg), where x 2 G and xj 2 ZZnj . By i we denote the imaginary unit and by c the complex conjugate of the complex number c. Furthermore, by w : G ! C we denote the function given by w (x) = i P , where = e n denotes an n-th primitive root of unity in C, and < w; x >= g w x T mod N , for w; x 2 G and T = n=n . j j i=1 j j j ^ By G we denote the elements of the character group corresponding to (G; +). For the purpose of this report, simply let the elements of the character group be the set of functions given by G^ = fw : w 2 Gg. Notice that a(x) a(y) = a(x + y), a(?x) = a(x) = ?a(x), and a(x) = x(a) for all 2 G^ and a; x; y 2 G. In addition, the values of a character always lie on the unit circle in C. 1

2

2

2.3 The Fourier Transform

De nition 2.3.1 Fourier transform. Let G be the elements of a nite Abelian group of order n with character group G^ = fa : a 2 Gg. Given a complex-valued function : G ! C, the Fourier transform of is the function Ff g : G ! C

de ned by

for all a 2 G.

Ff g(a) = n1

X x2G

(x) ?a(x)

For a cyclic group, the above de nition coincides with the usual de nition of the discrete Fourier transform, and for G = ZZw2 (the \xor-group") it is simply the WalshHadamard transform. We will often use the well-known Parseval's identity so we include it here (without proof). Theorem 2.3.2 Parseval's identity. For any function : G ! C, 1 X j (x)j2 = X jFf g(w)j2 : n x2G w2G Another useful property of the Fourier transform is the following. Theorem 2.3.3 Inverse Fourier Transform. Any function : G ! C can be expressed in a unique way as a weighted sum of elements from the character group G^ by using the Fourier transform, as follows X (x) = Ff g(w) w (x): w2G

For G cyclic this is just another way of stating the well-known property that a periodic function can be thought of as a sum of (co)sines.


8

By ab or a(b) we denote Kronecker's delta function, i.e., ( a=b ab = a(b) = 10 ifotherwise.

Proposition 2.3.4 The Fourier transform of a character is a delta function. More precisely

Ffag(x) = a(x):

2.4 Elements from Linear Algebra

By Mrs we denote the set of all complex r s matrices, and by Mr the set of all complex r r (square) matrices. We use column vectors and by M T we denote the transposed version of the matrix M and by M = M T the Hermitian adjoint. To provide reference, we explain now brie y the notions of eigenvalues and singular values and some of their properties. The propositions are given without proof (for these see a textbook on matrix analysis, e.g., [15] or [16]).

De nition 2.4.1 An eigenvalue of the square matrix M 2 Mm is a complex number that ful ls the equation

Mv = v for some nonzero vector v called an eigenvector. An m m matrix M has m (algebraic) eigenvalues including multiplicities. In decreasing order of magnitude, these will be denoted by 1(M ); 2(M ); : : : ; m(M ) (i.e., j1(M )j j2(M )j : : : jm (M )j).

Proposition 2.4.2 Every symmetric matrix M 2 Mm can be factored into three matrices

M = V ?1 V

where is a diagonal matrix consisting of the eigenvalues of M and the rows of V are the corresponding eigenvectors.

Two matrices A and B are said to be similar, if A = M ?1BM for some matrix M . Similar matrices have the same eigenvalues.

De nition 2.4.3 The singular values of the square matrix M 2 Mm are de ned to be the square roots of the eigenvalues of M M .

Singular values are always real numbers since the eigenvalues of M M are nonnegative. The m singular values (including multiplicites) of an m m matrix M will be denoted by 1(M ); 2(M ); : : : ; m(M ) such that 1(M ) 2(M ) : : : m(M ). When the elements of M are real numbers, M = M T and in that case the singular values of M are of course the square roots of the eigenvalues of M T M . At some point, we will need to measure the \magnitude" of a matrix. Here the following de nitions are useful.

2.4. ELEMENTS FROM LINEAR ALGEBRA

9

De nition 2.4.4 Let M = [mab] 2 Mn be a square matrix. De ne the following matrix operators

P jm j ab ab (Pab m2ab) maxabPjmabj maxb a jmabj 1(M )P maxa b jmabj Spectral radius: 1(M ) Every norm is bounded by every other norm when suitable scaled. The following proposition presents the tightest bounds possible for all of the norms introduced above.

kM k1 Frobenius norm: kM k2 l1 norm: kM k1 Maximum column sum norm: jjM jj1 Spectral norm: jjM jj2 Maximum row sum norm: jjM jj1 l1 norm:

= = = = = = (M ) =

1 2

Proposition 2.4.5 Let M 2 Mr be a square matrix. Then kM k1, kM k2, rkM k1 , jjM jj1, jjM jj2, and jjM jj1 are all matrix norms on M , i.e., jjM njj jjM jjn for all n 2 IN; where jj jj is one of the mentioned operators. Furthermore, the following table gives the best constants k such that jjM jj kjjM jj for all M 2 Mr . jj jj jj jj1 jj jj2 jj jj1 k k1 k k2 rk k1 jj jj pr r pr jj jj1 1 1 1 p p jj jj2 r p1 r 1 p1 1 jj jj1 r r 1 1 r 1 3=2 k k1 r r r 1 r r p p p k k2 r r r 1 1 1 r k k1 r r r r r 1 The spectral radius () (which is not a matrix norm) is bounded from above by every matrix norm.

The table is from [43].

Proposition 2.4.6 Let C be a doubly stochastic matrix. Then the spectral radius (C ) equals 1. Moreover, e = (1; 1; : : : ; 1) is an eigenvector with eigenvalue 1. It is easily checked that Ce = e since C is doubly stochastic.

De nition 2.4.7 By the Hadamard product or Schur product M N of the two

matrices M = [mab] and N = [nab] with the same dimensions de ne the matrix which is the elementwise product of M and N . More formally,

M N = [mab nab] : For more details on the properties of the Hadamard product, see [16].


10

Proposition 2.4.8 The Hadamard product is submultiplicative with respect to the spectral norm in the following sense. For any A; B 2 Mm we have k X j =1

j (A B )

k X j =1

j (A) j (B )

for k = 1; 2; : : : ; m. In particular

jjA B jj2 jjAjj2 jjB jj2: De nition 2.4.9 A square matrix C is said to be Schur stochastic, unitary stochastic, or orthostochastic if it has the form

C =U U for some unitary matrix U .

We mention without proof that every orthostochastic matrix is doubly stochastic.

De nition 2.4.10 The Kronecker product of V = [vab] 2 Mrs and W = [wab] 2 Mtu is denoted by V W and is de ned to be the block matrix 0 v W v W ::: v W 1 BB v12;;11 W v12;;22 W : : : v11;s;s W CC V W =B ... C ... ... CA : B@ ... vr;1 W vr;2 W : : : vr;s W Proposition 2.4.11 Some basic properties of the Kronecker product are (A B )T = AT B T ; (A B ) = A B ; and

(A B )(C D) = (AC ) (BD):

Chapter 3 The Statistical Attack In [35], Murphy, Piper, Walker and Wild present a general setting which is useful for characterizing a number of cryptanalytical attacks based on statistical testing. Their work applies to both block ciphers and stream ciphers. In this chapter and in Chapter 7, we put LC, GLC, PC, DC, and CC into this setting. However, the chapter can be read independently of [35] since we develop our own notation and we do not follow the conventions of Murphy et al. A common element to LC, GLC, PC, DC, and CC is the statistical analysis of P/C-pairs, which in the successful case leads to determination of the nal round key. More speci cally, all ve attacks depend on the existence of a random variable S which depends on the cipher input X and the reduced cipher output Y (r?1) in such a way that information is leaked. For this to be the case, S must be uniformly distributed when X and Y (r?1) are randomly chosen and independent of each other, and as non-uniformly distributed as possible (with respect to some measure) when X and Y (r?1) describe an actual P/C-pair of the reduced cipher. In the following, let V be some set and let T be a family of balanced functions of the form s : G2 ! V (i.e., s(X; Y ) is uniformly distributed over V when (X; Y ) is uniformly distributed over G2 ). Furthermore, let a likelihood evaluator L be a certain measure of non-uniformity; more precisely let L be a function which maps a random variable S with values in V into the nonnegative real numbers, such that L(S ) = 0 if S is uniformly distributed over V . De ne by an attack descriptor the pair (T ; L), where T is a family of functions onto V as described above and L is a likelihood evaluator used to measure the non-uniformity of V -valued random variables. We will use the above model of a statistical attack here and in the following chapters to describe all of the above mentioned attacks.

Example 3.0.1 Linear cryptanalysis as introduced by Matsui [30, 31] has an attack

descriptor given by

T = fsjs : ZZw2 ZZw2 ! ZZ2; a; b 2 ZZw2 n f0g; s(x; y) = a x b y for all x; yg and

1 L(S ) = P [S = 0] ? 2 for all S; 11

CHAPTER 3. THE STATISTICAL ATTACK

12

where denotes addition modulo 2, w is the bit-width of the cipher, and denotes the inner product over ZZw2 , i.e., T is the set of all binary linear relations between input and output, and the value of the likelihood evaluator L is de ned to be the bias of its argument. The following is an explanation of the basic attack with descriptor (T ; L) as it applies to our cipher model (for actual ciphers like DES there might be more ecient approaches). As usual, X and Y denote the cipher input and output, respectively. First, we guess a last-round key k~. Then with this guess we obtain the one-round (r) ?1 ~ decryption of the ciphertext Y = (Rk~ ) (Y ). There are now two possibilities: We have guessed the correct last-round key. We have guessed a wrong last-round key. For each of these possibilities there is a corresponding distribution of Y~ . If we have correctly guessed the key (instance 1), then Y~ equals Y (r?1) (see Figure 3.1) and X

X

Rk(1)

(1)

k(1)

Rk(1)

(1)

k(1)

(2)

k(2)

Rk(2)

(2)

k(2)

(r-1)

k(r-1)

Rk(r-1)

(r-1)

k(r-1)

Rk(2)

Rk(r-1)

Y (r)

Rk(r)

(r-1)

~ Y

= ~ k(r) = k

(r)

Y=Y

Y (r) -1

(Rk~ )

(r)

Rk(r)

(r+1) ~ Y=Y

(r-1)

~ k(r) = k

(r) -1

(Rk~ )

(r)

Y=Y

Figure 3.1: Cipher output given a correct and a wrong last-round key guess. therefore the two pairs (X; Y~ ) and (X; Y (r?1)) are identically distributed (i.e., Y~ is the output from the reduced cipher). If we have guessed a wrong key (instance 2), then Y~ is generally not equal to Y (r?1). Instead (X; Y~ ) = (X; Y (r+1)) follows the distribution of a P/C-pair from an extended cipher.

13 To distinguish between cases 1 and 2, we do a statistical test of the hypothesis H1: (X; Y~ ) is distributed like (X; Y (r?1)) (success) against the alternative H2: (X; Y~ ) is distributed like (X; Y (r+1)) (failure). If the test points at H1 being correct, we declare our key guess to be the correct last-round key. If the test points at the alternative H2, we discard our key guess and repeat the whole procedure with another last-round key guess. See Figure 3.2 for a schematic of the attack (the gure is from [14]). Plaintext source

X

First r-1 rounds

r-th round

K(1..r-1)

K(r) ~ Y

Y

r-th round inverse

Evaluation of the guess

~ k

~ L[k] ML-estimator

key guess

~ {k: L[k] = max L[k]} ~ k

Figure 3.2: Schematic of the statistical attack. To carry out the test in practice, we consider a function s from T and the corresponding random variables S = s(X; Y~ ), S (1::r?1) = s(X; Y (r?1)), and S (1::r+1) = s(X; Y (r+1)). We then test the simple hypothesis

H1 : L(S ) = L(S (1::r?1)) against the simple alternative H2 : L(S ) = L(S (1::r+1)). To do this, we rst have to compute an estimate L~(S ) of L(S ) based on a sucient number of P/C-pairs. There are several ways the test can proceed. NeymanPearson's Lemma [27] is useful for distinguishing between two simple hypotheses.

CHAPTER 3. THE STATISTICAL ATTACK

14

This requires knowledge of the probability density function of L~(S ) (more about that and the relations to CC later). Some simpli cations are possible if we assume that the random variable S (1::r+1) is uniformly distributed. This is a valid assumption since Y (r+1) has passed through two more rounds than Y (r?1) and therefore, loosely speaking, S (1::r+1) = (X; Y (r+1)) is \more uniformly distributed" than S (1::r?1) = (X; Y (r?1)). This phenomenon is formulated in [12] for GLC as the \Hypothesis of Wrong Key Randomization". The simpli cation results in the new test

H1 : L(S ) = L(S (1::r?1)) against

H2 : L(S ) 6= L(S (1::r?1)) or, alternatively, H1 : L(S ) = L(S (1::r?1)) against H2 : L(S ) < L(S (1::r?1)). An advantage of this test is that it does not require prior knowledge of L(S (r+1)) and if L is appropriately chosen, then knowledge of the probability density function of L~ is not necessary either. As candidate for the last-round key we simply choose the key which have the highest value of L. The following summarizes on a more algorithmic form the simplest possible attack (K denotes the keyspace of the last round). choose a suitable s 2 T for k~ 2 K do begin Y~ := (R(k~r))?1(Y ) compute the estimate L~(s(X; Y~ )) based on the available P/C-pairs L[k~] := L~(s(X; Y~ )) end output all keys k~ that maximize L[k~] It is possible to improve upon this attack by considering several functions from T instead of only one (namely s). How to do this optimally for the correlation attack will be revealed later on. Next, after having recovered the last-round key we can proceed by yet another statistical attack to recover the key of the round next to last and so on. Or, if tractable, we can do an exhaustive key-search on the remaining part of the key (taking advantage of relations between bits in the subkeys caused by a key schedule, if any).

15 There exist short-cuts that decrease the number of keys that we need to test. For instance, it is possible to speed up an attack if one can get just some information about Y (r?1) by guessing correctly only some of the bits of the last-round key. More formally, we say that the two keys k~1 and k~2 belong(r) to the same key class if the random variables s(X; Y~1 ) and s(X; Y~2) with Y~1 = (Rk~ )?1 (Y ) and Y~2 = (R(k~r))?1(Y ) are identically distributed for any s 2 T . This relation induces a set of equivalence classes on the key space. The test can only distinguish between keys which are not equivalent, i.e., we can only infer to which key class the correct last-round key belongs { in return we need not test as many keys as earlier. We will not deal further with the subject of key classes, instead we refer to [14]. The attack can also be extended by simultaneously trying to guess the rst-round and the last-round key. This approach also will not be considered. To prevent undermining our complexity estimations by short-cuts like the above we will sometimes tacitly assume that the adversary is computationally unbounded. Thus, if he can't break the cipher by a statistical attack, then nobody can (when even a computationally unbounded adversary can fail a statistical test, the reason lies in the limited number of P/C-pairs). 1

2

16

Chapter 4 The Correlation Attack In this chapter we introduce the basic theory and notions related to the correlation attack. We then proceed to deduce the probability density function for certain so-called imbalance estimates.

4.1 I/O Products and Correlation Matrices Basic notions of CC are the I/O product, the imbalance operator, and the correlation matrix. These notions are all useful tools for obtaining knowledge about function pairs that have a high correlation. They will be explained in this section. First, however, we need to de ne certain pairs of functions called I/O pairs. De nition 4.1.1 I/O pair. Let an I/O pair (f; g) over G be a pair of two functions, f : G ! C and g : G ! C, such that E [f ] = E [g ] = 0, and E [jf j2] = E [jgj2] = 1. The function f is called the input function and the function g is called the output function. By PG we denote the set of all possible I/O pairs over G. The following is the CC-equivalent of the linear relations used with LC. De nition 4.1.2 I/O product. Given an I/O pair (f; g) and two random variables X and Y , de ne by the I/O product (input/output product) corresponding to (f; g), X , and Y the following expression S = f (X ) g(Y ): We sometimes use Sfg to denote the I/O product corresponding to (f; g). Similar to the notion of characteristics used with DC, we de ne the following. De nition 4.1.3 The linear I/O pair with characteristic (a; b) where a; b 2 G n f0g is de ned to be the I/O pair (a; b) where a and b denote the characters corresponding to a and b. The corresponding linear I/O product Lab(X; Y ) is de ned to be Lab (X; Y ) = a(X ) ?b (Y ): We omit X and Y and use Lab if it is implicitly given to which random variables we refer. 17

18

CHAPTER 4. THE CORRELATION ATTACK

It is easily seen that (a; b) is indeed an I/O pair since E [a] = E [b] = 0 and E [jaj2] = E [jbj2] = 1. To measure the non-uniformity of the distribution of an I/O product, we introduce the following notion which is analogous to the bias used by Matsui [30] to measure the non-uniformity of linear approximations. De nition 4.1.4 Imbalance. Given an I/O product S , de ne by the imbalance of S denoted by I (S ), the squared absolute expected value of S , i.e., I (S ) = jE [S ]j2 : If the random variable S depends on the random variable K , we let I (S jK = k) or just I (S jk) denote the imbalance of S given that K = k. The average-key imbalance I(S ) of the I/O product S is de ned to be the expected value of the imbalance of S over all possible keys K , i.e., X I(S ) = 1 I (S jK = k)

jKj k2K

where K denotes the key space. An I/O product with average-key imbalance 1 is said to be guaranteed.

In other words, the imbalance of the I/O product S = f (X ) g(Y ) is simply the squared correlation between f (X ) and g(Y ). Remark 4.1.5 For xor-type ciphers, that is, ciphers acting on the group (ZZw2 ; ) for some w, where represents componentwise addition modulo 2, the I/O product and the imbalance de ned above coincides with the linear relation and 4 times the square of the bias used by Matsui. Thus, results derived for CC are possible to adapt for use with LC as well. The imbalance operator will be used as likelihood estimator in the attack. To summarize, the attack descriptor of a correlation attack is given by T = fsjs : G2 ! G; s(x; y) = f (x) g(y) for all x; y; (f; g) 2 PGg L(S ) = jE [S ]j2 for all S: The following which is called the hypothesis of xed key equivalence was introduced in [12] for use with GLC. Hypothesis 4.1.6 Fixed key equivalence. For every I/O product S and almost every key k, I (S jk) I(S ): Roughly, it states that the correlational properties of a cipher are independent of the actual choice of key meaning that we can use the average-key imbalance of an I/O product as a good approximation of the actual xed-key imbalance. As indicated by the context, we will sometimes assume that the hypothesis holds. In [25], Lai, Massey, and Murphy de ne a matrix of dierential transition probabilities for use with DC. Similarly, here we put the imbalances of all linear I/O products into a matrix called the correlation matrix.

4.1. I/O PRODUCTS AND CORRELATION MATRICES

19

De nition 4.1.7 Correlation matrix. Given a keyed function ek : G ! G with input X and output Y = eK (X ), de ne by the correlation matrix of that function, the (n ? 1) (n ? 1) matrix C = [cab] with cab = I(Lab(X; Y )) for a; b 2 G n f0g.

Notice that the above de nition of a correlation matrix is not identical to the de nition which is normally used in probability theory since, e.g., a correlation matrix in the above sense is not always positive semi-de nite. A more correct but also more extensive choice of words might be the cross-correlation matrix (in probability theory, the correlation matrix is more like a kind of autocorrelation matrix). The de nition, however, complies with the notation of Daemen, Govaerts, and Vandewalle, who also uses the words correlation matrix in conjunction with (binary) LC.

Example 4.1.8 The correlation matrix C = [cab] of the function ek (x) = x + k with x; k 2 G is the identity matrix I , since cab = I(a(X ) ?b(X + K )) X = n1 I (S jK = k) k2G 2 X 1 X a 1 ? b = n n (x) (x + k) k2G x2G 2 X 1 X a 1 = n n (x) ?b(x) ?b(k) G x2G k2X 2 1 a ? b = n (x) x2G = ab: The next observation is crucial to most of the following theory.

Proposition 4.1.9 All correlation matrices are doubly stochastic. Proof Let C be the correlation matrix of the function ek : G ! G and let K denote the key space. That C is doubly stochastic follows easily from Parseval's identity. The sum of the elements in the b-th row of C is X X cab = I (Lab) a2Gnf0g a2Gnf0g 2 X 1 X 1 X a ? b n (x) (ek (x)) : = jKj k2K x2G a2Gnf0g

20


Since 0 = 1 and E [?b] = 0 we may include a = 0 in the sum without altering the result. This makes it possible to apply Parseval's identity and obtain 1 X X X (x) ?b(e (x)) 2 = 1 X X ?b(e (a)) 2 k k n jKj a2G k2K x2G a n jKj a2G k2K = 1: In a similar way the column sums can be found to equal 1 as well. 2

Proposition 4.1.10 It is possible to express the elements of a correlation matrix by using the Fourier transform. More precisely, let ek : G ! G be a keyed permutation and let C = [cab] be the corresponding correlation matrix. Then X jFfb ek g(a)j2: cab = 1

jKj k2K

In this way it is possible to speed up computation of correlation matrices by a factor of n= log n by using the Fast Fourier Transform (FFT). For information about the FFT algorithm and how to implement it see [37]. Proof Follows directly from the de nitions of a correlation matrix and the Fourier transform cab = I(Lab) 2 X 1 X a 1 ? b = jKj n (X ) (ek (x)) k2K x2G 2 X 1 X b 1 = jKj n (ek (x)) ?a(X ) k2K x2G X 1 = jKj jFfb ek g(a)j2: k2K

2

As decryption is carried out by applying the inverse encryption function, the following is useful to know. Proposition 4.1.11 Let C = [cab] be the correlation matrix corresponding to the permutation ek : G ! G and let D = [dab] be the correlation matrix corresponding to the inverse function e?k 1 : G ! G. Then C = DT : (4.1) Proof The property is proved simply by inserting the de nitions in question. cab = I(a(X ) ?b(eK (X ))) = I(b(Y ) ?a(e?K1(Y ))) = dba: As a direct result of this, we have the following.

2


21

Corollary 4.1.12 The correlation matrix of an involution is symmetric. Proof Let the function e : G ! G be an involution, i.e., e?1 = e, and let C be the

corresponding correlation matrix. According to (4.1) C = C T . 2 Since a large percentage of ciphers use involutory round functions to facilitate decryption, the correlation matrix of a cipher is often symmetric.

Remark 4.1.13 PES (Proposed Encryption System) is a block cipher constructed

by Lai and Massey [24]. It has later been replaced in [25] by the slightly modi ed IPES (Improved PES) which is now known as IDEA (International Data Encryption Algorithm). PES uses an involution as round permutation. As mentioned, this results in a symmetric correlation matrix C . The symmetry also appears in the matrix of dierential transition probabilities D which is used with dierential cryptanalysis. The purpose of modifying PES was exactly to get rid of this symmetry. The modi cation caused a permutation of the columns of both C and D which made the cipher stronger. Generally, symmetrical matrices cause a higher maximum multiround imbalance than non-symmetrical.

In an attack, we exploit I/O products with a high imbalance. The question is how to nd these given some cipher. Clearly, trying every possible I/O product and computing the average-key imbalance by running through all possible keys and inputs is an immense task, which is, to say the least, unpractical. However, by considering the correlation matrix it is possible to compute the average-key imbalance of any I/O product in an easy way. This is formulated in the following proposition.

Proposition 4.1.14 Let there be given a cipher with a group operation at the entry and at the exit, i.e., let the encryption function ejkl : G ! G be given by ejkl (x) = e~k (x + j ) + l, where j; l 2 G are the subkeys used in the initial and nal group operations, and k 2 K is the rest of the key (cf. Figure 4.1). In addition, let (f; g) X

J

~e K

K

L

~ (X+J)+L Y=e K

Figure 4.1: A cipher with a group operation at the entry and at the exit.


22

be an I/O pair over G, let C denote the correlation matrix of the cipher, and let S be the I/O product corresponding to (f; g). Then I(S ) = f^T C g^; (4.2) where f^ and g^ are the Fourier power spectrum vectors of f and g respectively, i.e., f^a = jFff g(a)j2 and g^b = jFfgg(b)j2 for a; b 2 G n f0g. Proof According to the de nition of average-key imbalance, we have 1 X X I (S jj; k; l) I(S ) = 2

jKj n

j;l2G k2K

XX = jKj n2 I f (X ) g(Y jj; k; l) j;l2G k2K 2 X X 1 X 1 (4.3) = g (y) ; f ( x ) 2 jKj n j;l2G k2K n x2G where y = e~k (x + j ) + l. To simplify notation, let Fa = Fff g(a) and Gb = Ffgg(b). Now we express f (x) and g(x) by the inverse Fourier transform X f (x) = Fa a(x): (4.4) 1

a2G

Similarly for g(y) we obtain X g(y) = Gb b(y) b2G X = Gb b(~ek (x + j ) + l): b2G

(4.5)

Substitution of (4.4) and (4.5) into (4.3) yields 3 2 "X # 2X X X X 1 a e I(S ) = jKj1n2 F k (x + j ) + l)5 : a (x) 4 Gb b(~ j;l2G k2K N x2G a2G b2G Recall that (x) = (?x), so ! 2 X X X X 1 1 I(S ) = jKjn2 FaGb n a(x) ?b (~ek (x + j ) + l) : j;l2G x2K a;b2G x2G Recall also that (x + y) = (x) (y). Thus ! 2 X X X X FaGb n1 a(x) ?b(~ek (x + j )) ?b (l) I(S ) = jKj1n2 x2G j;l2G k2K a;b2G ! 2 X a X X X 1 1 FaGb n (x ? j ) ?b(~ek (x)) ?b(l) = jKjn2 x2G j;l2G k2K a;b2G ! 2 X X X X FaGb n1 a(x) a(?j ) ?b(~ek (x)) ?b (l) : = jKj1n2 j;l2G k2K a;b2G x2G


23

Since ?b (l) = b(?l) we get ! 2 X X X X 1 1 I(S ) = jKjn2 FaGba(?j ) b(?l) n a(x) ?b(~ek (x)) : x2G j;l2G k2K a;b2G Application of Parseval's identity twice yields ! 2 X X X 1 FaGb a(?j ) b(?l) n1 a(x) ?b(~ek (x)) : I(S ) = jKj x2G j;l2G a;b2G Since E [f ] = 0 and E [g] = 0 by the de nition of an I/O pair, we have f^0 = g^0 = 0, and nally we obtain (4.2) 2 X X I(S ) = 1 FaGb Ffb e~k g(a)

jKj k2K a;b2Gnf0g

2 X X a 1 = jFaj jGbj jKj (x) ?b(~ek (x)) k2K x2G a;b2Gnf0g X ^ 1 X = fag^b jKj I (Lab(X; Y )jk) k2K a;b2Gnf0g X ^ = fag^b I (Lab(X; Y )) X

=

2

2

a;b2Gnf0g f T Cg:

2

Proposition 4.1.14 still leaves us with the problem of determining the correlation matrix of a whole cipher. This would be an immense task, too, if it wasn't for the following proposition which lets us determine the correlation matrix of an iterated block cipher given the correlation matrices of the individual rounds. Proposition 4.1.15 The correlation matrix of a multiround cipher consisting of keyed permutations separated by keyed group operations is the product of the correlation matrices of the individual permutations. In other words, Yr C (1::r) = C (s); s=1

(1::r) where r is the number of rounds, C (1::r) = [cab ] is the correlation matrix of the ( s ) entire cipher, and C is the correlation matrix of the permutation of round s. Proof First, the result is shown for two rounds. We denote the rst round key by j , the intermediate subkey used with the group operation by k, and the second round key by l. The rst round function is denoted by Qj , and the second by Rl, i.e., the encryption function is given by ejkl(x) = Rl(Qj (x) + k), where j 2 J , k 2 G, and l 2 L. Finally, the random variables W , X , Y , and Z denote the input and the output of round one, and the input and output of round two, respectively (cf. Figure 4.2). Then we have


24

W

J

QJ X

K Y RL

L

Z = RL(QJ(W)+K)

Figure 4.2: Two rounds separated by a group operation.

X t2G

X a I ( (W ) t(X )) I(t(Y ) b(Z )) t2G 2 3 X 4 1 X a ?1 I ( (Qj (X )) c(X )jJ = j )5 = jJ j j 2J t2G 2 3 X 1 4 jLj I (t(Y ) b(Rl (Y ))jL = l)5 2 l2L 23 X 4 1 X 1 X a ?1 t (x) 5 = ( Q ( x )) j t2G jJ j j 2J n x2G 2 23 X X 1 1 64 jLj n t(y) b(Rl(y)) 75 l2L y2G 2 X X X 1 1 a(Q?1 (x)) b (R (y )) t(y ? x) : = jJ j jLj l n2 t2G j2J ;l2L n x;y2G j

(2) c(1) at ctb =

By applying Parseval's identity with respect to t this becomes 2 X X X 1 a (Q?1(x)) b (R (y )) (y ? x) : 1 l t jJj jLj n t2G j2J ;l2L n x;y2G j Whenever y ? x 6= t, the terms in the sum disappear, leaving the terms where y = x + t. The expression simpli es to 2 X X 1 X a ?1 1 b (R ~l(x + t)) : ( Q ( x )) j jJj jLj n t2G j2J ;l2L n x2G

4.2. THE DISTRIBUTION OF IMBALANCE ESTIMATES

25

Substituting k = t and w = Q?j 1(x) yields 2 X 1 X a 1 b(Rl (Qj (w) + k )) ( w ) jJ j jLj n j2J ;k2G;l2L n w2G X = jJ j 1jLj n I (a(W ) b(Z )jj; k; l) j 2J ;k2G;l2L = I(a(W ) b(Z )) = c(1ab::2): Thus X (2) c(1ab::2) = c(1) at ctb ; for all a; b 2 G n f0g, i.e.,

t

C = C (1) C (2): By induction this property generalizes to more than two rounds.

2

4.2 The Distribution of Imbalance Estimates In this section we turn to the examination of the probability density function of imbalance estimates. As indicated by the attack descriptor, the likelihood estimator of a correlation attack is the imbalance operator I (). Thus, in the statistical attack we must construct an estimate I~(S ) of I (S ) given some I/O product S = f (X ) g(Y~ ) and a number N of P/C-pairs. Since I (S ) = jE [S ]j2, the natural choice of estimate is 2 X 1 f (x) g(~y) : I~(S ) = N (x;y)2Z We mention without proof that this estimator is noncentral but consistent. To do hypothesis testing via this value, we need the distribution of I~(S ). The following de nition presents the probability density function of I~(S ). We later prove that this is indeed the correct density function.

De nition 4.2.1 Imbalance distribution. Let the imbalance distribution with imbalance parameter J and sample size parameter N be the probability distribution with density (; J; N ) : IR+ [ f0g ! IR+ [ f0g de ned by 2N 2 N 2 N (x; J; N ) = 1 ? J h 1 ? J x; 1 ? J J (4.6) where h(; s) represents the probability density function of the non-central 2-distribution with 2 degrees of freedom and skewness parameter s. The following gives a more explicit expression.


26

Proposition 4.2.2 The density function of the imbalance distribution with parameters J and N is given by

1 Nx J X r (x; J; N ) = (1 ?NJ )p e? ?J

(4.7)

2N 2 !r ?(r + 1 ) 1 2 ; r = (2r)! 1 ? J Jx ?(r + 1) where ? represents the gamma function.

(4.8)

( + ) 1

r=0

with

Proof The non-central 2-distribution h : IR+ [ f0g ! IR+ [ f0g with d degrees of

freedom and skewness parameter s has density given by (see [34]) ?1 1 X hd (x; s) = 2? d ? 12 e? (x+s)x (d?2) r ; r=0 2

where

1 2

1 2

?(r + 1 ) r = (21r)! (sx)r ?((d + 2r2)=2) :

For d = 2 this simpli es to

h2(x; s) = 2p1 e? with

1 2

(x+s) X r r=0

1

?(r + 12 ) r = (21r)! (sx)r ?(r + 1) :

(4.9)

Expanding 12?NJ h2( 12?NJ x; 12?NJ J ) and simplifying the result yields (4.7) and (4.8). 2 As one can see, the probability density function is rather complex. However, for small values of J , it simpli es greatly.

Proposition 4.2.3 If J ( 21N )2, then the imbalance distribution with parameters

J and N closely resembles an exponential distribution. More formally, (x; J; N ) h(x; J; N ) where

N e ?N ?xJ J h(x; J; N ) = 1 ? J

( + ) 1

with accumulated error

=

Z1 0

jh(x; J; N ) ? (x; J; N )j dx = 1 ? e? JN?J : 1


27

Proof Recall from Proposition 4.2.2 that the probability density is given by (4.7) and (4.8). For r > 0 all three factors of r are decreasing in r (recall that the gamma-function is increasing for positive arguments). Thus for r > 0 and x 1 2N 2 !r ? 23 1 r 2! 1 ? J Jx ?(2) 2N 2 !r 1 ? J J 0: For J ( 21N )2 this implies 1 2N 2 !r X J r 0 r=1 r=1 1 ? J 2N 2 1?J J = 2N 2 0 1 ? 1?J J 0 : 1 X

p

p

Since 0 = , this means altogether that P1r=0 r and thus by (4.7) we have

N e? N x?JJ : (x; N; J ) 1 ? J ( + ) 1

Since all values of r are positive, we have h(x; J; N ) < (x; J; N ), and thus the accumulated error is given by

Z1

((x; J; N ) ? h(x; J; N )) dx Z1 = 1 ? h(x; J; N )dx 0 = 1 ? e? JN?J :

=

0

1

R Note that h(x) does not de ne a probability density function since 01 h(x)dx 6= 1. However for small J the value is close to 1. 2 Unfortunately it is not often the case that J ( 21N )2 since we usually have a large value of N . In that case, Algorithm XI is useful for evaluating (x; J; N ).


28

Algorithm for evaluating the probability density of the imbalance distribution.

procedure XI(x, J, N)

// input: // // // output:

x, the I/O product imbalance estimate. J , the imbalance parameter. N , the sample size parameter. The value of (x; J; N ).

begin

:= 0:00001 r := 0 0 := 1 s0 := 0 x~ := x 2N=(1 ? J ) a := J 2N=(1 ? J ) while t > do begin r := r + 1 r := r?1 ax~=(2r)2 sr := sr?1 + r

end

rlast = r

return N=(1 ? J ) e?(~x+a)=2 srlast end The value of can be changed to suit the actual need for precision.

Proposition 4.2.4 Algorithm XI computes the correlation distribution with the required precision.

Proof The variable r is assigned the value r?1 ax~=(2r)2 and thus by induction we have

!2 1 (4.10) r = 2 4 6 (2r) (ax~)r since 0 = 1. Also by induction we see that sr = Prj=0p r . Recall the recurrence relation ?(x +1) = x?(x) and that p ?(1) = 1 and ?( 21 ) = . This implies ?(n +1) = n! and ?(n + 21 ) = 12 32 25 2n2?1 for n 2 IN. Using this with (4.10) yields 1 3 5 (2r ? 1) (ax~)r 1 r = 1 2 (2r) 2 4 6 (2r) 135(2r?1) p 1 1 p (ax~)r = (2r)! 2rr! ?(r + 12 ) 1 r p = (2r)! (ax~) ?(r + 1) :


29

p

Substituting x~ = x 2N=(1 ? J ) and a = J 2N=(1 ? J ) yields r = r = with r as de ned in (4.8). The return value is thus last last N p e? N x?JJ rX N e?(~x+a)=2 rX r = r 1?J (1 ? J ) r=0 r=0 1 X (1 ?NJ )p e? N x?JJ r r=0 = (x; J; N ): The while-statement secures the required precision. The algorithm terminates since r! grows faster than kr for positive k and r 2 IN. 2 We now present the proof of actually being the density function of imbalance estimates. Proposition 4.2.5 Let S be an I/O product, and let S1; S2; : : :; SN denote N independent random variables (called samples) each with the same distribution as S . Furthermore, let the random variable I~(S ) be an estimate of I (S ) based on the N samples of S , i.e., let N 2 X I~(S ) = N1 Sj : (4.11) ( + ) 1

( + ) 1

j =1

Then, assuming that N is large, the distribution of I~(S ) is close to the correlation distribution (; I (S ); N ).

Proof Let X and Y denote cipher input and output, as usual. Furthermore, let K

denote the key used in the nal group operation, and let Y denote the input to the last group operation, i.e., Y = Y + K . Assume that the I/O product is given by S = f (X ) g(Y ). We wish to determine the distribution of (4.11). In the following, let A = Re(S ), B = Im(S ), Aj = Re(Sj ), and Bj = Im(Sj ) for all j = 1; 2; : : : ; N . Then I (S ) = jE [S ]j2 = E [A]2 + E [B ]2: The sum of the variances of A and B is V [A] + V [B ] = E [A2] ? (E [A])2 + E [B 2] ? (E [B ])2 = E [A2 + B 2] ? I (S ) = E [jS j2] ? I (S ) = E [jf (X ) g(Y )j2] ? I (S ) = E [jf (X )j2 jg(Y + K )j2] ? I (S ): Computing the expected value over all possible k; x 2 G and letting y = y + k be the plaintext corresponding to the ciphertext x yields XX jf (x) g(y + k)j2 ?I (S ) + n12 k2G x2G

30


X X = ?I (S ) + n1 jf (x)j2 n1 jg(y + k)j2 x2G k0 2G 1 ! X X 1 1 2 @ 2A = ?I (S ) + n x2G jf (x)j n k2G jg(k)j = E [jf (X )j2] E [jg(K )j2] ? I (S ): Since (f; g) is an I/O pair, we have E [jf (X )j2] = E [jg(K )j2] = 1, resulting in the simpli ed relation V [A] + V [B ] = 1 ? I (S ): We have I~(S ) = A~2 + B~ 2; (4.12) where N N X X A~ = N1 Ai and B~ = N1 Bi: i=1

i=1

As N grows, the central limit theorem tells us that the distribution of A~ approaches a normal distribution with mean value E [A] and variance V [A]=N . Similarly, the distribution of B~ approaches a normal distribution with mean value E [B ] and variance V [B ]=N . To proceed, we now assume that V [A] and V [B ] are equal, i.e., that V [A] = V [B ] = 1 ? 2I (S ) : If V [A] 6= V [B ], we simply rotate S = A + iB in complex plane until V [A] = V [B ] by multiplying S by a suitable complex number w of magnitude 1. This causes no loss of generality since I~(wS ) = I~(S ) when jwj = 1. Dividing both sides of (4.12) by VN[A] , we now obtain 2N I~(S ) = C~ 2 + D~ 2; 1 ? I (S ) where C~ and D~ are random variables following normal distributions with variance 1 and mean values s s 2 N E [C~ ] = E [A] 1 ? I (S ) and E [D~ ] = E [B ] 1 ?2NI (S ) ;

respectively. According to [34], the sum of d squared normally distributed random variables with variance 1 follows the non-central 2-distribution with d degrees of freedom and skewness parameter s given by the sum of the squared mean values. Consequently, 1?2IN(S) I~(S ) follows the non-central 2-distribution with two degrees of freedom and skewness parameter s = E [C~ ]2 + E [D~ ]2 s s !2 !2 2 N 2 N = E [A] 1 ? I (S ) + E [B ] 1 ? I (S ) = I (S )2N : 1 ? I (S )


31

Let h(; s) denote the corresponding probability density function of 1?2IN(S) I~(S ). Then the density function of I~(S ) is ! ! ! 2 N 2 N (x; I (S ); N ) = 1 ? I (S ) h 1 ? I (S ) x; s : Substituting s = I1(?SI)2(SN) , I (S ) = J and simplifying yields (4.6).

4.2.1 Empirical Evidence for the Distribution Function

2

Computer simulations verify the validity of Proposition 4.2.5. Figures 4.3 and 4.4 show both the theoretical distribution and the empirical distribution of I~(S ) for various values of N and I (S ). The empirical distribution is indicated by the histogram and the theoretical distribution by the graph. Note how the variance decreases and how the top moves closer to x = I (S ) as N grows. The empirical distribution was calculated on basis of randomly chosen permutations over ZZ256. Each histogram represents 20000 estimates of I~(S ).


32

I(S) = 0.004302, N = 100 100 90 80 70

f(x)

60 50 40 30 20 10 0 0

0.005

0.01

0.015 x

0.02

0.025

0.03

I(S) = 0.004302, N = 1000 700

600

500

f(x)

400

300

200

100

0 0

0.001

0.002

0.003

0.004

0.005 x

0.006

0.007

0.008

0.009

1.6

1.8

0.01

I(S) = 0.004302, N = 10000 1500

f(x)

1000

500

0 0

0.2

0.4

0.6

0.8

1 x

1.2

1.4

2 −3

x 10

Figure 4.3: Some theoretical and empirical distributions of I~(S ).


33

I(S) = 0.008562, N = 100 45 40 35 30

f(x)

25 20 15 10 5 0 0

0.01

0.02

0.03

0.04 x

0.05

0.06

0.07

0.08

I(S) = 0.008562, N = 1000 100 90 80 70

f(x)

60 50 40 30 20 10 0 0

0.005

0.01

0.015 x

0.02

0.025

0.03

I(S) = 0.008562, N = 10000 350

300

250

f(x)

200

150

100

50

0 0

0.005

0.01

0.015

x

Figure 4.4: Some theoretical and empirical distributions of I~(S ).

34

Chapter 5 Attack Algorithms Based on the previous derivations of the probability distribution of I/O-products, we now present three attack algorithms of dierent complexity and strength. In this chapter, K denotes the set of possible last-round keys.

5.1 The Simple Attack By the term \simple" we refer to the usage of only one I/O pair (f; g). The simple attack is carried out by hypothesis testing via the Neyman-Pearson Lemma [27] (cf. Chapter 3). In the following X and Y denote the cipher input and output as usual, whereas k~ denotes a last-round key guess, and Y~ = R?k~ 1 (Y ) the corresponding input to the last round given that the last-round key is k~ (cf. Figure 3.2). Let (f; g) be an I/O pair and let S = f (X ) g(Y~ ) be the corresponding I/O product. Similarly, let

S (1::r?1) = f (X ) g(Y (r?1)) and S (r+1) = f (X ) g(Y (r+1)) be the I/O products over the reduced and expanded cipher, respectively. Furthermore, let there be given N P/C-pair samples, and let I~(S ) denote the corresponding imbalance estimate of I (S ) based upon these samples. For each guess of last-round key there are two possibilities: 1) the guess is the correct last-round key; or 2) the guess is a wrong last-round key. If possibility 1) is correct, then we have Y~ = Y (1::r?1) and thus the distribution of the imbalance estimate I~(S ) will be (x; I(S (1::r?1)); N ) (assuming that the hypothesis of xed key equivalence holds). If possibility 2) is correct, I~(S ) will have the distribution (x; I(S (1::r+1)); N ). To determine the correct possibility, we must distinguish, for each guess of lastround key, between two -distributions, one with parameter I(S (1::r?1)) and one with parameter I(S (1::r+1)). According to the Neyman-Pearson Lemma, the likelihood ratio (I~(S ); I(S (1::r?1)) (I~(S ); I(S (1::r+1)) provides the best possible test value to accomplish this. The higher this value is, the more probable is it that I (S ) = I(S (1::r?1)) meaning that Y~ = Y (r?1) (i.e., we 35

CHAPTER 5. ATTACK ALGORITHMS

36

have chosen the correct key). Thus, to nd the maximum likelihood key we must search for the last-round key that maximizes the likelihood ratio. This is done in the following algorithm. Algorithm for attacking a block cipher.

procedure ATTACK1((f; g), Z )

// input: (f; g), an I/O pair // Z , the set of collected P/C-pairs // output: The set of maximum likelihood last-round keys

begin for k~ 2 K do begin

I~ := 0 for (x; y)?21 Z do begin y~ := Rk~ (y) I~ := I~ + f (x) g(~y) end 2 I~ := I~=N L[k~] := (I~; I(Sfg(1::r?1)); N )=(I~; I(Sfg(1::r+1)); N )

end return fk~ : L[k~] = maxk L[k]g end

The complexity of the above algorithm is O(N jKj) if we approximate the complexity of evaluating by O(1).

5.2 The Advanced Attack In the above approach we use only one I/O pair to distinguish the correct key from the wrong keys. However, it is possible to use several I/O pairs in an improved attack equivalent to the approach with \multiple approximations" considered in [20] for use with linear cryptanalysis. This is done in the following where we consider a set of m I/O pairs A = f(f1; g1); (f2; g2); : : :; (fm; gm )g and the corresponding I/O products S = fS1, S2, : : :, Smg where St = ft(X ) gt(Y~ ) for t = 1; 2; : : : ; m. We assume independency between the estimated imbalances I~(S1); I~(S2); : : :; I~(Sm ) implying that the joint distribution of I~(S1Q); I~(S2); : : : ; I~(Sm) is given by the probability density function h(j1; j2; : : :; jm) = mt=1 (jt ; I(St); N ). The likelihood ratio of Neyman-Pearson's Lemma is then m (I~(S ); I(S (1::r?1))) Y t t (1::r+1) ~ )) t=1 (I (St); I (St

5.3. SIMPLIFYING THE ADVANCED ATTACK

37

resulting in the algorithm below. In the actual implementation, we evaluate the log-likelihood ratio m X log (I~(St); I(St(1::r?1))) ? log (I~(St); I(St(1::r+1))) t=1

so as not to cause over- or under ow. Since log is an increasing function, this is compatible with the old approach. Algorithm for attacking a block cipher.

procedure ATTACK2(A, Z ) // input: A = f(f1; g1); (f2; g2); : : : ; (fm; gm)g, the set of I/O pairs // Z , the set of collected P/C-pairs // output: The set of maximum likelihood last-round keys

begin for k~ 2 K do begin

L[k~] := 0 for t := 1; 2; : : : ; m do begin (f; g) := (ft; gt) I~ := 0 for (x; y)?21 Z do begin y~ := Rk~ (y) I~ := I~ + f (x) g(~y) end 2 I~ := I~=N L[k~] := L[k~] + log (I~; I(St(1::r?1)); N ) ? log (I~; I(St(1::r+1)); N )

end end return fk~ : L[k~] = maxk L[k]g end

The time complexity of the above algorithm is O(m N jKj), again assuming that evaluation of is possible in O(1) time. The memory complexities of both Algorithm ATTACK1 and ATTACK2 are not very high if at each step of the algorithm we store only the key guesses which have maximum L[].

5.3 Simplifying the Advanced Attack The two algorithms presented above have one major drawback in common: While they might be optimal in a maximum-likelihood sense, they are dicult to analyze since they give occasion for some very complex probability distributions. In this section we present a simpli cation of the advanced attack to facilitate analysis.


38

In Algorithm ATTACK2 several -distributed random variables are considered, namely the imbalance estimates for each of the I/O products S1; S2; : : :; Sm. In the following approach, a new random variable I~ is introduced which is a weighted sum of the various imbalance estimates. This sum almost follows a normal distribution which makes it at lot easier to do the hypothesis checking (we do not have to consider the high complexity -function). To simplify further we assume that I~(Sj ) is distributed the same way for all Sj whenever a(1wrong key is used, more precisely that the imbalance over the expanded ::r +1) cipher I (St ) = n?1 1 for all t (corresponding to identical entries in the correlation matrix of the expanded cipher). As mentioned earlier, this is called the hypothesis of wrong key randomization and was formulated in [12] for use with binary GLC. The hypothesis was demonstrated to hold for a variety of actual ciphers in [14]. Intuitively, it is reasonable that I (S (1::r+1)) is close to n?1 1 , at least when compared to I (S (1::r?1)), since the expanded cipher has got two more rounds than the reduced cipher (resulting in a \more random" output). The following de nes the weighted sum mentioned above.

De nition 5.3.1 Let S = fS1; S2; : : :; Smg be a set of m I/O products. Furthermore, let = (1 ; 2; : : : ; m) with kk2 = 1 be a vector consisting of m real numbers. The random variable I~ (S ) is then de ned by the following weighted sum of imbalance estimates

X I~(S ) = j I~(Sj ): m

j =1

One of the advantages of using this sum is its nice distribution, which is presented in the following proposition.

Proposition 5.3.2 Let S = fS1; S2; : : : ; Smg be a set of m I/O products. If I (S1), I (S2), : : :, I (Sm) are independent and small, then the random variable I~(S ) based upon N samples follows a normal distribution with mean value X 1 ~ E [I (S )] = I (S ) +

and variance

j

j

j

N

V [I~(S )] = N ?2 :

Proof According to the proof of Proposition 4.2.5, the random variable J~ = 1 ?2NI (S ) I~(S )

follows a non-central 2-distribution with 2 degrees of freedom and skewness parameter s = 2NI (S )=(1 ? I (S )). This means that J~ has mean value E [J~] = 2 + s and variance V [J~] = 4 + 4s (see [19]). Consequently, the random variable I~(S ) = 1 ?2NI (S ) J~

5.3. SIMPLIFYING THE ADVANCED ATTACK has mean value

and variance

39

E [I~(S )] = (2 + s) 1 ?2NI (S ) = 1 ?NI (S ) + I (S ) I (S ) + N1

!2 1 ? I ( S ) ~ V [I (S )] = (4 + 4s) 2N 2 2 = 1 ? 2I (SN)2+ I (S ) + 4I (S ) 2?N4I (S ) N12 : Then by the central limit theorem, I~(S ) follows a normal distribution with mean value m X ~ t E [I~(St)] E [I(S )] = t=1 m X t I (St) + N1 = t=1 and variance m X 2t V [I~(St)] V [I~(S )] = t=1

=

m X t=1

2t =N 2

= N ?2;

since kk2 = 1. 2 Thus, if our guess k~ equals the correct key, then the random variable I~(S ) follows a normal distribution with mean value 1 = E [I~(S (1::r?1))] m X t (I (St(1::r?1)) + N1 ) = t=1

according to Proposition 5.3.2. If we have guessed a wrong key, then by the hypothesis of wrong key randomization, I~(S ) has mean value 2 = E [I~(S (1::r+1))] m X t (I (St(1::r+1)) + N1 ) = t=1 X 1 t n ? 1 + N1 : t


40

In either case, the variance is N ?2. With this knowledge of 1 and 2 it is possible to nd the exact choice of the vector which results in the best attack. We want to distinguish between the distributions N (1; N ?2 ) and N (2; N ?2 ) (here N (; V ) denotes the normal distribution with mean value and variance V ). Due to the variances being identic and constant with respect to in the two cases, to get the most powerful statistical test we should choose = (1; 2; : : :; m) with kk2 = 1 such that the two mean values 1 and 2 are as far apart as possible. Assuming that the hypothesis of xed key equivalence holds, this happens when the value X (1::r?1) 1 ? 2 = t I (St )? 1 n?1 t X t I(St(1::r?1)) ? n ?1 1 t

is as large as possible. By Cauchy's inequality this happens for the unique choice of where 1 2 m = = = ; (1 ::r ? 1) (1 ::r ? 1) (1 ::r ? 1 1 I(S1 ) ? n?1 I(S2 ) ? n?1 I(Sm 1)) ? n?1 1 i.e., I(St(1::r?1)) ? n?1 1 (5.1) t = l for all t = 1; 2; : : : ; m with "X 2# m 1 (1 ::r ? 1) I(St )? n?1 l= t=1 1 2

since kk2 = 1. This corresponds to the mean values m X t (I(St(1::r?1)) + N1 ) 1 = t=1 m I(St(1::r?1)) ? 1 X n?1 (I(S (1::r?1)) + 1 ) = t l N t=1

and

1 1 t n ? 1 + N 2 = t=1 m I(St(1::r?1)) ? 1 1 X 1 n?1 = + l n?1 N t=1

(5.2)

m X

and a dierence of

1 ? 2 =

m X t=1

(1::r?1) 1 1 1 t I (St )+ N ? n?1 ? N

(5.3)

5.3. SIMPLIFYING THE ADVANCED ATTACK m I(St(1::r?1)) ? 1 X 1 1 1 (1 ::r ? 1) n ? 1 I (St )+ N ? n?1 ? N = l t=1 2 m I(St(1::r?1)) ? 1 X n?1 = l t=1 "X 2# m 1 (1::r?) I (St ) ? n ? 1 = : t=1 1 2

41

(5.4)

We can now present the algorithm of the simpli ed attack. Algorithm for attacking a block cipher.

procedure ATTACK3(A, Z ) // input: A = f(f1; g1); (f2; g2); : : : ; (fm; gm)g, the set of I/O pairs // Z , the set of collected P/C-pairs // output: The set of maximum likelihood last-round keys begin for k~ 2 K do begin

L[k~] := 0 I~ := 0 for t := 1; 2; : : : ; m do begin (f; g) := (ft; gt) I~ := 0 for (x; y)?21 Z do begin y~ := Rk~ (y) I~ := I~ + f (x) g(~y) end 2 I~ := I~=N I~ := I~ + I~ (I(St(1::r?1)) ? n?1 1 )

end

L[k~] := I~

end return fk~ : L[k~] = maxk L[k]g end In the algorithm, the constant value of l from (5.1) has been multiplied out to simplify (this is allowed since multiplication with a constant does not change the the occurrence of the maximum value L[]).

42


5.4 Success Probability In this section we consider how to estimate the success probability of algorithms ATTACK1 and ATTACK3. Clearly, ATTACK1 is most successful when the considered I/O product has a large imbalance over the reduced cipher. Similarly, ATTACK3 performs best when one considers as many I/O products as possible.

5.4.1 The Simple Attack

As mentioned, the potential power of ATTACK1, when applied to a given cipher, can be measured by the maximum imbalance over the corresponding reduced cipher. The next theorem will show how to obtain this value and the corresponding I/O product(s) that attain this imbalance. As a side result, we will see that linear I/O products are in some sense the best. However, before formulating this, we need a lemma. Lemma 5.4.1 Given two vectors v and w each with nonnegative components and P j vj = 1, the following inequality holds v w wmax; where wmax = maxj wj . Equality holds if and only if va = 0 whenever wa 6= wmax. Proof Expanding the inner product yields X vw = vj wj j X vj wmax j = wmax: It is easily seen, that we have equality if and only if vj is nonzero only where wj = wmax. 2 We are now ready to formulate a proposition with which one can nd maximum imbalances over certain ciphers. Proposition 5.4.2 Maximum imbalance. Let there be given a cipher with a group operation at the entry and at the exit, and let C = [cab] denote the correlation matrix. Furthermore, let X and Y denote the cipher input and output, respectively. Then the maximum average-key imbalance over all possible I/O products I(f (X ) g(Y )) Imax = (f;gmax )2P G

is given by the maximum element of the correlation matrix C . More formally, Imax = max c : a;b ab Moreover, the linear I/O sums in the set fLab : cab = Imaxg attain this maximum as their imbalance.

5.4. SUCCESS PROBABILITY

43

This result tells us that, in some sense, the linear I/O products are the best. Proof Let (f; g) be an I/O pair, and let f^ and g^ denote the vectors of the corresponding Fourier power spectra like in Proposition 4.1.14. Furthermore, let S = f (X ) g(Y ). Then according to (4.2), I(S ) = f^T C g^: (5.5) Recall Parseval's identity and the de nition of a correlation pair we have P f^ =that P gby ^ = a a b b 1. We rst show that I(S ) is maximized by choosing f^ = ea and g^ = eb for some values of a and b (ej being the j -th basis vector, i.e., the all-zero vector except for a 1 at position j ). Assume that we already know a g^ which maximizes (5.5) (together with some f^). Then w = C g^ is a vector with nonnegative components. According to Lemma 5.4.1 choosing f^ = ea such that wa = maxj wj will maximize the value of f^T w = f^T C g^. In a similar manner one can show that choosing g^ = eb for some b will maximize (5.5). Thus, the maximum is given by eTa Ceb = cab for some values of a and b. Application of the inverse Fourier transform yields the corresponding I/O product is

F ?1fag(X ) F ?1fbg(Y ) = a(X ) ?b(Y ) = Lab (X; Y ):

2

Proposition 5.4.2 proves the hypothesis of Harpes that the maximum imbalance of the so-called I/O sums used with GLC is attained by a homomorphic1 I/O sum (at least for ciphers working over the xor-group, (ZZw2 ; +), since in this case CC is equivalent to GLC). In the following example we apply Proposition 5.4.2 to one round of the cipher IDEA(8) and to the full cipher.

Example 5.4.3 The block cipher IDEA [24, 25] is an example of a cipher with a group operation at the entry and at the exit. To be more exact, the generalized

version IDEA(w) with bitwidth w, v = w=4, m = 2v , and m + 1 prime is given by the computational owchart of Figure 5.1. Here represents addition over ZZv2 (xor), + represents addition modulo m, and represents multiplication modulo m + 1 with 0 m (!). Accordingly, the group operation at the entry of the cipher is given by a b = (a1 b1; a2 + b2 ; a3 + b3; a4 b4). The corresponding group (G; ) is decomposable into four cyclic groups, viz., G = ZZm+1 ZZm ZZm ZZm+1 . To transform the cipher into a cipher which uses only additive cyclic groups, we have to incorporate the isomorphism from (ZZ2n +1; ) to (ZZ2n ; +) and back into the keyed round permutation. More precisely, let 'k ; : G ! G denote the permutation in the round function following the group operation . De ne the new round function '~k ; : G~ ! G~ by (5 6)

(5 6)

'~k ; (x1; x2; x3; x4) = log1;4('k ; (exp x1; x2; x3; exp x4)); (5 6)

1

(5 6)

The GLC-notion of a homomorphic I/O sum is equivalent to the notion of a linear I/O product.


44

X1 (1)

X2

X3

(1)

(1)

Z2

Z1

X4 (1)

Z3

Group operation,

Z4

(1)

Z5

Round permutation, ϕ (1) Z6

Seven more rounds Output permutation (9)

(9)

Z1

(9)

Z2 Y1

(9)

Y2

Group operation,

Z4

Z3

Y3

Figure 5.1: The block cipher IDEA.

Y4


45

where the function log1;4 : G ! G~ is de ned by

log1;4(x1; x2; x3; x4) = (log; x1; x2; x3; log x4): Here log and exp are respectively the discrete logarithm function and the exponential function applied to the same basis (an arbitrary generator of ZZm+1 ). The new equivalent owchart is given by Figure 5.2. The function '~k ; can be thought (5 6)

X1 (1)

X2

X3

(1)

Z1

(1)

Z2

X4 (1)

Z3

Group operation,

Z4

EXP

~

EXP

(1)

Z5

~ Round permutation, ϕ (1) Z6

LOG

LOG

Seven more rounds Output permutation (9)

(9)

(9)

Z2

Z1

Y1

(9)

Z3 Y2

Group operation,

Z4 Y3

~

Y4

Figure 5.2: A block cipher which is equivalent to IDEA. of as the bijection of a cipher equivalent to IDEA which uses the group operation a ~ b = (a1 + b1; a2 + b2; a3 + b3; a4 + b4) instead of the usual . The mini-version of IDEA called IDEA(8) has w = 8 and thus m = 4. For this cipher we have found the correlation matrix C = [cab] corresponding to the altered round function '~ by using a computer program. The matrix C has a maximum value of 0:25 at cab with, e.g., a = (0; 0; 0; 1) and b = (0; 0; 2; 0). The character corresponding to a = (a1; a2; a3; a4) 2 ZZ44 is a(x) = ia x +a x +a x +a x . This means that the highest average-key imbalance over one round is 0:25 and that this imbalance 1 1

2 2

3 3

4 4


46

is attained by the I/O product Smax = (0;0;0;1)(X )?(0;0;2;0)(Y ) = iX (?1)Y (among others). We now use Propositions 4.1.15 and 5.4.2 to nd the maximum imbalance over the reduced cipher. IDEA(8) has 8 rounds, and thus to obtain the maximum possible imbalance for an I/O product over the reduced cipher we compute the 7th power of the correlation matrix C . We get Imax = maxa;b (C 7 )ab = 0:004994 for a = (0; 2; 2; 0) and b = (1; 0; 2; 0). This imbalance is attained by the I/O product S (1::7) = i2a +2a ?b ?2b = (?i)b (?1)a +a +b (among others). 4

2

3

1

3

1

2

3

3

3

For xor-ciphers, a bound on the diagonal elements of the correlation matrix are possible to obtain by considering the degree of the cipher expressed as a polynomial from ZZw2 [x].

Proposition 5.4.4 Let there be given an xor-cipher ek : ZZw2 ! ZZw2 with correlation matrix C = [cab]. Assume that caa = 6 1 for all a, i.e., none of the I/O products corresponding to diagonal elements in the correlation matrix are guaranteed. In addition, let d denote the maximum degree over all k of ek expressed as a polynomial from ZZw2 [x], i.e., let d = max deg(ek (x)): k

Then

caa (d ?n 1) ; 2

for all a, where n = 2w .

In fact, the proposition holds for all ciphers over groups whose cyclic decomposition consists only of groups of a prime order, e.g., G = ZZ3 ZZ45 or G = ZZ9257. However, most ciphers work over a group whose order is a power of 2 which is why we have covered the above case only. Proof Let (f; g) = (a; a) be a linear I/O pair corresponding to the diagonal element caa and let sk (x) = a(x) ?a(ek (x)). Furthermore, let S = sK (X ) be the I/O sum corresponding to (f; g). The diagonal element caa is expressible in the following way caa = I(S ) = I(a(X ) ?a(Y ) = I(a(X ? Y )): Weil's theorem ([28], Theorem 5.38) tells us that for any non-trivial character , we have X (5.6) w (h(x)) (deg(h(x)) ? 1)n x2ZZ if h 2 ZZw2 [x] is not on the form 1 2

2

h = u2 ? u + v

(5.7)


47

for some u 2 ZZw2 [x] and v 2 ZZw2 . The inequality (5.6) implies 2 (deg(x ? ek (x)) ? 1)2 1 X : s ( x ) n w k n x2ZZ 2

The left hand side of the above expression equals I (S jK = k). We now have left to prove that h(x) = x ? ek (x) is not on the form (5.7). However, if x ? ek (x) was on the indicated form, then I (S jK = k) would equal 1 since, in that case, the left hand side of (5.6) would equal n (for an explanation of this see [28]). But we have assumed that there are no guaranteed I/O products and we have a contradiction. We nally obtain X I(S ) = K1 I (S jK = k) k2K

2 X K1 (deg(x ? enk (x)) ? 1) k2K 2 (d ?n 1) :

2

Proposition 5.4.4 states that if the ciphertext is expressible as a polynomial of suf ciently low degree with respect to the plaintext, then the maximum imbalance of certain I/O products over the cipher is low. The proposition provides a more analytical approach for examining some ciphers since it is easy to obtain an upper bound on the degree of polynomials over a whole cipher when the degrees of polynomials over the individual rounds are known. Low Boolean complexity, however, is not a good design criterion, since ciphers of low polynomial degrees are susceptible to other attacks than statistical. At least, care should be taken if using Proposition 5.4.4 to design a cipher. Siegenthaler [46] (see also [47]) has shown a somewhat related result for stream ciphers based on combining Boolean linear feedback shift registers. More precisely, he has shown that there is a linear tradeo between the correlation immunity and the Boolean complexity of the combiner if the combiner is memoryless.

5.4.2 The Advanced Attack

Clearly, the advanced attack is more powerful than the simple attack since we receive information from several I/O products. In this section, we seek to determine how many P/C-pairs ATTACK3 requires for a successful cryptanalysis. This number is also an upper bound for the number of P/C-pairs required by ATTACK1 (since this is a weaker attack). The strongest attack with Algorithm ATTACK3 of course uses all the I/O pairs at disposal, e.g., all the linear I/O products S = fLab : a; b 2 Gg with imbalance as indicated by the correlation matrix (additional I/O products would be redundant since the linear I/O products completely describe the correlational properties (cf. Proposition 4.1.14)). We call this a full attack. However, since all the imbalance

48


estimates corresponding to these linear I/O products are not always independent, it is not possible directly to use Proposition 5.3.2 to acquire information about the distribution of I~(S ) since this proposition requires the random variables to be independent. In the following, however, we assume that the imbalance estimates of the I/O products from S are independent. Since independent estimates leak more information than correlated estimates, this hypothetical situation is a much stronger attack than otherwise possible. Consequently, if this hypothetical full attack with independent imbalance estimates requires N P/C-pairs to succeed, then the \real" attack requires more than N pairs. Therefore, the following theorems which determine N for the hypothetical attack can be thought of as lower bounds for N for real attacks. Next, after choosing the optimal given by (5.1), we proceed to determine the distribution of I~(S ) given a correct and a wrong key guess, respectively. According to Proposition 5.3.2 the two distributions are the normal distributions N (1; N ?2 ) and N (2; N ?2 ) with 1 and 2 given by (5.2) and (5.3). Thus, to distinguish between correct and wrong keys, we wish to test which one of the two hypotheses H0: I~(S ) 2 N (1; N ?2), and H1: I~(S ) 2 N (2; N ?2) is true. This is equivalent to testing between the hypotheses H0: I~ ? 2 2 N (1 ? 2; N ?2), and H1: I~ ? 2 2 N (0; N ?2 ). According to (5.4) "X 2# m 1 (1 ::r ? 1) I(St )? n?1 1 ? 2 = t=1 which in this case equals 3 2 3 2 (1::r?1) 2 2 X X 4 cab ? n ?1 1 5 I(Lab ) ? n ?1 1 5 = 4 a;b2Gnf0g a;b2Gnf0g = kC ? E k2: 1 2

1 2

1 2

Based on this, we now derive a formula for determining how many P/C-pairs one have to consider to mount a successful attack against a given cipher for which the value of kC ? E k2 is known. In the following, jKj will denote the number of lastround keys. Theorem 5.4.5 Given a block cipher, let C be the correlation matrix of the corresponding reduced cipher and let jKj denote the number of possible last-round keys. Then the minimum number N of P/C-pairs required for a full correlation attack with independent imbalance estimates to succeed with probability p is ?1(1 ? jKj1 ) + ?1(p) : (5.8) N= kC ? E k2


49

where (x) = P [N (0; 1) x] is the accumulated normal distribution function and E = [eab] is given by eab = n?1 1

Proof Consider algorithm ATTACK3. Instead of outputting the last-round key k~ which has the highest value of I~, assume that there is a constant above which we declare I~ ? 2 to belong to N (kC ? E k2; N ?2) and below which we declare I~ ? 2 to belong to N (0; N ?2 ) (corresponding to guessing a correct and a wrong key, respectively). We do not want any false positives (i.e., last-round key guesses which are declared correct, when they are not), and thus for Z1 2 N (0; N ?2 ) we want P [Z > ] 1 : (5.9) 1

jKj

Similarly, we do not want a false negative, i.e., we want the attack algorithm to succeed with a reasonable probability p. In other words, for Z0 2 N (kC ? E k22; N ?2 ) we want P [Z0 > ] p: (5.10) Both of these inequalities have to hold for the attack to succeed. Thus the smallest number N for which they hold for some value of is the value we are looking for. Equation (5.9) is equivalent to P [Z~ > N ] 1 1

jKj

where Z~1 = N Z1 2 N (0; 1). This inequality has the solution ?1 (1 ? jKj1 )

: N

(5.11)

Equation (5.10) is equivalent to P [Z~0 > N ( ? kC ? E k2)] p where Z~0 = N (Z0 ? kC ? E k2) 2 N (0; 1). This inequality has the solution ?1 (5.12)

(1N ? p) + kC ? E k2: The critical point where the right hand sides of (5.11) and (5.12) are equal is now obtained by solving ?1(1 ? jKj1 ) ?1(1 ? p) = + kC ? E k2 N N with respect to N . The solution is ?1 (1 ? jKj1 ) ? ?1 (1 ? p) N= kC ? E k2


50

which simpli es to (5.8). 2 The following table shows the value of ?1(1 ? jKj1 ) + ?1(p) for various values of p and jKj = 2b where b is the number of bits in the last-round key guess. b

p 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

50% 0 0.67 1.15 1.53 1.86 2.15 2.42 2.66 2.89 3.10 3.30 3.49 3.67 3.84 4.01 4.17

90% 1.28 1.96 2.43 2.82 3.14 3.44 3.70 3.94 4.17 4.38 4.58 4.77 4.95 5.12 5.29 5.45

95% 97.5% 99% 99.99% 99.9999% 1.64 1.96 2.37 3.72 4.75 2.32 2.63 3.00 4.39 5.43 2.80 3.11 3.47 4.87 5.90 3.18 3.49 3.86 5.25 6.29 3.51 3.82 4.19 5.58 6.62 3.80 4.11 4.48 5.87 6.91 4.06 4.38 4.74 6.14 7.17 4.30 4.62 4.99 6.38 7.41 4.53 4.85 5.21 6.60 7.64 4.74 5.06 5.42 6.82 7.85 4.94 5.26 5.62 7.02 8.05 5.13 5.45 5.81 7.21 8.24 5.31 5.63 5.99 7.39 8.42 5.49 5.80 6.17 7.56 8.60 5.65 5.97 6.34 7.73 8.76 5.81 6.13 6.50 7.89 8.92

Theorem 5.4.5 makes it possible to prove sometimes that a given cipher is secure against CC. This happens if, e.g., N exceeds the number of possible P/C-pairs. Example 5.4.6 In this example we show that the cipher IDEA(8) is immune to correlation cryptanalysis. Let C 7 denote the correlation matrix of the reduced cipher as found in Example 5.4.3. The corresponding value of the Frobenius norm is kC 7 ? E k2 = 0:007600. Then according to (5.8) and the above table, the minimum number of P/C-pairs required to correctly guess the 12-bit last-round key with a probability of more than p = 50% is ?1 (1 ? jKj1 ) + ?1(p) N = kC ? E k2 3:49=0:007600 459 which is more than the available number of P/C-pairs (namely 28 = 256). Consequently, IDEA(8) is secure against CC (assuming independent round keys). As we shall see later, this implies that IDEA(8) is secure against LC, GLC, PC, and DC, too.

Chapter 6 Bounds for Multiple Rounds When examining the security of a cipher against CC, it is not always desirable nor practical to have to consider the correlation matrix of the whole cipher, since it is generally hard to obtain. In the following, we will provide various upper bounds for multiple rounds which depend on only one value derived from the correlation matrix of the round function. With these bounds, one can easily determine how many rounds a cipher should have to make it secure against CC. We will assume that the round functions of the considered cipher are identical, implying identical correlation matrices for each round. The results, however, are adaptable for use with ciphers using several dierent round functions. First, we will derive bounds related to ATTACK1, i.e., bounds that tell us something about the maximum value of the correlation matrix of the whole cipher, then we present bounds related to ATTACK3 telling us how many P/C-pairs are required for a successful attack.

6.1 The Simple Attack The success of the simple attack depends on nding an I/O product with a high imbalance. In this subsection we develop upper bounds for the imbalance over a given cipher. First, however, we need a lemma. Lemma 6.1.1 Let C be a correlation matrix, and let E = [eab] be the (n?1)(n?1) matrix with eab = n?1 1 for a; b 2 G n f0g. Then C r ? E = (C ? E )r : (6.1) for all r 2 IN. Furthermore, 2(C ) = 1(C ? E ); (6.2) i.e., the second highest singular value of C is the highest singular value of C ? E . Proof To prove (6.1), we use induction on r. Clearly, (6.1) holds for r = 1. Then for r = s + 1 we get (C ? E )s+1 = (C ? E )s (C ? E ): 51

52

CHAPTER 6. BOUNDS FOR MULTIPLE ROUNDS

Using the induction hypothesis C s ? E = (C ? E )s we get (C s ? E )(C ? E ) = C s+1 ? EC + EE ? C sE: Since C and C s are doubly stochastic, the above simpli es to C s+1 ? E + E ? E = C s+1 ? E ending the proof of (6.1). Next, write D = C T C on the form D = V ?1V where is the diagonal matrix with the eigenvalues of D (including multiplicities) in descending order of magnitude and the rows of V contain the corresponding eigenvectors. Since C and thereby D are doubly stochastic, 1 is the highest eigenvalue of D and (1=(n ? 1); : : :; 1=(n ? 1)) is a corresponding eigenvector. In other words, = diag(1; 2(C )2; 3(C )2; : : :; n(C )2) (recall that the eigenvalues of D are the squared singular values of C ), and 0 1=(n ? 1) 1=(n ? 1) 1=(n ? 1) 1 BB v2;1 v2;2 vn?1;2 CCC B V =B ... ... ... ... CA : @ vn?1;1 vn?1;2 vn?1;n?1 Furthermore, since D is Hermitian, V is unitary, i.e., V ?1 = V . Another way to write E is the following E = (1=(n ? 1); 1=(n ? 1); : : :; 1=(n ? 1)) (1=(n ? 1); 1=(n ? 1); : : :; 1=(n ? 1))T = V diag(1; 0; : : : ; 0) V; and thus 1(C ? E )2 = 1((C ? E )T (C ? E )) = 1(C T C ? C T E + E T E ? E T C ) = 1(D ? E + E ? E ) = 1(V diag(1; 2 (D); : : : ; n (D)) V ?V diag(1; 0; : : : ; 0) V ) = 1(V diag(0; 2 (D); : : : ; n (D)) V ) = 2(D) = 2(C )2 proving (6.2). 2 In fact, the lemma holds for all doubly stochastic matrices C , which is easily checked. The following theorem gives an upper bound for the imbalance of an I/O product over a whole cipher when the correlation matrix is known for the round function.

6.1. THE SIMPLE ATTACK

53

Theorem 6.1.2 Let there be given a cipher with r rounds each with correlation

matrix C . Then the average-key imbalance of any I/O product S over that cipher is bounded by I(S ) n ?1 1 + kC ? E kr with k k replaced by any one of the following matrix norms: k k1 , k k2 , jj jj1, jj jj2, or jj jj1.

Proof According to Proposition 2.4.5, the maximum norm of C = [cab], i.e., kC k1 = maxab jcabj is upper-bounded by any of the listed matrix norms. Using rst Propositions 5.4.2 and 4.1.15, then the triangle inequality and (6.1) of Lemma 6.1.1, we obtain I(S ) = max (C r)ab a;b n ?1 1 + kC r ? E k1 n ?1 1 + kC r ? E k = n ?1 1 + k(C ? E )r k n ?1 1 + kC ? E kr ;

where k k is any one of the listed norms.

2

Corollary 6.1.3 Let there be given a cipher with r rounds each with correlation

matrix C . Then the average-key imbalance of any I/O product S over that cipher is bounded by an expression involving a singular value of C , namely I(S ) n ?1 1 + 2(C )r:

Proof Follows from (6.2) and application of Theorem 6.1.2 with k k = jj jj2, the

spectral norm. 2 Thus for ciphers with 2 < 1 we can get force the maximum imbalance as close to n?1 1 as we wish by using an appropriate number of rounds. More tight upper bounds are possible to derive by considering two or more rounds at a time, i.e., for r divisible by m compute n?1 1 + [2(C m)]r=m. Using m = r of course gives us the correct maximum imbalance right away (via Proposition 5.4.2) since we compute C r. Corollary 6.1.3 leaves us with the problem of nding the second highest singular value of a given correlation matrix. With the following proposition, we can nd an upper bound for the second highest singular value.


54

Proposition 6.1.4 The second highest singular value of the correlation matrix C = [cab] is bounded from above by 8 !9 ! < = X X T C) T C) 2(C ) min : 1 ? min ; 1 ? min ( C ( C ab ; ab a b 1 2

1 2

a

b

Proof The second highest eigenvalue of a doubly stochastic matrix M = [mab] obeys the following bound

(

2(M ) min 1 ?

X b

min a mab; 1 ?

X a

)

min mab : b

For a proof of this consult [2], Theorem 5.10, where a similar bound is proved for stochastic matrices. This nishes the proof since the singular values of C are just the square roots of the eigenvalues of the doubly stochastic matrix C T C . 2

Example 6.1.5 In this example, we use the correlation matrix from Example 5.4.3 to obtain an upper bound for the imbalance of a multiround I/O product over IDEA(8). Let S (1::r) denote an I/O product over r rounds. By Proposition 6.1.4, we have 1 according to Corollary 6.1.3, the 2(C ) 0:9813, and since I(S (1::r)) 2(C )r + 255 following table is obtained r 1 2 3 4 5 6 7 8 I(S (1::r)) 0.9853 0.9669 0.9489 0.9312 0.9139 0.8969 0.8802 0.8638 As we have seen already in Example 5.4.3, the true maximum imbalance for 7 rounds is approximately 0:004994. Thus, using Proposition 6.1.4 provides us with a bound, but it is very loose and should be used only for a preliminary cipher evaluation.

As the above example has shown us, the bound given in Proposition 6.1.4 is not very tight, and unfortunately as the size of the correlation matrix grows, in most cases the bound gets even worse. This is due to the increased probability of getting very small minima in the rows and columns of the matrix C T C resulting in a bound closer to 1. Furthermore, the bound requires the calculation of C T C which is time-consuming. To establish a tighter multiround bound we need a way to compute a better approximation of the second highest singular value than provided by Proposition 6.1.4. In the following, we present an algorithm for nding the second highest singular value of a doubly-stochastic matrix. The algorithm is a variant of the well-known iterative algorithm for computing a numerical approximation of the spectral radius of a given matrix.

6.1. THE SIMPLE ATTACK

55

Algorithm for nding the second highest singular value of a correlation matrix.

procedure SINGULARVALUE(C , )

// input: C , a doubly stochastic matrix // , the precision // output: , the second highest eigenvalue of C

begin

x := (1; 0; : : : ; 0) D := C ? E , where Eab = 1=(n ? 1) for all a; b := 1

repeat

z := DT (Dx) ~ := := maxj zj x := z= until j p? ~ j < return( )

end

The algorithm is based upon the following proposition. Proposition 6.1.6 Let M be an m m matrix, and let 1; 2; : : :; m be the algebraic eigenvalues of M (including multiplicities), such that j1j j2j jm j. Furthermore, let vj denote the eigenvector corresponding to j , and let w0 2 Cm be an arbitrary vector with a non-zero component in the v1 direction. Finally, let wr = Mwr?1 for k = 1; 2; : : :. Then the spectral radius (M ) of M is given by T wr+1 y r (M ) = rlim !1 yrT wr : where y0; y1 ; : : : are any non-zero vectors. Proof The eigenvectors form a base, and thus we may write w0 = c1v1 + c2v2 + + cmvm; where c1; c2; : : :; cn 2 C and c1 6= 0 (w0 has a non-zero component in the v1 direction). Then wr = c1r1v1 + c2r2v2 +! + cmrm vm !r ! r 2 m r = 1 c1v1 + c2 v2 + + cm vm : 1

1

For r suciently large, this sum will be dominated by the term c1r1v1, and therefore yr wrT+1 yr v1T c1r1+1 = : 1 yr wr yr v1T c1r1

56


Since the spectral radius of a matrix is just the maximum absolute eigenvalue, the proof is nished. 2 What follows is a brief explanation of the algorithm. First, the algorithm uses Lemma 6.1.1 to get rid of the highest singular value of C by subtracting E from C . It then proceeds by computing the new highest singular value (i.e., the original second highest singular value); this is done by nding the highest eigenvalue of DT D and returning its square root. Letting := maxj zj and computing x := z= is equivalent to choosing yr of Proposition 6.1.6 to be the all-zero vector except for a 1 at the same position where z has its maximum (this gives fast convergence). Finally, z is scaled down to prevent over ow. On some (very rare) occasions, the initial value of x = (1; 0; : : : ; 0) has no component in the direction corresponding to the highest eigenvalue of D. In this case, the algorithm does not produce the correct result. To solve this problem, simply try another value of x. Algorithm SINGULARVALUE has complexity O(j n2 ) where j is the number of iterations required. As one can see from the above proof, j depends upon the need for precision and upon the ratio between the third and the second highest singular values of C of which we know nothing a priori. However, the number of iterations should not grow very high even if the ratio is only slightly below 1. The factor n2 is due to the two matrix/vector multiplications done in the step z := DT (Dx). By computing Dx rst instead of DT D we avoid the time-consuming calculation of a matrix product. Hansen treats other methods for computing singular values in [11].

Example 6.1.7 After 56 iterations, Algorithm SINGULARVALUE nds the 4 de-

cimals approximation of the second highest singular value of the correlation matrix of the round function of IDEA(8) to be 0:6792. Again, let S (1::r) denote an I/O product over r rounds. Using Theorem 6.1.3, we obtain the following table

r 1 2 3 4 5 6 7 8 I (S (1::r)) 0.6832 0.4653 0.3173 0.2168 0.1485 0.1021 0.0706 0.0492 These values are closer to the true maximum value of I(S 1::r) than those presented in Example 6.1.5.

6.2 The Advanced Attack Recall the formula (5.8) from the previous chapter for obtaining the number N of P/C-pairs required for a successful attack. The number depends on the Frobenius norm kk2 of a certain matrix. Since kk2 is a matrix norm and since the correlation matrix is multiplicative, it is easy to obtain lower bounds for N also for multiround ciphers. This is done in this section in a manner similar to the proofs of, e.g., Theorem 6.1.2 and Corollary 6.1.3 where upper bounds on the imbalance where obtained by considering various matrix norms.

6.2. THE ADVANCED ATTACK

57

Proposition 6.2.1 Let C be the correlation matrix of the round function of an rround cipher. Then the number of samples N necessary for a successful correlation attack obeys the following bounds ?1(1 ? jKj1 ) + ?1(p) N kC ? E k2r?1 ?1(1 ? jKj1 ) + ?1(p) kC ? E k1r?1 : where p is the desired success probability and jKj is the number of last round key guesses. Some other bounds are ?1(1 ? jKj1 ) + ?1(p) N pn kC ? E kr?1 with k k replaced by either jj jj2 (the spectral norm), jj jj1 (the maximum absolute column sum norm), or jj jj1 (the maximum absolute row sum norm). The last 3 bounds are particularly useful if kC ? E k2 1 (in this case the rst 2 bounds are worthless). Proof Let C (1::r?1) denote the round function of the reduced cipher. Using Propositions 5.4.5 and 2.4.5 yields ?1(1 ? jKj1 ) + ?1(p) N = kC (1::r?1) ? E k2 ?1(1 ? jKj1 ) + ?1(p) = kC r?1 ? E k2 ?1(1 ? jKj1 ) + ?1(p) = k(C ? E )r?1k2 ?1(1 ? jKj1 ) + ?1(p) kC ? E k2r?1 ?1(1 ? jKj1 ) + ?1(p) kC ? E k1r?1 :

p

The other bounds follow analogously (since kDr k2 njj jj according to the table of Proposition 2.4.5). 2 Assume that we are designing a cipher, and that we want it to be immune against CC. In other words, we want the CC attack to be at least as computationally expensive as a brute-force attack. The expected number of encryptions by a bruteforce attack is 21 jK(1::r)j = 21 jKjr where jK(1::r)j and jKj are the number of possible keys for the whole cipher and for one round, respectively (since one must expect on average to run through half of the keys before nding the correct one). The expected number of operations gone through by a simple CC attack (i.e., using one I/O pair only) is 12 jKjN . Recalling (5.8), the condition for security can

58 be expressed by


1 jKjr 1 jKj ?1(1 ? jKj1 ) + ?1 (p) : 2 2 kC ? E k2r?1 Solving this with respect to integer-valued r produces 2 ?1 (1 ? 1 ) + ?1 (p)) 3 log( r = 66 log(jKj kjKjC ? E k ) 77 + 1; 7 6 2 which is the required number of rounds. CC becomes impossible for even an computationally unbounded adversary when the required number of P/C-pairs N exceeds the number of possible plaintexts n (note, however, that DES is an example of a cipher where the number of possible keys is less than the number of possible plaintexts). In this case the required number of rounds r can be found by solving ?1 (1 ? jKj1 ) + ?1(p) n kC ? E k2r?1 : The integer solution to this is 2 ?1 (1 ? 1 ) + ?1 (p)) ? log n 3 log( jKj 77 + 1: r = 66 log kC ? E k2 7 6 Incidentally, notice that if an adversary has N = n available P/C-pair samples then he also has the entire mapping x 7! y of plaintexts x to ciphertexts y leaving no need to nd the key. Until now, we have derived only lower multiround bounds on the number N of P/C pairs necessary for a successful attack. The number of pairs actually needed may be higher. The following proposition gives us an upper bound on the sucient number of P/C-pairs to break a cipher. This bound is useful for an attacker only, since it tells us the maximum number of pairs needed. The actual number might be smaller. This makes it possible to determine if a cipher is breakable by CC, whereas the previous bounds can only determine if a cipher is secure. The bound is based on the fact that all matrix norms are lower bounded by the spectral radius. Proposition 6.2.2 Let C be the correlation matrix of the round function of an rround cipher. Then the number of samples N required for a successful correlation attack is upper-bounded by ?1(1 ? jKj1 ) + ?1 (p) N : (C ? E )r?1 With this proposition it is possible to determine a lower bound on the number of rounds for which the cipher can be broken. Proof For any matrix norm kk and any matrix M , one has kM k (M ). Equation (5.8) yields ?1(1 ? jKj1 ) + ?1(p) N = k(C ? E )r?1k2

6.3. BOUNDS RELATED TO COMPOSITE MATRICES

59

?1(1 ? jKj1 ) + ?1(p) ((C ? E )r?1) ?1(1 ? jKj1 ) + ?1(p) = ; (C ? E )r?1 since (M r ) = 1(M r ) = 1(M )r = (M )r for any matrix M .

2

Comments

The matrix norm technique for obtaining upper bounds for multiple rounds is useful for DC and PC as well (since powers of doubly stochastic matrices are also considered in these two cases). As mentioned in the beginning of the chapter, the two sections above have considered only ciphers with round functions which are identical from round to round. This made it possible to obtain the correlation matrix of the reduced cipher by computing C r?1, where C is the correlation matrix of the round function and r is the number of rounds. This, in turn, resulted in bounds in which the expression kC ? E kr?1 occurred (where k k is some matrix norm). Similar bounds for ciphers with round functions which are not identical are easily obtained simply by replacing the expression kC ? E kr?1 with the analogous expression kC (1)kkC (2)kkC (r?1)k where C (1); C (2); : : : ; C (r), are the correlation matrices of the various rounds. This fact follows by more or less the same arguments as used in the proofs above.

6.3 Bounds Related to Composite Matrices Even after having reduced the problem of estimating the security of a cipher to that of evaluating various matrix norms of the correlation matrix of the round function, the method is still unpractical when it comes to direct computation, since when looking at real world ciphers, one often have n = 264. However, if the cipher has a certain structure, then it is possible to derive certain norms in a more analytical way. In this subsection denotes the Kronecker product. The main tool is the following lemma.

Lemma 6.3.1 Consider the matrix function : Mm Mn ! Mmn given by (M; N ) =

p X

a;b=0

gab M a N b

where gab 2 C. Let (x; y) be the complex bivariate polynomial

(x; y) =

p X a;b=0

gab xayb:

Then the eigenvalues of the matrix (M; N ) are the mn numbers (r (M ); s (N )) where r = 1; 2; : : : ; m and s = 1; 2; : : : ; n.

60


Proof For a proof see [26], Section 8.3.

2

Corollary 6.3.2 Let C = A B . Then the singular values of C are the products of the singular values of A and B .

Proof First notice that due to Lemma 6.3.1, the corollary clearly holds if we replace the term \singular values" by \eigenvalues". Now recall that the singular values of the matrix C = A B are the square roots of the eigenvalues of the matrix C T C . Due to properties of the Kronecker product, we have C T C = (A B )T (A B ) = (AT B T )(A B ) = (AT A) (B T B ): By Lemma 6.3.1, the singular values of C are thus given by q f r (AT A) s(B T B ) : r = 1; 2; : : : ; m; and s = 1; 2; : : : ; ng: q q However, since r (AT A) = r(A) and s (B T B ) = s (B ), the elements in this set are in fact the products of the singular values of A and B .

2

Corollary 6.3.3 Let M be the composite matrix M1 M2 Ms where Mj is doubly stochastic for all j = 1; 2; : : : ; s. Then the second highest singular value of M is maxf2(M1); 2(M2); : : : ; 2(Ms )g.

Proof For s = 2 the result follows easily from Corollary 6.3.2 and the fact that the highest singular value of a stochastic matrix is 1. The property generalizes to s > 2 by induction. 2 Before presenting the next proposition, we need a de nition. De nition 6.3.4 De ne by the extended correlation matrix C~ corresponding to the correlation matrix C the matrix

0 1 0 0 1 CC BB 0 C; C~ = B B@ ... A C C 0

i.e., the cases a = 0 and b = 0 are included in the matrix, so to speak (cf. De nition 4.1.7).

Note that the extended correlation matrix is also doubly stochastic.

6.3. BOUNDS RELATED TO COMPOSITE MATRICES X1

61

X2

X8 (G, +)

J

[1]

K

Q

[2]

Q

[3]

[4]

Q

[5]

Q

Q

[6]

Q

[7]

Q

[8]

Q

Linear transform

T Y1

QK

Y2

Y8

Figure 6.1: A composite cipher with a linear transform.

Theorem 6.3.5 Consider a cipher with input x = (x1; x2; : : :; xw ) where each com-

ponent xj belongs to some common set G, and a round function described by the following procedure: First a key is added to each component (over some Abelian group (G ; +)), then a keyed permutation is applied to each component of the result and nally all resulting components are combined by a invertible linear transform (cf. Figure 6.1). More formally, let the round function be given by

Rj;k (x) = T Qk (x1 + j1; x2 + j2; : : :; xw + jw ); where the function Qk : Gw ! Gw is de ned for k = (k1 ; k2 ; : : : ; kw ) by [2] [w] Qk (x) = (Q[1] k (x1); Qk (x2); : : : ; Qkw (xw )); 1

2

where Q[kss] : G ! G is a permutation and T is a w w matrix which is invertible over G . Let C~ be the extended correlation matrix of R, and let C~ [s] = [~c[abs] ] be the extended correlation matrix of Q[s]. Then

(i)

C~ = (C~ [1] C~ [2] C~ [w])P where P = [pab] with pab = a;bT and

(ii)

2(C~ ) = maxf2(C~ [1]); 2(C~ [2]); : : :; 2(C~ [w])g: Proof Let G = Gw and let a denote the character over (G; +) corresponding to a 2 G and let as denote a character over (G; +) corresponding to as 2 G. Then by the de nition of a correlation matrix 2 X 1 X a 1 ? b n (x) (T Qk (x + j )) ; c~ab = jJ j jKj j 2J ;k2K x2G

62


where J = G and K are the respective key spaces. Substitution of x + j by x yields 2 X 1 X a 1 a ? b jJj jKj j2J ;k2K n x2G (?j ) (x) (T Qk (x)) 2 X 1 X a 1 ? b T = jKj n (x) (Qk (x)) : k2K x2G Now de ne the matrix D~ = [d~ab] by 2 ~dab = 1 X 1 X a(x) ?b (Qk (x)) K k2K n x2G

(6.3)

for all a; b 2 G and note that

c~ab = d~a;bT : Since T is non-singular, the function f (b) = b T is a permutation, and consequently D~ is simply a column permutation of C~ . Thus we may write ~ C~ = DP (6.4) where P = [pab] is the permutation matrix de ned by pab = a;bT . By (6.3) and the de nition of a character, we have X Y ! 2 ! Y w w X 1 1 [ s ] ? b a s s (Qks (xs)) (xs) d~ab = K n s=1 k0 2K x2G s=1 21 w 1 X 1 X Y B = as (xs) ?bs (Q[kss](xs)) C @ K n A s=1 k2K xs2G

which equals

w Y I(Las;bs (Xs ; Q[Ks]s (Xs )))

s=1

where the indicated linear I/O sum over G. This, in turn, equals Qw c[sL] denotes s=1 as;bs and hence, due to the de nition of the Kronecker product, D~ = C~ [1] C~ [2] C~ [w]; where C~ [s] is the extended correlation matrix of the function Q[s]. By (6.4) C~ = (C~ [1] C~ [2] C~ [w])P proving (i). ~ is similar to D~ T D~ since To prove (ii), notice that the matrix C~ T C~ = P T D~ T DP permutation matrices are unitary and real-valued. Thus C~ and D~ have the same singular values because similar matrices have the same eigenvalues. By Corollary 6.3.3 the second highest singular value of D~ is maxf2(C~ [1]); 2(C~ [2]); : : : ; 2(C~ [w])g. This concludes the proof. 2

6.3. BOUNDS RELATED TO COMPOSITE MATRICES

63

Unfortunately, (ii) of Theorem 6.3.5 is trivial, since the value of the right hand side is 1. This is because extended correlation matrices have at least two singular values which are 1. This fact is proved by noting that the matrix C~ T C~ has the form of an extended correlation matrix, and therefore both (1; 0; : : : ; 0) and (1; 1; : : : ; 1) are eigenvectors corresponding to an eigenvalue of 1 implying that C~ has two singular values with the value 1. In spite of (ii) not being useful as such, we have included (ii) anyway, since the idea behind it might prove { at a later time { to be useful for obtaining knowledge about powers of the round correlation matrix of a structured cipher. In the following example, it is shown how to apply (i) of Theorem 6.3.5 to a modi cation of a real cipher. Example 6.3.6 We consider a slight modi cation of the cipher SAFER K-64 designed by Massey [29] for Cylink. See Figure 6.2 for a schematic of one round of the cipher. X1

8 bits

X2

X8

64 bits

KL

XOR/ADD U1 exp

log

log

exp

exp

log

log

exp

NL

V1 KR

ADD/XOR W1 2-PHT

2-PHT

2-PHT

2-PHT

2-PHT

2-PHT

2-PHT

2-PHT

2-PHT

2-PHT

2-PHT

2-PHT

Y1

Y2

PHT

Y8

Figure 6.2: One round of the block cipher SAFER. The EXP and LOG boxes are nonlinear permutations over ZZ256, and the PHT box represents a linear (over (ZZ256; +)), invertible transform called the PseudoHadamard-Transform, denotes addition modulo 2 over ZZ82 (XOR), and + denotes addition modulo 256 over ZZ256 (ADD). To bring SAFER K-64 on a form where Theorem 6.3.5 applies, we introduce some minor changes in the rst and third layer by

64


exchanging some of the ADD and XOR operations, such that the rst layer in our version consists of ADD operations only and the third layer consists of XOR operations only. The cipher is now on the form of Figure 6.1 since the rst layer represents 8 parallel, identical group operations over (ZZ256; +), the function Q[kjj] is given by either Q[kjj](xj ) = EXP (x)kj or Q[kjj](xj ) = LOG(x)kj in accordance with the value of j , and the linear transform T is represented by the PHT-layers. Since it is feasible to nd the correlation matrices of the functions Q[j] , j = 1; 2; : : : ; 8 by direct computation, it is possible with Theorem 6.3.5 to obtain a representation of the correlation matrix of SAFER on a more structured form. Recently, certain weaknesses in SAFER has been pointed out by Knudsen [21, 22].

6.4 Schur Stochastic Decomposition While investigating the properties of the correlation matrix, we stumbled upon some structure which has connections to Schur stochastic matrices. While the results in this section are not immediately useful, we have included them since they might provide foundation for further research into this area. Consider the correlation matrix CR of an unkeyed permutation R : G ! G. Such matrices appear when studying ciphers in which the key is introduced by group operations only, e.g., when the round function is given by Rk (x) = R(x + k) (since the corresponding correlation matrix CRk equals CR). As we shall see, all correlation matrices are expressible as sums of such \unkeyed" correlation matrices. Proposition 6.4.1 The correlation matrix C of an unkeyed permutation R : G ! G may be written as C = (F PF ) (F PF ); where denotes the Hadamard product, F = [fab] is the (n ? 1) n truncated Fourier matrix de ned by fab = p1n ?a(b) for a 2 G n f0g, b 2 G, and P = [pab ] is the permutation matrix de ned by pab = a;'(b). Since the matrix F PF is unitary, one may also write C = (F PF ) ((F PF )?1)T : Proof Let D = [dab] = (F PF ) (F PF ). Then by inserting the de nition of P and F , the following is obtained dab = j(F PF )abj2 X 2 = Fax(PF )xb x X 2 = F?a;xF'(x);b x X 2 = a(x) ?b('(x)) x = I(Lab(X; Y ));

6.4. SCHUR STOCHASTIC DECOMPOSITION

65

where Y = '(X ). This equals cab by de nition of a correlation matrix. That F PF is unitary follows from the fact (F PF )(F PF ) = F P T FF PF = I:

2 Note, that F , F P and PF are Vandermonde matrices. Matrices of the form A (A?1)T are widely used in a certain approach for designing chemical engineering plants. Here A is called the gain matrix and A (A?1)T is called the relative gain array (see [32]). For a mathematical treatment, see [16] and [18].

Corollary 6.4.2 Let C be the correlation matrix of an unkeyed function over some Abelian group. Then C is Schur stochastic, i.e., C = U U where U is unitary. Proof Follows directly from Proposition 6.4.1 and the de nition of a Schur stochas-

tic matrix. 2 The following proposition presents a decomposition of any correlation matrix into Schur stochastic matrices.

Proposition 6.4.3 The correlation matrix C of a keyed permutation ek : G ! G with k 2 K is expressible as a sum of jKj Schur stochastic matrices. More formally, 1 X C [k]; C = jKj k2K

where C [k] = U [k] U [k] with U [k] = [u[abk]] = FP [k] F unitary for all k 2 K and P [k] = [a;ek(b)].

Proof Let C = [cab] be the correlation matrix of ek . If we x k, the function ek can be thought of as an unkeyed permutation, and thus by de nition of the correlation matrix we get cab = I(Lab(X; eK (X )) 1 X I (L (X; e (x))jK = k) = jKj ab K k2K 1 X c[k]; = jKj ab k2K

where C [k] = [c[abk]] is the correlation matrix of the \unkeyed" permutation ek (given the xed k). According to Proposition 6.4.1 C [k] is Schur stochastic and furthermore C [k] = (FP [k]F ) (FP [k]F ). 2 In the following, the sum in Proposition 6.4.3 will be called the Schur decomposition of C . Since ordinary matrix addition is subadditive with respect to every matrix norm, we have

66


Proposition 6.4.4 Given a permutation ek : G ! G, let C be the corresponding correlation matrix. Then the second highest singular value of C is bounded from above by

1 X 2 (U [k])2 2(C ) ?1 + jKj 1 k2K i X h ?1 + 1 2 (F )2 (P [k]) 2

jKj k2K

1

1

(6.5) (6.6)

with U [k] , P [k] , and F de ned as in Proposition 6.4.3.

Proof Let

1 X C [k] C = jKj k2K

be the Schur decomposition of Proposition 6.4.3. As mentioned in Chapter 2, the Hadamard product is submultiplicative with respect to the spectral norm. Furthermore, ordinary matrix addition is subadditive with respect to every matrix norm. Thus, 0 1 X 1 2(C ) = 2 @ C [k]A

jKj k2K

1 X (C [k]) jKj 2 k2K

1 X (C [k]) + (C [k]) ? 1 = jKj 1 2 k2K 1 X (U [k] U [k]) + (U [k] U [k]): = ?1 + jKj 2 1 k2K

By the submultiplicativity of the Hadamard product, this expression is bounded from above by 1 ?1 + jKj

X

1(U [k]) 1(U [k]) + 2(U [k] ) 2(U [k]) k2K 1 X (U [k])2 + (U [k])2 = ?1 + jKj 1 2 k2K 1 X 2 (U [k])2 ?1 + jKj 1 k2K

proving (6.5). Further deductions yield

6.5. CONSTRUCTION OF SECURE ROUND FUNCTIONS 1 ?1 + jKj

67

1 X 2 h (FP [k]F )i2 2 1(U [k])2 = ?1 + jKj 1 k2K k2K i X h ?1 + 1 2 (F ) (P [k]) (F ) 2

X

jKj k2K

1

1

1

1 X 2 h (F )2 (P [k])i2 : = ?1 + jKj 1 1 k2K

2

Unfortunately, the bound is trivial since the right hand side of (6.5) equals 1. This is due to the fact that all eigenvalues of a unitary matrix U equals 1 since U U = I (and V and F are both unitary). The proposition is included, however, since the techniques used in the proof might be possible to expand upon.

6.5 Construction of Secure Round Functions Until now we have dealt exclusively with evaluating the security of a given cipher. Another interesting problem is how to construct ciphers which are secure against CC. One approach would be to use a sucient number of rounds { as the preceding results in this chapter have demonstrated, this results in ciphers that are secure against the statistical attack. Another approach, however, is choosing a round function with a good correlation matrix C (in the proper sense). Round functions with correlation matrices whose elements are all close to n?1 1 have high correlation immunity, since the corresponding value of the Frobenius norm kC ? E k2 is close to 0. This is where the so-called bent functions come in handy. A bent function [9, 23, 41, 42] is a function which has a perfectly at Fourier power spectrum. Unfortunately, bent functions which are at the same time permutations do not exist but \almost" bent permutations exist which have an \almost" at Fourier power spectrum. The links between (binary-valued) bent functions and cryptography have already been thoroughly studied [33, 38, 39] due to these functions' nonlinear properties. It has been shown that using almost bent functions as round functions gives immunity against LC and DC [40]. The same holds true for CC, and this fact is easily proven by considering Proposition 4.1.10 and the de nition of a bent function.

68

Chapter 7 Links to Other Statistical Attacks In this chapter we will explore the connection between CC and the statistical attacks DC, GLC, and PC. There are strong links \downwards" to all three attacks in the sense that CC is the most general approach.

7.1 Dierential Cryptanalysis Dierential cryptanalysis [4, 5] can be thought of as a statistical attack with descriptor given by T = fsjs : G2 ! G; 1; 2 2 G n f0g; s(x; y) = e(x + 1) ? y ? 2 for all x; yg and L(S ) = jP [S = 0]j for all S; where 1 and 2 denote the input and output dierence, respectively, and e : G ! G is the encryption function. If S = s(X; Y ) where s 2 T and X and Y are cipher input and output respectively, then the value of the likelihood estimator L(S ) is simply a measure of how often the output dierence is 2 given that the input dierence is 1. The following is the de nition of the DC counterpart to the correlation matrix. De nition 7.1.1 Matrix of Dierential Transition Probabilities. Given a cipher ek : G ! G, the matrix of dierential transition probabilities D = [dab] is de ned to be the (n ? 1) (n ? 1) matrix given by d = P [eK (X + 1) ? y ? 2 = 0] for a; b 2 G n f0g. In other words, the elements are simply the probabilities of transition from a certain input dierence to a certain output dierence. It is easily shown that this matrix is a doubly stochastic, too. There is a more intimate connection between the correlation matrix C of a cipher and the matrix of dierential transition probabilities D. Before it is revealed, we need a de nition. The de nition of the extended matrix D~ is similar to the de nition of the extended correlation matrix (cf. De nition 6.3.4). 1

2

69

CHAPTER 7. LINKS TO OTHER STATISTICAL ATTACKS

70

De nition 7.1.2 De ne by the extended matrix D~ corresponding to the matrix of

dierential transition probabilities D the matrix 0 1 0 0 1 BB 0 CC ~ B C: D = B .. @. A D C 0 Theorem 7.1.3 Given a cipher, let C~ denote the extended correlation matrix and let D~ denote the extended matrix of dierential transition probabilities. Then C~ is the two-dimensional Fourier transform of D~ and vice versa. More precisely, ~ and D~ = F CF ~ C~ = F DF where F = [fab] is de ned by fab = p1n ?a(b) for a; b 2 G. Proof Let F fh(1; 2)g denote the Fourier transform of the function h : G2 ! C with respect to the variable 1. Similarly, let F?1fh(1; 2)g denote the inverse Fourier transform of the function h : G2 ! C with respect to the variable 2. We have to show that n o cab = F F?1 fd g (b) (a): Starting with the right hand side and letting K denote the key space, we obtain F fF f(P [eK((X + 1) ? y ? 2 = 0]g (b)g (a) ) ) 1 = F F jf(x; k) : ek (x) ? ek (x ? 1) = bgj (b) (a) 1

2

1

1

1

2

2

2

9 9 8 8n jKj = = < < 1 XX (ek (x) ? ek (x ? 1)); (b); (a): = F :F : n jKj k2K x2G Applying the inverse Fourier transform yields 8 9 < 1 XX b = F : n jKj (ek (x) ? ek (x ? 1)); (a) 8 k2K x2G 9 < 1 XX b = = F : n jKj (ek (x)) b(ek (x ? 1)); (a) k2K x2G 9 8 =