The Algebra of Linear Algebra 1 Vector spaces

The Algebra of Linear Algebra 1

Vector spaces

The formal de…nition of a vector space requires a set of objects (the vectors in the space V ) together with a …eld of numbers F (generally, we will take this to be the …eld of real numbers, but in some instances, we will …nd it useful to work on the …eld of complex numbers), and two operations. 1. The …rst operation is scalar multiplication, which lets the underlying number …eld operate on the elements of V , together with the requirement that V is closed under scalar multiplication If v 2 V and c 2 F; then we require that cv 2 V for all such v and c: 2. The second operation is vector addition, and we require that V be closed under this operation as well: if v1 and v2 are elements of V; we require that v1 + v2 2 V: 3. You are already familiar with the concept of vector spaces in the form of …nite dimensional Euclidean vector spaces, and most of what we do in these lectures will involve these spaces (and, indeed, will also assume that the dot product operation is well-de…ned so that we can de…ne the matrix operations associated with linear transformations). However, it will also be useful to consider some less common instances of vector spaces, so it will be useful to examine these before we apply them. (a) The vector space F of real-value functions f : [0; 1] ! R. The collection of all real-valued functions on the unit interval is a vector space, since if we take any scalar c 2 R and any function f : [0; 1] ! R, then cf : [0; 1] ! R, so this space is closed under scalar multiplication. If f; g are real-valued functions on [0; 1] ; then f + g (with addition de…ned pointwise: (f + g)(t) = f (t) + g(t)). Hence, F is a vector space. F is obviously a very big vector space, since there are in…nitely many real numbers between zero and one; nevertheless, we can think of elements of F as vectors, each component of which is simply the value of some function at the points t 2 [0; 1] :

(b) The vector space Pn of polynomial functions from R ! R of degree less than or equal to n < 1: That scalar multiplication and vector addition hold for this space is obvious. Note that for this space, we can actually choose a basis consisting of the polynomials B = 1; x; x2 ; :::; xn : This is obviously a basis since every polynomial of degree less than or equal to n can be written as a linear combination of these vectors.

1

For another example of a basis for this space which we will make use of later, consider a collection of n + 1 distinct scalars fc0 ; :::; cn g and the polynomials (known as Lagrange polynomials) Y x ci pj (x) = cj ci i6=j

=

x cj

x cj

c0 c0

cj cj

x c 1 j

1

cj+1 cj+1

x cj

cn : cn

Let us show that these polynomials are a linearly independent collection. Suppose not. Then there exist ai , i = 0; ::; n such that n X

ai pi (x) = 0:

i=0

Since this must hold pointwise for any values of x; we can check the condition for x = ci ; i = 0; ::; n: At ci ; however, the Lagrange polynomials have the property that pj (c)

0 1

= =

if c 6= cj if c = cj :

ij

where ij is the so-called Kronecker delta, which is zero if i 6= j and one otherwise. Hence, 0

n X

=

ai pi (cj ) = aj

i=0

, aj = 0: Since this holds for all j; the Lagrange polynomials are linearly independent and hence span the space of polynomials. Since the Lagrange polynomials are a basis, any polynomial f of degree less than or equal to n can be written as a linear combination of the Lagrange polynomials n X f (x) = i pi (x) : i=0

If we evaluate this expression at x = cj for one of the distinct scalars used in de…ning the Lagrange polynomials, we …nd that f (cj )

=

n X

i pi

(cj )

i=0

=

j

so that for any polynomial f; f (x) =

n X i=0

2

f (ci ) pi (x) :

(c) We can usefully expand the idea of linear independence among polynomials by introducing the concept of an ideal in the space of polynomials having real valued coe¢ cients, which we denote by P (which allows for polynomials of arbitrary …nite degree). i. De…nition. An ideal in the space P is a subspace M of P such that f g belongs to M whenever f 2 P and g 2 M: ii. For an example of an ideal, let d1 ; :::; dn be a …nite number of polynomials in P. Then the sum of the subspaces di P is a subspace M P , and M is also an ideal. To see this, suppose p 2 M: Then there exist polynomials f1 ; :::; fn such that p = f1 d1 + + fn dn : If g is an arbitrary polynomial in P; then pg = (f1 g) d1 +

+ (fn g) dn

so pg 2 M:

2

Linear operators and matrices 1. Every n n matrix represents a so-called linear operator relative to some speci…ed basis. To see this, consider a matrix M , and a similarity transformation induced by a non-singular matrix H. Let B = H 1 M H and examine the characteristic function ( )

= jB = H

1

H

1

= H = jM

1

=

Ij MH

I

[M

I] H

jHj jM Ij :

Ij

It follows that the matrices M and B have the same characteristic functions and hence, the same eigenvalues and eigenvectors. Since, the eigenvalues and eigenvectors uniquely determine how the linear transformation given by M or B operates on an arbitrary vector x 2 Rn ; we can de…ne the linear transformation (or operator) uniquely by specifying eigenvalues and eigenvectors without reference to any underlying basis vectors, and simply refer to the transformation T: 2. Polynomials in T: (a) Given an arbitrary polynomial p (x) = c1 + c2 x + c3 x2 + ck xk we de…ne a polynomial in the operator T as the linear operator given by p (T ) = c1 I + c2 T + c3 T 2 +

ck T k

where I is the identity operator which leaves the vector x 2 Rn unchanged, and for any j; T j is the linear operator given by acting 3

on x with T for j iterations. Note that this de…nition is independent of any speci…cation of basis, since if the matrix M represents T in some basis and B = H 1 M H; then p (M )

= c1 I + c2 M + c3 M 2 + ck M k 1 = c1 I + c2 HBH + c3 HBH 1 HBH 1 + = H c1 I + c2 B + c3 B 2 + ck B k H 1 :

ck HB k H

1

thus, p(M ) and p(B) are related by a similarity transformation and thus represent the same operator. (b) Annihilating polynomials: Every linear operator T is annihilated by some polynomial in powers of T: Proof: Suppose the matrix M represents T and consider the polynomial 2 q(M ) = c0 I + c1 M + cn2 M n : Since every n n matrix can also be represented as a vector of dimension n2 (equivalently, the space of n n matrices is equivalent 2 to Rn ), the polynomial q can be viewed as a linear combination of 2 n2 + 1 vectors in Rn : This collection of vectors must be linearly dependent (since the number of vectors exceeds the dimension of the space in which they lie), so there exists q such that q (M ) = 0 with not every coe¢ cient ci = 0 in q(M ). This establishes the result.

3

Canonical forms. 1. De…nition: Let T be a linear operator on a vector space V: We say that T is diagonalizable if there is a basis of V each vector of which is an eigenvector of T: 2. Some examples. (a) Consider the 3

3 matrix 2

3 A=4 2 2

3 1 1 5: 0

1 2 2

The characteristic polynomial of this matrix is 3 j I

Aj =

2 2

1 2

1 1

=

3

5

2

+8

4=(

1) (

2

2) :

2

Hence, the eigenvalues of A are 1 and 2 (with multiplicity 2). Let us …nd the eigenvectors of A: 4

i. [A

2

2 1 I] x = 4 2 1 2 2

32 3 2 1 x1 2x1 + x2 1 5 4 x2 5 = 4 2x1 + x2 1 x3 2x1 + 2x2

3 x3 x3 5 = 0: x3

We can drop the …rst or second equation, since one of them is redundant. If we then subtract the third equation from the …rst, we …nd x2 = 0 and hence x3 = 2x1 : The subspace of R3 spanned by the eigenvector corresponding to the root 1 thus has dimension 1. Any vector colinear with 1 = [1; 0; 2] will be an eigenvector of A: For the second root, we have 2 32 3 2 3 1 1 1 x1 x1 + x2 x3 5 = 0: 1 5 4 x2 5 = 4 2x1 x3 [A 2I] x = 4 2 0 2 2 2 x3 2x1 + 2x2 2x3 In this case, the …rst and third equations are the same, so we drop the third. From the second equation, we have x3 = 2x1 : Substituting for x3 is the …rst equation yields x1 + x2

2x1 = x2

x1 = 0

so that x1 = x2 : Thus, we again have a one dimensional eigenspace corresponding to the root 2, and any vector colinear with 2 = [1; 1; 2] will be an eigenvector corresponding to the root 2. Note that from our de…nition of when an operator is diagonalizable, the operator represented by A in this example cannot be diagonalized since the eigenvectors don’t span all of R3 : (b) Now consider the operator represented by 2 3 5 6 6 2 5: A=4 1 4 3 6 4

We compute the characteristic equation of this operator using row and column operations: 5 j I

1 3

Aj =

5

6 4 6

6 2 : +4

We add a multiple of -1 times the last column to the second column to get 5 0 6 1 2 2 : j I Aj = 3 2 +4 Aside on row and column operations: To see that the determinant above is unchanged by the column operation just performed, note that the column operation is equivalent to multiplying the matrix I A on the right by the matrix 2 3 1 0 0 E = 4 0 1 0 5: 0 1 1 Since the determinant of a product is the product of the determinants, and the elementary operator E has determinant 1, the result follows. Continuing, we can now factor 2 out of the second column to get 5 j I

Aj = (

0 1 1

1 3

2)

6 2 : +4

Now, add the second row to the third to get 5 j I

Aj = (

0 1 0

1 2

2)

6 2 : +2

Using Laplace expansion on the second column, we now get j I

Aj =

(

2)

= = =

( ( (

2) [( 2) 2) (

=

(

2) (

2

5 2 2

6 +2

5) ( + 2) + 12] 3 +2 2) ( 1) 1) :

Next, we …nd the characteristic vectors associated with these eigenvalues. 2 32 3 2 3 4 6 6 x1 4x1 6x2 6x3 2 5 4 x2 5 = 4 x1 + 3x2 + 2x3 5 = 0: [A I] x = 4 1 3 3 6 5 x3 3x1 6x2 5x3 If we add the third equation 2 4x1 4 2x1 3x1

to the second, we get 3 6x2 6x3 3x2 3x3 5 = 0 6x2 5x3

6

which then implies that the …rst and second equations are redundant. Dropping the …rst equation, we then have x1 = 3x2 + 2x3 : Substituting for x1 is the third equation yields 9x2 + 6x3

6x2

5x3 = 0

or 3x2 + x3 = 0 so that x3 = 3x2 and hence, x1 = 3x2 : The eigenspace corresponding to the root 1 is then one dimensional and consists of all vectors colinear with a1 = [ 3; 1; 3] : For the second root, we have 2 3 3 6 6 2 5: jA 2Ij = 4 1 2 3 6 6 Note that all of the columns of A 2I are colinear, so that this matrix has rank 1. This means that the subspace of vectors which annihilate A 2I satisfy the equation 3x1

6x2

6x3 = 0

or x1 = 2x2 + 2x3 which yields a two dimensional subspace spanned by the vectors a2 = [2; 1; 0] and a3 = [2; 0; 1] : For this example, the space spanned by the characteristic vectors is three dimensional, so by our de…nition of diagonalizability, the matrix A can be diagonalized by a similarity transformation via the matrix 2 3 3 2 2 H = 4 1 1 0 5: 3 0 1 3. The two examples above show that when the characteristic function of an operator has repeated roots, the operator may or may not be diagonalizable, depending on the dimension of the eigenspace associated with the repeated roots. Hence, we would like to be able to characterize when an operator is diagonalizable, and, if it is not, to …nd the simplest possible representation of the operator relative to some basis. These representations are called the canonical forms of the operator, and the rest of the lecture will focus on characterizing these forms.

7

(a) De…nition: Let T be a linear operator on a …nite-dimensional vector space V: The minimal polynomial for T is the (unique) monic polynomial having least degree which annihilates T: In the de…nition, ”monic”means the coe¢ cient corresponding to the term of the polynomial of highest degree is 1. To see that the minimal polynomial is unique, suppose there are two distinct monic polynomials p(x) and q(x) of the same degree which annihilate T: Then any linear combination of the two polynomials also annihilates T; since [ap + bq] (T ) = ap(T ) + bq(T ) = 0: Consider the polynomial p q: By the previous result, [p q] (T ) = 0: But in this case, because both p and q are monic, p q has degree small than that of either p or q; which contradicts the assumption that p and/or q was the minimal polynomial. (b) Theorem: Let T be a linear operator on an n dimensional vector space V: Then the characteristic and minimal polynomials for T (or any matrix representing T ) have the same roots, except for multiplicities. Proof: Let p be the minimal polynomial for T; and let c be a scalar. We wish to show that p(c) = 0 if and only if c is a characteristic value of T: Suppose …rst that p(c) = 0: Then p = (x

c) q

where q is a polynomial of degree smaller than p: By de…nition of the minimal polynomial, we know that q(T ) 6= 0: Choose a vector such that q(T ) 6= 0 and let = q(T ) : Then 0

= p(T ) = (T cI)q(T ) = (T cI) :

It now follows that is an eigenvector of T corresponding to the eigenvalue c: Next, suppose c is an eigenvalue of T with eigenvector ; so that T = c : Then p(T ) = p(c) : This follows easily from the fact that for any exponent j; T j = T j 1 T = cT j 1 = cj : Now, since p(T ) = 0 while 6= 0; it follows that p(c) = 0: Now, suppose T is a diagonalizable operator. Then the minimal polynomial of T must be p(x) = (x

c1 )(x

8

c2 )

(x

ck )

where c1 ; :::; ck are the eigenvalues of T: To see why this must be so, consider any eigenvector of T: One of the matrices T ci I maps to zero. Since any polynomial in an operator commutes with the operator –which can be seen from the fact that (T

c1 I)(T

c2 I)

= T2 = (T

(c1 c2 ) T + c1 c2 c2 I)(T c1 I)

and an obvious indction argument –it follows that (T

c1 I)(T

c2 I)

(T

ck I) = 0:

Since the collection of eigenvectors forms a basis for Rn ; the matrix H = [a1 ; :::; an ] having eigenvectors of T for its columns has full rank, while (T c1 I)(T c2 I) (T ck I)H = 0: It must, therefore, be that (T c1 I)(T that p is the minimal polynomial for T:

c2 I)

(T

ck I) = 0; so

(c) Theorem (Cayley-Hamilton) Let T be a linear operator on a …nite dimensional vector space V: If f is the characteristic polynomial of T; then f (T ) = 0: Proof: Given any square matrix B; we know from the theory of determinants that [adj B] B = jBj I; where adj B is the classical adjoint matrix of B (i.e. the transpose of the matrix of cofactors of B). (Note that when B is non-singular, so that jBj 6= 0; this equation corresponds to Cramer’s rule for calculating the inverse of a non-singular matrix.) Let A represent the transformation T and let Q ( ) =adj [A I] : Then Q ( ) [A

I] = jA

Ij I = c0 I + c1 I +

where jA Ij = c0 + c1 + ck Q( ) can be written in the form

k

ck

k

I

: Note that the adjoint matrix

Q( ) = Q0 + Q1 +

Qr

r

by collecting components of Q( ) corresponding to like powers of together in the coe¢ cient matrices Qi : Hence, we have [Q0 + Q1 +

Qr

r

] [A

Evaluating this expression at 0

I] = c0 I + c1 I +

ck

k

I

= A we get

= c0 I + c1 A + = f (A):

ck Ak

Since this remains true under similarity transformations, it follows that f (T ) = 0: 9

In conjunction with the result above showing that the minimal and characteristic polynomials have the same roots, the Cayley-Hamilton theorem implies that the minimal polynomial divides the characteristic polynomial. 4. Direct-sum Decompositions (a) Invariant Subspaces. Let W V be a subspace. We say that W is invariant under the action of the linear operator T if T (W ) W: The eigenspaces of any operator are obviously invariant subspaces, since for any eigenvector of T; T = is obviously an element of the eigenspace. (b) De…nition: Let W1 ; :::; Wk be subspaces of V: We say that the Wi are independent if 1

implies

i

+

k

= 0 for

i

2 Wi i = 1; :::; k

= 0 for all i:

(c) De…nition: Let W1 + + Wk denote the set of all vectors which are generated by linear combinations of vectors in Wi : If the subspaces Wi are all independent, then we call the sum W1 + + Wk a directsum and denote it by W = W1 Wk : If = 1 + k for i 2 Wi and the subspaces Wi are independent, then the expression for as a sum of components of the Wi is unique. To see this, suppose there is some other collection of vectors i 2 Wi such that = 1+ k : Then 0=(

1)

1

+

+(

k

k)

which, by independence, implies i = i for all i: Thus, when the subspaces are independent, we can operate with the vectors in W as k-tuples ( 1 ; :::; k ) ; i 2 Wi in the same way that we operate with vectors in Rk as k-tuples of numbers. (d) Lemma. Let V be a …nite-dimensional vector space and W = W1 + + Wk for the collection of subspaces Wi V: Then the following are equivalent. i. W1 ; :::; Wk are independent ii. For each j; 2 j k; we have Wj \ (W1 +

+ Wj

1)

= f0g

iii. If Bi is an ordered basis for Wi , 1 i k; then the collection B = (B1 ; :::; Bk ) is an ordered basis for W: iv. Proof: Assume …rst that the subspaces are independent and consider a vector 2 Wj \ (W1 + + Wj 1 ) : Then there are 10

vectors i in Wi ; i = 1; :::; j This implies that 1

+

+

j 1

+(

1 such that )+0+

=

1

+

+

j 1:

+ 0 = 0:

By the independence assumption, then, it must be that = 1 = = j 1 = 0: v. To show that B implies A, suppose 1 + k = 0 for i 2 Wi i = 1; :::; k: Let j be the largest index for which some j 6= 0: Then j = 1 j 1 is a non-zero vector in Wj \ (W1 + + Wj 1 ) : Hence, if the subspaces Wi are not linearly independent, condition B doesn’t hold. By logical negation, it follows that B implies A. We leave the proof of the equivalence of A and B with C to the student. (e) Projections. If V is a vector space, a projection of V is a linear operator E on V such that E 2 = E: If E is a projection, we let RE denote the range of E; and NE the null space (or nullity), which consists of all vectors annihilated by E: i. A vector is in RE if and only if E = : If = E ; then E = E 2 = E = : Conversely, if = E ; then is obviously in RE : ii. V = RE + NE : iii. The unique expression for any vector 2 V as a sum of vectors in RE and NE is = E + (I E) : Since the projection E is uniquely determined from its range and nullity, we will frequently call E the projection on RE along NE : Note that any projection can be diagonalized trivially. Pick a basis for RE and a separate basis for NE : Then, in terms of the basis for V obtained by combining the two bases, the projection operator takes the form IE 0 E= 0 0 where IE is an identity matrix of dimension equal to the dimension of RE : We can use the concept of projection operators to describe and characterize direct-sum decompositions of the vector space V: (f) Theorem. If V = W1 Wk is a direct-sum decomposition, then there exist k linear operators E1 ; :::; Ek on V such that i. ii. iii. iv.

For all j; Ej2 = Ej (Ej is a projection) Ei Ej = 0 if i 6= j I = E1 + + Ek The range of Ei is Wi .

Conversely, if E1 ; :::; Ek are k linear operators on V satisfying A,B,C, and D, then V = W1 Wk is a direct-sum decomposition. 11

5. Invariant Direct-Sums We are interested in direct sum decompositions V = W1 Wk where each of the subspaces Wi is invariant under the actions of some operator T: Given such a decomposition, the operator T induces a linear operator Ti on each Wi by restriction. The action of the operator T can then be described by considering =

1

+

+

T

= T1

k

for

i

2 Wi

so that 1

+

+ Tk

k:

We then say that the operator T is a direct sum of the operators Ti : Note that in terms of matrix representations of the operator T; this decomposition says that if we select a basis for V consisting of bases for the Wi ; then the matrix representation of T takes the form 2 3 A1 0 0 6 0 A2 0 7 6 7 A=6 . . .. 7 . .. .. 4 .. . 5 0

0

Ak

where the block matrices Ai are di di with di = dim Wi : We then say that the matrix A is a direct sum of the matrices Ai : It will be most convenient to describe the decomposition in terms of the projections on the invariant subspaces, and the following result gives us a way of expressing the invariance in terms of the projection operators. (a) Theorem. Let T be a linear operator on the vector space V and let W1 ::: Wk be a direct sum decomposition of V , with E1 ; :::; Ek the associated projections on the subspaces. Then a necessary and su¢ cient condition for each subspace Wi to be invariant under T is that T commutes with each of the projections Ei ; i.e. T Ei = Ei T for i = 1; :::; k: Proof: Suppose T commutes with each Ei : Let Ei = ; and T

2 Wi : Then

= T Ei = Ei T

so that T is in the range of Ei , i.e. Wi is invariant under T: Next, suppose that each Wi is invariant for T: We need to show that this implies that the projections and T commute. Let 2 V be any vector. Then = E1 + + Ek 12

and T

= T E1 +

T Ek :

Since Ei 2 Wi , which is invariant under T; we must have T Ei Ei i for some vector i : Then Ej T E i

= Ej Ei =

=

i

0 if j 6= i : Ej j if i = j

Thus, Ej T

= Ej T E 1 + = Ej j = T Ej :

Since this holds for any

+ Ej T E k

2 V; it follows that T Ei = Ei T:

(b) We will now apply the machinery of direct-sum decomposition to the case of a diagonalizable linear operator, since this is a particularly simple case which illustrates the power behind the decomposition geometry. Theorem. Let T be a linear operator on a …nite-dimensional vector space V: If T is diagonalizable and if 1 ; :::; k are the distinct eigenvalues of T; then there exist linear operators E1 ; :::; Ek on V such that i. ii. iii. iv. v.

T = 1 E1 + + k Ek ; I = E1 + + Ek ; Ei Ej = 0 if i 6= j; Ei2 = Ei for all i; the range of Ei is the eigenspace of T associated with

i:

Conversely, if there exist k distinct scalars 1 ; :::; k and k non-zero linear operators E1 ; :::; Ek satisfying conditions i-iii, then T is diagonalizable, the scalars are the eigenvalues of T; and conditions iv and v hold also. Proof: Suppose …rst that T is diagonalizable, with distinct eigenvalues 1 ; :::; k : Let Wi be the eigenspace associated with i . Then V = W1

Wk :

Let E1 ; :::; Ek be the projections associated with this decomposition. Then ii, iii, iv and v are satis…ed by construction. To show that i is satis…ed, pick a vector 2 V: Then = E1 +

13

+ Ek

and hence, T

= T E1 + = 1 E1 +

+ T Ek + k Ek

(using the fact that T commutes with the projections since the Wi are invariant subspaces). Since was arbitrary, it follows that T =

1 E1

+

+

k Ek

and i holds. To show the converse, suppose we are given a linear operator T along with distinct scalars 1 ; :::; k and non-zero operators which satisfy i,ii, and iii. Since Ei Ej = 0; multiplying I = E1 + + Ek by Ei yields immediately that Ei = Ei2 ; so iv is satis…ed. Multiplying T = 1 E1 + + k Ek by Ei we have T Ei = i Ei or (T i I) Ei = 0: This says that any vector in the range of Ei is in the null space of T i I: Since we have assumed that Ei 6= 0; it then follows that there is some non-zero vector in the nullity of T i I; and hence, i is an eigenvalue of T: The collection of scalars 1 ; :::; k are also all of the eigenvalues of T; since, if is any scalar, then T

I

= + k Ek 1 E1 + = ( 1 ) E1 + +(

(E1 + + Ek ) ) Ek : k

Hence, if (T I) = 0 for some 6= 0; then we must have ( i ) Ei = 0; and for at least one i; Ei 6= 0; which then requires that = i : We have now shown, therefore, that every non-zero vector in the range of any Ej is a characteristic vector of T: Since I = E1 + +Ek implies that the eigenspaces span V; it follows that T is diagonalizable. It then remains to show condition v, that the range of each Ej is in fact the eigenspace of T corresponding to the eigenvalue j . To show this, we need to show that the nullity of T j I is exactly the range of Ej : But this follows from the fact that if T = i ; then T

= =

T E1 + + T E k = 1 E1 + + + Ek ] = i E1 + i = i [E1 +

)

k X

(

j

i ) Ej

k Ek

+

i Ek

= 0:

j=1

This implies that for j 6= i; we must have Ej = 0: Since = E1 + + Ek and Ej = 0 for j 6= i; we have = Ei ; and hence, is in the range of Ei : (c) Polynomials, revisited. One of the features of the direct-sum decomposition T = 1 E1 + + k Ek is that if g is any polynomial over the real numbers, then g(T ) = g ( 14

1 ) E1

+

g(

k )Ek :

This is easily shown by computing powers of T using the direct-sum decomposition. For example, T2

= =

[

1 E1

+

k X k X

+

k Ek ] [ 1 E1

+

+

k Ek ]

i j Ei Ej

i=1 j=1

=

k X

2 i Ei

(since the Ei are projections).

i=1

By a simple induction argument, then, it follows that Tr =

k X

r i Ei

i=1

and hence, that any polynomial g(T ) = a0 I + a1 T + given by k X g(T ) = (a0 + a1 i + an ni ) Ei :

an T n will be

i=1

It is interesting to note what happens if we apply this result to the Lagrange polynomials pj (x) =

Y x i6=j

where we take the scalars operator T: In this case,

i

pj (T ) = pj ( Since pj ( i ) =

ij ;

i

j

i

to be the distinct eigenvalues of the 1 ) E1

+

pj (

k )Ek :

we have pj (T ) = Ej

which tells us that the projections associated with the direct-sum decomposition of T are in fact polynomials in T: This analysis provides another (independent) way to prove the previous theorem that an operator is diagonalizable if and only if it’s minimal polynomial is the product of distinct linear factors. So, suppose T is diagonalizable. Then T =

1 E1

+

+

k Ek

and for any polynomial g(T ); g(T ) = g ( 15

1 ) E1

+

g(

k )Ek :

Hence, g(T ) = 0 if and only if g( i ) = 0 for all i; which implies that the minimal polynomial for T is p(x) = (x

1)

(x

k) :

Next, suppose T is a linear operator with minimal polynomial p(x) = (x

1)

(x

k) :

Form the Lagrange polynomials pj (x) =

Y x

i

j

i6=j

:

i

From the result we showed earlier on the spanning properties of the Lagrange polynomials, we know that for any polynomial g of degree less than or equal to k; k X

g(x) =

g ( i ) pi (x) :

i=1

Taking g to be the constant polynomial equal to 1, we get 1=

k X

pi (x) :

i=1

Taking g equal to the polynomial g(x) = x; we get x=

k X

i pi

(x) :

i=1

Now, let Ej = pj (T ): Then the expressions above give us I=

k X

Ej

i=1

and T =

k X

i Ej :

i=1

Now, observe that if i 6= j then the polynomial pi (x) pj (x) =

Y x

k6=i

i

k k

Y x

k6=j

j

k k

is divisible by the minimal polynomial p since pi pj contains every (x Hence, pi (T ) pj (T ) = 0 (since the minimal j ) as a factor. polynomial annihilates T ); and we have that Ei Ej = 0 if i 6= j: 16

Finally, note that for all i; Ei 6= 0 since pi has degree k 1 and cannot annihilate T without contradicting the assumption that the minimal polynomial p has degree k: Applying the previous theorem, then, we conclude that T is diagonalizable. (d) Let’s apply these results to the operator 2 5 6 A=4 1 4 3 6

represented by the matrix 3 6 2 5 4

which we examined previously. This matrix is diagonalizable and has minimal polynomial p( ) = ( 2) ( 1) : The Lagrange polynomials corresponding to this are p2 ( ) = (

1)

and p1 ( ) =

(

2):

Hence, the projection operators are E2 = A

I

associated with the eigenvalue 2, and E1 = 2I

A

associated with the unit eigenvalue. Clearly, E1 + E2 = 2I A I = I: Also, E1 E2 = p(A) = 0; and E1 + 2E2 = 2I

A + 2A

A+

2I = A:

6. The Primary Decomposition Theorem As we saw from a previous example, not all operators are diagonalizable. This occurs when the minimal polynomial of the operator has roots with multiplicity greater than 1. This can occur for two reasons. First, the …eld of numbers over which we are permitted to look for roots may not be algebraically closed, in the sense that every polynomial with coe¢ cients in the given …eld has roots in this …eld. An example of this is the polynomial p(x) = x2 + 1: This polynomial has real-valued coe¢ cients, but no real roots. A linear operator on R2 having p as it’s minimal polynomial is A=

0 1

1 0

so this matrix can’t be diagonalized over the …eld of real numbers. This situation is relatively easy to remedy, since the de…ciency lies in the …eld of numbers we are working with rather than the operator. If we extend 17

the …eld from the …eld of real numbers to the …eld of complex numbers, then we can factor the polynomial p(x) = x2 + 1 = (x + i) (x i) where i is the square root of -1. Since this polynomial factors completely into linear factors, our previous result imply that A can be diagonalized over the …eld of complex numbers. You should verify directly by computing the eigenvectors and performing the similarity transformation that this is in fact so. The more di¢ cult problem occurs when the minimal polynomial factors completely, but nevertheless contains factors with multiplicity greater than one. In this case, the operator simply can’t be diagonalized, no matter what the underlying …eld we work with. This, then, is a fundamental property of the operator itself, and the associated canonical form for the operator is correspondingly more complicated. Despite this complication, we can use a geometric approach parallel to the one we used for diagonalizable operators to analyze the general case. The key result is the following (known as the primary decomposition theorem). (a) Theorem. Let T be a linear operator on a …nite-dimensional vector space V . Let p be the minimal polynomial for T and assume that p factors as p = pr11 prkk where the pi are distinct irreducible monic polynomials and the ri are r positive integers. Let Wi be the null space of pi (T ) i ; i = 1; :::; k: Then i. V = W1 Wk ; ii. each Wi is invariant under T ; iii. if Ti is the operator induced in Wi by T; then the minimal polynomial for Ti is pri i : Proof. We want to develop the proof of this result along the same lines we used for the diagonalizable case, so we begin by looking for the projection operators associated with each of the subspaces Wi : For each i; let Y r p pj j : fi = ri = pi j6=i

Since the pi ; i = 1; :::; k are distinct prime polynomials, the polynomials fi ; i = 1; :::; k can be shown to be relatively prime (we leave this easy proof to the student). Since these polynomials generate the whole space of polynomials, it follows that there exist polynomials g1 ; :::; gk such that k X fi gi = 1: i=1

Note also that if i 6= j then the product fi fj is divisible by the minimal polynomial p: Let the polynomial hi = fi gi and de…ne the 18

operator Ei = hi (T ): We will show that the operators Ei ; i = 1; :::; k Pk are the projections we are after. First, since i=1 hi = 1 and p divides fi fj for i 6= j; it follows that E1 +

+ Ek = I

and Ei Ej = 0 for i 6= j: Thus, the Ei are projections which correspond to some direct sum decomposition of V: We wish to show that the range of each Ei is in fact the subspace Wi : Suppose …rst that is in the range of Ei : Then = Ei ; so that ri

r

= pi (T ) i Ei r = pi (T ) i fi (T )gi (T ) = 0

pi (T )

since pri i fi gi is divisible by p: Hence, any vector in the range of Ei is r in the nullity of pi (T ) i , i.e. in Wi : Conversely, suppose is in the ri nullity of pi (T ) : Then, for j 6= i; fj gj is divisible by pri i ; so that fj (T )gj (T ) = 0; which is to say Ej = 0 for j 6= i: It now follows that Ei = ; which completes the proof of statement i. r To show that Wi is invariant under T; we show that pi (T ) i T = 0 ri ri if 2 Wi : Since pi (T ) = 0 + 1 T + ri T ; we have r

pi (T ) i T

=

0

+

1T

+

= T 0 + 1T + = T pi (T )ri = 0

ri T

ri

ri T

T ri

r

since 2 Wi : It follows, then that T is in the nullity of pi (T ) i and hence is in Wi : This establishes ii (along with the fact that any polynomial in an operator commutes with the operator). Condition iii follows from this since the operator Ti obtained by restricting T to r Wi is annihilated by pi (T ) i which implies that the minimal polynori mial for Ti divides pi (T ) : Conversely, suppose q is any polynomial which annihilates Ti : Then 0

= q(Ti ) = q (T Ei ) = q(T )Ei (since Ei is a projection) = q(T )fi (T )gi (T ):

Now, gi (T ) 6= 0 since this would imply that Ei = 0; which it is not. Hence, it must be that q(T )fi (T ) = 0; which implies, in turn, that the minimal polynomial divides q(T )fi (T ): Since the minimal polynomial is pri i fi ; it should be clear that pri i divides q; and hence pri i is the minimal polynomial for Ti : 19

(b) Let us consider the case where each polynomial pri i consists of linear factors: pi = (x i ) : In this case, the range of Ei is the null space of [T I] : Let i D=

1 E1

+

+

k Ek :

By construction (and a previous theorem), D is a diagonalizable operator, which we call the diagonalizable part of T: Consider the operator N = T D: Since T = T E1 +

+ T Ek

and D=

1 E1

+

+

k Ek

we have N = (T

1 I) E1

+

(T

k I) Ek :

It then follows that r 1 I)

N r = (T

E1 +

(T

r k I)

Ek

(using the properties of projections), and hence, for r 1 for all i; N r = 0: We call an operator on V having the property that for some positive integer r N r = 0 a nilpotent operator. We can now state the following result. (c) Theorem. Let T be a linear operator on the …nite-dimensional vector space V; and suppose that the minimal polynomial for T decomposes into a product of linear polynomials. Then there is a diagonalizable operator D on V and a nilpotent operator N on V such that i. T = D + N ii. DN = N D: The diagonalizable operator D and the nilpotent operator N are uniquely determined by the conditions above and each of them is a polynomial in T: Proof. We have already shown all the results above except for the uniqueness. To show this, suppose there is another pair of operators D0 and N 0 with T = D0 + N 0 ; D0 diagonalizable, N 0 nilpotent, and D0 N 0 = N 0 D0 : We need to show that D0 = D and N 0 = N: Since D0 and N 0 commute with one another while T = D0 + N 0 ; D0 and N 0 commute with T; and hence with any polynomial in T: They thus commute with D and N as well. We then have D + N = D0 + N 0 or D

D0 = N 20

N 0:

The operator N (N

N 0 is nilpotent, since r X r r N) = (N ") j j=0 0 r

j

j

( N)

r

(using the binomial formula). Obviously, if r is large enough, (N N 0 ) = 0 since both N and N 0 are nilpotent, and hence, N N 0 is nilpotent. From the previous calculation, then the operator D D0 is nilpotent. Because both D and D0 are diagonalizable and commute, these operators can be diagonalized simultaneously. To see this, write D = 1 E1 + k Ek and D0 =

0 0 1 E1

0 0 k Ek :

+

Since D and D0 commute, while the projections are polynomials in D and D0 respectively, the projections in each expression above all commute, and each commutes with D and D0 : Hence, D and D0 have the same invariant subspaces and the similarity transformation that diagonalizes one operator will simultaneously diagonalize the other. Thus, with D D0 diagonalizable and nilpotent, it’s minimal polynomial must be of the form p( ) = r (since every matrix is annihilated by its minimal polynomial and we know that nilpotent operators satisfy N r = 0). Because D D0 is diagonalizable, however, it must be that p( ) = ; which then implies that D D0 = 0: Hence, D = D0 and N = N 0 : 7. The Jordan form (a) Cyclic Subspaces and Annihilators i. De…nition. If is any vector in the …nite-dimensional vector space V; and T is a linear operator on V; the T -cyclic subspace generated by is the subspace Z( ; T ) of all vectors of the form g(T ) ; where g 2 P: If Z( ; T ) = V; the we call a cyclic vector for T: ii. Examples: The identity operator has no cyclic vector, since every non-zero vector in V generates a one-dimensional cyclic subspace consisting the of that vector. For an example of an operator with a cyclic vector, consider the operator represented by the matrix 0 1

1 0

:

The …rst unit vector e1 is a cyclic vector for this operator, since T e1 = e2 and e1 and e2 span R2 : 21

iii. De…nition. If is any vector in V; the T -annihilator of is the ideal M ( ; T ) P consisting of all polynomials g over the number …eld we are working with such that g(T ) = 0: The unique monic polynomial p which generates this ideal will also be called the T -annihilator of : Note that M ( ; T ) is a non-zero ideal in P since it contains the minimal polynomial for T: Also, it should be clear that deg(p ) > 0 unless is the zero vector. (b) Theorem. Let be any non-zero vector in V and let p T -annihilator of :

be the

i. The degree of p is equal to the dimension of the cyclic subspace Z( ; T ): ii. If the degree of p is k; then the vectors ; T ; T 2 ; :::; T k 1 form a basis for Z( ; T ): iii. If U is the linear operator on Z( ; T ) induced by T; then the minimal polynomial for U is p : Proof: Let g be any polynomial in P; and write g =p q+r where either r = 0 or deg(r) < deg(p ) = k: The polynomial p q is in the T -annihilator of ; so that g(T )

= p q(T ) + r(T ) = r(T ) :

Since r = 0 or deg(r) < k; the vector r(T ) is a linear combination of the vectors ; T ; :::; T k 1 ; and, since g(T ) is a typical vector in Z( ; T ); this shows that these k vectors span Z( ; T ): These vectors are also linearly independent, since any non-trivial linear relation between them would give us a polynomial g of degree less than k such that g(T ) = 0: But this contradicts the assumption that the polynomial p generates the T -annihilator (and therefore has smallest degree among all polynomials in the ideal). This establishes conditions i and ii. To show iii, let U be the linear operator obtained from T by restricting T to the subspace Z( ; T ): If g 2 P; then p (U )g(T )

= p (T )g(T ) = g(T )p (T ) = 0:

since g(T ) 2 Z( ; T )

Hence, p (U ) is the zero operator on Z( ; T ) and p is the minimal polynomial of U: 22

(c) We will now apply the ideas on cyclic subspaces to the problem of …nding the canonical representation of an arbitrary operator T . We begin by considering operators which have a cyclic vector . Denote such an operator by U; and assume it operates on a space W of dimension k: By the previous theorem, the vectors ; U ; :::; U k form a basis for W; and the annihilator p of is the minimal polynomial for U: (Note that since the degree of p is equal to the dimension of the space W; this implies that p is also the characteristic polynomial of U:) If we let i = U i 1 ; i = 1; :::; k; then the action of U on the ordered basis B = f 1 ; :::; k g is U U

i k

= =

i+1

for i = 1; :::; k

0 1

1

1 2

k 1 k

where p =

0

The expression for U Uk +

+

k

1x

+

+

k 1 k 1x

+ xk :

follows from the fact that p (U ) = 0; i.e.

k 1U

k 1

+

+

1U

+

0

= 0:

Hence, in this basis, the matrix representation of U is 2 3 0 0 0 0 0 6 1 0 0 7 0 1 6 7 6 0 1 0 7 0 2 6 7: 6 .. .. .. 7 .. .. 4 . . . 5 . . 0 0 0 1 k 1

This matrix is called the companion matrix of the monic polynomial p : (d) Theorem. If U is a linear operator on the …nite-dimensional space W; then U has a cyclic vector if and only if there is some ordered basis for W in which U is represented by the companion matrix of the minimal polynomial for U: Proof: We have just shown that if U has a cyclic vector, then there is such an ordered basis for W: Conversely, if we have some ordered basis B = f 1 ; :::; k g for W in which U is represented by the companion matrix of its minimal polynomial, it should be clear that the vector 1 is a cyclic vector. (e) Corollary. If A is the companion matrix of a monic polynomial p; then p is both the minimal and characteristic polynomial of A: Proof: By theorem 7.b above, we know that p is the minimal polynomial for A: We show by direct calculation that p is also the char-

23

acteristic polynomial for A: Write

jxI

x 0 .. .. . . 0 0 0 0 0 0

Aj =

0 .. .

0 .. .

x 1 0

0 x 1

0

.. .

:

k 3 k 2 k 1

+x

We now perform a series of column operations. Multiply the second to last column by k 1 + x and add it to the last column. This yield a last column of the form 2 3 0

6 6 6 6 6 4 x2 +

.. .

k 3 k 1x

+

k 2

0

7 7 7 7: 7 5

Now, multiply the third to last column by x2 + k 1 x + k 2 and add it to the last column. This yield a last column of the form 3 2 6 6 6 6 x3 + 6 4

0

.. . 2 1x + 0 0

k

k 2x

+

k 3

7 7 7 7: 7 5

Continuing in this way with column operations, we end up with

jxI

Aj =

x 0 .. .. . . 0 0 0 0 0 0

0 .. .

0 .. .

x 1 0

0 x 1

xk +

jA

xIj =

0 .. .

0 .. .

0 0 0 0 0 0

0 xk + .. .

k 1 k 1x

24

0T I

+

k 2 k 2x

.. . 0 0 0

In partitioned form, we now have xIj =

k 2 k 2x

+

+

1x

+

0

:

1 rows to the row above it, we

0 0 1 0 0 1

jA

+

.. . 0 0 0

Adding x times each of the last k further reduce the determinant to 0 .. .

k 1 k 1x

p (x) : 0

+

+

1x

+

0

:

A Laplace expansion on the …rst row now yields jA

xIj = ( 1)k p (x):

Since the ( 1)k doesn’t matter for the determination of the eigenvalues of A; it follows that ch(x) = p (x): (f) Cyclic Decomposition and the Jordan Form The cyclic decomposition is accomplished by …nding a set of vectors i i = 1; :::; k such that the operator T acting on each i generates a cyclic subspace (which is obviously invariant for the operator T ) and such that the complementary subspace is also invariant, and hence can be decomposed further into cyclic subspaces. The main result we will be interested in is the following. Cyclic Decomposition Theorem. Let T be a linear operator on a …nite dimensional vector space V . There exist non-zero vectors 1 ; :::; r in V with respective T -annihilators p1 ; :::; pr such that i. V = Z( 1 ; T ) Z( r ; T ); ii. pk divides pk 1 ; k = 2; :::; r: Furthermore, the integer r and the annihilators p1 ; :::; pr are uniquely determined by i and ii, and the fact that no k is 0: Since the proof of this theorem is both di¢ cult and long, we will simply take this result as given. We now wish to apply this result as follows. Suppose the operator N is nilpotent. The cyclic decomposition theorem applied to this operator gives us a positive integer r and r non-zero vectors 1 ; :::; r together with their N -annihilators p1 ; :::; pr such that V = Z(

1; N )

Z(

r; N )

and pi+1 divides pi : Since N is nilpotent, its minimal polynomial is just p(x) = xk for some k n: Each of the annihilators takes the form pi (x) = xki since the annihilators divide the minimal polynomial. This last result follows from the fact that the minimal polynomial annihilates the operators itself, p(N ) = 0; from which it follows that p(N ) = 0 for any 2 V: Since 0 2 Z( i ; N ) for any i; the minimal polynomial must be part of the ideal generated by pi (x): The divisibility condition above then says simply that k1 k2 kr 1: For each i; the companion matrix for pi is the ki ki matrix 2 3 0 0 0 0 6 1 0 0 0 7 6 7 6 0 1 0 0 7 Ai = 6 7: 6 .. .. .. .. 7 4 . . . . 5 0 0 1 0 25

Thus, the cyclic decomposition theorem gives us an ordered basis for V in which the matrix of N is the direct sum of elementary nilpotent matrices Ai : Now, go back to the primary decomposition theorem applied to an arbitrary operator T on V: Assume that the characteristic polynomial for T factors as d1

ch( ) = (

c1 )

dk

(

ck )

where c1 ; :::; ck are distinct eigenvalues of T and for each i; di Then the minimal polynomial for T will be r1

p( ) = (

c1 )

rk

(

ck )

where 1 ri di : If Wi is the null space of (T primary decomposition theorem tells us that V = W1

1:

r

ci ) i , then the

Wk r

and the operator Ti induced on Wi has minimal polynomial ( ci ) i : Let Ni be the linear operator on Wi de…ned by Ni = Ti ci I: Then Ni is nilpotent and has minimal polynomial ri : To see this, calculate the characteristic polynomial of N = T ci I: This is given by chN ( ) = jN Ij = jT ( + ci ) Ij : It follows that chN ( ) is the same as the characteristic polynomial for T with the change of variable x = +ci : The roots of chN ( ) = 0 are therefore given by cj ci : When we restrict to the invariant subspace corresponding to the root ci ; we have chNi ( ) = 0 if and only if = 0: This then implies that chNi ( ) = k for some integer k 1; and Ni is nilpotent. On Wi ; T acts like Ni plus the scalar ci times the identity operator. Suppose we choose a basis for the subspace Wi corresponding to the cyclic decomposition for the nilpotent operator Ni : Then the matrix of Ti in this ordered basis will be the direct sum of matrices of the form 2 3 ci 0 0 0 6 1 ci 0 0 7 6 7 6 0 1 0 0 7 J =6 7: 6 .. .. .. .. 7 4 . . . . 5 0

0

1 ci

The matrix J is called an elementary Jordan composition of Ti will take the form 2 i J1 0 0 6 0 J2i 0 6 Ai = 6 . . . .. .. 4 .. 0 0

26

0

Jni i

matrix. The full de3 7 7 7 5

where the dimensions of each of the elementary Jordan matrices Jki associated with the eigenvalue ci decrease as we read from left to right. The exact dimensions of the Jki depend on the dimension of the cyclic subspaces in the cyclic decomposition of Ni : Finally, this procedure yields the canonical Jordan form for the full operator T as a direct sum of matrices of the form Ai : 2 3 A1 0 0 6 0 A2 0 7 7 6 A=6 . . .. 7 : . .. .. 4 .. . 5 0 0 Ak (g) Example. Go back to the matrix 2 3 1 A=4 2 2 2 2

3 1 1 5: 0

We found previously that this matrix has minimal polynomial equal to its characteristic polynomial, ch( ) = ( 1)( 2)2 This matrix cannot be diagonalized, so we instead put it in Jordan form. From our previous analysis, we found that the vector 1 = [1; 0; 2] spanned the eigenspace corresponding to the unit eigenvalue. We found that the vector 2 = [1; 1; 2] was annihilated by A 2I; but because this operator has rank 2, we could not …nd a second invariant direction. Consider 2 3 1 1 0 2 [A 2I] = 4 0 0 0 5 : 2 2 0

This matrix clearly has rank one. It’s null space is spanned by any vector = [x; y; z] having x = y: Note that 2 satis…es this condi2 tion, so the nullity of A 2I is contained in the nullity of [A 2I] : There are, however, other linearly independent vectors in the nullity 2 of [A 2I] which are not in the nullity of A 2I: Pick such a vector; 3 = [0; 0; 1] will do, for example. Note next that any vector in 2 the nullity of [A 2I] which is independent of 2 generates a cyclic subspace for A; since all of these vectors are mapped by A 2I into the subspace spanned by 2 : To see this, calculate 2 32 3 2 3 2 3 1 1 1 x 2x z 1 1 5 4 x 5 = 4 2x z 5 = (2x z) 4 1 5 : [A 2I] = 4 2 0 2 2 2 z 4x 2z 2 Hence, as long as 2x 6= z; (i.e. as long as the vector is independent of 2 ) A 2I maps into the nullity of A 2I: The subspace S of R3 spanned by the vectors 2 and 3 is invariant under the action of 27

A: To see this, let vectors. Then

=w

A

2

+v

= wA = wc1

3

be a linear combination of these

+ vA 2 + v [2

2

3 2] :

3

The last step follows from the fact that [A

2I] )A

3 3

= =

2

2

3

2:

Hence, A is a linear combination of 2 and 3 ; so the subspace S is invariant. To put A into Jordan form, we generate the cyclic basis for S; starting with 3 : By construction, [A 2I] 3 = 2: Hence, our basis will be [ 3 ; 2 ; 1 ] : The matrix for the similarity transformation is given by 3 2 0 1 1 1 0 5: H=4 0 1 2 2 The inverse of this matrix is H

1

so that

2

2 =4 0 1

2

and

2 H 1A = 4 0 1 2 4 = 4 2 1

H

1

AH

2

0 1 1 0 2 1

0 1 1

3 1 0 5 0

32 1 3 0 54 2 0 2 3 2 1 5 0

32 4 0 2 0 2 1 54 0 = 4 2 1 1 0 1 2 3 2 0 0 = 4 1 2 0 5 = J: 0 0 1

1 2 2

3 1 1 5 0

3 1 1 1 0 5 2 2

(h) We can use this procedure for the general case as well. Focus on the invariant subspace Wi of T given by the primary decomposition theorem. As before, the operator Ni = Ti ci I obtained by restricting to 28

Wi is nilpotent, with minimal polynomial p(x) = xri for 1 ri di ; where di is the dimension of the subspace Wi : Let S1 be the null space of N = T ci I: If S1 has dimension di ; we are done. So suppose dim S1 < di : Let S2 be the complement of S1 in the nullity 2 of [T ci I] : We take the complement here because S1 is obviously 2 contained in the nullity of [T ci I] . Equally obvious is the fact that if S2 is empty, dim S1 = di : Now, any vector 2 S2 must be mapped by N into S1 : This follows from the fact that N N = 0; so N 2 S1 : The same calculation we used for the example will establish that the subspace spanned by S1 and S2 is invariant under T: If dim S1 + dim S2 = di ; we are done. Otherwise, we continue by generating the space S3 , which will be the complement of S1 S2 3 in the nullity of [T ci I] : This process will eventually generate a sequence of subspaces such that Wi =S1 Sk for some …nite k di : We can then generate the cyclic subspace corresponding to Wi (or equivalently, the cyclic basis) by starting with a basis for Sk ; and then operating on this basis with N (which, recall, maps everything in Sk into Sk 1 ). The cyclic basis can then be used to put the component Ti of T into Jordan form. 8. Symmetric Operators Operators which can be represented by symmetric matrices, i.e. matrices M such that M T = M; are called (not surprisingly) symmetric operators. Symmetric operators arise naturally in economic models in a variety of contexts. In constrained optimization problems, second-order conditions characterize when a solution to the …rst-order conditions is in fact a maximum or minimum, and this generally involves looking at the matrix of second derivatives of the Lagrangian for the optimization problem. Young’s theorem (from Advanced Calculus) tells us immediately that this matrix is symmetric. In a di¤erent context, one of the things one generally proves in the study of demand analysis is that the Slutsky matrix (which is the second derivative matrix of the expenditure function) is also symmetric. Symmetric operators have two useful properties. First, the eigenvalues of any real-valued symmetric matrix are real. Second, any symmetric matrix is diagonalizable via an orthogonal similarity transformation. Equivalently, the eigenvectors of the matrix can be chosen so as to form an orthonormal basis. We develop these results below. (a) Theorem. All eigenvalues of the symmetric matrix M are real. Proof: Suppose is an eigenvalue (real or complex) of M: Then there is at least one eigenvector (which may have complex components) such that M = : Take the complex conjugate of this (keeping in mind that elements of M are real, so that the conjugate matrix M = M) M = :

29

Multiply the expression M = by T and the expression M T by and subtract the two resulting expressions to get T

T

M

M

T

=

T

=(

)

=

T

Pn is since T = i=1 i i is a real-valued scalar. Note that T also strictly positive as long as 6= 0 since i i = j i j > 0 as long as i 6= 0: Next, since the quadratic form T M is just a number and the transpose of a number is just the number itself, it follows that T T M = TM T

since M is symmetric. Hence have = ; so that is real.

T

M

M

= 0; so we must

(b) Theorem. If M is symmetric, M is diagonalizable. Proof: Suppose not. Then the characteristic polynomial of M has r some repeated factor ( ci ) i and the null space of M ci I has dimension strictly less than ri : In this case, our previous work shows 2 that there exists some vector in the null space of [M ci I] which is not in the null space of M ci I but which is mapped into the null space of M ci I by this matrix. Hence, = [M

ci I]

is non-zero and lies in the nullity of M 2 k k > 0: But 2

k k

=

T T

[M

= [M = 0:

T

ci I] [M 2

ci I]

ci I: Since

is non-zero,

ci I]

since M is symmetric

But this is a contradiction. Hence, the nullity of M ci I must have maximal dimension ri ; and the matrix is diagonalizable. (c) Theorem. The eigenvectors corresponding to di¤erent eigenvalues of a symmetric matrix are orthogonal. Proof: Let M i= i i and for i 6= j: Then

M

j

=

j

T j M

i

=

T i j

=

j

j

i

and T i M

30

j

T i

j:

Since

T j M

i

T i M

=

j,

(

subtracting the expressions above yields T i

j)

i

j

=0

which then implies Ti j = 0 since the eigenvalues are assumed to be distinct. Note that this argument won’t work is M is not symmetric, since the eigenvalues of the transpose of a non-symmetric matrix need not be the same as those of the original matrix. Finally, since we know that M is diagonalizable, if some eigenvalue is repeated, we can choose the basis of the invariant subspace corresponding to that eigenvalue so that it is orthogonal. In most applications, since the actual length of the eigenvectors doesn’t matter, we normalize the basis vectors to have norm one. In this case, we obtain what is called an orthonormal basis. (d) Here is another (longer) proof that if M is symmetric, it is diagonalizable. If all the eigenvectors of M are distinct, the argument above shows that there is a full orthonormal basis for the vector space V consisting of eigenvectors of M; and obviously, in this basis, M is diagonal. So, assume some eigenvalue j occurs with multiplicity r 2: We will show that we can …nd r linearly independent orthonormal eigenvectors corresponding to the eigenvalue j : Since there cannot be more than r linearly independent eigenvectors corresponding to this eigenvalue, this will imply that M is diagonalizable. We proceed by construction. Let 1 be an eigenvector (having norm 1) of M corresponding to i : Choose vectors u1 ; :::; un 1 such that [ 1 ; u1 ; :::; un 1 ] is an orthonormal basis for V: Consider the matrix Q1 = [

1 ; u1 ; :::; un 1 ] :

Then M Q1

= [M = [ j

1 ; M u1 ; :::; M un 1 ] 1 ; M u1 ; :::; M un 1 ]

and 2

6 6 QT1 M Q1 = 6 4

T 1 T j u1

j

T 1 M u1 T u1 M u1

1 1

.. .

T j un 1 1

.. . 1 M u1

uTn

..

. uTn

Since the basis is orthonormal, we know that uTj 1; :::; n 1; while T 1 M uj

= uTj M

1

T j uj

1

= =

0: 31

3

T 1 M un 1 T u1 M u n 1

.. . 1 M un 1

by symmetry of M

1

7 7 7: 5

= 0 for j =

Hence, letting

ij

= uTi M uj and

=

M1 = QT1 M Q1 =

ij

; we have 0

j

0

where the matrix is symmetric and of order n 1: Since M1 and M are similar, they have the same characteristic polynomial. For M1 ; we have chM ( ) = chM1 ( ) = (

j) j

Ij :

Thus, since we have assumed that j is a root of positive multiplicity for M; it must also be a root of ch ( ) = j Ij : It then follows that the matrix [M1

j I]

=

0 0

0 j In 1

has null space of at least dimension two, so we can …nd a second eigenvector 2 which is orthogonal to 1 and linearly independent of it. Indeed, since 1 is represented by the …rst unit vector in the basis consisting of the columns of Q1 ; if we take 2 to have its …rst component zero, it will be orthogonal to 1 : The rest of the components of 2 are then determined by …nding a non-zero vector ^ 2 such that [ j In 1 ] ^ 2 = 0: Constructing a new basis [ 1 ; 2 ; u1 :::; un 2 ] and forming the orthogonal matrix Q2 as above, we obtain 2 3 0 0 j 0 5 M2 = 4 0 j 0 0 0 with 0 a symmetric order n 2 matrix. Continuing in this fashion, we eventually obtain a diagonal matrix with all eigenvalues on the main diagonal, each repeated as many times as their multiplicity in the characteristic polynomial.

(e) Diagonal matrices are particularly easy to work with. For example, consider the de…niteness criterion for determining whether we are at a maximum or minimum in an optimization problem. For a minimum, we need the second derivative matrix (which is symmetric) to be positive de…nite. This means that for any vectors x 2 Rn , the quadratic form xT M x > 0: If we de…ne a change of variable using the basis of eigenvectors of M; y = Hx; where H is the orthogonal matrix consisting of the eigenvectors of M; we have xT M x = y T HM H T y = yT y

32

where

is diagonal. From this, it is clear that yT y =

n X

2 i yi

i=1

from which it follows that the quadratic form will be positive if and only if every i > 0: This condition also gives the familiar condition on the minors of the matrix M that one frequently sees, which requires that all the principle minors of M be strictly positive. With a small amount of e¤ort, you can show that the principle minors are una¤ected by orthogonal similarity transformations, from which the desired result follows. For negative de…niteness (which we need to ensure we are at a maximum), the matrix M of second partial derivatives must be negative de…nite, which occurs if and only if all eigenvalues are negative. This, in turn yields the condition on the principle minors requiring that they alternate in sign. (f) A second example of the usefulness of the diagonalization process can be found in studying di¤erential equations. Suppose we have a di¤erential equation of the form x_ = Ax where A is some diagonalizable matrix. Clearly, 0 is a steady-state for this system, since A0 = 0 implies the system doesn’t move. What, if anything can be say about the stability of this system if we start at a point other than x = 0? Make a change of variable to y = H 1 x where H is a matrix of eigenvectors of A: In terms of the new variables, we have x_ = H y_ = AHy or y_

= H 1 AHy = y:

Since is diagonal, the system of di¤erential equations is said to uncouple, and we can treat each component of the system as an independent system y_ i = i yi which can be solved by considering y_ i = yi

i:

Integrating, and recognizing that the left-hand side integrates to the log of yi ; we get ln yi (t) = i t + C 33

or yi (t) = Ae

it

where A = eC : Clearly, then, limt!1 yi (t) = 0 if and only if i < 0 (assuming i is real; if i is complex, then we require that its real part be negative).

34