Spectral Theory in Hilbert Space

0 downloads 0 Views 683KB Size Report
33. Chapter 6. Nevanlinna functions. 35. Chapter 7. The spectral theorem. 39. Exercises for Chapter ..... Let M be an arbitrary set and let CM be the set of complex-valued functions ...... form (12.2), but with an angle ˜α ∈ [0,π), an interval [0, ... the rays, and that |f(z)| ≤ AeB|z|1/2 in the sector, for some constants ...... Page 135 ...
Spectral Theory in Hilbert Space Lectures fall 2008 Christer Bennewitz

c 1993–2008 by Christer Bennewitz Copyright °

Preface The aim of these notes is to present a reasonably complete exposition of Hilbert space theory, up to and including the spectral theorem for the case of a (possibly unbounded) selfadjoint operator. As an application, eigenfunction expansions for regular and singular boundary value problems of ordinary differential equations are discussed. We first do this for the simplest Sturm-Liouville equation, and then, using very similar methods of proof, for a fairly general type of first order systems, which include so called Hamiltonian systems. Prerequisites are modest, but a good understanding of Lebesgue integration is assumed, including the concept of absolute continuity. Some previous exposure to linear algebra and basic functional analysis (uniform boundedness principle, closed graph theorem and maybe weak∗ compactness of the unit ball in a (separable) Banach space) is expected from the reader, but in the two places where we could have used weak∗ compactness, a direct proof has been given. The standard proofs of the Banach-Steinhaus and closed graph theorems are given in Appendix A. A brief exposition of the Riemann-Stieltjes integral, sufficient for our needs, is given in Appendix B. A few elementary facts about ordinary linear differential equations are used. These are proved in Appendix C. In addition, some facts from elementary analytic function theory are used. Apart from this the lectures are essentially selfcontained. Egevang, August 2008 Christer Bennewitz

i

Contents Preface

i

Chapter 0. Introduction

1

Chapter 1. Linear spaces Exercises for Chapter 1

5 8

Chapter 2. Spaces with scalar product Exercises for Chapter 2

9 13

Chapter 3. Hilbert space Exercises for Chapter 3

15 21

Chapter 4. Operators Exercises for Chapter 4

23 30

Chapter 5. Resolvents Exercises for Chapter 5

31 33

Chapter 6. Nevanlinna functions

35

Chapter 7. The spectral theorem Exercises for Chapter 7

39 44

Chapter 8. Compactness Exercises for Chapter 8

45 49

Chapter 9. Extension theory 1. Symmetric operators 2. Symmetric relations Exercises for Chapter 9

51 51 53 57

Chapter 10. Boundary conditions Exercises for Chapter 10

59 67

Chapter 11. Sturm-Liouville equations Exercises for Chapter 11

69 81

Chapter 12. Inverse spectral theory 1. Asymptotics of the m-function 2. Uniqueness theorems

83 84 87

Chapter 13. First order systems

91 iii

iv

CONTENTS

Exercises for Chapter 13

98

Chapter 14. Eigenfunction expansions Exercises for Chapter 14

101 104

Chapter 15. Singular problems Exercises for Chapter 15

105 114

Appendix A. Functional analysis

117

Appendix B. Stieltjes integrals Exercises for Appendix B

121 127

Appendix C. Linear first order systems

129

Appendix. Bibliography

133

CHAPTER 0

Introduction Hilbert space is the most immediate generalization to the infinite dimensional case of finite dimensional Euclidean spaces (i.e., essentially Rn for real, and Cn for complex vector spaces). Probably its most important uses, and certainly its historical roots, are in spectral theory. Spectral theory for differential equations originates with the method of separation of variables, used to solve many of the equations of mathematical physics. This leads directly to the problem of expanding an ‘arbitrary’ function in terms of eigenfunctions of the reduced equation, which is the central problem of spectral theory. A simple example is that of a vibrating string. The string is supposed to be stretched over an interval I ⊂ R, be fixed at the endpoints a, b and vibrate transversally (i.e., in a direction perpendicular to the interval I) in a plane containing I. The string can then be described by a real-valued function u(x, t) giving the location at time t of the point of the string which moves on the normal to I through the point x ∈ I. In appropriate units the function u will then (for sufficiently small vibrations, i.e., we are dealing with a linearization of a more accurate model) satisfy the following equation:     (0.1)

∂ 2u ∂ 2u = ∂x2 ∂t2 u(a, t) = u(b, t) = 0 for t > 0    u(x, 0) and ut (x, 0) given

(wave equation) (boundary conditions) (initial conditions).

The idea in separating variables is first to disregard the initial conditions and try to find solutions to the differential equation that satisfy the boundary condition and are standing waves, i.e., of the special form u(x, t) = f (x)g(t). The linearity of the equation implies that sums of solutions are also solutions (the superposition principle), so if we can find enough standing waves there is the possibility that any solution might be a superposition of standing waves. By substituting f (x)g(t) for u in (0.1) it follows that f 00 (x)/f (x) = g 00 (t)/g(t). Since the left hand side does not depend on t, and the right hand side not on x, both sides are in fact equal to a constant −λ. Since the general solution of √ the equation g 00 (t) + λg(t) = 0 is a linear combination of sin( λ t) and 1

2

√ cos( λ t), it follows that ( (0.2)

0. INTRODUCTION

−f 00 = λf

in I

f (a) = f (b) = 0, √ √ and that g(t) = A sin( λ t) + B cos( λ t) for some constants A and B. As is easily seen, (0.2) has non-trivial solutions only when λ is π 2 an element of the sequence {λj }∞ 1 , where λj = ( b−a j) . The numbers λ1 , λ2 , . . . are the eigenvalues of (0.2), and the corresponding solutions π (non-trivial multiples of sin(j b−a (x − a))), are the eigenfunctions of (0.2). The set of eigenvalues is called the spectrum of (0.2). In general, a superposition of standing therefore of the form u(x, t) = √ √ waves isp P (Aj sin( λj t) + Bj cos( λj t)) sin( λj (x − a)). If we assume that we may differentiate the sum term by term, the initial conditions of (0.1) therefore require that X X π π π Bj sin( b−a j(x − a)) and Aj b−a j sin( b−a j(x − a)) are given functions. The question of whether (0.1) has a solution which is a superposition of standing waves for arbitrary initial conditions, is then clearly seen to amount to the Pquestion whether an ‘arbitrary’ function may be written as a series uj , where each term is an eigenfunction of (0.2), i.e., a solution for λ equal to one of the eigenvalues. We shall eventually show this to be possible for much more general differential equations than (0.1). The technique above was used systematically by Fourier in his Theorie analytique de la Chaleur (1822) to solve problems of heat conduction, which in the simplest cases (like our example) lead to what are now called Fourier series expansions. Fourier was never able to give a satisfactory proof of the completeness of the eigenfunctions, i.e., the fact that essentially arbitrary functions can be expanded in Fourier series. This problem was solved by Dirichlet somewhat later, and at about the same time (1830) Sturm and Liouville independently but simultaneously showed weaker completeness results for more general ordinary differential equations of the form −(pu0 )0 + qu = λu, with boundary conditions of the form Au + Bpu0 = 0, to be satisfied at the endpoints of the given interval. Here p and q are given, sufficiently regular functions, and A, B given real constants, not both 0 and possibly different in the two interval endpoints. The Fourier cases correspond to p ≡ 1, q ≡ 0 and A or B equal to 0. For the Fourier equation, the distance between successive eigenvalues decreases as the length of the base interval increases, and as the base interval approaches the whole real line, the eigenvalues accumulate everywhere on the positive real line. The Fourier series is then replaced by a continuous superposition, i.e., an integral, and we get the classical Fourier transform. Thus a continuous spectrum appears,

0. INTRODUCTION

3

and this is typical of problems where the basic domain is unbounded, or the coefficients of the equation have sufficiently bad singularities at the boundary. In 1910 Hermann Weyl [12] gave the first rigorous treatment, in the case of an equation of Sturm-Liouville type, of cases where continuous spectra can occur. Weyl’s treatment was based on the then recently proved spectral theorem by Hilbert. Hilbert’s theorem was a generalization of the usual diagonalization of a quadratic form, to the case of infinitely many variables. Hilbert applied it to certain integral operators, but it is not directly applicable to differential operators, since these are ‘unbounded’ in a sense we will discuss in Chapter 4. With the creation of quantum mechanics in the late 1920’s, these matters became of basic importance to physics, and mathematicians, who had not advanced much beyond the results of Weyl, took the matter up again. The outcome was the general spectral theorem, generally attributed to John von Neumann (1928), although essentially the same theorem had been proved by Torsten Carleman in 1923, in a less abstract setting. Von Neumann’s theorem is an abstract result, and detailed applications to differential operators of reasonable generality had to wait until the early 1950’s. In the meantime many independent results about expansions in eigenfunctions had been given, particularly for ordinary differential equations. In these lectures we will prove von Neumann’s theorem. We will then apply this theorem to differential equations, including those that give rise to the classical Fourier series and Fourier transform. Once one has a result about expansion in eigenfunctions a host of other questions appear, some of which we will discuss in these notes. Sample questions are: • How do eigenvalues and eigenfunctions depend on the domain I and on the form of the equation (its order, coefficients etc.)? A partial answer is given if one can calculate the asymptotic distribution of the eigenvalues, i.e., approximate the growth of λj as a function of j. For simple ordinary differential operators this can be done by fairly elementary means. The first such result for a partial differential equation was given by Weyl in 1912, and his method was later improved and extended by Courant. • How well does the expansion converge when expanding different classes of functions? Again, for ordinary differential operators some questions of this type can be handled by elementary methods, but in general the answer lies in the explicit asymptotic behavior of the so called spectral projectors. The first such asymptotic result was given by Carleman in 1934, and his method has been the basis for most later results.

4

0. INTRODUCTION

• Can the equation be reconstructed if the spectrum is known? If not, what else must one know? If different equations can have the same spectrum, how many different equations? What do they have in common? Questions like these are part of what is called inverse spectral theory. Really satisfactory answers have only been obtained for the equation −u00 + qu = λu, notably by Gelfand and Levitan in the early 1950:s. Pioneering work was done by G¨oran Borg in the 1940:s. • Another aspect of the first point is the following: Given a ‘base’ equation (corresponding to a ‘free particle’ in quantum mechanics) and another equation, which outside some bounded region is close to the base equation (an ‘obstacle’ has been introduced), how can one relate the eigenfunctions for the two equations? The main questions of so called scattering theory are of this type. • Related to the previous point is the problem of inverse scattering. Here one is given scattering data, i.e., the answer to the question in the previous point, and the question is whether the equation is determined by scattering data, whether there is a method for reconstructing the equation from the scattering data, and similar questions. Many questions of this kind are of great importance to applications.

CHAPTER 1

Linear spaces This chapter is intended to be a quick review of the basic facts about linear spaces. In the definition below the set K can be any field, although usually only the fields R of real numbers and C of complex numbers are of interest. Definition 1.1. A linear space or vector space over K is a set L provided with an addition +, which to every pair of elements u, v ∈ L associates an element u + v ∈ L, and a multiplication, which to every λ ∈ K and u ∈ L associates an element λu ∈ L. The following rules for calculation hold: (1) (u + v) + w = u + (v + w) for all u, v and w in L.(associativity) (2) There is an element 0 ∈ L such that u + 0 = 0 + u = u for every u ∈ L. (existence of neutral element) (3) For every u ∈ L there exists v ∈ L such that u+v = v +u = 0. One denotes v by −u. (existence of additive inverse) (4) u + v = v + u for all u, v ∈ L. (commutativity) (5) (6) (7) (8)

λ(u + v) = λu + λv for all λ ∈ K and all u, v ∈ L. (λ + µ)u = λu + µu for all λ, µ ∈ K and all u ∈ L. λ(µu) = (λµ)u for all λ, µ ∈ K and all u ∈ L. 1u = u for all u ∈ L.

If K = R we have a real linear space, if K = C a complex linear space. Axioms 1–3 above say that L is a group under addition, axiom 4 that the group is abelian (or commutative). Axioms 5 and 6 are distributive laws and axiom 7 an associative law related to the multiplication by scalars, whereas axiom 8 gives a kind of normalization for the multiplication by scalars. Note that by restricting oneself to multiplying only by real numbers, any complex space may also be viewed as a real linear space. Conversely, every real linear space can be ‘extended’ to a complex linear space (Exercise 1.1). We will therefore only consider complex linear spaces in the sequel. Let M be an arbitrary set and let CM be the set of complex-valued functions defined on M . Then CM , provided with the obvious definitions of the linear operations, is a complex linear space (Exercise 1.2). In the case when M = {1, 2, . . . , n} one writes Cn instead of C{1,2,...,n} . An element u ∈ Cn is of course given by the values u(1), u(2), . . . , u(n) 5

6

1. LINEAR SPACES

of u so one may also regard Cn as the set of ordered n-tuples of complex numbers. The corresponding real space is the usual Rn . If L is a linear space and V a subset of L which is itself a linear space, using the linear operations inherited from L, one says that V is a linear subspace of L. Proposition 1.2. A non-empty subset V of L is a linear subspace of L if and only if u + v ∈ V and λu ∈ V for all u, v ∈ V and λ ∈ C. The proof is left as an exercise (Exercise 1.3). If u1 , u2 , . . . , uk are elements of a linear space L we denote by [u1 , u2 , . . . , uk ] the linear hull of u1 , u2 , . . . , uk , i.e., the set of all linear combinations λ1 u1 + · · · + λk uk , where λ1 , . . . , λk ∈ C. It is not hard to see that linear hulls are always subspaces (Exercise 1.5). One says that u1 , . . . , uk generates L if L = [u1 , . . . , uk ], and any linear space which is the linear hull of a finite number of its elements is called finitely generated or finitedimensional. A linear space which is not finitely generated is called infinite-dimensional. It is clear that if, for example, u1 is a linear combination of u2 , . . . , uk , then [u1 , . . . , uk ] = [u2 , . . . , uk ]. If none of u1 , . . . , uk is a linear combination of the others one says that u1 , . . . , uk are linearly independent. It is clear that any finitely generated space has a set of linearly independent generators; one simply starts with a set of generators and goes through them one by one, at each step discarding any generator which is a linear combination of those coming before it. A set of linearly independent generators for L is called a basis for L. A given finite-dimensional space L can of course be generated by many different bases. However, a fundamental fact is that all such bases of L have the same number of elements, called the dimension of L. This follows immediately from the following theorem. Theorem 1.3. Suppose u1 , . . . , uk generate L, and that v1 , . . . , vj are linearly independent elements of L. Then j ≤ k. Pk Proof. Since u1 , . . . , uk generate L we have v1 = s=1 x1s us , for some coefficients x11 , . . . , x1k which are not all 0 since v1 6= 0. By renumbering Pk x1s u1 , . . . , uk we may assume x11 6= 0. Then u1 = 1 v − s=2 x11 us , and therefore v1 , u2 , . . . , uk generate L. In parx11 1 P ticular, v2 = x21 v1 + ks=2 x2s us for some coefficients x21 , . . . , x2k . We can not have x22 = · · · = x2k = 0 since v1 , v2 are linearly independent. By renumbering u2 , . . . , uk , if necessary, we may assume x22 6= 0. It follows as before that v1 , v2 , u3 , . . . , uk generate L. We can continue in this way until we run out of either v:s (if j ≤ k) or u:s (if j > k). But if j > k we would get that v1 , . . . , vk generate L, in particular that vj is a linear combination of v1 , . . . , vk which contradicts the linear independence of the v:s. Hence j ≤ k. ¤ For a finite-dimensional space the existence and uniqueness of coordinates for any vector with respect to an arbitrary basis now follows

1. LINEAR SPACES

7

easily (Exercise 1.6). More importantly for us, it is also clear that L is infinite dimensional if and only if every linearly independent subset of L can be extended to a linearly independent subset of L with arbitrarily many elements. This usually makes it quite easy to see that a given space is infinite dimensional (Exercise 1.7). If V and W are both linear subspaces of some larger linear space L, then the linear span [V, W ] of V and W is the set [V, W ] = {u | u = v + w where v ∈ V and w ∈ W }. This is obviously a linear subspace of L. If in addition V ∩ W = {0}, then for any u ∈ [V, W ] there are unique elements v ∈ V and w ∈ W such that u = v + w. In this case [V, W ] is called the direct sum of V ˙ . The proof of these facts is left as an and W and is denoted by V +W exercise (Exercise 1.9). If V is a linear subspace of L we can create a new linear space L/V , the quotient space of L by V , in the following way. We say that two elements u and v of L are equivalent if u − v ∈ V . It is immediately seen that any u is equivalent to itself, that u is equivalent to v if v is equivalent to u, and that if u is equivalent to v, and v to w, then u is equivalent to w. It then easily follows that we may split L into equivalence classes such that every vector is equivalent to all vectors in the same equivalence class, but not to any other vectors. The equivalence class containing u is denoted by u + V , and then u + V = v + V precisely if u − v ∈ V . We now define L/V as the set of equivalence classes, where addition is defined by (u + V ) + (v + V ) = (u+v)+V and multiplication by scalar as λ(u+V ) = λu+V . It is easily seen that these operations are well defined and that L/V becomes a linear space with neutral element 0 + V (Exercise 1.10). One defines codim V = dim L/V . We end the chapter by a fundamental fact about quotient spaces. Theorem 1.4. dim V + codim V = dim L. We leave the proof for Exercise 1.11.

8

1. LINEAR SPACES

Exercises for Chapter 1 Exercise 1.1. Let L be a real linear space, and let V be the set of ordered pairs (u, v) of elements of L with addition defined componentwise. Show that V becomes a complex linear space if one defines (x + iy)(u, v) = (xu − yv, xv + yu) for real x, y. Also show that L can be ‘identified’ with the subset of elements of V of the form (u, 0), in the sense that there is a one-to-one correspondence between the two sets preserving the linear operations (for real scalars). Exercise 1.2. Let M be an arbitrary set and let CM be the set of complex-valued functions defined on M . Show that CM , provided with the obvious definitions of the linear operations, is a complex linear space. Exercise 1.3. Prove Proposition 1.2. Exercise 1.4. Let M be a non-empty subset of Rn . Which of the following choices of L make it into a linear subspace of CM ? (1) L = {u ∈ CM | |u(x)| < 1 for all x ∈ M }. (2) L = C(M ) = {u ∈ CM | u is continuous in M }. (3) L = {u ∈ C(M ) | u is bounded on M }. (4) L = L(M ) = {u ∈ CM | u is Lebesgue integrable over M }. Exercise 1.5. Let L be a linear space and uj ∈ L, j = 1, . . . , k. Show that [u1 , u2 , . . . , uk ] is a linear subspace of L. Exercise 1.6. Show that if e1 , . . . , en is a basis for L, then for each u ∈ L there are uniquely determined complex numbers x1 , . . . , xn , called coordinates for u, such that u = x1 e1 + · · · + xn en . Exercise 1.7. Verify that L is infinite dimensional if and only if every linearly independent subset of L can be extended to a linearly independent subset of L with arbitrarily many elements. Then show that u1 , . . . , uk are linearly independent if and only if λ1 u1 +· · ·+λk uk = 0 only for λ1 = · · · = λk = 0. Also show that CM is finite-dimensional if and only if the set M has finitely many elements. Exercise 1.8. Let M be an open subset of Rn . Verify that L is infinite-dimensional for each of the choices of L in Exercise 1.4 which make L into a linear space. Exercise 1.9. Prove all statements in the penultimate paragraph of the chapter. Exercise 1.10. Prove that if L is a linear space and V a subspace, then L/V is a well defined linear space. Exercise 1.11. Prove Theorem 1.4.

CHAPTER 2

Spaces with scalar product If one wants to do analysis in a linear space, some structure in addition to the linearity is needed. This is because one needs some way to define limits and continuity, and this requires an appropriate definition of what a neighborhood of a point is. Thus one must introduce a topology in the space. We will not deal with the general notion of topological vector space here, but only the following particularly convenient way to introduce a topology in a linear space. This also covers most cases of importance to analysis. A metric space is a set M provided with a metric, which is a function d : M × M → R such that for any x, y, z ∈ M the following holds. (1) d(x, y) ≥ 0 and = 0 if and only if x = y. (positive definite) (2) d(x, y) = d(y, x). (symmetric) (3) d(x, y) ≤ d(x, z) + d(z, y). (triangle inequality) A neighborhood of x ∈ M is then a subset O of M such that for some ε > 0 the set O contains all y ∈ M for which d(x, y) < ε. An open set is a set which is a neighborhood of all its points, and a closed set is one with an open complement. One says that a sequence x1 , x2 , . . . of elements in M converges to x ∈ M if d(xj , x) → 0 as j → ∞. The most convenient, but not the only important, way of introducing a metric in a linear space L is via a norm (Exercise 2.1). A norm on L is a function k·k : L → R such that for any u, v ∈ L and λ ∈ C (1) kuk ≥ 0 and = 0 if and only if u = 0. (positive definite) (2) kλuk = |λ|kuk. (positive homogeneous) (3) ku + vk ≤ kuk + kvk. (triangle inequality) The usual norm in the real space R3 is of course obtained from the dot product (x1 , x2 , x3 ) · (y1 , y2 , y3 ) = x1 y1 + x2 y2 + x3 y3 by setting kxk = √ x · x. For an infinite-dimensional linear space L, itpis sometimes possible to define a norm similarly by setting kuk = hu, ui, where h·, ·i is a scalar product on L. A scalar product is a function L×L → C such that for all u, v and w in L and all λ, µ ∈ C holds (1) hλu + µv, wi = λhu, wi + µhv, wi. (linearity in first argument) (2) hu, vi = hv, ui. (Hermitian symmetry) (positive definite) (3) hu, ui ≥ 0 with equality only if u = 0. If instead of (3) holds only (3’) hu, ui ≥ 0,

(positive semi-definite) 9

10

2. SPACES WITH SCALAR PRODUCT

one speaks about a semi-scalar product. Note that (2) implies that hu, ui is real so that (3) makes sense. Also note that by combining (1) and (2) we have hw, λu + µvi = λhw, ui + µhw, vi. One says that the scalar product is anti-linear in its second argument (Warning: In the so called Dirac formalism in quantum mechanics the scalar product is instead anti-linear in the first argument, linear in the second). Together with (1) this makes the scalar product into a sesqui-linear (=1 12 -linear) form. In words: A scalar product is a Hermitian, sesqui-linear and positive definite form. Wepnow assume that we have a scalar product on L and define kuk = hu, ui for any u ∈ L. To show that this definition makes k·k into a norm we need the following basic theorem. Theorem 2.1. (Cauchy-Schwarz) If h·, ·i is a semi-scalar product on L, then for all u, v ∈ L holds |hu, vi|2 ≤ hu, uihv, vi. Proof. For arbitrary complex λ we have 0 ≤ hλu + v, λu + vi = |λ|2 hu, ui + λhu, vi + λhv, ui + hv, vi. For λ = −rhv, ui with real r we obtain 0 ≤ r2 |hu, vi|2 hu, ui − 2r|hu, vi|2 + hv, vi. If hu, ui = 0 but hu, vi 6= 0 this expression becomes negative for r > 12 hv, vi|hu, vi|−2 which is a contradiction. Hence hu, ui = 0 implies that hu, vi = 0 so that the theorem is true in the case when hu, ui = 0. If hu, ui 6= 0 we set r = hu, ui−1 and obtain, after multiplication by hu, ui, that 0 ≤ −|hu, vi|2 + hu, uihv, vi which proves the theorem. ¤ p In the case of a scalar product, defining kuk = hu, ui, we may write the Cauchy-Schwarz inequality as |hu, vi| ≤ kukkvk. In this case it is also easy to see when there is equality in Cauchy-Schwarz’ inequality. To see that k·k is a norm on L the only non-trivial point is to verify that the triangle inequality holds; but this follows from Cauchy-Schwarz’ inequality (Exercise 2.4). Recall that in a finite dimensional space with scalar product it is particularly convenient to use an orthonormal basis since this makes it very easy to calculate the coordinates of any vector. In fact, if x1 , . . . , xn are the coordinates of u in the orthonormal basis e1 , . . . , en , then xj = hu, ej i (recall that e1 , . . . , en is called orthonormal if all basis elements have norm 1 and hej , ek i = 0 for j 6= k). Given an arbitrary basis it is easy to construct an orthonormal basis by use of the GramSchmidt method (see the proof of Lemma 2.2). In an infinite-dimensional space one can not find a (finite) basis. The best one can hope for are infinitely many vectors e1 , e2 , . . . such that each finite subset is linearly independent, and any vector is the limit in norm of a sequence of finite linear combinations of e1 , e2 , . . . . Again, it will turn out to be very convenient if e1 , e2 , . . . is an orthonormal sequence, i.e., kej k = 1 for j = 1, 2, . . . and hej , ek i = 0 for j 6= k. The following lemma is easily proved by use of the Gram-Schmidt procedure.

2. SPACES WITH SCALAR PRODUCT

11

Lemma 2.2. Any infinite-dimensional linear space L with scalar product contains an orthonormal sequence. Proof. According to Chapter 1 we can find a linearly independent sequence in L, i.e., a sequence u1 , u2 , . . . such that u1 , . . . , uk are linearly independent for any k. Put e1 = u1 /ku1 k and v2 = u2 −hu2 , e1 ie1 . Next put e2 = v2 /kv2 k. If we have already found e1 , . . . , ek , put P vk+1 = uk+1 − kj=1 huk+1 , ej iej and ek+1 = vk+1 /kvk+1 k. I claim that this procedure will lead to a well defined orthonormal sequence e1 , e2 , . . . . This is left for the reader to verify (Exercise 2.6). ¤ Supposing we have an orthonormal sequence e1 , e2 , . . . in L a natural question is: How well can one approximate (in the norm of L) an arbitrary vector u ∈ L by finite linear combinations of e1 , e2 , . . . . Here is the answer: Lemma 2.3. Suppose e1 , e2 , . . . is an orthonormal sequence in L and put, for any u ∈ L, uˆj = hu, ej i. Then we have (2.1)

ku −

k X

k k X X 2 λj ej k = kuk − |ˆ uj | + |λj − uˆj |2 2

2

j=1

j=1

j=1

for any complex numbers λ1 , . . . , λk . The proof is by calculation (Exercise 2.7). The interpretation of Lemma 2.3 is very interesting. The identity (2.1) says that if we want P to choose a linear combination kj=1 λj ej of e1 , . . . , ek which approximates u well in norm, the best choice of coefficients is to take λj = uˆj , j = 1, . . .P , k. Furthermore, with P this choice, the error is given exactly k 2 2 by ku − j=1 uˆj ej k = kuk − kj=1 |ˆ uj |2 . One calls the coefficients uˆ1 , uˆ2 , . . . the (generalized) Fourier coefficients of u with respect to the orthonormal sequence e1 , e2 , . . . . The following theorem is an immediate consequence of Lemma 2.3 (Exercise 2.8). P Theorem 2.4 (Bessel’s inequality). For any u the series ∞ u j |2 j=1 |ˆ converges and one has ∞ X |ˆ uj |2 ≤ kuk2 . j=1

Another immediate consequence of Lemma 2.3 is the next theorem (cf. Exercise 2.9). P ˆj ej converges Theorem 2.5 (Parseval’s formula). The series ∞ j=1 u P∞ 2 2 uj | = kuk . (in norm) to u if and only if j=1 |ˆ There is also a slightly more general form of Parseval’s formula. P uj |2 = kuk2 for some u ∈ L. Then Corollary 2.6. Suppose ∞ j=1 |ˆ P∞ ˆj vˆj = hu, vi for any v ∈ L. j=1 u

12

2. SPACES WITH SCALAR PRODUCT

Proof. Consider the following form on L. ∞ X [u, v] = hu, vi − uˆj vˆj . j=1 1 (|ˆ uj |2 2

2

+ |ˆ vj | ) by the arithmetic-geometric inequality, Since |ˆ uj vˆj | ≤ Bessel’s inequality shows that the series is absolutely convergent. It follows that [·, ·] is a Hermitian, sesqui-linear form on L. Because of Bessel’s inequality it is also positive (but not positive definite). Thus [·, ·] is a semi-scalar product on L. Applying Cauchy-Schwarz’ inequality |[u, v]|2 ≤ [u, u][v, v]. By assumption [u, u] = P∞we obtain 2 2 kuk − j=1 |ˆ uj | = 0 so that the corollary follows. ¤ It is now obvious that the closest analogy to an orthonormal basis in an infinite-dimensional space with scalar product is an orthonormal sequence with the additional property of the following definition. Definition 2.7. An orthonormal sequence in L is called complete P 2 if the Parseval identity kuk2 = ∞ |ˆ u j | holds for every u ∈ L. 1 It is by no means clear that we can always find complete orthonormal sequences in a given space. This requires the space to be separable. Definition 2.8. A metric space M is called separable if it has a dense, countable subset. This means a sequence u1 , u2 , . . . of elements of M , such that for any u ∈ M , and any ε > 0, there is an element uj of the sequence for which d(u, uj ) < ε. The vast majority of spaces used in analysis are separable (Exercise 2.10), but there are exceptions (Exercise 2.12). Theorem 2.9. A infinite-dimensional linear space with scalar product is separable if and only if it contains a complete orthonormal sequence. The proof is left as an exercise (Exercise 2.11). Suppose e1 , e2 , . . . is a complete orthonormalP sequence in L. We then know that any u ∈ L may be written as u = ∞ ˆj ej , where the series converges in norm. j=1 u P uj |2 converges to kuk2 . The Furthermore the numerical series ∞ j=1 |ˆ following question now P arises:2 Given a sequence λ1 , λ2 , . . . of complex numbers for which ∞ j=1 |λj | converges, does there exist an element u ∈ LPfor which λ1 , λ2 , . . . are the Fourier coefficients? Equivalently, does ∞ j=1 λj ej converge to an element u ∈ L in norm? As it turns out, this is not always the case. The property required of L is that it is complete. Warning: This is a totally different property from the completeness of orthonormal sequences we discussed earlier! To explain what it is, we need a few definitions. Definition 2.10. A Cauchy sequence in a metric space M is a sequence u1 , u2 , . . . of elements of M such that d(uj , uk ) → 0 as j, k →

EXERCISES FOR CHAPTER 2

13

∞. More exactly: To every ε > 0 there exists a number ω such that d(uj , uk ) < ε if j > ω and k > ω. It is clear by use of the triangle inequality that any convergent sequence is a Cauchy sequence. Far more interesting is the fact that this implication may sometimes be reversed. Definition 2.11. A metric space M is called complete if every Cauchy sequence converges to an element in M . A normed linear space which is complete P is called a Banach space. ∞ 2 If the norm derives from a scalar product, j=1 |λj | converges and Pk e1 , e2 , . . . is an orthonormal sequence we put uk = j=1 λj ej . If k < n we then have (the second equality is a special case of Lemma 2.3) 2

kun − uk k = k

n X

2

λj e j k =

j=k+1

n X

2

|λj | =

j=k+1

n X j=1

2

|λj | −

k X

|λj |2 .

j=1

P∞

Since j=1 |λj |2 converges the right hand side → 0 as k, n → ∞. Hence u1 , u2 , . . . is a Cauchy P∞ sequence in L. It therefore follows that if L is complete, then j=1 λj ej actually converges in norm to an element of L. On the other hand, if L is not complete and e1 , e2 , . . . is an orthonormal sequence, then λ1 , λ2 , . . . may be P chosen so that the series P∞ ∞ 2 λ e does not converge in L although j=1 j j j=1 |λj | is convergent (Exercise 2.14). Exercises for Chapter 2 Exercise 2.1. Show that if k·k is a norm on L, then d(x, y) = ku − vk is a metric on L. Exercise 2.2. Show that d(x, y) = arctan|x − y| is a metric on R which can be extended to a metric on the set of extended reals R = R ∪ {−∞} ∪ {∞}. Exercise 2.3. Consider the linear space C 1 [0, 1], consisting of complex-valued, differentiable functions with continuous derivative, defined in [0, 1]. Show that the following are all norms on C 1 [0, 1]. • kuk∞ = sup0≤x≤1 |u(x)| , R1 • kuk1 = 0 |u| , • kuk1,∞ = ku0 k∞ + kuk∞ . Invent some more norms in the same spirit! Exercise 2.4. Find all cases of equality in Cauchy-Schwarz’ inequality for a scalar product! Then show that k·k, defined by kuk = p hu, ui, where h·, ·i is a scalar product, is a norm.

14

2. SPACES WITH SCALAR PRODUCT

R1 Exercise 2.5. Show that hu, vi = 0 u(x)v(x) dx is a scalar product on the space C[0, 1] of continuous, complex-valued functions defined on [0, 1]. Exercise 2.6. Finish the proof of Lemma 2.2. Exercise 2.7. Prove Lemma 2.3. Exercise 2.8. Prove Bessel’s inequality! Exercise 2.9. Prove Parseval’s formula! Exercise 2.10. It is well known that the set of step functions which are identically 0 outside a compact subinterval of an interval I are dense in L2 (I). Use this to show that L2 (I) is separable. Exercise 2.11. Prove Theorem 2.9. Hint: Use Gram-Schmidt! Exercise 2.12. the set of complex-valued functions u of Pk Let Liαbe jx the form u(x) = j=1 λj e where α1 , . . . , αk are (a finite number of) different real numbers and λ1 , . . . , λk are complex numbers. Show that L is a linear subspace of C(R) (the functions continuous on the real RT 1 line) on which hu, vi = limT →∞ 2T uv serves as a scalar product. −T iαx Then show that the norm of e is 1 for any α ∈ R and that eiαx is iβx orthogonal to e as soon as α 6= β. Conclude that L is not separable. Exercise 2.13. Show that as metric spaces the set Q of rational numbers is not complete but the set R of reals is. Exercise 2.14. Suppose L is a space with scalar product which is not complete, and that e1 , e2 , . . . is a complete orthonormal sequence in L. ShowP that there exists P a sequence λ1 , λ2 , . . . of complex numbers, 2 such that |λj | < ∞ but λj ej does not converge to any element of L.

CHAPTER 3

Hilbert space A Hilbert space is a linear space H (we will as always assume that the scalars are complex numbers) provided with a scalar product such that the space is also complete, i.e., any Cauchy sequence (with respect to the norm induced by the scalar product) converges to an element of H. We denote the scalar p product of u and v ∈ H by hu, vi and the norm of u by kuk = hu, ui. It is usually required, and we will follow this convention, that the space be separable as well, i.e., there is a countable, dense subset. Recall that this means that any element can be arbitrarily well approximated in norm by elements of this dense subset. In the present case this means that H has a complete orthonormal sequence, and conversely, if the space has a complete orthonormal sequence it is separable (Theorem 2.9). As is usual we will also assume that H is infinite-dimensional. Example 3.1. The space `2 consists of all P infinite sequences u = (u1 , u2 , . . . ) of complex numbers for which |uj |2 < ∞, i.e., which are square summable. P The scalar product of u with v = (v1 , v2 , . . . ) is defined as hu, vi = uj v j . This series is absolutely convergent since 2 2 |uj v j | ≤ (|uj | + |vj | )/2 and u, v are square summable. Show that `2 is a Hilbert space (Exercise 3.1)! The space Hilbert himself dealt with was `2 . Actually, any Hilbert space is isometrically isomorphic to `2 , i.e., there is a bijective (oneto-one and onto) linear map H 3 u 7→ uˆ ∈ `2 such that hu, vi = hˆ u, vˆi for any u and v in H (Exercise 3.2). This is the reason any complete, separable and infinite-dimensional space with scalar product is called a Hilbert space. However, there are infinitely many isomorphisms that will serve, and none of them is ‘natural’, i.e., in general to be preferred to any other, so the fact that all Hilbert spaces are isomorphic is not particularly useful in practice. Example 3.2. The most important example of a Hilbert space is L (Ω, µ) where Ω is some domain in Rn and µ is a (Radon) measure defined there; often µ is simply Lebesgue measure. The space consists of (equivalence classes of) complex-valued functions on Ω, measurable with respect to µ and with integrable square over Ω with respect to µ. That this space is separable and complete is proved in courses on the theory of integration. 2

15

16

3. HILBERT SPACE

Given a normed space one may of course ask whether there is a scalar product on the space which gives rise to the given norm in the usual way. Here is a simple criterion. Lemma 3.3. (parallelogram identity) If u and v are elements of H, then ku + vk2 + ku − vk2 = 2kuk2 + 2kvk2 . Proof. A simple calculation gives ku±vk2 = hu±v, u±vi = kuk2 ± (hu, vi + hv, ui) + kvk2 . Adding this for the two signs the parallelogram identity follows. ¤ The name parallelogram identity comes from the fact that the lemma can be interpreted geometrically, as saying that the sum of the squares of the lengths of the sides in a parallelogram equals the sum of the squares of the lengths of the diagonals. This is a theorem that can be found in Euclid’s Elements. Given a normed space, Lemma 3.3 shows that a necessary condition for the norm to be associated with a scalar product is that the parallelogram identity holds for all vectors in the space. It was proved by von Neumann in 1929 that this is also sufficient (Exercise 3.3). We shall soon have another use for the parallelogram identity. In practice it is quite common that one has a space with scalar product which is not complete (such a space is often called a pre-Hilbert space). In order to use Hilbert space theory, one must then embed the space in a larger space which is complete. The process is called completion and is fully analogous to the extension of the rational numbers to the reals, which is also done to make the Cauchy convergence principle valid. In very brief outline the process is as follows. Starting with a (not complete) normed linear space L let Lc be the set of all Cauchy sequences in L. The set Lc is made into a linear space in the obvious way. We may embed L in Lc by identifying u ∈ L with the sequence (u, u, u, . . . ). In Lc we may introduce a semi-norm k·k (i.e., a norm except that there may be non-zero elements u in the space for which kuk = 0) by setting k(u1 , u2 , . . . )k = limkuj k. Now let Nc be the subspace of Lc consisting of all elements with semi-norm 0, and put H = Lc /Nc , i.e., elements in Lc are identified whenever the distance between them is 0. One may now prove that k·k induces a norm on H under which H is complete, and that through the identification above we may consider the original space L as a dense subset of H. If the original norm came from a scalar product, then so will the norm of H. We leave to the reader to verify the details, using the hints provided (Exercise 3.4). The process above is satisfactory in that it shows that any normed space may be ‘completed’ (in fact, the same process works in any metric space). Equivalence classes of Cauchy sequences are of course rather abstract objects, but in concrete cases one can often identify the elements

3. HILBERT SPACE

17

of the completion of a given space with more concrete objects. So, for example, one may view L2 (Ω, µ) as the completion, in the appropriate norm, of the linear space C0 (Ω) of functions which are continuous in Ω and 0 outside a compact subset of Ω. In the sequel H is always assumed to be a Hilbert space. There are two properties which make Hilbert spaces far more convenient to deal with than more general spaces. The first is that any closed, linear subspace has a topological complement which can be chosen in a canonical way (Theorem 3.7). The second, a Hilbert space can be identified with its topological dual (Theorem 3.8). Both these properties are actually true even if the space is not assumed separable (and of course if the space is finite-dimensional), as our proofs will show. To prove them we start with the following definition. Definition 3.4. A set M is called convex if it contains all linesegments connecting two elements of the set, i.e., if u and v ∈ M , then tu + (1 − t)v ∈ M for all t ∈ [0, 1]. A subset of a metric space is of course called closed if all limits of convergent sequences contained in the subset are themselves in the subset. It is easily seen that this is equivalent to the complement of the subset being open, in the sense that it is a neighborhood of all its points (check this!). Lemma 3.5. Any closed, convex subset K of H has a unique element of smallest norm. Proof. Put d = inf{kuk | u ∈ K}. Let u1 , u2 , . . . be a minimizing sequence, i.e., uj ∈ K and kuj k → d. By the parallelogram identity we then have kuj − uk k2 = 2kuj k2 + 2kuk k2 − 4k(uj + uk )/2k2 . On the right hand side the two first terms both tend to 2d2 as j, k → ∞. By convexity (uj + uk )/2 ∈ K so the last term is ≥ 4d2 . Therefore u1 , u2 , . . . is a Cauchy sequence, and has a limit u which obviously has norm d and is in K, since K is closed. If u and v are both minimizing elements, replacing uj by u and uk by v in the calculation above immediately shows that u = v, so the minimizing element is unique. ¤ Lemma 3.6. Suppose M is a proper ( i.e., M 6= H) closed, linear subspace of H. Then there is a non-trivial normal to M , i.e., an element u 6= 0 in H such that hu, vi = 0 for all v ∈ M . / M and put K = w + M . Then K is obviously Proof. Let w ∈ closed and convex so it has a smallest element u which is non-zero since 0 ∈ / K. Let v 6= 0 be in M so that u + av ∈ K for any scalar a. Hence kuk2 ≤ ku + avk2 = kuk2 + 2 Re(ahv, ui) + |a|2 kvk2 . Setting a = −hu, vi/kvk2 we obtain −(|hu, vi|/kvk)2 ≥ 0 so that hu, vi = 0. ¤

18

3. HILBERT SPACE

Two subspaces M and N are said to be orthogonal if every element in M is orthogonal to every element in N . Then clearly M ∩ N = {0} so the direct sum of M and N is defined. In the case at hand this is called the orthogonal sum of M and N and denoted by M ⊕ N . Thus M ⊕ N is the set of all sums u + v with u ∈ M and v ∈ N . If M and N are closed, orthogonal subspaces of H, then their orthogonal sum is also a closed subspace of H (Exercise 3.5). If A is an arbitrary subset of H we define A⊥ = {u ∈ H | hu, vi = 0 for all v ∈ A}. This is called the orthogonal complement of A. It is easy to see that A⊥ is a closed linear subspace of H, that A ⊂ B implies B ⊥ ⊂ A⊥ and that A ⊂ (A⊥ )⊥ (Exercise 3.6). When M is a linear subspace of H an alternative way of writing M ⊥ is H ª M . This makes sense because of the following theorem of central importance. Theorem 3.7. Suppose M is a closed linear subspace of H. Then M ⊕ M ⊥ = H. Proof. M ⊕ M ⊥ is a closed linear subspace of H so if it is not all of H, then it has a non-trivial normal u by Lemma 3.6. But if u is orthogonal to both M and M ⊥ , then u ∈ M ⊥ ∩ (M ⊥ )⊥ which shows that u cannot be 6= 0. The theorem follows. ¤ A nearly obvious consequence of Theorem 3.7 is that M ⊥⊥ = M for any closed linear subspace M of H (Exercise 3.7). A linear form ` on H is complex-valued linear function on H. Naturally ` is said to be continuous if `(uj ) → `(u) whenever uj → u. The set of continuous linear forms on a Banach space B (or a more general topological vector space) is made into a linear space in an obvious way. This space is called the dual of B, and is denoted by B 0 . A continuous linear form on a Banach space B has to be bounded in the sense that there is a constant C such that |`(u)| ≤ Ckuk for any u ∈ B. For suppose not. Then there exists a sequence of elements u1 , u2 , . . . of B for which |`(uj )|/kuj k → ∞. Setting vj = uj /`(uj ) we then have vj → 0 but |`(vj )| = 1 6→ 0, so ` can not be continuous. Conversely, if ` is bounded by C then |`(uj ) − `(u)| = |`(uj − u)| ≤ Ckuj − uk → 0 if uj → u, so a bounded linear form is continuous. The smallest possible bound of a linear form ` is called the norm of `, denoted k`k. It is easy to see that provided with this norm B 0 is complete, so the dual of a Banach space is a Banach space (Exercise 3.8). A familiar example is given by the space Lp (Ω, µ) for 1 ≤ p < ∞, where Ω is a domain in Rn and µ a Radon measure defined in Ω. The dual of this space is Lq (Ω, µ), where q is the conjugate exponent to p, in the sense that p1 + 1q = 1. A simple example of a bounded linear form on a Hilbert space H is `(u) = hu, vi, where v is some fixed element of

3. HILBERT SPACE

19

H. By Cauchy-Schwarz’ inequality |`(u)| ≤ kvkkuk so k`k ≤ kvk. But `(v) = kvk2 so actually k`k = kvk. The following theorem, which has far-reaching consequences for many applications to analysis, says that this is the only kind of bounded linear form there is on a Hilbert space. In other words, the theorem allows us to identify the dual of a Hilbert space with the space itself. Theorem 3.8 (Riesz’ representation theorem). For any bounded linear form ` on H there is a unique element v ∈ H such that `(u) = hu, vi for all u ∈ H. The norm of ` is then k`k = kvk. Proof. The uniqueness of v is clear, since the difference of two possible choices of v must be orthogonal to all of H (for example to itself). If `(u) = 0 for all u then we may take v = 0. Otherwise we set M = {u ∈ H | `(u) = 0} which is obviously linear because ` is, and closed since ` is continuous. Since M is not all of H it has a normal w 6= 0 by Lemma 3.6, and we may assume kwk = 1. If now u is arbitrary in H we put u1 = u − (`(u)/`(w))w so that `(u1 ) = `(u) − `(u) = 0, i.e., u1 ∈ M so hu1 , wi = 0. Hence hu, wi = (`(u)/`(w))hw, wi = `(u)/`(w) so `(u) = hu, vi where v = `(w)w. We have already proved that k`k = kvk. ¤ So far we have tacitly assumed that convergence in a Hilbert space means convergence in norm, i.e., uj → u means kuj − uk → 0. This is called strong convergence; one writes s-lim uj = u or uj → u. There is also another notion of convergence which is very important. By definition uj tends to u weakly, in symbols w-lim uj = u or uj * u, if huj , vi → hu, vi for every v ∈ H. It is obvious that strong convergence implies weak convergence to the same limit (the scalar product is continuous in its arguments by Cauchy-Schwarz), but the converse is not true (Exercise 3.9). We have the following important theorem. Theorem 3.9. Every bounded sequence in H has a weakly convergent subsequence. Conversely, every weakly convergent sequence is bounded. Proof. The first claim is a consequence of the weak∗ compactness of the unit ball of the dual of a Banach space. Since we do not want to assume knowledge of this, we will give a direct proof. To this end, suppose v1 , v2 , . . . is the given sequence, bounded by C, and let e1 , e2 , . . . be a complete orthonormal sequence in H. The numerical sequence {hvj , e1 i}∞ j=1 is then bounded and so has a convergent subsequence, corresponding to a subsequence {v1j }∞ j=1 of the v:s, by the Bolzano-Weierstrass theorem. The numerical sequence {hv1j , e2 i}∞ j=1 is again bounded, so it has a convergent subsequence, corresponding ∞ to a subsequence {v2j }∞ j=1 of {v1j }j=1 . Proceeding in this manner we k = 1, 2, . . . , each element of get a sequence of sequences {vkj }∞ j=1 , which is a subsequence of those preceding it, and with the property that

20

3. HILBERT SPACE

vˆn =P limj→∞ hvnj , en i exists. I claim that {vjj }∞ j=1 converges weakly to ∞ v= vˆn en . Note that {hvjj , en i}j=1 converges to vˆn since it is a subsePN quence of {hvnj , en i}∞ vn |2 ≤ C 2 j=1 from j = n on. Furthermore n=1 |ˆ PN 2 for all N since it is the limit as j → ∞ of n=1 |hvN j , en i| which by inequality is bounded by kvN j k2 ≤ C 2 . It follows that P∞Bessel’s 2 vn | ≤ C 2 so that v is actually an element n=1 |ˆ P of H. To show the weak convergence, let u = uˆn en be arbitrary in H. Suppose ε > 0 given arbitrarily. Writing u = u0 + u00 where P N u0 = n=1 uˆn en we may now choose N so large that ku00 k < ε so that |hvjj , u00 i| < Cε. Furthermore |hv, u00 i| < Cε and hvjj , u0 i → hv, u0 i so limj→∞ |hvjj , ui − hv, ui| ≤ 2Cε. Since ε > 0 is arbitrary the weak convergence follows. The converse is an immediate consequence of the Banach-Steinhaus principle of uniform boundedness. Theorem 3.10 (Banach-Steinhaus). Let `1 , `2 , . . . be a sequence of bounded linear forms on a Banach space B which is pointwise bounded, i.e., such that for each u ∈ B the sequence `1 (u), `2 (u), . . . is bounded. Then `1 , `2 , . . . is uniformly bounded, i.e., there is a constant C such that |`j (u)| ≤ Ckuk for every u ∈ B and j = 1, 2, . . . . Assuming Theorem 3.10 (for a proof, see Appendix A), we can complete the proof of Theorem 3.9, since a weakly convergent sequence v1 , v2 , . . . can be identified with a sequence of linear forms `1 , `2 , . . . by setting `j (u) = hu, vj i. Since a convergent sequence of numbers is bounded it follows that we have a pointwise bounded sequence of linear functionals. By Theorem 3.10 there is a constant C such that |hu, vj i| ≤ Ckuk for every u ∈ H and j = 1, 2, . . . . In particular, setting u = vj gives kvj k ≤ C for every j. ¤

EXERCISES FOR CHAPTER 3

21

Exercises for Chapter 3 Exercise 3.1. Prove the completeness of `2 ! Hint: Given a Cauchy sequence show first that each coordinate converges. Exercise 3.2. Prove that any Hilbert space is isometrically isomorphic to `2 , i.e., there is a bijective (one-to-one and onto) linear map H 3 u 7→ uˆ ∈ `2 such that hu, vi = hˆ u, vˆi for any u and v in H. Exercise 3.3. Suppose L is a linear space with norm k·k which satisfies parallelogram identity for all u, v ∈ L. Show that hu, vi = P3 the 1 k k 2 k=0 i ku + i vk is a scalar product on L. 4 Hint: Show first that hu, ui = kuk2 , that hv, ui = hu, vi and that hiu, vi = ihu, vi. Then show that hu + v, wi − hu, wi − hv, wi = 0 and from that hλu, vi = λhu, vi for any rational number λ. Finally use continuity. Exercise 3.4. Show that the semi-norm on the space Lc defined in the text is well-defined, i.e., that the limit limkuj k exists for any element (u1 , u2 , . . . ) ∈ Lc . Then verify that H = Lc /Nc can be given a norm under which it is complete, that L may be viewed as isometrically and densely embedded in H, and that H is a Euclidean space (a space with scalar product) if L is. Exercise 3.5. Show that if M and N are closed, orthogonal subspaces of H, then also M ⊕ N is closed. Exercise 3.6. Show that is A ⊂ H, then A⊥ is a closed linear subspace of H, that A ⊂ B implies B ⊥ ⊂ A⊥ and that A ⊂ (A⊥ )⊥ . Exercise 3.7. Verify that M ⊥⊥ = M for any closed linear subspace M of H, and also that for an arbitrary set A ⊂ H the smallest closed linear subspace containing A is A⊥⊥ . Exercise 3.8. Show that a bounded linear form on a Banach space B has a least bound, which is a norm on B 0 , and that B 0 is complete under this norm. Exercise 3.9. Show that an orthonormal sequence does not converge strongly to anything but tends weakly to 0. Conclude that if in a Euclidean space every weakly convergent sequence is convergent, then the space is finite-dimensional. Hint: Show√that the distance between two arbitrary elements in the sequence is 2 and use Bessel’s inequality to show weak convergence to 0.

CHAPTER 4

Operators A bounded linear operator from a Banach space B1 to another Banach space B2 is a linear mapping T : B1 → B2 such that for some constant C we have kT uk2 ≤ Ckuk1 for every u ∈ B1 . The smallest such constant C is called the norm of the operator T and denoted by kT k. Like in the discussion of linear forms in the last chapter it follows that the boundedness of T is equivalent to continuity, in the sense that kT uj − T uk2 → 0 if kuj − uk1 → 0 (Exercise 4.1). If B1 = B2 = B one says that T is an operator on B. The operator-norm defined above has the following properties (Here T : B1 → B2 and S are bounded linear operators, and B1 , B2 and B3 Banach spaces). (1) kT k ≥ 0, equality only if T = 0, (2) kλT k = |λ|kT k for any λ ∈ C, (3) kS + T k ≤ kSk + kT k if S : B1 → B2 , (4) kST k ≤ kSkkT k if S : B2 → B3 . We leave the proof to the reader (Exercise 4.1). Thus we have made the set of bounded operators from B1 to B2 into a normed space B(B1 , B2 ). In fact, B(B1 , B2 ) is a Banach space (Exercise 4.2). We write B(B) for the bounded operators on B. Because of the property (4) B(B) is called a Banach algebra. Now let H1 and H2 be Hilbert spaces. Then every bounded operator T : H1 → H2 has an adjoint 1 T ∗ : H2 → H1 defined as follows. Consider a fixed element v ∈ H2 and the linear form H1 3 u 7→ hT u, vi2 which is obviously bounded by kT kkvk2 . By the Riesz’ representation theorem there is therefore a unique element v ∗ ∈ H1 , such that hT u, vi2 = hu, v ∗ i1 . By the uniqueness, and since hT u, vi2 depends anti-linearly on v, it follows that T ∗ : v 7→ v ∗ is a linear operator from H2 to H1 . It is also bounded, since kv ∗ k21 = hT v ∗ , vi2 ≤ kT kkv ∗ k1 kvk2 , so that kT ∗ k ≤ kT k. The adjoint has the following properties. Proposition 4.1. The adjoint operation B(H1 , H2 ) 3 T 7→ T ∗ ∈ B(H2 , H1 ) has the properties: (1) (T1 + T2 )∗ = T1∗ + T2∗ , (2) (λT )∗ = λT ∗ for any complex number λ, (3) (T2 T1 )∗ = T1∗ T2∗ if T2 : H2 → H3 , (4) T ∗∗ = T , 1Also

operators between general Banach spaces, or even more general topological vector spaces, have adjoints, but they will not concern us here. 23

24

4. OPERATORS

(5) kT ∗ k = kT k, (6) kT ∗ T k = kT k2 . Proof. The first four properties are very easy to show and are left as exercises for the reader. To prove (5), note that we already have shown that kT ∗ k ≤ kT k and combining this with (4) gives the opposite inequality. Use of (5) shows that kT ∗ T k ≤ kT ∗ kkT k = kT k2 and the opposite inequality follows from kT uk22 = hT ∗ T u, ui1 ≤ kT ∗ T uk1 kuk1 ≤ kT ∗ T kkuk21 so (6) follows. The reader is asked to fill in the details missing in the proof (Exercise 4.3). ¤ If H1 = H2 = H3 = H, then the properties (1)–(4) above are the properties required for the star operation to be called an involution on the algebra B(H), and a Banach algebra with an involution, also satisfying (5) and (6), is called a B ∗ algebra. There are no less than three different useful notions of convergence for operators in B(H1 , H2 ). We say that Tj tends to T • uniformly if kTj − T k → 0, denoted by Tj ⇒ T , • strongly if kTj u−T uk2 → 0 for every u ∈ H1 , denoted Tj → T , and • weakly if hTj u, vi2 → hT u, vi2 for all u ∈ H1 and v ∈ H2 , denoted Tj * T . It is clear that uniform convergence implies strong convergence and strong convergence implies weak convergence, and it is also easy to see that neither of these implications can be reversed. Of particular interest are so called projection operators. A projection P on H is an operator in B(H) for which P 2 = P . If P is a projection then so is I − P , where I is the identity on H, since (I − P )(I − P ) = I − P − P + P 2 = I − P . Setting M = P H and N = (I − P )H it follows that M is the null-space of I − P since M clearly consist of those elements u ∈ H for which P u = u. Similarly N is the null-space of P . Since P and I − P are bounded (i.e., continuous) it therefore follows that M and N are closed. It also follows ˙ of M and N is H (this that M ∩ N = {0} and the direct sum M +N means that any element of H can be written uniquely as u + v with u ∈ M and v ∈ N ). Conversely, if M and N are linear subspaces of H, ˙ = H, then we may define a linear map P satM ∩ N = {0} and M +N 2 isfying P = P by setting P w = u if w = u + v with u ∈ M and v ∈ N . As we have seen P can not be bounded unless M and N are closed. There is a converse to this: If M and N are closed, then P is bounded. This follows immediately from the closed graph theorem (Exercise 4.4). In the case when the projection P , and thus also I − P , is bounded, the direct sum M u N is called topological. If M and N happen to be orthogonal subspaces P is called an orthogonal projection. Obviously N = M ⊥ then, since the direct sum of M and N is all of H. We have the following characterization of orthogonal projections.

4. OPERATORS

25

Proposition 4.2. A projection P is orthogonal if and only if it satisfies P ∗ = P . Proof. If P ∗ = P and u ∈ M , v ∈ N , then hu, vi = hP u, vi = hu, P ∗ vi = hu, P vi = hu, 0i = 0 so M and N are orthogonal. Conversely, suppose M and N orthogonal. For arbitrary u, v ∈ H we then have hP u, vi = hP u, P vi + hP u, (I − P )vi = hP u, P vi so that also hu, P vi = hP u, P vi. Hence hP u, vi = hu, P vi holds generally, i.e., P∗ = P. ¤ An operator T for which T ∗ = T is called selfadjoint. Hence an orthogonal projection is the same as a selfadjoint projection. We will have much more to say about selfadjoint operators in a more general context later. Another class of operators of great interest are the unitary operators. This is an operator U : H1 → H2 for which U ∗ = U −1 . Since hU u, U vi2 = hU ∗ U u, vi1 = hu, vi1 the operator U preserves the scalar product; such an operator is called isometric. If U is isometric we have hu, vi1 = hU u, U vi2 = hU ∗ U u, vi1 , so that U ∗ is a left inverse of U for any isometric operator. If dim H1 = dim H2 < ∞, then a left inverse of a linear operator is also a right inverse, so in this case isometric and unitary (orthogonal in the case of a real space) are the same thing. If dim H1 6= dim H2 or both spaces are infinitedimensional, however, this is not the case. For example, in the space `2 we may define U (x1 , x2 , . . . ) = (0, x1 , x2 , . . . ), which is obviously isometric (this is a so called shift operator ), but the vector (1, 0, 0, . . . ) is not the image of anything, so the operator is not unitary. Its adjoint is U ∗ (x1 , x2 , . . . ) = (x2 , x3 , . . . ), which is only a partial isometry, namely an isometry on the vectors orthogonal to (1, 0, 0, . . . ). See also Exercise 4.8. It is never possible to interpret a differential operator as a bounded operator on some Hilbert space of functions. We therefore need to discuss unbounded operators as well. Similarly, we will need to discuss operators that are not defined on all of H. Thus we now consider a linear operator T : D(T ) → H2 , where the domain D(T ) of T is some linear subset of H1 . T is not supposed bounded. Another such operator S is said to be an extension of T if D(T ) ⊂ D(S) and Su = T u for every u ∈ D(T ). We then write T ⊂ S. We must discuss the concept of adjoint. The form u 7→ hT u, vi2 is, for fixed v ∈ H2 , only defined for u ∈ D(T ), and though linear not necessarily bounded, so there may not be any v ∗ ∈ H1 such that hT u, vi2 = hu, v ∗ i1 for all u ∈ D(T ). Even if there is, it may not be uniquely determined, since if w ∈ D(T )⊥ we could replace v ∗ by v ∗ + w with no change in hu, v ∗ i. We therefore make the basic assumption that D(T )⊥ = {0}, i.e., D(T ) is dense in H1 . T is then said to be densely defined 2. In this case v ∗ ∈ H1 is 2We

ter 9.

will discuss the case of an operator which is not densely defined in Chap-

26

4. OPERATORS

clearly uniquely determined by v ∈ H2 , if it exists. It is also obvious that v ∗ depends linearly on v, so we define D(T ∗ ) to be those v ∈ H2 for which we can find a v ∗ ∈ H1 , and set T ∗ v = v ∗ . There is no reason to expect the adjoint T ∗ to be densely defined. In fact, we may have D(T ∗ ) = {0}, so T ∗ may not itself have an adjoint. To understand this rather confusing situation it turns out to be useful to consider graphs of operators. The graph of T is the set GT = {(u, T u) | u ∈ D(T )}. This set is clearly linear and may be considered a linear subset of the orthogonal direct sum H1 ⊕ H2 , consisting of all pairs (u1 , u2 ) with u1 ∈ H1 and u2 ∈ H2 with the natural linear operations and provided with the scalar product h(u1 , u2 ), (v1 , v2 )i = hu1 , v1 i1 + hu2 , v2 i2 . This makes H1 ⊕ H2 into a Hilbert space (Exercise 4.6). We now define the boundary operator U : H1 ⊕ H2 → H2 ⊕ H1 by U(u1 , u2 ) = (−iu2 , iu1 ) (the terminology is explained in Chapter 9). It is clear that U is isometric and surjective (onto H2 ⊕ H1 ). It follows that U is unitary. If H1 = H2 = H it is clear that U is selfadjoint and involutary (i.e., U 2 is the identity). Now put (4.1)

(GT )∗ := U((H1 ⊕ H2 ) ª GT ) = (H2 ⊕ H1 ) ª UGT .

The second equality is left to the reader to verify who should also verify that (GT )∗ is a graph of an operator (i.e., the second component of each element in (GT )∗ is uniquely determined by the first) if and only if T is densely defined. If T is densely defined we now define T ∗ to be the operator whose graph is (GT )∗ . This means that T ∗ is the operator whose graph consists of all pairs (v, v ∗ ) ∈ H2 ⊕ H1 such that hT u, vi2 = hu, v ∗ i1 for all u ∈ D(T ), i.e., our original definition. An immediate consequence of (4.1) is that T ⊂ S implies S ∗ ⊂ T ∗ . We say that an operator is closed if its graph is closed as a subspace of H1 ⊕ H2 . This is an important property; in many ways the property of being closed is almost as good as being bounded. An everywhere defined operator is actually closed if and only if it is bounded (Exercise 4.7). It is clear that all adjoints, having graphs that are orthogonal complements, are closed. Not all operators are closeable, i.e., have closed extensions; for this is required that the closure GT of GT is a graph. But it is clear from (4.1) that the closure of the graph is (GT ∗ )∗ . So, we have proved the following proposition. Proposition 4.3. Suppose T is a densely defined operator in a Hilbert space H. Then T is closeable if and only if the adjoint T ∗ is densely defined. The smallest closed extension (the closure) T of T is then T ∗∗ . The proof is left to Exercise 4.9. Note that if T is closed, its domain D(T ) becomes a Hilbert space if provided by the scalar product hu, viT = hu, vi1 + hT u, T vi2 .

4. OPERATORS

27

In the rest of this chapter we assume that H1 = H2 = H. A densely defined operator T is then said to be symmetric if T ⊂ T ∗ . In other words, if hT u, vi = hu, T vi for all u, v ∈ D(T ). Thus hT u, ui is always real for a symmetric operator. It therefore makes sense to say that a symmetric operator is positive if hT u, ui ≥ 0 for all u ∈ D(T ). A densely defined symmetric operator is always closeable since T ∗ is automatically densely defined, being an extension of T . If actually T = T ∗ the operator is said to be selfadjoint. This is an important property because these are the operators for which we will prove the spectral theorem. In practice it is usually quite easy to see if an operator is symmetric, but much more difficult to decide whether a symmetric operator is selfadjoint. When one wants to interpret a differential operator as a Hilbert space operator one has to choose a domain of definition; in many cases it is clear how one may choose a dense domain so that the operator becomes symmetric. With luck this operator may have a selfadjoint closure3, in which case the operator is said to be essentially selfadjoint. Otherwise, given a symmetric T , one will look for selfadjoint extensions of T . If S is a symmetric extension of T , we get T ⊂ S ⊂ S ∗ ⊂ T ∗ so that any selfadjoint extension of T is a restriction of the adjoint T ∗ . There is now obviously a need for a theory of symmetric extensions of a symmetric operator. We will postpone the discussion of this until Chapter 9. Right now we will instead study some very simple, but typical, examples. d Example 4.4. Consider the differential operator dx on some open interval I. We want to interpret it as a densely defined operator in the Hilbert space L2 (I) and so must choose a suitable domain. A convenient choice, which would work for any differential operator with smooth coefficients, is the set C0∞ (I) of infinitely differentiable functions on I with compact support, i.e., each function is 0 outside some compact subset of I. It is well known that C0∞ (I) is dense in L2 (I). Let us denote the corresponding operator T0 ; it is usually called the d minimal operator for dx . Sometimes it is the closure of this operator which is called the minimal operator, but this will make no difference to the calculations in the sequel. We now need to calculate the adjoint of the minimal operator. Let v ∈R D(T0∗ ). RThis means that there is an element v ∗ ∈ L2 (I) such that I ϕ0 v = I ϕv ∗ for all ϕ ∈ C0∞ (I) and that T0∗ v = v ∗ . InR R 0R ∗ = − tegrating by parts we have ϕv (ϕ v ∗ ) since the boundary I I R ∗ terms vanish. Here v denotes any integral function of v ∗ . Thus we R R have I ϕ0 (v + v ∗ ) = 0 for all ϕ ∈ C0∞ (I). We need the following lemma.

3This

is the same as T ∗ being selfadjoint. Show this!

28

4. OPERATORS

Lemma 4.5 (du Bois Reymond). Suppose u is locally square integrable on R, Ri.e., u ∈ L2 (I) for every bounded real interval I. Also suppose that uϕ0 = 0 for every ϕ ∈ C0∞ (R). Then u is (almost everywhere) equal to a constant. Assuming the truth of the lemma for the moment it follows that, choosing the appropriate representative in the equivalence class of v, R v + v ∗ is constant. Hence v is locally absolutely continuous with derivative −v ∗ . It follows that D(T0∗ ) consists of functions in L2 (I) which are locally absolutely continuous in I with derivative in L2 (I), ∗ and that T0∗ v = −v 0 . Conversely, all such functions R 0 areR in D(T0 ), ∗ as follows immediately by partial integration in I ϕ v = I ϕv . The d operator T0∗ is therefore also a differential operator, generated by − dx . d d The differential operator − dx is called the formal adjoint of dx and d the operator T0∗ is called the maximal operator belonging to − dx . In the same way any linear differential operator (with sufficiently smooth coefficients) has a formal adjoint, obtained by integration by parts. For ordinary differential operators with smooth coefficients one can always calculate adjoints in essentially the way we just did; for partial differential operators matters are more subtle and one needs to use the language of distribution theory. R Proof of Lemma 4.5. Let ψ ∈ CR0∞ (R) and assume that ψ = 1. Rx ∞ Given ϕ ∈ C0∞ (R) we put ϕ0 (x) = ψ(x) −∞ ϕ and Φ(x) = −∞ (ϕ−ϕ0 ). It is clear that compact support R∞ R ∞Φ is infinitely differentiable. It alsoRhas ∞ 0 0 (why?), so −∞ uΦ = 0 by assumption. But −∞ uΦ = −∞ uϕ − R∞ R∞ R∞ R∞ uψ −∞ ϕ so that −∞ (u − K)ϕ = 0 where K = −∞ uψ does not −∞ depend on ϕ. Since C0∞ (R) is dense in L2 (R) this proves that u = K a.e., so that u is constant. ¤ For the minimal operator of a differential operator to be symmetric it is clear that the differential operator has to be formally symmetric, i.e., the formal adjoint has to coincide with the original operator. In Example 4.4 D(T0 ) ⊂ D(T0∗ ) but there is a minus sign preventing T0 from being symmetric. However, it is clear that had we started d instead, then the minimal operawith the differential operator −i dx tor would have been symmetric, but the domains of the minimal and maximal operators unchanged. One may then ask for possible selfadjoint extensions of the minimal operator, or equivalently for selfadjoint restrictions of the maximal operator. d on the Example 4.6. Let T1 be the maximal operator of −i dx Rb Rb interval I. Let u, v ∈ D(T1 ) and a, b ∈ I. Then a T1 uv − a uT1 v = Rb −i a (u0 v + uv 0 ) = iu(a)v(a) − iu(b)v(b). Since u, v, T1 u and T1 v are all in L2 (I) the limit of uv exists in both endpoints of I. Consider the case I = R. Since |u(x)|2 has limits as x → ±∞ and is integrable, the limits

4. OPERATORS

29

must both be 0. Hence hT1 u, vi − hu, T1 vi = 0 for any u, v ∈ D(T1 ), so the maximal operator is symmetric and therefore selfadjoint (how does this follow?). It also follows that the maximal operator is the closure of the minimal operator so the minimal operator is essentially selfadjoint. Example 4.7. Consider the same operator as in Example 4.6 but for the interval (0, ∞). If u ∈ D(T1 ) we obtain hT1 u, ui − hu, T1 ui = i|u(0)|2 . To have a symmetric restriction of T1 we must therefore require u(0) = 0, and with this restriction on the domain of T1 we obtain a maximal symmetric operator T . If now u ∈ D(T ) and v ∈ D(T1 ) we obtain hT u, vi−hu, T1 vi = iu(0)v(0) = 0 so that T ∗ = T1 . T is therefore not selfadjoint so no matter how we choose the domain the differential d operator −i dx , though formally symmetric, will not be selfadjoint in d 2 has no selfadjoint realization in L2 (0, ∞). L (0, ∞). One says that −i dx Example 4.8. We finally consider the operator of Example 4.6 for the interval (−π, π). We now have (4.2)

hT1 u, vi − hu, T1 vi = −i(u(π)v(π) − u(−π)v(−π)).

In particular, for u = v it follows that for u to be in the domain of a symmetric restriction of T1 we must require |u(π)| = |u(−π)| so that u satisfies the boundary condition u(π) = eiθ u(−π) for some real θ. From (4.2) then follows that if v is in the domain of the adjoint, then v will have to satisfy the same boundary condition. On the other hand, if we impose this condition, then the resulting operator will be selfadjoint (because its adjoint will be symmetric). It follows that restricting the domain of T1 by such a boundary condition is exactly what is required to obtain a selfadjoint restriction. Each θ in [0, 2π) gives a different selfadjoint realization, but there are no others. The examples show that there may be a unique selfadjoint realization of our formally symmetric differential operator, none at all, or infinitely many depending on circumstances. It can be a very difficult problem to decide which of these possibilities occur in a given case. In particular, much effort has been devoted to decide whether a given differential operator on a given domain has a unique selfadjoint realization.

30

4. OPERATORS

Exercises for Chapter 4 Exercise 4.1. Prove that boundedness is equivalent to continuity for a linear operator between normed spaces. Then prove the properties of the operator norm listed at the beginning of the chapter. Exercise 4.2. Suppose B1 and B2 are Banach spaces. Show that so is B(B1 , B2 ). Exercise 4.3. Fill in the details of the proof of Proposition 4.1. Exercise 4.4. Show that if M and N are closed subspaces of H ˙ = H, then the corresponding projections with M ∩N = {0} and M +N onto M and N are bounded operators. Hint: The closed graph theorem! Exercise 4.5. Show that a non-trivial (i.e., the range is not {0}) projection is orthogonal if and only if its operator norm is 1. Exercise 4.6. Suppose H1 and H2 are Hilbert spaces. Show that the orthogonal direct sum H1 ⊕ H2 is also a Hilbert space. Exercise 4.7. Show that a bounded, everywhere defined operator is automatically closed. Conversely, that an everywhere defined, closed operator is bounded. Hint: The closed graph theorem! Exercise 4.8. Show that if U is unitary, then all eigen-values λ of U have absolute value |λ| = 1. Also show that if e1 and e2 are eigenvectors corresponding to eigen-values λ1 and λ2 respectively, then e1 and e2 are orthogonal if λ1 6= λ2 . Exercise 4.9. Show that if T is densely defined and closeable, then the closure of T is T ∗∗ .

CHAPTER 5

Resolvents We now consider a closed, densely defined operator T in the Hilbert space H. We define the solvability and deficiency spaces of T at λ by Sλ = {u ∈ H | (T − λ)v = u for some v ∈ D(T )} Dλ = {u ∈ D(T ∗ ) | T ∗ u = λu}. The following basic lemma is valid. Lemma 5.1. Supposed T is closed and densely defined. Then (1) Dλ = H ª Sλ . (2) If T is symmetric and Im λ 6= 0, then Sλ is closed and H = Sλ ⊕ Dλ (3) If T is selfadjoint and Im λ 6= 0, then (T − λ)v = u is uniquely solvable for any u ∈ H ( i.e., Sλ = H), T has no non-real eigen-values ( i.e., Dλ = {0}), and kvk ≤ |Im1 λ| kuk. Proof. Any element of the graph of T is of the form (v, λv + u), where u ∈ Sλ . To see this, simply put u = T v − λv for any v ∈ D(T ). Now hT v, wi − hv, λwi = hu + λv, wi − hv, λwi = hu, wi, so it follows that (w, λw) ∈ GT ∗ , i.e., w ∈ Dλ , if and only if w is orthogonal to Sλ . This proves (1). If T is symmetric and (v, λv+u) ∈ GT , then hλv+u, vi = hv, λv+ui, i.e., Im λkvk2 = Imhv, ui, which is ≤ kvkkuk by Cauchy-Schwarz’ inequality. If Im λ 6= 0 we obtain kvk ≤ |Im1 λ| kuk, so that v is uniquely determined by u; in particular T has no non-real eigen-values. Furthermore, suppose that u1 , u2 , . . . is a sequence in Sλ converging to u, and that (vj , λvj + uj ) ∈ GT . Then v1 , v2 , . . . is also a Cauchy sequence, since kvj − vk k ≤ |Im1 λ| kuj − uk k. Thus vj tends to some limit v, and since T is closed we have (v, λv + u) ∈ GT . Hence u ∈ Sλ , so that Sλ is closed and (2) follows. Finally, if T is self-adjoint, then T ∗ = T is symmetric so it has no non-real eigen-values. If Im λ 6= 0 it follows that Dλ = {0} so that (3) follows and the proof is complete. ¤ In the rest of this chapter we assume that T is a selfadjoint operator. We define the resolvent set of T as ρ(T ) = {λ ∈ C | T − λ has a bounded, everywhere defined inverse} , 31

32

5. RESOLVENTS

and the spectrum σ(T ) of T as the complement of ρ(T ). By Lemma 5.1.3 the spectrum is a subset of the real line. For every λ ∈ ρ(T ) we now define the resolvent of T at λ as the operator Rλ = (T − λ)−1 . The resolvent has the following properties. Theorem 5.2. The resolvent of a selfadjoint operator T has the properties: (1) kRλ k ≤ 1/|Im λ| if Im λ 6= 0. (2) (Rλ )∗ = Rλ for λ ∈ ρ(T ). (3) Rλ − Rµ = (λ − µ)Rλ Rµ for λ and µ ∈ ρ(T ). The last statement is called the (first) resolvent relation. Proof. The first claim is simply a re-statement of Lemma 5.1.3. Note that all elements of GT are of the form (Rλ u, λRλ u + u). Now w∗ = (Rλ )∗ w precisely if hRλ u, wi = hu, w∗ i for all u ∈ H. Adding λhRλ u, w∗ i to both sides we obtain hRλ u, λw∗ + wi = hλRλ u + u, w∗ i, so that (w∗ , λw∗ + w) ∈ GT , i.e., w∗ = Rλ w. This proves (2). Finally, suppose (w, µw+u) ∈ GT . Since GT is linear it follows that (v, λv+u) ∈ GT if and only if (v, λv+u)−(w, µw+u) = (v−w, λ(v−w)+(λ−µ)w) ∈ GT . But this means exactly that (3) holds. ¤ Theorem 5.3. The resolvent set ρ(T ) is open, and the function λ 7→ Rλ is analytic in the uniform operator topology as a B(H)-valued function. This means (by definition) that Rλ can be expanded in a power series with respect to λ around any point in ρ(T ) and that the series converges in operator norm in a neighborhood of the point. In fact, if µ ∈ ρ(T ), then λ ∈ ρ(T ) for |λ − µ| < 1/kRµ k and ∞ X Rλ = (λ − µ)k Rµk+1 for |λ − µ| < 1/kRµ k . k=0

Finally, the function ρ(T ) 3 λ 7→ hRλ u, vi is analytic for all u, v ∈ H, and for u = v it maps the upper and lower half-planes into themselves. Proof. The series is norm convergent if |λ − µ| < 1/kRµ k since k(λ − µ)k Rµk+1 k ≤ kRµ k(|λ − µ|kRµ k)k , which is a term in a convergent geometric series. Writing T − λ = T − µ − (λ − µ) and applying this to the series from the left and right, one immediately sees that the series represents the inverse of T − λ. We have verified the formula for Rλ and it also follows that ρ(T ) is open. Now by Theorem 5.2 we have 2i ImhRλ u, ui = hRλ u, ui − hu, Rλ ui = h(Rλ − Rλ )u, ui = 2i Im λhRλ Rλ u, ui = 2i Im λkRλ uk2 . It follows that ImhRλ u, ui has the same sign as Im λ. The analyticity of hRλ u, vi follows since we have a power series expansion of it around any point in ρ(T ), by the series for Rλ . Alternatively, from Theorem 5.2.3 it easily follows that d hRλ u, vi = hRλ2 u, vi (Exercise 5.1). ¤ dλ

EXERCISES FOR CHAPTER 5

33

Analytic functions that map the upper and lower halfplanes into themselves have particularly nice properties. Our proof of the general spectral theorem will be based on the fact that hRλ u, ui is such a function, so we will make a detailed study of them in the next chapter. That ρ(T ) is open means of course that the spectrum is always a closed subset of R. It is customary to divide the spectrum into (at least) two disjoint subsets, the point spectrum σp (T ) and the continuous spectrum σc (T ), defined as follows. σp (T ) = {λ ∈ C | T − λ is not one-to-one} σc (T ) = σ(T ) \ σp (T ). This means that the point spectrum consists of the eigen-values of T , and the continuous spectrum of those λ for which Sλ is dense in H but not closed. This follows since (T − λ)−1 is automatically bounded if Sλ = H, by the closed graph theorem (Exercise 5.2). For nonselfadjoint operators there is a further possibility; one may have Sλ non-dense even if λ is not an eigenvalue. Such values of λ constitute the residual spectrum which by Lemma 5.1 is empty for selfadjoint operators. An eigenvalue for a selfadjoint operator is said to have finite multiplicity if the eigenspace is finite-dimensional. Removing from the spectrum all isolated points which are eigenvalues of finite multiplicity leaves one with the essential spectrum. The name comes from the fact that the essential spectrum is quite stable under perturbations (changes) of the operator T , but we will not discuss such matters here. Exercises for Chapter 5 Exercise 5.1. Suppose that Rλ is the resolvent of a self-adjoint operator T in a Hilbert space H. Show directly from Theorem 5.2.3 that if u, v ∈ H, then λ 7→ hRλ u, vi is analytic (has a complex derivative) for λ ∈ ρ(T ), and find an expression for the derivative. Also show that if u ∈ H, then λ 7→ hRλ u, ui is increasing in every point of ρ ∩ R. Exercise 5.2. Show that if T is a closed operator with Sλ = H and λ ∈ / σp (T ), then λ ∈ ρ(T ). Hint: The closed graph theorem! Exercise 5.3. Show that if T is a self-adjoint operator, then U = (T +i)(T −i)−1 = I +2iR−i is unitary. Conversely, if U is unitary and 1 is not an eigen-value, then T = i(U + I)(U − I)−1 is selfadjoint. What can one do if 1 is an eigen-value? This transform, reminiscent of a M¨obius transform, is called the Cayley transform and was the basis for von Neumann’s proof of the spectral theorem for unbounded operators.

CHAPTER 6

Nevanlinna functions Our proof of the spectral theorem is based on the following representation theorem. Theorem 6.1. Suppose F is analytic in C \ R, F (λ) = F (λ), and F maps each of the upper and lower half-planes into themselves. Then there exists unique, left-continuous, increasing function ρ with R ∞ adρ(t) ρ(0) = 0 and −∞ 1+t2 < ∞, and unique real constants α and β ≥ 0, such that ¶ Z∞ µ 1 t (6.1) F (λ) = α + βλ + − dρ(t), t − λ 1 + t2 −∞

where the integral is absolutely convergent. For the meaning of such an integral, see Appendix B. Functions F with the properties in the theorem are usually called Nevanlinna, Herglotz or Pick functions. I am not sure who first proved the theorem, but results of this type play an important role in the classical book Eindeutige analytische Funktionen by Rolf Nevanlinna (1930). We will tackle the proof through a sequence of lemmas. Lemma 6.2 (H. A. Schwarz). Let G be analytic in the unit disk, and put u(R, θ) = Re G(Reiθ ). For |z| < R < 1 we then have: Zπ 1 Reiθ + z (6.2) G(z) = i Im G(0) + u(R, θ) dθ. 2π Reiθ − z −π

Proof. According to Poisson’s integral formula (see e.g. Chapter 6 of Ahlfors: Complex Analysis (McGraw-Hill 1966)), we have Zπ 2 R − |z|2 1 u(R, θ) dθ . Re G(z) = 2π |Reiθ − z|2 −π

The integral here is easily seen to be the real part of the integral in (6.2). The latter is obviously analytic in z for |z| < R < 1, so the two sides of (6.2) can only differ by an imaginary constant. However, for z = 0 the integral is real, so (6.2) follows. ¤ The formula (6.2) is not applicable for R = 1, since we do not know whether Re G has reasonable boundary values on the unit circle. 35

36

6. NEVANLINNA FUNCTIONS

However, if one assumes that Re G ≥ 0 the boundary values exist at least in the sense of measure, and one has the following theorem. Theorem 6.3 (Riesz-Herglotz). Let G be analytic in the unit circle with positive real part. Then there exists an increasing function σ on [0, 2π] such that Zπ iθ 1 e +z G(z) = i Im G(0) + dσ(θ) . 2π eiθ − z −π

With a suitable normalization the function σ will also be unique, but we will not use this. To prove Theorem 6.3 we need some kind of compactness result, so that we can obtain the theorem as a limiting case of Lemma 6.2. What is needed is weak∗ compactness in the dual of the continuous functions on a compact interval, provided with the maximum norm. This is the classical Helly theorem. Since we assume minimal knowledge of functional analysis we will give the classical proof. Lemma 6.4 (Helly). 1 (1) Suppose {ρj }∞ 1 is a uniformly bounded sequence of increasing functions on an interval I. Then there is a subsequence converging pointwise to an increasing function. (2) Suppose {ρj }∞ 1 is a uniformly bounded sequence of increasing functions on a compact interval I, converging pointwise to ρ. Then Z Z (6.3) f dρj → f dρ as j → ∞, I

I

for any function f continuous on I. Proof. Let r1 , r2 , . . . be a dense sequence in I, for example an enumeration of the rational numbers in I. By Bolzano-Weierstrass’ theo∞ rem we may choose a subsequence {ρ1j }∞ 1 of {ρj }1 so that ρ1j (r1 ) con∞ verges. Similarly, we may choose a subsequence {ρ2j }∞ 1 of {ρ1j }1 such that ρ2j (r2 ) converges; as a subsequence of ρ1j (r1 ) the sequence ρ2j (r1 ) still converges. Continuing in this fashion, we obtain a sequence of sequences {ρkj }∞ j=1 , k = 1, 2, . . . such that each sequence is a subsequence of those coming before it, and such that ρ(rn ) = limj→∞ ρkj (rn ) exists for n ≤ k. Thus ρjj (rn ) → ρ(rn ) as j → ∞ for every n, since ρjj (rn ) is a subsequence of ρnj (rn ) from j = n on. Clearly ρ is increasing, so if x ∈ I but 6= rn for all n, we may choose an increasing subsequence rjk , k = 1, 2, . . . , converging to x, and define ρ(x) = limk→∞ ρ(rjk ). Suppose x is a point of continuity of ρ. If rk < x < rn we get ρjj (rk ) − ρ(rn ) ≤ ρjj (x) − ρ(x) ≤ ρjj (rn ) − ρ(rk ). Given ε > 0 we may 1i.e.,

all the functions are bounded by a fixed constant

6. NEVANLINNA FUNCTIONS

37

choose k and n such that ρ(rn ) − ρ(rk ) < ε. We then obtain −ε ≤ lim (ρjj (x) − ρ(x)) ≤ lim (ρjj (x) − ρ(x)) ≤ ε . j→∞

j→∞

Hence {ρjj }∞ 1 converges pointwise to ρ, except possibly in points of discontinuity of ρ. But there are at most countably many such discontinuities, ρ being increasing. Hence repeating the trick of extracting subsequences, and then using the ‘diagonal’ sequence, we get a subsequence of the original sequence which converges everywhere in I. We now obtain (1). If f is the characteristic function of a compact interval whose endpoints are points of continuity for ρ and all ρj it is obvious that (6.3) holds. It follows that (6.3) holds if f is a stepfunction with all discontinuities at points where ρ and all ρj are continuous. If f is continuous and ε > 0 we may, by uniform continuity, choose such a stepfunction g so thatR supI |f − g| < ε. If C is a common bound for all ρj we then obtain | I (f − g) dρ| < 2Cε and similarly with ρ replaced by ρj . It R R follows that limj→∞ | I f dρj − I f dρ| ≤ 4Cε and since ε is arbitrary positive (2) follows. ¤ Proof of Theorem 6.3. According to Lemma 6.2 we have, for |z| < 1, Zπ iθ e +z 1 G(Rz) = i Im G(0) + dσR (θ) , 2π eiθ − z −π Rθ iϕ where σR (θ) = −π Re G(Re ) dϕ. Hence σR is increasing, ≥ 0 and bounded from above by σR (π). Now Re G is a harmonic function so it has the mean value property, which means that σR (π) = 2π Re G(0). This is independent of R, so by Helly’s theorem we may choose a sequence Rj ↑ 1 such that σR converges to an increasing function σ. Use of the second part of Helly’s theorem completes the proof. ¤ To prove the uniqueness of the function ρ of Theorem 6.1 we need the following simple, but important, lemma. Lemma 6.5 (Stieltjes’ inversion formula). Let ρ be complex-valued R∞ of locally bounded variation, and such that −∞ dρ(t) is absolutely cont2 +1 vergent. Suppose F (λ) is given by (6.1). Then if y < x are points of continuity of ρ we have 1 ρ(x) − ρ(y) = lim ε↓0 2πi

Zx (F (s + iε) − F (s − iε) ds y

1 = lim ε↓0 π

Zx Z∞ y −∞

ε dρ(t) ds . (t − s)2 + ε2

38

6. NEVANLINNA FUNCTIONS

Proof. By absolute convergence we may change the order of integration in the last integral. The inner integral is then easily calculated to be 1 (arctan((x − t)/ε) − arctan((y − t)/ε)). π This is bounded by 1, and also by a constant multiple of 1/t2 if ε is bounded (verify this!). Furthermore it converges pointwise to 0 outside [y, x], and to 1 in (y, x) (and to 21 for t = x and t = y). The theorem follows by dominated convergence. ¤ Proof of Theorem 6.1. The uniqueness of ρ follows immediately on applying the Stieltjes inversion formula to the imaginary part of (6.1) for λ = s + iε. We obtain (6.1) from the Riesz-Herglotz theorem by a change of maps the upper half plane bijectively variable. The mapping z = 1+iλ 1−iλ to the unit disk, so G(z) = −iF (λ) is defined for z in the unit disk and has positive real part. Applying Theorem 6.3 we obtain, after simplification, Zπ 1 1 + λ tan(θ/2) F (λ) = Re F (i) + dσ(θ) . 2π tan(θ/2) − λ −π

Setting t = tan(θ/2) maps the open interval (−π, π) onto the real axis. For θ = ±π the integrand equals λ, so any mass of σ at ±π gives rise to a term βλ with β ≥ 0. After the change of variable we get Z∞ 1 + tλ F (λ) = α + βλ + dτ (t) , t−λ −∞

where we have set α = Re F (i) and τ (t) = σ(θ)/(2π). Since µ ¶ 1 + tλ 1 t = − (1 + t2 ) t−λ t − λ 1 + t2 Rt we now obtain (6.1) by setting ρ(t) = 0 (1 + s2 ) dτ (s). It remains to show the uniqueness of α and β. However, setting λ = i, it is clear that α = Re F (i), and since we already know that ρ is unique, so is β. ¤ Actually one can calculate β directly from F since by dominated convergence Im F (iν)/ν → β as ν → ∞. It is usual to refer to β as the ‘mass at infinity’, an expression explained by our proof. Note, however, that it is the mass of τ at infinity and not that of ρ!

CHAPTER 7

The spectral theorem Theorem 7.1. (Spectral theorem) Suppose T is selfadjoint. Then there exists a unique, increasing and left-continuous family {Et }t∈R of orthogonal projections with the following properties: • Et commutes with T , in the sense that T Et is the closure of Et T . • Et → 0 as t → −∞ and Et → I (= identity on H) as t → ∞ (strong R ∞convergence). • T = −∞ t dEt in the following sense: u ∈ D(T ) if and only if R∞ 2 R∞ t dhEt u, ui < ∞, hT u, vi = −∞ t dhEt u, vi and kT uk2 = −∞ R∞ 2 t dhEt u, ui. −∞ The family {Et }t∈R of projections is called the resolution of the idenR∞ tity for T . The formula T = −∞ t dEt can be made sense of directly by introducing Stieltjes integrals with respect to operator-valued increasing functions. This is a simple generalization of the scalar-valued case. Although we then, formally, get a slightly stronger statement, it does not appear to be any more useful than the statement above. We will therefore omit this. For the proof we need two lemmas, the first of which actually contains the main step of the proof. Lemma 7.2. For f , g ∈ H there is a unique left-continuous function σf,g of bounded variation, with σf,g (−∞) = 0, and the following properties: • σf,g is Hermitian in f , g ( i.e., σf,g = σg,f and is linear in f ), Rand σf,f is increasing. • dσf,g R ∞is a bounded sesquilinear form on H. In fact, we even have −∞ |dσf,g | ≤ kf kkgk. R ∞ dσf,g • hRλ f, gi = −∞ t−λ . Proof. The uniqueness of σf,g follows from the Stieltjes inversion formula, applied to F (λ) = hRλ f, gi. Since hRλ f, gi is sesqui-linear in f , g and Rλ∗ = Rλ , it then follows that σf,g is Hermitian if it exists. However, by Theorem 5.3 the function λ 7→ hRλ f, f i is a Nevanlinna 39

40

7. THE SPECTRAL THEOREM

function of λ for any f , so we have ¶ Z∞ µ t 1 dσf,f (t), − (7.1) hRλ f, f i = α + βλ + t − λ 1 + t2 −∞

where σf,f is increasing and α, β may depend on f . Since kRλ k ≤ |Im1 λ| , we find that kf k2 is an upper bound for νhRiν f, f i for ν ∈ R, the R ∞ ν 2 dσf,f (t) imaginary part of which is βν 2 + −∞ t2 +ν . Hence β = 0, and by 2 R∞ Fatou’s lemma we get, as ν → ∞, that −∞ dσf,f ≤ kf k2 . A more elementary argument is the following: For ν, ε > 0 we have Zνε Z∞ 1 ν2 dσ ≤ dσf,f (t) ≤ kf k2 , f,f 1 + ε2 t2 + ν 2 −νε

−∞

ν2

1 since 1+ε 2 ≤ ν 2 +t2 for |t| ≤ νε, so letting ν → ∞, and then ε → 0, we obtain the same bound. We may now assume σf,f to be so R ∞normalized t as to be left-continuous and σf,f (−∞) = 0. Clearly −∞ 1+t2 dσf,f (t) is absolutely convergent, so this part of the integral in (7.1) may be incorporated in the α. So, with absolute convergence, we have R ∞constant dσf,f (t) 0 hRλ f, f i = α + −∞ t−λ . However, for λ → ∞ along the imaginary axis, both the left hand side and the integral → 0 (Exercise 7.1), so we must have α0 = 0. The proof is now finished in the case f = g. By the polarization identity (Exercise 7.2) 3

1X k i hRλ (f + ik g), f + ik gi , 4 k=0 R ∞ dσf,g (t) by setting so we obtain hRλ f, gi = −∞ t−λ hRλ f, gi =

3

σf,g

1X k = i σf +ik g,f +ik g . 4 k=0

The function σf,g has the correct normalization, so only the bound on the R total variation remains to be proved. But if ∆ is an interval, then dσ is a semi-scalar product on H, so Cauchy-Schwarz’ inequality ¯R∆ f,g ¯2 R R ¯ dσf,g ¯ ≤ dσf,f ∆ dσg,g is valid. For ∆ = R this shows that ∆ R∆ dσf,g is bounded by kf kkgk. If {∆j }∞ 1 is a partition of R into disjoint R intervals we obtain ¯ XµZ ¶ 12 Z X¯¯Z ¯ ¯ dσf,g ¯ ≤ dσf,f dσg,g ¯ ¯ ∆j

∆j



∆j

µX Z

∆j

dσf,f

¶ 12 µX Z ∆j

dσg,g

¶ 21

≤ kf kkgk,

7. THE SPECTRAL THEOREM

41

where the second inequality is Cauchy-Schwarz’ inequality in `2 . The proof is complete. ¤ R∞ Lemma 7.3. −∞ dσf,g = hf, gi for any f , g ∈ H. Proof. Assume first that f ∈ D(T ) so that f = Rλ (v −λf ), where R ∞ dσf,g (t) v = T f . Thus hf, gi = −λhRλ f, gi + hRλ v, gi. Since −iν −∞ t−iν → R∞ dσf,g as ν → ∞ by bounded convergence (Exercise 7.1), the lemma −∞ R∞ is true for f ∈ D(T ), which is dense in H. But −∞ dσf,g is a bounded R∞ R∞ Hermitian form on H since | −∞ dσf,g | ≤ −∞ |dσf,g | ≤ kf kkgk by Lemma 7.2, so the general case follows by continuity. ¤ Proof of the spectral theorem. We first show the uniqueness of the resolution of identity. So, assume a resolution of the identity with all the properties claimed exists. Then Et Es = Emin(s,t) , so if w ∈ D(T ) and s fixed we obtain Zs dhEt T w, vi = hEs T w, vi −∞

Z∞ = hT Es w, vi =

Zs t dhEt Es w, vi =

−∞

t dhEt w, vi. −∞

Thus dhEt T w, vi = t dhEt w, vi as measures. Now suppose w = Rλ u. We then get dhEt u, vi dhEt (T − λ)Rλ u, vi = = dhEt Rλ u, vi. t−λ t−λ R t u,vi It follows that hRλ u, vi = dhEt−λ . The uniqueness of the spectral projectors therefore follows from the Stieltjes inversion formula. The linear form f 7→ σf,g (t) is bounded for each g ∈ H (by kgk, according to Lemma 7.2). By Riesz’ representation theorem it is therefore of the form hf, gt i, where kgt k ≤ kgk. It is obvious that gt depends linearly on g so gt = Et g where Et is a linear operator with norm ≤ 1, which is selfadjoint since σf,g is Hermitian. Furthermore Et f * 0 as t → −∞ by the normalization of σf,g and Et f * f as t → ∞ (weak convergence) by Lemma 7.3. Suppose we knew that Et is a projection. Since Et is selfadjoint it is then an orthogonal projection. It follows that kEt f k2 = hEt f, f i → 0 as t → −∞ and similarly kf − Et f k2 = hf − Et f, f i → 0 as t → ∞. Hence we only need to show that Et is a projection increasing with t, and the statements about T .

42

7. THE SPECTRAL THEOREM

The resolvent relation Rλ − Rµ = (λ − µ)Rλ Rµ may be expressed as

Z∞

1 dhEt f, gi = t−λ t−µ

−∞

Z∞

dhEt Rµ f, gi t−λ

−∞

(check this!), so the uniqueness of the Stieltjes transform shows that Rt s f,gi hEt Rµ f, gi = −∞ dhEs−µ . But Z∞ hEt Rµ f, gi = hRµ f, Et gi =

dhEs f, Et gi . s−µ

−∞

So, again by uniqueness, hEs f, Et gi = hEu f, gi where u = min(s, t), i.e., Et Es = Emin(s,t) . For s = t this shows that Et is a projection, and if t > s we get 0 ≤ (Et − Es )∗ (Et − Es ) = (Et − Es )2 = Et − Es so that {Et }t∈R is an increasing family of orthogonal projections. Now suppose f ∈ D(T ) and v = T f . For any non-real λ we then have f = Rλ (v −λf ) or Rλ v = f +λRλ f . Since 1+λ/(t−λ) = t/(t−λ) we therefore obtain Z∞ Z∞ dσv,g (t) t dσf,g = t−λ t−λ Rt

−∞

−∞

R∞ so that σv,g (t) = −∞ s dσf,g (s). In particular, hT f, gi = −∞ t dhEt f, gi. Rt Rt We also get σv,v (t) = −∞ s dσf,v (s) = −∞ s2 dσf,f (s), so that kT f k2 = R∞ 2 s dhEs f, f i. −∞ R∞ Next we prove that any u ∈ H for which −∞ s2 dhEs u, ui < ∞ is in D(T ). To see this, note that vZ Z Z u u |dhEs u, vi| ≤ t dhEs u, ui dhEs v, vi ∆





if ∆ is a finite union of intervals. This follows just as in the proof of Lemma 7.2. Now let ∆k = {s | 2k−1 < |s| ≤ 2k }, k ∈ Z. Then ¯ ¯ ¯ ¯Z Z ¯ ¯ k ¯ s dhEs u, vi¯ ≤ 2 |dhEs u, vi| ¯ ¯ ¯ ¯ ∆k ∆k vZ vZ Z Z u u u u k 2 ≤ 2 t dhEs u, ui dhEs v, vi ≤ 2t s dhEs u, ui dhEs v, vi. ∆k

R∞

∆k

∆k

∆k

s2 dhEs u, ui < ∞ we obtain from this by adding over all k ¯ ¯R ¯ ¯ ∞ and using Cauchy-Schwarz’ inequality for sums that ¯ −∞ s dhEs u, vi¯ ≤ qR R∞ ∞ 2 −∞ s2 dhEs u, uikvk so that the anti-linear form v 7→ −∞ s dhEs u, vi If now

−∞

7. THE SPECTRAL THEOREM

43

is bounded on H. It is therefore, by Riesz’ representation theorem, a scalar product hu1 , vi. It is obvious that u1 depends linearly on u, i.e., there is a linear operator S so that u1 = Su. It is clear that S is symmetric and an extension of T , so we have T ⊂ S ⊂ S ∗ ⊂ T ∗ = T . Hence S = T so the claims about D(T ) are verified. Finally, we must prove that T Et is the closure of Et T . From what we just proved it follows thatRif u ∈ D(T ) then Et uR∈ D(T ). For v ∈ H ∞ ∞ we then have hT Et u, vi = −∞ s dhEs Et u, vi = −∞ s dhEs u, Et vi = hT u, Et vi = hEt T u, vi so T Et is an extension of Et T . Since Et is bounded and T closed it follows that T Et is closed (Exercise 7.3). Now suppose Et u ∈ D(T ). We must find uj ∈ D(T ) such that uj → u and Et T uj → T Et u. Since D(T ) is dense in H we can find vj ∈ D(T ) so that vj → u. Now set uj = vj − Et vj + Et u. Clearly uj ∈ D(T ), uj → u and Et T uj = T Et uj = T Et u and the proof is complete. ¤ The operator Et is called the spectral projector for the interval (−∞, t). The spectral projector for the interval (a, b) is E(a,b) = Eb − Ea+ where Ea+ is the right hand limit at a of Et . Similarly E[a,b] = Eb+ − Ea , etc. For a Rgeneral Borel set M ⊂ R the spectral projector is defined to be EM = M dEt . Show that this is actually an orthogonal projection for any Borel set B! Obviously the various parts of the spectrum (point spectrum etc.) are determined by the behavior of the spectral projectors. We end this chapter with a theorem which makes explicit this connection. Theorem 7.4. (1) λ ∈ σp (T ) if and only if Et jumps at t, i.e., E{λ} = E[λ,λ] 6= 0. (2) λ ∈ ρ(T ) ∩ R if and only if Et is constant in a neighborhood of t = λ. It follows that the continuous spectrum consists of those points of increase of Et which are not jumps1. Proof. If Et jumps at λ we can find a unit vector e in the range of E{λ} , i.e., such that E{λ} e = e. It follows immediately from the spectral theorem that e ∈ D(T ) and (T − λ)e = 0. Conversely, suppose that e is a unit vector with T e = λe. Then Z∞ 0 = k(T − λ)ek2 = (t − λ)2 dhEt e, ei, −∞

so that the support of the non-zero, non-negative measure dhEt e, ei is contained in {λ}. Hence Et jumps at λ, and the proof of (1) is complete. Now assume Et is constant in (λ − ε, λ + ε). Then λ is not an eigenvalue of T so Sλ is dense in H. Thus the inverse of T − λ exists 1A

point of increase for Et is a point λ such that E∆ 6= 0 for every open ∆ 3 λ.

44

7. THE SPECTRAL THEOREM

as a closed, densely defined operator. We need only show that this inverse is bounded, to R ∞see that its domain is allRof∞H so that λ ∈ ρ(T ). But k(T − λ)uk2 = −∞ (t − λ)2 dhEt u, ui ≥ ε2 −∞ dhEt u, ui = ε2 kuk2 so the inverse of T − λ is bounded by 1/ε. Conversely, assume that Et is not constant near λ. Then there are arbitrarily short intervals ∆ containing λ such that E∆ 6= 0, i.e., there are non-zero vectors u such that E∆ u = u. But then k(T − λ)uk ≤ |∆|kuk, where |∆| is the length of ∆. Hence we can find a sequence of unit vectors uj , j = 1, 2, . . . for which (T − λ)uj → 0 (a ‘singular sequence’). Consequently either T − λ is not injective, or else the inverse is unbounded so λ ∈ / ρ(T ). ¤ Exercises for Chapter 7 R∞ Exercise 7.1. Suppose σ is increasing and −∞ dσ < ∞. Show that R∞ R∞ −λ −∞ dσ(t) → dσ as λ → ∞ along any non-real ray originating in t−λ −∞ R∞ the origin. In particular, −∞ dσ(t) → 0. t−λ Exercise 7.2. Suppose B(·, ·) is a sesqui-linear form on a complex linear space. Show the polarization identity 3

B(u, v) =

1X k i B(u + ik v, u + ik v) . 4 k=0

Exercise 7.3. Show that if T is a closed operator on H and S is bounded and everywhere defined, then T S, but not necessarily ST , is closed. Exercise 7.4. Show that if T is selfadjoint and f is a continuous R∞ function defined on σ(T ), then f (T ) = −∞ f (t) dEt defines a densely defined operator, which is bounded if f is and selfadjoint if f is realvalued. Also show that (f (T ))∗ = f (T ), that (f (T ))∗ has the same domain as f (T ) and commutes with it in a reasonable sense, and that f g(T ) = f (T )g(T ). This is the functional calculus for a selfadjoint operator, and also makes sense for arbitrary Borel functions. The integral is made sense of in the same way as in the statement of the spectral theorem. Exercise 7.5. Let T be selfadjoint and put H(t) = e−itT , t ∈ R, the exponential being defined as in the previous exercise. Show that H(t + s) = H(t)H(s) for real t and s (a group of operators), that H(t) is unitary and that if u0 ∈ D(T ), then u(t) = H(t)u0 solves the Schr¨odinger equation T u = iu0t with initial data u(0) = u0 . Similarly, if T ≥ 0 and t ≥ 0, show that K(t) = e−tT is selfadjoint and bounded, that K(t + s) = K(t)K(s) for s ≥ 0 and t ≥ 0 (a semigroup of operators) and that if u0 ∈ H then u(t) = K(t)u0 solves the heat equation T u = u0t for t > 0 with initial data u(0) = u0 .

CHAPTER 8

Compactness If a selfadjoint operator T has a complete orthonormal of Psequence eigen-vectors e1 , e2 , . . . , then for any f ∈ H we have f = fˆj ej where fˆj = hf, ej i are the generalized Fourier coefficients; we have a generalized Fourier series. However, σp (T ) can still be very complicated; it may for example be dense in R (so that σ(T ) = R), and each eigenvalue can have infinite multiplicity. We have a considerably simpler situation, more similar to the case of the classical Fourier series, if the resolvent is compact. Definition 8.1. • A subset of a Hilbert space is called precompact (or relatively compact) if every sequence of points in the set has a strongly convergent subsequence. • An operator A : H1 → H2 is called compact if it maps bounded sets into precompact ones. Note that in an infinite dimensional space it is not enough for a set to be bounded (or even closed and bounded) for it to be precompact. For example, the closed unit sphere is closed and bounded, and it contains an orthonormal sequence. But no orthonormal sequence has a strongly convergent subsequence! The second point means that if {uj }∞ 1 is a bounded sequence in H1 , ∞ then {Auj }1 has a subsequence which converges strongly in H2 . Theorem 8.2. (1) The operator A is compact if and only if every weakly convergent sequence is mapped onto a strongly convergent sequence. Equivalently, if uj * 0 implies that Auj → 0. (2) If A : H1 → H2 is compact and B : H3 → H1 bounded, then AB is compact. (3) If A : H1 → H2 is compact and B : H2 → H3 bounded, then BA is compact. (4) If A : H1 → H2 is compact, then so is A∗ : H2 → H1 . Proof. If uj * u then uj − u * 0, and if A(uj − u) → 0 then Auj → Au. Thus the last statement of (1) is obvious. By Theorem 3.9 every bounded sequence has a weakly convergent subsequence, so if A maps weakly convergent sequences into strongly convergent ones, then A is compact. Conversely, suppose uj * u and A is compact. Since 45

46

8. COMPACTNESS

weakly convergent sequences are bounded (Theorem 3.9), any subsequence of {Auj }∞ 1 has a convergent subsequence. Suppose Aujk → v. Then for any w ∈ H we have hv, wi = limhAujk , wi = limhujk , A∗ wi = hu, A∗ wi = hAu, wi, so that v = Au. Hence the only point of accu1 mulation of {Auj }∞ 1 is Au, so Auj → Au . This completes the proof of (1). We leave the rest of the proof as an exercise for the reader (Exercise 8.1). ¤ Theorem 8.3. Suppose T is selfadjoint and its resolvent Rµ is compact for some µ. Then Rλ is compact for all λ ∈ ρ(T ), and T has discrete spectrum, i.e., σ(T ) consists of isolated eigenvalues with finite multiplicity. Proof. By the resolvent relation Rλ = (I + (λ − µ)Rλ )Rµ where I is the identity so the first factor to the right is bounded. Hence Rλ is compact by Theorem 8.2.3. Now let ∆ be a bounded interval. If u ∈ E∆ H then kRλ uk2 = R dhEt u,ui ≥ Kkuk2 where K = inf t∈∆ |t − λ|−2 > 0 (verify this calcu∆ |t−λ|2 lation!). We have Rλ uj → 0 if uj * 0, so the inequality shows that any weakly convergent sequence in E∆ H is strongly convergent (the identity operator is compact). This implies that E∆ H has finite dimension (for example since an orthonormal sequence converges weakly to 0 but is not strongly convergent). In particular eigenspaces are finite-dimensional. It also follows that any bounded interval can only contain a finite number of points of increase for Et , because projections belonging to disjoint intervals have orthogonal ranges (Exercise 8.2). This completes the proof. ¤ Resolvents for different selfadjoint extensions of a symmetric operator are closely related. In particular, we have the following theorem. Theorem 8.4. Suppose a densely defined symmetric operator T0 has a selfadjoint extension with compact resolvent and that dim Dλ < ∞ for some λ ∈ C \ R. Then every selfadjoint extension of T0 has compact resolvent. ˜ λ be resolvents of selfadjoint exProof. Let Im λ 6= 0 and Rλ , R ˜ λ has its range in Dλ , since Rλ u and tensions of T0 . Then A = Rλ − R ∗ ˜ Rλ u both solve the equation T0 v = λv + u. It follows that A is a com∞ pact operator, since if {uj }∞ 1 is a bounded sequence in H, then {Auj }1 is a bounded sequence in a finite-dimensional space. By the Bolzano˜λ Weierstrass theorem there is therefore a convergent subsequence. If R ˜ is compact it therefore follows that Rλ = Rλ + A is compact. ¤ 1If

not, there would be a neighborhood O of Au and a subsequence of {Auj }∞ j=1 that were outside O. But we could then find a convergent subsequence which does not converge to Au.

8. COMPACTNESS

47

A natural question is now: How do I, in a concrete case, recognize that an operator is compact? One class of compact operators which are sometimes easy to recognize, are the Hilbert-Schmidt operators. Definition 8.5. A : H → H is called a Hilbert-SchmidtPoperator if 2 for some complete orthonormal pP sequence e1 , e2 , . . . we have kAej k < kAej k2 is called the Hilbert-Schmidt norm ∞. The number |||A||| = of A. Lemma 8.6. |||A||| is independent of the particular complete orthonormal sequence used in the definition, it is a norm, |||A||| = |||A∗ |||, and any Hilbert-Schmidt operator is compact. The set of Hilbert-Schmidt operators on H is a Hilbert space in the Hilbert-Schmidt norm. Proof. It is clear that ||| · ||| is a norm. Now suppose {ej }∞ 1 and ∞ {fj }1 are arbitrary complete orthonormal sequences. P P Using Parse2 2 val’s formula twice it follows that kAe k = j j j,k |hAej , fk i| = P P ∗ 2 ∗ 2 k kA fk k . Thus the Hilbert-Schmidt norm has j,k |hej , A fk i| = the claimed properties. To see that APis compact, suppose uj * 0 and ∗ 2 let ε > 0. Choose N so large that ∞ N kA ej k < ε and let C be a ∞ bound for P the sequence {uP j }1 . By Parseval’s formula we then have kAuk k2 = |hAuk , ej i|2 = |huk , A∗ ej i|2 . We obtain 2

kAuk k ≤

N X

|huk , A∗ ej i|2 + C 2 ε → C 2 ε

1

as k → ∞ since |huk , A∗ ej i| ≤ CkA∗ ej k . It follows that Auk → 0 so that A is compact. We leave the proof of the last statement as an exercise for the reader (Exercise 8.4). ¤ It is usual to consider a differential operator defined in some domain Ω ⊂ Rn as an operator in the space L2 (Ω, w) where w > 0 is measurable and the scalar product in the space is given by hu, vi = R u(x)v(x)w(x) dx. In all reasonable cases the resolvent of such an Ω operator can be realized as an integral operator, i.e., an operator of the form Z (8.1) Au(x) = g(x, y)u(y)w(y) dy for x ∈ Ω. Ω

The function g, defined in Ω × Ω, is called the integral kernel of the operator A. The integral kernel of the resolvent of a differential operator is usually called Green’s function for the operator. Theorem 8.7. Assume g(x, y) is measurable as a function of both its variables and that y 7→ g(x, y) is in L2 (Ω, w) for a.a. x ∈ Ω. Then the operator A of (8.1) is a Hilbert-Schmidt operator in L2 (Ω, w) if and

48

8. COMPACTNESS

only if g ∈ L2 (Ω, w) ⊗ L2 (Ω, w), i.e., if and only if ZZ |g(x, y)|2 w(x)w(y) dx dy < ∞. Ω×Ω

Proof. Let {ej }∞ 1 be a complete orthonormal sequence in the space L2 (Ω, w). For fixed x ∈ Ω we may view Aej (x) as the j:th P Fourier coefficient of g(x, ·) so Parseval’s formula gives |Aej (x)|2 = R |g(x, y)|2 w(y) dy for a.a x ∈ Ω. By monotone convergence the prodΩ uct of this function by w is in L1 (Ω) if and only if the Hilbert-Schmidt norm of A is finite. The theorem now follows by an application of Tonelli’s theorem (i.e., a positive, measurable function is integrable over Ω × Ω if and only if the iterated integral is finite). ¤ Example 8.8. Consider the operator T in L2 (−π, π) with domain D(T ), consisting of those absolutely continuous functions u with derivative in L2 (−π, π) for which u(π) = u(−π), and given by T u = −i du (cf. dx Example 4.8).R This operator is self-adjoint and its resolvent is given π by Rλ u(x) = −π g(x, y, λ)u(y) dy where Green’s function g(x, y, λ) is given by  e−iλπ iλ(x−y)   − e y < x, 2 sin λπ g(x, y, λ) = iλπ   − e eiλ(x−y) y > x. 2 sin λπRR The reader should verify this! Since |g(x, y, λ)|2 dx dy < ∞ for noninteger λ the resolvent is a Hilbert-Schmidt operator, so it is compact. Now consider the operator of Example 4.6. Green’s function is now only defined for non-real λ and given by ( Im λ iλ(x−y) i |Im e if (x − y) Im λ > 0, λ| (8.2) g(x, y, λ) = 0 otherwise. The reader should verify this as well! In this case there is no value of λ for which g(·, ·, λ) ∈ L2 (R2 ) so the resolvent is not a Hilbert-Schmidt operator.

EXERCISES FOR CHAPTER 8

49

Exercises for Chapter 8 Exercise 8.1. Prove Theorem 8.2(2)–(4). Exercise 8.2. Show that if ∆1 and ∆2 are disjoint intervals and {Et }t∈R a resolution of the identity, then the ranges of E∆1 and E∆2 are orthogonal. Generalize to the case when ∆1 and ∆2 are arbitrary Borel sets in R. Exercise 8.3. Show the converse of Theorem 8.3, i.e., if the spectrum consists of isolated eigen-values of finite multiplicity, then the resolvent is compact. Hint: Let λ1 , λ2 , . . . be the eigenvalues ordered by increasing absolute value and repeated according to multiplicity and let the corresponding P |hu,ej i|2 normalized eigen-vectors be e1 , e2 , . . . . Show that kRλ uk2 = |λ−λj |2 and use this to see that Rλ uk → 0 if uk * 0. Exercise 8.4. Prove the last statement of Lemma 8.6. Exercise 8.5. Verify all claims made in Example 8.8. Exercise 8.6. Let T be a selfadjoint operator. Show that if the resolvent Rλ of T is a Hilbert-Schmidt P operator and λj , j = 1, 2, . . . −2 are the non-zero eigenvalues of T , then ∞ j=1 λj < ∞.

CHAPTER 9

Extension theory We will here complete the discussion on selfadjoint extensions of a symmetric operator begun in Chapter 4. This material is originally due to von Neumann although our proofs are different, and we will also discuss an extension of von Neumann’s theory needed in Chapter 13. 1. Symmetric operators We shall find criteria for the existence of selfadjoint extensions of a densely defined symmetric operator, which according to the discussion just before Example 4.4 must be a restriction of the adjoint operator. We shall deal extensively with the graphs of various operators and it will be convenient to use the same notation for the graph of an operator T as for T itself. Note that if T is a closed operator on the Hilbert space H, then its graph is a closed subspace of H ⊕ H, so in this case T is itself a Hilbert space. Recall that with the present notation we have T ∗ = U(H ⊕ H) ª T ) = (H ⊕ H) ª UT according to (4.1), where U : H ⊕ H 3 (u, v) 7→ (−iv, iu) is the boundary operator introduced in Chapter 4. Also recall that U is selfadjoint, unitary and involutary on H ⊕ H. So, assume we have a densely defined symmetric operator T . We want to investigate what selfadjoint extensions, if any, T has. Since T ⊂ T ∗ the adjoint is densely defined and thus the closure T = T ∗∗ exists (Proposition 4.3) and is also symmetric. We may therefore as well assume that T is closed to begin with. Recall that if S is a symmetric extension of T , then it is a restriction of T ∗ since we then have T ⊂ S ⊂ S ∗ ⊂ T ∗ . Now put D±i = {U ∈ T ∗ | U U = ±U }. Note that U ∈ T ∗ means exactly that U = (u, T ∗ u) for some u ∈ D(T ∗ ). It is immediately seen that Di and D−i consist of the elements of T ∗ of the form (u, iu) and (u, −iu) respectively, so that u satisfies the equation T ∗ u = iu respectively T ∗ u = −iu. We may therefore identify these spaces with the deficiency spaces D±i introduced in Chapter 5. Also D±i are therefore called deficiency spaces. Theorem 9.1 (von Neumann). If T is closed and symmetric operator, then T ∗ = T ⊕ Di ⊕ D−i . 51

52

9. EXTENSION THEORY

Proof. The facts that Di and D−i are eigenspaces of the unitary operator U for different eigenvalues and hT, UT ∗ i = 0 imply that T , Di and D−i are orthogonal subspaces of T ∗ (cf. Exercise 4.8). It remains to show that Di ⊕ D−i contains T ∗ ª T . However, U ∈ T ∗ ª T implies U ∈ H2 ª T and thus UU ∈ T ∗ . Denoting the identity on H2 by I and using U 2 = I one obtains U+ = 21 (I + U)U ∈ Di and U− = 21 (I − U)U ∈ D−i . Clearly U = U+ + U− so this proves the theorem. ¤ We define the deficiency indices of T to be n+ = dim Di = dim Di and n− = dim D−i = dim D−i so these are natural numbers or ∞. We may now characterize the symmetric extensions of T . Theorem 9.2. If S is a closed, symmetric extension of the closed symmetric operator T , then S = T ⊕ D where D is a subspace of Di ⊕ D−i such that D = {u + Ju | u ∈ D(J) ⊂ Di } for some linear isometry J of a closed subspace D(J) of Di onto part of D−i . Conversely, every such space D gives rise to a closed symmetric extension S = T ⊕ D of T . The proof is obvious after noting that if u+ , v+ ∈ Di and u− , v− ∈ D−i , then hu+ , v+ i = hu− , v− i precisely if (u+ + u− , U(v+ + v− )) = 0. Some immediate consequences of Theorem 9.2 are as follows. Corollary 9.3. The closed symmetric operator T is maximal symmetric precisely if one of n+ and n− equals zero and selfadjoint precisely if n+ = n− = 0. Corollary 9.4. If S is the symmetric extension of the closed symmetric operator T given as in Theorem 9.2 by the isometry J with domain D(J) ⊂ Di and range RJ ⊂ D−i , then the deficiency spaces for S are Di (S) = Di ª D(J) and D−i (S) = D−i ª RJ respectively. Proof. If D ⊂ Di ⊕ D−i and S = T ⊕ D is symmetric, then u ∈ Di (S) ⊂ Di precisely if hT ⊕ D, Uui = 0. But hT, Uui = 0 and if u+ + u− ∈ D with u+ ∈ Di , u− ∈ D−i then hu+ + u− , ui = hu+ , ui which shows that Di (S) = Di ª D(J). Similarly the statement about D−i (S) follows. ¤ Corollary 9.5. Every symmetric operator has a maximal symmetric extension. If one of n+ and n− is finite, then all or none of the maximal symmetric extensions are selfadjoint depending on whether n+ = n− or not. If n+ = n− = ∞, however, some maximal symmetric extensions are selfadjoint and some are not.

2. SYMMETRIC RELATIONS

53

We will now generalize Theorem 9.1. To do this, we use the notation of Lemma 5.1. Define Dλ = {(u, λu) ∈ T ∗ } = {(u, λu) | u ∈ Dλ } Eλ = {(u, λu + v) ∈ T ∗ | v ∈ Dλ } . It is clear that Eλ for non-real λ is the direct sum of Dλ and Dλ since i if a = 2 Im we have (u, λu + v) = a(v, λv) + (u − av, λ(u − av)). This λ direct sum is topological (i.e., the projections from Eλ onto Dλ and Dλ are bounded) since all three spaces are obviously closed. Thus the assertion follows from the closed graph theorem. Carry out the argument as an exercise! We can now prove the following theorem. ˙ λ as a topoTheorem 9.6. For any non-real λ we have T ∗ = T +E logical direct sum. Proof. Since all involved spaces are closed it is enough to show the formula algebraically (the reason is as above). Let (u, v) ∈ T ∗ . By Lemma 5.1.1 H = Sλ ⊕ Dλ so we may write v − λu = w0 + wλ with wλ ∈ Dλ and w0 ∈ Sλ . We can find u0 ∈ H such that (u0 , λuo +w0 ) ∈ T so (u, v) = (u0 , λu0 + w0 ) + (u − u0 , λ(u − u0 ) + wλ ). The last term is obviously in Eλ . If (u, v) ∈ T ∩ Eλ we have v − λu ∈ Sλ ∩ Dλ = {0} so that λ is an eigenvalue of T if u 6= 0. ¤ Corollary 9.7. If Im λ > 0 then dim Dλ = n+ , dim Dλ = n− . Proof. Suppose U = (u, T ∗ u) and V = (v, T ∗ v) are in T ∗ . The boundary form hU, UV i = i(hu, T ∗ vi − hT ∗ u, vi) is a bounded Hermitian form on T ∗ . It is immediately verified that it is positive definite on Dλ , negative definite on Dλ , non-positive on ˙ λ and non-negative on T +D ˙ λ. T +D Let µ be a complex number with Im µ > 0. We get a linear map of Dµ into Dλ in the following way. Given u ∈ Dµ we may write u = u0 +uλ +uλ uniquely with u0 ∈ T , uλ ∈ Dλ and uλ ∈ Dλ according to Theorem 9.6. Let the image of u in Dλ be uλ . Then uλ can not be 0 unless u is since the boundary form is positive definite on Dµ but non˙ λ . It follows that dim Dµ ≤ dim Dλ . By symmetry the positive on T +D dimensions of Dλ and Dµ are then equal, i.e., dim Dλ = n+ . Similarly ¤ one shows that dim Dλ = n− . 2. Symmetric relations This section is a simplified version of Section 1 of [2]. Most of it can also be found in [1]. The theory of symmetric and selfadjoint relations is an easy extension of the corresponding theory for operators, but will be essential for Chapters 13 and 14.

54

9. EXTENSION THEORY

We call a (closed) linear subspace T of H2 = H⊕H a (closed) linear relation on H. This is a generalization of the concept of (the graph of) a linear operator which will turn out to be useful in the following chapters. We still denote by U the boundary operator on H2 and define the adjoint of the linear relation T on H by T ∗ = H2 ª U T = U(H2 ª T ) . Clearly T ∗ is a closed linear relation on H. Note that by not insisting that T and T ∗ are graphs we can, for example, now deal with adjoints of non-densely defined operators. Naturally T is called symmetric if T ⊂ T ∗ and selfadjoint if T = T ∗ . Proposition 9.8. Let T ⊂ S be linear relations on H. Then S ∗ ⊂ T . The closure of T is T = T ∗∗ and (T )∗ = T ∗ . ∗

The reader should prove this proposition as an exercise. It is very easy to obtain a spectral theorem for selfadjoint relations as a corollary to the spectral theorem of Chapter 7. Given a relation T we call the set D(T ) = {u ∈ H | (u, v) ∈ T for some v ∈ H} the domain of T . Now let HT be the closure of D(T ) in H and put H∞ = {u ∈ H | (0, u) ∈ T ∗ } One may view H∞ as the ‘eigen-space’ of T ∗ corresponding to the ‘eigen-value’ ∞. Proposition 9.9. H = HT ⊕ H∞ . Proof. We have h(u, v), U(0, w)i = ihu, wi so that (0, w) ∈ T ∗ precisely when w ∈ H ª D(T ). The proposition follows. ¤ Now assume T is selfadjoint and put T∞ = {0} × H∞ and T˜ = T ∩ HT2 . Then it is clear that T = T˜ ⊕ T∞ so we have split T into its many-valued part T∞ and T˜ which is called the operator part of T because of the following theorem. Theorem 9.10 (Spectral theorem for selfadjoint relations). If T is selfadjoint, then T˜ is the graph of a densely defined selfadjoint operator in HT with domain D(T ). Proof. T˜ is the graph of a densely defined operator on HT since (0, w) ∈ T˜ implies w ∈ H∞ ∩ HT = {0}. T˜ is selfadjoint since T˜ = T ª T∞ so its adjoint (in HT ) is T˜∗ = HT2 ∩ (T ∗ ⊕ U T∞ ) = HT2 ∩ T = T˜ (check this calculation carefully!). ¤ It is now clear that we get a resolution of the identity for T by adjoining the orthogonal projector onto H∞ to the resolution of the identity for T˜. Assume we have a symmetric relation T . We want to investigate what selfadjoint extensions, if any, T has. Since the closure T of T is also symmetric we may as well assume that T is closed to begin with. Just as is the case for operators, if S is a symmetric extension of T ,

2. SYMMETRIC RELATIONS

55

then it is a restriction of T ∗ since we then have T ⊂ S ⊂ S ∗ ⊂ T ∗ . Now put D±i = {u ∈ T ∗ | U u = ±u} . It is immediately seen that Di and D−i consist of the elements of T ∗ of the form (u, iu) and (u, −iu) respectively. We call them the deficiency spaces of T . The following generalizes von Neumann’s formula. Theorem 9.11. For any closed and symmetric relation T holds T ∗ = T ⊕ Di ⊕ D−i . The proof is the same as for Theorem 9.1 and is left to Exercise 9.6 As before we define the deficiency indices of T to be n+ = dim Di = dim Di and n− = dim D−i = dim D−i so these are again natural numbers or ∞. The next theorem is completely analogous to Theorem 9.2 with essentially the same proof, so we leave this as Exercise 9.7 Theorem 9.12. If S is a closed, symmetric extension of the closed symmetric relation T , then S = T ⊕ D where D is a subspace of Di ⊕ D−i such that D = {u + Ju | u ∈ D(J) ⊂ Di } for some linear isometry J of a closed subspace D(J) of Di onto part of D−i . Conversely, every such space D gives rise to a closed symmetric extension S = T ⊕ D of T . The following consequences of Theorem 9.12 are completely analogous to Corollaries 9.3–9.5, and their proofs are left as Exercise 9.8 Some immediate consequences of Corollary 9.13. The closed symmetric relation T is maximal symmetric precisely if one of n+ and n− equals zero and selfadjoint precisely if n+ = n− = 0. Corollary 9.14. If S is the symmetric extension of the closed symmetric relation T given as in Theorem 9.12 by the isometry J with domain D(J) ⊂ Di and range RJ ⊂ D−i , then the deficiency spaces for S are Di (S) = Di ª D(J) and D−i (S) = D−i ª RJ respectively. Corollary 9.15. Every symmetric relation has a maximal symmetric extension. If one of n+ and n− is finite, then all or none of the maximal symmetric extensions are selfadjoint depending on whether n+ = n− or not. If n+ = n− = ∞, however, some maximal symmetric extensions are selfadjoint and some are not. We will now prove a theorem generalizing Theorem 9.6. To do this, first note that Lemma 5.1 remains valid for relations, with the obvious

56

9. EXTENSION THEORY

definitions of Sλ and Dλ and identical proofs. We now define Dλ = {(u, λu) ∈ T ∗ } = {(u, λu) | u ∈ Dλ } Eλ = {(u, λu + v) ∈ T ∗ | v ∈ Dλ } . As before it is clear that Eλ for non-real λ is the direct sum of Dλ and i Dλ since if a = 2 Im we have (u, λu+v) = a(v, λv)+(u−av, λ(u−av)) λ and that this direct sum is topological (i.e., the projections from Eλ onto Dλ and Dλ are bounded). ˙ λ as a topoTheorem 9.16. For any non-real λ holds T ∗ = T +E logical direct sum. Corollary 9.17. If Im λ > 0 then dim Dλ = n+ , dim Dλ = n− . The proofs of Theorems 9.16 and 9.17 is the same as for Theorems 9.6 and 9.7 respectively, and are left as exercises.

EXERCISES FOR CHAPTER 9

57

Exercises for Chapter 9 Exercise 9.1. Fill in all missing details in the proofs of Theorem 9.2 and Corollaries 9.3–9.5. Exercise 9.2. Show that if n+ = n− < ∞, then if one selfadjoint extension of a symmetric operator has compact resolvent, then every other selfadjoint extension also has compact resolvent. Hint: The difference of the resolvents for two selfadjoint extensions of a symmetric operator has range contained in Dλ . Exercise 9.3. Suppose T is a closed and symmetric operator on H, that λ ∈ R and that Sλ is closed. Show that if λ is not an eigenvalue of T , then T ∗ is the topological direct sum of T and Eλ and that n+ = n− . You may also show that if Sλ is closed but λ is an eigen-value of T , then one still has n+ = n− . Exercise 9.4. Suppose T is a symmetric and positive operator, i.e., hT u, ui ≥ 0 for every u ∈ D(T ). Use the previous exercise to show that T has a selfadjoint extension (this is a theorem by von Neumann). Exercise 9.5. Suppose T is a symmetric and positive operator. By the previous exercise T has at least one selfadjoint extension. Prove that there exists a positive selfadjoint extension (the so called Friedrichs extension). This is a theorem by Friedrichs. Hint: First define [u, v] = hT u, vi + hu, vi for u, v ∈ D(T ), show that this is a scalar product, and let H1 be the completion of D(T ) in the corresponding norm. Next show that H1 may be identified with a subset of H and that for any u ∈ H the map H1 3 v 7→ hv, ui is a bounded linear form on H1 . Conclude that hu, vi = [u, Gv] for u ∈ H1 and v ∈ H, where G is an operator on H with range in H1 . Finally show that G−1 − I, where I is the identity, is a positive selfadjoint extension of T . Exercise 9.6. Prove Theorem 9.11. Exercise 9.7. Prove Theorem 9.12. Exercise 9.8. Prove Corollaries 9.13–9.15. Exercise 9.9. Prove Theorem 9.16. Exercise 9.10. Prove Theorem 9.17.

CHAPTER 10

Boundary conditions A simple example of a formally symmetric differential equation is given by the general Sturm-Liouville equation (10.1)

−(pu0 )0 + qu = wf.

Here the coefficients p, q and w are given real-valued functions in a given interval I. Standard existence and uniqueness theorems for the initial value problem are valid if 1/p, q and w are all in Lloc (I). There are (at least) Rtwo Hermitian formsR naturally associated with this equation, namely I (pu0 v 0 + quv) and I uvw. Under appropriate positivity conditions either of these forms is a suitable choice of scalar product for a Hilbert space in which to study (10.1). The corresponding problems are then called left definite and right definite respectively. We will not discuss left definite problems in these lectures. If p is not differentiable it is most convenient to interpret (10.1) as a first order system µ ¶ µ ¶ µ ¶ q 0 0 1 w 0 0 U + U= V . 0 − p1 −1 0 0 0 µ ¶ u This equation becomes equivalent to (10.1) on setting U = −pu0 and letting the first component of V be f . It is a special case of a fairly general first order system (10.2)

Ju0 + Qu = W v

where J is a constant n × n matrix which is invertible and skewHermitian (i.e., J ∗ = −J) and the coefficients Q and W are n × n matrix-valued functions which are locally integrable on I. In addition Q is assumed Hermitian and W positive semi-definite, and u, v are n × 1 matrix-valued functions. We shall study such systems in Chapters 13–15. Here we shall just deal with the case of the simple inhomogeneous scalar Sturm-Liouville equation (10.3)

−u00 + qu = λu + f

and the corresponding homogeneous eigenvalue problem −u00 +qu = λu. The latter is often called the one-dimensional Schr¨odinger equation. In later chapters we shall then see that with minor additional technical complications we may deal with the first order system (10.2) in much 59

60

10. BOUNDARY CONDITIONS

the same way. This will of course include the more general SturmLiouville equation (10.1). We shall study (10.3) in the Hilbert space L2 (I) where I is an interval and the function q is real-valued and locally integrable in I, i.e., integrable on every compact subinterval of I. In L2 (I) the scalar R product is hu, vi = I uv. Before we begin we need to quote a few standard facts about SturmLiouville equations. Basic for what follows is the following existence and uniqueness theorem. Theorem 10.1. Suppose q is locally integrable in an interval I and that c ∈ I. Then, for any locally integrable function f and arbitrary complex constants A, B and λ the initial value problem ( −u00 + qu = λu + f in I, u(c) = A, u0 (c) = B has a unique, continuously differentiable solution u with locally absolutely continuous derivative defined in I. If A, B are independent of λ the solution u(x, λ) and its x-derivative will be entire functions of λ, locally uniformly in x. We shall use this only if f is actually locally square integrable. The theorem has the following immediate consequence. Corollary 10.2. Let q, λ and I be as in Theorem 10.1. Then the set of solutions to −u00 + qu = λu in I is a 2-dimensional linear space. If one rewrites −u00 + qu = λu + f as a first order system according to the prescription before (10.1), then Theorem 10.1 and Corollary 10.2 become special cases of the theorems for first order systems given in Appendix C. In order to get a spectral theory for (10.3) we need to define a minimal operator, show that it is densely defined and symmetric, calculate its adjoint and find the selfadjoint restrictions of the adjoint. We define Tc to be the operator u 7→ −u00 + qu with domain consisting of those continuously differentiable functions u which have compact support, i.e., they are zero outside some compact subinterval of the interior of I, and which are such that u0 is locally absolutely continuous with −u00 + qu ∈ L2 (I). We will show that Tc is densely defined and symmetric and then calculate its adjoint, but first need some preparation. If u, v are differentiable functions we define [u, v] = u(x)v 0 (x) − u0 (x)v(x). This is called the Wronskian of u and v. It is clear that [u, v] = −[v, u], in particular [u, u] = 0. The following elementary fact is very important. Proposition 10.3. If u and v are linearly independent solutions of −v + qv = λv on I, then the Wronskian [u, v] is a non-zero constant on I. 00

10. BOUNDARY CONDITIONS

61

Proof. Differentiating we obtain [u, v]0 = uv 00 − u00 v = u(q − λ)v − (q −λ)uv = 0 so that the Wronskian is constant. If the constant is zero, then given any point c ∈ I the vectors (u(c), u0 (c)) and (v(c), v 0 (c)) are proportional. Since the initial value problem has a unique solution this implies that u and v are proportional. ¤ Now let v1 and v2 be solutions of −v 00 + qv = λv in I such that [v1 , v2 ] = 1 in I. There are certainly such solutions. We may for example pick a point c ∈ I and specify initial values v1 (c) = 1, v10 (c) = 0 respectively v2 (c) = 0, v20 (c) = 1. By Proposition 10.3 the Wronskian is constant equal to its value at c, which is 1. The following theorem states a version of the classical method known as variation of constants for solving the inhomogeneous equation in terms of the solutions of the homogeneous equation. Lemma 10.4. Let v1 , v2 be solutions of −v 00 +qv = λv with [v1 , v2 ] = 1, let c ∈ I and suppose f is locally integrable in I. Then the solution u of −u00 + qu = λu + f with initial data u(c) = u0 (c) = 0 is given by Zx Zx (10.4) u(x) = v1 (x) v2 (y)f (y) dy − v2 (x) v1 (y)f (y) dy. c

c

Proof. With u given by (10.4) clearly u(c) = 0. Differentiating we obtain Zx Zx 0 0 0 u (x) = v1 (x) v2 f − v2 (x) v1 f, c

c

since the two other terms obtained cancel. Thus u0 (c) = 0. Differentiating again we obtain Zx Zx u00 (x) = v100 (x) v2 f −v200 (x) v1 f −[v1 , v2 ]f (x) = (q(x)−λ)u(x)−f (x), c

which was to be proved.

c

¤

Corollary 10.5. If f ∈ L1 (I) with compact support in I then −u only if R + qu = f has a solution u with compact support in I if and 00 vf = 0 for all solutions v of the homogeneous equation −v +qv = 0. I 00

Proof. If we choose c to the left of the support of f , then by Lemma 10.4 the function u given by (10.4) is the only solution of −u00 + qu = f which vanishes to the left of c. Since v1 , v2 are linearly independent the equation has a solution of compact support if and only if f is orthogonal to both v1 and v2 , which are a basis for the solutions of the homogeneous equation. The corollary follows. ¤ Lemma 10.6. The operator Tc is densely defined and symmetric. Furthermore, if u ∈ D(Tc∗ ) and f = Tc∗ u, then u is differentiable with

62

10. BOUNDARY CONDITIONS

locally absolutely continuous derivative and satisfies −u00 + qu = f . Conversely, if u, f ∈ L2 (I) and this equation is satisfied, then u ∈ D(Tc∗ ) and Tc∗ u = f . Proof. Let u1 be a solution of −u001 + qu1 = f . Assume u0 is in the domain of Tc and put f0 = Tc u0 . Integrating by parts twice we get Z Z Z Z 00 00 (10.5) u0 f = u0 (−u1 + qu1 ) = (−u0 + qu0 )u1 = f0 u1 . I

I

I

I

So, if f is orthogonal to the domain of Tc , then u1 is orthogonal to all compactly supported elements f0 ∈ L2 (I) for which there is a solution u0 of −u000 + qu0 = f0 with compact support. By Corollary 10.5 it follows that u1 solves −v 00 + qv = 0 so that f = 0. Thus Tc is densely defined. The calculation (10.5) also proves the converse part of the lemma. Furthermore, if u is in the domain of Tc∗ with Tc∗ u = f we obtain 0 = hu0 , f i − hf0 , ui = hf0 , u1 − ui. Just as before it follows that u1 − u solves the equation −v 00 + qv = 0. It follows that u solves the equation −u00 + qu = f so that Tc ⊂ Tc∗ . The proof is complete. ¤ Being symmetric and densely defined Tc is closeable, and we define the minimal operator T0 as the closure of Tc and denote the domain of T0 (the minimal domain) by D0 . Similarly, the maximal operator T1 is T1 := Tc∗ with domain D1 ⊃ D0 . Thus the maximal domain D1 consists of all differentiable functions u ∈ L2 (I) such that u0 is locally absolutely continuous function for which T1 u = −u00 + qu ∈ L2 (I). We can now apply the theory of Chapter 9. The deficiency indices of T0 are accordingly the number of solutions of −u00 + qu = iu and −u00 + qu = −iu respectively which are linearly independent and in L2 (I). Since there are only 2 linearly independent solutions for each of these equations the deficiency indices can be no larger than 2. For the equation (10.3) the deficiency indices are always equal, since if u solves −u00 + qu = λu, then u solves the equation with λ replaced by λ, and linear independence is preserved when conjugating functions. Thus, for our equation there are only three possibilities: The deficiency indices may both be 2, both may be 1, or both may be 0. We shall see later that all three cases can occur, depending on the choice of q and I. We will now take a closer look at how selfadjoint realizations are determined as restrictions of the maximal operator. Suppose u1 and u2 ∈ D1 . Then the boundary form (cf. Chapter 9) is Z (10.6) h(u1 , T1 u1 ), U(u2 , T1 u2 )i = i (u1 T1 u2 − T1 u1 u2 ) I

Z (−u1 u002

=i I

+

u001 u2 )

Z

= −i

¯ ¯ [u1 , u2 ]0 = −i lim [u1 , u2 ]¯ , K→I

I

K

10. BOUNDARY CONDITIONS

63

the limit being taken over compact subintervals K of I. We must restrict T1 so that this vanishes. In some sense this means that the restriction of T1 to a selfadjoint operator T is obtained by boundary conditions since the limit clearly only depends on the values of u1 and u2 in arbitrarily small neighborhoods of the endpoints of I. This is of course the motivation for the terms boundary operator and boundary form. The simplest case is when an endpoint is an element of I. This means that the endpoint is a finite number, and that q is integrable near the endpoint. Such an endpoint is called regular ; otherwise the endpoint is singular. If both endpoints are regular, we say that we are dealing with a regular problem. We have a singular problem if at least one of the endpoints is infinite, or if q ∈ / L1 (I). Consider now a regular problem. It is clear that the deficiency indices are both 2 in the regular case, since all solutions of −u00 + qu = iu are continuous on the compact interval I and thus in L2 (I). We shall investigate which boundary conditions yield selfadjoint restrictions of T1 . The boundary form depends only on the boundary values (u(a), u0 (a), u(b), u0 (b)), and the possible boundary values constitute a linear subspace of C4 . On the other hand, the boundary form is positive definite on Di and negative definite on D−i , both of which are 2-dimensional spaces. The boundary values for the deficiency spaces therefore span two two-dimensional spaces which do not overlap. It follows that as u ranges through D1 the boundary values range through all of C4 . The boundary conditions need to restrict the 4-dimensional space Di ⊕D−i to the 2-dimensional space D of Theorem 9.2, so two independent linear conditions are needed. This means that there are 2 × 2 matrices A and B such thatµthe boundary conditions are given by AU (a)+ ¶ u BU (b) = 0, where U = . Linear independence of the conditions −u0 means that the 2 × 4 matrix (A, B) must have linearly independent rows. Consider first the case when A is invertible. Then the condition 0 1 is of the form U (a) = SU (b), where S = −A−1 B. If J = ( −1 0 ) the ∗ ∗ boundary form is −i{(U2 (a)) JU1 (a) − (U2 (b)) JU1 (b)}, so symmetry requires this to vanish. Inserting U (a) = SU (b) the condition becomes (U2 (b))∗ (S ∗ JS − J)U1 (b) = 0 where U1 (b) and U2 (b) are arbitrary 2 × 1 matrices. Thus it follows that the condition U (a) = SU (b) gives a selfadjoint restriction of T1 precisely if S satisfies S ∗ JS = J. Such a matrix S is called symplectic. Important special cases are when S is plus or minus the unit matrix. These cases are called periodic and antiperiodic boundary conditions respectively. Another valid choice is S = J. Since det J = 1 6= 0 it is clear that any symplectic matrix S satisfies | det S| = 1 (see also Exercise 10.1). In particular, it is invertible. It is clear that the inverse

64

10. BOUNDARY CONDITIONS

of a symplectic matrix is also symplectic (show this!), so it follows that assuming the matrix B to be invertible again leads to boundary conditions of the form U (a) = SU (b) with a symplectic S. It remains to consider the case when neither A nor B is invertible. Neither A nor B can then be zero, since then the other matrix must be invertible. Thus A and B both have linearly dependent rows, one of which has to be non-zero. We may assume the first row in A to be non-zero, and then adding an appropriate multiple of the first row to the second in (A, B) we may assume the second row of A to be zero. The second row of B will then be non-zero since the rows of (A, B) are linearly independent, and then adding an appropriate multiple of the second row to the first we may cancel the first row of B. At this point the first row gives a condition on U (a) and the second a condition on U (b). Such boundary conditions are called separated. We end the discussion of the regular case by determining what separated boundary conditions give rise to selfadjoint restrictions of T1 . Separated boundary conditions require u1 u02 − u01 u2 to vanish in each endpoint. One possibility is of course to require u1 (and u2 ) to vanish there. Such a boundary condition is called a Dirichlet condition. If there is an element u1 in the domain of the selfadjoint realization for which u1 (a) does not vanish in a we obtain for u2 = u1 that 0 = u01 (a)u1 (a) − u1 (a)u01 (a) = 2i Im u01 (a)u1 (a) so that u01 (a)u1 (a) is real. Equivalently u01 (a)/u1 (a) is real, say = −h ∈ R. If u is any other element of the domain the condition for symmetry becomes 0 = u0 (a)u1 (a)−u(a)u01 (a) = (u0 (a)+hu(a))u1 (a) so that we must have u0 (a) + hu(a) = 0. On the other hand, imposing this condition on all elements of the maximal domain clearly makes the boundary form at a vanish. In particular, if h = 0 we have a Neumann boundary condition. We may of course find α ∈ (0, π) such that h = cot α, and multiplying through by sin α the boundary condition becomes (10.7)

u(a) cos α + u0 (a) sin α = 0,

and then α = 0 gives a Dirichlet condition. For α = π/2 we obtain a Neumann condition, and any separated, selfadjoint boundary condition at a is given by (10.7) for some α ∈ [0, π). To summarize: Separated, symmetric boundary conditions for a Sturm-Liouville equation are of the form (10.7) at a with a similar condition at b (possibly for a different value of α, of course). Important special cases are α = 0, a Dirichlet condition, and α = π/2, a Neumann condition. Every other selfadjoint realization is given by coupled boundary conditions U (a) = SU (b) for a symplectic matrix S. Important special cases are periodic and antiperiodic boundary conditions.

10. BOUNDARY CONDITIONS

65

Let us now consider the singular case. We will then first consider the case when one endpoint is regular and the other singular. So, assume that I = [a, b) with a regular and b possibly singular. Lemma 10.7. There are elements of D1 which vanish in a neighborhood of b and have arbitrarily prescribed initial values u(a) and u0 (a). Proof. Let c ∈ (a, b) and f ∈ L2 (a, b) vanish in (c, b). Now solve −u00 + qu = f with initial data u(c) = u0 (c) = 0 so that u vanishes in (c, b). It follows that u ∈ D1 , and we need to show that u(a) and u0 (a) can be chosen arbitrarily by selection of f . Note that if −v 00 + qv = 0 integrating by parts twice shows that Zc hf, vi = (−u00 + qu)v = [−u0 v + uv 0 ]ca = u0 (a)v(a) − u(a)v 0 (a). a

If v1 and v2 are solutions of −v 00 +qv = 0 satisfying v1 (a) = 1, v10 (a) = 0 and v2 (a) = 0, v20 (a) = −1 respectively, we obtain u(a) = hf, v2 i and u0 (a) = hf, v1 i. Since v1 , v2 are linearly independent we can choose f to give arbitrary values to this, for example by choosing f as an appropriate linear combination of v1 and v2 in [a, c]. ¤ The fact that T1 and T0 are closed means that their domains are Hilbert spaces with norm-square kuk21 = kuk2 +kT1 uk2 . We shall always view D1 and D0 as spaces in this way. We also note that if u ∈ D1 , then u is continuously differentiable. If K is a compact interval we define C 1 (K) to be the linear space of continuously differentiable functions provided with the norm kukK = supK |u|+supK |u0 |. Convergence for a sequence {uj }∞ 1 in this space therefore means uniform convergence on 0 K of uj and uj as j → ∞. This space is easily seen to be complete, and thus a Banach space. As we noted above, if K is a compact subinterval of I, then the restriction to K of any element of D1 is in C 1 (K). We will need the following fact. Lemma 10.8. For every compact subinterval K ⊂ I there exists a constant CK such that kukK ≤ CK kuk1 for every u ∈ D1 . In particular, the linear forms D1 3 u 7→ u(x) and D1 3 u 7→ u0 (x) are locally uniformly bounded in x. Proof. The restriction map D1 3 u 7→ u ∈ C 1 (K) is linear and we will show that this map is closed. By the closed graph theorem (see Appendix A) it then follows that this map is bounded, which is the statement of the lemma. To show that the map is closed we must show that if uj → u in D1 and the restrictions to K of uj converge to u˜ in C 1 (K), then the restriction to K of u equals u˜. But this is clear, since if uj converges in L2 (I) to u, then their restrictions to K converge in L2 (K) to the restriction of u to K. At the same time the restrictions to K converge

66

10. BOUNDARY CONDITIONS

R R uniformly to u˜, so that K |uj −u|2 converges both to 0 and to K |˜ u−u|2 . It follows that u = u˜ a.e. in K. ¤ A bounded Hermitian form on a Hilbert space H is a map H × H 3 (u, v) 7→ B(u, v) ∈ C such that |B(u, v)| ≤ Ckukkvk for some constant C. It is clear that the boundedness of a Hermitian form is equivalent to it being continuous as a function of its arguments. The boundary form i(hu, T1 vi − hT1 u, vi) is a bounded Hermitian form on D1 , i.e., it is a Hermitian form in u, v and is bounded by kuk1 kvk1 , and by Lemma 10.8 the boundary form at a, i.e., u0 (a)v(a) − u(a)v 0 (a), is also a bounded Hermitian form (bounded by 2CK kuk1 kvk1 if a ∈ K). Since i(hu, T1 vi − hT1 u, vi) = −i lim[u, v](x) + i[u, v](a) x→b

we see that i limx→b (u0 (x)v(x) − u(x)v 0 (x)), the boundary form at b, is also a bounded Hermitian form on D1 . Since the forms at a and b vanish if u is in the domain of Tc , i.e., if u vanishes near a and b, it follows that they also vanish if u ∈ D0 . In particular, if u ∈ D0 , then u(a) = u0 (a) = 0. Now T0 is the adjoint of T1 so it follows that this is the only condition at a for an element of D1 to be in D0 , since this guarantees that the form at a vanishes. Of course, u ∈ D0 also requires that the form at b vanishes. Now let Ta be the closure of the restriction of T1 to those elements of D1 which vanish near b, and let Da be the domain of Ta . Then Lemma 10.7 and the boundedness of the forms at a and b show that the boundary form at b vanishes on Da and that dim D1 /D0 ≥ dim Da /D0 ≥ 2. We obtain the following theorem. Theorem 10.9. If the interval I has one regular endpoint a, then n+ = n− ≥ 1. If n+ = n− = 1, then the boundary form at the singular endpoint vanishes on D1 , and any selfadjoint restriction of T1 is given by a boundary condition of the form (10.7) at a and no condition at all at b. Proof. If n+ = n− = 0 then T1 = T0 so that T1 is selfadjoint, according to Theorem 9.1. But then we can not have dim D1 /D0 ≥ 2. If n+ = n− = 1, then 2 = dim D1 /D0 ≥ dim Da /D0 ≥ 2 so that we must have D1 = Da . Thus the boundary form at the singular endpoint vanishes on D1 , and the boundary form at a vanishes precisely if we impose a boundary condition of the form (10.7). ¤ If n+ = n− = 2 we obtain a selfadjoint restriction of T1 by imposing two appropriate boundary conditions. One of them can be a condition of the form (10.7), and then a condition at the singular endpoint also has to be imposed. There are also selfadjoint restrictions obtained by imposing coupled boundary conditions. See Exercise 10.3. Whether one obtains deficiency indices 1 or 2 when one endpoint is regular clearly only depends on conditions near the singular endpoint.

EXERCISES FOR CHAPTER 10

67

It is customary to say that a singular endpoint is in the limit point condition if the deficiency indices are 1 and in the limit circle condition if the deficiency indices are 2. The terminology derives from the methods Weyl [12] used in 1910 to construct the resolvent of a Sturm-Liouville operator. If an interval has two singular endpoints in the limit point condition it is clear that T1 is selfadjoint, since the boundary form vanishes on D1 . No boundary conditions are therefore required in this case, and one often says that the operator Tc is essentially selfadjoint, since its closure T0 = T1 is selfadjoint. If one or both of the endpoints are in the limit circle condition, we have a situation similar to when the endpoint is regular, and need to impose boundary conditions of a similar type. Note, however, that at a limit circle endpoint the limits of u(x) and u0 (x) for an element u ∈ D1 do not necessarily exist. To formulate the boundary conditions in explicit terms one may instead use the idea of Exercise 10.3. It is clearly an important problem to find explicit conditions on the interval and the coefficient q which guarantee limit point or limit circle conditions. A large number of different criterions for this are available today. We end the chapter by proving a simple criterion, known already to Weyl (with a more complicated proof), for the limit point condition at an infinite interval endpoint. Theorem 10.10. Suppose q is bounded from below near ∞. Then (10.3) is in the limit point condition at ∞. Proof. Suppose q > C on [a, ∞). Let u be the solution of −u00 + qu = (C + i)u with initial data u(a) = 1, u0 (a) = 0. We have (Re(u0 u))0 = Re(|u0 |2 + u00 (x)u(x)) = |u0 |2 + Re((q − C − i)|u|2 = |u0 |2 + (q − C)|u|2 ≥ 0. Thus Re(u0 u) is increasing and Re(u0 (0)u(0)) = 0 so Re(u0 u) ≥ 0. But (|u|2 )0 = 2 Re(u0 u) so |u|2 is increasing. It follows that |u|2 ≥ 1 so that one can not have u ∈ L2 (0, ∞). Thus deficiency indices are < 2. ¤ For a more general result, see Exercise 10.4. Exercises for Chapter 10 Exercise 10.1. Show that a symplectic 2 × 2 matrix S is of the form eiθ P where θ ∈ R and P is a real 2 × 2 matrix with determinant 1. Also show that the inverses and adjoints of symplectic matrices are symplectic. Exercise 10.2 (Hard!). Suppose that all solutions of −v 00 +qv = λv are in L2 (I) for some real or non-real λ. Show that this is then true for all complex λ.

68

10. BOUNDARY CONDITIONS

Hint: If −u00 + qu = µu, write this as −u00 + (q − λ)u = (µ − λ)u and use the variation of constants formula, thinking of (µ − λ)u as an inhomogeneous term, to write down an integral equation for u in terms of solutions of −v 00 + qv = λv. Using an initial point sufficiently close to an endpoint use estimates in this integral equation to show that u is square integrable near the endpoint. Exercise 10.3. Show that [u1 , v1 ][u2 , v2 ] − [u1 , v2 ][u2 , v1 ] = [u1 , u2 ][v1 , v2 ] for differentiable functions u1 , u2 , v1 , v2 . Next show that if [v1 , v2 ] = 1, then the boundary form for u1 , u2 ∈ D1 at b equals limb ([u1 , v1 ][u2 , v2 ]− [u1 , v2 ][u2 , v1 ]). Furthermore, show that if −v 00 + qv = λv and −u00 + qu = f then ([u, v])0 = (f − λu)v. Finally show that if all solutions of −v 00 + qv = λv are in L2 (a, b) and if u ∈ D1 then the limit at b of [u, v] exists. Conclude that in the case n+ = n− = 2 selfadjoint boundary conditions may be described by conditions on the values of [u, v1 ], [u, v2 ] at the endpoints of exactly the same form as we described them for the regular case on the values of u, u0 . Exercise 10.4. Show that (10.3) is in the limit point condition at ∞ if q = q0 + q1 where q0 is bounded from below and q1 ∈ L1 (0, ∞). Rx Hint: First show that |u(x)|2 ≤ 2 0 |u0 ||u| if u(0) = 0, then multiply integrate. Conclude Rbyx |q1 | and Rx 0 2 R x 2 that there is a constant A so that 2 |q ||u| ≤ |u | + A |u| for all x > 0. Now show, similar to the 1 0 0 0 proof of Theorem 10.10, that |u|2 is increasing if u(0) = 0, u0 (0) = 1 and u satisfies −u00 + qu = λu for an appropriate λ.

CHAPTER 11

Sturm-Liouville equations The spectral theorem we proved in Chapter 7 is very powerful, but sometimes its abstract nature is a drawback, and one needs a more explicit expansion, analogous to Fourier series or Fourier transforms. A general theorem of this type was proved by von Neumann in 1949, but it is still of a fairly abstract nature. It can be applied to elliptic partial differential equations (G˚ arding around 1952), but gives more satisfactory results when applied to ordinary differential equations. How to do this was described by G˚ arding in an appendix to John, Bers and Schechter: Partial Differential Equations, (1964). A slightly more general situation was treated in [2]. For Sturm-Liouville equations one can, however, as easily obtain an expansion theorem directly. We will do that in this chapter. As in our proof of the spectral theorem, we will deduce our results from properties of the resolvent, but now need to have a more explicit description of the resolvent operator. The first step is to prove that the resolvent is actually an integral operator. First note that all elements of D1 are continuously differentiable with locally absolutely continuous derivative, and according to Lemma 10.8 point evaluations of elements of D1 (and their derivatives) are locally uniformly bounded linear forms on D1 . If T is a selfadjoint realization of (10.3) in L2 (I) its resolvent Rλ is a bounded operator on L2 (I) for every λ in the resolvent set. If E denotes the identity on L2 (I) we have (T − λ)Rλ = E so that T Rλ = E + λRλ . Thus kT Rλ k ≤ 1 + |λ|kRλ k. Since Rλ u is in the domain of T we may also view the resolvent as an operator Rλ : L2 (I) → D1 , where D1 is viewed as a Hilbert space provided with the graph norm, as on page 65. This operator is bounded since kRλ uk21 = kRλ uk2 + kT Rλ uk2 ≤ (kRλ k2 + (|λ|kRλ k + 1)2 )kuk2 . It is also clear that the analyticity of Rλ implies the analyticity of T Rλ = E + λRλ , and therefore the analyticity of Rλ : L2 (I) → D1 . We obtain the following theorem. Theorem 11.1. Suppose I is an interval, and that T is a selfadjoint realization in L2 (I) of the equation (10.3). Then the resolvent Rλ of T may be viewed as a bounded linear map from L2 (I) to C 1 (K), for any compact subinterval K of I, which depends analytically on λ ∈ ρ(T ), in the uniform operator topology. Furthermore, there exists Green’s 69

70

11. STURM-LIOUVILLE EQUATIONS

function g(x, y, λ), which is in L2 (I) as a function of y for every x ∈ I and such that Rλ u(x) = hu, g(x, ·, λ)i for any u ∈ L2 (I). There is also a kernel g1 (x, y, λ) in L2 (I) as a function of y for every x ∈ I such that (Rλ u)0 (x) = hu, g1 (x, ·, λ)i for any u ∈ L2 (I). Proof. We already noted that ρ(T ) 3 λ 7→ Rλ ∈ B(L2 (I), D1 ) is analytic in the uniform operator topology. Furthermore, the restriction operator IK : D1 → C 1 (K) is bounded and independent of λ. Hence ρ(T ) 3 λ → IK Rλ is analytic in the uniform operator topology. In particular, for fixed λ ∈ ρ(T ) and any x ∈ I, the linear form L2 (I) 3 u 7→ (IK Rλ u)(x) = Rλ u(x) is (locally uniformly) bounded. By Riesz’ representation theorem we have Rλ u(x) = hu, g(x, ·, λ)i, where y 7→ g(x, y, λ) is in L2 (I). Similarly, since L2 3 u 7→ (Rλ u)0 (x) is a bounded linear form for each x ∈ I the kernel g1 exists. ¤ Among other things, Theorem 11.1 tells us that if uj → u in L2 (I), then Rλ uj → Rλ u in C 1 (K), so that Rλ uj and its derivative converge locally uniformly. This is actually true even if uj just converges weakly, but all we need is the following weaker result. Lemma 11.2. Suppose Rλ is the resolvent of a selfadjoint relation T as above. Then if uj * 0 weakly in L2 (I), it follows that both Rλ uj → 0 and (Rλ uj )0 → 0 pointwise and locally boundedly. Proof. Rλ uj (x) = huj , g(x, ·, λ)i → 0 since y 7→ g(x, y, λ) is in L2 (I) for any x ∈ I. Now let K be a compact subinterval of I. A weakly convergent sequence in L2 (I) is bounded, so since Rλ maps L2 (I) boundedly into C 1 (K), it follows that Rλ uj (x) is bounded independently of j and x for x ∈ K. Similarly for the sequence of derivatives. ¤ Corollary 11.3. If the interval I is compact, then any selfadjoint restriction T of T1 has compact resolvent. Hence T has a complete orthonormal sequence of eigenfunctions in L2 (I). Proof. Suppose uj * 0 weakly in L2 (I). If I is compact, then Lemma 11.2 implies that Rλ uj → 0 pointwise and boundedly in I, and hence by dominated convergence Rλ uj → 0 in L2 (I). Thus Rλ is compact. The last statement follows from Theorem 8.3. For a different proof, see Corollary 11.7. ¤ If T has compact resolvent, then the generalized Fourier series of any u ∈ L2 (I) converges to u in L2 (I). For functions in the domain of T much stronger convergence is obtained. Corollary 11.4. Suppose T has a complete orthonormal sequence of eigenfunctions in L2 (I). If u is in the domain of T , then the generalized Fourier series of u, as well as the differentiated series, converges locally uniformly in I. In particular, if I is compact, the convergence is uniform in I.

11. STURM-LIOUVILLE EQUATIONS

71

Proof. Suppose u is in the domain of T , i.e., T u = v for some v ∈ L2 (I), and let v˜ = v − iu, so that u = Ri v˜. If e is an eigenfunction of T with eigenvalue λ we have T e = λe or (T + i)e = (λ + i)e so that R−i e = e/(λ + i). It follows that hu, ei e = hRi v˜, ei e = h˜ v , R−i ei e = 1 h˜ v , ei e = h˜ v , eiR e. If s u denotes the N :th partial sum of the i N λ−i Fourier series for u it follows that sN u = Ri sN v˜, where sN v˜ is the N :th partial sum for v˜. Since sN v˜ → v˜ in L2 (I), it follows from Theorem 11.1 and the remark after it that sN u → u in C 1 (K), for any compact subinterval K of I. ¤ The convergence is actually even better than the corollary shows, since it is absolute and uniform (see Exercise 11.2). Example 11.5. Consider the equation −u00 = λu, first in L2 (−π, π), 0 0 with periodic boundary conditions √u(−π) = u(π), √ u (−π) = u (π). The general solution is u(x) = A cos( λ x) + B sin( λ x), where A, B are constants. The boundary conditions may be viewed as linear equations for determining the constants A and B, and if there is going to be a non-trivial solution, the determinant must vanish. The determinant is √ ¯ ¯ √ ¯ 0√ 2 sin( λ π)¯¯ 2 ¯ λ π) = 4 sin ( ¯−2 sin( λ π) ¯ 0 so that λ = k 2 , where k ∈ N. For each eigenvalue k 2 > 0 we have two linearly independent eigenfunctions cos(kx) and sin(kx). For the eigenvalue 0 the eigenfunction is √12 . These functions are orthonormal Rπ if we use the scalar product hu, vi = π1 −π uv (check!). We obtain the P classical (real) Fourier series f (x) = a20 + ∞ k=1 (ak cos kx + bk sin kx), where a0 = hf, 1i, ak = hf (x), cos kxi for k > 0, and bk = hf (x), sin kxi. In this case Corollary 11.4 states that the series for u as well as that for u0 converge uniformly if u is continuously differentiable with an absolutely continuous derivative such that u00 ∈ L2 (−π, π). Now consider the same equation in L2 (0, π), with separated boundary conditions u(0) = 0 and u(π) = 0. Applying this √ to the general solution we obtain first B = 0 and then A sin λ π) = 0, so a non-trivial solution exists only if λ = k 2 for a positive integer k. Thus the eigenfunctions are sin x, sin 2x,R. . . . These are orthonormal π if the scalar product used is hu, vi = π2 0 uv. We obtain a sine seP∞ ries f (x) = k=1 bk sin(kx), where bk = hf (x), sin kxi. This is the series expansion relevant to the vibrating string problem discussed in Chapter 0 (if the length of the string is π). Finally, consider the same equation, still in L2 (0, π), but now with separated boundary conditions u0 (0) = 0 and u0 (π) = 0. Applying this √ to the general solution we obtain first A = 0 and then B sin( λ π) = 0, so a non-trivial solution requires λ = k 2 for a non-negative integer k. Thus the eigenfunctions are √12 , cos x, cos 2x, . . . . These are orthonormal with the same scalar product as in the previous example. We

72

11. STURM-LIOUVILLE EQUATIONS

obtain a cosine series f (x) = and ak = hf (x), cos kxi.

a0 2

+

P∞ k=1

ak cos(kx), where a0 = hf, 1i

We have thus retrieved some of the classical versions of Fourier series, but is clear that many other variants are obtained by simply varying the boundary conditions, and that many more examples are obtained by choosing a non-zero q in (10.3). We now have a satisfactory eigenfunction expansion theory for regular boundary value problems, so we turn next to singular problems. We then need to take a much closer look at Green’s function. We shall here primarily look at the case of separated boundary conditions for I = [a, b) where a is a regular endpoint and b possibly singular, and refer the reader to the theory of Chapter 15 for the general case. With this assumption Green’s function has a particularly simple structure. Assume that ϕ, θ are solutions of −u00 + qu = λu with initial data ϕ(a, λ) = − sin α, ϕ0 (a, λ) = cos α and θ(a, λ) = cos α, θ0 (a, λ) = sin α. Theorem 11.6. Suppose I = [a, b) with a regular, and that T is given by the separated condition (10.7) at a, and another separated condition at b if needed, i.e., if b is regular or in the limit circle condition. If Im λ 6= 0, then g(x, y, λ) = ϕ(min(x, y), λ)ψ(max(x, y), λ) where ψ is called the Weyl solution and is given by ψ(x, λ) = θ(x, λ)+m(λ)ϕ(x, λ). Here m(λ) is called the Weyl-Titchmarsh m-coefficient and is a Nevanlinna function in the sense of Chapter 6. The kernel g1 is g1 (x, y, λ) = ϕ0 (x, λ)ψ(y, λ) if x < y and g1 (x, y, λ) = ϕ(y, λ)ψ 0 (x, λ) if x > y. Proof. It is easily verified that [θ, ϕ] = 1. Now ϕ satisfies the boundary condition at a and can therefore only satisfy the boundary condition at b if λ is an eigenvalue and thus real. On the other hand, there will be a solution in L2 (a, b) satisfying the boundary condition at b, since if deficiency indices are 1 there is no condition at b, and if deficiency indices are 2, then the condition at b is a linear, homogeneous condition on a two-dimensional space, which leaves a space of dimension 1. Thus we may find a unique m(λ) so that ψ = θ + mϕ satisfies the boundary condition at b. It follows that [ψ, ϕ] = [θ, ϕ] + m[ϕ, ϕ] = 1. Now setting v(x) = hu, g(x, ·, λ)i and assuming that u ∈ L2 (a, b) has compact support we obtain Zx v(x) = ψ(x, λ)

Zb uϕ(·, λ) + ϕ(x, λ)

a

so that v(a) = − sin α

Rb a

uψ(·, λ), x

uψ(·, λ). Differentiating we obtain Zx

(11.1)

0

0

Zb 0

v (x) = ψ (x, λ)

uϕ(·, λ) + ϕ (x, λ) a

uψ(·, λ), x

11. STURM-LIOUVILLE EQUATIONS

73

Rb since the other two terms obtained cancel. Thus v 0 (a) = cos α a uψ(·, λ) so v satisfies the boundary condition R bat a. If x is to the right of the support of u we obtain v(x) = ψ(x, λ) a uϕ(·, λ) so that v also satisfies the boundary condition at b, being a multiple of ψ near b. Differentiating again we obtain −v 00 (x) + (q(x) − λ)v(x) = [ψ, ϕ]u(x) = u(x). It follows that v = Rλ u and, since compactly supported functions are dense in L2 (a, b), that g(x, y, λ) is Green’s function for our operator. From (11.1) now follows that the kernel g1 is as stated. It remains to show that m is a Nevanlinna function. If u and v both have compact supports in I we have ZZ hRλ u, vi = g(x, y, λ)u(y)v(x) dxdy, the double integral being absolutely convergent. Similarly ZZ hu, Rλ vi = g(y, x, λ)u(y)v(x) dxdy, and since the integrals are equal for all u, v by Theorem 5.2.2 we obtain g(x, y, λ) = g(y, x, λ) or, if x < y, ϕ(x, λ)θ(y, λ) + ϕ(x, λ)ϕ(y, λ)m(λ) = ϕ(x, λ)θ(y, λ) + ϕ(x, λ)ϕ(y, λ)m(λ), since ϕ(·, λ) = ϕ(·, λ) and similarly for θ. Since ϕ(x, λ) 6= 0 for non-real λ (why?) it follows that m(λ) = m(λ). Now λ 7→ Rλ u(x) is analytic for non-real λ and for compactly supported u Zx Rλ u(x) = θ(x, λ)

Zb uϕ(·, λ) + ϕ(x, λ)

a

uθ(·, λ) x

Zb uϕ(·, λ).

+ m(λ)ϕ(x, λ) a

The first two terms on the right are obviously entire functions according to Theorem 10.1, as is the coefficient of m(λ), and since by choice of u we may always assume that this coefficient is non-zero in a neighborhood of any given λ it follows that m(λ) is analytic for non-real λ. Finally, integration by parts shows that Zx Zx £ 0 ¤x 2 λ |ψ| = −ψ ψ a + (|ψ 0 |2 + q|ψ|2 ). a

a

74

11. STURM-LIOUVILLE EQUATIONS

Taking the imaginary part of this and using the fact that ψ satisfies the boundary condition at b so that Im(ψ 0 ψ) → 0 at b we obtain Zb (11.2)

|ψ(·, λ)|2 =

0≤

Im m(λ) , Im λ

a

since a simple calculation shows that Im(ψ 0 (a, λ)ψ(a, λ)) = Im m(λ). It follows that m has all the required properties of a Nevanlinna function. ¤ Before we proceed, we note the following corollary, which completes our results for the case of a discrete spectrum. Corollary 11.7. Suppose both endpoints of I are either regular or in the limit circle condition. Then for any selfadjoint realization T the resolvent is compact. Thus there is a complete orthonormal sequence of eigenfunctions. Proof. By Theorem 8.4 it is enough to prove the corollary when T is given by separated boundary conditions. But as in the proof of Theorem 11.6 we can then find non-trivial solutions ψ− (·, λ) and ψ+ (·, λ) of −v 00 + qv = λv satisfying the boundary conditions to the left and right respectively. If Im λ 6= 0 the solutions ψ± (·, λ) can not be linearly dependent, since this would give a non-real eigenvalue for T . We may therefore assume [ψ+ , ψ− ] = 1 by multiplying ψ− , if necessary, by a constant. But then it is seen that ψ− (min(x, y), λ)ψ+ (max(x, y), λ) is Green’s function for T just as in the proof of Theorem 11.6. It is clear that the assumption implies that deficiency indices equal 2, so that ψ± are in L2 (I). However, an easy calculation now shows that Z |g(x, y, λ)|2 dxdy ≤ 2kψ− k2 kψ+ k2 < ∞. I×I

Thus, according to Theorem 8.7, the resolvent is a Hilbert-Schmidt operator, so that it is compact. ¤ If at least one of the interval endpoints is singular and in the limit point condition the resolvent may not be compact (but it can be!). In this case the only boundary condition will be a separated boundary condition at the other endpoint, unless this is also in the limit point condition, when no boundary conditions at all are required. We now return to the situation treated in Theorem 11.6 when I = [a, b) with a regular, and T is given by the separated condition (10.7) at a, and another separated condition at b if needed. Since the mcoefficient is a Nevanlinna function there is a unique increasing and left-continuous matrix-valued function ρ with ρ(0) = 0 and unique real

11. STURM-LIOUVILLE EQUATIONS

75

numbers A and B ≥ 0 such that Z∞ 1 t m(λ) = A + Bλ + ( − 2 ) dρ(t). t−λ t +1 −∞

The spectral measure dρ gives rise to a Hilbert space L2ρ , which consists of those functions R ∞ uˆ 2which are measurable with respect to dρ 2 and for which kˆ ukρ = −∞ |ˆ u| is finite. Alternatively, we may think of 2 Lρ as the completion in this norm of compactly supported, continuous functions. These alternative definitions give the same space, but we will not prove this here. We denote the scalar product in L2ρ by h·, ·iρ . The main result of this chapter is the following. Theorem 11.8. Rx (1) If u ∈ L2 (a, b) the integral 0 uϕ(·, t) converges in L2ρ as x → b. The limit is called the generalized Fourier transform of u and is denoted by F(u) or uˆ. We write this as uˆ(t) = hu, ϕ(·, t)i, although the integral may not converge pointwise. (2) The mapping u 7→ uˆ is unitary between L2 (a, b) and L2ρ so that the Parseval Rformula hu, vi = hˆ u, vˆiρ is valid if u, v ∈ L2 (a, b). (3) The integral K uˆ(t)ϕ(x, t) dρ(t) converges in L2 (a, b) as K → R through compact intervals. If uˆ = F(u) the limit is u, so the integral is the inverse of the generalized Fourier transform. Again, we write u(x) = hˆ u, ϕ(x, ·)iρ for u ∈ L2 (a, b), although the integral may not converge pointwise. (4) Let E∆ denote the R spectral projector of T for the interval ∆. Then E∆ u(x) = ∆ uˆϕ(x, ·) dρ. (5) If u ∈ D(T ) then F(T u)(t) = tˆ u(t). Conversely, if uˆ and tˆ u(t) are in L2ρ , then F −1 (ˆ u) ∈ D(T ). Before we prove this theorem, let us interpret it in terms of the spectral theorem. If the interval ∆ shrinks to a point t, then E∆ tends to zero, unless t is an eigenvalue, in which case we obtain the projection on the eigenspace. By (4) this means that eigenvalues are precisely those points at which the function ρ has a (jump) discontinuity; continuous spectrum thus corresponds to points where ρ is continuous, but which are still points of increase for ρ, i.e., there is no neighborhood of the point where ρ is constant. In terms of measure theory, this means that the atomic part of the measure dρ determines the eigenvalues, and the diffuse part of dρ determines the continuous spectrum. We will prove Theorem 11.8 through a long (but finite!) sequence of lemmas. First note that for u ∈ L2 (a, b) with compact support in [a, b) the function uˆ(λ) = hu, ϕ(·, λ)i is an entire function of λ since ϕ(x, λ) is entire, locally uniformly in x, according to Theorem 10.1. Lemma 11.9. The function hRλ u, vi − m(λ)ˆ u(λ)ˆ v (λ) is entire for all u, v ∈ L2 (a, b) with compact supports in [a, b).

76

11. STURM-LIOUVILLE EQUATIONS

Proof. If the supports are inside [a, c], direct calculation shows that the function is ¶ Zc µ Zx Zc θ(x, λ) uϕ(·, λ) + ϕ(x, λ) uθ(·, λ) v(x) dx . a

a

x

This is obviously an entire function of λ.

¤

11.10. Let σ be increasing and differentiable at 0. Then R 1 Lemma R 1 dσ(t) √ ds converges. −1 −1 t2 +s2 Proof. Integrating by parts we have, for s = 6 0, Z1 −1

dσ(t) σ(1) − σ(−1) √ √ = − t2 + s2 1 + s2

Z1 −1

σ(t) − σ(0) d 1 (t √ ) dt . t dt t2 + s2

The first factor in the last integral is bounded since σ 0 (0) exists, and the 1 second factor is negative since (t2 + s2 )− 2 decreases with |t|. Furthermore, the integral with respect to t of the second factor is integrable with respect to s, by calculation (check this!). Thus the double integral is absolutely convergent. ¤ As usual we denote the spectral projectors belonging to T by Et . Lemma 11.11. Let u ∈ L2 (a, b) have compact support in [a, b) and assume c < d to be points of differentiability for both hEt u, ui and ρ(t). Then Zd (11.3) hEd u, ui − hEc u, ui = |ˆ u(t)|2 dρ(t). c

Proof. Let Γ be the positively oriented rectangle with corners in c ± i, d ± i. According to Lemma 11.9 I I hRλ u, ui dλ = uˆ(λ)ˆ u(λ)m(λ) dλ Γ

Γ

if either of these integrals exist. However, by Lemma 11.9, I Z∞ I 1 t u(λ)m(λ) dλ = uˆ(λ)ˆ u(λ) ( − 2 ) dρ(t) dλ. uˆ(λ)ˆ t−λ t +1 Γ

Γ

−∞

The double integral is absolutely convergent except perhaps where t = λ. The difficulty is thus caused by Z1

Zµ+1 ds

−1

µ−1

uˆ(µ + is)ˆ u(µ − is) dρ(t) t − µ − is

11. STURM-LIOUVILLE EQUATIONS

77

for µ = c, d. However, Lemma 11.10 ensures the absolute convergence of these integrals. Changing the order of integration gives I

Z∞ I uˆ(λ)ˆ u(λ)m(λ) dλ =

uˆ(λ)ˆ u(λ)(

1 t − 2 ) dλ dρ(t) t−λ t +1

−∞ Γ

Γ

Zd |ˆ u(t)|2 dρ(t)

= −2πi c

since for c < t < d the residue of the inner integral is −|ˆ u(t)|2 dρ(t) whereas t = c, d do not carry any mass and the inner integrand is regular for t < c and t > d. Similarly we have Z∞ I I Zd dλ hRλ u, ui dλ = dhEt u, ui = −2πi dhEt u, ui t−λ Γ

−∞

c

Γ

which completes the proof.

¤

Lemma 11.12. If u ∈ L2 (a,Rb) the generalized Fourier transform x uˆ ∈ L2ρ exists as the L2ρ -limit of a uϕ(·, t) as x → b. Furthermore, Zt uˆvˆ dρ .

hEt u, vi = −∞

In particular, hu, vi = hˆ u, vˆiρ if u and v ∈ L2 (a, b). Proof. If u has compact support Lemma 11.11 shows that (11.3) holds for a dense set of values c, d since functions of bounded variation are a.e. differentiable. Since both Et and ρ are left-continuous we obtain, by letting d ↑ t, c → −∞ through such values, Zt uˆvˆ(t) dρ(t) hEt u, vi = −∞

when u, v have compact supports; first for u = v and then in general by polarization. As t → ∞ we also obtain that hu, vi = hˆ u, vˆiρ when u and v have compact supports. For arbitrary u ∈ L2 (a, b) we set, for c ∈ (a, b), ( u(x) for x < c uc (x) = 0 otherwise and obtain a transform uˆc . If also d ∈ (a, b) it follows that kˆ uc − uˆd kρ = kuc − ud k, and since uc → u in L2 (a, b) as c → b, Cauchy’s convergence principle shows that uˆc converges to an element uˆ ∈ L2ρ as c → b. The lemma now follows in full generality by continuity. ¤

78

11. STURM-LIOUVILLE EQUATIONS

Note that we have proved that F is an isometry from L2 (a, b) to L2ρ . R Lemma 11.13. The integral K uˆϕ(x, ·) dρ is in L2 (a, b) if K is a compact interval and uˆ ∈ L2ρ , and as K → R the integral converges in L2 (a, b). The limit F −1 (ˆ u) is called the inverse transform of uˆ. 2 −1 If u ∈ L (a, b) then F (F(u)) = u. F −1 (ˆ u) = 0 if and only if uˆ is orthogonal in L2ρ to all generalized Fourier transforms. Proof. If uˆ ∈ L2ρ has compact support, then u(x) = hˆ u, ϕ(x, ·)iρ 2 is continuous, so uc ∈ L (a, b) for c ∈ (a, b), and has a transform uˆc . We have Zc Z∞ ¡ ¢ kuc k2 = uˆϕ(x, ·) dρ u(x) dx. a

−∞

Considered as a double integral this is absolutely convergent, so changing the order of integration we obtain Z∞ µZc 2

kuc k = −∞

¶ uϕ(·, t) uˆ(t) dρ(t)

a

= hˆ u, uˆc iρ ≤ kˆ ukρ kˆ uc kρ = kˆ ukρ kuc k, according to Lemma 11.12. Hence kuc k ≤ kˆ ukρ , so u ∈ L2 (a, b), and kuk ≤ kˆ ukρ . If now uˆ ∈ L2ρ isRarbitrary, this inequality shows (like in the proof of Lemma 11.12) that K uˆ(t)ϕ(x, t) dρ(t) converges in L2 (a, b) as K → R through compact intervals; call the limit u1 . If v ∈ L2 (a, b), vˆ is its generalized Fourier transform, K is a compact interval, and c ∈ (a, b), we have Z µZ c



Zc

v(x)ϕ(x, t) dx uˆ(t) dρ(t) = K

a

Z v(x)

a

uˆ(t)ϕ(x, t) dρ(t) dx K

by absolute convergence. Letting c → b and K → R we obtain hˆ u, vˆiρ = hu1 , vi. If uˆ is the transform of u, then by Lemma 11.12 u1 − u is orthogonal to L2 (a, b), so u1 = u. Similarly, u1 = 0 precisely if uˆ is orthogonal to all transforms. ¤ We have shown the inverse transform to be the adjoint of the transform as an operator from L2 (a, b) into L2ρ . The basic remaining difficulty is to prove that the transform is surjective, i.e., according to Lemma 11.13, that the inverse transform is injective. The following lemma will enable us to prove this. Lemma 11.14. The transform of Rλ u is uˆ(t)/(t − λ).

11. STURM-LIOUVILLE EQUATIONS

Proof. By Lemma 11.12, hEt u, vi = Z∞ hRλ u, vi =

dhEt u, vi = t−λ

−∞

Z∞

Rt −∞

79

uˆvˆ dρ, so that

uˆ(t)ˆ v (t) dρ(t) = hˆ u(t)/(t − λ), vˆ(t)iρ . t−λ

−∞

By properties of the resolvent kRλ uk2 =

1 hRλ u − Rλ u, ui = 2i Im λ Z∞

dhEt u, ui = kˆ u(t)/(t − λ)k2ρ . 2 |t − λ|

−∞

Setting v = Rλ u and using Lemma 11.12, it therefore follows that kˆ u(t)/(t−λ)k2ρ = hˆ u(t)/(t−λ), F(Rλ u)iρ = kF(Rλ u)k2ρ . It follows that we have kˆ u(t)/(t − λ) − F (Rλ u)kρ = 0, which was to be proved. ¤ Lemma 11.15. The generalized Fourier transform is unitary from L (a, b) to L2ρ and the inverse transform is the inverse of this map. 2

Proof. According to Lemma 11.13 we need only show that if uˆ ∈ L2ρ has inverse transform 0, then uˆ = 0. Now, according to Lemma 11.14, F(v)(t)/(t − λ) is a transform for all v ∈ L2 (a, b) and non-real λ. Thus we have hˆ u(t)/(t − λ), F(v)(t)iρ = 0 for all non-real λ if uˆ is orthogonal to all transforms. But we R tcan view this scalar product as the Stieltjes-transform of the measure −∞ uˆF(v) dρ, so applying the R inversion formula Lemma 6.5 we have K uˆF(v) dρ = 0 for all compact intervals K, and all v ∈ L2 (a, b). Thus the cutoff of uˆ, which equals uˆ in K and 0 outside, is also orthogonal to all transforms, i.e., has inverse transform 0 according to Lemma 11.13. It follows that Z v(x) = uˆ(t)ϕ(x, t) dρ(t) K 2

is the zero-element of L (a, b) for any compact intervalRK. Differentiating under the integral sign we also see that v 0 (x) = K uˆϕ0 (x, ·) dρ is the zero element of L2 (a, b). But these functions are continuous, so R 0 they are pointwise 0. Now 0 = v (a) cos α − v(a) sin α = K uˆ dρ. Thus ¤ uˆ dρ is the zero measure, so that uˆ = 0 as an element of L2ρ . Lemma 11.16. If u ∈ D(T ), then F(T u)(t) = tˆ u(t). Conversely, if u) ∈ D(T ). uˆ and tˆ u(t) are in L2ρ , then F −1 (ˆ Proof. We have u ∈ D(T ) if and only if u = Rλ (T u − λu), which holds if and only if uˆ(t) = (F(T u)(t) − λˆ u(t))/(t − λ), i.e., F(T u)(t) = tˆ u(t), according to Lemmas 11.14 and 11.15. ¤ This completes the proof of Theorem 11.8. We also have the following analogue of Corollary 11.4.

80

11. STURM-LIOUVILLE EQUATIONS

Theorem 11.17. Suppose u ∈ D(T ). Then the inverse transform hˆ u, ϕ(x, ·)iρ converges locally uniformly to u(x). Proof. The proof is very similar to that of Corollary 11.4. Put v = (T −Ri)u so that u = Ri v. Let K be a compact interval, and put uK (x) = K uˆ(t)ϕ(x, t) dP (t) = F −1 (χˆ u)(x), where χ is the characteristic function for K. Define vK similarly. Then by Lemma 11.14 χ(t)ˆ v (t) Ri vK = F −1 ( ) = F −1 (χˆ u) = uK . t−i Since vK → v in L2 (a, b) as K → R, it follows from Theorem 11.1 that uK → u in C 1 (L) as K → R, for any compact subinterval L of [a, b). ¤ Example 11.18 (Sine and cosine transforms). Let us interpret Theorem 11.8 for the case of the equation −u00 = λu on the interval [0, ∞). We shall look at the cases when the boundary condition at 0 is either a Dirichlet condition (α = 0 in (10.7)) or a Neumann√condition √(α = π/2). The general solution of the equation is u(x) = Ae −λx +Be− −λx . Let the root be the principal branch, i.e., the branch where the real part is ≥ 0. Then√the only solutions in L2 (0,√ ∞) are, unless λ ≥ 0, √ − −λx the multiples of e = cos(i −λx) + i sin(i −λx). It follows that the equation is in the limit point condition at infinity (this is also a consequence of Theorem 10.10). √ With a Dirichlet condition √ √at 0 we have θ(x, λ) = cos(i −λx) and ϕ(x, λ) √ = −i sin(i −λx)/ −λ. It follows that the m-function is mD (λ) = − −λ. Similarly, the √ m-function in the case of a Neumann condition at 0 is mN (λ) = 1/ −λ, using again the principal branch of the root. Using the Stieltjes inversion formula Lemma 6.5 we √ see that the corresponding spectral measures are given by dρD (t) = π1 t dt for t ≥ √ for t ≥ 0, dρN = 0 0, dρD = 0 in (−∞, 0), respectively dρN (t) = πdt t √ R∞ 2 in (−∞, 0). If u ∈ L (0, ∞) and we define uˆ(t) = 0 u(x) sin(√ttx) dx, as a generalized integral converging in L2ρD , then the inversion formula √ R∞ reads u(x) = π1 0 uˆ(t) sin( tx) dt. In this case one usually changes variable in the transform and R∞ defines the sine transform S(u)(ξ) = 0 u(x) sin(ξx) dx = ξ uˆ(ξ 2 ). √ Changing variable to ξ = t in the inversion formula above then shows R ∞ that u(x) = π2 0 S(u)(ξ) sin(ξx) dξ. √ R∞ Similarly, if we set uˆ(t) = 0 u(x)√cos( tx) dx the inversion forR∞ mula obtained is u(x) = π1 0 uˆ(t) cos(√ttx) dt. In this case it is again √ common to use ξ = t as the Rtransform variable, so one defines ∞ the cosine transform C(u)(ξ) = 0 u(x) cos(ξx) dx. Changing variables in the R ∞ inversion formula above then gives the inversion formula u(x) = π2 0 C(u)(ξ) cos(ξx) dξ for the cosine transform.

EXERCISES FOR CHAPTER 11

81

Note that there are no eigenvalues in either of these cases; the spectrum is purely continuous. Exercises for Chapter 11 Exercise 11.1. Show that if K is a compact interval, then C 1 (K) is a Banach space with the norm supx∈K |u(x)| + supx∈K |u0 (x)|. If you know some topology, also show that if I is an arbitrary interval, then C(I) is a Fr´echet space (a linear Hausdorff space with the topology given by a countable family of seminorms, which is also complete), under the topology of locally uniform convergence. Exercise 11.2. With the assumptions of Corollary 11.4 the Fourier series for u in the domain of T actually converges absolutely and locally uniformly to u. If λ1 , λ2 , . . . are the eigenvalues and e1 , e2 , . . . the corresponding orthonormal eigenfunctions, use Parseval’s formula to show P e (x) that, pointwise in x, kg(x, ·, λ)k2 = | λjj −λ |2 , with natural notation. Then show that as an L2 (I)-valued function x 7→ g(x, ·, λ) is locally bounded, i.e., x 7→ kg(x, ·, λ)k is bounded on any compact subinterval of I. If v = Rλ u and uˆj is the j:th Fourier coefficient of u, then vˆj = hR P λ u, ej i = hu, Rλ ej i = uˆj /(λj − λ). Show that this implies that vj ej (x)| tends locally uniformly to 0. j>n |ˆ

CHAPTER 12

Inverse spectral theory In this chapter we continue to study the simple Sturm-Liouville equation −u00 + qu = λu, on an interval with at least one regular endpoint. Our aim is to give some results on inverse spectral theory, i.e., questions related to the determination of the equation, in this case the potential q from spectral data, such as eigenvalues, spectral measures or similar things. Our object of study is the eigen-value problem (12.1)

− u00 + qu = λu on [0, b),

(12.2)

u(0) cos α + u0 (0) sin α = 0.

Here α is an arbitrary, fixed number in [0, π), so that the boundary condition is an arbitrary separated boundary condition. We assume q ∈ L1loc [0, b), i.e., q integrable on any interval [0, c] with c ∈ (0, b), so that 0 is a regular endpoint for the equation. The other endpoint b may be infinite or finite, in the latter case singular or regular. If the deficiency indices for the equation in L2 (0, b) are (1, 1) the operator corresponding to (12.1), (12.2) is selfadjoint; if they are (2, 2) a boundary condition at b is required to obtain a selfadjoint operator. We assume that, if necessary, a choice of boundary condition at b is made, so that we are dealing with a self-adjoint operator which we will call T . If the deficiency indices are (2, 2) we know the spectrum is discrete (Theorem 11.7), but when the deficiency indices are (1, 1) the spectrum can be of any type. As in Chapter 11, let ϕ and θ be solutions of (12.1) satisfying initial conditions ( ( ϕ(0, λ) = − sin α θ(0, λ) = cos α (12.3) , . 0 ϕ (0, λ) = cos α θ0 (0, λ) = sin α Then Green’s function for T is given by g(x, ·, λ) = ϕ(min(x, y), λ)ψ(max(x, y), λ) where ψ(x, λ) = θ(x, λ) + m(λ)ϕ(x, λ) and the Titchmarsh-Weyl mfunction m(λ) is determined so that ψ satisfies the boundary condition at b. In particular ψ ∈ L2 (0, b). Let the Nevanlinna representation of m be Z∞ ¡ 1 t ¢ − 2 dρ(t), m(λ) = A + Bλ + t−λ t +1 −∞

83

84

12. INVERSE SPECTRAL THEORY

where A ∈ R, B ≥ 0 and ρ increases (dρ is a positive measure) and R ∞ dρ(t) < ∞. The transform space L2ρ consists of those functions uˆ, −∞ t2 +1 R∞ measurable with respect to dρ, for which kˆ uk2ρ = −∞ |ˆ u|2 dρ is finite. The generalized Fourier transform of u ∈ L2 (0, b) is Zb uˆ(t) =

u(x)ϕ(x, t) dx, 0

converging in

L2ρ ,

and with inverse given by Z∞ u(x) =

uˆ(t)ϕ(x, t) dρ(t), −∞

which converges in L2 (0, b). Furthermore, kuk = kˆ ukρ (Parseval) and 2 u ∈ D(T ) if and only if uˆ and tˆ u(t) ∈ L (0, b), and then Tcu(t) = tˆ u(t). In the case when one has a discrete spectrum, which means that the spectrum consists of isolated eigenvalues (of finite multiplicity), the function ρ is a step function, with a step at each eigenvalue. Suppose the eigenvalues are λ1 , λ2 , . . . and that the size of the step is cj = limε↓0 (ρ(λj + ε) − ρ(λj − ε)). Then the inverse transform takes the form ∞ X u(x) = uˆ(λj )ϕ(x, λj )cj , j=1

where uˆ(λj ) = hu, ϕ(·, λj i. For u = ϕ(·, λj ) the expansion becomes ϕ(x, λj ) = kϕ(·, λj )k2 ϕ(x, λj )cj . It follows that cj = kϕ(·, λj )k−2 . Note that ϕ(·, λj ) is an eigenfunction associated with λj , so the jump cj of ρ at λj is the so called normalization constant for the eigenfunction. The name comes from the fact that a normalized eigenfunction is given by √ ej = cj ϕ(·, λj ). We have shown the following proposition. Proposition 12.1. In the case of a discrete spectrum knowledge of the spectral function ρ is equivalent to knowing the eigenvalues and the corresponding normalization constants. 1. Asymptotics of the m-function In order to discuss some results in inverse spectral theory we need a few results on the asymptotic behavior of the m-function for large λ. We denote by mα (λ) the m-function for the boundary condition (12.2) and some fixed boundary condition at b. The following theorem is a simplified version of a result from [3]. Theorem 12.2. We have

√ m0 (λ) = − −λ + o(|λ|1/2 )

1. ASYMPTOTICS OF THE m-FUNCTION

85

as λ → ∞ along any non-real ray1. Similarly, for 0 < α < π, √ mα (λ) = cot α + ( −λ sin2 α)−1 + o(|λ|−1/2 ) as λ → ∞ along any non-real ray. By a non-real ray we always mean a half-line starting at the origin which is not part of the real line. Here and later the square root is always the principal branch, i.e., the branch with a positive real part Now note that, up to constant multiples, the Weyl solution ψ is determined by the boundary condition at b. For α = 0 we have ψ 0 (0, λ)/ψ(0, λ) = m0 (λ), so keeping a fixed boundary condition at b we obtain m0 (λ) = (sin α + mα (λ) cos α)/(cos α − mα (λ) sin α). Solving for mα gives mα (λ) =

cos α m0 (λ) − sin α sin α m0 (λ) + cos α

= cot α − (m0 (λ) sin2 α)−1 +

cos α . m0 (λ) sin α(m0 (λ) sin α + cos α) 2

Thus, the formula for m0 immediately implies that for mα , 0 < α < π, so that we only have to prove the formula for m0 . This will require good asymptotic estimates of the solutions ϕ and θ. Lemma 12.3. If u solves −u00 + qu = λu with fixed initial data in 0 one has √ Rx √ √ (12.4) u(x) = u(0)(cosh(x −λ) + O(1)(e 0 |q|/ |λ| − 1)ex −λ ) √ Rx √ √ u0 (0) |q|/ |λ| 0 √ − 1)ex −λ ), + (sinh(x −λ) + O(1)(e −λ uniformly in x, λ. Proof. Solving the equation u00 + λu = f and then replacing f by qu gives (12.5) u(x) = cosh(kx)u(0) +

where we have written k for



sinh(kx) 0 u (0) k Zx sinh(k(x − t)) + q(t)u(t) dt, k 0

−λ. Setting

g(x) = |u(x) − cosh(kx)u(0) − 1If

sinh(kx) 0 u (0)|e−x Re k k

g is a positive function the notation f (λ) = o(g(λ)) as λ → ∞ means f (λ)/g(λ) → 0 as λ → ∞.

86

12. INVERSE SPECTRAL THEORY

easy estimates give c(λ) g(x) ≤ |k|

Zx 0

1 |q| + |k|

Zx |q|g, 0

0

where c(λ) = |u(0)| + |u (0)|/|k|. R x Integrating after multiplying by the integrating factor |q(x)| exp(− 0 |q|/|k|) we obtain √ Rx g(x) ≤ c(λ)(e 0 |q|/ |λ| − 1). The estimate for u follows immediately from this.

¤

Proof of Theorem 12.2. As noted, we only need to prove the theorem for α = 0, so assume this. Now let λ = rµ, where µ is in some fixed,√compact √ subset of C \ R, and r >√0 is large. We define ϕr (x, µ) = rϕ(x/ r, rµ) and θr (x, µ) = θ(x/ r, rµ). Then ϕr and θr satisfy the initial conditions (12.3) for α = 0 and √ satisfy the equation √ 00 −u + qr u = µu on (0, b r), where qr (x) = q(x/ r)/r (check!!). From Lemma 12.3 it immediately follows that, locally uniformly in x, µ, we √ √ sinh(x −µ) have ϕr (x, µ) → √−µ and θr (x, µ) → cosh(x −µ) as r → ∞. √ √ Now let mr (µ) = m(rµ) and make a change of variable x = y/ r in r R b√r (11.2). This gives 0 |θr + mr ϕr |2 = Im(mr (µ))/ Im µ, so if c > 0 we have Zc Im mr (µ) (12.6) |θr (·, µ) + mr (µ)ϕr (·, µ)|2 ≤ Im µ √

0

as soon as b r ≥ c. The inequality may be rewritten as |mr (µ) − Cr | ≤ Rr ,

Rc Rc 2 where C and R are easily expressed in terms of θ ϕ , |θ | and r r r r 0 0 r Rc 2 |ϕr | . The inequality therefore confines mr to a disk Kr (c), and is 0 clear from Lemma 12.3 that as r → ∞ the coefficients converge, locally uniformly for µ ∈ C \ R, to those in the corresponding disk K(c) for the case q = 0. Therefore, given any neighborhood Ω of K(c), we must have mr (µ) ∈ Ω for all sufficiently large r. This is true for any c > 0, and it is obvious from (12.6) that K(c) decreases √ as a function of c. We shall show presently that only the −µ is common to all K(c), and then it follows that mr (µ) → point − √ −√−µ, locally uniformly for µ ∈ C \ R. But this means that m(λ) = − −λ(1 + o(1)) as λ → ∞ in a closed, non-real sector with vertex at the origin, and thus proves the theorem. √ It remains to show that ∩c>0 K(c) = − −µ. But any√point ` in the intersection corresponds to a solution u(x) =Rcosh(x −µ) + √ √ c Im ` ` sinh(x −µ)/ −µ with u ∈ L2 (0, ∞), since we have 0 |u|2 ≤ Im for µ √ all c > 0. Thus the only possible value is ` = − −µ. On the other

2. UNIQUENESS THEOREMS

87

hand, the equation with q = 0 has a Weyl solution on [0, ∞), so that in fact this value of ` gives a point which is in all K(c). This may of course also be verified directly (do it!). The proof is now complete. ¤ 2. Uniqueness theorems Given q, b, and the boundary conditions, one may in principle determine m and thus dρ. We will take as our basic inverse problem to determine q (and possibly b and the boundary conditions) when dρ is given. Around 1950 Gelfand and Levitan [9] gave a rather complete solution to this problem. Their solution includes uniqueness, i.e., a proof that different boundary value problems can not yield the same spectral measure, reconstruction, i.e., a method (an integral equation) whereby one, at least in principle, can determine q from the spectral measure, and characterization, i.e., a description of those measures that are spectral measures for some equation. To discuss the full Gelfand-Levitan theory here would take us too far afield. Instead we will confine ourselves to the problem of uniqueness, i.e., to show that two different operators can not have the same spectral measure. This problem was solved independently by Borg [8] and Marˇcenko [10] just before the Gelfand-Levitan theory appeared. To state the theorem we introduce, in addition to the operator T , another similar operator T˜, corresponding to a boundary condition of the form (12.2), but with an angle α ˜ ∈ [0, π), an interval [0, ˜b), a potential q˜ and, if needed, a boundary condition at ˜b. Let the corresponding spectral measure be d˜ ρ. Theorem 12.4 (Borg-Marˇcenko). If dρ = d˜ ρ, then T˜ = T , i.e., α ˜ = α, ˜b = b and q˜ = q. A few years ago Barry Simon [11] proved a ‘local’ version of this uniqueness theorem. This was a product of a new strategy developed by Simon for obtaining the results of Gelfand and Levitan. I will give my own proof [6], which is quite elementary and does not use the machinery of Simon. We will use the same idea to prove Theorem 12.4. In order to state Simon’s theorem, one should first note that knowing m is essentially equivalent to knowing dρ, at least if the boundary condition (12.2) is known. Knowing m one can in fact find dρ via the Stieltjes inversion formula, and knowing dρ one may calculate the integral in the representation of m. By Theorem 12.2 we always have B = 0, and A may be determined (if α 6= 0) since we also have m(iν) → cot α as ν → ±∞. We denote the m-functions associated with T and T˜ by m and m ˜ respectively. Then Simon’s theorem is the following. ˜b). Then Theorem 12.5 (Simon). Suppose that 0 < a ≤ min(b, √ 2(a−ε) Re −λ α=α ˜ and q = q˜ a.e. on (0, a) if (m(λ) − m(λ))e ˜ → 0 for

88

12. INVERSE SPECTRAL THEORY

every ε > 0 as λ → ∞ along some non-real ray. Conversely, if α = α ˜ √ 2(a−ε) Re −λ and q = q˜ on (0, a), then (m(λ) − m(λ))e ˜ → 0 for every ε > 0 as λ → ∞ along any non-real ray. We will prove both theorems by the same method, the crucial point of which is the following lemma. Lemma 12.6. For any fixed x ∈ (0, b) holds ϕ(x, λ)ψ(x, λ) → 0 as λ → ∞ along a non-real ray. Note that ϕ(x, λ)ψ(x, λ) is Green’s function on the diagonal x = y. We shall postpone the proof a moment and see how the theorem follows from it. We first have a corollary. Corollary 12.7. Suppose α = α ˜ = 0 or α 6= 0 6= α ˜ . Then both ˜ λ) tend to 0 as λ → ∞ along a non-real ϕ(x, ˜ λ)ψ(x, λ) and ϕ(x, λ)ψ(x, ray, locally uniformly in x. Proof. Clearly (12.4) implies that for fixed x and α ˜ 6= 0 we have ϕ(x, λ)/ϕ(x, ˜ λ) → sin α/ sin α ˜ as λ → ∞ along a non-real ray. If α = α ˜ = 0 we instead obtain the limit 1, so the corollary follows from Lemma 12.6. ¤ We shall also need a standard theorem from complex analysis, which is a slight elaboration of the maximum principle. Theorem 12.8 (Phragm´en-Lindel¨of). Suppose f is analytic in a closed sector bounded by two rays from the origin, that it is bounded on 1/2 the rays, and that |f (z)| ≤ AeB|z| in the sector, for some constants A and B. Then f is bounded in the sector. This is just one of the simplest versions of a general class of theorems, which are all known under the names of Phragm´en and Lindel¨of. Proofs are given in many textbooks on complex analysis, but for the reader’s convenience we also give a proof here. Proof. We may without loss of generality assume that the rays γ are given by the angles ±β. Let ε > 0 and F (z) = e−εz f (z), where 1/2 < γ < π/(2β) and the branch of z γ is chosen to be positive real for γ positive real z. Now, for z = re±iβ we have |F (z)| = e−εr cos(βγ) |f (z)|, where cos(βγ) > 0. Let M be a bound for f on the rays. Then we have |F (z)| ≤ M on the rays. For z = Reiδ with |δ| ≤ β we have |F (z)| ≤ A exp(BR1/2 − εRγ cos(βγ)) which tends to 0 as R → ∞. Thus, on all circular sectors bounded by the rays we have |F (z)| ≤ M on the boundary if the radius R is sufficiently large. By the maximum principle this also holds in the interior of the circular sector. Since R can be chosen arbitrarily large,

2. UNIQUENESS THEOREMS

89

the bound is valid in the entire domain bounded by the rays. It follows γ that if z is in this domain, then |f (z)| ≤ M eε|z| , and letting ε → 0 we obtain the desired result. ¤ Proof of Theorem 12.4. According to the Nevanlinna representation formula for m and m ˜ their difference is constant = C, since the linear term Bλ is always absent by the asymptotic formulas of Theorem 12.2. In particular, since Dirichlet m-functions are always unbounded near ∞ on a non-real ray and all others are bounded, we must have either α = α ˜ or α 6= 0 6= α ˜ if dρ = d˜ ρ. Thus, according to ˜ λ) tends to Corollary 12.7, the difference ϕ(x, ˜ λ)ψ(x, λ) − ϕ(x, λ)ψ(x, 0 as λ → ∞ along a non-real ray. This difference is ˜ λ) + Cϕ(x, λ)ϕ(x, ϕ(x, ˜ λ)θ(x, λ) − ϕ(x, λ)θ(x, ˜ λ), which is an entire function of λ tending to 0 along non-real rays, and it 1/2 may be bounded by a multiple of eB|λ| for some constant B according to (12.4). By Theorem 12.8 such a function is bounded in the entire plane, and therefore constant by Liouville’s theorem, hence identically 0 since the limit is zero along the rays. It follows that ˜ λ)/ϕ(x, θ(x, λ)/ϕ(x, λ) = θ(x, ˜ λ) + C for all x, λ. Differentiating with respect to x, using the fact that θ0 ϕ − θϕ0 = 1, we obtain ϕ2 (x, λ) = ϕ˜2 (x, λ). Taking the logarith0 (x,λ) ˜0 (x,λ) mic derivative of this we obtain ϕϕ(x,λ) = ϕϕ(x,λ) . ˜ For x = 0 this gives α = α ˜ , and thus that m and m ˜ are asymptotically the same. Thus C = 0, so that m = m. ˜ Differentiating once more 00 00 we obtain ϕ /ϕ = ϕ˜ /ϕ˜ which means that q = q˜ on min(b, ˜b). From ˜ and thus also ψ = ψ, ˜ on min(b, ˜b). this follows that ϕ = ϕ˜ and θ = θ, ˜ ˜ This implies that b = b, since otherwise ψ (or ψ) would satisfy selfadjoint boundary conditions both at b and ˜b, so that ψ would be an eigenfunction to a non-real eigen-value for a selfadjoint operator. Since ψ = ψ˜ also the boundary conditions at b = ˜b (if any) are the same. It follows that T = T˜. ¤ Proof of Theorem 12.5. Our starting point is that if α = α ˜ ˜ λ) tend to 0 as λ → ∞ the functions ϕ(x, ˜ λ)ψ(x, λ) and ϕ(x, λ)ψ(x, along a non-real ray. Their difference is ˜ λ) + (m(λ) − m(λ))ϕ(x, (12.7) ϕ(x, ˜ λ)θ(x, λ) − ϕ(x, λ)θ(x, ˜ λ)ϕ(x, ˜ λ). Suppose first that α = α ˜ and q = q˜ on (0, a). Then the first two terms cancel on (0, a), so that (m(λ) − m(λ))ϕ(x, ˜ λ)ϕ(x, ˜ λ) → 0 as λ → ∞ along non-real √rays if x ∈ (0, a). By (12.4) this implies that 2(a−ε) Re −λ (m(λ) − m(λ))e ˜ ) → 0 as λ → ∞ along any non-real ray. Conversely, the estimate for m − m ˜ implies first that α = α ˜ and then that for 0 < x < a the last term of (12.7) tends to 0 according to assumption and (12.4), so that the entire function ϕ(x, ˜ λ)θ(x, λ) −

90

12. INVERSE SPECTRAL THEORY

˜ λ) of λ also tends to 0 along a non-real ray, and by symmetry ϕ(x, λ)θ(x, also along its conjugate. However, as in the proof of Theorem 12.4 this 1/2 entire function is bounded by eB|λ| for some constant B, so by the Phragm´en-Lindel¨of theorem it vanishes for all x ∈ (0, a). It follows that q = q˜ in (0, a) exactly as in the proof of Theorem 12.4. ¤ It only remains to prove Lemma 12.6. Proof of Lemma 12.6. Note that for α = 0 (Dirichlet’s boundary condition) we have ψ(0, λ) = 1 and ψ 0 (0, λ) = m(λ). Since only ψ and its multiples satisfy the boundary condition at b, we have m(λ) = u0 (0, λ)/u(0, λ) for any solution of −u00 + qu = λu satisfying the boundary condition at b. But consider now the interval [a, b) for 0 < a < b and the corresponding operator generated by our differential equation in L2 (a, b) with the Dirichlet boundary condition at a and the same boundary condition as before at b. It follows that its m-function is given by ψ 0 (a, λ)/ψ(a, λ). Similarly, −ϕ0 (a, λ)/ϕ(a, λ) is the m-function corresponding to the interval (0, a], considering a as the initial point, provided with the Dirichlet boundary condition, and using the boundary condition (12.2) at 0. The change in sign is due to the fact that the initial point of the interval is now to the right of the other end point. Now, since ϕψ 0 − ϕ0 ψ ≡ 1 we have 1/(ϕψ) = (ϕψ 0 − ϕ0 ψ)/(ϕψ) = ψ 0 /ψ − ϕ0 /ϕ, so this is a sum of two Dirichlet m-functions. √According to Theorem 12.2 all such m-functions are asymptotic to − −λ as λ → ∞ along a non-real ray, which immediately implies that ϕ(a, λ)ψ(a, λ) → 0. ¤ We make some final remarks. One may generalize Simon’s theorem to the more general Sturm-Liouville equation −(pu0 )0 + qu = λu, where 1/p and q are realvalued and locally integrable, provided one can show appropriate growth estimates for the solutions and that ϕψ ˜ → 0 as before. I showed in [4, 5] that ϕψ ˜ → 0 in the appropriate manner, 0 provided 1/p is in Lrloc for some r > 1 and q − q˜ is in Lrloc , where r0 is the conjugate exponent to r. For example, if 1/p is locally bounded it is enough with local integrability of q and q˜. Simon’s theorem therefore generalizes to this situation. The condition on the then p R a−εm-functions has to be replaced by m(λ) − m(λ) ˜ = O(exp(−2 0 Re −λ/p)). As far as the original Borg-Marˇcenko theorem is concerned, it is now well known exactly to what extent the coefficients p, q and w in the equation (10.1), as well as the interval and boundary conditions, are determined by the spectral measure, see [7].

CHAPTER 13

First order systems We shall here study the spectral theory of general first order system (13.1)

Ju0 + Qu = W v

where J is a constant n × n matrix which is invertible and skewHermitian (i.e., J ∗ = −J) and the coefficients Q and W are n × n matrix-valued functions which are locally integrable on I. In addition Q is assumed Hermitian and W positive semi-definite. As we shall see, these properties ensure the proper symmetry of the differential expression. The functions u and v are¡ n×1¢ matrix-valued on I. In the special 0 I case when n is even and J = −I 0 , I being the unit matrix of order n/2, systems of the form (13.1) are usually called Hamiltonian systems. The following existence and uniqueness theorem is fundamental Theorem 13.1. Suppose A is an n × n matrix-valued function with locally integrable entries in an interval I, and that B is an n×1 matrixvalued function, also locally integrable in I. Assume further that c ∈ I and C is an n × 1 matrix. Then the initial value problem ( u0 + Au = B in I, u(c) = C, has a unique n × 1 matrix-valued solution u with locally absolutely continuous entries defined in I. The theorem has the following immediate consequence. Corollary 13.2. The set of solutions to u0 + Au = 0 in I is an n-dimensional linear space. Proofs for Theorem 13.1 and Corollary 13.2 are given in Appendix C. We will apply them for A = J −1 (Q − λW ), where λ ∈ C, and B = J −1 W v. We shall study (13.1) in the Hilbert space L2W of equivalence classes of n × 1 matrix-valued Lebesgue measurable functions u for which ∗ u R W∗ u is integrable over I. In this space the scalar product is hu, ui = v W I R u. Two∗ functions u and u˜ are considered equivalent if the integral I (u − u˜) W (u − u˜) = 0. Note that this means that they can be very different pointwise. For example, in the case of the system equivalent to (10.1) the second component of an element of L2W is completely undetermined. 91

92

13. FIRST ORDER SYSTEMS

Since W is assumed locally integrable it is clear that constant n × 1 matrices are locally in L2W so (each components of) W u is locally integrable if u ∈ L2W . It is also clear that u and u˜ are two different representatives of the same equivalence class in L2W precisely if W u = W u˜ almost everywhere (Exercise 13.1). Example 13.3. Any standard scalar differential equation may be written on the form (13.1) with a constant, skew-Hermitian J. If it is possible to do this so that Q and W are Hermitian, the differential equation is called formally symmetric. We have already seen this in the case of the Sturm-Liouville equation (10.1), which will be formally symmetric if p, q and w are real-valued. The first order scalar equation iu0 + qu = wv is already of the proper form and formally symmetric if q and w are real-valued. The fourth order equation (p2 u00 )00 − (p1 u0 )0 + qu = wv may be written on the form (13.1) by setting µ ¶ µ 0 0 1 0¶ µ p0 0 0 0 ¶ u hu0 0 0 0 1 00 0 0 U = (p2 u ) −p1 u , J = −1 0 0 0 and Q = 00 p11 10 00 , −p2 u00

0 −1 0 0

0 0 0 −1/p2

as is readily seen, and it will be formally symmetric if the coefficients w, p0 , p1 and p2 are real-valued. In order to get a spectral theory for (13.1) it is convenient to use the theory of symmetric relations, since it is sometimes not possible to find a densely defined symmetric operator realizing the equation. Consequently, we must begin by defining a minimal relation, show that it is symmetric, calculate its adjoint and find the selfadjoint restrictions of the adjoint. We define the minimal relation T0 to be the closure in L2W ⊕ L2W of the set of pairs (u, v) of elements in L2W with compact support in the interior of I (i.e., which are 0 outside some compact subinterval of the interior of I which may be different for different pairs (u, v)) and such that u is locally absolutely continuous and satisfies the equation Ju0 + Qu = W v. This relation between u and v may or may not be an operator (Exercise 13.2). The next step is to calculate the adjoint of T0 . In order to do this, we shall again use the classical variation of constants formula, now in a more general form than in Lemma 10.4. Below we always assume that c is a fixed (but arbitrary) point in I. Let F (x, λ) be a n × n matrix-valued solution of JF 0 + QF = λW F with F (c, λ) invertible. This means precisely that the columns of F are a basis for the solutions of (13.1) for v = λu. Such a solution is called a fundamental matrix for this equation. We will always in addition suppose that S = F (c, λ) is independent of λ and symplectic, i.e., S ∗ JS = J. We may for example take S equal to the n × n unit matrix or, if J is unitary, S = J. Lemma 13.4. We have F ∗ (x, λ)JF (x, λ) = J for any complex λ and x ∈ I. The solution u of Ju0 + Qu = λW u + W v with initial data

13. FIRST ORDER SYSTEMS

93

u(c) = 0 is given by Zx (13.2)

u(x) = F (x, λ)J −1

F ∗ (y, λ)W (y)v(y) dy . c

Proof. We have (F ∗ (x, λ)JF (x, λ))0 = −(JF 0 (x, λ))∗ F (x, λ) + F (x, λ)JF 0 (x, λ) = 0 using the differential equation. It follows that F ∗ (x, λ)JF (x, λ) is constant. Since it equals J for x = c this is its value for all x ∈ I. It follows that J −1 F ∗ (x, λ) is the inverse matrix of JF (x, λ). Straightforward differentiation now shows that (13.2) solves the equation. ¤ ∗

Corollary 13.5. If v ∈ L2W with compact support in RI then (13.1) has a solution u with compact support in I if and only if I v ∗ W u0 = 0 for all solutions u0 of the homogeneous equation (13.1) with v = 0. Proof. If we choose c to the left ofR the support of v, then by x Lemma 13.4 the function u(x) = F (x)J −1 c F ∗ W v is the only solution of (13.1) which vanishes to the left of c. Since F (x)J −1R is invertible (13.1) has a solution of compact support if and only if I F ∗ W v = 0. But the columns of F are linearly independent so they are a basis for the solutions of the homogeneous equation. The corollary follows. ¤ Lemma 13.6. Suppose (u, v) ∈ T0∗ . Then there is a representative of the equivalence class u, also denoted by u, which is absolutely continuous and satisfies Ju0 + Qu = W v. Conversely, if this holds, then (u, v) ∈ T0∗ . Proof. Let u1 be a solution of Ju01 + Qu1 = W v and assume (u0 , v0 ) ∈ T0 has compact support. Integrating by parts we get Z Z Z Z ∗ 0 ∗ ∗ 0 v W u0 = (Ju1 + Qu1 ) u0 = u1 (Ju0 + Qu0 ) = u∗1 W v0 . I

I

I

I

This proves the converse part of the lemma. We also have 0 = hu0 , vi − hv0 , ui = hv0 , u1 − ui. Here v0 is an arbitrary compactly supported element of L2W for which there exists a compactly supported element u0 ∈ L2W satisfying Ju00 + Qu0 = W v0 . By Corollary13.5 it follows that u1 − u solves the homogeneous equation, i.e., u solves (13.1). ¤ It now follows that T0 is symmetric and that its adjoint is given by the maximal relation T1 consisting of all pairs (u, v) in L2W × L2W such that u is (the equivalence class of) a locally absolutely continuous function for which Ju0 + Qu = W v. We can now apply the theory of Chapter 9.2. The deficiency indices of T0 are accordingly the number of solutions of Ju0 + Qu = iW u and Ju0 + Qu = −iW u respectively which are linearly independent in L2W . Since there are altogether only

94

13. FIRST ORDER SYSTEMS

n (pointwise) linearly independent solutions of these equations the deficiency indices can be no larger than n; in particular they are both finite. We now make the following basic assumption. Assumption 13.7. If K is a sufficiently large, compact R subinterval 0 of I there is no non-trivial solution of Ju + Qu = 0 with K u∗ W u = 0. Note that if there is a solution with u∗ W u = 0, then W u = 0 so u actually also solves Ju0 + Qu = λW u for any complex λ. The assumption automatically holds if (13.1) is equivalent to a Sturm-Liouville equation, or more generally an equation of the types discussed in Example 13.3 and Exercise 13.3. One reason for making the assumption is that it ensures that the deficiency indices of T0 are precisely equal to the dimensions of the spaces of those solutions of Ju0 + Qu = ±iW u which have finite norm, but the assumption will be even more important in the next chapter. According to Corollary 9.15 there will be selfadjoint realizations of (13.1) precisely if the deficiency indices are equal. We will in the rest of this chapter assume that a selfadjoint extension of T0 exists. Some simple criterions that ensure this are given in the following proposition, but if these do not apply it can, in a concrete case, be very difficult to determine whether there are selfadjoint realizations or not. Proposition 13.8. The minimal relation T0 has equal deficiency indices if either of the following conditions is satisfied: (1) J, Q and W are real-valued. (2) The interval I is compact. Proof. If u ∈ L2W satisfies Ju0 + Qu = λW u and the coefficients are real-valued, then conjugation shows that u is still in L2W and Ju0 + Qu = λW u. There is therefore a one-to-one correspondence between Dλ and Dλ which obviously preserves linear independence. It follows that n+ = n− . If I is compact, then solutions of Ju0 + Qu = λW u are absolutely continuous in I, and W is integrable in I, so that all solutions are in L2W . Thus n+ = n− = n. ¤ real-valued only if Example 13.9. Note that J ∗ = −J, so J can be P (k) (k) = n is even (show this!). Suppose u solves the equation m k=0 (pk u ) iwu where p0 , . . . pm are realvalued and w > 0. Then u P the coefficients (k) (k) (p u ) = −iwu. It follows that if (13.1) is equivalent satisfies m k=0 k to an equation of this form, then its deficiency indices are always equal so that selfadjoint realizations exist. This is in particular the case for the Sturm-Liouville equation (10.1). We will now take a closer look at how selfadjoint realizations are determined as restrictions of the maximal relation. Suppose (u1 , v1 )

13. FIRST ORDER SYSTEMS

95

and (u2 , v2 ) ∈ T1 . Then the boundary form (cf. Chapter 9) is Z (v2∗ W u1 − u∗2 W v1 )

(13.3) h(u1 , v1 ), U(u2 , v2 )i = i Z

I

((Ju02 + Qu2 )∗ u1 − u∗2 (Ju01 + Qu1 ))

=i I

Z (u∗2 Ju1 )0 = −i lim [u∗2 Ju1 ]K ,

= −i

K→I

I

the limit being taken over compact subintervals K of I. We must restrict T1 so that this vanishes. Like in Chapter 10 this means that the restriction of T1 to a selfadjoint relation T is obtained by boundary conditions since the limit clearly only depends on the values of u1 and u2 in arbitrarily small neighborhoods of the endpoints of I. An endpoint is called regular if it is a finite number and Q and W are integrable near the endpoint. Otherwise the endpoint is singular. If both endpoints are regular, we again say that we are dealing with a regular problem. We have a singular problem if at least one of the endpoints is infinite, or if at least one of Q and W is not integrable on I. Consider now the regular case. Since it is clear that both deficiency indices equal n in the regular case there are always selfadjoint realizations. To see what³they´look like, let u˜ be the boundary value ¡ iJ 0 ¢ of (u, v) ∈ T1 , i.e., u˜ = u(a) so that the 0 −iJ u(b) . Also put B = ∗ boundary form is u˜2 B u˜1 . Now if u ∈ Di then hu, Uui = hu, ui so that the boundary form is positive definite on Di . Similarly it is negative definite on D−i (cf., Corollary 9.17). Since dim Di ⊕D−i = 2n the rank of the boundary form is 2n on this space so that the boundary values of this space, and a fortiori those of T1 , range through all of C2n . Since hT1 , UT0 i = 0 it follows that the boundary value of any element of T0 is 0. Conversely, to guarantee that hT1 , Uui = 0 for some u ∈ T1 it is obviously enough that the boundary value of u vanishes. Hence the minimal relation consists exactly of those elements of the maximal relation which have boundary value 0. It is now clear that any maximal symmetric restriction of T1 is obtained by restricting the boundary values to a maximal subspace of C2n for which the boundary form vanishes, a so called maximal isotropic space for B. We know, since the deficiency indices are finite and equal, that all such maximal symmetric restrictions are actually selfadjoint (Corollary 9.15). Since the problem of finding maximal isotropic spaces of B is a purely algebraic one we consider the problem of identifying all selfadjoint restrictions of T1 solved in the regular case. See also Exercise 13.4.

96

13. FIRST ORDER SYSTEMS

Clearly all these restrictions are obtained by restricting the boundary values of elements in T1 to certain n-dimensional subspaces of C2n , i.e., by imposing n linear, homogeneous boundary conditions on T1 . We consider a few special cases. One selfadjoint realization is obtained by imposing periodic boundary conditions u(b) = u(a) or more generally u(b) = Su(a) where S is a fixed matrix satisfying S ∗ JS = J. As already mentioned, such a matrix S ¡is often ¢ called symplectic, at least 0 I in the case when S is real, and J = −I 0 , so that n is even. Another possibility occurs if the invertible Hermitian matrix iJ has an equal number of positive and negative eigen-values (this obviously requires n to be even). In that case we may impose separated boundary conditions, i.e., conditions that make both u∗ (a)Ju(a) and u∗ (b)Ju(b) vanish. Boundary conditions which are not separated are called coupled. It must be emphasized that for n > 2 there are selfadjoint realizations which are determined by some conditions imposed only on the value at one of the endpoints, and some conditions involving the values at both endpoints. Let us now turn to the general, not necessarily regular case. We first need to briefly discuss Hermitian forms of finite rank. If B is a Hermitian form on a linear space L we set LB = {u ∈ L | B(u, L) = 0} which is a subspace of L. The rank of B is codim LB (= dim L/LB ). In the sequel we assume that B has finite rank. If M is a subspace on which the form B is non-degenerate, i.e., there is no non-zero element u ∈ M such that B(u, v) = 0 for all v ∈ M, then we must have LB ∩ M = {0} so that M has to be finitedimensional. This means, of course, that after introducing a basis in M the form B is given on M by an invertible matrix. If B is non-degenerate on M, then for every u ∈ L there is a unique element v ∈ M (the B-projection of u on M) such that B(u − v, M) = 0 (Exercise 13.5). If B is non-degenerate on M, but not on any proper superspace of M, we say that M is maximal non-degenerate for B. Of course this means exactly that LB ∩M = {0} ˙ B as a direct sum. and dim M = rank B so that L = M+L We call a subspace P of L on which B is positive definite a maximal positive definite space for B if P has no proper superspaces on which B is positive definite. If B is positive definite on P, then clearly dim P ≤ rank B. It follows that forms of finite rank always have maximal positive definite spaces. Similarly for negative definite spaces. Proposition 13.10 (Sylvester’s law of inertia). Suppose B is a Hermitian form of finite rank on a linear space L. Then all maximal positive definite subspaces for B have the same dimension. Similarly for maximal negative definite subspaces. Proof. Suppose P is maximal positive definite for B and that P˜ is another positive definite space for B. Then the B-projection on P is injective as a linear map BP : P˜ → P. For if not, there exists

13. FIRST ORDER SYSTEMS

97

a non-zero u ∈ P˜ such that B(u, P) = 0. But then B is positive definite on the linear hull of u and P, since B(αu + βv, αu + βv) = |α|2 B(u, u)+|β|2 B(v, v) for any v ∈ P. This contradicts the maximality of P as a positive definite space. From the standard fact dim P˜ = ˜ + dim{u ∈ P˜ | BP u = 0} now follows that dim P˜ ≤ dim P. dim BP (P) By symmetry all maximal positive definite subspaces for B have the same dimension. Similarly, all maximal negative definite spaces for B have the same dimension. ¤ If P is any maximal positive definite subspace, and N any maximal negative definite subspace, for B, we set r+ = dim P and r− = dim N . The pair (r+ , r− ) is called the signature of the form B. Proposition 13.11. Suppose P and N are maximal as positive and negative definite subspaces for a Hermitian form B of finite rank. ˙ Then P ∩ N = {0}, the direct sum P +N is a maximal non-degenerate space for B, and rank B = r+ + r− . Proof. Clearly B can not be both positive and negative on the same vector u, so P ∩M = {0}. B is obviously (check!) non-degenerate ˙ , and if P +N ˙ is not maximal there exists u ∈ ˙ such that on P +N / P +N ˙ . We may B is non-degenerate on the linear hull M of u and P +N ˙ ) = 0, since otherwise we can subtract from u its Bassume B(u, P +N ˙ . We cannot have B(u, u) = 0 since B would then projection on P +N be degenerate on M. But if B(u, u) > 0, then B would be positive definite on the linear hull of u and P, contradicting the maximality of P. Similarly, if B(u, u) < 0 we would get a contradiction to the ˙ maximality of N . Therefore P +N is maximal non-degenerate so that r+ + r− = rank B. ¤ Two Hermitian forms Ba and Bb of finite rank are said to be independent if each has a maximal non-degenerate space Ma respectively Mb such that Ba (Mb , L) = Bb (Ma , L) = 0. It is then clear that ˙ b is maximal non-degenerate for Ma ∩ Mb = {0} and that Ma +M a a b b Bb − Ba . If (r+ , r− ) and (r+ , r− ) are the signatures of Ba and Bb rea b a b ) is the signature of Bb − Ba . + r+ , r− + r− spectively it follows that (r+ Now consider (13.3) and suppose I = (a, b). If u1 = (u1 , v1 ) and u2 = (u2 , v2 ) ∈ T1 then −iu∗2 Ju1 has a limit both in a and b by (13.3). We denote these limits Ba (u1 , u2 ) and Bb (u1 , u2 ) respectively and call them the boundary forms at a and b respectively. Clearly Ba and Bb are Hermitian forms on T1 . Being limits of forms of rank n they both have ranks ≤ n (Exercise 13.6). They are also independent. This follows from the next lemma. Lemma 13.12. Suppose (u, v) ∈ T1 . Then there exists (u1 , v1 ) in T1 such that (u1 , v1 ) = (u, v) in a right neighborhood of a and (u1 , v1 ) vanishes in a left neighborhood of b.

98

13. FIRST ORDER SYSTEMS

Proof. Let [c, d] be a compact subinterval of I = (a, b) such that F ∗ (·, λ)W F (·, λ) is invertible and put v1 = v in (a, c] and v1 ≡ 0 in c [d, b). Now let Rd

Zx u1 (x) = F (x, λ)(u(c) + J −1

F ∗ (y, λ)W (y)v1 (y) dy) . c

It is clear that u1 = u in (a, c] and if we choose v1 appropriately in [c, d] we can achieve that u1 ≡ 0 in [d, b). In fact, setting v(x) = Rd −F (x, λ)( c F ∗ (·, λ)W F (·, λ))−1 Ju(c) in this interval will do. ¤ It follows that (u − u1 , v − v1 ) ∈ T1 is 0 near a and equals (u, v) near b. We can therefore find a maximal non-degenerate space for Ba consisting of elements of T1 vanishing near b. Similarly, a maximal nondegenerate space for Bb consisting of elements of T1 vanishing near a. Thus Ba and Bb are independent, as claimed. Since the signature of the complete boundary form Bb − Ba is (n+ , n− ) the independence of a b a b Ba and Bb implies that n+ = r− + r+ and n− = r+ + r− , using the notation introduced above for the signatures of Ba and Bb . According to Corollary 9.15 T1 has selfadjoint restrictions precisely if n+ = n− . Reasoning like in the regular case it follows that there are selfadjoint a restrictions defined by separated boundary conditions precisely if r+ = a b b r− and r+ = r− , from which n+ = n− follows. In fact, from any two of these relations the third clearly follows. Consider finally the case when a is a regular endpoint but b possibly is singular. In this case Ba is given by Ba (u1 , u2 ) = iu2 (a)∗ Ju1 (a), with a notation as above. Clearly r+ is the number of positive eigenvalues of a iJ and r− the number of negative eigenvalues. It follows that selfadjoint restrictions of T1 defined by separated boundary conditions exist if and only if the deficiency indices are equal and iJ has an equal number of positive and negative eigenvalues; in particular n must be even. In the Sturm-Liouville case all these conditions are fulfilled, as we already know. Exercises for Chapter 13 Exercise 13.1. Show that u and u˜ are elements of the same equivalence class in L2W if and only if W u = W u˜ a.e. Exercise 13.2. Verify that T0 is the graph of an operator if (13.1) is equivalent to an equation of the type (10.1) (or more generally an equation of the type discussed in Exercise 13.3) and w > 0 a.e. in I. Also show that in this case Assumption 13.7 holds. Try to show this assuming only that w ≥ 0 but w > 0 on a subset of I of positive measure (this is considerably harder).

EXERCISES FOR CHAPTER 13

99

Exercise 13.3. Show that the differential equation iu000 = wv (here √ i = −1) can be written on the form P (13.1). (k) (k) = wv can be written Also show that the equation m k=0 (pk u ) on this form if the coefficients w and p0 , p1 , . . . , pm satisfy appropriate conditions (state these conditions!). ³ u ´ Hint: Put U = hu000 in the first case. In the second case, let U be hu

the matrix with 2m rows u0 , . . . , u2m−1 where uj = u(j) and um+j = P m (−1)j k=j+1 (pk u(k) )(k−j−1) for j = 0, . . . , m − 1. Exercise 13.4. Find all selfadjoint realizations of a regular SturmLiouville equation. More generally, assume J −1 = J ∗ = −J and show that the eigen-values of B are ±1, both with multiplicity n. Then describe all maximal isotropic spaces for B. Exercise 13.5. Suppose B is a Hermitian form of finite rank on a Hilbert space L, and that B is non-degenerate on a subspace M. Show that for any u ∈ L there is a unique v ∈ M, the B-projection on M, such that B(u − v, M) = 0. Also show that if, and only if, M is maximal non-degenerate, then B(u − v, L) = 0. Exercise 13.6. Suppose B1 , B2 , . . . is a sequence of Hermitian forms on L with finite rank, all of signature (r+ , r− ), and suppose Bj (u, v) → B(u, v) as j → ∞, for any u, v ∈ L. Show that B is a Hermitian form on L of finite rank (s+ , s− ), where s+ ≤ r+ and s− ≤ r− .

CHAPTER 14

Eigenfunction expansions Just as in Chapter 11, we will deduce our results for the system (13.1) from a detailed description of the resolvent. As before we will prove that the resolvent is actually an integral operator. To see this, first note that according to Lemma 13.6 all elements of D1 are locally absolutely continuous, in particular they are in C(I). The set C(I) becomes a Fr´echet space if provided with the topology of locally uniform convergence; with a little loss of elegance we may restrict ourselves to consider C(K) for an arbitrary compact subinterval K ⊂ I. This is a Banach space with norm kukK = supx∈K |u(x)|, |·| denoting the norm of an n × 1 matrix (Exercise 14.1). The set T1 is a closed subspace of H ⊕ H, since T1 is a closed relation. It follows from Assumption 13.7 that the map T1 3 (u, v) 7→ u ∈ C(I) is well defined, i.e., there can not be two different locally absolutely continuous functions u in the same L2W -equivalence class satisfying (13.1) for the same v. The restriction map IK : T1 3 (u, v) 7→ u ∈ C(K) is therefore a linear map between Banach spaces. Proposition 14.1. For every compact subinterval K ⊂ I there exists a constant CK such that kukK ≤ CK k(u, v)kW for any (u, v) ∈ T1 . Proof. We shall show that the restriction map IK is a closed operator if K is sufficiently large. Since IK is everywhere defined in the Hilbert space T1 it follows by the closed graph theorem (Appendix A) that IK is a bounded operator, which is the statement of the proposition. Now suppose (uj , vj ) → (u, v) in T1 and uj → u˜ in C(K). We mustR show that IK (u, v) = u˜, i.e., u = u˜ pointwise in K. We have 0 ≤ K (u − uj )∗ W (u − uj ) ≤ ku − uj k2 and by Lemma 13.4 Zx −1 uj (x) = F (x, λ)(uj (c) + J F ∗ (y, λ)W (y)vj (y) dy), R

c

so letting j → ∞ it is clear that K (u − u˜)∗ W (u − u˜) = 0 and that u˜ satisfies J u˜0 + Q˜ u = W v, so Assumption 13.7 shows that u − u˜ = 0 pointwise in K if K is sufficiently large. Hence IK is closed, and we are done. ¤ 101

102

14. EIGENFUNCTION EXPANSIONS

We can now show that the resolvent is an integral operator. First note that if T is a selfadjoint realization of (13.1), i.e., a selfadjoint restriction of T1 , then setting HT = D(T ) the resolvent Rλ of the operator part T˜ of T is an operator on HT , defined for λ ∈ ρ(T˜). We define the resolvent set ρ(T ) = ρ(T˜) and extend Rλ to all of L2W by setting Rλ H∞ = 0, and it is then clear that the resolvent has all the properties of Theorems 5.2 and 5.3; the only difference is that the resolvent is perhaps no longer injective. Given u ∈ L2W we obtain the element1 (Rλ u, λRλ u + u) ∈ T1 , so we may also view the resolvent as an operator ˜ λ : L2 → T1 . This operator is bounded since k(Rλ u, λRλ u + u)kW ≤ R W ˜ λ k ≤ (1 + |λ|)kRλ k + 1, where kRλ k ((1 + |λ|)kRλ k + 1)kukW . Hence kR is the norm of Rλ as an operator on HT . It is also clear that the an˜ λ . We obtain the following alyticity of Rλ implies the analyticity of R theorem. Theorem 14.2. Suppose I is an arbitrary interval, and that T is a selfadjoint realization in L2W of the system (13.1), satisfying Assumption 13.7. Then the resolvent Rλ of T may be viewed as a bounded linear map from L2W to C(K), for any compact subinterval K of I, which depends analytically on λ ∈ ρ(T ), in the uniform operator topology. Furthermore, there exists Green’s function G(x, y, λ), an n × n matrix-valued function, such that Rλ u(x) = hu, G∗ (x, ·, λ)iW for any u ∈ L2W . The columns of y 7→ G∗ (x, y, λ) are in HT = D(T ) for any x ∈ I. ˜ λ ∈ B(L2 , T1 ) is Proof. We already noted that ρ(T ) 3 λ 7→ R W analytic in the uniform operator topology. Furthermore, the restriction operator IK : T1 → C(K) is bounded and independent of λ. Hence ˜ λ is analytic in the uniform operator topology. In ρ(T ) 3 λ → IK R particular, for fixed λ ∈ ρ(T ) and any x ∈ I, the components of the ˜ λ u)(x) = Rλ u(x) are bounded linear forms. linear map L2W 3 u 7→ (IK R By Riesz’ representation theorem we have Rλ u(x) = hu, G∗ (x, ·, λ)iW , where the columns of y 7→ G∗ (x, y, λ) are in L2W . Since Rλ u = 0 for u ∈ H∞ it follows that the columns of G∗ (x, ·, λ) are actually in HT for each x ∈ I. ¤ Among other things, Theorem 14.2 tells us that if uj → u in L2W , then Rλ uj → Rλ u in C(K), so that Rλ uj converges locally uniformly. This is actually true even if uj just converges weakly, but we only need is the following weaker result. Lemma 14.3. Suppose Rλ is the resolvent of a selfadjoint relation T as above. Then if uj * 0 weakly in L2W , it follows that Rλ uj → 0 pointwise and locally boundedly. = uT + u∞ with uT ∈ HT and (0, u∞ ) ∈ T and T˜Rλ uT = (T˜ − λ)Rλ uT + λRλ uT = uT + λRλ u. Thus (Rλ u, λRλ u + u) = (Rλ u, λRλ u + uT ) + (0, u∞ ) ∈ T ⊂ T1 . 1u

14. EIGENFUNCTION EXPANSIONS

103

Proof. Rλ uj (x) = huj , G∗ (x, ·, λ)iW → 0 since the columns of y 7→ G∗ (x, y, λ) are in L2W for any x ∈ I. Now let K be a compact subinterval of I. A weakly convergent sequence in L2W is bounded, so since Rλ maps L2W boundedly into C(K), it follows that Rλ uj (x) is bounded independently of j and x for x ∈ K. ¤ Corollary 14.4. If the interval I is compact, then any selfadjoint restriction T of T1 has compact resolvent. Hence T has a complete orthonormal sequence of eigenfunctions in HT . Proof. Suppose uj * 0 weakly in L2W . If I is compact, then Lemma 14.3 implies that Rλ uj → 0 pointwise and boundedly in I, and hence by dominated convergence Rλ uj → 0 in L2W . Thus Rλ is compact. The last statement follows from Theorem 8.3. ¤ If T has compact resolvent, then the generalized Fourier series of any u ∈ HT converges to u in L2W ; if we just have u ∈ L2W the series converges to the projection of u onto HT . For functions in the domain of T much stronger convergence is obtained. Corollary 14.5. Suppose T has a complete orthonormal sequence of eigenfunctions in HT . If u ∈ D(T ), then the generalized Fourier series of u converges locally uniformly in I. In particular, if I is compact, the convergence is uniform in I. Proof. Suppose u ∈ D(T ) = D(T˜), i.e., T˜u = v for some v ∈ HT , and let v˜ = v − iu, so that u = Ri v˜. If e is an eigenfunction of T with eigenvalue λ we have T˜e = λe or (T˜ + i)e = (λ + i)e so that R−i e = e/(λ + i). It follows that hu, eiW e = hRi v˜, eiW e = 1 h˜ v , R−i eiW e = λ−i h˜ v , eiW e = h˜ v , eiRi e. If sN u denotes the N :th partial sum of the Fourier series for u it follows that sN u = Ri sN v˜, where sN v˜ is the N :th partial sum for v˜. Since sN v˜ → v˜ in HT , it follows from Theorem 14.2 and the remark after it that sN u → u in C(K), for any compact subinterval K of I. ¤ The convergence is actually even better than the corollary shows, since it is absolute and uniform (see Exercise 14.2). Example 14.6. Consider the operator of Example 4.8, which is d considered in L2 (−π, π), with the boundary condition u(−π) = −i dx u(π). This is a regular, selfadjoint realization of (13.1) for n = 1, J = −i, Q = 0 and W = 1, and it is clear that H∞ = {0}. Hence there is a complete orthonormal sequence of eigenfunctions in L2 (−π, π). The solutions of −iu0 = λu are the multiples of eiλx , and the boundary condition implies that λ is an integer.P We obtain the classical ˆk eikx , where uˆk = (complex) Fourier series expansion u(x) = ∞ k=−∞ u R π 1 u(x)e−ikx dx. According to our results, the series converges in 2π −π L2 (−π, π) for any u ∈ L2 (−π, π), and uniformly if u is absolutely continuous with derivative in L2 (−π, π).

104

14. EIGENFUNCTION EXPANSIONS

Exercises for Chapter 14 Exercise 14.1. Show that if K is a compact interval, then C(K) is a Banach space with the norm supx∈K |u(x)|. Also show that if I is an arbitrary interval, then C(I) is a Fr´echet space (a linear Hausdorff space with the topology given by a countable family of seminorms, which is also complete), under the topology of locally uniform convergence. Exercise 14.2. With the assumptions of Corollary 14.5 the Fourier series for u ∈ D(T ) actually converges absolutely and uniformly to u. This may be proved just as for the case of a Sturm-Liouville equation, which was considered in Exercise 11.2. Do it!

CHAPTER 15

Singular problems We now have a satisfactory eigenfunction expansion theory for regular boundary value problems, so we turn next to singular problems. We then need to take a much closer look at Green’s function. To do this, we fix an arbitrary point c ∈ I; if I contains one of its endpoints, this is the preferred choice for c. Next, let F (x, λ) be a fundamental matrix for JF 0 + QF = λW F with λ-independent, symplectic initial data in c. We will need the following theorem. Theorem 15.1. A solution u(x, λ) of Ju0 + Qu = λW u with initial data independent of λ is an entire function of λ, locally uniformly with respect to x. This means that u(x, λ) is analytic as a function of λ in the whole complex plane, and that the difference quotients h1 (u(x, λ+h)−u(x, λ)) converge locally uniformly in x as h → 0. The proof is given in Appendix C. We can now give the following detailed description of Green’s function. Theorem 15.2. Green’s function has the following properties: (1) For λ ∈ ρ(T ) we have Rλ u(x) = hu, G∗ (x, ·, λ)iW . (2) As functions of y the columns of G∗ (x, y, λ) satisfy the equation Ju0 + Qu = λW u for y 6= x. (3) As functions of y, the columns of G∗ (x, y, λ) satisfy the boundary conditions that determine T as a restriction of T1 , for any x interior to I. (4) G∗ (x, y, λ) = G(y, x, λ), for all x, y ∈ I and λ ∈ ρ(T ). (5) G(x, y, λ) − G(x, y, µ)) = (λ − µ)hG∗ (y, ·, µ), G∗ (x, ·, λ)iW = (λ − µ)Rλ G∗ (y, ·, µ)(x), for all x, y ∈ I and λ, µ ∈ ρ(T ). Furthermore, there exists an n×n matrix-valued function M (λ), defined in ρ(T ) and satisfying M ∗ (λ) = M (λ), such that (15.1)

G(x, y, λ) = F (x, λ)(M (λ) ± 21 J −1 )F ∗ (y, λ),

where the sign of 12 J should be positive for x > y, negative for x < y. Proof. We already know (1). Now let K be a compact subinterval of I, (u, v) ∈ T0 with support in K, and suppose x ∈ / K. We have u ∈ D(T0 ) ⊂ D(T ) and (u, v) = (u, λu + (v − λu)) so that u = Rλ (v − λu). 105

106

15. SINGULAR PROBLEMS

We obtain 0 = u(x) = Rλ (v − λu)(x) = hv − λu, G∗ (x, ·, λ)iW = hv, G∗ (x, ·, λ)iW − hu, λG∗ (x, ·, λ)iW . But according to Lemma 13.6 this means that each column of y 7→ G∗ (x, y, λ) is in the domain of the maximal relation for (13.1) on the intervals I ∩ (−∞, x) and I ∩ (x, ∞) and satisfies the equation Ju0 + Qu = λW u on these intervals, so (2) follows. It also follows that we have ( F (y, λ)P+∗ (x, λ), y x, for some n × n matrix-valued functions P+ and P− . If u is compactly supported and in L2W we have, for x outside the convex hull of the support of u, (15.2)

Rλ u(x) = P± (x, λ)hu, F (·, λ)iW .

The function v = Rλ u satisfies the equation Jv 0 + Qv = λW v + W u, so we may write P± (x, λ) = F (x, λ)H± (λ), and since Rλ u ∈ D(T ) it certainly satisfies the boundary conditions determining T . If the support of u is large enough the scalar product in (15.2) can be any column vector, in view of Assumption 13.7, so for every y each column of x 7→ G(x, y, λ) also satisfies the boundary conditions determining T . This proves (3). If the endpoints of I are a and b respectively we now have ¡ Rλ u(x) = F (x, λ)

Zx

Zb H+ (λ)F ∗ (·, λ)W u +

a

¢ H− (λ)F ∗ (·, λ)W u .

x

Differentiating this we obtain JRλ u0 + (Q − λW )Rλ u = JF (x, λ)(H− (λ) − H+ (λ))F ∗ (x, λ)W (x)u(x), so JF (x, λ)(H− (λ) − H+ (λ))F ∗ (x, λ) should be1 the unit matrix. In view of the fact that JF (x, λ) is the inverse of J −1 F ∗ (x, λ) this means that H− (λ) − H+ (λ) = J −1 . If we define M (λ) = (H− (λ) + H+ (λ))/2 we now obtain (15.1). If now u and v both have compact supports we have ZZ hRλ u, viW = v ∗ (x)W (x)G(x, y, λ)W (y)u(y) dxdy, 1Actually,

to the reader.

one must again argue using Assumption 13.7. We leave the details

15. SINGULAR PROBLEMS

107

the double integral being absolutely convergent. Similarly ZZ hu, Rλ viW = v ∗ (x)W (x)G∗ (y, x, λ)W (y)u(y) dxdy, and since the integrals are equal by Theorem 5.2 (2) and G(x, y, λ) − G∗ (y, x, λ) = F (x, λ)(M (λ) − M ∗ (λ))F ∗ (y, λ) we obtain hF (·, λ), viW (M (λ) − M ∗ (λ))hu, F ∗ (·, λ)iW = 0. By Assumption 13.7 this implies that M (λ) = M ∗ (λ) and thus (4). Finally, to prove (5) we use the resolvent relation Theorem 5.2(3). For u ∈ L2W this gives hu, G∗ (x, ·, λ) − G∗ (x, ·, µ)iW = Rλ u(x) − Rµ u(x) = (λ − µ)Rλ Rµ u(x) = (λ − µ)hRµ u, G∗ (x, ·, λ)iW = hu, (λ − µ)Rµ G∗ (x, ·, λ)iW . Now Rµ G∗ (x, ·, λ)(y) = hG∗ (x, ·, λ), G∗ (y, ·, µ)iW . Thus G(x, y, λ) − G(x, y, µ) = (λ − µ)hG∗ (y, ·, µ), G∗ (x, ·, λ)iW = (λ − µ)Rλ G∗ (y, ·, µ)(x), since both sides are clearly in D(T ). This proves (5).

¤

Before we proceed, we note the following corollary, which completes our results for the case of a discrete spectrum. Corollary 15.3. Suppose for some non-real λ that all solutions of Ju0 + Qu = λW u and Ju0 + Qu = λW u are in L2W . Then for any selfadjoint realization T the resolvent of T is compact. In other words, if the deficiency indices are maximal, then the resolvent is compact. Actually, the assumptions are here a bit stronger than needed. In fact, it is not difficult to show (Exercise 15.1) that if all solutions are in L2W for some λ, real or not, then the same is true for all λ. Proof. One could use a version of Theorem 8.7 valid for L2W and show that Rλ is a Hilbert-Schmidt operator. Here is an alternative proof. Suppose uj * 0 weakly in L2W and let I = (a, b). Rx Rb Then a F ∗ (y, λ)W (y)uj (y) dy and x F ∗ (y, λ)W (y)uj (y) dy are both bounded uniformly with respect to x by Cauchy-Schwarz and since the columns of F (·, λ) are in L2W . The latter fact also shows that the integrals tend pointwise to 0 as j → ∞. Since also the columns of F (·, λ) are in L2W it follows that Rλ uj → 0 strongly in L2W by dominated convergence. ¤

108

15. SINGULAR PROBLEMS

We will give an expansion theorem generalizing the Fourier series expansion obtained for a discrete spectrum. The first step is the following lemma. Lemma 15.4. Let M (λ) be as in Theorem 15.2. Then there is a unique increasing and left-continuous matrix-valued function P with P (0) = 0 and unique Hermitian matrices A and B ≥ 0 such that Z∞ (15.3)

M (λ) = A + Bλ +

(

1 t − 2 ) dP (t). t−λ t +1

−∞

Proof. If S = F (c, λ) Theorem 15.2.(5) gives S(M (λ) − M (µ))S ∗ = (λ − µ)Rλ G∗ (c, ·, µ)(c), where the constant matrix S is invertible. Thus M (λ) is analytic in ρ(T ), since the resolvent Rλ : L2W → C(K) is. Furthermore, for µ = λ non-real we obtain 1 1 (M (λ) − M ∗ (λ)) = (M (λ) − M (λ)) 2i Im λ 2i Im λ = S −1 hG∗ (c, ·, λ), G∗ (c, ·, λ)iW (S −1 )∗ ≥ 0. Thus M is a ‘matrix-valued Nevanlinna function’. We now obtain the representation (15.3) by applying Theorem 6.1 to the Nevanlinna function m(λ, u) = u∗ M (λ)u where u is an n × 1-matrix. Clearly the quantities α, β and ρ in the representation (6.1) are Hermitian forms in u, so (15.3) follows. ¤ The function P is called the spectral matrix for T . We now define the Hilbert space L2P in the following way. We consider n × 1 matrixvalued Borel functions uˆ, so that they are measurable R ∞ ∗ with respect to all elements of dP , and for which the integral −∞ uˆ (t) dP (t) uˆ(t) < ∞. The elements of L2P are equivalence classes of such functions, two functions u, v being equivalent if they are equal a.e. with respect to dP , i.e., if dP (u − v) has all elements equal to the zero measure. We denote the scalar product in this space by h·, ·iP and the norm by k·kP . Note that one may write the scalar product in a somewhat more familiar way by using the Radon-Nikodym theorem to find a measure dµ with respect to which all the entries in dP are absolutely continuous; one may for example let dµ be the sum of all diagonal elements in dP . One then has dP = Ω dµ, where Ω is a non-negative matrix of functions locally with respect to dµ, and the scalar product R ∞ integrable ∗ is hˆ u, vˆiP = −∞ vˆ Ωˆ u dµ. Alternatively, we define L2P as the completion of compactly supported, continuous n × 1 matrix-valued functions with respect to the norm k·kP . These alternative definitions give the same space (Exercise 15.2). The main result of this chapter is the following.

15. SINGULAR PROBLEMS

109

Theorem 15.5.

R (1) The integral K F ∗ (y, t)W (y)u(y) dy converges in L2P for u ∈ L2W as K → I through compact subintervals of I. The limit is called the generalized Fourier transform of u and is denoted by F(u) or uˆ. We write this as uˆ(t) = hu, F (·, t)iW , although the integral may not converge pointwise. (2) The mapping u 7→ uˆ has kernel H∞ and is unitary between HT and L2P so that the Parseval formula hu, viW = hˆ u, vˆiP holds 2 if u, v ∈ LW and R at least one of them is in HT . (3) The integral K F (x, t)dP (t)ˆ u(t) converges in HT as K → R through compact intervals. If uˆ = F(u) the limit is PT u, where PT is the orthogonal projection onto HT . In particular the integral is the inverse of the generalized Fourier transform on HT . Again, we write u(x) = hˆ u, F ∗ (x, ·)iP for u ∈ HT , although the integral may not converge pointwise. ˜ (4) Let E∆ denote the R spectral projector of T for the interval ∆. Then E∆ u(x) = ∆ F (x, t) dP (t) uˆ(t). (5) If (u, v) ∈ T then F(v)(t) = tˆ u(t). Conversely, if uˆ and tˆ u(t) 2 −1 are in LP , then F (ˆ u) ∈ D(T ).

We will prove Theorem 15.5 through a sequence of lemmas. First note that for u ∈ L2W with compact support, the function uˆ(λ) = hu, F (·, λ)iW is an entire, matrix-valued function of λ since F (x, λ), and thus also F ∗ (x, λ), is entire, locally uniformly in x, according to Theorem 15.1. u(λ) is entire Lemma 15.6. The function hRλ u, viW − vˆ∗ (λ)M (λ)ˆ for all u, v ∈ L2W with compact supports. Proof. If the supports are inside [a, b], direct calculation shows that the function is Zb µZx 1 2 a

a

Zb ¶ − v ∗ (x)W (x)F (x, λ)J −1 F ∗ (y, λ)W (y)u(y) dy dx . x

This is obviously an entire function of λ.

¤

As usual we denote the spectral projectors belonging to T (i.e., those belonging to T˜) by Et . Lemma 15.7. Let u ∈ L2W have compact support and assume a < b to be points of differentiability for both hEt u, ui and P (t). Then Zb (15.4)

uˆ∗ (t) dP (t) uˆ(t).

hEb u, ui − hEa u, ui = a

110

15. SINGULAR PROBLEMS

Proof. Let Γ be the positively oriented rectangle with corners in a ± i, b ± i. According to Lemma 15.6 I I hRλ u, ui dλ = uˆ∗ (λ)M (λ)ˆ u(λ) dλ Γ

Γ

if either of these integrals exist. However, by Lemma 15.4, I

I ∗

Z∞ ∗

uˆ (λ)M (λ)ˆ u(λ) dλ =

uˆ (λ)

Γ

(

1 t − 2 ) dP (t) uˆ(λ) dλ . t−λ t +1

−∞

Γ

The double integral is absolutely convergent except perhaps where t = λ. The difficulty is thus caused by Z1

Zµ+1 ds

−1

uˆ∗ (µ − is)dP (t)ˆ u(µ + is) t − µ − is

µ−1

for µ = a, b. However, Lemma 11.10 ensures the absolute convergence of these integrals. Changing the order of integration gives Z∞ I

I

uˆ∗ (λ)dP (t)ˆ u(λ)(

uˆ∗ (λ)M (λ)ˆ u(λ) dλ =

1 t − 2 ) dλ t−λ t +1

−∞ Γ

Γ

Zb uˆ∗ (t)dP (t)ˆ u(t)

= −2πi a

since for a < t < b the residue of the inner integral is −ˆ u∗ (t)dP (t)ˆ u(t) whereas t = a, b do not carry any mass and the inner integrand is regular for t < a and t > b. Similarly we have I

Z∞ hRλ u, ui dλ =

Γ

I dhEt u, ui

−∞

dλ = −2πi t−λ

Γ

which completes the proof.

Zb dhEt u, ui a

¤

2 Lemma 15.8. If u ∈ L2W R the∗ generalized Fourier transform uˆ ∈ LP 2 exists as the LP -limit of K F (y, t)W (y)u(y) dy as K → I through compact subintervals of I. Furthermore,

Zt vˆ∗ (t) dP (t) uˆ(t).

hEt u, viW = −∞

In particular, hPT u, viW = hˆ u, vˆiP if u and v ∈ L2W .

15. SINGULAR PROBLEMS

111

Proof. If u has compact support Lemma 15.7 shows that (15.4) holds for a dense set of values a, b since functions of bounded variation are a.e. differentiable. Since both Et and P are left-continuous we obtain, by letting b ↑ t, a → −∞ through such values, Zt hEt u, vi = vˆ∗ (t) dP (t) uˆ(t) −∞

when u, v have compact supports; first for u = v and then in general by polarization. If PT is the projection of L2W onto HT we obtain as t → ∞ also that hPT u, viW = hˆ u, vˆiP when u and v have compact supports. For arbitrary u ∈ L2W we set, for a compact subinterval K of I, ( u(x) for x ∈ K uK (x) = 0 otherwise and obtain a transform uˆK . If L is another compact subinterval of I it follows that kˆ uK − uˆL kP = kPT (uK − uL )kW ≤ kuK − uL kW , and since uK → u in L2W as K → I, Cauchy’s convergence principle shows that uˆK converges to an element uˆ ∈ L2P as K → I. The lemma now follows in full generality by continuity. ¤ Note that we have proved that F is an isometry on HT , and a partial isometry on L2W . R Lemma 15.9. The integral K F (x, t) dP (t) uˆ(t) is in HT if K is a compact interval and uˆ ∈ L2P , and as K → R the integral converges in HT . The limit F −1 (ˆ u) is called the inverse transform of uˆ. If u ∈ L2W −1 then F (F(u)) = PT u. F −1 (ˆ u) = 0 if and only if uˆ is orthogonal in 2 LP to all generalized Fourier transforms. u, F ∗ (x, ·)iP Proof. If uˆ ∈ L2P has compact support, then u(x) = hˆ is continuous, so uK ∈ L2W for compact subintervals K of I, and has a transform uˆK . We have Z∞ Z u(t) dx . kuK k2W = u∗ (x)W (x) F (x, t)dP (t)ˆ K

−∞

Considered as a double integral this is absolutely convergent, so changing the order of integration we obtain Z∞ µZ kuK k2W

¶∗ ∗

=

F (x, t)W (x)u(x) dx −∞

dP (t) uˆ(t)

K

= hˆ u, uˆK iP ≤ kˆ ukP kˆ uK kP ≤ kˆ ukP kuK kW , according to Lemma 15.8. Hence kuK kW ≤ kˆ ukP , so u ∈ L2W , and 2 kukW ≤ kˆ ukP . If now uˆ ∈ LP is arbitrary, this inequality shows (like

112

15. SINGULAR PROBLEMS

R in the proof of Lemma 15.8) that K F (x, t)dP (t) uˆ(t) converges in L2W as K → R through compact intervals; call the limit u1 . If v ∈ L2W , vˆ is its generalized Fourier transform, K is a compact interval, and L a compact subinterval of I, we have Z Z ( F ∗ (x, t)W (x)v(x) dx)∗ dP (t) uˆ(t) K

L

Z

Z ∗

=

v (x)W (x) L

F (x, t)dP (t) uˆ(t) dx K

by absolute convergence. Letting L → I and K → R we obtain hˆ u, vˆiP = hu1 , viW . If uˆ is the transform of u, then by Lemma 15.8 u1 − u is orthogonal to HT , so u1 = PT u. Similarly, u1 = 0 precisely if uˆ is orthogonal to all transforms. ¤ We have shown the inverse transform to be the adjoint of the transform as an operator from L2W into L2P . The basic remaining difficulty is to prove that the transform is surjective, i.e., according to Lemma 15.9, that the inverse transform is injective. The following lemma will enable us to prove this. Lemma 15.10. The transform of Rλ u is uˆ(t)/(t − λ). Rt Proof. By Lemma 15.8, hEt u, viW = −∞ vˆ∗ dP uˆ, so that Z∞ hRλ u, viW =

dhEt u, vi = t−λ

−∞

Z∞

vˆ∗ (t) dP (t) uˆ(t) = hˆ u(t)/(t − λ), vˆ(t)iP . t−λ

−∞

By properties of the resolvent kRλ uk2 =

1 hRλ u − Rλ u, uiW = 2i Im λ Z∞ dhEt u, uiW = kˆ u(t)/(t − λ)k2P . |t − λ|2 −∞

Setting v = Rλ u and using Lemma 15.8, it therefore follows that kˆ u(t)/(t − λ)k2P = hˆ u(t)/(t − λ), F(Rλ u)iP = kF (Rλ u)k2P . It follows that we have kˆ u(t)/(t−λ)−F (Rλ u)kP = 0, which was to be proved. ¤ Lemma 15.11. The generalized Fourier transform is unitary from HT to L2P and the inverse transform is the inverse of this map.

15. SINGULAR PROBLEMS

113

Proof. According to Lemma 15.9 we need only show that if u ˆ∈ has inverse transform 0, then uˆ = 0. Now, according to Lemma 15.10, F(v)(t)/(t − λ) is a transform for all v ∈ L2W and non-real λ. Thus we have hˆ u(t)/(t − λ), F(v)(t)iP = 0 for all non-real λ if uˆ is orthogonal to all transforms. But we can scalar product as the R t view this ∗ Stieltjes-transform of the measure −∞ F(v) dP uˆ, so applying the inR version formula Lemma 6.5 we have K F(v)∗ dP uˆ = 0 for all compact intervals K, and all v ∈ L2W . Thus the cutoff of uˆ, which equals uˆ in K and 0 outside, is also orthogonal to all transforms, i.e., has inverse transform 0 according to Lemma 15.9. It follows that Z F (x, t)dP (t) uˆ(t) L2P

K

L2W for ∗

is the zero-element of any compact interval K. Now multiply this from the left with F (x, s)W (x) and integrate with respect to x over a large compact subinterval L ⊂ I. We obtain Z B(s, t)dP (t) uˆ(t) = 0 for every s, K

R

where B(s, t) = L F ∗ (x, s)W (x)F (x, t) dx. Thus B(s, t)dP (t) uˆ(t) is the zero measure for all s. By Assumption 13.7 the matrix B(s, t) is invertible for s = t, so by continuity it is, given s, invertible for t sufficiently close to s. Thus, varying s, it follows that dP (t) uˆ(t) is the zero measure in a neighborhood of every point. But this means that uˆ = 0 as an element of L2P . ¤ Lemma 15.12. If (u, v) ∈ T , then vˆ(t) = tˆ u(t). Conversely, if uˆ 2 −1 and tˆ u(t) are in LP , then F (ˆ u) ∈ D(T ). Proof. We have (u, v) ∈ T if and only if u = Rλ (v − λu), which holds if and only if uˆ(t) = (ˆ v (t) − λˆ u(t))/(t − λ), i.e., vˆ(t) = tˆ u(t), according to Lemmas 15.10 and 15.11. ¤ This completes the proof of Theorem 15.5. We also have the following analogue of Corollary 14.5. Theorem 15.13. Suppose u ∈ D(T ). Then the inverse transform hˆ u, F ∗ (x, ·)iP converges locally uniformly to u(x). Proof. The proof is very similar to that of Corollary 14.5. Put v = (T˜ − i)u so that v ∈ RHT and u = Ri v. Let K be a compact interval, and put uK (x) = K F (x, t) dP (t) uˆ(t) = F −1 (χˆ u)(x), where χ is the characteristic function for K. Define vK similarly. Then by Lemma 15.10 χ(t)ˆ v (t) ) = F −1 (χˆ u) = uK . Ri vK = F −1 ( t−i

114

15. SINGULAR PROBLEMS

Since vK → v in L2W as K → R, it follows from Theorem 14.2 that uK → u in C(L) as K → R, for any compact subinterval L of I. ¤ Example 15.14. Let us interpret Theorem 15.5 for the case of the operator of Example 4.6, Green’s function of which is given in Example 8.8. Comparing (8.2) with (15.1), we see that M (λ) = i/2 for λ in the upper half plane. RBy Lemma 6.5 the corresponding spectral t measure is P (t) = limε→0 π1 0 Im M (µ + iε) dµ = 2πt . This means that Rb if f ∈ L2 (R), then as a, b → ∞ the integral −a f (x) e−ixt dt converges in the sense of L2 (R) to a function fˆ ∈ L2 (R). Furthermore the integral Rb 1 fˆ(t) eixt dt converges in the same sense to f as a and b → ∞. 2π −a R∞ R 1 ∞ ˆ2 We also conclude that −∞ |f |2 = 2π |f | . Finally, if f is locally −∞ absolutely continuous and together with its derivative in L2 (R), then the transform of −if 0 is tfˆ(t) and conversely, if fˆ and tfˆ(t) are both in L2 (R), then the inverse transform of fˆ is locally absolutely continuous, and its derivative is in L2 (R) and is the inverse transform of itfˆ(t). We also get from Theorem 15.13 that if f has these properties, then the inverse transform of fˆ converges absolutely and locally uniformly to f . Actually, it is here easy to see that the convergence is uniform on the whole axis, but nevertheless it is clear that we have retrieved all the basic properties of the classical Fourier transform. Exercises for Chapter 15 Exercise 15.1. Use, e.g., estimates in the variation of constants formula Lemma 13.4 for v = (λ − µ)u to show that all columns of F (x, µ) are in L2W , then so are those of F (x, λ). Exercise 15.2. Show that the two definitions of L2P given in the text are equivalent. What needs to be proved is that any measurable n × 1 matrix-valued function with finite norm can be approximated in norm by a similar function which is C0∞ . Hint: Use a cut off and convolution with a C0∞ -function of small support. 15.9 is claimed that for every compact Exercise 15.3. In Lemma R interval K the integral K F (x, t) dP (t) uˆ(t) ∈ HT , but this is never proved; or is it? Clarify this point! Exercise 15.4. Consider, as in the beginning of Chapter 10, the first order system corresponding to a general Sturm-Liouville equation −(pu0 )0 + qu = λwu on [a, b), where 1/p, q and w are integrable on any interval [a, x], x ∈ (a, b). Also assume that p and q are real-valued functions and w ≥ 0 and not a.e. equal to 0. Consider a selfadjoint realization given by separated

EXERCISES FOR CHAPTER 15

115

boundary conditions (cf. Chapters 10 and 13). This will be a condition at a, and if the boundary form does not vanish at b, also a condition at b. Choose the point c = 0 and the fundamental matrix F such that its first ³ column ´ satisfies the boundary condition at a. Show that m(λ) 21 M (λ) = , where the Titchmarsh-Weyl function m(λ) is a 1 0 2 scalar-valued Nevanlinna function. ³ ´ ϕ θ Now write F = −pϕ0 −pθ . Show that there is a scalar Green’s 0 function for the operator given by ½ ϕ(x, λ)ψ(y, λ), x < y, g(x, y, λ) = ψ(x, λ)ϕ(y, λ), y < x, where ψ(x, λ) = θ(x, λ) + m(λ)ϕ(x, λ), with the property that the solution of −(pu0 )0 + qu = λwu + wv which is in RL2w and satisfies the ∞ boundary conditions is given by u(x) = Rλ v(x) ¡ ρ 0= ¢ 0 g(x, y, λ)v(y) dy. Show also that the spectral matrix P = 0 0 , where the spectral function ρ is the function in the representation (6.1) for the function m(λ), and that Zb |ψ(x, λ)|2 dx.

Im m(λ) = Im λ a

Finally show that the generalized Fourier transform of ψ is always given ˆ λ) = 1/(t − λ). by ψ(t, Thus the spectral theory for the general Sturm-Liouville equation has precisely the same basic features as for the simple case treated in Chapter 11.

APPENDIX A

Functional analysis In this appendix we will give the proofs of some standard theorems from functional analysis. They are all valid in more general situations than stated here. As is usual, our proofs will be based upon the following important theorem. We have stated it for a Banach space, but the proof would be the same in any complete, metric space. Theorem A.1 (Baire). Suppose B is a Banach space and F1 , F2 , . . . a sequence of closed subsets of B. If all Fn fail to have interior points, so does ∪∞ n=1 Fn . In particular, the union is a proper subset of B. Proof. Let B 0 = {x ∈ B | kx − x0 k ≤ R0 } be an arbitrary closed ball. We must show that it can not be contained in ∪∞ n=1 Fn . We do this by first selecting a decreasing sequence of closed balls B 0 ⊃ B 1 ⊃ B 2 ⊃ · · · such that the radii Rn → 0 and B n ∩ Fn = ∅ for each n. But if we already have chosen B 0 , . . . , B n we can find a point xn+1 ∈ Bn (in the interior of B n ) which is not contained in Fn+1 , since Fn+1 has no interior points. Since Fn+1 is closed we can choose a closed ball ⊂ Bn , centered at xn+1 , and which does not intersect Fn+1 . If we also make sure that the radius Rn+1 is at most half of the radius Rn of B n , it follows by induction that we may find a sequence of balls as required. For k > n we have xk ∈ B n so that kxk − xn k ≤ Rn → 0, so that x1 , x2 , . . . is a Cauchy sequence, and thus converges to a limit x. We have x ∈ B n for every n since xk ∈ B n for k > n and B n is closed. Thus x is not contained in any Fn . B 0 being arbitrary, it follows that no ball is contained in ∪∞ n=1 Fn , which therefore has no interior points, and the proof is complete. ¤ A set which is a subset of the union of countably many closed sets without interior points, is said to be of the first category. More picturesquely such a set is said to be meager. Meager subsets of Rn have many properties in common with, or analogous to, sets of Lebesgue measure zero. There is no direct connection, however, since a meager set may have positive measure, and a set of measure zero does not have to be meager. A set which is not meager is said to be of the second category, or to be non-meager (how about fat?). The basic properties of meager sets are the following. 117

118

A. FUNCTIONAL ANALYSIS

Proposition A.2. A subset of a meager set is meager, a countable union of meager sets is meager, and no meager set has an interior point. Proof. The first two claims are left as exercises for the reader to verify; the third claim is Baire’s theorem. ¤ The following theorem is one of the cornerstones of functional analysis. Theorem A.3 (Banach). Suppose B1 and B2 are Banach spaces and T : B1 → B2 a bounded, injective (one-to-one) linear map. If the range of T is not meager, in particular if it is all of B2 , then T has a bounded inverse, and the range is all of B2 . Proof. We denote the norm in Bj by k·kj . Let An = {T x | kxk1 ≤ n} be the image of the closed ball with radius n, centered at 0 in B1 . The balls expand to all of B1 as n → ∞, so the range of T is ∪∞ n=1 An ⊂ ∞ ∪n=1 An . The range not being meager, at least one An must have an interior point y0 . Thus we can find r > 0 so that {y0 + y | kyk2 < r} ⊂ An . Since An is symmetric with respect to the origin, also −y0 +y ∈ An if kyk2 < r. Furthermore, An is convex, as the closure of (the linear image of) a convex set. It follows that y = 12 ((y0 +y)+(−y0 +y)) ∈ An . Thus 0 is an interior point of An . Since all An are similar (An = nA1 ), 0 is also an interior point of A1 . This means that there is a number C > 0, such that any y ∈ B2 for which kyk2 ≤ C is in A1 . For such y we may therefore find x ∈ B1 with kxk1 ≤ 1, such that T x is arbitrarily close to y. For example, we may find x ∈ B1 with kxk1 ≤ 1 such that ky − T xk2 ≤ 21 C. For arbitrary non-zero y ∈ B2 we set C y˜ = kyk y, and then have k˜ y k2 = C, so we can find x˜ with k˜ xk 1 ≤ 1 2 and k˜ y − T x˜k2 ≤ 12 C. Setting x = (A.1)

kxk1 ≤

1 kyk2 C

and

kyk2 x˜ C

we obtain

ky − T xk2 ≤ 12 kyk2 .

Thus, to any y ∈ B2 we may find x ∈ B1 so that (A.1) holds (for y = 0, take x = 0). ∞ We now construct two sequences {xj }∞ j=0 and {yj }j=0 , in B1 respectively B2 , by first setting y0 = y. If yn is already defined, we define xn and yn+1 so that kxn k1 ≤ C1 kyn k2 , yn+1 = yn − T xn , and kyn+1 k2 ≤ 1 kyn k2 . We obtain kyn k2 ≤ 2−n kyk2 and kxn k1 ≤ C1 2−n kykP 2 from this. 2 Furthermore, T xn = yn+1 − yn , so adding we obtain T ( nj=0 xj ) = P∞ y − yn+1 → y as n → ∞. But the series j=0 kxj k1 converges, P∞ −j 1 2 since it is dominated by C kyk2 j=0 2 = C kyk2 . Since B1 is comP plete, the series ∞ j=0 xj therefore converges to some x ∈ B1 satisfying 2 kxk1 ≤ C kyk2 , and since T is continuous we also obtain T x = y. In

A. FUNCTIONAL ANALYSIS

119

other words, we can solve T x = y for any y ∈ B2 , so the inverse of T is defined everywhere, and the inverse is bounded by C2 , so it is continuous. The proof is complete. ¤ In these notes we do not actually use Banach’s theorem, but the following simple corollary (which is actually equivalent to Banach’s theorem). Recall that a linear map T : B1 → B2 is called closed if the graph {(u, T u) | u ∈ D(T )} is a closed subset of B1 ⊕ B2 . Equivalently, if uj → u in B1 and T uj → v in B2 implies that u ∈ D(T ) and T u = v. Corollary A.4 (Closed graph theorem). Suppose T is a closed linear operator T : B1 → B2 , defined on all of B1 . Then T is bounded. Proof. The graph {(u, T u) | u ∈ B1 } is by assumption a Banach space with norm k(u, T u)k = kuk1 + kT uk2 , where k·kj denotes the norm of Bj . The map (u, T u) 7→ u is linear, defined in this Banach space, with range equal to B1 , and it has norm ≤ 1. It is obviously injective, so by Banach’s theorem the inverse is bounded, i.e., there is a constant so that k(u, T u)k ≤ Ckuk1 . Hence also kT uk2 ≤ Ckuk1 , so that T is bounded. ¤ In Chapter 3 we used the Banach-Steinhaus theorem, Theorem 3.10. Since no extra effort is involved, we prove the following slightly more general theorem. Theorem A.5 (Banach-Steinhaus; uniform boundedness principle). Suppose B is a Banach space, L a normed linear space, and M a subset of the set L(B, L) of all bounded, linear maps from B into L. Suppose M is pointwise bounded, i.e., for each x ∈ B there exists a constant Cx such that kT xkB ≤ Cx for every T ∈ M . Then M is uniformly bounded, i.e., there is a constant C such that kT xkB ≤ CkxkL for all x ∈ B and all T ∈ M . Proof. Put Fn = {x ∈ B | kT xkB ≤ n for all T ∈ M }. Then Fn is closed, as the intersection of the closed sets which are inverse images of the closed interval [0, n] under a continuous function B1 3 x 7→ kT xkL ∈ R. The assumption means that ∪∞ n=1 Fn = B. By Baire’s theorem at least one Fn must have an interior point. Since Fn is convex (if x, y ∈ Fn and 0 ≤ t ≤ 1, then ktT x + (1 − t)T ykL ≤ tkT xkL + (1 − t)kT ykL ≤ n) and symmetric with respect to the origin it follows, like in the proof of Banach’s theorem, that 0 is an interior point in Fn . Thus, for some r > 0 we have kT xkL ≤ n for all T ∈ M , if kxkB ≤ r. By homogeneity follows that kT xkL ≤ nr kxkB for all T ∈ M and x ∈ B. ¤

APPENDIX B

Stieltjes integrals The Riemann-Stieltjes integral is a simple generalization of the (one-dimensional) Riemann integral. To define it, let f and g be two functions defined on the compact interval [a, b]. For every partition ∆ = {xj }nj=0 of [a, b], i.e., a = x0 < x1 < · · · < xn = b, we let the mesh of ∆ be |∆| = max(xk − xk−1 ). This is the length of the longest subinterval of [a, b] in the partition. We also choose from each subinterval [xk−1 , xk ] a point ξk and form the sum s=

n X

f (ξk )(g(xk ) − g(xk−1 )) .

k=1

Now suppose that s tends to a limit as |∆| → 0 independently of the partition ∆ and choice of the points ξk . The exact meaning of this is the following: There exists a number I such that for every ε > 0 there is a δ > 0 such that |s − I| < ε as soon as |∆| < δ. In this case we say that the integrand f is Riemann-Stieltjes integrable with respect to the integrator g andRthat the corresponding Rintegral equals I. We denote b b this integral by a f (x) dg(x) or simply a f dg. The choice g(x) = x gives us, of course, the ordinary Riemann integral. Proposition B.1. A function f is integrable with respect to a function g if and only if for every ε > 0 there exists a δ > 0 such that for any two partitions ∆ and ∆0 and the corresponding sums s and s0 , we have |s − s0 | < ε as soon as |∆| and |∆0 | are both < δ. This is of course a version of the Cauchy convergence principle. We leave the proof as an exercise (Exercise B.1). From the definition the following calculation rules follow immediately (Exercise B.2). Zb Zb Zb f1 dg + f2 dg = (f1 + f2 ) dg, (1) a

(2) C Zb (3)

a

Zb

f dg = a

Cf dg, a

Zb

f dg1 + a

a

Zb

Zb f dg2 =

a

f d(g1 + g2 ), a

121

122

B. STIELTJES INTEGRALS

Zb (4) C Zb (5)

Zb f dg =

a

Zd

f dg = a

f d(Cg), a

Zb

f dg + a

f dg for a < d < b. d

where f , f1 , f2 , g, g1 and g2 are functions, C a constant and the formulas should be interpreted to mean that if the integrals to the left of the equality sign exist, then so do the integrals to the right, and equality holds. Proposition B.2 (Change of variables). Suppose that h is continuous and increasing and f is integrable with respect to g over [h(a), h(b)]. Then the composite function f ◦ h is integrable with respect to g ◦ h over [a, b] and Zb Zh(b) f dg = f ◦ h d(g ◦ h). h(a)

a

We leave the proof also of this proposition to the reader (Exercise B.3). The formula for integration by parts takes the following nicely symmetric form in the context of the Stieltjes integral. Theorem B.3 (Integration by parts). If f is integrable with respect to g, then g is also integrable with respect to f and Zb

Zb g df = f (b)g(b) − f (a)g(a) −

a

f dg. a

Proof. Let a = x0 < x1 < · · · < xn = b be a partition ∆ of [a, b] and suppose xk−1 ≤ ξk ≤ xk , k = 1, . . . , n. Set ξ0 = a, ξn+1 = b. Then a = ξ0 ≤ ξ1 ≤ · · · ≤ ξn+1 = b gives a partition ∆0 (one discards any ξk+1 which is equal to ξk ) of [a, b] for which |∆0 | ≤ 2|∆| (check this!). We have ξk ≤ xk ≤ ξk+1 and s=

n X k=1

g(ξk )(f (xk ) − f (xk−1 )) =

n X

g(ξk )f (xk ) −

k=1

= f (b)g(b) − f (a)g(a) −

n−1 X

g(ξk+1 )f (xk )

k=0 n X

f (xk )(g(ξk+1 ) − g(ξk )).

k=0

Rb If |∆| → 0 we have |∆0 | → 0, so the last sum converges to a f dg (note that if ξk+1 = ξk then the corresponding term in the sum is 0). Rb It follows that s converges to f (b)g(b) − f (a)g(a) − a f dg and the theorem follows. ¤

B. STIELTJES INTEGRALS

123

Note that Theorem B.3 is a statement about the Riemann-Stieltjes integral; for more general (Lebesgue-Stieltjes) integrals it is not true without further assumptions about f and g. The reason is that the Riemann-Stieltjes integrals can not exist if f and g have discontinuities in common (Exercise B.4), whereas the Lebesgue-Stieltjes integrals exist as soon as f and g are, for example, both monotone. In such a case the integration by parts formula only holds under additional assumptions, for example if f is continuous to the right and g to the left in any common point of discontinuity, or if both f and g are normal, i.e., their values at points of discontinuity are the averages of the corresponding left and right hand limits. So far we don’t know that any function is integrable with respect to any other (except for g(x) = x which is the case of the Riemann integral). Theorem B.4. If g is non-decreasing on [a, b], then every continuous function f is integrable with respect to g and we have Zb ¯ ¯ ¯ f dg ¯ ≤ max|f |(g(b) − g(a)). [a,b]

a

Proof. Let ∆0 and ∆00 be partitions a = x00 < x01 < · · · < x0m = b and a = x000 < x001 < · · · < x00n = P b of [a, b] and consider the corresponding m 0 0 0 0 00 Riemann-Stieltjes sums s = k=1 f (ξk )(g(xk ) − g(xk−1 )) and s = Pn 00 00 00 0 00 k=1 f (ξk )(g(xk )−g(xk−1 )). If we introduce the partition ∆ = ∆ ∪∆ , supposing it to be a = x0 < x1 < · · · < xp = b, we can write p X 0 00 s −s = (f (ξk0 j ) − f (ξq00j ))(g(xj ) − g(xj−1 )) j=1

where kj = k for all j for which [xj−1 , xj ] ⊂ [x0k−1 , x0k ] and qj = k for all j for which [xj−1 , xj ] ⊂ [x00k−1 , x00k ] (check this carefully!). Thus, for all j, ξk0 j and xj are in the same subinterval of the partition ∆0 , and ξq00j and xj in the same subinterval of the partition ∆00 . It follows that |ξk0 j − ξq00j | ≤ |ξk0 j − xj | + |ξq00j − xj | ≤ |∆0 | + |∆00 | for all j. Since f is uniformly continuous on [a, b], this means that given ε > 0, then |f (ξk0 j ) − f (ξq00j )| ≤ ε if |∆0 | and |∆00 | are both small enough. It follows P that |s0 −s00 | ≤ ε pj=1 |g(xj )−g(xj−1 | = ε(g(b)−g(a)) for small enough |∆0 | and |∆00 |. Thus f is integrable P with respect to g according to 0 Proposition B.1. We also have |s | ≤ nk=1 |f (ξk0 )||g(x0k ) − g(x0k−1 )| ≤ max|f |(g(b) − g(a)) so the proof is complete. ¤ As a generalization of Theorem B.4 we may of course take g to be any function which is the difference of two non-decreasing functions. Such a function is called a function of bounded variation. We shall briefly discuss such functions; the main point is that they are characterized by having finite total variation.

124

B. STIELTJES INTEGRALS

Definition B.5. Let f be a real-valued function defined on [a, b]. Then the total variation of f over [a, b] is (B.1)

n X V (f ) = sup |f (xk ) − f (xk−1 )|, ∆

k=1

the supremum taken over all partitions ∆ = {x0 , x1 , . . . , xn } of [a, b]. We have 0 ≤ V (f ) ≤ +∞, and if V (f ) is finite, we say that f has bounded variation on [a, b]. When the interval considered is not obvious from the context, one may write the total variation of f over [a, b] as Vab (f ); another common Rb notation is a |df |. As we mentioned above, a function of bounded variation can also be characterized as a function which is the difference of two non-decreasing functions. Theorem B.6. (1) The total variation Vab (f ) is an interval additive function, i.e., if a < x < b we have Vax (f ) + Vxb (f ) = Vab (f ). (2) A function of bounded variation on an interval [a, b] may be written as the difference of two non-decreasing functions. Conversely, any such difference is of bounded variation. (3) If f is of bounded variation on [a, b], then there are nondecreasing functions P and N , such that f (x) = f (a) + P (x) − N (x), called the positive and negative variation functions of f on [a, b], with the following property: For any pair of nondecreasing functions u, v for which f = u − v holds u(x) ≥ u(a) + P (x) and v(x) ≥ v(a) + N (x) for a ≤ x ≤ b. Proof. It is clear that if a < x < b and ∆, ∆0 are partitions of [a, x] respectively [x, b], then ∆ ∪ ∆0 is a partition of [a, b]; the corresponding sum is therefore ≤ Vab (f ). Taking supremum over ∆ and then ∆0 it follows that Vax (f ) + Vxb (f ) ≤ Vab (f ). On the other hand, in calculating Vab (f ), we may restrict ourselves to partitions ∆ containing x, since adding new points canP only increase the sum (B.1). If ∆ = {x0 , . . . P , xn } and x = xp we have pk=1 |f (xk ) − f (xk−1 )| ≤ Vax (f ) b respectively m k=p+1 |f (xk ) − f (xk−1 )| ≤ Vx (f ). Taking supremum over all ∆ we obtain Vab (f ) ≤ Vax (f ) + Vxb (f ). The interval additivity of the total variation follows. Setting T (x) = Vax (f ) the function T is finite in [a, b]; it is called the total variation function of f over [a, b]. Since by interval additivity T (y)−T (x) = Vxy (f ) ≥ |f (y)−f (x)| ≥ ±(f (y)−f (x)) if a ≤ x ≤ y ≤ b it also follows that T is non-decreasing, as are P = 21 (T + f − f (a)) and N = 12 (T − f + f (a)). But then f = (f (a) + P ) − N is a splitting of f into a difference of non-decreasing functions. Note also that T = P + N . Conversely, if u and v are non-decreasing functions on [a, b]

B. STIELTJES INTEGRALS

125

and {x0 , . . . , xn } a partition of [a, x], a < x ≤ b, then n X

|(u(xk ) − v(xk )) − (u(xk−1 ) − v(xk−1 ))|

k=1



n X

|u(xk ) − u(xk−1 )| +

k=1

n X |v(xk ) − v(xk−1 )| k=1

= u(x) − u(a) + v(x) − v(a), so that Vax (u − v) ≤ u(x) + v(x) − (u(a) + v(a)). In particular, for x = b this shows that u − v is of bounded variation on [a, b]. The inequality also shows that if f = u − v, then P (x) = 21 (T (x) + f (x) − f (a)) ≤ 21 (u(x) − u(a) + v(x) − v(a) + f (x) − f (a)) = u(x) − u(a) . Similarly one shows that N (x) ≤ v(x) − v(a) so that the proof is complete. ¤ We remark that a complex-valued function (of a real variable) is said to be of bounded variation if its real and imaginary parts are. If Tr and Ti are the total variation functions of the real and imaginary parts p of f , then one defines the total variation function of f to be T = Tr2 + Ti2 (sometimes the definition T = Tr + Ti is used). One may also use Definition p B.5 for complex-valued functions, and then it is easily seen that Tr2 + Ti2 ≤ T ≤ Tr + Ti . Since a monotone function can have only jump discontinuities, and at most countably many of them, also functions of bounded variation can have at most countably many discontinuities, all of them jump discontinuities. Moreover, it is easy to see that the positive and negative variation functions (and therefore the total variation function) are continuous wherever f is (Exercise B.7). Corollary B.7. If g is of bounded variation on [a, b], then every continuous function f is integrable with respect to g and we have (B.2)

¯ ¯

Zb

¯ f dg ¯ ≤ max|f |Vab (g). [a,b]

a

Proof. The integrability statement follows immediately from Theorem B.4 on writing g as the difference of non-decreasing functions. To obtain the inequality, consider a Riemann-Stieltjes sum s=

n X k=1

f (ξk )(g(xk ) − g(xk−1 )).

126

B. STIELTJES INTEGRALS

We obtain n X |s| ≤ |f (ξk )||g(xk ) − g(xk−1 )| k=1

≤ max|f | [a,b]

n X k=1

|g(xk ) − g(xk−1 )| ≤ max|f |Vab (g) . [a,b]

Since this inequality holds R b for all Riemann-Stieltjes sums, it also holds for their limit, which is a f dg. ¤ In some cases a Stieltjes integral reduces to an ordinary Lebesgue integral. Theorem B.8. Suppose f is continuous andR g absolutely continuRb b 0 1 ous on [a, b]. Then f g ∈ L (a, b) and a f dg = a f (x)g 0 (x) dx, where the second integral is a Lebesgue integral. The proof of Theorem B.8 is left as an exercise (Exercise B.8).

EXERCISES FOR APPENDIX B

127

Exercises for Appendix B Exercise B.1. Prove Proposition B.1. Exercise B.2. Prove the calculation rules (1)–(5). Exercise B.3. Prove Proposition B.2. Exercise B.4. Show that if f and g has a common point of discontinuity in [a, b], then f is not Riemann-Stieltjes integrable with respect to g over [a, b]. Exercise B.5. Show that if f is absolutely continuous on [a, b], Rb then f is of bounded variation on [a, b], and Vab (f ) = a |f 0 |. Rb Hint: First show Vab (f ) ≥ a |f 0 |. To show the other direction, write Rb (B.1) on the form a ϕf 0 for a stepfunction ϕ and use H¨older’s inequality. Exercise B.6. Show that the set of all functions of bounded variation on an interval [a, b] is made into a normed linear space by setting kf k = |f (a)| + Vab (f ). Convergence in this norm is called convergence in variation. Show that convergence in variation implies uniform convergence, and that the normed space just introduced is complete (any Cauchy sequence of functions in the space converges in variation to a function of bounded variation). Exercise B.7. Show that a monotone function can have at most countably many discontinuities, all of them jump discontinuities. Also show that if a function of bounded variation is continuous to the left (right) at a point, then so are its positive and negative variation functions, and that only if the function jumps up (down) will the positive (negative) variation function have a jump. Hint: How many jumps of size > 1/j can there be? Exercise B.8. Prove Theorem B.8. Also show that if g is absolutely continuous on [a, b], then any Riemann integrable f is integrable with respect to g and the same formula holds. R P Hint: f (ξk )(g(xk ) − g(xk−1 ) = ϕg 0 where ϕ is a step function converging to f . Exercise B.9. SupposeR f , g are continuous and ρ of bounded varit ation in (a, b). Put σ(t) = c f (s) dρ(s) for some c ∈ (a, b). Show that Zb

Zb g(t) dσ(t) =

a

g(t)f (t) dρ(t) . a

Hint: Integrate both sides by parts, first replacing (a, b) by an arbitrary compact subinterval.

APPENDIX C

Linear first order systems In this appendix we will prove some standard results about linear first order systems of differential equations which are used in the text. We will prove no more than we actually need, although the theorems have easy generalizations to non-linear equations, more complicated parameter dependence, etc. The first result is the standard existence and uniqueness theorem, Theorem 13.1, which also implies Theorem 10.1. Theorem. Suppose A is an n × n matrix-valued function with locally integrable entries in an interval I, and that B is an n × 1 matrixvalued function, locally integrable in I. Assume further that c ∈ I and C is an n × 1 matrix. Then the initial value problem ( u0 = Au + B in I, (C.1) u(c) = C, has a unique n × 1 matrix-valued solution u with locally absolutely continuous entries defined in I. Corollaries 13.2 and 10.2 are immediate consequences of the theorem. Corollary. Let A and I be as in the previous theorem. Then the set of solutions to u0 = Au in I is an n-dimensional linear space. Proof. It is clear that any linear combination of solutions is also a solution, so the set of solutions is a linear space. We must show that it has dimension n. Let uk solve the initial value problem with uk (c) equal to the k:th column of the n × n unit matrix. If u is any solution of the equation, and the components of u(c) are x1 , . . . , xn , then the function x1 u1 +· · ·+xn un is also a solution with the same initial data. It therefore coincides with u, and it is clear that no other linear combination of u1 , . . . , un has the same initial data as u. It follows that u1 , . . . , un is a basis for the space of solutions, which therefore is n-dimensional. ¤ Finally we shall prove Theorem 15.1. Theorem. A solution u(x, λ) of Ju0 + Qu = λW u with initial data independent of λ is an entire function of λ, locally uniformly with respect to x. 129

130

C. LINEAR FIRST ORDER SYSTEMS

If we integrate the differential equation in (C.1) from c to x, using the initial data, we get the integral equation Zx (C.2) u(x) = H(x) + Au, c

Rx

where H(x) = C + c B. Conversely, if u is continuous and solves (C.2), then u has initial data H(c) = C and is locally absolutely continuous (being an integral function). Differentiation gives u0 = Au + B, so that the initial value problem is equivalent to the integral equation (C.2). In the case of Theorem 13.1, we put A = J −1 (Q − λW ) and B = 0 to get an equation of the form (C.1). We therefore need to show the following theorems. Theorem C.1. Suppose A has locally integrable, and H locally absolutely continuous, elements. Then the integral equation (C.2) has a unique, locally absolutely continuous solution. Theorem C.2. Suppose that A depends analytically on a parameter 0 λ, in the sense that there is a matrix R 1 A (x, λ) which is locally 0integrable with respect to x, and such that J | h (A(x, λ+h)−A(x, λ))−A (x, λ)| → 0 as h → 0, for all compact subintervals J of I, and all λ in some open set Ω ⊂ C. Then the solution u(x, λ) of (C.2) is analytic for λ ∈ Ω, locally uniformly in x. Proof of Theorem C.1. We will find a series expansion for the solution. To R xdo this, we set u0 = H, and if uk is already defined, we set uk+1 (x) = c Auk . It is then clear that uk is defined for k = 0, 1, . . . inductively, and all uk are (absolutely) continuous. I claim that ½Zx ¾k 1 sup|uk | ≤ sup|H| |A| for k = 0, 1, . . . , k! [c,x] [c,x] c

for x > c, and a similar inequality with c and x interchanged for x < c. Here |·| denotes a norm on n-vectors, and also the corresponding subordinate matrix-norm (so that |Au| ≤ |A||u|). Indeed, the inequality is trivial for k = 0, and supposing it valid for k, we obtain Zx |uk+1 (x)| ≤ c

1 |A||uk | ≤ sup|H| k! [c,x]

½Z t

Zx |A(t)| c

¾k |A|

dt

c

1 = sup|H| (k + 1)! [c,x]

½Zx

¾k+1 |A|

,

c

for c P < x, and a similar inequality for x < c. It follows that the series u= ∞ k=0 uk is absolutely and uniformly convergent on any compact

C. LINEAR FIRST ORDER SYSTEMS

131

subinterval of I. Therefore u is continuous, and u(x) =

∞ X k=0

∞ Z X

x

uk (x) = H(x) +

Auk

k=0 c

Zx = H(x) +

A c

∞ X k=0

Zx uk = H(x) +

Au . c

Thus (C.2) has a solution. To prove the uniqueness, we need the following lemma. Lemma C.3 (Gronwall). Suppose f ∈ C(I) is real-valued, h is a non-negative constant, and g is a locally integrable and non-negative Rx function. Suppose R x that 0 ≤ f (x) ≤ h + | c gf | for x ∈ I. Then f (x) ≤ h exp(| c g|) for x ∈ I. The uniqueness of the solution of (C.2) follows directly from R xthis. For suppose v is the difference of two solutions. Then v(x) = Av, c Rx so setting f = |v| and g = |A| we obtain 0 ≤ f (x) ≤ | c gf |. Hence f ≡ 0 by Lemma B.3, and thus v ≡ 0, so that (C.2) has at most one solution. ¤ It remains to prove the lemma. Proof of Lemma C.3. We will prove the lemma for c < x,R leavx ing the other case as an exercise for the reader. Set F (x) = h + c gf . Then f ≤ F and F 0 =R gf so that F 0 ≤ gF . Multiplying by the inRx x d tegrating factor exp(− g) we get (F (x) exp(− g)) ≤ 0 so that dx c cR Rx x F (x) exp(− c g) is non-increasing. Thus F (x) exp(− g) ≤ F (c) = h c Rx for x ≥ c. We obtain f (x) ≤ F (x) ≤ h exp( c g), which was to be proved. ¤ Proof of Theorem C.2. It is clear by their definitions that the functions uk in the proof of Theorem C.1 are analytic in Ω as functions of λ, locally uniformly in x (this is a trivial induction). But the solution P u is the locally uniform limit, in x, λ, of the partial sums jk=1 uk . Since uniform limits of analytic functions are analytic, we are done. ¤

Bibliography 1. Christer Bennewitz, Symmetric relations on a Hilbert space, In Conference on the Theory of Ordinary and Partial Differential Equations (Univ. Dundee, Dundee, 1972), pages 212–218. Lecture Notes in Math., Vol. 280, Berlin, 1972. Springer. 2. , Spectral theory for pairs of differential operators, Ark. Mat. 15(1):33–61, 1977. 3. , Spectral asymptotics for Sturm-Liouville equations, Proc. London Math. Soc. (3) 59 (1989), no. 2, 294–338. MR 91b:34141 4. , A uniqueness theorem in inverse spectral theory, Lecture at the 1997 Birman symposium in Stockholm. Unpublished, 1997. 5. , Two theorems in inverse spectral theory, Preprints in Mathematical Sciences 2000:15, Lund University, 2000. 6. , A proof of the local Borg-Marchenko theorem, Comm. Math. Phys. 218 (2001), no. 1, 131–132. MR 2001m:34035 7. , A Paley-Wiener theorem with applications to inverse spectral theory, Advances in differential equations and mathematical physics (Birmingham, AL, 2002), Contemp. Math., vol. 327, Amer. Math. Soc., Providence, RI, 2003, pp. 21–31. MR 1 991 529 8. G. Borg, Uniqueness theorems in the spectral theory of y 00 +(λ−q(x))y = 0, Proc. 11th Scandinavian Congress of Mathematicians (Oslo), Johan Grundt Tanums Forlag, 1952, pp. 276–287. 9. I. M. Gelfand and B. M. Levitan, On the determination of a differential equation from its spectral function, Izv. Akad. Nauk SSSR 15 (1951), 309–360, English transl. in Amer. Math. Soc. Transl. Ser 2,1 (1955), 253-304. 10. V. A. Marˇcenko, Some questions in the theory of one-dimensional second-order linear differential operators. I, Trudy Moskov. Mat. Obˇsˇc. 1 (1952), 327–340, Also in Amer. Math. Soc. Transl. (2) 101, 1-104, (1973). 11. B. Simon, A new approach to inverse spectral theory, I. fundamental formalism, Annals of Math. 150 (1999), 1–29. ¨ 12. H. Weyl. Uber gew¨ohnliche Differentialgleichungen mit Singularit¨aten und die zugeh¨origen Entwicklungen willk¨ urlicher Funktionen. Math. Ann., 68:220–269, 1910.

133