Topics in Conformal Geometry and Dynamics

0 downloads 0 Views 759KB Size Report
thank Trinity for financial assistance in the form of a research scholarship. ...... the tori C/Λ for lattices Λ in C. The remaining Riemann surfaces covered by D are called ...... Applying this result to the sequences in condition (1) in the statement,.
Topics in Conformal Geometry and Dynamics

Edward Thomas Crane Trinity College.

This dissertation is submitted for the degree of Doctor of Philosophy

University of Cambridge

December 2003

For Jacqueline

i

Acknowledgements I would like to thank my supervisor, Keith Carne, for all the careful attention and encouragement he gave me, listening patiently while I explained all sorts of unpromising ideas and pointing me towards more fruitful ones. His confidence in me was a great help. Matthew Fayers put a great deal of care into proofreading an early draft of this thesis, eliminating hundreds of grammatical and typographical errors and helping me to improve the style and presentation. Ben Green asked me a question that led me to think about the question answered by Chapter 4. I have had many useful mathematical conversations with Dinesh Markose and Alan Beardon, among others. This thesis was typeset using Latex, with the AMS Latex packages and Paul Taylor’s diagrams package. I would like to thank the Engineering and Physical Sciences Research Council for the research studentship which funded this PhD research. Trinity College has been a wonderful home and has given me a first rate mathematical education over the course of seven years. I also wish to to thank Trinity for financial assistance in the form of a research scholarship. Most importantly, I would like to thank Jacqui for her love and support while I have been doing my PhD. I could not have done it without her!

ii This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. This thesis is not substantially the same as any that has been submitted, or is concurrently being submitted, for any other degree, diploma, or other qualification.

Edward Crane

Contents List of notation

vii

Introduction

ix

1 Preliminaries 1.1

1

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.1.1

Proper metric spaces . . . . . . . . . . . . . . . . . . .

1

1.1.2

Tight measures and the weak topology . . . . . . . . .

2

1.2

Kantorovich metrics . . . . . . . . . . . . . . . . . . . . . . .

4

1.3

Prohorov metrics . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.4

Ergodic theory . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.5

Geometric function theory . . . . . . . . . . . . . . . . . . . . 11

2 Iterated function systems 2.1

2.2

13

Iterated function systems . . . . . . . . . . . . . . . . . . . . . 13 2.1.1

Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.2

Linear operators associated to an IFS . . . . . . . . . . 15

Motivation and outline of results . . . . . . . . . . . . . . . . 18 2.2.1

Improving the Contraction Mapping Theorem . . . . . 18 iii

CONTENTS

iv

2.2.2

The Wolff–Denjoy Theorem . . . . . . . . . . . . . . . 20

2.2.3

Ambroladze’s theorem . . . . . . . . . . . . . . . . . . 20

2.2.4

Outline of Results . . . . . . . . . . . . . . . . . . . . . 21

2.3

2.4

2.5

Non-uniformly contracting IFSs . . . . . . . . . . . . . . . . . 23 2.3.1

Stability via Kantorovich metrics . . . . . . . . . . . . 24

2.3.2

Ergodic properties in the stable case . . . . . . . . . . 28

Analytic IFSs on D . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.1

Wolff–Denjoy Conjecture for analytic IFSs on D . . . . 31

2.4.2

An improved sufficient condition for stability . . . . . . 34

Convergence of reverse iterates . . . . . . . . . . . . . . . . . . 37 2.5.1

A generalisation of Letac’s Principle

. . . . . . . . . . 37

2.5.2

Converse for non-uniformly contracting IFSs . . . . . . 39

2.5.3

The natural extension of the skew product . . . . . . . 47

2.5.4

Strong converse for metrically taut spaces . . . . . . . 55

3 Holomorphic correspondences 3.1

3.2

62

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.1.1

Correspondences – the algebraic viewpoint . . . . . . . 63

3.1.2

The geometric function theory viewpoint . . . . . . . . 65

3.1.3

Dynamics on the Jacobian . . . . . . . . . . . . . . . . 65

3.1.4

Separable correspondences . . . . . . . . . . . . . . . . 66

3.1.5

Correspondences as Markov chains . . . . . . . . . . . 67

Background results . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2.1

Invariant measures for rational maps, rational semigroups and Kleinian groups . . . . . . . . . . . . . . . 69

3.2.2

Example: the AGM correspondence . . . . . . . . . . . 72

CONTENTS

v

3.2.3

Critically finite correspondences . . . . . . . . . . . . . 73

3.2.4

Lifting the AGM correspondence . . . . . . . . . . . . 76

3.3

3.4

3.5

Invariant measures of critically finite correspondences . . . . . 77 3.3.1

Latt`es rational maps . . . . . . . . . . . . . . . . . . . 78

3.3.2

Galois correspondences . . . . . . . . . . . . . . . . . . 79

3.3.3

A dynamical dichotomy . . . . . . . . . . . . . . . . . 80

3.3.4

Asymptotic stability . . . . . . . . . . . . . . . . . . . 84

Forward critically finite correspondences . . . . . . . . . . . . 89 3.4.1

The inverse of a hyperbolic rational map . . . . . . . . 89

3.4.2

Forward critical finiteness . . . . . . . . . . . . . . . . 92

3.4.3

The tangential correspondence on a quartic

3.4.4

The Fermat quartic . . . . . . . . . . . . . . . . . . . . 95

3.4.5

Asymptotic stability of the inverse . . . . . . . . . . . 99

. . . . . . 93

Correspondences on elliptic curves . . . . . . . . . . . . . . . . 100 3.5.1

Correspondences with no singular points . . . . . . . . 101

3.5.2

Correspondences of bidegree (2,2) on a torus with two singular points in each direction . . . . . . . . . . . . . 103

3.6

Dynamics of a D8 example . . . . . . . . . . . . . . . . . . . . 111 3.6.1

Algebraic equations for the (2, 2) correspondences on √

the torus C/ 1, i/ 2 . . . . . . . . . . . . . . . . . . 112

3.6.2

Dynamics of f0 . . . . . . . . . . . . . . . . . . . . . . 115

4 Area distortion of polynomial mappings

118

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

4.2

Logarithmic capacity . . . . . . . . . . . . . . . . . . . . . . . 120

4.3

Condenser capacity . . . . . . . . . . . . . . . . . . . . . . . . 123

CONTENTS 4.4

vi

Proof of Theorem 4.2 . . . . . . . . . . . . . . . . . . . . . . . 126

5 Smale’s Mean Value Conjecture 5.1

Smale’s Mean Value Conjecture . . . . . . . . . . . . . . . . . 128 5.1.1

5.2

128

Algebraic results on Smale’s Conjecture . . . . . . . . . 130

The mean value conjecture for rational maps . . . . . . . . . . 139 5.2.1

Rational maps with all critical points fixed . . . . . . . 142

5.2.2

Forbidden multipliers . . . . . . . . . . . . . . . . . . . 146

A Applications of IFS

150

B Known stability results for IFS

153

C Dependence of stability on p

157

Bibliography

165

List of notation

N . . . . . . . . . . . . . . . . . . . . . . . . the natural numbers, which begin at 1. C . . . . . . . . . . . . . . . . . . . . . . . . the complex numbers. b = P1 (C) . . . . . . . . . . . . . . . the Riemann sphere. C D = {z ∈ C : |z| < 1} . . . . .the open unit disc in C. D = {z ∈ C : |z| ≤ 1} . . . . the closed unit disc in C. dhyp (z, w) . . . . . . . . . . . . . . . . the distance between z and w in the natural complete hyperbolic metric on a hyperbolic Riemann surface. B . . . . . . . . . . . . . . . . . . . . . . . . the Borel σ-algebra. B(X) . . . . . . . . . . . . . . . . . . . . the space of real Borel-measurable functions on X. B(X, Y ) . . . . . . . . . . . . . . . . . . the set of Borel measurable functions from X to Y. C(X, Y ) . . . . . . . . . . . . . . . . . the space of continuous maps from X to Y , with the compact-open topology.

vii

LIST OF NOTATION

viii

C(X) . . . . . . . . . . . . . . . . . . . . the Banach space of bounded continuous realvalued functions on a topological space X, with the supremum norm. C0 (X) . . . . . . . . . . . . . . . . . . . the closed subspace of functions in C(X) that vanish at infinity. M(X) . . . . . . . . . . . . . . . . . . . the space of signed tight (regular) Borel measures on the topological space X. M+ (X) . . . . . . . . . . . . . . . . . . the subspace of M(X) consisting of positive measures. P(X) . . . . . . . . . . . . . . . . . . . . the space of tight Borel probability measures on X. Γ(n) . . . . . . . . . . . . . . . . . . . . . The subgroup of PSL2 (Z) represented by matrices congruent to the identity modulo n. Γ0 (n) . . . . . . . . . . . . . . . . . . . . The subgroup of PSL2 (Z)  whose  elements are a b  where c ≡ 0 represented by matrices  c d (mod n). Γ0 (m, n) . . . . . . . . . . . . . . . . . The subgroup of PSL2 (Z)  whose  elements are a b  where c ≡ 0 represented by matrices  c d (mod m) and b ≡ 0 (mod n).

Introduction The chapters of this thesis concern themselves with various aspects of a single theme: conformal mappings between Riemann surfaces have strongly controlled geometry and this has interesting dynamical consequences when we compose sequences of conformal mappings. Chapter 1 covers various technical prerequisites. It is intended to make the thesis reasonably self-contained. The reader is likely to be familiar with some of this material. In Chapter 2 we study a particular class of iterated function systems. An iterated function system is a kind of random dynamical system. In an ordinary discrete-time dynamical system, we have a state space X and a map f : X → X which we apply repeatedly (iterate). We are typically interested in the long-term statistical behaviour of an orbit x0 ,

x1 = f (x0 ),

x2 = f (x1 ),

x3 = f (x2 ), . . . .

An iterated function system differs in that the map Fn that must be applied to step from xn−1 to xn is not fixed, but is a random variable; the maps at different times are chosen independently, all from the same probability distribution. Now the orbit is random; it forms a Markov chain. We recommend two readable and up-to-date introductions to the subject [27, 69]. ix

INTRODUCTION

x

We were motivated by [3], which studied iterated function systems in which the maps Fn are drawn from a finite set of analytic self-maps of the open unit disc in the complex plane. In that paper a stability result for the corresponding Markov Chain was proved using the Schwarz–Pick Lemma, a contractivity property of such maps. We improve the sufficient condition for stability in this situation. We then define the class of non-uniformly contracting iterated function systems by abstracting the contractivity property; as its author remarked, the method of [3] applies to give a stability theorem for this class. We give an alternative proof, formalising a coupling argument by the use of probability metrics. We then prove the continuous dependence of the stationary distribution on the maps and probabilities. After proving a weak law of large numbers for this class, we conclude by giving a necessary and sufficient condition for stability in terms of the reverse iterates. These are the maps obtained by composing the random maps Fn in reverse order. In chapter 3, we study another type of random dynamical system that belongs to complex dynamics. This is the iteration of multivalued algebraic functions, or holomorphic correspondences. This is a relatively new subject, but already a number of different aspects have been explored in the literature. The fact that the setting is a surface with a moduli space should lead to an interaction between dynamical systems and arithmetic algebraic geometry. We study some ergodic properties of the iteration of the critically finite correspondences defined by Bullett [19]. These correspondences are actually arithmetic algebraic objects, also called modular correspondences. A stability result for critically finite correspondences that we hoped was new has recently appeared elsewhere [25]; we present our proof anyway, as is some-

INTRODUCTION

xi

what different. We then use some ideas from Chapter 2 to prove a stability result for a larger class of correspondences which are analogous to critically finite rational maps. As far as we know this result is new. Finally we study some concrete examples of correspondences. These are correspondences on certain elliptic curves with complex multiplication; they are rigid in that the topology of the correspondence determines the analytic isomorphism class of the elliptic curve. Each of these correspondences gives rise to a natural family of random dynamical systems parameterised by the underlying elliptic curve. We also use these correspondences to construct examples for the second stability result mentioned above. Chapter 4 continues the first part of our theme, namely the restrictions on the metric geometry of a map imposed by conformality. We study how polynomial mappings in one complex variable distort Euclidean areas of sets. Using classical potential theory we prove that if p is monic and K is a Lebesgue measurable subset of C, then  deg p Area (p−1 (K)) Area(K) . ≤ π π In chapter 5 we study another question concerning the Euclidean geometry of polynomials in one complex variable. This is Smale’s Mean Value Conjecture, a well-known conjecture which sets bounds on where a polynomial p maps any point z ∈ C in terms of the locations of the critical points ζi of p and their images under p. It states that if p0 (z) 6= 0 then p(z) − p(ζi ) ≤ 1. min i (z − ζi ) p0 (z) The result is known with 1 replaced by 4 on the right-hand side, or better constants in terms of the degree of p. Our result on the Conjecture itself is

INTRODUCTION

xii

a reduction: we show that it suffices to give a proof for pairs (p, z) for which the values in the minimum on the left-hand side are all equal. Unfortunately we have not yet been able to make any use of this reduction to improve the constant. We then proceed to formulate a Mean Value Conjecture for rational maps, and prove it with weaker constant. Finally we study a special case of these conjectures, that in which all the critical points are fixed.

Chapter 1 Preliminaries In this chapter we collect many of the definitions and technical results that we will need. None of the material presented in this chapter is new, with the possible exception of Lemma 1.8 and Lemma 1.10, which we are not aware of in the literature.

1.1 1.1.1

Analysis Proper metric spaces

A metric space (X, d) is proper if every closed ball of any positive radius is compact, which happens if and only if for every x ∈ X the distance function d(x, ·) is a proper map. For example, a Riemannian manifold is proper if and only if it is complete, by the Hopf–Rinow Theorem. Proper metric spaces are the appropriate setting for most of the new results in Chapter 2. Every proper metric space is complete and locally compact and has a countable compact exhaustion so is separable. These properties allow us to use machinery from 1

CHAPTER 1. PRELIMINARIES

2

functional analysis, and ensure various desirable properties of probability metrics. Several of our proofs will make explicit use of the compactness of closed balls. Throughout Chapter 2 (X, d) stands for a proper and nonempty metric space. In contrast Y stands for a topological space, subject to conditions that vary from paragraph to paragraph.

1.1.2

Tight measures and the weak topology

Let Y be a topological space and µ be a finite Borel measure µ on Y . µ is regular if for every Borel set A ⊂ X, µ(A) = sup{µ(K) : K compact, K ⊂ A} . µ is tight if this equation holds when A = X. If µ is regular then for every Borel A ⊂ X we have µ(A) = inf{µ(V ) : V open, V ⊃ A} . When Y is metrizable, µ is tight if and only if µ is regular [29, Theorem 7.1.3]. A law on Y is a Borel probability measure. P(Y ) denotes the set of regular laws on Y . A Polish space is a topological space which can be metrized to be separable and complete. Theorem 1.1 (Ulam). [29, Theorem 7.1.4] Let Y be a Polish space. Then every finite Borel measure on Y is tight. A topological space is called universally measurable if every law on it is tight. As a consequence of Ulam’s theorem, every proper metric space is

CHAPTER 1. PRELIMINARIES

3

universally measurable. In fact a topological space is universally measurable if it is Borel-measurably isomorphic to a Borel subset of a Polish space [29, 57]. Let Y be a locally compact non-empty Hausdorff space. Let C(Y ) denote the Banach space of bounded continuous functions on Y , with the supremum norm. C0 (Y ) is the closed subspace of C(Y ) consisting of functions f that vanish at infinity, i. e. for any  > 0 there exists a compact set K outside which |f | < . Let M(Y ) denote the space of regular finite signed Borel measures on Y , a Banach space with the total variation norm. Theorem 1.2 (Riesz Representation Theorem). For any locally compact non-empty Hausdorff space Y , M(Y ) is the dual of C0 (Y ) with respect to the integration pairing. This statement may be obtained by considering complex conjugates in [63, Theorem 6.19]. We will use the weak topology on M(Y ) that arises from this representation. Convergence in norm implies but is not implied by weak convergence. The weak topology on M(Y ) is Hausdorff and only depends on the topology of Y , not on the metric. A finite signed Borel measure integrates all bounded continuous functions, but if Y is not compact, there are bounded linear functionals on C(Y ) which are not represented by integration against a measure (take a Banach limit of point evaluations along a sequence that tends to infinity). Lemma 1.3. Let Y be a locally compact non-empty Hausdorff space and µn a tight law for each n ≥ 0. Then µn → µ0 weakly against C0 (Y ) if and only if µn → µ0 weakly against C(Y ).

CHAPTER 1. PRELIMINARIES

4

The proof of Lemma 1.3 uses local compactness via Urysohn’s Lemma. If Y is not compact then a sequence of probability measures may converge weakly to a positive measure of mass less than 1, perhaps even 0; in that case we do not have weak convergence against C(Y ). A subset S ⊂ P(Y ) is uniformly tight if for each  > 0 there exists a compact set J ⊂ Y such that ν(J) > 1 −  for every ν ∈ S. Lemma 1.4 (Le Cam). [29, Theorem 11.5.3] Let (Y, d) be a locally compact metric space, and mun ∈ P(Y ) for n ≥ 0. If µn → µ0 , then the set {µ, µ1 , µ2 , . . . } is uniformly tight. Lemma 1.5 (Prohorov). [29, Theorem 11.5.4] Let (Y, d) be a Polish space and S ⊂ P(Y ). Then S is relatively compact if and only if it is uniformly tight.

1.2

Kantorovich metrics

Let (Y, d) be a separable metric space. Define the subspace Pd ⊂ P(X) by   Z Pd = µ ∈ P(X) : d (y0 , y) dµ(y) < ∞ . Pd is independent of the choice of y0 ∈ Y . For any ν1 , ν2 ∈ Pd , let P ν1 ,ν2 stand for the space of laws on Y × Y with marginals ν1 and ν2 . The Kantorovich distance Kd (ν1 , ν2 ) is defined by Z Kd (ν1 , ν2 ) = inf d(x, y) dm(x, y) : m ∈ P ν1 ,ν2

(1.1)

We write Lip(Y ) for the space of Lipschitz functions f : Y → R, with seminorm kf kL , the best Lipschitz constant of f . The Wasserstein distance

CHAPTER 1. PRELIMINARIES

5

is defined by Z γ (ν1 , ν2 ) = sup

 f (y)d (ν1 − ν2 ) (y) : kf kL ≤ 1 .

Theorem 1.6 (Properties of Kd ). • Kd (ν1 , ν2 ) = γ (ν1 , ν2 ). • The infimum in equation 1.1 is attained. • Kd is a metric on Pd (Y ). • The following are equivalent: 1. Kd (µn , µ) → 0 as n → ∞; 2. µn → µ weakly and for some (and hence any) fixed x0 ∈ X, we have Z

Z d (x, x0 ) dµn (x) →

d (x, x0 ) dµ(x) ;

3. µn → µ weakly and for some (and hence any) fixed x0 ∈ X, we have Z lim sup

R→∞

n

d (x, x0 ) 1(d (x, x ) > R) dµn (x) = 0 . 0

• A subset A ⊂ Pd is relatively compact with respect to the metric Kd if and only if it is relatively compact in the weak topology and Z lim sup d (x, x0 ) 1(d (x, x ) > R) dm(x) = 0 . R→∞ m∈A

0

• If (Y, d) is complete, then (Pd , Kd ) is also complete.

CHAPTER 1. PRELIMINARIES

6

The first three facts are proved in [29, §11.8]. The remaining results are special cases of [61, Theorems 6.2.1, 6.3.1, 6.3.2 & 6.3.3]. Note that if (X, d) has finite diameter then the integral conditions in the convergence criteria are redundant, so Kd metrizes the weak topology on P(X). Lemma 1.7. Suppose that (X, c) is a proper metric space. Then every ball of finite radius in the metric space (Pc , Kc ) is uniformly tight. Proof. Consider the ball B(µ, R). Given any  > 0, there exists a compact set K ⊂ X with µ(K) > 1 − . K is contained in some c-ball B (x0 , r) ⊂ X. The ball K 0 = B (x0 , r + R/) is compact because (X, c) is proper. If a probability measure ν has ν(X \ K 0 ) > 2 then Kc (µ, ν) > R. Suppose that we wish to study a particular probability measure that happens not to lie in Pc . Then we change the metric on X to c0 (x, y) = ϕ(c(x, y)) . + When ϕ : R+ 0 → R0 is continuous, strictly increasing and concave with

ϕ(0) = 0, this yields a new metric space (X, c0 ), with the same topology as (X, c). Indeed, the balls in the c0 -metric of radius less than sup ϕ are precisely the balls of finite radius in the c-metric. If c is unbounded but ϕ is bounded then Pc0 = P(X). However, in that case (X, c0 ) is not proper, so Lemma 1.7 cannot be applied. On the other hand if (X, c) is proper and ϕ(t) → ∞ as t → ∞ then (X, c0 ) is also proper.

CHAPTER 1. PRELIMINARIES

7

Lemma 1.8. Given a separable metric space (X, c) and finitely many probability measures µ1 , . . . µk on X, there exists a continuous, strictly increasing and concave + function ϕ : R+ 0 → R0 such that ϕ(0) = 0, ϕ(t) → ∞ as t → ∞ and such

that all the µi are in Pc0 , where c0 (x, y) = ϕ(c(x, y)). Proof. Define gi (t) = µi ({x ∈ X : c(x, x0 ) ≥ t}) . These are decreasing functions such that gi (t) → 0 as t → ∞. We will construct ϕ as a piecewise linear continuous function the vertices of whose graph are at (tn , n) for n ∈ N. We will define inductively the values tn at which ϕ (tn ) = n. Note that ϕ is concave if and only if (tn − tn−1 ) is an increasing sequence. Set t0 = 0, t1 = 1 and then for n ≥ 2 set  tn = inf t : gi (t) ≤ (n + 1)−3 for i = 1, . . . , k and tn − tn−1 ≥ tn−1 − tn−2 . Then ϕ is certainly a continuous, strictly increasing, concave function with ϕ(0) = 0 and ϕ(t) → ∞ as t → ∞. Moreover, for n ≥ 2 we have Z ϕ (c (x, x0 )) 1 [n ≤ ϕ (c (x, x0 )) ≤ n + 1] dµi (x) ≤ (n+1)·gi (tn ) ≤ (n+1)−2 , so Z ϕ (c (x, x0 )) dµi (x) < ∞ , i. e. µi ∈ Pc0 where c0 = ϕ(c).

1.3

Prohorov metrics

The weak topology on P(X) is usually metrized by the Prohorov metric. Although we will not use it, we discuss it here for two reasons. We will need

CHAPTER 1. PRELIMINARIES

8

it in appendix B to explain the context of our results in the literature on IFSs, and the comparison with the Kantorovich metric serves to point out the advantages of the Kantorovich metric for our applications. Let (X, c) be a separable metric space. We denote the -neighbourhood of a subset A by A . The Prohorov distance πc between two elements µ, ν ∈ P(X) is defined to be infimum of all  > 0 such that for all Borel A ⊂ X µ(A) ≤ ν (A ) +  and ν(A) ≤ µ (A ) + . [16] gives more information about the Prohorov metric. The important properties are that πc is a metric on P(X) and that it metrizes the weak topology. Moreover, if (X, c) is separable and complete then so is (P(X), πc ). The following result gives an alternative way to define πc , close in spirit to our first definition of the Kantorovich metric Kc . Lemma 1.9 (Strassen–Dudley Theorem). Suppose πc (µ, ν) < α. Then there exists a measure m ∈ P µ,ν such that m({(x, y) : c(x, y) > α}) < α. Note that the measure m acts as a certificate of the fact that πc (µ, ν) < α. For a proof, see [16, Theorem 6.9]. The following inequalities relate the Prohorov and Kantorovich metrics associated to a separable metric space (X, c). Lemma 1.10. 1. πc (µ1 , µ2 ) ≤

p

Kc (µ1 , µ2 ).

2. Kc (µ1 , µ2 ) ≤ πc (µ1 , µ2 ) (1 + diamc (X)).

CHAPTER 1. PRELIMINARIES Proof.

9

1. A minimizing measure on X × X for Kc is a certificate as in the

Strassen–Dudley Theorem. 2. The certificate m provided by the Strassen–Dudley Theorem has Kantorovich integral at most equal to the right-hand side.

The analogue of Lemma 1.7 for Prohorov metrics fails.

1.4

Ergodic theory

A measure-preserving dynamical system (m. p. d. s. ) is defined as a quadruple (W, B, ν, T ), where W is a set, B is a σ-algebra of subsets of W , ν is a positive measure on B, B is complete with respect to ν (i.e. contains all subsets of ν-null sets), and T : W → W is a function such that T −1 (B) ⊂ B and T∗ ν = ν. We allow T to be defined only ν-a.e. We will often make use of the following result [58, p. 34]. Theorem 1.11 (Poincar´ e’s Recurrence Theorem). Let (W, B, ν, T ) be a m. p. d. s. and suppose that ν(A) > 0. Then for ν-a.e. x ∈ A, the sequence (T n (x))n∈N returns to A infinitely often. Let (W1 , B1 , ν1 , T1 ) and (W2 , B2 , ν2 , T2 ) be m. p. d. s. Then α : W2 → W1 is a homomorphism or factor map if α−1 (B1 ) ⊂ B2 , α∗ (ν2 ) = ν1 , α ◦ T2 = T1 ◦ α,

ν2 -a.e.

CHAPTER 1. PRELIMINARIES

10

We say that T2 is an extension of T1 and that T1 is a factor of T2 . If there also exists a homomorphism β : W1 → W2 such that α ◦ β = id, ν1 -a.e. (or equivalently β ◦ α = id, ν2 -a.e.) then we say α is an isomorphism. We say that a homomorphism or a measurable function is essentially unique if any two possible values of it agree a.e. with respect to the relevant measure. ˆ , B, ˆ νˆ, Tˆ) such that Tˆ is Any m. p. d. s. (W, B, ν, T ) has an extension (W invertible and such that any homomorphism ψ from an invertible m. p. d. s. to (W, B, ν, T ) is essentially uniquely expressible as ψ = φ ◦ χ. This is called the natural extension of (W, B, ν, T ). Being a universal object it is unique (up to an essentially unique isomorphism).1 If T is ergodic with respect to ν then Tˆ is ergodic with respect to νˆ. ˆ to be The standard construction of the natural extension is to take W the space of infinite backward orbits of T , i. e. sequences (wn )∞ n=0 such that wn = T wn+1 for all n, with the σ-algebra inherited from the product σalgebra on W N . The map Tˆ is given by T acting co-ordinatewise, and Tˆ−1 is the left-shift. There is a unique measure νˆ such that for each co-ordinate ˆ → W , we have (πm ) νˆ = ν. Indeed, a compactness projection map πm : W ∗ argument on trees of pre-images shows that this condition specifies νˆ(Am ) ˆ consistently for sets of the form Am = {(wn )∞ n=0 ∈ W : wm ∈ A}; the Carath´eodory Extension Theorem then gives existence and uniqueness. 1

However, a universal object does depend on the category with respect to which it is

universal! Some authors do not insist that B be complete with respect to ν; their natural extension may have a smaller σ-algebra than ours.

CHAPTER 1. PRELIMINARIES

1.5

11

Geometric function theory

b the Riemann sphere, and by D We denote by C the complex numbers, by C the open unit disc in C. Theorem 1.12 (Uniformisation of Riemann surfaces). Let R be any Riemann surface. Then R is the quotient of precisely one of b C, and D by a group of automorphisms that acts the Riemann surfaces C, freely and properly discontinuously; this group is unique up to conjugation. The universal cover of any Riemann surface has a unique complex structure that makes the covering map analytic, so the theorem amounts to proving that every simply-connected Riemann surface is isomorphic to precisely b C and D. The only case in which the universal cover is C b is when one of C, b The only quotients of C are C itself, the once-punctured plane, and R = C. the tori C/Λ for lattices Λ in C. The remaining Riemann surfaces covered by D are called hyperbolic. In particular, any domain in C which omits at least two points of C is hyperbolic, as is any Riemann surface whose fundamental group is non-abelian. The automorphisms of the disc D are also isometries of the Poincar´e metric on D, which is the unique complete conformal Riemannian metric on D of constant curvature −1. (A Riemannian metric on a Riemann surface is conformal when with respect to some (hence any) local co-ordinate z it has the form ds2 = ρ(z)|dz|2 , where ρ is some smooth positive function.) It follows that any hyperbolic Riemann surface R has a unique complete conformal metric with curvature −1, which we call the hyperbolic metric on R. We denote the corresponding distance function dR or dhyp when there will be no confusion over which Riemann surface is

CHAPTER 1. PRELIMINARIES

12

being referred to. Let f : R → S be an analytic map between hyperbolic Riemann surfaces. The derivative Df (z) is a scalar multiple of an isometry of the tangent spaces at z and f (z); this scalar is written f # (z) and called the hyperbolic derivative of f at z. Lemma 1.13 (Schwarz–Pick Lemma). Let f : R → S be an analytic map between two hyperbolic Riemann surfaces. Then f does not increase distances, i. e. for all z, w ∈ R, dS (f (z), f (w)) ≤ dR (z, w) . If there are two distinct points z and w for which equality holds then f is a local isometry. Moreover, for each z ∈ R, f # (z) ≤ 1, with equality at any single point if and only if f is a local isometry. The following generalisation is apparently due to Nehari [55]. Lemma 1.14 (Branched Schwarz Lemma). Let f, b : D → D be analytic, with f (0) = 0 = b(0), and suppose that b is a finite Blaschke product. If the valency of f is at least the valency of b at each point of D, then |f 0 (0)| ≤ |b0 (0)|. If |b0 (0)| = 6 0 then equality holds if and only if f ∼ = λb, where λ is a constant of modulus 1.

Chapter 2 Iterated function systems 2.1 2.1.1

Iterated function systems Definitions

Let Y be a topological space with Borel σ-algebra B. Let C(Y, Y ) denote the space of continuous maps from Y to Y , and B(Y, Y ) the set of Borelmeasurable maps from Y to Y . An Iterated Function System (IFS) F consists of the following data: a topological space Y , a probability space (Ω, M, P), and a function f : Ω → B(Y, Y ). We will abuse notation by using f also to stand for the corresponding map Ω × Y → Y . We insist that the latter map be measurable (i. e. for every A ∈ B, f −1 (A) ∈ M × B). We will think of f as a random map of Y into itself. We call the IFS F continuous when f : Ω → C(Y, Y ). In the special case in which Ω is finite, we call F an N -map IFS. Then we denote the possible values of f by f1 , f2 , . . . , fN , taken with probabilities p1 , p2 , . . . , pN

13

CHAPTER 2. ITERATED FUNCTION SYSTEMS

14

respectively. We interpret F dynamically by constructing a sequence of independent random maps (Fi )∞ i=1 , each map distributed as f is. We think of the map Fi being applied to Y at time i. The (random) orbit of a (possibly random) point x0 ∈ Y is the random sequence (xn ), where xn = Fn ◦ Fn−1 ◦ · · · ◦ F1 (x0 ) . This may also be thought of as the orbit of a discrete-time Markov chain (xi )∞ i=0 , where the probability of a transition into a Borel set A ⊂ Y is defined by P (xi+1 ∈ A|xi = y) = P (y, A) = P (f (y) ∈ A) . P (y, A) is a measurable function of y [50, Lemma 2.1]. To fix notation, the probability space on which the sequence (Fn ) is defined is the unilateral shift space Ω = {1, 2, . . . N }N with σ-algebra M obtained by completing the product σ-algebra with respect to the Bernoulli measure Bp . Then Fn = fωn . In order to interpret F in terms of dynamical systems, we define the left-shift map σ : Ω → Ω given by σ : (ω1 , ω2 , ω3 , . . . ) 7→ (ω2 , ω2 , ω3 , . . . ) . and the skew-product map τ : Y × Ω → Y × Ω given by τ (x, (ω1 , ω2 , ω3 , . . . )) = (fω1 (x), (ω2 , ω3 , ω4 , . . . )) .

CHAPTER 2. ITERATED FUNCTION SYSTEMS

2.1.2

15

Linear operators associated to an IFS

Let B(Y ) denote the space of Borel-measurable real-valued functions on the topological space Y . The composition of Borel-measurable functions is Borelmeasurable, so we can associate to any Borel-measurable map h : Y → Y a pull-back operator h∗ : B(Y ) → B(Y ), defined by h∗ (g) = g ◦ h. If h is continuous then h∗ also acts on C(Y ). The push-forward of a Borel measure µ on Y by h is defined by h∗ (µ)(E) = µ (h−1 (E)) for each Borel set E ⊂ Y . Since h−1 : B → B is an algebra homomorphism, h∗ (µ) is another Borel measure. If h is continuous then h∗ takes tight measures to tight measures. The Perron–Frobenius or transfer operator F∗ of an IFS F on Y acts on g ∈ B(Y ) by ∗

Z



(F g) (y) := E (f (g)(y)) =

g ◦ f (y) dP(f ) ,

where f is the random map in the definition of F and E is the expectation with respect to P. Since g ◦ f is Borel-measurable, Fubini’s Theorem tells us that F ∗ (g) ∈ B(Y ). The Markov operator F∗ acts on the space of all Borel measures on Y by (F∗ µ) (A) := E (f∗ (µ)(A)) = E µ f

−1

 (A) =

Z P(f (x) ∈ A) dµ(x) ,

where A is any Borel subset of Y . We say that F has the weak Feller property if F ∗ acts on C(Y ). We will call F good if it has the weak Feller property and F∗ acts on P(Y ). The weak Feller property does not imply that F∗ sends tight measures to tight measures, although if Y is universally measurable then this is automatic. Every continuous N -map IFS is good. If F is a good IFS then F∗ is the adjoint of F ∗ (with respect to the duality described by

CHAPTER 2. ITERATED FUNCTION SYSTEMS

16

theorem 1.2), i. e. for every g ∈ C(Y ), Z Z g(y) d (F∗ µ) (y) = (F ∗ g) (y) dµ(y) . Furthermore, if F is good then F ∗ is a bounded operator of norm 1 on C(Y ). In chapter 3 we will use IFSs that are good but not continuous. A finite Borel measure µ on Y is invariant for F when F∗ µ = µ. Note that µ is invariant for F if and only if µ is a stationary distribution for the associated Markov chain, which happens if and only if µ × Bp is an invariant measure for τ . When µ is invariant for F, the Cartesian projection map π : Y × Ω → Ω makes the m. p. d. s. (Y × Ω, B × M, µ × Bp , τ ) into an extension of the m. p. d. s. (Ω, M, Bp , σ). The Krylov–Bogolubov Theorem states that any continuous self-map of a non-empty compact space has an invariant law. The following standard result is the analogue for IFSs. Lemma 2.1 (Krylov–Bogolubov for IFSs). Let Z be a non-empty compact metric space and let R : C(Z) → C(Z) be a bounded linear operator such that R(1) = 1. Then the adjoint R∗ acts on P(Z) and has a fixed point there. In particular, any good IFS on Z has an invariant law. Proof. R∗ is continuous with respect to the weak topology on P(Z). The Banach–Alaoglu Theorem says that the unit ball of M(Z) is weakly compact; P(Z) is a closed subset of this ball. The Schauder–Tychonov Fixed Point Theorem [64, Theorem 5.28] says that any continuous self-map of a nonempty compact convex subset of a locally convex topological vector space has a fixed point.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

17

It is sometimes of interest to construct an invariant law. Choose any µ0 ∈ P(Z) and for each n ∈ N define n 1 X ∗k µn = R µ0 . n + 1 k=0

Since P(Z) is weakly compact, there is a subsequence µni that converges weakly to µ ∈ P(Z), say. R ∗ µn − µn =

 1 R∗ n+1 µ0 − µ0 . n+1

The norm of this is at most 2/(n + 1) so it converges weakly to 0. Thus R∗ µn → µ weakly as n → ∞. Since R∗ is continuous, R∗ µn → R∗ µ as n → ∞, so R∗ µ = µ. When X is not compact, P(X) is not weakly closed, let alone compact, and an IFS F on X may fail to have invariant law. T : P(X) → P(X) is asymptotically stable if there is a T -invariant law µ such that for every ν ∈ P(X), we have T n ν → µ weakly as n → ∞. An asymptotically stable operator evidently has a unique invariant law. We call an IFS F asymptotically stable when F∗ is asymptotically stable, which happens if and only if for every g ∈ C0 (Y ), the sequence (F ∗ )n g converges pointwise to a constant function.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

2.2

18

Motivation and outline of results

2.2.1

Improving the Contraction Mapping Theorem

Let (X, d) be a complete and non-empty metric space. Suppose that f : X → X is a contraction, i. e. for some K < 1 and all x, y ∈ X, d(f (x), f (y)) ≤ Kd(x, y). The Contraction Mapping Theorem says that f has a unique fixed point x0 which attracts all orbits, i. e. for all y ∈ X, f n (y) → x0 as n → ∞. The conclusion obviously fails if we allow K = 1. For a more delicate result, define f : X → X to be strictly distance-decreasing if for all distinct x, y ∈ X, d(f (x), f (y)) < d(x, y). Such maps are sometimes known as contractive, but we will avoid that term as is it often used in looser senses. For strictly distance-decreasing maps we have the following analogue of the Contraction Mapping Theorem. It appears in [12]; here we give a proof to serve as a model for the proofs of later results. Proposition 2.2. Suppose (X, d) is a proper metric space and f : X → X is strictly distancedecreasing. Then either f has a unique fixed point which attracts all points of X, or the orbit of each point tends to infinity (and this convergence is locally uniform). Proof. Firstly note that f is continuous. Suppose that the orbit of some point x ∈ X does not tend to infinity, i. e. f ◦n (x) is in some compact set K for infinitely many n. Then the orbit has a convergent subsequence f ◦nk (x) → x0 as k → ∞. Because f is continuous, f ◦(1+nk ) (x) → f (x0 ) as n → ∞. How ever, the distances d f ◦n (x), f ◦(n+1) (x) form a strictly decreasing sequence whose limit is d (x0 , f (x0 )), by continuity of f and of the distance function.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

19

If d (x0 , f (x0 )) = 0 then x0 is a fixed point. In that case, for any other point y ∈ X, d (f ◦n (y), x0 ) is a strictly decreasing sequence. Suppose for a contradiction that it converges to  > 0. By the properness condition, the orbit of y is precompact so we may choose a convergent subsequence f ◦mk (y) → y0 . Then d (y0 , x0 ) = , so d (f (y0 ) , x0 ) < , so by continuity of f , we have d (f 1+mk (y), x0 ) <  for k sufficiently large, a contradiction. Hence the fixed point attracts all orbits. Finally we must rule out the possibility that d (x0 , f (x0 )) =  > 0. If this were the case, then d (f (x0 ) , f (f (x0 ))) <  and by continuity of f we would have for sufficiently large k  d f ◦(1+nk ) (x), f ◦(2+nk ) (x) <  , a contradiction. The alternative is that for each compact set K, the orbit of x only visits K finitely many times, i. e. f ◦n (x) → ∞. In that case, for any compact set L, f ◦n (L) ∩ L = ∅ for all but finitely many n. Indeed, let R be the diameter of L and let NR (L) be the closed R-neighbourhood of L, which is compact by the properness condition. Fixing some x ∈ L, the above intersection is certainly empty when f ◦n (x) 6∈ NR (L), because f does not increase distances. Example 2.1. Let (X, d) be the set of non-negative integers with the metric d(m, n) =

1 + 2− max(m,n) 2

for m 6= n.

Let f : 0 7→ 0 and f : m 7→ m + 1 for m > 0. Then f is strictly distancedecreasing. The topology is discrete, so X is locally compact. The point 0 is

CHAPTER 2. ITERATED FUNCTION SYSTEMS

20

fixed, but the orbit of 1 is not precompact. Here X is complete, separable and locally compact, with a countable compact exhaustion, but the conclusion of Proposition 2.2 fails. Properness is genuinely required.

2.2.2

The Wolff–Denjoy Theorem

Let D be the open unit disc in the complex plane, equipped with the usual hyperbolic metric, which is proper. Suppose that f : D → D is analytic but is not a conformal automorphism of D. The Schwarz–Pick Lemma tells us that f is strictly distance-decreasing, so proposition 2.2 applies. In fact, a stronger conclusion can be obtained, using not the one-point compactification of D, but a bigger compactification, D. Theorem 2.3 (Wolff–Denjoy). Suppose that f : D → D is analytic. Unless f is an automorphism with a fixed point, then there is an α ∈ D such that for all z ∈ D, f n (z) → α as n → ∞. Note that the theorem does not assume that f extends continuously to D. A simple proof of this theorem given by Beardon [12] is reproduced in [23, §IV.3].

2.2.3

Ambroladze’s theorem

Let F be an N -map IFS on D whose maps are analytic. Suppose that all the maps are automorphisms of D. If they have a common fixed point then there are infinitely many invariant measures and each orbit stays on a circle. If they have no common fixed point then it is known that there is no invariant

CHAPTER 2. ITERATED FUNCTION SYSTEMS

21

b measure on D. However, the automorphisms extend to M¨obius maps of C and the extended IFS has at least one invariant measure on the unit circle, although it need not be asymptotically stable. Henceforth we assume that with positive probability the random map f : D → D is not an automorphism. Theorem 2.4 (Ambroladze). [3] Let F be an N -map IFS of analytic endomorphisms of D, and suppose that at least one map, say f1 , is not an automorphism. Then there is a measure µ such that for any Borel law ν, the iterates F∗n ν converge weakly to µ. Either µ = 0 and F has no invariant law, or µ is the unique invariant law for F. A sufficient condition for an invariant law to exist is that f1 (D) is relatively compact in D. In particular an IFS of this type is automatically asymptotically stable if it has an invariant law.

2.2.4

Outline of Results

Theorem 2.4 is a generalisation to IFSs of proposition 2.2 for the special case of analytic self-maps of D. In §2.3 we make precise the suggestion in [3] of generalising Theorem 2.4 to more general metric spaces and maps. We introduce the class of non-uniformly contracting IFSs and prove Theorem 2.6, which is a generalisation to these IFSs both of Theorem 2.4 and of Proposition 2.2. Ambroladze’s proof applies almost verbatim to prove Theorem 2.6; here we give an alternative proof using a Kantorovich metric, in preparation for using a similar method in the iteration of correspondences. In §2.3.2 we study some consequences for IFSs of asymptotic stability, giving a weak law

CHAPTER 2. ITERATED FUNCTION SYSTEMS

22

of large numbers for typical orbits of F, and for all orbits in the non-uniformly contracting case. An alternative way to generalise theorem 2.4 would be to produce a Wolff– Denjoy Theorem for analytic IFSs on D. Suppose that F∗n µ → 0 as n → ∞, weakly against C0 (D). Is there necessarily a law µ ˜ supported on the unit ˜ weakly against continuous functions on D? We circle ∂D such that F∗n ν → µ make some minor progress on this question in §2.4.1. In §2.4.2 we improve the sufficient condition for stability in Theorem 2.4. Theorem 2.16 gives a geometric sufficient condition which will be useful in Chapter 3 because it interacts well with covering maps. Define the reverse iterates of the IFS F to be the random maps Gn = F1 ◦ F2 ◦ · · · ◦ Fn . One of the two proofs in [3] of the sufficient condition for existence of an invariant law in Theorem 2.4 is based on the following result. Theorem 2.5 (Letac’s Principle). [47]. Suppose that the sequence of reverse iterates (Gn )∞ n=1 almost surely converges pointwise to a (random) constant map. Then the distribution of that constant limit is the unique invariant measure for F. We will use Letac’s Principle in proving Theorem 2.16. In §2.5 we ask which asymptotically stable IFSs can be proven to be asymptotically stable using Letac’s Principle. We show in Corollary 2.25 that if an N -map analytic IFS on D has an invariant law then the sequence of reverse iterates almost surely converge pointwise to a (random) constant limit.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

2.3

23

Non-uniformly contracting IFSs

Definition. An N -map IFS F is non-uniformly contracting if • Each fi ∈ Lip(1), i. e. each fi is non-expanding: d (fi (x), fi (y)) ≤ d(x, y)

for all x, y ∈ X.

• At least one map, f1 , say, is strictly distance-decreasing, i. e. d (fi (x), fi (y)) < d(x, y)

for all x 6= y ∈ X.

(When X is not compact, f1 need not be Lipschitz with any constant less than 1.) This terminology is new; we are not aware of any conflicting meanings in the literature. Here is the generalisation of Theorem 2.4 to the setting of a non-uniformly contracting IFS. Theorem 2.6. Let X be a proper metric space. Let F = (f1 , . . . , fm : X → X; p1 , . . . pm ) be a non-uniformly contracting IFS. 1. If F has an invariant law µ, then F is asymptotically stable. In particular, µ is the unique invariant law. 2. If F has no invariant law then for any ν ∈ P(X), F∗n ν → 0 weakly as n → ∞.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

24

Proof. The proof in [3] applies almost word-for-word to this generalisation, writing X in place of D. We have to replace the limn→∞ |Zn | = 1 with Zn → ∞ (meaning that for any compact set K ⊂ X, Zn 6∈ K for sufficiently large n). To generalise the proof of [3, Lemma 1], one needs the following statement. Suppose (xn ) → ∞ as n → ∞ and d (xn , yn ) is bounded, then yn → ∞ as n → ∞. This is true if and only if X is proper. The properness condition also allows us to use the original proof of [3, Lemma 3], where it is necessary that for any compact ball, the closed ball with the same centre but five times the radius is also compact.

Something along these lines was suggested in [3] but not made precise (no explicit condition was given on the metric space). All we claim to have added is the information that properness of X is the appropriate condition. That properness cannot be weakened much is demonstrated by example 2.1 above. The context of this result in the IFSs literature is discussed in appendix B.

2.3.1

Stability via Kantorovich metrics

We will now give an alternative proof of part 1 of Theorem 2.6, using the method of Kantorovich distances. This will serve as a model for the proofs of several asymptotic stability results. Suppose that (X, d) is a proper metric space and F is a non-uniformly contracting IFS on (X, d). If c = ϕ(d) is a modified metric as in §1.2, where

CHAPTER 2. ITERATED FUNCTION SYSTEMS

25

ϕ is strictly increasing, then F is also a non-uniformly contracting IFS on (X, c). The following lemma is the key to our use of Kantorovich metrics. Lemma 2.7. Let (X, c) be a proper metric space and let F be a non-uniformly contracting IFS on (X, c). Then the push-forward F∗ acts on Pc and is strictly distance-decreasing with respect to Kc . Also, F∗ does not increase the Prohorov distance πc . Proof. Suppose µ ∈ Pc . Then Z Z N X c (x, x0 ) d (F∗ µ) (x) = pi c (fi (x), x0 ) dµ(x) ≤

i=1 N X



Z

pi c (fi (x0 ) , x0 ) +

 c (fi (x), fi (x0 )) dµ(x)

i=1

≤ A+

N X

Z pi

c (x, x0 ) ,

i=1

where the additive constant A does not depend on the choice of µ. Thus F∗ acts on Pc . Suppose that ν1 , ν2 ∈ Pc are distinct. The infimum in the defining equation 1.1 is attained by a measure m0 ∈ P ν1 ,ν2 . The maps fi act on X × X via fi × fi : (x, y) 7→ (fi (x), fi (y)), so we can define F∗ m0 =

N X

pi (fi × fi )∗ m0 .

i=1

Note that π1 F∗ (m0 ) = F∗ ν1 and π2 m00 = F∗ ν2 . Since f1 is strictly distancedecreasing, we have Z Kc (F∗ ν1 , F∗ ν2 ) ≤

c(x, y) dm00

Z
α}) < α. But we also have F∗ (m)({(x, y) : c(x, y) > α}) < m({(x, y) : c(x, y) > α}) because the maps fi do not increase distance. So F∗ m is a certificate for the inequality πc (F∗ ν1 , F∗ ν2 ) < α. We now give an example to show that F∗ need not decrease Prohorov distances. Let δx and δy be the unit masses at distinct points x, y ∈ X such that c(x, y) < 1. Then πc (δx , δy ) = c(x, y). Let f1 be a constant map with value z, where c(x, z) and c(y, z) both exceed c(x, y), and let f2 be the identity map. Taking positive probabilities p1 and p2 = 1 − p1 > c(x, y) we have a 2-map non-uniformly contracting IFS, for which πc (F∗ (δx ) , F∗ (δy )) = πc (δx , δy ). Lemma 2.8. Let F be a non-uniformly contracting IFS on a proper metric space (X, d). Suppose that F has an invariant law µ, and take any ν ∈ P(X). Then the sequence F∗n ν is uniformly tight. First proof. By Lemma 1.8 there is a metric c = ϕ(d) such that ϕ is concave, strictly increasing and unbounded (so (X, c) is proper and F is non-uniformly contracting on (X, c)) and such that µ, ν ∈ Pc . By Lemma 2.7 the measures F∗n ν all lie in the ball B (µ, Kc (µ, ν)) in the Kc metric, which by Lemma 1.7 is uniformly tight. Second proof. Let  > 0. Because µ and ν are both tight, there is some compact set K ⊂ X such that µ(X \ K) <  and ν(X \ K) < . Let

CHAPTER 2. ITERATED FUNCTION SYSTEMS

27

m = µ × ν ∈ P(X × X). Then m(K × K) > 1 − 2. Using the push-forward F∗ on P(X × X) defined in the proof of Lemma 2.7, we have (π1 )∗ (F∗n m) = F∗n µ = µ and

(π2 )∗ (F∗n m) = F∗n ν .

Let r = diam(K) and let Nr (K) be the closure of the r-neighbourhood of K, which by the properness property is also compact. If (x, y) ∈ K × K and Fn ◦ · · · ◦ F1 (x) ∈ K then Fn ◦ · · · ◦ F1 (y) ∈ Nr (K). It follows that F∗n m (K × Nr (K)) is at least 1 − 3, and hence F∗n (ν) (Nr (K)) > 1 − 3. This bound is independent of n, as required. We can now give our alternative proof of the first part of Theorem 2.6. Alternative proof of Theorem 2.6(1). Consider the modified metric ρ = d/(1 + d), for which diamρ (X) ≤ 1, so Pρ = P(X). Let ν ∈ P(X). By Lemma 2.7 the sequence Kρ (F∗n ν, µ) is strictly decreasing. Suppose it converges to a positive limit λ. By Lemma 2.8, we can find a weakly convergent subsequence F∗nk ν → ν˜. Because Kρ metrizes the weak topology, we find Kρ (µ, ν˜) = λ. But since F∗ is distancedecreasing and therefore continuous, we also have F∗1+nk ν → F∗ ν˜. Then we have λ = Kρ (F∗ ν˜, µ) < Kρ (˜ ν , µ) = λ, a contradiction. Since Kρ (F∗n ν, µ) is strictly decreasing and does not converge to any positive limit, it must converge to 0, which implies that F∗n ν → µ weakly. So the fixed point µ attracts all laws, as required.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

2.3.2

28

Ergodic properties in the stable case

A measurable dynamical system is called uniquely ergodic if it has only one non-zero invariant measure, up to multiplication by scalars. The invariant measure of a uniquely ergodic system is necessarily ergodic. The following lemma is a version of that statement for IFSs. It is a simple consequence of results of Ohno presented in [50, §I.1.2], but it is not explicitly stated in this way there. Lemma 2.9. Suppose that an N -map IFS F = (f1 , . . . , fN ; p1 , . . . , pN ) is uniquely ergodic, i. e. has a unique invariant law µ. Write Bp for the corresponding Bernoulli measure on the shift space {1, . . . N }N . Then the measure µ × Bp is ergodic for the skew-product map τ . Proof. Let µ be any F∗ -invariant Borel measure on X. We say that a bounded measurable function g on X is (F ∗ , µ)-invariant if F ∗ g = g, µ-a.e; a set A ⊂ X is called (F ∗ , µ)-invariant if its characteristic function χA is (F ∗ , µ)invariant. We say µ is ergodic if every F ∗ -invariant function is constant µ-a.e. It is shown in [50, §I.1.2] that µ is ergodic if and only if every (F ∗ , µ)-invariant set has µ-measure equal to 0 or 1, and that µ is ergodic if and only if µ × Bp is ergodic for the skew product map (in the usual sense of the term). If A is an (F ∗ , µ)-invariant set with 0 < µ(A) < 1, then consider the measure µ0 given by µ0 (B) = µ(A ∩ B)/µ(A) . This defines an F∗ -invariant law on X different from µ. Hence the uniqueness of the invariant measure implies that any (F ∗ , µ)-invariant set has µ-measure

CHAPTER 2. ITERATED FUNCTION SYSTEMS

29

either 0 or 1. This completes the proof of Lemma 2.9. The following corollaries tell us some circumstances in which we can plot a good picture of the invariant measure µ of an IFS F just by plotting sufficiently many points of a single random orbit. Corollary 2.10. If an N -map IFS F is asymptotically stable with invariant law µ then for µa.e. x0 ∈ X, we have the following weak convergence of empirical measures, almost surely: k−1

1X δx → µ . k i=0 i Recall that x0 is a non-random point, whose random orbit under the IFS F is the sequence (xi ). Proof. By Lemma 2.9, the measure µ × Bp is ergodic for the skew-product τ . For any function g ∈ C0 (X), we have g ◦ π1 ∈ L1 (X × Ω), so we can apply Birkhoff’s Pointwise Ergodic Theorem to conclude that k−1

1X lim g (xi ) = g (x0 , ω) k→∞ k i=0 exists (µ × Bp )-a.s. and is a τ -invariant function, hence equal to the constant R g := g dµ, for (µ × Bp )-a.e. point (x0 , ω). Now we use the fact that C0 (X) is separable (this follows from the properness of X, using a countable compact exhaustion). If g (x0 , ω) exists and equals g for each element g of a countable dense subset of C0 (X) (which again happens (µ × Bp )-a.e.) then it exists and equals g for every g ∈ C0 (X), which is precisely the statement that the empirical measures converge weakly to µ.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

30

Corollary 2.11 (Weak law of large numbers). Suppose that X is a proper metric space and that the N -map IFS F on X is non-uniformly contracting and has an invariant law. Then for every x0 ∈ X, we have the following weak convergence of empirical measures, almost surely: k−1

1X δx → µ . k i=0 i Proof. Observe that if we have two sequences (xi ) and (yi ) such that the P empirical measures k1 k−1 i=0 δyi converge weakly to a law µ on X and such that d (xi , yi ) → 0 as i → ∞, then the empirical measures of the sequence (xi ) must also converge weakly to µ. This follows from the fact that every element of C0 (X) is uniformly continuous. By corollary 2.10 the former happens for µ-a.e. y0 ∈ X. Let K ⊂ X be a compact set such that µ(K) > 0. Consider the set A = K × {ω ∈ Ω : ω1 = 1}. Poincar´e’s Recurrence Theorem tells us that for (µ × Bp )-a.e. point (y0 , ω) ∈ A, τ n (y0 , ω) ∈ A for infinitely many n. Thus we may choose a point y0 ∈ X for which Bp -a.s. the empirical measures of (yi ) converge weakly to µ and also for infinitely many i, we have yi ∈ K and Fi+1 = f1 . For such a y0 and for any x0 ∈ X, define the random sequences (yi ) and (xi ) on the same shift space (Ω, Bp ). This is a simple form of coupling of the two Markov chains (xi ) and (yi ). We must show that d (xi , yi ) → 0 a.s. as i → ∞. Since d (xi , yi ) is non-increasing and X is proper, xi returns to some compact set L whenever yi returns to K. Suppose that d (xi , yi ) 6→ 0 as i → ∞; then we could choose a sequence ik along which xik → xˆ and yik → yˆ and such that ω1+ik = 1. Then d (xi , yi ) & d (ˆ x, yˆ) as i → ∞, but d (f1 (ˆ x), f1 (ˆ y )) < d(ˆ x, yˆ), so for k sufficiently large, d (x1+ik , y1+ik ) < d(ˆ x, yˆ), which is a contradiction.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

2.4 2.4.1

31

Analytic IFSs on D Wolff–Denjoy Conjecture for analytic IFSs on D

Suppose F is an N -map IFS of analytic maps of D, and suppose that F has no invariant law. Theorem 2.4 then says that for any measure ν ∈ P(D) we have F∗n µ → 0 weakly against C0 (D), so the sequence F∗n µ eventually leaves every compact subset of P(D). The weak topology on P(D) coincides with  the subspace topology of P(D) ⊂ P D . In the case of a 1-map IFS, the  Wolff–Denjoy Theorem says that in P D we have F∗n ν → δα as n → ∞, where δα is the unit point mass at α ∈ D. Conjecture 1 (Wolff–Denjoy for IFSs). Let F be an N -map analytic IFS on D. Then there exists a measure µ ∈   P D such that for every ν ∈ P(D), we have F∗n ν → µ weakly against C D . Even in the special case where the maps extend continuously to D, the extension of F to D may not be asymptotically stable. For example, there are many invariant measures on ∂D for the map z 7→ z 2 . Lemma 2.12. Suppose that ϕ : R+ → R+ is an unbounded concave function with ϕ(0) = 0, and let ρ be the metric ϕ (dhyp ) on D, so that (Pρ (D), Kρ ) is a proper metric space. Let µn , νn ∈ Pρ (D) for each n ∈ N and let µ ∈ P(∂D). If µn → µ  weakly in P D as n → ∞ and the sequence Kρ (µn , νn ) is bounded, then  νn → µ weakly in P D . Corollary 2.13. Suppose that µ0 ∈ P(D), that µ ∈ P(∂D), and that there is a sequence

CHAPTER 2. ITERATED FUNCTION SYSTEMS

32

 nk → ∞ such that F∗nk µ0 → µ weakly in P D . Then for any ν ∈ P(D), we  have F∗nk ν → µ weakly in P D . Thus it suffices to prove conjecture 1 for just one ν ∈ P(D). Corollary 2.14.  If µ0 ∈ P(D) and µ ∈ P(∂D) then in P D we have F∗nk µ0 → µ as k → ∞ if and only if F∗1+nk µ0 → µ as k → ∞. Proof of Lemma 2.12 and Corollaries 2.13 and 2.14.  Let KE be the Kantorovich metric on P D associated to the Euclidean metric. Since D has finite Euclidean diameter, KE metrizes the weak topology  on P D , so we have KE (µn , µ) → 0 as n → ∞. Now KE (νn , µ) ≤ KE (µn , µ) + KE (µn , νn ) , so it suffices to prove that KE (µn , νn ) → 0. Let 0 < R < 1, and let DR be the open disc about 0 of hyperbolic radius R. We claim that for any x ∈ D and y ∈ D \ DR , and for R large enough, |x − y| ≤ c(R)ρ(x, y) ,

(2.1)

where c(R) → 0 as R → ∞. To show this, suppose that  > 0; we will find R0 such that if R ≥ R0 then (2.1) holds with  in place of c(R). Since |x − y| ≤ 2, this is immediate when ρ(x, y) ≥ 2/. Since ϕ is unbounded, this occurs for dhyp (x, y) ≥ S, where S is a constant depending on . On the other hand, since ϕ is concave, there is a constant A > 0 such that ρ(x, y) ≥ Adhyp (x, y) whenever dhyp (x, y) < S. Suppose that dhyp (0, y) ≥ R. Then dhyp (0, z) ≥ R−S for every point z on the hyperbolic geodesic γ from x

CHAPTER 2. ITERATED FUNCTION SYSTEMS

33

to y. Therefore for R sufficiently large, (say R ≥ R0 ), the hyperbolic density at each point on γ is at least 1/(A). Putting this together we have Z 1 1 ρ(x, y) = ϕ(dhyp (x, y)) ≥ A dhyp (x, y) ≥ A |dz| ≥ |x − y| , A γ  as required.  Now µn → µ weakly in P D and µ is supported on ∂D, so µn → 0 weakly in P(D). By the above comparison of metrics, KE (µn , νn ) ≤ 2 µn (DR ) + c(R) Kρ (µn , νn ) , which we can make arbitrarily small by choosing R and then n sufficiently large. For corollary 2.13, use Lemma 1.8 to choose ϕ so that ν, µ0 ∈ Pρ , then apply Lemma 2.7 to get boundedness of Kρ (F∗n µ0 , F∗n ν). Corollary 2.14 is proved similarly, taking ν = F∗ µ0 . To prove conjecture 1 it would be sufficient to construct a non-empty subset of P(D), forward-invariant under F∗ , with only one limit point in P(∂D). This is how Beardon’s proof of the Wolff–Denjoy Theorem works; the forward-invariant set there is a Euclidean disc tangent to the unit circle at one point, itself a limit of hyperbolic discs. Unfortunately, similar limits of Kantorovich balls in P(D) do not have unique weak limits in P(D), so further ideas are required.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

2.4.2

34

An improved sufficient condition for stability

Combining ideas from [3] and [13] we can obtain a better sufficient condition than the one given by Ambroladze in [3] for an N -map IFS of analytic self maps of D to have an invariant measure. We will show that if the image of some finite composition of the maps is a Bloch subdomain of D then there exists an invariant law. Definition. Let U be a hyperbolic Riemann surface, with hyperbolic metric d. Then a subdomain V ⊂ U is a Bloch domain if and only if it has finite inradius, i. e. r = sup d(x, U \V ) < ∞. x∈V

Bloch subdomains are useful here because of the following result. Theorem 2.15. [13, Theorem 4.1] Suppose that U is a hyperbolic Riemann surface. If V is a Bloch subdomain of U with inradius r and f : U → V is conformal then f is Lipschitz with constant tanh r with respect to the hyperbolic metric on U . Remark: In [13] the theorem is only stated for plane domains, but the proof applies to arbitrary hyperbolic surfaces. Theorem 2.16. Let U be a hyperbolic Riemann surface, and let F be an IFS specified by analytic maps f1 , . . . , fN : U → U and probabilities pi > 0. Suppose that there is some finite composition of maps fi1 ◦ · · · ◦ fim that maps U into a Bloch subdomain of U . Then F is asymptotically stable.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

35

Proof. By Theorem 2.6 we just have to show that F has an invariant law. We will use Letac’s principle, so we have to show that the sequence of reverse iterates converges to a constant map with probability 1. Since f1 (U ) is a Bloch domain, we can take κ < 1 a Lipschitz constant for f1 with respect to the hyperbolic metric on U . Pick x0 ∈ U . Define A = max {d (x0 , fi (x0 )) : i = 1 . . . N } . Then for any x ∈ U we have d(x, fj (x)) ≤ d(x, x0 ) + d(x0 , fj (x0 )) + d(fj (x0 ), fj (x)) ≤ 2d(x, x0 ) + A. Now for any x ∈ U put ym = F1 ◦ · · · ◦ Fm (x). Let h(t) be the (random) number of values 1 ≤ i ≤ t such that Fi = f1 . Then d(ym , ym+n ) ≤ ≤

m+n−1 X t=m m+n−1 X

d(yt , yt+1 )

(2.2)

κh(t) d(x, ft+1 (x))

(2.3)

t=m

≤ (2d(x, x0 ) + A)

m+n−1 X

κh(t)

(2.4)

t=m

Claim.

P∞

t=1

κh(t) < ∞ almost surely.

It follows from the claim that ym is almost surely a Cauchy sequence. U is a complete metric space, so we know that (ym ) converges almost surely. The inequality 2.4 above shows that the convergence is locally uniform in x; 0 furthermore if ym = F1 ◦ · · · ◦ Fm (x0 ), then 0 d(ym , ym ) ≤ κh(t) d(x, x0 ).

CHAPTER 2. ITERATED FUNCTION SYSTEMS

36

It also follows from the claim that h(t) → ∞ almost surely and therefore the limit function is almost surely constant. It remains to prove the claim. The convergence of the sum is a tail event, so has probability 0 or 1; we prove it converges with positive probability. Let wn be the number of steps between the (n − 1)th occurence and nth occurrence of Fi = f1 . The wn are non-negative and have finite expectation C, so P(wn ≥ κ−n/2 ) ≤ C.κn/2 . The wn are independent, so −n/2

P(∀n ≥ 1, wn ≤ κ

)≥1−

∞ Y

(1 − C.κn/2 ) > 0,

n=1

because

P∞

n=1

C.κn/2 converges. Now ∞ X t=1

h(t)

κ

=

∞ X

wn κn−1 ,

n=1

the right-hand side of which converges with positive probability, as required.

Note that the conditions for Theorem 2.16 may be satisfied even when no individual fi maps into a Bloch subdomain. Bloch subdomains form a strictly larger class than relatively compact subdomains. A particular advantage is that the Bloch property is preserved under lifting to covering spaces. In Chapter 3 this will allow us to apply Theorem 2.16 to prove uniqueness of invariant measures for certain holomorphic correspondences.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

2.5

Convergence of reverse iterates

2.5.1

A generalisation of Letac’s Principle

37

Here we give a version of Letac’s Principle with weaker (albeit more complicated) hypotheses than the original. Theorem 2.17 (Generalised Letac’s Principle). Let F be any IFS of continuous maps. Suppose that for every sequence (nk ) in some non-empty family S, the corresponding subsequence of reverse iterates, i. e. Gnk , almost surely converges pointwise to a (random) constant. Suppose that ∞ 1. whenever (nk )∞ k=1 ∈ S, then also (1 + nk )k=1 ∈ S, and

2. for any two sequences (ak ), (bk ) ∈ S, there is a sequence (ck ) ∈ S such that (ak ) and (ck ) have a common infinite subsequence and (bk ) and (ck ) have a common infinite subsequence. Then the distribution of the limit point does not depend on the choice of (nk ) from S and is the unique invariant measure for F. The conditions here are more subtle than they appear at first sight. If Gnk almost surely converges pointwise then so does G1+nk , since F2 ◦ · · · ◦ Fnk +1 is identically distributed to Gnk and appending the continuous map F1 on the left does not affect convergence. Condition (1) is needed here because S need not be the family of all sequences such that the corresponding sequence of reverse iterates almost surely converges. (It seems unlikely that one could establish condition (2) for that family without using the conclusion of the theorem.)

CHAPTER 2. ITERATED FUNCTION SYSTEMS

38

Proof. Let (ak ), (bk ), (ck ) be as in condition (2) in the statement. By hypothesis, the sequences Gak , Gbk and Gck all converge pointwise almost surely to random constants a, b and c. Since (ak ) and (ck ) share a common infinite subsequence, we must have a = c and likewise we must have b = c. Therefore the limit does not depend on the choice of sequence from S. Applying this result to the sequences in condition (1) in the statement, we have lim G1+nk (x) = lim Gnk (x) a.s.,

k→∞

k→∞

so these two limits have the same distribution. Consider the shifted sequence of i. i. d. random maps Fei = Fi+1 . Note that    ∞ ∞ ei be the sequence of e Fi and (Fi )i=1 are identically distributed. Let G i=1   reverse iterates corresponding to Fei . Because F1 is continuous, we have lim G1+nk (x) = F1

k→∞



 en (x) . lim G k

k→∞

We have shown above that the distribution µ of the left-hand side equals the distribution of limk→∞ Gnk (x) . By construction this is also the distribution en (x). Since F1 is independent of (Fei )∞ , we obtain µ = F∗ µ, of limk→∞ G i=1 k as required. For uniqueness, let µ0 be any F∗ -invariant measure. Let x be a random point of X, independent of (Fi ) and distributed according to µ0 . Then the random variables Gnk (x) are all distributed according to µ0 . Almost sure pointwise convergence implies weak convergence in distribution, so µ0 = µ.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

2.5.2

39

Converse for non-uniformly contracting IFSs

Our reason for generalising Letac’s Principle in Theorem 2.17 is the following result. It says that for a non-uniformly contracting IFS with an invariant law, there always exists a family of sequences satisfying the conditions of Theorem 2.17. So our generalised Letac’s Principle applies to every nonuniformly contracting IFS. Theorem 2.18 (Partial Converse of Theorem 2.17). Let (X, d) be a proper metric space. Let F = (f1 , . . . , fm : X → X ; p1 , . . . pm ) be a non-uniformly contracting IFS. Suppose that F has an invariant law µ. Then there exists a sequence (ak )∞ k=1 of positive integers with the following property: for any sequence (nk )∞ k=1 of positive integers with n1 ≥ a1 and nk − nk−1 ≥ ak for all k ≥ 2, the corresponding subsequence of reverse iterates Gnk = F1 ◦ · · · ◦ Fnk almost surely converges locally uniformly to a (random) constant. The family of such sequences satisfies the conditions on the family S in Theorem 2.17, so the limit does not depend on the choice of the sequence (nk ) and the distribution of the limit is the unique F∗ -invariant measure, namely µ. Proof. The key to the proof is the following nesting lemma, which depends on the same hypotheses as the theorem. Let α ∈ X be an arbitrarily chosen base point, which will remain fixed for the remainder of §2.5.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

40

Lemma 2.19. There exists an increasing sequence of radii (rk )∞ k=0 , with rk → ∞ as k → ∞, and an increasing sequence of gaps ak ∈ N with the following property. For any sequence of times nk ∈ N such that n0 = 0 and nk − nk−1 ≥ ak for all k ≥ 1, there are almost surely infinitely many (random) offsets j ∈ N such that Fnj = f1

and

(∀k ≥ 1) g(k+j) (B (α, rk )) ⊂ B (α, rk−1 ) ,

(2.5)

where gk = Fnk−1 +1 ◦ · · · ◦ Fnk . We will prove Lemma 2.19 after showing how it is used to prove Theorem 2.18. The following geometric lemma is useful in both of these proofs. Lemma 2.20. Let K and L0 , L1 , L2 , L3 , . . . be compact subsets of a metric space (X, d), and suppose that f : X → X is strictly distance-decreasing. Suppose that for all i, Li ⊂ K and diam (Li+1 ) ≤ diam (f (Li )) . Then diam (Li ) ≤ ti , where ti → 0 as i → ∞, and ti depends only on i, K and f . Proof. We have diam (f (Li )) ≤ ϕ (diam(Li )) ,

CHAPTER 2. ITERATED FUNCTION SYSTEMS

41

where for t ≥ 0, ϕ(t) is defined by ϕ(t) = sup {d (f (x), f (y)) : x, y ∈ K and d(x, y) ≤ t} .

(2.6)

It follows that for all i ≥ 1 diam(Li ) ≤ ϕ◦i (diam(K)) . Since f is continuous and the subset of X × X over which the supremum in (2.6) is taken is compact, the supremum is attained, say at (xt , yt ). Therefore for all t > 0 we have ϕ(t) < t. We claim that for any t0 > 0, the iterates ti = ϕ◦i (t0 ) → 0 as i → ∞. Indeed, we have tn & r as n → ∞, and we can choose a subsequence along which (xt , yt ) converges to (xr , yr ). We conclude that d (xr , yr ) = r and also d (f (xr ) , f (yr )) = r, a contradiction unless r = 0. This completes the proof of lemma 2.20. Now we will show how to use Lemma 2.19 to prove Theorem 2.18. The conclusion of the theorem asserts the existence of a gap sequence (ak ); this will be the sequence (ak ) given by Lemma 2.19. Let (nk )∞ k=1 be a sequence (as in the theorem) such that n1 ≥ a1 and for all k ≥ 2, nk − nk−1 ≥ ak . Set n0 = 0 to get a sequence as in the lemma. Let j1 , j2 , . . . be the (random) sequence of values of j for which the nesting condition (2.5) is satisfied. We have B (α, r0 ) ⊃ g1+jm (B (α, r1 )) ⊃ g1+jm ◦ g2+jm (B (α, r2 )) ⊃ . . .  . . . ⊃ g1+jm ◦· · ·◦gjm+1 B α, r(jm+1 −jm ) ⊃ g1+jm ◦· · ·◦gjm+1 (B (α, r0 )) . (2.7) Fix a ∈ N and consider the sequence of sets Im = ga ◦ . . . gjm (B (α, r0 )) ,

CHAPTER 2. ITERATED FUNCTION SYSTEMS

42

defined for m sufficiently large, say m ≥ m0 (where m0 is minimal subject to jm0 ≥ a). Since (X, d) is proper, each Im is compact. Because of the inclusions (2.7) they are nested: whenever Im is defined then Im ⊃ Im+1 ⊃ Im+2 ⊃ . . . . We wish to show that almost surely their intersection is a (random) point, which we will call za . Since (X, d) is complete, it suffices to show that diam (Im ) → 0 as m → ∞. This follows from Lemma 2.20. To see this, fix m for the moment and in Lemma 2.20 take f = f1 , K = B (α, r0 ) and (for 0 ≤ i ≤ m − m0 )   Li = g1+jm−i ◦ · · · ◦ gj1+m−i ◦ · · · ◦ g1+jm−1 ◦ · · · ◦ gjm (B (α, r0 )) . (2.8) When they are defined, diam (Li+1 ) ≤ diam (f1 (Li )) because all the maps of F do not increase distances and the rightmost map in each ‘block’ in (2.8) is f1 . So diam (Im ) ≤ diam (Lm−m0 ) ≤ tm−m0 , where tm−m0 is the bound from Lemma 2.20 that only depends on K and f . Now we let m vary, but our choices of K and f do not vary, so the bounds tm−m0 converge to 0 according to Lemma 2.20. We have shown that diam (Im ) → 0 as m → ∞, as required. We digress briefly here to note that if we let a ∈ N vary then we obtain as above a random sequence (za ) of intersection points. For a ≥ 1 they satisfy za = ga (za+1 ) . We will return to the idea of a random infinite reverse orbit in the proof of Theorem 2.24.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

43

Let L ⊂ X be any compact set. For i sufficiently large, L ⊂ B (α, ri ) so g1+jm ◦ · · · ◦ gi+jm (L) ⊂ B (α, r0 ) . Applying g1 ◦· · ·◦gjm to both sides, we find that the compact sets g1 ◦· · ·◦gk (L) converge to the point z1 as k → ∞. We have shown that the sequence Gnk = g1 ◦ · · · ◦ gk converges locally uniformly to the constant map with (random) value z1 . This is the first conclusion of Theorem 2.18. Next we show that the family of sequences (nk )∞ k=1 such that n1 ≥ a1 and nk − nk−1 ≥ ak for all k ≥ 2 satisfies the conditions of the generalised Letac principle. Condition (1) is obviously satisfied. For condition (2), to find the sequence (ck ), just take alternately a term from the sequence (ak ) and then a term from the sequence (bk ), always ensuring that the gaps are sufficiently long. This completes the proof of Theorem 2.18. Proof of Lemma 2.19. Choose any sequence of probabilities q1 , q2 , · · · ∈ (0, 1), such that

P

qk < ∞.

We will define inductively an increasing sequence of natural numbers (ak )∞ k=1 and a sequence of positive radii (rk )∞ k=0 such that rk → ∞ as k → ∞ and for k ≥ 1 we have (m ≥ ak ) =⇒ P [F1 ◦ · · · ◦ Fm (B (α, rk )) ⊂ B (α, rk−1 )] > 1 − qk .

(2.9)

Fix any positive integer b. We will show that with probability 1 the nesting event 2.5 occurs for some j ≥ b. This implies that with probability 1 it happens for infinitely many j.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

44

As in the statement of Lemma 2.19, we consider a sequence (nk )∞ k=0 such that n0 = 0 and nk − nk−1 ≥ ak for all k ≥ 1. Also recall the definition of the compositions gk = Fnk−1 +1 ◦ · · · ◦ Fnk . Using the independence of the maps Fi , condition (2.9) gives P [gk (B (α, rk )) ⊂ B (α, rk−1 )] > 1 − qk . Since the sequence (ak ) is increasing, we have (for each positive integer j) P [gj+k (B (α, rk )) ⊂ B (α, rk−1 )] > 1 − qk . Since the maps Fi are i. i. d. , the compositions gk are independent. Therefore for any m ∈ N we have P [(∀k ≥ m) gj+k (B (α, rk )) ⊂ B (α, rk−1 )] >

∞ Y

(1 − qk ) > 0.

k=m

Now by independence of the maps Fn we obtain   Fnj = f1 and for all 1 ≤ k < m  = 1. P ∃j ≥ b such that gj+k (B (α, rk )) ⊂ B (α, rk−1 ) Combining the last two equations gives   ∞ Y Fnj = f1 and for all 1 ≤ k < m   P ∃j ≥ b such that > (1 − qk ) . gj+k (B (α, rk )) ⊂ B (α, rk−1 ) k=m But m was arbitrary and 

Q∞

k=m

P ∃j ≥ b such that

(1 − qk ) % 1 as m → ∞, so we find  Fnj = f1 and for all k ≥ 1  = 1. gj+k (B (α, rk )) ⊂ B (α, rk−1 )

CHAPTER 2. ITERATED FUNCTION SYSTEMS

45

It remains to show that we can construct the sequences nk and rk as required; we choose the rk first, and then show that the condition describing which sequences nk we can use is of the form nk+1 − nk ≥ ak , as in the statement of the theorem. Let us choose the radii rk , for k ≥ 0. We know that F∗n (δα ) → µ weakly, so we also have

n−1

1X i F (δα ) → µ n i=0 ∗

as n → ∞.

Therefore we can choose radii rk and integers mk such that for each k ≥ 1, m ≥ mk−1 implies both F∗m (δα ) (B (α, rk−1 − 1)) > 1 − and

qk 3

m−1 p 1 qk 1 X i . F∗ (δα ) (B (α, rk−1 − 1)) > 1 − m i=0 6

(2.10)

(2.11)

At the same time we make sure that ri → ∞ as i → ∞. From (2.11) it follows that   qk  Fi ◦ · · · ◦ F1 (α) ∈ B (α, rk−1 − 1) for   (∀n ≥ mk−1 ) P >1− . p1  3 at least n 1 − values 0 ≤ i < n 2 (2.12) Let Xn be the number of maps F1 , . . . , Fn that take the value f1 . Let En be the event that both Xn ≥

2np1 3

and Fi ◦ · · · ◦ F1 (α) ∈ B (α, rk−1 − 1) for at least n(1 − p1 /2) of the values 0 ≤ i < n. (2.13)

CHAPTER 2. ITERATED FUNCTION SYSTEMS When En occurs, there must be at least

2np1 3



np1 2



46 values 0 ≤ i < n such

that both Fi ◦ · · · ◦ F1 (α) ∈ B (α, rk−1 − 1) and Fi+1 = f1 .

(2.14)

It follows from (2.12) that   2np1 qk − P Xn ≤ . P (En ) ≥ 1 − 3 3 Xn is binomially distributed with parameters (n, p1 ), so for n sufficiently large, 

2np1 P Xn ≤ 3


ak−1 . Then if n ≥ ak we have P [F1 ◦ · · · ◦ Fm (B (α, rk )) ⊂ B (α, rk−1 )] > 1 − qk . We have established that (2.9) is satisfied. This completes the proof of Lemma 2.19.

2.5.3

The natural extension of the skew product

The reader may have wondered why we defined the skew product τ as an extension of the unilateral shift rather than the bilateral shift. In this section we treat the skew product τˆ over the bilateral shift that arises from a stable non-uniformly contracting N -map IFS F. Using Theorem 2.18, we will show that there is a unique τˆ-invariant law that makes τˆ into an extension of the bilateral shift with a given Bernoulli measure, and that this extension is in fact an isomorphism; both m. p. d. s. are isomorphic to the natural extension of the original skew product τ . The invertibility of τˆ as an m. p. d. s. will enable us in section §2.5.4 to apply the Birkhoff ergodic theorem to study the sequence of reverse iterates. The idea of using the natural extension of the skew product to obtain ergodic theorems about the reverse iterates has been appeared elsewhere, e.g. [30]. The bilateral shift space is ˆ = {1, . . . , N }Z , Ω

CHAPTER 2. ITERATED FUNCTION SYSTEMS

48

ˆ which is the completion of the product σ-algebra with with the σ-algebra M respect to the Bernoulli measure Bp that corresponds to the probability vector p = (p1 , . . . , pN ). The left-shift map σ ˆ acts on a symbol sequence ω = (ωi )∞ i=−∞ thus: (ˆ σ ω)i = ωi+1 .   ˆ M ˆ,B ˆp , σ There is a homomorphism φ from Ω, ˆ to (Ω, M, Bp , σ), given by restriction:  φ (ωi )i∈Z = (ωi )i∈N . Suppose that (W, C, ν, T ) is an invertible m. p. d. s. with a homomorphism α to (Ω, M, Bp , σ). Then there is an essentially unique homomorphism α ˆ: ˆ such that α = φ ◦ α W →Ω ˆ . We can construct α ˆ by setting  (ˆ α(w))i = α(T i w) 0 ,

for i ∈ Z.



 ˆ ˆ ˆ It follows that Ω, M , Bp , σ ˆ is the natural extension of (Ω, M, Bp , σ). Now suppose that (X, d) is a proper metric space with Borel σ-algebra B. Suppose that F is a stable non-uniformly contracting N -map IFS on (X, d), with probability vector p and invariant law µ. Recall the skew-product map τ : X × Ω → X × Ω defined in §2.1.1. We denote the Cartesian projections by πΩ : X × Ω → Ω , πX : X × Ω → X . πΩ is a homomorphism from (X × Ω, B × M, µ × Bp , τ ) to the unilateral shift (Ω, M, Bp , σ).

CHAPTER 2. ITERATED FUNCTION SYSTEMS

49

We now define the skew product over the bilateral shift: ˆ →X ×Ω ˆ, τˆ : X × Ω

(x, ω) 7→ (fω1 (x), σ ˆ ω) ,

ˆ → Ω. ˆ and the Cartesian projection πΩˆ : X × Ω The following diagram commutes:

(id, φ)

ˆ X ×Ω

- X ×Ω

τ

τˆ

πΩ

-

-

ˆ (id, φ) X ×Ω

πΩˆ

- X ×Ω

πΩˆ ?

φ

ˆ Ω

? -Ω

πΩ σ

σˆ

ˆ Ω

φ

-

-

?

? -Ω

ˆ so as to make the left, We wish to put a probability measure on X × Ω right, top and bottom faces of this diagram into extensions of m. p. d. s., ˆ are given the Bernoulli measures Bp and B ˆp respectively and when Ω and Ω X × Ω is given the probability measure µ × Bp .

CHAPTER 2. ITERATED FUNCTION SYSTEMS

50

Lemma 2.21. ˆ such that π ˆ is a There exists a unique Borel probability measure Q on X × Ω Ω     ˆ ˆ ˆ ˆ ˆ homomorphism of m. p. d. s. from X × Ω, B × M, Q, τˆ to Ω, M , Bp , σ ˆ , ˆ is the completion of the σ-algebra B × M ˆ with respect to where B × M Q. Moreover, πΩˆ is then an isomorphism between natural extensions of (X × Ω, B × M, µ × Bp , τ ). Proof. We begin by constructing a suitable measure Q using Theorem 2.18. Now that we are working with a bilateral shift, it is convenient to connect the reverse iterates with the skew product. To do this, we will discard our original definition of Gn in terms of the maps F1 , . . . , Fn and instead model ˆ M, ˆ B ˆp ) by extending the the reverse iterates Gn on the probability space (Ω, definition Fn = fωn

for each n ∈ Z ,

and then for n ∈ N setting Gn = F0 ◦ F−1 · · · ◦ F1−n = fω0 ◦ fω−1 ◦ · · · ◦ fω1−n . We have not changed the distribution of the sequence (Gn )∞ 1 , so Theorem 2.18 still applies to the sequence (Gn ). Here is the connection of our new reverse iterates with the skew product: τ n (x, σ ˆ −n ω) = (Gn (ω)(x), ω) .

(2.16)

Let (nk ) be any suitable sequence for Theorem 2.18. Then we define L(ω) to be the random constant limit in Theorem 2.18, i.e. L(ω) = lim Gnk (α) . k→∞

CHAPTER 2. ITERATED FUNCTION SYSTEMS

51

L(ω) is a Bp -a.e. defined function of ω. For all i ∈ Z and n ∈ N, we have Gn+1 (ˆ σ i ω) = fωi ◦ Gn (ˆ σ i−1 ω).

(2.17)

Since the sequence (1 + nk ) is also admissible for Theorem 2.18, taking the limit in (2.17) as k → ∞ yields L(ˆ σ i ω) = fωi ◦ L(ˆ σ i−1 ω) .

(2.18)

Consider the map ˆ →X ×ω L# = (L, id) : Ω ˆ. L# is Borel-measurable because L is Borel-measurable, being a Bp -a.e. limit of continuous functions. We define Q using as the push-forward by L# of the Bernoulli measure: ˆp . Q = (L# )∗ B To show that Q is a τˆ-invariant probability measure, it suffices to check that L# ◦ σ ˆ = τ ◦ L# , which is the content of (2.18). Since πΩˆ ◦ L# = id, we have (πΩˆ )∗ Q = Bp , L# ◦ πΩˆ = id ,

and also Q-a.e. ,

so πΩˆ is an isomorphism as required. The uniqueness of Q will follow once we show that (id, φ) is a homomorphism that makes τˆ into the natural extension of τ . To this end, let α be a homomorphism from (W, C, ν, T ) to (X × Ω, B × M, µ × Bp , τ ). Then πΩ ◦ α

CHAPTER 2. ITERATED FUNCTION SYSTEMS

52

is a homomorphism, so is essentially uniquely expressible as α = φ ◦ β, where   ˆ M ˆ,B ˆp , σ β is a homomorphism from (W, C, ν, T ) to Ω, ˆ . We wish to prove that α(w) = (L(β(w)), φ(β(w))) ,

for ν-a.e. w ∈ W

(2.19)

for then α factors in an essentially unique way through β as α = (id, φ) ◦ L# ◦ β = (L, φ) ◦ β . To prove (2.19) we use the locally uniform convergence of the reverse iterates Gnk given by Theorem 2.18. Let  > 0. Since (X, d) is proper, we can choose a compact set K ⊂ X such that 1 −  < µ(K) = ν(K × Ω) . −1 Let (nk )∞ k=1 be a suitable sequence in Theorem 2.18. Because ν is T

invariant, we have ν



  w ∈ W : α T −nk (w) ∈ K for infinitely many k ∈ N > 1 −  .

On the other hand, Bp -a.s. Gnk converges uniformly on K to L(ω) as k → ∞. It follows that ν ({w ∈ W : πX (α(w)) = L(β(w))}) > 1 −  , which proves (2.19) since  was arbitrary. The reader should note that the τˆ-invariant measure Q is in general not ˆ co-ordinates are not independent. a product measure, i.e. the X and Ω Lemma 2.21 genuinely relies on fact that the maps are non-expanding. For a counterexample, consider the 1-map IFS given by the doubling map

CHAPTER 2. ITERATED FUNCTION SYSTEMS

53

modulo 1, with µ being Lebesgue measure. In this case the skew product m. p. d. s. is isomorphic to the unilateral Bernoulli shift on two symbols  with probabilities 12 , 12 , although the underlying shift is the Bernoulli shift on one symbol, i.e. the trivial m. p. d. s. . The natural extensions are the corresponding bilateral shifts, which are not isomorphic. We can use L to construct the isomorphism from σ ˆ to the standard construction of the natural extension of τ that was explained in §1.4. To do this, we must associate to Bp -almost every symbol sequence ω = (ωn )∞ −∞ a bi-infinite orbit (xi ) (ω), satisfying (∀i ∈ Z)

xi = fωi (xi−1 ) = Fi (xi−1 ) .

(2.20)

For every i ∈ Z, the distribution of the i-th co-ordinate xi must be µ. These conditions are satisfied when we set   xi (ω) := L σ ˆ i ω = πX τˆi (x0 , ω) . τˆ−1 is ergodic, since it is isomorphic to σ ˆ . This is useful because the random backwards orbit (x−n )∞ ˆ−1 . n=0 is a projection of the forward orbit of τ In particular we can apply the Birkhoff Pointwise Ergodic Theorem to τˆ−1 to obtain information about averages of functions along the sequence (x−n )∞ n=0 . Since Gn (x−n ) = x0 , we can use this to obtain information about the complete sequence of reverse iterates, where previously we only had information about sparse subsequences.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

54

Lemma 2.22. For Bp -a.e. ω ∈ Ω, the backward orbit (x−n )∞ n=0 (ω) constructed above is uniquely determined by (2.20) together with the following property: for any compact set K ⊂ X, n−1

1X χK (x−j ) → µ(K) as n → ∞ , n j=0 where χK is the characteristic function of K. Proof. To see that the property is satisfied Bp -almost surely, we apply the   ˆ Q, τˆ ˆ B × M, Birkhoff Pointwise Ergodic Theorem to the m. p. d. s. X × Ω, and the L1 function χK ◦ πX , using χK (x−n ) = χK ◦ πX ◦ τˆ−n (x0 , ω) and Z R χK ◦ πX dQ = χK dˆ µ = µ(K) . For the uniqueness, suppose that for some symbol sequence ω there are two orbits (xn ) and (yn ), both satisfying this criterion for every compact K ⊂ X. We may choose a compact set K such that µ(K) > 12 . Then n−1

lim inf n→∞

1X χK (x−j ) χK (y−j ) ≥ 2µ(K) − 1 > 0, n j=0

Bp -a.s.,

so at infinitely many negative times we must have d (x−j , y−j ) ≤ diam(K). Because the maps fi do not increase distance, we have d (x−j , y−j ) ≤ diam(K) for all j ∈ Z. As was shown in the last part of the proof of Lemma 2.21, Bp -a.s. there is a compact K 0 ⊂ X such that x−nk ∈ K 0 for infinitely many k ∈ N. For these values of k, y−nk is in K 00 , the closed diam(K)neighbourhood of K 0 , which is compact because X is proper. Meanwhile,

CHAPTER 2. ITERATED FUNCTION SYSTEMS

55

Gnk converges uniformly on K 00 to x0 , so y0 = Gnk (y−nk ) → x0

as k → ∞,

i.e. y0 = x0 , Bp -a.s. The criterion of the lemma is unchanged if we make the index of summation run from some fixed m instead of 0; the above argument then yields xm = ym , Bp -a.s.

2.5.4

Strong converse for metrically taut spaces

Definition (Iancu and Williams). A metric space (X, d) is taut if for every x, y ∈ X and every  > 0 there exists a sequence x = z0 , z1 , z2 , . . . , zn = y of points such that d (zi−1 , zi ) ≤  for all i = 1, . . . , n, and n X

d (zi−1 , zi ) ≤ d(x, y) +  .

i=1

For example, every Riemannian manifold is a taut metric space. Lemma 2.23. Suppose that (X, d) is proper. Then (X, d) is taut if and only if for every x 6= y ∈ X and 0 < t < d(x, y), there exists z ∈ X with d(x, z) = t and d(z, y) = d(x, y) − t. Proof. The ‘if’ direction is trivial. For ‘only if’, suppose we are given x 6= y in a taut metric space (X, d) and 0 < t < d(x, y). Then for any  > 0 we obtain a chain of points zi as in the definition of tautness, and we choose z to be the first of these such that d(x, z ) ≥ t

CHAPTER 2. ITERATED FUNCTION SYSTEMS

56

Then d(x, z ) ≤ t +  and d(z ) ≤ d(x, y) − t +  . Since (X, d) is proper, the ball B(x, d(x, y)) is compact, so we can choose a sequence i → 0 as i → ∞ such that zi converges; take z in the lemma to be the limit of this sequence. We will in fact only use the equivalent condition given by Lemma 2.23. Various properties of general taut metric spaces are developed in [43]. Theorem 2.24. Suppose that (X, d) is proper and metrically taut. Let F = (f1 , . . . , fN : X → X; p1 , . . . pN ) be a non-uniformly contracting IFS, on (X, d) and suppose that F has an invariant law µ. Then the sequence (Gk ) of reverse iterates almost surely converges locally uniformly to a (random) constant. Corollary 2.25. Suppose that F is an N -map analytic IFS on D with an invariant law µ. Then the sequence of reverse iterates almost surely converges locally uniformly to a random constant, whose distribution is µ. Proof. The condition of metric tautness only comes into play in Lemma 2.27, near the end of the proof. Firstly we rephrase the conclusion of the theorem as a large deviations statement. The sequence of maps Gn converges locally uniformly to the

CHAPTER 2. ITERATED FUNCTION SYSTEMS

57

constant x0 if and only if for every neighbourhood U of x0 and every compact set K ⊂ X we have K ⊂ G−1 n (U ) for n sufficiently large. We know that x−n ∈ G−1 n ({x0 }). Also, there is a non-decreasing sequence (sn )∞ n=0 such that d (x−n , y) ≤ sn =⇒ Gn (y) ∈ U . So the question is whether the sequence (sn ) grows rapidly enough that for sufficiently large n this will be satisfied for all y ∈ K. Thus we need 1. a statement about the large deviations of d (α, x−n ) and 2. a statement about the growth of sn = d (x−n , X\G−1 n (U )). It is for the second of these that we will use the metric tautness property. We will deal first with the common building blocks for the two statements above. The next step in the proof of Theorem 2.24 is to use the natural extension of the skew product map τ associated with F, as in 2.5.3. We assume that f1 is strictly distance-decreasing. Let K ⊂ X be compact. Now apply the Birkhoff Pointwise Ergodic Theorem in the same way as in the proof of Lemma 2.22 to the indicator function for the event (x0 ∈ K and ω1 = 1) . Since the co-ordinate x0 is a function purely of the ωi : i ≤ 0, and ω1 is independent of these, we have Z χ {x0 ∈ K and ω1 = 1} dˆ µ = p1 µ(K). Hence n−1

1X χ {x−j ∈ K and ω1−j = 1} = p1 µ(K) n→∞ n j=0 lim

Bp -a.s.

CHAPTER 2. ITERATED FUNCTION SYSTEMS

58

So with probability 1, the random backwards orbit reasonably often lands in K when the next map is f1 . We are now in a position to prove a suitable large deviations result for d (α, x−n ). Lemma 2.26. Fix any positive distance t. Then Bp -almost surely, d (α, x−n ) < nt for all sufficiently large n. Proof. The proof is by contradiction. The statement “d (α, x−n ) > nt for infinitely many positive n” in itself does not violate the ergodic properties established above. However, if d (α, x−n ) > nt then d (α, x−m ) > mt/2 for n ≤ m ≤ n(1 + c), for some c > 0 which depends on t but not on n. The reason is that d (α, x−j ) ≤ d (α, F−j (α)) ≤

+

d F−j (α), F−j x−(1+j)  max d (α, fi (α)) + d α, x−(1+j) .

i=1,2,...,N

Define R=

max d (α, fi (α)) < ∞ .

i=1,2,...,N

If d (α, x−n ) > nt then d (α, x−j ) > jt/2 whenever  n≤j≤n

R+t R + t/2

 .



CHAPTER 2. ITERATED FUNCTION SYSTEMS

59

We may choose a large compact set K ⊂ X such that µ(K) >

R + 2t . R+t

Suppose now that for infinitely many positive n we have d (α, x−n ) > nt. Then for infinitely many positive n we have n−1

R + 2t 1X χK (x−j ) < , n j=0 R+t contrary to Lemma 2.22. Next we use the metric tautness property to give a lower bound on the growth of the pre-images of an arbitrary neighbourhood U of x0 . Lemma 2.27. There is a distance t0 > 0 such that  0 sn = d x−n , X\G−1 n (U ) ≥ nt for all sufficiently large n, Bp -a.s. Proof. Choose  > 0 so that B (x0 , ) ⊂ U . Define, for a ≤ b, r(x, a, b) = sup {r : Fa ◦ · · · ◦ Fb (B(x, r)) ⊂ B (Fa ◦ · · · ◦ Fb (x), )}  = d x, X\ (Fa ◦ · · · ◦ Fb )−1 (B(Fa ◦ · · · ◦ Fb (x), ) . We now show that (r(x, a, b) − ) is superadditive in the sense that for a ≤ b ≤ c we have (r(x, a, c)−) ≥ (r(x, b+1, c)−) + (r (Fb+1 ◦ · · · ◦ Fc (x), a, b) − ) . (2.21) To see this, let y be any point in the closed ball B(x, S) where S is the right-hand side of (2.21).

We have to show that Fa ◦ · · · ◦ Fc (y) lies in

CHAPTER 2. ITERATED FUNCTION SYSTEMS

60

B (Fa ◦ · · · ◦ Fc (x), ). The metric tautness condition tells us that there is a point z such that d(x, z) ≤ r(x, b + 1, c) and d(z, y) ≤ r (Fb+1 ◦ · · · ◦ Fc (x), a, b) −  . Then, by definition, d (Fb+1 ◦ · · · ◦ Fc (x), Fb+1 ◦ · · · ◦ Fc (z)) ≤ . Because the maps do not increase distance, d (Fb+1 ◦ · · · ◦ Fc (z), Fb+1 ◦ · · · ◦ Fc (y)) ≤ d(z, y) , so by the triangle inequality, d (Fb+1 ◦ · · · ◦ Fc (x), Fb+1 ◦ · · · ◦ Fc (y)) ≤  + d(z, y) ≤ r (Fb+1 ◦ · · · ◦ Fc (x), a, b) . Therefore we have d (Fa ◦ · · · ◦ Fc (x), Fa ◦ · · · ◦ Fc (y)) ≤  , which establishes (2.21). Now we apply the superadditivity repeatedly to obtain r (x−n , 1 − n, 0) −  ≥

n X

r (x−j , 1 − j, 1 − j) − .

j=1

Next, we show that when Fk = f1 , then r(x, k, k) −  > 0 for every x ∈ X. Suppose not, then there is a point x ∈ X and a sequence of points (zn ) such that d (zn , x) →  as n → ∞ and d (f1 (zn ) , f1 (x)) > . Because X is

CHAPTER 2. ITERATED FUNCTION SYSTEMS

61

proper, there is a convergent subsequence with limit z, say. Then d(z, x) =  but d (f1 (z), f1 (x)) ≥ . Since f1 is strictly distance-decreasing, this is a contradiction. We now apply the Birkhoff Pointwise Ergodic Theorem to the the averages of the positive function r(ω) = r (x−1 , 0, 0) along the forward orbits of the ergodic transformation σ ˆ −1 . We find that Bp -a.s., Z n 1X lim inf r (x−j , j − 1, j − 1) −  ≥ E(r(ω) − ) dµ n→∞ n j=1

>

0.

(We do not know that the function r is in L1 (Bp ), so we have to apply the Birkhoff Pointwise Ergodic Theorem to truncations of the function in order to get this, which is why we have a lim inf rather than a limit). Hence lim inf n→∞

r (x−n , 0, n − 1) −  n

>

0,

Bp -a.s.

Therefore the conclusion of the lemma is satisfied by taking Z 0 t = E(r(ω) − ) dµ .

At the cost of using a less well-known result, we could shorten this proof slightly by applying Kingman’s Subadditive Ergodic Theorem instead of Birkhoff’s Pointwise Ergodic Theorem. Finally we combine Lemma 2.26 and Lemma 2.27 to complete the proof of the theorem. Choose t = 2t0 , so that for sufficiently large n we have  0 d x−n , X\G−1 n (U ) ≥ d (α, x−n ) + nt , and K ⊂ B(α, nt0 ) for sufficiently large n. Therefore K ⊂ G−1 n (U )

for sufficiently large n, Bp -a.s.

Chapter 3 Holomorphic correspondences 3.1

Introduction

In this chapter we will study the geometry and dynamics of holomorphic correspondences on compact Riemann surfaces. Polynomials and rational maps on the Riemann sphere are objects belonging to complex algebraic geometry, yet in studying their dynamics, complex analysis must be applied in ways that do not belong to algebraic geometry. This is because we are interested in describing asymptotic behaviour with respect to the usual topology rather than the Zariski topology. We will do the same with holomorphic correspondences, introducing them as objects of algebraic geometry then switching to the point of view of geometric function theory to obtain conclusions about the invariant probability measures of the associated Markov chains. In doing this we will make use of some of the results and techniques of Chapter 2.

62

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

3.1.1

63

Correspondences – the algebraic viewpoint

The following definitions mostly follow the exposition given in [37, §2.5], although some of the notation follows [21]. Let C and C 0 be (smooth, projective) algebraic curves over C, which we may also think of as compact Riemann surfaces. The dth symmetric product (C 0 )(d) is a compact complex manifold [37, p. 236]. A correspondence T : C → C 0 of degree d associates to every point p ∈ C an effective divisor T (p) of degree d on C 0 , varying holomorphically with p, in the sense that it is a holomorphic map from C to (C 0 )(d) . Equivalently we can specify T by its curve of correspondence D = {(p, q) : q ∈ T (p)} ⊂ C × C 0 . Then T (p) is obtained by pullback of divisors, as T (p) = i∗p (D), where ip is the inclusion map C 0 → C × C 0 such that ip (q) = (p, q). When D is irreducible then we call the correspondence T irreducible. Note that D need not be a smooth curve in C ×C 0 , but nevertheless there is a compact Riemann ˜ (the desingularization of D), with analytic maps π1 : D ˜ → C and surface D ˜ → C 0 such that π2 : D T (p) =

X

vπ1 (z) . π2 (z) = π2 ◦ π1∗ (p) ,

˜ : π1 (z) = p z∈D where vπ1 (z) is the valency of π1 at z. We will sometimes think of the correspondence as a multivalued map T = π2 ◦ π1−1 . The inverse of the correspondence T is the correspondence T −1 given by π1 ◦π2∗ . (We have reversed the roles of C and C 0 while keeping the same curve of correspondence.) The degree d0 of T −1 need not be the same as that of T .

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

64

˜ is called forwardThe pair (d0 , d) is called the bidegree of T . A point z ∈ D singular if vπ1 (z) > 1 and backward-singular if vπ2 (z) > 1. Accordingly a point p ∈ C is called forward-singular if T (p) is supported on fewer than d points, which happens if and only if p = π1 (z) for some forward-singular ˜ Similarly q ∈ C 0 is called backward-singular if and only if point z ∈ D. T −1 (q) is supported on fewer than d0 points. In [37], the point (p, q) ∈ D is called a coincident point if q appears in T (p) with multiplicity greater than 1. If D has a node at (p, q), then (p, q) is a coincident point but need not be forward-singular. Coincident points are important in enumerative algebraic geometry, but do not have much connection with dynamics; in contrast forward-singular and backward-singular points turn out to be very important in studying the dynamics of a correspondence. b →C b is a rational map of degree n. Then we may Suppose that R : C consider R as a correspondence of bidegree (n, 1) and R−1 as a correspondence of bidegree (1, n). Note that a critical point of R is not a singular point, but a critical value is both a backward-singular point for the correspondence R and a forward-singular point of the correspondence R−1 . Given two correspondences T : C → C 0 and S : C 0 → C 00 , we may P compose them as follows. Expressing T (p) = q∈C 0 m(q).q, we define S ◦ P T (p) = q∈C 0 m(q)T (q) , an effective divisor on C 00 . Note that unless the correspondence T has bidegree (1, 1), then the composite T −1 ◦ T will not be the identity. If d0 = 1 then T −1 ◦ T will be the correspondence p 7→ d.p. From the point of view of dynamics, we will wish to consider two correspondences T : C → C and T 0 : C 0 → C 0 as equivalent when there exists an isomorphism of algebraic curves φ : C → C 0 such that φ−1 ◦ T 0 ◦ φ = T as

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

65

divisors. When this happens, we say that T and T 0 are conjugate.

3.1.2

The geometric function theory viewpoint

A holomorphic correspondence is a multivalued analytic map between compact Riemann surfaces which is locally defined except at a discrete set of singular points by a finite set of of analytic branches. Each connected component of the graph of the correspondence in the neighbourhood of a singular point is given analytically by a Puiseux expansion, i. e. a power series in some fractional power of a local co-ordinate.

3.1.3

Dynamics on the Jacobian

In order to iterate the correspondence T , we must have C 0 = C. To carry out iteration using only the constructions of algebraic geometry, we must extend T linearly to T : Div(C) → Div(C); then T n will associate to each point of C a divisor of degree dn . This does not constitute an interesting dynamical system because the degree of the divisors varies with time, so it is not very meaningful to compare distinct points of an orbit. A natural way to surmount this difficulty is to consider the action of T on Div0 (C). Consider a principal divisor (g) on C. We have T ((g)) = (π2 )∗ ◦ π1∗ ((g)) = (π2 )∗ ((g ◦ π1 )) = (h) , where h is the meromorphic function on C such that h(z) is the product of the values of g ◦ π1 over the points of π2−1 (z), repeated according to multiplicity. h is a non-zero meromorphic function. Thus T maps principal divisors to

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

66

principal divisors. Hence T descends to a linear map on the Jacobian of C, which we write as T : Pic0 (C) → Pic0 (C). When C has positive genus, a correspondence T : C → C is said to have valence k ∈ Z if the linear equivalence class of the divisor T (p) + k.p does not depend on p. This happens if and only if T : Pic0 (C) → Pic0 (C) is multiplication by −k. Not every correspondence has valency, but on a generic Riemann surface there are no correspondences without valency, since the Jacobian is a complex torus which typically has no endomorphisms other than multiplication by elements of Z. For example, an elliptic curve (which is isomorphic to its Jacobian) has a correspondence without valency if and only if it has complex multiplication: in this case the map T : Pic0 (C) → Pic0 (C) is a non-integer endomorphism. Conversely any non-integer endomorphism is itself a correspondence without valency. In §3.5.2 we will construct some irreducible correspondences of bidegree (2, 2) without valency.

3.1.4

Separable correspondences

A holomorphic correspondence T : C → C 0 of bidegree (d0 , d) is said to be separable if T = π3−1 ◦ π4 for some analytic maps π3 : C 0 → C 00

and

π4 : C → C 00 .

If T is separable and the divisors T (p) and T (q) have a point in common, then T (p) = T (q). In the dynamical case C = C 0 , we can understand everything

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

67

about the dynamics of a separable correspondence by studying the associated correspondence π4 ◦ π3−1 : C 00 → C 00 . This correspondence will sometimes be simpler to study because the genus of C 00 is less than or equal to the genus of C. Lemma 3.1. b then T has valence 0. If T : C → C is separable with C 00 = C, Proof. In this case π3 and π4 are meromorphic functions and T (p) − T (q) = π3∗ (π4 (p) − π4 (q)), which is a principal divisor since π4 (p) − π4 (q) is principal.

3.1.5

Correspondences as Markov chains

Given a correspondence T : C → C of degree d > 0, we will consider an associated Markov chain taking values in C, defined as follows. Let X0 be a C-valued random variable, possibly constant. Express T (Xn ) =

k X

m i . pi ,

i=1

for distinct points pi ∈ C, then set P(Xn+1 = pi ) = mi /d . We also insist that the transitions at distinct times are mutually independent and independent of X0 . In general this Markov chain cannot be expressed as an IFS of continuous (single-valued) maps. We could make finitely many branch cuts along smooth arcs. Then the orbits (Xn )∞ 1 would be those of an IFS F of finitely

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

68

many functions, each one analytic except on the chosen arcs. However the choice of arcs is rather arbitrary and because the individual functions are not continuous at the cuts we cannot apply any of the theory of continuous IFS. We consider the curve C with the usual topology (inherited from a projective embedding), not the Zariski topology. This makes C compact. For a continuous function f on C, we define the pull-back T ∗ (f ) to be the function given by T ∗ (f )(p) = where T (p) =

P

X mi d

f (pi ) ,

mi pi . Observe that T ∗ acts on C(C). This is easily checked

locally. In fact T ∗ is the Perron–Frobenius operator of the IFS F, which is therefore good in the sense of §2.1.2. Furthermore T ∗ : C(C) → C(C) is a bounded linear operator. We can also describe the Markov operator of F purely in terms of T , as follows. Given a Borel set A, let n(T, z, A) denote the number of points of T (z) in A, counted according to multiplicity. Then, given a law µ on C, we define its push-forward by Z 1 T∗ (µ)(A) = n(T, z, A) dµ(z) . d C We will call T∗ the push-forward operator and recall that it is adjoint to T ∗ . We will be interested in invariant laws, those for which T∗ (µ) = µ. Lemma 2.1 tells us that there exists at least one invariant probability measure for T .1 Note that T∗ acts continuously on M(C) (with the weak topology) and in particular acts continuously on P(C). 1

We warn the reader that an invariant measure for T need not be invariant for T −1 .

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

3.2 3.2.1

69

Background results Invariant measures for rational maps, rational semigroups and Kleinian groups

b →C b be a rational map of degree d ≥ 2. As a correspondence, T Let T : C has bidegree (d, 1). A T −1 -invariant measure is necessarily T -invariant (see footnote 1). In the context of T -invariant measures, the standard terminology is to say that a T -invariant measure is balanced when it is also T −1 -invariant. T may have a finite backward-invariant set E, called the exceptional set, consisting of at most two points. Choose any point z0 ∈ / E and use it as the initial state of the Markov chain (zn ) associated to the inverse correspondence T −1 . What can be said about the statistical behaviour of the sequence (zn )∞ n=0 ? This question was answered for a polynomial map by Brolin in the 1960s and then for an arbitrary rational map in the early 1980s by Lyubich [49], and at the same time by Ma˜ n´e et al. [32, 52]. Write µn for the distribution of zn . The laws µn converge weakly to a doublyinvariant law µ whose support is J , the Julia set of T . Moreover, for every Borel set B on which T is injective, we have µ(T (B)) = d · µ(B); in other words, the Jacobian of T with respect to µ is d everywhere. µ is the unique invariant measure of maximal entropy for T , with entropy log d. Also, T is an exact endomorphism with respect to the measure µ, which means that b with µ(A) > 0, we have µ(T n (A)) → 1 as n → ∞. for any Borel set A ⊂ C (Exactness is the measure-theoretic analogue of the topological transitivity of T on J ). It follows that T is strong-mixing, hence also ergodic, with respect to µ. The natural extension of the system (T, µ) is also ergodic,

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

70

so if z0 is chosen randomly according to µ then with probability one the random backward orbit (zn )∞ n=0 approximates µ in the following sense: for b any continuous function f on C, n−1

1X f (zi ) lim n→∞ n i=0

Z =

f dµ.

Put another way, the sequence of empirical measures n−1

1X µ ˆn = δz n i=0 i almost surely converges weakly to µ. Let G be a finitely generated (but not finite) Kleinian group, with a given set H of generators. Construct an iterated function system F using these b G also acts on hyperbolic 3-space generators and their inverses, acting on C. H3 . Suppose x ∈ H3 is not fixed by any element of G, then the sequence of images of x under the reverse iterates is a random walk on the Cayley graph of (G, H) embedded in H3 . This random walk almost surely converges b and the distribution µ of this limit point is to a unique point in ∂H3 = C, an invariant measure for F supported on the limit set of G.2 In fact if G is non-elementary then F is asymptotically stable. The topological dynamics of semigroups of rational maps have been studied by Hinkkanen and Martin [41] and others. To make sense of the notion of an invariant measure for a semigroup of rational maps, one needs to consider an iterated function system defined by rational maps f1 , . . . , fN and probability vector p = (p1 , . . . , pN ). For such an IFS, Sumi [71] proved that 2

Of course the measure µ depends on the probability vector of F, unlike the Patterson–

Sullivan measure, which only depends on G.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

71

among all invariant measures for the skew product, there is a unique one that maximises the relative metric entropy of the skew product over the unilateral Bernoulli shift. This maximal measure is in fact obtained by iterating the Perron-Frobenius operator for the inverse of the skew product, which involves the inverse correspondences of the fi . The maximal relative metric entropy is

N X

pi log (deg fi ) ;

i=1

the topological entropy of the skew product itself is the maximum of the P entropies of these invariant measures equal to log deg fi . Many direct analogies can be drawn between concepts and results in the theory of iteration of rational maps and in the theory of Kleinian groups. This relationship, called Sullivan’s dictionary, has substantial predictive power. Hopes have been expressed that it can be extended to include results about certain classes of holomorphic correspondences. It is not immediately obvious how to define such objects as the Julia set or limit set for a general correspondence; [21] describes how one might try to generalise the topologicaldynamical descriptions of Julia set and limit set. On the other hand, since Julia sets of rational maps and limit sets of Kleinian groups both arise as the supports of maximal entropy invariant measures of correspondences, it makes sense to study invariant measures for general holomorphic correspondences. In the light of Sumi’s results, the next step should be to study invariant measures and stability for the iteration of an irreducible correspondence. In this thesis we obtain stability and entropy results about invariant measures for certain special classes of irreducible correspondences, namely critically finite correspondences and forward critically finite correspondences.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

3.2.2

72

Example: the AGM correspondence

Let a0 ≥ b0 > 0. For n ≥ 0 an+1 =

an + b n 2

and

bn+1 =

p

Consider the canonical definite elliptic integral Z π/2 dθ G (a0 , b0 ) = , 2 2 a0 sin θ + b20 cos2 θ 0

an b n .

a≥b>0

Gauss observed that G (ai , bi ) = G (ai+1 , bi+1 ) , and that the sequences an and bn converge rapidly to a common limit M (a0 , b0 ). This enabled him to evaluate definite elliptic integrals with unprecedented precision, since π . 2M (a0 , b0 )) M (a0 , b0 ) is called the arithmetic-geometric mean of a and b. Like Newton’s G(a0 , b0 ) =

method for finding a root of a polynomial, this method displays quadratic convergence: the error, once small enough, is roughly squared at each step of  √  a + b the iteration. If we try to extend the transformation (a, b) 7→ 2 , ab to complex values of a and b, we immediately encounter the problem of which square root to choose. One solution is to choose both! Let us simplify the transformation by projectivizing it, to obtain T :z=

an an+1 z+1 7→ = √ . bn bn+1 2 z

This is the AGM correspondence. It is an irreducible correspondence of b as we may check by exhibiting D b and the ˜ =C bidegree (2, 2) on P1 (C) = C, maps π1 : z 7→ z 2

and π2 : z 7→

z2 + 1 . 2z

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

3.2.3

73

Critically finite correspondences

The results of this chapter were inspired by [20] and [19]. In §3.2.3–3.2.4 we summarise the statement and proof of [19, Prop. 1] and its application to the AGM correspondence. In doing so we will set up the notation that we need for later sections. Definition. A holomorphic correspondence T : C → C is critically finite if all forwardsingular and backward-singular points have finite grand orbits, i. e. there is a finite completely invariant set of points of C containing all the singular points. Suppose that a correspondence T : C → C is critically finite, and that A ⊂ C is the smallest completely invariant set containing all the singular ˜ Set U = D ˜ \ π1−1 (A). points. Then π1−1 (A) = π2−1 (A), a finite subset of D. Then the restrictions π1 |U and π2 |U are unbranched covering maps of X. For the moment we will not consider the cases where C = P1 and T has a complete critical orbit consting of at most two points, or C has genus 1 and T has no singular points. In all remaining cases, X and U are hyperbolic Riemann surfaces, so we may uniformise U via an unbranched covering map πU : H → U , which is the quotient map for some Fuchsian group G0 < PSL2 (R). Then π1 ◦ πU and π2 ◦ πU are two different unbranched covering maps of X; they are the quotient maps for Fuchsian groups G1 and G2 respectively, both containing G0 . Because H is simply-connected, we can lift π2 ◦ πU to an analytic map M : H → H. Because π2 ◦ πU is an unbranched covering map, the lift M is a M¨obius automorphism of H, i. e. an element of

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

74

PSL2 (R). The following diagram commutes:

H

M H

πU

πU ?

U

f

? -U

π2

π1

T

-

?

X

π1 ? -X

It follows that G2 = M −1 G1 M . Irreducibility implies that π1 and π2 do not simultaneously factor through any quotient of U , so G0 = G1 ∩ G2 . If the bidegree of T is (m, n) then G0 has index n = deg π1 in G1 and index m = deg π2 in G2 . When the intersection of two subgroups of PSL2 (R) or of PSL2 (C) is of finite index in each, they are said to be commensurable. The hyperbolic area of U is n times that of X because π1 is an unbranched covering map; similarly it is m times the area of X. These areas are finite, so m = n. Conversely, suppose we are given a finitely generated Fuchsian group of the first kind, i.e. G1 < PSL2 (R) such that V = H/G1 is a compact surface

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

75

minus finitely many punctures. Also suppose that M ∈ PSL2 (R) is such that G0 = M −1 G1 M ∩ G1 has finite index both in G1 and in G2 = M −1 G1 M . Let the quotient map for G1 be q1 : H → V . Then q2 = q1 ◦ M : H → H/G1 is a quotient map for G2 . Define U = H/G0 . Both q1 and q2 factor through the quotient map πU : H → U , so there exists unique unbranched covering maps πi : U → V ,

i = 1, 2,

such that π i ◦ π U = qi . Now we have a correspondence T : π2 ◦ π1−1 : V → V . Now fill in the punctures of U and V to obtain compact surfaces U 0 and V 0 . Apply the Removable Singularity Theorem for punctured Riemann surfaces to extend the maps π1 and π2 to branched coverings from U 0 to V 0 , thereby extending T to a correspondence T : V 0 → V 0 . Such correspondences (and formal sums of them) are called modular 3 . The above working shows that a correspondence on a compact Riemann surface is modular if and only if it is 3

In [26], correspondances modulaires are defined in a similar fashion beginning with a

more general simply-connected non-compact algebraic group in place of PSL2 (R)

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

76

critically finite with a finite invariant set of at least three points; this is the content of [19, Prop. 1]. One useful consequence of the above analysis is that we can easily describe the nth iterate of T . For n ≥ 0 define S[n] = {gn ◦ M ◦ gn−1 ◦ M ◦ · · · ◦ M ◦ g1 ◦ M ◦ g0 : all gi ∈ G1 } . S[n] is the set of lifts of branches of T n to H, (w. r. t. the quotient map q1 = π1 ◦ πU ). The branches themselves are of the form q1 ◦ γ ◦ q1−1 , where γ ∈ S[n]. We will also need a notation for the semigroup of all lifts of branches of iterates of T , so we define ∞ [ S= S[n] . n=0

Finally we must decide when the pairs (G1 , M ) and (G01 , M 0 ) give rise to conjugate correspondences. There are two choices to be made to recover such a pair from a correspondence, namely the choice of the covering group from a family of conjugate subgroups of PSL2 (R) and the choice of the lift M . Thus the pairs (G1 , M ) and (G01 , M 0 ) give rise to conjugate correspondences precisely when there exist h ∈ PSL2 (R) and g ∈ G1 such that G01 = h−1 G1 h and M 0 = h−1 g M h. Bullett remarks that composing M with an element of the normaliser of G1 in PSL2 (R) leaves G2 unchanged, but may alter the correspondence. So it is not enough merely to specify the groups G1 and G2 ; we really need to know the conjugating element M .

3.2.4

Lifting the AGM correspondence

The AGM correspondence z 7→

z+1 √ 2 z

is critically finite: the set {0, 1, −1, ∞}

is completely invariant. The point 1 is a critical point for both branches, the

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

77

values being 1 and −1. The two branches at −1 are analytic but happen both to take the value 0 there. 0 maps to ∞ in a forward-singular fashion and ∞ maps to itself in a forward-singular fashion. The covering group of b \ {0, 1, −1, ∞} is C ,    a b c ≡ 0 (mod 4)   ∈ SL2 (Z) : ±I. G1 = Γ0 (4, 2) =   c d b ≡ 0 (mod 2)   The map M is z 7→ z/2, represented by 

√ 1/ 2 0

     a b 1 0 ≡  G0 = Γ(4) =   c d 0 1



0 √ , so that 2

(mod 4)

, 

±I.



Notice that the AGM is conjugate to its own inverse via the automorphism z 7→

z+1 z−1

b This automorphism lifts to h : z 7→ −1/z on H. Note that h of C.

conjugates G1 to G2 = Γ0 (2, 4), conjugates G2 to G1 and conjugates M to M −1 .

3.3

Invariant measures of critically finite correspondences

Let T : C → C be a critically finite correspondence. The restriction of T to the complete critical orbit A gives a Markov chain with finitely many states, so there is at least one invariant measure supported on A; this need not also be invariant for T −1 . More interestingly we can restrict T to the complement X = C \ A, which has finite hyperbolic area 2π(2g − 2 + n), where g is

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

78

the genus of C and n = |A|. Observe that the hyperbolic area measure is invariant for both T and T −1 , because the lift M is a hyperbolic isometry. Dividing the hyperbolic area measure by its total mass, we obtain a doubly invariant probability measure µh on X. Let T be a holomorphic correspondence on a quasi-projective curve C, i.e. a compact Riemann surface punctured at finitely many points. In [26] it is shown that if T∗ preserves the hyperbolic area measure associated to a punctured surface C \{z1 , . . . , zn } then T is in fact a modular correspondence.

3.3.1

Latt` es rational maps

Before proving any theorems, let us draw an analogy between critically finite correspondences and certain special rational maps, originally constructed by Latt`es. Let ℘ be the Weierstrass function of the elliptic curve E = C/Λ, b with where Λ is a lattice in C. Then ℘ is a degree 2 branched cover of C, four critical points at the half-lattice points. ℘ is an even function. Now let z 7→ az induce an endomorphism ϕ : E → E (for which we need aΛ ⊂ Λ). b ℘−1 (x) consists Suppose that |a| > 1 so that deg φ > 1. Then for any x ∈ C, of two points ±t ∈ E, and ℘(at) = ℘(−at) = R(x) defines a rational map R. The Lyubich measure for R is the ℘-image of the normalised Euclidean area b The best measure on E; in particular its support, the Julia set of R, is C. way to think about the action of the Perron–Frobenius operator for R−1 is in terms of the Perron-Frobenius operator associated to the correspondence z 7→ a−1 z on E.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

3.3.2

79

Galois correspondences

There is an easy way to produce examples of critically finite correspondences. Let R : C → C 0 be any branched covering map between compact Riemann surfaces and define the Galois correspondence of R by TR : p 7→ R∗ (R(p)), a correspondence on C. The images of p under TR are all the points that map to the same point as does p under R. Since R has only finitely many critical points and each grand orbit of TR is a fibre of R (hence finite), TR is critically finite. TR will generally not be irreducible (it has the identity correspondence as a component), but each irreducible component T of TR will also be a critically finite correspondence. In the terminology of [26], these are interior modular correspondences, while all other irreducible modular correspondences are exterior. A modular correspondence specified by (G, M ) (as in §3.2.3) is interior if and only if the group generated by G and M is discrete. Thus an interior correspondence T has finite dynamics, meaning that the set of irreducible correspondences that arise as Zariski components of the graphs of iterates of T is finite. There are at most deg(R) such components; indeed at each point there are at most deg(R) branches of iterates of T . To avoid confusion, let us mention that R need not be a Galois covering (also called a regular or a normal covering, meaning that R is the quotient map for a group of automorphisms of C). It is such a covering if and only if the components of TR are all single-valued. So from the point of view of correspondences, the situation is most interesting when R is not a Galois covering.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

3.3.3

80

A dynamical dichotomy

Theorem 3.2. Let T : C → C be an irreducible critically finite correspondence with complete critical orbit A. Suppose that the Riemann surface X = C \ A is hyperbolic. Then exactly one of the following holds. 1. M and G1 generate a dense subgroup of PSL2 (R). In this case the the semigroup generated by M and G1 is also dense in PSL2 (R). In particular T has an infinite forward orbit, and if A 6= ∅ then each point of X has a forward orbit under T whose closure in C meets A. 2. M and G1 generate a discrete subgroup of PSL2 (R). In this case, T is a component of a Galois correspondence, arising from a map of degree at most 42(2g − 2 + |A|). The basic dichotomy in Theorem 3.2 that hG1 , M i is either discrete or dense is not new. [19] points out that the existence of an infinite orbit of T implies that the group generated by M and G1 is dense in PSL2 (R) and hence by a deep theorem of Margulis that G1 is a weakly arithmetic group, i. e. some conjugate of G1 is commensurable with PSL2 (Z). It follows that G1 contains parabolic elements, and hence that A 6=, so in fact the last part of item 1 always applies. However, we did not wish to rely on Margulis’ result in what is otherwise a fairly elementary theorem. The parts of Theorem 3.2 that may be new are the result that the semigroup generated by G1 and M is dense and the result that shows that the existence of a grand orbit with more than 42(2g − 2 + |A|) points guarantees the existence of an infinite orbit.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

81

Proof. Consider the semigroup S generated by G1 and M , and its closure S. We will prove below that S is in fact a subgroup of PSL2 (R), and hence is also the closure of the subgroup hG1 , M i of PSL2 (R) generated by G1 and M . Since S is closed, it is in fact a Lie subgroup of PSL2 (R) [66, Theorem VII.2.5]. Furthermore, S is invariant under conjugation by any element of G1 . In particular there are two hyperbolic elements of G1 with b that both conjugate S to itself. The connected disjoint fixed point sets in C component of S containing the identity is invariant under conjugation by the same elements. The connected Lie subgroups of PSLn (R) were classified in [36]; in the simple case of PSL2 (R), the only possibilities are the trivial subgroup, the one-parameter groups with either a fixed point in H or a fixed geodesic in H, and PSL2 (R) itself. Of these, only the trivial subgroup and PSL2 (R) are invariant under conjugation by two hyperbolic elements that have disjoint fixed point sets. Hence S is either discrete or all of PSL2 (R). In the discrete case, we must have S = hG1 , M i. Hurwitz’s Theorem tells us that the hyperbolic area of a fundamental domain for this group is at least π/21, so [hG1 , M i : G1 ] ≤ 42(2g − 2 + |A|) . The correspondence T is a component of the Galois correspondence arising from the map R : H/G1 → H/hG1 , M i . It remains to prove that S is a subgroup of PSL2 (R). Using continuity of right-multiplication, Lemma 3.3 below implies that the inverse of any element of S lies in S. Since taking the inverse in PSL2 (R) is continuous, S is also closed under inversion.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

82

Remark. [25] gives a proof that if a closed monoid in a simply-connected semisimple Lie group of rank one contains a lattice then it is in fact a subgroup. This result could be used in place of Lemma 3.3 in the proof of Theorem 3.2. On the other hand the proof below of Lemma 3.3 would also provide a proof of the above-mentioned result about closed monoids that makes direct use of the finite volume of the quotient by the lattice, whereas [25] uses the hyperbolic metric, the classification of isometries and some clever conjugation arguments. Lemma 3.3 (Recurrence for critically finite correspondences). For any g ∈ S, the identity element I of PSL2 (R) is in the closure of S ◦ g. Remark: This lemma has consequences for the Markov chain (Xn )∞ n=0 associated to T . In particular, for any  > 0, and any finite orbit segment x0 , x1 , . . . , xn (satisfying xi+1 ∈ T (xi ) for i = 0, . . . n − 1) we have P(∃ k > n : d(Xk , X0 ) <  | X0 = x0 , X1 = x1 , . . . , Xn = xn ) > 0 . Proof. Make geodesic branch cuts on X = C \ A to leave a connected and simply-connected domain U . On U , the correspondence T is described by m branches f1 , . . . , fm , each of which is a local isometry from U into X. Because the branch cuts have total hyperbolic area 0, the branches fi are well-defined modulo 0 as Borel-measurable maps from U to U , (where modulo 0 refers to the normalised hyperbolic area measure µh ). The individual maps fi need not preserve µh , but nevertheless µh is an invariant probability measure for the Borel-measurable IFS F, whose maps are these m branches of T on U , assigned equal probabilities. Let T1 U be the unit tangent bundle of U (where U has the Riemannian metric inherited from the hyperbolic metric

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

83

on X). Let f˜i be the map (fi , Dfi ) induced by fi on T1 U , and let F˜ be the associated measurable IFS. Write λ for the normalised Lebesgue measure on each fibre of T1 U , and λ × µh for the product measure on T1 U . Let B1/m be the Bernoulli measure associated to the probability vector (1/m, . . . , 1/m). Then B1/m ×(λ×µh ) is an invariant measure for the associated skew-product map τ : {1, . . . , m}N × T1 U → {1, . . . , m}N × T1 U . Let V˜ be any neighbourhood of the identity in PSL2 (R). Identify PSL2 (R) with T1 H by fixing some base point (˜ x0 , v˜0 ) ∈ T1 H and making the identification γ ∈ PSL2 (R)



(γ(˜ x0 ), Dγ(˜ v0 )) .

We may assume that x˜0 covers a point x0 ∈ U which is not mapped onto any of our branch cuts under any branch of any iterate of T . The covering map π1 : T1 H → T1 C maps V˜ homeomorphically onto a neighbourhood V of a point (x0 , v0 ) in T1 U . The element g ∈ S may be represented as g = γn ◦ M ◦ γn−1 ◦ M ◦ · · · ◦ M ◦ γ0 , for some (not necessarily unique) elements γ0 , . . . , γn ∈ G1 . Choose one such representation. Making V smaller if necessary, this representation corresponds to applying the composition of some particular sequence of n of the maps f˜i to the neighbourhood V . That sequence defines a cylinder set Cyl(g) ⊂ {1, . . . , m}N with Bernoulli measure m−n . Making V smaller if necessary we may assume that V is an open neighbourhood of (x0 , v0 ). Then

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

84

V is measurable and λ × µh (V ) > 0. It follows that B1/m × (λ × µh ) (Cyl(g) × V ) > 0 . We now apply Poincar´e’s Recurrence Theorem to the skew-product map τ for ˜ It says that there exists a point in Cyl(g) × V that returns to Cyl(g) × V F. after some positive number of iterations of the skew-product map τ . In particular, there is an element h ∈ S such that h ◦ g ∈ V˜ . The lemma follows.

3.3.4

Asymptotic stability

If T : C → C is an interior correspondence then it has infinitely many invariant probability measures. On the other hand if T : C → C is an irreducible critically finite correspondence with an infinite orbit, then Theorem 3.2 and Lemma 3.3 give us some topological information about the possible orbits of the Markov chain of T , but we can use them to prove more. Theorem 3.4. Let T : C → C be a critically finite correspondence with complete critical orbit A, such that the Riemann surface X = C \ A is hyperbolic. Suppose that T is not a Galois correspondence, or equivalently that T has an infinite orbit. Then the normalised hyperbolic area measure is the unique invariant probability measure for the restriction of T to X. Furthermore the restriction of T to X is asymptotically stable: for any Borel probability measure ν on X, we have T∗n ν → µh as n → ∞. This theorem has now appeared in a preprint by Clozel and Otal [25], of which we became aware only after producing the proof below. Although we

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

85

cannot claim priority for the result, we have kept it in this thesis because our proof is quite different from theirs, at least on the surface. The proof in [25] proves the locally uniform convergence of the sequence fn = (T ∗ )n f to a constant function, for each bounded and uniformly continuous function f on C. A simple coupling argument is used to show that the fn are uniformly continuous, and hence have a convergent subsequence. Then the density of hG1 , M i in PSL2 (R) and some inequalities between L2 norms are used to show that any subsequential limit of the fn must be constant. As remarked after the statement of Theorem 3.2, there cannot exist a correspondence with an infinite orbit but without singular points on a compact hyperbolic Riemann surface. So A is non-empty and X has at least one cusp. Since we do not know an elementary proof of this, we give two versions of our proof of Theorem 3.4. The simpler of the two makes the assumption that A is non-empty. Proof of Theorem 3.4. We use the method of the Kantorovich metric to obtain the asymptotic stability of T , which of course implies uniqueness of the invariant measure. The method of proof is almost identical to the argument of §2.3.1. Let ν 6= µh be a Borel probability measure on X. Let d be the hyperbolic metric on X, and consider a modified metric c = ϕ(d), as provided by Lemma 1.8, such that µ, ν ∈ Pc and (X, c) is a proper metric space. We will show that Kc (T∗ (ν), T∗ (µh )) ≤ Kc (ν, µh ) ,

(3.1)

∃n ∈ N such that Kc (T∗n (ν), T∗n (µh )) < Kc (ν, µh ) .

(3.2)

This plays the rˆole of Lemma 2.7. After this, the argument given in §2.3.1

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

86

works almost word-for-word, subject to replacing F∗ by T∗ and changing the hypotheses of Lemma 2.8 in an obvious way. The variable n in equation (3.2) does not interfere with the proof. Let us prove (3.1). The infimum in the definition of Kc (ν, µh ) is attained by some measure m on X × X, whose marginals are ν and µh . To each pair (x, y) ∈ X ×X we associate a shortest directed geodesic segment γ(x, y) from x to y in the hyperbolic metric on X. We can do this in a measurable fashion. Pushing forward by this map takes m to a measure m0 on the space GX of directed geodesic segments in X.4 The branches of T map geodesic segments to geodesic segments, so by weighting the branches each with probability

1 m

(where the bidegree of T is (m, m)), we obtain a Markov operator P acting on the space of measures M(GX). Project P (m0 ) back to a measure P (m) on X × X, whose marginals are T∗ (ν) and T∗ (µh ). We have Z Kc (T∗ (ν), t∗ (µh )) ≤

c(x, y) d(P (m)(x, y)) Z = c(x, y) dm(x, y) = Kc (ν, µh ) .

Now we prove (3.2), assuming that A is non-empty. Because S is dense in PSL2 (R), at least one of the images of any given minimal geodesic segment under some iterate of T must fail to be a minimal geodesic segment because it winds around a puncture and intersects itself. We produce a modifed Markov operator Q by mapping each geodesic segment to the weighted formal sum of minimal geodesic segments that join the endpoints of each of its images 4

Of course GX is essentially the tangent bundle of X, but it is better here to think of

geodesic segments than of tangent vectors.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

87

under T . Then a countable additivity argument shows that for some n ∈ N, Z Z n c(x, y) d (Q (m)) (x, y) < c(x, y) d (P n (m)) (x, y) , which implies (3.2). Finally we prove (3.2) without assuming that A is non-empty. This proof involves a more substantial modification of the operator P . Since ν 6= µh , the measure m is not supported on the diagonal in X × X. By a countable covering argument, there exists (x0 , y0 ) ∈ X × X such that m (B(x0 , ), B(y0 , )) = η > 0 ,

where  = d(x0 , y0 )/6 .

Recall that we defined S[n] to be the set of lifts of branches of T n . Now we define R[n] = {K −1 ◦ K 0 : K, K 0 ∈ S[n]} , and R = ∪∞ n=0 R[n] . We may think of R[n] as the set of all geometrical relationships between lifts of different branches of T n with respect to q1 . Lemma 3.5. Under the hypotheses of Theorem 3.4, R is dense in PSL2 (R). Because R is dense in PSL2 (R), there exist K 0 , K ∈ S[n] for some n ∈ N, such that  d K −1 ◦ K 0 (x0 ), y0 < 

and

 d K −1 ◦ K 0 (y0 ), x0 <  .

Hence (K(x0 ), K 0 (y0 )) < 

and

d (K(y0 ), K 0 (x0 )) <  .

(3.3)

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

88

Let A be the set of minimal geodesics that begin in B(x0 , ) and end in B(y0 , ). Let s ∈ A. The images of s under the branches of T n corresponding to K and K 0 are geodesic segments s1 and s2 , both of the same length as s, which is at least 4. Let s01 be the minimal geodesic segment from the start point of s1 to the end point of s2 and let s02 be the minimal geodesic segment from the start point of s2 to the end point of s1 . Both s01 and s02 have length at most 3, because of (3.3) and the triangle inequality. Replacing s1 and s2 by s01 and s02 whenever they arise in this way yields a probability measure on GX whose endpoint projections are the same as those of P n (m0 ). However, the integral of the length of its segments (in the c-metric) is strictly smaller than the corresponding integral for P n (m0 ). This suffices to prove (3.2) and completes the proof of Theorem 3.4. Proof of Lemma 3.5. Note that R[n] is closed under taking of inverses in PSL2 (R). We will show that R is a subgroup of PSL2 (R). For this it suffices to prove that for any F ∈ R[m] and G ∈ R[n], G ◦ F is in R. To this end, let U be any neighbourhood of G ◦ F in PSL2 (R). From the definition of S[n] we see that for any H ∈ S[m] and G = L−1 ◦ L0 ∈ R[n], we have H −1 ◦ G ◦ H = (L ◦ H)−1 ◦ (L0 ◦ H 0 ) ∈ R[n + m] .

(3.4)

Given the neighbourhood U ◦ F −1 of G in PSL2 (R), there exists a neighbourhood V of the identity such that for all H ∈ V we have H −1 ◦G◦H ∈ U ◦F −1 . Indeed, the function H 7→ H −1 ◦ G ◦ H is continuous and maps the identity to G. Now suppose F = K −1 ◦ K 0 , where K, K 0 ∈ S[m]. We can write F = (J ◦ K)−1 ◦ (J ◦ K 0 ) ,

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

89

where J ∈ S[N ] for some large N and J ◦ K ∈ V . This is possible because S is dense in PSL2 (R); we just have to choose J ∈ V ◦ K −1 . Then U contains (J ◦ K)−1 ◦ G ◦ (J ◦ K) ◦ F = (L ◦ J ◦ K)−1 ◦ (L0 ◦ J ◦ K) , which is in R[n + m + N ]. Since U was arbitrary, we see that G ◦ F ∈ R. We now know that R is a closed (hence Lie) subgroup of PSL2 (R), and that it is invariant under conjugation by any element of S, hence also by any element of S = PSL2 (R). Therefore R = PSL2 (R) as required. This completes the proof of Lemma 3.5.

3.4 3.4.1

Forward critically finite correspondences The inverse of a hyperbolic rational map

b Let C be the Consider a rational map T of degree d ≥ 2 acting on C. n set of critical points of T and let P C = ∪∞ n=1 T (C) be the postcritical set.

Suppose for convenience that the Julia set J of T does not contain the point ∞ (replace T by a conjugate if necessary). When P C ∩ J = ∅, we say that T is hyperbolic. [23, §V.2] gives basic results about hyperbolic rational maps, in particular that T is hyperbolic if and only if every critical point lies in the basin of an attracting periodic orbit of T . We will exclude from the present discussion the simple cases when T is conjugate to z 7→ z d or b C is a to z 7→ z −d . Then P C contains at least three points, so U = C\P hyperbolic domain. Since U contains no critical values of T , near any point of U there are d distinct analytic branches of T −1 , which can be analytically continued along any path in U , although they cannot be defined as single-

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

90

valued maps right around any postcritical point. However, we can choose a single-valued lift of each of these branches to D, obtaining analytic maps f1 , . . . , fd : D → D that make the following diagram commute: fi

inclusion

T

inclusion

D −−−→ fi (D) −−−−−→ D       πy πy πy U ←−−− T −1 (U ) −−−−−→ U

Because the iterated pre-images of any point not in the exceptional set E accumulate on J , there must be points in T −1 (P C) \ P C. Since fi (D) omits every lift of any such point, fi is not an isometry, so the Schwarz–Pick Lemma tells us that fi is strictly distance-decreasing. It follows that T is locally expanding w. r. t. the hyperbolic metric on U . In particular, on the compact and T -invariant set J ⊂ U , we find that the hyperbolic derivative satisfies T # ≥ ρ > 1. Since J is compact, the Euclidean metric on C is comparable on J to the hyperbolic metric on U . Hence T is expanding on J in the sense that there are constants α > 0 and A > 1 such that |(T n )0 (z)| > aAn for all z ∈ J and all n ∈ N. In fact this only happens when T is hyperbolic, so hyperbolic rational maps are also called expanding rational maps. At this point we depart from the treatment given in [23], by considering the iterated function system F associated to the lifts fi . Let F1 , F2 , F3 , . . . be a sequence of i. i. d. random maps from D to itself, with P(Fn = fi ) = d1 .

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

91

Choose x0 ∈ D with z0 = π(x0 ). Now for each n ∈ N, set xn = Fn ◦ · · · ◦ F1 (x0 ). Then zn = π(xn ) reconstructs the Markov chain (zn ) described in 3.1.5. As remarked above, it is known that the distributions µn of zn = π(xn ) converge weakly to a probability distribution on U . We now ask whether the distributions νn of xn necessarily converge to a probability distribution on D. It is not obviously so. Indeed, D covers U with infinitely many sheets, so the space of measures that project via π to the limit measure µ is not even compact. However, as a corollary of Theorem 2.16, we will prove that in fact the νn do converge weakly to a probability measure ν on D: Proposition 3.6. b \ P C. Let F be any IFS on D which is a lift of the restriction of T −1 to C Then F is asymptotically stable. The maps fi all avoid all the lifts of points in T −1 (P C) \ P C. Unfortunately it is not true that the complement of these points is a Bloch domain in D, because in U there are points of U (near the postcritical set) that in the hyperbolic metric on U are arbitrarily far from all such points. Claim: there is a subdomain V ⊂ U such that T −1 (V ) is relatively compact in V . It follows that π −1 (T −1 (V )) is a Bloch subdomain of π −1 (V ) (although not relatively compact in π −1 (V ), and is forward-invariant for each of the maps f1 , . . . , fd . Theorem 2.16 now tells us that the restriction of F to π −1 (V ) is asymptotically stable. Of course its invariant probability measure is also an invariant probability measure for F, so F itself is asymptotically stable, by Ambroladze’s Theorem.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

92

To prove the claim, we will construct the domain V by specifying its complement, a closed forward-invariant neighbourhood of the postcritical set. There are two types of critical point to consider: the periodic ones and the non-periodic ones. For any periodic cycle of T , say z1 , z2 , . . . , zk , containing at least one critical point of T , there exist small closed discs Bi about zi such that T (Bi ) is contained in the interior of Bi+1 (subscripts modulo k). Any non-periodic critical point is attracted under iteration of T to some attracting cycle [23, §V]. Fix such a cycle. Again we can find a cycle of closed discs Bi about the points of the cycle, each mapping strictly inside the next. Now for each critical point z attracted to that cycle, we have T m (z) in the interior of Bj , for some m and j. Then we can find a sequence of closed discs about the points T (z), T 2 (z), . . . T m−1 (z), each mapping strictly inside the next and the last mapping strictly inside Bj . Do this for all the critical points. There are only finitely many critical points (at most 2d − 2), so we have now chosen a finite collection of closed discs, whose union contains the closure of the postcritical set. The domain V is now defined to be the complement of the union of all these discs; by construction T −1 (V ) is relatively compact in V , as required.

3.4.2

Forward critical finiteness

In §3.4.1 we were able to prove asymptotic stability of a certain correspondence by lifting its branches to the universal cover of a suitable subdomain and applying iterated function system results. We will try to extend this method to apply to a larger class of correspondences T : C → C. We need to find a closed proper subset of C that is forward-invariant under T and

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

93

contains all the backward-singular points. Definition. The correspondence T is forward critically finite when the union of the forward orbits of the backward-singular points is finite. We will denote by P C the postcritical set, i.e. the union of the backwardsingular points and their forward orbits. Note that every postcritically finite rational map is forward critically finite, although it is only be critically finite as a correspondence if it is conjugate to z 7→ z n or z 7→ z −n . For a forward critically finite correspondence T of bidegree (m, n) on a curve C, we may delete the set P C from C. The punctured Riemann surface X = C \ P C is typically hyperbolic, uniformised by the unit disc D, in which case we can choose lifts of the m branches of T −1 to m analytic maps D → D. If we prove that the corresponding IFS is asymptotically stable, then as a corollary the restriction of T −1 to C \ P C is asymptotically stable. If we can also show that any T −1 -invariant probability measure must assign mass zero to P C, it will follow that T −1 is asymptotically stable.

3.4.3

The tangential correspondence on a quartic

Let C be a smooth curve of degree 4 in P2 (C). Every such curve is the canonical curve of its underlying Riemann surface. In particular a compact Riemann surface of genus 3 has a unique embedding of this form, up to automorphisms of P2 (C). For each point p ∈ C, the tangent line to C at p cuts out a divisor S(p) of degree 4 on C, and T (p) = S(p) − 2p is an effective divisor of degree 2. The correspondence T is called the tangential

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

94

correspondence on C. It has bidegree (10, 2), which is computed using the Riemann-Hurwitz formula in [37, p. 290]. A point p ∈ C is a flex of C when p ∈ T (p). A point at which the tangent to C meets C with multiplicity 4 is called a hyperflex. At a simple flex p, one branch of T has a fixed point at p, while the other branch has a critical point at p. These are the only critical points of T . Thus the backward-singular points of T are the points q such that T (p) = p + q, where p is a simple flex. If p is a hyperflex of C then the graph of T has an ordinary double point at (p, p); thus p is a fixed-point for both branches of T in a neighbourhood of p. In fact it is a repelling fixed point of each branch: by putting the curve C in √ a suitable normal form at p, we compute that the multipliers are −1 ± −2. The forward-singular points of T are the points whose tangent line is a bitangent to C. We do not count the tangent at a hyperflex as a bitangent. Note that the set of forward-singular points of T is forward-closed under T . In fact they form super-repelling cycles of length 2. Let B be the number of bitangent lines to C, H the number of hyperflexes, and F the number of simple flexes. We always have F + 2H = 24 and B + H = 28 (see [37, p. 549]). Since H ≤ 12, we have B ≥ 16. Let D be the desingularised graph of T . Then T = π2 ◦ π1−1 , where π1 : D → C has degree 2 and is branched over 2B points, and π2 : D → C has degree 10 and is branched over F points. The fact that π1 has degree 2 and is branched shows that D is irreducible. The genus of D is gD = (F + 42)/2 = B + 5. If C is a generic smooth plane quartic, then C has no hyperflexes, 24 simple flexes and 28 bitangent lines. In this case, T has 56 forward-singular points and 24 backward-singular points and the genus of its graph is 33.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

95

The tangential correspondence on a smooth plane curve of degree at least 4 is never separable. Indeed, if T (p) and T (q) have two points in common (or a common multiple point) then the tangent lines are p and q are equal, so p and q are either equal or lie on a bitangent. To ask for the tangential correspondence on a smooth plane quartic to be forward critically finite is to ask a great deal: the forward orbit of each of the simple flexes must be finite. One way to achieve this is dicussed in the next section.

3.4.4

The Fermat quartic

The Fermat quartic is the smooth plane curve C given in homogeneous coordinates by X 4 + Y 4 + Z 4 = 0. The Fermat quartic has 12 hyperflexes, namely those points described by homogeneous co-ordinates consisting of one zero and two eighth roots of unity. C has 16 other bitangent lines, meeting C in 32 distinct points. Note that all the flexes of C are hyperflexes, so T has no backward-singular points. Hence T is forward critically finite. Because π2 is unbranched, there is no obstruction to analytic continuation of any branch of T −1 along any path on C. Let πD : D → D be an analytic universal covering map. Then πC = π2 ◦ πD is a universal covering map for C with deck transformation group ΓC isomorphic to the fundamental group of C based at p = πC (0). ΓD is a subgroup of ΓC of index 10. ΓC and ΓD both act freely and properly discontinuously on D. Because it has degree 2, π1 is a normal covering, so there is a group GC containing ΓD with index 2 such that the quotient map for GC is π1 ◦ πD . However, GC does not act freely on D; it is a so-called Belyi uniformisation of C.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

96

Consider the ten distinct branches of T −1 in a simply-connected neighbourhood of p in C. Choose lifts of each of these branches and analytically continue them to obtain analytic maps f1 , . . . , f10 : D → D. We study the IFS F obtained from these maps with p1 = · · · = p10 = 1/10. Lemma 3.7. The maps f1 , . . . , f10 are all Lipschitz with some constant ρ < 1, with respect to the hyperbolic metric on D. Proof. Each map fi has critical points at each point of 32 orbits of ΓD in the unit disc (These orbits depend on i.) Since D is compact, it has finite diameter in the hyperbolic metric. Thus each point z ∈ D is within distance diam(D) of a critical point of fi . The present lemma now follows from the branched Schwarz lemma (Lemma 1.14). The monodromy action of the fundamental group of C based at p permutes the set of ten branches transitively because D is irreducible. Thus the maps fi are related to each other by many relations of the form fi = gij ◦ fj ◦ γij , where γij ∈ ΓC and gij ∈ GC . Proposition 3.8. The inverse of the tangential correspondence on the Fermat quartic is asymptotically stable, with exponential convergence with respect to either the Kantorovich or Prohorov metrics associated to the hyperbolic metric on C. Proof. Let ν be any Borel probability measure on C. Then we can find a measure ν˜ supported on a compact fundamental domain L for the group ΓC , such that (πC )∗ ν˜ = ν. From lemma 3.7, there is a compact set L0 ⊃ L that

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

97

is forward-invariant under the IFS F. It is now a standard result that there is an invariant law µ for F and that F∗n ν˜ → µ exponentially as n → ∞, with respect to either the Kantorovich metric or the Prohorov metric on the hyperbolic plane. The projection to C does not increase hyperbolic distances and therefore does not increase the Kantorovich or Prohorov metrics. For an alternative proof not involving the lifts fi , we could argue as follows. T −1 has at least one invariant probability measure µ, as explained in §3.1.5. Carry out the a coupling argument as in our proof of Theorem 3.4, using the Kantorovich metric Kd associated to the hyperbolic metric d on C. Lemma 3.7 leads easily to exponential convergence in the Kantorovich metric. Because the diameter of C is finite, the Kantorovich metric Kd is bi-Lipschitz equivalent to the Prohorov metric by Lemma 1.10. Let us point out that the Fermat quartic example is rather surprising. One might conclude that there should be many correspondences on hyperbolic surfaces with no singular points in one direction, by reasoning along the following lines. If R is a Riemann surface of genus g ≥ 2, then R has many unbranched covers from surfaces of higher genus. If one of these surfaces also has a branched cover of R, then we may compose the inverse of the unbranched cover with the branched cover to obtain an irreducible correspondence T with no forward-singular points. Perhaps we could find such surfaces R by variation within the moduli space of marked surfaces? Unfortunately a dimension count shows that we should not expect to find such surfaces for dimensional reasons alone. Some extra symmetry seems to be required. The Fermat curve has a large automorphism group, of order 96. According to [51], there is only one other smooth plane quartic that has 12

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

98

Figure 3.1: A Maple plot of the image under the map ψ : (X : Y : Z) 7→ X/Y of a random orbit segment of length 7000 for the inverse of the tangential b correspondence on the Fermat quartic X 4 +Y 4 +Z 4 = 0. The map ψ : C → C is the quotient map for a subgroup of order 4 of the automorphism group of the Fermat quartic, branched with index 4 over each of the fourth roots of −1, which explains why the empirical measure looks dense at those points.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

99

hyperflexes and no simple flexes, namely the curve with 24 automorphisms given by  X 4 + Y 4 + Z 4 + 3 X 2Y 2 + X 2Z 2 + Y 2Z 2 = 0 . We might also try looking for pairs of Fuchsian groups such that D/Γ ∼ = D/∆, where Γ is free, ∆ is not free, and Γ and ∆ are commensurable.

3.4.5

Asymptotic stability of the inverse

Theorem 3.9. Let the correspondence T : C → C be forward critically finite but not critically finite, and let A be the union of the forward orbits of all backwardsingular points. Suppose that the Riemann surface X = C \ A is hyperbolic. Then the restriction of T −1 to X is asymptotically stable. Proof. If T has no backward-singular points but there are forward-singular points then the proof is just like the proof we used for the tangential correspondence on the Fermat quartic. If there are no singular points at all but there is an infinite orbit, then C cannot be hyperbolic, because it would have to be weakly arithmetic. Now assume that there are some backward-singular points. Restrict T −1 to X. Lift all branches to the universal cover of X. These give analytic maps which do not increase the hyperbolic distance. There is an invariant measure for T −1 , as explained in §3.1.5. We use the Kantorovich metric coupling argument on X, as in our proof of Theorem 3.4. We need to prove that the Kantorovich distance with respect to modifications of the hyperbolic metric on X is strictly decreased at each

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

100

step. All we need to prove is that the branches of T −1 are not all local hyperbolic isometries. If they were all local isometries then there could be no deficient points for T −1 : X → X (i.e. points in X with at least one T -image in A). From the construction of A, this would imply that A was completely invariant. As X would also contain no forward-singular points, it would follow that the correspondence was critically finite, contrary to the hypothesis.

3.5

Correspondences on elliptic curves

In this section we consider irreducible correspondences on compact Riemann surfaces of genus 1. We begin by applying the Riemann-Hurwitz formula to relate the numbers of singular points to the bidegree. Let C be any Riemann surface of genus 1, which we will think of also as an elliptic curve. Let T be a correspondence of bidegree (m, n) on C. As usual we write D for the desingularised graph of T and gD for the genus of D. The forward-singular points of T are the v1 critical points of the degree n analytic map π1 : D → C, and the backward-singular points of T are the v2 critical points of the degree m map π2 : D → C (all counted with multiplicity). The Riemann-Hurwitz formula tells us that v1 = 2(gD − 1) = v2 . b then the dynamics of T are best As remarked earlier, if T is separable to C b so in this section we will only consider such correspondences studied on C, when they arise in families of non-separable ones. However, we will need to consider correspondences which are separable to other surfaces of genus 1.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

3.5.1

101

Correspondences with no singular points

Suppose that T is an irreducible correspondence with no singular points on a genus 1 surface C, of bidegree (m, n). Then the desingularised graph D of T is an unbranched cover of C, so is also a compact surface of genus 1. Make C and D into elliptic curves by giving them a marked point (called 0), in such a way that π1 : D → C satisfies π1 (0) = 0, where T = π2 ◦ π1−1 . Then π1 is an isogeny, i.e. both an analytic map and a group homomorphism. We may represent D as C/Λ0 , where Λ0 is a lattice in C. The analysis of section 3.2.3 applies, with the commensurable Fuchsian groups G1 and G2 replaced by commensurable lattices Λ1 and Λ2 in C.

5

The map M is now

an automorphism of C rather than H, so it is of the form M : z 7→ az + b. If M is a translation, then it commutes with the translations in Λ1 , so Λ2 = Λ1 and T is merely an automorphism of C. Henceforth suppose that M is not a translation. After conjugating everything by a suitable translation, (which does not change the groups Λi ), we may assume that M is of the form z 7→ az. Thinking of the lattices as subsets of C we have aΛ2 = Λ1 . It need not be the case that m = n (the area argument that we used in the hyperbolic case does not apply because branches of the correspondence need not be local Euclidean isometries, merely similarities). Note that all the covering groups are subgroups of the abelian group of translations of C, which we will write as C. This allows us to carry the algebra further than we could in the hyperbolic case. Indeed, the quotient 5

This application is implicit in [19].

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

102

map πD : C → D = C/Λ0 is actually a homomorphism of abelian groups; we may identify Λ0 = ker πD . Now πD (Λ1 ) and πD (Λ2 ) are subgroups of D of orders n and m respectively, with trivial intersection. These subgroups therefore generate a subgroup K < D of order mn, isomorphic to their direct −1 product.6 The group generated by Λ1 and Λ2 is also a lattice, Λ3 = πD (K),

and it follows that the correspondence T is separable to the quotient surface C/Λ3 . Nevertheless, T need not have valency – for example it could be a non-integer endomorphism of the elliptic curve. Thus we see that Lemma b is replaced by an elliptic curve. In this case, 3.1 no longer applies when C passing to the associated correspondence on C/Λ3 would not help us to study the dynamics because it reduces neither the genus nor the bidegree. Let π1∗ : C → D be the dual isogeny, so that π1∗ ◦ π1 is multiplication by n = deg (π1 ) on D and π1 ◦ π1∗ is multiplication by n on C. Then T = (π2 ◦ π1∗ ) ◦ (π1 ◦ π1∗ )−1 . The factor on the right is the inverse of the isogeny C → C given by multiplication by n. The factor on the left may be expressed z 7→ α(z) + t, where α : C → C is an isogeny, t is a point of C and addition is in the group C. In the case when C does not have complex multiplication, α must be multiplication by an integer m0 . Thus we may express T as multiplication by a rational m0 /n followed by a translation. By changing the choice of 0 ∈ C to a fixed point of T , we can dispense with the translation. Put m0 /n in its 6

In contrast, two commensurable Fuchsian groups may together generate a non-discrete

subgroup of PSL2 (R). For example, Jørgensen’s inequality shows that the commensurable groups corresponding to the third power of the AGM correspondence generate a nondiscrete group.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

103

lowest form p/q. Then the bidegree of T is actually (p2 , q 2 ). It follows that if either m or n is not square then C does have complex multiplication. In the special case of bidegree (2, 2), we must have an isogeny of C of degree 4 which is not simply the composition of an automorphism of C with multiplication by 2. We can analyse these in terms of Λ0 = h1, τ i, where |τ | ≥ 1 and −1/2 < Re(τ ) ≤ 1/2. In terms of these generators, there are only three possibilities for the unordered pair of lattices Λ1 and Λ2 . They must be two of h1, τ /2i, h1/2, τ i and 1, (1 + τ )/2. On the other hand they are related by Λ1 = aΛ2 , where we may assume that |a| = 1 and Im(a) > 0. Since 1 ∈ Λ2 , we have a ∈ Λ1 , so 2a ∈ Λ0 so 2a ∈ {τ − 2, τ − 1, τ, τ + 1, τ + 2}. Each of these possibilities gives us a quadratic equation for a with integer coefficients, √ and we find after some calculation that a must be i or (1 + −3)/2, and that √ aΛ0 = Λ0 . The elliptic curves involved are C = C/h1, 2ii or C = C/h1, −3i respectively. Since we can dispense with the translation by choosing the zero in C to be a fixed point of T , we get finite dynamics in both cases.

3.5.2

Correspondences of bidegree (2,2) on a torus with two singular points in each direction

We want π1 and π2 to be branched covers of degree 2 from a compact Riemann surface R to a torus S, with precisely two critical values each, and therefore precisely two critical points each. The Riemann-Hurwitz formula tells us that R must have genus 2. Then π1 and π2 are quotient maps associated to involutions I1 and I2 of R, with precisely two fixed points each. Since we want f = π2 ◦ π1−1 to be a genuine (2, 2) correspondence and not an automorphism of the torus S, the involutions must be distinct. Now I1 and

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

104

I2 generate a group G = hI1 , I2 i of automorphisms of R, a subgroup of the finite automorphism group Aut(R), of order at least 4. Lemma 3.10. Any two distinct involutions a, b of any compact Riemann surface R generate a dihedral group of automorphisms. Proof. Every element of ha, bi can be expressed as an alternating word in the letters a and b. The alternating words of odd length are all conjugate either to a or to b, hence are not the identity. However, Aut(R) is finite so some even-length word is the identity. Now ab and its inverse ba have the same order, say m. We know m ≥ 2 since a−1 = a 6= b. Then ha, bi is the dihedral group of order 2m. We may view the quotient map πG : R → R/G as an analytic branched cover of degree |G| ≥ 4, branched at the (finitely many) fixed points of G. This makes R/G into a Riemann surface, which the Riemann-Hurwitz formula tells us must have genus 0. Now the quotient map πG factors through S in two ways: πG = π3 ◦ π1 = π4 ◦ π2 , where π3 and π4 are analytic branched covers of degree |G|/2. If |G| = 4, i. e. if I1 and I2 commute, then π3 and π4 are of degree 2, and in fact f = π4−1 ◦ π3 , so f is separable. This construction gives us the family of (2, 2) correspondences on the sphere with four forward-singular and four backward-singular points. Every one of these is M¨obius conjugate to some correspondence given by M ◦ ℘ ◦ T ◦ ℘−1 , where ℘ is the Weierstrass ℘-function for the (unique) torus branched

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

105

over the four forward singular points, T is a torus automorphism (which we may always take to be a translation), and M is a M¨obius map. Composing the same maps but starting instead at the torus, we get a separable (2, 2) correspondence on the torus: ℘−1 ◦ M ◦ ℘ ◦ T . When this is non-degenerate, its graph is a genus 2 surface, and separability tells us that the induced involutions do commute. The reader may wonder at this point whether are there any non-separable (2, 2) correspondences on any torus. We will show that there are, and attempt to classify all such correspondences. To show that a correspondence f is non-separable, it suffices to exhibit distinct points x, y ∈ S such that f (x) and f (y) have distinct but not disjoint sets of values. We have seen above that a non-separable (2, 2) correspondence on a torus S is π2 ◦ π1−1 , where for j = 1, 2, πj is the quotient map for an involution Ij , and where I1 and I2 do not commute. Lemma 3.11. Let R be a compact genus 2 Riemann surface and I a non-trivial involution of R with precisely two fixed points. Write S for the torus R/I obtained by identifying z with I(z) for each z ∈ R, and choose 0 ∈ S so that the branch points of the quotient map πI : R → S are α and −α. Then the involution w 7→ −w on S has two single-valued lifts to R, say J and K. Both are involutions commuting with I, and IJ = JI = K. We can label the lifts so that J has two fixed points and K has six fixed points. Proof. The involution of S induces an involution of the fundamental group of S \ {α, −α} which necessarily preserves the holonomy of the cover by R (this could fail if the cover had more than two sheets); hence the involution

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

106

lifts, and the two lifts are related by the sheet-swapping involution I. One of the lifts has two fixed points over 0 ∈ S while the other swaps the two points over 0. Their squares are both lifts of the identity map on S, with fixed points, and therefore both lifts are involutions. An involution of a genus 2 surface has either 2 or 6 fixed points. The quotient by the group {1, I, J, K} is the Riemann sphere and the Riemann-Hurwitz formula tells us that the quotient map has 10 critical points. Each of those critical points is a fixed point of just one of I, J or K (since they cannot share any fixed points). Thus without loss of generality we may take J to have two fixed points and K to have 6. Since {1, I, J, K} is a subgroup of order four of Aut(R), we find that whenever Aut(R) contains an involution with two fixed points, then 4 divides |Aut(R)|. In particular in order for R to be the graph of a non-separable (2, 2) correspondence on a torus, we must have |Aut(R)| ≥ 8. An involution of a compact Riemann surface R of genus g ≥ 2 is called b which hyperelliptic if it induces a branched covering map of degree 2 to C, happens if and only if it has 2g + 2 fixed points. When g = 2 then R necessarily has a hyperelliptic involution. The following lemma is standard. Lemma 3.12. Let R be a compact Riemann surface of genus g ≥ 2. Then R has at most one hyperelliptic involution. Proof. A hyperelliptic involution presents R as a curve w2 = f (z) where f is a polynomial of degree 2g + 2 with simple roots. The ratios of holomorphic differentials on R generate the subfield of meromorphic functions C(z) ⊂

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

107

C(R), which in turn determines the hyperelliptic involution: two points x, y ∈ R are related by the involution if and only if every function in this subfield takes the same value at x and y. Corollary 3.13. Let R be a compact genus 2 Riemann surface, and suppose that I1 , I2 are two non-commuting involutions of R with precisely two fixed points each. For m = 1, 2, construct Jm , Km associated to Im as in Lemma 3.11, so that K1 , K2 have six fixed points each. Then K1 = K2 , and I1 and I2 descend to give b involutions A1 , A2 of R/K1 = C. Note that I1 I2 is not the identity, nor is it K, since then we would have I2 = I1 K = J1 , which commutes with I1 . Thus A1 A2 does not descend to the identity map on R/K, so A1 6= A2 . Now A1 and A2 may or may not commute, so we have two cases to consider: Lemma 3.14. Assume the conditions of Corollary 3.13. 1. If A1 A2 = A2 A1 then hI1 , I2 i = D8 . The space of conformal classes of compact genus 2 Riemann surfaces with D8 symmetry is one-complex dimensional, and there are just three examples in which the quotient tori R/I1 and R/I2 are isomorphic, the j-invariants being (20)3 in one example and −(15)3 in the other two (complex conjugate) examples. 2. If A1 A2 6= A2 A1 then hI1 , I2 i = D12 . The space of conformal classes of compact genus 2 Riemann surfaces with D12 symmetry is one-dimensional (over C), and there are just five examples in which the quotient tori

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

108

R/I1 and R/I2 are isomorphic, the j-invariants being 2.(2.3.5)3 in one example, and also −(25 )3 and (20)3 , there being two complex conjugate cases of each. Here j is the j-invariant of an elliptic curve, with the normalisation com2

3

−λ+1) mon in number theory, namely j = 28 (λλ2 (λ−1) 2 for an elliptic curve branched

over {0, 1, ∞, λ}. Note that this is 1728 times Klein’s elliptic modular function. Proof. We treat the two cases separately. 1. Suppose A1 and A2 commute. Since I1 and I2 do not commute, we must have I1 I2 = I2 I1 K. Note that K commutes with I1 and with I2 , so I1 I2 I1 I2 = K = I2 I1 I2 I1 , hence ord(I1 I2 ) = 4, and we have D8 as required. The conformal type of R is determined by the conformal b with four branch type of the orbifold quotient R/hI1 , I2 i, which is C points marked. In fact by a M¨obius conjugation we may assume that b as z 7→ −z and that A2 acts as z 7→ 1/z. We still have a A1 acts on C choice of which lift of A1 to call I1 and which to call J1 , and of how to label the lifts of A2 as I2 and J2 . Since we may conjugate by A1 and A2 we lose no generality by taking I1 to fix the two points of R lying b and I2 to fix the two points lying over 1. over 0 ∈ R/K = C, b is branched over six points, which The quotient map R 7→ R/K = C are permuted with no fixed points by A1 and permuted with no fixed points by A2 (because I1 and I2 share no fixed points with K). Since b is determined by its action on three points, an automorphism of C A1 and A2 must act by different but commuting fixed-point-free in-

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

109

volutions of the set of six branch points. These two involutions have one common two-cycle, consisting of the solutions of z = −1/z, i. e. ±i.

The remaining four branch points are x, −x, 1/x, −1/x for

b \ {0, ∞, 1, −1, i, −i}. We may write the quotient map some x ∈ C R/K → R/hI1 , I2 i as φ : z 7→ 14 (z + 1/z)2 , which is a degree 4 map with critical points {0, ∞, ±1, ±i}, and critical values 0, 1, ∞. Now the torus R/I1 is determined by the four critical values of the degree 2 quotient map R/I1 → R/hI1 , Ki = (R/K)/A1 . The quotient map R/K → (R/K)/A1 is realised by z 7→ z 2 . Now, I1 fixes the two points over 0 ∈ R/K, so these two points are critical for R → R/I1 . Hence they are critical for the degree 4 quotient map R → R/hI1 , Ki = b whose critical values are 0, ∞, −1, x2 , 1/x2 . Thus (R/K)/A1 = C, R/I1 is the torus obtained as a degree 2 branched cover of the sphere branched over ∞, −1, x2 , 1/x2 . Similar considerations apply to R/I2 . This time the quotient map (R/K) → (R/K)/A2 is represented by z 7→ 21 (z+1/z) and then the critical values for the degree 4 quotient map b are 1, −1, 0, 1 (x + 1/x), −1 (x + 1/x). R → R/hI2 , Ki = (R/K)/A2 = C 2 2 However 1 is accounted for by the fact that R → R/I2 is critical at the two points over 1 ∈ R/K. We obtain a non-separable dynamical correspondence of bidegree (2, 2) when R/I1 and R/I2 are isomorphic. This happens precisely when their j-invariants are equal, i. e. when the cross-ratios     1 −1 2 2 ∞, −1, x , 1/x , −1, 0, (x + 1/x), (x + 1/x) 2 2 b generated by z 7→ 1 − z lie in the same orbit of the action of S3 on C

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

110

and z 7→ 1/z. We write w = 14 (x + 1/x)2 for the co-ordinate on the quotient sphere R/hI1 , I2 i. Then j(R/I1 ) =

64(4w − 3)3 (w − 1)

j(R/I2 ) =

64(3 + w)3 . (w − 1)2

and

Comparing these shows that j(R/I1 ) = j(R/I2 ) if and only if 64w3 − 209w2 + 243w − 162 = 0 . The roots of this polynomial are w = 2,

√ 9(9 ± 5 −7) . w= 128

The corresponding tori have j-invariants (20)3 and (−15)3 respectively. The torus with j-invariant (20)3 is .D √ E C 1, i/ 2 . The torus with j-invariant (−15)3 is √ + * 1+i 7 . C 1, 2 2. Consider the action of A1 and A2 permuting the six branch points of the quotient map R → R/K. Neither fixes any of those points (since I1 and K share no fixed points, for example). We assume now that A1 and A2 do not commute; in particular A1 A2 cannot fix all six points since it is a non-identity M¨obius map. The only possibility is that A1 A2 has order 3 with no fixed points; hence (I1 I2 )3 is either 1 or K. It can be 1, but then we may replace I2 with I2 K, which also has two fixed points and does not commute with I1 , so as to get K.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

111

The elliptic curves arising here are rather special, being five of the 13 elliptic curves that have complex multiplication and integer j-invariant. These are the elliptic curves whose ring of endomorphisms is bigger than Z but is nevertheless a principal ideal domain. They arise from quadratic number fields with class number equal to 1. The fact that these curves have complex multiplication may be explained by the action on the Jacobian, if we can check that the correspondences do not have valency. Perhaps an analysis of the actions of I1 and I2 on the Jacobian of R would lead to a more conceptual explanation of why we get these particular curves. At the moment we can only remark that it is no surprise because these are in some sense the simplest elliptic curves. The affine linear examples of irreducible (2, 2) correspondences with no singular points also had complex multiplication with integer j-invariants: the lattice h1, 2ii gives j = (66)3 , while the √ lattice h1, −3i gives j = 2.(30)3 .

3.6

Dynamics of a D8 example

This section contains the preparatory work for a numerical exploration of one of the examples computed in the previous section. To study the dynamics numerically we need a fairly precise and rapid means of calculation – repeated calls to functions that evaluate elliptic integrals or theta functions are to be avoided. Instead we represent the genus 2 surface as a hyperelliptic curve, find the involutions I1 and I2 explicitly, and represent their quotients and quotient maps explicitly as hyperelliptic curves and algebraic maps. Fi-

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

112

nally we must represent algebraically the isomorphism between the two tori, using the known M¨obius map sending the branch points of one to the branch points of the other. To compose with a translation of the torus we need the elliptic curve addition formula, which again is an algebraic map. For a (2, 2) correspondence one should be able to get the hard computational work down to finding one square root per iteration. Of course, the validity of any iterative computations depends on the stability of the system, which is one reason for being interested in results such as Theorem 3.9.

3.6.1

Algebraic equations for the (2, 2) correspondences √

on the torus C/ 1, i/ 2

We follow the notation and normalisation from the proof of Lemma 3.14. The genus 2 surface R has hyperelliptic involution K, with R → R/K branched √ over the six points ±i, ±1 ± 2, i. e. R is the compactification of the curve in C2 given by y 2 = (x2 + 1)(x2 − 2x − 1)(x2 + 2x − 1). We have involutions I1 : (x, y) 7→ (−x, y) , I2 : (x, y) 7→ (1/x, y/x3 ) b by forgetting the second inducing the involutions A1 and A2 on R/K = C co-ordinate. The quotient map R → R/I1 is represented as (x, y) 7→ (x2 , y),

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

113

mapping onto the torus R/I1 represented as the curve t2 = (z + 1)(z 2 − 6z + 1), √ i. e. a double cover of the z-plane branched over z = −1, 3 ± 2 2, ∞. The quotient map R → R/I2 is represented as   1 1 2 (x + 1/x), (y/x + y/x ) , (x, y) 7→ 2 4 mapping onto the torus R/I2 represented as the curve t2 = z(z 2 − 2)(z + 1), √ i. e. a double cover of the z-plane branched over z = 0, ± 2, −1. The M¨obius map that sends the latter set of branch points to the former is z 7→

z−1 , z+1

which induces a map R/I2 → R/I1 given by

√  z − 1 t −8 (z, t) 7→ , z + 1 (z + 1)2 √ n. b. The choice of a value for −8 corresponds to a choice of one of the 

two isomorphisms R/I2 → R/I1 covering the M¨obius map z 7→

z−1 . z+1

It will

be convenient to take the (unique) point of R/I1 over z = ∞ to be the zero for the group law on R/I1 , because then ζ 7→ −ζ on the torus is represented by (z, t) 7→ (z, −t), i. e. by the induced action of K on R/I1 . Then the two isomorphisms are each other’s negative. Now we compose these maps to get a (2, 2) correspondence f0 : R/I1 → R/I1 , given by  (z, t) 7→

! √ √ 2 1− z t −8 √ √ , . 1+ z (1 + z)3

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

114

The correspondence f0 has two forward singular points, (0, ±1). Each √ of these has only one image, namely (0, 1) 7→ (1, −8) and (0, −1) 7→ √ (1, − −8). The point over z = ∞ is not forward-singular since R 7→ R/I1 is branched over ∞. The two points (0, 1) and (0, −1) are also the two critical values of f0 (i.e backward-singular points). They are one of the images of √ √ each of (1, −8) and (1, − −8), the other image of each of these being the point over z = ∞, which itself maps back to those two points. The two branches of f0 take the same value at (−1, 0), which is in fact a fixed point. Thus there are three points in the torus where f0 has only one value. Finally we make explicit the group law on R/I1 , having chosen the point over z = ∞ as the zero. Three points sum to zero if they are collinear. Thus we get the addition law (z0 , t0 ) ⊕ (z1 , t1 ) = (z2 , t2 ), where  z2 = 5 +

t1 − t0 z1 − z0

2 − z1 − z0 , 

t2 = −t0 − (z2 − z0 )

t1 − t0 z1 − z0

 .

We will use Greek letters to denote points of R/I1 . We now have a family of correspondences fζ , one for each ζ ∈ R/I1 . If ζ = (z0 , t0 ), we have  fζ : (z, t) 7→

! √ √ 2 t −8 1− z √ √ , ⊕ (z0 , t0 ). 1+ z (1 + z)3

We had two choices for the isomorphism R/I2 → R/I1 ; let’s call the √ √ √ √ resulting families fζ (taking −8 = i 8) and gζ (taking −8 = −i 8).

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

115

Then fζ = −g−ζ . The correspondence f0 is odd with respect to the group law, so fζ (−θ) = ζ − f0 (θ) = −f−ζ (θ), (by which we mean they have the same pairs of values). Thus fζ and f−ζ are conjugate (via K). The same holds for the family gζ , so fζ = gζ ◦K = K ◦g−ζ , and fζ ◦ fζ = gζ ◦ g−ζ .

3.6.2

Dynamics of f0

As is evident from its algebraic formula, f0 has a factor correspondence obtained by ignoring the second co-ordinate. The factor is the (2, 2) corresponb given by dence on C

√ 2 1− z √ z 7→ . 1+ z This correspondence is separable; in fact it has the M¨obius involution w 7→ 

w+3 w−1

as a factor via the map z 7→ 12 (z + 1/z). The correspondence f02 has two

algebraic components: (z, t) 7→ (z, −t), occurring in two ways, and √ (z, t) 7→ (1/z, ±t/z z). So even iterates of (z, t) eventually take just four values, and odd iterates eventually take just four values (possibly fewer for certain points (z, t)). This is rather similar to the situation in §3.5.1. However, inserting a generic translation of the torus destroys the factor, and allows the dynamics to be much more complicated.

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

116

Note that f0 is a critically finite correspondence, so after removing the complete critical orbit we could lift to the hyperbolic plane H and describe the correspondence by means of a discrete subgroup G of SL2 (R) and a commensurate conjugate group A−1 GA (as Bullett did for critically finite b [19]). It would be interesting to describe these groups correspondences on C explicitly. As a result of numerical experiments we discovered a rather nice geometrical way to describe the action of f0 . Slit the torus by removing all the points (z, t) with z ∈ [0, ∞]. This corresponds to cutting along the boundary of a certain rectangular fundamental domain in the covering plane, then making two short slits part of the way in towards the centre of the rectangle from the midpoints of the long sides. The vertices of this rectangle all √ correspond to the same point of the curve, namely (3 + 2 2, 0). The centre of the rectangle is the coincident fixed point (−1, 0). The short slits end at the forward-singular points of f0 , which are (0, −1) and (0, 1), and meet the √ long sides of the rectangle at (3 − 2 2, 0). Now each branch of f0 consists of an automorphism of this slit-rectangle, conjugate to a rotation of the unit disc; both branches fix the central point of the rectangle, and they rotate by π/4 or 3π/4 about that point. The two short sides of the rectangle and each half of each long side represent one eighth of the boundary of the domain (in terms of harmonic measure with respect to the central point), and both sides of a slit together represent a further one eighth. It is rather amusing to see how the two branches fit with each other when we cross a boundary segment of the slit fundamental domain. Consider the equivalence relation whose equivalence classes are grand

CHAPTER 3. HOLOMORPHIC CORRESPONDENCES

117

orbits of f0 . The quotient map for this relation is a degree 8 meromorphic function. In fact it is a Belyi map: we can arrange for it to be branched only over 0, 1, ∞ and for the slits in the previous paragraph to form some of the edges of the associated dessin d’enfant on the elliptic curve. f0 has a coincident fixed point, i.e. a point which is fixed for both branches of f0 ; taking this to be the zero of the elliptic curve, we can easily compute that the action on the Jacobian (which is naturally identified the elliptic √ curve itself) is multiplication by i 2. In particular f0 does not have valency. The topological description of f0 in terms of the Belyi map defined above does in fact determine uniquely the complex analytic structure of the elliptic curve. Likewise a topological description of the correspondence f0 allows us to compute the action on the Jacobian (which is a homological invariant) and this again determines the complex analytic structure uniquely. It is possible to describe topological correspondences in terms of pairs of topological branched covering maps. However, these need not always be realised by any holomorphic correspondence. It would be interesting to understand the obstruction. Now choose the zero in the elliptic curve to be the midpoint of the long √ side of the slit, which is the point (3 − 2 2, 0) in our algebraic description. √ The composition of f0 followed by multiplication by i 2 gives us a forwardcritically finite correspondence. This is our first explicit example for Theorem 3.9 in which there are singular points in both directions.

Chapter 4 Area distortion of polynomial mappings This chapter has been accepted for publication in the Bulletin of the London Mathematical Society under the title ‘The area of polynomial images and pre-images’.

4.1

Introduction

A lemniscate is the level set of a complex polynomial, i.e. E(p, r) = {z ∈ C : |p(z)| = rn } , where p is a polynomial of degree n over C. A well-known example is Bernoulli’s lemniscate, the degree 4 plane curve |z 2 − 1| = 1. The region enclosed by a lemniscate is a sublevel set for the modulus of a polynomial, and it is natural to ask how large this set can be. To take account of scaling, 118

CHAPTER 4. AREA DISTORTION OF POLYNOMIAL MAPPINGS 119 we take p to be monic. With this normalisation, P´olya proved in [59] that the area (two-dimensional Lebesgue measure dA) enclosed by the lemniscate E(p, 1) is bounded by a constant that does not depend on the choice of p. Theorem 4.1 ((P´ olya’s inequality)). Let p be a monic complex polynomial of degree n and let D be a closed disc in C. Then the Euclidean area of  1/n p−1 (D) is at most π Area(D) , with equality only when p : z 7→ a(z −b)n +c π and the centre of D is the point c, which is the unique critical value of p. Here we have made a slight reformulation of P´olya’s original statement, intended to suggest a generalisation to an arbitrary measurable set of finite area, in place of the disc D. This generalisation is the main result of this chapter. Theorem 4.2. Let p be a monic polynomial of degree n over C. Let K be any measurable subset of the plane. Then  1/n Area(K) −1 Area(p (K)) ≤ π , π with equality if and only if K is a disc, up to a set of measure zero, and p has a unique critical value at the centre of that disc. In fact we will establish the following bound on the area of the image of a set under a monic polynomial mapping. Theorem 4.3. Let p be a monic polynomial of degree n over C, and L be any measurable subset of the plane. Define the multiplicity n(z, p, L) to be the number of p-pre-images of z in L, counted according to their valency. Then the area of p(L) counted with multiplicity satisfies n  Z Z Area(L) 0 2 , n(z, p, L) dA(z) = |p (z)| dA(z) ≥ nπ π C L

CHAPTER 4. AREA DISTORTION OF POLYNOMIAL MAPPINGS 120 with equality if and only if L is a disc (up to a set of measure zero) and p has a unique critical value at the centre of that disc. To deduce Theorem 4.2 from Theorem 4.3, take L = p−1 (K), which maps onto K with multiplicity n everywhere, so that Z 1 Area(K) = |p0 (w)|2 dA(w) . n p−1 (K) A survey of area estimates for lemniscates has recently been given by Lubinsky [48], with a view towards applications in the convergence theory of Pad´e approximation. In [31], Eremenko and Hayman address the related problem of bounding the length of E(p, r). Fryntov and Rossi [33] have obtained a hyperbolic analogue of P´olya’s inequality, giving a sharp upper bound for the hyperbolic area of the pre-image of a hyperbolic disc under a finite Blaschke product. This raises the question of finding analogues for finite Blaschke products of theorems 4.2 and 4.3.

4.2

Logarithmic capacity

In this section and the next we outline the machinery that we will use to prove Theorem 4.3. Logarithmic capacity and condenser capacity are powerful tools of potential theory with many alternative descriptions. In order to make this paper as accessible as possible, we describe them only in terms of polynomials and rational functions. The results in this section are all classical, but the author does not know of one comprehensive reference. For a clear introduction to potential theory in C, including proofs of our statements about logarithmic capacity, see [62]. Further information about the capacity

CHAPTER 4. AREA DISTORTION OF POLYNOMIAL MAPPINGS 121 and modulus of a condenser may be found in [4, 5, 39, 65], and the appendix of [67]. Ahlfors [2, Ch. 4] casts condenser capacity as a conformal invariant of extremal length type, called extremal distance. Given a compact subset K ⊂ C and a polynomial q, we write kqkK = max |q(z)| . z∈K

Let M be the set of all monic complex polynomials. The logarithmic capacity of K is cap (K) = inf{(kqkK )1/ deg q : q ∈ M} Logarithmic capacity may also be characterised as the unique non-negative real function of compact sets satisfying the following four conditions.  1. cap D = 1, where D is the closed unit disc in C. 2. (monotonicity) If K1 ⊂ K2 then cap (K1 ) ≤ cap (K2 ). 3. (outer continuity) If Kn ↓ K then cap (Kn ) ↓ cap (K). 4. If p(z) = an z n + . . . a0 is a complex polynomial with an 6= 0, then −1

cap (p (K)) =



cap (K) |an |

1/n .

Thus for any monic polynomial p, the capacity of the lemniscate E(p, r) is r. Using a linear polynomial in condition (4) we see that cap (aK + b) = a cap (K) , so logarithmic capacity is a one-dimensional measurement of the size of K. If d is the diameter of K then cap (K) ≤ d/2, and if K is connected then

CHAPTER 4. AREA DISTORTION OF POLYNOMIAL MAPPINGS 122 cap (K) ≥ d/4. Thus it is not surprising that there is an ‘isoperimetric’ inequality comparing logarithmic capacity and area. A compact set K ⊂ C is called polar when cap (K) = 0. Theorem 4.4. For any compact set K ⊂ C, Area(K) ≤ π cap (K)2 , with equality if and only if K is the union of a closed disc and a polar set. For a proof using a simple kind of symmetrization, see [62, Thm. 5.3.5]. Because of condition (4) above, Theorem 4.1 is the special case of this isoperimetric inequality in which K is restricted to be a lemniscate. In fact P´olya first proved Theorem 4.1 by a clever use of Gronwall’s area formula then used it to deduce Theorem 4.4 [48, 59]. To make this deduction, consider the ˜ = C \ U , where U is the unbounded component of C \ K. The filled-in set K ˜ = cap (K). On the other maximum modulus theorem shows that cap (K) ˜ by approximathand the following result shows that we can estimate cap (K) ˜ from outside by filled-in lemniscates, whose areas are bounded above ing K in Theorem 4.1. Theorem 4.5 ((Hilbert’s Lemniscate Theorem)). Let K be a compact subset of C such that C\K is connected, and let V be any open neighbourhood of K. Then there exists a complex polynomial q such that |q(z)| >1 kqkK

for all z ∈ C \ V .

We can use the scaling properties of logarithmic capacity to formulate a scale-invariant version of Theorem 4.2. For a non-polar compact set K,

CHAPTER 4. AREA DISTORTION OF POLYNOMIAL MAPPINGS 123 define Area(K) . π cap (K)2 Then ρ(K) is a measure of the roundness of K: from Theorem 4.4 we find ρ(K) =

that ρ(K) ∈ [0, 1] and that ρ(K) = 1 if and only if K is a full-measure subset of a disc. Divide both sides of the inequality of Theorem 4.2 by cap (K)2/n = cap (p−1 (K))2 to obtain the following theorem. Theorem 4.6. If p is any complex polynomial of degree n, not necessarily monic, and K is any non-polar compact subset of the plane, then ρ(p−1 (K)) ≤ ρ(K)1/n . This is sharp for each value of ρ(K). Indeed, any value of ρ(K) in [0, 1] can be achieved by taking K = [0, 1] ∪ {z ∈ C : |z| ≤ α} and p : z 7→ z n . This gives equality in Theorem 4.2 and hence also in Theorem 4.6.

4.3

Condenser capacity

Definition. A plane condenser is a pair (K, L) of disjoint compact subsets b of the Riemann sphere C. b → R admissible for the condenser We call a continuous function f : C (K, L) if f = 0 on K, f = 1 on L and f is continuously diffferentiable on b C\(K ∪L). The condenser capacity cap (K, L) is the infimum over admissible f of the Dirichlet integral 1 D(f ) = 2π

Z

|∇f (z)|2 dA(z) .

C

Caveat: in some works the condenser capacity is given as half of this. Condenser capacity has the following properties:

CHAPTER 4. AREA DISTORTION OF POLYNOMIAL MAPPINGS 124 b \ D(0, S)) = 1/(log S − log R), 1. cap (D(0, R), C where D(0, R) is the open disc of radius R about 0. 2. (monotonicity) If K1 ⊂ K2 and L1 ⊂ L2 then cap (K1 , L1 ) ≤ cap (K2 , L2 ). 3. (outer continuity) If (Kn , Ln ) is a sequence of condensers such that Kn ↓ K and Ln ↓ L as n → ∞, then cap (Kn , Ln ) → cap (K, L). 4. (conformal invariance) Suppose that (E, F ) and (K, L) are condensers, b \ (E ∪ F ) and V = C b \ (K ∪ L). Suppose that ϕ : U → V is with U = C analytic and n-valent, i.e. every point in V has precisely n pre-images in U (counted with multiplicity). Finally suppose that ϕ(z) → K as z → E in U and ϕ(z) → L as z → F in U . Then cap (E, F ) = n cap (K, L) . To prove (4), approximate K and L by their closed -neighbourhoods K and L , whose boundaries are regular for the Dirichlet problem. Then there is a unique extremal function f in the definition of cap (K , L ). It is the b \ (K ∪ L ). Consider the unique admissible function that is harmonic on C pulled-back condenser (E 0 , F 0 ) = (E ∪ ϕ−1 (K ), F ∪ ϕ−1 (L )). The pullback f ◦ ϕ extended by 0 on E and by 1 on F is admissible for (E 0 , F 0 ) and b \ (E 0 ∪ F 0 ). Therefore it is the unique extremal function for harmonic on C (E 0 , F 0 ). We have Z D(f ◦ ϕ) =

|∇(f ◦ ϕ)(z)|2 dA(z)

0 ∪F 0 ) b C\(E

Z =

|(∇f )(ϕ(z))|2 |ϕ0 (z)|2 dA(z)

0 ∪F 0 ) b C\(E

Z = n b C\(K  ∪L )

|∇f (w)|2 dA(w)

=

n D(f ) .

CHAPTER 4. AREA DISTORTION OF POLYNOMIAL MAPPINGS 125 Finally take the limit as  & 0 and apply outer continuity of condenser capacity. An easy consequence of the conformal invariance is that for any rational function R of degree n, we have 

minL |R| maxK |R|

1/n ≤ exp(1/cap (K, L)) .

In fact the right-hand side is the supremum of the left-hand side as R ranges over all rational functions (see [34]). This is the analogue for condensers of our initial definition of logarithmic capacity. Whereas logarithmic capacity scales one-dimensionally, the condenser capacity is invariant under scaling, so we cannot use it to estimate the area of K or L. However, in the case that ∞ ∈ L we can estimate the ratio of the b \ L: Euclidean areas of K and C Theorem 4.7. [Carleman, 1918] Let (K, L) be a condenser with ∞ ∈ L. Then 2 ≤ log cap (K, L)

b \ L) Area(C Area(K)

! ,

b \ L and K are concentric discs in the plane, up with equality if and only if C to the addition of closed sets of logarithmic capacity zero to K and L. The proof of Carleman’s inequality uses the fact that the Dirichlet integral D(f ) does not increase when f is replaced by its Schwarz symmetrization. This is the function S(f ) whose superlevel sets are concentric discs with the same Euclidean area as the corresponding superlevel sets of f . For details, see the classic book of P´olya and Szeg¨o, [60], or [5] for a more recent account and a generalisation to variable metrics.

CHAPTER 4. AREA DISTORTION OF POLYNOMIAL MAPPINGS 126 The logarithmic capacity can be recovered from condenser capacities: # " 1 − log R . − log cap (K) = lim b \ D(0, R)) R→∞ cap (K, C This is easy to prove using the characterisations of logarithmic capacity and condenser capacity in terms of minimal energy of Borel measures. It reveals the isoperimetric theorem for logarithmic capacity as a limiting case of Carleman’s isoperimetric inequality for condenser capacity.

4.4

Proof of Theorem 4.2

Lemma 4.8. For any complex polynomial g of degree d, Z 2x |g(w)| 1{|g(w)|≤x} dA(w) ≥ Area({w ∈ C : |g(w)| ≤ x}) . d+2 C Proof. By conformal invariance of condenser capacity, we have     d b \ B(0, x) , g −1 (B(0, s)) = cap g −1 C . (log x − log s) Theorem 4.7 gives  s 2/d Area ({w ∈ C : s ≤ |g(w)| ≤ x}) ≥ 1− , Area ({w ∈ C : |g(w)| ≤ x}) x so Z

Z |g(w)| 1{|g(w)|≤x} dA(w) =

C

x

Area ({w ∈ C : s ≤ |g(w)| ≤ x}) ds Z x  s 2/d  ≥ Area ({w ∈ C : |g(w)| ≤ x}) 1− ds x 0 2x = Area({w ∈ C : |g(w)| ≤ x}) . d+2 0

CHAPTER 4. AREA DISTORTION OF POLYNOMIAL MAPPINGS 127 Now fix a monic polynomial p and A > 0. Among all measurable sets K satisfying Area(K) = A, the Dirichlet integral Z |p0 (w)|2 dA(w) K

is minimised when K is the sublevel set Kt = {w ∈ C : |p0 (w)|2 ≤ t}. Here t is determined uniquely by the condition Area(Kt ) = A. The polynomial z 7→ (p0 (z)/n)2 is monic, with degree 2n − 2, so Theorem 4.1 gives  A = Area(Kt ) ≤ π

π(t/n2 )2 π

1/(2n−2) .

Rearranging this we have  n−1 A . t≥n π 2

Now we apply Lemma 4.8 to the polynomial g = (p0 )2 to obtain Z Z 0 2 |p (w)| dA(w) = |p0 (w)|2 1{|p0 (w)|2 ≤t} dA(w) Kt

C

2t tA Area (Kt ) = 2n   n n A . ≥ nπ π ≥

For equality, we must have equality in our application of P´olya’s inequality, so p must be p : z 7→ (z − b)n + c, and K can differ from disc Kt at most by a set of 2–dimensional Lebesgue measure zero. This completes the proof of Theorem 4.3.

Chapter 5 Smale’s Mean Value Conjecture Smale’s mean value conjecture is a well-known inequality constraining the location of critical points and critical values of a polynomial mapping the complex plane into itself. We give an algebraic proof that for any isolated local extremum for Smale’s mean value conjecture, all the objective values are equal. We also generalise Smale’s conjecture to rational maps of the Riemann sphere, proving a version which only fails to be best possible by a constant multiplicative factor. We also discuss the special case of rational maps all of whose critical points are fixed, giving a construction based on the Newton–Raphson method.

5.1

Smale’s Mean Value Conjecture

Let p be any polynomial with coefficients in C. We say that ζ ∈ C is a critical point of p if p0 (ζ) = 0. Then p(ζ) is the corresponding critical value. In 1981 Stephen Smale proved the following result about critical points and

128

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

129

critical values of polynomials. Theorem 5.1. Let p be a non-linear polynomial and z any given complex number. Then there exists a critical point ζ of p such that p(ζ) − p(z) 0 ζ − z ≤ 4 |p (z)| . p(ζ)−p(z) The values of (ζ−z)p 0 (z) as ζ ranges over all critical points of p are called the objective values, and any critical point ζ which minimises p(ζ)−p(z) ζ−z among all critical points is called an essential critical point with respect to z. Smale asked whether the constant 4 could be reduced to 1, or in the case of a polynomial of degree n, to 1 − 1/n, which would be best possible in view of the example p(z) = z n + z. Thus Smale’s mean value conjecture is the statement that p(ζi ) − p(z) n − 1 ≤ . min (ζi − z)p0 (z) n Various special cases have been solved, but at present the best known result that applies to all polynomials replaces 4 by 4(n−2)/(n−1) [15]. It is widely assumed that for each n ≥ 2 there exists at least one polynomial of degree n that achieves the best possible constant for polynomials of degree n. We will refer to such extremal polynomials as globally maximal polynomials. In fact it is conjectured that the polynomial p(z) = z n + z is (after normalisation) the unique globally maximal polynomial. David Tischler has shown that this polynomial is an isolated local maximum (see [72] and [73]).

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

5.1.1

130

Algebraic results on Smale’s Conjecture

We will work in the family Gn = {monic polynomials p of degree n with p(0) = 0 and p0 (0) = 1}. Each such polynomial is of the form p(z) = z n + an−1 z n−1 + · · · + a2 z 2 + z, and the vector (a2 , . . . , an−1 ) ∈ Cn−2 gives co-ordinates for the parameter space. Note that by pre- and post-composition with affine maps, every polynomial of degree n can be put into a unique such form, and that these transformations do not change the objective values in Smale’s mean value conjecture. Proposition 5.2. For each n ≥ 2 there exists a polynomial Qn such that (λ1 , . . . , λn−1 ) is the vector of objective values λi =

p(ζi ) ζi p0 (0)

for some polynomial p of degree n if and only if (λ1 , . . . , λn−1 ) ∈ X \ T , where X is the affine hypersurface in Cn−1 defined by Qn = 0 and T is some proper subvariety of X. In particular the set of possible vectors of objective values is an open subset of X in the usual topology. Remark: the related problems of constructing a polynomial with given critical points or with given critical values do not have this feature: in those problems, every vector of objective values is possible (see [14]).

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE Proof. Let q be the polynomial given by q(z) = objective values λi =

p(ζi ) ζi

p(z) , z

131

where p ∈ Gn . The

are the values of t for which q(z) − t and p0 (z) have

a common root in z, so they are the roots of the resultant   n−1 Y n−1 Y p0 (z) R(t) = Res q(z) − t, = (αi − ζj ), n i=1 j=1 where the αi are the roots of q(z) = t as a polynomial in z, and the ζj are the roots of p0 (z) = 0, both repeated according to multiplicity. Note that the multiplicity of roots of R(t) counts the λi appropriately. We can calculate the resultant R(t) from Bezout’s or Sylvester’s determinant, so the coefficients of R(t) are polynomials in the coefficients ai . One can easily check that the leading term of R(t) is (−1)n−1 tn−1 , independent of the choice of p. (This is a useful consequence of our normalisation of p). Let sk be the sum of the products of the λi taken k at a time (repeating the λi according to their multiplicity). These elementary symmetric functions are given by the coefficients of R(t) taken with appropriate signs. Thus for k = 1, . . . , n − 1 there are polynomials σk in n − 2 variables such that sk = σk (a2 , . . . , an−1 ). Now there must be an algebraic relation Rn (σ1 , . . . , σn−1 ) = 0 between the polynomials σk in the polynomial ring C[a2 , . . . an−1 ], since there are n − 1 of them. (If not, the polynomials σi would generate a subring of C[a2 , . . . , an−1 ] of (Krull) dimension n − 1, which is impossible.) In fact we can choose Rn to be irreducible; later we will check that this implies that the map ϕ : (a2 , . . . , an−1 ) 7→ (σ1 (a2 , . . . an−1 ), . . . , σn−1 (a2 , . . . , an−1 ) is a dominant morphism. On the other hand there is no non-trivial algebraic relation between the sk in the polynomial ring generated by the variables λi . Therefore

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

132

substituting the expansions of the sk gives the required non-trivial algebraic relation between the λi , say Qn (λ1 , . . . λn−1 ) = 0. For the converse, we will show in Lemma 5.3 that the image of ϕ is (n − 2)dimensional, then the irreducibility of Rn implies that ϕ is dominant. Therefore the image of ϕ is W \ S, where W is the affine hypersurface defined by Rn = 0 and S is some proper subvariety of W . In particular the image of ϕ is an open subset of W in the usual topology (as well as in the Zariski topology). Now the map that sends the vector of roots of a monic polynomial of degree (n−1) to the corresponding vector of elementary symmetric functions is a finite surjective morphism ψ : Cn−1 → Cn−1 , of degree (n − 1)!, so the quasi-affine variety ψ −1 (W \ S) = X \ T has the properties described by the theorem. We may also wish to include in S the image under ϕ of the vectors (a2 , . . . , an−1 ) corresponding to polynomials with repeated roots; this is conQn−1 λi = 0. tained in the subvariety of W defined by i=1 Lemma 5.3. The image of ϕ is (n − 2)-dimensional. Proof. We consider the vector of values ci = p(ζi )/ζi as functions of the critical points ζ1 , . . . , ζn−1 , after dropping the requirement that p0 (0) = 1. We claim that the mapping χ : (ζ1 , . . . , ζn−1 ) → (c1 , . . . , cn−1 ) is a dominant morphism from Cn−1 to Cn−1 . In fact we will show that there exists a polynomial p0 at which the derivative of χ has rank (n − 1), such that Qn−1 the product function π : (ζ1 , . . . , ζn−1 ) 7→ i=1 ζi takes the value 1 and has

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

133

non-zero derivative at p0 . The image of ϕ is the image of χ restricted to the level set (π = 1), so the implicit function theorem yields the lemma. Let rk be the k-th elementary symmetric function of the critical points. We may take p to be monic, so that 0

p (z) = n

n−1 Y

 (z − ζi ) = n z n−1 − r1 z n−2 + r2 z n−3 − · · · + (−1)n−1 rn−1 ,

i=1

p(z) z n−1 z n−2 z0 = − r1 + · · · + (−1)n−1 rn−1 . nz n n−1 1 In particular, p(ζi )/(nζi ) is a polynomial function of the critical points. At the point Z0 = (1, 1, , . . . , 1) we can compute the derivative matrix of χ as follows. For i 6= j,      n−2 n−2 n−2 ∂ p(ζi ) 0 1 2 = − + − + · · · + (−1)n−1 ∂ζj nζi n−1 n−2 n−3 (−1)n−1 . = n−1

n−2 n−2



1

For i = j, " #      n−2 n−2 n−2 n−2 ∂ p(ζi ) = − 0 + 1 − 2 + · · · + (−1)n−1 n−2 + ∂ζi nζi n−1 n−2 n−3 1       n−1 1 n−1 1 − 1− + ... 1− n 0 n−1 1    1 n−1 n−1 · · · + (−1) 1− 1 n−1 n−1 n−1 (−1) (−1) = + . n−1 n It follows that the determinant of the derivative of χ at Z0 is non-zero (because

1−n n

is not a root of the characteristic polynomial of the (n − 1) by

(n − 1) matrix each of whose entries is 1.

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

134

As a consequence of Proposition 5.2, we obtain are able to obtain Theorem 5.4, which may be a useful step towards a proof of Smale’s conjecture. Bear in mind that Tischler has shown that for each degree n there is at least one polynomial locally strictly maximal in Gn for Smale’s conjecture, namely i) z n +z, (see [72] and [73]) (i. e. it is an isolated local maximum for min ζp(ζ ). 0 i p (0) We know that the vector of objective values for any polynomial p lies inside a relatively open subset of the analytic set X \ T . Recall that we call a critical is minimal among all critical points of p. point ζ of p essential when p(ζ) ζ We are not aware of any proof in the literature of the following highly plausible conjecture, although it is likely that the quasiconformal deformation method of Tischler could be adapted to prove it, if it were known that the best possible constant for Smale’s mean value conjecture is a strictly increasing function of the degree. Conjecture 2. The (globally) maximal polynomials of degree n are contained in a compact region of the parameter space Gn . Theorem 5.4. For any isolated local maximal polynomial of degree n, all the objective values in Smale’s mean value conjecture have equal modulus, i. e. every critical point is essential. If conjecture 2 is true then the same applies to any globally maximal polynomial. Even without conjecture 2, if there exists a globally maximal polynomial of degree n, then there exists a globally maximal polynomial for which all the objective values have equal modulus. The usefulness of this theorem is that (subject to proving conjecture 2) it reduces proving Smale’s mean value conjecture to proving it for the special

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

135

case where all the objective values are equal in modulus. This is a condition that might be suitable for attack by results of geometric function theory. Proof. Suppose that pˆ is a maximal polynomial for which the moduli of the ˆ i are not all equal; in fact suppose that the essential critical objective values λ points are ζ1 , . . . ζk , (repeated according to multiplicity), where k ≤ n − 2. ˆ1, . . . λ ˆ n−1 ) ∈ Cn−1 . Consider the complex line L in Cn−1 Write v0 = (λ ˆ 1 , . . . , xn−2 = λ ˆ 2 . Suppose that L were contained in X, (the given by x1 = λ analytic hypersurface Qn = 0). Then L ∩ T must be finite since v0 ∈ L \ T . But L ∩ (X \ T ) cannot be all of L \ T because then all points of L \ T outside a disc in L (and including v0 ) would come from polynomials achieving the same minimum objective modulus in Smale’s conjecture as v0 , contrary to the hypothesis that pˆ is a locally maximal polynomial. If conjecture 2 is true, then because ϕ is continuous and ψ is proper, this implies that the vectors of objective values for globally maximal polynomials of degree n are also contained in a compact set, which again shows the impossibility of L ⊂ X. The Weierstrass Preparation Theorem has the following consequence. The zero locus of an analytic function f (z1 , . . . , zn−2 , w), not vanishing identically on the w-axis, projects locally onto the hyperplane (w = 0) as a finite-sheeted cover branched over the zero locus of an analytic function. (See [37], chapter 0.1, for details). Applying this to f = Qn , it follows that there is an subset U ⊂ X, open in ˆ 1 |, |x2 | > |λ ˆ 2 |, . . . , |xn−2 | > the usual topology, close to v0 , on which |x1 | > |λ ˆ n−2 |, but |xn | is as close as we like to |λ ˆ n−1 |. In U there must be a point v1 |λ

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

136

of X ⊂ T because T is a proper subvariety of X. Then v1 shows that v0 was not maximal for Smale’s mean value conjecture after all. This contradiction shows that it is not possible for the objective values of a locally maximal polynomial not to be all equal. Finally, if conjecture 2 is false and pˆ is a globally maximal polynomial then we cannot use the Weierstrass Preparation Theorem, but we can find ˆ 1 |; indeed we can choose such a point of the complex line L with |xn−1 | = |λ a point to avoid T because T cannot contain a circle in L. By permuting the co-ordinates and repeating the argument we eventually obtain a globally maximal polynomial all of whose objective values are equal in modulus. Lemma 5.5. For any p in Gn , we have n Y i=1

|λi | =

disc(p) . nn−1

In particular for a maximal polynomial p in Gn , we have for all i (n − 1)/n ≤ min |λi | = i

(disc(q))1/(n−1) (disc(p))1/(n−1) = . n n

Hence any maximal polynomial p satisfies disc(p) ≥ (n − 1)(n−1) . Proof. Suppose that p is a maximal polynomial in Gn . Let z0 = 0, z1 , . . . , zn−1 be the (distinct) roots of p, let q(z) = p(z)/z and let ζ1 , . . . ζn−1 be the roots of p0 , repeated according to multiplicity. Then the constant coefficient of

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

137

R(t) is n−1 n−1 Y n−1 Y Y p0 (zj ) p0 Res(q, ) = (zj − ζi ) = n n j=1 i=1 j=1

= n

n−1 Y j=0

=

n−1 Y1Y p0 (zj ) =n (zj − zi ) n n j=0 i6=j

(−1)n(n−1)/2 disc(p), nn−1

where disc(p) stands for the discriminant of p. The condition p0 (0) = 1 tells us that disc(p) = ±disc(q). The leading term of R(t) is (−1)n−1 tn−1 . Recalling that the λi are the roots of R(t), we find that n−1 Y

|λi | =

i=1

disc(p) . nn−1

Since p is maximal for Smale’s mean value conjecture, we know from Theorem 5.4 that the |λi | are all equal, which gives the first part of the corollary. The second part arises from the fact that we know |λi | ≥

n−1 n

for a maximal

polynomial, because of the example z n + z. This immediately gives us as a corollary a result originally proved by Tischler using a different method. Theorem 5.6 (Tischler). Among polynomials p in Fn such that the roots of q(z) = p(z)/z all lie on the unit circle, p(z) = z n + z is maximal for Smale’s mean value conjecture. Proof. It is well-known that among all monic polynomials q whose roots all lie on the unit circle, the modulus of the discriminant is maximised uniquely when those roots are equally spaced. Observe that the minimum of the |λi |

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

138

is at most their geometric mean, with equality only if they are all equal. The further condition that q(0) = p0 (0) = 1 leaves us with a unique polynomial maximal among those whose roots lie on the unit circle, namely q(z) = z n−1 + 1. Any explicit information that we can obtain about the polynomial Qn may help us to pin down the extremal polynomials. For example, in the case n = 3 we have Q3 (λ1 , λ2 ) = (λ1 − 1/2)(λ2 − 1/2) − 1/36. Setting Q3 = 0 forces either λ1 or λ2 to lie in the closed disc of radius 1/4 centred on 1/2, therefore to have modulus at most 3/4. The only case of equality here is λ1 = λ2 = 3/4, which corresponds (uniquely, after normalisation) to the polynomial p(z) = z 3 + z. The relation Q4 is already formidable: Let u = 4(λ1 − 1/2), v = 4(λ2 − 1/2), and w = 4(λ3 − 1/2). Then Q4 (λ1 , λ2 , λ3 ) = 3 − 4 (u + v + w) − 4 (uv + vw + wu) + 8 (u2 v + v 2 u + v 2 w + w2 v + w2 u + u2 w) + 33 uvw + 4 uvw(u + v + w) − 16 uvw(u2 + v 2 + w2 ) − 52 uvw(uv + vw + wu) − 3u2 v 2 w2 + 72 u2 v 2 w2 (u + v + w) − 81 u3 v 3 w3 . Unfortunately it is not possible to separate the variables to rewrite the equation Q4 = 0 as R(λ1 ) · R(λ2 ) · R(λ3 ) = constant,

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

139

where R is any rational function. As for Q5 , a crude estimate to bound its degree shows that the degree is at most 3420; it is probably rather smaller, but nevertheless Q5 is likely to be impossible to calculate explicitly. One further thing we can do is apply calculus to variation of the arguments n of λi . This gives the condition for a maximal polynomial that arg(λi ∂Q ) ∂λi

must be equal for all i = 1, .., n − 1. Together with the condition that the λi have equal moduli and satisfy Qn , this gives in theory enough constraints to leave only a finite set of candidate maxima to examine. However since we are unable to make effective use of this information, we omit the details.

5.2

The mean value conjecture for rational maps

In Smale’s original proof (to obtain the constant 4), the only fact about polynomials as special examples of rational maps that is used is that P (∞) = ∞. The proof does not use the fact that ∞ is a critical point for P or that there are no other pre-images of ∞. This allows us to use the Smale’s proof to prove a version of the mean value conjecture for rational maps: Theorem 5.7. Let R be any rational map of degree at least 2, and let x, y be points of b with R(x) 6= R(y), such that x is not a critical point of R. Then there C exists a critical point ζ of R and a M¨ obius map M such that M ◦ R fixes each of x, y and ζ, and with respect to any local co-ordinate at x we have |(M ◦ R)0 (x)| ≥ 1/4.

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

140

Remark: The special case with y = ∞ and R a polynomial is a simple reformulation of Smale’s result. Because x is fixed by M ◦ R, the derivative at x of M ◦ R does not depend on the choice of local co-ordinate. ˆ = T ◦R◦ Proof. The statement is certainly true for R if it is true for R S, where T and S are M¨obius maps. Choose a M¨obius map S such that S(∞) = y and S(0) = x, and a M¨obius map T such that T (R(y)) = ∞ and ˆ = T ◦ R ◦ S is a rational map fixing 0 and ∞. Let T (R(x)) = 0. Then R D be the largest open disc centred on 0 that carries a single-valued branch ˆ −1 mapping 0 to 0; let D ˜ be the image of β, i. e. the component of β of R ˆ −1 (D) containing 0. There is a critical point ζ of R ˆ on the boundary ∂ D, ˜ R ˆ with R(ζ) ∈ ∂D. Now β is a univalent map from the disc about 0 of radius ˆ ˜ which omits ζ, so Koebe’s 1 -Theorem gives |R(ζ)| to the domain D, 4 ˆ −1 )0 (0)| ≤ 4 |(R

|ζ| . ˆ |R(ζ)|

We now ask what is the best possible constant in Theorem 5.7 to replace 1/4, (perhaps in terms of the degree of R). The example R(z) = z n + z, y = ∞ shows that n/(n − 1) would be best possible. The proof precludes the case of equality, because then the branch β of R−1 would cover the whole of the Riemann sphere except for a slit, and the complement of D could not be covered by R. A degree 2 rational map must have exactly two simple critical points; after M¨obius maps at both ends we may assume that the map is z 7→ z 2 , and that x = 1 in the theorem; in this case the derivative at x is 2. So the situation for degree 2 is no different from the polynomial case.

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

141

The reader may think that this generalised mean value conjecture looks a little unnatural due to the unequal roles of x and y. We can make it more symmetrical by letting y be a variable critical point, giving it the same status as ζ. Here is the analogue of Smale’s Theorem: Theorem 5.8. b be given, such Let R be a rational map of degree at least 2, and x ∈ C that R(x) is not a critical value of R. Then there exist two critical points ζ and κ of R such that if M is the M¨ obius map that makes x, ζ and κ fixed points of M ◦ R, then with respect to any local co-ordinate at x we have |(M ◦ R)0 (x)| ≥ 1/2. Proof. The conclusion is unchanged if we replace R by M ◦ R ◦ N , where M and N are any M¨obius maps, so w. l. o. g. we may assume x = ∞ = R(x); then all the critical values of R are finite and there are at least two of them. Let K be the convex hull of the critical values. Then there exist two critical points ζ and κ such that diam(K) = |R(ζ) − R(κ)| > 0. Now there exists b \ K, taking ∞ to ∞, and a single-valued branch β of R−1 defined on C b \D→C b \ K be a Riemann map fixing ∞. The omitting ζ and κ. Let f : C logarithmic capacity of a compact subset of C of diameter d is at most d/2 [62, Theorem 5.3.4] so f # (∞) ≥ 2/|R(ζ)−R(κ)|. The Koebe 41 -Theorem tells us that (β ◦ f )# (∞) ≤ 4/|ζ − κ|, so we have

1 R# (∞)

= β # (∞) ≤ 2 |R(ζ)−R(κ)| . |ζ−κ|

Take M to be the affine map that makes M ◦ R fix ζ, κ and ∞. Then M # (∞) =

|R(ζ)−R(κ)| , |ζ−κ|

which gives the result.

As before, we ask what is the best possible constant to replace 1/2 in Theorem 5.8. Taking R(z) = (z n − nz)/(1 − n), for which zero and all the

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

142

critical points including ∞ are fixed, there is only one choice for M , namely the identity map; in this case we have (M ◦ R)0 (0) = (n/n − 1). So the constant n/(n − 1) to replace 1/2 in Theorem 5.8 would be best possible, and the constant 1 would be the best possible bound independent of the degree of R.

5.2.1

Rational maps with all critical points fixed

If we can arrange for every critical point to be fixed and for there to be at least one non-critical fixed point left over, then we get an example for Theorem 5.8 in which there is only one choice of M¨obius map M , making it easy to compute an upper bound on the best possible constant for that theorem. A rational map of degree n ≥ 2 has 2n − 2 critical points and n + 1 fixed points, both counted with multiplicity, so there will have to be some multiple critical points. In this section we will present a method of constructing rational maps all of whose critical points are fixed, and which have only one further fixed point (which after conjugation we may take to be ∞). Note that there are rational maps all of whose critical points are fixed but which do not have this additional property, for example z 7→ z n for n ≥ 3. Suppose that g is a polynomial of degree n with no repeated roots, such that all the roots of g 00 are also roots of g; it follows that g 0 has no multiple roots. Then let Rg be the associated Newton-Raphson map Rg (z) = z −

g(z) . g 0 (z)

Rg has a superattracting fixed point (i. e. a critical fixed point) at each of

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

143

the roots of g. In fact Rg0 (z) =

g(z)g 00 (z) , g 0 (z)2

so the only critical points of Rg are the roots of g. Observe that Rg has no multiple poles since g 0 has no multiple roots. In particular, every critical point of Rg is a fixed point. Note that Rg has a fixed point at ∞ with multiplier R# (∞) =

n . n−1

Any such Rg will show that the best possible

constant for Theorem 5.8 is at most n/(n − 1), just as the polynomial p(z) = (z n + nz)/(n − 1) does. In fact, when we take h(z) = a(z − c)n + b(z − c), we find that Rh is M¨obius-conjugate to p, via z 7→ 1/z; moreover, this is the only case of this construction for which Rg can be conjugate to a polynomial! Here is an example of an Rg which is not conjugate to a polynomial, but all of whose critical points are fixed. Take h(z) = z 4 + 2z 3 + 6z 2 + 5z + 4 = (z 2 + z + 4)(z 2 + z + 1) . For this h, we have Rh (z) =

3z 4 + 4z 3 + 6z 2 − 4 4z 3 + 6z 2 + 12z + 5

h00 (z) = 12(z 2 + z + 1) . Since h has no repeated roots, the roots of h00 are roots of h but not of h0 . Thus Rh has two finite fixed points of valency three, two finite fixed points of valency two, and no other critical points. Rh is not conjugate to a polynomial because it does not have a fixed point of valency equal to its degree. In Theorem 5.8, it seems likely that there are several extremal rational maps for each degree. This would be interesting for the following reason: any method that solves Smale’s mean value conjecture might be expected also

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

144

to apply to give the best possible constant for Theorem 5.8, so it should not rely on the uniqueness of the extremum! The following lemma characterises those rational maps that occur as Newton-Raphson maps of rational functions in terms of their fixed points and the multipliers at those fixed points. Buff and Henriksen [22] characterise the rational maps that occur as K¨onig’s methods for polynomials, and this includes Newton’s method for polynomials as a special case for which they give priority to [40, Prop. 2.1.2]. The extension here to Newton maps of rational functions may possibly be new. Lemma 5.9 (Characterisation of Newton-Raphson maps of rational functions). A rational map R is the Newton-Raphson map associated to some non-linear rational function g if and only if all the fixed points of R are simple and each finite fixed point χ of R satisfies

1 1−R0 (χ)

∈ Z. Moreover, g is a polynomial if

and only if these integers are all positive. The condition that all fixed points be simple applies even to any fixed point at ∞. Proof. Suppose that g(z) = p(z)/q(z) in lowest terms is a rational function of degree n ≥ 2 and R = Rg (z) := z −

g(z) . g 0 (z)

The fixed points of R are

precisely the roots and poles χ of g. If g has order m 6= 0 at χ ∈ C then   d g(z) 1 0 1 − R (χ) = = . 0 dz g (z) z=χ m For the converse, consider Z g(z) := exp

z

dz . z − R(z)

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

145

This function is certainly locally defined away from fixed points of R and satisfies the Newton-Raphson equation R(z) = z− gg(z) 0 (z) . The integrand has no multiple poles since at each fixed point of R we are told

1 z−R(z)

d (z −R(z)) dz

6=

0. We can therefore express the integrand in partial fractions as X Ai 1 = q(z) + , z − R(z) z − χ i i where q is a polynomial. R does not have a multiple fixed point at ∞, so z − R(z) → ∞ as z → ∞, hence q = 0. Near a fixed point χi of R we know that z − R(z) = (z − χi )/m + O((z − χi )2 ) for some positive integer mi , so Ai = mi . Now we can perform the integration explicitly: Z z X dz = mi log(z − χi ) + c , z − R(z) i so g(z) = exp(c).

Y (z − χi )mi , i

which is a rational function; it is a polynomial when all mi ≥ 1. The following special case is also a special case of [22, Prop. 4]. Corollary 5.10 (Characterisation of Newton-Raphson maps associated to polynomials without repeated roots). The following are equivalent for a rational map R: 1. R has a simple fixed point at ∞ and all the finite fixed points of R are also critical points of R; 2. R(z) = z − roots.

g(z) g 0 (z)

for some non-linear polynomial g with no repeated

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

146

In this situation, the fixed point of R at ∞ is not critical; indeed its multiplier is n/(n − 1), where n = deg g = deg R. If a Newton-Raphson map Rg associated to a polynomial g has all its critical points fixed, then all of its finite fixed points must be critical, and hence g cannot have any repeated root. Indeed, Rg would otherwise have a non-critical fixed point of multiplier less than 1 in modulus, and then a suitable iterate of Rg would violate Theorem 5.8. Note that for a Newton-Raphson map associated to a rational function g of degree n, we could have repelling fixed points associated to poles of g, but the multiplier would be k/(k − 1) for a pole of g of order k. It is possible for the degree of Rg to be less than the degree of g, so we might still hope to produce better examples for Theorem 5.8 using the present construction. However in the next section we will show that the multiplier n/(n − 1) is the smallest possible multiplier of a non-critical fixed point of any Rg of degree n whose critical points are all fixed. A final observation: R has a simple non-critical fixed point at ∞ precisely when R(z) b ∈ C \ {0, 1, ∞}. z→∞ z lim

5.2.2

Forbidden multipliers

The special case of Smale’s mean value conjecture for polynomials in which the critical points are all fixed is often referred to as Kostrikin’s conjecture. It was observed by Shub that if the critical points of a polynomial are all fixed, then the multiplier at each remaining fixed point must be greater than or equal to 1 in modulus (just plug a suitable iterate of the given map into

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

147

Smale’s Theorem). We will give a different argument which in fact applies in general to rational maps whose critical points are all fixed, and shows that the multiplier must be strictly greater than 1 in modulus. To see this, consider first the case in which R has only two critical values. b \ {critical values} to Then R pulls back a complete Euclidean metric from C b \ R−1 ({critical values}), so there can only be two critical points, and the C map is M¨obius-conjugate to z 7→ z n , in which the multiplier at the remaining fixed points is n. Otherwise, there are at least three critical values, and the b \ {critical values} → C b \ R−1 ({critical values}) lifts multivalued map R−1 : C to a conformal isomorphism between the universal covers of these two domains. Thus any branch of R−1 is locally a hyperbolic isometry between the b b −1 ({critical values}). natural hyperbolic metrics on C\{critical values} to C\R b b −1 ({critical values}) However, the inclusion map I : C\{critical values} → C\R omits some points because it is impossible for each of the critical values to have only one pre-image. Therefore the inclusion map is everywhere a strict contraction between the hyperbolic metrics. At any non-critical fixed point this shows that the multiplier is greater than 1 in modulus. The following theorem excludes further values of the multiplier at any non-critical fixed point, thus making a small amount of progress on Kostrikin’s Conjecture. Theorem 5.11. Suppose that R is a rational map all of whose critical points are fixed. Suppose that R has degree n and has m critical points (not counting multiplicity). Then R has exactly n + 1 fixed points, of which n + 1 − m are non-critical. The multiplier at any non-critical point does not lie in the closed disc whose

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

148

2 diameter is the interval [1, 1+ n+m−2 ], except in the case where m = n and the

multiplier of the remaining fixed point is n/(n − 1), which is on the boundary of this disc. Proof. We use the notion of residue fixed point index for holomorphic maps. If f : U → C is holomorphic on an open set U ⊂ C and z0 is an isolated fixed point of f , then the residue fixed point index of f at z0 is defined as Z 1 dz ι(f, z0 ) = , 2πi z − f (z) where the integral is taken around a positively-oriented circle around z0 so small that it contains no other fixed points of f . If the multiplier of f at z0 is λ 6= 1 then ι(f, z0 ) =

1 1−λ

. If the fixed point is simple then the index is

still well-defined and finite, but this formula does not apply. Theorem 5.12 (Rational Fixed Point Theorem). [54, Theorem 12.4] b→C b which is not the identity map, the sum over For any rational map f : C all the fixed points of the residue fixed point index is 1. The first part of Theorem 5.11 is a consequence of Shub’s observation that any non-critical fixed points must be repelling; in particular they are simple, so there are no multiple fixed points of R. The residue fixed point index of a fixed critical point is 1. For a non-critical fixed point, the multiplier has modulus strictly greater than one, so the residue fixed point index has real part strictly less than 1/2. In fact in the case where there are n critical fixed points, the rational fixed point theorem shows that the remaining fixed point must have residue fixed point index 1 − n, so must have multiplier n/(n − 1). In the general case, select a particular non-critical fixed point z0 . We get a contribution of less than m + (n−m) to the real part of the total index from all 2

CHAPTER 5. SMALE’S MEAN VALUE CONJECTURE

149

the other fixed points, so the residue fixed point index ι(R, z0 ) has real part greater than 1 − (m +

(n−m) ) 2

=1−

m+n , 2

and hence the multiplier does not

lie in the disc whose diameter is the interval [1, 1 +

2 ], m+n−2

as required. In

fact, further small regions of excluded values of the multiplier may be found by considering iterates of R. Note that in the case m < n, this disc of excluded values contains the multiplier

n n−1

in its interior. Since the multipliers at the fixed points of a

Newton-Raphson map associated to a rational function are real and nonnegative, Theorem 5.11 implies that one cannot improve on the multiplier n/(n − 1) by using Newton-Raphson maps associated to rational functions.

Appendix A Applications of IFS The motivation in this thesis for studying IFSs arises from conformal dynamical systems. To put our results in context, we discuss here various other applications of IFSs. An IFS is often used to model a discrete dynamical system perturbed by random noise, perhaps as a discrete approximation to a stochastic differential equation. Typically the unperturbed system will be stable, say with a locally attracting fixed point. The IFS corresponding to a small perturbation of this system may have an invariant measure concentrated near the original fixed point. However, it may also happen for a dynamical system with many distinct locally attracting fixed points that even small perturbations are asymptotically stable. A well-known application of N -map IFSs is in computer graphics, for generating natural-looking landscapes and textures. Michael Barnsley’s fern [7] is a famous example. These pictures are produced by plotting the points of a random orbit of an N-map continuous IFS. In order to be sure that this will 150

APPENDIX A. APPLICATIONS OF IFS

151

produce the required picture, we would like to know that with probability one the distribution of the first n points of the orbit will converge (as n → ∞) to some non-random probability measure, which represents the picture. As we will see (particularly in section 2.3.2), this requirement is closely related to another property of IFSs called asymptotic stability, which will be of central importance to us. For image compression one must solve the inverse problem of finding an IFS with a reasonably small number of simple maps whose invariant measure approximates a given probability measure. One approach is to look for an IFS of affine maps, using moments of the target measure to determine parameters. Although there are commercial image compression systems which encode digital images in terms of collections of IFSs, they have not been successful because they are computationally intensive in comparison to compression methods based on Fourier or wavelet transforms (such as JPEG) yet they do not typically achieve much better compression ratios. Another common reason for introducing an IFS is to represent a given Markov chain. Consider a Markov Chain with state-space Y and transition probabilities P (x, A), where for each x ∈ Y , P (x, ·) is a Borel probability measure on Y , and for each Borel set A ⊂ Y , P (·, A) is a Borel-measurable function. Any such Markov Chain is representable by an IFS (see chapter 1 of [50]). However not every such Markov Chain can be represented by a continuous IFS. Indeed, for a continuous IFS, the map from Y to P(Y ) (with the weak topology) given by x 7→ P (x, ·) is necessarily continuous. Blumenthal and Corson [17] gave a partial converse: they showed that any Markov chain on a connected, locally connected and compact space Y such that the

APPENDIX A. APPLICATIONS OF IFS

152

transition map Y → P(Y ) is continuous, and such that for each x ∈ Y the support of P (x, ·) is the whole of Y , is representable by an IFS of continuous maps. Another restrictive condition on a Markov chain is the existence of a representation by an IFS with finitely many maps (not necessarily continuous). Such a representation exists if and only if the transition probabilities P (x, A) take only finitely many distinct values. Finally, representability by a continuous N -map IFS is a more subtle question. For example, consider the 2 Markov Chain (xi )∞ i=0 with state space C such that P(xn+1 = xn ) = 1, where

the two square roots are taken with equal probability when xn 6= 0. The corresponding map C → P(C) is continuous and the transition probabilities take only finitely many values, yet this chain cannot be represented using any continuous IFS. The survey article [27] gives an example of the the use of an IFS to represent the waiting time process in the G/G/1 queue, and gives references to further applications in queuing theory. The same article describes the recent method of Propp and Wilson which uses backward iteration of a suitably contractive IFS to simulate exactly from a distribution on a very large finite state space, discussing in particular the Ising model on a large but finite grid in two dimensions. That method is based on Letac’s principle, among other ideas, and in this connection theorem 2.24 may be of interest.

Appendix B Known stability results for IFS There are many papers in which IFSs are proved to be asymptotically stable as a consequence of contractivity conditions on the defining maps. [27] is a very readable recent survey article on applications of contractive IFSs. It includes a complete proof of the following theorem (which was known before) and gives references to papers containing stronger results. Theorem B.1. Let F be an IFS on a complete separable metric space (Y, d). Suppose that f is a.s. Lipschitz, with E log Lip(f ) < 0, and that there exist α > 0 and β > 0 such that P(Lip(f ) > u) < α/uβ for all u > 0. Then F is asymptotically stable and there exists a point x0 ∈ X and constants a > 0, b > 0 and 0 < r < 1 such that πd ((F∗ )n δx , µ) ≤ (a + bd(x, x0 ))rn , where πd is the Prohorov metric and µ is the invariant law of F. The idea of using the natural extension of the skew product to relate the 153

APPENDIX B. KNOWN STABILITY RESULTS FOR IFS

154

forward and reverse iterates was used by Elton [30] to prove stability results and ergodic theorems in the situation of the above theorem, but allowing the sequence of maps (Fn ) to be a more general stationary process than a Bernoulli process. Then the sequence Fn ◦ · · · ◦ F1 (Z0 ) no longer forms a Markov chain, but what has been called a Markov chain in a random environment. This raises the question of whether our results on non-uniformly contracting IFSs can be extended to deal with stationary sequences of maps. For IFSs on a complete separable metric space (X, d), Stenflo [70] supposes that there exists c < 1 such that for all x, y ∈ X, E(d(f (x), f (y)) ≤ cd(x, y) , and E(d(x, f (x)) < ∞ He proves that there exists a unique invariant law µ and that for any x ∈ X, Kd (F∗n (δx ), µ) = O(cn ) , uniformly on bounded subsets. For non-uniformly contracting IFSs, one cannot hope to give any similar bound on the rate of convergence. In the same situation, Stenflo proves the continuous dependence of the invariant law on the parameters defining the IFS. A more common assumption of average contractivity is a spatially uniform bound E (log d(f (x), f (y)) − log d(x, y)) < − < 0 . Barnsley and Elton [10] prove asymptotic stability of IFSs using a slightly extended version of this. [9] shows asymptotic stability using this assumption and allowing place-dependent probabilities; in the same situation [74]

APPENDIX B. KNOWN STABILITY RESULTS FOR IFS

155

proves strong results (including a law of the iterated logarithm) about the behaviour of the average of a continuous function h over the first n steps of the orbit, approximating a suitable normalisation of this sequence by a Brownian motion. The subject of iterated function systems driven by stationary sequences of maps was also developed in Romania under the name of dependence with complete connections; a paper from that school relevant to the present discussion is [38], which generalises the result of [10]. Define the local Lipschitz constant of a map h : X → X to be d(f (x), f (y)) . d(x, y) y→x

Dx h = lim sup

Steinsaltz [68] studies IFSs that he calls locally contractive. Here X is a convex subset of Rn with the Euclidean metric, and it is assumed that there is a function φ : X → [1, ∞) and a constant c < 1 such that for all n ∈ N and all x ∈ X, E (Dx (F1 ◦ · · · ◦ Fn )) ≤ φ(x)cn .

(B.1)

The main result is that F is asymptotically stable, with an explicit rate of convergence given for the sequence of reverse iterates. A sufficient condition for (B.1) to hold is given that does not involve iteration. Note that the brief literature survey in [68] repeatedly omits the crucial condition of completeness. It would be interesting to see whether Steinsaltz’ results can be applied to simplify any of the results about analytic IFSs on D in Chapter 2. There are non-uniformly contracting IFSs which do not fall into any of the classes discussed above, and of course each of these classes contains IFSs which are not non-uniformly contracting.

APPENDIX B. KNOWN STABILITY RESULTS FOR IFS

156

In the setting of IFSs of maps that do not increase distance, two recent papers [1, 53] go even beyond non-uniform contractivity. They prove the stability of the IFS on R+ defined by fω (x) = |x − ω| , where ω ∈ Ω = R+ and the law P of ω is compactly supported but not supported on any lattice in R. Note that in this case each map is Lipschitz with constant 1, but none of the maps is strictly distance-decreasing. Moreover, the stability result is very delicate indeed: in the N -map case, stability fails when P is supported on a lattice, so the unstable IFSs are dense in the parameter space.

Appendix C Dependence of stability on p In this appendix we give some examples of analytic IFS on D. They show that no condition on the maps alone can be both necessary and sufficient for stability. In all three examples, the maps are fixed but the existence of an invariant probability measure depends on the choice of the associated positive probability vector. From the point of view of random walks this is not surprising, but in the context of IFSs it is interesting – Lasota and Yorke [46] showed that for the closely related class of non-expansive IFSs on compact metric spaces, stability depends only on the maps and not on the probabilities. The examples also demonstrate that for a fixed set of maps, neither the existence nor the absence of an invariant measure is necessarily a convex condition on the probability vector. In particular the condition for stability is not necessarily linear. Example C.1. Define a 2-map IFS F by f1 : z 7→ z 2 157

APPENDIX C. DEPENDENCE OF STABILITY ON P

158

and f2 : z 7→

z+τ , 1 + τz

where τ > 0 is chosen so that f2 is the hyperbolic isometry of D that translates the geodesic along the real axis by hyperbolic distance log 2 towards 1. The interval [0, 1) is a forward-invariant set for both maps. If F is asymptotically stable then its restriction to [0, 1) is asymptotically stable, and conversely if the restriction is asymptotically stable then the unrestricted system has an invariant probability distribution, so by Ambroladze’s Theorem is asymptotically stable. The restriction to [0, 1) behaves rather like a random walk with a reflecting barrier at 0, since on the real axis near 1, z 7→ z 2 displaces points towards 0 by a hyperbolic distance that approaches log 2. When p1 >

1 2

we will show how to bound the Markov chain Hn = Fn ◦ · · · ◦ F1 (0) be-

tween two simpler random walks with retaining barriers, both coupled to Hn . Each of these bounding walks has a unique invariant probability distribution, to which its empirical distribution almost surely converges. Letting δ0 be the unit mass at 0, it follows that F∗n (δ0 ) 9 0 weakly; hence by Ambroladze’s Theorem F has an invariant probability measure and is asymptotically stable. The invariant measure is supported on [0, 1) because each F∗n (δ0 ) is supported on [0, 1). Lower bound: Consider a random walk Zn on the points xm of [0, 1) such that d(0, xm ) = m log 2, m ∈ N, defined as follows. Z0 = 0; the transition from Zn−1 to Zn is a step to the right when Fn = f2 and a step to the left when Fn = f1 , with the exception that if Zn−1 = 0 and Fn = f1 then Zn = 0 again. Since for all x ∈ (0, 1), we have d(x2 , x) < log 2, we obtain Zn ≤ Hn for all n, as required. It is well known that the chain Zn is recurrent (persistent)

APPENDIX C. DEPENDENCE OF STABILITY ON P

159

when p1 > p2 . Upper bound: The upper bound is slightly more complicated. Let R ∈ (0, 1). Consider a random walk Yn on (0, 1) such that Y0 = 0 and the transition from Yn−1 to Yn is a step to the right by a hyperbolic distance log 2 when Fn = f2 but is a step to the left by a hyperbolic distance

p2 +p1 2p1

log 2 when

Fn = f1 , except when this would make Yn < R, in which case Yn = Yn−1 . (We have introduced a retaining barrier at R). Except when Yn is close to R, the expected increment in d(0, Yn ) is −p1 p12p+p1 2 log 2 + p2 log 2 =

p2 −p1 2

log 2,

so when p1 > p2 , the chain (Yn ) has an invariant distribution. To ensure that Yn is an upper bound for Hn , we need only choose R large enough that for √ √ all 1 > t ≥ R, we have d t, t > p22p+p1 1 log 2, and since d t, t → log 2 as t → 1, there does exist such an R. When p1 ≥ p2 , the chain Zn is still a lower bound for Hn , but it has no invariant distribution (even though in the case p1 = p2 =

1 2

it is recurrent).

In particular, for any t ∈ (0, 1), P(Zn ∈ [0, t]) → 0 as n → ∞, and since Zn ≤ Hn , we have P(Hn ∈ [0, t]) ≤ P(Zn ∈ [0, t]). Therefore F∗n (δ0 ) → 0 weakly, so by Ambroladze’s Theorem F has no invariant measure. Example C.2. In this example, stability is not a convex condition on the probability vector. The details are somewhat technical (using hyperbolic trigonometry) but the basic idea is as in example C.1: we find some Liapunov function φ on D (i. e. a continuous real-valued function with φ(z) → ∞ as z → ∂D) and consider the process φ(Fn ◦ · · · ◦ F1 (0)). Then we construct a Markov chain with values in R that is a lower or upper bound for this process and which we can prove is transient or positively recurrent, respectively. This general idea appears a great deal in the literature on IFSs.

APPENDIX C. DEPENDENCE OF STABILITY ON P

160

The maps for this example will all be conjugate to g : z 7→ z 2 , so individually they are easy to understand. Pick two points x1 , x2 of D with d(x1 , x2 ) = t, and fix a large positive integer N . (In due course we will specify how large N and t must be). For i = 1, 2, let σi be a hyperbolic isometry 2πi

that carries xi to 0. Let h be the rotation h : z 7→ e N z. Then the maps of our example are the following, as j runs over {0, . . . , N − 1} and i runs over {1, 2}: fi,j = σi−1 ◦ hj ◦ g ◦ σi . Note that hj ◦ g is in fact a conjugate of g. To specify the probabilities associated to these maps, choose α ∈ (0, 1). Set the probability p1,j associated to each map f1,j equal to α/N , and the probability p2,j associated to each map f2,j equal to (1 − α)/N . We have now defined an IFS; call it F. Claim. 1. For α sufficiently close to 0 or to 1, F is asymptotically stable. 2. For α = 21 , F has no invariant measure. Proof.

1. Swapping σ1 and σ2 gives another system satisfying the above

description, but with α and 1 − α swapped; thus it suffices to consider the case where α is close to 0, so that the f1,j occur only rarely. The maps f2,j have fixed point x2 , so we will look at rn = d(x2 , Hn ), where Hn := Fn ◦ · · · ◦ F1 (x2 ). When Fn = f1,j for some j, we apply the

APPENDIX C. DEPENDENCE OF STABILITY ON P

161

triangle inequality twice to obtain rn − rn−1



d(x2 , x1 ) + d(x1 , Hn ) − d(x2 , Hn−1 )



d(x2 , x1 ) + d(x1 , Hn−1 ) − d(x2 , Hn−1 )



d(x1 , x2 ) + d(x1 , x2 )

=

2t.

If Fn = f2,j for some j, and rn−1 is large, then rn −rn−1 is approximately − log 2. Just as in example 1, we can bound rn above by a random walk on R with a retaining lower barrier, which has an invariant distribution when 2αt < (1 − α) log 2. This shows that for sufficiently small α, F itself is asymptotically stable. 2. To deal with the case α = 21 , we introduce the random sequences (in ) and (jn ) such that Fn = fin ,jn , and define an = d(Hn , xin ),

bn = d(Hn , x3−in ).

Our aim is to show that with probability 1, rn → ∞ as n → ∞. We begin with some hyperbolic trigonometry. cosh d(0, z) = ∴

cosh d(0, z 2 ) =

1 1 − |z|2

1 1 1 = . 4 2 1 − |z| 1 − |z| 1 + |z|2

Hence if in = in−1 , we have cosh rn ≥ 21 cosh rn−1 . Now suppose that in 6= in−1 . We shall use the analogue for hyperbolic triangles of the cosine rule. Let θ be the angle at x3−in in the geodesic triangle with vertices at x1 , x2 and Hn−1 . The side lengths

APPENDIX C. DEPENDENCE OF STABILITY ON P

162

are d(x1 , x2 ) = t, d(xin , Hn−1 ) = sn−1 and d(x3−in , Hn−1 ) = rn−1 . The hyperbolic cosine rule now gives cosh sn−1 = cosh t cosh rn−1 − cos θ sinh t sinh rn−1 . Dividing through by cosh rn−1 , we have cosh sn−1 = cosh t − cos θ sinh t tanh rn−1 . cosh rn−1 The angle θ takes one of N possible values evenly spaced around the circle, and which of those values it takes is independent of rn−1 and sn−1 . So if we require N to be even, with probability at least 21 , the contribution of the final term in the above equation is non-negative, so cosh sn−1 cosh rn−1

≥ cosh t. When can the ratio

when cos θ >

cosh t−1 sinh t

cosh sn−1 cosh rn−1

be less than 1? Precisely

= tanh( 2t ). We can require t to be sufficiently

large that cos Nπ < tanh( 2t ), and then the probability that

cosh sn−1 cosh rn−1