The Folk Theorem in Repeated Games of ... - Semantic Scholar

9 downloads 0 Views 477KB Size Report
information game is perturbed by a small amount of incomplete information. Keywords: Reputation, Folk Theorem, repeated games, incomplete in- formation.
The Folk Theorem in Repeated Games of Incomplete Information Martin W. Cripps and Jonathan P. Thomas University of Warwick, Coventry CV4 7AL, UK. This version: May 1997

Abstract: The paper analyzes the Nash equilibria of two-person dis-

counted repeated games with one-sided incomplete information. If the informed player is arbitrarily patient relative to the uninformed player, then the characterization is essentially the same as that in the undiscounted case. This implies that even small amounts of incomplete information can lead to a discontinuous change in the equilibrium payo set. For the case of equal discount factors, however, a result akin to the Folk Theorem holds when a complete information game is perturbed by a small amount of incomplete information. Keywords: Reputation, Folk Theorem, repeated games, incomplete in-

formation. JEL Classification Numbers: C73, D83, L14.

 Our thanks are due to seminar participants at Erasmus University, Carlos III Madrid, Cambridge, LSE,

Warwick, Essex, UCL, Southampton, York and to conference participants at the 1996 summer workshop at MEDS Northwestern University, and the 1995 Fields Institute Conference Toronto, and also to Mike Peters, Carlo Perroni and Marco Celentani for their comments and suggestions.

1. Introduction The Folk Theorem for discounted repeated games of complete information asserts that any feasible strictly individually rational payo vector can be sustained in equilibrium provided both players are suciently patient. A natural question arising from the literature surrounding the Folk Theorem, is whether a similar result holds when there is asymmetric information about the players' preferences over outcomes in the game. This is of interest in its own right given that situations of incomplete information are common in economic models. It is also important for the question of how robust complete information results are to small perturbations of the information structure. Suppose there is a small chance that an opponent's preferences might be di erent from their representation in some complete information game: does this small perturbation lead to a substantial change in the predictions of the model? In this paper we consider discounted repeated games between two players when the stage-game payo matrix of one of the players in unknown to the other player (but rates of discount are common knowledge). We shall investigate to what extent, as the players become patient, the (Nash) equilibria of the game can be characterized by a corresponding Folk Theorem result. We nd that the answer to this question depends critically upon what is assumed about the relative rates of patience of the two players. If the players' discount factors are equal, then as they become very patient we can establish a continuity result for small perturbations of complete information games. If the informed player is suciently more patient relative to the uninformed player, however, it turns out that the characterization is di erent, and such a continuity result no longer holds; we establish that all equilibria are approximately payo equivalent to equilibria in which the informed player acts to reveal her information at the start of the game (see the end of the following subsection for a more complete summary of the main results). The situation where one or more players' preferences may be unknown to the opponent(s) has received relatively little attention in the discounted repeated games literature, despite considerable work on `reputation' models where perturbations of preferences are in terms of irrational or commitment types. Undiscounted repeated games of incomplete information have, however, been studied in some depth, especially in the zero-sum case. 1

(A brief review of the relevant results is contained in Section 3.) Some recent results exist for the discounted case, however. Kalai and Lehrer (1993)1 and Jordan (1995) have established that play must converge to Nash play of the true game. Jordan (1995) has also proved the existence of an equilibrium for this class of games. Perfect Bayesian equilibria of such games must have a Markov property (Bergin (1989)). McKelvey and Palfrey (1992, 1993) have studied belief stationary equilibria. The results of Kalai and Lehrer and Jordan on convergence to Nash play are informative about the long-run behaviour of an equilibrium, but to be able to say anything about the overall payo s from the beginning of the game|what we are interested in here|it is necesssary to know something about how rapidly convergence takes place relative to the rate of discounting of payo s and also, possibly, what happens in the shorter run.2 By exploiting a fundamental result due to Fudenberg and Levine (1992) on the speed of learning, the case where the informed player is arbitrarily patient relative to the uninformed player can be completely solved purely on the basis of \long-run" considerations. A more detailed consideration of the shorter run is needed for the symmetric discounting case.

1.1. An Example and Statement of Main Results In this subsection, to motivate the main ideas of the paper, we shall analyse a simple example of an incomplete information game. While the general argument is more complex, the example illustrates well the main issues and results. We consider the game in Figure 1| a perturbed version of the Battle of the Sexes|in which there are two players, 1 (row player) and 2 (column player). Player 1 has a stage-game payo matrix given by either A1 or A2, while player 2 has the payo matrix B . As in the usual Bayes-Nash approach, we imagine that nature selects A,  = 1 or 2, before the start of the game with probabilities (1 ? p) and p respectively (0 < p < 1), and only player 1 observes this choice; hence p is player 2's prior belief that player 1 is of \type 2". Each period t = 0; 1; 2; : : : the simultaneous move stage game (A; B ) is played out, and it is assumed that players can observe all previous actions. For the moment assume that there is a common discount Their approach also covers a broader learning environment than the standard one. In this respect the undiscounted case is much more straightforward since long-run behaviour and payo s can be characterised using martingale arguments, and the speed of convergence to the long-run does not a ect average payo s. 1 2

2

factor , 0 <  < 1, with the payo at time t being weighted by (1 ? )t. Apart from the incomplete information, the game is a standard discounted repeated game. Type 2 is

U

3

D

0

L

1 0 0 1

(A1; B )

R 0

U

1

3

D

0

L 1

1

0

0

R 0 3

(A2; B ) Figure 1 ? Battle of the Sexes

a \commitment type" of player 1; playing the top row (U ) is a dominant strategy in the repeated game for this type.3 It will become clear that a crucial issue in characterizing the possible equilibria is the speed at which the uninformed player learns. To start, consider rst a fully revealing Nash equilibrium in which the types distinguish themselves in the rst period (i.e., type 1 chooses D and type 2 chooses U ). What conditions must such an equilibrium satisfy? After the rst period, t = 0, the game essentially becomes one of two possible complete information games, depending on which type is revealed. If type 2 is revealed, the equilibrium thereafter involves U being played each period. Clearly player 2 must respond so as to get at least his minmax payo (= 3=4) from t = 1 onwards (i.e., the equilibrium must be individually rational for both players). On the other hand, if type 1 is revealed, in addition to the usual constraints on equilibria of a complete information game (essentially, again, individual rationality for both players), there is an incentive compatibility constraint which must be observed. Type 1 has the option at t = 0 of mimicking type 2. Consequently the incentive compatibility constraint requires that the payo type 1 gets from revealing her type and playing out some complete information equilibrium of the game (A1; B ) must be at least as great as the payo she would get from mimicking type 2. This latter payo , discounted to period t = 1, is at least 9=4, since in the type 2 equilibrium player 2 must play L (against U ) at least three quarters of the While the analysis below does not generally restrict the payo matrices of the various types of player 1, such a commitment type simpli es the example in two important ways. First, on the equilibrium path, type 2 always chooses U , and consequently we need only concentrate on the behaviour of type 1. Secondly, punishing player 1 for deviating is simple since only type 1 needs to be punished. 3

3

time in order to get at least his minmax payo . Hence, in any fully revealing equilibrium, for  near one, type 1 must receive at least approximately 9=4. Since this argument is independent of p, it follows that even small perturbations of the repeated Battle of the Sexes game only have fully revealing equilibria in which type 1 does substantially better than her minmax payo of 3=4. It is a result from the theory of undiscounted incomplete information games (of the type considered here) that all equilibria are payo equivalent to revealing equilibria; that is, even for equilibria where revelation does not occur immediately, there are payo equivalent revealing equilibria (see Section 3 for a full discussion of these results). We shall show that this is not the case with symmetric discounting. Indeed for small perturbations of a complete information game, and as  goes to one, there will be equilibria, not (in general) involving immediate revelation, which give overall payo s close to any payo s that are feasible and individually rational in the complete information game. Hence for symmetric discounting there is a continuity result for small perturbations of the information structure of complete information games. To see how type 1 can receive payo s substantially below 9=4 in the example when  is close to one, it is only necessary to delay revelation, so both types of player 1 pool on U for t = 0; 1; : : : ; T ? 1, and at t = T type 1 reveals her type by playing D. If type 2 is revealed at time T , then an equilibrium of the complete information game between type 2 and player 2 is played out from T + 1, which gives player 2 a payo of 0:8 (discounted to T + 1); this is possible for  close enough to one. Notice that if type 1 was to mimic type 2, she would receive a payo of 2:4 in this continuation equilibrium. If type 1 is revealed, then an equilibrium of the repeated Battle of the Sexes game is played out, in which type 1 receives 2:5 and player 2 receives 1:5 | an equilibrium for  near one. At T , player 2 plays R, and so incentive compatibility is satis ed. (A deviation by player 2 at T is not pro table for  near one assuming minmax punishments are used.) Hence we have constructed a revealing equilibrium starting at t = T , provided  is close to one. In this equilibrium type 1 receives a payo of (1 ? ) + (2:5), and player 2 receives p  (0:8)+(1 ? p)((1 ? )(3)+ (1:5)). For  close to one and p close to zero, these payo s are respectively approximately 2:5 and 1:5. Next, for t < T , (U; R) is played out by both types. Choose T so that the weight in the discounted sum placed on the rst T 4

periods, (1 ? ) Tt=0?1 t, is approximately 0:4, so player 2 gets a payo of approximately (0:4)(0) + (0:6)(1:5) = 0:9, provided p is very small, and hence the threat of minmax punishments will prevent a deviation. Type 1 of player 1 gets approximately (0:4)(0) + (0:6)(2:5) = 1:5. So, provided  is close to one and p is close to zero, an equilibrium can be constructed in which type 1 gets considerably less than 9=4, her minimum payo in a revealing equilibrium. Revelation still occurs, but not immediately. P

Is it possible to drive type 1 down to her minmax payo of 3=4 in such a pure strategy equilibrium? The answer is no. The worst equilibrium for type 1 would involve a revealing equilibrium from T which gave her exactly 9=4 and which gave player 2 the highest feasible payo consistent with this, i.e., 7=4.4 The initial pooling phase can have a weight in the discounted sum of no more than 4=7, since player 2 then gets overall at most (3=7)(7=4) = 3=4 when p is very small (a larger p can only reduce the weight on the pooling phase); any extension of the pooling phase would imply that player 2's overall payo drops below his minmax payo . This gives a lower bound on type 1's payo of (3=7)  (9=4) = 27=28 > 3=4; no other con guration can lead to a lower payo . To construct an equilibrium in which type 1 is held close to her minmax payo it is necessary not only to delay revelation, but to make it gradual. One such equilibrium consists of the previously constructed delayed revelation equilibrium, preceded by an initial pooling phase of (U; R) which ends with a random move by type 1. More precisely, suppose that at t = T 0 type 1 plays D with probability 0:5, while player 2 plays R. If type 1 chooses U , the above equilibrium is followed. If type 2 chooses D, a complete information equilibrium is played which gives type 1 the same payo she would get from playing U , that is, approximately 1:5. Player 2 receives the maximum payo consistent with this, approximately 2:5 (for  close to one payo s at T 0 are insigni cant). Now payo s discounted to T 0 are approximately 1:5 for type 1 and (0:5)(0:9)+(0:5)(2:5) = 1:7 for player 2. By having the initial phase of (U; R) suciently long, type 1's payo can be reduced close to 0:75, her minmax payo , without violating individual rationality for player 2 (provided the weight in the discounted sum on the rst phase is approximately 0:5). To conclude, for all  suciently close to one, and all p suciently close to zero, type 1 can be held arbitrarily close to her minmax payo . 4

It is being assumed that  is close to one so the actual period of revelation is insigni cant for payo s.

5

The example illustrates the need for a mixed-strategy equilibrium to attain certain payo s | stated di erently, learning by player 2 must be gradual; this is in contrast to the complete information case where only pure-strategy equilibria are needed. The example also illustrates the purpose served by the gradual learning: it permits player 2's overall payo to be increased relative to the payo player 2 receives along the path played by type 2 of player 1. In turn, this permits actions to be taken along that path which are undesirable for type 2 and hence also for type 1. This relaxes the incentive compatibility constraint, which was the constraint that guaranteed type 1 a high payo in the revealing equilibrium. The characterization that type 1 can be driven down to her minmax payo for small perturbations fails, however, if player 1 is very patient relative to player 2. In the example it was argued that type 1's payo could be held down by gradual learning because the path followed by type 2 could then be made unattractive from type 1's point of view, thus relaxing the incentive compatibility constraint. If, however, player 1 is very patient relative to player 2, the part of the game in which type 2's path is unattractive becomes unimportant from the point of view of player 1; this implies that the incentive compatibility constraint will again require that type 1 gets at least approximately 9=4 (in this case there is a discontinuity with the complete information game as p goes to zero). It is the periods of learning which can be used to hold type 1's payo down, but for a very patient player 1, the periods of learning are insigni cant in the calculation of payo s. In what follows, we shall treat these two discounting cases in reverse order to the above discussion. Our rst main result, in Section 3, states that for arbitrary given initial beliefs, for a xed value of player 2's discount factor, and if player 1's discount factor is suciently close to one, the equilibrium payo s to player 1 (for each of a nite number of types) must approximately satisfy the conditions of a revealing equilibrium. This implies a continuity result with the undiscounted case (holding prior beliefs constant): as the players' discount factors go to one, if player 1's discount factor goes to one suciently fast relative to that of player 2, then the limiting set of equilibrium payo s for player 1 must satisfy the necessary conditions appropriate for the model with no discounting. In Section 4, the symmetric discounting case is analysed. We establish a continuity result with complete information games as the probability of one of the types goes to 6

one: for any feasible strictly individually rational payo vector in the game between this type and player 2, there is a Nash equilibrium of the incomplete information game with these payo s (to this type of player 1 and to player 2) provided the players are suciently patient and provided initial beliefs put suciently high probability on this type. Since there is no such continuity result for undiscounted games as the size of the perturbation goes to zero, it can be concluded that the equilibrium characterization which exists for the undiscounted case is only the limit (as discount factors go to one, holding beliefs constant) of the discounted case if the limit is taken in a particular way,5 and notably it is not the limit of the discounted case if both players' discount factors are equal.

2. The Model The in nitely repeated game ?(p; 1; 2) is de ned as follows. There are two players called \1" (she) and \2" (he). At the start of the game, player 1's \type" k is drawn from a nite set K (where K also denotes the number of elements) according to the probability distribution p = (pk )k2K 2 K (the unit simplex of 0 for all k. In every period t = 0; 1; 2; : : :, player 1 selects an \action" it out of a nite action space I , while player 2 simultaneously chooses an action j t from the nite set J , where I and J have at least two elements. Payo s at stage t to type k of player 1 and to player 2 are respectivelyAk(it; j t) and B (it; j t). Player i discounts payo s with discount factor i, with the payo to type k of player 1 being a~k = (1 ? 1) 1t=0 1t Ak (it; j t), and that to player 2 being ~b = t t t t t (1 ? 2) 1 t=0 2 B (i ; j ). Both players observe the realized action pro le (i ; j ) after each period. Let H t = (I J )t+1 be the set of all possible histories ht up to and including period t. A (behavioral) strategy for type k of player 1 is a sequence of maps k = (k0; k1;   ), kt : H t?1 ! I . We de ne  = (k )k2K . Likewise, a strategy for player 2 is a sequence of maps  = ( 0;  1;   ),  t : H t?1 ! J . The prior probability distribution p, together with a pair of strategies (;  ), will induce a probability distribution over in nite histories P

P

5 This provides another contrast with complete information games where it can be shown that the projection of the set of equilibrium payo s onto the more patient player's axis converges, as both discount factors go to one, to the Folk Theorem projection, so that in terms of the payo s possible to the more patient player, the way the two discount factors go to one does not matter.

7

and hence over discounted payo s. We use Ep;; to denote expectations with respect to this distribution, and abbreviate to E where there is no ambiguity. Players are assumed to maximize expected payo s, and a Nash equilibrium is de ned as a pair of strategies (;  ) such that, for each k, Ep;; [~ak j k]  Ep;0; [~ak j k] for all 0, and Ep;; [~b]  Ep;; 0 [~b] for all  0. Finally we shall need the following. Let ^ak := ming2J maxf 2I Ak (f; g) be type k's minmax payo , where we use the notational abuse that Ak (f; g) is the expected value of Ak(i; j ) when mixed actions f and g are followed. Likewise player 2's minmax payo is given by ^b := minf 2I maxg2J B (f; g).

3. A Relatively Patient Informed Player We start by considering the case where the discount factor of player 2 is taken as xed, and we let the discount factor of player 1, the informed player, go to one. This case corresponds closely to the undiscounted case; necessary conditions which must be satis ed by player 1's payo s in the undiscounted case must also be (asymptotically) satis ed in the discounted case as 1 ! 1. These necessary conditions can be interpreted as requiring payo equivalence to some fully revealing equilibrium. Before we consider the discounted case, we brie y review the existing results in the undiscounted case.

3.1 Review of the Undiscounted Case Hart (1985) gave a complete characterization for the general class of undiscounted games with one-sided incomplete information, which includes the possibility that the uninformed player is unaware of his own payo function. (This characterization is based on bimartingales.) For the case we are interested in, namely where each player is aware of their own payo s but one of the players does not know the payo s of the other player,6 a much simpler characterization has been found, using Hart's results, by Shalev (1994).7 Denote this game by ?(p; 1; 1). We shall show that essentially the same characterization That is, in the general case the informed player knows both her own and the opponent's payo matrix, while the opponent knows only the probability distribution over the possible types of the informed player; here the uninformed player's payo matrix does not vary with the informed player's type. 7 For a direct proof of these results which does not use the bi-martingale approach, see Koren (1988). Forges (1992) provides a survey of the literature. 6

8

as that of Shalev can be obtained for the discounted case provided the informed player is arbitrarily patient relative to the uninformed player. We de ne rst individual rationality in this setting. Punishment strategies for player 2 are more complex than in the complete information setting, because all possible types of player 1 must simultaneously be punished. In a repeated game without discounting this problem is solved by Blackwell's approachability result (Blackwell (1956)). Let x := (xk )k2K 2 3 > 0. There exists a  < 1 such that if  > , then ?(p; ) has an equilibrium with the payo s (( 1; :::;  K ); ) satisfying: (a) ak (3) ? =2 >  k > ak (3) ?  for all k, and (b)   ^b + 3. We wish to construct an equilibrium where type 1 mimics each type k > 1 with positive probability, and then repeatedly plays out a sequence of actions, speci c to type k, which gives her a low payo (as in the equilibrium of Section 4.1). This construction allows us to construct an equilibrium of ?(p; ) in which type 1 gets a low payo . It is, therefore, necessary to nd K ? 1 such sequences of actions, one for each type k > 1, that holds type 1 close to her minmax level. If this were all that is required, then calculating these sequences would be a trivial problem. We will, however, need these sequences to satisfy two further properties which considerably complicate their construction. The rst is incentive compatibility; that is, each type k > 1 prefers to play out her nite sequence of actions rather than another type's nite sequence. The second is individual rationality; we require there to exist punishments which player 2 can impose on the types of player 1 if they deviate from these nite sequences. The incentive compatibility of the sequences is a relatively simple condition to ensure; one could simply choose the sequence for type k to maximize her payo conditional on type 1 receiving no more than her minmax payo . The main problem arises in choosing these sequences to ensure the individual rationality of all types, that is, the existence of feasible punishments which will deter deviations from playing out the equilibrium. There are two distinct types of deviation by player 1 that the punishments must deter: The rst type of deviation from our equilibrium is if type k truthfully signals her type, but then deviates when she should be repeatedly playing the nite sequence of actions. Of course, any punishment imposed on this behaviour must be e ective in deterring both type 1 and type k from deviating, because at our equilibrium both types will be playing out k's nite sequence with positive probability. The second type of deviation that the punishments must prevent is if type k > 1 mimics type k0 > 1 and then, after some period of time playing k0's nite sequence, deviates and takes a punishment (it may be that the punishments imposed on type k0 are more attractive than the equilibrium payo of type k). We will need one more piece of notation. Recall that ak () is type k's largest payo in the set Gk (). By taking  = 0 de ne (7)

a := (a1; :::; aK) 2 1 of player 1 signal their type and type 1 randomly mimics one of the types k0 > 1. This is not simple to implement, because type 1 must be made indi erent between mimicking all other types. To ensure her indi erence it is necessary that player 2 randomizes in the period that type k signals and that the outcome of player 2's randomization determines the equilibrium of the two-type game that is subsequently played.

Theorem 3 Assume A.1 and let 0 <  <  and (a1; b) 2 G1( ) be given. Then there exists  < 1, p1 < 1 such that for all p with p1 > p1 and for all  >  the game ?(p; ) has an equilibrium with the payo s (( 1; :::; K ); ) 2 0 be given and consider any Nash equilibrium

and any history ht which has positive probability in this equilibrium conditional upon type k. Suppose that conditional upon player 1 being type k the expected continuation payo for player 2 is i h (13) E ~bt+1 j ht ; k  ^b ?  : Then there exists a nite integer N and a number  > 0, both depending only on 2 and , such that (14) k qN ( j ht) ? qN ( j ht; k) k >  :

Proof: To simplify notation let qN = qN ( j ht ) and q^ N = qN ( j ht ; k). Choose N to be the smallest integer such that 2N bmax1??b2min < 2(1? 2) : Next, de ne V~2t+1 (y N ) to be the payo to player 2 over the next N periods discounted to period t + 1, that is

V~2t+1 (y N ) =

(15)

tX +N r=t+1

2r?t?1 B(ir ; j r) :

P For probability distribution qN its expectation is EqN V~2t+1 (y N ) = yN qN V~2t+1 (y N ), and since this is a continuous function of qN , with compact domain, there exists an  > 0 such that k qN ? q^N k   implies that h

(16)

i

EqN V~2t+1 (y N ) ? Eq^N V~2t+1(y N )  2(1 ?  ) : 2



h

i

h

25

i

Let  be as in the statement of the lemma, and assume to the contrary of the lemma k qN ? q^ N k  ; then by (16)

(17)

i h i h E bt2+1 j ht ? E bt2+1 j ht; k i i h h < (1 ? 2 )EqN V~2t+1 (yN ) + 2N bmax ? (1 ? 2 )Eq^N V~2t+1 (yN ) ? 2N bmin i h i h < (1 ? 2 ) EqN V~2t+1 (y N ) ? Eq^N V~2t+1 (y N ) + 2N (bmax ? bmin) < : h

i

But using (13) this implies E ~bt+1 j ht < ^b which is impossible.

Q.E.D.

The next result shows that if player 1 follows the strategy of type k, then there can be only a nite number of periods in which the probability distribution over outcomes predicted by player 2 di ers signi cantly from the true distribution. Eventually, player 2 will predict future play (almost) correctly. Given integers N and n, with N > 0 and 0  n < N , de ne the set T (n; N ) = fn; n + N; n + 2N; : : :g. Suppose that at the end of each of the periods t 2 T (n; N ) player 2 makes predictions about the course of play over the following N periods. The result says that if type k is the true type of player 1 then, no matter how small pk and what strategies the other types of player 1 are supposed to play, in almost all periods player 2 will make predictions which are very close to the true predictions given player 1's type. The result is a straightforward adaptation of the main theorem of Fudenberg and Levine (1992, Theorem 4.1) which is stated for the case N = 1.

Result A. 1 (Fudenberg and Levine) Given integers N and n, with N > 0 and 0  n < N , and for every  > 0, > 0 and a type k of player 1 with pk > 0, there is an m depending only on N ,  , , and pk such that for any (;  ) and any ht consistent with (;  ), the probability, conditional on player 1's true type being k, that there are more than m periods t 2 T (n; N ) with (18)

k qN ( j ht) ? qN ( j ht; k) k >

is less than  .

Proof of Lemma 1: Fix an equilibrium and a type k and choose  = =3 in Lemma A.1; then there is an N and an  such that (14) holds whenever (13) holds. Set =  in Result A.1, take any integer n, 0  n < N , and set  = 3N (^b?bmin ) (assuming that ^b > bmin ; the lemma is trivial otherwise). Then by Result A.1 there is an m ( nite) such that the probability that inequality (14) holds more than m times in T (n; N ) is less than  , so the probability that inequality (13) holds more than m times in T (n; N ) must also be less than  . Hence, considering all values for n, 0  n < N , we have that the probability, conditional upon type k, that the inequality i h (19) E ~bt+1 j ht ; k  ^b ? =3 holds more than Nm times is smaller than N = 3(^b?bmin ) .

26

Next,

i

h

h

i

(20) E ~bt+1 j k = E (1 ? 2 )B(it+1 ; j t+1) + 2~bt+2 j k ; so i h i h (21) (1 ? 2 )E B (it+1 ; j t+1) j k = E ~bt+1 ? 2~bt+2 j k : Hence, player 2's payo against type k in the equilibrium, calculated using player 1's discount factor, is

B(k (1)) = (1 ? 1 )

(22)

1

X

t=0

h

1t E B(it; j t) j k

i

1 i h ? 1 X t E ~bt ?  ~bt+1 j k  = 11 ? 2 2 t=0 1 #) ( " 1 h i h i X 1 ?  1 E 1t (1 ? 2)~bt+1 ht ; k k : = 1 ?  E ~b1 j k + E 2 t=0

Using the result on the number of times (19) holds, for 1 > 2 the random variable 1



E 1t (1 ? 2)~bt+1 ht ; k t=0     ?  1 2 ^ ^ (23)  1 ?  (b ? 3 ) ? (1 ? 2)(b ? bmin)Nm 1 with probability at least (1 ? N ) conditional on k, where we are using the fact that in the event that (19) fails no more than Nm times, subtracting (^b ? bmin ) Nm times undiscounted yields a payo lower than the minimum possible. The random variable is at least 11??12 bmin otherwise. Using this in (22) gives a lower bound, say (1; 2), so that B (k (1))  (1; 2), and notice that (1; 2) is independent of the particular equilibrium studied. Next, taking the limit as 1 ! 1 yields    + Nb ; ^ (24) b ? lim (  ;  ) = (1 ? N ) min 1 2  !1 3 X

h

i

1

hence, since N =

 3(^b?bmin ) , we get

 ^b ?  ? ^b ? bmin ?  lim (  ;  ) = 1 2 ^ 1 !1 3 3(b ? bmin ) 3 2 = ^b ? 23 + ^  9(b ? bmin ) (25) > ^b ? 23 : Choosing  (1k) such that (1; 2) is within 3 of its limit ( (1k) depends only upon pk ,  and 2), we have for 1   (1k) (26) B(k (1))  ^b ?  : 

Set  1 = maxk2K f (1k) g and the result follows.



Q.E.D.

Proof of Theorem 1:

27

We take 2 and p to be xed throughout the proof. First consider condition (i) of De nition 1 of 0 , individual rationality (for player 1). Let (;  ) be a Nash equilibrium pair of strategies for the game ?(p; 1; 2), and suppose that the equilibrium payo pro le for player 1, a = (Ak (k (1 )))k2K , is not individually rational. Then by (1), there exists q 2 K such that q  a < a(q). By the minimax theorem, min q  a < fmax 2I g2J

(27)

X

k

qk Ak (f; g) ;

so that if player 1 plays a mixed action f  which attains the maximum in (27),

q  a
q  a ;

since given that   does not vary with type, conditioning on k does not a ect the distribution over histories. Because q 2 K , it follows that

Ep;; [~ak j k] > ak

(31)

for at least one k, contradicting the de nition of equilibrium. Hence individual rationality must be satisi ed for player 1 for any value of 1 ; that is, a satis es (1). Next, condition (ii) of De nition 1 (incentive compatibility) must be satis ed for any 1 , 0 < 1 < 1, since in any Nash equilibrium Ak (k (1))  Ak (k0 (1 )) for all k, k0 by the de nition of equilibrium (recall that Ak (k (1)) is the equilibrium payo of type k of player 1, and Ak (k0 (1 )) is the payo type k would get from following the strategy of type k0 ). Finally, individual rationality for player 2 must be dealt with. De ne ^ := f(k )k2K 2  j Ak (k )  Ak (k0 ) all k; k0and(Ak (k ))k2K is individually rationalg ; and de ne the compact valued correspondence : [0; 1) !!  by (32)

n

o

() = (k )k2K j Bk (k )  ^b ?  all k 2 K :

Since is clearly an upper hemi-continuous function of , it follows that the correspondence given by \ ^ , which is non-empty (Shalev (1994)), is also upper hemi-continuous. Moreover, if the linear function A((k )k2K ) := (A1(1 ); A2(2 ); : : :; AK (K )) is de ned on , the correspondence given by A[ () \ ^ ] is an upper hemi-continuous function of , with value A at  = 0. Hence given , there is a  > 0 such that for 0   < , all payo s in A[ () \ ^ ] lie

28

within  of A. Choose  in Lemma 1 to be ; the corresponding  1 is therefore as required for (6) to hold. Q.E.D. Proof of Lemma 2

Before we prove Lemma 2, we start with two preliminary results. The rst is an approximation result which allows correlated strategies to be approximated by average behaviour along deterministic sequences of action pro les.

Result A. 2 Let  > 0 be given. There is a ^() < 1 such that if  > ^() and given

any correlated strategy  2 KP, then there exists a sequence of actions f(it; j t)g1 1  t Ak (it ; j t), for all k 2 K , and B ( ) = (1t=0 ? such that: A (  ) = (1 ?  ) k t=0 P1 t t t ) t=0  B(i ; j ); moreover

t?s t t (1 ?  ) 1 t=s  Ak (i ; j ) ? Ak ( )  =2 P1 t?s t t (1 ?  ) t=s  B (i ; j ) ? B ( )  =2

P

s = 0; 1; 2; :::; 8k 2 K; s = 0; 1; 2; ::: :

Proof: The proof can be adapted from the proof of Lemma 2 in Fudenberg and Maskin (1991). Q.E.D. It follows immediately that given  > 0, there is a  ()  ^() such that (ak ; b) 2 Gk () are equilibrium payo s for any  >  (). Next, consider the following strategies, which will be used to construct an equilibrium in which a single randomization occurs. The proof of Lemma 2 will require the iteration of this ?1 as in the statement of Lemma 2 and construction. Take  > 0 to be given and also f(^it; ^j t)gTt=0   an arbitrary (a1; b ) 2 G1(3). Assume that  > maxf();  g.

Type 1 : In period 0 play ^i0 with probability 1/2 and ~i 6= ^i0 with probability 1/2. T ?1

If (^i0; ^j 0) is played in period zero continue to play the sequence f^itgt=0 n times and then in period nT begin playing the equilibrium strategy to get the payo s (a1; b) 2 G1(3). If (~i; ^j 0) is played in period zero play an equilibrium of 1 ( ) to get the payo s (x; f (x)) 2 G1 (2). (Both payo s are equilibrium payo s by the assumption that  >  ().) Minmax all deviations by player 2. Player 2 : In period 0 play ^j 0. If (^i0; ^j 0) is played in period zero continue to play ?1 n times and then in period nT begin playing the equilibrium the sequence f^j tgTt=0 strategy to get the payo s (a1 ; b) 2 G1(3). If (~i; ^j 0) is played in period zero play an equilibrium of 1 ( ) to get the payo s (x; f (x)) 2 G1 (2). Minmax all deviations by player 1. Call the strategies de ned above ^ (n; a1; b) for type 1 and ^(n; a1; b) for player 2 (we suppress the dependence on x and the implicit dependence of the continuation equilibria on  ). We will now establish the following result.

29

Lemma A. 2 Let  > 0 be given; also let f(^it; ^j t)gTt=0?1 and  (as in the statement of

Lemma 2) be given, and (a1 ; b) 2 G1 (3) also be given. If  > maxf ();  ; [4M=(+ 4M )]1=T g then there exists (x; f (x)) 2 G1 (2) so that (^ (n; a1; b); ^(n; a1; b)) is an equilibrium of 1( ), provided a1 + =2 < a1 (2) and n is the largest integer satisfying (33) b(n) := (1 ?  nT )B^ +  nT b > ^b + 2; (34) a1 (n) := (1 ?  nT )A^1 +  nT a1 > a1 (2) + =2:

Proof: The strategies are an equilibrium of 1 ( ) provided: (a) type 1 is indi erent when she randomizes in period zero, and (b) no player prefers to deviate when playing out the sequence f(^it; ^j t)gTt=0?1 n times. Type 1 is indi erent in period zero if we can nd an equilibrium with the payo s (x; f (x)) 2 G1 (2) where the payo x satis es x = a1 (n) ? (1 ? ) A (~i; ^j 0): (35)

  1 But (35) implies that ja1(n) ? xj < 2M (1 ?  )= < =2, where the last inequality follows from our assumptions on  . This implies a1 (2) < x < a1 (2); the lower bound follows as a1 (n) satis es a1(2) + =2 < a1 (n), the upper bound is true since x  a1 + =2 < a1(2). So there exists a pair (x; f (x)) 2 G1(2) where x satis es (35). Type 1's expected payo from continuing to play the sequence when there are t periods of the current sequence and n0  n repetitions of the sequence left to play satis es (1 ?  )

t?1

X

s=0

 sA1 (^iT ?t+s ; ^j T ?t+s)+  ta1 (n0)  ?M (1 ?  T )+  T a(n)  ?M (1 ?  T )+  T (^a1 +2) :

This follows as a1 (n0)  a1 (n). Type 1's payo from deviation is bounded above by (1 ?  T )M + T a^1, so a sucient condition for deviation not to be pro table,  T ( + M )  M , is given in the proposition. An identical argument using the fact that b(n0)  minfb(n); bg shows that player 2 also does not bene t from deviating when they are playing out the sequence n times. Q.E.D. Lemma 2 shows that, by iterating the strategies described in the equilibrium of Lemma A.2, it is possible to play the sequence enough times to nd an equilibrium where type 1 receives a payo close to a1(). We start with an equilibrium of 1 ( ) which has payo s (a1; b) close to the maximum feasible and individually rational payo to type 1 in G1(3), namely satisfying: (a1; b) 2 G1 (3) and a1 (2) ? =2 > a1 > a1 (3) ? . The lemma takes this equilibrium and nds a new equilibrium with the payo s (a1 (n); (b(n) + (1 ?  )B (~i; ^j 0) + f (x))=2), where, by construction, a1(n) < a1 . If the payo s at this new equilibrium satisfy (a1(n); (b(n) + (1 ? )B(~i; ^j 0) + f (x))=2) 2 G1(3), then it is possible to apply the lemma a second time to nd a further equilibrium of 1( ) where type 1 receives the payo a1 (n + n0 ) < a1 (n) < a1 . Again if this new equilibrium gives players payo s in G1(3) it will be possible to iterate the lemma a third time, to nd further equilibria of 1 ( ) where type 1 receives even lower payo s. We de ne (^(N ); ^(N )) to be the strategies that iteratively apply Lemma A.2 to the equi?1 is played out N times; if the last librium with payo s (a1; b) where the sequence f(^it; ^j t)gTt=0 iteration requires a full application of Lemma A.2 then omit the randomization in period 0. (The dependence on the discount factor  and  is suppressed.) There is an upper bound on the number of times Lemma A.2 can be applied, and hence on N ; let Nmax be this upper bound

30

on N . (We show that the strategies (^(Nmax); ^(Nmax)) will imply that a1 is close to a1 (2).) Randomizations by player 1 occur at each new application of Lemma A.2. Proof of Lemma 2: Let ~ := maxf ();  ; [4M=( + 4M )]1=T g. We will rst show that the payo to type 1 at the equilibrium (^(Nmax); ^(Nmax)) is no greater than a1 (4) + 2. If, in the last feasible iteration of Lemma A.2, the constraint a1 (n) > a1 (2) + =2 binds, then a1 (n + 1)  a1 (2) + =2, but since (36) a1(n) ? a1(n + 1) =  nT (1 ?  T )(a1 ? A^1) < (1 ?  T )2M < =2 (where the last inequality follows by our assumption on  ), this implies a(n) < a1 (2) + . On the other hand, if the constraint b(n) > ^b + 2 binds in the last feasible iteration of Lemma A.2 and that (a(n); (b(n) + (1 ?  )B (~i; ^j 0) + f (x))=2) 62 G1(3) then there are two separate cases to consider: (1) If (b(n) + (1 ?  )B (~i; ^j 0) + f (x))=2 > ^b + 3, but (a(n); (b(n) + (1 ?  )B (~i; ^j 0) + f (x))=2) 62 G1(3), then it must be that a1 (n) < a1 (3). (2) If (b(n)+(1?)B(~i; ^j 0)+f (x))=2 < ^b + 3, then (1 ?  )B (~i; ^j 0) + f (x) < ^b + 4 (as b(n) > ^b + 2) so a(n) < a1 (4). The payo to type 1 at the equilibrium (^(Nmax); ^(Nmax)) is thus no greater than a1(4)+ . Therefore, type 1's payo at the equilibrium (^(N ); ^(N )) ranges from a1 (4)+  (for N large) to a1 > a1(3) ?  (for N = 0). By (36), type 1's payo at the equilibrium (^ (N ); ^(N )) increases by at most =2 as N increases in integer steps. Thus there must be a value N for which type 1's payo is within =4 of any point in [a1 (4) + ; a1 (3) ? ]. For any given value of  the equilibrium (^(Nmax); ^(Nmax)) is well de ned, so: there are ?1 is played and there are only only a nite number of periods when the sequence f(^it; ^j t)gTt=0 a nite number of occasions when type 1 randomizes over the actions ^i0 and ~i. Thus for any given value  , there is a strictly positive probability r of always playing ^i0 and not deviating from the sequence. To prove part (b) of the lemma, note that in Lemma A.2 payo s at the equilibrium (^ (n; a1; b); ^(n; a1; b)) are not dependent on the discount factor (apart from minor integer problems), but are determined by the constraints (34) and (33). Thus as  increases the number of iterations of Lemma A.2 necessary when playing out the equilibrium (^ ;  ) will not Q.E.D. vary. And so there is a strictly positive probability r of playing ^i0 for all  > ~. Proof of Lemma 3:

Since  < =3 there exists a pro le of IR payo s, (!k )k2K , in ?(p;  ) satisfying ! k +3 < Ak (k ). There also exists (1 ;  2; :::; K ) 2 (IJ )K such that !k + 17=8  ak (3) ? 7=8 < Ak (k ) < ak (3) ? 3=4, and B(k ) > ^b +3, for all k. We will assume, without loss of generality, that these correlated strategies can also be chosen so that for all  <  it is true that A1 (k ) < A1 (1)?2 for all k > 1. (This is without loss of generality, because it is always possible to choose (1;  2; :::; K) so that A1 (k )  A1 (1) by rede ning 1 where necessary. But if A1 (k ) = A1(1 ) for some k > 1 and all  < , we could replicate the revealing equilibrium below with a partially pooling equilibrium where types 1 and k pooled their actions.) By Result A.2, for any  > ^() we can specify K sequences of action pro les f(itk ; jkt )g1 t=0 such that

Ak0 (k ) = (1 ? ) B(k ) = (1 ? )

1

X

s=0 1 X s=0

 tAk0 (itk ; jkt );  tB(itk ; jkt );

31

8k; k0 2 K; 8k 2 K:

By Result A.2 we can also choose these sequences so that player k0 's continuation payo s are within =2 of Ak0 (k ) at all future times. In this proof we will choose  < 1 so that (i)  > ^(), (ii)  >  , (iii)  > [16M=(16M + 1 =K )] , (iv)  > [(^b3 +  + M )=(Bk (k ) + M )]1=K for all k. The second condition ensures that player 2 can hold the types of player 1 to within  of any IR payo s. The third ensures that the loss from signalling is at most =8 and the last condition ensures that player 2 never gets less than ^b + 3. We now show that the following strategies are an equilibrium of ?(p;  ): Player 2 begins by playing the xed sequence of actions associated with type 1, fj1t g, and if he observes player 1 deviating from her corresponding sequence fit1g in period t = k ? 2, for t = 0; 1; :::; K ? 2, he interprets this move as a signal that player 1 is type k. When type k is signalled he then begins to play out the sequence fjkt g1 t=0 from the beginning and expects player 1 to play out the corresponding sequence fitk g1 . If player 1 deviates from the sequence fit1g in period t > K ? 2, t=0 t or deviates from the sequence fik g once type k has been signalled, then player 2 punishes these deviations by holding her to the payo s (!k )k2K + 1 (de ned above). This is possible as  >  . Each of player 1's types play a best response to this strategy of player 2 and minmax player 2 if he deviates from the above strategy. If each type k signals truthfully her expected payo is bounded below by ak (3) ? , because, by the assumption 16M (1 ?  K ) <  K , the payo s obtained over the rst K ? 1 periods contribute at most =8 to her total payo . Thus the optimal response of type k's to 2's strategy must give her a payo ,  k , satisfying  k > ak (3) ? , since she always has the option of signalling truthfully. In general the optimal response for type k will be to signal some type k0 and will never trigger the punishment from player 2. Suppose this were false, and it is optimal for type k to signal type k0 and to trigger the punishment after s periods of mimicking type k0 . Her payo from playing out the sequence f(itk0 ; jkt 0 )g1 t=0 in its entirety can be decomposed into her average payo over the rst s periods, x, and her average payo over the remaining periods, y ; Ak (k0 ) = (1 ?  s)x +  sy. By the construction of the sequence we know that at any point in time the continuation payo satis es y > Ak (k0 ) ? =2, by combining these two facts there is an upper bound on x; (1 ?  s)x < (1 ?  s)Ak (k0 ) +  s =2. Thus her payo from mimicking type k0 and then deviating in period s is bounded above by (1 ?  s )Ak (k0 ) +  s =2 + (1 ?  ) s M +  s+1 (!k + ): If she prefers to be punished from time s we must have that Ak (k0 )  ! k + 3=2, because her payo from continuing to play fitk0 g1 t=0 is at least Ak (k0 ) ? =2 (by the construction of the action sequences). This upper bound for Ak (k0 ) implies that her payo from mimicking type k0 and then deviating after s periods is less than !k + 2. By constuction this is strictly less than the payo from truthful revelation, described above, which gives a contradiction. It is now clear that when the types k > 1 of player 1 use an optimal response to player 2's strategy his expected payo , , is determined by playing at most K ? 1 arbitrary actions followed by one of the xed sequences described above. But we have assumed that (1 ?  K )(?M ) + K B(k ) > ^b + 3; therefore,   ^b + 3. Player 2 cannot pro tably deviate from her strategy since her expected payo from abiding by her equilibrium strategy is bounded below by ^b + 3, where as her maximum payo from deviating is at most (1 ?  )(M ) + ^b and the lower bound on  is sucient to ensure that the former is greater than the latter. Q.E.D.

32

Proof of Lemma 4:

Consider the constrained optimization k ( ) ? ak ; (37) subject to A1 ( )  ^a1: maxIJ A 2 a1 ? A1 ( ) As a1 > a^1, by assumption A.1, the maximand is well de ned. As the constraint set is nonempty (by the Minimax Theorem) and compact there is a solution k0 to the optimization for all k0 > 1. We aim to show that the point z, de ned in (8), is individually rational. We must, therefore, show that the set fxjx  zg is approachable. By Zamir (1992), for example, it is sucient to show that for any q 2 0. By the de nition of a, ak  Ak (i; g^), together with the fact that q  0, this implies q(a ? z) > 0. A substitution from the de nition of z shows this is equivalent to K

!

k (k ) ? ak > 0: (a1 ? a^1 ) q1 + qk A (39) k=2 A1 (k ) ? a1 We must show that if (39) holds, q((A1(i; ^g1); :::; AK(i; ^g1)) ? z)  0 for all i 2 I . It is sucient to show q((A1( ); :::; AK( )) ? z)  0 for all  such that A1( )  a^1 . A substitution for z then X

gives

q((A1(); :::; AK()) ? z)

K

 ) ? a  A (  k k k = q1 (A1 ( ) ? a^1 ) + qk Ak ( ) ? ak + (a1 ? ^a1) A ( ) ? a 1 1 k k=2   K X A (  ) ? a  ) ? a  A (  a  ? a ^ k k k k 1 1 k = (A1 ( ) ? a^1 )q1 + (a1 ? A1 ( )) qk + a1 ? A1 ( ) a1 ? A1 ( ) A1(k ) ? a1 k=2 K A ( ) ? a ! X  (A1() ? a1) q1 + qk Ak (k ) ? ak  0 8A1()  a^1: 1 k 1 k=2 The rst inequality arises because  is replaced by  k in (Ak ( ) ? ak )=(a1 ? A1 ( )) and this is therefore maximized on the set of  's with A1 ( )  a^1 . The nal inequality then follows from (39). Thus if q((A1(i; ^g); :::; AK(i; ^g)) ? z) > 0 it must be true that q((A1(i; ^g1); :::; AK (i; ^g1)) ? z)  0. We can conclude that z is individually rational. Q.E.D. X



Proof of Corollary: For k0 = 2; 3; :::; K , let  (k0) be a rational approximation to the correlated strategy k0 , such that kk0 ?  (k0)k < =4 for k0 = 2; 3; :::; K . There exists a positive integer T such that T (k0)ij

33

is an integer for all k0 = 2; 3; :::; K , i 2 I and j 2 J , (where  (k)ij denotes the ij th element of the correlated strategy  (k). Choose the K ? 1 sequences so that the action pair (i; j ) appears ?1. Continuity then ensures that there exists   such T(k0)ij times in the sequence f(isk0 ; jks0 )gTs=0  that for all  >  the result holds. Q.E.D. Proof of Theorem 3:

Some de nitions and notation Choose Q > 0 to be an upper bound on the di erence between ak () and ak for all  2 (0; ), in particular, choose Q so that (40)

ak ? ak (3) + 3=4 < Q

8k 2 K; 0 <  < :

We will also de ne a positive constant R as follows ? ? ?a ?  ? A (  ) k k k ? ?: R := max (41) ? k a1 ? A1( k ) ? From Lemma 4(b) we have that Ak ( k ) ? ak  Ak (k0 ) ? ak ; (42) a1 ? A1(k ) a1 ? A1 (k0 )

8k; k0 = 2; 3; :::; K:

Without loss of generality we will assume that this inequality is strict when k 6= k0 , that is, Ak (k ) ? ak > Ak (k0 ) ? ak ; 8k; k0 = 2; 3; :::; K k 6= k0 : (43) a1 ? A1 (k ) a1 ? A1 (k0 )

(This is because if there exists a k 6= k0 satisfying (42) with equality, we can nd an equilibrium where both type k and k0 send the same signal in the signalling phase and play out the same nite sequence of actions N times before nally revealing their types.) Recall that  > 0 is given, where  < . Choose  > 0 so that: (i)  <  ; (ii) for all k; k0 = 2; 3; :::; K with k 6= k0 it is true that A^k;k ? ak > A^k;k0 ? ak + (2 + R); (44) a1 ? A^1;k a1 ? A1;k0

(iii) 0    1 such that a^1+(1?)a1 > a^1 + implies z+(1?)a is (2+(Q+1)(R+1))-IR. ((ii) is possible as we have assumed (43) and the payo s from playing out the action sequences can be made aribitrarily close to the payo from playing the correlated strategies  k , jA^k;k0 ? Ak (k0 )j < =2, by Corollary 1. (iii) is possible because the sets of -IR payo s are convex and these sets converge to the set of IR payo s as  ! 0. So (a) as the point a is (2 + (Q + 1)(R + 1))-IR for  suciently small, (b) the set of -IR payo s is convex and converges to the set of IR payo s as  ! 0, and (c) the point z is IR, then the convex combination (1 ? )z + a, for a given  < 1 will be (2 + (Q + 1)(R + 1))-IR provided  is suciently small.) Let T and   be as de ned in the Corollary, and choose  = maxf~;  ; g, where ~ is as de ned in Lemma 2,  is de ned below De nition 1, and  is de ned in Lemma 3.

34

1. The Game with Two Types : arbitrary payo for type 1 Recall that Lemma 2 de ned an equilibrium (^ (N ); ^ (N )) of the complete information game where, with occasional randomizations, type 1 and player 2 play out a nite sequence of actions N times and then settle on an equilibrium with the payo s (a1 ; b). Recall also that type 1's ?1 (de ned in the Corollary) is not average payo over the nite sequence of actions f(^isk ; ^jks )gTs=0 greater than a^1 +  for all  >  , and that the equilibrium with payo s ( 1;  2; :::;  K ; ) de ned in Lemma 3 has payo s that satisfy ( 1 ; ) 2 G1 (3) and a1 (2) ? =2 >  1 > a1 (3) ? . Let a01 2 [a1 (4) + ; a1(3) ? ] be given; then by Lemma 2 there exists N and strategies (^(k; N ); ^(k; N )) that, for  close to 1, are an equilibrium of 1 ( ), in which type 1 gets a ?1 is played N times with payo within =4 of a01 . At this equilibrium the sequence f(^isk ; ^jks )gTs=0 occasional randomizations by type 1 and nally, if 1 has not deviated from the sequence, play settles on an equilibrium of 1 ( ) where the players receive the payo s ( 1; ). By Lemma 2, there is a probability of at least r that type 1 ends up playing the equilibrium with payo s ( 1; ). We will now de ne strategies in the game ?(p;  ) where p1 > 1 ? r, pk < r and pk0 = 0 for all k0 6= 1; k. ?1 N times and then plays out the equilibrium Type k plays out the nite sequence f(^isk ; ^jks )gTs=0 described in Lemma 3. Deviations by player 2 from his equilibrium strategy are minmaxed. Type 1 plays a strategy so that from player 2's perspective the combined actions of types 1 and k over the rst TN periods replicate the strategy ^ (k; N ), de ned above, and, after TN periods of playing the sequence, type 1 settles down to play the equilibrium de ned in Lemma 3. Thus, in periods where ^ (k; N ) requires player 1 to randomize with probability 1=2, type 1 actually deviates from the sequence with probability more than 1=2 to compensate for the fact that type k never deviates from the sequence. Provided pk < r it is feasible for type 1 and type k to replicate the strategy ^(k; N ) in this fashion. Deviations by player 2 from his equilibrium strategy are minmaxed. Player 2 will play out the strategy ^(k; N ) on the equilibrium path over the rst TN periods with the terminal equilibrium de ned in Lemma 3 being played thereafter, or one of the revealing equilibria if type 1 has revealed her type. However, if player 1 uses a pure action that deviates from her equilibrium strategy (i.e., a probability zero action), then player 2 responds in the following way. He rst calculates type 1's expected payo if she were to continue playing out her strategy (call this c). Then he takes the convex combination z + (1 ? )a, of the point z (de ned in (8)) and the point a (de ned in (7)), that gives type 1 exactly the payo c, that is,  = (a1 ? c)=(a1 ? a^1 ). By the construction above, if c > a^1 +  then this convex combination is (2 + (1 + R)(1 + Q))-IR. That is, there exists a vector of IR payo s (!1 ; ::::; !K) 2  . To show that these strategies form an equilibrium of the game ?(p;  ) which gives positive

35

probability only to types f1; kg, it is sucient to show that type 1 and type k do not bene t by deviating from their equilibrium strategy. (Lemma 2 guarantees that type 1 is indi erent in periods when she must randomize and that player 2 is playing an optimal response to types 1 and k.) By the construction above, if  >  then type 1's expected payo from deviation is at most (1 ?  )M +  (!1 + ), whereas her expected payo from continuing, c, satis es c > !1 + 3; our assumption on  is sucient to ensure deviation is suboptimal. Finally, it must be shown that type k cannot pro tably deviate. Type k can deviate from the equilibrium by mimicking type 1 revealing her type (by playing ~i at a point of randomization), and then by continuing to mimic type 1, playing out an equilibrium of the game 1 ( ). If this is pro table, then instead of the above described equilibrium, there will be a pooling equilibrium where types 1 and k play out the sequence for a nite number of times, with type 1 randomizing as before, and then both diverge from it together at some point with probability one. Hence, without loss of generality, we can ignore this possibility. Next, if c and d are respectively type 1 and type k's continuation payo s in the period a deviation happened then the punishments satisfy

!k + (2 + (1 + R)(1 + Q))  ak ? (a1 ? c) aak ?? AAk((k))

1 1 k 0 TN ^ = d + f(1 ?  )Akk +  TN 0 k ? dg +  TN 0 fak ?  k g + (1 ?  TN 0 )fAk ( k ) ? A^kk g Ak (k ) n(1 ?  TN 0 )fA^ ? A ( )g ? f(1 ?  TN 0 )A^ +  TN 0  ? cg + aak ? 1 1k 1k 1 k 1 ? A1 (k ) o ?TN 0 fa1 ? 1g

(46)

< d + f(1 ?  TN 0 )A^kk +  TN 0 k ? dg + Q + =2 ? Ak (k ) n=2 ? (1 ? TN 0 )A^ ? TN 0  + c + Qo : + aak ? 1k 1 1 A1 (k )

The nal inequality follows from (40), Ak (k ) ? A^kk < =2 and A^1k ? A1 (k ) < =2 (which follows from Corollary 1). Type 1's continuation payo , c, is determined either by (i) continued playing out of the sequence f(^isk ; ^jks)g followed by the terminal equilibrium, or by (ii) her payo from playing out a revealing equilibrium (if she has already revealed her type). In case (i) if type 1 has N 0 complete repetitions of the sequence left to perform (but fewer than N00 +1), then analogously 0 with the derivation of (36) type 1's payo satis es j(1 ? 0 TN )A^1k + TN 1 ? cj  =2 and type 0 TN ^ k's continuation payo , d, satis es j(1 ?  )Akk +  TN  k ? dj  =2. These inequalities, and (41), in (46) give !k + 2 + (Q + 1)(R + 1) < d + (Q + 1)(R + 1); thus !k + 2 < d and a deviation for type k is not pro table in this case. In case (ii) type 1 and type k's payo from playing out the revealing equilibrium at any time t must be within =2 of her initial payo from the equilibrium (by Result 3). Thus again their payo s from continuing with the equilibrium TN 0 )A^1k +  TN 0  1 ? cj  =2 and type k's continuation payo d satis es will satisfy j(1 ?  d > (1 ?  TN 0 )A^kk +  TN 0  k ? =2, where N 0 in this case is the period in which type 1 deviated from playing the xed sequence. So the argument of case (i) above will apply again.

2. The game with two types : arbitrary payo for type 1 and player 2 The strategies above are an equilibrium, so, given any a1 2 (a1 (4) + ; a1 (3) ? ), the game ?(p;  ) has an equilibrium with the payo s ( 1; ) where type 1's payo , 1 , satis es j 1 ? a1 j
 and p1 > 1 ? r and pk0 = 0 for all k0 62 f1; kg. Given this result we can now nd an equilibrium of ?(p;  ) where both type 1's and player 2's payo s, ( 1 ; ), satisfy k( 1; ) ? (a1; b)k <  for any pair (a1; b) 2 G1( ), provided that: (i) a1 2 (a1(4)+ ; a1(3) ? ), (ii)  >  , (iii) p1 is suciently large. Moreover, we will construct an equilibrium where player 2's payo  can be chosen to vary continuously. To do this it is necessary to alter rst period strategies in the equilibrium above so that type 1 randomizes | with probability 1 ?  she plays out the equilibrium described above and with probability  she reveals her type by playing ~i = 6 ^i0, and play follows an equilibrium of the complete information game in which rst period actions are (~i; ^j 0) and where type 1 and player 2 receive the payo s ( 1 ; b) 2 G1 (). As type 1 is indi erent, such a change will still give an equilibrium of the game with incomplete information, provided that any deviation from the equilibrium of the complete information game with the payo s ( 1; b) is punished in the same way as a deviation in period zero of the equilibrium of the incomplete information game, and provided the discount factor is high enough to prevent a deviation by player 2 in period 0. Type 1 and player 2's expected payo s from these strategies are ( 1 ; ) = ( 1 ; p1b + (1 ? p1) ), so the di erence between player 2's payo and b satis es

jp1b + (1 ? p1) ? bj = j ? bj(1 ? p1)  2M (1 ? p1) By choosing  = (1 ? =(4M ))=p1 we can ensure that  is within =2 of b. If type 1 does not reveal her type, then 2's posterior for type k is (1 ? p1)=(1 ? p1 ). Thus, given , if p1 is suciently close to unity and  = (1 ? =(4M ))=p1, then  is within =2 of b and type 2 will still attach posterior probability of less than r to type k. Thus it is possible for there to be an initial

move where type 1 reveals her type with high probability, and even if she does not reveal her type there is still suciently high probability attached to type 1 in player 2's posterior for the equilibrium with the payo s ( 1; ) to be played out. Since  >  (), the payo s ( 1 ; b) 2 G1() can be chosen so that b varies continuously in the interior of G1 () when this is non-empty. In this case it is possible to continuously vary the payo  received by player 2 in the two-type game.

3 The game with many types We now describe the players' strategies in the repeated game of incomplete information ?(p;  ) where all types are given positive probability, and show that these strategies are an equilibrium with payo s satisfying (10). The play in the game is divided into a signalling phase, where all types are given positive probability, and a payo phase where only two types of player 1 are given positive probability. If K = 2, the theorem follows from part 2 above, so henceforth it is assumed that K > 2. The Signalling Phase : Periods t=0,1,...,K-3: The players use the following strategies: Type k, where k = 2; 3; :::; K ? 1, plays action it = 1 in periods t = 0; 1; :::; k ? 3 and in period t = k ? 2 she plays action i = 2 to signal her type. Type K plays action it = 1 in periods t = 0; 1; :::; K ? 3. Type 1 chooses a type k = 2; 3; :::; K with equal probability and mimics her signalling strategy. (All of the types of player 1 minmax player 2 if she chooses a pure action that is not played with positive probability in the signalling phase.) Player 2 plays action j = 1 with probability q 0 and action j = 2 with probability 1 ? q 0 in period zero. If, in period t < K ? 2, player 1 used action i = 1 in all past periods, then player 2 plays action j = 1 with probability qt and action j = 2 with probability 1 ? q t.

37

The signalling phase, thus, nishes when player 1 plays action it = 2 in a period t < K ? 2 (this signals that player 1 is type 1 or type k = t + 2), or if player 1 has played it = 1 in all periods t = 0; 1; 2; :::; K ? 3 (this signals player 1 is type 1 or type K ). At the end of the signalling phase only two types of player 1, f1; kg, will be given positive probability by player 2. We will use h(k; 1) to denote the history up to and including the point where type k is signalled and j t = 1 in the period the signal was sent and h(k; 2) to denote the history where type k was signalled and j t = 2 in the period the signal was sent. The construction of the signalling phase ensures that when it is nished player 2's posterior beliefs will still attach positive probability to type 1, moreover by increasing the prior p1 these posteriors can give arbitrarily high probability to type 1. Thus, it is possible to choose p1 suciently high so that the equilibrium of the game with two types (described above) can be played after the signalling phase. After the Signalling: If the history is h(K; 1) or h(K; 2) the players play out the equilibrium described in part 2 of the proof where type 1 and type K are given positive probability, with payo s to type 1 and player 2, ( y1 ; y), satisfying j y1 ? a1 j < =2 and j y ? bj < =2. If the history is h(k; 1) (respectively h(k; 2)), where k < K , an equilibrium of the game giving positive probability to types f1; kg is played (as described in part 2) where type 1 and player 2 receive the payo s (to be de ned below) ( y1 (k; 1); y(k; 1)) (respectively ( y1(k; 2); y(k; 2))). We must rst show that each player is indi erent in the periods she/he is required to randomize. Consider rst period t = K ? 3; in this period type 1 plays action i = 1 and i = 2 with equal probability and player 2 randomizes between j = 1 and j = 2. First consider type 1's payo s from her two actions: (i = 1) (i = 2)

(1 ?  )A1 (1; q K ?3) +  y1; (1 ?  )A1 (2; q K ?3) +  fq K ?3 y1 (K ? 1; 1) + (1 ? q K ?3 ) y1 (K ? 1; 2)g;

where A1 (i; q K ?3) is an abuse of notation that denotes type 1's stage-game payo when she uses action i and player 2 plays (q K ?3 ; 1 ? q K ?3 ) on the actions j = 1 and j = 2. By our assumption on  we have (1 ?  )jA1(1; q K ?3) ? A1(2; q K ?3)j < (1 ?  )2M < =2. So a sucient condition for the existence of a q K ?3 that makes type 1 indi erent between actions i = 1 and i = 2 in period K ? 2 is that the payo s y1 (K ? 1; 1) and y1 (K ? 1; 2) satisfy

y1 ? 3=2 < y1(K ? 1; 1) < y1 ? =2; y1 + 3=2 > y1 (K ? 1; 2) > y1 + =2: Such a choice is possible by (36), the fact that the payo s y1 (K ? 1; j ) decrease in steps of less than =2 in part 2 of the proof, and (a1; b) 2 G1( ). For player 2 to be indi erent between actions j = 1 and j = 2, let (; 1 ? ) denote the probability he attaches to i = 1 and i = 2 in period K ? 3, so that  = (pK + p1=(K ? 1))=(pK ?1 + pK + 2p1 =(K ? 1)); then his payo s to

(47) (48)

his two actions are (abusing notation as above) (j = 1) (j = 2)

(1 ?  )B (; 1) +  [ y + (1 ? ) y(K ? 1; 1)]; (1 ?  )B (; 2) +  [ y + (1 ? ) y(K ? 1; 2)]:

Player 2 is indi erent between these two actions if (49)

[(1 ?  )= ]fB (; 1) ? B (; 2)g = (1 ? )f y(K ? 1; 2) ? y(K ? 1; 1)g:

38

Since: (i) the LHS of (49) is less than =2 in absolute value by the assumption on  , and 1 ?  is bounded away from unity (and will never be less than 1=2 when p1 is large), and (ii) the payo s y(K ? 1; 1) and y(K ? 1; 2) at the two equilibria can be made to vary continuously when Int G1() 6= ;, by part 2, it is possible to choose equilibria such that the payo s ( y1 (K ? 1; 1); y(K ? 1; 1)), ( y1(K ? 1; 2); y(K ? 1; 2)) satisfy (47), (48) and (49) where j y(K ? 1; i) ? bj < =2 for i = 1; 2. (When Int G1() = ; it is necessary to have type 1 vary the probability he mimics type K ? 1 and the right of (49) is continuous in this.) The above construction ensures that in period t = K ? 3 it is possible to make all players willing to randomize in the manner prescribed by their strategies. An iterative application of this argument for periods t = K ? 4; K ? 5; :::; 0 will then justify all of the randomizations in all of the periods of the signalling phase. Type 1 is indi erent among all her actions, so her equilibrium payo equals what she gets if she mimics type K ; this is bounded below by (1 ?  K ?2 )(?M ) +  K ?2 (a1 ? =2) and bounded above by (1 ?  K ?2 )(M ) +  K ?2 (a1 + =2), so, by our assumption on  , her payo can be at most  from a1 . A similar argument shows that player 2's payo is also within  of b. To establish the strategies above are an equilibrium we now show that no player wishes to deviate from her/his equilibrium strategies. Lemma 2 established that player 2 does not bene t by deviating from the equilibrium above. Part 2 of this proof ensures that type 1 does not bene t by deviating from the strategies described above, nor does type k bene t by deviating when she has signalled that she is type k. The only additional possible deviation that can arise when there are many types is when type k mimics type k0 , or mimics type k0 and then deviates to take a punishment. Now suppose type k sends the signal of type k0 and then plays out her nite sequence N 0 times before settling at the equilibrium described in Lemma 3. From (9) her payo from this is (1 ?  TN 0 )A^k;k0 +  TN 0  k , whereas her payo from playing her equilibrium strategy can be written as (1 ?  TN )A^k;k +  TN k . Also let c be type 1's expected payo from mimicking type k and c0 be her expected payo from mimicking type k0 , that is, (50) c = (1 ?  TN )A^1;k +  TN 1 = (1 ?  TN )(A^1;k ? 1 ) + 1 (51) c0 = (1 ?  TN 0 )A^1;k0 +  TN 0  1 = (1 ?  TN 0 )(A^1;k0 ? 1 ) + 1 Thus we have the following sucient condition for type k to play her equilibrium strategy: (1 ?  TN )A^k;k +  TN  k > (1 ?  TN 0 )A^k;k0 +  TN 0  k + 2; or equivalently (1 ?  TN )(A^k;k ?  k ) > (1 ?  TN 0 )(A^k;k0 ? k ) + 2; or A^k;k ?  k ( ? c) > A^k;k0 ? k ( ? c0) + 2;  1 ? A^1;k 1  1 ? A^1;k0 1 where the last inequality follows from substitution for (1 ?  TN ) from (50) and for (1 ?  TN 0 ) from (51). Since in equilibrium jc ? c0 j <  and (44) holds, it is optimal for type k to play her equilibrium strategy. If type k mimics type k0 and then deviates when type 1's continuation payo is c, the strategies described in part 1 of the proof impose the same punishment on type k as the punishment she would have received if she had truthfully signalled her type and again deviated when type 1's continuation payo was c. A repetition of the above argument shows that this latter option is strictly preferred to the former, and hence a fortiori type k prefers to use her equilibrium strategy. Q.E.D.

39