Two-person repeated games with finite automata - Center for the ...

2 downloads 0 Views 171KB Size Report
... with finite automata. Abraham Neyman1, Daijiro Okada2 ... then de®ned to be the smallest number of states of an automaton required to implement it. Speci®c ...
Int J Game Theory (2000) 29:309±325

2000 9 99 9

Two-person repeated games with ®nite automata Abraham Neyman1, Daijiro Okada2 1 Institute of Mathematics, The Hebrew University of Jerusalem, Givat Ram, Jerusalem 91904, ISRAEL and SUNY at Stony Brook, Stony Brook, NY 11794-4384, USA (email: [email protected]) 2 Department of Economics, SUNY at Stony Brook, Stony Brook, NY 11794-4384, USA (email: [email protected]) Received February 1997/revised version March 2000

Abstract. We study two-person repeated games in which a player with a restricted set of strategies plays against an unrestricted player. An exogenously given bound on the complexity of strategies, which is measured by the size of the smallest automata that implement them, gives rise to a restriction on strategies available to a player. We examine the asymptotic behavior of the set of equilibrium payo¨s as the bound on the strategic complexity of the restricted player tends to in®nity, but su½ciently slowly. Results from the study of zero sum case provide the individually rational payo¨ levels. JEL classi®cation: C73, C72. Key words: repeated games, ®nite automata

1. Introduction The objects of this study is two-person non-zero sum repeated games in which there is a bound on the complexity of strategies for only one of the players. Throughout the paper player 1 will be the restricted player. We employ automata to represent repeated game strategies. The complexity of a strategy is then de®ned to be the smallest number of states of an automaton required to implement it. Speci®c models of repeated games studied are (1) The ®nitely repeated game G n …m…n†† and (2) The l-discounted game Gl …m…l††. Here, m…† is the bound on the number of states of automata available to player 1 and this is a function of the number of repetitions, n, or the discount factor, l. We examine the set of limit points of equilibrium payo¨s of G n …m…n†† (resp. Gl …m…l††) as

310

A. Neyman, D. Okada

m…n† ! y (n ! y) (resp. m…l† ! y (l ! 1)). The particular case considered in this paper is when m…† grows ``su½ciently slowly''1 Formally, we will examine the cases when lim

n!y

m…n† log m…n† ˆ 0; n

…1:1†

and lim…1

l!1

l†m…l† log m…l† ˆ 0:

…1:2†

We will show that, under these conditions on m…†, the Hausdor¨ limit of the set of equilibrium payo¨s, Limn!y E…m…n†† and Liml!1 E…m…l††, exist and they coincide with the set of the feasible payo¨s above certain individually rational levels. For player 1 this level will be his maxmin payo¨ in the oneshot game where max ranges over his pure actions and min ranges over player 2's pure actions, and for player 2 it will be her minmax of the one-shot game where min ranges over player 1's pure actions and max ranges over her own pure actions. The determination of these individually rational levels, under the conditions (1.1) and (1.2), can be provided by our analysis of the zero sum case, Neyman and Okada (1999). In this paper, however, we will explicitly construct player 2's strategy that e¨ectively punishes the restricted player 1. This will provide a simpler proof of a result originally proved in Neyman and Okada (1999) using the concept of entropy. See also Neyman and Okada (2000) for an alternative proof. It will be seen that the equilibria that we construct are in pure strategies and the equilibrium paths are cyclic. Our result for ®nitely repeated games implies in particular that, in ®nitely repeated prisoner's dilemma, the friendly, or nearly friendly, outcomes can be achieved in an equilibrium when there is a bound on the strategic complexity on only one of the players. In addition, Folk-Theorem type results like ours have an implication that in non-zero sum games, being restricted in terms of strategic possibility is not necessarily detrimental even against a powerful unrestricted player. Only the punishment will be severe for the restricted player. Related literature includes Neyman (1999) and Papadimitriou and Yannakakis (1994) which contain several results on the asymptotic behavior of the set of equilibrium payo¨s of two-person ®nitely repeated games when there are bounds on the strategic complexity for both players. These results encompass Neyman (1985)'s justi®cation of cooperation in ®nitely repeated prisoners' dilemma. For example, the main theorem of Neyman (1999) states that if the two bounds on the size of automata are subexponential as a function of minfthe number of repetitions; the larger boundg, then the asymptotic folk theorem is obtained. More precisely, let G be a two-person game in strategic form. Denote by v i the minmax payo¨ for player i where min ranges over the other player's mixed actions and max ranges over i's own pure ac1 For example, the condition (1.1) holds for all functions m…n† ˆ n a where 0 < a < 1, while it is violated for m…n† ˆ n. In addition, the function m…n† ˆ n=log n, for which m…n†=n ! 0 …n ! y† but m…n†=n a ! y …n ! y† for all 0 < a < 1, violates (1.1). The authors thank a referee for this comment.

Two-person repeated games with ®nite automata

311

tions. Let G n …m1 …n†; m2 …n†† be the n-fold repetitions of G in which player i's strategies are restricted to those implementable by automata of size at most mi …n†, a function of n. If the sequence of triples …n; m1 …n†; m2 …n††y nˆ1 satis®es the conditions minfm1 …n†; m2 …n†g ! y …n ! y† and lim

n!y

log…maxfm1 …n†; m2 …n†g† ˆ 0; minfn; m1 …n†; m2 …n†g

then the set of equilibrium payo¨ vectors of G n …m1 …n†; m2 …n†† converges to the set of payo¨ vectors which are feasible and give player i at least v i . Zemel (1989) contains results similar to Neyman (1985) but using modi®ed ®nite automata which can send messages, in addition to those conveyed through the actions taken, during the play. Similar results have been obtained for other classes of repeated games. Ben-Porath (1993) studies the undiscounted in®nitely repeated games with N N ; …ri †iˆ1 † be an N-person game in strategic ®nite automata. Let G ˆ ……Ai †iˆ1 form, and vi ˆ

min

max Eq …r…a i ; b††

q A j0i D…A j † a i A A i

and w i ˆ max

min

p A D…A i † b A j0i A i

Ep …r…a i ; b††

where D…X † denotes the set of probability distributions on a set X and Em denotes the expectation with respect to the probability m. Consider the in®nitely repeated game in which player i has a complexity bound mi …k†, parameterized by positive integer k, with m1 …k† U    U mN …k† and m1 …k† ! y…k ! y†. Denote this game by Gy …m1 …k†; . . . ; mN …k†† and the set of its equilibrium payo¨s by E y …m1 …k†; . . . ; mN …k††. One of his results asserts that if lim

k!y

log mN …k† ˆ 0; m1 …k†

then, (i) the set of feasible payo¨s which give each player i at least v i is included in lim inf k!y E y …m1 …k†; . . . ; mN …k†† and (ii) lim supk!y E y …m1 …k†; . . . ; mN …k†† is included in the set of feasible payo¨s which give each player i at least w i . Note that v i V w i . For two-person games, we have v i ˆ w i . Hence one can conclude that limk!y E y …m1 …k†; m2 …k†† exists and it coincides with the set of equilibrium payo¨s of the in®nitely repeated game without complexity bound (the Folk Theorem). This result crucially depends on the study of the twoperson zero sum case which provides the individually rational levels v i and w i . The exact asymptotics, i.e., the limit, if exists, of E y …m1 …k†; . . . ; mN …k†† for Nperson case is not known. See Section 4 of Neyman (1997). Lehrer (1988)

312

A. Neyman, D. Okada

contains a similar result for two-person games with bounded recall. Also see Lehrer (1994) for N-person case with bounded recall. The main contribution of this paper is a determination of individually rational payo¨ levels together with the construction of equilibria which have certain robustness properties and can be applied to a wider variety of conditions on the order of magnitude of the complexity bound m…†. The next section introduces the model of repeated games and ®nite automata. In Section 3 we will construct player 2's strategy which will be used to punish player 1 in equilibria constructed in the subsequent chapters. The results on the asymptotics of the set of equilibrium payo¨ vectors are presented in Section 4 (the ®nitely repeated games) and Section 5 (the discounted games). Section 6 concludes the paper. 2. Repeated games and automata Let G ˆ …A; B; h; k† be a two-person game in strategic form where A and B are ®nite sets of actions, and, h : A  B ! R and k : A  B ! R are the payo¨ functions of player 1 and 2, respectively. We call G the stage game. Throughout the paper we will assume without loss of generality that all payo¨s are nonnegative, i.e., h…a; b† V 0 and k…a; b† V 0 for all …a; b† A A  B. Denote the maxmin value of the stage game for player 1 in pure actions by h and the minimax value for player 2 in pure actions by k  , i.e., h ˆ max min h…a; b† aAA bAB

and

k  ˆ min max k…a; b†: aAA bAB

Also set khk ˆ maxa; b jh…a; b†j and kkk ˆ maxa; b jk…a; b†j. Given G ˆ …A; B; h; k† we next describe a new game in which G is played repeatedly (with complete information and standard signaling). For each positive integer n, let Sn (resp. Tn ) be the set of mappings from …A  B† n 1 to A (resp. to B) where …A  B† 0 ˆ ffg. A pure strategy of player 1 (resp. player 2) is an element of S ˆ n Sn (resp. T ˆ n Tn ). Equivalently, S (resp. T ) is the set of all mappings on the set of all ®nite histories 6nV1 …A  B† n 1 to A (resp. B). A mixed strategy of player 1 (resp. player 2) is a probability distribution on S (resp. T ). The sets of mixed strategies are denoted by D…S† and D…T†. Every pair of pure strategies …s; t† induces a play o…s; t† ˆ …ol …s; t††y lˆ1 A …A  B†y where ol …s; t† is de®ned inductively as  for l ˆ 1 …s1 …f†; t1 …f†† ol …s; t† ˆ …al ; bl † ˆ …sl …o1 ; . . . ; ol 1 †; tl …o1 ; . . . ; ol 1 †† for l > 1 Accordingly, every pair …s; t† of mixed strategies induces a random play o…s; t† ˆ …ol …s; t††y lˆ1 . We denote the corresponding probability distribution on the set of plays …A  B†y by Ps; t and the expectation with respect to Ps; t by Es; t . For each positive integer n we de®ne thePn-average payo¨ function of n h…ol …s; t††. Also, for each player 1, hn : S  T ! R, by hn …s; t† ˆ …1=n† lˆ1 l A ‰0; 1† we de®ne the l-discounted payo¨ function of player 1, hl : S  T ! P l 1 l h…o …s; t††. The n-average and the lR by hl …s; t† ˆ …1 l† y l lˆ1 discounted payo¨ functions of player 2, kn and kl , are similarly de®ned. The

Two-person repeated games with ®nite automata

313

bilinear extensions of hn , hl , kn and kl to D…S†  D…T† denoted by the Pare n h…al ; bl †e. same symbols. Thus, for example, hn …s; t† ˆ Es; t d…1=n† lˆ1 In this paper we study two classes of repeated games di¨erentiated by their payo¨ functions. Finitely Repeated Game G n ˆ …S; T; hn ; kn † The l-Discounted Game Gl ˆ …S; T; hl ; kl † If two pure strategies of a player induce the same play against any pure strategy of the other player, they are said to be equivalent. For example, player 1's pure strategies s and s 0 are equivalent if ol …s; t† ˆ ol …s 0 ; t† for all pure strategy t of player 2 and all stages l ˆ 1; 2; . . . . Extending this notion to mixed strategies, we say that two strategies of a player are equivalent if, against any strategy of the other player, they induce the same probability over the plays of a repeated game. Given the stage game G ˆ …A; B; h; k†, an automaton of player 1 is de®ned by a four-tuple M ˆ hQ; q1 ; f ; gi. The ®rst component Q is a set of states, and q1 A Q is an initial state. The third component is an action function, f : Q ! A, and the last component is a transition function, g : Q  B ! Q. By the size of an automaton we mean the cardinality of the set of its states, jQj. An automaton M plays a repeated game as follows. At each stage n it takes an action prescribed by f for the current state, say qn , i.e., f …qn †; it is set for q1 at the ®rst stage. Then it changes its state to qn‡1 speci®ed by g as a function of the current state qn and player 2's action bn , that is, qn‡1 ˆ g…qn ; bn †. Every automaton M induces a pure strategy s for player 1 in a repeated game in the following manner. First, for any sequence of player 2's actions b1 ; . . . ; bn …n V 2†, de®ne an extension of the transition function inductively by g…q; b1 ; . . . ; bn † ˆ g…g…q; b1 ; . . . ; bn 1 †; bn †: Then for any history o ˆ ……a1 ; b1 †; . . . ; …an ; bn †† A …A  B† n , set s…o† ˆ f …g…q; b1 ; . . . ; bn †† which is an action taken at stage n ‡ 1 …n V 1†. At the ®rst stage, s…f† ˆ f …q1 †. Also, for every pure strategy s A S in a repeated game, there is an automaton that induces an equivalent strategy. If s is equivalent to a pure strategy induced by an automaton, we say that s is implementable by that automaton. The size of the smallest automaton that implements a pure strategy serves as a measure of complexity of that strategy. To be more precise, for a given s ˆ …sn † A S, we say that a ®nite history o ˆ ……a1 ; b1 †; . . . ; …al ; bl †† is compatible with s if an ˆ sn ……a1 ; b1 †; . . . ; …an 1 ; bn 1 †† for every n ˆ 1; . . . ; l. Also, for an arbitrary ®nite history o of length l, de®ne the induced strategy sjo ˆ ……sjo†n † by …sjo†n …o 0 † ˆ sl‡n …oo 0 † for each o 0 A …A  B† n 1 where oo 0 is the concatenation of o and o 0 . The number of distinct, or nonequivalent, strategies induced by s and ®nite histories compatible with s can be considered as a measure of complexity of implementing s. Indeed, it can be shown that

314

A. Neyman, D. Okada

the size of the smallest automaton that implements s equals the number of the equivalence classes of f…sjo†jo is a finite history compatible with sg. Kalai and Stanford (1988) provides an analogous result for the full automata whose transition depends on the player's own action as well as the actions of the other players. Henceforth, by the complexity of a pure strategy s, we mean the size of the smallest automaton that implements s. For each positive integer m, we denote by S…m† the subset of S consisting of those pure strategies of player 1 whose complexity is at most m. 3. Individually rational payo¨ levels We will present in this section a result which will be utilized in deriving much of the subsequent results. The situation under consideration is the one in which player 1 is restricted to a ®nite set of pure strategies. The nature of this set is arbitrary. In particular, it may contain pure strategies which cannot be implemented by any ®nite automata. Theorem 3.1. For every ®nite subset S 0 of S there exists t^ A T such that for all s A S0 (i)

hn …s; t^† U h ‡

khk log2 jS 0 j n

for all n ˆ 1; 2; . . . ;

and (ii) hl …s; t^† U h ‡ …1

l†khk log2 jS 0 j

for all l A ‰0; 1†:

Proof: For each ®nite history o ˆ …o1 ; . . . ; ol †, where oj ˆ …aj ; bj †, let S 0 …o† be the set of strategies in S 0 that are compatible with o, i.e., S 0 …o† ˆ fs A S 0 j s…q† ˆ a1 ; and s…o1 ; . . . ; oj 1 † ˆ aj for all j ˆ 2; . . . ; lg: For each a A A let S 0 …o; a† be the set of strategies in S 0 …o† that takes the action a at the history o, i.e., S 0 …o; a† ˆ fs A S 0 …o† j s…o† ˆ ag: Clearly, if a 0 a 0 , then S 0 …o; a† and S 0 …o; a 0 † are disjoint, and 6a A A S 0 …o; a† ˆ S 0 …o†. Let a…o† be an action of player 1 such that jS 0 …o; a…o††j V jS 0 …o; a†j for all a A A. Notice that if a 0 a…o†, then jS 0 …o; a†j is at most one half of jS 0 …o†j: otherwise, jS 0 …o; a…o††j ‡ jS 0 …o; a†j V 2jS 0 …o; a†j > jS 0 …o†j, a contradiction. This implies that for every …a; b† A A  B with a 0 a…o†, if o 0 ˆ …o1 ; . . . ; ol ; …a; b††, then jS 0 …o 0 †j U

jS 0 …o†j : 2

De®ne t^ A T by t^…o† A argminb A B h…a…o†; b†. Take s A S 0 arbitrarily and let …o1 ; o2 ; . . .† be the play generated by …s; t^†, oj ˆ …aj ; bj †. Denote o l ˆ

Two-person repeated games with ®nite automata

315

…o1 ; . . . ; ol †. Of course, s is compatible with o l for every l, i.e., s A S…o l †. Therefore for all n, Pn I …al 0a…o l 1 †† lˆ1 V jS…o n †j V 1 jS 0 j2 Pn where I is the indicator function. This implies that lˆ1 I …al 0 a…ol 1 †† U 0 log2 jS j, that is, the number of stages at which player 1's action di¨ers from a…o l † is at most log2 jS 0 j. ; y2 ; . . .† be a nonincreasing sequence of nonnegative Now let y ˆ …y1P Pyreal numbers such that y lˆ1 yl ˆ 1. De®ne hy : S  T ! R by hy …s; t† ˆ lˆ1  yl h…ol …s; t††. Take s A S 0 . Then, since h…ol † U h I …al ˆ a…ol 1 †† ‡ khkI …al 0 a…ol 1 †† for every l ˆ 1; 2; . . . ; we have hy …s; t^† U

y X

yl …h I …al ˆ a…ol 1 †† ‡ khkI …al 0 a…ol 1 †††

lˆ1

U h ‡ khk

y X

yl I …al 0 a…ol 1 ††

lˆ1

U h ‡ khk

y X

y1 I …al 0 a…ol 1 ††

lˆ1

U h ‡ y1 khk log2 jS 0 j: (Recall our assumption h…a; b† V 0.) Note that if yl ˆ 1=n for l ˆ 1; . . . ; n and yl ˆ 0 for l > n, then hy ˆ hn . Hence (i). Also, if yl ˆ …1 l†ll 1 , l ˆ Q.E.D. 1; 2; . . . ; then hy ˆ hl . This proves (ii). Remark 3.1: If s and s 0 in S 0 are equivalent, then for every ®nite history o and every action a A A, either both s and s 0 are in S 0 …o; a† or neither is in S 0 …o; a†. Therefore one can replace logjS 0 j in the statement of Theorem 3.1 by logjS 0 =@j where S 0 =@ is the set of equivalence classes of S 0 . Remark 3.2: Let S1 ; S2 ; . . . be a nondecreasing sequence of ®nite subsets of S. If logjSn j=n ! 0 as n ! y, then Theorem 3.1 (i) implies that for every e > 0, there is n0 such that for each n V n0 there is t A T for which maxs A Sn hn …s; t† U h ‡ e. Similar result is obtained from Theorem 3.1 (ii) for l-discounted payo¨ by replacing the sequence S1 ; S2 ; . . . by a net …Sl j0 U l < 1† and the condition logjSn j=n ! 0 as n ! y by …1 l† logjSl j ! 0 as l ! 1. 4. Finitely repeated game G n …m…n†† In this section we study the modi®ed version of the ®nitely repeated game, G n …m…n†† ˆ …S…m…n††; T; hn ; kn †. The bound on the complexity of player 1's

316

A. Neyman, D. Okada

strategy, m…†, is a function of the number of repetitions n. Player 1 is allowed to use a mixed strategy provided that its support lies in S…m…n††. He can also use a behavioral strategy s : 6lV0 …A  B†l ! D…A† as long as it is equivalent to a mixed strategy with the support in S…m…n††. A simple counting shows that the number of ®nite automata of size m is at most mCm for some positive constant C. Thus the number of equivalence classes of S…m† is also bounded by mCm . The next lemma follows from Theorem 3.1 (i), Remark 3.1 and Remark 3.2. Lemma 4.1. Suppose that m…n† log m…n†=n ! 0 as n ! y. Then for every e > 0, there is n0 such that for each n > n0 there is t A T such that hn …s; t† U h ‡ e

for all s A S…m…n††:

Remark 4.1: As an immediate corollary of this lemma, we obtain the following result concerning the asymptotics of the value of two-person zero sum repeated games with ®nite automata which was proved in Neyman and Okada (1998a).2 Consider G ˆ …A; B; h; k† to be a two-person zero sum game, i.e., k ˆ h. Denote the value of G n …m…n†† by V n …m…n††. Corollary 4.1. If m…n† log m…n†=n ! 0 as n ! y, then V n …m…n†† ! h as n ! y. Denote by E n …m…n†† the set of (Nash) equilibrium payo¨ vectors of G …m…n††. The next theorem provides an asymptotics of E n …m…n†† when m…n† grows su½ciently slowly. The convergence of sets is with respect to the Hausdor¨ topology in R 2 . To state the theorem formally we need some more notation. Let F be the convex hull of the set of payo¨s feasible in pure actions of the stage game, that is, F ˆ Cof…h…a; b†; k…a; b†† j …a; b† A A  Bg; and let n

F~ ˆ f…x; y† A F j x V h ; y V k  g: The set F~ is nonempty. For example, let a  A argmaxa A A ‰minb A B h…a; b†Š and b  A argmaxb A B k…a  ; b†. Then it is easily seen that h…a  ; b  † V h and k…a  ; b  † V k  . Note that the point …h ; k  † does not necessarily belong to F and thus it may not belong to F~. For example, for the 2  2 stage game L

R

T

0; 0

1; 2

B

2; 1

0; 0

we have …h ; k  † ˆ …0; 1† and F ˆ Cof…0; 0†; …1; 2†; …2; 1†g. So …h ; k  † B F . …F~ ˆ Cof…12 ; 1†; …1; 2†; …2; 1†g† See Figure 1. Theorem 4.1. If m…n† ! y, m…n† log m…n†=n ! 0 as n ! y, and if there is …x; y† A F~ with x > h , then E n …m…n†† ! F~ as n ! y. 2 The previous proof utilized the notion of entropy.

Two-person repeated games with ®nite automata

317

Fig. 1. …h ; k  † may not be in F

As demonstrated in the next example, the conclusion of the theorem fails if the condition on F~ is not satis®ed. Example (Neyman (1999)). Consider the 2  2 stage game given below. L

R

T

1; 3

0; 4

B

1; 1

1; 0

Observe that h ˆ k  ˆ 1 and F~ ˆ f…1; y† j 1 U y U 3g. For this game E …m…n†† ˆ f…1; 1†g for every n regardless of m…n†. To see this ®rst note that player 1 must receive 1 at every stage in any equilibrium path, and he can guarantee 1 with an automaton of size 1 (play B at every stage). Suppose that, in some equilibrium …s; t† of G n …m…n††, player 2 received more than 1. Then …T; L† must be played with a positive probability at some stage on the equilibrium path. Let n~ be the last stage at which …T; L† is played with a positive probability, i.e., n

n~ ˆ maxfn 0 j 1 U n 0 U n; Ps; t ……an 0 ; bn 0 † ˆ …T; R†† > 0g: De®ne t~ ˆ …~ tl † A D…T† as 8 < tl …o† for 1 U l U n~ t~l …o† ˆ R for l ˆ n~ : L for l > n~

1

Then it is easily veri®ed that kn …s; ~t† > kn …s; t†, contradicting to the supposition that …s; t† is an equilibrium. r We now turn to the proof of Theorem 4.1. Given a point z A R 2 and a nonempty compact set Z H R 2 , de®ne d…z; Z† ˆ minz 0 A Z kz z 0 k. Since F~ and E n …m…n†† are nonempty compact subsets of F which is compact, the

318

A. Neyman, D. Okada

conclusion of the theorem, E n …m…n†† ! F~, is equivalent to Lim E n …m…n†† ˆ Lim E n …m…n†† ˆ F~ n!y

n!y

where Lim E n …m…n†† ˆ fzjEe > 0; En 0 ; bn V n 0 such that d…z; E n …m…n††† < eg; n!y

and Lim E n …m…n†† ˆ fzjEe > 0; bn0 such that En V n0 ; d…z; E n …m…n††† < eg: n!y

We will establish the identity of the three sets through a pair of claims. Note that the ®rst claim requires neither m…n† ! y …n ! y† nor the condition on F~ present in the statement of Theorem 4.1. Claim 4.1. If m…n† log m…n†=n ! 0 as n ! y, then Limn!y E n …m…n†† H F~. Proof: Obviously, E n …m…n†† is included in the set of payo¨ vectors achieved by mixed strategies which in turn is a subset of F. As the set F is closed, Limn!y E n …m…n†† H F . Take …x; y† A Limn!y E n …m…n††. First, the fact that player 1 can guarantee h at every stage using a pure action a  A argmaxa A A ‰minb A B h…a; b†Š shows that x V h . Next if G is zero sum so that h ˆ k, then h ˆ max min k…a; b† ˆ aAA bAB

min max k…a; b† ˆ aAA bAB

k :

It follows from Lemma 4.1 that for every e > 0, there is n0 such that for each n V n0 player 2 has a pure strategy t such that kn …s; t† V k 

e

for all s A S…m…n††:

Therefore, for every n V n0 , player 2 must receive at least k  librium of Gn …m…n††. This implies that y V k  .

e in any equiQ.E.D.

Claim 4.2. If m…n† ! y, m…n† log m…n†=n ! 0 as n ! y, and if there is …x; y† A F~ with x > h , then F~ H Lim n!y E n …m…n††. Proof: First, we deal with the case in which there is …x; y† in F~ with x > h and y > k  . To show F~ H Lim n!y En …m…n†† it su½ces to show that, for every d > 0, the set F~d ˆ f…x; y† A F~ j x > h ‡ d; y > k  ‡ dg is contained in Lim n!y En …m…n††. Let K ˆ maxfkhk; kkkg. Since we have assumed that the payo¨s are nonnegative, it follows that jr…a; b† r…a 0 ; b 0 †j U K for all r ˆ h; k and …a; b†; …a 0 ; b 0 † A A  B. Fix a d > 0 for which F~d 0 q and take …x; y† A F~d . Let e > 0 be su½ciently small so that e < minf1; K=4g, x > h ‡ 2e, and y > k  ‡ 2e. Let …ai ; bi † A A  B, i ˆ 1; 2; 3, be such that …x; y† is a convex combination of …h…ai ; bi †; k…ai ; bi ††, i ˆ 1; 2; 3. Thus there are ai V 0, i ˆ 1; 2; 3, such

Two-person repeated games with ®nite automata

319

P3 that a1 ‡ a2 ‡ a3 ˆ 1 and …x; y† ˆ iˆ1 ai …h…ai ; bi †; k…ai ; bi ††. Assume that k…a1 ; b1 † U k…a2 ; b2 † U k…a3 ; b3 † and, without loss of generality, a3 > 0. Let d be a su½ciently large positive integer soP that, by setting d1 ˆ ‰a1 dŠ, d2 ˆ ‰a2 dŠ, 3 di …h…ai ; bi †; k…ai ; bi ††, the followd3 ˆ d …d1 ‡ d2 †, and …x; y† ˆ …1=d† iˆ1 ing inequalities hold: k…x; y†

e …x; y†k < ; 2

…4:1†

and k  † > K:

d3 …y

…4:2†

Note that …x; y† converges to …x; y† as d tends to in®nity and thus (4.1) holds for a su½ciently large d. Also, since a3 > 0 implies that d3 ! y as d ! y, and since y > k  , (4.2) holds for a su½ciently large d. Let b4 A B be a best response of player 2 to the action a3 of player 1. De®ne a sequence of action pairs of length d, x ˆ …x1 ; . . . ; xd †, by 8 < …a1 ; b1 † for j ˆ 1; . . . ; d1 xj ˆ …a2 ; b2 † for j ˆ d1 ‡ 1; . . . ; d1 ‡ d2 : …a3 ; b3 † for j ˆ d1 ‡ d2 ‡ 1; . . . ; d. (Recall that d1 ‡ d2 ‡ d3 ˆ d) De®ne a sequence of action pairs o ˆ …o1 ; . . . ; on † A …A  B† n as follows. Let q ˆ ‰n=dŠ. In the last d stages, o coincides with x up to the one stage before the end and then ®nishes with …a3 ; b4 †: …on

d‡1 ; . . . ; on †

From the stage n …on

ˆ …x1 ; . . . ; xd 1 ; …a3 ; b4 ††; qd ‡ 1 up to n

qd‡1 ; . . . ; on d †

Finally, in the ®rst n …o1 ; . . . ; on n

qd †

d, x is repeated q

ˆ ``…x1 ; . . . ; xd † repeated q

1 times:

1 times:''

qd stages, the tail part of x is played:

ˆ …x…q‡1†d

n‡1 ; . . . ; xd †:

Notice that …o1 ; . . . ; on 1 † is d-periodic. Clearly, for every p ˆ 1; . . . ; 1, n X

h…ol † V …n

p†x

dK

…4:3†

lˆp‡1

The assumption k…a1 ; b1 † U k…a2 ; b2 † U k…a3 ; b3 † and the choice of b4 imply that for every p < n, n X lˆp‡1

k…ol † V …n

p†y:

…4:4†

320

A. Neyman, D. Okada

De®ne a pair of pure strategies …~ s; t~† so that (i) they follow the path o as long as the other player does so, and (ii) if player 2 deviated from o, then s~ takes a pure action a~ A argmina A A dmaxb A B k…a; b†e at every stage afterward, while (iii) if player 1 deviated from o for the ®rst time at stage p, then t~ starts playing a pure strategy t^ such that for every s A S…m…n†† and l ˆ 1; 2; . . . ; hl …s; t^† U h ‡

khk logjS…m…n††j : l

Theorem 3.1 ensures the existence of such strategy. The strategy s~ is implementable by an automaton of size d ‡ 1: d states for playing the cycle phase of o and one for the punishment.3 So, if n is large enough so that m…n† > d, then s~ A S…m…n††. Since the play induced by …~ s; t~† is o, we have s; t~†; kn …~ s; t~†† k…hn …~

…x; y†k


2dK : e

It follows from (4.1), using the triangle inequality, that …hn …~ s; t~†; kn …~ s; t~†† is within e of …x; y† for su½ciently large n. Next we will show that no unilateral deviation from …~ s; t~† leads to a strict improvement of the payo¨. We start with player 2. Take t A T. Assume that the strategy t deviates from the play o at stage p. If p U n d3 , then the inequalities (4.2) and (4.4) imply that s; t~† n…kn …~

kn …~ s; t†† V V

while if n

K ‡ …n K ‡ d3 …y

p†…y

k †

k  † > 0;

d3 < p U n, recalling the choice of b4 ,

s; t~† n…kn …~

kn …~ s; t†† V …n

p†k…a3 ; b3 † ‡ k…a3 ; b4 † …k…a3 ; b4 † ‡ …n

ˆ …n

p†…k…a3 ; b3 †

p†k  † k  † > 0:

Thus we conclude that player 2 cannot bene®t from a deviation from o at any stage. Let us turn to player 1. Take s A S…m…n†† and suppose that …s; t~† resulted in player 1's deviation from the path o. The fact that s A S…m…n†† implies that such deviation must occur in the ®rst m…n† repetitions of the cycle o1 ; . . . ; od , and hence in the ®rst m…n†d stages of the repeated game. Thus assume that the deviation occurred at stage p U m…n†d. Let …o10 ; . . . ; op0 † be the play induced by …s; t~† up to stage p and set s 0 ˆ …sjo10 ; . . . ; op0 †. Then by the construction of t~, we have, recalling that the payo¨s are assumed to be nonnegative, 3 Although on 0 xd , player 1's action in on , a3 , is the same as his action in xd .

Two-person repeated games with ®nite automata

321

Fig. 2. F~ ˆ f…x; k  †jh0 U x U h1 g, h U h0 < h1

hn …s; t~† U U

pK  ‡ 1 n

p hn p …s 0 ; t^† n

pK khk logjS…m…n††j ‡ h ‡ n n

which would be less than h ‡ e if, e.g., nV

2 maxfm…n†dK; khk logjS…m…n††jg: e

…4:5†

Our assumption on the order of magnitude of m…n† guarantees (4.5) to hold for all su½ciently large n. Since x > h  ‡ 2e and hn …~ s; t~† is within e of x, we s; t~†. We have thus shown that …~ s; t~† is an equilibrium of have hn …s; t~† < hn …~ G n …m…n†† with a payo¨ vector within e of our target payo¨ vector …x; y† provided that n is large enough. Next assume F~ ˆ f…x; k  †jh0 U x U h1 g where h U h0 U h1 and h < h1 . See Figure 2 for example. In this case there are two action pairs …a; b† and …a 0 ; b 0 † such that k…a; b† ˆ k  , h…a; b† U h0 and …h…a 0 ; b 0 †; k…a 0 ; b 0 †† ˆ …h1 ; k  †. Take …x; k  † A F~ with x > h ‡ 2e where e > 0 is su½ciently small. Let d be a su½ciently large positive integer and let x ˆ …x1 ; . . . ; xd † A …A  B† d be such Pd that (i) xj ˆ …a; b† or …a 0 ; b 0 †, and (ii) j…1=d† jˆ1 h…xj † xj < e. Let n be a su½ciently large positive integer and de®ne the path o ˆ …o1 ; . . . ; on † A …A  B† n by …o1 ; . . . ; on qd † ˆ ……a 0 ; b 0 †; . . . ; …a 0 ; b 0 †† and …on qd‡1 ; . . . ; on † ˆ ``x repeated q times'' where q ˆ ‰n=dŠ. Let s~ be player 1's pure strategy that takes the action ol1 at stage l regardless of the past history. Let t~ be player 2's pure strategy that follows the path o as long as player 1 does so, and if player 1 deviated from o at stage l for the ®rst time, it immediately reverts to t^ against S…m…n†† as in the previous case. s; t~† is played and it is the Since player 2 receives k  at every stage when …~ highest payo¨ for her in the stage game, she has no incentive to deviate from o at any stage. An argument similar to the ®rst case shows that player 1 cannot bene®t from the deviation from o, provided that n is su½ciently large.

322

A. Neyman, D. Okada

Thus …~ s; t~† is an equilibrium of G n …m…n†† with payo¨ within e of …x; k  † for su½ciently large n. This completes the proof. Q.E.D. Claims 4.1, 4.2 and the fact Lim n!y E n …m…n†† H Limn!y E n …m…n†† establish Theorem 4.1. 5. The l-discounted game G l (m(l)) In this section we will study a modi®ed version of the l-discounted game, Gl …m…l†† ˆ …S…m…l††; T; hl ; kl †. We consider the bound on the complexity of player 1's strategies to be a function of the discount factor l such that m…l† ! y as l ! 1. The next lemma is an analog of Lemma 4.1. Lemma 5.1. Suppose that …1 l†m…l† log m…l† ! 0 as l ! 1. Then for every e > 0, there is l0 such that for each l A ‰l0 ; 1† there is t A T such that hl …s; t† U h ‡ e

for all s A S…m…l††:

Corollary 5.1. Let G be a two-person zero sum game and Vl …m…l†† be the value of Gl …m…l††. If …1 l†m…l† log m…l† ! 0 as l ! 1, then Vl …m…l†† ! h as l ! 1. Let us denote by El …m…l†† the set of (Nash) equilibrium payo¨ vectors of Gl …m…l††. The next theorem is an analogue of Theorem 4.1. Theorem 5.1. If …1 l†m…l† log m…l† ! 0 as l ! 1 and if there is …x; y† A F~ with x > h or y > k  , then El …m…l†† ! F~ as l ! 1. Proof: De®ne the sets Liml!1 El …m…l†† and Lim l!1 El …m…l†† similarly to Limn!y E n …m…n†† and Lim n!y E n …m…n††. An argument similar to the proof of Claim 4.1 together with Lemma 5.1 shows that Liml!1 El …m…l†† H F~. Below we will show that F~ H Lim l!1 El …m…l††. First, assume that there is a point …x; y† in F~ such that x > h and y > k  . As in the proof of Claim 4.2, ®x d > 0 with F~d 0 q and take …x; y† A F~d . Let e > 0 be su½ciently small so that x > h ‡ 4e and y > k  ‡ 4e. Let d be a su½ciently large positive integer and let x ˆ …x1 ; . . . ; xd † A …A  B† d be a ®nite sequence of action pairs such that

1 X d

…h…xj †; k…xj ††

d jˆ1

e

…x; y† < :

2

…5:1†

De®ne a play o ˆ …o1 ; o2 ; . . .† by ol ˆ xj if l ˆ j (mod d ). That is, o consists of repetitions of the ®nite cycle x. For each positive integer p let …xp …l†; yp …l†† ˆ …1



y X lˆp

ll p …h…ol †; k…ol ††;

Two-person repeated games with ®nite automata

323

and set …x…l†; y…l††Pˆ …x1 …l†; y1 …l††. As …ol †y lˆ1 is d-periodic, …xp …l†; yp …l†† d …h…xj †; k…xj †† as l ! 1 for each p. This convergence is converges to …1=d† jˆ1 uniform in p. So we can take l su½ciently close to 1 so that for every p ˆ 1; 2; . . . ;

e

d 1X

…5:2† …h…xj †; k…xj †† < :

…xp …l†; yp …l††

2

d jˆ1 Now we describe the equilibrium strategies s~ A S and t~ A T. Player 1's strategy s~ follows the play o as long as player 2 does so. If player 2 deviated from o at some stage, then from the next stage on s~ takes a pure action a~ A argmina A A dmaxb A B k…a; b†e. Player 2's strategy t~ also follows o as long as player 1 does so. If player 1 deviated from o at some stage, then at the next stage t~ starts playing the pure strategy t^ constructed in the proof of Theorem 3.1 against player 1's strategy set S…m…l††. The strategy s~ is implementable by an automaton of size at most d ‡ 1. So for l su½ciently close to 1 so that m…l† > d, we have s~ A S…m…l††. Note s; t~†; kl …~ s; t~†† ˆ …x…l†; y…l††. Thus, by (5.1) and (5.2), the strategy that …hl …~ pair …~ s; t~† yields a payo¨ vector within e of …x; y†. ~ Take s A S…m…l†† and let …ol0 †y lˆ1 be the play induced by …s; t †. Then, either ol ˆ ol0 for all l or there is the smallest p such that op 0 op0 . (Note that both …ol † and …ol0 † are deterministic plays.) In the latter case, Lemma 5.1 implies that hl …s; t~† < …1



p 1 X

l

l 1

h…ol † ‡ l

p 1

lˆ1

h…op0 †

‡

!

lp 1

l

…h ‡ e† :

It follows from (5.1), (5.2), and the assumption x > h ‡ 4e, that, for l su½ciently close to 1, …h ‡ e† xp‡1 …l† < 2e. Since l=…1 l† ! y as l ! 1, we have K ‡ …l=…1 l††……h ‡ e† xp …l†† < e for l su½ciently close to 1 and hence hl …s; t~†

hl …~ s; t~† ˆ hl …s; t~† U …1

x…l†  p 1 K‡ l†l

U …1

l†l p 1 … e† < 0:



l 1

l

……h ‡ e†

xp‡1 …l††

If l is su½ciently close to 1 so that …1 l†dK U l d e, then player 2 would have no incentive to deviate from o at any stage. Indeed, if player 2 deviated from o for the ®rst time in the p-th cycle, then the gain within the p-th cycle from the deviation is at most …1 l†l… p 1†d dK while the loss she incurs from the punishment is at least l pd e. Thus we have shown that, for all l su½ciently close to 1, …~ s; t~† is an equilibrium of Gl …m…l†† that yields a payo¨ vector within e of …x; y†. Now suppose that F~ ˆ f…h ; y†jk0 U y U k1 g where k  U k0 U k1 and  k < k1 . Then there are action pairs …a; b† and …a 0 ; b 0 † such that h…a; b† ˆ h ,

324

A. Neyman, D. Okada

k…a; b† U k0 and …h…a 0 ; b 0 †; k…a 0 ; b 0 †† ˆ …h ; k1 †. For a given payo¨ vector …x; y† in F~ with y > k  , de®ne x ˆ …x1 ; . . . ; xd † as in (5.1) except that xj ˆ …a; b† or …a 0 ; b 0 †. Let o be the cyclic play with the cycle x. De®ne a strategy pair …~ s; t~† as follows: s~ is the same as above, and t~ follows o regardless of the history. Note that player 1 receives h at every stage on the play o and the assumption on F~ implies that this is the highest payo¨ for him in the stage game. Thus player 1 cannot bene®t by deviating from o. The same argument as above shows that player 2 has no incentive to deviate from o provided that l is su½ciently close to 1. The proof for the case F~ ˆ f…x; k  †jh0 U x U h1 g Q.E.D. with h U h0 U h1 and h < h1 is similar and we omit it. For the following 2  2 game L

R

T

0; 1

1; 0

B

1; 0

0; 1

…h ; k  † ˆ …0; 1† and hence F~ ˆ f…0; 1†g. It is clear, however, that player 1 must receive a strictly positive payo¨ in any equilibrium of the l-discounted game. Thus one cannot dispense with the condition on F~ in the statement of Theorem 5.1. 6. Concluding remarks In the proof of Claim 4.2, we argued that player 1's deviation from equilibrium path, if any, must occur in a very early stage of the repeated game and so there are enough stages after the deviation so that player 2's punishment is e¨ective. For this we only needed the condition on the order of magnitude of the complexity bound, m…n† ˆ o…n†, which is weaker than our original condition m…n†log m…n† ˆ o…n†. The latter condition is needed to determine the individually rational levels, or rather, we know the individually rational levels only under this condition on m…n†. Suppose that we obtained a result that, for a particular sequence …m…n††y nˆ1 with m…n† ˆ o…n†, lim V n …m…n†† ˆ Val…G† ˆ min max h…a; b†

n!y

b A D…B† a A A

for every two-person zero sum game G ˆ …A; B; h†. (See Neyman (1997) Conjecture 1 and 2.) Then an essentially the same proof as in Theorem 4.1 shows that, for such sequence …m…n††y nˆ1 , n

E …m…n†† !



 …x; y† A F x V min max h…a; b†; y V min max k…a; b† b A D…B† a A A a A D…A† b A B

as n ! y, provided that there is a feasible payo¨ vector in which player 1 receives strictly more than minb A D…B† maxa A A h…a; b†. Similar argument holds for the discounted games.

Two-person repeated games with ®nite automata

325

References Ben-Porath E (1993) Repeated games with ®nite automata. Journal of Economic Theory 59:17±32 Kalai E, Stanford W (1988) Finite rationality and interpersonal complexity in repeated games. Econometrica 56:397±410 Lehrer E (1988) Repeated games with stationary bounded recall strategies. Journal of Economic Theory 46:130±144 Lehrer E (1994) Finitely many players with bounded recall in in®nitely repeated games. Games and Economic Behavior 7:390±405 Neyman A (1985) Bounded complexity justi®es cooperation in the ®nitely repeated prisoner's dilemma. Economics Letters 19:227±229 Neyman A (1997) Cooperation, repetition, and automata. In Cooperation: Game-Theoretic Approaches, ed. by S. Hart, and A. Mas-Colell, vol. 155 of NATO ASI-Seies F, pp. 233±255. Springer Verlag Neyman A (1999) Finitely repeated games with ®nite automata. Mathematics of Operations Research 23:513±552 Neyman A, Okada D (1999) Strategic entropy and complexity in repeated games. Games and Economic Behavior 29:191±223 Neyman A, Okada D (2000) Repeated games with bounded entropy. Games and Economic Behavior 30:228±247 Papadimitriou CH, Yannakakis M (1994) On complexity as bounded rationality: Extended abstract. In STOC 94, pp. 726±733, Montreal, Quebec, Canada Zemel E (1989) Small talk and cooperation: A note on bounded rationality. Journal of Economic Theory 49:1±9