Games Computers Play: Simulating Characteristic ... - CiteSeerX

Games Computers Play: Simulating Characteristic Function Game Playing Agents with Classifier Systems Garett Dworman Operations and Information Management Systems The Wharton School, University of Pennsylvania Philadelphia, PA 19104 [email protected]

ABSTRACT * Many game theorists are turning to evolutionary simulations to model the behavior of boundedly rational agents. This new methodology allows researchers to observe purely adaptive behaviors in games, to observe differences of behavior due to changes in the games’ parameters, to discover equilibria in games that are too complex to calculate analytically, and to discover new strategies for playing the games. In this paper, I extend this methodology to a more complex class of games than had previously been attempted. I create a coevolutionary environment in which three agents, represented by classifier systems, play a characteristic function game. Although the agents have no computational capabilities, they learn to adapt reasonably intelligent behavior. INTRODUCTION Cooperative game theory attempts to model how groups should and do act in cooperative situations. This consists of determining what equilibria exist in a cooperative game, and determining which equilibrium agents in the game will arrive at. Traditional game theory assumes that agents are perfectly rational; agents have access to all the information they need plus the time and ability to process that information. Unfortunately, people are not that rational. Therefore, the following question arises: Can agents of limited rationality arrive at normative behavior? Current research shows that in simple games the answer is yes. This paper shows that in a more complex game the answer is still yes. This paper extends a new methodology currently being used in empirical game theory. This methodology simulates boundedly rational, adaptive, game playing agents using evolutionary models. With the methodology we can isolate the effects of specific changes in the game (Marimon, McGrattan and Sargent, 1990; Miller, 1989) or in the adaptive reasoning process (Miller 1989; Ho 1991). The methodology also demonstrates strategies that arise from adaptive behavior. It has even discovered good strategies that are too complex for people to find (Axelrod, 1987; Miller 1989; Matwin, Szapiro, and Haigh, 1991). Finally, the methodology provides a technique for equilibrium analysis. This includes confirming proposed equilibria, choosing among multiple equilibria, and even discovering equilibria in analytically intractable games (Marimon, McGrattan and Sargent, 1990).

* This paper is a condensed version of a longer working paper. The full version is available from the author on request.

Researchers have successfully applied the methodology to several games, such as the Repeated Prisoner’s Dilemma, that involve a small number of discrete strategies. However, more complex games have not been attempted. Therefore, we do not know whether we can recreate the results of the methodology in games that require more than a few discrete strategies. This paper demonstrates the application of the methodology to characteristic function games. The results show that even in this more complex game, adaptive agents of limited rationality can attain normative behavior. THE METHODOLOGY Researchers do not use the methodology to test a specific cognitive model, but rather assume a cognitive model as a basis for analyzing behavior in games. This cognitive model provides an alternative to expected utility maximization, the traditional model for analyzing games. The assumed cognitive model is built on three concepts: 1) bounded rationality, 2) rule-based reasoning, and 3) adaptive behavior. All three concepts have had substantial support as plausible cognitive theories. Traditional models of game theory assume rational agents who have access to all the information they need plus the time and ability to process that information. While this assumption may be useful for normative models of games, it is inappropriate for descriptive models; people have limited information, limited time to process information and limited ability to process information. Many scholars feel that future development of economics and game theory requires modeling bounded rationality (Simon, 1982; Aumann 1989; Kreps, 1990; Selton 1991). Simon (1978) nicely summarizes the need for bounded rationality: Complexity is deep in the nature of things, and discovering tolerable approximation procedures and heuristics that permit huge spaces to be searched very selectively lies at the heart of intelligence, whether human or artificial. A theory of rationality that does not give an account of problem solving in the face of complexity is sadly incomplete. It is worse than incomplete; it can be seriously misleading by providing “solutions” to economic questions that are without operational significance (p.12). Simon calls on scholars to study process rationality – the effectiveness of decision procedures to produce good choices using limited cognitive abilities – as well as substantive rationality – the appropriateness of the choices taken. One paradigm of process rationality is that people use simple heuristics or rules of thumb to come to a decision. The use of rules as a model for the human decision process has a long history. Rules, in the form of a production system, were basic to the General Problem Solver of Newell and Simon; expert systems are based on rules; John Anderson (1983) used rules as a

psychological model of human memory; and Holland, Holyoak, Nisbett and Thagard (1986) consider rules to be the “basic epistemic building block” of their theory of induction. Traditional game theory, based upon substantive rationality, analyzes each agent’s possible choices to determine equilibria. In contrast, when process rationality is assumed instead, then people’s actual choices are products of their decision process. An equilibrium occurs when previous moves no longer lead to changes in each agent’s decision process. Therefore, equilibria arise from an evolutionary process. Many researchers have used evolutionary equilibria as a method for modeling bounded rationality (see Aumann, 1989; Selten, 1983 and 1991; Friedman, 1991). For an agent whose decision process is modeled as a rule-based system, evolutionary behavior is a result of adaptation of the agent’s rules. Research Using the Methodology Axelrod (1987) uses genetic algorithms to evolve strategies for the repeated prisoner’s dilemma game. The genetic algorithm evolved strategies against eight human-derived strategies. The genetic algorithm generated a strategy that did as well as the best human strategy, Tit-For-Tat, in a round-robin tournament, and even beat Tit-For-Tat in a grudge match. Miller (1989) generalizes Axelrod’s work by coevolving automaton-represented strategies with a genetic algorithm. He uses this platform to study how communication noise between agents affects outcomes. The results show that communication noise decreases the amount of cooperation between agents. Ho (1991) uses Miller’s method to study the effects on outcomes when costs of information processing are introduced. He finds that information processing costs can greatly affect the level of cooperation. He also finds that the strategies that develop in costly environments differ from those that develop in costless environments. Holland and Miller (1991) describe the methodology using classifier systems to study adaptive economic agents. Matwin, Szapiro, and Haigh (1991) use a simple classifier representation with a genetic algorithm to create a negotiation support system (NSS). The NSS simulates the negotiations that a negotiator faced and suggests possible negotiation strategies. The NSS is able to discover negotiation strategies that are better than human subjects’ strategies, because the NSS is able to consider complex strategies that the human subjects can not. Marimon, McGrattan and Sargent (1990) use a classifier representation with a genetic algorithm to study outcomes in an economic environment in which agents exchange and consume various goods. They found that in a coevolutionary environment of three different agent types, the agents learned a coordinated behavior that conforms to Nash-Markov equilibria. Also, the authors use the model to determine in what environments one of two different possible equilibria occur. Finally, Marimon, et al., use the model to suggest a possible equilibrium in a game that is analytically complex. The equilibrium discovered by the automated agents seems plausible to the authors; it does not violate any properties the authors intuitively feel an equilibrium in this complex game should have.

CHARACTERISTIC FUNCTION GAMES This paper reports an attempt to apply the simulation methodology to characteristic function games. Characteristic function games model a bargaining situation in which agents must form coalitions. Different coalitions may be worth different amounts and the agents must negotiate on which coalition will form and how the value of that coalition will be split. The game played in this research consists of the following coalitions: Players 1 and 2 – 18 points Players 1 and 3 – 24 points Players 2 and 3 – 30 points One possible outcome of this game consists of Players 1 and 2 forming a coalition in which Player 1 receives 13 points and Player 2 receives 5 points. In this case Player 3 would receive 0 points. Many theories exist to model equilibria in cooperative games (McKelvey and Ordeshook, 1987; Rubinstein, 1982). Albers (1975) provides an equilibrium theory for characteristic function games. In addition to problems from the rationality assumption discussed previously, characteristic function games prove difficult to analyze because the rules by which the games are played can change the equilibrium outcome. For this reason, there has been a trend in recent game theory to study cooperative games, including characteristic function games, empirically. McKelvey and Ordeshook (1987) discuss laboratory work on cooperative games represented spatially. Albers and Laing (1991) set up experiments where students played spatially represented games over a computer network. The simulation methodology is a natural extension of this empirical work. Using the simulation methodology, evolved behavior may indicate which solution theory has better descriptive power. Characteristic function game are more complex than those games modeled in previous literature. In each of the games described above, the number of states that an agent needed to react to was very small. In prisoner’s dilemma the state space is defect or cooperate. The environment studied by Marimon, et al., has more choices but not many: An agent must decide to trade or not trade its current good, then decide to consume or not consume the good it ends up with. In contrast, in a characteristic function game an agent must first decide to reject or accept another proposal, and then, if rejecting, the agent must create a proposal itself. When deciding to accept or reject, the agent considers who proposed and how much was offered. When considering how much was offered, the agent looks at ranges of possible values (e.g., a rule may be “Accept, if agent 1 offered me at least 15.”). Constructing a proposal involves selecting a coalition and a distribution of the proposed coalition’s value (e.g., “Offer to player 3 that I get 15 and it gets 9”). IMPLEMENTATION Representation of the Agents The simulation uses three separate agents. Each agent is represented by two classifier systems: One to initiate offers, and a second to respond to offers made by other players. The game

is played as follows. When the game begins, an agent is randomly chosen to make the first offer. This agent’s initiator classifier system then constructs a proposal to another agent. The second agent’s response classifier system then accepts or rejects the offer. If the second agent accepts the offer, the game ends. Otherwise, the second agent’s initiator classifier system is invoked to construct another proposal. The game continues until one agent accepts another agent’s proposal or until a prespecified limit is reached. The conditions for both types of classifier system are identical; Figure 1 shows the representation of the classifiers’ conditions below. The condition allows a classifier to specify the agent who initiated a proposal, the amount the initiator gets from the proposal, the responder to the proposal, and the amount that responder gets from the proposal. Both amounts are specified in two fields. This allows the classifiers to condition on ranges of amounts.

No matter what I was offered, offer 10 to player 2 # # # # # # 2 10 0

0

If player 1 offered me less than 8, then request 8 from player 2 1 # # # # 8 2 8 0 1 Figure 3: Some possible initiator classifiers. Accept anything: #

#

#

#

#

Accept only if player 3 offers me between 5 and 25: 3 # # # 5

# 25

Figure 4 : Some possible responder classifiers. Reward Structure1

Initiator

Lower Upper Responder Lower Bound Bound Bound Figure 1: Representation of the classifiers’ condition

Upper Bound

The two classifier systems specify very different actions and, therefore, have very different action representations. The Responder classifiers have no explicit action. Instead, if a responder classifier matches the environment then it is implicitly stating that the agent should accept the current offer. If the classifier does not match the environment then it is implicitly stating that the agent should reject the proposed offer. Initiator classifiers are somewhat more complicated, because they must generate a proposal. This involves choosing a player to make the offer to and choosing an amount to offer that player. Figure 2 outlines the structure of an initiator classifier’s action. The first field names the responder. The following three fields together determine the amount of the proposal. First, the reference point and the offset fields are added together to specify an amount. Then the perspective field is used to determine whether this amount is being offered to the responder, or if this amount is a value that the agent is requesting for itself.

Unfortunately, the traditional Bucket Brigade algorithm used by classifier systems does not apply very well to the characteristic function game. Consider the following scenario: Player 1 makes a ridiculous offer to Player 2. Player 2 rejects this offer and tries to make a deal with Player 3. After haggling with Player 3 for 100 rounds, Player 2 finally tries to make a deal with Player 1 again. Should Player 1’s current responder classifier pay the initiator classifier that made the ridiculous offer 100 games ago? This does not seem correct. In this research I modified the Bucket Brigade algorithm to supplement the weak causal connections between a classifier and the consequences of its actions. The connections were supplemented by making the classifiers responsible for the risks of their actions. I did this by requiring bids to be some percentage of the amount offered to the agent (for responders) or proposed by the agent (for initiators). That is to say, an initiator that asks for 18 should bid a proportion of that 18, while an initiator that asks for 10 should bid a proportion of that 10. Likewise, a responder given a choice of accepting or rejecting 12 should bid some proportion of that 12. With this mechanism, the classifiers that take greater risks are favored in the auction but incur a greater cost. The bidding equations were:

Responder Reference Point Offset Perspective Figure 2: Representation of the action of an initiator classifier This three-field representation of a proposal amount is more cognitive than a one-number representation. With a one-number representation a rule may only specify “offer y to agent x.” The three-field representation can also specify such rules as “request 10 points less than the coalition value from agent x,” or “offer agent x 5 points less than the even-split.” Figures 3 and 4 give examples of initiator and responder classifiers Each field may have an explicit value or a dontcare. An explicit value is an integer that matches against the environment. In an initiator or responder field this may be [1-3] indicating a specific player. In an amount field this may be [0-30] specifying either a lower or upper bound on the amount of the proposal.

Bid = (NormalizedStrength)(StrengthWeighting) × (Specificity)(SpecificityWeighting) × Offer Effective Bid = Bid + noise. In these equations Normalized Strength is the classifier’s strength on a scale of 0 (weakest classifier) to 1 (strongest classifier). Strength Weighting determines the power of the

1 In this section I describe work performed with Eric

Schoenberg (Dworman and Schoenberg, 1992). The modified bucket brigade algorithm described above is actually the second of two reward structures that I have used. The first reward structure was purely a risk-based system with no semblance to a Bucket Brigade. A longer version of this paper is available from the author that describes and compares both reward structures.

normalized strength. Specificity is determined in the normal way. However, a minimum specificity of 0.1 is used so that very general rules have a chance of firing. Specificity Weighting determines the power of the specificity. Offer is the amount proposed by the agent for an initiator, and the amount offered to the agent for a responder.

normative behavior resulted. However, some interesting anomalies arose. For example, in one of these experiments, after several thousand games of almost perfectly normative behavior, Agent 1 began to receive higher coalition values while Agent 3’s coalition values decreased. This continued until the behavior stabilized at Agent 1 and Agent 3 both taking 14.

GAME THEORETIC ANALYSIS 2

Looking at the rules used by the two non-preset players explained this behavior. For the few thousand games in which the players were playing normatively, Agent 3’s response strategy was to accept 10. Therefore, Agent 3 was receiving normative coalition values only because the other players were offering normative coalition values. However, later Agent 1 developed a classifier that requested an {Agent 1 = 10, Agent 3 = 14} split. Later still, Agent 1 developed another classifier that requested an {Agent 1 = 14, Agent 3 = 10} split. Therefore, Agent 1 learned to take advantage of Agent 3’s poor response strategy.

The game used in this paper lends itself to the following quota solution: Q1 = {6, 12, 18}. If all players are behaving according to this solution then Player 1 would receive 6 points from any coalition that it participates in, Player 2 would receive 12 points, and Player 3 would receive 18 points. This solution is consistent with Albers’ theory of stable demand vectors (Albers, 1975) and is akin to a price equilibrium. However, the above solution is only one of many Nash equilibria in the game. Furthermore, this solution only looks at an agent’s performance in individual games. One may also consider the long-term equilibria of repeated games. An agent that receives its normative value once every 10 games and is otherwise excluded cannot be said to be doing very well. If all the agents participate in equally many games, then each agent would receive its normative value in 2/3 of the games. Thus, a long-term normative solution to the game is Q2 = {4, 8, 12}. There are many long-term behaviors that can satisfy this solution. Thus there are many possible behaviors that could be termed normative. One feature of the simulation methodology is that we can observe which equilibrium is chosen by the agents. RESULTS I examine each run with the following criteria: How close to short-term normative are the average coalition values (ACV) for each player? How close to long-term normative are the average payoff values (APV) for each player? Does it appear that an agent could alter its strategy to attain better results (do the classifier rules reflect Nash equilibrium strategies)? ACV measures an agents performance in only those games in which the agent was a coalition member. APV measures performance in all games. The last criterion is judged by studying the strategies used by the agents. In the interest of space, I describe here only some of the runs. Exhibit 1 displays the ACV and APV values for one run described below. The first experiment fixed Agents 1 and 2 to play at normative behavior; their rules were preset to offer and accept only normative coalitions and the genetic algorithm was turned off for their populations. Therefore, only Agent 3 could evolve new rules. Very quickly, Agent 3 converged on perfect normative behavior. However, Agent 3’s responder rules accepted anything above 2. In this safe environment where the other players only ever offered 18 – the normative amount – to Agent 3, that agent did not need to learn better response rules. Note that the two preset players, while normative were not playing a best response strategy to Agent 3. A second set of runs was performed with only Agent 2 preset and held to normative behavior. In these experiments mostly

2Much of this discussion comes from my work with Eric Schoenberg

Finally, experiments were run which allowed full coevolution of the three players. Exhibits 1 and 2 show the ACV and APV values for the last 4000 games of one of these runs. In this run both short-term and long-term normative behaviors were achieved. Not all runs demonstrated behavior this close to solution Q1. In one run Agent 2 accepted as little as 7. However, in doing so Agent 2 ended every game in a coalition, thereby nearly achieving its long-term normative amount in solution Q2. CONCLUSION The players in these simulations are not capable of any computation. Yet they were capable of arriving at fairly normative behavior. These results were attained for a game that was far more complex than those attempted before. Therefore, the following claim can be made with even more confidence than before: Adaptive agents of very limited rationality can learn normative behavior in a coevolutionary environment. The results of this research also demonstrate a powerful use of the simulation methodology. The quota solution to the characteristic function game used in this paper – {6, 12, 18} – is only one of many Nash equilibria. Albers theory of stable demand vectors suggests which of those Nash equilibria a group of players would select. This paper’s results support Albers’ theory for this game. However, Albers’ theory is one of many, and, for many games, the different theories disagree about which equilibria are likely to be chosen by agents playing the game. The methodology can be used to simulate those games and observe which theory, if any, is correct. ACKNOWLEDGMENTS I thank Professors Jim Laing, Jerry Lohse and Steve Kimbrough for guiding and assisting me in this research. I also thank Chris Schull and Shawn Casey for getting me the resources I needed and helping me get those resources to work! REFERENCES Albers, Wulf (1975), Zwei Losungskonzepte fur kooperative Mehrpersonenspiele, die auf Anspruchsniveaus der Spieler basieren, OR-Verfahren XXI, pp 1-13.

Albers, Wulf and James D. Laing (1991), "Prominence, Competition, Learning, and the Generation of Offers in Computer-Aided Experimental Spatial Games," in Reinhard Selten (Ed.), Game Equilibrium Models III, pp. 141-185. Anderson, John R. (1983), The Architecture of Cognition, Cambridge, MA: Harvard University Press. Aumann, R.J. (July, 1989), "Perspectives on Bounded Rationality," Stanford GSB Working Paper. Axelrod, R (1984), The Evolution of Cooperation, New York: Basic Books. Axelrod, R (1987), “The Evolution of Strategies in the Iterated Prisoner’s Dilemma,” in Lawrence Davis (Ed.), Genetic Algorithms and Simulated Annealing, Morgan Kaufman Publishers, Los Altos, CA. Dworman, Garett O., and Eric Schoenberg (1993) “Games Computers Play (Part 2): Applying a Bucket Brigade to a Classifier System Implementation of Characteristic Function Games,” Decision Sciences Department, The Wharton School. Friedman, D. (1991), “Evolutionary Games in Economics,” Econometrica. Ho, Teck-Hua (September 1991), “Finite Automata Play Repeated Prisoner's Dilemma With Information Processing costs,” working paper of the Decision Sciences Department of The Wharton School. Holland, John and John H. Miller (1991), “Artificially Adaptive Agents in Economic Theory,” American Economic Review, v81(2), pp. 365-370. Holland, John, K.J. Holyoak, R.E. Nisbett and P.R. Thagard (1986), Induction: Processes of Inference, Learning, and Discovery. Cambridge, MA: MIT Press. Kieras, David (1987), "Cognitive Modeling," in Stuart C. Shapiro and David Eckroth (Eds.), Encyclopedia of Artificial Intelligence, volume 1, pp. 111-115. Kreps, David M. (1990) Game Theory and Economic Modeling, Oxford: Clarendon Press, chapter 6. Laing, James D. (1991), “A Noncooperative Model of Bargaining In Simple Spatial Games,” in Reinhard Selten (Ed.), Game Equilibrium Models III, pp. 80-117. Marimon, Ramon, Ellen McGrattan and Thomas J. Sargent (1990), “Money as a Medium of Exchange in an Economy with Artificially Intelligent Agents,” Journal of Economic Dynamics and Control, v14, pp. 329-373. Matwin, Stan, Tom Szapiro, and Karen Haigh (January/February 1991), “Genetic Algorithms Approach to a Negotiation Support System,” IEEE Transactions on Systems, Man, and Cybernetics, 21 (1), pp. 102-114. McKelvey, R.D. and Peter C. Ordeshook (1987), “A decade of experimental research on spatial models of elections and committees.” Social Sciences Working Paper 657, Division of Humanities and social Sciences, California Institute of Technology, Pasadena, CA. McKelvey, R.D., Peter C. Ordeshook and M. Winer (1978), “The competitive solution for N-person games with-out side payments.” American Political Science Review, 72, pp 599615

Miller, John H. (July 1989), "The Coevolution of Automata in the Repeated Prisoner’s Dilemma," working paper of the Santa Fe Institute. Rubinstein, A (January, 1982), "Perfect Equilibrium in a Bargaining Model," Econometrica, v50 (1), pp. 97-109. Selten, Reinhard (1981), “A Noncooperative Model of Characteristic-Function Bargaining,” in R. J. Aumann et al. (eds.), Essays in and Mathematical Economics in Honor of Oscar Morgenstern (Bibliographisches Institut, Zurich: Mannheim), pp 131-151. Selten, Reinhard(1983), "Evolutionary Stability in Extensive Two-Person Games," Mathematical Social Sciences, v5, pp 269-363. Selten, Reinhard (1991), "Evolution, Learning and Economic Behavior," Games and Economic Behavior, v3 (1), pp. 3-24 Simon, Herbert A. (May 1978), "Rationality as Process and Product of Thought," the Richard T. Ely Lecture in American Economic Review, v68(2). Simon, Herbert A. (1982), Models of Bounded Rationality, Cambridge, MA: The MIT Press.