Harmony Search Based Algorithm for Complete Information ...

Harmony Search Based Algorithm for Complete Information Equilibrium in Infinite Game Riccardo Alberti and Atulya K. Nagar

Abstract In this paper we discuss and analyse a novel algorithm for the computation of approximated Nash equilibrium points in the class of infinite games with complete information. In particular we compare an established algorithm based on the principles of simulated annealing (SA) with our implementation which is a hybrid combination of SA and the basic Harmony Search algorithm. We study the method in the class of positional games which is a subset of the class of continuous games and it is a model for economic goods allocation in the presence of externalities. We show that our hybrid method converges to an equilibrium point faster than the plain SA algorithm though the accuracy of the solutions is slightly higher.

Keywords Computational game theory Harmony search ing Nash equilibrium Best response Regret function

Simulated anneal-

1 Introduction In the most general terms, game theory is a mathematical apparatus designed to analyse the strategic interactions of rational players. The basic assumption is that decision-makers pursue individual objectives taking into consideration the expectations of other participants. In this work we focus our attention on strategic non-cooperative continuous games with a finite number of players. A game is

R. Alberti (&) A. K. Nagar Department of Mathematics and Computer Science, Liverpool Hope University, Liverpool L16 9JD, UK e-mail: [email protected] A. K. Nagar e-mail: [email protected]

M. Pant et al. (eds.), Proceedings of the Third International Conference on Soft Computing for Problem Solving, Advances in Intelligent Systems and Computing 258, DOI: 10.1007/978-81-322-1771-8_44, Springer India 2014

503

504

R. Alberti and A. K. Nagar

defined by the tuple G ¼ ðR; u; I Þ where R ¼ i2I Ri is a compact set (every Ri is compact) and represents the set of joint pure strategies, u is the set of utility functions (i.e. ui ; 8i 2 I) which we consider to be continuous and concave and, I is the finite set of players. A solution of the game is defined as r ¼ r1 ; . . .; rI such that 8i 2 I ui ri ; ri ui ri ; ri . As usual we use the notation ri to indicate the strategies of all players except for player i. This is the definition of Nash equilibrium. In words a Nash equilibrium is a point in the joint strategy set where no player can increase her payoff by unilaterally deviate from the equilibrium strategy while the others don’t. In the original proof of existence, given by Nash in [1], the author employs the concept of best reponse (BR). BR is a correspondence that produces a subset of maximisers of i’s utility function when other players play some joint strategy ri . Therefore a Nash equilibrium can be defined as the fixed point of the cartesian product, over the set of player I, of the BR correspondences. Eventually, another concept that we are goin to extensively use in this work, is that of regret function. The regret function is a measure that gives the maximum benefit any agent can gain by unilateral deviation. Formally it is defined as eðrÞ ¼ maxi2I maxa2Ri ½ui ða; ri ui ðrÞ. In this light a Nash equilibrium is the point that minimise the regret function. Calculating a Nash equilibrium is in the class of PPAD complexity. Though many tools that compute solutions exist, few approximation tools for any general class of infinite games are present. In this work we study a variation of the work proposed in [2] substituting a part of the proposed Simulated Annealing based algorithm with a Harmony Search algorithm. In the experimental part of this work we calculate the approximate solution to games in a class of continuous games that describe an economic model in the presence of externalities. We show that our implementation though might slightly lose in accuracy of the approximation, it gains in convergence speed. The remainder of this paper is structured as follows. Section 2 revises the method proposed by [2] and present a simplified version of the algorithm to be applied to continuous games of complete information. In Sects. 3 and 4 we present our implementation and discuss the results of the simulations. Eventually in Sect. 5 we draw the conclusion of our work and propose future directions of research.

2 Simulated Annealing Simulated Annealing (SA) is a technique that has attracted significant attention as suitable for optimization problems of large scale, especially ones where a desired global extremum is hidden among many local extrema [3]. It mimics the physical process of annealing of metals whereby metals cool and anneal. At high temperatures, the molecules of a liquid move freely with respect to one another. If the liquid is cooled slowly, thermal mobility is lost. The atoms are often able to line themselves up and form a pure crystal that is completely ordered. This crystal is the state of minimum energy for this system. For slowly cooled systems, nature is

Harmony Search Based Algorithm for Complete Information Equilibrium

505

able to find this minimum energy state. The general setting of the method is summarised as follow: – a real valued function F defined over a finite set D (let us assume that F has many global minima); – for each d 2 D a set N ðdÞ D containing all the neighbours of d; – a function T : N ! ð0; 1Þ, the cooling schedule, that is coupled with the cooling rate Tr ; – d0 2 D as a starting point; The algorithm evolves as a discrete-time inhomogeneous Markov chain using a peculiar update rule. The acceptance/update condition is mutuated by another physical observation. Even at low temperature, there is a chance, of a system being in a high energy state, thus the system moves, with some probability, to a higher energy state to get out of a local energy minimum in favor of finding a global one. The update condition has been defined by Metropolis in [4] and represents the probability of accepting a new energy state value as: p ¼ exp½ðE2 E1 Þ=kT , where k is a natural constant (Boltzman), E2 is the new energy value and E1 is the current minimum. In [2] a globally convergent SA based algorithm for finding an approximated Nash equilibrium is described. The method consists of two routines: one for the computation of the approximated best response for each player and the other for the minimisation of the joint regret function. A simplified version for continuous games with complete information is summarised here. Approximated best response (SABR). For each player this routine is described by four steps: 1. consider an initial point dk;i 2 Di ; 2. choose a neighbour point dkþ1;i 2 N dk;i using a truncated normal distribution within i’s strategy boundaries; 3. evaluate the utility function for the current player in dkþ1;i and dk;i ; 4. update the response using the Metropolis Acceptance (MA) rule; The parameters used by the SABR routine are as in Table 1 Approximated equilibrium calculator (SAEC ? SABR). For each tick of the cooling schedule this routine performs the following actions: 1. consider an initial point dk 2 D; 2. choose a neighbour point dkþ1 2 N ðdk Þ using a truncated normal distribution within the joint strategy boundaries; 3. calculate the maximum regret function as e ¼ maxi ui SABR dk;i ; dk;i ui ðdk Þ ; 4. update the regret counter using the Metropolis Acceptance (MA) rule; The parameters used in the routine are summarised in Table 2.

506


Table 1 Simulated annealing parameters for SABR

Parameter

Value

F D T Tr MA d0;i

ui Ri 1000 1.05 exp ui dkþ1;i ; dk;i ui ðdk Þ =t random from D

Table 2 Simulated annealing parameters for SAEC

Parameter

Value

F D T Tr MA d0

SABRi R 1000 1.05 expððeðdk Þ eðdkþ1 ÞÞ=tÞ random from D

In the Sect. 3 we introduce our contribution which is based on the application of the Harmony Search algorithm.

3 Harmony Search The HS algorithm was inspired by the improvisation process of musicians. When the musicians compose harmony, they usually try various possible combinations of the musical pitches stored in their memory [5]. The search for a perfect state of harmony is analogous to the procedure for finding the optimal solutions to engineering problems. It has been shown that HS outperforms various optimization methods in many optimization problems. One of key success factors of the algorithm is the employment of a novel stochastic derivative which can be used even for discrete variables. Instead of traditional calculus-based gradient, the algorithm utilizes musicians experience as a derivative in searching for an optimal solution. The Harmony Search (HS) algorithm combines features of others heuristic optimization methods. It preserves the history of past vectors similar to Tabu Search (TS) and ability to vary the adaptation rate as Simulated Annealing (SA). Furthermore, HS manages several vectors simultaneously in the process similarly to the Genetic Algorithm (GA). However, the major difference between GA and HS is that HS makes a new vector from all existing vectors and can independently consider each component variable in a vector, while GA utilizes only two of the existing vectors and keep the structure of gene [6]. The general parameters of the algorithm are:

Harmony Search Based Algorithm for Complete Information Equilibrium Table 3 Values of the parameters HSBR

507

Parameter

Value

HMS HMCR PAR MI FW

30 0.9 0.3 1000 du dl 1000

– HMS harmony memory size: number of solution vectors simultaneously handled by the algorithm; – HMCR harmony memory considering rate: is the rate ð0 HMCR 1Þ where HS picks one value randomly from musicians memory; – PAR pitch adjusting rate: PAR ð0 PAR 1Þ is the rate where HS tweaks the value which was originally picked from memory; – MI maximum improvisation: is the number of iterations; – FW is arbitrary the amount of maximum change in pitch adjustment; In this work we utilise such algorithm to design an alternative approximated best response (HSBR). We will incorporate our routine in the SAEC described above. In words, given a joint strategy of i’s opponents, the routine is able to determine the best response like a musican is able to determine his best contribution to the harmony produced by the rest of the orchestra. Approximated best response (HSBR). For each player this routine is described by five steps: – consider an initial point dk;i 2 Di ; – initialise HM with random values taken from a uniform distribution within i’s strategy boundaries (i.e. du for the upper bound and dl for the lower bound); – if a strategy from HM is selected (HMCR) then adjust pitch (PAR) using a uniform distribution U ð0; 1Þ.Generate new strategy o.w; – update HM replacing worst strategy with strategy generated; – find best strategy in HM; The parameters used in the routine are summarised in Table 3. In our implementation we utilise a fixed set of values for the parameters. As anticipated the routine just described is integrated withing the SAEC routine described in the Sect. 2 substituting the original SABR routine. This constitutes an hybrid HS ? SA search algorithm for the calculation of approximated Nash equilibria in continuous games with perfect information. Such modification will result in the following change in the SAEC sequence; in fact step 3. of SAEC is replaced by the following: Calculate the maximum regret function as e ¼ maxi ui HSBR dk;i ; dk;i ui ðdk Þg. In the Sect. 4 we compare the two methods.

508


4 Simulation and Results Our simulation considers a set of continuous games derived from the study of an economic model of competition in the presence of externalities. Such model generates a class called: positional non-cooperative games. In such model every player possesses a utility function defined as: di ðr; pÞ ¼ P

qpi ðrÞ X qj ðrÞ p j2N qj ðrÞ j2N

ð1Þ

where as usual r 2 R is a point in the joint strategy set, the qi s are convex functions that indicate the level of absolute consumption of a particular good and N is the number of players. Eventually p is the positionality index which indicates the social-status signalling capabilities of the good at stake [7]. We do not discuss the details of the economic model here. For the scope of this paper it is sufficient to know that the di s are continuous and quasi-concave functions and hence every game defined as G ¼ ðR; dðr; pÞ; I Þ has a Nash equilibrium in pure strategies. We procede by generating 30 games with random values for the qi s and ps and calculating an approximate Nash equilibrium of each game. In particular we focus our attention on three performance indices: 1. the values of the maximum regret function e at equilibrium; 2. the absolute number of iteration required by the algorithm to converge to an equilibrium point (Convergence Speed); 3. the relative convergence speed (RCS) as the ratio between the number of iteration required to converge to an equilibrium point and the total number of iteration; In Fig. 1 we report the comparison between the maximum regret function calculated with the SABR and HSBR. This measure provides an insight on the accuracy of the approximated solution. The closer to 0 the closer to the real equilibrium point. As we can see the simulated annealing based algorithm has the lowest maximum regret function. The mean (-.0157) of the regret maximum regret function calculated by the harmony search based best response is still close to 0 but the suboptimality is evident from Fig. 1. Though the approximations provided by our algorithm is inferior to the SA based algorithm, our implementation converges faster to the equilibrium point as one can appreciate from Fig. 2. In particular in 20 cases out of 30 (66 %) our algorithm converges faster then the implementation described in [2]. Eventually if we consider the number of steps required to converge to an equilibrium with respect to the total number of computation steps, out implementation still outperforms the simulated annealing version. In Fig. 3 the performances against this statistic is presented. In this case only 50 % of the times our implementation finds the equilibrium point faster (with respect to the previous analysis) than the SA algorithm.


Fig. 1 Maximum regret function

Fig. 2 Convergence speed

509

510


Fig. 3 Relative convergence speed

In the Sect. 5 we draw the conclusion of this work and we provide some future research directions.

5 Conclusions In this work we have exploited the analogy between the behaviour of a rational agent that finds the best response to her opponents’ strategies and that of a jazz musician that tries to find the perfect harmony to be in tune with the overall harmony played through a process of improvisation and improvement. We have shown that the HSBR converges faster to an equilibrium than the corresponding routine based on simulated annealing, with the cost of loosing a small fraction of accuracy in the solution. Future works will be in the direction of using a dynamic tuning for the HS parameters in order to reduce the inaccuracy but maintaining the CS. Another research direction is to consider the implementation of HS to the main routine (SAEC) in order to build a complete approximated Nash equilibrium calculator based on the harmony search approach.


511

References 1. Nash, J.F.: Equilibrium points in n-person games. PNAS 36(1), 48–49 (1950) 2. Vorobeychik, Y., Wellman, M.P.: Stochastic search methods for Nash equilibrium approximation in simulation-based games. In: Proceedings of 7th International Conference on Autonomous Agents and Multiagent Systems, pp. 1055–1062 (2008) 3. Bertsimas, D., Tsitsiklis, J.: Simulated annealing. Stat. Sci. 8(1), 10–15 (1993) 4. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Simulated annealing methods (Chap. 10). In: Numerical Recipies in C: The Art of Scientific Computing. Cambridge University, Cambridge (1988) 5. Woo Geem, Z.: State-of-the-Art in the structure of harmony search algorithm (Chap. 1). Recent Advances in Harmony Search Algorithm. Springer, Heidelberg (2010) 6. Tangpattanakul, P., Meesomboon, A., Artrit, P.: Optimal trajectory of robot manipulator using harmony search algorithms (Chap. 3). Recent Advances in Harmony Search Algorithm. Springer, Heidelberg (2010) 7. Ball, S., Eckel, C.C., Grossman, P.J., Zame, W.: Veblen effects in a theory of conspicuous consumption. Quart. J. Econ. 116(1), 161–188 (2001)