Learn While You Earn: Two Approaches to Learning ... - CiteSeerX

3 downloads 2941 Views 205KB Size Report
Electronics and Computer Science. University of Southampton. Southampton, SO17 .... date its estimate of the unknown parameters using an online. Bayesian ...
Learn While You Earn: Two Approaches to Learning Auction Parameters in Take-it-or-leave-it Auctions Archie C. Chapman, Alex Rogers and Nicholas R. Jennings {ac05r,acr,nrj}@ecs.soton.ac.uk

Electronics and Computer Science University of Southampton Southampton, SO17 1BJ, UK. ABSTRACT Much of the research in auction theory analyses situations where the auctioneer is assumed to know the distribution of valuations the participants hold with complete certainty. However, this is unrealistic. Thus, we analyse cases in which the auctioneer is uncertain about the valuation distributions; specifically, we consider a repeated auction setting in which the auctioneer can learn these distributions. Using take-it-or-leave-it auctions (Sandholm and Gilpin, 2006) as an exemplar auction format, we consider two auction design criteria. Firstly, an auctioneer could maximise expected revenue each time the auction is held. Secondly, an auctioneer could maximise the information gained in earlier auctions (as measured by the Kullback-Liebler divergence between its posterior and prior belief distributions) in order to develop good estimates of the unknowns, which may later be exploited to improve the total revenue earned in the long-run. Simulation results comparing the two approaches indicate that setting offers in take-it-or-leave-it auctions to maximise revenue does not significantly detract from learning performance, but optimising offers for information gain substantially reduces the expected revenue of the auctioneer while not producing significantly better parameter estimates.

Categories and Subject Descriptors J.4 [Social and Behavioural Sciences]: Economics

General Terms Design, Economics

Keywords Take–it–or–leave–it auctions, optimal auction design, Bayesian experimental design

1.

INTRODUCTION

The use of agent mediated electronic commerce has grown rapidly over recent years and represents a vast potential market [5]. This

popularity has prompted much research into agent mediated auctions and, specifically, the development of autonomous software agents that are capable of fulfilling the role of auctioneer or bidder on behalf of their owner. However, much of this work makes strong assumptions about the type of information available to the auction designer. In particular, much of the research assumes that, although individual bidders’ valuations are private, the auctioneer knows the distribution from which they are drawn (e.g. [10, 2, 14]). In this paper we relax this somewhat unrealistic assumption to analyse cases where the auctioneer is uncertain about the distribution of bidders’ valuations, but we consider a repeated auction setting so the auctioneer has an opportunity to learn the distribution. The subject of our investigation is take-it-or-leave-it auctions, an allocation mechanism that generates close to optimal expected utility for the auctioneer, with intuitive rules and simple strategies for participants [14]. We investigate the optimal design of one particular variant of this protocol, a single-offer take-it-or-leave-it auction with symmetric bidders, which we will usually refer to as a TLA. Under this protocol, each buyer is made an offer, a price for the single item, which they can accept or reject. If the buyer accepts the offer, the auction closes and the item is allocated to them. If they reject the offer they leave the auction and the next buyer is offered the item, possibly at a different, higher or lower, price. The auction continues until the item is sold or all potential buyers reject the offers made to them. Consequently, buyers have a dominant strategy to act truthfully and accept an offer if it is less than their valuation. This protocol is used as a canonical setting in which we can examine the fundamentals of our problem, without being distracted by the complications of other auction protocols. In this situation, the auctioneer faces two complementary challenges. The first is to determine the actual values of the offers to be made to the bidders in each auction. To this end, we derive a procedure for determining offers that will maximise the expected revenue of an auction given uncertainty regarding the value of the parameters that describe the distribution of bidders’ valuations. This is achieved by using reasonable priors over parameter values to represent this uncertainty, integrating over these priors to find the expected revenue of the auction, and then selecting the set of offers that maximise expected revenue. The second challenge is determining appropriate priors over the values of these parameters for use in optimising the auction design. To do this, we consider a setting in which the auctioneer holds repeated auctions for identical items with a large pool of bidders (as commonly occurs in online auctions such as eBay) where the auctioneer has an opportunity to

learn these values. An online Bayesian inference algorithm is then used to generate sequentially better estimates of the unknown parameters, using the observed behaviour of bidders in previous auctions. These refined estimates are then used as priors when setting offers for the next auction. The auctions are designed to maximise revenue, so by following the procedure described above the auctioneer “learns while it earns”. However, learning from the outcomes of auctions designed to maximise expected revenue does not necessarily imply efficient, or even rapid, learning. In particular, we expect that if an auctioneer were to have more accurate estimates of the unknown parameter values earlier in a sequence of auctions, the average total revenue of subsequent auctions could be improved. This may, in turn, increase the total revenue of the complete sequence of auctions. To test this hypothesis, we develop a procedure for determining offers that will maximise the information elicited by running the auction. We apply the Bayesian d-optimality criterion, which uses the expected Kullback-Liebler (KL) divergence between all possible posterior distributions and the prior distribution of the unknowns as an experimental design utility function, just as one would use expected revenue [1]. In the context of Bayesian inference, KL divergence is the natural measure of information gain, so using it should improve the learning performance of our Bayesian inference algorithm, in terms of its posterior approximation of the true value of the unknown parameters. Against this background, the work described in this paper extends the state of the art in four important ways: 1. We extend the model of TLAs to consider uncertainty in the values of the parameters of the bidders’ valuation distribution, and derive an expression for the expected revenue of a TLA given uncertainty in these parameters. This is used as a design criterion for the auctioneer interested in maximising expected revenue. 2. We demonstrate how an auctioneer can automatically update its estimate of the unknown parameters using an online Bayesian machine learning algorithm. 3. We derive an expression for the expected KL divergence between the posterior distribution generated by the outcome of a TLA and the prior distribution of unknown parameters describing the bidders’ valuations. This is used as a design criterion for the auctioneer interested in maximising the expected gain in information from the auction. 4. We apply each design criteria to the problem of learning unknown parameter values from the closing price of TLAs, and compare the learning performance and revenue generation of each. Specifically, we consider a uniform distribution with uncertainty about the value of the upper limit of its support, and an exponential distribution with uncertainty about its expected value. Our results show that using offer levels that maximise expected information gain produce only a marginally better learning performance than compared to using offer levels that maximise expected revenue, but at a large cost in terms of foregone revenue. The implication of this result is that, although not specifically targeted at maximising the information gained from an auction, offer levels that maximise expected revenue do indeed provide sufficient information for the Bayesian machine learning algorithm to learn effectively. The paper progresses as follows. The next section reviews the related literature. Section 3 describes the extended TLA model that considers uncertainty in the parameters of the bidders’ valuation

distribution. In section 4 we present the first auction design criterion, which sets offer levels to maximise the expected revenue of the auction given uncertain parameter values. Section 5 details the online Bayesian machine learning algorithm used by our auctioneer. In section 6 we present our second design criterion, which sets offer levels to maximise the expected information gained from an auction. In section 7 information gain and revenue optimising offers are generated for a TLA given an ignorant prior over the unknown parameter values. Using them in the initial round of a series of auctions, we compare how our Bayesian inference algorithm learns from offers generated by each approach. The final section summarises the contributions of the paper and discusses future work.

2.

RELATED WORK

Sandholm and Gilpin examine sequences of take-it-or-leave-it offers that produce near optimal outcomes without full valuation revelation [14]. This protocol is simple to implement and has straightforward, intuitive protocols. They derive equilibrium strategy profiles for bidders in various forms of the take-it-or-leave-it auction protocol, and show that in the single-offer variant, bidders have a dominant strategy to act truthfully. They develop an algorithm that determines the offers an auctioneer should implement to maximise revenue. Their simulation results indicate that for large numbers of bidders, the economic performance in terms of the seller’s expected utility of TLAs is close to that of Myerson’s expected utility maximising auction [10]. However, this result is predicated on the assumption that the parameters governing the buyers’ valuation distributions are known to the auctioneer; a violation of the Wilson doctrine that states that economic institutions should be free of strict distributional assumptions [16]. Our extension of their model, examining uncertainty in parameter values, relaxes this assumption, and therefore lessens the violation of the doctrine, while also making the model more realistic. Other recent work has considered the problem of learning the distribution of auction participants’ valuations. Rogers et al. extend a model of an English auction with discrete bid levels to consider uncertainty in valuation distribution parameter values [12]. They derive a method to increase revenue in such a setting by randomly selecting parameter values according to the prior and implementing bid levels as if they were the true parameter values, rather than integrating over the prior as we do in this work. They then estimate unknown parameter values using Bayesian machine learning. Conitzer and Garera apply three learning techiques — a bandit algorithm, a gradient ascent method and Bayesian learning — to repeated principal-agent problems in which the principal does not know the distribution of agents’ valuations [4]. They find that under the assumption that the principal knows the type of distribution that the agents’ valuations follow (the same assuption we make), the Bayesian learning algorithm converges faster than the other two approaches. Pardoe et al. address the problem of learning the distribution of auction participants’ valuations using sample averaging [11]. Specifically, they consider n English auctions in sequence, where the auctioneer’s aim is to set reserve prices in each auction in order to maximise the total revenue obtained from all auctions. The average revenue for each choice of reserve price is recorded and mapped to actions via a Boltzmann action choice function. They compare the revenue generated using this approach to learning auction parameters to that generated by learning using a Bayesian approach (similar to the method used in this work), and find neither approach dominates the other in all the settings considered. Finally, Jiang and Leyton–Brown investigate the problem of estimating bidders’ valuation distribution from the perspective of a

buyer deciding whether to bid in an English auction, and use the EM algorithm to counter bias induced by ‘hidden bids’ or missing data when estimating the distribution [6]. Now, all of the work discussed above investigates designing bidding strategies that aim to maximise revenue. However, the experimental design literature considers alternative criteria. Specifically, Lindley was the first to suggest using the expected Shannon information of the joint posterior of unknown parameters as an experimental design criterion where the aim is to improve the experimenter’s knowledge about the world [7]. Bernardo then showed that using the expected KL divergence between a prior and possible posterior distributions generated by the outcome of an experiment as a measure of the expected information about the unknown parameters gained by running the experiment is an instance of the general principle of maximising expected utility [1]. Expected KL divergence is the natural measure of information gain to apply when using Bayesian inference, and its use is known as Bayesian d-optimality [3]. Although d-optimality has been widely applied to the design of experiments in many fields, including spatial sampling for geographic data [15], and computational models of physical processes [9], it has not yet been applied to the design of auctions for maximising information gain.

3.

THE AUCTION MODEL

In this paper we consider the model of a TLA introduced by Sandholm and Gilpin [14]. In particular, we investigate the single-offer variant as it is the most readily applicable within a real world scenario, it avoids the problem of having to decide how many offers to make to each potential buyer, and it induces potential buyers to act truthfully. We consider a series of such auctions as a model of a situation where the auctioneer can learn the values of the parameters of the bidders’ valuation distributions. Under the single-offer TLA protocol, an auctioneer has an indivisible good, which it values v0 , that it can allocate to any one of a set of n risk neutral buyers. The auctioneer approaches each potential buyer in sequence and proposes one price for the item. Let o = o1 , . . . , on be the sequence of n offers. If buyer i accepts the offer, the item is allocated to that buyer at price ow=i , the auction finishes, and by the standard quasi-linear utility assumption, the seller gains utility U = ow=i − v0 . If the buyer rejects the offer they leave the auction and the next buyer is offered the item. The auction continues until the item is sold or all bidders reject the offer made to them, at which point ow=0 = v0 and the auctioneer’s gain in utility from holding the auction is zero. We analyse symmetric TLAs, where each buyer has a valuation vi for the good that is independently drawn from a common distribution with cumulative density F(v). Making this restriction removes the problem of ordering offers, allowing us to focus solely on the offer levels implemented by the auctioneer. The bidders’ valuation distribution is itself characterised by a vector of parameters θ which are, in the full information case, known to the auctioneer. In the extended model considered here, this assumption is relaxed to analyse the effects of uncertainty over the parameters.

3.1

The Full Information Case

In the single-offer TLA protocol, the decision regarding whether to accept or reject an offer is dependent only on a buyer’s own valuation. As such, each buyer’s dominant strategy is to act truthfully and accept any offer that is less than its valuation. Subsequently, the probability of the auction closing on a particular offer is simply the probability that the current bidder i has a valuation vi greater than oi , multiplied by the probability that all previous offers were rejected. Thus, the probability of closing on offer P(ow=i ), given

π ← v0 for i=n:1, oi ← arg max P(o) o + F(o) π o

π ← P(ow=i ) oi + F(oi ) π

Figure 1: Pseudo-code for Sandholm and Gilpin’s algorithm for calculating the optimal offers in the single-offer TLA. valuation distribution F(v), and given that all i − 1 previous offers have been rejected, is: i−1

P(ow=i ) = [1 − F(oi )]

∏ F(o j ),

1 ≤ i ≤ n.

(1)

j=1

Then, the probability that the item is not allocated and is retained by the auctioneer is the probability that every bidder has a valuation lower than the offer it is made: n

P(ow=0 ) = ∏ F(oi ).

(2)

i=1

So, under Sandholm and Gilpin’s model (recalling that o0 = v0 ), the expected utility of an auctioneer setting offer levels at the beginning of a round to maximise revenue in a TLA is: n

E[U(o)] =

∑ P(ow=i ) oi .

(3)

i=0

Sandholm and Gilpin construct the algorithm shown in figure 1 to generate offers to maximise this expression. Working backwards from the nth offer, see that the expected value of an offer is determined only by its own value and that of the offer levels after it. Consequently, the auctioneer can implement a simple algorithm to solve for the optimal offers. To begin, the auctioneer sets a virtual reserve price, π, equal to its own valuation (i.e. π = v0 ). It then sets the last offer level, on , to maximise its expected revenue given π, recalculates the virtual reserve price using this offer level, and computes the next highest offer on−1 . By backward induction, as each offer is optimal at each step of the auction, the algorithm produces optimal offer levels, and offers naturally decrease over time. Examples of offer levels generated by this method are given in figure 3, and are discussed in section 4.1.

3.2

The Incomplete Information Case

The full information model assumes the auctioneer has perfect knowledge of the distribution of bidders’ valuations, (i.e. knows the values of the parameters of the bidders’ valuation distribution with complete certainty). This is clearly unrealistic. Thus, the model we consider is of the more general case where the vector of parameters, θ, describing the bidders’ valuation distribution, F(v, θ), is not known to the auctioneer. In the case of uncertainty in the parameter values, we represent the auctioneers’ initial uncertainty about θ with a prior P(θ). This represents an initial assumption as to which values of θ are most likely to occur. We illustrate this general approach by considering two example valuation distributions. Firstly, the bidders’ valuations may be drawn from a uniform distribution, where the unknown parameter, b, is the upper limit of the support of the distribution: F(v) =

v−a b−a

The auctioneer’s initial uncertainty about b may be represented by

this criterion to those generated by Sandholm and Gilpin’s method (from section 3.1).

d←∞ while d > stopping condition, for i=n:1 o0i

4.1

← arg max E[U(o)] oi

d←0

for i=1:n, d ← max(d, abs(o0i − oi )) oi ← o0i

Figure 2: Pseudo-code algorithm for calculating solutions for optimal offer levels in the extended model. a uniform prior with support [b, b].1 Secondly, the bidders’ valuations may be drawn from an exponential distribution, where the unknown parameter is the value of the mean of the distribution, 1/α: F(v) = 1 − e−αv . Again, the auctioneer’s initial uncertainty about the value of α may also be represented by a uniform prior, α ∈ [α, α]. We will illustrate how these specific forms of the valuation distribution affect the offer levels implemented to maximise revenue and information gain in section 7.1. These two examples have been chosen because they frequently appear in the literature (e.g. [13, 12, 4, 11]), and uncertainty in valuations drawn from these distributions is easily represented by uncertainty in a single parameter value. However the general approach may be used to learn distributions with multiple parameters, or may be used in conjunction with Bayesian model selection techniques [8].

4.

EXPECTED REVENUE CRITERION

Uncertainty in the value of θ requires a new method to maximise the expected revenue of the auction. First, a change of notation is necessary to explicitly consider uncertainty in the parameters. Thus, given a valuation distribution F(v, θ) and a set of offers o, let the probability of closing on offer oi , after i − 1 other offers have been rejected, or of not closing, o0 , be written as:  i−1     [1 − F(oi , θ)] ∏ F(o j , θ), 1 ≤ i ≤ n, j=1 P(ow=i |θ, o) = (4) n    i = 0.  ∏ F(oi , θ) i=1

In order to maximise expected revenue, the auctioneer can integrate out uncertainty in the parameter values using P(θ) to moderate the value of the expected revenue for each value of θ. Thus, the expected utility of an auctioneer interested in maximising revenue by setting offers, o, is: Z

E[U(o)] =

n

P(θ) ∑ P(ow=i |θ, o) oi dθ.

(5)

i=o

Note that in practice we perform these calculations numerically by considering a discrete approximation of P(θ) from θ to θ. In section 4.1 we describe how this criterion is implemented, provide an algorithm to maximise it, and compare the offers generated by 1 Our use of uniform priors is reasonable because in economic scenarios, there are almost always bounds on the likely valuations bidders will hold for an item. This information is captured in range of the prior.

Maximum Expected Revenue Algorithm

In order to find the optimal offer levels, we must find the values o that maximise equation (5). However, as we were unable to find an analytical solution to this expression we use a numerical algorithm. When there is no uncertainty in the parameter value, the value of each revenue maximising offer is dependent only on the offers that follow it, so it may be solved by the algorithm shown in figure 1. However this is not the case when uncertainty exists. Thus, an optimisation algorithm based on Jacobi iteration is used. Specifically, while fixing all other offers, we find the value of oi that maximises equation (5). This can be found using any well established one-dimensional method (such as golden section search). We sequentially update all oi and iterate the process until the offer levels converge to the necessary accuracy. We present this numerical algorithm in pseudo-code in figure 2, and note that E[U(o)] represents the revenue expression shown in equation (5). Also note its relationship to Sandholm and Gilpin’s algorithm; the inner loop of our algorithm is analogous to the algorithm presented in figure 1. Whilst our purpose is not to prove the convergence properties of this iterative algorithm, in our experiments it was found to converge reliably and rapidly, with the only constraints on starting conditions being that the offers are in the correct order (i.e. descending). We compare the offers generated by Sandholm and Gilpin’s full information algorithm to those generated by our method for the case of a uniform prior distribution over the parameters of both the uniform and exponential valuation distributions. That is, we compare offers calculated in the full information case at discrete values of b and α over the range of 0.2 to 2.0, with those calculated by our method using a uniform prior with support [0.2, 2.0]. Results are shown in figure 3. Note that, by averaging over θ the density pattern of offer levels for an exponential distribution is markedly changed. In the full information case, the offer levels are closer together toward the top of the range, whereas in the incomplete information case, by virtue of integrating over P(θ) they are more closely spaced at the bottom of the range.

5. ESTIMATING AUCTION PARAMETERS In section 3.2 of this paper we extended the model of TLAs to consider uncertainty in the parameters that describe the bidders’ valuation distribution. Now, under the commonly made assumption that the participants in each auction are drawn from a large pool of potential bidders whose valuations are described by the fixed distribution F(v) (e.g. [12, 4, 11]), the auctioneer can use the information contained in the closing price to refine its estimate of the unknown auction parameters. In order to compare our results to Sandholm and Gilpin’s, we only consider the case where the auctioneer learns at the end of each auction. 2 Given this, we use Bayesian inference to estimate parameter values via the expressions for the probability of closing on a particular offer given in equation (4). Bayesian inference is particularly suitable in this context as, while generally it converges to the same result as maximum-likelihood statistical methods, rather than providing a single parameter estimate at each iteration, it provides a full distribution that describes the auctioneer’s belief over the entire range of possible parameter values. The shape of this distribution indicates the confidence that the auction2 However, it is concievable that the auctioneer may update its beliefs over the unknown parameters after each offer is accepted or rejected.

Offers - uniform valuation distribution

1 , . . . , oT , we have: ow = ow=i w=i T

∏ P(otw=i |θ, ot ) P(θ)

1.5

P(θ|ow , O) = Z

t=1 T



1.0

t=1

0 1.8 2.0 Incomplete information

Offers - exponential valuation distribution 12

9

5.1

6

3

0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Full Information (α)

1.8 2.0 Incomplete information

Figure 3: Maximum expected revenue offers, assuming full information, or incomplete information with an uninformative prior.

eer should have in his current estimate [8]. In addition, Bayesian inference is often computationally simpler than maximum likelihood methods, since it does not require us to maximise a function over several dimensions. By this approach, the auctioneer updates its joint distribution over the unknown parameters θ using Bayes’ theorem:

P(θ|ow=i , o) =

P(ow=i |θ, o)P(θ) . P(ow=i |o)

(7)

Again, in this expression the denominator is a normalising factor that ensures that P (θ|ow , ot ) sums to one over the range of possible values of θ. The auctioneer adopts the following procedure to learn the unknown parameter values. Using its prior belief, it selects offers for the first auction using equation (5). After observing the closing price of this auction, it uses the expression in equation (7) to update its belief over the parameters θ. This belief distribution is then used as P(θ) to calculate the offers for the next auction. The process of refining the estimate of θ and implementing new offer levels is repeated for subsequent auctions. In the next section we examine how refining its estimate of P(θ|ow , O) affects the offer levels implemented by the auctioneer.

0.5

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Full Information (b)

. P(otw=i |θ, ot ) P(θ)dθ

(6)

In this expression, P(θ) is the auctioneer’s prior belief (i.e. a distribution containing any information about bidders’ valuations that is known before any observations have been made), typically initialised as a uniform distribution. P(ow=i |θ, o) is the probability of the auction closing on offer ow=i given a value of θ and offers o. P(ow=i |o) is a normalisation factor which ensures that ∑ni=0 P(θ|ow=i , o) = 1. This describes the case where the auctioneer has made an observation of a single auction. In general, if T such auctions have been observed, the auctioneer can use all of this evidence to improve its estimate. Thus if the offer levels used in auction t ∈ T are ot = ot1 , . . . , otn , all the levels used in T auctions were O = {o1 , . . . , oT }, and the sequence of observed closing prices were

Comparisons to the Full Information Case

As the auctioneer makes more observations, its estimate of the unknown parameters converges. Specifically, looking at how the offer levels change as the estimate of unknowns converges reveals the relationship between our model (with learning) and that of Sandholm and Gilpin. As the estimate of P(θ) converges, the offers generated by maximising equation (5) approach the offers implemented by Sandholm and Gilpin’s algorithm for known parameter values. In particular, figure 4 shows that the offers generated by our criterion converge to the offers generated by Sandholm and Gilpin’s algorithm as the posterior estimate converges to the true parameter value. In these simulations, the true value of the upper limit of the support of the uniform distribution is b = 1.3, and the inverse of the mean of the exponential distribution is α = 1.3. Note that after 20 repetitions, there is still significant uncertainty in the beliefs. Consequently, the offers are close to, but not identical to, the offers generated by Sandholm and Gilpin’s algorithm.

6. EXPECTED INFORMATION GAIN CRITERION In section 5, the auctioneer learns θ through observing the behaviour of bidders in auctions designed to maximise revenue, and by this process, improves the average revenue generated by the auction. However, if the auctioneer had better estimates of θ earlier in the sequence of auctions, it would be able to implement better offers and earn greater revenue in subsequent auctions. To this end, in this section, the problem of setting offer levels specifically so the auctioneer can learn in the most effective way possible is addressed. By doing so, we can then answer the associated question of whether more rapid learning in earlier stages of the series of auctions will increase the revenue earned over the longer term. The gain in information about parameters θ obtained by selecting an experimental (auction) design o = {o1 , . . . on }, observing an outcome ow=i which is then used to update the estimate of θ, may be measured by the KL divergence between the prior and the posterior distributions of θ: µ

Z

DKL [P(θ|ow=i , o)||P(θ)] =

P(θ|ow=i , o) log

¶ P(θ|ow=i , o) dθ. P(θ)

However, as the outcome has not yet been observed, the expected amount of information gained by holding an auction is the average

Uniform valuation distribution Offer Levels

Posterior Probability P (b | ow, O)

2

0.08

1.8

Posterior after 20 auctions Posterior after 10 auctions Posterior after 5 auctions Posterior after 2 auctions

0.07 0.06

Full information Incomplete information

1.6 1.4

0.05

1.2 1

0.04

0.8 0.03 0.6 0.02 0.4 0.01 0

0.2 0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0

2

2

4

6

Upper limit of support (b)

8

10

12

14

16

18

20

Repeated Auctions

Exponential valuation distribution Offer Levels

Posterior Probability P (α | ow, O)

5

0.04 Posterior after 20 auctions Posterior after 10 auctions Posterior after 5 auctions Posterior after 2 auctions

0.035 0.03

4.5 4 3.5

0.025

Full information Incomplete information

3 2.5

0.02

2 0.015 1.5 0.01 1 0.005 0

0.5 0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0

2

2

4

6

8

10

12

14

16

18

20

Repeated Auctions

Inverse mean valuation (α)

Figure 4: Offers converge to those generated by Sandholm and Gilpin’s method (dots) as the parameter estimate converges. KL divergence over all possible outcomes in o, as each outcome will result in a different posterior estimate. Thus, the expected utility of an auctioneer interested in maximising the information it gains about the bidders’ valuation distribution is: µ ¶ Z n P(θ|ow=i , o) E[U(o)] = ∑ P(ow=i |o) P(θ|ow=i , o) log dθ. P(θ) i=0 (8) We now describe how we implement this criterion.

6.1

Maximum Expected KL Divergence Algorithm

Equation (8) is in a clumsy form that requires computing all potential posterior distributions generated by the different outcomes of the auction. By applying Bayes’ theorem twice the following expression can be implemented in our maximum expected KL divergence algorithm: E[U(o)]

µ ¶ P(ow=i |θ, o) P(θ) P(θ|ow=i , o) log dθ P(ow=i |o) P(θ) i=0 µ ¶ n Z P(ow=i |θ, |o) P(θ) = ∑ P(ow=i |θ, o) P(θ) log dθ P(ow=i ||o) P(θ) i=0 ¶ µ n Z P(ow=i |θ, o) dθ (9) = ∑ P(ow=i |θ, o) P(θ) log P(ow=i |o) i=0 n

=

Z

∑ P(ow=i |o)

One convenient aspect of equation (9) is that it may also be max-

imised using the Jacobi iteration algorithm used to maximise the expected revenue criterion (figure 2, discussed in section 4.1). In this case, E[U(o)] represents the expected KL divergence given by equation (9).

7.

COMPARING THE CRITERIA

In this section we compare how setting offers using either the expected revenue or the expected information gain criterion affects the performance of the Bayesian machine learning algorithm and the revenue generated by the auction. As before, we consider a uniform distribution with uncertainty about the upper boundary of the support, and an exponential distribution with uncertainty about the value of the mean of the distribution. We first compare offers set to maximise expected information gain and expected revenue, generated from a (relatively uninformative) uniform prior over unknown parameter values. We then implement these offers in a simulated TLA, and compare how our Bayesian inference algorithm learns, and how much revenue is generated. The results are then used to explore the viability of foregoing revenue at the start of a series of auctions in order to learn faster with greater revenue in later stages as a result of better estimates of the unknowns.

7.1

The Offer Levels

We first compare the offers that maximise each design criterion. Figure 5 shows the differences between the offers generated by each approach, for four, eight and twelve bidders. Note that in all

Bid levels - uniform valuation distribution

Uniform valuation distribution

1.5 14

1.0

12

10

0.5

8

0

n=4

n=8

n = 12

6

Bid levels - exponential valuation distribution

4

8 2

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

1.8

2

b 6

Exponential valuation distribution 4 2.2 2

2

1.8 1.6

0

1.4

n=4

n=8

n = 12

1.2 1

Figure 5: Optimal offers for DKL (solid) and revenue (dotted), for 4, 8 and 12 potential buyers.

0.8 0.6 0.4

cases, the lowest offer (effectively the reserve price) of the revenue maximising auction is lower than the corresponding information gain offer, as the auctioneer does not earn any revenue if the auction closes without the item being allocated, but does learn something about the bidders’ valuations. Hence, an auctioneer interested in revenue maximisation needs to implement a low reserve price (final offer) to guarantee some utility from holding the auction, whereas an auctioneer interested in maximising the information yielded by the auction learns from any outcome. For both distributions, the presence of additional buyers acts to expand the range of the revenue maximising offers by much more than in the information gain maximising case, particularly with an exponential valuation distribution. Intuitively, an auctioneer interested in maximising revenue should use additional offers to increase the value of its first offer, as even if it is a rare event, a high revenue is generated. For the information case, there is no overall benefit for making such a rare observation.

7.2

Learning Performance

We now compare the expected KL divergence of TLAs with either information gain maximising or revenue maximising offers. To this end, assuming uniform priors with support [0.2, 2.0], we generate offers using each criteria and then calculate the expected KL divergence observed between the prior and posterior after running one auction. We do this for true parameter values over the same range as the prior and show the results in figure 6. In particular, note that the information gain maximising levels do not produce a superior expected KL divergence across the entire range. This is not surprising, as the information gain method maximises expected KL divergence, but what it does show is that setting offers to maximise the expected revenue of the TLA produces a comparable amount of additional information to the maximum expected information gain

0.2

0.2

0.4

0.6

0.8

1

α

1.2

1.4

1.6

Figure 6: Expected DKL for DKL (solid) and revenue (dotted) maximising offers. offers. Furthermore, results from 100, 000 repetitions of the auction indicated no statistically significant difference between the expectation or the maximum likelihood estimate of the unknown parameter generated from offers generated using either criteria. The consequence is that setting offers to maximise information gain rather than revenue provides only a small improvement to the accuracy of the posterior estimate produced by the learning algorithm.

7.3

Revenue Generation

What is of real concern to the auctioneer is whether or not the information gain maximising offers produce a refined posterior estimate that can be used to generate greater revenue in subsequent auctions. We addresses this question by looking at the effects of learning from either approach on the revenue generated if the auction were run a second time. In all of the following, simulations consider the case of n = 8 bidders. In the first auction, offers are generated from an ignorant prior to either maximise expected revenue or expected information gain, as in the previous section. In the second auction, both sets of offers maximise expected revenue, however one set is generated using the posterior of the information gain maximising auction, while the other is generated using the posterior of the revenue maximising auction. The rationale for this experiment is that it is the limiting case: In the first auction you have the most to learn, while the second auction presents the greatest potential to exploit the additional information to earn additional revenue. Figure 7 shows that in the first auction, the revenue maximising

Average revenue - uniform valuation distribution Offers maximising 1st auction 2nd auction DKL , revenue 0.5804 0.6846 Revenue, revenue 0.6126 0.6846

Average revenue - exponential valuation distribution Offers maximising 1st auction 2nd auction DKL , revenue 1.393 1.723 Revenue, revenue 1.472 1.720 Figure 7: Revenue comparison: First auction revenue generated from DKL (solid) and revenue (dotted) maximising offers; Second auction revenue generated from revenue maximising offers using refined prior from outcome of first auction based on DKL (solid) or revenue (dotted) maximising offers. Average values for each are shown in the table on the right. offers generate significantly more revenue than those set to maximise information gain. The average expected revenue for the uniform distribution is 0.6126 for the revenue maximising offers and 0.5804 for the information gain maximising offers. For the exponential distribution, the values are 1.472 and 1.393, respectively. Thus, in setting the offers to maximise expected information gain, the auctioneer has to forgo a significant amount of revenue, and this is particularly so in the case of exponential valuation distributions. To justify this action, subsequent auctions must be able to make up the difference. However, figure 7 shows that the revenue generated by the second auction is only slightly affected by differences in the parameter estimates used (i.e. at the fourth significant figure). That is, the refinements to the prior produced by using information gain maximising offers in the first auction do not generate enough revenue in the second auction than if revenue maximising offer levels had been used in both auctions. Furthermore, even if the extremely small benefit is maintained in subsequent auctions, unless a very long horizon is used (i.e. > 40 repetitions, without discounting), the future benefits of more rapid learning performance will not overcome the revenue foregone in the very first auction. On the other hand, the results presented above indicate that setting offers in TLAs to maximise revenue does not significantly detract from learning performance.

8.

CONCLUSIONS

In this paper we extended an existing model of TLAs to consider uncertainty in the value of parameters describing the bidders’ valuation distribution. We derived two criteria; one that maximised the expected revenue of the auction given this parameter uncertainty, and one that maximised the expected information gained about the unknown parameter values by holding the auction. We used the criteria to test whether, by adopting a strategy of learning the unknown parameters more quickly, an auctioneer could increase the long-term revenue generated by the entire series of auctions. Our results show that a TLA optimised to earn revenue reveals close to the same amount of information as a TLA designed specifically for this purpose would, so also allows the auctioneer to come close to learning as rapidly as if they had optimise the auction specifically to learn. However, the benefits of learning slightly more earlier in a series of auctions do not manifest themselves in significantly more revenue. As such, the revenue forgone by an auctioneer who implements offer levels to maximise the expected

information gain of a TLA is not recouped by any additional revenue in future auctions. Future work on this topic involves, firstly, showing that these results can be generalised to other auction formats (e.g. discrete bid auctions). Secondly, the criteria may be used in scenarios characterised by more uncertainty, for example situations characterised by valuation distributions with more than one parameter or asymmetric bidders, or situations where the auctioneer has to decide between a selection of statistical models. These situations involve more for the auctioneer to learn and are therefore a stronger test of the results presented here.

9.

REFERENCES

[1] J. M. Bernardo. Expected information as expected utility. The Annals of Statistics, 7:686–690, 1979. [2] J. Bredin and D. C. Parkes. Models for truthful online double auctions. In Proceedings of the 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), pages 50–59, Arlington, Virginia, 2005. AUAI Press. [3] K. Chaloner and I. Verdinelli. Bayesian experimental design: A review. Statist. Sci., 10:273–304, 1995. [4] V. Conitzer and N. Garera. Online learning algorithms for online principal–agent problems (and selling goods online). In Proceedings of the 23rd International Conference on Machine Learning (ICML ’06), 2006. [5] M. He, N. R. Jennings, and H.-F. Leung. On agent-mediated electronic commerce. IEEE Transactions on Knowledge and Data Engineering, 15:985—1003, 2003. [6] A. X. Jiang and K. Leyton-Brown. Bidding agents for online auctions with hidden bids. Machine Learning Journal, 67(1–2):117–143, 2007. [7] D. V. Lindley. On a measure of the information provided by an experiment. Ann. Math. Statist., 27:986–1005, 1956. [8] D. J. C. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003. [9] T. Mitchell and M. Morris. Bayesian design and analysis of computer experiments: two examples. Statistica Sinica, 2:359–379, 1992. [10] R. Myerson. Optimal auction design. Mathematics of Operations Research, 6(1):58–73, 1981. [11] D. Pardoe, P. Stone, M. Saar-Tsechansky, and K. Tomak. Adaptive mechanism design: A metalearning approach. In Proceedings of the Eighth International Conference on Electronic Commerce (ICEC ’06), 2006. [12] A. Rogers, E. David, J. Schiff, S. Kraus, and N. R. Jennings. Learning environmental parameters for the design of optimal English auctions with discrete bid levels. In H. L. Poutr, N. Sadeh, and J. Sverker, editors, Agent-mediated Electronic Commerce, Designing Trading Agents and Mechanisms: AAMAS 2005 Workshop, AMEC, pages 1–15, Utrecht, Netherlands, July 25 2005. Springer. [13] M. H. Rothkopf and R. Harstad. On the role of discrete bid levels in oral auctions. European Journal of Operations Research, 74:572–581, 1994. [14] T. Sandholm and A. Gilpin. Sequences of take-it-or-leave-it offers: Near-optimal auctions without full valuation revelation. In Proceedings of AAMAS’06, May 8-12, Hakodate, Hokkaido, Japan, 2006. [15] M. C. Shrewy and H. P. Wynn. Maximum entropy sampling. Journal of Applied Statistics, 14:165–170, 1987. [16] R. Wilson. Incentive efficiency of double auctions. Econometrica, 53:1101–1115, 1985.