Fast gradient descent method for mean-CVaR optimization

15 downloads 0 Views 172KB Size Report
Feb 27, 2009 - Abstract. We propose an iterative gradient descent procedure for computing approximate solutions for the scenario-based mean-CVaR portfolio ...
Fast gradient descent method for mean-CVaR optimization Garud Iyengar



Alfred Ka Chun Ma



February 27, 2009

Abstract We propose an iterative gradient descent procedure for computing approximate solutions for the scenario-based mean-CVaR portfolio selection problem. This procedure is based on an algorithm proposed by Nesterov [13] for solving non-smooth convex optimization problems. Our procedure does not require any linear programming solver and in many cases the iterative steps can be solved in closed form. We show that this method is significantly superior to the linear programming approach as the number of scenarios becomes large.

1

Introduction {intro}

The goal of portfolio selection is to distribute a fixed amount of capital over a given set of investment opportunities to maximize “return” while managing the “risk”. Although the benefits of diversifying were well-known, the first mathematical model for portfolio selection was proposed by Markowitz [10]. In the Markowitz model, the “return” of a portfolio is given by the expected return of the portfolio and the “risk” of the portfolio is measured by the variance of the return of the portfolio. The variance is a good measure of risk only if the returns are symmetric. The returns on equity, at least for short time horizons, can be approximated by a Normal random variable; consequently, the variance is an adequate measure for the risk in the portfolio. However, when the distribution of the returns of the underlying assets is not symmetric, variance is not an adequate risk measure. Recently, Conditional Value-at-Risk (CVaR) [15] has been proposed as a risk measure for asset classes that have asymmetric return distributions. CVaR has many nice properties: it is coherent risk measure [4], Rockafellar and Uryasev [14] show that the CVaR of ∗ Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027. [email protected] † Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027. [email protected]

1

Email: Email:

a portfolio can be computed from scenario by solving a linear program (LP), using LP duality CVaR upper bound constraints can be formulated as linear constraints, and empirical studies suggest that the mean-CVaR approach where the portfolio return is given by its expected return and the portfolio risk is given by the CVaR of the portfolio is more appropriate than the mean-variance approach if the risk-return relation is nonlinear [1]. From the results in Rockafellar and Uryasev [14], it follows that the mean-CVaR portfolio selection problem reduces to an LP. However, the resulting LP is very ill-conditioned and solving such LP, particularly when the scenario size is large, is very difficult in practice [2]. We adapt a gradient descent method proposed by Nesterov [13] to solve the mean-CVaR optimization problem. The method we propose does not require solving an LP and therefore it is able to potentially handle a very large number of scenarios. In addition, the method can be easily implemented. These features imply that a portfolio manager can use our method without installing any third-party LP solvers. We also show how to incorporate analysts’ views into the mean-CVaR portfolio selection problem [5, 6].

2

Mean-CVaR optimization

Suppose there are n assets in the market. Let R ∈ Rn denote the random returns on the n assets. Let !n w ∈ Rn denote the portfolio of the investor, i.e., 1T w = i=1 wi = 1. The CVaR1−β (−Rw) at the probability β ∈ (0, 1) of the portfolio w is defined as

−1 CVaR1−β (−Rw) = EP [−Rw | Rw ≤ FRw (β)],

where −Rw denotes the loss on the portfolio w, and FRw denote the cumulative density function (CDF) of the random variable Rw. Thus, the CVaR is conditional expectation of the lowest β-quantile of the random portfolio return. The mean-CVaR portfolio selection problem we consider is as follows:

min CVaR1−β (−Rw),

w∈W

where the set W is the set of all feasible portfolios w. For example, by setting " # W = w : EP (R)T w = r, 1T w = 1 , 2

(1) {meancvar}

where EP [R] denotes the expected returns on the assets, one recovers the canonical mean-CVaR portfolio selection problem where the goal is to select the minimum CVaR portfolio that has a target return r. Rockafellar and Uryasev [14, 15] show that $ % 1 CVaR1−β (−Rw) = min τ + EP (−Rw − τ )+ , τ β

(2) {cvar}

where 1 − β is the confidence level and the function (x)+ = max(x, 0). It is typically very hard to explicitly characterize the distribution of the returns R, and therefore, in practice, EP (−Rw − τ )+ is approximated by using return vectors R generated by some scenario generator [8]. Let {Ri : i = 1, . . . , N } denote N scenarios and let pi , i = 1, . . . , N , denote the probability of the i-th scenario. Then the expectation in (2) can be approximated as follows. EP (−Rw − τ )+ ≈

N & i=1

pi (−RTi w − τ )+ .

By introducing new variables ai ≥ (−RTi w − τ )+ , i = 1, i = 1, . . . , N , the optimization problem (1) can be reformulated into the linear program (LP) min τ +

1 β

!N

i=1

pi ai

s.t. ai ≥ −RTi w − τ,

i = 1, . . . , N,

(3) {lpmeancva

Aw = b, a ≥ 0, for W in the form of W =

" # w : Aw = b . The LP (3) is large – it has O(N ) constraints, and is, often,

very ill-conditioned [2]. Thus, solving the LP (3) as the number of samples N becomes large is very hard. See Section 4 for further evidence of the numerical instability of the LP formulation. Our solution method for the optimization problem (1) is based on the following variational characterization of CVaR [4, 16, 9] CVaR1−β (−Rw) = max EQ (−Rw), Q∈Q

where Q denotes a probability measure on the returns R and the set of measures " 1# ∂Q ≤ . Q= Q:0≤ ∂P β

3

(4) {cvardual}

Thus, the mean-CVaR portfolio selection problem (1) can be formulated as the following min-max problem

min max EQ (−Rw).

(5) {meancvarg

w∈W Q∈Q

This formulation can be thought of as a game played by the nature and the portfolio manager. It is then natural to consider iterative methods to solve the mean-CVaR portfolio selection problem. When the distribution P is approximated by N scenarios, the set of measures Q is given by " 1 # QN = q ∈ RN : 1T q = 1, 0 ≤ q ≤ p , β

(6) {eq:CQ-deg

where p = (p1 , . . . , pN )T and the inequalities are interpreted as component-wise inequalities. From now on, we let RT = [R1 , . . . , RN ] ∈ Rn×N denote the matrix where the i-th column is the asset return in the i-the scenario, i = 1, . . . , N . Thus, the scenario-based mean-CVaR problem reduces to the saddle-point problem

min max

w∈W q∈QN

3

'

( − qT Rw .

(7) {meancvar-

An iterative algorithm

We solve the minimax problem (7) using a gradient-based procedure proposed by Nesterov [13]. This procedure requires that the admissible set of portfolios W be bounded. In practice, there is always margin requirement on the short positions in the portfolio. Such a margin requirement can be modeled as follows.

(1 + M )

& i

(−wi )+ ≤

&

wi+ ,

(8) {marginreq

i

for some M > 0. Since the portfolio weights sum to one, we have

1=

n & j=1

wj+ −

n & j=1

(−wj )+ ≥ (1 + M )

Therefore, we have &w&1 =

n &

n & j=1

(−wj )+ −

wj+ + (−wj )+ = 1 + 2

n & j=1

j=1

n &

(−wj )+ = M

j=1

(−wj )+ ≤ 1 +

n &

(−wj )+ .

(9) {marginbou

j=1

2 . M

(10) {normbound

In order to keep the portfolios w bounded, we will impose constraints in the form of &w&1 ≤ 1 + 2/M or &w&2 ≤ &w&1 ≤ 1 + 2/M . 4

A naive approach to solve the modified minimax problem (7) would involve generating iterates {(w(k) , q(k) }, where w(k) is the best-response to the nature’s move q(k−1) , i.e., w(k) = argmin w∈W

"

# − (q(k−1) )T Rw ,

and q(k) is the best-response to the investor’s move w(k−1) , i.e., q(k) = argmin q∈QN

"

− qT Rw(k−1)

#

The objective qT Rw is not smooth in (w, q); consequently, this iterative scheme converges very slowly. Nesterov [13] devised a procedure that is able to escape this convergence bottleneck. The Nesterov procedure consists of two steps. The first step is “smoothing” the optimization in q:. Let w(k) denote the k-th iterate. Then the smoothed best response of nature is given by q(k) = argmax q∈Q

"

# − qT Rw(k) − µd2 (q) ,

(11) {fmu}

where µ > 0 and d2 (q) is any strongly convex function. We choose

d2 (q) =

N & i=1

qi log qi + (pi /β − qi ) log(pi /β − qi ).

In Appendix A, we show that d2 (q) is strongly convex with parameter σ2 =

1 1−β

(12) {d2}

with respect to the %1 -norm.

The Lagrangian function L for optimization in q is given by

L(q) = −qT Rw(k) − µd2 (q) − α(1T q − 1) +

N & i=1

µi qi −

N & i=1

νi (qi − pi /β).

Setting ∇q L = 0, we have that q(k) must satisfy −RTi w(k) − µ ln

)

(k)

qi

(k)

pi /β − qi

*

− α + µi − νi = 0,

i.e., (k)

qi

(k)

pi /β − qi

=e

(k) −α+µ −ν −RT i i i w µ

.

Thus, it follows that for all values of (α, µ, ν), we have that 0 < q(k) < p/β. Therefore, complementary

5

slackness implies that µ = ν = 0, and (k)

qi

=

β −1 pi 1+e

1 T (k) +α) µ (Ri w

,

i = 1, . . . , N,

(13) {q-opt}

where α is the solution of the equation & i

pi β −1 1+e

1 T (k) +α) µ (Ri w

= 1.

(14) {sumofq}

The second-step in the Nesterov procedure is to compute the update w(k) using a convex combination of two updates z(k) and y(k) defined as follows. y(k) = argmin y∈W

z

(k)

w(k)

"

− q(k−1) Ry +

# Ω &y − w(k−1) &22 , 2µσ2

) Ω *# t + 1 * (t) q Rz + &z&2 , 2 2µσ2 z∈W t=0 ) 1 * )k + 1* = z(k) + y(k) , k+3 k+3

= argmin

"



k−1 &)

(15) {ykdef} (16) {zkdef} (17)

where Ω = max

max

$q$1 ≤1 $w$2 ≤1

+

qT Rw

,2

= max &Ri &22 i

and σ2 is the convexity parameter for the strongly convex function d2 (q). The iterate y(k) is a modified best-response where one penalizes large movements from the last response w(k−1) . The iterate z(k) in (16) ' ( considers all the previous responses q(t) : t = 0, . . . , k − 1 to compute the response. The weight on y(k) increases as the iteration count k increases.

When the set W is described by linear equalities, i.e., W =

'

( w : Aw = b , we add the additional

constraint &w&2 ≤ 1 + 2/M , and in this case it is easy to show that (15) and (16) can be solved in closed form. When the set W is described by linear inequality constraints, we impose the constraint &w&1 ≤ 1+2/M . Then (15) and (16) are quadratic programs that can, in practice, be solved very efficiently using active set methods. Note that each quadratic problem encountered in the course of our proposed iterative procedure has n variables and O(m) constraints, where m denotes the number of components in b. - q -) of the algorithm displayed in Figure 1 satisfies Nesterov [13] proves that after N steps the output (w, " # " # ) D D Ω * 12 1 ∆ 1 2 - − max q -T Rw < δN = · , qT Rw q∈QN w∈W σ2 K min

6

(18) {gap}

Nesterov Procedure 2 2 D1 ← 12 (1 + M ) , D2 ← Ω ← maxi &Ri &22 , w(0) ← n1 1 for k ← 0 to K do

+ 1 β − β ln(β) − (1 − β) ln(1 − β), . ε , K ← 1ε ΩDσ12D2 , µ ← 2D 2

σ2 ←

1 1−β

# − qT Rw(k) − µd2 (q) # " Ω (k) 2 &y − w & y(k+1) ← argminy∈W − q(k) Ry + 2µσ ) * 2 ) 2# " ! k t+1 Ω (t) (k+1) z ← argminz∈W − t=0 2 q Rz + 2µσ 2 * ) * ) 1 (k+1) z(k+1) + k+1 y w(k+1) ← k+3 k+3 q(k) ← argmaxq∈Q

- = y(K) , q -= return w

"

!K ) k=0

2(i+1) (N +1)(N +2)

*

q(k) .

Figure 1: Nesterov Procedure

- q -) that are δN -optimal policies for nature and the i.e., after K iterations the algorithm produces a pair (w,

investor. One can, therefore, terminate the algorithm once we are satisfied with the quality of the portfolio. * 12 ) · 1ε can ensure that the output of the algorithm is ε-optimal. In our Moreover, choosing K ≥ D1σD22 Ω numerical calculations we found that using the gap in (18) terminates the algorithm much quicker than using the upper bound. The main features of this algorithm are as follows. (a) The modified best-response y(k) and z(k) of the investor are computed by solving a separable quadratic optimization problem that is similar to the mean-variance portfolio selection with uncorrelated assets. This implies that the technology for mean-variance optimization can be used to solve the mean-CVaR problem. ¯ q ¯ ) are at least δN -optimal, and often, the quality of the solution is significantly superior (b) The iterates (w, to that implied by the bound. Thus, one can terminate the algorithm at any stage where one obtains a solution of sufficient quality. (c) In Section 4, we show that this algorithm converges to a reasonably accurate solution with the error ε = 10−3 very quickly even when the number of scenarios N = 106 . Since the scenario-based meanCVaR problem is itself an approximation to the original problem, solving the scenario-based CVaR very accurately does not serve any purpose.

7

{fig:neste

4

Numerical results {results}

We tested our procedure on the example in [12]. Our asset universe consisted of Treasury bonds with 2, 5, 10, and 30 years to maturity. As in the example in [12], we approximated the returns on the assets a Delta-Gamma approximation using the yields on bonds with 6 month, 2 years, 5 years, 10 years, 20 years, and 30 years to maturity as the risk-factors. We simulated N scenarios for the risk factors and then used the Delta-Gamma approximation to compute N return scenarios. We refer the reader to [11] for a detailed discussion of the simulation procedure. In Table 1, we display the optimal solution to the LP formulation for the mean-CVaR problem (3) with β = 0.05 and N = 106 . We use MOSEK [3] to solve these LPs. Table 2 shows the optimal portfolio computed by our proposed algorithm with the error tolerance ε = 10−3 . The portfolios produced by our algorithm and the LP formulation (3) are quite different; although the CVaR values are close. These results only imply that the LP approach and the our proposed iterative approach are consistent, i.e., both approaches are able to solve the mean-CVaR problem; these results are not able to differentiate between the two approaches. The most important results of this section are in Tables 3 and 4. In Table 3 we display the CPU time for solving the LP formulation using ILOG CPLEX [7] and MOSEK, and the CPU time for computing an ε = 0.001 optimal solution using our algorithm as a function of the number of scenarios N . It is clear that the industry leader LP solver CPLEX performs very poorly on this problem. MOSEK performs much better but the run times for this commercial solver is an order of magnitude higher than that of our MATLAB-based code. Table 4 displays the run times and the number of iterations required by our algorithm as a function of the accuracy ε. The performance of our algorithm degrades very quickly as ε decreases. Therefore, this algorithm is only suited for applications where one wants to compute a reasonably accurate solution very quickly. An example of such an application is high-frequency trading. The data in high-frequency trading is typically very noisy; therefore, it is pointless to compute a very accurate solution. Note that the LP approach does not allow any flexibility in setting the accuracy level. Next, we show how to use analysts’ “views” to bias the sample probability mass function p. We restrict ourselves to “views” of the form: ν T R ∼ g, where ν ∈ Rn is a vector that determines the particular linear combination of the return vector R, and g is a probability density on R. We convert this view on the distribution of the random return R to a view on

8

distribution of the N sample returns Ri , t = 1, . . . , n, by defining a view probability vector g(ν T Ri ) , p-i = !N T k=1 g(ν Rk )

i = 1, . . . , N.

- (j) , j = 1, . . . , m. Suppose we have m different “views”, i.e., there are m different view probability vectors p

We combine these vectors into a single sample probability vector p as follows:

p=

m & j=1

- (j) + u0 p(0) , uj p

(19) {probabili

1 N1

- (j) . Since denotes the empirical measure, and uj denotes the confidence weight on view p !N p is a probability vector, we require that j=0 uj = 1. Next, we solve the mean-CVaR problem with

where p(0) =

scenario probability vector p. Our algorithm also works with other techniques for combining views, see, for

example [5, 6, 12]. For our numerical experiments, we set m = 2. The two views were chosen to be

ν

(1)

ν (2)

/

= 0 / = 0

−1 −0.5

0 0 1

1 −0.5

0T

0

0

g (1) = unif[0, 0.001],

, 0

0T

,

g (2) = unif[0, 0.0005].

The weight vector was set such that u0 = 0.9, u1 = u2 = 0.05, i.e., we assumed that we had 90% confidence in the empirical distribution and 5% confidence in each of the two views. Table 5 shows the optimal portfolio computed using the LP formulation (3). As in the previous case, the LP was solved using MOSEK. Table 6 shows the results computed using our algorithm with ε = 0.001.

5

Conclusion

In this paper, we propose an efficient algorithm for solving mean-CVaR portfolio selection problem without using an LP solver. As shown in the numerical experiments, the algorithm is a useful alternative to the LP approach when one wants a very fast solver that guarantees an accuracy algorithm with ε ≈ 10−3 . This technique can also be extended to solve many other types of portfolio selection problems.

9

bond/target return (r) 2y 5y 10y 30y CVaR

0.0020 0.1856 0.9591 −0.1857 0.0409 0.00994

0.0035 −0.7919 2.1602 −0.4601 0.0917 0.01597

0.0045 −1.4413 2.9548 −0.6379 0.1244 0.02003

0.0050 −1.7684 3.3541 −0.7244 0.1387 0.02206

Table 1: Optimal portfolio and CVaR for the Mean-CVaR problem solved by LP approach. bond/target return (r) 2y 5y 10y 30y CVaR Error

0.0020 0.4739 0.3282 0.1888 0.0090 0.0101 0.0001

0.0035 0.2455 0.2484 0.2512 0.2548 0.0171 0.0011

0.0045 0.0932 0.1953 0.2928 0.4187 0.0218 0.0017

{tab:1}

0.0050 0.0171 0.1687 0.3136 0.5006 0.0244 0.0024

Table 2: Optimal portfolio and CVaR for the Mean-CVaR problem solved by our algorithm, and the absolute error compared with LP approach. {tab:1b}

References [1] V. Agarwal and N.Y. Naik. Risks and portfolio decisions involving hedge funds. Review of Financial Studies, 17(1):63–98, Spring 2004. [2] S. Alexander, T.F. Coleman, and Y. Li. Minimizing CVaR and VaR for a portfolio of derivatives. Journal Banking and Finance, 30(2):583–605, February 2006. [3] E. D. Andersen and K. D. Andersen. The MOSEK optimization toolbox for MATLAB manual Version 4.0. http://www.mosek.com/products/4 0/tools/help/index.html, 2006. [4] P. Artzner, F. Delbean, J.M. Eber, and D. Heath. Coherent measure of risks. Mathematical Finance, 9(3):203–228, July 1999. [5] F. Black and R. Litterman. Asset allocation: combining investor views with market equilibrium. Goldman Sachs Fixed Income Research, 1990. [6] F. Black and R. Litterman. Asset allocation: combining investor views with market expectations. Journal of Fixed Income, 1(1):7–18, September 1991. [7] ILOG. ILOG CPLEX 11.1. http://www.ilog.com/products/cplex/, 2008. [8] Y.K. Koskosidis and A.M. Duarte Jr. A scenario-based approach to active asset allocation. Journal of Portfolio Management, 23:74–85, Winter 1997. 10

N 10000 50000 100000 500000 1000000

CPLEX 1.42 39.25 155.71 6633.09 44439.79

MOSEK 0.76 3.00 4.41 25.08 50.62

Our algorithm (Iterations) 0.34 (1) 0.56 (1) 1.06 (1) 2.76 (1) 5.54 (1)

Table 3: CPU time for both methods in second and number of iterations required for our algorithm. ε 0.001 0.0005 0.0002 0.0001

η 0.01 0.005 0.002 0.001

CPU time 4.06 4.77 3331.4 9457.8

Iterations 1 1 703 2192

CVaR 0.0244 0.0244 0.0240 0.0230

{tab:3}

Error 0.0023 0.0023 0.0019 0.00094

Table 4: CPU time and iteration counts for our algorithm.

{tab:4}

[9] H. L¨ uthi and J. Doege. Convex risk measures for portfolio optimization and concepts of flexibility. Mathematical Programming, 104(2):541–559, November 2005. [10] H.M. Markowitz. Portfolio selection. Journal of Finance, 7(1):77–91, March 1952. [11] A. Meucci. Risk and asset allocation. Springer, 2005. [12] A. Meucci. Beyond black-litterman: Views on non-normal markets. Risk Magazine, 19:87–92, 2006. [13] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127– 152, May 2005. [14] R.T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Journal of Risk, 2(3):21–41, 2000. [15] R.T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions. Journal Banking and Finance, 26(7):1443–1471, July 2002. [16] R.T. Rockafellar, S. Uryasev, and M. Zabarankin. Deviation measures in risk analysis and optimization. Technical report, Department of Industrial and System Engineering, University of Florida, 2002.

Appendix A

Details of the parameters in the Nesterov algorithm

{compmaxit 2

The Hessian ∇ d2 (q) of the smoothing function d2 (q) = by

!N i

qi log qi + (β

−1

pi − qi ) log(β

−1

pi − qi ) is given

−1 ∇2 (d2 (q)) = diag([q1−1 , . . . , qN ]) + diag([β −1 p1 − q1 )−1 , . . . , (β −1 pN − qN )−1 ]).

11

r Bond 2y 5y 10y 30y CVaR

0.0020 New Change 0.0328 −0.1528 1.1395 0.1804 −0.1923 −0.0066 0.0199 −0.021 0.0110 0.00106

0.0035 New Change −1.0998 −0.3079 2.5328 0.3726 −0.4862 −0.0261 0.0533 −0.0384 0.0180 0.00203

0.0045 New Change −1.8520 −0.4107 3.4553 0.5005 −0.6785 −0.0406 0.0753 −0.0491 0.0228 0.00277

0.0050 New Change −2.2297 −0.4613 3.9186 0.5645 −0.7745 −0.0501 0.0855 −0.0532 0.0252 0.00314

Table 5: Optimal portfolio and CVaR for the Mean-CVaR problem solved by our algorithm with weights on views u0 = 0.9, u1 = u2 = 0.05 by LP approach. {tab:2a} r Bond 2y 5y 10y 30y CVaR Error

0.0020 New Change 0.4457 −0.0282 0.3175 −0.0107 0.1960 0.0072 0.0408 0.0318 0.0113 0.0012 0.0003

0.0035 New Change 0.1860 −0.0595 0.2280 −0.0204 0.2677 0.0162 0.3184 0.0636 0.0196 0.0025 0.0016

0.0045 New Change 0.0128 −0.0804 0.1683 −0.027 0.3155 0.0227 0.5034 0.0847 0.0254 0.0036 0.0026

0.0050 New Change −0.0737 −0.0901 0.1384 −0.0303 0.3394 0.0258 0.5959 0.0953 0.0283 0.0039 0.0031

Table 6: Optimal portfolio and CVaR for the Mean-CVaR problem solved by our algorithm with weights on views u0 = 0.9, u1 = u2 = 0.05 by our algorithm. {tab:2b} Therefore,

hT ∇2 (d2 (q))h

N & h2

N &

h2i q (β −1 pi − qi ) i=1 i=1 i !N N N )& * ) !N (β −1 p − q ) * h2i * ) i=1 qi * ) & h2i i i i=1 = · + · −1 p − q ) −1 − 1 q 1 (β β i i i=1 i i=1

=

≥ =

i

N N *2 )& )& 1 1 h i √ *2 hi 1 · · β −1 pi − qi q + √ i −1 qi β − 1 i=1 β −1 pi − qi i=1

(20)

(21)

1 &h&21 , 1−β

where (20) follows from the fact that Cauchy-Schwatrz inequality.

+

!

i qi

= 1 and

!

i (β

−1

pi − qi ) = β −1 − 1, and (21) follows from the

By setting w(k) = 0 in (11), it follows that qmin = argminq∈QN {d2 (q)} satisfies qimin =

β −1 pi , 1 + eα/µ

12

i = 1, . . . , N,

where α is chosen to ensure that 1T qmin = 1. Therefore, it follows that qmin = p, and

min d2 (q) =

q∈QN

&

pi log pi +

i

& i

) * pi (β −1 − 1) log pi + log(β −1 − 1) .

Since d2 (q) is a convex function, maxq∈QN d2 (q) occurs at extreme points of the polytope QN . The extreme points of the polytope QN are of the form:    β −1 pi , i ∈ {π(1), . . . , π(k − 1)}, qi =   0, i ∈ {π(k + 2), . . . , π(N )} where π is a permutation of the set {1, . . . , N } and qπ(k+1) ∈ [0, β −1 pπ(k+1) ] is chosen to ensure that !N i=1 qi = 1. The value d2 (q)

&

=

β −1 pπ(i) ln(β −1 pπ(i) )

i:i&=π(k+1)

+ qπ(k+1) ln(qπ(k+1) ) + (β −1 pπ(k+1) − qπ(k+1) ) ln(β −1 pπ(k+1) − qπ(k+1) ) & ≤ β −1 pπ(i) ln(β −1 pπ(i) ), i

where the last inequality follows from qπ(k+1) ln(qπ(k+1) ) + (β −1 pπ(k+1) − qπ(k+1) ) ln(β −1 pπ(k+1) − qπ(k+1) ) ≥ (β −1 pπ(k+1) ) ln(β −1 pπ(k+1) ). Thus, D2 = max d2 (q) − min d2 (q) q∈Q



& i

q∈Q

β −1 pi log(β −1 pi ) −

& i

pi log pi −

& i

) * = −β −1 − β log β + (1 − β) log(1 − β) .

13

) * pi (β −1 − 1) log pi + log(β −1 − 1)

(22) {D2_q}