a smoothing penalized sample average ...

0 downloads 0 Views 403KB Size Report
Jun 26, 2013 - Harbin, 150001, China [email protected] ... of Higher Education of China (Grant No. ...... China CITIC Bank Corporation Limited. 3.24.
4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Asia-Pacific Journal of Operational Research Vol. 30, No. 3 (2013) 1340002 (25 pages) c World Scientific Publishing Co. & Operational Research Society of Singapore  DOI: 10.1142/S0217595913400022

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

A SMOOTHING PENALIZED SAMPLE AVERAGE APPROXIMATION METHOD FOR STOCHASTIC PROGRAMS WITH SECOND-ORDER STOCHASTIC DOMINANCE CONSTRAINTS

HAILIN SUN∗ Department of Mathematics, Harbin Institute of Technology Harbin, 150001, China [email protected] HUIFU XU School of Engineering and Mathematical Sciences City University of London, London EC1V OHB, UK [email protected] YONG WANG Department of Mathematics, Harbin Institute of Technology Harbin, 150001, China [email protected] Received 30 December 2011 Revised 18 May 2012 Published 26 June 2013 In this paper, we propose a smoothing penalized sample average approximation (SAA) method for solving a stochastic minimization problem with second-order dominance constraints. The basic idea is to use sample average to approximate the expected values of the underlying random functions and then reformulate the discretized problem as an ordinary nonlinear programming problem with finite number of constraints. An exact penalty function method is proposed to deal with the latter and an elementary smoothing technique is used to tackle the nonsmoothness of the plus function and the exact penalty function. We investigate the convergence of the optimal value obtained from solving the smoothed penalized sample average approximation problem as sample size increases and show that with probability approaching to one at exponential rate with the increase of sample size the optimal value converges to its true counterpart. Some preliminary numerical results are reported. Keywords: Second-order dominance; exact penalization; sample average approximation; portfolio optimization.

∗ The

work of the first author was supported by the Specialized Research Fund of Doctoral Program of Higher Education of China (Grant No. 20103207110002).

1340002-1

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

H. Sun, H. Xu & Y. Wang

1. Introduction Stochastic dominance is a fundamental concept in decision theory and economics. A random outcome a(ω) is said to dominate another random outcome b(ω) in the second-order, written as a(ω) 2 b(ω), if E[v(a(ω))] ≥ E[v(b(ω))] for every concave nondecreasing function v(·), for which the expected values are finite, see monograph (M¨ uller and Scarsini, 1991) for the recent discussions of the concept. In their pioneering work, Dentcheva and Ruszczy´ nski (2003) introduced a stochastic programming model with second-order stochastic dominance constraints: min EP [F (z, ξ(ω))] z

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

s.t.

G(z, ξ(ω)) 2 Y (ξ(ω)),

(1)

z ∈ Z, where Z is a closed convex subset of Rn , F, G : Rn ×Ξ → R are continuous functions, ξ : Ω → Ξ is a vector of random variables defined on probability (Ω, F , P ) with support set Ξ ⊂ Rq , and EP [·] denotes the expected value with respect to the distribution of ξ(ω). A simple economic interpretation of the model can be given as follows. Let G(z, ξ(ω)) be a profit function which depends on decision vector z and a random variable ξ(ω), let F = −G and Y (ξ(ω)) be a benchmark profit. Then (1) can be viewed as an expected profit maximization problem subject to the constraint that the profit dominates the benchmark profit in second-order. Mathematically, we denote the cumulative distribution function of random variable X as F1 (X; η) := P (X ≤ η). If for all η ∈ R, F1 (G(z, ξ(ω)); η) ≤ F1 (Y (ξ(ω)); η), then G(z, ξ(ω)) dominates Y (ξ) in first-order, denoted by G(z, ξ(ω)) 1 Y (ξ). Let  η F1 (G(z, ξ(ω)); t)dt. F2 (G(z, ξ(ω)); η) := −∞

If F2 (G(z, ξ(ω)); η) ≤ F2 (Y (ξ(ω)); η),

∀ η ∈ R,

then G(z, ξ(ω)) dominates Y (ξ) in second-order, that is G(z, ξ(ω)) 2 Y (ξ(ω)). It is easy to observe that first-order stochastic dominance implies second-order stochastic dominance. It is well known that F2 (X; η) can be rewritten as F2 (X; η) = E[(η − X)+ ], 1340002-2

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Smoothing Penalized SAA Method for Stochastic Programs

where (x)+ = max(0, x), consequently, the second-order dominance constraint in (1) can be reformulated as E[(η − G(z, ξ(ω)))+ ] ≤ E[(η − Y (ξ(ω)))+ ],

∀ η ∈ R,

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

see for instance Dentcheva and Ruszczy´ nski (2004). Ogryczak and Ruszczy´ nski (1999) investigated the relationship between stochastic second-order dominance and mean-risk models. In a more recent development (Dentcheva and Ruszczy´ nski, 2006), the second-order dominance is shown to be equivalent to the constraint of conditional value at risk through Fenchel conjugate duality. Using the reformulation of the second-order dominance constraints, Dentcheva and Ruszczy´ nski (2003) reformulated (1) as: min

EP [F (z, ξ(ω))]

s.t.

EP [(η − G(z, ξ(ω)))+ ] ≤ EP [(η − Y (ξ(ω)))+ ],

z

∀ η ∈ R,

(2)

z ∈ Z. To ease notation, we will use ξ to denote the random vector ξ(ω) and a deterministic vector, depending on the context. It is well known that (2) does not satisfy the Slater’s constraint qualification, a condition that is often needed for deriving first optimality conditions of the problem and developing a numerically stable method for solving the problem. Subsequently, a so-called relaxed form of (2) is proposed: min

EP [F (z, ξ)]

s.t.

EP [(η − G(z, ξ))+ ] ≤ EP [(η − Y (ξ))+ ],

z

∀ η ∈ [a, b],

(3)

z ∈ Z, where [a, b] is a closed interval in R. In the case when Y (ξ(ω)) has a bounded distribution, a ≤ minω∈Ω Y (ξ(ω)) and b ≥ maxω∈Ω Y (ξ(ω)), problems (2) and (3) are equivalent. Over the past few years, Dentcheva and Ruszczy´ nski have developed a comprehensive theory of optimality and duality for both (1) and (2), see Dentcheva and Ruszczy´ nski (2003, 2004, 2006). Moreover, they proposed the following equivalent formulation to deal with the nonsmoothness arising from the plus function: min

EP [F (z, ξ)]

s.t.

G(z, ξ) + S(η, ω) ≥ η,

z

for a.a. (η, ω) ∈ [a, b] × Ω,

EP [S(η, ω)] ≤ EP [(η − Y (ξ)+ ], S(η, ω) ≥ 0,

for all η ∈ [a, b],

(4)

for a.a. (η, ω) ∈ [a, b] × Ω,

z ∈ Z, where the abbreviation “a.a.” is understood as “almost all” with respect to the product of the Lebesgue measure on [a, b] and the probability measure P on Ω. 1340002-3

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

H. Sun, H. Xu & Y. Wang

Our concern in this paper is on numerical method for solving (3). Specifically we apply the well-known sample average approximation (SAA) method to deal with the expected values and reformulate the resulting semi-infinite SAA problem to an ordinary nonlinear programming problem. An exact penalty function method is subsequently proposed to solve the latter. In doing so, we incorporate some smoothing techniques to tackle the nonsmoothness resulting from the plus function and exact penalization. Let us make a few comments on the relationship between this work and those in the literature. Hu et al. (2012) introduced the concept of multidimensional polyhedral linear second-order stochastic dominance constraints and proposed a sample average cutting-surface algorithm for solving the problem, that is, they dealt with the expected values of the underlying random functions with sample average approximation and then solve the SAA problem by a cutting plane method. The latter is an iterative scheme which requires solving maximization problem with DC objective function through a branch and bound method. Under some moderate conditions, they showed exponential rate of convergence of the statistical estimators of optimal value and optimal solution of the SAA problem as sample size increases. Our work differs from Hu et al. (2012) in that we proposed a smoothing SAA scheme and then reformulate it as an ordinary nonlinear convex program which is easy to solve. In other words, for a fixed sample, our SAA problem is very easy to solve. We also derived exponential convergence for our smoothed SAA scheme when as sample size increases and the smoothing parameter is driven to zero. Rudolf and Ruszczy´ nski (2008) and F´ abi´ an et al. (2011) proposed a cutting plane method for solving a stochastic program with second-order stochastic dominance constraints. A crucial element of the method is based on an observation that when F and G are linear w.r.t. x and probability space Ω is finite, the constraint function in the second-order dominance constraint is the convex envelope of a finitely many linear functions. Subsequently, an iterative scheme which exploits the fundamental idea of classical cutting plane method is proposed where at each iterate “cutting plane” constraints are constructed and added. This also effectively tackles the nonsmoothness issue caused by the plus function. While the method displays strong numerical performance, it relies heavily on discreteness of the probability space as well as on the linearity of F and G. It is unclear whether the method can be extended to nonlinear and/or continuous distribution case. The rest of the paper is organized as follows. In Sec. 2, we consider the optimality conditions of the true problem (3). In Sec. 3, we propose an SAA scheme to deal with the expected values and show that with probability approaching one at an exponential rate an optimal value obtained from solving the SAA problem (10) converges to the optimal value of its true counterpart. In Sec. 4, we reformulate the semi-infinite SAA problem into an ordinary nonlinear programming problem and then apply a smoothing exact penalization function method to solve the latter. Finally, in Sec. 5, we apply the proposed numerical method to an academic example and a portfolio optimization problem and report some preliminary numerical results. 1340002-4

1340002.tex

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Smoothing Penalized SAA Method for Stochastic Programs

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

2. Preliminaries Throughout this paper, we use the following notation. xT y denotes the scalar products of two vectors x and y,  ·  denotes the Euclidean norm of a vector and a compact set of vectors. We also use the same notation to denote the infinity norm of a continuous function space and its induced norm of a linear operator. d(x, D) := inf x ∈D x − x  denotes the distance from point x to set D. For two sets D1 and D2 , D(D1 , D2 ) := supx∈D1 d(x, D2 ) denotes the deviation of set D1 from set D2 . For a real valued-function h(x), we use ∇h(x) to denote the gradient of h at x, if h(x) is vector valued, then the same notation refers to the classical Jacobian of h at x. We need some basics about measure theory in this paper. Let C ([a, b]) denote the space of continuous functions defined on [a, b] with maximum norm. By the Riesz representation theorem, the space dual to C ([a, b]), denoted by C ∗ ([a, b]), is the space of regular countably additive measures on [a, b] having finite variation, see Bonnans and Shapiro (2000, Example 2.63), Dentcheva and Ruszczy´ nski (2003) and the references therein. Let C+∗ ([a, b]) be the subset of C ∗ ([a, b]) of positive measures b and µ denote the induced norm of map a · µ(dη) : C ([a, b]) → R. Then for b µ ∈ C ∗ ([a, b]), µ = a µ(dη) = µ([a, b]), which is the total variation measure of µ on [a, b], see Gugat (1999, Sec. 3) and Bonnans and Shapiro (2000, Example 2.63). Let v : Rn → Rm be a locally Lipschitz continuous function. Recall that Clarke generalized derivative of v at a point x in direction d is defined as v o (x, d) := lim sup y→x,t↓0

v(y + td) − v(y) . t

v is said to be Clarke regular at x if the usual one sided directional derivative, denoted by v  (x, d), exists for all d ∈ Rn and v o (x, d) = v  (x, d). The Clarke generalized gradient (also known as Clarke subdifferential) is defined as ∂v(x) := {ζ : ζ T d ≤ v o (x, d)}. See Clarke (1983, Chap. 2) for the details of the concepts above. For the simplicity of notation, let p(x) := max(0, x) and rewrite (η − G(z, ξ))+ and (η − Y (ξ))+ as p(η − G(z, ξ)) and p(η − Y (ξ)), respectively. Let f (z) := EP [F (z, ξ)], H(z, η, ξ) := p(η − G(z, ξ)) − p(η − Y (ξ)) and h(z, η) := EP [H(z, η, ξ)]. For µ ∈ C+∗ ([a, b]), let  g(z) :=

b

h(z, η)µ(dη). a

1340002-5

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

H. Sun, H. Xu & Y. Wang

The proposition below summarizes the main properties of functions f , g, H and h that are needed in this paper under some moderate conditions. Assumption 2.1. There exists a positive integrable function κ(ξ) such that max(∇z F (z, ξ), ∇z G(z, ξ)) ≤ κ(ξ).

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

Proposition 2.1. Let Assumption 2.1 hold. Assume that there exist z0 ∈ Z and η0 ∈ [a, b] such that EP [F (z0 , ξ)] < ∞, EP [ p(η0 − G(z0 , ξ))] < ∞ and EP [ p(η0 − Y (ξ))] < ∞. Then (i) f (z) and h(z, η) are well defined for any z ∈ Z and η ∈ [a, b]; (ii) both f (z) and h(z, η) are globally Lipschitz continuous w.r.t. z with Lipschitz modulus E[κ(ξ)] and h(z, η) is globally Lipschitz continuous w.r.t. η; (iii) g(z) is well defined for every z ∈ Z and it is Lipschitz continuous and directionally differentiable, i.e.,  b  hz (z, η; u)µ(dη), ∀ u ∈ Rm , g (z; u) = a

where

 E [−∇G(z, ξ)T u],    P hz (z, η; u) = max{EP [−∇G(z, ξ)T u], 0},    0,

if (η − G(z, ξ)) > 0, if (η − G(z, ξ)) = 0,

(5)

if (η − G(z, ξ)) < 0.

Proof. Verification of the assertions in Parts (i) and (ii) is elementary, we omit the details. Part (iii). The well definedness of g(z) follows from Part (i). In what  b follows, we show the Lipschitzness and directional differentiability. Let M := a µ(dη) < ∞. By a simple calculation, we have  b EP [|p(η − G(z  , ξ)) − p(η − G(z  , ξ))|]µ(dη) |g(z  ) − g(z  )| ≤ a

 ≤

b

EP [κ(ξ)]z  − z  µ(dη)

a

= M EP [κ(ξ)]z  − z  . This shows the global Lipschitzness of g. Moreover, since for almost every η ∈ [a, b], h(·, η) is directionally differentiable and its Lipschitz modulus is EP [κ(ξ)] which is independent of η, by Qi et al. (2005, Proposition 1), g is directionally differentiable and  b h (z, η; u)µ(dη), ∀ u ∈ Rn . g  (z; u) = a

A simple calculation of the directional derivative of h yields (5). We omit the details. 1340002-6

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Smoothing Penalized SAA Method for Stochastic Programs

2.1. Optimality conditions Optimality conditions on optimization problems with second-order dominance constraints have been well documented, see for instance Dentcheva and Ruszczy´ nski (2003, 2004, 2007). Here we review those relevant to our true problem (3). Assumption 2.2. Problem (3) satisfies the uniform dominance condition, that is, there exists a point z0 ∈ Z such that sup h(z0 , η) < 0.

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

η∈[a,b]

This condition is also known as Slater’s constraint qualification. Let µ ∈ C+∗ ([a, b]) and define the Lagrange function of problem (3):  L (z, µ) := EP [F (z, ξ)] +

b

h(z, η)µ(dη). a

The following optimality conditions are established by Dentcheva and Ruszczy´ nski (2003, Theorem 4.2) in the context of stochastic programs with second-order dominance constraints. They can also be derived from a discussion in Bonnans and Shapiro (2000, p. 499, Theorem 5.107) in that (3) is essentially a semi-infinite programming problem. Theorem 2.1 (Optimality condition). Consider the relaxed stochastic dominance problem (3). Assume F is convex, G is concave and (3) satisfies Assumption 2.2. If z ∗ is an optimal solution of the problem, then there exists a measure µ ∈ C+∗ ([a, b]) such that  ∗ z ∈ arg min L (z, µ),   z∈Z     ∗   h(z , η) ≥ 0,  b   h(z ∗ , η)µ(dη) = 0,    a     ∗ z ∈ Z.

(6)

Moreover, the set of all measures µ satisfying (6) is nonempty, convex and bounded, and is the same for any optimal solution of the problem. It is possible to characterize the optimality conditions (6) in terms of derivatives of the underlying functions, in which case they are known as first-order necessary conditions or Karush–Kuhn–Tucker (KKT) conditions. Indeed this has been done in terms of subdifferentials of the underlying functions for both convex and nonconvex optimization problem with second-order dominance constraints, see Dentcheva and Ruszczy´ nski (2007) for details and Bonnans and Shapiro (2000, Theorem 5.111) and Gugat (1999) in the general context of semi-infinite programming. Here we 1340002-7

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

H. Sun, H. Xu & Y. Wang

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

present first-order optimality conditions in terms of directional derivatives by virtue of Theorem 2.1 and Proposition 2.1. Theorem 2.2 (First-order necessary condition). Let Assumption 2.1 hold. Under the setting and conditions of Theorem 2.1, there exists µ ∈ C+∗ ([a, b]) such that   b  L  (z ∗ , µ; u) := ∇EP [F (z ∗ , ξ)]T u + hz (z ∗ , η; u)µ(dη) ≥ 0, ∀ u ∈ TZ (z ∗ ),    a      h(z ∗ , η) ≤ 0, ∀ η ∈ [a, b],  b     h(z ∗ , η)µ(dη) = 0,    a     ∗ z ∈ Z, (7) where TZ (z ∗ ) denotes the Bouligrand tangent cone to Z at z ∗ , i.e., TZ (z ∗ ) := {u ∈ Rn : d(z ∗ + tu, Z) = o(t), t ≥ 0}.

(8)

We call a tuple (z, µ(·)) which satisfies conditions above a KKT pair of problem (3), z a stationary point and µ(·) the corresponding Lagrange multiplier. 3. Sample Average Approximation Consider problem (3). If the random vector ξ satisfies a finite discrete distribution and we know the distribution, then we can easily reformulate (3) as a deterministic nonlinear programming problem. In general, it is difficult to obtain a closed form of the expected value of the underlying random functions and this motivates us to consider applying the well-known sample average approximation to the problem. Let ξ 1 , . . . , ξ N be an independent and identically distributed (i.i.d.) sampling of ξ. The sample average approximation of (3) can be written as: min fN (z) := z

s.t.

N 1  F (z, ξ i ) N i=1

hN (z, t) :=

N 1  [p(η − G(z, ξ i )) − p(η − Y (ξ i )] ≤ 0, N i=1

(9)

∀ η ∈ [a, b], z ∈ Z. Assume that we are able to obtain an optimal solution, denoted by z N , from solving (9). Our main focus here is to show that under some appropriate conditions z N converges to its true counterpart as N increases. To simplify the notation and save some technical details in the convergence analysis, we write the sample average approximation as a specific probability 1340002-8

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Smoothing Penalized SAA Method for Stochastic Programs

measure: PN :=

N 1  1ξk (ω), N k=1

where for k = 1, . . . , N  1ξk (ω) :=

1, if ξ(ω) = ξ k , 0, if ξ(ω) = ξ k .

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

In the literature of stochastic programming, PN is known as empirical probability measure. Using the notation, we can rewrite (9) as min fN (z) := EPN [F (z, ξ)] z

s.t.

hN (z, η) := EPN [p(η − G(z, ξ)) − p(η − Y (ξ))] ≤ 0,

(10)

∀ η ∈ [a, b], z ∈ Z. We call (9) and (10) the SAA problems and (3) the true problem. SAA is well known in stochastic programming under various names such as sample average approximation, Monte Carlo method, sample path optimization, stochastic counterpart, etc., see Robinson (1996); Homen-De-Mello (2008); Shapiro and Xu (2008); Xu (2010) and the references therein. Before moving on to the convergence analysis, we introduce the following notation. Let P(Ω) denote the set of all Borel probability measures. Define the set of functions: G := {g(·) = H(z, η, ·) : z ∈ Z, η ∈ [a, b]} ∪ {g(·) = F (z, ·) : z ∈ Z}. The distance between PN and P is defined as: D(PN , P ) := sup |EPN [g] − EP [g]|. g∈G

This type of distance was introduced by R¨omisch (2003, Sec. 2.2) for the stability analysis of stochastic programming and was called pseudometric. It is well known that D is non-negative, symmetric and satisfies the triangle inequality, see R¨omisch (2003, Sec. 2.1). Throughout this section, we use the following notation. For Q = P, PN : F (Q) := {z ∈ Z : EQ [H(z, η, ξ)] ≤ 0, ∀ η ∈ [a, b]}, ϑ(Q) := inf{EQ [f (z, ξ)] : z ∈ F(Q)}, Sopt (Q) := {z ∈ F(Q) : ϑ(Q) = EQ [f (z, ξ)]}, PG (Ω) := Q ∈ P(Ω) : −∞ < inf EQ [g(ξ)] and g(ξ)∈G

1340002-9

inf EQ [g(ξ)] < ∞ .

g(ξ)∈G

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

H. Sun, H. Xu & Y. Wang

It is easy to observe that under some mild conditions, P, PN ∈ PG (Ω) and D(P, PN ) < ∞. Proposition 3.1. Assume: (a) the conditions of Proposition 2.1 hold, (b) Z is a compact set. Then w.p.1 lim D(PN , P ) = 0.

N →∞

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

Proof. By Proposition 2.1, f (z), fN (z), hN (z, η) and h(z, η) are well defined for z ∈ Z and η ∈ [a, b]; f (z), fN (z), hN (z, η) and h(z, η) are globally Lipschitz continuous w.r.t. z with Lipschitz modulus κ ˆ := max{EP [κ(ξ)], EPN [κ(ξ)]}, h(z, η) and hN (z, η) are globally Lipschitz continuous w.r.t. η. By the definition of D and G , we have D(PN , P ) = sup |EPN [g] − EP [g]| g∈G



|hN (z, η) − h(z, η)| + sup |fN (z) − f (z)|.

sup

(11)

z∈Z

z∈Z,η∈[a,b]

We estimate the two terms on the right-hand side of (11). By the uniform law of large numbers (see e.g. Rusczy´ nski and Shapiro, 2003, Lemma A1) N 1  k sup H(z, ξ , η) − EP [H(z, ξ, η)] = 0 lim (12) N →∞ z∈Z,η∈[a,b] N k=1

and

N 1  k F (z, ξ ) − EP [F (z, ξ)] = 0. lim sup N →∞ z∈Z N

(13)

k=1

Combining (12) and (13), we have lim D(PN , P ) = 0.

N →∞

The proof is complete. Proposition 3.2. Assume: (a) the conditions of Proposition 3.1 hold ; (b) the moment generating function EP [eκ(ξ)t ] of the random variable κ(ξ) is finite valued for t close to 0, where κ(ξ) is defined by Assumption 2.1; (c) for every z ∈ Z, the moment generating functions EP [e(F (z,ξ)−EP [F (z,ξ)])t ],

EP [e(p(η−G(z,ξ))−EP [p(η−G(z,ξ))])t ]

and

EP [e(p(η−Y (ξ))−EP [p(η−Y (ξ))])t ] are finite valued for t close to 0. Then for any small positive number α > 0, there exist positive constants C(α) and β(α) independent of N such that Prob(D(PN , P ) ≤ α) ≥ 1 − C(α)e−β(α)N for N sufficiently large. 1340002-10

(14)

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Smoothing Penalized SAA Method for Stochastic Programs

Proof. We estimate Prob(D(PN , P ) ≤ α). Note that D(PN , P ) ≤

sup

|hN (z, η) − h(z, η)| + sup |fN (z) − f (z)|. z∈Z

z∈Z,η∈[a,b]

Therefore,



α Prob(D(PN , P ) ≥ α) ≤ Prob sup|fN (z) − f (z)| ≥ 2 z∈Z 



Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

α . + Prob sup |hN (z, η) − h(z, η)| ≥ 2 z∈Z,η∈[a,b]

(15)

It suffices to estimate the two terms at right hand of (15). By virtue of Shapiro and Xu (2008, Theorem 5.1) and conditions (a), (b), and (c), there exist positive constants C1 (α), C2 (α), β1 (α) and β2 (α) such that

α Prob sup|fN (z) − f (z)| ≥ (16) ≤ C1 (α)e−N β1 (α) 2 z∈Z and



α Prob sup |hN (z, η) − h(z, η)| ≥ 2 z∈Z,η∈[a,b]

≤ C2 (α)e−N β2 (α) .

Combining (16) and (17), we obtain (14) with C(α) := minj=1,2 βj (α).

2 j=1

(17)

Cj (α) and β(α) :=

With Proposition 3.2, we are ready to discuss the approximation of the SAA problem (10) to the true problem (3) in terms of optimal values as N → ∞. Theorem 3.1. Let Assumption 2.2 hold and G(z, ξ) is concave w.r.t. z for all ξ. Suppose that the conditions of Proposition 3.2 are satisfied. Then (i) there exist positive numbers N ∗ and L∗ such that the optimal value function of problem (10), denoted by ϑ(PN ), satisfies the following: |ϑ(PN ) − ϑ(P )| ≤ L∗ D(PN , P ) for N ≥ N ∗ ; (ii) for any small positive number α, there exist positive constants C(α), β(α) independent of N such that Prob{|ϑ(PN ) − ϑ(P )| ≤ α} ≥ 1 − C(α)e−N β(α) for N ≥ N ∗ . Proof. We use Theorem 2.7 from Liu and Xu (2013) to prove part (i). To this end, we verify the conditions in the theorem. Under Assumption 2.2, there exist a positive number δ¯ and a point z0 ∈ Z such that ¯ max EP [H(z0 , η, ξ)] ≤ −δ.

η∈[a,b]

1340002-11

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

H. Sun, H. Xu & Y. Wang

On the other hand, it follows by Proposition 3.1 that for any > 0, there exists N ∗ > 0 such that D(PN , P ) ≤ for all N ≥ N ∗ , i.e., PN belongs to the -neighborhood of P . By Liu and Xu (2013, Theorem 2.7), Part (i) holds. Combining Part (i) and Proposition 3.2, we have  α Prob{|ϑ(PN ) − ϑ(P )| ≤ α} ≥ Prob D(PN , P ) ≤ ∗ ≥ 1 − C(α)e−β(α)N L for N ≥ N ∗ . The proof is complete.

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

Note that the exponential convergence of the optimal values is slightly different from Theorem 3.10 (Liu and Xu, 2013): here we obtain the convergence results without using the exact penalization. 4. Penalization Scheme From numerical point of view, the SAA problem (10) is still difficult to solve as it consists of infinite number of constraints. Assume that Y (ξ) is bounded and its support set is contained by [a, b]. By Dentcheva and Ruszczy´ nski (2003, Proposition 3.2), we can reformulate EPN [(η − G(z, ξ))+ ] ≤ EPN [(η − Y (ξ))+ ],

∀ η ∈ [a, b]

into EPN [(yi − G(z, ξ))+ ] ≤ EPN [(yi − Y (ξ))+ ],

i = 1, . . . , N,

where yi = Y (ξ i ), i = 1, . . . , N and yi ∈ [a, b]. Consequently we can reformulate the SAA problem (10) as follows: min fN (z) := EPN [F (z, ξ)] z

s.t.

hiN (z) := EPN [H(z, yi , ξ)] ≤ 0,

i = 1, . . . , N, z ∈ Z,

(18)

or equivalently min fN (z) := z

s.t.

N 1  F (z, ξ k ) N k=1

hiN (z) :=

N 1  (p(yi − G(z, ξ k )) − p(yi − Y (ξ k ))) ≤ 0, N

(19)

k=1

i = 1, . . . , N, z ∈ Z. 4.1. Exact penalization For a fixed sample, problem (19) is an ordinary nonlinear programming problem with finite number of constraints. We can apply any existing NLP code to solve it. 1340002-12

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Smoothing Penalized SAA Method for Stochastic Programs

However, the number of constraints depends on the sample size N and this may make the problem difficult to solve by well-known NLP methods such as the active set method as sample size N becomes large. This motivates us to consider an exact penalization function scheme for (19) as follows: min z

s.t.

ψN (z, ρN ) := EPN [F (z, ξ)] + ρN

N 

p(hiN (z))

i=1

z ∈ Z,

(20)

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

where ρN is a positive constant depending on N . In what follows, we deal with two technical issues: (a) the existence of a finite penalty parameter ρN , (b) the relationship between (19) and (20). To this end, we need the following assumption. Assumption 4.1. There exist a vector u ∈ Rn and a positive constant δ such that ho (z, η; u) ≤ −δ,

∀ η ∈ [a, b], z ∈ Z,

(21)

o

where h (z, η; u) denotes Clarke generalized directional derivative of h(z, η) at point z in direction u for a given η ∈ [a, b]. We call Assumption 4.1 strong extended Mangasarian–Fromovitz constraint qualification (MFCQ). When δ = 0, the condition is known as extended MFCQ, see Bonnans and Shapiro (2000, p. 510) in the context of semi-infinite programming. It is well known that when the underlying functions are nonconvex, the boundedness of the Lagrange multipliers can be derived under the extended MFCQ, see Gugat (1999, Theorem 1) and Bonnans and Shapiro (2000, Theorem 5.111). Define the Lagrange function of (19): LN (z, λ) := EPN [F (z, ξ)] +

N 

i λN i hN (z).

i=1 N

Proposition 4.1. Let z be an optimal solution of problem (19) and Assumption 4.1 hold. Then w.p.1 problem (19) satisfies the strong extended MFCQ for N sufficiently large and there exist non-negative numbers λN i , i = 1, . . . , N, such that   N LN (z , µ; u) := ∇EPN [F (z N , ξ)]T u      N     i  N   + λN i hN (z ; u) ≥ 0,     i=1   hiN (z N ) ≤ 0, ∀ i = 1, . . . , N,     N    i N  λN  i hN (z ) = 0,    i=1     N z ∈ Z,

u ∈ TZ (z N ), (22)

where TZ (z N ) denotes the Bouligrand tangent cone to Z at z N and λN i , i = 1, . . . , N, are known as the Lagrange multipliers corresponding to z N . 1340002-13

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

H. Sun, H. Xu & Y. Wang

Proof. Let δ and u be given as in Assumption 4.1. It suffices to show that hoN (z, η; u) ≤ −

sup η∈[a,b],z∈Z

δ 2

(23)

w.p.1 for N sufficiently large, i.e., problem (19) satisfies the strong extended MFCQ. Let  ∂z h(z, η) := ∂z h(z  , η) z  ∈z+B

and

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

ho (z, η; u) :=

ζ T u.

sup ζ∈∂z h(z,η)

Under Assumption 4.1, it is easy to verify that ho (z, η; u) ≤ −δ. By Sun and Xu (2013, Lemma 2.1), hoN (z, η; u) − ho (z, η; u) ≤

δ , 2

w.p.1

w.p.1 for N sufficiently large, which yields sup η∈[a,b],z∈Z

hoN (z, η; u) ≤

sup

ho (z, η; u) +

η∈[a,b],z∈Z

δ δ ≤− 2 2

w.p.1. It is easy to observe that strong extended MFCQ implies extended MFCQ. By Bonnans and Shapiro (2000, Theorem 5.111), there exist non-negative numbers λN i , i = 1, . . . , N such that (22) hold and the set of Lagrange multipliers is bounded. Using Proposition 4.1, we can show the existence of a finite penalty parameter and establish a relationship between (19) and (20) under some appropriate conditions. Note that the proof of Proposition 4.1 is similar to the proof in Sun and Xu (2013, Theorem 2.4), the difference is that (Sun and Xu, 2013, Theorem 2.4) is proved for the case when the problem has infinite number of constraints while Proposition 4.1 is proved for the problem with finite number of constraints. But somehow, Proposition 4.1 is covered by Sun and Xu (2013, Theorem 2.4). Theorem 4.1. Let Assumption 4.1 hold. Let zˆ be a stationary point of (19) which satisfies (22) and λN := {λN i }, i = 1, . . . , N be a corresponding Lagrange multiplier. If ρN ≥ max{λN i , i = 1, . . . , N },

(24)

then zˆ is a stationary point of penalized minimization problem (20). In the case when the underlying functions are convex, the stationary point is a global optimal solution to both the SAA problem (19) and the penalized minimization problem (20). 1340002-14

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Smoothing Penalized SAA Method for Stochastic Programs

Proof. Let u ∈ TZ (ˆ z ). Then 

z , ρN ; u) = ∇EPN [F (ˆ z , ξ)] u + ρN ψ (ˆ T

N 

(hiN + ) (ˆ z ; u),

i=1

where

 (hiN + ) (ˆ z ; u)

:=



z ; u), hiN (ˆ

if hiN (ˆ z ) > 0,



max{hiN (ˆ z ; u), 0}, if hiN (ˆ z ) ≤ 0.

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

On the other hand, under Assumption 4.1, it follows by Proposition 4.1 that problem (19) satisfies the strong extended MFCQ, which implies that max{λN i ,i = 1, . . . , N } < ∞ w.p.1 for N sufficiently large. Moreover, LN (ˆ z , λN ; u) ≥ 0, i.e., ∇EPN [F (ˆ z , ξ)]T u +

N 



i λN z ; u) ≥ 0, i hN (ˆ

u ∈ TZ (ˆ z ).

i=1

Consequently, 

z , ρN ; u) ≥ − ψ (ˆ

N 

i  λN z ; u) i hN (ˆ

+ ρN

i=1

N 

(hiN + ) (ˆ z ; u)

i=1

≥0 w.p.1 for N sufficiently large. The last inequality is due to (24) and the nonnegativeness of the integrand. The rest is straightforward. Remark 4.1. Note that the Lagrange multipliers {λiN , i = 1, . . . , N } can be thought as a positive measure µN defined on [a, b], where for all subset A ⊂ [a, b], µN (A) :=

N 

λi 1yi (A),

i=1

 1yi (A) :=

1, if yi ∈ A, 0, if yi ∈ / A.

Under the moderate conditions, it follows by Sun and Xu (2013, Theorem 3.1), we can show that the µN converges to µ weakly, where µ is defined in Theorem 2.1. It is unclear whether or not Theorem 4.1 can be extended to the case when ξ has continuous distribution. The reason is that in the continuous case,  b  T z , ρ; u) = ∇EP [F (ˆ z , ξ)] u + ρ (h+ ) (ˆ z , η; u)dη, ψ (ˆ a

where

 

(h+ ) (ˆ z , η; u) :=

z , η; u), h (ˆ 

if h(ˆ z , η) > 0,

z , η; u), 0}, if h(ˆ z , η) ≤ 0, max{h (ˆ 1340002-15

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

H. Sun, H. Xu & Y. Wang

and

 z , ξ)]T u + ∇EP [F (ˆ

b

h (ˆ z , η; u)µ(dη) ≥ 0,

u ∈ TZ (ˆ z ).

a

It is unclear whether there exists ρ > 0 such that  b  b − h (ˆ z , η; u)µ(dη) + ρ (h+ ) (ˆ z , η; u)dη ≥ 0, a

a

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

a condition usually needed for deriving the equivalence of the optimal value between original problem (3) and its penalized problem. 4.2. A smoothing approach Like any exact penalty function method in the literature of nonlinear programming, a disadvantage of the penalty function formulation (20) is that it introduces nonsmoothness in the penalty term p(h(z, η)). Our idea here is to consider smoothing of the penalized minimization problem (20) by approximating p(x) with pˆ(x, ). Let ∈ R+ be a smooth parameter and, for every > 0,  x, if x > ,     1 (x2 + 2x + 2 ), if − ≤ x ≤ , pˆ(x, ) := (25) 4      0, if x < − and for = 0, pˆ1 (x, 0) := p(z), where p(x) = max(0, x). It is easy to verify that lim→0 pˆ(x, ) = p(x), which implies that pˆ(x, ) is continuous in at = 0 for every x. Note that the continuity of pˆ in on (0, ∞) is obvious. Moreover, pˆ(z, ·) is uniformly globally Lipschitz continuous with respect to with modulus 14 and pˆ(·, ) is uniformly globally Lipschitz continuous with modulus 1. We summarize this in a lemma below as we will need the property in the follow-up discussions. The proof is elementary, see Xu and Zhang (2009, Example 3.1). Lemma 4.1. Let pˆ be defined as in (25). Then pˆ is globally Lipschitz continuous w.r.t. x and ε, i.e., 1 |ˆ p(a, ε1 ) − pˆ(b, ε2 )| ≤ |a − b| + |ε1 − ε2 |, 4

∀ a, b ∈ R, ε1 , ε2 ∈ R+ .

Note that pˆ(x, ) is not necessarily differentiable at (z, 0). In the convergence analysis later on, we will consider the directional derivative of pˆ(x, ) as for any x ∈ R. It is easy to obtain that pˆ (x, ; d) ∈ [0, 1],

(26)

where d ≤ 1, which means that p satisfies the gradient consistency (Ralph and Xu, 2005) at x. 1340002-16

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Smoothing Penalized SAA Method for Stochastic Programs

Using the smoothing function for the plus function, we may consider the following smoothed penalty problem: min z

ψ˜N (z, ρN , N ) := EPN [F (z, ξ)] + ρN

N 

pˆ(EPN [ˆ p(yi − G(z, ξ), N )]

i=1

− EPN [p(yi − Y (ξ))], N ) s.t.

(27)

z ∈ Z,

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

where N ↓ 0 as N → ∞. Theorem 4.2. Let Assumptions 2.2 and 4.1 hold. Suppose that the conditions of Proposition 3.2 are satisfied, F (z, ξ) is convex and G(z, ξ) is concave for almost ˆ N , N ). Then every ξ ∈ Ξ. Denote the optimal value function of problem (27) by ϑ(P ∗ for any small positive number α, there exist positive constants N , C(α) and β(α) such that ˆ N , N ) − ϑ(P )| ≤ α} ≥ 1 − C(α)e−N β(α) Prob{|ϑ(P for N ≥ N ∗ . ˆ N , N ) − ϑ(P )| ≤ α}. Note that Proof. We estimate Prob{|ϑ(P ˆ N , N ) − ϑ(PN )| + |ϑ(PN ) − ϑ(P )|. ˆ N , N ) − ϑ(P )| ≤ |ϑ(P |ϑ(P

(28)

By Theorem 3.1, there exists a positive number N ∗ such that  α (29) Prob |ϑ(PN ) − ϑ(P )| ≤ ≥ 1 − C(α)e−N β(α) 2 ˆ N , N ) − ϑ(PN )|. By Theorem 4.1 for N ≥ N ∗ . In what follows, we estimate |ϑ(P and Lemma 4.1, there exists ρN such that optimal value of problem (20) is equal to ϑ(PN ). Consequently ˆ N , N ) − ϑ(PN )| ≤ sup|ψ˜N (z, ρ, N ) − ψN (z, ρ)| |ϑ(P z∈Z

≤ supρN z∈Z

N 

|ˆ p(EPN [ˆ p(yi − G(z, ξ), N )]

i=1

− EPN [ p(yi − Y (ξ))], N ) − p(EPN [ p(yi − G(z, ξ))] − EPN [ p(yi − Y (ξ))])| N   |EPN [ pˆ(yi − G(z, ξ), N ) ≤ supρN z∈Z

i=1

− p(yi − G(z, ξ))]| + ≤

1 N ρN N . 2 1340002-17

N  4 (30)

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

H. Sun, H. Xu & Y. Wang

Let N be such that 12 N ρN N ≤

α 2.

Then

ˆ N , N ) − ϑ(PN )| ≤ |ϑ(P

α . 2

(31)

Combining (28), (29) and (31), we have

  ˆ N , N ) − ϑ(P )| ≤ α} ≥ Prob |ϑ(PN ) − ϑ(P )| ≤ α Prob{|ϑ(P 2 ≥ 1 − C(α)e−N β(α)

(32)

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

for N ≥ N ∗ . The proof is complete. Theorem 4.2 guarantees that the optimal value of problem (27) converges to its true counterpart. In what follows, we propose a smoothing penalized method for problem (19). Algorithm 4.1. (Smoothing penalty method for problem (19)). Step 1. Let α be a positive number. Let ρ0 be an intial estimate of the penalty parameter and 0 be an intial smooth parameter such that N ρ0 0 ≤ α. Set t := 0. Step 2. For ρN := ρt and N := t , solve problem (27). Let xt denote the solution obtained from solving the problem. Step 3. If maxi∈{1,...,N } hiN (z) ≤ α + 2t , stop; otherwise, set xt+1 := xt , ρt+1 := 1 t and t := t + 1, go to step 2. 10ρt , t+1 := 10 In Algorithm 4.1, we use a simple way to estimate the penalty parameter and, at the same time, keep N ρt t ≤ α, which guarantees that the gap between the optimal values of the smoothing penalized problem (20) and the penalized problem (27) is less or equal to α2 , see (30). We start with an estimate of the penalty parameter and solve the smoothing penalized problem (27). We then check the feasibility of the obtained solution: if it is infeasible, then update both the penalty parameter and smoothing parameter and repeat the process, otherwise it is an optimal solution. This kind of procedure is similar to the Simple Penalty Function Method in the literature of optimization, see for instance Sun and Yuan (2006, Algorithm 10.2.3). Algorithm 4.1 terminates in a finite number of iterations in that the exact penalty parameters for problem (20) are finite, see Theorem 4.1. Before concluding this section, we make a few comments on the smoothed penalized problem (27). First, this is an ordinary smooth nonlinear minimization problem with simple constraints. Any NLP code can be applied to solve the problem. In our numerical tests, we will use fmincon installed in Matlab. Second, if F is convex and G is concave w.r.t. z, then problem (27) is convex. In such a case, we may apply the well-known cutting plane method (Kelley, 1960) or level function method (Lemarechal et al., 1995; Xu, 2001) to solve (27). Meskarian et al. (2012) recently proposed to solve (3) by a level function method, they do so by penalizing the second-order dominance to the objective and then applying level functions methods 1340002-18

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Smoothing Penalized SAA Method for Stochastic Programs

to solve the penalized problem. Note that the random variables in their model has finite distribution which correspond to our sample average approximated problem. The main difference is that here the objective function in (27) is continuously differentiable whereas the objective function in the penalized problem in Meskarian et al. (2012) is nonsmooth and hence a subgradient needs to be calculated at each iterate in the implementation of the level function methods. 5. Applications

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

In this section, we apply the proposed smoothed SAA scheme to a couple of problems and carry out some numerical tests. The first one is an academic example which is constructed to test the convergence of the SAA method. Example 5.1. Consider the following second-order dominance constrained optimization problem: min E[−G(z, ξ)], z

s.t.

E[(η − G(z, ξ))+ ] ≤ E[(η − Y (ξ))+ ],

∀ η ∈ R, z ∈ R+ ,

(33)

where G(z, ξ) := zξ − 12 z 2 , Y (ξ) = G(1, ξ) and ξ is a random variable with uniform distribution over interval [2, 3]. Observe first that E[−G(z, ξ)] = − 12 z 2 + 52 z and the global minimizer of the function is z ∗ = 2.5. Let us now look at the feasible set of the problem. It is rather complicated to work out the feasible set precisely. Here we only need to find out the optimal solution of problem (33). It is easy to verify that G(z, ξ) dominates G(1, ξ) in first order for z ∈ [1, 3] because the cumulative distribution function of the former is strictly smaller than that of the latter. This implies G(z, ξ) dominates G(1, ξ) in second order for z ∈ [1, 3] and hence the feasible set of problem (33) contains [1, 3]. We claim that z ∗ = 2.5 is the optimal solution since it is the global minimizer of the objective function and it is located in the feasible region. We carried out a number of numerical experiments on this problem with smooth penalized SAA approach (27) in Matlab 7.2 installed in a PC with Windows XP where the SAA problem is solved by the Matlab optimization solver fmincon. We set α := 0.1 the penalty parameter ρ0 = 10 and smooth parameter 0 := Nαρ0 . We perform comparative analysis with respect to 20 sample sizes ranging from 50 to 2750. The numerical results are depicted in Fig. 1. The figure shows how the optimal solution obtained from solving SAA problem changes as the sample size increases. For each sample size, 20 independent tests are carried out, each of which solves the SAA problem and yields an approximation solution. In Fig. 1, we use a vertical interval to indicate the range of the 20 approximate solutions. From the figure, we can find that the solution is very stable, even the sample is as less as 50, the numerical value is close to the true one. We can also observe a trend of convergence of the range of the approximate optimal values as sample size increases. 1340002-19

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

H. Sun, H. Xu & Y. Wang −2.9 −2.95 −3

Optimal Value

−3.05 −3.1 −3.15 −3.2 −3.25 −3.3

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

−3.35 50

350

650

950

1250 1550 Sample Size

1850

2150

2450

2750

Fig. 1. The convergence of the SAA problem with sample size increases.

The second example is a portfolio optimization problem. Example 5.2. Portfolio optimization is a well-known mathematical approach in finance (Roman and Mitra, 2009). We apply the smoothing penalty scheme (27) to a portfolio optimization problem investigated by F´ abi´ an et al. (2011) and Xu and Zhang (2009). Let R1 , R2 , . . . , Rn denote the random return rates of assets 1, 2, . . . , n. We assume that E[|Rj |] < ∞ for all j = 1, 2, . . . , n. Denoting by z1 , z2 , . . . , zn the fractions of the initial capital invested in assets 1, 2, . . . , n, we can easily derive the total return: R(z, ξ) = R1 z1 + R2 z2 + · · · + Rn zn .

(34)

Our aim is to investigate the optimal investment of a fixed capital in the n assets in order to obtain some desirable characteristics of total return. The main difficulty in formulating a meaningful portfolio optimization problem is the definition of the preference structure among feasible portfolios. There are several ways to incorporate risk-aversion into the model to address practical needs, see an excellent review by Roman and Mitra (2009). The stochastic dominance approach (Dentcheva and Ruszczy´ nski, 2006) which introduces a comparison to a benchmark return rate into the optimization problem, is a way which receives increasing attention over the past few years. Let R(z, ξ) be defined as in (34), associated with the decision vector z ∈ Z ⊂ Rm , where Z is a compact and convex set, and ξ : Ω → Rd is random vector with a probability density function ρ(ξ). Here z is interpreted as a portfolio and Z, as a set of available portfolios, is a compact and convex subset of Rn , ξ represents market uncertainties that can affect the return rate. We assume that a benchmark random return rate is Y (ξ). Our intention is to require the return rate of the new portfolio, 1340002-20

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Smoothing Penalized SAA Method for Stochastic Programs

R(z, ξ), to be preferable to Y (ξ). The following second-order dominance constrained portfolio optimization problem is formulated to address this: min

−E[R(z, ξ)]

s.t.

R(z, ξ) (2) Y (ξ).

z∈Z

(35)

Applying the smoothing penalty SAA scheme (27) to (35), we have min

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

z

 N N N   1 ρ 1  k R(z, ξ ) + pˆ pˆ(Y (ξ j ) ψ˜N (z, ρ, N ) := − N N j=1 N k=1

k=1

N 1  − R(z, ξ ), N ) − p(Y (ξ j ) − Y (ξ k )), N N



(36)

k

s.t.

k=1

z ∈ Z,

where ↓ 0 as N → ∞. We have carried out some numerical tests on (36) with Hang Seng FTSE/Xinhua China 25 index ETF (Table 1) which is a trust fund built by HSBC Provident Fund Trustee (Hong Kong) Limited, as our benchmark. The samples are based on the historical data downloaded from the stock market which consists of daily closing prices for those 25 stocks, over a one-year period spanning from 31 July 2008 to 31 July 2009,a including 226 historical samples of monthly return, which means that the problem size is n = 25 and the sample size is N = 226. The feasible set of (36) is specified as:  n  n zi ≤ 1, zi ≥ 0, for i = 1, 2, . . . , n , Z := z ∈ R : i=1

R(z, ξ k ) = R1k z1 + R2k z2 + · · · + Rnk zn where Rik = pk+∆ /pki for i = 1, 2, . . . , n, pki i denotes the closing price in the real market of stock i on day k, and ∆ = 19 which specifies one month trading time gap. We set Y (ξ) := R(¯ z , ξ) and the benchmark decision vector z¯ as listed in Table 2. Our tests are carried out through Matlab 7.2 installed in a PC with Windows XP where (36) is solved by a Matlab optimization solver fmincon. We set α := 0.1, the initial penalty parameter ρ0 is set 10 while the initial smoothing parameter 0 is set Nαρ0 . We obtain an approximate optimal value n N 1  k Ri zi = 1.3082 N i=1 k=1

a See

http://www.hangseng.com/etf and http://www.hkex.com.hk adjusted for stock splitting. 1340002-21

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

H. Sun, H. Xu & Y. Wang

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

Table 1. The portfolio of FTSE/Xinhua China 25 index until July 31st 2009. Code

Name of Stock

Weight(%)

HKG:0941 HKG:0939 HKG:2628 HKG:1398 HKG:3398 HKG:2388 HKG:2318 HKG:1088 HKG:0875 HKG:0386 HKG:3328 HKG:3968 HKG:0883 HKG:0762 HKG:0728 HKG:0998 HKG:1800 HKG:1898 HKG:2600 HKG:0390 HKG:2899 HKG:1919 HKG:0902 HKG:0991 HKG:0753

China Mobile Ltd. China Construction Bank Corporation China Life Insurance Company Limited Industrial and Commercial Bank of China Bank of China Limited BOC Hong Kong Ping An Insurance (Grp) Co. of China Ltd. China Shenhua Energy Company Limited PetroChina Company Limited China Petroleum & Chemical Corporation Bank of Communications Co., Ltd. China Merchants Bank Co., Ltd CNOOC Limited China Unicom (Hong Kong) Limited China Telecom Corporation Limited China CITIC Bank Corporation Limited China Communications Construction Co., Ltd China Coal Energy Company Limited Aluminum Corporation of China Limited China Railway Group Limited Zijin Mining Group Co., Ltd. China COSCO Holdings Company Limited Huaneng Power International, Inc. Datang Inter. Power Generational Co., Ltd. China Power New Energy Devlpmnt Co., Ltd

9.77 8.88 8.11 6.81 5.91 4.27 4.24 4.22 4.06 4.02 3.98 3.96 3.91 3.83 3.75 3.24 2.89 2.88 2.30 1.90 1.87 1.87 1.21 1.09 1.07

and the residual error   N n N n   1  1  i k i k max pˆ(Y (ξ ) − Ri zi , N ) − p(Y (ξ ) − Ri z¯i ), N i∈{1,...,N } N N i=1 i=1 k=1

k=1

≤ 4.1970e−004, which means that the benchmark decision is not SSD (Stochastic Second-Order Dominance) efficient in comparison with the computed approximate optimal solution. We denote the computed portfolio as P ∗ . Table 2 displays the in-sample performance of the two portfolios considered. The portfolio P ∗ behaves better than the benchmark in five of the seven measures used: higher mean, high Median, higher skewness, higher minimum and higher maximum but with higher variance, higher kurtosis. In Fig. 2, we depict the cumulative distribution function for the return of P ∗ and Hang Seng FTSE/Xinhua China 25 index ETF by matlab function cdf plot. It is obvious that the cumulative distribution function of P ∗ takes a “smaller” value for low outcomes, which means that at lower levels, its outcomes are distributed

1340002-22

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Smoothing Penalized SAA Method for Stochastic Programs Table 2. Statistics for the return distribution of P ∗ and Hang Seng FTSE/Xinhua China 25 index ETF.

Mean Median St. Deviation Skewness Kurtosis Excess Minimum Maximum

P∗

Benchmark

1.0382 1.0390 0.1841 0.4137 4.0010 0.5155 1.7625

1.0097 1.0308 0.1470 −0.2709 3.1709 0.5411 1.4564

Empirical CDF

0.9 0.8 0.7 Probability

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

1

0.6 0.5 0.4 0.3 P* index

0.2 0.1 0 0.4

0.6

0.8

1

1.2 Return

1.4

1.6

1.8

2

Fig. 2. Cumulative distribution function for returns of P ∗ and Hang Seng FTSE/Xinhua China 25 index ETF.

further to the right than those of the index. This is expected since P ∗ dominates the index with respect to SSD. Acknowledgments The authors are grateful to the two anonymous referees for their insightful comments which have significantly helped improve the quality of the paper. References Bonnans, JF and A Shapiro (2000). Perturbation Analysis of Optimization Problems, Springer Series in Operations Research, Springer-Verlag.

1340002-23

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

H. Sun, H. Xu & Y. Wang

Clarke, FH (1983). Optimization and Nonsmooth Analysis. New York: Wiley. Dentcheva, D and A Ruszczy´ nski (2003). Optimization with stochastic dominance constraints. SIAM Journal on Optimization, 14, 548–566. Dentcheva, D and A Ruszczy´ nski (2004). Optimality and duality theory for stochastic optimization with nonlinear dominance constraints. Mathematical Programming, 99, 329–350. Dentcheva, D and A Ruszczy´ nski (2006). Portfolio optimization with stochastic dominance constraints. Journal of Banking and Finance, 30, 433–451. Dentcheva, D and A Ruszczy´ nski (2007). Composite semi-infinite optimization. Control and Cybernetics, 36, 1–14. F´ abi´ an, C, G Mitra and D Roman (2011). Processing second-order stochastic dominance models using cutting-plane representations. Mathematical Programming, 130, 33–57. Gugat, M (1999). A parametric review on the Mangasarian-Fromovitz constraint qualification. Mathematical Programming, 85, 643–653. Homen-De-Mello T (2008). On rates of convergence for stochastic optimization problems under non-independent and identically distributed sampling. SIAM Journal on Optimization, 19, 524–551. Hu, J, T Homen-De-Mello and S Mehrotra. Sample average approximation of stochastic dominance constrained programs. Accepted in Mathematical Programming, Ser. A (2012), 133, 171–201. Kelley, JE, Jr. (1960). The cutting-plane method for solving convex programs. SIAM Journal on Applied Mathematics, 8, 703–712. Lemarechal, C, A Nemirovskii and Y Nesterov (1995). New variants of bundle methods. Mathematical Programming, 69, 111–147. Liu, Y and H Xu (2013). Stability and Senstivity Analysis of Stochastic Programs with Second Order Dominance Constrains, Preprint, School of Mathematics, University of Southampton, to appear in Mathematical Programming Series A. Meskarian, R, H Xu and J Fliege (2012). Numerical methods for stochastic programs with second order dominance constraints with applications to portfolio optimization, European Journal of Operational Research, 216, 376–385. M¨ uller, A and M Scarsini (eds.) (1991). Stochastic Orders and Decision Under Risk. Hayward, CA: Institute of Mathematical Statistics. Ogryczak, W and A Ruszczy´ nski (1999). From stochastic dominance to mean-risk models: Semideviations as risk measures. European Journal of Operational Research, 116, 33–50. Qi, L, A Shapiro and C Ling (2005). Differentiability and semismoothness properties of integral functions and their applications. Mathematical Programming, 102, 223–248. Ralph, D and H Xu (2005). Implicit smoothing and its application to optimization with piecewise smooth equality constraints. Journal of Optimization Theory and Applications, 124, 673–699. Robinson, SM (1996). Analysis of sample-path optimization. Mathematics of Operations Research, 21, 513–528. Roman, D and G Mitra (2009). Portfolio selection models: A review and new directions. Wilmott Journal, 1, 69–85. R¨ omisch, W (2003). Stability of stochastic programming problems, In Stochastic Programming, A Ruszczynski and A Shapiro (eds.), Handbooks in Operations Research and Management Science, Vol. 10, pp. 483–554. Amsterdam: Elsevier.

1340002-24

1340002.tex

4th Reading June 20, 2013 15:13 WSPC/S0217-5959

APJOR

1340002.tex

Asia Pac. J. Oper. Res. 2013.30. Downloaded from www.worldscientific.com by 177.43.75.154 on 06/13/14. For personal use only.

Smoothing Penalized SAA Method for Stochastic Programs

Rudolf, G and A Ruszczy´ nski (2008). Optimization problems with second order stochastic dominance constraints: Duality, compact formulations, and cut generation methods. SIAM Journal on Optimization, 19, 1326–1343. Rusczy´ nski, A and A Shapiro (2003). Stochastic programming models, in Stochastic Programming, A Rusczy´ nski and A Shapiro (eds.), Handbooks in OR & MS, Vol. 10, pp. 1–64. Amsterdam: North-Holland Publishing Company. Shapiro, A and H Xu (2008). Stochastic mathematical programs with equilibrium constraints, modeling and sample average approximation. Optimization, 57, 395–418. Sun, H and H Xu (2013). Convergence analysis of stationary points in sample average approximation of stochastic programs with second order stochastic dominance constraints. To appear in Mathematical Programming Ser. B. Sun, W and Y Yuan (2006). Optimization Theory and Methods. New York: Springer. Xu, H (2001). Level function method for quasiconvex programming. Journal of Optimization Theory and Applications, 108, 407–437. Xu, H (2010). Uniform exponential convergence of sample average random functions under general sampling with applications in stochastic programming. Journal of Mathematical Analysis and Applications, 368, 692–710. Xu, H and D Zhang (2009). Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications. Mathematical Programming, 119, 371–401.

Hailin Sun received his B.Sc. degree from Jilin University, China. He is a PhD student in Department of Mathematics, Harbin Institute of Technology, China. His research interests include nonsmooth stochastic optimization and distributional robust optimization. Huifu Xu received the B.Sc. degree in mathematics from Nanjing University, Nanjing, China, in 1986 and the PhD degree in optimization from Ballarat University, Victoria, Australia, in 1999. He is a Professor in the School of Engineering and Mathematical Sciences, City University of London, U.K. His recent research interests include stochastic optimization and equilibrium problems and robust analysis for these problems. Yong Wang received the B.Sc. degree, M.Sc. degree and PhD degree in Harbin Institute of Technology, China, in 1982, 1988 and 2005 respectively. He is a Professor in Department of Mathematics, Harbin Institute of Technology, China. His recent research interests include stochastic process, stochastic optimization and control.

1340002-25