strategic measures in optimal control problems for ... - Semantic Scholar

19 downloads 0 Views 198KB Size Report
same role as the general strategic measures in the general mod- els. The most important properties .... If P ∈ D is an extreme point then P ∈ Dϕ. If the transition ...
STRATEGIC MEASURES IN OPTIMAL CONTROL PROBLEMS FOR STOCHASTIC SEQUENCES 1 X. Mao Dep. of Statistics and Modelling Science. Livingstone Tower, University of Strathclyde, 26 Richmond Street, Glasgow G1 1XH, UK

A. Piunovskiy Inst. of Physics and Technology, 119034 Moscow, Prechistenka, 13/7, Russia

ABSTRACT

Controlled discrete-time stochastic processes are studied using the convex-analytic approach. Some new properties of strategic measures spaces are established, particular Markov models are considered. The meaningful example is presented.

1. INTRODUCTION. In the study of controlled discrete-time stochastic processes the problem usually consists of the constructing of a control strategy which provides the extremum of some criterion. Very often the criterion is the expectation of some real-valued functional depending on trajectories; integrating is carried out according to 1

The work was supported by the Grant of Royal Society

1

the ”strategic measure” which is completely defined by the control strategy (the initial distribution is fixed). One of the main questions arising when investigating control problems is the following: to which strategy class can one restrict oneself in order to solve the problem? As a rule, one can prove the sufficiency of the class of nonrandomized strategies (selectors), and in some more particular cases – the sufficiency of smaller classes (Markov selectors for Markov models, stationary selectors for homogeneous models with the infinite horizon, and so on). These questions were discussed in [1, 2] and in many other works. A very good idea of the traditional control theory can be obtained from the monograph [3]. The basic instrument there is the dynamic programming approach. During last ten years, another approach was developed based on the abstract convex programming. The optimal control problem is considered as the convex programming problem in the space of strategic measures (or occupation measures for particular Markov models). The advantage of such approach is the possibility of the investigation of control problems under functional constraints and other problems of vector optimization. The convex programming approach is presented in [4] for models with the denumerable state space and in [5] for general Borel models; the similar approach is used also in [6, 7], see also other works of the same authors. The first stage of such investigation is the study of the strategic measures space. The present article is devoted to this question. The main result is formulated as Theorem 4: the compactness of the strategic measures space is proved under more general conditions than in the works cited above. In Section 3 other properties of this space are also formulated which are familiar today; in Section 4 the use of these properties is demonstrated for the study of optimal control problems including the problems with functional constraints. It is shown that when looking for a solution one can restrict oneself to finite mixtures of nonrandomized 2

strategies. Sections 5 and 6 are devoted to the specific Markov models: with the finite horizon and with the discount factor. The definitions of occupation measures are formulated which play the same role as the general strategic measures in the general models. The most important properties of the occupation measures spaces are formulated, the analysis of the corresponding optimal control problems is performed. In Section 7 the simplest meaningful example is presented to illustrate the theoretical results and is of its own interest. Just a few words about designations: I{·} is the indicator function, δi,j is the Kronecker symbol, ΓY is a measurable subset of the Borel space (Y, B(Y )), if ν is a measure on the Borel space Z Z (Y, B(Y )) then integrals are denoted by f (y)ν(dy) or f (y)dν. Y

Y

2. BASIC DEFINITIONS. Let (X, B(X)) and (A, B(A)) be the Borel spaces of states and actions, respectively. The initial distribution of the states P0 (dx0 ) is assumed to be given. Put Ht = X × (A × X)t , t = 0, 1, 2, ... . The elements ht = (x0 , a1 , x1 , ..., at , xt ) ∈ Ht are the trajectories accessible to observation at the instant t. The space Gt = X × (A × X)t−1 × A has the anologous meaning. The natural σ-algebras B(Ht ), B(Gt ) of the direct products are introduced in the spaces Ht , Gt . The case t = ∞ for (H∞ , B(H∞ )) is not exempted from the rule. All the spaces Ht , Gt , H∞ are Borel spaces, they are equipped with the Tychonoff topology of component-wise convergence. Let us assume that the transition probability (e.g. the measurable stochastic kernel) p(dxt |gt ) is given which defines the distribution of the states xt under the trajectory gt ∈ Gt realized (t = 1, 2, ...). Definition 1 A control strategy π = {πt }∞ t=1 is a sequence of measurable stochastic kernels on A. If for any t = 1, 2, ... there exists a measurable mapping ϕ(t, ht−1 ) : Ht−1 → A such that 3

πt (Γ|ht−1 ) = I{Γ 3 ϕ(t, ht−1 )} at any Γ ∈ B(A) then the strategy is denoted by the symbol ϕ and is called a selector. Assume that the basic probability space (Ω, F) coincides with the Borel space (H∞ , B(H∞ )). Now xt (t = 0, 1, 2, ...) and at (t = 1, 2, ...) are the random elements defined by the projections ω = h∞ = (x0 , a1 , x1 , ...) −→ xt ; (1) −→ at .

ω

We shall need also the following projections ω −→ ht = (x0 , a1 , x1 , ..., at , xt ); (2) ω −→ gt = (x0 , a1 , x1 , ..., at ) which define random trajectories. The preimages of σ-algebras B(Ht ) and B(Gt ) relative to the projections (2) are denoted by Ft and Gt (σ-algebras in Ω). The random elements (1) are consistent with the non-decreasing family of σ-algebras {Ft }t≥0 ⊆ F. In accordance with the Ionescu-Tulcea theorem for each control strategy π there exists the unique probability measure P π on (Ω, F) such that a) P π {x0 ∈ ΓX } = P0 (ΓX ); Z b) P π {gt ∈ ΓG , xt ∈ ΓX } = p(ΓX |gt )PtG,π (dgt ); ΓG Z

c) P π {ht−1 ∈ ΓH , at ∈ ΓA } = X

H,π πt (ΓA |ht−1 )Pt−1 (dht−1 )

ΓH

at each t = 1, 2, ... , Γ ∈ B(X), ΓA ∈ B(A), ΓG ∈ B(Gt ), ΓH ∈ B(Ht−1 ). Here and below PtH,π and PtG,π are the images of P π relative to the projections (2). Remark 1 In the left hand parts of a),b), and c) xt , at , gt , and ht−1 are random elements (see (1) and (2)); but in the right hand parts gt and ht−1 are the parameters of the integration. We hope that in what follows, this will not generate a misunderstanding. 4

Let us designate D = {P π } ⊆ P (Ω) the set of all strategic measures. Here and below P (Ω) is the set of all probability measures on the Borel space (Ω, B(Ω)). The weak topology is fixed in P (Ω); in this case (P (Ω), B(P (Ω))) is the Borel space. Dϕ ⊆ D is the set of all strategic measures generated by selectors. The constructions presented are traditional for the theory of controlled stochastic sequences [1]–[5]. In the general case the performance criterion (e.g. the loss functional) is the measurable real functional R(P π ) : D → [−∞, +∞]. The optimal control problem consists in the minimization of R(·) on the given subset D1 ⊆ D: R(P ) −→ inf ;

(3)

P ∈D1



a strategy π ∗ ∈ D1 is called optimal if R(P π ) = inf R(P ). P ∈D1

The loss functional is called integral if R(P ) =

Z

R(ω)P (dω),

(4)



where R(·) : Ω → [−∞, +∞] is the given measurable function. Here and below Z

4

R(ω)P (dω) =



Z

+

R (ω)P (dω) +



Z

R− (ω)P (dω),



4

4

R+ (ω) = max{0, R(ω)}; R− (ω) = min{0, R(ω)}; ” + ∞”+ 4 ” − ∞” = ” + ∞. Problem (3) will be considered as the problem of mathematical programming on the linear topological space E(Ω) of all bounded measures on Ω (with alternating signs) equipped with the weak topology σ(E(Ω), C(Ω)). As usual, C(Ω) is the linear space of all continuous bounded functions on Ω, hc, ei is the natural coupling 4

hc, ei =

Z

c ∈ C(Ω), e ∈ E(Ω).

c(ω)e(dω),



In this connection the space D plays an important role. Its properties are studied in the next section. 5

3. PROPERTIES OF THE STRATEGIC MEASURES SPACE D. Theorem 1 [1]. The space D is measurable and convex. The measure P ∈ P (Ω) is strategic if and only if the projection of P on H0 = X is coincident with P0 (·) and for each t = 1, 2, ... ∀c ∈ C(Ht ) the following equality holds Z

P (dω)c(ht−1 , at , xt ) =



Z Ω

 Z  P (dω) 

X



c(ht−1 , at , y)p(dy|ht−1 at ) .

(Here xt (ω), at (ω), and ht (ω) are defined by reflections (1) and (2).) Definition 2 . A finite mixture of strategies π 1 , π 2 , ..., π M (with weights λm ≥ 0,

M X

λm = 1) is a strategy π for which P π =

m=1 M X

m

λm P π . In this case the measure P π will be also called a

m=1

finite mixture for short. Theorem 2 [5]. If P ∈ D is an extreme point then P ∈ Dϕ . If the transition probabilities are continuous then the set D is closed in P (Ω). Theorem 3 [6]. For each point P ∈ D there exists a probability measure µ ∈ P (Dϕ ) such that P =

Z

Qµ(dQ).



The main theoretical result of the present work consists in the proof of the following theorem. Theorem 4 Assume that the transition probabilities are continuous, the space X is topologically complete, and A is compact. Then D is a metrizable compactum. 6

Proof. The space D is metrizable unconditionally. This fact follows from Theorem 1. (Note that P (Ω) is the Borel space [2].) Let us show that D is a compactum. First of all let us establish that for each t = 1, 2, ... {PtG , P ∈ D} and {PtH , P ∈ D} are compactums. Clearly {P0 , P ∈ D} is a compactum. (This set contains the single point P0 .) Suppose that the statement formulated holds for some t. Since the space Ht is complete and separable the compactness of the set {PtH , P ∈ D} implies its tightness by the Prokhorov theorem [8]: ∀ε∃KεHt ⊆ Ht (KεHt is compact): ∀P ∈ D PtH (KεHt ) ≥ 1 − ε. For each ε > 0 4 denote by KεGt+1 = KεHt ×A the compact in Gt+1 . Clearly ∀P ∈ D G G Pt+1 (KεGt+1 ) ≥ 1 − ε, therefore the set {Pt+1 , P ∈ D} is precomG pact. According to Theorem 2 it is closed; hence {Pt+1 , P ∈ D} Gt+1 is a compactum. The reflection Kε/2 → P (X) defined by the formula P˜t+1 (·) = p(·|gt+1 ) is continuous by the condition of the theorem. Therefore {P˜t+1 , G

gt+1 ∈ Kε/2t+1 } is a compactum and by the Prokhorov theorem X X ) = ⊆ X such that P˜t+1 (Kε/2 there exists a compactum Kε/2 G X p(Kε/2 |gt+1 ) ≥ 1 − ε/2 for all gt+1 ∈ Kε/2t+1 . Obviously for each H (P ∈ D) the relations measure Pt+1 G

H X Pt+1 (Kε/2t+1 × Kε/2 )=

Z

X G p(Kε/2 |gt+1 )Pt+1 (dgt+1 ) ≥

G

Kε/2t+1



Z

G (1 − ε/2)Pt+1 (dgt+1 ) ≥ (1 − ε/2)(1 − ε/2) > 1 − ε

G

Kε/2t+1 H , P ∈ D} is tight; hence it is are valid, that is the set {Pt+1 precompact. According to Theorem 2 it is closed. Therefore the H , P ∈ D} is compact. The induction proposition is set {Pt+1 proved. Let us take an arbitrary sequence of measures n P˜ from D. Let H { m P }∞ −→ PtH . m=1 be its subsequence such that ∀t = 1, 2, ... m Pt m→∞

7

According to the Ionescu Tulcea theorem there exists the unique strategic measure P ∈ D the images of which relative to the projections (2), that is PtH and PtG coincide with the limits m→∞ lim m PtH = PtH ; m→∞ lim m PtG = PtG at any t. Here Theorem 2 was used, too. Let us show that m→∞ lim m P = P . Fix an arbitrary open set O ⊆ H∞ . By the definition of the topological product O = OH0 ×(A×X)∞ {

[ [

[OGt ×X×(A×X)∞

[

t≥1

OHt ×(A×X)∞ ]},

where OHt (t ≥ 0), OGt (t ≥ 1) are some open sets in Ht and in Gt . Let us designate by T [ [

OT = OH0 ×(A×X)∞ {

[OGt ×X×(A×X)∞

t=1

the open set in H∞ , T ≥ 0. Then O = T [ T S

OH0 × (A × X)

{

[OGt × X × (A × X)

t=1

lim

m P (O)

≥ lim

m→∞

m P (OT )

4

OHt × (A × X)T −t ]}

m P (O)

= lim

m→∞

OHt ×(A×X)∞ ]}

OT . The set OTH =

T ≥0 T −t [

is open in HT . Clearly ∀m ≥ 1 ∀T ≥ 0 m→∞

[

[



m P (OT ),

H H m PT (OT )

hence



PTH (OTH ) = P (OT ). Therefore lim

m→∞

m P (O)

≥ lim P (OT ) = P (O) T →∞

and m→∞ lim m P = P . So D is a compactum and the proof is complete. Corollary 1 Under the conditions of Theorem 4 the set of all finite mixtures of selectors is everywhere dense in D. To put it differently, the closed convex hull of the set Dϕ coincides with D. The proof follows directly from the Krein-Milman theorem [9] and Theorem 2. 8

4. SOLVABILITY OF THE OPTIMAL CONTROL PROBLEM AND SUFFICIENT CLASSES OF STRATEGIES. We shall assume that all the conditions of Theorem 4 are fulfilled. Suppose that D1 = D and the functional R(·) is bounded and continuous. According to Corollary 1 for each point P ∈ D and for each ε > 0 there exists a finite mixture of selectors M X P˜ = P ϕm such that R(P˜ ) ≤ R(P ) + ε. Therefore, in this m=1

case one can restrict oneself to finite mixtures of selectors when looking for a solution of problem (3). Clearly in order to have an exact solution of problem (3), it is sufficient to require the lower semicontinuity of the functional R(·) and the compactness of D1 . Lemma 1 [10] If R(·) is an integral functional then for each P ∈ D and each K > −∞ there exists a selector ϕ such that ϕ

    

R(P )

if

R(P ) > −∞;

  

K

if

R(P ) = −∞.

R(P ) ≤ 

Corollary 2 If R(·) is an integral functional with bounded below and lower semicontinuous function R(ω) then there exists a selector ϕ∗ providing a solution of problem (3). In order to prove this statement it is sufficient to note that the functional R(·) is bounded below and lower semicontinuous. Let us consider the case when D1 = {P ∈ D : S1 (P ) ≤ 0, S2 (P ) ≤ 0, ..., SN (P ) ≤ 0};

(5)

where {S1 (·), S2 (·),...,SN (·)} is a set of additional loss functionals. In this connection, problem (3) is called the problem with functional constraints. Theorem 5 Assume that the functionals Sn (P ), n = 1, 2, ..., N are bounded below and lower semicontinuous, the functional R(·) is 9

integral with function R(ω) being bounded below and lower semicontinuous, and the set D1 6= ∅ is not empty. Then there exists a solution P ∗ of problem (3) such that P ∗ is an extreme point of D1 . Proof. Clearly D1 is a compactum and problem (3) has a solution P˜ . In accordance with the Choquet theorem [9] each point P ∈ D1 is a barycenter of some probability measure µP concentrated on the set E of all extreme points of D1 and R(P ) =

Z

R(Q)µP (dQ).

E

˜ = {Q ∈ E : R(Q) ≤ R(P˜ )}. Then µP˜ (D) ˜ > 0. It remains Put D ˜ only to select an arbitrary point P ∗ ∈ D. Theorem 6 . Assume that all the functionals R(·), Sn (·), n = 1, 2, ..., N are integral, the functions Sn (ω), n = 1, 2, ..., N are continuous and bounded, and one of the following conditions is satisfied: (a) D1 6= ∅ and the function R(ω) is continuous and bounded; (b) the function R(ω) is bounded below and lower semicontinuous, and the Slater condition is fulfilled: for some point Pˆ ∈ D the strong inequalities Sn (Pˆ ) < 0, n = 1, 2, ..., N , and R(Pˆ ) < +∞ hold. Then there exists a solution of problem (3),(5) of the form of a mixture of (N + 1) selectors. Proof. (a) In accordance with Theorem 5 there exists a solution P of problem (3),(5) such that the point (R(Pˆ ∗ ), S1 (Pˆ ∗ ), S2 (Pˆ ∗ ), ..., SN (Pˆ ∗ )) belongs to the boundary of the convex compactum ˆ∗

4

V = {(R(P ), S1 (P ), S2 (P ), ..., SN (P )), P ∈ D} ⊂ RN +1 and by the Caratheodory theorem can be represented in the form of convex combination of (N + 1) extreme points of V . But every 10

extreme point of V corresponds to some extreme point of D [9] since the reflection (R(·), S1 (·), S2 (·), ..., SN (·)) : D −→ RN +1 is affine and continuous. It remains only to use Theorem 2. (b) The proof in this case can be found in [5]. 5. MARKOV MODELS WITH THE INTEGRAL FUNCTIONALS. A model is called Markov if the transition probability is of the form pt (dxt |xt−1 , at ). Assume that the function R(ω) in the expression (4) is of the form R(ω) =

T X

rt (xt−1 , at )

(6)

t=1

and consider a Markov model. Definition 3 Two strategies π 1 and π 2 are called equivalent if the equality Z Ω

 

T X

t=1



1

rt (xt−1 , at ) P π (dω) =

Z Ω

 

T X

t=1



2

rt (xt−1 , at ) P π (dω)

holds for any non-negative measurable function rt (x, a). Definition 4 The occupation measure for the strategy π is the probability measure ν π (·) on the set {1, 2, ..., T } × X × A which is defined in the following way ν π ({t1 , t2 , ..., tI } × ΓX × ΓA ) = 1 T

Z Ω

 

I X

X

I{xti −1 ∈ Γ , ati ∈

i=1



ΓA } P π (dω).

The space of all occupation measures is denoted by Do .

11

(7)

It is obvious that the strategies π 1 and π 2 are equivalent if and only if the corresponding occupation measures coincide. Besides, the measure ν on the set {1, 2, ..., T } × X × A is the occupation measure if and only if the measure ν({1}×ΓX ×A) on X coincides with T1 P0 (ΓX ) and the equality Z

I{t = θ + 1}f (x)dν =

Z

dνI{t = θ}

{1,2,...,T }×X×A

{1,2,...,T }×X×A

 Z  ×

X



f (y)pθ (dy|x, a)

is valid for any measurable bounded function f (x) and any θ = 1, 2, ..., T − 1. Suppose that if two strategies π 1 and π 2 are equivalent then 1 2 1 2 / D1 & P π ∈ / D1 . For either P π ∈ D1 & P π ∈ D1 or P π ∈ instance, this condition is satisfied if D1 is of the form (5) with the functionals Sn (·), n = 1, 2, ..., N being defined by the expressions similar to (4) and (6). Remark 2 All the functions rt (·), t = 1, 2, ..., T in formula (6) are assumed to be bounded above or below; this also pertains to the functions snt (·), n = 1, 2, ..., N . Definition 5 A strategy is called Markov if it is defined by stochastic kernels of the type πtm (dat |xt−1 ); a selector is called Markov if it has the form ϕm (t, xt−1 ). Doϕ is the set of all occupation measures corresponding to Markov selectors. There exists a Markov strategy in each class of equivalent strategies: it is sufficient to represent an occupation measure ν in the form ν({t} × dx × da) =

1 ˜ Pt−1 (dx)πtm (da|x) T

m

and to verify that ν = ν π . 12

(8)

Under the assumptions made one can investigate the problem T X

Z

t=1X×A

rt (x, a)ν({t} × dx × da) −→ info ν∈D1 m

(9) m

instead of problem (3). Here D1o = {ν ∈ Do : P π ∈ D1 }, P π being the strategic measure for the Markov strategy π m from the representation (8). If ν ∗ is a solution of problem (9) then the optimal strategy π m∗ can be built by formula (8). Naturally we do not go beyond the class of Markov strategies by this approach. Obviously formula (7) defines the continuous affine reflection from D on Do . Therefore theorems from Section 3 imply the following properties of the space Do (the more detailed reasonings can be found in [5]): – the set Do is convex; – if ν ∈ Do is an extreme point then there exists a Markov m selector ϕm such that ν = ν ϕ ; – if the transition probabilities pt (dy|x, a) are continuous then the set Do is closed in P ({1, 2, ..., T } × X × A); if in addition X is topologically complete and A is compact then Do is a compactum and coincides with the closed convex hull of the set Doϕ . Hence the corresponding results of Section 4 hold for problem (9) if D is replaced by Do and Dϕ is replaced by Doϕ . For instance let us present the statements corresponding to theorems 5 and 6. So problem (9) is considered where the set D1 is of the form (5) all the functionals Sn (·), n = 1, 2, ..., N being of the type (4),(6); Sn (ω) =

T X

snt (xt−1 , at ).

t=1

In other words one should minimize the expression T X

Z

t=1X×A

rt (x, a)ν({t} × dx × da) −→ info ν∈D

(10)

under the constraints T X

Z

snt (x, a)ν({t} × dx × da) ≤ 0,

t=1X×A

13

n = 1, 2, ..., N.

(11)

Theorem 7 Assume that the transition probabilities pt (dy|x, a) are continuous, the space X is topologically complete, A is compact, and all the functions rt (x, a), snt (x, a), n = 1, 2, ..., N are lower semicontinuous and bounded below. If D1o 6= ∅ (that is, the class of admissible occupation measures meeting inequalities (11) is not empty) then there exists a solution of problem (10),(11) which is an extreme point of D1o and corresponds to some Markov strategy. If the functions snt (x, a), n = 1, 2, ..., N are continuous and bounded and one of the following conditions is satisfied: (a) D1o 6= ∅ and the function rt (x, a) is continuous and bounded; (b) the Slater condition is satisfied: for some point ν ∈ Do expression (10) is finite and all the inequalities (11) are strong; then there exists a solution of problem (10),(11) of the form of a mixture of (N + 1) Markov selectors. 6. DISCOUNTED MARKOV MODELS. A Markov model is called homogeneous if the transition probability p(dxt |xt−1 , at ) does not depend on time. Assume that the function R(ω) in the expression (4) is of the form ∞ X R(ω) = β t−1 r(xt−1 , at ) (12) t=1

and consider a homogeneous model. The parameter β ∈ (0, 1) is called the discount factor. Definition 6 Two strategies π 1 and π 2 are called equivalent if the equality Z Ω

 

∞ X

t=1



1

β t−1 r(xt−1 , at ) P π (dω) =

Z Ω

 

∞ X

t=1



holds for any non-negative measurable function r(x, a).

14

2

β t−1 r(xt−1 , at ) P π (dω)

Definition 7 The discounted occupation measure for the strategy π is the probability measure νdπ (·) on the set X × A which is defined in the following way νdπ (ΓX ×ΓA ) = (1−β)

Z Ω

 

∞ X

t=1



β t−1 I{xt−1 ∈ ΓX , at ∈ ΓA } P π (dω).

(13) The space of all discounted occupation measures is denoted by the symbol Dod = {νdπ }. It is obvious that the strategies π 1 and π 2 are equivalent if and only if the corresponding discounted occupation measures coincide. Besides, the measure ν on the set X × A is the discounted occupation measure if and only if the equality Z X×A

f (x)dν = (1 − β)

Z

f (y)P0 (dy) + β

X

Z X×A



 Z  

X



f (y)p(dy|x, a)

is valid for any measurable bounded function f (x). The detailed proofs can be found in [5]. Suppose that if two strategies π 1 and π 2 are equivalent then 1 2 1 2 either P π ∈ D1 & P π ∈ D1 or P π ∈ / D1 & P π ∈ / D1 . For instance, this condition is satisfied if D1 is of the form (5) with the functionals Sn (·), n = 1, 2, ..., N being defined by the expressions similar to (4) and (12). (The discount factor β is the same for all the functionals.) Remark 3 The function r(·) in formula (12) is assumed to be bounded above or below; this also pertains to the functions sn (·), n = 1, 2, ..., N . Definition 8 A strategy is called stationary if it is defined by stochastic kernels of the type π s (dat |xt−1 ) at each t = 1, 2, ..., N ; a selector is called stationary if it has the form ϕs (xt−1 ). Dodϕ is the set of all discounted occupation measures corresponding to stationary selectors. 15

There exists a stationary strategy in each class of equivalent strategies: it is sufficient to represent a discounted occupation measure νd in the form νd (dx × da) = P˜t−1 (dx)π s (da|x)

(14)

s

and to verify that νd = νdπ . Under the assumptions made one can investigate the problem Z

r(x, a)ν(dx × da) −→ infod

(15)

ν∈D1

X×A

s

s

instead of problem (3). Here D1od = {ν ∈ Dod : P π ∈ D1 }, P π being the strategic measure for the stationary strategy π s from the representation (14). If νd∗ is a solution of problem (15) then the optimal strategy π s∗ can be built by formula (14). Naturally we do not go beyond the class of stationary strategies by this approach. Obviously formula (13) defines the continuous affine reflection from D on Dod . (Note that the function F (ω) =

∞ X

β t−1 f (xt−1 , at )

t=1

is continuous and bounded if the function f (x, a) is countinuous and bounded [5].) Therefore theorems from Section 3 imply the following properties of the space Dod (the more detailed reasonings can be found in [5]): – the set Dod is convex; – if ν ∈ Dod is an extreme point then there exists a stationary s selector ϕs such that ν = νdϕ ; – if the transition probability p(dy|x, a) are continuous then the set Dod is closed in P (X × A); if in addition X is topologically complete and A is compact then Dod is a compactum and coincides with the closed convex hull of the set Dodϕ . Hence the corresponding results of Section 4 hold for problem (15) if D is replaced by Dod and Dϕ is replaced by Dodϕ . For instance let us present the statements corresponding to theorems 5 and 6. 16

So problem (15) is considered where the set D1 is of the form (5) all the functionals Sn (·), n = 1, 2, ..., N being of the type (4),(12); Sn (ω) =

∞ X

β t−1 sn (xt−1 , at ).

t=1

In other words one should minimize the expression Z

r(x, a)ν(dx × da) −→ infod ν∈D

X×A

(16)

under the constraints Z

sn (x, a)ν(dx × da) ≤ 0,

n = 1, 2, ..., N.

(17)

X×A

Theorem 8 Assume that the transition probability p(dy|x, a) is continuous, the space X is topologically complete, A is compact, and all the functions r(x, a), sn (x, a), n = 1, 2, ..., N are lower semicontinuous and bounded below. If D1od 6= ∅ (that is, the class of admissible discounted occupation measures meeting inequalities (17) is not empty) then there exists a solution of problem (16),(17) which is an extreme point of D1od and corresponds to some stationary strategy. If the functions sn (x, a), n = 1, 2, ..., N are continuous and bounded and one of the following conditions is satisfied: (a) D1od 6= ∅ and the function r(x, a) is continuous and bounded; (b) the Slater condition is satisfied: for some point ν ∈ Dod expression (16) is finite and all the inequalities (17) are strong; then there exists a solution of problem (16),(17) of the form of a mixture of (N + 1) stationary selectors. 7. EXAMPLE: OPTIMIZATION OF PUBLICITY EXPENSES. Formally speaking, the example considered below is the onechannel queueing system with refusals; chosen actions (the publicity costs) affect the intensity of the input flow of requests (orders 17

for production). In actual practice there is a delay between the apportionment of the resources for the publicity and the change in the requests flow. But as a first approximation one can consider the Markov model without delay. Suppose that the controlled stochastic process xt with the values in the state space X = {0, 1} describes providing some firm with orders for production: xt−1 = 0 (xt−1 = 1) means that there are (no) requests in the time interval (t − 1, t]. The action at ∈ A = {0, 1} in the interval (t − 1, t] consists in the apportionment of the resources for the publicity; as a consequence the probability of the arrival of a request before the moment t is equal to λ(at ) ∈ [0, 1]; λ(0) < λ(1). If xt−1 = 1 then all the new requests are lost. Assume that µ ∈ (0, 1) is the known probability of the completion of all the present orders in the interval (t − 1, t] provided that xt−1 = 1. The graphical display is presented in fig. 1.

λ(a) 1 − λ(a)'$  x=0 

yX X &%µ

'$ 1 − µ XX z

x=1

   &%

Figure 1: The transition network of the system.

Lastly, the initial probability distribution is assumed to be given: P0 (0) = p, P0 (1) = 1 − p. Let us consider the discounted model with the given discount

18

factor β ∈ (0, 1) and the transition probability        

p(y|x, a) =       

λ(a) 1 − λ(a) µ 1−µ

if if if if

x = 0, x = 0, x = 1, x = 1,

y y y y

= 1; = 0; = 0; = 1.

First of all we investigate the set Dod of all the discounted occupation measures. Since the set X × A contains four points Dod is the subset of the space R4 . In accordance with Section 6 Dod coincides with the closed convex hull of the set Dodϕ (that is the set of all the discounted occupation measures corresponding to stationary selectors). Obviously there are only four stationary selectors in the model: ϕ00 (x) ≡ 0; ϕ01 (x) = x; ϕ10 (x) = 1 − x ϕ11 (x) ≡ 1. Therefore it is sufficcient to determine the discounted occupation measures νdϕ00 , νdϕ01 , νdϕ10 , and νdϕ11 . The elementary calculations lead to the following values: νdϕ00 (0, 0) = νdϕ01 (0, 0) =

βµ + p(1 − β) = 1 − β + βµ + βλ(0)

1 − νdϕ00 (1, 0) = 1 − νdϕ01 (1, 1); νdϕ10 (0, 1) = νdϕ11 (0, 1) =

βµ + p(1 − β) = 1 − β + βµ + βλ(1)

1 − νdϕ10 (1, 0) = 1 − νdϕ11 (1, 1). All the other components νdϕ00 (0, 1) = νdϕ00 (1, 1) = νdϕ01 (0, 1) = ... = νdϕ11 (0, 0) = νdϕ11 (1, 0) = 0 are zero. The first three coordinates νd (0, 0), νd (0, 1), νd (1, 0) of the four-dimensional vectors νd are presented in fig.2; these three-dimensional vectors form the planar convex quadrangular indicated by the double line. The fourth coordinate νd (1, 1) is determined from the normalization condition. 19

νd (1, 0)

1r

6

νdϕ00r

∗ νdϕ00r ν c r νdϕ10

ϕ10 r νd D CC   D  CC  D  CC   0 D r -r r1- ν (0, 1) !CC d   !! ν ϕ11 d  ! r ! ϕ01 νd (0, 0) r νd 1 

D od C  1q C  C   D od !Cr  q !! ϕ11 νd r!  ! νdϕ01 

Figure 3: Three-dimensional projections of the sets Dod and D1od .

Figure 2: Three-dimensional projections of the sets Dodϕ and Dod .

So the tops of the quadrangular form the display of the set D , and the quadrangular itself is the dispaly of the set Dod . Let us consider now optimal control problems for the model constructed. Suppose that the absence of requests in the interval (t − 1, t] implies the loss c that is odϕ

r(x, a) = δx,0 · c. The solution of the problem X

X×A

r(x, a)ν(x, a) −→ infod ν∈D

(18)

is trivial: one can take the selectors ϕ10 and ϕ11 as well as any mixture of them. Suppose that the publicity expenses in the interval (t − 1, t] equal to at : s˜(x, a) = a. The solution of the problem X

X×A

s˜(x, a)ν(x, a) −→ infod ν∈D

(19)

is also obvious: the selector ϕ00 is the optimal strategy. Clearly the multicriteria control problem (18), (19) is inconsistent. 20

Therefore, let us choose a constant d and solve problem (18) under the constraint X s˜(x, a)ν(x, a) ≤ d. (20) X×A

After renaming s(x, a) = a − d we obtain the standard problem (16), (17) with N = 1 which is an ordinary convex programming problem in this case and can be solved with the help of the Lagrange multipliers method. Let us present the answer. Let (1 − β)S(P ϕ00 ) =

X

s(x, a)νdϕ00 (x, a) = −d;

X×A

βµ + p(1 − β) −d 1 − β + βµ + βλ(1) X×A be the publicity expenses at the strategies ϕ00 and ϕ10 . Clearly if d < 0 then there exist no admissible strategies which meet inβµ + p(1 − β) equality (20). On the other hand if d > then 1 − β + βµ + βλ(1) constraint (20) is inessential: the solution of problem (18) ϕ10 satisfies constraint (20) automatically. Therefore, assume that the constant d meets the inequalities (1 − β)S(P ϕ10 ) =

X

s(x, a)νdϕ10 (x, a) =

0≤d≤

βµ + p(1 − β) . 1 − β + βµ + βλ(1)

Then the nonempty set of admissible plans D1od is the part of the quadrangular presented in fig.2; it is presented in fig.3. The dot and dash straight line is the level curve S(P π ) = 0, that is X s˜(x, a)ν(x, a) = d. The extreme point ν ∗ of the set D1od is

X×A

presented in fig.3, it provides a solution to problem (18), (20): 1 − γ ∗ ϕ00 1 + γ ∗ ϕ10 ν = νd + νd ; 2 2 ∗

S(P ϕ00 ) + S(P ϕ10 ) ∗ γ = ∈ [0, 1]. S(P ϕ00 ) − S(P ϕ10 ) 21

                

(21)

We have obtained a solution in the form of a mixture of two stas tionary selectors. In accordance with Section 6 ν ∗ = νdπ where the stationary strategy π s can be calculated by the following formulae: 1 − η∗ 1 + η∗ s s s π (0|1) = 1; π (0|0) = ; π (1|0) = ; 2 2 η ∗ is the solution of the equation a(η)(βµ + p(1 − β)) = d, 1 − β + βµ + βλ(η) where the functions a(η) and λ(η) are defined by the following expressions: 1+η 1−η 1+η a(η) = ; λ(η) = λ(0) + λ(1). 2 2 2 Remark 4 One can consider the following generalization of the problem investigated: A = [0, 1], λ(a) : [0, 1] → [0, 1] is the given nondecreasing function. If the function λ(·) is convex then the solution of problem (18), (20) is retained with no modifications up to formulae (21); the answer in the form of stationary strategy depends on the specific function λ(·). If the function λ(·) is concave then problem (18), (20) needs a special investigation. √ The particular example λ(a) = a was considered in [5] (the version of the average losses). ACKNOWLEDGEMENTS. This research was partially supported by the Royal Society. Research of the second author was supported by Fund of Fundamental Researchs of Russia, grant 95-01-00191.

References [1] E.B. Dynkin, and A.A. Yushkevich. Controlled Markov processes and their applications. Springer-Verlag, N.Y.-Berlin, 1979. 22

[2] D.P. Bertsekas, and S.E. Shreve. Stochastic optimal control. Academic Press, N.Y-S.Francisco-London, 1978. [3] V.N. Afanasiev, V.B. Kolmanovskii, and V.R. Nosov. Mathematical theory of control systems design. Kluwer Academic Publishers, Dordrecht-Boston-London, 1995. [4] V.S. Borkar. Topics in controlled Markov chains. Longman Scientific and Technical, England, 240, (1991). [5] A.B. Piunovskiy. Optimal control of random sequences in problems with constraints. Kluwer Academic Publishers, Dordrecht-Boston-London, 1997. [6] E.A. Feinberg. On measurability and representation of strategic measures in Markov decision processes. Statistics, Probability and Game Theory Papers in Honor of David Blackwell (ed. T.Ferguson), IMS Notes - Monograph Series, 30, (1996), 29-43. [7] E. Altman. Constrained Markov decision processes with total cost criteria: occupation measures and primal LP. Math. Methods of Oper. Res., 43, (1996), 45-72. [8] P. Billingsley. Convergence of probability measures. J.Wiley and Sons, N.Y.-London-Sydney-Toronto, 1968. [9] P.-A. Meyer. Probability and potentials. Blaisdell, Waltham, Massachusetts-Toronto-London, 1966. [10] E.A. Feinberg. Controlled Markov processes with arbitrary numerical criteria. Theory Probab. Appl., 27, (1982), N.3, 486-503.

23