Numerical Solution to Optimal Feedback Control by Dynamic

J Syst Sci Complex (2017) 30: 782–802

Numerical Solution to Optimal Feedback Control by Dynamic Programming Approach: A Local Approximation Algorithm GUO Bao-Zhu · WU Tao-Tao

DOI: 10.1007/s11424-017-5149-1 Received: 16 June 2015 / Revised: 9 October 2016 c The Editorial Office of JSSC & Springer-Verlag Berlin Heidelberg 2017 Abstract This paper considers optimal feedback control for a general continuous time finite-dimensional deterministic system with finite horizon cost functional. A practically feasible algorithm to calculate the numerical solution of the optimal feedback control by dynamic programming approach is developed. The highlights of this algorithm are: a) It is based on a convergent constructive algorithm for optimal feedback control law which was proposed by the authors before through an approximation for the viscosity solution of the time-space discretization scheme developed by dynamic programming method; b) The computation complexity is significantly reduced since only values of viscosity solution on some local cones around the optimal trajectory are calculated. Two numerical experiments are presented to illustrate the effectiveness and fastness of the algorithm. Keywords Curse of dimensionality, Hamilton-Jacobi-Bellman equation, optimal feedback control, upwind finite difference, viscosity solutions.

1

Introduction

Seeking optimal feedback control law for an optimal control problem is a major issue in optimal control theory. However, for most of optimal control problems, finding the analytic optimal control is formidable. So, numerical solution is only the feasible way to get practically the optimal control law. Theoretically, the celebrated Pontryagin maximum principle[1] and the Bellman dynamic programming method are two effective methods in solving many optimal control problems. The necessary condition provided by the maximum principle results in solving a GUO Bao-Zhu Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China. Email: [email protected]. Wu Tao-Tao Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China. Email: [email protected]. This paper was recommended for publication by Editor CHEN Jie.

NUMERICAL SOLUTION TO OPTIMAL FEEDBACK CONTROL

783

two-point boundary-value problem, which produces powerful multiple shooting method[2] but the multiple shooting method may come up against the difficulty of “initial guess”[3] and the obtained optimal control is usually not in feedback form. In contrast to the maximum principle that deals with only one extremal problem, the dynamic programming method deals with a family of extremals. Once the Hamilton-Jacobi-Bellman (HJB) equation satisfied by the value function is established, the optimal feedback control law can be found in terms of the solution of this first order nonlinear partial differential equation[4] . Unfortunately, the HJB equation may have no classical solution no matter how smooth its coefficients are, which is a well known fact since Pontryagin’s time. To overcome this difficulty, Crandall and Lions[5] introduced viscosity solution for Hamilton-Jacobi (HJ) equation in the 1980s. Two approximations of the solution of HJ equations were provided in [6]. Furthermore, [7] proposed an efficient algorithm for HJ equation in high dimension. Other numerical methods for solving HJ equations, we refer to the ENO/WENO methods[8, 9] , the discontinuous Galerkin method[10, 11] , the viability kernel schemes[12] , the finite difference upwind discretizations (such as the Fast Marching Methods[13] , the Fast Sweeping Methods[14] , and Ordered Upwind Methods[15] ) for isotropic stationary problems. For more researches on the numerical approximation to optimal feedback control via the solutions of HJB equations, we refer to Appendix A of [4] and [16, 17]. Under this weak notion of the “viscosity solution”, the existence and uniqueness of solution to the HJB equation is guaranteed. For finite dimensional problems, the standard methods to solve the viscosity solution of the HJB equation are finite element methods, and the Markov chain approximating method[18] . We refer to [19–22] for studies of the viscosity solution for infinite-dimensional optimal control problems. Different from the above two ways, a new class of methods so called the max-plus-based methods[23] was explored for solving viscosity solution of the HJB equation, and convergence rate and error bound are investigated in [24, 25]. However, as it was indicated in [24, 25] that “although max-plus basis expansion and max-plus finite element methods can provide substantial computational-speed advantages, they still generally suffer from the “curse of dimensionality”. A curse of dimensionality-free algorithm was developed in [23] for the case where the Hamiltonian takes the form of an (pointwise) maximum of linear/qudratic forms. In this paper, we are aiming at developing a practically feasible algorithm to find numerical solution of optimal feedback control for finite-dimensional control systems with finite horizon cost functional based on dynamic programming approach. The algorithm is motivated from a convergent constructive discrete scheme for optimal feedback control law that we developed in [26] through an approximation for viscosity solution for which the convergence is available, and the local approximation method for viscosity solution based on the upwind finite difference method[6, 27, 28] . The highlight of this algorithm lies in that we do not need to compute all values of value function at each point of time-spatial space that are (n + 1)-dimensional. In this sense, the computation complexity is significantly reduced. Some efforts for the dimensionality problem have already been made in [20, 29, 30] but no convergence is proven for the discrete scheme. Let us first recall the problem. For a given T > 0, we consider the following nonlinear

784

GUO BAO-ZHU · WU TAO-TAO

finite-dimensional control system: ⎧ ⎨ y (t) = f (y(t), t, u(t)), ⎩ y(0) = z,

t ∈ (0, T ],

(1)

where u(·) ∈ Δ L∞ ([0, T ]; U) is the control, U ⊂ Rm is a compact subset, z ∈ Rn is the initial value, y(t) ∈ Rn is the state at time t. The cost functional J(·) is defined by J (u(·)) =

T

L(y(t), t, u(t)) dt + ψ(y(T )),

0

(2)

where L defines the running cost, ψ is the final cost. The optimal control problem is to find u∗ (·) ∈ Δ such that J (u∗ (·)) = inf J (u(·)). (3) u(·)∈Δ

Instead of considering only one optimal problem above, we consider, by dynamic programming approach, a family of nonlinear finite-dimensional control systems, that is, each system starts from (x, s) ∈ Rn × [0, T ) such that ⎧ ⎨ y (t) = f (y(t), t, u(t)), t ∈ (s, T ], (4) ⎩ y(s) = x, where u(·) ∈ Δs L∞ ([s, T ]; U), U ⊂ Rm is a compact subset, x ∈ Rn is the initial value, y(t) ∈ Rn is the state at time t. The objective is to find u∗ (·) ∈ Δs that minimizes the cost functional of the following Jx,s (u(·)) =

T

L(y(t), t, u(t)) dt + ψ(y(T )).

(5)

s

The function w(x, s) defined by w(x, s) = Jx,s (u∗ (·)) =

inf

u(·)∈Δs

Jx,s (u(·))

(6)

is called the value function of the optimal control problem of (4)–(6). The dynamic programming principle claims that if w ∈ C 1 (Rn × [0, T ]) then it satisfies the following HJB equation: ⎧ ∂w ⎪ ⎨ − (x, s) − inf {∇x w(x, s) · f (x, s, u) + L(x, s, u)} = 0, (x, s) ∈ Rn × [0, T ), u∈U ∂s (7) ⎪ ⎩ n w(x, T ) = ψ(x), x ∈ R . The following Theorem 1.1 that comes from [26, 31] gives us an insight to the HJB equation.


Theorem 1.1 For Systems (4)–(6), assume that ⎧ ⎪ ⎪ f (x, t, u) − f (y, s, u) ≤ Lf (x − y + |t − s|), ∀ x, y ∈ Rn , t, s ∈ [0, T ], u ∈ U, ⎪ ⎪ ⎪ ⎪ f (x, t, u) ≤ M , f (x, t, ·) ∈ C(U), ∀ x ∈ Rn , t ∈ [0, T ], u ∈ U, ⎪ ⎪ f ⎪ ⎪ ⎪ ⎨ |L(x, t, u) − L(y, s, u)| ≤ L (x − y + |t − s|), ∀ x, y ∈ Rn , t, s ∈ [0, T ], u ∈ U, L ⎪ |L(x, t, u)| ≤ ML , L(x, t, ·) ∈ C(U), ∀ x ∈ Rn , t ∈ [0, T ], u ∈ U, ⎪ ⎪ ⎪ ⎪ ⎪ n ⎪ ⎪ ⎪ |ψ(x) − ψ(y)| ≤ Lψ x − y, ∀ x, y ∈ R , ⎪ ⎪ ⎩ |ψ(x)| ≤ M , ∀ x ∈ Rn , ψ

785

(8)

where Lf , Mf , LL , ML , Lψ , Mψ are positive constants. Then the function w(x, s) defined by (6) belongs to BU C(Rn × [0, T ]), the space of bounded uniformly continuous functions over Rn × [0, T ], and is the unique viscosity solution of the HJB equation (7). Moreover, there exists a constant Lw > 0 such that for any (x, t), (y, s) ∈ Rn × [0, T ], |w(x, t) − w(y, s)| ≤ Lw (x − y + |t − s|). It should be pointed out that our algorithm is developed only for the fixed horizon problems without state restriction, and it is assumed that both the viscosity solution and the optimal feedback control exist and are unique. The investigation for state restriction or the boundary conditions would be the forthcoming study. We proceed as follows. In Section 2, the optimal feedback control law developed in [26] adopted in our algorithm is revisited. A local approximation method for computing viscosity solution along the optimal trajectory based on upwind finite difference method is presented in Section 3. Section 4 is devoted to the development of a new feasible algorithm for the numerical solution of the optimal feedback control. In Section 5, two numerical experiments, including a two-dimensional problem, are demonstrated to illustrate the effectiveness and fastness of the algorithm.

2

Approximation to Optimal Feedback Control: Revisit

In this section, we revisit construction for optimal feedback control law, which comes from [26]. It is noted that this section is only served as a motivation of our algorithm. For each given large positive integer N , subdivide [0, T ] into N equal sub-intervals. Let sj = jh, j = 0, 1, · · · , N, h = T /N < 1 and S = {sj | j = 0, 1, · · · , N }. In this paper, our algorithm will adopt the minimal energy optimal feedback control law given by (1) u∗k (s) = u∗k,j a y j , sj , s ∈ [sj , sj+1 ), j = 0, 1, · · · , N − 1, where y j ≈ y(sj ) denotes the solution of the following time discretization system with control u∗k determined by (1): ⎧ ⎨ y j+1 = y j + hf y j , s , u∗ (s ) , j = 0, 1, · · · , N − 1, j k j (2) ⎩ y0 = z,


786

and a(x, sj ) ∈ A(x, sj ) is any one element of the nonempty subset:

k k A(x, sj ) = v ∈ U | P (x, sj , v) = min P (x, sj , u), v has minimum norm . u∈U

(3)

In (3), P k is defined as P k (x, sj , u) hL(x, sj , u) + whk (x + hf (x, sj , u), sj+1 ) ,

(4)

and whk is a known approximate solution of the viscosity solution. Notice that the minimal norm choice for optimal control is just one of the probable polices which was described in Appendix A of [4]. In our earlier work[26] , we propose a convergent time-space discretization scheme for the HJB equation on Ω × [0, T ] whose solution is denoted by whk , and prove that the optimal feedback control law u∗k defined by (1) is a minimizing sequence of the optimal feedback control of the original control system (1)–(3) under the assumptions of Theorem 1.1 and the standard assumption (see [4]) x + hf (x, s, u) ∈ Ω ,

∀ (x, s, u) ∈ Ω × [0, T ] × U,

(5)

which holds for all sufficiently small h on a closed bounded polyhedron Ω ⊂ Rn . Furthermore, a successful numerical experiment in [26] suggests that the algorithm therein is applicable to some systems for which the conditions of Theorem 1.1 are not necessarily satisfied. Although there is convergence result for approximate viscosity solution whk in [26], one needs to compute all values of the value function at all points of a particular domain that may result in the well known “curse of dimensionality” problem. Therefore, reducing the computation complexity as much as possible becomes the main objective of this paper. In the next section, we shall see that when calculating the optimal feedback control, one does not need all values of the viscosity solution on the whole time-spacial space but only on some local cones around the optimal trajectory instead.

3

Local Approximation of Viscosity Solution

In this section, we shall develop a practically useful algorithm to find the numerical solution of the optimal feedback control that does not need assumptions (5) and (8) but instead, we assume that both viscosity solution and optimal feedback control exist and are unique. In our algorithm, two steps are designed in order to reduce the computational complexity as much as possible. The first step is that we restrict the calculation on some local cone, denoted by Rq , around the optimal trajectory, and the second step is that Rq may not be rebuilt unless it is necessary. 3.1

Build a Local Cone

Note that the optimal feedback control is constructed in terms of the viscosity solution. Once the q-th optimal state is known, in order to calculate the q-th optimal feedback control by (1),


787

one only needs the values of viscosity solution on a local cone of the q-th optimal state. This idea of local approximation for viscosity solution is the first step to reduce the computational complexity. Now, we introduce the local approximation method. The idea is based on the upwind finite difference method[6, 27, 28] . The difference is that the cones on which we solve numerically the HJB equation are chosen in a particular way, that is, we only consider those cones that are around the optimal trajectory. Precisely, in our algorithm we consider the calculation of the viscosity solution and the construction of the optimal feedback control-state pairs simultaneously. And in this computation process, we do not need always to rebuild new local cone at each optimal state unless it is necessary. This important observation can be considered as the second step of reducing the computation complexity. Let us describe the process of local approximation for viscosity solution. Note that y 0 = z ∈ Rn is given. Once y q = (y1q , y2q , · · · , ynq ) (0 ≤ q ≤ N − 1) is known, we could built a local cone N Rq (Dj × [sj , sj+1 )) (1) j=q

with Dj

n

[ypq − (j − q)Δxp , ypq + (j − q)Δxp ],

j = q, q + 1, · · · , N,

(2)

p=1

on which we consider the following HJB equation: ⎧

n ⎪ ⎪ ⎪ wxp (x, s)fp (x, s, u) + L(x, s, u) = 0, ⎨ −ws (x, s) − inf u∈U

⎪ ⎪ ⎪ ⎩ w(x, T ) = ψ(x),

(x, s) ∈ Rq , (3)

p=1

x ∈ DN ,

where Δxp is a small size of the space mesh along the xp axis. Notice that Dq ⊂ Dq+1 ⊂ · · · ⊂ DN , and DN is the largest space domain that we need to consider the spacial variable on Rq . We remark that the local cone Rq determined only by the vertex (y q , sq ) and the mesh size Δxp is specially designed to reduce the computation complexity as much as possible. For numerical computation, we need to discretize the continuous cone Rq . Let i = (i1 , i2 , · · · , in ), ip+ = (i1 , i2 , · · · , ip−1 , ip + 1, ip+1 , · · · , in ) and ip− = (i1 , i2 , · · · , ip−1 , ip − 1, ip+1 , · · · , in ). For the time-space discretization, assume that sj = jh for j = 0, 1, · · · , N with h = T /N , and xp,ip = ypq + [ip − (N − q)]Δxp ,

p = 1, 2, · · · , n, ip = 0, 1, · · · , 2(N − q).

(4)

788


We approximate (3) by the following upwind finite difference scheme: ⎧ ⎡ ⎤ j+1 j+1 j+1 j+1 j+1 j+1 n ⎪ 1 − sign f 1 + sign f ⎪ w − w w − w p,i p,i + − ⎪ i ip ⎦ j+1 ip j+1 i ⎪ ⎣ ⎪ fp,i fp,i + ⎪ ⎪ 2 Δx 2 Δx p p ⎪ p=1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ wij+1 − wij ⎪ ⎪ + Lj+1 = 0, + ⎪ i ⎪ h ⎪ ⎪ ⎨

n j+1 j+1 1 + sign (fp (xi , sj+1 , u)) wip + − wi j+1 f ∈ arg inf (x , s , u) u ⎪ p i j+1 i ⎪ u∈U ⎪ 2 Δxp ⎪ p=1 ⎪ ⎪ ⎪ ⎪ j+1 j+1 ⎪ ⎪ − w w ⎪ − (x , s , u)) 1 − sign (f i p i j+1 ip ⎪ ⎪ fp (xi , sj+1 , u) + L(xi , sj+1 , u) , + ⎪ ⎪ 2 Δxp ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ wN = ψ(x ) i i

(5)

for ip = N − (j − q), N − (j − q) + 1, · · · , N, · · · , N + (j − q), p = 1, 2, · · · , n and j = N − 1, N − 2, · · · , q, where sign(a) denotes the sign of a ∈ R and j+1 fp,i , Lj+1 , xi = (xi1 , xi2 , · · · , xin ) . = fp xi , sj+1 , uj+1 = L xi , sj+1 , uj+1 i i i Note that (5) is actually the semi-Lagrangian in nature. In fact, it is simply a variant of Falcone’s semi-Lagrangian scheme[32] . By (5), one could get values of the approximate viscosity solution on hyper-pyramidal region Hq , the set of knot points of Rq defined by Hq

N

j × {sj } , D

(6)

j=q

with j {(x1,i1 , x2,i2 , · · · , xn,in ) | ip = 0, 1, · · · , 2(N − q), p = 1, 2, · · · , n} ⊂ Dj , D

(7)

where {xp,ip } are defined by (4). By the piecewise linear interpolation, we can expand the values of approximate viscosity solution on Hq to the continuous cone Rq , and use them to calculate the q-th optimal feedback control. Actually, as long as Δxp is suitably chosen, for example, when Δxp ≥ h max {f (y q , sq , u) } , p = 1, 2, · · · , n, (8) u∈U

then all the next possible states will stay in the current local cone, that is, y q+1 (u) = y q + hf (y q , sq , u) ∈ Dq+1 for all u ∈ U, which enables us to obtain the q-th optimal feedback control uq via (1)–(4). It is seen from above that the key step for calculating the q-th optimal feedback control uq from the known q-th optimal state y q is to build a local cone Rq on which we need to calculate the viscosity solution of the HJB equation (instead of the whole time-spacial space), which is the first step to reduce significantly the computation complexity.


789

We remark that the localization idea is not new. From practical point of view, a similar idea has been used in [33]. To end this subsection, we indicate the stability of the explicit difference scheme (5) on Hq ⊂ Rq . These sufficient stability conditions for (5) were given in [28]. By a simple calculation, one has 1 h ≤ n , (9) 1 p=1 Δxp fp ∞ where fp ∞ max(x,s,u)∈Rq ×U {|fp (x, s, u)|}. If Δx = Δxp , p = 1, 2, · · · , n, then (9) becomes Δx ≥

n

fp ∞ · h.

(10)

p=1

Particularly, in the case of n = 1, it is indicated in [28] that (10) can be relaxed as Δx ≥ 3.2

f ∞ · h. 2

(11)

Rebuild a Local Cone if Necessary

It is noticed that our aim is to make use of the obtained data on the local cone Rq as many as possible. When f is bounded, the stability condition (9) guarantees (8), and also implies y q+i (u) ∈ Dq for all i > 0 as well. But when f is unbounded, rebuilding Rq becomes necessary since the state driven by f will run out of the cone Rq by the fact that f will be truncated technically in the construction of Rq (see the Step 2 of Section 4). To reduce the burden of computation, the idea is to rebuilt Rq when it is necessary. Let us explain the second step of reducing the computation complexity in the above process of calculating the local approximate viscosity solution and constructing the optimal feedback control simultaneously. Notice that the (q + 1)-th optimal state y q+1 is obtained from the q-th optimal feedback control uq through (2). Once this new optimal state y q+1 is produced, one should usually rebuild a new local cone Rq+1 with vertex y q+1 , sq+1 in the same way, and simply repeat the above process to calculate the q + 1-th optimal feedback control. This consideration is natural and feasible. However, this is not always necessary as mentioned before. Actually, whether rebuilding a new local cone Rq+1 or not at the vertex y q+1 , sq+1 for calculating the (q + 1)-th optimal feedback control depends on the following observation: If all the next possible states lie in the current local cone Rq , i.e., y q+2 (u) = y q+1 + hf y q+1 , sq+1 , u ∈ Dq+2 for all u ∈ U, then just set Rq+1 = Rq . This case may happen sometimes. If it happens, one can use the known data on the current local cone Rq to calculate the (q + 1)-th optimal feedback control without calculating the viscosity solution at this q + 1 step. Otherwise, one needs to rebuild a new local cone Rq+1 . This observation enables us to make use of the information on each local cone as much as possible once it has been built. More importantly, if the new local cone is indeed necessary to be rebuilt, then the data obtained before this new cone are useless anymore and can be thrown away. This is why through the two steps above, the computation complexity is reduced significantly.


790

Now, we are in a position to describe the process concretely. Since the initial state y 0 is given, by the construction of R0 , all the next possible states y 1 (u) = y 0 + hf y 0 , s0 , u ∈ D1 for all admissible u ∈ U. So we can get the 0-th optimal feedback control u0 from the first step of reducing computation complexity described above and so is for the optimal state y 1 through (1) and (2), respectively. Suppose that we have produced the q-th optimal feedback control uq in the cone Rq , then we can get optimal state y q+1 through (1) and (2). If y q+2 (u) = y q+1 + hf y q+1 , sq+1 , u ∈ Dq+2 for all admissible u ∈ U, then we just set Rq+1 = Rq . By the same principle, we can get the maximum q0 ≥ 0 such that Rq+q0 = Rq+q0 −1 = · · · = Rq , and all optimal feedback controls (uq , uq+1 , · · · , uq+q0 ) and optimal states (y q+1 , y q+2 , · · · , y q+q0 +1 ) through (1) and (2), but there exists at least one u ∈ U such that y q+q0 +2 (u) = y q+q0 +1 + hf y q+q0 +1 , sq+q0 +1 , u ∈ Dq+q0 +2 . We then build a new local cone Rq+q0 +1 and continue the process till we find all N optimal feedback control-state pairs {(uq , y q )}q=0 .

4

Algorithm

In this section, we present an algorithm which summarizes the procedure for optimal feedback control approximation developed in previous sections. Step 1 Initial settings. For a large integer N , let Δt = h = T /N, sj = jh, y 0 = z, q = 0. Step 2 Building a new local cone Rq . Notice that Rq and its discretization Hq are determined completely by y q and Δxp through (1)–(2) and (6)–(7). Since y q is known (y 0 = z is initially given), we only need to choose Δxp . Set Δx = Δxp = hMq , p = 1, 2, · · · , n, where Mq should be chosen technically to satisfy the sufficient stability condition (10) (or (11) when n = 1) Δx = hMq ≥

n

fp ∞ · h,

p=1

where fp ∞ = sup(x,s,u)∈Rq ×U |fp (x, s, u)|. Suppose that xp,ip = ypq + [ip − (N − q)]Δxp , ip = 0, 1, · · · , 2(N − q) are the 2(N − q) + 1 equal partition points of interval [ypq − (N − q)Δxp , ypq + (N − q)Δxp ] on the xp axis, p = 1, 2, · · · , n. Notice that since h is given, the computation cone Rq could be determined technically by some iterative process. We give one of them as follows. n q Let Mq (0) = p=1 fp sup,0 with fp sup,0 = sup(s,u)∈[sq ,T ]×U |fp (y , s, u)|. By setting Δx(0) = Δxp (0) = hMq (0), we could get a local cone Rq (0) via (1)–(2). Then let Mq (1) = n p=1 fp sup,1 with fp sup,1 = sup(x,s,u)∈Rq (0)×U |fp (x, s, u)| and set Δx(1) = Δxp (1) = hMq (1). If Mq (1) = Mq (0) (or equivalently fp sup,1 = fp sup,0 ) then stop. Otherwise, repeat the above process. Notice that this procedure may not be convergent if f becomes unbounded when x → ∞, and in this case the stability conditions would not be valid. However, since we are only interested in the viscosity solution in a neighborhood cone of the optimal state, we can simply truncate f technically when x is beyond the local cone. For


example, when n = 1, we could replace f by ⎧ ⎪ f (−a, s, u), x ≤ −a, (s, u) ∈ U × [0, T ], ⎪ ⎪ ⎪ ⎨ f (x, s, u) = f (x, s, u), x ∈ [−a, b], (s, u) ∈ U × [0, T ], ⎪ ⎪ ⎪ ⎪ ⎩ f (b, s, u), x ≥ b, (s, u) ∈ U × [0, T ].

791

(1)

Step 3 Calculating viscosity solution w on Hq . For j = N − 1 to q, do For ip = 0 to 2(N − q), p = 1, 2, · · · , n, do ⎧ ⎪ wiN = ψ(xi ), i = (i1 , i2 , · · · , ip , · · · , in ), ⎪ ⎪ ⎪ ⎪

n ⎪ ⎪ j+1 j+1 ⎪ 1 + sign (fp (xi , sj+1 , u)) wip ⎪ + − wi j+1 ⎪ ⎪ fp (xi , sj+1 , u) ui ∈ arg inf ⎪ ⎪ u∈U 2 Δxp ⎪ ⎪ p=1 ⎪ ⎪ ⎪ ⎪ j+1 ⎪ ⎪ wij+1 − wip − 1 − sign (fp (xi , sj+1 , u)) ⎪ ⎪ ⎪ + L(xi , sj+1 , u) , + fp (xi , sj+1 , u) ⎪ ⎪ 2 Δxp ⎪ ⎨ j+1 j+1 j+1 ⎪ fp,i , p = 1, 2, · · · , n, L , = fp xi , sj+1 , uj+1 = L x , s , u i j+1 ⎪ i i i ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ n n 1 + sign f j+1 ⎪ ⎪ p,i h h j+1 j+1 ⎪ j j+1 j+1 ⎪ w = 1 − + fp,i wip+ w f ⎪ i p,i i ⎪ Δx 2 Δx ⎪ p p ⎪ p=1 p=1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ n 1 − sign f j+1 ⎪ ⎪ p,i h j+1 j+1 ⎪ ⎪ − fp,i wip− + hLj+1 , ⎪ i ⎩ 2 Δx p p=1 end do ip , p = 1, 2, · · · , n. end do j Step 4 Calculating optimal feedback control-state pair. This step will be implemented only when all the next possible states lie in the current local cone, i.e., y q+1 (u) = y q + hf (y q , sq , u) ∈ Dq+1 for all u ∈ U. We remark that this always holds true for a newly built local cone Rq due to the choices of Mq and Δx. To calculate the q-th optimal feedback control via (1), one should know the values of viscosity solution on the set of points y q+1 (u), sq+1 | u ∈ U . Actually, one could easily get the values of viscosity solution on such points just by the linear interpolation of the ones on Hq obtained in previous step, and then the q-th optimal feedback control by (1). To do this, we first find the hyper-cubic in which y q+1 (u) lies using a searching technique. Denote the 2n vertices of this hyper-cubic by vl , l = 1, 2, · · · , 2n . Consider the convex linear combination of y q+1 (u) by these vertices 2n 2n q+1 l l y (u) = λ (u)vl with λ (u) ≥ 0 and λl (u) = 1. l=1

l=1


792

Discretize the control domain U as a finite set Ud = {ul , l = 1, 2, · · · } at a given size of controlspace mesh d > 0, and calculate

W y

q+1

(u), sq+1 =

2 n

λl (u)w(vl , sq+1 ),

l=1

P k (y q , sq , u) = hL (y q , sq , u) + W y q+1 (u), sq+1 for all u ∈ Ud . With these values of viscosity solution, we can then construct the optimal feedback control. Set k k = min P k (y q , sq , u) , A = u ∈ Ud | P k (y q , sq , u) = Pmin , u has minimum norm . Pmin u∈Ud

Choose any one element a of A to define the q-th optimal feedback control uq = u(sq ) = a, which drives the control system to the q + 1-th optimal state y q+1 = y(sq+1 ) = y q + hf (y q , sq , uq ). Let q = q + 1. If q = N , set uq = uq−1 , and end the procedure; otherwise, goto Step 5. Step 5 Rebuilding a new local cone or not? This depends on whether or not all the next possible states lie in the current local cone. If q+1 y (u) ∈ Dq+1 for all u ∈ Ud , then set Rq = Rq−1 and goto Step 4 to continue the calculation of the next optimal feedback control and state; otherwise, goto Step 2 to rebuild a new local cone starting at (y q , sq ), on which we could repeat the above calculation process. Continuation N of iteration for the index q yields all the optimal feedback control-state pairs {(uq , y q )}q=0 .

5

Numerical Experiments

To validate the effectiveness and fastness of the algorithm presented in Section 4, two numerical experiments are carried out on the same computer. The Visual C++ and Matlab are used as our programming languages. Example 5.1

⎧ ⎪ y (t) = 2(1 − y(t))u(t), t ∈ (0, 1], ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ y(0) = z, 1 ⎪ ⎪ |1 − y(t)|(1 + t)2 u(t)2 dt + 2|1 − y(1)|, J(u(·)) = ⎪ ⎪ ⎪ 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ min J(u(·)), ∗

(1)

u(·)∈Δ

where Δ∗ L∞ ([0, 1]; U), U [0, 1], and z ∈ R is a given initial value. The above 1-d optimal control problem was considered in [26, 34] also. The corresponding HJB equation of (1) is given as ⎧ ⎪ ⎨ −ws (x, s) − inf ∇x w(x, s) · 2(1 − x)u + |1 − x|(1 + s)2 u2 = 0, (x, s) ∈ R × [0, 1), u∈U

⎪ ⎩ w(x, 1) = 2|1 − x|,

x ∈ R, (2)

793


which admits a unique viscosity solution according to the uniqueness theorem (see [35]). It is easy to check that the function w ∈ C(R × [0, 1]) defined by w(x, s) = |1 − x|(1 + s),

(x, s) ∈ R × [0, 1]

(3)

is a viscosity solution of (2). Notice that when z = 1, the control problem (1) becomes a trivial case that y(u(·), ·) ≡ 1 on [0, 1] and J(u(·)) = 0 for any control function u(·) ∈ Δ∗ . We only consider the case of z = 1. It is easy to check that the unique optimal feedback control u∗ (·) and the corresponding optimal trajectory y ∗ (·) of the system (1) are given analytically by u∗ (z, t) = u∗ (y ∗ (z, t), t) = and y ∗ (z, t) =

1 , 1+t

t2 + 2t + z , (1 + t)2

t ∈ [0, 1],

(4)

t ∈ [0, 1].

(5)

To apply our algorithm to this example, we should truncate f technically and determine a suitable Mq and then Rq as well by an iteration process in order to ensure that the sufficient stability condition (11) is satisfied on Rq . This is because that f is unbounded as it is stated in Step 2 of Section 4. One of such solutions is to let Mq = N 1/2 supu∈U f (y q , sq , u), and truncate f with a = b = Mq in (1) where N = 40. Applying our algorithm stated in Section 4, we obtain numerically the optimal feedback control-state pairs for the control problem (1). The results are plotted in Figures 1–2 for two different initial values (z = 10 and z = 0.99). Numerical and Exact Optimal Feedback Control when z=10 1 Numerical Exact

0.95 0.9 0.85

u

0.8 0.75 0.7 0.65 0.6 0.55 0.5

0

0.1

0.2

0.3

0.4

0.5 t

(a)

0.6

0.7

0.8

0.9

1


794

Numerical and Exact Optimal Trajectory when z=10 10 Numerical Exact 9

8

y

7

6

5

4

3

0

0.1

0.2

0.3

0.4

0.5 t

0.6

0.7

0.8

0.9

1

(b) Figure 1

Numerical and analytical solutions of optimal feedback control and trajectory functions with z = 10

Numerical and Exact Optimal Feedback Control when z=0.99 1 Numerical Exact

0.95 0.9 0.85

u

0.8 0.75 0.7 0.65 0.6 0.55 0.5

0

0.1

0.2

0.3

0.4

0.5 t

(a)

0.6

0.7

0.8

0.9

1

795

NUMERICAL SOLUTION TO OPTIMAL FEEDBACK CONTROL Numerical and Exact Optimal Trajectory when z=0.99 0.999 Numerical Exact 0.998

0.997

0.996

y

0.995

0.994

0.993

0.992

0.991

0.99

0

0.1

0.2

0.3

0.4

0.5 t

0.6

0.7

0.8

0.9

1

(b) Figure 2

Numerical and analytical solutions of optimal feedback control and trajectory functions with z = 0.99

Table 1 The computed errors in the maximum norm between the numerical solutions and analytical solutions for u∗ and y ∗ Error

control u∗

trajectory y ∗

z = 10

3.56e-2

4.35e-2

z = 0.99

1.11e-2

1.31e-4

Figures 1–2 display the numerical and analytical solutions of the optimal feedback control and trajectory functions (4) and (5) with the initial state z = 10 and z = 0.99, respectively. In Table 1, we list the computed errors in the maximum norm: e(·) maxt∈[0,T ] |e(t)|, between the numerical solutions and analytical solutions of u∗ and y ∗ . For this 1-d problem, the calculation only costs about one second while in [26], the calculation time for the case z = 0.99 is about 78 seconds. More importantly, the numerical solutions are very close to the analytical ones, which illustrates the effectiveness and fastness of the algorithm. Our next example is a 2-d optimal control problem coming from the well-known time optimal


796 control problem:

⎧ x˙ 1 (t) = x2 , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x˙ 2 (t) = u(t), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ x1 (0) = 4, x2 (0) = −1, ⎪ tf ⎪ ⎪ ⎪ ⎪ J(u(·)) = dt = tf , ⎪ ⎪ ⎪ 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ min J(u(·)),

x1 (tf ) = x2 (tf ) = 0,

(6)

u(·)∈Δ∗

where Δ∗ L∞ ([0, tf ], U) and U [−1, 1]. This problem has a unique solution (see [36, p.16]) √ t∗f = 3 2 − 1 = 3.242 · · · , u∗ (t) =

⎧ ⎪ ⎨ −1, ⎪ ⎩

1,

√ 3 2−2 t ∈ [0, t1 ), t1 = = 1.121 · · · , 2 t∈

(7)

[t1 , t∗f ],

which is a bang-bang control. It is well-known that the optimal feedback control of (6) can be found analytically as: ⎧ 1 1 ⎪ ⎨ 1, when x1 + x2 |x2 | < 0, or x1 + x2 |x2 | = 0, x2 < 0, 2 2 u∗ (x1 , x2 ) = (8) ⎪ 1 ⎩ −1, when x + x |x | > 0, or x + 1 x |x | = 0, x > 0, 1 2 2 1 2 2 2 2 2 or in the simple form:

1 u (x1 , x2 ) = −sign x1 + x2 |x2 | . 2 ∗

√ Let T = t∗f = 3 2 − 1 be fixed. We construct a 2-d optimal control problem of Example 5.2 following. Example 5.2 Consider the optimal control problem: ⎧ ⎪ x˙ 1 (t) = x2 , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x˙ 2 (t) = u(t), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ x1 (0) = 4, x2 (0) = −1, ⎪ T ⎪ ⎪ 1 ⎪ ⎪ J(u(·)) = − |u(t)|2 dt − ⎪ ⎪ 2 (T ) + x2 (T ) , ⎪ 1 + x 0 ⎪ 1 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ min J(u(·)), u(·)∈Δ∗

where Δ∗ L∞ ([0, T ], U) and U [−1, 1].

(9)

797


The optimal trajectory of control ⎧ ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ∗ ⎪ x1 (t) = ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎨

problem (9) can be found analytically as 1 − t2 − t + 4, t ∈ [0, t1 ), 2 √ 1 [t − (3 2 − 1)]2 , t ∈ [t1 , T ], 2

(10)

⎧ ⎪ ⎪ ⎪ ⎪ t ∈ [0, t1 ), ⎪ ⎨ −t − 1, ⎪ ⎪ ⎪ ∗ ⎪ x (t) = √ ⎪ 2 ⎪ ⎪ ⎪ ⎩ t − (3 2 − 1), t ∈ [t1 , T ], ⎩ and the optimal feedback control given by (8) now becomes 1 u∗ (t) = u∗ (x∗1 (t), x∗2 (t)) = −sign x∗1 (t) + x∗2 (t)|x∗2 (t)| 2 where t1

√ 3 2 2

=

⎧ ⎪ ⎨ −1,

t ∈ [0, t1 ),

⎪ ⎩ 1,

t ∈ [t1 , T ],

(11)

− 1. Obviously, for any admissible control u = u∗ , it has J(u(·)) > −(T + 1) = J(u∗ (·)),

where u∗ is defined by (11). So the bang-bang control u∗ is the unique solution to the optimal control problem (9). Notice that f is also unbounded in this example. Similar to Example 1, we can find technically one of such suitable solutions, i.e., Mq = 2.2 supu∈U f (y q , sq , u), where N = 159. Applying the algorithm, we obtain numerically the optimal feedback control-state pairs of control problem (9). The results are plotted in Figure 3. Numerical and Exact Optimal Feedback Control 1 Numerical Exact

0.8 0.6 0.4

u

0.2 0 −0.2 −0.4 −0.6 −0.8 −1

0

0.5

1

1.5

t

(a)

2

2.5

3

3.5


798

Numerical and Exact Optimal Trajectory 0 Numerical Exact

−0.5

x2

−1

−1.5

−2

−2.5 −0.5

0

0.5

1

1.5

x1

2

2.5

3

3.5

4

(b) Figure 3

Numerical and analytical solutions of optimal feedback control and trajectory functions when N = 159

Figure 3 displays the numerical and analytical solutions of the optimal feedback control and trajectory functions (11) and (10). In Table 2, we list the computed errors in the maximum norm between the numerical solutions and analytical solutions of u∗ and (x∗1 , x∗2 ). For this 2-d control problem, the calculation only costs about one minute. It is seen also that the numerical solutions are very close to the analytical ones, which shows again the effectiveness and fastness of the algorithm. Table 2 The computed errors in the maximum norm between the numerical solutions and analytical solutions for u∗ and (x∗1 , x∗2 ) when N = 159 Error

control u∗

trajectory x∗1

trajectory x∗2

Measured on [0, T ]

0

1.17e-2

6.96e-4

It should be pointed out that the analytical optimal feedback control is discontinuous and in fact it jumps at t1 and in the above simulation N = 159 was chosen to make the jump point t1 being nearly a time node of the discretization scheme. With this choice, all the errors are very small. However, this is only a technical problem in simulation. For a different choice of N , the result is still satisfactory. In this case, the jump point t1 may lie between a time sub-interval I [sk , sk+1 ] for some k. We may then divide I into two small parts I1 [sk , t1 ) and I2 [t1 , sk+1 ]. In the simulation, the errors between the numerical solution and analytical

799


solution for the optimal trajectory (x∗1 , x∗2 ) are always small. For the optimal feedback control u∗ , the error is as large as the jump height 2 measuring in maximum norm over the whole time interval [0, T ]. However, the large error is reasonable due to the jump property. Precisely, the numerical solution of optimal control is a constant on the time sub-interval I and would be either −1 or 1, while the analytical one is constant −1 on I1 and 1 on I2 . Therefore, the error of 2 only occurs on a very small time interval I3 , which is either I1 or I2 . The error for optimal control is small on [0, T ]\I3 . To illustrate this situation, we use N = 140 for the simulation. We choose technically Mq = 2.05 supu∈U f (y q , sq , u). In this case, t1 ≈ 1.1213 ∈ I [s48 , s49 ] = [1.1118, 1.1349] and I1 [s48 , t1 ) = [1.1118, 1.1213) and I2 [t1 , s49 ] = [1.1213, 1.1349]. The error of 2 occurs only on I1 . The simulation results and errors are shown in Figure 4 and Table 3, respectively. Table 3 The computed errors in the maximum norm between the numerical solutions and analytical solutions for u∗ and (x∗1 , x∗2 ) when N = 140 Error

control u∗

trajectory x∗1

trajectory x∗2

On [0, T ] = [0, 3.2426]

2

2.88e-2

1.91e-2

On I1 = [s48 , t1 ) = [1.1118, 1.1213)

2

-

-

On [0, T ]\I1

0

-

-

Numerical and Exact Optimal Feedback Control 1 Numerical Exact

0.8 0.6 0.4

u

0.2 0 −0.2 −0.4 −0.6 −0.8 −1

0

0.5

1

1.5

t

(a)

2

2.5

3

3.5


800

Numerical and Exact Optimal Trajectory 0.5 Numerical Exact 0

x2

−0.5

−1

−1.5

−2

−2.5

0

0.5

1

1.5

2 x1

2.5

3

3.5

4

(b) Figure 4

6

Numerical and analytical solutions of optimal feedback control and trajectory functions when N = 140

Concluding Remarks and Future Works

Motivated by the convergent results for approximation of optimal feedback control, we design, in this paper, a simple but fast algorithm that can be easily applied in practise to get the numerical solution of the optimal feedback control for the fixed horizon problems without state restriction provided that both the viscosity solution and the optimal feedback control exist and are unique. The central idea is to calculate the viscosity solution of the associated HJB equation only on some local cones around the optimal trajectory instead of the whole time-spatial space. The obtained computation data are used sufficiently as much as possible and will be thrown away once the new local cone is needed necessarily. The advantages of effectiveness and fastness of the algorithm are illustrated by two numerical experiments including one 2-d problem. It is worthwhile investigating the convergence property of this algorithm and the relationship between the algorithm designed in this paper and the existing simple algorithms such as the one in [30] for which no convergence result is available up to present. And the algorithm of [30] was later proved in [37] to be not in feedback form but formally, looks like an open-loop feedback control[38] .


801

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

[10] [11] [12] [13]

[14] [15] [16] [17]

[18] [19]

Sussmann H J and Willems J C, 300 years of optimal control: From the brachystochrone to the maximum principle, IEEE Controls Systems Magazine, 1997, 17: 32–44. Stoer J and Bulirsch R, Introduction to Numerical Analysis, 2nd ed., Springer-Verlag, New York, 1993. Brysun Jr A E, Optimal Control — 1950 to 1985, IEEE Control Systems Magazine, 1996, 13: 26–33. Bardi M and Dolcetta I C, Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Birkh¨ auser, Boston, 1997. Crandall M G and Lions P L, Viscosity solutions of Hamilton-Jacobi equations, Tran. Amer. Math. Soc., 1983, 277: 1–42. Crandall M G and Lions P L, Two approximations of solutions of Hamilton-Jacobi equations, Math. Comp., 1984, 43: 1–19. Carlini E, Falcone M, and Ferretti R, An efficient algorithm for Hamilton-Jacobi equations in high dimension, Comput. Vis. Sci., 2004, 7: 15–29. Osher S and Shu C W, High order essentially non-oscillatory schemes for Hamilton-Jacobi equations, SIAM J. Numer. Anal., 1991, 28: 907–922. Zhang Y T and Shu C W, Third and fourth order weighted ENO schemes for Hamilton-Jacobi equations on 2D unstructured meshes, “Hyperbolic Problems: Theory, Numerics, Applications”, Eds. by Hou T Y and Tadmor E, Springer-Verlag, Berlin, 2003, 941–950. Hu C and Shu C W, A discontinuous Galerkin finite element method for Hamilton-Jacobi equations, SIAM J. Sci. Comput., 1999, 21: 666–690. Cheng Y and Shu C W, A discontinuous Galerkin finite element method for directly solving the Hamilton-Jacobi equations , J. Comput. Phys., 2007, 223: 398–415. Aubin J P and Frankowska H, The viability kernel algorithm for computing value functions of infinite horizon optimal control problems, J. Math. Anal. Appl., 1996, 201: 555–576. Carlini E, Cristiani E, and Forcadel N, A non-monotone fast marching scheme for a HamiltonJacobi Equation modelling dislocation dynamics, “Numerical Mathematics and Advanced Applications”, Springer-Verlag, Berlin, 2006, 723–731. Tsai Y, Cheng L T, Osher S, et al., Fast sweeping algorithms for a class of Hamilton-Jacobi equations, SIAM J. Numer. Anal., 2003, 41: 673–694. Sethian J A and Vladimirsky A, Ordered upwind methods for static Hamilton-Jacobi equations, Proc. Natl. Acad. Sci. USA, 2001, 98: 11069–11074. Dupuis P and Szpiro A, Convergence of the optimal feedback policies in a numerical method for a class of determinstic optimal control problems, SIAM J. Control and Optim., 2001, 40: 393–420. Falcone M, Some remarks on the synthesis of feedback controls via numerical methods, “Optimal Control and Partial Differntial Equations”, Eds. by Menaldi J L, Rofman E, and Sulem A, IOS Press, 2001, 456–465. Kushner H J and Dupuis P G, Numerical Methods for Stochastic Control Problems in Continuous Time, Springer-Verlag, Berlin, 1992. Barron E N, Application of viscosity solutions of infinite-dimensional Hamilton-Jacobi-Bellman equations to some problems in distributed optimal control, J. Optim. Theory and Appl., 1990, 64: 245–268.

802


[20] Guo B Z and Sun B, Numerical solution to the optimal birth feedback control of a population dynamics: Viscosity solution approach, Optim. Control Appl. Meth., 2005, 26: 229–254. [21] Kocan M and Soravia P, A viscosity approach to infinite-dimensional Hamilton-Jacobi equations arising in optimal control with state constraints, SIAM J. Control Optim., 1998, 36: 1348–1357. [22] Yong J M and Zhou X Y, Stochastic Controls: Hamiltonian Systems and HJB Equations, Springer, New York, 1999. [23] McEneaney W M, Max-Plus Methods for Nonlinear Control and Estimation, Birkhauser, Boston, 2006. [24] McEneaney W M, Convergence rate for a curse-of-dimensionality-free method for Hamilton-JacobiBellman PDEs represented as maxima of quadratic forms, SIAM J. Control Optim., 2009, 48: 2651–2685. [25] McEneaney W M and Kluberg L J, Convergence rate for a curse-of-dimensionality-free method for a class of HJB PDEs, SIAM J. Control Optim., 2009, 48: 3052–3079. [26] Guo B Z and Wu T T, Approximation of optimal feedback control: A dynamic programming approach, J. Global Optim., 2010, 46: 395–422. [27] Peyret R and Taylor T D, Computational Methods for Fluid Flow, Springer-Verlag, New York, 1983. [28] Wang S, Gao F, and Teo K L, An upwind finite-difference method for the approximation of viscosity solutions to Hamilton-Jacobi-Bellman equations, IMA J. Math. Control and Inform., 2000, 17: 167–178. [29] Guo B Z and Sun B, Numerical solution to the optimal feedback control of continuous casting process, J. Global Optim., 2007, 39: 171–195. [30] Guo B Z and Sun B, A new algorithm for finding numerical solutions of optimal feedback control, IMA J. Math. Control and Inform., 2009, 26: 95–104. [31] Crandall M G, Ishii H, and Lions P L, User’s guide to viscosity solutions of second order partial differntial equations, Bull. Amer. Math. Soc., 1992, 27: 1–67. [32] Falcone M and Ferretti R, Discrete time high-order schemes for viscosity solutions of HamiltonJacobi-Bellman equations, Numer. Math., 1994, 67: 315–344. [33] Drake D, Xin M, and Balakrishnan S N, A new nonlinear control technique for ascent phase of reusable launch vehicles, AIAA J. Guidance, Control, and Dynamics, 2004, 27: 938–948. [34] Wu T T and Guo B Z, A neighborhood approximation algorithm for the numerical solution of optimal feedback control, Proc. 4th Int. Conf. Optimization and Control with Applications, June 6–11, Harbin and Wudalianchi, China, 2009, 515–525. [35] Yong J M, Dynamic Programming Principle and Hamilton-Jacobi-Bellman Equations, Shanghai Scientific and Technical Publishers, Shanghai, 1992 (in Chinese). [36] Knowles G, An Introduction to Applied Optimal Control, Academic Press, New York, 1981. [37] Guo B Z and Sun B, Numerical solution of the optimal control for two types of drug therapies of HIV/AIDS, Optim. Eng., 2014, 15: 119–136. [38] Westphal L C, Handbook of Control Systems Engineering, Kluwer Academic Publishers, Boston, 2001.