Optimal Control and Simple Continuous Cash ...

4 downloads 0 Views 190KB Size Report
Ziyu Guo. May 2016. 1 Introduction to Optimal Control. Optimal control theory aims to solve optimization problems. It is an exten- sion of the classical calculus of ...
Optimal Control and Simple Continuous Cash Balance Problem Ziyu Guo May 2016

1

Introduction to Optimal Control Optimal control theory aims to solve optimization problems. It is an exten-

sion of the classical calculus of variations. Control theory deals with control systems or systems with inputs. A control system will be generally presented as a differential equation,

x˙ = f (x, u, t)

where x(t) is the state variable of the system; it is an n-column vector (x1 , x2 , ..., xn )T . The control variable of the system is u(t); it is an m-column vector. Time t is the independent variable. The control variables u(t) has to be admissible controls. An admissible control u(t) has to be restricted by two rules; it must be a measurable function and it belongs to a set U . For a admissible control u(t), it does not have to be a continuous function; it can be piecewise-continuous or piecewise-differentiable. For the specified set U , it is closed and bounded control region, and it is a set in the space of the variables u1 , u2 , ..., ur . It is the set of the control parameters, as temperature, break force, or car fuel, etc. 1

The control system could also be time-invariant, and in this case the system is autonomous. If x(·) and u(·) is optimal for an autonomous control system, x(· + T ) and u(· + T ) will be an optimal pair as well. Moreover, there exists a special type of control variable called a bang-bang controller. The optimal value of a bang-bang controller can only be chosen from two extreme values of the restricted control region, and it can achieve two different states of the system by simply switching between two extreme values. Normally, if the Hamiltonian function, which is constructed based on Pontryagin’s maximum principle, is linear in one control variable, then we may confirm that this control variable is bang-bang. Consider a control system,

x˙ = f (x, u, t)

with u(t) ∈ U , initial condition x(t0 ) = x0 and possible end condition x(t1 ) = x1 . The problem designed for such a system and optimal control theory will be finding the optimal pair u∗ (t) and x∗ (t) that could maximize the objective function, Z

t1

J = h(t1 , x(t1 )) +

 f0 (x(t), u(t), t)dt

t0

and it is called the problem of Bolza, as h(t1 , x(t1 )) is the end condition and R t1 f (x(t), u(t), t)dt is the cost function. If h = 0, then it is called problem of t0 0 Lagrange. If f0 = 0, it is called the problem of Mayer, which is the form of the simple continuous cash problem that I will discuss next. Though it seems that those three problems have different forms, in fact, they can be transformed into each other.

2

2

Pontryagin’s Maximum Principle and The Proof of The Principle

2.1

Intro to Pontryagin’s Maximum Principle

In the 1950s, when Lev Semenovich Pontryagin tried to solve a concrete fifth-order system of ordinary differential equations with three control parameters, which is related to the optimal operation of an aircraft, he discovered the formulation of general time-optimal problems. As two of the control parameters were linear and bounded regarding the equations, it was impossible for him to solve the problem by using classical methods such as solving the EulerLagrange equation. He soon realized that it was necessary for a general formula to be invented to get the solution of the problem, which, consequently, led to the birth of Pontryagin’s maximum principle. We will discuss the formulation of this principle in the rest of this section. Given the fundamental system:

x˙i = fi (x, u),

with i = 1, 2, ..., n and t0 ≤ t ≤ t1 , we define Z

t1

x0 =

f0 (x1 , x2 , . . . , xn , u)dt, t0

and the right part of the equation is the objective function for the problem of Lagrange. We now introduce an auxiliary function or adjoint function p(t), which is subject to adjoint differential equations and is a function of t,  p˙ = −p

∂fi ∂xi

 i = 0, 1, 2, ..., n

3

For any initial condition, p(t0 ) = (p0 (t0 ), p1 (t0 ), p2 (t0 ), . . . , pn ), p is defined on the time interval on which u(t) and x(t) are defined. The function p is a continuous function which has continuous derivatives with respect to t, except possibly at the points where u(t) is discontinuous. Therefore, dp0 ∂f0 =− · p0 = 0. dt ∂x0 Originally, we need to optimize the objective function as a integral; however, by transforming the integral into a simpler function x0 , we can simplify the objective function as x0 (t1 ) − x0 (t0 ), which, to some extent, can be optimized more easily than the integral. Now we construct a Hamiltonian function H according to the adjoint function:

H(p, x, u) = p · fi (x, u)

i = 0, 1, 2, ..., n

So now the differential equation of x˙ and p˙ can be rewritten as: ∂H dx = dt ∂p dp ∂H =− dt ∂x because

∂H ∂p

is just fi (x, u) and −p

h

∂fi ∂x

i

(1) (2)

∂fi = − ∂H ˙ ∂fi * ∂x . Those new forms of x

and p˙ are called Pontryagin’s equations. As we choose any admissible control u(t) from t0 to t1 , and set the initial x0 as x(t0 ), we can find the trajectory that satisfies the new form of x, ˙ (1), which leads to the solution of the system p(t). For constant x(t) and p(t), the function H only changes with the control variable u ∈ U . If we introduce another function M (p, x) which is the least upper bound of the values of the function H with respect to u, then M (p, x) will be the maximum value of H when the upper bound value of H is on U .

4

Then, the theorem is as follow: Theorem 2.1. Let u(t), t0 ≤ t ≤ t1 , be an admissible control for a corresponding trajectory x(t). In order that u(t) and x(t) be optimal it is necessary that there exists a nonzero continuous vector function p(t) = (p0 (t), p1 (t), ..., pn (t)) corresponding to u(t) and x(t), such that: 1. for every t from t0 to t1 , the function H(p, x, u) of the variable u ∈ U attains its maximum at the point u = u∗ (t), which is the optimal control variable: H(p, x, u) = M (p, x)

2. at the terminal time t1 the relations

p0 (t1 ) ≤ 0, M (p(t1 ), x(t1 )) = 0

(3)

are satisfied. Furthermore, it turns out that if p(t), x(t), and u(t) satisfy system (1) and (2), and condition (1), the time functions p0 (t) and M (x(t), p(t)) are constant. Thus, (3) may be verified at any time t, t0 ≤ t ≤ t1 , and not just at t1 .

2.2

The Proof of the Pontryagin’s Maximum Principle

Pontryagin’s maximum principle provides the solution for the optimization problem of a general dynamical system by introducing a Hamiltonian function H. Basically, for a system to be optimal, the function H has to be maximal with respect to u(t) almost everywhere in time interval [t0 , t1 ]. I will discuss the proof of the principle next. For a general continuous-time and autonomous problem with a dynamical

5

system, x˙ = g(x, u) and the objective function Z

t1

J=

h(x(t), u(t))dt

(4)

t0

We set the optimal values of J as f (x, T ), which is the value of J from t0 to t1 by using optimal control from initial condition x(t0 ) = x, and the optimal control as u(x, T ), which is the optimal value of u(t0 ) at t0 . The time interval T can be separated into two parts, (t0 , t0 + ∆) and (t0 + ∆, t1 ) with ∆ as infinitesimal. The objective function during the first part can be written as Z

t0 +∆

h(x, u)dt ≈ h(x, u)∆, t0

and the increment δx of x during the first part will be δx ≈ g(x, u)∆. The objective function during the second part will be Z

t1

h(x, u)dt = f (x + δx, T − ∆). t0 +∆

If we assume that f (x, T ) is differentiable, then the second part could be approximated by a first-order Taylor series, Z

t1

h(x, u)dt ≈ f (x, T ) + t0 +∆

∂f ∂f g(x, u)∆ − ∆ ∂x ∂T

The sum of the two parts will give Z

t1

h(x, u)dt = h(x, u)∆ + f (x, T ) + t0

6

∂f ∂f g(x, u)∆ − ∆ ∂x ∂T

Since

R t1 t0

h(x, u)dt = h(x, u) = f (x, T ), the sum could be simplified as   n X ∂f ∂f = min h(x, u) + gj (x, u) u ∂T ∂x j j=1

This is a functional differential equation for the optimal value f of J. The u(t) found by solving this differential equation would be the optimal control. Moreover, for a non-autonomous system, the process of finding the functional differential equation is almost the same as the time-invariant system but with a third variable, x1 , which is the ending time, other than x and T . So f (x, T ) and u(x, T ) in the previous discussion now become f (x, T, t1 ), which is the value of J over time interval T , ending at t1 with initial state x(t1 − T ) = x and u(x, T, t1 ), which is the optimal control for the process of the system ending at t1 with the same initial state x(t1 − T ) = x as well. Now the general functional differential equation is   n X ∂f ∂f = min h(x, u, t1 − T ) + gj (x, u, t1 − T ) u ∂T ∂x j j=1

(5)

From the derivation of function (4), the problem has transformed from finding the minimum of the objective function J to finding that of the general functional differential equation. By setting t = t1 − T , we can rewrite the function (4) as n X ∂f ∂f + h(x, u, t) + gj (x, u, t) = 0, ∂t ∂xj j=1

which is known as the Hamilton-Jacobi-Bellman equation. To illustrate the Pontryagin’s maximum principle, we now introduce the

7

adjoint variable p, which is only a function of time, and

pj =

∂f ∂xj

Based on Pontryagin’s maximum principle, we need to construct a Hamiltonian function H that could represent the minimum value of the objective function: n X ∂f H(x, u, t) = h(x, u, t) + gj (x, u, t). ∂x j j=1

Then we get ∂f = −H ∂t Also ∂H ∂ p˙j = − = ∂xj ∂t



∂f ∂xj



and x˙j =

∂H . ∂pj

The Hamiltonian function constructed is just the same as the general functional differential equation, and it is not hard to find from their components. Those two differential equation for x˙ and y˙ we get from differentiating H is the same as those ((1) and (2)) that I have already mentioned in (2.1). Though the Hamiltonian function here seems different from the one in (2.1), they are actually the same; this part, n X ∂f gj (x, u), ∂x j j=1

can be transformed into n X

pj gj (x, u, t),

j=1

which is just another form of p · f (x, u). Therefore, as for the optimization

8

problem, we need to find the optimal control u∗ (t) that could achieve the extreme values of the objective function (4). According to the derivation of the general functional differential equation (5), we could find u∗ (t) of the function (4) by simply finding that of the equation (5). At the end, we constructed the Hamiltonian function based on the function (5); they are actually of the same form. The problem has finally become finding the u∗ (t) from the Hamiltonian function since then, which means finding the optimal control that could minimize the function H at every point in the time interval T and the processes prove the Pontryagin’s maximum principle.

3

Controllability Of Linear Dynamical System

3.1

Introduction of linear dynamical system

Controllability is one of the most important property of modern optimal control theory. The state-space approach to analyze control system was the focus of the discussion of control theory in late 1950s, but it was R.Kalman and his colleagues that made a great progress on topics considering the availability of optimal controls of dynamical systems under certain conditions. Inevitably, for many dynamical systems, those controls do not have influence on the complete states of according systems; they only affect a part of their systems. Basically, controllability theorem helps to find the possibility of finding the set of admissible controls that can steer dynamical system from a given initial state x(0) to a final state, the origin S = {0} within finite time, in the following case the finite time period will be T = t − 0 = t.

9

For a linear autonomous dynamical control system,    x(t) ˙

= Ax(t) + Bu(t)

  x(0)

= x0 ,

where t > 0, and the ODE equation of the system is linear in both the state variable x(t) and the control variable u(t). We define a set of control parameters,

U = [−1, 1]m = {a ∈ Rm ||ai | ≤ 1, i = 1, · · · , m}.

The unique solution of the nonhomogeneous system,    x(t) ˙ =

Ax(t) + f (t)

  x(0) = x0 , is x(t) = et Ax0 + et A

t

Z

e(t−s)A f (s)ds,

0

and this kind of expression is known as the variation of parameters formula. According to this formula, the solution for the ODE of the system will be, t

Z

x(t) = et Ax0 + et A

e(t−s)A Bu(s)ds.

0

We define the reachable set for time t to be C(t), which is a convex and symmetric set of initial points x0 for which there exists a control such that x(t) = 0. For an arbitrary initial point, we can conclude that x0 ∈ C(t) if and only if there exists a control u(·) ∈ U that

0 = et Ax0 + et A

Z

t

0

10

e(t−s)A Bu(s)ds;

which means, for the target final state, the origin, to be reachable, the initial point that we can choose has to be,

x0 = −

Z

t

e(t−s)A Bu(s)ds

0

4

Simple Continuous Cash Balance Problem The simple continuous cash balance problem aims to solve a continuous cash

balance model to manipulate the level of cash balance of a company to meet its demand for cash. The solution for such problem based on optimal control is constructed as the same as the solution of the application of the problem of Mayer, which is also called the Mayer form of Pontryagin’s maximum principle. The control variable for the cash balance problem is the rate of sale of securities for cash or the opposite. The notations of the problem are:

x(t) = the cash balance at time t. y(t) = the security balance at time t. d(t) = the demand for cash at time t. Positive d(t) represents cash outflow. u(t) = the sale of securities for cash at time t. Negative u(t) represents the purchase of a portfolio of securities. The restriction of u(t) is M1 and M2 , and both of them are positive number. − M1 ≤ U (t) ≤ M2 . r1 (t) = the interest rate of the cash balance x(t) r2 (t) = the interest rate of the securities y(t) α = the broker’s commission of securities per dollar bought or sold (All units are in dollars except interest rates)

11

Though in reality it is impossible for securities to be sold or bought instantaneously, we assume that they are traded immediately without a time interval in the continuous model. In this case, the state variables equations for this dynamic system are:

x˙ = f (x, u) = r1 x(t) − d(t) + u(t) − α|u|,

x(0) = x0

y˙ = g(y, u) = r2 y(t) − u(t),

y(0) = y0

x0 and y0 are the initial cash balance and security balances. The system is restricted by the limitations on u(t) and the objective functions. The change of the cash balance at time t, x, ˙ will be cash inflow, which includes the interest earned, r1 x(t), and the cash received from the sale of securities, u(t), subtracting cash outflow, which will be cash demanded by other activities of the company, d(t), and the broker’s commission fee on securities sold for cash α|u|. Similarly, the change of the security balance,y, ˙ will be the interest earned, r2 y(t), subtracting the sales of securities, u(t). The objective function will be in the Mayer form :

max(x(T ) + y(T ))

Based on Pontryagin’s maximum principle, we can solve this problem by formulating the Hamiltonian function and finding the maximum value of the H function. Also, we can use the adjoint variables e1 and e2 to formulate the H

12

function,

H = e1 x(t) ˙ + e2 y(t) ˙ = e1 (r1 x(t) − d(t) + u(t) − a|u|) + e2 (r2 y(t) − u(t)) = e1 (r1 x(t) − d(t)) + e2 r2 y(t) + e1 (u(t) − a|u|) − e2 u(t)   ∂f with e˙1 = −e1 ∂x   ∂f e˙2 = −e2 ∂x We construct a function with the control variables portions of H,

U (t) = e1 (u(t) − a|u(t)|) − e2 u(t).

And we can derive the adjoint functions as, ∂h = −e1 r1 (t), ∂x ∂h e˙2 = − = −e2 r2 (t), ∂y

e˙1 = −

e1 (T ) = 1 e2 (T ) = 1.

So the solutions will be, Z

T

e1 = exp

! r1 (t)dt

t

Z e2 = exp

T

! r2 (t)dt

t

Theses equations for the adjoint variables represent the future value of a dollar in cash or securities in the time period from t to T . The value of e1 is the future value of holding one dollar in the cash account, while e2 is the future value of holding one dollar in securities. According to Pontryagin’s maximum principle, as we have formulated the Hamiltonian function, now the problem

13

has become finding the maximum value of the Hamiltonian function rather than maximizing the original objective function of Mayer form. In fact, we only need to maximize U (t) by choosing proper u(t), as the rest of the Hamiltonian function is determined and unaffected by the control variables u(t). Let’s set,

u(t) = u1 (t) − u2 (t), u1 (t) ≥ 0, u2 (t) ≥ 0 and u1 (t)u2 (t) = 0.

So in this case we can also say, |u(t)| = u1 (t) + u2 (t) and one of u1 (t) and u2 (t) must be zero, and U (t) can be transformed into U (t) = u1 (t)[(1 − α)e1 (t) − e2 (t)] − u2 (t)[(1 + α)e1 (t) − e2 (t)]. Since U (t) is linear in u1 (t) and u2 (t),as we have discussed in the first section, the optimal strategy for this problem is ”bang-bang”; the optimal value of u(t) could only be two end-point values −M2 or M1 . When u2 (t) = 0 and u( t) = u1 (t), u1 (t) has to be M1 . In the same way, when u1 (t) = 0 and u(t) = −u2 (t), u2 (t) must be M2 . Therefore, the value of of u1 (t) has to be zero or M1 and the value of u2 (t) has to be M2 or zero accordingly. To maximize U (t), we need to maximize the first part, u1 (t)[(1 − α)e1 (t) − e2 (t), and the second part, −u2 (t)[(1 + α)e1 (t) − e2 (t)] as large as possible. For the first part of U (t). If (1 − α)e1 (t) > e2 (t), then we need to maximize the first part so u1 (t) will be M1 . If (1 − α)e1 (t) = e2 (t), then the first part will be zero no matter what u1 (t) is, so the value of it will be undetermined. If (1 − α)e1 (t) < e2 (t), u1 (t) has to zero. Likewise, for the second part of U (t), if (1 + α)e1 (t) > e2 (t), u2 (t) has to be zero. If (1 + α)e1 (t) < e2 (t), then the value of u2 (t) has to be M2 to maximize this part. And u2 (t) wil still be undetermined if (1 + α)e1 (t) = e2 (t). So u1 (t) and u2 (t) should be as following:

14

u1 (t) =

    M1    

if (1 − α)e1 (t) > e2 (t),

Undetermined       0

if (1 − α)e1 (t) = e2 (t), if (1 − α)e1 (t) < e2 (t);

and

u2 (t) =

    0    

if (1 + α)e1 (t) > e2 (t),

Undetermined       M2

if (1 + α)e1 (t) = e2 (t), if (1 + α)e1 (t) < e2 (t).

u1 (t) and u2 (t) represent sales of securities and purchase of securities. The optimal strategy, according to u1 (t), states that the company should take is to sell if the future value of a dollar in cash is larger than that of a dollar in securities even with the existence of the broker’s commission fee, and not to sell if the situation is the opposite. If the future value of both a dollar in cash less the commission fee is equal to that of a dollar in securities, then the optimal strategy is undetermined. Likewise, based on u2 (t), the optimal strategy for the company is to buy if the future value of holding one dollar in cash plus the broker’s commission fee is smaller than that of holding a dollar in securities, and not to buy reversely. It is still undetermined if both future values are equal. Consequently, if the optimal strategy shows that the company needs to sell, it will be reasonable for the company not to buy any securities, which means if (1− α)e1 (t) ≥ e2 (t), then (1 + α)e1 (t) > e2 (t). That is, if u1 (t) > 0, then u2 (t) = 0. Similarly, when (1 + α)e1 (t) ≤ e2 (t), (1 − α)e1 (t) < e2 (t). That is, if u2 (t) > 0, then u1 (t) = 0. So u1 (t)u2 (t) = 0 is always illustrated in the optimal policy.

15