Hybrid Solution Method for Dynamic Programming ... - Springer Link

7 downloads 7087 Views 111KB Size Report
An optimal control problem is considered for a multi-degree-of-freedom (MDOF) ... disturbances may be studied by the Dynamic Programming approach, which ...
Dynamics and Control, 10, 107–116, 2000

c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. °

Hybrid Solution Method for Dynamic Programming Equations for MDOF Stochastic Systems A. BRATUS’ Department of System Analysis, Moscow State University, Moscow 119899, Vorobyevy Gory, Russia M. DIMENTBERG [email protected] Mechanical Engineering Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA D. IOURTCHENKO Mechanical Engineering Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA M. NOORI Mechanical Engineering Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA Editor: M. J. Corless Received September 21, 1999; Accepted September 29, 1999 Abstract. An optimal control problem is considered for a multi-degree-of-freedom (MDOF) system, excited by a white-noise random force. The problem is to minimize the expected response energy by a given time instant T by applying a vector control force with given bounds on magnitudes of its components. This problem is governed by the Hamilton-Jacobi-Bellman, or HJB, partial differential equation. This equation has been studied previously [1] for the case of a single-degree-of-freedom system by developing a “hybrid” solution. Specifically, an exact analitycal solution has been obtained within a certain outer domain of the phase plane, which provides necessary boundary conditions for numerical solution within a bounded in velocity inner domain, thereby alleviating problem of numerical analysis for an unbounded domain. This hybrid approach is extended here to MDOF systems using common transformation to modal coordinates. The multidimensional HJB equation is solved explicitly for the corresponding outer domain, thereby reducing the problem to a set of numerical solutions within bounded inner domains. Thus, the problem of bounded optimal control is solved completely as long as the necessary modal control forces can be implemented in the actuators. If, however, the control forces can be applied to the original generalized coordinates only, the resulting optimal control law may become unfeasible. The reason is the nonlinearity in maximization operation for modal control forces, which may lead to violation of some constraints after inverse transformation to original coordinates. A semioptimal control law is illustrated for this case, based on projecting boundary points of the domain of the admissible transformed control forces onto boundaries of the domain of the original control forces. Case of a single control force is considered also, and similar solution to the HJB equation is derived. Keywords: optimal control, random excitation, HJB equation, hybrid solution

1.

Introduction

Procedures for synthesis of optimal control laws for dynamic systems are well established for the cases, where no bounds or restrictions are imposed on the control forces. However, as mentioned, e.g., in [2], the resulting optimal laws may prove to be “unfeasible”, or requiring too large forces to be implemented by the available actuators. The bounds on

108

BRATUS’ ET AL.

magnitudes of control forces may significantly modify the optimal control laws, and they make the optimization problem nonlinear even for linear controlled systems. Problems of optimal bounded feedback control for systems with random white-noise disturbances may be studied by the Dynamic Programming approach, which leads to the Hamilton-Jacobi-Bellman, or HJB, equation [3–5]. The solution to this equation provides both the optimal control law and values of the relevant functional. The basic difficulty with this approach lies in necessity to generate numerical solution to this multidimensional PDE for an unbounded domain. This is the basic reason for a rather limited amount of specific solutions available. This difficulty was overcome by a hybrid approach, as suggested in a previoous paper [1], where a controlled single-degree-of-freedom (SDOF) system had been considered with the goal being to minimize the expected response energy functional by a given time instant. An exact explicit analytical solution to the HJB equation has been obtained within a certain outer domain, thereby reducing the necessary numerical study to a finite inner domain. The boundary conditions for the numerical solution are readily obtainable from the analytical solution. The results has shown the simple “dry-friction” control law to be optimal within the outer domain, whereas switching lines were generated numerically within the inner domain. The efficiency of the dry-friction control, as extended to the whole phase plane, has been evaluated in terms of the calculated expected response energy. As should be expected, this law may be regarded as a suboptimal one for the case of a weak control, where the bound on available magnitude of control force is small. With increasing the bound this simplified control law becomes much less efficient compared with the perfectly optimal one. In this paper the hybrid approach is extended to MDOF systems using common transformation to modal coordinates. The multidimensional HJB equation is solved explicitly for the corresponding outer domain, thereby reducing the problem to a set of numerical solutions within bounded inner domains. Thus, the problem of bounded optimal control is solved completely, as long as the necessary modal control forces can be implemented in the available actuators. If, however, the control forces can be applied to the original generalized coordinates only, the resulting optimal control law may become unfeasible. The reason is the nonlinearity in maximization operation for modal control forces, which may lead to violation of some constraints after transformation to original coordinates. A semioptimal control law is illustrated for this case, based on projecting boundary points of the domain of the admissible transformed control forces onto boundaries of the domain of the original control forces. Case of a single control force is considered also, and similar solution to the HJB equation is derived.

2.

Formulation

Consider a randomly excited controlled system with n degrees of freedom, governed by the following matrix equation of motion M X¨ + K X = U (t) + B(t)ς (t)

(1)

HYBRID SOLUTION METHOD FOR DYNAMIC PROGRAMMING EQUATIONS

109

Here X (t) and U (t) are n-dimensional column vectors of displacements and control forces respectively, with components xi (t) and u i (t), i = 1, . . . , n; M and K are symmetric positively definite n × n-mass and stiffness matrices respectively, ς(t) is a vector of independent random Gaussian white noises with unit intensities; B(t) = kbi j (t)k; i, j = 1, 2, . . . , n. (Just for brevity the original control forces, as well as transformed control forces—components of vectors U and V respectively—are shown here as functions of time only, whereas for a system with feedback control they may depend on all system’s state variables). The following bounds are assumed to be imposed on the possible magnitudes of control forces: U (t) ∈ S R , where S R = {u i (t): |u i | ≤ Ri , i = 1, . . . , n}, Ri > 0

(2)

The total response energy at time instant t is W = (1/2)[(M X˙ , X˙ ) + (K X˙ , X˙ )], where parenthes denote dot product of vectors. The matrices of mass and stiffness can be diagonalized by a nonsingular transformation with n × n-matrix A = kai j k, i, j = 1, . . . , n, such that A T M A = I, A T K A = Ä2

(3)

where I is identity matrix, Ä2 is diagonal, with elements Äi2 , i = 1, . . . , n, and superscript “T ” denotes the transposed matrix [6]. Furthermore, A may be represented as a product of an orthogonal matrix Q and a certain diagonal matrix. Introducing now modal coordinates as Y (t) = A−1 X (t), the equation of motion (1), in view of the relations (3) is reduced to Y¨ + Ä2 Y = A T U (t) + A T B(t)ς (t)

(4)

The expression for energy is now transformed to W = (1/2)[(Y˙ , Y˙ ) + (Ä2 Y, Y )] = (1/2)

n X [ y˙i2 (t) + Äi2 yi2 (t)]

(5)

i=1

Consider column vector V (t) of transformed control forces, with components vi (t) =

n X

a ji u j (t)

(6)

j=1

In view of inequalities (2) this vector belongs to a set Sρ = {vi (t): |vi (t)| ≤ ρi , i = 1, 2, . . . , n}, ρi =

n X

|a ji |R j

(7)

j=1

The transformed equation (4) may be rewritten in a space state (scalar) form as y˙i = pi , p˙ i = −Äi2 yi + vi + ξi (t), i = 1, 2, . . . , n ξi (t) =

n X k=1

σki (t)ςk (t), σki (t) =

n X j=1

a ji b jk (t)

(8)

110

BRATUS’ ET AL.

Denote by H (Y, P, T − t) minimal expected value of the response energy (5) of the system (4) by a time instant T , where Y and P are column n-vectors with components yi and pi respectively. The expectation is a conditional one, with the condition being initial state Y , P of the system at time instant t. Introducing “backward” time τ = T − t, the HJB equation for H can be written as [3–5] ∂ H/∂τ =

n X [ pi (∂ H/∂ yi ) − Äi2 yi (∂ H/∂ pi ) − ρi |∂ H/∂ pi |] + (1/2)T r (σ σ T H P P )

(9)

i=1

the initial condition being H (Y, P, 0) = (1/2)

n X (Äi2 yi2 + pi2 )

(10)

i=1

Here σ and H P P are n × n-matrices of σi j and ∂ 2 H/(∂ pi ∂ p j ) respectively. Comment. Terms with ρi in the HJB equation (9) appear due to minimizing over vi the original quantities vi (∂ H/∂ pi ), which in view of the inequalities (7) yields vi = −ρi sgn(∂ H/∂ pi ), sgn z = +1 for z > 0, sgn z = −1 for z < 0

(11)

An important modification of the HJB equation (9) is appropriate for the case of a simple “dry-friction” control, namely vi = −ρi sgn pi , vi (∂ H/∂ pi ) = −ρi (∂ H/∂ pi )sgn pi

(12)

This law is often called suboptimal due to reasons, outlined in the following section. 3.

Solution: Optimal Control in Transformed Variables

Statement 1. The following function H (Y, P, τ ) = (1/2)

n X {[ pi − (ρi z i /Äi ) sin Äi τ ]2 + [Äi yi + (ρi z i /Äi ) i=1

× (1 − cos Äi τ )]2 } + (1/2)

n Z n X X k=1 i=1

τ

0

σik2 (T − s)ds; z i sgn

× [(ρi sgn pi − (ρi /Äi ) sin Äi τ )sgn pi ]; i = 1, 2, . . . , n

(13)

provides solution to the problem (9), (10) within domain D=

n [

Di ; Di = {Y, P, τ : | pi | > (ρi /Äi )| sin Äi τ |}, i = 1, 2, . . . , n

i=1

Indeed, substituting the expression (13) for H into the PDE (9) yields the equation n n X X (ρi2 /Äi ) sin Äi τ = ρi [ pi z i − | pi − ( pi z i /Äi ) sin Äi τ |] i=1

i=1

(14)

HYBRID SOLUTION METHOD FOR DYNAMIC PROGRAMMING EQUATIONS

111

which would be certainly satisfied if it is satisfied simultaneously, term by term, for every i. Thus, the proof is reduced to one for a single degree of freedom (n = 1), which has been presented in [1].

COROLLARY The optimal control law within domain D is vi (τ )∗ = −ρi for (Y, P, τ ) ∈ Di+ , Di+ = {|Y, P, τ : pi > (ρi /Äi )| sin Äi τ |} vi (τ )∗ = ρi for (Y, P, τ ) ∈ Di− , Di− = {|Y, P, τ : pi < −(ρi /Äi )| sin Äi τ | (15) which is clearly seen to be the same as the dry-friction law (11) for the i-th DOF. Thus, these two control laws are different only within the inner domain, this difference diminishing with reducing ρi . Hence, the name “suboptimal”, which corresponds to the case of a “weak” control, or one with small maximal possible value of the magnitude of control force. To generate optimal control law within the whole state space one must know solution to the problem (9), (10) within the inner domains D¯ i = {Y, P, τ : | pi | ≤ (ρi /Äi )| sin Äi τ | for i = 1, 2, . . . , n. Such a solution has been found numerically in a previous work [1], using the analytical solution to obtain the necessary boundary conditions for the inner domain; in particular, switching line for control force were found. Applying this hybrid solution to each pair of state variables yi , pi , i = 1, 2, . . . , n, one can find the corresponding switching surfaces in the whole space Y, P, τ . Thus, the problem of optimal control for the system (5) is solved in terms of control forces vi (τ ), as introduced by relations (6). Furthermore, calculations of the response energy functional H has been made in [1] both for the basic HJB equation (9) and its counterpart for the dry-friction control case (12). These results may also be applied to the each DOF to make a decision on whether it is worthwhile to implement the perfectly optimal control law, or the much simpler dry-friction law is sufficient for the given application. 4.

Semioptimal Control in Original Variables

The above analysis provides a complete solution to the problem of optimal control, as long as the required control forces for transformed, or modal coordinates (vector V (t)) can be implemented indeed. In certain applications this may not be the case, and one should generate vector U (t) of the control forces, applied to the original coordinates. Then one should resolve—for every point of the state space and every time instant—the set of linear algebraic equations (6) in terms of u j (t) where vi (T − t) are optimal control forces, as governed by relations (11). The natural question arises then: will the resulting original control forces—components of the vector U —satisfy the original bounds (2)? Regretfully, the answer is negative in general, the reason being nonlinear operation of maximization, which “sneaks in” between direct modal transformation Y = A−1 X and its inverse. To illustrate this effect and outline the proposed procedure for generating a reasonable control strategy consider first a simple case of two DOFs (n = 2). Let a primary mass be attached to a rigid base via primary elastic spring, and a secondary mass be attached to the primary one via secondary spring. Then, using subscripts 1 and 2 respectively for primary

112

BRATUS’ ET AL.

and secondary masses/stiffnesses, as well as for their displacements and corresponding control forces, the matrix equation of motion (1) may be written, with ¸ · ¸ · k + k2 −k2 m1 0 ,K = 1 (16) M= −k2 k2 0 m2 The matrix A, which transforms the system to the form (4), may be written as # " −1/2 −1/2 −m 1 sin α m 1 cos α A = −1/2 −1/2 m 2 cos α m 2 sin α q α = tan−1 [(k1 + 4k22 + k12 )/k2 ]

(17)

This results in the transformed equations (8) with Äi2 = λi /m i , i = 1, 2, λ1,2 = (1/2)(2k2 + k1 ±

q 4k22 + k12 )

(18)

These explicit expressions for elements of the matrix A in terms of original masses and stiffnesses should be used in the equations (6) and (7) for the transformed control forces and their bounds respectively. It can be seen that only four pairs of optimal values of the transformed control forces are possible, namely v1∗ = ±ρ1 , v2∗ = ±ρ2 , with different combinations of positive and negative signs within various domains of space state and time (the optimal values are denoted here by star superscripts, which will also be used for the optimized original control forces). Resolving relations (6) in terms of the original control forces for each of the above combinations of signs (with coefficients as defined by expressions (17)) yields Cases A, D: v1∗ = ±ρ1 , v2∗ = ±ρ2 ⇒. u ∗1 = ±R1 (| cos α| cos α − | sin α| sin α) ± R2 (| sin α| cos α − | cos α| sin α), u ∗2 = ±R1 (| sin α| cos α + | cos α| sin α) ± R2 (| cos α| cos α + | sin α| sin α) Case B: v1∗ = −ρ1 , v2∗ = ρ2 ⇒. u ∗1 = R1 (| sin α| sin α + | cos α| cos α) + R2 (| cos α| sin α + | sin α| cos α) u ∗2 = R1 (| cos α| sin α − | sin α| cos α) + R2 (| sin α| sin α − | cos α| cos α)

(18)

Case C: v1∗ = ρ1 , v2∗ = −ρ2 ⇒. u ∗1 = −R1 (| sin α| sin α + | cos α| cos α) − R2 (| cos α| sin α + | sin α| cos α) u ∗2 = R1 (| sin α| cos α − | cos α| sin α) + R2 (| cos α| cos α − | sin α| sin α) The resulting four pairs of values u ∗1 , u ∗2 define four points on the u 1 , u 2 plane. We may denote them as A, B, C and D, using the same notation as for the corresponding cases as defined by the expressions (18) (with the upper (plus) sign being used for the case A in the double-sign expression, and lower (minus)—for the case D).

HYBRID SOLUTION METHOD FOR DYNAMIC PROGRAMMING EQUATIONS

113

Figure 1a.

Statement 2. The quadrangle ABDC is a rectangle, circumscribed around the rectangle |u 1 | ≤ R1 , |u 2 | ≤ R2 (see Figure 1a). − → −→ To prove that vectors, say, AB and CD are collinear one can directly compare their angles by calculating, from expressions (18), relevant differences in coordinates first between points A and B and then between points C and D. Similarly, it can be shown that vectors, − → − → − → − → say, AB and BD are orthogonal, whereas AC and BD are collinear. Consider now, for definiteness, the case, where 0 ≤ α ≤ π/2. The equation of the straight line AB is then found to be u 2 − R1 sin 2α + R2 u 1 − R1 cos 2α = R1 (1 − cos 2α) + R2 sin 2α −R2 (1 + cos 2α) − R1 sin 2α and it is clearly satisfied by the pair u 1 = R1 , u 2 = R2 . It can be shown similarly, that (R1 , −R2 ) ∈ B D, (−R1 , −R2 ) ∈ C D, (−R1 , R2 ) ∈ C A. Figure 1b illustrates positions of the rectangle ABDC for several values of α and R1 = 2, R2 = 1.

114

BRATUS’ ET AL.

Figure 1b.

This example shows clearly, that any pair of possible values of the optimal transformed control forces v1∗ , v2∗ corresponds to such a pair of the original control forces u ∗1 , u ∗2 , that one of the conditions (2) is always satisfied whereas the other one is always violated. For example, 0 < u 1 < R1 = 2, u 2 > R2 = 1 for the highest corner point of the dashed rectangle in Figure 1b (α = 15◦ ). A reasonable way to handle this problem is to project the apexes of the rectangle ABDC onto the nearest sides of the rectangle |u 1 | ≤ R1 , |u 2 | ≤ R2 . These projections are denoted in Figure 1a by the same letters with primes. The resulting control laws may be called “semioptimal”—literally(!), since one of the forces is kept at its optimal value, whereas the other one is reduced in magnitude in order to comply with the relevant bound (2). And this reduction is made along the shortest route on the plane of control forces.

HYBRID SOLUTION METHOD FOR DYNAMIC PROGRAMMING EQUATIONS

115

This simple example illustrates the approach, which is suggested for a general MDOF system with an arbitrary n. Let u ∗j , j = 1, 2, . . . , n be solution to the equation set (6) for the corresponding domain Y, P, τ . Denote by J + and J − lists of those indices j, for which, respectively, u ∗j > R j and u ∗j < −R j . The “semioptimal” control law, which may be suggested then for this domain, is as follows (it is denoted by adding bars) / J+ ∪ J− u¯ ∗j = +R j for j ∈ J + ; u¯ ∗j = −R j for j ∈ J − ; u¯ ∗j = u ∗j for j ∈ 5.

(19)

Control with a Single Actuator

We shall now waive the original assumption that the number of control forces available is exactly n, i.e., the same as the number of DOFs. If the former is smaller than the latter, the system (1) may not be completely controllable [7]. However, this property may not be necessary for some applications. In this Section we shall consider the case, where only a single control force is available. The control vector has then only one nonzero component, i.e., U T (t) = (0, . . . , u k (t), . . . , 0), and there is only a single constraint (2) correspondingly: |u k (t)| = |u(t)| ≤ R. The transformed control forces are now defined as vi (t) = aki u(t), i = 1, 2, . . . , n. Being dependent on a single original control foirce u(t), they are certainly not independent. Furthermore, they should satisfy the obvious n constraints |vi (t)| ≤ Raki . Denoting |aki | = ai , i = 1, 2, . . . , n, we may write the HJB equation as ¯ ¯ ¶ n X n n µ n n X ¯X X ∂H ∂ H ¯¯ 1 X ∂2 H ∂H ¯ 2 ∂H = − Äi yi ai σik σ jk (20) pi − R¯ ¯+ ¯ i=1 ∂ pi ¯ 2 k=1 i=1 j=1 ∂τ ∂ yi ∂ pi ∂ pi ∂ p j i=1 Statement 3. The following function H (Y, P, τ ) = (1/2)

n X {[ pi − z(Rai /Äi ) sin Äi τ ]2 i=1

+[Äi yi + z(Rai /Äi )(1 − cos Äi τ )]2 } Ã ! n Z τ n X n X X 2 σ jk (T − s)ds, z = sgn ai pi +(1/2) k=1 j=1

0

(21)

i=1

provides solution to the initial-value problem (20) within domain ¯ ¯ Ã !1/2 n n ¯ ¯X √ X ¯ ¯ 4 2 ai pi ¯ > R n (ai / Äi ) = R0 ¯ ¯ ¯ i=1 i=1

(22)

Proof: The equation (20) after substituting expression (21) is reduced to ¯ ¯ n n n ¯X X X ¤¯¯ ¡ 2 2 ¢ £ ¯ 2 ai pi − R ¯ ai R /Äi sin Äi τ = Rz a p − z(Rai /Äi )sin Äi τ ¯ (23) ¯ i=1 i i ¯ i=1 i=1

116

BRATUS’ ET AL.

Assume first, that argument of the quantity z is positive, so that z = +1. Then the symbol of absolute value can be dropped in the LHS of the inequality (22). On the other hand, in view of the Cauchy inequality à !1/2 à !1/2 à !1/2 n n n n X X X √ X 2 4 2 2 4 2 (ai /Äi )sin Äi τ ≤ ai /Äi sin Äi τ ≤ n ai /Äi i=1

i=1

i=1

i=1

and the equation (23) is satisfied, as long as the expression under the absolute value sign in the RHS is positive. It can be shown similarly, that the latter expression is negative if z = −1, and thus the equation (23) is satisfied once again. Pn The inequality (17) defines in R2n a domain, bounded by hyperplanes i=1 ai pi = R0 . It can be illustrated in a simple manner for the TDOF example from the Section 3. In this −1/2 −1/2 case U T = (0, u(t)), |u(t)| ≤ R and |v1 | ≤ Rm 1 |cos α|, |v2 | ≤ Rm 2 |sin α|. The −1/2 −1/2 |sin α| analytical solution (21) to the HJB equation (20), with a1 = m 1 |cos α|, a2 =2 is valid within domain ¯ ¯ ¯ p1 |cos α|/√m 1 + p2 |sin α|/√m 2 ¯ > R(cos4 α/λ1 + sin4 α/λ2 ) = R0 where formula (18) has been used.√The last inequality defines exterior of a strip of width 2R0 , inclined by angle β = tan−1 ( m 2 /m 1 |cot α|) to the axis p2 . The inclination angle is small if m 2 ¿ m 1 . Acknowledgments This research was sponsored by the NSF, Grant CMS-9610363. This support is most highly appreciated. References 1. Bratus’, A., Dimentberg, M. and Iourtchenko, D, “Optimal bounded response control for a second-order system under a white-noise excitation,” Journal of Vibration and Control, to appear. 2. Boyd, S. and Barrat, C., Linear Controller Design: Limits of Performance, Prentice Hall, 1991. 3. Bensoussan, A., Perturbation Methods in Optimal Control, John Wiley, 1988. 4. Fleming, W. and Rishel, R., Deterministic and Stochastic Optimal Control, Springer-Verlag, 1975. 5. Friedman, A., Stochastic Differential Equations and Applications, Academic Press, 1975. 6. Strang, G., Linear Algebra and its Applications, Academic Press, 1976. 7. Bryson, A. and Ho, Y-C., Applied Optimal Control, Optimization, Estimation and Control, John Wiley, 1975.