neural networks applied to optimal flight control - Semantic Scholar

1 downloads 0 Views 167KB Size Report
Abstract. This paper presents a method for developing control laws for nonlinear systems based on an optimal control formulation. Due to the nonlinearities of ...
NEURAL NETWORKS APPLIED TO OPTIMAL FLIGHT CONTROL  Tomas McKelvey Department of Electrical Engineering Linkoping University S-581 83 Linkoping, Sweden Phone: +46 13 282461 E-mail: [email protected]

Abstract.

This paper presents a method for developing control laws for nonlinear systems based on an optimal control formulation. Due to the nonlinearities of the system, no analytical solution exists. The method proposed here uses the 'black box' structure of a neural network to model a feedback control law. The network is trained with the back-propagation learning method by using examples of optimal control produced with a di erential dynamic programming technique. Two di erent optimal control problems from ight control are studied. The produced control laws are simulated and the results analyzed. Neural networks show promise for application to optimal control problems with nonlinear systems.

Keywords.

neural nets, optimal control, ight control

1 Introduction

model would thus severely degrade the solution. The obvious solution is instead to use the nonlinear model in the optimization. A drawback with a nonlinear model is however the problem of nding an analytical feedback control. Unless the problem is very simple, only numerical solutions exist. The solutions obtained are also of an open-loop type, i.e. the solution only depends on time and the initial state. When applied in reality this type of open-loop control is sensitive to disturbances and instead the goal is to nd a state-feedback control, using the current state to produce the control signal which would give a closed loop control system. The approach used in this paper to nd a feedback controller is also used in (McKelvey, 1991). The method proposed use a neural network to model the unknown feedback control. The network is trained with examples of optimal control derived with a numerical method.

The use of neural networks (NN) for di erent applications in control has recently emerged, see (Special issue on neural network in control systems, 1990). The most interesting feature of a neural network is its capability to build an internal model of nonlinear functions, given input/output samples of the unknown function. The NN can be seen as a multivariable nonlinear function, parametrized with a set of parameters . Di erent parameters  give di erent functions. The key idea with NN is to t the NN function to a set of known correct data by adjusting the parameters . In the next section this is further explained. In control of a ghter aircraft (AC) a wide range of optimal control problems arise. An interesting class of problems is control of the AC over a relatively long time period where the optimal criterias are given by di erent tactical goals. This type of control has typically been performed by the pilot of the aircraft. The state equations describing the dynamics of the AC are nonlinear. A commonly used method to deal with nonlinear systems is to linearize the state equations around a stable state. The optimal control problems studied in this paper usually includes a wide time horizon of 50-200 seconds. However during this time the state of A Neural Network is composed of many similar the AC changes signi cantly. Using only a linear units called neurons. These units are usually organized in layers forming an input layer, one or  This work was partly sponsored by SAAB-SCANIA more hidden layers and an output layer. The inAB, Saab Aircraft division, Linkoping, Sweden. put layer is not a real layer in the sense that it

2 Neural Network as a Multipurpose Nonlinear Function

X

α1

α2

α3

W1

W2

W3

Σ

σ

Σ

σ

σ

Σ

σ

. . . .

. . . .

Σ

Σ

Σ

Y

Σ . . . .

. . . .

σ

Σ

σ

Σ

Figure 1: Feed-forward neural network architecture.

complexity, i.e. the number of units in the hidden layers, is dependent on the amount of nonlinearities in the problem. Experimentation determined the number of hidden units in the networks used in this paper. When the structure of the network is given, i.e. the number of units and layers, the network parameters  have to be adjusted so as to give the best t of the function g(X; ) to the given data Z . This is referred to as training or learning in the NN literature. The data Z contains N samples of the unknown function f , given as input vectors Xi and output vectors Yi , the index i indicating the di erent samples. This gives the following relation: Yi = f (Xi ) for i = 1 : : : N between the unknown function f and the data vectors. The output of the network depends on both the input vector X and the parameters . The network error " for a given sample of Z and parameters  can be written

does not include any parameters. Each unit in a layer is fully connected to all units in the previous layer, forming a set of cascaded layers (see Fig. 1). Each input to a unit is scaled with a weight. The weighted inputs are added together with a threshold. The output of a unit is a nonlinear transformation of the sum. This type of network architecture is called a feed-forward neural network. Consider a NN with n inputs, two hidden layers and m outputs. The NN can then be seen as a function g : Rn  Rp ! Rm . The output Y 2 Rm of the network is then where Y = g(X; ) (1)

"(; Zi ) = Yi ? g(Xi ; )

(5)

Zi = [YiT ; XiT ] Z = [Z1 ; Z2; : : : ; ZN ]T

(6) (7)

where X 2 Rn is the input vector and  2 Rp is the parameter vector. Let us de ne To measure the network performance, a quadratic 3 2 loss-function V (; Z ) is introduced: tanh v1 (2) (V ) = 64 ... 75 N X (8) V (; Z ) = v(; Zi ) tanh vn where V is a vector with n components and tanh is the scalar function hyperbolic tangent:

i=1

v(; Zi ) = "(; Zi )T "(; Zi ) (9) The loss-function V (Z; ) is minimized using

the well known back-propagation learning method (3) (Rumelhart and McClelland, 1986). In backpropagation learning the parameters are changed Using (2) the NN can be described with the simple in the direction of the negative gradient of the expression quadratic error v(; Zi ) for each sample in the data Z as follows: Y = g(X; ) = W3 (W2 (W1 X + 1 ) + 2 ) + 3 (4) i+1 = i ? i v T (i ; Zk ) (10) where the matrices Wi contains the connecting weights between the units and the vectors i con- where k = 1 + (i mod N ) (11) tain the thresholds. All together they constitute the parameter vector . The dimensions of the ma- i is gradually decreased during the learning to trices are de ned by the number of inputs, outputs obtain a stable minimum. The learning process and the number of units in the two hidden layers. continues until the loss-function V (; Z ) reaches a minimum. The initial vector 0 is initialized Theoretical results show that a NN with one hid- randomly with values in a limited range. den layer can approximate any continuous function on a compact set arbitrary well by using an unlimited number of hidden units (Cybenko, 1989). Empirical work has indicated that a NN with two hidden layers gives better performance compared to a NN with one hidden layer using The aircraft model is based on a modern ghter the same amounts of units. The choice of network aircraft. Using mass point dynamics with the ?z

?e tanh z = eez + e?z z

3 Aircraft Model

states velocity v, climb angle , horizontal position x, altitude z and the aircraft's mass m, the state equations are written as:

_ = gv0 (n ? cos )

x_ = v cos z_ = v sin m_ = ?Sfc(z; M )Thr(z; M )

(12) (13)

10

8

Altitude km

? D(z; M; n) ? g sin v_ = Thr(z; M ) m 0

12

6

4

(14) (15) (16)

2

0 0.2

0.4

0.6

0.8

with the constraint

1

1.2

1.4

1.6

Mach

(17) Figure 2: Simulated trajectories in the v ? z plane from Example 1. (Optimal trajectory: dotted line, where M is the Mach number, Thr is the thrust neural network: solid line) produced by the engine, D is the drag composed of induced drag and zero lift drag, g0 is the acceleration due to gravity, Sfc is the speci c fuel consumption and n is the load factor on the wing which is used as the independent control variable. Thr, D and Sfc are functions based on a realistic model of a generic ghter aircraft. The model can be summarized in the following standard form

jnj  nmax

3

2.5

Load Factor

2

x_ = f (x; u; t); x(t0 ) = x0

(18)

where x is the state vector, u is the control vector and t 2 [t0 ; tf ] is time.

4 Optimal Control Problems

1.5

1

0.5

0

-0.5 0

20

40

60

80

100

120

140

160

180

Time sec.

Figure 3: Load factor. (Optimal control: dotted line, neural network: solid line)

The problem formulations used in this section are the same as in (Jarmark, 1991). In the two prob- optimal trajectories can be obtained from di erlem studied the loss-function and the stop criterion ent initial positions. Putting together the state vector x along all the optimal trajectories with can be described as: the control variable n forms our training set Z . V = F (x(tf ); tf ) (19) The network structure is given by the number of states, used as inputs; the desired control as the tf = argt ((x(t); t) = 0) (20) network's output; and an appropriate number of where the function V is to be minimized by choos- units in the hidden layers. ing the optimal control, u (t); t 2 [t0 ; tf ] . The problem can be solved numerically with a differential dynamic programming technique (DDP) 4.1 Example 1: Optimal Energy Climb (Jarmark, 1991). The solution obtained is given in an open loop format, i.e., the control at any time t depends only on the initial state x0 and the time The rst optimal control example is the classical energy climb (in ight mechanics). The problem t. can be stated as: To obtain a feedback control solving the optimal problem a neural network is used to approximate V = ?Ef (21) the unknown feedback control function. The data Z used to train the neural network is composed of tf = given (22) a wide range of correct solutions covering the state space of the problem. Using the DDP-technique, Ef = z (tf ) + v(tf )2 =2g0 (23)

where Ef is the scaled sum of the aircraft's potential and kinetic energy at the nal time tf . The task of the feedback controller is to produce the optimal control given the time to go, t ? tf and the state variables velocity v, altitude z and climb angle . The horizontal position is not included since it is not coupled in the state equations and is not part of the optimization. The mass m of the AC is also neglected since the weight change is less than 5% during the optimization time. The nal time tf is allowed to be in the range from 20 to 200 seconds to make the problem more realistic. This gives us the outer architecture for the neural network with 4 inputs and one output, the load factor n. Empirically 10 units and 5 units were included in the rst and second hidden layer, respectively. To produce the learning set Z the DDP-method in (Jarmark, 1991) was used to produce optimal trajectories with di erent initial states x(t0 ) and nal times tf . The initial states were chosen in order to distribute the optimal trajectories over the entire state space. Also the nal times were distributed over the time range tf 2 [20; 200] seconds. The learning data Z was composed of 40 di erent trajectories giving a total of 760 input/output pairs. The learning with the back-propagation algorithm converged to a minimum using a total number of 50000 iterations. Figure 2 show the phase plane from a simulation when the neural network is used as a feedback controller. The trajectory produced by the neural controller (solid) is shown along with the optimal trajectory (dotted). Figure 3 show the corresponding load factor n. The initial position was z (t0 ) = 1:5 km, v(t0 ) = 240 m/s, (t0 ) = 0 and the nal time tf = 190 seconds. The optimal energy is E o = 18:97 km and the energy obtained with the neural network controller is 18:96 km, thus only a di erence of E = ?0:01 km. Similarly excellent results were obtained using other nal times and di erent initial states x0 and the results from a set of simulations are shown in Table 1. The initial states used in the simulations are di erent from the states used to produce the optimal trajectories in the training data Z .

4.2 Example 2: Reach a Launch Envelope in Minimum Time This problem formulation models a mission for a ghter aircraft. The mission is to reach an envelope and launch a missile towards an approaching target. The target is assumed to y straight at a constant velocity and altitude. The function to

z (t0 ) km v(t0 ) m/s tf s E o km E km 1.5 2 3 4 5 6 7 8 10 10

240 110 300 180 250 200 200 250 225 300

190 150 70 110 140 20 150 200 120 50

18.97 16.61 14.32 15.70 17.82 10.08 18.26 20.57 18.41 16.94

-0.01 -0.01 -0.03 -0.04 -0.03 -0.00 -0.04 -0.02 -0.02 -0.06

Table 1: Simulation results from Example 1 using initial states not included in the learning set Z minimize is simply tf which gives the following:

V = tf

(24)

tf = argt ((x(t); z (t); t) = 0)

(25)

(x(t); z (t); t) = k1 (xt (t0 ) ? vt t ? x(t))2 ? z (t)+ k2 xt (t0 ) ? vt t ? x(t) > 0

(26) (27)

where x is the horizontal position, z is the altitude, xt (t0 ) and vt are the targets initial position and velocity, respectively. k1 and k2 are constants and  models the launch envelope, i.e. a curve, in the x ? z plane. Thus the launch envelope moves together with the target towards the AC. Here the task of the control law is to produce the time-optimal control given the state of the aircraft and the distance to the target. The inputs to the feedback controller are: the distance between the aggressor and the target xt (t) ? x(t), the altitude z (t), the climb angle (t) and the velocity v(t). The output is again the load factor n. The neural network is thus composed of 4 input units and one output unit. The number of hidden units chosen and the method to obtain the learning set Z and to nd the best parameter vector  were the same as in the previous problem. The results from a simulation using the neural network as a feedback controller is shown in Fig. 4{6. In Fig. 6 The phase-plot of z and x is shown together with the launch envelope. The tick marks indicate the AC's position every 10 seconds. The initial state used is not part of the learning set Z . The performance is quite good and the aircraft controlled by the neural network reaches the envelope only 0.3 seconds later compared to the optimally controlled aircraft. Simulations with other initial states and target positions give similar results.

12

12

10

10

8

8

Altitude km

Altitude km

o

6

o + o +

6 +o +o

+o

4

4

0 0.2

+o

+o

+o

+o

+o + o

2

+

+ o

2

0.4

0.6

0.8

1

1.2

1.4

1.6

Mach

0 0

5

10

15

20

25

30

35

40

45

x km

Figure 4: Simulated trajectories in the v ? z plane Figure 6: Simulated trajectories in the x ? z plane. Example 2. (Optimal trajectory: dotted line, neu- (Optimal trajectory: dotted line, neural network: ral network: solid line) solid line, Launch envelope: dashed line)

6 Acknowledgements 3

Part of this work was completed at SAABSCANIA AB, Saab Aircraft Division, Linkoping, Sweden as a master thesis at Dept of Automatic Control, Lund, Sweden, see (McKelvey, 1991). I also would like to thank my supervisors Bernt Jarmark and Soren Wickstrom both active at SAAB and Prof. K.J.  Astrom, Lund, for their ideas and support .

2.5

Load Factor

2

1.5

1

0.5

0

-0.5 0

20

40

60

80

100

120

Time sec.

Figure 5: Load factor. (Optimal control: dotted line, neural network: solid line)

5 Conclusions In this paper neural networks have been used to develop nonlinear control laws solving optimal control problems. Due to the nonlinear properties of the problem no analytical solution exists. The method described uses a di erential dynamic programming technique to obtain a set of optimal trajectories which are used in the learning of the network. Two problems with di erent optimization criterias were examined. Simulations using the neural network as the feedback controller for the two problems give evidence that the proposed method is one way of solving this class of problems. Only a small network, with 15 neurons partitioned in the two hidden layers, 10 in the rst and 5 in the second, was needed to give good results for both problems.

7 REFERENCES Cybenko, G. (1989). \Approximation by Superpositions of a Sigmoidal Function". Mathematics of Control, Signals, and Systems, 2:303{314. Jarmark, B. (1991). Various optimal climb pro les. In AFM Conference, New Orleans, pages 124{130. McKelvey, T. (1991). Neural networks applied to optimal ight trajectories. Master's thesis, Lund Institute of Technology, Sweden. Rumelhart, D. and McClelland, J. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, Cambridge MA. Special issue on neural network in control systems (1990). IEEE Control Systems Magazine, 10(3).