Modelling of Nonlinear Systems from Input-Output

Proceedings of the 40th IEEE Conference on Decision and Control Orlando, Florida USA, December 2001

ThM02-3

Modelling of Nonlinear Systems from Input-Output Data for State Space Realization D . C . F o l e y a n d N. S a d e g h Georgia Institute of Technology George W. Woodruff School of Mechanical Engineering Atlanta, GA 30332

of nonlinear input-output maps, and [17, 8] presents explicit algorithms to perform the transformation, when possible. Specifically, we will examine here the use of the method of paper [8].

Abstract In this paper, we examine data driven modelling procedures for creating a discrete-time input-output map that can be transformed into an observable state space form. We first present previous results of a model form that guarantees the existence of an observable state space realization, as well as the state equations that can be implemented using that form. We then examine the feasibility of NARMA models, feedforward neural networks, and nodal link perceptron networks with local basis functions in creating the model. Simulation results are shown for these model types, as well as a linear model for comparison.

In light of these tools, it is desirable to have a variety of modelling options available that satisfy the necessary and sufficient conditions for transformation. This paper will discuss the single-input-single-output(SISO) modelling issues associated with satisfying these conditions and present some new techniques for creating a viable model. Two works similar to what will be presented here, [3] and [6], introduce a neural network based on a similar class of nonlinear input-output maps, with the second presenting training algorithms and simulation results. In this paper, we will first introduce a class of models that satisfy the realizability conditions and show the resultant state equations. We will then present four modelling options that can accommodate the conditions(NARMA including linear modelling, feedforward neural networks and nodal link perceptron networks with local basis functions) and examine the usability of each. Simulation results will be shown for a theoretical mass-spring-damper system.

1 Introduction Many real world nonlinear systems are often difficult to characterize by first principle modelling. Analytically, the nonlinearities are often too complex to quantify and thus these systems are frequently modelled as inputoutput(i/o) maps of sampled data from the system [2, 4]. This can be done in a variety of ways, some of the more popular being Nonlinear Autoregressive Moving Average(NARMA) and feedforward neural networks. But no matter how the model is obtained, if it was not derived from the physical laws of the system it is most likely not realizable in state space form, and therefore more difficult to analyze and control.

2 A C l a s s of O b s e r v a b l e State Space Realizable M o d e l s Consider an input-output model of the system created from sampled data to be described by:

There has been much work done on the subject of creating state space realizations from input-output models, for both continuous [19, 20] and discrete [5, 7, 10, 14, 18, 17, 8] systems. In the discrete domain, Leontaritis and Billing [10] derive a recursive i/o map similar to the one we will examine here, via Nerode realization. References [14, 15] examine necessary and sufficient conditions for a Volterra series, and Jakubcyk [5] studied the existence of analytic reversible realizations. Of particular interest here are the papers by Sadegh [17], Kotta et al [7] and their joint work, [8]. These papers present the necessary and sufficient conditions for an observable state space realization for a general class

0-7803-7061-9/01/$10.00 © 2001 IEEE

y[k]- g(y[k-m],...,y[k-1],u[k-m],...,u[k-1])

(1)

where u[k] C ~ and y[k] C ~ represent the process input and output, respectively, and g is a smooth function. We should also note the assumption of [8]" that there exists a constant equilibrium point (g,9) such that ~ - 9(Y,-.., Y, u, ..., ~), and without loss of generality we will assume ~ - ~ - 0. Now consider a state space model of the form

x[k + 1 ] - f(x[k], u[k])

y[k]- h(xN)

2980

(2)

where y j ( k + m ) - E m=0- j - 1 + i) , y (k + i + 1), u ( k + 1)). It should also be noted that for the case when fewer u values t h a n y values are used in formulating the model, the coupling restrictions are relaxed, and t h a t case is covered in [8] as well.

where x[k] C Nn is the state vector and f(., .) and h(.) are s m o o t h ( C °° ) functions• The i n p u t - o u t p u t model, (1) has a state space realization given by (2) if g satisfies a set of necessary and sufficient conditions. To show these conditions and formulate a concise definition of the states, we introduce a block form of the i/o map. If we define blocks of inputs and outputs by u(k) - ( u ( k ) , ..., u ( k + m y(k) - ( y ( k ) , . . . , y ( k + m -

1)), 1))

The state u p d a t e equations are also straightforward, found by simply incrementing the state equations"

(3)

Xl+ -- X2 -{- g l ( X l , ?t) -

and evaluate y ( k ) , y ( k + 1), ..., y ( k + m - 1) recursively in terms of y(k - m), u(k - m) and u(k) we obtain a block i/o map y(k) - G ( y ( k - m), u(k - m), u(k)) where

G

=

(gl, g 2 ..., g n ) T

(7) + X m _ 1 -- Xm + grn--1 ( X l , X t , U)

=

where x + (k) - x i ( k + 1)•

= Yl

The state model requires no inverse maps or approximations, allowing a simple translation from inputoutput model to state model. Because of this simple extended form, the dynamic model error and state equation error should be within computational tolerances of each other.

=

It can be shown that the necessary and sufficient conditions are satisfied if ~-%yG ( y , u, v) is nonsingular, and [ ~ y G ( y , u , v ) ] -1 ° G ( y , u , v )

is independent

of

the

third variable v on a neighborhood of the origin, the proof of which is given in [17].

3 Modelling Options Since it is solely the couplings of the model t h a t are restricted, nearly any nonlinear modelling technique can be applied to the sub-models, t). We will discuss here three popular options: polynomial models (NARMA_) including linear modelling, back propagation neural networks, and nodal link perceptron networks(NLPN) with local basis functions.

These conditions are always satisfied if the following form for g is upheld: g(yl,...,ym,ul,...,um) ~--]i=0 g m - i ( Y i + l , Y i + 2 , U i + l ) + ~]l(Yrn,Um) -

where the ~i's are smooth maps. graphically for the m - 3 case in duced coupling form of the model deal of freedom in choosing model rated functions will alter s t a n d a r d

,

(4)

g l ( y , u, v)

g(Yl, ..., Ym, Ul, ..., ltm), and g i ( y , u, v) g(Yi, ..., Yn, gl (y, u, v), . . . , gi-1 (y, u, v), lti, ..., ltm, Vi, ..., Vi--1) for i 2, ..., n. y ( k - m ) , Y2 - y ( k - m + 1), etc.

+

(5)

The form is shown Figure 1. This restill leaves a great type, but the sepatraining methods.

3.1 P o l y n o m i a l M o d e l l i n g Polynomial models are attractive because of their simplicity in form and training. The reduced coupling restriction amounts to no more than the removal of a few terms, and they can be trained via least squares in one step. A downfall of these models is their potential for explosive error growth as the model is run dynamically, using its own feedback as the input at the next time step. The modelling procedure is discussed in detail in [10, 11].

.............................................................................................................................. U3 U2 Ul

,.- Y Y3

Y2 Yl

A generic discrete polynomial model can be expressed aS n

F i g u r e 1: Reduced Coupling Model, m -

y(k)-

3. n

The states for a model with this reduced coupling form can then be chosen simply as

j=l i=1 m--1

E

x~(k) - y ( k ) X2 (]~) - - Yl (]g -t- 1) •

m

i=1

m

+ E E

j=l i=1

.

m--1 i=1 rn rn

Ch,iUk-i~tk-i+ 1 at- E

E

C 6 , ( i , j ) Y k - i U k - j -+'''"

j = l i=1

(8) where the c's are trainable coefficients, m is still the order of our system, and the power term n and the

(6)

2981

number of cross product terms is left to the designer. The system is linear in the parameters no matter how many cross product terms are selected, so even a very high order complex model can be trained very easily. The reduced coupling restriction of equation (5) can be implemented simply by removing some of the cross product terms, such that y ( k - i) is coupled only with y(k - i + 1) and u ( k - i). One of the problems involving training a model based on sampled data from the system becomes evident when dealing with complex polynomials with a high n and/or a high number of cross product terms. If the natural frequency of the system is high, and thus the sampling rate is fast, it is often true that y(k) and y(k + 1) are very close together. When these terms are raised to a high power they can be numerically indistinguishable, causing singularity issues in the least squares training. If a simple model does not give adequate results, it may not be possible to increase the number of terms to improve accuracy.

×

Figure

2: Reduced Coupling Neural Network Model, m=3.

ple remains exactly the same for the reduced coupling model, but the training must be performed in m steps (for each of the sub-networks) at each iteration. The update equations can be described by

Linear modelling can be thought of as a special case of the polynomial model, where there are no cross products and n = 1. Obviously, a linear model would alleviate the need for complex analysis tools and controllers and would always be attempted first. We will use the linear model as a basis of comparison in the simulation results.

W i ( k + 1) - Wi(k) - ( J T J i + #I) -1 JiTe

(9)

where Wi is a combination of the W's and b's (of Figure 2) of each subnetwork i, J is the Jacobian of the error criterion with respect to the parameters of the i th subnetwork, # is the gradient descent weighting, and e is the error between the target and current network output. Although this is a standard algorithm, the computational expense can be high since a Jacobian must be computed m times during each iteration.

3.2 Back P r o p a g a t i o n Neural N e t w o r k M o d elling The universal approximation capability of feedforward neural networks make them a good option for nonlinear modelling. Of course, training can be slow and they perform poorly outside the bounds of the training set. Network size can also be an issue for complex systems. Neural network modelling for nonlinear dynamic systerns is discussed in [12, 13].

Because of the saturation properties of most of the transfer functions used in feedforward neural networks, it is well known that they may perform poorly outside their training range. When a full range of data is unavailable, a polynomial model may be a better choice than a neural network. If a full range of data is available and the nonlinearity of the model is high (when a linear model is clearly insufficient), a neural network can often outperform a polynomial due to their unbounded complexity.

Training the neural network model in light of the reduced coupling restriction is more complex than in the polynomial case. The reduced coupling neural network model can be thought of as a linear combination of a set of neural networks. Figure 2 shows an m = 3 case, with one hidden layer in each sub-network. Ri values are the number of nodes chosen to be in each hidden layer. The formulation of the model is straightforward, but unfortunately, the form is not conducive to prepackaged training algorithms, such as Matlab's neural network toolbox.

3.3 N o d a l Link P e r c e p t r o n N e t w o r k s ( N L P N ) The NLPN networks to be discussed here employ local basis functions, and can be trained simply with least squares, as described in [16]. The architecture has a resemblance to cerebellar model articulation controller (CMAC) networks [1, 9], where nonlinear functions are approximated by local multilinear splines. We assume our input space, A C ~m is shaped such that A = [OZl,/~1] X - - . X [OZrn,/~m], and each interval [c~i,/3i] is divided into subintervals:

Generally, training a feedforward neural network is achieved using an iterative algorithm such as Levenberg-Marquardt. The weights and biases are updated based on a combination of Newton's method and gradient descent, with the error criterion traditionally defined by sum or mean squared error. The princi-

aj--7 °