IDENTIFICATION OF FULLY PARAMETERIZED LINEAR ... - CiteSeerX

3 downloads 0 Views 78KB Size Report
systems. The nonuniqueness of the full parameterization is dealt with by a ... Keywords: System identification, state-space methods, nonlinear models, nonlinear.
IDENTIFICATION OF FULLY PARAMETERIZED LINEAR AND NONLINEAR STATE-SPACE SYSTEMS BY PROJECTED GRADIENT SEARCH Vincent Verdult ∗ Niek Bergboer ∗∗ Michel Verhaegen ∗



Delft University of Technology, Faculty of Information Technology and Systems, Control Systems Engineering, P. O. Box 5031, NL-2600 GA Delft, The Netherlands ∗∗ Maastricht University, Department of Computer Science, St Jacobstraat 6, Maastricht, The Netherlands

Abstract: A nonlinear optimization-based identification procedure for fully parameterized multivariable state-space models is presented. The method can be used to identify linear timeinvariant, linear parameter-varying, composite local linear, bilinear, Hammerstein and Wiener systems. The nonuniqueness of the full parameterization is dealt with by a projected gradient search to solve the nonlinear optimization problem. Both white and nonwhite measurement noise at the output can be dealt with in a maximum likelihood setting. It is proposed to use subspace identification methods to initialize the nonlinear optimization problem. A computationally efficient and numerically reliable implementation of the procedure is discussed in detail. Keywords: System identification, state-space methods, nonlinear models, nonlinear programming, numerical algorithms, maximum likelihood, bilinear systems.

1. INTRODUCTION Efficient and reliable identification algorithms for multivariable linear and nonlinear systems are of considerable interest to both academic research groups and industry to solve large-scale practical problems. Although a multitude of identification methods for multivariable linear time-invariant (LTI) systems exist, the choices are limited when dealing with multivariable nonlinear systems. To deal with multivariable systems, state-space models offer considerable advantages over inputoutput descriptions. State-space models for LTI systems can be identified using the classical maximumlikelihood methods or using subspace methods (Ljung, 1999). In the maximum-likelihood methods the system is parameterized and the optimal values of the parameters are obtained by minimizing a cost function. In general, the cost function is a nonconvex nonlinear function that has to be minimized by an iterative procedure. Hence, maximum likelihood methods require a reasonable initial guess of the parameters. Subspace identification methods (Van Overschee and De Moor,

1996; Verhaegen, 1994) can be used to provide such an initial guess. The subspace methods are noniterative, but not optimal in a maximum likelihood sense. The drawbacks of maximum likelihood and subspace methods can be overcome by combining the two. For LTI systems the combination of subspace and optimization based identification methods has been proposed by several authors (for example Ljung, 1999). A delicate issue in maximum-likelihood methods is the choice of system parameterization. Using canonical forms like the observer canonical form can lead to bad numerical properties. To avoid the choice of parameterization McKelvey and Helmersson (1997) proposed to use a fully parameterized state-space model and deal with the nonuniqueness of this model using a special gradient projection method for minimizing the maximum likelihood cost function. This projection can be regarded as choosing a local parameterization of the system at each iteration step of the numerical minimization of the the maximum-likelihood cost function. Therefore, this method is also referred to as the local parameterization

method or the data driven local coordinates (DDLC) method. Currently, the topological and geometrical properties of this approach are investigated by Ribarits and Deistler (2002) (see also Deistler and Ribarits, 2001). A similar method has been proposed by Lee and Poolla (1999) for the identification of linear parameter-varying (LPV) systems. In this paper we show that the gradient projection method of McKelvey and Helmersson (1997) and Lee and Poolla (1999) can also be used for the identification of the following multivariable nonlinear statespace systems: bilinear systems, composite local linear systems, Hammerstein systems and Wiener systems. For most of these systems subspace methods are available to generate the initial starting point, and thus, a practical identification method is available by combining a maximum-likelihood and a subspace method. The paper also discusses computationally efficient and numerically reliable implementation of the identification procedures. The paper is organized as follows: Section 2 describes the types of linear and nonlinear models that we consider. Section 3 describes the output error identification problem; the output is only disturbed by white measurement noise. A projected gradient algorithm is presented to minimize the output error cost function. As mentioned above a reasonable initial starting point for the projected gradient algorithm can be obtained using subspace identification; this is discussed in Section 4. Two possibilities to deal with nonwhite output disturbances are discussed in Section 5. A computationally efficient and numerically reliable implementation of the proposed identification procedure is discussed in Section 6. Due to lack of space, we do not present any examples of the proposed approach. The reader can find bilinear, LPV and composite local linear examples in: Verdult (2002), Verdult et al. (2001), Verdult et al. (2002). 2. IDENTIFICATION OF LINEAR AND NONLINEAR STATE-SPACE SYSTEMS The identification framework that is presented is suitable for a number of different model structures. In all models x(k) ∈ Rn is the state sequence, u(k) ∈ Rm the input, and y(k) ∈ R` the output. The first model structure to be considered is the linear parameter-varying system     x(k) u(k) x(k + 1) = A +B , p(k) ⊗ x(k) p(k) ⊗ u(k)     x(k) u(k) y(k) = C +D , p(k) ⊗ x(k) p(k) ⊗ u(k) where the signal p(k) ∈ Rs is the time-varying parameter, which is assumed to be known, ⊗ is the Kronecker matrix product, and A = [A0 , A1 , . . . , As ] with Ai ∈ Rn×n , i = 1, 2, . . . , s. The matrices B, C and D can be partitioned similarly with Bi ∈ Rn×m , Ci ∈ R`×n , and Di ∈ R`×m .

This LPV model structure also accommodates other model structures as follows: • For linear time-invariant systems, we have s = 0, and thus the dependency on p(k) disappears. • For bilinear systems, we have s = m, p(k) = u(k), and Bi = 0, Ci = 0 and Di = 0 for i = 1, 2, . . . , m. Another model structure that can be dealt with is a composite local linear system (Murray-Smith and Johansen, 1997) in which the state equation consists of a weighted combination of LTI models x(k + 1) =

s    X fi φ(k) Ai x(k) + Bi u(k) + Oi , i=1

y(k) = C0 x(k) + D0 u(k),

where Oi represent offsets on the state. The weights fi (φ(k)) are taken as radial basis functions and φ(k) is the scheduling vector which is assumed to be a known function of the input u(k) and the output y(k) The scheduling vector represents the operating point of the system. The weights determine which combination of local models is active based on the operating point of the system. Note that by taking the parameters p(k) in the LPV structure equal to fi (φ(k)), the composite local model structure can be regarded as a special case of the LPV system. In addition to these model structures, the identification framework presented in this paper can handle nonlinear Hammerstein and Wiener systems. In Hammerstein systems, the input first passes through a smooth nonlinear function before entering an LTI system. We assume the following special structure for the Hammerstein system   x(k + 1) = A0 x(k) + Bf u(k) ,   y(k) = C0 x(k) + Df u(k) ,

with f : Rm → Rm(s+1) a fixed and smooth nonlinear function. For example, the elements of f can be polynomial functions of u(k) and then B and D, contain the weights to combine these functions. In Wiener systems, the output of an LTI system passes through a smooth nonlinear function. We consider Wiener systems with the following structure: x(k + 1) = A0 x(k) + B0 u(k), z(k) = C0 x(k) + D0 u(k),   y(k) = W f z(k) ,

with f : R` → R` , and W the corresponding weights. Other structures of the nonlinear parts in the Hammerstein and Wiener systems can be used, but are not discussed here. It is also possible to deal with a Hammerstein-Wiener combination, where the LTI part is between two static nonlinearities. Although we will only discuss static nonlinearities in combination with LTI systems, we can also handle static nonlinearities in combination with LPV systems.

3. OUTPUT ERROR IDENTIFICATION This section introduces the output error identification problem. It is assumed that the output of the system is disturbed by white measurement noise. More general disturbances are dealt with in Section 5. 3.1 System Parameterization In this paper we adopt a full parameterization of the system matrices which is given by   A B , (1) θ = P vec C D

where P is a selection matrix that discards the entries that are zero by definition (as in the bilinear model for example). Note that the LTI parts of the Hammerstein and Wiener systems can also be described in this way. In the composite local linear system, the radial basis functions fi need to be determined as well. These functions are parameterized by their means and variances, collected in the parameter vector ρ. For the Wiener systems the weights W also need to be determined, these weights stored in the parameter vector ρ = vec(W ). 3.2 The Cost Function To identify one of the models of Section 2 we search for a set of parameters θ and ρ such that the output of the model yb(k; θ, ρ) approximates the output y(k) of the real system sufficiently accurately. To achieve this goal, the output error is minimized with respect to the parameters θ and ρ. The output-error cost function is given by VN (θ, ρ) :=

N X

||y(k) − yb(k; θ, ρ)||22

k=1 T EN (θ, ρ)EN (θ, ρ),

=  T where EN (θ, ρ) = e(1)T e(2)T e(3)T · · · e(N )T , e(k) = y(k) − yb(k; θ, ρ), and N is the total number of measurements available. Minimization of (2) is a nonlinear, nonconvex optimization problem because of the nonlinear dependence of yb(k; θ, ρ) on θ and ρ. In the next section we present an algorithm to numerically search for a solution to this optimization problem. The cost function for the Wiener system can be modified and the weights W can be eliminated from the problem using the principle of separable least squares (Golub and Pereyra, 1973). This was pointed out by Bruls et al. (1999). 3.3 Projected Gradient Search The input-output behaviors of the state-space systems of Section 2 do not change under a nonsingular linear similarity transformation T ∈ Rn×n of the state: xT (k) = T −1 x(k). Since we use a full system parameterization, the minimization of VN (θ, ρ) does not

have a unique solution: there exist different parameter values θ that yield the same input-output behavior of the model and hence the same value of the cost function. Below, we present an iterative projected gradient search method that deals with this nonuniqueness by restricting the parameter update at each iteration to directions in which the cost function changes. Such a method has been previously described by Lee and Poolla (1999) for LPV models, by McKelvey and Helmersson (1997) for LTI models and by Verdult et al. (2002) for bilinear models. The nonuniqueness due to the similarity transformation can be characterized by the similarity map which is defined as:  −1    A(θ) B(θ) T 0 S(θ, T ) := C(θ) D(θ) 0 I`   I ⊗T 0 . × s+1 0 Im(s+1) Lee and Poolla (1999) have shown that by linearizing the similarity map, the directions in which the cost function change are given by the left null space of the matrix    s+1  X A(θ)ΠTi ΠTi ⊗ M (θ) := 0m(s+1)×n C(θ)ΠTi i=1     In A(θ)T , ⊗ − 0`×n B(θ)T   where Πi := 0n×(i−1)n In 0n×(s+1−i)n . The matrix M (θ) has full column rank if the similarity map S(θ, T ) for a fixed θ is locally one-to-one around In . For LTI, Wiener and Hammerstein systems, the similarity map is one-to-one if the pair (A, B) is controllable or if the pair (A, C) is observable; conditions for LPV systems have been described by Lee and Poolla (1999), conditions for bilinear systems and composite local linear models by Verdult (2002). The matrix M (θ) does not take into account that certain blocks of the matrices A, B, C, and D can be zero by definition (as is the case for the LTI, bilinear, composite local linear models, Hammerstein and Wiener models). Since these zero blocks do not change the degrees of freedom of the similarity transformation, these zero blocks can be taken into account by simply discarding the corresponding rows of the matrix M (θ). In other words the left null space of the matrix P M (θ) will be determined, where P is the selection matrix of equation (1). Other constraints on the parameters θ, of the form Γθ = θ0 , with Γ and θ0 given, can be taken into account by incorporating them in the derivation of the matrix M (θ) as discussed by Avdeenko (2002). The left null-space of a matrix is usually obtained using a singular value decomposition. However, since P M (θ) is of full rank we can use a QR factorization, which is computationally faster:     R1 (θ) P M (θ) = Q1 (θ) Q2 (θ) . (2) 0

The columns of the matrix Q2 (θ) form a orthonormal basis for the left null-space of M (θ). A gradient search will be performed in this resulting local parameter subspace. We propose to use a dedicated trustregion implementation of the Levenberg-Marquardt algorithm (Moré, 1978). For this algorithm the Jacobian of the error vector EN (θ, ρ) is needed; it is given by:   ΨN (θ, ρ) := ΨN,θ (θ, ρ) ΨN,ρ (θ, ρ)   ∂EN (θ, ρ) ∂EN (θ, ρ) . = ∂θT ∂ρT After integrating the local gradient search into this algorithm, subsequent iterations update the parameters as  (i+1)   (i)  θ θ = (i) + d(θ (i) , ρ(i) , λ(i) ), ρ(i+1) ρ in which d(θ (i) , ρ(i) , λ(i) ) is given by d(θ, ρ, λ) = −Q2 (θ)Φ(θ, ρ, λ)−1 Q2 (θ)T ×ΨN (θ, ρ)T EN (θ, ρ),

(3)

with Φ(θ, ρ, λ) := Q2 (θ)T ΨN (θ, ρ)T ΨN (θ, ρ)Q2 (θ) + λI,  T Q2 (θ) 0 Q2 := , 0 Iq and q the number of parameters in ρ. The LevenbergMarquardt regularization parameter λ(i) is determined in each iteration and depends on the linearity of the cost function in the vicinity of the point (θ (i) , ρ(i) ). The regularization in the Levenberg-Marquardt takes care of nonuniqueness in the nonlinear system representations that is not due to the similarity transformation. Besides Q2 (θ), the vector EN (θ) and its Jacobian ΨN (θ, ρ) are needed to compute (3). According to (2), EN (θ) follows from the output of the model yˆ(k; θ (i) ) which can be obtained by simulating the state-space system. The Jacobian can also be obtained by simulating a dynamic state-space model. This will be discussed in detail in Section 6.2.

4. INITIALIZATION As mentioned before, the initial starting point of the projected gradient search has a big influence on the final result. A good initial starting point for most of the model structures of Section 2 can be obtained using subspace identification. Subspace identification methods have been described for LTI (Van Overschee and De Moor, 1996; Verhaegen, 1994), LPV (Verdult and Verhaegen, 2002; Verdult, 2002), bilinear (Favoreel et al., 1999; Favoreel, 1999; Verdult and Verhaegen, 2001; Chen and Maciejowski, 2000; Verdult, 2002), Wiener (Westwick and Verhaegen, 1996), and Hammerstein systems (Verhaegen and Westwick, 1996)

The parameters ρ (W ) in the Wiener system can be initialized from an estimate of the signal z(k) provided by subspace identification. For the local linear model structure no subspace identification methods are available. A natural way to initialize the model is to estimate a global LTI statespace model using subspace identification and to take all the local models equal to this linear model. The initial weighting functions are distributed uniformly over the operating range (Verdult et al., 2001). Using linear models for initialization has been proposed and motivated by Sjöberg (1997). More sophisticated ways of initializing these local linear model structures, including subspace identification are a topic for further research. 5. DEALING WITH COLORED NOISE In Section 3 the output measurement noise was assumed to be a white-noise sequence. If the output measurement noise is nonwhite the optimization of the cost function (2) leads to estimates of the parameters θ, and ρ that are not of minimum variance in a maximum likelihood sense. This problem can be overcome by modifying the cost function. Minimum variance estimates can be obtained by minimizing a weighted cost function T −1 EN ), EN )T (Σ−1/2 Σv EN = (Σ−1/2 VN (θ, ρ) = EN v v

in which the weighting matrix Σv is based upon a model of the measurement noise vk . If the weighting matrix Σv is taken equal to the covariance matrix of the residual vector EN that results from the measurement noise, maximum-likelihood estimates of the system matrices can be obtained. The computation of the huge dimensional inverse covariance matrix Σ−1 can be done in a computav tionally efficient way by modeling the noise vk by an multivariable AR model and exploiting the GohbergHeinig explicit inverse of a Toeplitz matrix. This was recently pointed out by David and Bastin (2001). Bergboer et al. (2002) described a computationally efficient implementation of this method. Another way to obtain minimum variance estimates is by minimizing the prediction error instead of the output error. For LTI systems this is well known and the Kalman filter can be used for deriving the predictor. For the other systems, especially the nonlinear ones, deriving the predictor is far from trivial. The LPV and bilinear systems would require a timevarying Kalman gain (Fnaiech and Ljung, 1987). 6. EFFICIENT IMPLEMENTATION It is well-known that computing the search-direction using (3) is inefficient and inaccurate. The QR factorization can be used to implement the parameter update rule (3) in a numerically reliable way, in which only the R factor needs to be computed, and which is such that only one factorization is needed for several values

of the regularization parameter λ. Furthermore, for huge data lengths, it is possible with this QR factorization to process the data in batches to compute the residuals EN (θ, ρ) and the gradients ΨN (θ, ρ). In this way the identification method can deal with huge data lengths, because only a small batch of the data needs to be stored in memory. Details of these steps have been presented by Bergboer et al. (2002).

 Xj (k; θ) p(k) ⊗ Xj (k; θ) !  p X ∂C(θ) x ˆ(k; θ) + Q2 (i, j) p(k) ⊗ x ˆ(k; θ) ∂θi i=1 !  p X ∂D(θ) u(k) + Q2 (i, j) , p(k) ⊗ u(k) ∂θi

ΩN (k, j) = C(θ)



i=1

where the state sequence 6.1 Obtaining the Local Parameter Subspace

Xj (k; θ) :=

The determination of the left null space of the matrix M (θ) accounts for much of the computation time of the method. It can be computed efficiently using a QR factorization (2) based on Householder rotations. Given the Householder vectors, the matrix Q2 can be calculated directly, without calculating Q1 , by applying all Householder rotations to [0 I]T . Efficient numerical implementations exist for this (Golub and Van Loan, 1996, p. 211). Recently, McKelvey (2002) proposed an alternative way to obtain the local parameter subspace for LTI systems which results in considerable computational savings. It is based on using the impulse response of the LTI system, and results in performing a QR factorization on a matrix of which the number of rows grows linearly with n instead of quadratically as in M (θ). This method can also be used for Hammerstein and Wiener systems. Based on the results of Isidori (1973) an extension to bilinear systems is possible, but not useful, because it results in a matrix of which the number of rows grows exponentially with n.

6.2 Computing the Projected Gradient

∂EN Q2 . ∂θT The kth element of the jth column of ΩN can be written as ΩN := ΨN,θ Q2 =

ΩN (k, j) =

p X ∂ yˆ(k; θ) i=1

∂θi

Q2 (i, j),

i=1

(4)

where p is the number of parameters in θ. As differentiation is a linear operation, this sum can be obtained from

∂θi

Q2 (i, j)

follows by simulating the dynamic equation   Xj (k; θ) Xj (k + 1; θ) = A(θ) p(k) ⊗ Xj (k; θ) !  p X ∂A(θ) x ˆ(k; θ) Q2 (i, j) + p(k) ⊗ x ˆ(k; θ) ∂θi i=1 !   p X ∂B(θ) u(k) + Q2 (i, j) . p(k) ⊗ u(k) ∂θi i=1

The weighted sums of the derivatives of the various system matrices can be obtained from a column of Q2 as follows:   p p X ∂A X ∂B Q2 (i, j) Q2 (i, j)    ∂θi ∂θi i=1 i=1  p  vec  p   X X ∂D ∂C  Q2 (i, j) Q2 (i, j) ∂θi ∂θi i=1 i=1   p X ∂ A B = Q2 (i, j) vec C D ∂θi i=1 =

Memory usage and computation time can be reduced by computing the product ΨN,θ Q2 directly, rather than first calculating ΨN,θ and then multiplying by Q2 . Since the number of columns in M (θ) is n2 , the number of columns in ΨN,θ Q2 is n2 less than the number of columns in ΨN,θ . For systems having a large order this difference can be substantial, especially for LTI systems since the number of columns in ΨN,θ Q2 will be proportional to n whereas the number of columns in ΨN,θ is proportional to n2 . We start with the LPV model structure. The columns of ΨN,θ Q2 are the directional derivatives of EN in the directions specified by the columns of Q2 ,

p X ∂x ˆ(k; θ)

p p X X ∂θ Q2 (i, j) = ei Q2 (i, j) = Q2 ( : , j), ∂θi i=1 i=1

in which ei denotes a vector which contains zeros, except for the ith component, which equals one. For LTI, bilinear models, and local linear models some of the entries in the system matrices are zero by definition, and the weighted sum is obtained as   p X ∂ A B vec Q2 (i, j) = Q2 ( : , j). P C D ∂θi i=1

For Hammerstein systems, the input signal in the procedure outlined above should be replaced by f (u(k)). For Wiener systems, the chain rule for differentiation must be applied to the output equation. For composite local linear and Wiener systems also ΨN,ρ , the part of the Jacobian related to the parameters ρ must be computed. It is not difficult to show that for composite local linear models ΨN,ρ can also be obtained by simulating a dynamic equation (Verdult et al., 2001). For Wiener systems, the system’s states do not depend on ρ and thus ΨN,ρ can be obtained from a static relation. Successful identification requires that the Jacobian ΨN (θ, ρ) is bounded. Thus, the dynamic equations for the Jacobian computation must be stable. It

is easy to verify for LTI, Wiener, Hammerstein, and bilinear systems that the dynamics governing the Jacobian computations are stable if the model corresponding to θ is stable. Since the optimization method aims at minimizing the output error, it is very unlikely that the parameters θ describing the model are modified towards instability of the model. For LPV and local linear models the dynamics governing the Jacobian computations are stable if the model corresponding to θ is stable and in addition the parameter p(k) or the scheduling vector ψ(k) does not depend on the output or state of the model (Verdult et al., 2001; Verdult, 2002). 7. CONCLUSIONS Multivariable linear and nonlinear state-space systems can be identified by numerically solving a nonlinear optimization problem in which the system is fully parameterized. The nonuniqueness of the state-space representation is taken into account by solving the optimization problem using a projected gradient search that restricts the update of the parameters at each iteration to directions that change the input-output behavior. This paper shows that such a method can be used for LTI, LPV, bilinear, composite local linear, Hammerstein and Wiener systems. Colored output noise can be taken into account in a maximum likelihood procedure by estimating the inverse covariance matrix of the residuals and using the inverse of this matrix as a weighting in the cost function, or alternatively by minimizing the prediction error if a predictor can be derived. It is pointed out that the optimization-based method can be initialized by a model obtained from subspace identification. REFERENCES Avdeenko, T. (2002). On structural identifiability of system parameters of linear models. In: Preprints of the 15th IFAC World Congress. Barcelona, Spain. Bergboer, N., V. Verdult and M. Verhaegen (2002). An efficient implementation of maximum likelihood identification of LTI state-space models by local gradient search. In: Proceedings of the 41st IEEE Conference on Decision and Control. Las Vegas, Nevada. Bruls, J., C. T. Chou, B. Haverkamp and M. Verhaegen (1999). Linear and non-linear system identification using separable leastsquares. European Journal of Control 5(1), 116–128. Chen, H. and J. Maciejowski (2000). An improved subspace identification method for bilinear systems. In: Proceedings of the 39th IEEE Conference on Decision and Control. Sydney, Australia. David, B. and G. Bastin (2001). An estimator of the inverse covariance matrix and its application to ML parameter estimation in dynamical systems. Automatica 37(1), 99–106. Deistler, M. and T. Ribarits (2001). Parametrizations of linear systems by data driven local coordinates. In: Proceedings of the 40th IEEE Conference on Decision and Control. Orlando, Florida. pp. 4754–4759. Favoreel, W. (1999). Subspace Methods for Identification and Control of Linear and Bilinear Systems. PhD thesis. Faculty of Engineering, K. U. Leuven. Leuven, Belgium. Favoreel, W., B. De Moor and P. Van Overschee (1999). Subspace identification of bilinear systems subject to white inputs. IEEE Transactions on Automatic Control 44(6), 1157–1165.

Fnaiech, F. and L. Ljung (1987). Recursive identification of bilinear systems. International Journal of Control 45(2), 453–470. Golub, G. H. and C. F. Van Loan (1996). Matrix Computations. third ed.. The Johns Hopkins University Press. Baltimore, Maryland. Golub, G. H. and V. Pereyra (1973). The differentiation of pseudoinverses and nonlinear least squares problems whose variables separate. SIAM Journal of Numerical Analysis 10(2), 413–432. Isidori, A. (1973). Direct construction of minimal bilinear realizations from nonlinear input-output maps. IEEE Transactions on Automatic Control 18(6), 626–631. Lee, L. H. and K. Poolla (1999). Identification of linear parametervarying systems using nonlinear programming. Journal of Dynamic Systems, Measurement and Control 121(1), 71–78. Ljung, L. (1999). System Identification: Theory for the User. second ed.. Prentice-Hall. Upper Saddle River, New Jersey. McKelvey, T. (2002). A new minimal local parametrization for multivariable linear systems. In: Preprints of the 15th IFAC World Congress. Barcelona, Spain. McKelvey, T. and A. Helmersson (1997). System identification using an over-parametrized model class: Improving the optimization algorithm. In: Proceedings of the 36th IEEE Conference on Decision and Control. San Diego, California. pp. 2984–2989. Moré, J. J. (1978). The Levenberg-Marquardt algorithm: Implementation and theory. In: Numerical Analysis (G. A. Watson, Ed.). Vol. 630 of Lecture Notes in Mathematics. pp. 106–116. Springer Verlag. Berlin. Murray-Smith, R. and T. A. Johansen (1997). Multiple Model Approaches to Modelling and Control. Taylor and Francis. London. Ribarits, T. and M. Deistler (2002). Data driven local coordinates: Some new topological and geometrical results. In: Preprints of the 15th IFAC World Congress. Barcelona, Spain. Sjöberg, J. (1997). On estimation of nonlinear black-box models: How to obtain a good initialization. In: Proceedings of the 1997 IEEE Workshop Neural Networks for Signal Processing VII. Amelia Island Plantation, Florida. pp. 72–81. Van Overschee, P. and B. De Moor (1996). Subspace Identification for Linear Systems; Theory, Implementation, Applications. Kluwer Academic Publishers. Dordrecht, The Netherlands. Verdult, V. (2002). Nonlinear System Identification: A State-Space Approach. PhD thesis. University of Twente, Faculty of Applied Physics. Enschede, The Netherlands. Verdult, V. and M. Verhaegen (2001). Identification of multivariable bilinear state space systems based on subspace techniques and separable least squares optimization. International Journal of Control 74(18), 1824–1836. Verdult, V. and M. Verhaegen (2002). Subspace identification of multivariable linear parameter-varying systems. Automatica 38(5), 805–814. Verdult, V., L. Ljung and M. Verhaegen (2001). Identification of composite local linear state-space models using a projected gradient search. Technical report. Accepted for publication in International Journal of Control. Verdult, V., N. Bergboer and M. Verhaegen (2002). Maximum likelihood identification of multivariable bilinear state-space systems by projected gradient search. In: Proceedings of the 41st IEEE Conference on Decision and Control. Las Vegas, Nevada. Verhaegen, M. (1994). Identification of the deterministic part of MIMO state space models given in innovations form from inputoutput data. Automatica 30(1), 61–74. Verhaegen, M. and D. Westwick (1996). Identifying MIMO Hammerstein systems in the context of subspace model identification methods. International Journal of Control 63(2), 331–349. Westwick, D. and M. Verhaegen (1996). Identifying MIMO Wiener systems using subspace model identification methods. Signal Processing 52(2), 235–258.