Identification of nonlinear time-varying systems using ...

1 downloads 0 Views 875KB Size Report
Mar 13, 2015 - A. Billings & P.G. Sarrigiannis (2016). Identification of nonlinear time-varying systems using an online sliding-window and common.
International Journal of Systems Science

ISSN: 0020-7721 (Print) 1464-5319 (Online) Journal homepage: http://www.tandfonline.com/loi/tsys20

Identification of nonlinear time-varying systems using an online sliding-window and common model structure selection (CMSS) approach with applications to EEG Yang Li, Hua-Liang Wei, Stephen. A. Billings & P.G. Sarrigiannis To cite this article: Yang Li, Hua-Liang Wei, Stephen. A. Billings & P.G. Sarrigiannis (2016) Identification of nonlinear time-varying systems using an online sliding-window and common model structure selection (CMSS) approach with applications to EEG, International Journal of Systems Science, 47:11, 2671-2681, DOI: 10.1080/00207721.2015.1014448 To link to this article: http://dx.doi.org/10.1080/00207721.2015.1014448

Published online: 13 Mar 2015.

Submit your article to this journal

Article views: 86

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=tsys20 Download by: [Beihang University]

Date: 01 May 2016, At: 21:54

International Journal of Systems Science, 2016 Vol. 47, No. 11, 2671–2681, http://dx.doi.org/10.1080/00207721.2015.1014448

Identification of nonlinear time-varying systems using an online sliding-window and common model structure selection (CMSS) approach with applications to EEG Yang Lia , Hua-Liang Weib , Stephen. A. Billingsb,∗ and P.G. Sarrigiannisc a

Department of Automation Science and Electrical Engineering, Beihang University, Beijing, China; b Department of Automatic Control and Systems Engineering, The University of Sheffield, Sheffield, UK; c Department of Clinical Neurophysiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK

Downloaded by [Beihang University] at 21:54 01 May 2016

(Received 9 March 2014; accepted 29 January 2015) The identification of nonlinear time-varying systems using linear-in-the-parameter models is investigated. An efficient common model structure selection (CMSS) algorithm is proposed to select a common model structure, with application to EEG data modelling. The time-varying parameters for the identified common-structured model are then estimated using a sliding-window recursive least squares (SWRLS) approach. The new method can effectively detect and adaptively track and rapidly capture the transient variation of nonstationary signals, and can also produce robust models with better generalisation properties. Two examples are presented to demonstrate the effectiveness and applicability of the new approach including an application to EEG data. Keywords: CMSS algorithm; EEG; nonlinear time-varying system identification; parameter estimation; sliding window; SWRLS approach; time-varying common-structured (TVCS) model

1. Introduction Many processes in engineering systems and the biomedical field exhibit inherently both time-varying and nonlinear behaviours (Chen, Gong, & Hong, 2013). The modelling and identification for nonlinear time-varying systems is usually a challenging task. The identification of dynamical nonlinear systems based on mathematical models is vital in many fields. During recent years, considerable attention has been devoted to the problem of identification of time-varying systems. In many practical cases, the system parameters are unknown and are time varying (Li, Wei, & Billings, 2011). A classical approach for estimating and tracking the temporal variation of a nonlinear system is to employ adaptive algorithms such as recursive least squares (RLS), least mean squares (LMS) and the Kalman filter (Chen, Zhao, Zhu, & Principe, 2013; Ljung & S¨oderstr¨om, 1983). It should be noted that the RLS and LMS algorithms for the estimation of time-varying systems can work well for signals that exhibit slow-transients while they are not quite suitable for fast-transients as the convergence speed of the algorithms is not fast enough (Li, Wei, & Billings, 2011; Zou, Wang, & Chon, 2003). The RLS algorithms are among the most popular algorithms in the field of adaptive signal processing as they can provide better performance of tracking in comparison with the LMS algorithms (Jiang & Zhang, 2004). To improve the tracking capability of the RLS algorithms for estimating parameters in nonstationary sys-



Corresponding author. Email: [email protected]

 C 2015 Taylor & Francis

tems, the application of the sliding-window recursive least squares (SWRLS) algorithm to the estimation of nonlinear system parameters was developed (Djigan, 2006; Jiang & Zhang, 2004). A detailed discussion and derivation about the performance of SWRLS identification and related adaptive control schemes can be found in Choi and Bien (1989) and Nishiyama (2014). It is well known that a large class of nonlinear timevarying systems can be modelled and identified by linearin-the-parameter models (Billings & Wei, 2007; Wei & Billings, 2002). A popular choice of such approaches is that the time-varying parameters for the nonlinear systems can be approximated by a set of basis functions, the nonstationary modelling is then reduced to the time-invariant parameter estimation. For example, Schilling, Carroll, and Al-Ajlouni (2001) presented estimation of the states using artificial neural networks and introduced parameter estimation methods based on a radial basis functions (RBF) neuronal predictor, Legendre and Walsh basis functions have also been employed for tracking smooth and abrupt changes of nonstationary signals, respectively (Zou, Wang, & Chon, 2003). However, these approaches are mainly based on specific model structures with a priori knowledge of the nonstationary systems, which may not be clearly suitable for many structure unknown time-varying systems (Chen, Gong, & Hong, 2013). Although different approaches have also been investigated in Chen and Billings (1985), Chen and Billings

Downloaded by [Beihang University] at 21:54 01 May 2016

2672

Y. Li et al.

(1989), Choi and Bien (1989), Ding, Liu, and Chu (2013), Kingravi, Chowdhary, Vela, and Johnson (2012), Miranian and Abdollahzade ( 2013), Nordsjo and Zetterberg (2001), Schilling et al. (2001) and Sj¨oberg et al. (1995) for parameters estimation of nonlinear systems, only partial and quite weak results have been obtained in terms of time-varying function approximation and nonstationary parameter estimation. The main contribution of this paper is the introduction of a new time-varying common-structured (TVCS) modelling scheme as a solution to the identification problem of nonlinear time-varying systems, where the selection of the common model structure is the critical step throughout the modelling procedure. A new efficient CMSS algorithm is investigated to select a common model structure using an online sliding-window approach. Once the common-structured model has been determined, relevant time-varying parameters for the common-structured model can then be estimated using a SWRLS algorithm. The novel study of the common-structured model identification is particularly useful for engineering system design and control, where only a fixed common model structure is involved but with time-varying parameters. The advantage of the proposed approach is that first, even without a priori knowledge of the nonstationary system, a TVCS model can produce less biased or preferably an unbiased robust model with better generalisation properties, compared with the sort of ‘hold-out’ or ‘split-sample’ data partitioning identification methods, where the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance validation. Generally, the hold-out data partitioning approach is very convenient and the associated model identification procedure is easy to implement. However, it should be noted that the resultant model obtained from such a once-partitioned single training data-set may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made (Hawkins, Basak, & Mills, 2003; Leonard & Roy, 2006). This problem will be even more exaggerated when the system is time-varying. To overcome the drawback of the hold-out data partitioning approach, in this paper, a new common model structure selection (CMSS) algorithm is proposed to produce less biased or preferably unbiased robust models. Especially, when the sliding-window size is equal to the available observed data length, the new CMSS approach will then simplify to the case of the hold-out data partitioning method. Second, the novel method proposed can be used to adaptively track and rapidly capture transient variations of varying parameters and can also be applied to study the performance of the behaviour with the underlying dynamical system characteristics. Two examples, one based on simulation data and the other using a real EEG signal, are

given to show the effectiveness and applicability of the new TVCS modelling method using an online sliding-window approach. We can fit individual models to each window; on the other hand, there are many applications such as fault detection, biomedical engineering where the underlying system characteristics can be revealed by one common model over a series of windows. This is a more challenging problem and is addressed in this paper to achieve a parsimonious identified TVCS model. The TVCS model discussed in the present study is different from the traditional multi-input and multi-output model structure (Komninakis, Fragouli, Sayed, & Wesel, 2002), where each subsystem model may not need to share the same common model structure and which often involves one single data-set.

2. The time-varying linear-in-the-parameter regression model The identification problem of a nonlinear dynamical system is based on the observed input-output data {u (t) , y (t)}N t=1 , where u (t) and y (t) are the observations of the system input and output, respectively Chen & Billings (1985). This study considers a class of discrete stochastic nonlinear systems which can be represented by the following nonlinear autoregressive with eXogenous inputs (NARX) structure below (Chen & Billings, 1989; Chen, Wang, & Harris, 2008; Wei & Billings, 2002; Wei & Billings, 2009): y (t) = f {y (t − 1) , . . . , y(t − ny ), u (t − 1) , . . . , u (t − nu ) , θ } + e (t)

(1)

where u (t) and y (t) are the system input and output variables, respectively, nu and ny are the maximum input and output lags, respectively, f (·) is the unknown system mapping, and the observation noise e (t) is an uncorrelated zero mean noise sequence providing that the function f (·) gives a sufficient description of the system. X (t) = [y (t − 1) , . . . , y(t − ny ), u (t − 1) , . . . , u (t − nu )]T denotes the system ‘input’ vector with a known dimension d = ny + nu and θ is an unknown parameter vector. The NARX model (1) is a special case of the polynomial nonlinear autoregressive moving average with eXogenous inputs (NARMAX) model that takes the form below (Billings, 2013; Billings & Wei, 2005; Billings & Wei, 2007; Chen, Cowan, & Grant, 1991)  y (t) = f y (t − 1) , . . . , y(t − ny ), u (t − 1) , . . . , u (t − nu ) , e (t − 1) , . . . , e (t − ne ) ; θ } + e (t) , (2) The NARMAX model in Equation (2) was developed and discussed in detail (Billings, 2013; Chen & Billings, 1989). The nonlinear mapping f (·) can be constructed using a class of local or global basis functions including RBF,

International Journal of Systems Science neural networks, multi-wavelets and different types of polynomials (Billings & Wei, 2007; Chen & Billings, 1985; Chen & Billings, 1989; Chen et al., 2008; Chng, Chen, & Mulgrew, 1996; Li, Wei, & Billings, 2011; Peng, Ozaki, Haggan-Ozaki, & Toyoda, 2003; Wei & Billings, 2009; Zou et al., 2003). The polynomial model representation of a nonlinear time-varying NARX is represented as y (t) = θ0 (t) +

d 

θi1 (t) xi1 (t) +

d 

θi1 ,i2 (t) xi1 (t) xi2 (t)

i1 =1 i2 =i1

i1 =1

+··· +

d d  

...

d 

θi1 ,...,id (t) xi1 (t) · · · xid (t) + e (t) ,

(3)

Downloaded by [Beihang University] at 21:54 01 May 2016

where θ0 (t), θi1,...,im (t) m = 1, 2, . . . , d are time-varying parameters and 

can be removed from the initial regression equation without any effect on the predictive capability of the model, and this elimination of the redundant regressors usually improves the model performance (Aguirre & Billings, 1995; Billings, 2013; He, Wei, & Billings, 2013). For most nonlinear dynamical system identification problems, only a relatively small number of model terms are commonly required in the regression model. Thus, an efficient algorithm of model term selection is highly desirable to detect and select the most significant regressors (Billings, 2013; Billings & Wei, 2007; Chen et al., 1991).

id =id−1

i1 =1

xk (t) =

2673

y (t − k) , 1 ≤ k ≤ ny u(t − (k − ny )), ny + 1 ≤ k ≤ d

(4)

The degree of a multivariate polynomial is defined as the highest order amongst the terms. If the number of regressors is d and the maximum polynomial degree is λ, the number of polynomial terms is nθ(t) = (λ+d)! (λ!d!) . For large lags ny and nu , the regression model in Equation (1) often involves a large number of candidate model terms, even if the nonlinear degree λ is not very high. For instance, if d is 10 and λ is 3, then nθ(t) is 286. Modelling experience has shown that an initial candidate model with a large number of candidate model terms can often be drastically reduced by including in the final model only the effectively selected significant model terms. The main motivation of this study is to select significant common- structured model terms to form a parsimonious common model structure which generalises well over a series of sliding- windows, the underlying system characteristics can then be deduced by one common model (Aguirre & Billings, 1995; Billings & Wei, 2007). A general form of the time-varying linear-in-the- parameter regression model is given as follows (Wei & Billings, 2009): y (t) =

M  m=1 T

θm (t) φm (t) + e (t)

= ϕ (t)  (t) + e (t) ,

(5)

where M is the total number of candidate regressors. φm (t) = φm (X (t)) (m = 1, 2, . . . , M) are nonlinear functions and θm (t) (m = 1, 2, . . . , M) represents the model time-varying parameters, respectively. ϕ (t) = [φ1 (X (t)) , . . . , φM (X (t))]T and  (t) are the associated regressor and parameter vectors, respectively. Note that in most cases, the initial full regression in Equation (5) might be highly redundant. Some of the regressors or model terms

3. TVCS model identification The CMSS algorithm is a critical step in TVCS identification. Once the common-structured model has been identified, relevant model parameters for each window data-set can then be estimated by a SWRLS algorithm, and the transient properties of the model parameters on the associated data-set can, thus, be deduced. The identification procedure for TVCS models contains the following steps.

3.1. Data acquisition For an original N-sample observational input–output data DN = {u (t) , y (t)}N t=1 , the K + 1 data-sets can be obtained by using an online sliding-window size W with 50% overlap, where the parameter K + 1 is equal to N/ (W/2) − 1, and x denotes taking the upper integer part of the variable x. Note that how to choose the suitable window size W for a time series segmentation is discussed in detail (Fu, 2011; Keogh, Chu, Hart, & Pazzani, 2004).

3.2. The common model structure selection (CMSS) algorithm Assume that a total of (K + 1) data-sets, where the first K represent the training data-sets and the last data-set is used as a test data-set, obtained by the online sliding-window have been carried out on the same system. Also, assume that a common model structure of Equation (5) can be best fit to all the training data-sets. Denote the observed input–output k sequences for the kth data-set by DNk = {uk (t) , yk (t)}N t=1 for k = 1, 2, . . . , K + 1. Thus, the kth ‘input’ vector is represented by Xk (t) = [xk,1 (t) , . . . , xk,d (t)]T = [yk (t − 1) , . . . , yk (t − ny ), uk (t − 1) , . . . , uk (t − nu )]T . Assume that all the K training data-sets can be represented using a common model structure for the different parameters, the initial candidate multiple regression model can then be formulated as (Wei & Billings, 2009)

yk (t) =

M  m=1

θk,m φm (Xk (t)) + ek (t),

(6)

2674

Y. Li et al.

where the parameters θk,m in Equation (6) are timeindependent constants, Equation (6) will be called the timeinvariant common structured model. If the parameters θk,m are time-dependent, the TVCS model is represented by yk (t) =

M 

θk,m (t) φk,m (Xk (t)) + ek (t),

(7)

m=1

where k = 1, 2, . . . , K, m = 1, 2, . . . , M, and t = 1, 2, . . . , Nk . The representation of Equation. (7) using a compact matrix form can be expressed as

Downloaded by [Beihang University] at 21:54 01 May 2016

ϒk = k k + k ,

(8)

where ϒk = [yk (1) , . . . , yk (Nk )]T , k = [θk,1 (t) , . . . , θk,M (t)]T , k = [ek (1) , . . . , ek (Nk )]T and k = [ϕk,1 , . . . , ϕk,M ] with ϕk,m = [φk,m (1) , . . . , φk,m (Nk )]T , for k = 1, 2, . . . , K and m = 1, 2, . . . , M. A new CMSS algorithm will be developed to select a common-structured sparse model from the multiple regressions shown in Equations (6) and (7). Let I = {1, 2, . . . , M} , and denote D = {φm : m ∈ I } as the dictionary of candidate model terms. For the kth window data-set, the dictionary D can be used to form a dual dictionary k = {ϕk,m : m ∈ I }, it should be noted that the mth candidate basis vector ϕk,m is formed by the mth candidate model term φm ∈ D. Thus, the CMSS problem is equivalent to finding a subset {φp1 , φp2 , . . . , φpn } ⊂ D (normally n ≤ M). So that ϒk (k = 1, 2, . . . , K) can be approximated using a linear combination of regression terms {ϕk,p1 , . . . , ϕk,pn } ⊂ k below ϒk = θk,1 (t) ϕk,p1 + · · · + θk,n (t) ϕk,pn + k ,

(9)

The next step of the CMSS algorithm selects significant model terms in a forward stepwise way. The first significant common model term can be selected as the p1 th element, φp1 ∈ D by maximising the sum of error reduction ration (ERR) values for all the K data-sets from I . Thus, the first significant basis vector for the kth regression model is αk,p1 = ϕk,p1 , and the associated orthogonal basis vector can be chosen as qk,1 = ϕk,p1. Generally, the mth significant model term of the kth regression model φk,pm can be chosen by the following steps. It is assumed that at the (m − 1)th step, (m − 1) significant model terms, namely, {φk,1 , . . . , φk,m−1 }, have been selected by maximising the (m-1)th average ERR (AERR) for all the K data-sets from I, which guarantees that the variation of the outputs in all the K data-sets with the highest percentage, compared with choosing any other candidate model term φ ∈ D. The AERR criterion provides a way to select significant vectors one by one. Once the first (m − 1) basis vectors {αk,1 , . . . , αk,m−1 } have been determined, and the associate orthogonal vectors {qk,1 , qk,2 , . . . , qk,m−1 } can be

obtained, then these (m − 1) vectors together with the mth vector αk,m = ϕk,pm , and the associated orthogonal vector qk,m , can be selected step-by-step. Further details of the procedure of the CMSS algorithm can be found in Wei, Lang, and Billings, (2008). To determine the proper common model size, the generalised cross-validation (GCV) criterion Billings, Wei, & Balikhin, (2007) can be adopted to terminate the CMSS procedure. Specially, for the l-term model, the GCV of single 2  N mse (l) , regression model is defined as GCV (l) = N−μl where mse is the mean-squared error, μ = max {1, ρN } and 0 ≤ ρ ≤ 0.01. The average GCV (AGCV) is formulated by K 1  GCV[k] (l), AGCV (l) = K k=1

(10)

where GCV[k] (l) is the value for the GCV criterion associated to the kth data-set. If the AGCV reaches the minimum at l = n, the CMSS procedure is then terminated, yielding an n-term model. Note that CMSS is only used to select most significant model terms in an iterative way, step by step, based upon multiple data-sets. AGCV is used to monitor the overall model length. A simple alternative to AGCV is the Akaike information criterion (Akaike, 1974) which can also be used to determine how many model terms should be included in the final model.

3.3. Model parameter estimation The parameters for the common structured model (7) can be easily calculated by Ak,n = Qk,n Rk,n ,

(11)

where Ak = [ϕk,p1 , . . . , ϕk,pn ], Qk,n is an Nk × n matrix with orthogonal columns qk,1 , . . . , qk,n and Rk,n is an n × n unit upper triangular matrix whose entries are calculated during the orthogonalisation procedure. For a TVCS model of Equation (7), it is also easy to calculate the value of the unknown time-dependent parameters by a SWRLS algorithm (Choi & Bien, 1989; Nishiyama, 2014) for each data window of the (K + 1)th data-set. The transient variation properties of the observational data can thus be deduced by the transient parameter values for the associated data-set.

4. Case study Two examples are provided to illustrate the performance of the proposed TVCS model identification procedure. The data used in the first example are simulated from a known nonstationary model; this is a severely nonstationary process. The objective here is to illustrate the capability of the

International Journal of Systems Science novel TVCS approach for adaptively tracking and rapidly capturing the transient variation for the time-varying parameters. The second example involves a practical modelling problem of a real EEG data.

2675

1.6 1.4

Downloaded by [Beihang University] at 21:54 01 May 2016

4.1. Example 1: simulation data Prior to applying the proposed TVCS modelling approach to a real EEG data, an EEG-like nonstationary signal was considered. The signal below was simulated ⎧ |t| sin (2π fδ t) , t ∈ [0, 1) , ⎪ ⎪ ⎪ ⎪ |t| sin (2π fδ t + ϕ) + |t|2 sin (2π fθ t + ϕ) , t ∈ [1, 2) ⎪ ⎪ ⎪ ⎪ |t| sin (2π fθ t − ϕ) , t ∈ [2, 3) ⎪ ⎪ ⎪ ⎪ |t| sin (2π fα t + ϕ) , t ∈ [3, 4) , ⎪ ⎪ ⎪ ⎪ 3 |t| ⎪ ⎪ ⎪ ⎨ 4 sin (2π fα t + ϕ) + |t| sin 2π fβ1 t + ϕ , y(t) = (12) t ∈ [4, 5) ⎪ ⎪ |t| sin 2π fβ1 t + ϕ , t ∈ [5, 6) ⎪ ⎪ ⎪ ⎪ |t| sin 2π fβ2 t + ϕ , t ∈ [6, 7) ⎪ ⎪ ⎪ ⎪ |t| ⎪ ⎪ sin 2π fβ2 t − ϕ + |t| sin 2π fγ t − ϕ , t ∈ [7, 8) ⎪ ⎪ ⎪ 4 ⎪ ⎪ ⎪|t| sin 2π fγ t + ϕ , t ∈ [8, 9] ⎪ ⎩ 0, otherwise

where frequency components of fδ = 4, fθ = 8, fα = 12 and fγ = 48 Hz are meant to emulate the Delta, Theta, Alpha and Gamma bands of EEG recording data, fβ1 = 15, and fβ2 = 21 Hz are designed to simulate the Beta band of the EEG signal, respectively. Sinusoids in Equation (12) have an initial phase ϕ = 3/8π. The above signal was sampled with a sampling interval 0.01, and thus a total of 900 observations were obtained. An additive Gaussian white noise sequence, with mean zero and variance of 0.04, was then added to the 900 data points. The objective is to identify a TVCS model, and the transient dynamical properties of the analytical signal can then be deduced from the time-varying parameters. Denote the system output sequences using {y (t)}N t=1 , with N = 900. The sliding-window algorithm is attractive because of its intuitiveness, simplicity and particularly the fact it is an online algorithm, which is especially popular with the medical community, since patient monitoring is inherently an online task (Keogh et al., 2004; Vullings, Verhaegen, & Verbruggen, 1997). Titov and McDonald (2008) used a sliding-window with overlaps to model a multi-grain model, as it was computationally less expensive and obtained good results. To produce less biased or preferably an unbiased robust model with better generalisation properties, and further improve the capability of tracking and capturing transient variations for time-varying systems with abruptly changed parameters, a TVCS model approach employed by an online sliding window algorithm is proposed in this study. A suitable window length W of a sliding-window for identification of a CMSS model is decided by a normalised root-mean-squared-error (RMSE) shown in Equation (14) below. The identification performance of the above algo-

AGCV

1.2

1 0.8

0.6

0.4

0.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Model size

Figure 1. AGCA versus model size for common model structure selection models over the output signals.

rithm will depend on the length of the sliding window; for time-invariant systems, in general the longer the window length, the higher the estimation accuracy. However, for a time-varying system with abrupt parameter changes, to achieve rapidly tracking and capturing of the transient variations of the abruptly changed parameters, the proper window length of a sliding-window should be chosen accordingly providing that the RMSE value is very small. Here, from the properties of the simulation signal and a number of simulation experiments with different slidingwindow sizes, the sliding-window length should be chosen as W = 200, with 50% overlap, which gives good model identified results and the RMSE value is much smaller. The sliding-window size W = 200 is applied to the artificial EEG time-varying signal of Equation (12) and then obtained the K + 1 = N/ (W/2) − 1 = 8 data-sets. The first seven training data-sets are used for the commonstructured model identification, and the eighth data-set is used to test the performance of the identified model. The predictor vector for all the common-structured models is chosen to be X (t) = [x1 (t) , x2 (t) . . . , x5 (t)]T , where xk (t) = y (t − k) for k = 1, 2, . . . , 5. The initial common structure for all the first seven training data-sets is chosen to be a NAR model in the following: y (t) = θ0 +

5  i=1

θi xi (t) +

5 5  

θi,j xi (t) xj (t) + e (t).

i=1 j =i

(13) This candidate model involves a total of 21 candidate model terms. Based on the candidate common model structure, the novel CMSS algorithm is applied to the first seven training data-sets. The AGCV criterion, shown in Figure 1, suggests that a common model structure, with six model terms, is preferred. The six selected common model terms,

2676

Y. Li et al. Table 1.

Identification results for the simulation data with the CMSS algorithm for NAR model representation. Parameters for test data-sets

Step 1 2 3 4 5 6

Model term

Data01

Data 02

Data 03

Data 04

Data 05

Data 06

Data 07

AERR(%)

y (t − 3) y (t − 1) y (t − 4) y (t − 2) y (t − 5) y 2 (t − 3)

−0.0936 1.1444 −0.4808 −0.0818 0.1849 0.0125

0.1223 1.3349 −0.1992 −0.6228 −0.0381 0.0215

−0.2970 1.5732 0.2144 −0.8507 −0.0183 0.0248

−0.4126 1.0466 0.2663 −0.7316 −0.1523 0.0009

−0.4773 0.4353 −0.1697 −0.4164 −0.1009 0.0015

−0.5545 0.6967 0.3607 −1.0391 −0.7545 0.0054

−0.8738 −0.8617 −0.1393 −0.1472 0.5286 0.0007

45.9962 38.3098 1.3839 4.6122 0.8536 1.4080

Note: RMSE = 0.4493

20

θ3(t)

1

θ1(t)

θ4(t)

θ2(t)

θ5(t)

15

Downloaded by [Beihang University] at 21:54 01 May 2016

0 10

-1

5

-2 10 20 30 40 50 60 70 80 90 100110120130140150160170180190200

0

-3

-5

4

-10

2

x 10

θ (t) 6

0 -15

-2 -20 10 20 30 40 50 60 70 80 90 100110120130140150160170180190200 Sample Index

-4 10 20 30 40 50 60 70 80 90 100110120130140150160170180190200 Sample index

Figure 2. A comparison of the recovered signal from the identified TVCS model (15) and the original observations. Solid (blue) line indicates the observations and the dashed line indicates the signal recovered from the TVCS model (15). The sampling index interval [10, 200] above corresponds to [7.1, 9] (time scale).

Figure 3. The time-varying coefficients estimation for test data set of NAR identified common-structured model in Equation (15) using the SWRLS algorithm with a forgetting factor of 0.995.

ranked in order of significance, are shown in Table 1. Now, consider the performance of the identified model, where parameters are determined by Equation (11) and the estimated results of parameters are given in Table 1. The eighth test data-set is applied to test the performance of the identified model. Figure 2 presents a comparison between the recovered signal from the identified common structured model and the original measurements. To measure the identified models, the normalised RMSE is defined as follows:

  RMSE = 

1 NK+1

 yˆ (t) − y(t)2

NK+1

t=1

y(t)2

,

(14)

where NK+1 is the data sliding-window length of the (K + 1)th test data-set and yˆ (t) is the predicted value from the identified model. The RMSE criteria of Equation (14) can also be provided to select a proper sliding-window size

W provided that the RMSE value is very small. The value for RMSE, for the identified models, over the test window data-set, was calculated as 0.4493. Clearly, the identified model provides a very good presentation for the test data-set. The TVCS model was, thus, represented by y (t) =

5 

θi (t) y (t − i) + θ6 (t) y 2 (t − 3) + e (t),

i=1

(15) where the parameter θ (t) depends on the data-sets from the sliding window. The parameters can be directly estimated using the SWRLS algorithm. Figure 3 shows the estimated values for θ (t) for the test data-set given in Figure 2 using the SWRLS algorithm with a forgetting factor of 0.995. Note that the sampling index interval [10, 200] in Figure 2 corresponds to [7.1, 9] (time scale). The estimates of timevarying coefficients in Figure 3 can give more transient information; for example, there is one clear abrupt change

International Journal of Systems Science 30

4.2. Example 2: modelling EEG data 4.2.1. EEG data-sets

20

EEG series provides an illustrative analysis to highlight key features of the methodology. Scalp EEG series are synchronous discharges from cerebral neurons detected by electrodes attached to the scalp. An XLTEK 32 channel headbox (Excel-Tech Ltd) with the international 10–20 electrode placement system was applied to record EEG data in the Sheffield Teaching Hospitals NHS Foundation Trust, Royal Hallamshire Hospital, United Kingdom. Thirty-two parallel EEG series were recorded in parallel from 32 electrodes located on epileptic seizure patient’s scalp using the same-32 channel amplifier system using bipolar montages reference channels. The sampling frequency of the device was 500 Hz. Dynamical properties of brain electrical activity from different extracranial and intracranial recording regions have been discussed in Andrzejak et al. (2001) and Li, Wei, Billings, and Liao (2012). The time-frequency decomposition method aided by the time-varying autoregressive model for EEG series to extract and estimate latent EEG components in various key frequency bands was also investigated in Li, Wei, Billings, and Sarrigiannis (2011) and West, Prado, and Krystal (1999). The central objective of the present paper for the EEG signals is to propose an empirical and data-based TVCS modelling scheme to track and capture the transient variations of EEG signals from model identification that can produce an accurate but simple description of the dynamical relationships between different recording regions during brain activity. This is a complicated and challenging black box system where the true model structure is unknown, and thus, needs to be identified from the available experimental data. As an example, symmetrical two bipolar channels (F3, located over the left superior frontal area of the brain and F4, located over the same area on the right) of EEG recorded from a patient with absence seizure epileptic discharge are investigated. Channel F3 was treated as the input, denoted by u (t) , and Channel F4 was treated as the output, denoted by y (t); note that Channel F3 is the signal input and Channel F4 is the signal output, the main reason is that the phase of Channel F4 is related to the phase of Channel F3. The objective is to investigate, from the available Channel F3 and F4 recordings, if an identified TVCS model is suitable to describe the dynamical characteristics and adaptively track and capture the transient variations of time-varying parameters using the proposed approach. The input–output EEG signals of N = 3000 data points pairs of one seizure, which are for a sort of epileptic seizure activity of a patient, with a sampling rate of 500 Hz, recording during 6 s, were analysed in this example. This analysis represents the first application of a TVCS modelling to epileptic seizure EEG data and is intended to determine feasibility and identify its potential for tracking transient variations over time in seizure EEG data.

10

0

-10

-20

-30 10 20 30 40 50 60 70 80 90 100110120130140150160170180190200 Sample Index

Downloaded by [Beihang University] at 21:54 01 May 2016

2677

Figure 4. The simulation training data block output for the fourth window block data-set. The sample index interval [10, 200] above corresponds to [3.1, 5] (time scale). θ (t) 3

1.5

θ (t) 1

θ (t) 4

θ (t) 2

θ (t) 5

1 0.5 0 -0.5 10 20 30 40 50 60 70 80 90 100110120130140150160170180190200 -3

15

x 10

θ (t) 6

10 5 0 -5 10 20 30 40 50 60 70 80 90 100110120130140150160170180190200 Sample index

Figure 5. The time-varying coefficients estimation of NAR identified common-structured model in Equation (15) for training data-set given in Figure 4 using the SWRLS algorithm with a forgetting factor of 0.995.

of the estimated coefficients at sample index about 100, which implies the jump change of transient frequency component from fβ2 to fγ , and shows that the original signal given in Figure 2 undergoes transient changes. Furthermore, the proposed method can also track and detect variation of each training data block dynamically; for instance, Figure 5 shows the rapid change of coefficient estimation at sample index about 100, which clearly implies the transient changes of component frequency from fβ1 to fβ2 , and indicates that the corresponding original training data block given in Figure 4 changed at sample index about 100. These results discussed have demonstrated that the CMSS algorithm is effective.

Downloaded by [Beihang University] at 21:54 01 May 2016

Y. Li et al.

4.2.2. TVCS model identification Similar to the previous simulation example, the objective is to identify a TVCS model which can be used to analyse and detect transient variation properties of EEG signals and dynamically track and capture the variation of the EEG signals. Simulation results have shown that, the choice of sliding-window size W = 600 data points gives good model identified results. So, the parameter K + 1 was equal to 9. The first eight data-sets will be considered as training datasets for the model identification, and the ninth test data-set was then used to test the performance of the identified model. Denote the system input and output sequence using DN = {u (t) , y (t)}N t=1 with N = 3000 data pairs. The predictor vector for all the common-structured models was chosen to be X (t) = [x1 (t) , . . . , x10 (t)]T , wherexk (t) = y (t − k) for k = 1, 2, . . . , 5 and xk (t) = u (t − k + 5) for k = 6, 7, . . . , 10. The initial candidate common model structure for all the eight training data-sets was chosen to be a NARX model below y (t) = θ0 +

10 

θi xi (t) +

10  10 

θi,j xi (t) xj (t) + e (t),

i=1 j =i

i=1

(16) This candidate model involves a total of 66 candidate model terms. Based on the candidate common model structure, the new CMSS algorithm was applied to the eight training data-sets to identify a TVCS model. The AGCV index, shown in Figure 6, suggests that a common model structure, with eight model terms, is preferred. The eight selected common model terms, ranked in order of the significance, are shown in Table 2. The common model structure for the eight training data-sets was identified as

y (t) = θ0 +

4 

θi y (t − i) + θ5 u (t − 1) + θ6 y (t − 5)

i=1

× u (t − 1) + θ7 y (t − 5) u (t − 5) + e (t) , (17)

Table 2.

6

5.9

5.8

AGCV

2678

5.7

5.6

5.5

5.4

0

5

10

15 20 Model size

25

30

35

Figure 6. The AGCV versus model size for the CMSS produced model for the input-output EEG signals modelling problem.

The corresponding coefficient parameters of Equation (17) for each training window block data-set are given in Table 2. Figure 7 presents a comparison between the recovered signal with the model predicted output prediction and the associated original measurements, where the relevant normalised RMSE, with respect to the test dataset, was calculated to be 0.2755. Clearly, the TVCS model provides an excellent representation for the test data-set. The common structured model in Equation (17) is then employed to form a TVCS model, y (t) = θ0 (t) +

4 

θi (t) y (t − i) + θ5 (t) u (t − 1)

i=1

+ θ6 (t) y (t − 5) u (t − 1) + θ7 (t) y (t − 5) × u (t − 5) + e (t) , (18) where θi (t) (i = 0, 1, . . . , 7) are now time-dependent parameters which can then be estimated by using the SWRLS

Identification results for the EEG data with the CMSS algorithm for NARX model representation. Parameters for test data-sets

Step 1 2 3 4 5 6 7 8

Model term

Data 01

Data 02

Data 03

Data 04

Data 05

Data 06

Data 07

Data 08

AERR(%)

y (t − 1) y (t − 2) y (t − 3) y (t − 4) y (t − 5) u (t − 1) Const u (t − 1) y (t − 5) u (t − 5)

1.9406 −1.3804 0.6155 −0.2304 −0.0001 −2.9959 0.0283 0.0001

1.8059 −1.2920 0.6372 −0.2151 −0.0001 −6.7214 0.0344 0.0001

1.8513 −1.3627 0.6665 −0.2299 −0.0001 −9.5099 0.0433 0.0001

1.7979 −1.2635 0.5496 −0.1315 −0.0002 −5.8357 0.0291 0.0002

1.4649 −0.7640 0.3049 −0.0542 −0.0003 −5.3187 0.0288 0.0004

1.2945 −0.4772 0.2344 −0.0761 −0.0003 −2.9343 0.0079 0.0004

1.3323 −0.5121 0.2097 −0.0250 −0.0005 1.6203 −0.0164 0.0005

1.7458 −1.1687 0.5779 −0.1549 −0.0003 −0.3837 −0.0049 0.0004

97.1551 1.0140 0.0688 0.0381 0.0149 0.0138 0.0371 0.0351

Note: RMSE = 0.2755.

International Journal of Systems Science

Note that polynomial models may be intrinsically unstable in some cases (Ozaki, Sosa, & Haggan, 1999) if a full model is directly used in a simple manner. The proposed approach would be to only select the appropriate polynomial terms which then avoid the unstable problems. This is why term selection becomes so important in nonlinear system identification.

100

0

EEG(uV)

-100

-200

-300

5. Conclusions

-400

Downloaded by [Beihang University] at 21:54 01 May 2016

-500 50

Original Measurement MPO 100

150

200

250

300 350 400 Sample Index

450

500

550

600

Figure 7. A comparison of the recovered signal from the identified TVCS model (18) and the original observations over the test data-set. Solid (blue) line indicates the observations and the dashed line indicates the signal recovered from the TVCS model.

θ1(t)

θ2(t)

θ3(t)

θ4(t)

θ5(t)

2 0 -2 50

100

150

200

250

300

350

400

450

500

550

600

100

150

200

250

300

350

400

450

500

550

600

350

400

450

500

550

600

0 -10

θ0(t)

-20 50

-3

x 10 5 0 -5 50

2679

θ6(t) 100

150

200

θ7(t) 250

300

Sample Index

Figure 8. The time-varying coefficient estimates of nonlinear common-constructed model in Equation (18) for the ninth EEG test data-set using a SWRLS algorithm with a forgetting factor of 0.98.

algorithm. The associated parameter estimates are shown in Figure 8. The parametric estimation results of time-varying model here, in combination with an extension of the concept of the generalised frequency response functions (Billings & Jones, 1990; He et al., 2013; Jones & Billings, 1989), can be used to form some nonlinear parametric time-frequency formulas, which can then be used to generate nonlinear time-frequency properties that are useful for feature extraction from EEG data. Detailed discussions on the nonlinear parametric time-frequency method will be given in future in a separate study due to the nature and complexity of such an algorithm.

The application of the new CMSS approach involves two critical steps: model structure selection and model parameter estimation. When the CMSS algorithm is applied in model structure selection, a multiple regression search procedure, over a number of partitioned data sets using an online sliding-window approach, is performed. Initially, the implementation of a multiple search appears to be very complex. But the introduction of the new multiple orthogonal regression search algorithm provides an attractive solution to this problem. It should be noted that the computational complexity of the CMSS algorithm depends on the block data-sets K, where the parameter K depends on the sampled data length N and the sliding-window size W . The choice of the sliding-window size W depends on the properties of the observational data. The true model structure of the underlying system will in many cases be unknown and only the input and output observations are available. But the algorithms derived in this study show that a common model structure can be deduced from the available observations. In the two examples, polynomial models were employed to form the common-structured models. However, it should be noted that the CMSS approach can also apply to any other parametric or nonparametric modelling problems where the initial full models can be written as a linear-in-the-parameters form. Once a common model structure has been obtained, an online SWRLS algorithm is then applied to estimate the time-varying model parameters. But note that other online methods, for example a sliding-window Kalman filtering algorithm, can also be employed to estimate the unknown time-varying parameters. While the tracking ability of RLS is achieved by performing a forgetting operation on the information matrix, the tracking capability of Kalman filtering is obtained by adding a nonnegative definite matrix to the covariance matrix. The main reason for employing RLS in the present study is mainly because of its simple calculation and good convergence properties, but this does not exclude the introduction of a Kalman filtering algorithm in our future comparative studies. The TVCS model can be applied to analyse and reflect the transient properties of nonstationary signals, and also to dynamically track and capture the transient variation of the nonstationary EEG signals. The main purpose of present study at this stage is focused on nonlinear time-varying parametric modelling, which forms the basis of some

2680

Y. Li et al.

important developments for further application in medical applications including EEG data modelling, feature extraction and classification analysis. For example, the time-varying parameter results estimated from the TVCS model can not only provide the transient local information of the EEG signals, but can also be applied in nonlinear time-dependent parametric spectral analysis in the frequency domain to extract more features from the EEG signals, so that the results can provide further applications for EEG data analysis. Acknowledgements The authors would like to thank the editor Robert Smith and three anonymous reviewers for their valuable comments and constructive suggestions in improving the presentation of the paper.

Downloaded by [Beihang University] at 21:54 01 May 2016

Disclosure statement No potential conflict of interest was reported by the authors.

Funding This work was supported in part by the National Natural Science Foundations of China [61403016]; the Engineering and Physical Sciences Research Council, United Kingdom; the European Research Council; Specialized Research Fund for the Doctoral Program of Higher Education [20131102120008]; Project Sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, and the Fundamental Research Funds for the Central Universities.

References Aguirre, L.A., & Billings, S.A. (1995). Dynamical effects of overparametrization in nonlinear models. Physica D: Nonlinear Phenomena, 80, 26–40. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723. Andrzejak, R.G., Lehnertz, K., Mormann, F., Rieke, C., David, P., & Elger, C.E. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64, 061907. Billings, S.A. (2013). Nonlinear system identification: NARMAX methods in the time, frequency, and spatio-temporal domains. Chichester: Wiley & Sons. Billings, S.A., & Jones, J.C.P. (1990). Mapping non-linear integro-differential equations into the frequency domain. International Journal of Control, 52, 863–879. Billings, S.A., & Wei, H.L. (2005). The wavelet-NARMAX representation: A hybrid model structure combining polynomial models with multiresolution wavelet decompositions. International Journal of Systems Science, 36, 137–152. Billings, S.A., & Wei, H.L. (2007). Sparse model identification using a forward orthogonal regression algorithm aided by mutual information. IEEE Transactions on Neural Networks, 18, 306–310. Billings, S.A., Wei, H.L., & Balikhin, M.A. (2007). Generalized multiscale radial basis function networks. Neural Networks, 20, 1081–1094. Chen, B.D., Zhao, S.L., Zhu, P.P., & Principe, J.C. (2013). Quantized Kernel recursive least squares algorithm. IEEE Transactions on Neural Networks, 24, 1484–1491.

Chen, H., Gong, Y., & Hong, X. (2013). Online modeling with tunable RBF network. IEEE Transactions on Cybernetics, 43, 935–947. Chen, S., & Billings, S.A. (1985). Input-output parametric models for non-linear systems, part I: Deterministic non-linear systems. International Journal of Control, 41, 303–328. Chen, S., & Billings, S.A. (1989). Representations of non-linear systems: The NARMAX model. International Journal of Control, 49, 1013–1032. Chen, S., Cowan, C.F.N., & Grant, P.M. (1991). Orthogonal least squares learning algorithm for radial basis function networks. IEEE Transactions on Neural Networks, 2, 302–309. Chen, S., Wang, X.X., & Harris, C.J. (2008). NARX-based nonlinear system identification using orthogonal least squares basis hunting. IEEE Transactons on Control Systems Technology, 16, 78–84. Chng, E.S., Chen, S., & Mulgrew, B. (1996). Gradient radial basis function networks for nonlinear and nonstationary time series prediction. IEEE Transactions on Neural Networks, 7, 190– 194. Choi, B.Y., & Bien, Z. (1989). Sliding-windowed weighted recursive least-squares method for parameter estimation. Electronics Letters, 25, 1381–1382. Ding, F., Liu, X.G., & Chu, J. (2013). Gradient-based and leastsquares-based iterative algorithms for Hammerstein systems using the hierarchical identification principle. IET Control Theory and Applications, 7, 176–184. Djigan, V.I. (2006). Multichannel parallelizable sliding window RLS and fast RLS algorithms with linear constraints. Signal Process, 86, 776–791. Fu, T.C. (2011). A review on time series data mining. Engineering Applications of Artificial Intelligence, 24, 164–181. Hawkins, D.M., Basak, S.C., & Mills, D. (2003). Assessing model fit by cross-validation. Journal of Chemical Information and Computer Sciences, 43, 579–586. He, F., Wei, H.L., & Billings, S.A. (2013). Identification and frequency domain analysis of non-stationary and nonlinear systems using time-varying NARMAX models. International Journal of Systems Science. doi:10.1080/00207721.2013.860202 Jiang, J, & Zhang, Y. (2004). A revisit to block and recursive least squares for parameter estimation. Computers & Electrical Engineering, 30, 403–416. Jones, J.C.P., & Billings, S.A. (1989). Recursive algorithm for computing the frequency response of a class of non-linear difference equation models. International Journal of Control, 50, 1925–1940. Keogh, E., Chu, S., Hart, D., & Pazzani, M. (Eds.). (2004). Data mining in time series databases: Segmenting time series - a survey and novel approach. Singapore: World Scientific. Kingravi, H.A., Chowdhary, G., Vela, P.A., & Johnson, E.N. (2012). Reproducing Kernel Hilbert space approach for the online update of radial bases in neuro-adaptive control. IEEE Transactions on Neural Networks and Learning Systems, 23, 1130–1141. Komninakis, C., Fragouli, C., Sayed, A.H., & Wesel, R.D. (2002). Multi-input multi-output fading channel tracking and equalization using Kalman estimation. IEEE Transactions on Signal Processing, 50, 1065–1076. Leonard, J.T., & Roy, K. (2006). On selection of training and test sets for the development of predictive QSAR models. QSAR & Combinatorial Science, 25, 235–251. Li, Y., Wei, H.L., & Billings, S.A. (2011). Identification of time-varying systems using multi-wavelet basis functions. IEEE Transactions on Control Systems Technology, 19, 656– 663.

Downloaded by [Beihang University] at 21:54 01 May 2016

International Journal of Systems Science Li, Y., Wei, H.L., Billings, S.A., & Liao, X.F. (2012). Timevarying linear and nonlinear parametric model for Granger causality analysis. Physical Review E, 85, 049908. Li, Y., Wei, H.L., Billings, S.A., & Sarrigiannis, P.G. (2011). Time-varying model identification for time-frequency feature extraction from EEG data. Journal of Neuroscience Methods, 196, 151–158. Ljung, L., & S¨oderstr¨om, T. (1983). Theory and practice of recursive identification. Cambridge: MIT Press. Miranian, A., & Abdollahzade, M. (2013). Developing a local least-squares support vector machines-based neuro-fuzzy model for nonlinear and chaotic time series prediction. IEEE Transactions on Neural Networks and Learning Systems, 24, 207–218. Nishiyama, K. (2014). Time-varying AR spectral estimation using an indefinite matrix-based sliding window fast linear prediction. IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, E97A, 547–556. Nordsjo, A.E., & Zetterberg, L.H. (2001). Identification of certain time-varying nonlinear wiener and Hammerstein systems. IEEE Transactions on Signal Processing, 49, 577–592. Ozaki, T., Sosa, P.A.V., & Haggan, O.V. (1999). Reconstructing the nonlinear dynamics of epilepsy data using nonlinear time series analysis. Journal of Signal Process, 3, 153–162. Peng, H., Ozaki, T., Haggan-Ozaki, V., & Toyoda, Y. (2003). A parameter optimization method for radial basis function type models. IEEE Transactions on Neural Networks, 14, 432– 438. Schilling, R.J., Carroll, J.J., & Al-Ajlouni, A.F. (2001). Approximation of nonlinear systems with radial basis function neu-

2681

ral networks. IEEE Transactions on Neural Networks, 12, 1–15. Sj¨oberg, J., Zhang, Q., Ljung, L., Benveniste, A., Delyon, B., Glorennec, P.-Y., . . . Juditsky, A. (1995). Nonlinear blackbox modeling in system identification: a unified overview. Automatica, 31, 1691–1724. Titov, I., & McDonald, R. (2008). Proceedings of the 17th international conference on World Wide Web: Modeling online reviews with multi-grain topic models. New York, NY: ACM. Vullings, H., Verhaegen, M., & Verbruggen, H.B. (1997). Advances in Intelligent Data Analysis Reasoning about Data: ECG segmentation using time-wraping. Berlin, Germany: Springer. Wei, H.L., & Billings, S.A. (2002). Identification of time-varying systems using multiresolution wavelet models. International Journal of Systems Science, 33, 1217–1228. Wei, H.L., & Billings, S.A. (2009). Improved model identification for non-linear systems using a RSMM approach. International Journal of Control, 82, 27–42. Wei, H.L., Lang, Z.Q., & Billings, S.A. (2008). Constructing an overall dynamical model for a system with changing design parameter properties. International Journal of Modelling, Identification and Control, 5, 93–104. West, M., Prado, R., & Krystal, A.D. (1999). Evaluation and comparison of EEG traces: Latent structure in nonstationary time series. Journal of the American Statistical Association, 94, 1083–1095. Zou, R., Wang, H.L., & Chon, K.H. (2003). A robust time varying identification algorithm using basis functions. Annals of Biomedical Engineering, 31, 840–853.