Chaos Theory, Optimal Embedding and ... - Semantic Scholar

Chaos Theory, Optimal Embedding and Evolutionary Algorithms

Vladan Babovic

Maarten Keijzer

January 19, 2001

Magnus Stefansson

Abstract ABSTRACT: Constructing models from time series with non-trivial dynamics is a diÆcult problem. The classical approach is to build a model from rst principles and use it to forecast on the basis of the initial conditions. Unfortunately, this is not always possible. For example, in uid dynamics a perfect model in the form of the Navier-Stokes equations exists, but initial conditions are diÆcult to obtain. In other cases, a good model may not exist. In either case, alternative approaches should be examined. In this contribution a method inspired by chaos theory for building non-linear models from data - Local Linear Models (LLMs) - is discussed and described. This work was in part funded by the Danish Technical Research Council (STVF) under the Talent Project N 9800463 entitled "Data to Knowledge { D2K". More information on the project can be obtained through http://www.d2k.dk

Contents 1 Introduction 2 Background

2.1 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Moving average (MA) models . . . . . . . . . . . . 2.1.2 Autoregressive (AR) models . . . . . . . . . . . . . 2.1.3 Autoregressive { Moving average (ARMA) models 2.2 Embedding Theorem . . . . . . . . . . . . . . . . . . . . .

3 Time Series Characterisation

3.1 Fourier Analysis and Autocorrelation Function . . 3.2 Information-theoretic Characterisation . . . . . . . 3.2.1 Average Mutual Information . . . . . . . . 3.2.2 Choice of Optimal Time Delays . . . . . . . 3.3 Embedding Dimension { False Nearest Neighbours 3.4 Invariants of Motion . . . . . . . . . . . . . . . . . 3.4.1 Lyapunov exponents . . . . . . . . . . . . . 3.4.2 Fractal Dimension . . . . . . . . . . . . . . 3.5 Local Properties of the Dynamics . . . . . . . . . . 3.5.1 Local Dynamical Dimension . . . . . . . . .

4 Prediction

4.1 Local Modelling . . . . . . . . . . . . . . . . 4.1.1 Local Linear Models . . . . . . . . . 4.1.2 The Embedding Recipe . . . . . . . 4.2 Evolutionary Embedding . . . . . . . . . . . 4.2.1 Evolutionary Algorithms . . . . . . . 4.2.2 Evolutionary Embedding { The Idea 4.2.3 Results . . . . . . . . . . . . . . . .

5 Conclusions

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

3 5 5 5 6 6 7

10

10 12 12 14 15 17 17 19 22 22

24

24 25 25 26 26 27 28

35

2

Chapter 1

Introduction Many of the important aspects of analyzing dynamical systems are carried out by the study of observable variables of the system as a function of time. However, it may be argued that not enough attention is given to the representation of the data obtained from eld or laboratory experiments. As a result, observations, which appear random under time series representation, are typically discarded as noise in most cases. Recent developments in non-linear dynamics have demonstrated that irregular or random behavior in natural systems may arise from purely deterministic dynamics with unstable trajectories. Even though some observations might appear random, underneath their random behavior may lie an order or pattern. Such types of non-linear dynamical systems, which are also highly sensitive to initial conditions, are popularly known as chaotic systems. Chaotic systems comprise a class of signals that lie between predictable periodic or quasiperiodic signals and totally irregular stochastic signals which are completely unpredictable. The dynamical source of chaotic signals is not properly representable by one-dimensional observations in a time domain; instead the dynamics takes place in a phase space of vectors of larger dimension. Such a phase space is however viewed projected down on time axis of observed variables, for example as a time series of water levels. The phase space along which the chaotic dynamics evolve can be reconstructed from time series of observations by utilising a method known as embedding. Major advances have not only been made in the construction of the geometry of the phase space but also in the study of the dynamics in the phase space. One of the many applications that follow from the reconstruction of the dynamics on the phase space is time series prediction. This paper explores two dierent techniques based on the principles of chaos theory for the reconstruction of the chaotic dynamics on the phase space. In the rst instance, a more-or-less standard `recipe' (3) for embedding of time series into phase space is described and applied . In the second instance, an approach based on evolutionary algorithms was developed and subsequently applied to nd parameters for optimal embedding. Finally, local linear modelling (11) 3

based on these two dierent sets of embedding parameters has been used to establish model of the evolving dynamics.

4

Chapter 2

Background Measurements of a physical or a biological system result in a time series x(t) = fx(t0 + ns )g sampled at intervals s and initiated at t0 . The purpose of forecasting is to predict the state of a system x(t) fxt ; xt 1 ; xt 2 ; xt 3 ; : : : xt d 1 ; xt at a time-horizon T in the future x(t + T ). 2.1

Linear Models

One, standard way of generating models capable of forecasting is to use globally linear models, such as moving average (MA), autoregressive (AR) or autoregressive moving average (ARMA) models. Application of these models is classic (almost old-fashioned) however, for the purposes of completeness, in the sequel a quick overview of these globally linear paradigms is presented. 2.1.1

Moving average (MA) models

Moving average model is a classical convolution lter: the new series x(t) is generated by a linear lter with coeÆcients fb0; b1 ; : : : ; bN g modifying external input series et fe0 ; e1 ; e2; : : : eN g. x(t) =

N X b(n)e(t

n=1

n s )

(2.1)

Such a model is in statistical community referred to as an N th order moving average model { MA(N) { whereas in the engineering community such a model is referred to as a nite impulse response (FIR) lter, since its output is guaranteed to converge to zero at N time steps after the input (or the forcing term) becomes zero. Characterisation of behaviour of MA(N) model is usually given in the terms

5

d

g

of discrete Fourier transform of its impulse response(see (15)): N X b e i nf

n=0

(2.2)

2

n

The power spectrum is provided by convention as f

=j

N X b e i nf j

n=0

n

2

2

(2.3)

Autocorrelation coeÆcients of the output are de ned in the terms of of the mean of the observed output = hxt i and the variance 2 = h(xt )2 i 1 h(x )(x )i (2.4) 2

t

t

where the angular brackets hi denote expected values. 2.1.2

Autoregressive (AR) models

The Mth order autoregressive model { AR(M) { is conventionally de ned as follows: M X x(t) = a(n) x(t n s ) + e(t) = xb(t) + e(t) (2.5) n=1

Depending on the purpose, e(t) can either represent a controlled input to the system or noise. The autocorrelation coeÆcients a(m) of the AR model are found by solving a so-called Yule-Walker equations

=

M X a

m=1

m m

>0

(2.6)

The Yule-Walker set of linear equations (2.6) allow to express the autocorrelation coeÆcients of a time series in terms of the AR coeÆcients that generated it. Yule-Walker equations also allow that the coeÆcients of an AR(M) model are established from the observed correlation of the observed signal x(t). The power spectrum of the AR(M) impulse response is given as: 1 f= (2.7) P M j1 m=1 am ei2mf j2 2.1.3

Autoregressive { Moving average (ARMA) models

The next natural step in complexity is to introduce both the AR and the MA components into the model; this provides what is commonly referred to as an ARMA(M, N) model: 6

M X x(t) = a(m) x(t

N X m) + b(n) e(t

m=1

n=1

n)

(2.8)

Models of ARMA kind have dominated the eld of time series analysis and forecasting for more than half century, as they transform the signal into a small number of coeÆcients plus residual white noise. However, such appealing simplicity of linear models can be entirely misleading even when simple nonlinearities occur. It has been shown (see, for example (15)) that the power spectra of such, globally linear models as discussed above and related autocorrelation coeÆcients contain the same information about a system driven by uncorrelated white noise. Thus, if and only if the power spectrum fully characterises the relevant features of a time series, will a linear model (AR, MA or ARMA) be the appropriate description. At the same time, it is widely known that two time series can have very similar broadband spectra but can be generated from systems with very dierent properties, such as a linear system that is driven stochastically by external noise and a deterministic (noise-free) non-linear system with a small number of degrees of freedom. Therefore, in order to forecast non-linear systems (left alone systems that exhibit chaotic behaviour), a dierent class of models is called upon. 2.2

Embedding Theorem

Yule's (28) original idea for forecasting was that future predictions can be generated by using the immediately preceding values. An ARMA model (2.8) can then be rewritten as follows: xt = axt

+ bet

(2.9) where b = fb0; b1 ; : : : ; bN g, e = fe0 ; e1 ; : : : ; eN g, a = fa0; a1 ; : : : ; aM g and x = fx0; x1 ; : : : ; xM g denote lag vectors { also referred to as tapped delay lines. There is a profound connection between such time-lagged vectors and the underlying dynamics. This connection has been rst proposed as a so-called Time-Delay Embedding Theorem by Packard (?). Takens (?) published the rst formal demonstration of such a connection, which was later strengthened by Sauer (24). The Time-Delay Embedding Theorem can be written as follows: 1

Given a dynamical system with a d-dimensional solution space and an evolving solution h(t), let x be some observation x(h(t)). Let us also de ne the lag vector (with dimension d and common time lag T ) x(t) (xt ; xt T1 ; xt T2 ; xt T3 ; : : : ; xt Td 1 ). Then, under very general conditions, the space of vectors x(t) generated by the dynamics contains all of the information of the space of solution vectors h(t). The mapping between them is smooth and invertible. This property is referred to as dieomorphism and this kind of mapping is referred to as an embedding. Thus, the study of the time series x(t) is also the study of the solutions of the

7

underlying dynamical system h(t) via a particular coordinate system given by the observable x.

Thus, delay vectors of a suÆcient length are not just representation of the states of a sequences of superimposable linear systems { it turns out that delay vectors can recover the full geometrical structure of underlying non-linear system. These results address the general problem of inferring the behaviour of the intrinsic degrees of freedom when a function of the state of the system is measured. The state of a dynamical system at any time can be speci ed by a statespace vector where the coordinates of the vector are the independent degrees of freedom of the system. Generally, the number of rst-order dierential equations describing the system determines the number of independent components in y(t). The embedding theorem establishes that, when there is only a single measured quantity from a dynamical system, it is possible to reconstruct a state space that is equivalent to the original (but unknown) state space composed of all the dynamical variables. The embedding theorem (?; ?; 25) states that if the system produces orbits in the original state space that lie on a geometric object of dimension dA (which need not be integer), then the object can be unambiguously seen without any spurious intersections of the orbit in another space of integer dimension d > 2 dA comprised of coordinates that are arbitrary transformations of original state-space coordinates. The absence of intersections in the second space implies that the orbit is resolved without ambiguity when d is large enough. Overlaps of the orbit may occur in lower dimensions and the ambiguity at the intersections destroys the possibility of predicting the evolution of the system. In a dissipative system, the geometric object to which orbits converge in time is referred to as the system attractor. If dA is not integer, is is referred to as a strange attractor (after Ruelle, (23)) and the system is chaotic. It is the attractor and the motion of system orbits on it, not the projection of those motions onto the observation axis of the measurements where prediction, classi cation, control and other signal-processing tasks can be carried out without ambiguity. That is, it is necessary to go to a higher-dimensional space to do signal processing when the signal comes from a non-linear system that may exhibit irregular, chaotic motion. Almost any set of d coordinates is equivalent by embedding theorem. Each set is a dierent way of unfolding the attractor from its projection onto the observations. Formally, an autonomous system producing orbits h(t) through the dynamics is: dh(t) = F(h(t)) (2.10) dt

and the output is x(t) = f (h(t)). h is an n dimensional vector, and x(t) is typically a one dimensional output signal. With mild restriction on choice of functions F(h) and h(h), any independent set of quantities related to x(t) can serve as the coordinates for a state space for the system. Time derivatives of x(t) are the natural choice for the set of independent coordinates. However, when 8

the signal is sampled in discrete time, the derivatives are a high-pass lters x(t + (t + 1)s ) x(t0 + t s ) x_ (t) = 0 (2.11) s

and therefore emphasize errors and noise in measurements. Equation (2.11) suggests an alternative set of coordinates for the state space. The signal x(t) and its time delays are the ingredients in the approximations to the time derivatives of x(t). The time-delay values of x(t) represent new information that enters in approximation of each derivative. Using the observed signal and its time delays avoids emphasis on errors and noise associated with high pass lter approximations of the time derivatives and requires no computations on the observations themselves. This set of coordinates is realized by forming the vectors

x(t) = fx(t); x(t

1 ); x(t

g

(2.12) where a dE dimensional space is constructed, and components of the vector x(t) are separated by times fi g. One can further simplify the situation by adopting the following scheme 1 = T ; 2 = 2 T ; 3 = 3 T (after (3)). Nevertheless, to perform optimal embedding, time series needs to be properly characterised and the most appropriate time delay T , as well as the number of components dE in the vector x(t) need to be identi ed. The appropriate methods of characterisation of time series is the subject of the next chapter.

9

2 ); : : : ; x(t

dE

1

Chapter 3

Time Series Characterisation 3.1

Fourier Analysis and Autocorrelation Function

The Fourier Power Spectrum provides a convenient tool for identifying regularities in observed signals. A broadband and continuous power spectrum without any dominant frequency indicates that the signal has possibly originated from a chaotic system. When a signal x(t) can be represented as a superposition of sine waves with dierent amplitudes, its characteristics can be adequately described by Fourier coeÆcients of amplitude and phase. In such cases, linear models (as described in section 2.1) and Fourier based methods for extracting information are appropriate and powerful. However, it is important to understand that the Fourier power spectrum is not itself a true invariant of system's behaviour because new frequencies may appear to be introduced with nonlinear changes of the coordinate system. It is also important to understand that multi-harmonic power spectra do not necessarily indicate that the system is chaotic. Such power spectra provide only an indication that the system might be of chaotic origin. Systems with large degrees of freedom can generate similar power spectra. An autocorrelation of a periodic signal produces a periodic function. For a chaotic or a random signal the autocorrelation function will approach zero rapidly. This too is a good indication that the system may be either chaotic or random in nature. Both Figure 3.1 and Figure 3.2 indicate that indeed the time series of errors of deterministic model in Venice lagoon may be characterised as chaotic. Therefore, alternative invariants of motion need to be examined in order to con rm a hypothesis of the chaotic nature of the ows. 10

Figure 3.1: Fourier Power Spectrum

Figure 3.2: Autocorrelation 11

3.2

Information-theoretic Characterisation

The upshot of the discussion in section 2 is that, for non-linear systems characterisation of time series on the basis of classical statistical paradigm is not suÆcient. In order to expand the explanatory apparatus, the standard information theory originated by Khinchin (21) and Shannon (26) is used. A very brief discussion of the information theory is presented in the sequel. 3.2.1

Average Mutual Information

The experimental process consists of performing measurements to obtain information. Consider an experiment A with possible outcomes A1 ; A2 ; A3 ; :::An . If the respective probabilities are p(A1 ); p(A2 ); p(A3 ); : : : p(An ), the uncertainty of the outcome can be assessed. If, for example, all p(Ai ) are zero except one, there is no uncertainty in the outcome and there is no point in performing an experiment as no information can be gained in performing it. Such a system is a deterministic one. If on the other side, all p(Ai ) are equi-probable, the uncertainty of the outcome is at the maximum and the information gained by carrying out the experiment is also maximal. Thus, if one carries out an experiment, the possible outcomes of which are described by a given scheme A = f[A1 ; p(A1 )]; [A2 ; p(A2 )]; [A3 ; p(A3 )]; : : : [An ; p(An )]g, then in doing so one obtains information and the uncertainty of the outcome is eliminated. It can, therefore, be said that the information received by an observer is equal in magnitude to the uncertainty which existed before the experiment. The larger the uncertainty, the larger the amount of information obtained by removing it. Following this closely, the information obtained by a measurement of the outcome of a nite scheme A can be expressed through the corresponding entropy H (A) (again (21) see): H (A) =

N X p (A) log p (A) i=1

i

2

i

(3.1)

This describes the average information gained by performing the experiment and obtaining a measurement. The de nition (3.1) can be generalised to continuous variables, i.e. H (A) =

Z

1

+

1

p(x) log2 p(x)dx

(3.2)

In order to determine higher order relationships, it is necessary to introduce higher order measures. If measurements are collected from two schemes A f[A1 ; p(A1 )]; [A2 ; p(A2 )]; [A3 ; p(A3 )]; : : : [An ; p(An )]g and B f[B1; p(B1 )]; [B2 ; p(B2 )]; [B3 ; p(B3 )]; : : : [Bn ; p(Bn )]g, the mutual information I (A; B ) is a measure of how much can be said about the one given the other.

12

I (A; B ) =

m X n X p(A ; B ) log i

j =1 i=1

I (A; B ) =

j

2

p(Ai ; Bj ) p(Ai )p(Bj )

(3.3)

m X n X p(A ; B ) log p i

j =1 i=1 m n

j

2

ij

X X p(A ; B ) log p(A ) i

j =1 i=1 m n

j

i

2

X X p(A ; B ) log p(B ) i

j =1 i=1

j

j

2

The sum p(Ai ; Bj ) over the Ai simply leaves p(Bj ) since Therefore, the above equation (3.4) can be simpli ed into: I (A; B ) =

(3.4)

P p(Ai) = 1:0.

m X n X p(A ; B ) log p i

j =1 i=1 n

j

2

ij

X p(A ) log p(A ) i=1 m

i

2

i

j

2

j

X p(B ) log p(B ) j =1

I (A; B ) = H (A) + H (B )

(3.5)

H (A; B )

(3.6) Here H (A; B ) refers to the information obtained considering A and B together H (A; B ) = HB (A) + H (B )

(3.7) in which HB (A) denotes conditional entropy { the entropy of A given B . If A and B are independent, the terms HB (A) and H (A) become equal, reducing H (A; B ) to H (A)+ H (B ) and nally implying that mutual information between A and B amounts to zero { I (A; B ) = 0. It should also be noted that I (A; B ) 0 8A; B so that there are no negative values as in the case of more familiar autocorrelation function. Within the context of time series analysis, one may refer to A as say an observation x(t) at time t and B as an observation x(t ) at some other time t . Expression (3.6) can be trivially extended to I (A; B; C; : : : ; dE ) as a function of related A ; B ; C ; : : : ; dE . Some form of correlation function between measurements x(t) and x(t + T ) is needed to describe their non-linear dependence on each other. Because of the way the signal conveys information to the observer, a natural correlation 13

function is mutual information, as described above and originating by Shannon (26). The question of selecting time delays A ; B ; : : : ; dE that provide the delayed co-ordinates x(t) fxt ; xt A ; xt B ; xt C ; : : : xt dE 1 ; xt dE g and mapping from one dimensional time domain to dE dimensional phase space providing the maximum information gain over the experimental space. 3.2.2

Choice of Optimal Time Delays

The choice of optimal time delays fi g corresponds to optimal embedding. Such a choice should be made such that each component of the vector is providing a new information about the signal source at a given time. The dynamical dierence between the components is achieved by the evolution of the signal source over a time fi g. These times must be large enough so that all dynamical degrees of freedom coupled to the variables x(t) have the opportunity to in uence the value of x(t). At the same time i must be small enough so that inherent instabilities of nonlinear systems do not contaminate the measurements at a later time, i+T . The embedding theorem provides assurance that, were there an in nite amount of in nitely accurate data, any i would work, and these concerns would vanish. However, since the amount of data is usually nite and of a nite precision, here are practical concerns of a serious nature that need to be addressed. The fundamental issue is that there must be a balance between values of i that are too small (where each component of the vector does not add signi cant new information about the dynamics), and values of i that are too large. Large values of i create uncorrelated elements in x(t) because of instabilities in the nonlinear system that may manifest over time. Thus, the components of x(t) will become independent and would convey no knowledge about the system dynamics. When creating the embedding vector T it is recommended to use the value of time delay T at which I (T ) goes through the rst minimum (13; 12). With such a choice, the embedding vector becomes x(s ) = [x(s ); x(s T ); x(s 2T ); : : : ; x(s (d 1)T )] where

T = some common time lag s = time sampling interval d = dimension of the phase space Such a choice of the rst minimum of I (T ) is a prescription, not a rigorous xed choice. In general, the prescription provided by the average mutual information often appears to yield a shorter time delay than the rst zero crossing of the autocorrelation function. The danger in using the longer time delays is that the components of the vector may become independent of each other and subsequent calculations may not be valid. 14

Figure 3.3: Average Mutual Information 3.3

Embedding Dimension { False Nearest Neighbours

The embedding theorem (?; ?) provides a suÆcient dimension in which the orbits of the system are no longer crossing each other and the temporal sequence of points is resolved without ambiguity. The data may not require this large dimension, and through following a dierent general sense another method to choose the number of coordinates may be adopted (20). For any point on the attractor x(t), one can ask whether its nearest neighbour in a state space of dimension d is there for dynamical reasons or is it instead projected in the neighbourhood because the embedding dimension is not appropriate. That same neighbour is then examined in the embedding dimension d +1 by simply adding another coordinate to x(t) using the time-delay construction. If the nearest neighbour is the true neighbour, it will remain a neighbour in this larger space. If that nearest neighbour is there because embedding dimension d is too small, it will move away from x(t) as the dimension increases. When the number of false nearest neighbours drops to zero, the attractor has been unambiguously unfolded, because of the crossings of the orbits have been eliminated. The dimension in which false nearest neighbours disappear is the dimension necessary for viewing the data. This, minimum embedding dimension dE is less or equal to the suÆcient dimension as speci ed in the embedding theorem. The global embedding dimension denotes the dimension of the non-linear (in 15

this case) dynamics after which the apparent neighbours stop being unprojected by the addition of more components to the x(T ) i:e: there is no further unfolding of the attractor. Suppose, for example, that we have a vector on the phase space x(s ) constructed from x(s ) = [x(s ); x(s T ); x(s 2T ); : : : ; x(s (d 1)T )], let the nearest neighbour of this be: x0 (s ) = fx0 (s ); x0 (s T ); x0 (s 2T ); : : : ; x0 (s (d 1)T )g whereby, if the distance between the points become larger in dimension d + 1 is compared to that in dimension d, then we have a false neighbour. The square of the Euclidian distance between the nearest neighbouring points in dimension d is: Rd2 (s ) = Rd2+1 (s ) =

Xd [x(

m=1 d

+ (m 1)T ) x0 (s + (m 1)T )]2

(3.8)

+ (m 1)T ) x0 (s + (m 1)T )]2

(3.9)

= Rd2 (s ) + jx(s + dT ) x0 (s + dT )j

(3.10)

s

X [x(

m=1

s

This distance between points, when seen in dimension d + 1 relative to the distance in dimension d is:

s

Rd2+1 (s ) Rd2 (s ) Rd2 (s )

0 = jx(s + dTR) (x)(s + dT )j d s

(3.11)

where again, if the quantity is larger than some threshold value, then we have a false neighbour. From experiments, the value of 15 gives a good approximation j x(s +dT ) x0 (s +dT )j to the threshold. Another criteria could be that where RA RA is the nominal radius of the attractor, which is taken as the RMS about the mean. False nearest neighbours are the result of using too small embedding dimension. An ambiguity results where the trajectory crosses itself. At such a crossing and in the absence of any other information one cannot tell which path the trajectory should follow. As the embedding dimension increases from d = 1 to d = 2; : : :, the percentage of false nearest neighbours should go to zero at the dimension where the attractor is globally unfolded (20). This is the minimum global embedding dimension dE . A residual of false nearest neighbours is caused by contamination of the data by a high dimensional signal that is conveniently called \noise". Noisy data have nonzero false nearest neighbours because the attractor is blurred. In these cases, as embedding dimension d increases, the noisy components starts dominating. Therefore, there is a strong motivation of not over-embedding signals polluted by noise.

16

Figure 3.4: False Nearest Neighbours 3.4 3.4.1

Invariants of Motion Lyapunov exponents

Nonlinear systems that exhibit irregular behaviour may be chaotic, which implies that the system is unstable on its attractor and within basin of attraction for that attractor. The orbits of such systems are said to be \sensitive to initial conditions", which in turn implies that nearby trajectories will diverge at an exponential rate. This exponential rate, and hence the predictability of the evolution of the system is described by the so-called largest Lyapunov exponents of the system. The Lyapunov exponents determine the stability of the system. They are indicators of the growth of perturbations as a function of the number of steps L advanced along the attractor from an initial perturbation. Lyapunov exponents indicate, on average, how well a prediction could be made of the evolution of the system L steps ahead of the present location. Since chaotic systems are extremely sensitive to initial conditions, Lyapunov exponents provide a good test for chaos. Global Lyapunov exponents are computed by studying the separation of two points, a0 and b0 , on two trajectories after n iterations. That is: 1 lim log a0 b0 = lim (3.12) n!1

n ja0 b0 j!0

an

bn Given two initial conditions for a chaotic system, a0

and b0 , which are close to each other, the average value obtained in the successive iterations for a and 17

will dier by an exponentially increasing amount. If two points are initially separated by a small distance of e then on average their separation will grow as e . Predictability is quanti ed by examining the behaviour of a small perturbation, "(0) within an orbit of a system x(t). Assuming a dynamical rule x(t + 1) = F(x(t)), the Jacobian matrix of such dynamics is de ned as DF(x) = @Fa(x) ; a; b = 1; 2; 3; : : : ; d (3.13)

b

ab

@xb

where DF x) is the composition of t Jacobians calculated at iterations of a starting point, x. Then, the dynamical rule can be given as t(

x(t + 1) + "(t + 1) = F(x(t)) + "(t) = F (x(t)) + DF(x(t)) "(t) + Order("(t) ) 2

(3.14) (3.15)

or in the limit as "(t) ! 0

x(t + 1) = DF(x(t)) "(t) = DF(x(t)) DF(x(t 1)) DF(x(0))"(0) = DFt (x(0)) "(0)

(3.16) (3.17) (3.18) The stability of the system for the small perturbation "(0) is determined by the eigenvalues of the matrix DFt (x) after L samples along the orbit, after the perturbation. Oseledec (22) analysed this eigenvalue problem. The matrix

OSL(L; x) = [DFL(x) DFL(x)T ] = L (3.19) L is well de ned because the product of DF and its transpose is orthogonal. As L becomes large (L ! 1) the eigenvalues of this matrix (??) have been shown to exist for almost all x along an orbit and are independent of x. The eigenvalues 1 2

are unaltered under smooth transformation of the coordinate system, which in turn implies that the dynamical system can be classi ed with these eigenvalues. These eigenvalues take quantities of e1 ; e2 ; : : : ; ed in a d dimensional system, where by convention 1 2 d are the global Lyapunov exponents (4). The i are evaluated by numerically determining DF(x) locally in the state space from the local predictive maps. The linear term in these maps gives DF(x). The composition of the Jacobians can be diagonalised by a recursive decomposition. The values of the Lyapunov exponents, L samples forward (or backward) in time, after a perturbation of x can be determined from (3.19). These are the local Lyapunov exponents, a (L; x) (4; 1; 17), and their average over many initial conditions x is denoted as a (L). The a (L) converge to the global exponents i as a power of L. It is useful to display a (L) as a function of L to identify the zero exponent, if it exist, as well as to characterise the general trend of all exponents. 18

Positive Lyapunov exponents indicate that neighbouring points are diverging. This implies the onset of chaos. A zero exponent indicates that the neighbouring points remain at the same distance and the system can be modelled as a set of linear dierential equations. Negative Lyapunov exponents indicate that the neighbouring points are converging and the system is dissipative. The largest Lyapunov exponents can also be used to establish a window of predictability of a system in the time domain: where

P redictability = s 1

(3.20)

= time sampling interval 1 = largest Lyapunov exponent Lyapunov exponents are a measure of information loss per unit time, expressed in bits. They are average measures and do not address the distribution of information throughout the phase space. If any of the exponentsPi positive, then the orbit is unstable. Dissipation within dynamics requires di=1 i < 0, so that value in the state space shrinks as the time progresses. This implies that manifestation of instability remains a compact geometric object within state space. If any of the i are greater than zero, then the dynamical system is chaotic by de nition. If one of the i is zero, then the dynamical system producing the signal can be described by a set of ordinary dierential equations. If the sum of all exponents is greater than zero, the results are meaningless. This can occur when signal is non-stationary, or if the trajectories are not coherent over the some reasonable period of time, or if the signal migrates through state space instead of remaining in a bounded region. Positive sum of all exponents may result when the system is stochastic. The Lyapunov exponents are perhaps the most important quantifying measure of chaotic motion. Of all the classifying quantities, they are the only ones that are truly sensitive to changes in dynamics, which often occur without a change in the dimension. Figure 3.5 shows Lyapunov exponents for the present case study. Positive exponents suggest that the trajectories are diverging exponentially, which clearly indicates that the system is chaotic. Negative exponents suggests that a dissipative mechanism exist within the system. The presence of four exponents also indicates that there are four degrees of freedom, which in turn implies that the behaviour of the system could be described by four dierential equations. The sum of all four Lyapunov exponents is in this case negative and the the attractor is indeed a compact geometrical structure. s

3.4.2

Fractal Dimension

The attractors associated with chaotic dynamics have a fractional dimension, in contrast to regular, or integrable systems, which have an integer dimension. 19

Figure 3.5: Lyapunov Exponents The fractal dimension of a dynamic system is related to the way in which the points of an embedded time series are distributed in a d dimensional space, where d is the global embedding dimension. The phase space of a chaotic system stretches and folds and re-maps into the original space. This re-mapping leaves gaps in the phase space. This again causes the orbits to ll a subspace in the phase space that is not of integral dimensionality. This non-integral dimension of the subspace points at the existence of a strange attractor(?). The dimensions of simple attractors are integers, e.g. for a xed point the dimension is zero and for limit cycles it is 1. Using the full spectrum of Lyapunov exponents, a fractal dimension of the attractor (sometimes also referred to as the Lyapunov dimension) can be de ned (18). This is the dimension of a ball in state space that neither grows nor shrinks as the dynamical system evolves. A line segment in the space grows as e1 t, while a fullPNK volume shrinks as e (1 +2 ++N )t . If K is the largest integer for which k=1 k > 0, then the Lyapunov dimension is DL = K +

PKk

=1 k k+1

(3.21)

There are many fractal dimensions and an enormous eort has been expended in producing strategies to establish them (16; 27). However, fractal dimensions have not proved to be very useful since their estimation critically depends on accurate measurements of the distance between near neighbours at the smallest scale of the attractor. However, attractor is inevitably blurred at these scales by noise contamination and quantizing errors (14). 20

Figure 3.6: Correlation Integral An additional way of estimating the attractor dimension is by computing the correlation dimension (16), which is described in the sequel. The correlation integral C (r) estimates the average number of data points within a radius r of the data point x(t). As r ! 0, or becomes small, the correlation integral is de ned as C (r) ! rd . The exponent d, de ned as the correlation dimension of the attractor, can be obtained from the slope of log C (r) versus log(r) (as illustrated in Figure 3.6). The stability of the motions on the attractor is governed by the Lyapunov exponents, which then tell us how small changes in the orbit will grow or shrink in time. The correlation dimensions dierent radii r can be estimated from Figure 3.6. The fractal dimension estimated from the slopes of the curves is in the range 1.1 - 12.0. This indicates the existence of a non-integer, fractal dimension for the attractor and is a rather clear indicator of the presence of chaotic dynamics. It leads from discussion above that we can not conclude with a high certainty that time series of water levels in Venice lagoon exhibits low dimensional chaotic behaviour. There are elements indicating chaotic behaviour (such as the existence of non-integer correlation dimension, positive as well as negative Lyapunov exponents), but it is not yet clear to which extent noise is present in the data. It is also not clear how this in turn aects forecast skill (this will be further discussed in section 4.1.1 Local Linear Models) and embedding of time series in phase space. 21

3.5

Local Properties of the Dynamics

The dynamics associated with dynamical systems develop locally on the attractor through some discrete-time map or through dierential equations. The number of active degrees of freedom dL governing local dynamics is less than or equal to the global embedding dimension dE . Since the dynamical system has only dL degrees of freedom active locally, this information can be used to classify the dynamics and construct a framework for predictive models and control strategies. 3.5.1

Local Dynamical Dimension

The embedding theorem is concerned only with global properties of the dynamics. The following method for identifying the local dynamics goes beyond those safe con nes and addresses the important issue of predictability. While computing distances between points on the attractor in dimension dE , the quality of the predictive model is examined in dL = 1; 2; : : :, until the number of coordinates required to evolve the observations locally in reconstructed space is established. The model-making is straightforward. For a neighbourhood of state-space vectors around a point x(t), on the attractor, its NB neighours y(r) (n); r = 1; 2; 3; : : : ; NB are found. A local model that takes these neighbours of x(t) to their respective points y(r; n + 1), at next sample is:

y(r; n + 1) =

N X

m=1

m(

y r (n) c(n; m) ( )

(3.22)

where m (x) are selected set of basis functions used to interpolate among the data points y(r) (n) and the c(n; m) are local coeÆcients determining the local map from time n to time n + 1. The coeÆcients c(n; m) are determined using a least square criterion NB N X X jy(r; n + 1) (y r (n)c(n; m)j ( )

r=1

m=1

(3.23)

The predictions are made using known basis functions and the local coeÆcients. The issue is how these predictions behave as a function of both the local dimension dL of the vectors in the model and the number of neighbours NB used in determining the coeÆcients. The neighbours are chosen in dE , but the predictions depend on the local dimension. When a measure of the quality of the predictions becomes independent of dL and of NB , the local dimension of the dynamics has been determined. The quality of the predictions is quanti ed by determining how many trajectories associated with the NB neighbours remain within some fraction of the attractor size for a number of samples forward in time (forecast horizon), which is usually set to T or T=2 (2). A \bad" prediction is one where trajectory diverges from the original neighbourhood by more than this fraction before it reaches the prediction point. 22

It is important to determine the local dynamics of the signal source in order to be able to predict and model the future of a dynamical system. Because dL dE , the determination of an embedding space in which the attractor can be unfolded does not provide the information needed for prediction and modelling.

23

Chapter 4

Prediction Characterising nonlinear systems by the state space analysis of observed data provides a framework for performing important tasks. One of those is making predictions on the basis on observed sets of data from the past (?; 29). Using a single, scalar measurement from a nonlinear dynamical system, one can reconstruct its attractor using method described earlier. If every neighbourhood point within the attractor has been visited suÆciently often, then the evolution of points from one state-space neighbourhood to another statespace neighbourhood can be mapped. Knowledge of a local numerical map that moves the trajectories from neighbourhood to neighbourhood provides a convenient procedure to trace any newly observed point x(t), into the neighbourhood nearest to it and the project the trajectory forward to predict x(t + 1). This process may be recursively repeated to forecast an arbitrary time horizon T . From the perspective of time series prediction and modelling, the existence of low-dimensional chaos suggests that the construction of nonlinear deterministic models for time series should be considered. If a series is indeed chaotic and low dimensional, the good news is that it will be possible to obtain precise short-term predictions with such deterministic models. The bad news is that it will be impossible to obtain good long-term predictions with any model, since the uncertainty increases exponentially in time. This divergence of nearby trajectories is a hallmark of chaotic behaviour. Therefore, the power of prediction is limited by the growth of errors, estimated by the largest Lyapunov exponent 1 , which in time T1 destroys the possibility of further prediction. 4.1

Local Modelling

In order to build local maps that describe the evolution of neighbourhoods one starts with an observed point x(t) on the attractor. In a neighbourhood with NB neighbours, x(r) (n) with n = 1; 2; 3; : : : ; NB , the unknown vector eld F(x) that evolves points on the attractor x(n + 1) = F(x) can be expanded as 24

M X F(x(n)) = c(m) m=1

m(

x(n))

(4.1)

in terms of M basis functions m (x). One can make many choices of basic (kernel) functions, such as: radial basis functions, sigmoidal functions associated with neural networks, but also rst or higher-order polynomials. In the sequel, we describe an approach based on local linear models. 4.1.1

Local Linear Models

Each time a system experiences similar conditions { both internal to the system and exerted externally on the system { we expect the system to exhibit a similar response. Forecasting exploits this principle by using the observed behaviour of a system to predict behaviour when similar conditions recur { and, clearly, a tapped delay line represent a history (evolution) of a system. Even if the equations describing a system are unknown, we can nevertheless use forecasting to learn about the system. This represents a rather eective way of approximating evolution of a dynamical system by means of local approximation, using only the most similar trajectories from the past to make predictions of future. For example, to predict the state of the system x(t) fxt ; xt 1 ; xt 2 ; xt 3 ; : : : xt d 1 ; xt d g at a time-horizon T in the future { x(t + T ) { rstly k most similar occurrences of x(t) in the past records are found. These occurrences are eectively k nearest neighbours of a point x(t) in a d dimensional embedding space. Then, a linear interpolation of zeroth (averaging) or rst order (linear regression) is performed, taking into account all k neighbours of x(t). Although the local linear modelling technique makes use of a linear model for each separate prediction, the resulting overall model can be highly nonlinear, as each of these linear approximations are made locally for each neighbourhood. 4.1.2

The Embedding Recipe

The quality of LLM approximation and resulting forecast depend on many factors, such as dimensionality of embedding space (dE ), size of the local neighbourhood (k), and perhaps most importantly the choices of embedding time delays (1 ; 2 ; 3 ; : : : ; dE ). Chaos literature provides a number of recipes for making choices of k, dE , and delay times ti fi = 1; 2; : : : ; dg. These prescriptions have been repeatedly mentioned previously in the text and are typically based on a number of simplifying assumptions (such as creation of an embedding time-delay vector as integer multiplications of a certain elementary embedding time t, i.e. 1 = T ; 2 = 2 T ; 3 = 3 T , etc.). Such recipes provide robust, but in principle sub-optimal choices of embedding parameters, thus resulting in a sub-optimal embedding properties as well as a sub-optimal forecast skill. 25

4.2

Evolutionary Embedding

The optimal embedding at a most generic level is a dE + 2 dimensional search problem. For large embedding dimension dE , the optimal choice of embedding parameters is consequently a very diÆcult one. In the sequel an investigation using evolutionary algorithms for selection of the optimal embedding parameters has been carried out. 4.2.1

Evolutionary Algorithms

Evolutionary algorithms (EAs) are engines simulating grossly simpli ed processes occurring in nature and implemented in arti cial media { such as a computer. The fundamental idea is that of emulating the Darwinian theory of evolution. According to Darwin, evolution is best depicted as the process of the adaptation of species to their environment as the one of \natural selection". Perceived in this way, all species inhabiting our planet are actually results of this process of adaptation. Evolutionary algorithms eectively provide an alternative approach to problem solving - one in which solutions to the problem are evolved rather than the problems being solved directly. The family of evolutionary algorithms today is divided into four main streams: Evolution Strategies (?), Evolutionary Programming (?), Genetic Algorithms (?) and Genetic Programming (?). Although dierent and intended for dierent purposes, all EAs share a common conceptual base (schematised in Figure 4.1). In principle, an initial population of individuals is created in a computer and allowed to evolve using the principles of inheritance (so that ospring resemble parents), variability (the process of ospring creation is not perfect { some mutations occur) and selection (more t individuals are allowed to reproduce more often and less t less often so that their \genealogical" trees disappear in time). One of the main advantages of EAs is their domain independence. EAs can evolve almost anything, given an appropriate representation of evolving structures. Similarly to processes observed in nature, one should distinguish between an evolving entity's genotype and its phenotype. The genotype is basically a code to be executed (such as a code in a DNA strand), whereas the phenotype represents a result of the execution of this code (such as any living being). Although the information exchange between evolving entities (parents) occurs at the level of genotypes, it is the phenotypes in which one is really interested. The phenotype is actually an interpretation of a genotype in a problem domain. This interpretation can take the form of any feasible mapping. For example, for optimisation and constraint satisfaction purposes, genotypes are typically interpreted as independent variables of a function to be optimised. Along these lines, one can employ mapping in which genotypes are interpreted as roughness coeÆcients in a free surface pipe ow model with the genetic algorithms (GAs) directed towards the minimisation of the discrepancies between model output and measured water level and discharge values. Resulting GA represents an automatic calibration model of hydrodynamic systems (10). Several other applications of GAs, which make use of various kinds of 26

Figure 4.1: Schematic illustration of an evolutionary algorithm. The population is initialised (usually randomly). From this population, the most t entities are selected to be altered by genetic operators exempli ed by crossover (corresponding to sexual reproduction) and mutation. Selection is performed based on certain tness criteria in which the more ' t' are selected more often. Crossover simply combines two genotypes by exchanging sub-strings around randomly selected points. In the illustration above, parental genotypes are indicated as either all 1s or all 0s, for the sake of clarity. Mutation simply ips the randomly selected bit. genotype-phenotype mappings and with a speci c emphasis on water resources, are described in for example, (5; 6; 7; 8; 9; 19). 4.2.2

Evolutionary Embedding { The Idea

The ultimate purpose of time series analysis and characterisation is prediction. In this context, the establishment of optimal embedding parameters is carried out in such a way as to provide the best possible forecast skill. Therefore, the bottom line of the performance of the embedding and subsequent local linear modelling is the resulting forecast skill. Good forecast skill implies good embedding properties. It is therefore prudent to investigate a scheme in which embedding parameters (such as embedding dimension dE or related embedding vector x(t)), as well as parameters of local linear models (such as neighbourhood size k) are object of the search procedure. Following this line of thought, a steady state GA has been implemented in which the evolving individuals represent embedding vector x(t), as well as the number of nearest neighbours to be used for tting of the local models. For all the runs, population size was set to 200. The selection mechanism was based on tournament selection, with a tournament size of 8. The tness was based on the associated accuracy of resulting LLMs. Obviously in such a setting, no assumptions (such as T 6= const were made about choices for reconstruction of embedding vector.

27

Forecast Horizon [hours] Embedding Recipe Evolutionary Embedding Improvement [%] 1 0.0075 0.0056 25.12 4 0.0181 0.0136 24.80 8 0.0212 0.0168 20.65 12 0.0201 0.0144 27.72 Table 4.1: Root mean square error for local linear models based on embedding recipe and on evolutionary embedding. Forecast Horizon [hours] Embedding Recipe r 1 0.963 4 0.786 8 0.726 12 0.669

MAE 0.075 0.194 0.232 0.216

Evolutionary Embedding r 0.983 0.848 0.721 0.824

Table 4.2: Mean absolute error (MAE) and correlation c0eÆcient (r) for local linear models based on embedding recipe and on evolutionary embedding. 4.2.3

Results

The con gurations for the resulting embedding vectors for several forecast lead times are presented in 4.2.3. Corresponding forecast skill and improvement over the LLMs constructed on the basis of embedding prescription are given in Tables 4.2.3 and 4.2.3, as well as Figure 4.2. There is a consistent improvement in all statistical measures of accuracy: correlation coeÆcient (r), root mean square error (RMS) and mean absolute error (MAE) amounting to 20%-35%. This represents quite a considerable gain over the embedding based on prescription. Figures 4.3, 4.4, 4.5 and 4.6 represent times series of observed time series as well as the ones generated based on prescription and evolutionary embedding. These gures demonstrate that the main improvement lies in better approximation of extreme valkues. Furthrer analysis of the intercomparison of the two approaches to the construction of LLMs indicates the presence of more pronounced phase error for the prescription.

28

MAE 0.054 0.132 0.171 0.145

Horizon [hours] 1 4 8 12 Nearest Neighbours 24 12 9 17 Degrees of Freedom 4 8 7 6 lag[0] 0 0 0 0 lag[1] -2 -5 -6 -8 lag[2] -19 -8 -8 -10 lag[3] -20 -11 -11 -12 lag[4] -16 -16 -15 lag[5] -20 -18 -22 lag[6] -24 -23 lag[7] -28 Table 4.3: Embedding characteristics for various lead times

Figure 4.2: Evolution of forecast error as a function of forecast horizon for both embedding parameters based on recipe and evolutionary algorithm

29

Venice Aqua Alta Forecast 1.2 Observed Recipe Evolutionary Embedding 1

Water Level [meters]

0.8

0.6

0.4

0.2

0

−0.2

0

20

40

60

80 Time [hours]

100

120

140

160

Figure 4.3: Time series of observed waters levels, as well as the water levels calculated based on evolutionary embedding and recipe. Forecast horizon 1 hour

30

Venice Aqua Alta Forecast 1.2 Evolutionary Embedding Recipe Observed 1

Water Level [m]

0.8

0.6

0.4

0.2

0

−0.2

0

50

100

150

Time [hour]

Figure 4.4: Time series of observed waters levels, as well as the water levels calculated based on evolutionary embedding and recipe. Forecast horizon 4 hours

31

Venice Aqua Alta Forecast 1.2 Observed Recipe Evolutionary Embedding 1

Water Level [m]

0.8

0.6

0.4

0.2

0

−0.2

0

50

100

150

Time [hours]


32

Venice Aqua Alta Forecast 1.2 Observed Recipe Evolutionary Embedding

1

Water Level [m]

0.8

0.6

0.4

0.2

0

−0.2

0

50

100

150

Time [hours]


33

Figure 4.7: User interface of a genetic algorithm implementation under DHI's Mike Zero framework 34

Chapter 5

Conclusions The technical report introduced an idea of evolutionary embedding and contrasted its performance against more traditional approach to optimal embedding. On the basis of demonstrated performance, the resulting improvements in the forecast skill not quite considerable. It is therefore believed that such a combined use of evolutionary algorithms and local linear models will nd more widespread use within forecasting problems.

35

List of Figures 3.1 3.2 3.3 3.4 3.5 3.6

Fourier Power Spectrum . . . Autocorrelation . . . . . . . . Average Mutual Information False Nearest Neighbours . . Lyapunov Exponents . . . . Correlation Integral . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

4.1 Schematic illustration of an evolutionary algorithm. The population is initialised (usually randomly). From this population, the most t entities are selected to be altered by genetic operators exempli ed by crossover (corresponding to sexual reproduction) and mutation. Selection is performed based on certain tness criteria in which the more ' t' are selected more often. Crossover simply combines two genotypes by exchanging sub-strings around randomly selected points. In the illustration above, parental genotypes are indicated as either all 1s or all 0s, for the sake of clarity. Mutation simply ips the randomly selected bit. . . . . 4.2 Evolution of forecast error as a function of forecast horizon for both embedding parameters based on recipe and evolutionary algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Time series of observed waters levels, as well as the water levels calculated based on evolutionary embedding and recipe. Forecast horizon 1 hour . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Time series of observed waters levels, as well as the water levels calculated based on evolutionary embedding and recipe. Forecast horizon 4 hours . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Time series of observed waters levels, as well as the water levels calculated based on evolutionary embedding and recipe. Forecast horizon 8 hours . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Time series of observed waters levels, as well as the water levels calculated based on evolutionary embedding and recipe. Forecast horizon 12 hours . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 User interface of a genetic algorithm implementation under DHI's Mike Zero framework . . . . . . . . . . . . . . . . . . . . . . . . . 36

11 11 15 17 20 21

27 29 30 31 32 33 34

Bibliography [1] H. D. I. Abarbanel. Chaotic signals and physical systems. IEEE Int. Conf. ASSP, 4:113{116, 1992. [2] H. D. I. Abarbanel. Nonlinearity and chaos at work. Nature, 364:672{673, 1993. [3] H. D. I. Abarbanel. Analysis of Observed Chaotic Data. Springer-Verlag, New York Berlin Heidelberg, 1996. [4] H. D. I. Abarbanel, R. Brown, and M. B. Kennel. Lyapunov exponents in chaotic systems: their importance and their evaluation using observed data. Int. J. Mod. Phys. B, 5:1347{1375, 1991. [5] Vladan Babovic. Emergence, Evolution, Intelligence: Hydroinformatics. Balkema, Rotterdam, 1996. [6] Vladan Babovic and Michael B. Abbott. Evolution of equation from hydraulic data: Part i - theory. Journal of Hydraulic Research, 35(3):1{14, 1997. [7] Vladan Babovic and Michael B. Abbott. Evolution of equation from hydraulic data: Part ii - applications. Journal of Hydraulic Research, 35(3):15{34, 1997. [8] Vladan Babovic and Maarten Keijzer. Genetic programming as a model induction engine. Journal of Hydroinformatics, 2(1):35{60, 2000. [9] Vladan Babovic, Maarten Keijzer, and Rahman Mahbub. Analysis and prediction of chaotic time series. D2K Technical Report 0399-2, Danish Hydraulic Institute, http://www.d2k.dk, 1999. [10] Vladan Babovic, Lars Christian Larsen, and Z. Wu. Calibrating hydrodynamic models by means of simulated evolution. In Adri Verwey, Anthony W. Minns, Vladan Babovic, and Cedo Maksimovic, editors, Proceedings of the First International Conference on Hydroinformatics, pages 193{200. Balkema, Rotterdam, 1994. [11] J. D. Farmer and J. J. Sidorowich. Predicting chaotic time series. Phys. Rev. Lett., 59:845{848, 1987. [12] A. M. Fraser. Information and entropy in strange attractors. IEEE Trans. Information Theory, 35(2):245{262, 1989. 37

[13] A. M. Fraser and H. L. Swinney. Independent coordinates for strange attractors from mutual information. Phys. Rev. A, 33(2):1134{1140, 1986. [14] T. W. Frison, H. D. I. Abarbanel, M. D. Earle, J. R. Schultz, and W. D. Scherer. Chaos and predictability in ocean water levels. J. Geophys. Res. Oceans, 104(C4):7935{7951, 1999. [15] N. A. Gershenfeld and A. S. Weigend. The future of time series: Learning and understanding. In A. S. Weigend and N. A. Gershenfeld, editors, Time series prediction: Forecasting the future and understanding the past, pages 1{70. Addison Wesley, Reading, MA, 1993. [16] P. Grassberger. Generalized dimensions of strange attractors. Phys. Lett. A, 97:227{230, 1983.

[17] P. Grassberger, R. Badii, and A. Politi. Scaling laws for invariant measures on hyperbolic and non-hyperbolic attractors. J. Stat. Phys., 51:135{178, 1988. [18] J. L. Kaplan and J. A. Yorke. Chaotic behavior of multidimensional difference equations. In H. O. Walter and H.-O. Peitgen, editors, Functional Dierential Equations and Approximation of Fixed Points, volume 730 of Lecture Notes in Mathematics, pages 204{227. SpringerVerlag, Berlin, 1979. [19] Maarten Keijzer and Vladan Babovic. Error correction of a deterministic model in venice lagoon by local linear models. In Proceedings of the \Modelli complessi e metodi computazionali intensivi per la stima e la previsione" conference. Universita Ca' Foscari, Venice, September 1999.

[20] M. B. Kennel, R. Brown, and H. D. I. Abarbanel. Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys. Rev. A, 45:3403{3411, 1992. [21] A. I. Khinchin. Mathematical foundations of statistical mechanics. Dover Publ., Inc., New York, 1949. [22] V. I. Oseledec. A multiplicative ergodic theorem. Lyapunov characteristic numbers for dynamical systems. Trans. Mosc. Math. Soc., 19:197{231, 1968. [23] D. Ruelle and F. Takens. On the nature of turbulence. Comm. Math. Phys., 20:167{192, 1971. [24] T. Sauer and J. A. Yorke. Rigorous veri cation of trajectories for the computer simulation of dynamical systems. Nonlinearity, 4:961{979, 1991. [25] T. Sauer, J. A. Yorke, and M. Casdagli. Embedology. J. Stat. Phys., 65:579{ 616, 1991. [26] C. E. Shannon and W. Weaver. The mathematical theory of information. University Press, Urbana Ill., 1949. [27] J. Theiler. Estimating fractal dimensions. J. Opt. Soc. Am. A, 7:1055{1073, 1990. 38

[28] G. U. Yule. On a method of investigating periodicities in disturbed series with special reference to wolfer's sunspot numbers. Philos. Trans. Roy. Soc. London A, 226:267{298, 1927. [29] J. M. Zaldivar, E. Gutierrez, I. M. Galvan, F. Strozzi, and A. Tomasin. Forecasting high waters at venice lagoon using chaotic time series analysis and nonlinear neural networks. Journal of Hydroinformatics, 2(1):61{87, 2000.

39