On Multivariate, Multiple Nonlinear Regression Models with Stationary ...

On Multivariate, Multiple Nonlinear Regression Models with Stationary Correlated Errors Gy. Terdik, T. Subba Rao & S. Rao Jammalamadaka First version: 15 August 2006

Research Report No. 11, 2006, Probability and Statistics Group School of Mathematics, The University of Manchester

On multivariate,multiple nonlinear regression models with stationary correlated errors. Gy. Terdik1 , T. Subba Rao2 and S. Rao Jammalamadaka3 1 Department of Information Technology Faculty of Informatics, University of Debrecen 4010 Debrecen, Pf.12, HU 2 School of Mathematics, University of Manchester PO Box 88, Manchester M60 1QD, UK 3 Department of Statistics and Applied Probability University of California Santa Barbara, Ca. 93106, USA

ABSTRACT In this paper we consider the statistical analysis of multivariate multiple nonlinear regression models with correlated errors, using Finite Fourier Transforms. Consistency and asymptotic normality of the weighted least squares estimates are established under various conditions on the regressor variables. These conditions involve di¤erent types of scalings, and these scaling factors are obtained explicitly for various types of nonlinear regression models including an interesting model which require the estimation of unknown frequencies The estimation of frequencies is a classical problem occurring in areas like signal processing, environmental time series, astronomy and in many areas of physical sciences. We illustrate our methodology using two real data sets taken from geophysics and environmental sciences. The data we consider from geophysics is polar motion (which is now widely known as "Chandlers Wobble") where one has to estimate the drift parameters, the o¤set parameters and the two periodicities associated with elliptical motion. The data was …rst analyzed by Arato, Kolmogorov and Sinai who treat it as a bivariate time series satisfying a …nite order time series model. They estimate the periodicities using the coe¢ cients of the …tted models. Our analysis shows that the two dominant frequencies are 12 hours and 410 days. The second example, we consider is the minimum/maximum monthly temperatures observed at the Antarctic Peninsula ( Faraday/Vernadsky station). It is now widely believed that over the past 50 years there is a steady warming in this region, and if this is proved, the warming has serious consequences on ecology, marine life etc. as it can result in melting of ice shelves and galciers. Our object here is to estimate trend (if any) in the data. We use the nonlinear regression methodology developed here to estimate the trend.

1 Introduction One of the classical problems in statistical analysis is to …nd a suitable relationship between a response variable Yt and a set of p regressor variables x1, x2 ;...xp under suitable assumptions on the random errors. The usual assumption is that the errors are independent, identically distributed random variables. This was later generalized to the case when the errors are correlated. In many situations the response function is a nonlinear function in both regressor variables and the parameters. The asymptotic results of the estimators of the parameters are now well known (see for example Jennrich, 1969). The results were later extended to the case of nonlinear multiple regression with correlated errors by Hannan(1971) when the stationary errors satisfy a linear representation. The methods used by Hannan are frequency domain oriented which depend on the properties of Finite Fourier Transforms. The methodology is dependent on making certain assumptions on regressor vari1

2

ables ,now known as Grenander and Rosenblatt’s conditions (see Grenander and Rosenblatt,1957) The conditions depend on the nature of nonlinearity of the parameters in the regressor variables. The central limit theorems associated with the estimates obtained and the scaling factors required also depend on these functions. If the nonlinear parameters to be estimated are frequencies, even though the model looks linear, one has to impose a di¤erent set of conditions than the usual conditions. A brief discussion of this important aspect was discussed by Hannan (1971). Robinson (1972) extended the results of Hannan to the multivariate nonlinear regression situation, when the regression matrix is not of full rank and the parameters satisfy some constraints. The methods and the asymptotic theory of Robinson does not include the situation when the parameters are frequencies and also the variance-covariance matrices of the estimated parameters given are not explicit enough for computational purposes. In this paper our objective is to consider both linear models and nonlinear models and a mixture of both.. We state suitable scaling factors in each case for establishing asymptotic properties of the estimates. We refer to a recent paper by Subba Rao and Terdik (2006) [SRT06], for more details..A multivariate convolution model somewhat similar to our model was considered by Shumway, Kim and Blandford(1999), [SKB99]. We propose a frequency domain approach which we believe is extremely useful in situations when the marginal distributions of the elements of the random vector have di¤erent distributions as in the case of minimum and maximum temperatures considered in this papers. The minimum and maximum temperatures have extreme value distributions with di¤erent sets of parameters( see Hughes, Subba Rao,Subba Rao,2006, [HSRSR06], for a detailed study). . The minimum contrast estimate ([Guy95], [Hey97]) of the unknown parameters are constructed in frequency domain. The mixed model containing the linear regression and linear combinations of the nonlinear regressions is also considered in detail. To illustrate our methods we consider the analysis of time series of Polar motion (in geophysics this is widely described as “Chandlers Wobble"). One of the …rst papers dealing with statistical analysis of this data was by Arato, Kolmogorov and Sinay (1962) who estimated the parameters of the polar motion, such as o¤set(trend), drift and periodicities. Using high resolution GPS data it is shown in this paper that besides the well-known Chandler period of 410 days, a secondary period of 12 hours is also present . The motion of the pole (as a function of time) can be described by two polar coordinates which are strongly correlated. This motion is very similar to that of a rotating spinning top which is a children’s toy. The second example we consider is the minimum and maximum monthly temperatures (and Ozone levels) observed at a station in the Antarctic Pensinsula and this data is considered to illustrate our methodology where the regression model is linear (here we assume the periodicities are known a priory) We refer to a recent paper by Hughes, Subba Rao,Subba Rao (2006) [HSRSR06], for a detailed analysis of this data In the following we brie‡y review the methodology , and for more details we refer to Subba Rao and Terdik(2006) [SRT06].

1.1 The Statistical Model and assumptions Consider a d dimensional observational vector Y t , the random disturbances Z t and the function X t (#0 ) satisfying the usual model Y t = X t (#0 ) + Z t : (1) The function X t (#) can be a nonlinear function of both regressor variables and the p-dimensional parameter vector # 2 Rp ; while Z t is a d-dimensional stationary correlated time series.Here #0 is the true parameter to be estimated.,The set of admissible parameters # is de…ned by a number of possibly nonlinear equations, see [Rob72], for more details. We shall assume that the set is chosen suitably in each case. For convenience we consider X t (#) as a regressor vector , although in particular cases we may have to separate the regressors and the parameters. The regressor variables may depend on the parameters nonlinearly. One speci…c model, we have in mind, for X t (#) can be written in the form X t (#) = B1 X 1;t + B2 X 2;t ( ) :

3

This can be considered as a mixed model since it is both linear as well as non-linear in the parameters at the same time. The parameter vector # contains both B1 and B2 and also the vector . The admissible set is the union of three subsets. There is no restriction on the entries of matrix B1 may have to satisfy some identi…ability with size d p1 . However the matrix B2 and the vector conditions. The parameter is identi…ed unless some particular entries of B2 annihilate an entry, say k , from the model. If are set of frequencies to be estimated we may have to put some constraints so that they lie within a compact set.. We assume that Z t is a stationary linear processes, and has the moving average representation Zt =

1 X

Ak W t

k ; where

k= 1

1 X

k= 1

Tr (Ak CW Ak ) < 1:

Here we assume that W t is a sequence of independent identically distributed random vectors( i.i.d.vectors) and CW = Var W t ; is non-singular We also assume that Z t has a (element by element)piecewise continuous spectral density matrix SZ (!) ; as in [Han70], [Bri01]. The model is feedback free, i.e. Z t does not depend on X t .

1.2 The conditions on regressors and regression spectrum Consider the regressor function X t (#) which is a function of t which may depend nonlinearly on parameters # belonging to the compact set . It is widely known that the Grenander’s conditions ( see [Gre54], [GR57]), for the regressor X t (#) are su¢ cient and in some situations ([Wu81]), also necessary to establish the consistency of the least squares ( LS) estimators. They are as follows. Let 2 kXk;t (#)kT

=

T X

2 Xk;t (#) :

t=1

denote the Euclidean norm of the kth regressor of the vector X(#) Condition 1 (G1) For all k = 1; 2; : : : ; d; 2

lim kXk;t (#)kT = 1:

T !1

Condition 2 (G2) For all k = 1; 2; : : : ; d; lim

T !1

2 Xk;T +1 (#) 2

kXk;t (#)kT

= 0: 2

Without any loss of generality we can assume that the regressor X t (#) is scaled : kXk;t (#)kT ' T; for all k, see De…nition 15 and a note therein. De…ne the following matrices. Let for each integer h 2 [0; T ); T Xh b X;T (h; # ; # ) = 1 C X (# ) X |t (#2 ) ; 1 2 T t=1 t+h 1

(2)

b X;T ( h; # ; # ) = C b | (h; # ; # ) : C 1 2 2 1 X;T

b X;T (h; # ; # ) Whenever #1 = #2 = #, throughout our paper we use the shorter notation C 1 2

#1 =#2 =#

=

b X;T (h; #). The next condition we impose essentially means that the regressor X (#) is changing C t 2 ’slowly’in the following sense. For each integer h, kXk;t (#)kT +h ' T . Condition 3 (G3) For each integer h,

b X;T (h; #) = CX (h; #) : lim C

T !1

4

Condition 4 (G4) CX (0; #) is non-singular. One can use the Bochner’s theorem for the limit CX such that Z 1=2 exp (i2 h) dF ( ; #) ; CX (h; #) = 1=2

where F is a spectral distribution matrix function of the regressors (from now on abbreviated, SDFR) whose entries are of bounded variations. The SDFR can be obtained as the limit of the vector of periodogram ordinates (see [Bri01] for details). We now introduce the discrete Fourier transform dX;T (!; #) =

T X1

X t (#) z

t

;

z = exp (2 i!) ;

t=0

1 2

!
1=2 as well), and DT = diag (T1 ; T2 ; : : : Td ). In this case the SDFR F is concentrated at zero with values dFj;k (0) = p b is (2k 1) (2j 1)= (k + j 1) ; so the asymptotic variance of Vec B DT 1 dF

1

(0) DT 1

SZ (0) ;

(13)

(see [GR57] p. 247, for scalar valued case.)

4.2 A mixed model involving parameters both linearly and nonlinearly An interesting model which we come across very often is where the regression function is linear and also contains a nonlinear part of the type Y t = X t (#0 ) + Z t ; where X t (#) = B1 X 1;t + B2 X 2;t ( ) = [B1 ; B2 ]

(14)

X 1;t X 2;t ( )

= BX 3;t ( ) ; Here the unknown parameter # = Vec (Vec B1 ; Vec B2 ; ), where B1 is d

p, B2 is d

X 1;t is of dimension p, X 2;t ( ) is of dimension q; B = [B1 ; B2 ] and X 3;t ( ) =

q, is r 1, X 1;t First X 2;t ( ) :

we consider the problem of estimation, by minimizing the objective function QT (B; ) =

T1 1 X T

Tr IY ;T (!k ) (!k ) + Tr BIX 3 ;T (!k ; )B|

(!k )

(15)

k= T1

Tr IY ;X 3 ;T (!k ; )B|

(!k )

Tr BIX 3 ;Y ;T (!k ; ) (!k )

:

Now, di¤erentiate with respect to h B1 ; Bi2 and . We can apply the linear methods for B in terms of b b 1; B b 2 and b, satis…es the system of equations X 3;t ( ) : Suppose that the B= B @QT (B; ) = 0; @B @QT ( ) Vec = 0: @ |

13

The estimation of the linear parameters B1 and B2 can be carried out as in linear regression when the parameter is …xed. It leads to a recursive procedure. When we …rst set = e (chosen), the normal equations result in b = Vec B

T1 X

I|X ;T 3

!k ; e

k= T1

k= T1

@BIX 3 ;T (!k ; )B| @ |

1

(!k )

T1 X

Vec

(!k ) IY ;X 3 ;T !k ; e ;

k= T1

b …xed and then minimize (15), i.e. …nd the , we keep B = B

Now, to obtain the estimates for solution to the equation T1 X

!

@IY ;X 3 ;T (!k ; )B| @ |

@BIX 3 ;Y ;T (!k ; ) @ |

|

Vec =b

|

(!k ) = 0:

X 1;t , is given by DT = diag (DX1 ;T ; DX2 ;T ) where X 2;t ( ) DX1 ;T = diag (DX1 ;k (T ) ; k = 1; 2; : : : ; p), and DX2 ;T = diag (DX2 ;k (T ) ; k = 1; 2; : : : ; q). The secondary scaling of the regressors are D1;T = diag (Udp+dq ; D3;T ) : Let us denote the limit of the variance covariance of the derivatives 2 h @Q (B ;B ; ) i| 3 The primary scaling of X 3;t ( ) =

T

1

2 |

6 h @ Vec B1 i| 6 @QT (B1 ;B2 ; ) 6 @ Vec B| 6 2 i| 4 h @QT (B1 ;B2 ; ) @ |

by

2

= 24

11

12

21

D3;T

7 7 7 7 5

1

22 1

D3;T

2 2

D3;T

3 D3;T 5; D3;T D3;T

where the blocks of contain already the scaling DT of the regressor,(this includes the case when = SZ 1 ). The part that is linear in parameters results in

11

12

22

= = =

Z Z

Z

1=2 1=2 1=2

DX1 ;T dF|11 (!) DX1 ;T SZ 1 (!) ; DX2 ;T dF|12 (!;

0 ) DX1 ;T

SZ 1 (!)

DX2 ;T dF|22 (!;

0 ) DX2 ;T

SZ 1 (!) :

1=2 1=2 1=2

In the mixed model context we get 1

2

= =

Z

Z

1=2

DX1 ;T

SZ 1 (!) B2;0 DX2 ;T d

@F1;2 (!; @ |

0)

DX2 ;T

SZ 1 (!) B2;0 DX2 ;T d

@F2;2 (!; @ |

0)

1=2 1=2 1=2

The following nonlinear block matrix =2

Z

1=2

Ur 1=2

Vec

h

; :

comes from the general result (10)

DX2 ;T B|2;0 SZ 1 (!) B2;0 DX2 ;T

i|

|

d

@ 2 F2;2 (!; @ |2 @

1; | 1

2)

: 1= 2= 0

14

b 1 ; Vec B b 2 ; b is Finally the variance matrix of the estimates Vec Vec B h

b 1 ; Vec B b 2; b Var Vec Vec B

i

2

'4

11

12

21

D3;T

1

22 1

2

D3;T

2

D3;T

3 D3;T 5 D3;T D3;T

1

:

4.3 Some Special Examples which are of Interest.. A model with Linear trend and with harmonic components; a worked example Here we consider a special case of the mixed model considered above We give brie‡y the detailed calculations to illustrate the methodology.. This model is later used to demonstrate our methods to real data (here d = 2). Let Y t = X t (#0 ) + Z t ; where

2

cos (2 6 sin (2 + A6 4 cos (2 sin (2

1 t

X t (#) = B |

|

3 t 1) t 1) 7 7: t 2) 5 t 2)

(16)

The parameter is #| = ([Vec B1 ] ; [Vec B2 ] ; [ 1 ; 2 ]) ; j i j ; 1 6= 2 ; i 6= 0; 1=2. It is readily seen that the estimation of the coe¢ cient matrix B of the linear regression is given by B=

b11 b21

b12 b22

:

We see later that we can estimate B independently of A given by A=

a11 a21

a12 a22

a13 a23

a14 a24

:

p p The primary scaling for X 1;t is DX 1 ;T = diag T 1=2 ; T 3=2 = 3 , and for X 2;t ( ) is DX 2 ;T =T 1=2 = 2U4 , since X 1;t = [1; t]| and X 2;t ( ) = [cos (2 t 1 ) ; sin (2 t 1 ) ; cos 2 t 2 ; sin (2 t 2 )]. The secondary scaling for the linear part X 1;t as we have already seen is U2 , and the secondary one for the nonp linear part is U4 and the scaled partial derivatives corresponding to is D = 2 T = 3U2 since 3;T p the primary scaling T =2 has already been applied. Therefore the scaling matrix DT of the regres| sors X |1;t ; X |2;t ( ) is DT = diag DX 1 ;T ; DX 2 ;T , and D1;T = diag (U12 ; D3;T ). The asymptotic variance is therefore given by D1;T1 J

1

DT SZ 1 DT ; F D1;T1 :

b X;T is 1= kXk;t k kXm;t k : Here it In general, the proper scaling for the term Xk;t Xm;t+h in C T T can be changed into an equivalent function of T , instead of (2) we have b X;T (h; # ; # ) = D 1 C 1 2 T

T Xh

X t+h (#1 ) X |t (#2 ) DT 1 :

t=1

Let us partition the second derivative of SDFR according notation 2 F F12 @ 2 F (!; 1 ; 2 ) 4 11 = F21 F22 | | @ 2@ 1 F 1 F 2 Assume

1

6=

2;

i

6= 0; 1=2:

to the parameters, using the obvious 3 F1 F2 5 ; F

15

1. The regression spectrum of the linear part dF11 (!) = ! 0

p

p1 3=2

3=2 1

denotes the Kronecker delta. Hence the block 11

11

d

! 0

reduces to

= DX 1 ;T dF11 (0) DX 1 ;T SZ 1 (0) :

2. It is seen here that the mixed model parameters has no e¤ect, F12 (!; F1 (!; 0 ) = 0; 1 = 0. 3. The F22 (!;

0)

0)

= 0;

12

= 0, and

corresponds to the coe¢ cient A: Let H1h ( ) =

cos (2 sin (2

h) h)

sin (2 h) ; cos (2 h)

Notice b X ;T h; ; C 2 where

=!

and

= DX1 ;T 2

T Xh

X 2;t+h ( ) X |2;t

2

t=1

!

1= 1

1)

1= 2

1)

2=

2)

2=

2)

H1h ( 1 H1h (

H1h ( 2 H1h (

denotes the Kronecker delta. De…ne the step functions 8 8 !< ; < 0; < 0; 1=2; ! < ; i=2; g c (!) = g s (!) = : : 1; !; 0; G1 (!) =

Now we have b X ;T h; ; lim C 2

T !1

where

F22 !; ;

=

g c (!) g s (!)

Z

1= 1

=

2= 1

The scaled version of the block is Z 1=2 (DX2 ;T dF|22 (!; 22 = 2"

6 6 = T =2 6 6 4 0 ),

!