Nonlinear Regression Asymptotics

Nonlinear Regression Asymptoti s A. Ronald Gallant University of North Carolina

July 1992 Revised September 1992 Corre tions November 1992

1 Handout

for E on 275.

1992 by A. Ronald Gallant.

1

ABSTRACT This is terse summary of the asymptoti theory of least mean distan e estimation and testing for nonlinear statisti al models together with illustrative appli ation to least squares and maximum likelihood estimation. It is less general than Gallant (1987, Chapter 3) be ause the true parameter may not drift, estimates may not depend on a preliminary estimate of a nuisan e parameter, and the model must be orre tly spe i ed.

0

1 Estimation Theory 1.1 Setup Stru tural model: q(yt; xt; o) = et Redu ed form: yt = Y (et ; xt; o) Distan e fun tion: s(y; x; )

(must be omputable) (need only exist) (small when x & t y well)

et independent, ea h with distribution P (e) o in whi h is a losed & bounded subset of N ) = 1. The exponent may be positive, negative, or zero. Let fXng1 t=1 be a sequen e of random variables. Writing Xn = op (n ) means given > 0 and Æ > 0 there is an N su h that P (jXn=nj > ) < Æ for all n > N: Writing Xn = Op(n ) means given Æ > 0 there is a bound B and an N su h that P (jXn=nj > B ) < Æ for all n > N: The exponent may be positive, negative, or zero. If Xn onverges in distribution then Xn = Op(1): The obvious algebra holds: o(n )o(n ) = o(n+ ); o(n )O(n ) = o(n+ ); o(n )op(n ) = op (n+ ); os(n )op(n ) = op (n+ ); o(n )os (n ) = os (n+ ); os (n )Op (n ) = op (n+ ); et .

6

2.3 Distribution of the Constrained Estimator 2.3.1

Lagrangian

L(; ) = sn() + 0h() 2.3.2

First order onditions

0 = (=0 )sn(~) + ~0(=0 )h(~) 0 = h(~) 2.3.3

Taylor's expansions

Note that for any n on the line segment joining ~n to o we have ( 2 =0 )sn(n) = ( 2 =0 )sn(o) + os(1); (=0 )h() = (=0 )h(o) + os(1). Write H = (=0 )h(o); J as above, H = (=0 )h(n); J = ( 2 =0 )sn(n); H~ = (=0 )h(~n); J~ = ( 2 =0 )sn(~n); et . By Taylor's theorem p 0 = H n(~ o)

pn(=)s (~ ) = pn(=)s (o) + Jpn(~ o): n n n n

Thus

p p [H J 1H~ 0℄ 1H J 1 n(=)sn (o) = [H J 1H~ 0℄ 1 H J 1 n(=)sn (~n) p [H J 1H~ 0℄ 1H J 1J n(~n o) p = [H J 1H~ 0℄ 1 H J 1H~ 0 n~ 0 p = n~:

p Sin e [H J 1 H~ 0℄ 1 ; H J 1; and n(=)sn (o) are ea h Op(1) we have that pn~ = O (1) p

2.3.4

Key equations

H 0(H J 1 H 0 ) 1 H J

1 pn(=)s

n (

o)

p = [H 0(H J 1H 0) 1H J 1 + os(1)℄ n(=)sn (o) 7

p = H 0(H J 1H 0) 1H J 1 n(=)sn (o ) + op(1) p p = H 0(H J 1H 0) 1H J 1[ n(=)sn (~n) J n(~n o)℄ + op(1) p p = H 0(H J 1H 0) 1H J 1 n(=)sn (~n) H 0(H J 1H 0) 1 H n(~n o) + op(1) p = H 0(H J 1H 0) 1H J 1 n(=)sn (~n) 0 + op(1) = = = = 2.3.5

p H 0 (H J 1 H 0) 1 H J 1 H~ 0 n0 + op (1) p

[H~ 0 + os(1)℄[I + os(1)℄ n0 + op(1) p H~ 0 n~ + op (1) + op(1)

pn(=)s (~ ) + o (1): n n p Main result

Joining the rst line of Subse tion 2.3.4 to the last we have H 0(H J 1 H 0 ) 1 H J

1 pn(=)s

n (

o) =

pn(=)s (~ ) + o (1): n n p

2.4 Continuity theorem L X and g (x) is ontinuous then g (X ) ! L g (X ): For example, if X ! L X and If Xn ! n n An ! A where X is N (0; ); A is symmetri , and A is idempotent then Xn0 An Xn onverges in distribution to a hi square with rank(A) degrees freedom.

2.5 Wald test By Taylor's theorem Be ause h(o) = 0

pn[h(^ ) h(o )℄ = H pn(^ o ): n n pnh(^ ) = H pn(^ o): n n

p L N (0; V ) and H = H + os(1) Be ause n(^n o) ! p

pnh(^ ) !L N (0; HV H 0): n q

Thus

W = h^ 0 (HV H 0) 1 h^

onverges in distribution to a hi square with q degrees freedom by Subse tion 2.4. 8

2.6 Likelihood ratio test By Taylor's theorem (=0 )sn(~n) = (=0 )sn(^n) + J(~n ^n) = 0 + J(~n ^n) L = 2n[sn (~ n) sn (^ n )℄

p p = 2n(=)sn (^n)(~n ^n) + n(~n ^n)0J n(~n ^n) p p = 0 + n(=0 )sn(~n)J 1J J 1 n(=)sn (~n) pn(=0 )s (~ )[J 1 + o (1)℄pn(=)s (~ ) n n s n n p p = n(=0 )sn(~n)J 1 n(=)sn (~n) + op(1)

= Substituting

pn(=)s (~ ) = H 0(H J 1H 0) 1H J 1pn(=)s (o) + o (1) n n n p p from Subse tion 2.3.5 and noting that this equation implies n(=)s (~ ) = o (1) we n

have

p

p

L = n(=0 )sn (o )J 1 H 0 (H J 1 H 0 ) 1 H J

1 pn(=)s

n (

n

o) + o

p

p (1)

L N (0; I ). If H 0V H = H 0 J 1 H then J 1 H 0 (H J 1 H 0) 1 H J 1 I Re all n(=)sn (o) ! p is idempotent and L = 2n[sn (~ n) sn (^ n )℄

onverges in distribution to a hi square with q degrees freedom by Subse tion 2.4.

2.7 Lagrange multiplier test Using

pn(=)s (~ ) = H 0(H J 1H 0) 1H J 1pn(=)s (o) + o (1) n n n p

from Subse tion 2.3.5 we have that R = n(=0 )sn (~ n )J~ 1 H~ 0 (H~ V~ H~ 0 ) 1 H~ J~ 1 (=)sn (~ n )

= n(=0 )sn(~n)J 1H 0(HV H 0) 1H J 1(=)sn (~n) + op(1)

= n(=0 )sn(on)J 1H 0(HV H 0) 1H J 1(=)sn (on) + op(1): 9

p

L N (0; I ). Be ause J 1 H 0 (HV H 0 ) 1 H J 1 I is idempotent Re all that n(=)sn (o) ! p

R = n(=0 )sn (~ n )J~ 1 H~ 0 (H~ V~ H~ 0 ) 1 H~ J~ 1 (=)sn (~ n )

onverges in distribution to a hi square with q degrees freedom by Subse tion 2.4.

10

3 Appli ations 3.1 Least squares yt = f (xt ; o ) + et E (et) = 0 Var(et) = 2 =

Stru tural model: et = q(yt; xt ; o) = yt f (xt ; o) Redu ed form: yt = Y (et ; xt; o) = f (xt; o) + et Distan e fun tion: s(y; x; ) = [y f (x; )℄2 (=)s(y; x; ) = 2[y f (x; )℄(=)f (x; ) ( 2 =0 )s(y; x; ) = 2[(=)f (x; )℄[(=)f (x; )℄0 2[y f (x; )℄( 2 =0 )f (x; )

P

sn () = (1=n) nt=1 [yt f (xt ; )℄2 P son () = 2 + (1=n) nt=1 [f (xt ; o ) f (xt ; )℄2 R s () = 2 + X [f (xt ; o ) f (xt ; )℄2 d(x) fx : f (x; ) 6= f (x; o )g ) o = argmin son () ) o = argmin s ()

I = RX 42[(=)f (x; )℄[(=)f (x; )℄0 d(x)j= I = limn!1(1=n)42 Pnt=1 [(=)f (xt ; )℄[(=)f (xt ; )℄0j=^ I = limn!1(1=n) Pnt=1 4[yt f (xt ; ^n)℄2 [(=)f (xt ; ^n)℄[(=)f (xt ; ^n)℄0 o

n

2 = limn!1 (1=n)

Pn [y f (x ; ^ )℄2 t n t=1 t

J = RX 2[(=)f (x; )℄[(=)f (x; )℄0 d(x)j= J = limn!1(1=n) Pnt=1 2[(=)f (x; ^n )℄[(=)f (x; ^n )℄0 o

nR

V = 2 X [(=)f (x; )℄[(=)f (x; )℄0 d(x)j=o

11

o

3.2 Maximum likelihood yt = f (xt ; o ) + o et E (et) = 0 Var(et) = 1 = (0 ; 2 )0

Stru tural model: et = q(yt; xt ; o) = [yt f (xt; o)℄=o Redu ed form: yt = Y (et ; xt; o) = f (xt; o) + oet Distan e fun tion: s(y; x; ) = (1=2)flog 2 + 2 [y f (x; )℄2g (=)s(y; x; ) = 2 [y f (x; )℄(=)f (x; ) ( 2 =0 )s(y; x; ) = 2 [(=)f (x; )℄[(=)f (x; )℄0 2[y f (x; )℄( 2 =0 )f (x; ) (=2 )s(y; x; ) = (1=2)f 2 4[y f (x; )℄2 g ( 2 =2 2 )s(y; x; ) = (1=2)f 4 + 6 [y f (x; )℄2g ( 2 =2 )s(y; x; ) = 4 [y f (x; )℄(=)f (x; )

P

sn () = (1=n) nt=1 (1=2)flog 2 + 2 [y f (x; )℄2 g P son () = (1=2)flog 2 + ( o = )2 + (1=n) nt=1 2 [f (xt ; o ) f (xt ; )℄2 g R s () = (1=2)flog 2 + ( o = )2 + X 2 [f (xt ; o ) f (xt ; )℄2 d(x)g fx : f (x; ) 6= f (x; o )g ) (o ; o ) = argmin son () ) (o ; o ) = argmin s ()

0 I = B

1

(o) 2 Q (1=2)(o) 3 E (e3)q C A (1=2)(o) 3E (e3)q0 (1=4)(o) 4Var(e2)

0 (o) 2Q J = B 0 0

0 (1=2)(o)

12

4

1 CA

R

q = X (=)f (x; )℄ d(x)j=o

R

Q = X [(=)f (x; )℄[(=)f (x; )℄0 d(x)j=o

0 V = B

(o)2 Q 1

(o)3 E (e3)q0Q

(o )3 1

E (e3)Q 1 q

(o)4Var(e2)

1 CA

4 Referen es Burguete, Jose F., A. Ronald Gallant, and Geraldo Souza (1982), \On Uni ation of the Asymptoti Theory of Nonlinear E onometri Models," E onometri Reviews 1, 151{ 190. Gallant, A. Ronald (1987), Nonlinear Statisti al Models, New York: Wiley.

13