Binary Response Nonparametric Regression Model And Its ...

Proceedings of IICMA 2009 Topic, pp. xx—xx.

Binary Response Nonparametric Regression Model And Its Application in University Graduation 1

Jerry Dwi Trijoyo Purnomo Statistics Department Institut Teknologi Sepuluh Nopember Email: [email protected] 2

Suhartono Statistics Department Institut Teknologi Sepuluh Nopember Email: [email protected]

ABSTRACT In linear logistic regression model, the expectation of a binary response variable with the logit model ln(p(x)/(1-p(x))) = x`a. The assumption of linearity is often violated by smooth function of the explanatory variables, so that, alternative form are sought. The expectation of k a binary response variable with the logit model become ln(p(x)/(1-p(x))) = i ( xi ) where i 1 i are general smooth functions of the explanatory variables. Estimation is achieving using local maximum likelihood. The technique is illustrated in university graduation problem (master degree), with status of grade point average (GPA) as response variable, and toefl pre-test and “Tes Potensi Akademik” (TPA) as explanatory variables.

Key word: Logistic regression, binary response, smooth function, local maximum likelihood

1. INTRODUCTION One of the important methods in statistics is that of regressing a binary response variable on a set of explanatory variables. This has special field in medical diagnosis, education and risk analysis. An example will be used in this paper is university graduation (master degree) of Institut Teknologi Sepuluh November Surabaya. The response variable y is coded 1 if GPA greater than 3.50 (scale 4.00), else 0. We have a sample of master degree graduation’s GPA, which we use to model the probability of the binary response as a function of the explanatory variables. Specifically, we wish to estimate p(x) = p(y=1|x)= 1-p(y=0|x) for any vector x. A standar approach to the problem is the linear logistic regression model (Hastie, 1987):

p(x)

exp(x' a) 1 exp(x' a)

or

ln

p(x) 1 p(x)

logit p(x)

x' a

(1)

2000 Mathematics Subject Classification:

1

PURNOMO AND SUHARTONO

In words this says that the log-odds of the model are linear in the predictor variables. The unknown parameters a, can be found by using maximum likelihood (Hosmer and Lemeshow, 1989). Another approach is to make the assumption that the predictor variables are jointly normal with the same covariance matrix in each group (y=1 or y=0) but with different mean vectors. For this model, the log-odds are once again linear in x and the parameters are function of the parameters of the normal distribution. This is known as Fisher’s linear discriminant function (Lachenbruch, 1975). The logit form in (1) guarantee that the estimated probabilities are positive and in the interval [0,1]. It is also the form of the natural parameter for Binomial distribution in Exponential Family (Hastie, 1987). An often unjustified and misleading assumption is that logit p(x) is linear in x. The effect of a predictor may be felt only for a portion of its range. Sometimes this linear effect is require in terms of predictive ability, however, a linear term might be in appropriate and lead to the wrong interpretation. In order to generalize (1), we propose the model: k

logit p(x) i 1

i

( xi )

(2)

where i ( xi ) is an specified nonparametric smooth function of xi, and k is the dimension of x. The estimation is performed using local likelihood technique introduce by Tibshirani (1982) in the context of censored data and the proportional hazards model.

2. METHODS 2.1 The Linear Logistic Model Consider the linear case in which logit p(x) = x’a. The log-likelihood for n independent observations (y1, x1),…, (yn, xn) is n

L(a)

[ yi ln pi

(1 yi ) ln (1 pi )]

i 1 n

' i

[ yi xi' a ln (1 e x a )]

(3)

i 1

where pi = p(xi). Let X be nxp matrix of the predictor variables, y an n vector of responses and p an n vector of model probabilities with ith element pi. The maximum likelihood estimate aˆ maximaze (3) and the score function is given by X ' ( y pˆ ) 0 (4) where 'ˆ

pˆ i

e xi a '

1 e xi aˆ The expected information matrix is given by I(a)

X' VX

where V is a diagonal matrix with ith entry pi(1-pi). The Newton-Rhapson iterative procedure can be used to solve the non-linear system (4) with the estimate at the (t+1) st iteration aˆ (t 1) aˆ (t ) I 1 (aˆ (t )) X ' ( y pˆ (t )) (5) 2


2.2 The Nonlinear Model With One Predictor Consider that the model logit p(x) = (x) where x is a scalar predictor variable. Let the sample point x1, x2, …, xn be sorted in ascending order. We wish the estimate at each point xi to exhibit the local behavior of the response. We then consider only those points within a certain neighborhood of xi and base estimation on them. The neighborhood is defined in terms of a span, which is the proportion of the sample. Usually we take half the span to the left, and half to the right of x i. At the end point we have to consider asymmetric neighborhood. The local likelihood for span s, (s [0,1]), at point i given by r (i , s )

L(a(i), i, s)

[ y j a0 (i)

y j x j a1 (i) ln(1 e

a0 ( i ) a1 ( i ) x j

)]

(6)

j l (i , s )

where l(i, s) = max(0, i -

ns ) 2

r(i, s) = min(0, i +

ns ) 2

Let aˆ (i ) maximize (6) and define ˆ( x ) ˆ i

i

aˆ 0 (i ) aˆ1 (i) xi

(7)

The estimate of ( xi ) is only affected by the ns/2 nearest neighbors to the left and ns/2 to the right, and thus exhibits local properties of the data. As we move to estimate ( xi 1 ) , point l(i, s) leaves the likelihood and point r(i, s)+1 enters it, and thus the likelihood does not change much. As a consequence, aˆ (i 1) is not much different from aˆ (i) , and hence ˆ( x ) is not much different from ˆ( x ) . This result in a smooth estimated i 1 i curve ˆ (.) . As s increase toward 1, ˆ will get smoother and in the limit is the usual straight line (Hastie, 1987). Each local likelihood is maximized using the above iterative procedure and can be time consuming. However, aˆ (i) is an excellent starting value for the (i+1) st local likelihood and convergence is usually achieved in 1 or 2 iterations (Hastie, 1987).

2.3 The Non-Linear Model With More Than One Predictor The procedure here is related to the backfitting algoritm applied to additive models in Friedman and Stuetzle (1982), and adapted by Tibshirani (1982) for local likelihood estimation in the Cox model. Suppose we are given 1 (.),..., p 1 (.) and let p 1 ( p) ( x j )

i

( x ji )

i 1

where x 'j = (xj1, xj2, …, xjp). We need to estimate

p

( x jp ) . The local

likelihood is:

3


r (i , s )

L(a(i), i, s)

(yj

( p)

( x j ) y j a0 (i) y j x jp a1 (i) ln(1 e

( p) ( xj )

a0 ( i ) a1 ( i ) x jp

))

j l (i , s )

where logit ( pˆ j )

( p)

( x j ) aˆ 0 (i) aˆ1 (i) x jp . The local information is define

similarly. Thus ˆ p (.) can be found using the Newton-Rhapson procedure as before.

2.4 Spline in Nonparametric Regression There are various approaches used to get the estimator f. If the regression curve is assumed to be smooth, in the sense of continuous and differensiable, the estimate for f is obtained by using the approach of Penalized Least Square (PLS), the criteria that consider goodness-of-fit smoothness. In general, spline function with orde (m-1) with knots S1, S2, ... Sk are any function that can be presented in the form (Heckman, 1986): k 1

S(t)

αi t i

i 1

h

δ j (t S j ) k

(8)

Sj )k 1, t

(t

where (t

1

j 1

Sj )k

Sj

1

0

, t < Sj

and are real constants and S1, S2,...Sk are knots.

2.5 Choose of Smoothing Parameter ( λ ) Smoothing parameter is controlling the balance between the compliance curve to the data and smoothness curve (Eubank, 1988). Pairing a very small value or greater value of λ will provide very coarse resolution or very smooth function. (Wahba, 1990; Eubank, 1988). Several methods of selection of smoothing parameter which have been developed are Cross Validation (CV) and Generalized Cross Validation (GCV) (Craven and Wahba (1979)), Wahba (1985), Li (1986), Kohn et al (1991), Andrews (1991), Shao (1993 ), Venter and Snyman (1995), Kauermann and Opsomer (2001), Carew, Wahba, Xie, Nordheim, and Meyerand (2003), Ruppert, Wand, and Carroll (2003). In the spline regression model, GCV criterion is defined as: GCV ( )

MSE ( ) (n 1 (tr[ I A( )]))2

4


n

(y j

n

1

j 1

n

1

(n 1tr(I

(I

f λ j )2 A(λ))) 2

A(λ)) y

n 1tr(I

2

A(λ))

2

3. MAIN RESULTS An observation has been done in 2008 by Fathurahman at Institut Teknologi Sepuluh November (ITS) concerned master degree graduation of ITS. There are 213 graduation students and three variables: y = 1 if GPA < 3.5 (scale 4.00) 0 if GPA ≥ 3.5 (scale 4.00) x1 = toefl pre-test score x2 = TPA score Descriptive statistics of ITS graduation students are given in Table 3.1, and the summary of parameter estimation of logistic regression are shown in Table 3.2. Table 3.1. Descriptive statistics of graduation students Variable GPA (y) Toefl pre-test (x1) TPA (x2)

Mean 3.43 422.27 45.8

St. Dev. 0.26 56.52 10.2

Minimum 2.86 310 15

Maximum 3.94 570 68.9

Table 3.2. Parameter estimation of logistic regression Variable Standard Error Sig. a x1 0.012 0.003 0.001 x2 0.081 0.023 0.000 Constant -9.842 1.566 0.000 The linear logistic regression model from this observation is: exp( 9.842 0.012x 1 0.081x 2 ) p(x) 1 exp( 9.842 0.012x 1 0.081x 2 ) or ln

p(x) 1 p(x)

logit p(x)

9.842

0.012x 1

0.081x 2

The observation made by Fathurahman (2008) for master degree graduation of ITS show that between GPA, toelf pre-test score and TPA score have unclear pattern. Those are seen in graduation students with high GPA, their toefl score and TPA score are low, and vice versa. It is also shown from R2= 31 %, that relatively small. This show indication the usage 5


GPA

0.0

0.2

0.4

0.6

0.8

of nonparametric models. So that, the linear logistic regression model is become: logit p(x) = 1.126 -0.00595 x1 + 0.0000097 x12 -0.0000161 (x1 470 ) 2 + -0.0184 x2 + 0.000359 x 22 -0.000928 (x 2 56 ) 2 2 With R = 98%

300

350

400

450

500

550

Toelf Pre-test Toefl

GPA

0.0

0.2

0.4

0.6

0.8

Figure 3.1. Correspondence between GPA and toefl-pre-test

20

30

40

50

60

70

TPA

Figure 3.2. Correspondence between GPA and TPA score

3. CONCLUDING REMARKS

The presence of peculiar relationship in the data up above involve that linear logistic regression method is less suitable in practice. Peculiar relationship between response variable and explanatory variables are due to the existence of smooth functions in explanatory variables. The use of nonparametric, spline approach, is the solution that can be used to overcome this condition, since this approach gives better result than linear logistic regression. 6


REFERENCES

1. D.W.K. Andrews, Asymtotic Optimality of Generalized CL, Cross Validation, and Generalized Cross Validation in Regression with Heterokedastic Error, Journal of Econometrics, 47, 359-377, 1991. 2. J.D. Carew, G. Wahba, X. Xie, E.V. Nordheim, and M.E. Meyerand, Optimal Spline Smoothing of fMRI Time Series by Generalized Cross Validation, NeuroImage, 18, 950-961, 2003. 3. Craven, and G. Wahba, Smoothing Noisy Data With Spline Function: Estimating The Correct Degree of Smoothing by The Method of Generalized Cross Validation, Numer. Math,31, 377-403, 1979. 4. R.L. Eubank, Spline Smoothing and Nonparametric Regression, Marcel Dekker, New York, 1988. 5. J.H. Friedman, and W. Stuetzle, Smoothing of Scatterplots, Dept. of Statistics Tech. Rept. Orion 3, Stanford University, 1982. 6. T.J. Hastie, Nonparametric Logistic and Proportional Odds Regression. Applied Statistics, 36, 260-276, 1987. 7. D.W. Hosmer, and Lemeshow, Applied Logistic Regression, New York: John Wiley,1989. 8. G. Kauermann, and J.D. Opsomer, A Fast Method for Implementing Generalized Cross Validation in Multidimensional Nonparametric Regression, Paper 247, 2001. 9. R. Kohn, Et al, The Performance of Cross Validation and Maximum Likelihood Estimators of Spline Smoothing Parameters, Journal of The American Statistical Association, 86, 1042-1050, 1991. 10. P.A. Lachenbruch, Discriminant Analysis, New York: Hafner Press, 1975. 11. K.C. Li, Asymtotic Optimality of Cl and Generalized Cross Validation in Ridge Regression With Application to Spline Smoothing, Ann.Statist., 14, 1101-1112, 1986. 12. D. Ruppert, M.P. Wand, and R.J. Carroll, Semiparametric Regression. New York: Cambridge University Press, 2003. 13. R. Tibshirani, Nonparametric Estimation of Relative Risk. Submitted to the Journal of The American Statistical Association, 1982. 14. J.H. Venter, and J.L.J. Snyman, A note on The Generalized Cross Validation Criterion in Linear Model Selection, Biometrika, 82, 215-219, 1995. 15. G. Wahba, A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem, Journal the Annals of Statistics, 13, 1378-1402, 1985. 16. G. Wahba, Spline Models for Observasional Data, SIAM, Pensylvania, 1990.

7