Applications of Generalized Linear Regression

0 downloads 0 Views 1MB Size Report
Finally we have p = 11 measurements introduced to the initially ... qqq qqqq q q q qqq q qqqqqqqqq q qqqq qq q q q q q q qq q qq q q q qq qq q q q q q qqqq q q.
Applications of Generalized Linear Regression Models in Preschool Language Research 1 Zhen Zhang

Huirong Zhu

Cong Lu

[zhangz19, zhuhuiro, lucong]@msu.edu Department of Statistics and Probability Michigan State University East Lansing, 48824, MI, USA Abstract In this technique report we apply the generalized linear models to study the relationship between children’s pass/fail outcome in M CC test and selected criterion measures in preshool language research. We also implement a polytomous data analysis to seek for informative measurements with significant effects on discriminating children grouped according to performances in both MCC and standardized PLS-3 test.

Keywords: Category data analysis, Logistic Regression, Multinomial Logit Regression, Classification, Model Comparison and Validation

1

Introduction

Progressive researches in the preschool communication field are conducted to evaluate preshool children’s performance in a minimal competence core (MCC) of consonants used by African American preschoolers in I. Stockman (2006)[2] 2 with consonants sampled in the conversational speech of 120 Head Start students who were distributed in a northern (Lansing, Michigan) and a southern (Baton Rouge, Louisiana) regional location. Some background study, including detailed explanatory to the measurements and how the data were collected, can be found in [2] [3] [4]. The readers are encouraged to refer to the articles to gain further insight into relevant language researches for prechool children. The main interests focuses on the evaluation and predictions of their performance in MCC test, which is more general comparing to the Preschool Language Scale (PLS-3) test in the sense that it focuses on the natural speech and can be carried out for general languange for counties where the standardized PLS-3 are not available. A set of typical and atypical measurements are collected along with the pass/fail outcome in MCC test, including basic 1

Technique report of course project for STT864: Applied Statistical Methods, MSU, Spring 2010, instructed by Prof.Maiti. 2 Ida J.Stockman is a professor in College of Communication Arts&Science at the Michigan State University, East Lansing, U.S.A., (e-mail: [email protected])

1

demorgraphic information and performace in relevant tests such as PLS-3, overall Percentage of Consonants Correct–Revised (PCC-R) and Number of Different Words (NDW), etc.; suspected clinical delay, or clinical status, is also determined for each child by a triangulation consisting of children’s parents, preshool teschers and screeners who gave a brief test to investigate any possible speech problems. The children in two cohorts (LA and MI) were sampled from families below poverty line and with no disablity like hearing loss, blindness, social pathology or physical defects, to exclude environmental risks. The researchers aim to investigate whether the measurements correlated differently with pass/fail outcome of MCC grammar test and seek for specific relationship between the performance with measurements with significant effects; furthermore it is interested in a comparison of the performances of MCC test with that of standardized PLS test with scoreline 79 (pass if > 79). Although a consistency of two test is expected, inconsistency may however arouse more intersts and it is worth further insight into interpreting the inconsistency with a certain set of measurements for four possible pass/fail groups. The two research objects above indicate both dichotomous and polytomous analysis and we resort to generalized linear models[10] [8] [9] for this study. We will mainly use statistical software R with relevant functions described in [7] [11] [6] [5] [1].

2

Data Description

To be explicit, the data set we analyse contains 119 children from two cohorts (LA&MI). The measurements are preliminarily selected to remove the collinearity, for instance the PLS-3 test has three scores with Auditory Comprehension, Expressive Communication and Total Language score, we will only use the total score. The scatter plot matrix is shown in Figure 1, where we pairwise plot the variables in the lower triangular plane with histogram for each variable in the diagonal and correlation coefficients for each pair in the upper triangular plane using R add-on module PCCAT[12]. It also indicates a high correlation between the ”words” and ”utterance”, which are Average Total MS Dialectal Alterations Only Words and Only Utterance respectively, thus we use ”words” and exclude ”utterance” since the latter has higher correlations with other measurements. Finally we have p = 11 measurements introduced to the initially proposed regression model for the study of pass/fail outcome of MCC test, which is shown in the bottom of Figure 1. Notice that for the second part where we study the effects of measurements on the four groups partitioned by both the pass/fail outcomes of MCC test and PLS total score (scoreline: 79), the data set is recently updated with possible different values.

2

0.6 0.3 0.0 0.00

0.10

0.20

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●





● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

90 70 50

● ●

● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●● ●



● ● ●●● ● ● ●● ●● ● ●●●● ●● ● ● ●● ● ● ● ●● ●●● ●●● ● ● ●● ● ●● ● ● ● ●● ● ●●● ●●● ● ●● ●● ● ● ●● ●● ●● ● ●● ● ● ●●●●● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ●● ● ●● ●● ●● ● ●●●●●●● ● ●● ● ● ● ● ● ●● ●●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●

● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ●●● ●● ●●●● ● ● ● ●●● ● ● ● ● ●● ● ● ●●● ●● ●● ● ●● ●● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ●●● ● ● ●● ● ●● ● ●

● ●



● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ●●●● ● ●● ● ●● ● ●● ● ● ● ●● ● ●● ●● ●● ●● ●● ● ●● ●● ●● ●●●●● ●● ● ● ● ●● ●● ● ● ●● ●●●● ●● ●●● ● ● ● ● ●● ● ●●●● ● ●● ●● ● ● ●● ● ●● ● ● ● ●

●●● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ●●●● ● ●●● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ●●● ● ●●● ●●● ● ● ●● ● ●● ● ● ●●● ●●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●









● ● ●

● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ●●● ● ●●●●●● ● ● ●●● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●●● ● ●●● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ●● ●● ●●● ● ●● ● ● ● ●● ●● ●●● ● ●●● ● ● ● ● ●● ●● ● ●● ●● ●●● ● ●● ● ● ●● ●● ● ●●● ●● ● ● ●● ●●● ● ●● ●● ● ●● ●●● ●● ●●● ● ●● ● ● ●●●● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●













●● ●● ●● ●●●●●●● ●● ●●●●●●●●●●●●●●● ● ●●● ●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ●● ●●●●● ● ●●●















● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



















1.8 1.4 1.0 1.8 1.4

2.5e−20

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●







● ● ●● ● ●● ● ●●●● ● ● ●● ●

● ●● ● ●● ●●● ● ● ● ● ●● ●● ● ● ● ● ●●● ●●● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●●●● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●

● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ●●● ● ● ● ●●●● ● ● ● ●● ●●● ● ●● ● ● ● ● ●● ● ●●● ● ●● ●● ● ● ● ● ● ●●●● ● ● ●● ● ● ●● ● ● ●● ● ●●● ● ● ●● ● ● ●

●● ●● ● ●●●●●●● ●● ●●●●●●●●●●●●●●● ● ●●● ●●

3.5 ●



● ●

●● ● ●● ●●●● ●● ● ●●● ●●● ●● ●●● ● ● ●

●●



● ●



● ●● ●● ● ● ●● ● ●● ● ● ● ●● ●● ● ●● ●●● ● ●●●●● ●● ●●●●●● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ●●●●● ● ● ●●● ● ● ●



●● ●



● ● ● ● ● ● ● ●●● ● ● ●● ● ●●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ● ●● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ●● ● ● ●● ● ●

● ●



●● ●





●● ● ● ●● ● ● ●●●●●●●●● ●●● ●●●●●● ● ● ●● ●●● ●●● ●●●●● ●● ● ● ●●●● ● ●●●●●●●●●●● ●●●●●●●●●● ●●●● ● ●●●● ●●●

● ● ●●●●●● ● ● ●● ●●●●●● ● ● ●● ●● ● ● ●● ● ● ● ● ●●●●●●●●●●●●●●● ●● ● ●●●● ●●●●●●● ●●●●●● ●● ● ● ● ● ● ●●●●● ●●● ●● ● ● ● ●● ●● ● ●

● ●

● ●●● ●● ●● ●● ● ●●●● ●●●● ● ● ● ●● ●●●● ● ●● ●●● ● ● ● ●● ●● ● ●●● ● ● ●●● ●● ● ●●●● ● ●●● ●● ●● ● ● ● ●●●●● ● ●● ● ●● ●●●● ●●● ●● ●● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●



● ●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●● ●●● ●●● ●●●●



● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●●●● ● ●● ● ●●●● ● ●● ●●● ●● ●● ● ●●● ●● ● ●● ● ●●●● ● ● ● ●●● ●● ● ●● ●● ●● ● ●● ●●●● ●● ●● ● ● ● ●●●● ● ● ●●●●●●● ● ● ● ● ● ● ●●●● ●● ●●●●

● ● ● ●● ● ● ● ● ●●●●● ● ● ● ●

● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ●● ●● ● ●● ● ●● ● ● ●● ● ● ●● ● ●● ●● ●● ● ●● ● ● ● ●●● ● ●● ● ●● ●●●● ●● ● ● ● ● ●● ● ●●● ● ● ● ●●●● ● ● ● ●● ●●● ● ●● ●●



● ●●

● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●



● ●

● ●●● ● ● ●● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ●● ● ●● ● ● ●● ● ●● ●● ●● ● ●● ●● ● ●●● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ●● ●● ● ●● ●●● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ●

●●● ●● ●●●●●●●● ●● ●●● ●





●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ●● ●●● ● ● ●

●●



● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

words





● ● ●● ● ● ●●●●●● ● ●●●● ●●●● ● ●●●●●● ●●● ●●●●●●●●●●●● ● ●●●● ●● ●● ● ● ● ● ●●● ●

● ● ●

● ● ● ●





● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●



0.027

0.24 0.16

0.075

0.18

0.078 0.059

0.18

0.037































● ● ● ● ●

● ● ●

● ●









● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●● ● ●●●● ●● ●●●●●●●●●●●●●●● ●●●●●●





● ●●●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●



● ● ●



● ●





























● ●●●●●

● ● ●●

●●

●● ● ●





clin_status

0.25 0.51



● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ●● ●●● ● ● ● ● ● ●●●● ●● ●● ● ●● ● ●● ●●● ● ●● ●●● ●●● ●●● ●● ●●● ●●●● ● ● ●●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●



●●●●●●●●●●●●●●

0.62 0.27 0.46





● ● ● ● ● ● ● ● ● ● ● ● ●

PCC_RV ●

● ● ● ●



● ●

0.062 0.084 0.016

0.14

● ● ● ● ● ● ● ●



● ●

● ●● ●● ●●● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●●●● ●● ● ● ●● ●● ●●● ●●● ● ●●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●

● ● ● ● ● ●





● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●● ●●●●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



●●●●●●●●



● ● ● ● ●



●●

●●●●●●●●●●●●●●

●●● ●





● ●● ● ● ●● ● ●●●●●● ● ● ●●● ●● ●● ●● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ●● ●● ●●●● ●● ●● ● ● ●● ●● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ●



AGE_LS

0.032

0.044

0.025



● ● ●● ●

0.18

0.016

0.90 0.07 0.15

●● ●

● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●●● ●● ●● ● ● ● ●●●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ●

● ●





utterance



0.0041

● ●●●● ●●●●●● ●● ●●●●●●●●●●●●●● ●●●●●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

NDW_100

● ● ● ● ●

0.45

● ●









●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ●● ● ●

pass

1.0 1.2 1.4 1.6 1.8 2.0

1.0 1.2 1.4 1.6 1.8 2.0

● ● ● ●● ●● ● ● ● ● ●



● ● ●● ● ●●●●● ● ● ●●

0.5

1.5



2.5

● ● ● ● ●● ●●●● ●● ●

3.5

●●● ●●●●●●

0.0

● ●

0.2



0.4



●●●●●

●●





40

42

44

46



●●●

● ●● ● ●●



●●●

48

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



● ● ●● ● ●●●● ● ●● ●● ● ●● ● ●●● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●



● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●





● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ●●●●● ●● ●● ● ● ● ●● ●● ●●●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ●●

0.39 0.47



● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ●●●●●●● ● ● ● ● ● ● ●●●● ●● ● ● ●●● ● ●● ● ● ●●● ●● ● ● ● ●● ●●● ● ●● ●● ●●●● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ●● ●● ● ● ● ●●

● ● ● ● ● ● ● ● ● ● ● ● ●



MS_Percent ●



● ●





0.20 0.44 0.33 0.12 0.18 0.05



● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.14

2.0

Density

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ●● ●●●●●● ● ● ● ●●● ● ● ● ●● ●●● ● ●● ●● ●●● ●● ●●● ● ● ● ●●● ●● ● ●● ● ●● ●● ● ● ● ●● ● ● ●●● ● ●● ● ● ●●●● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.053 0.017 0.021 0.015

0.052 0.017 0.033 0.059

0.0085 0.0023

0.035

0.5

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●●

● ● ● ● ● ● ● ● ● ● ● ● ● ●



120

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.41 0.30 0.35 0.08 0.08 0.10

0.072

0.25 0.12 0.13 0.25 0.10 0.30 0.32 0.41 0.36



● ●



● ● ● ● ● ● ● ● ●

80

● ● ● ● ● ● ● ● ● ● ● ● ●

140





● ●

120

0.4

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

100

0.2



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

PLS_tot

80

0.0



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

90

48



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

80

44

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

70

40



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Gender

60

0.8



● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

50

0.4



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.00 0.05 0.10 0.15 0.20

0.0

1.8 1.4 1.0 110 90 70



● ● ● ● ● ● ● ● ●

0.6

0.11



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.4

0.056





0.2

0.23 0.10 0.24 0.16 0.22





0.0

0.015



● ● ● ● ● ● ● ● ● ● ●

110

0.09

Examiner



100

0.19 0.50



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

90

0.035





80

0.8

0.62



70

1.0

1.0 1.2 1.4 1.6 1.8 2.0

Location

●● ●● ●● ●● ● ● ● ●●● ●●

0.0 0.2 0.4 0.6 0.8 1.0

Figure 1: Scatter plot matrix of all variables

3

Logistic Regression

We are interested in which terms in our data influence children’s performance in the MCC grammer test. In our data, we used the P F to measure the performance in this grammer test and took other variables as the ingredients which may influence the test result. since P F is a binary variable,it is naturally to use the logistic regression to represent which ingredients could contribute to the preformance of the MCC grammer test. Frist, we have to select our predict variables.The above scatter plot matrix shows that words and utterance has a high correlation, with correlation coefficient 0.9.Since words also has a high correlation with other variables.Hence we were not include words into our model. Consider the background information and the last line of the above scatter plot matrix, it is reasonable to take the collection,Pls tot, Density,MS Percent,words, AGE LS,PCC RV,clin status,NDW 100 ,Proportion, as our candidate predict variables. So we constructed the original model:  log

p 1−p

 = β0 + β1 ∗ P ls tot + β2 ∗ Density + . . . + β8 ∗ N DW 100 + β9 ∗ P roportion (1) 3

Using R software to fit the above model, the output shows that most coefficients of these variables have large P values.SO we seted a standard P value < 0.1 and used backward selection to choose predict variables which have P value less than our standard one . After backward selection, we chose clin status , NDW 100 and PLS tot as our variable . Since the logit model is the most popular way to fit binary response data, we constructed the logit model :   p = β0 + β1 ∗ clin status + β2 ∗ N DW 100 + β3 ∗ P LS tot (2) log 1−p Using R software, we got the estimated parameters:

(Intercept) clin status NDW 100 PLS tot AIC

Estimate -17.26902 2.79025 0.11803 0.07192 54.552

Std. Error 4.95249 0.80002 0.03587 0.04241

t value -3.487 3.488 3.291 1.696

Pr(>|t|) 0.000489 0.000487 0.000999 0.089939

Table 1: Summary of the logit model

Another popular model for binary response variable is probit model.we also consturcted the probit model using the chosen predict variable to make model comparsion. Using R software, we got the estimated parameters for probit model:

(Intercept) clin status NDW 100 PLS tot AIC

Estimate -9.47228 1.49758 0.06424 0.04057 54.305

Std. Error 2.64717 0.42670 0.01905 0.02369

t value -3.578 3.510 3.373 1.712

Pr(>|t|) 0.000346 0.0004497 0.000744 0.086808

Table 2: Summary of the probit model

The graphic diagnosis of the logit model and probit model are

4

−2

−1

0

1

49 36●

0

2

4

Predicted values

6

● 49● 36

−2

4 2

Std. Pearson resid. 10

● ● ●

0.5

Cook's distance

0.0

0.1

0.2

0.3

Leverage

probit link

probit link

Residuals vs Leverage

49 36●



−1

0

1

2

● 86

−2

0

Theoretical Quantiles

2

4

Predicted values

●●

6

2

1

0

● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ●● ● ●● ● ● ● ● ● ●

Std. Pearson resid.

1.0



0.5

●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●●

Std. deviance resid.

1 0

5

●●

●●

Scale−Location ● 86

0.0

−2

●●

−2

●● ●● ●● ● ●● ● ●

22 ●







−1

●●



Std. deviance resid.

● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ●●● ●● ●● ● ●● ● ● ● ● ● ●



Predicted values

2

2 0

● ●● ●●

0.5 ●

1 ●● ●● ● ● ● ● ● ●

−4

1.0

0

86 ●



1

0.5

2

1

● 46 ● ● ● ● ●●●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●

4

probit link

Normal Q−Q

3

probit link

Residuals vs Fitted ●

Residuals



Theoretical Quantiles

● 86

−1

1.5

● 49● 36

Predicted values

−2

Std. deviance resid.

1

10

● ● ● ● ● ●● ●●● ● ● ●

−2

5



● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●●● ●

0.5

46 ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●●



0.5 1

● ● 49

Cook's distance

−4

0

● 86

0.0

49 36 ●



1.5

−2

● ● ●

−2

●● ●● ●● ● ● ●

logit link

Residuals vs Leverage

49 36 ●

0

3 ●●

● ●● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0

0

● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●●●● ● ●● ●● ● ● ● ● ● ●



−1

Residuals

1

● ● ●● ●●

−1



Std. deviance resid.



logit link

Scale−Location ● 86 ●

2

2

86 ●

−2

logit link

Normal Q−Q

3

logit link

Residuals vs Fitted ● 86

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Leverage

Compare the estimate coefficients in Table 1 and Table 2, All the varaible in those two model are significant.The AIC value decrease a little bit in probit model.Also,there are no large difference with the graphic diagnosis of these two model. while the coefficient of NDW 100 changed from 0.118 to 0.064, which means the influence of NDW 100 became less significant in probit model, this change will make the influence of NDW 100 less obvious when it made unit change and harder for us to explain this variable .Most importantly, it is easy to interpret our data using logit model, since its result could be interpret with odd ratio form which is easy to understand. Hence we finally choose the logit model as our model.  log

p 1−p

 = −17.263 + 2.790 ∗ clin statu + 0.118 ∗ N DW 100 + 0.0719 ∗ P LS tot (3)

We got the 3D surface of pass probability versus two continuous measurements under the normal clinic status is ok or not ok. From these S-like curve and image plots, we could get the relationship between Pass or Fail of the MCC test versus PLS total score and NDW. Fixed the clinical status, increasing the PLS total score and the value of NDW will increase the probability of pass the MCC test. Also, compare the difference between the S-like curves under the clinical status is ok and not ok, we could conclude that a child whose clinical status is ok are more likeliy to pass the MCC test than those whose clinical status is not ok under the same PLS total score and same value of NDW.

4

Multinomial Logit Regression

For the second part of this study, since what the researcher is concerned with is the children’s performance in both language sample protocol test (pass/fail) and standardized test (PLS), now we consider four groups obtained from the combination of them: MCC(gramma) and PLS total score : 5

140

0.9

120

0.8

100 80

0.6 0.5

60

NDW_100

0.7

0.4

40

0.3

20

0.2

20

40

60

80

100

120

140

PLS_tot

Figure 2: Predicted probability of pass MCC test versus PLS total score and NDW with clinical condition not ok

6

ˆ Group1: passed both the exams ˆ Group2: failed both the exams ˆ Group3: passed language sample protocol but fail the standardized test (PLS) with score < 79 ˆ Group4: passed standardized test but fail the language sample

Since we are seeking for a subset of predictors that can discriminate the groups other than a linear combination of all predictors, we resort to multinomial logit model with group label as the response variables to find predictors that are significant in discriminating the groups. Since we have p = 12 predictors (note: we dropped two PLS components(expression and auditory) but use the total score only, and we drop sample size, proportion of used sample but use effective sample size only to remove the collinearity), a exausted search becomes possible. The multinomial logit model has desired way of interpretion for our goal that it provide further insight into the effect of selected independent variables. Also the goodness-of-fit for models provide better measurements for our goal. More explicitely, log

πj = ziT β j , πJ

j = 1, 2, ..., J − 1

(4)

where we extract a subset {zi1 , ..., zip∗ } of p measurements, to be involved in the regression model, with n >> p∗ < p to reduce the redundance of parameters. The fitted p − values, deviance and AIC will provide evaluation of the model choice. Furthermore,from the fitted model, we can predict the memberships for ith subject belonging to jth group using: exp(ziT β j )

πji = PJ

k=1

exp(ziT β k )

(5)

for j = 1, ..., J, i = 1, .., n and β J = 0 for the group as baseline-category. We assign the subject to the Ith group for I = arg maxi {πi } Since true membership for each subject is known, we can obtain so-called training error c for model choice c, c = 1, 2, ..., C. To be explicit, let y be an indicator variable for the groups, with y = i for Group i, i = 1, 2, 3, 4. We will choose a subset of Location, Examiner, Gender, Density, AGE LS, PCC RV, clin status , NDW 100, as explanatory variables. Selecting the variables to include in the model based on AIC criterion using a stepwise(backwards) method, among the variables, we find there are 4 variables in determining the group, so we use them as the explanatory variable to fit the multinomial logit model: Density , PCC RV, clin status , NDW 100. 7

Coefficients: 2 3 4 Std. Errors 2 3 4

(Intercept) 11.587884 8.842589 17.003993

Density 1.5498427 1.8996259 0.6112406

PCC RV -0.08063940 -0.09459082 -0.15816086

clin status -3.117875 -1.721030 -4.534902

NDW 100 -0.06837895 -0.04296029 -0.04780244

(Intercept) 4.668844 4.548248 4.809364

Density 0.7876404 0.7004889 0.8747212

PCC RV 0.05216377 0.05134895 0.05362086

clin status 1.015980 1.083893 1.283781

NDW 100 0.02624799 0.02458696 0.02598835

Table 3: Summary of the multinomial logit model

To interpret, we take clinic status for instance: when the others remain the same, if the clinical status is OK (the value=1), the child will more likely to be in the group 1 rather than group 2. The estimated odds that it is in group 1 rather than in group 2 equal exp(3.1179) = 22.60 times the estimated odds with clinical status not OK.

5

Summary

In this report we fitted generlized models (logistic regression and multinomial logit model) to the data set to study the relationships between the measurements and children’s performance in MCC test, along with the effcts of measurments on the partition of combined test groups. The models are fitted well according to the obtained statistical criteria and graphic dignosis. We conclude the prefictors introduced into our model have significant contributions to the performances, which coincides with the empirical knowledges on the studied topic.

References [1] A.Agresti. Categorical Data Analysis. A John Wiley & Sons, INC., 2 edition, 2002. [2] I.J.Stockman. Evidence for a minimal competence core of consonant sounds in the speech of african american children: Apreliminary study. Journal of Clinical Linguistics and Phonetics, 20:723–749, 2006. [3] I.J.Stockman. Toward validation of a minimal competence phonetic core for african american children. Journal of Speech, Language, and Hearing Research, October 2008. [4] I.J.Stockman, L.Karasinski, and B.Guillory. The use of conversational repairs by african 8

american preschoolers. Language, Speech, and Hearing Services in Schools, October 2008. [5] J.C.Pinheiro and D.M.Bates. Mixed-Effects Models in S and S-PLUS. Springer, 2000. [6] J.J.Faraway. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametrix Regression Models. Chapman & Hall/CRC, Taylor & Francis Group, 2006. [7] J.M.Chambers and T.J.Hastie. Statistical Models in S. AT&T Bell Laboratories, 1992. [8] J.W.Hardin and J.M.Hilbe. Generalized Linear Models and Extensions. A Stata Press Publication, StataCorp LP, 2 edition, 2007. [9] L.Fahrmeir and G.Turtz. Multivariate Statistical Modeling Based on Generalized Linear Models. Springer, 2 edition. [10] P.McCullagh and J.A.Nelder. Generalized Linear Models. Chapman & Hall/CRC, 2 edition, 1989. [11] W.N.Venables and B.D.Ripley. Modern Applied Statistics with S-PLUS. Springer, 3 edition. [12] Z.Zhang. User Manual for PCCAT: for State of Michigan, Department of Environmental Quality.

9