Homework 4 Solutions

Homework 4 Solutions ACM/ESE 118, Fall 2008 Date: 11/13/2008. (1) Montgomery 4.2 Note: We only are concerned with the regression model fitting y to x2 , x7 , and x8 as in problem 3.1. (a) We can construct a normal plot with the following R code: > football.lm = lm(y ~ x2 + x7 + x8) > football.lm.resid = resid(football.lm) > qqnorm(football.lm.resid) > qqline(football.lm.resid)

1 0 −3

−2

−1

Sample Quantiles

2

3

Normal Q−Q Plot

−2

−1

0

1

2

Theoretical Quantiles

Figure 1. Normal probability plot for problem 1(a) The plot is shown in figure 1. The residuals lie nearly in a straight line, but not exactly, indicating that the errors may not be iid normal (as in figure 4.2(a) of Montgomery). Note that if you plotted probability on the y axis, the probability values should not be equally spaced as that would not be a normal probability plot. (b) The plot is shown in figure 2. There does not appear to be any strong pattern to the residuals, and they all appear to lie randomly in a horizontal band, so this plot supports the assumption that the errors are independently, identically distributed. (c) The plots are shown in figure 3. The plots of residuals versus x2 and residuals verus x8 both exhibit a weak double-bow pattern. The plot of residuals verus x7 exhibits a strong funnel pattern. All of these patterns imply non-constant variance, but do not imply that the relationship between the response and the regressors is non-linear. (d) We can make a partial regression plot to test the marginal role of x2 with the following code in R: > football.lm.x2 = lm(y~x7+x8) > x2.lm = lm(x2~x7+x8) > plot(resid(football.lm.x2)~resid(x2.lm)) > plot(resid(football.lm.x2)~resid(x2.lm), 1

2

0 −3

−2

−1

residuals

1

2

3

Residual plot

2

4

6

8

10

12

predicted response

1500

2000

2500 x2

3 −3

−2

−1

0

residuals

1

2

3 2 1 −3

−2

−1

0

residuals

0 −1 −2 −3

residuals

1

2

3

Figure 2. Plot of residuals versus predicted values for problem 1(b)

45

50

55 x7

60

65

1600

1800

2000

2200

2400

2600

2800

x8

Figure 3. Plots of residuals versus each regressor for problem 1(c)

xlab = "residuals of x2 regressed against x7,x8", ylab = "residuals of y regressed against x7,x8") The other regression plots can be coded similarly. The plots are shown in figure 4. We see that the plots exhibit similar behaviors to those in part (d), especially the partial regression for x7 which still shows a definite cone shape. In general, partial regression plots are especially useful for demonstrating the marginal usefullness of a specific regressor, i.e. if a regressor is highly collinear with the other regressors, the partial regression plot can help to show that it is not useful. (e) We can use the R commands ”rstandard” and ”rstudent” to compute the studentized residuals and the R-student residuals, respectively. These residuals, and their differei . ence, are plotted in figure 5. Studentized residuals are defined as ri = √ M SRes (1−hii ) p As you can see they have been ”normalized” by the factor of 1/ (1 − hii ). On account of this factor, they have constant variance regardless of the location of xi and they are

0 −2

residuals of y regressed against x2,x7

2 1 0


−4

−4

−2

−1

2 0 −2


3

2

4

4

3

−500

0

500

1000

−5

residuals of x2 regressed against x7,x8

0

5

−400

−200


0

200

400

600


Figure 4. Partial regression plots for problem 1(d) better at determining which points will be highly influential on the data than nonnormalized residuals. The R-student residuals are defined as ti = q 2 ei where S(i) (1−hii )

(n−p)M SRes −e2i /(1−hii ) . n−p−1

4

6

8

10

fitted values

12

0.15 0.10 0.05 −0.05 −0.10

−2 2

0.00

1 0

R−student residuals

−1

0 −1 −2

studentized residuals

1

R−student Res − studentized Res

2

2

0.20

2 = S(i) The R-student residuals are ”externally scaled” meaning that the estimate of the value of σ 2 used to standardize the ith residual is derived without using the ith data point. Because of this, R-student residuals are especially good for detecting outliers (in general, better than studentized residuals). As you can see from the plot of the difference of the two types of residuals, the outliers are a little larger for the R-student residuals. In either case, none of these values are too large (greater than about 3)

2

4

6 fitted values

8

10

12

2

4

6

8

10

12

fitted values

Figure 5. Studentized residuals, R-student residuals, and their difference for problem 1(e) (2) Montgomery 5.6. One caveat to take care of - note that for these models, y is a singular function of x (if we ignore error terms). That is, the magnitude of y blows up as x approaches 0, −β0 /β1 , or β0 /β1 respectively. If there are data points with x-values close to the singularity on both sides of the singularity, then it should be obvious where the singularity is. For example, in the top left of Figure 6, there are several data points where x is of small magnitude and y is of large magnitude and has the same sign as x. So, clearly there

4

is a singularity at 0 and model (a) would be a good choice. However, a model with a singularity in the middle of it is unlikely to be meaningful in practice, so we will focus on cases where there are data points only on one side of the singularity. y = 2 + 1/x + e

y = 1 + 1/x + e

20

12 10

15 8 10 y

y

6 4

5

2 0 0 −5 −20

−15

−10

−5 x

0

5

−2

10

0

1

2

3

4

5

x

1/y = 2 + x + e

y = x/(.25 + .25*x) + e

4

15

3.5 3

10

2

y

y

2.5 5

1.5 1

0

0.5 0

0

1

2

3 x

4

5

−5 −2

−1.5

−1

−0.5 x

0

0.5

1

Figure 6. Example scatterplots for each of the models For a one line summary of the main difference between the models: in (a) the data has the y-axis as an asymptote, in (b) the data has the x-axis as an asymptote, and in (c) neither the x-axis nor y-axis are asymptotes. (a) For this model, one should data points with a large deviation |yi − y¯| when x is of relatively small magnitude. (If there are no points of relatively small magnitude, that is, min |xi |/ max |xi | is close to 1, then this model is not too different from the constant model, and probably shouldn’t be used.) The mean of y varies slowly when |x| is relatively large, and it does not necessarily approach 0. The variance of y should be appear independent of x. A typical instance of data following his model is shown top right of Figure 6, (b) For this model, both the deviation from the sample mean |yi − y¯| and the variance of y should be large as x approaches some unknown value −β0 /β1 , (not necessarily 0 or in the range of x). Further away from the singularity (typically when |x| is large), both the mean and variance of y approach 0. See bottom left of Figure 6. (c) If the range x includes 0, the mean of y should be 0 there. There is a large deviation from the sample mean as x is near some unknown value β0 /β1 (not necessarily 0 or in the range of x). Away from the singularity, the mean of y assymptotically approaches the value −β1 . However, the variance of y does not go to 0 or depend on x in any way (although this is hard to tell near the singularity). See bottom right of Figure 6.

5

(3) Montgomery 5.8. (a) In order to perform a thorough residual analysis, we will construct and examine a normal probability plot, a plot of residuals versus predicted response, a plot of residuals versus each regressor, added variable plots for each regressor, and the adjusted R2 statistic. The normal probability plot lies very close to a straight line and the plot of residuals versus predicted response does not appear to show any strong pattern in the residuals (see figure 7). Thus they do not show any strong evidence to reject the constant variance, independent, normal assumptions about the noise.

10 5 0

residuals

−10

−5

0 −10

−5

Sample Quantiles

5

10

Normal Q−Q Plot

−2

−1

0 Theoretical Quantiles

1

2

20

25

30

35

40

predicted response

Figure 7. Testing the noise: A normal probability plot and a plot of residuals versus predicted response for problem 3(a) In order to test the validity of our regressors we will construct added variable plots and plots of residuals versus each regressor (see Figure 8). Judging by random, non-linear pattern of the added variable plot for x1, it appears that x1 does not belong in the model. To test this theory, we examine the p-value of the t-statistic for the regressor x1, whose value is .17 and thus not small enough to reject the hypothesis that x1 doesn’t belong in the model. Secondly, judging by the parabolic, and nonlinear shape in the added variable plot for x4 and in the plot of residuals against x4, we see that we should probably use higher terms in the variable x4. In part (b) we will add x42 as a regressor. Finally, note that in the original model, the R2 value is .69. (b) From the analysis in part (a), we regress y onto x2, x3, x4 and x42 . We will now go through the same steps as in part (a) to analyze the residuals. In this second model, the points on the normal probability plot seem to suggest that the noise is ”fat-tailed” (see figure 9). It could be that problems in the model in part (a) disguised this from us. The plot of residuals versus predicted response once again does not seem to violate the independent, constant variance assumptions on the noise. Judging by the added variable plots and the plots of the residuals versus each regressor, it appears that each regressor belongs and that the nonlinearity has been accounted for by including x42 . The summary output of our new model is: Coefficients: Estimate Std. Error t value Pr(>|t|)

−10

5 0 −10

−5

0 −5

residuals

5

residuals of y regressed against x2,x3,x4

10

10

6

2

3

4

5

6

7

8

−4

−2

0

2

4

residuals of x1 regressed against x2,x3,x4

10

−10

−10

0

0 −5

residuals

5


20

10

x1

0

20

40

60

80

100

−20

0

40

60

80

5 0

−10

−10

−5


10 5 0 −5

residuals

20


10

x2

0.25

0.30

0.35

0.40

0.45

0.50

0.55

−0.10

−0.05

0.00

0.05

0.10

0.15

0.20


5 0 −10

−5

0 −10

−5

residuals

5


10

10

x3

0.2

0.4

0.6 x4

0.8

1.0

−0.4

−0.2

0.0

0.2

0.4


Figure 8. Testing the regressors: Plots of the residuals versus each regressor and added variable plots for problem 3(a)

7

10 5 0 −10

−5

residuals

0 −5 −10

Sample Quantiles

5

10

Normal Q−Q Plot

−2

−1

0 Theoretical Quantiles

1

2

15

20

25

30

35

40

fitted values

Figure 9. Testing the noise: A normal probability plot and a plot of residuals versus predicted response for problem 3(b) (Intercept) 15.68918 3.96047 3.961 0.000210 x2 0.19244 0.01409 13.661 < 2e-16 x3 39.09141 9.05912 4.315 6.42e-05 x4 -47.92013 9.54919 -5.018 5.43e-06 x5 45.19666 7.99087 5.656 5.23e-07 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1

*** *** *** *** *** 1

Residual standard error: 4.081 on 57 degrees of freedom Multiple R-squared: 0.7955, Adjusted R-squared: 0.7811 F-statistic: 55.42 on 4 and 57 DF, p-value: < 2.2e-16 Note that the p-values are quite small, indicating that each term is useful. (In particular, including both x4 and x24 is useful.) The residuals for this model Figure 10; they look reasonable. Finally, the new R2 value is a very respectible .80, which is a significant improvent made without increasing the number of regressors to the original model. (4) Montgomery 6.15. To begin, we examine the leverage and Cook’s distance for the observations, shown in Figure 11. The leverage is just found by calculating the diagonal of the hat matrix; it represents how much a point could potentionally influence the fit, based in the x-values. Cook’s distance is a measure of how much the coefficients change overall when a point is removed. Observations 2 and 4 stand out as having particularly large leverage (greater than 2p/n) and Cook’s distance (greater than 1). Points 8 and 9 have large leverage but are more in line with the fitted model so their Cook’s distance is not as large. We use the R influence measure command to compute the DFBETAS and DFFITS for the most influential observations. Potentially influential observations of lm(formula = y ~ x1 + x2 + x3 + x4) :

20 10 −10

0

0 −10

−5

residuals

5

residuals of y regressed against x3,x4,x4^2

10

8

0

20

40

60

80

100

−20

20

40

60

80

0.35

0.40

0.45

0.50

5 0 −5

residuals of y regressed against x2,x4,x4^2 0.30

−10

10 5 0

residuals

−5 −10 0.25

0.55

−0.10

−0.05

0.00

0.05

0.10

0.15

0.20

residuals of x3 regressed against x2,x4,x4^2

5 0 −5 −10

−10

−5

0

5

residuals of y regressed against x2,x3,x4^2

10

10

x3

residuals

0


10

x2

0.2

0.4

0.6

0.8

1.0

−0.15

−0.10

−0.05

0.00

0.05


5 0 −10

−5

0 −10

−5

residuals

5


10

10

x4

0.0

0.2

0.4

0.6 x4^2

0.8

1.0

−0.05

0.00

0.05

0.10

0.15

0.20

residuals of x4^2 regressed against x2,x3,x4

Figure 10. Testing the regressors: Plots of the residuals versus each regressor and added variable plots for problem 3(b)

9

dfb.1_ dfb.x1 dfb.x2 dfb.x3 dfb.x4 dffit cov.r cook.d hat 2 1.67_* 0.21 -3.77_* 0.05 -1.07_* -4.67_* 0.04_* 1.98_* 0.47 4 -0.59 0.20 -0.57 -0.94 2.34_* 2.50_* 0.91 1.04_* 0.56 8 0.41 -0.60 1.46_* -0.54 -0.29 1.57_* 0.62 0.41 0.34 9 -0.13 -0.09 -0.11 1.05_* -0.51 1.15 1.38 0.26 0.41 10 0.19 -0.28 0.07 -0.19 0.02 -0.33 1.98_* 0.02 0.38 We note that for observation 2, it has a large magnitude DF BET A2 , which makes sense since its value for x2 is so high, but its value for y is so low. Looking at the qqplot shown in the left of Figure 13, we see that the residuals do not follow a normal distribution very well, partially due to observations 2 and 4, which appear as outliers. Even with those points removed however, the qqplot does not follow a straight line very well.

1.5

4

0.5

1.0

Cook’s distance

0.3 0.2

8

0.0

0.0

0.1

Leverage

0.4

0.5

2.0

Cook’s distance 2

5

10

15

Obs. number

20

25

5

10

15

20

25

Obs. number lm(y ~ x1 + x2 + x3 + x4)

Figure 11. Leverage (left) and Cook’s distance (right) for each observation. 2p/n level marked on leverage plot

10

Residuals vs Fitted

Scale−Location

4

2

1.5 1.0

8

0.5

Standardized residuals

0 −2

4

−4

Residuals

2

14 8

−6

0.0

2

−2

0

2

4

6

−2

Fitted values lm(y ~ x1 + x2 + x3 + x4)

0

2

4

6

Fitted values lm(y ~ x1 + x2 + x3 + x4)

Figure 12. Residuals (left) and standardized residuals (right) vs. fitted values QQ−plot with points 2,4 removed

Normal Q−Q 4

1 −1

0

Sample Quantiles

1 0 −1 −2 −3

Standardized residuals

2

2

8

2

−2

−1

0

1

Theoretical Quantiles lm(y ~ x1 + x2 + x3 + x4)

2

−2

−1

0

1

2

Theoretical Quantiles

Figure 13. QQ plots for full data set (left) and with influential points 2 and 4 removed (right)