Bootstrap Methods and their Application Anthony Davison c

2006 A short course based on the book ‘Bootstrap Methods and their Application’, by A. C. Davison and D. V. Hinkley c

Cambridge University Press, 1997

Anthony Davison: Bootstrap Methods and their Application,

1

Summary ◮ ◮

Bootstrap: simulation methods for frequentist inference. Useful when • standard assumptions invalid (n small, data not normal, . . .); • standard problem has non-standard twist; • complex problem has no (reliable) theory; • or (almost) anywhere else.

◮

Aim to describe • • • •

basic ideas; confidence intervals; tests; some approaches for regression

Anthony Davison: Bootstrap Methods and their Application,

2

Motivation

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

3

Motivation

AIDS data ◮ ◮ ◮

◮

◮

UK AIDS diagnoses 1988–1992. Reporting delay up to several years! Problem: predict state of epidemic at end 1992, with realistic statement of uncertainty. Simple model: number of reports in row j and column k Poisson, mean µjk = exp(αj + βk ). Unreported diagnoses in period j Poisson, mean X X µjk = exp(αj ) exp(βk ). k unobs

◮

k unobs

Estimate total unreported diagnoses from period j by replacing αj and βk by MLEs. • How reliable are these predictions? • How sensible is the Poisson model?

Anthony Davison: Bootstrap Methods and their Application,

4

Motivation

Diagnosis period Year 1988

1989

1990

1991

1992

Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Reporting-delay interval (quarters): 0† 31 26 31 36 32 15 34 38 31 32 49 44 41 56 53 63 71 95 76 67

1 80 99 95 77 92 92 104 101 124 132 107 153 137 124 175 135 161 178 181

2 16 27 35 20 32 14 29 34 47 36 51 41 29 39 35 24 48 39

3 9 9 13 26 10 27 31 18 24 10 17 16 33 14 17 23 25

4 3 8 18 11 12 22 18 9 11 9 15 11 7 12 13 12

5 2 11 4 3 19 21 8 15 15 7 8 6 11 7 11

Anthony Davison: Bootstrap Methods and their Application,

6 8 3 6 8 12 12 6 6 8 6 9 5 6 10

··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ···

≥14 6 3 3 2 2 1

Total reports to end of 1992 174 211 224 205 224 219 253 233 281 245 260 285 271 263 306 258 310 318 273 133

5

Motivation

AIDS data ◮

◮

300

+ ++ + ++ + + + + + + + + + + + + ++ + ++ + +++

200 0

100

Diagnoses

400

500

◮

Data (+), fits of simple model (solid), complex model (dots) Variance formulae could be derived — painful! but useful? Effects of overdispersion, complex model, . . .?

+ ++ + ++ + ++++ 1984

1986

1988

1990

1992

Time

Anthony Davison: Bootstrap Methods and their Application,

6

Motivation

Goal

Find reliable assessment of uncertainty when ◮

estimator complex

◮

data complex

◮

sample size small

◮

model non-standard

Anthony Davison: Bootstrap Methods and their Application,

7

Basic notions

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

8

Basic notions

Handedness data Table: Data from a study of handedness; hand is an integer measure of handedness, and dnan a genetic measure. Data due to Dr Gordon Claridge, University of Oxford.

1 2 3 4 5 6 7 8 9 10

dnan 13 18 20 21 21 24 24 27 28 28

hand 1 1 3 1 1 1 1 1 1 2

11 12 13 14 15 16 17 18 19 20

dnan 28 28 28 28 28 28 29 29 29 29

hand 1 2 1 4 1 1 1 1 1 2

21 22 23 24 25 26 27 28 29 30

dnan 29 29 29 30 30 30 30 31 31 31

Anthony Davison: Bootstrap Methods and their Application,

hand 2 1 1 1 1 2 1 1 1 1

31 32 33 34 35 36 37

dnan 31 31 33 33 34 41 44

hand 1 2 6 1 1 4 8

9

Basic notions

Handedness data Figure: Scatter plot of handedness data. The numbers show the multiplicities of the observations.

7

8

1

hand

5

6

1

4

1

3

1

2211

2 1

1

1

1 15

2 20

15534

2 25

11

30 dnan

Anthony Davison: Bootstrap Methods and their Application,

35

40

45

10

Basic notions

Handedness data

◮

◮ ◮

Is there dependence between dnan and hand for these n = 37 individuals? Sample product-moment correlation coefficient is θb = 0.509. Standard confidence interval (based on bivariate normal population) gives 95% CI (0.221, 0.715).

◮

Data not bivariate normal!

◮

What is the status of the interval? Can we do better?

Anthony Davison: Bootstrap Methods and their Application,

11

Basic notions

Frequentist inference ◮ ◮ ◮

Estimator θb for unknown parameter θ. iid

Statistical model: data y1 , . . . , yn ∼ F , unknown Handedness data • • •

◮

◮

y = (dnan, hand) F puts probability mass on subset of R2 θb is correlation coefficient

Key issue: what is variability of θb when samples are repeatedly taken from F ? Imagine F known — could answer question by • analytical (mathematical) calculation • simulation

Anthony Davison: Bootstrap Methods and their Application,

12

Basic notions

Simulation with F known ◮

For r = 1, . . . , R: iid

◮

• generate random sample y1∗ , . . . , yn∗ ∼ F ; • compute θbr using y1∗ , . . . , yn∗ ;

Output after R iterations:

∗ θb1∗ , θb2∗ , . . . , θbR

◮

∗ to estimate sampling distribution of θ b Use θb1∗ , θb2∗ , . . . , θbR (histogram, density estimate, . . .)

◮

In practice R is finite, so some error remains

◮

If R → ∞, then get perfect match to theoretical calculation (if available): Monte Carlo error disappears completely

Anthony Davison: Bootstrap Methods and their Application,

13

Basic notions

Handedness data: Fitted bivariate normal model Figure: Contours of bivariate normal distribution fitted to handedness data; parameter estimates are µ b1 = 28.5, µ b2 = 1.7, σ b1 = 5.4, σ b2 = 1.5, ρb = 0.509. The data are also shown. 10 0.020 1

8

0.015 1

hand

6

1

4

0.010

1

1 0.005

2211

2 1

1

2

15534 11

2

0

0.000 10

15

20

25

30

35

40

45

dnan

Anthony Davison: Bootstrap Methods and their Application,

14

Basic notions

Handedness data: Parametric bootstrap samples

10 8

10 8

8

10

Figure: Left: original data, with jittered vertical values. Centre and right: two samples generated from the fitted bivariate normal distribution.

0

10

20

30 dnan

40

50

hand 0

2

4

hand 4 2 0

0

2

4

hand

6

Correlation 0.533

6

Correlation 0.753

6

Correlation 0.509

0

10

20

30 dnan

Anthony Davison: Bootstrap Methods and their Application,

40

50

0

10

20

30

40

50

dnan

15

Basic notions

Handedness data: Correlation coefficient Figure: Bootstrap distributions with R = 10000. Left: simulation from fitted bivariate normal distribution. Right: simulation from the data by bootstrap resampling. The lines show the theoretical probability density function of the correlation coefficient under sampling from a fitted bivariate

3.5 3.0 Probability density 1.0 1.5 2.0 2.5 0.5 0.0

0.0

0.5

Probability density 1.0 1.5 2.0 2.5

3.0

3.5

normal distribution.

−0.5

0.0 0.5 Correlation coefficient

1.0

−0.5

Anthony Davison: Bootstrap Methods and their Application,

0.0 0.5 Correlation coefficient

1.0

16

Basic notions

F unknown ◮

Replace unknown F by estimate Fb obtained

• parametrically — e.g. maximum likelihood or robust fit of distribution F (y) = F (y; ψ) (normal, exponential, bivariate normal, . . .) • nonparametrically — using empirical distribution function (EDF) of original data y1 , . . . , yn , which puts mass 1/n on each of the yj

◮

◮

Algorithm: For r = 1, . . . , R: iid • generate random sample y1∗ , . . . , yn∗ ∼ Fb ; ∗ ∗ • compute θbr using y1 , . . . , yn ;

Output after R iterations:

∗ θb1∗ , θb2∗ , . . . , θbR Anthony Davison: Bootstrap Methods and their Application,

17

Basic notions

Nonparametric bootstrap ◮

iid Bootstrap (re)sample y1∗ , . . . , yn∗ ∼ Fb, where Fb is EDF of y1 , . . . , yn

• Repetitions will occur!

◮ ◮

◮ ◮ ◮

Compute bootstrap θb∗ using y1∗ , . . . , yn∗ .

For handedness data take n = 37 pairs y ∗ = (dnan, hand)∗ with equal probabilities 1/37 and replacement from original pairs (dnan, hand) Repeat this R times, to get θb∗ , . . . , θb∗ 1

R

See picture

Results quite different from parametric simulation — why?

Anthony Davison: Bootstrap Methods and their Application,

18

Basic notions

Handedness data: Bootstrap samples

10

15

20

25 30 dnan

35

40

45

6 1

2

3

hand 4 5

6 hand 4 5 3 2 1

1

2

3

hand 4 5

6

7

Correlation 0.491

7

Correlation 0.733

7

Correlation 0.509

8

8

8

Figure: Left: original data, with jittered vertical values. Centre and right: two bootstrap samples, with jittered vertical values.

10

15

20

25 30 dnan

35

Anthony Davison: Bootstrap Methods and their Application,

40

45

10

15

20

25 30 dnan

35

40

45

19

Basic notions

Using the θb∗ ◮

◮ ◮

b Bootstrap replicates θbr∗ used to estimate properties of θ. Write θ = θ(F ) to emphasize dependence on F Bias of θb as estimator of θ is iid β(F ) = E(θb | y1 , . . . , yn ∼ F ) − θ(F )

estimated by replacing unknown F by known estimate Fb: ◮

iid β(Fb) = E(θb | y1 , . . . , yn ∼ Fb ) − θ(Fb )

Replace theoretical expectation E() by empirical average: β(Fb) ≈ b = θb∗ − θb = R−1

Anthony Davison: Bootstrap Methods and their Application,

R X r=1

θbr∗ − θb 20

Basic notions

◮

Estimate variance ν(F ) = var(θb | F ) by

1 X b∗ b∗ 2 v= θ −θ R − 1 r=1 r R

◮

◮

Estimate quantiles of θb by taking empirical quantiles of ∗ θb1∗ , . . . , θbR

For handedness data, 10,000 replicates shown earlier give b = −0.046,

v = 0.043 = 0.2052

Anthony Davison: Bootstrap Methods and their Application,

21

Basic notions

Handedness data b Figure: Summaries of the θb∗ . Left: histogram, with vertical line showing θ. Right: normal Q–Q plot of θb∗ .

0.0

−0.5

0.5

t* 0.0

Density 1.0 1.5

0.5

2.0

2.5

Histogram of t

−0.5

0.0 t*

0.5

1.0

−4 −2 0 2 4 Quantiles of Standard Normal

Anthony Davison: Bootstrap Methods and their Application,

22

Basic notions

How many bootstraps? ◮

◮ ◮

Must estimate moments and quantiles of θb and derived quantities. Nowadays often feasible to take R ≥ 5000 Need R ≥ 100 to estimate bias, variance, etc. Need R ≫ 100, prefer R ≥ 1000 to estimate quantiles needed for 95% confidence intervals

2 Z* 0 −2 −4

−4

−2

Z* 0

2

4

R=999

4

R=199

−4

−2 0 2 Theoretical Quantiles

4

Anthony Davison: Bootstrap Methods and their Application,

−4

−2 0 2 Theoretical Quantiles

4

23

Basic notions

Key points ◮

Estimator is algorithm • • • •

◮

◮

applied to original data y1 , . . . , yn gives original θb applied to simulated data y1∗ , . . . , yn∗ gives θb∗ θb can be of (almost) any complexity for more sophisticated ideas (later) to work, θb must often be smooth function of data

Sample is used to estimate F

• Fb ≈ F — heroic assumption

Simulation replaces theoretical calculation • removes need for mathematical skill • does not remove need for thought • check code very carefully — garbage in, garbage out!

◮

Two sources of error • statistical (Fb 6= F ) — reduce by thought • simulation (R 6= ∞) — reduce by taking R large (enough)

Anthony Davison: Bootstrap Methods and their Application,

24

Confidence intervals

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

25

Confidence intervals

Normal confidence intervals ◮

◮

◮

. If θb approximately normal, then θb ∼ N (θ + β, ν), where θb has bias β = β(F ) and variance ν = ν(F ) If β, ν known, (1 − 2α) confidence interval for θ would be (D1) θb − β ± zα ν 1/2 ,

where Φ(zα ) = α. Replace β, ν by estimates: . . β(F ) = β(Fb) = b = θb∗ − θb X . . (θbr∗ − θb∗ )2 , ν(F ) = ν(Fb) = v = (R − 1)−1 r

◮

giving (1 − 2α) interval θb − b ± zα v 1/2 . Handedness data: R = 10, 000, b = −0.046, v = 0.2052 , α = 0.025, zα = −1.96, so 95% CI is (0.147, 0.963)

Anthony Davison: Bootstrap Methods and their Application,

26

Confidence intervals

Normal confidence intervals Normal approximation reliable? Transformation needed? b b − θ)}: Here are plots for ψb = 12 log{(1 + θ)/(1

0.0

0.5

Density 1.0

1.5

◮

Transformed correlation coefficient −0.5 0.0 0.5 1.0 1.5

◮

−0.5 0.0 0.5 1.0 1.5 Transformed correlation coefficient

−4

Anthony Davison: Bootstrap Methods and their Application,

−2 0 2 Quantiles of Standard Normal

4

27

Confidence intervals

Normal confidence intervals ◮

Correlation coefficient: try Fisher’s z transformation: b = ψb = ψ(θ)

for which compute bψ = R−1

R X r=1

◮

◮ ◮

1 2

b b log{(1 + θ)/(1 − θ)} 1 X b∗ b∗ 2 , ψ −ψ R − 1 r=1 r R

b ψbr∗ − ψ,

vψ =

(1 − 2α) confidence interval for θ is o n o n 1/2 1/2 , ψ −1 ψb − bψ − zα vψ ψ −1 ψb − bψ − z1−α vψ For handedness data, get (0.074, 0.804) But how do we choose a transformation in general?

Anthony Davison: Bootstrap Methods and their Application,

28

Confidence intervals

Pivots ◮

◮

◮

◮

∗ mimic effect of sampling from Hope properties of θb1∗ , . . . , θbR original model. Amounts to faith in ‘substitution principle’: may replace unknown F with known Fb — false in general, but often more nearly true for pivots. Pivot is combination of data and parameter whose distribution is independent of underlying model. iid

Canonical example: Y1 , . . . , Yn ∼ N (µ, σ 2 ). Then Z=

◮

Y −µ (S 2 /n)1/2

∼

tn−1 ,

for all µ, σ 2 — so independent of the underlying distribution, provided this is normal Exact pivot generally unavailable in nonparametric case.

Anthony Davison: Bootstrap Methods and their Application,

29

Confidence intervals

Studentized statistic ◮ ◮ ◮

Idea: generalize Student t statistic to bootstrap setting Requires variance V for θb computed from y1 , . . . , yn Analogue of Student t statistic: θb − θ V 1/2 If the quantiles zα of Z known, then Z=

◮

θb − θ Pr (zα ≤ Z ≤ z1−α ) = Pr zα ≤ 1/2 ≤ z1−α V

!

= 1 − 2α

(zα no longer denotes a normal quantile!) implies that Pr θb − V 1/2 z1−α ≤ θ ≤ θb − V 1/2 zα = 1 − 2α

so (1 − 2α) confidence interval is (θb − V 1/2 z1−α , θb − V 1/2 zα )

Anthony Davison: Bootstrap Methods and their Application,

30

Confidence intervals ◮

Bootstrap sample gives (θb∗ , V ∗ ) and hence Z∗ =

◮

b V ): R bootstrap copies of (θ, (θb1∗ , V1∗ ),

and corresponding z1∗ =

◮

◮

θb1∗ − θb

, ∗1/2

V1 ∗ ∗ z1 , . . . , zR

θb∗ − θb V ∗1/2

(θb2∗ , V2∗ ),

z2∗ =

θb2∗ − θb

, ∗1/2

V2

...,

...,

∗ (θbR , VR∗ ) ∗ zR =

∗ −θ b θbR . ∗1/2 VR

Use to estimate distribution of Z — for example, ∗ < · · · < z∗ order statistics z(1) (R) used to estimate quantiles Get (1 − 2α) confidence interval ∗ θb − V 1/2 z((1−α)(R+1)) ,

Anthony Davison: Bootstrap Methods and their Application,

∗ θb − V 1/2 z(α(R+1))

31

Confidence intervals

Why Studentize? ◮

D

Studentize, so Z −→ N (0, 1) as n → ∞. Edgeworth series: Pr(Z ≤ z | F ) = Φ(z) + n−1/2 a(z)φ(z) + O(n−1 ); a(·) even quadratic polynomial.

◮

Corresponding expansion for Z ∗ is Pr(Z ∗ ≤ z | Fb) = Φ(z) + n−1/2 b a(z)φ(z) + Op (n−1 ).

Typically b a(z) = a(z) + Op (n−1/2 ), so

Pr(Z ∗ ≤ z | Fb) − Pr(Z ≤ z | F ) = Op (n−1 ).

Anthony Davison: Bootstrap Methods and their Application,

32

Confidence intervals

◮

D

If don’t studentize, Z = (θb − θ) −→ N (0, ν). Then z z z Pr(Z ≤ z | F ) = Φ 1/2 +n−1/2 a′ 1/2 φ 1/2 +O(n−1 ) ν ν ν and

Pr(Z ∗ ≤ z | Fb) = Φ

z z z −1/2 ′ +n b a φ 1/2 +Op (n−1 ). νb1/2 νb1/2 νb

Typically νb = ν + Op (n−1/2 ), giving

◮

Pr(Z ∗ ≤ z | Fb ) − Pr(Z ≤ z | F ) = Op (n−1/2 ).

Thus use of Studentized Z reduces error from Op (n−1/2 ) to Op (n−1 ) — better than using large-sample asymptotics, for which error is usually Op (n−1/2 ).

Anthony Davison: Bootstrap Methods and their Application,

33

Confidence intervals

Other confidence intervals ◮

◮

Problem for studentized intervals: must obtain V , intervals not scale-invariant Simpler approaches: • Basic bootstrap interval: treat θb − θ as pivot, get ∗ b θb − (θb((R+1)(1−α)) − θ),

∗ b θb − (θb((R+1)α) − θ).

∗ • Percentile interval: use empirical quantiles of θb1∗ , . . . , θbR :

◮

∗ , θb((R+1)α)

∗ . θb((R+1)(1−α))

Improved percentile intervals (BCa , ABC, . . .) • Replace percentile interval with ∗ θb((R+1)α ′) ,

∗ θb((R+1)(1−α ′′ )) ,

where α′ , α′′ chosen to improve properties. • Scale-invariant. Anthony Davison: Bootstrap Methods and their Application,

34

Confidence intervals

Handedness data ◮

95% confidence intervals for correlation coefficient θ, R = 10, 000: Normal 0.147 0.963 Percentile −0.047 0.758 Basic 0.262 1.043 ′ ′′ BCa (α = 0.0485, α = 0.0085) 0.053 0.792 Student 0.030 1.206 Basic (transformed) Student (transformed)

◮

0.131 0.824 −0.277 0.868

Transformation is essential here!

Anthony Davison: Bootstrap Methods and their Application,

35

Confidence intervals

General comparison ◮

Bootstrap confidence intervals usually too short — leads to under-coverage

◮

Normal and basic intervals depend on scale.

◮

Percentile interval often too short but is scale-invariant. Studentized intervals give best coverage overall, but

◮

• depend on scale, can be sensitive to V • length can be very variable • best on transformed scale, where V is approximately constant ◮

Improved percentile intervals have same error in principle as studentized intervals, but often shorter — so lower coverage

Anthony Davison: Bootstrap Methods and their Application,

36

Confidence intervals

Caution ◮

◮

.. .

10

◮

Edgeworth theory OK for smooth statistics — beware rough statistics: must check output. Bootstrap of median theoretically OK, but very sensitive to sample values in practice. Role for smoothing?

0 -10

-5

T*-t for medians

5

. .. ... . ........ ......... ............. . ............................. .......... ................ ............ .............. ........ . ......... . ...... . . -2

0

2

Quantiles of Standard Normal

Anthony Davison: Bootstrap Methods and their Application,

37

Confidence intervals

Key points

◮

Numerous procedures available for ‘automatic’ construction of confidence intervals

◮

Computer does the work

◮

Need R ≥ 1000 in most cases

◮

Generally such intervals are a bit too short

◮

Must examine output to check if assumptions (e.g. smoothness of statistic) satisfied

◮

May need variance estimate V — see later

Anthony Davison: Bootstrap Methods and their Application,

38

Several samples

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

39

Several samples

Gravity data Table: Measurements of the acceleration due to gravity, g, given as deviations from 980,000 ×10−3 cms−2 , in units of cms−2 × 10−3 .

1 76 82 83 54 35 46 87 68

2 87 95 98 100 109 109 100 81 75 68 67

3 105 83 76 75 51 76 93 75 62

Series 4 5 95 76 90 76 76 78 76 79 87 72 79 68 77 75 71 78

Anthony Davison: Bootstrap Methods and their Application,

6 78 78 78 86 87 81 73 67 75 82 83

7 82 79 81 79 77 79 79 78 79 82 76 73 64

8 84 86 85 82 77 76 77 80 83 81 78 78 78

40

Several samples

Gravity data

40

60

g

80

100

Figure: Gravity series boxplots, showing a reduction in variance, a shift in location, and possible outliers.

1

2

3

4

5

6

7

8

series

Anthony Davison: Bootstrap Methods and their Application,

41

Several samples

Gravity data ◮

Eight series of measurements of gravitational acceleration g made May 1934 – July 1935 in Washington DC

◮

Data are deviations from 9.8 m/s2 in units of 10−3 cm/s2

◮

Goal: Estimate g and provide confidence interval

◮

Weighted combination of series averages and its variance estimate !−1 P8 8 2 X y × n /s i i i=1 i , ni /s2i , V = θb = P 8 2 n /s i i=1 i i=1 giving

θb = 78.54,

V = 0.592

and 95% confidence interval of θb ± 1.96V 1/2 = (77.5, 79.8)

Anthony Davison: Bootstrap Methods and their Application,

42

Several samples

Gravity data: Bootstrap ◮

◮

Apply stratified (re)sampling to series, taking each series as a separate stratum. Compute θb∗ , V ∗ for simulated data Confidence interval based on θb∗ − θb Z ∗ = ∗1/2 , V whose distribution is approximated by simulations z1∗ = giving

◮

∗ −θ b θbR θb1∗ − θb ∗ , . . . , z = , R 1/2 1/2 V1 VR

∗ ∗ (θb − V 1/2 z((R+1)(1−α)) , θb − V 1/2 z((R+1)α) )

For 95% limits set α = 0.025, so with R = 999 use ∗ , z∗ z(25) (975) , giving interval (77.1, 80.3).

Anthony Davison: Bootstrap Methods and their Application,

43

Several samples

..

5 0 -5

z*

.

0

2

-2

0

2

Quantiles of Standard Normal

... . .. .. .. ................... . . . .................................................................. . . .. .. . .... . .. .. ... ... .. . .. ....................................................................................... . . .............. ....... . . . ................................................................. ... . . . . . . . . . . . . . .. . ................................................................ .... . .. .. ...... ..... ..... ........... . . . ..... . .. ........................ . . . . . . ............................. ........ . . . ... . . ........... ..... .................. .. .. .. . . . . .. . . ..... ...... .... . . . . .. . . . . . .. . . . . . . .

.. . ... ..................... .... .. ................... ........................................ . .......................... .................... . ......................................................... .. . . . . . . . . .............................................. . . ...................................... . . . ................................. .... . . . . ... ... .. .................................... . .. . . . ...... ..... .... . . . . . ... ... ..... . . . . .

77

78

79

80

81

sqrt(v*)

Quantiles of Standard Normal

0.1 0.2 0.3 0.4 0.5 0.6 0.7

sqrt(v*)

.

.. . .... ............ ........... ..................... ......... ... ...... ........ ............. .......... .... ....... .... ............. .... . . . . . . . . ... ........

. -2

0.1 0.2 0.3 0.4 0.5 0.6 0.7

.

-10

.. .. .... .... ..... ..... . . . . . .... ......... ............ ........ ........... ......... ...... .... ......... . . . . . . .... ..... .... ...

-15

79 77

78

t*

80

81

Figure: Summary plots for 999 nonparametric bootstrap simulations. Top: normal probability plots of t∗ and z ∗ = (t∗ − t)/v ∗1/2 . Line on the top left has intercept t and slope v 1/2 , line on the top right has intercept zero and unit slope. Bottom: the smallest t∗r also has the smallest v ∗ , leading to an outlying value of z ∗ .

. -15

-10

t*

Anthony Davison: Bootstrap Methods and their Application,

-5

0

5

z*

44

Several samples

Key points

◮

For several independent samples, implement bootstrap by stratified sampling independently from each

◮

Same basic ideas apply for confidence intervals

Anthony Davison: Bootstrap Methods and their Application,

45

Variance estimation

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

46

Variance estimation

Variance estimation

◮

◮

Variance estimate V needed for certain types of confidence interval (esp. studentized bootstrap) Ways to compute this: • • • •

double bootstrap delta method nonparametric delta method jackknife

Anthony Davison: Bootstrap Methods and their Application,

47

Variance estimation

Double bootstrap

◮ ◮

◮

◮ ◮

Bootstrap sample y1∗ , . . . , yn∗ and corresponding estimate θb∗ Take Q second-level bootstrap samples y1∗∗ , . . . , yn∗∗ from y1∗ , . . . , yn∗ , giving corresponding bootstrap estimates ∗∗ , θb1∗∗ , . . . , θbQ

Compute variance estimate V as sample variance of ∗∗ θb1∗∗ , . . . , θbQ

Requires total R(Q + 1) resamples, so could be expensive . Often reasonable to take Q = 50 for variance estimation, so need O(50 × 1000) resamples — nowadays not infeasible

Anthony Davison: Bootstrap Methods and their Application,

48

Variance estimation

Delta method ◮

◮ ◮

◮ ◮ ◮

◮

Computation of variance formulae for functions of averages and other estimators b estimates ψ = g(θ), and θb ∼. N (θ, σ 2 /n) Suppose ψb = g(θ) Then provided g′ (θ) 6= 0, have (D2) b = g(θ) + O(n−1 ) E(ψ) b = σ 2 g′ (θ)2 /n + O(n−3/2 ) var(ψ)

. 2 ′ b2 b = Then var(ψ) σ b g (θ) /n = V Example (D3): θb = Y , ψb = log θb . b = Variance stabilisation (D4): if var(θ) S(θ)2 /n, find . b =constant transformation h such that var{h(θ)} Extends to multivariate estimators, and to ψb = g(θb1 , . . . , θbd )

Anthony Davison: Bootstrap Methods and their Application,

49

Variance estimation

Anthony Davison: Bootstrap Methods and their Application,

50

Variance estimation

Nonparametric delta method ◮ ◮

Write parameter θ = t(F ) as functional of distribution F General approximation: n 1 X . L(Yj ; F )2 . V = VL = 2 n j=1

◮

◮

L(y; F ) is influence function value for θ for observation at y when distribution is F : t {(1 − ε)F + εHy } − t(F ) , L(y; F ) = lim ε→0 ε where Hy puts unit mass at y. Close link to robustness. Empirical versions of L(y; F ) and VL are X lj2 , lj = L(yj ; Fb), vL = n−2 usually obtained by analytic/numerical differentation.

Anthony Davison: Bootstrap Methods and their Application,

51

Variance estimation

Computation of lj ◮ ◮

Write θb in weighted form, differentiate with respect to ε Sample average: X 1X yj = wj yj θb = y = n wj ≡1/n Change weights:

wj 7→ ε + (1 − ε) n1 ,

wi 7→ (1 − ε) n1 ,

i 6= j

so (D5)

◮

y 7→ y ε = εyj + (1 − ε)y = ε(yj − y) + y, P −1 2 giving lj = yj − y and vL = n12 (yj − y)2 = n−1 n n s Interpretation: lj is standardized change in y when increase mass on yj

Anthony Davison: Bootstrap Methods and their Application,

52

Variance estimation

Nonparametric delta method: Ratio ◮

Population F (u, x) with y = (u, x) and Z Z θ = t(F ) = x dF (u, x)/ u dF (u, x), sample version is

◮

θb = t(Fb) =

Z

x dFb(u, x)/

Z

u dFb(u, x) = x/u

Then using chain rule of differentiation (D6),

giving

b j )/u, lj = (xj − θu

1 X vL = 2 n

b j xj − θu u

Anthony Davison: Bootstrap Methods and their Application,

!2 53

Variance estimation

Handedness data: Correlation coefficient ◮

Correlation coefficient P may be written as a function of −1 averages xu = n xj uj etc.: θb = n

xu − x u

(x2

− x2 )(u2

−

o1/2 ,

u2 )

from which empirical influence values lj can be derived ◮

In this example (and for others involving only averages), nonparametric delta method is equivalent to delta method

◮

Get vL = 0.029

◮

for comparison with v = 0.043 obtained by bootstrapping. b — as here! vL typically underestimates var(θ)

Anthony Davison: Bootstrap Methods and their Application,

54

Variance estimation

Delta methods: Comments ◮

Can be applied to many complex statistics

◮

Delta method variances often underestimate true variances:

◮

b vL < var(θ)

Can be applied automatically (numerical differentation) if algorithm for θb written in weighted form, e.g. X xw = wj xj , wj ≡ 1/n for x

and vary weights successively for j = 1, . . . , n, setting X wj = wi + ε, i 6= j, wi = 1

for ε = 1/(100n) and using the definition as derivative Anthony Davison: Bootstrap Methods and their Application,

55

Variance estimation

Jackknife ◮

Approximation to empirical influence values given by lj ≈ ljack,j = (n − 1)(θb − θb−j ),

where θb−j is value of θb computed from sample y1 , . . . , yj−1 ,

◮

◮ ◮

yj+1 , . . . , yn

Jackknife bias and variance estimates are X 1 1X 2 − nb2jack ljack,j , vjack = ljack,j bjack = − n n(n − 1)

Requires n + 1 calculations of θb b with Corresponds to numerical differentiation of θ, ε = −1/(n − 1)

Anthony Davison: Bootstrap Methods and their Application,

56

Variance estimation

Key points

◮

Several methods available for estimation of variances

◮

Needed for some types of confidence interval

◮

Most general method is double bootstrap: can be expensive

◮

Delta methods rely on linear expansion, can be applied numerically or analytically

◮

Jackknife gives approximation to delta method, can fail for rough statistics

Anthony Davison: Bootstrap Methods and their Application,

57

Tests

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

58

Tests

Ingredients

◮

Ingredients for testing problems: • data y1 , . . . , yn ; • model M0 to be tested; • test statistic t = t(y1 , . . . , yn ), with large values giving evidence against M0 , and observed value tobs

◮

◮

P-value, pobs = Pr(T ≥ tobs | M0 ) measures evidence against M0 — small pobs indicates evidence against M0 . Difficulties: • pobs may depend upon ‘nuisance’ parameters, those of M0 ; • pobs often hard to calculate.

Anthony Davison: Bootstrap Methods and their Application,

59

Tests

Examples ◮

Balsam-fir seedlings in 5 × 5 quadrats — Poisson sample? 0 0 1 4 3

◮

1 2 1 1 1

2 0 1 2 4

3 2 1 5 3

4 4 4 2 1

3 2 1 0 0

4 3 5 3 0

2 3 2 2 2

2 4 2 1 7

1 2 3 1 0

Two-way layout: row-column independence? 1 2 0 1 0

2 0 1 1 1

2 0 1 2 1

1 2 1 0 1

1 3 2 0 1

Anthony Davison: Bootstrap Methods and their Application,

0 0 7 0 0

1 0 3 1 0 60

Tests

Estimation of pobs ◮

◮

◮

Estimate pobs by simulation from fitted null hypothesis c0 . model M Algorithm: for r = 1, . . . , R: c0 ; • simulate data set y1∗ , . . . , yn∗ from M • calculate test statistic t∗r from y1∗ , . . . , yn∗ .

Calculate simulation estimate

of ◮

pb =

1 + #{t∗r ≥ tobs } 1+R

c0 ). pbobs = Pr(T ≥ tobs | M

Simulation and statstical errors:

pb ≈ pbobs ≈ pobs

Anthony Davison: Bootstrap Methods and their Application,

61

Tests

Handedness data: Test of independence ◮ ◮

◮ ◮

◮

◮

Are dnan and hand positively associated? Take T = θb (correlation coefficient), with tobs = 0.509; this is large in case of positive association (one-sided test) Null hypothesis of independence: F (u, x) = F1 (u)F2 (x) Take bootstrap samples independently from Fb1 ≡ (dnan1 , . . . , dnann ) and from Fb2 ≡ (hand1 , . . . , handn ), then put them together to get bootstrap data (dnan∗1 , hand∗1 ), . . . , (dnan∗n , hand∗n ). b so With R = 9, 999 get 18 values of θb∗ ≥ θ, 1 + 18 pb = = 0.0019 : 1 + 9999 hand and dnan seem to be positively associated To test positive or negative association (two-sided test), b gives pb = 0.004. take T = |θ|:

Anthony Davison: Bootstrap Methods and their Application,

62

Tests

c0 Handedness data: Bootstrap from M

1

2

2

1

2

15

20

0.5

3

1 2

1

1102 3 4 3

25

30 dnan

1

35

40

0.0

1

hand 4 5

6

2

Probability density 1.0 1.5 2.0

7

2.5

8

1

45

Anthony Davison: Bootstrap Methods and their Application,

−0.6

−0.2 0.2 Test statistic

0.6

63

Tests

Choice of R ◮

◮

◮

Take R big enough to get small standard error for pb, typically ≥ 100, using binomial calculation: 1 + #{t∗r ≥ tobs } var(b p) = var 1+R pobs (1 − pobs ) 1 . Rpobs (1 − pobs ) = = 2 R R . so if pobs = 0.05 need R ≥ 1900 for 10% relative error (D7) . Can choose R sequentially: e.g. if pb = 0.06 and R = 99, can augment R enough to diminish standard error. Taking R too small lowers power of test.

Anthony Davison: Bootstrap Methods and their Application,

64

Tests

Duality with confidence interval ◮

Often unclear how to impose null hypothesis on sampling scheme

◮

General approach based on duality between confidence interval I1−α = (θα , ∞) and test of null hypothesis θ = θ0

◮

Reject null hypothesis at level α in favour of alternative θ > θ0 , if θ0 < θα

◮

Handedness data: θ0 = 0 6∈ I0.95 , but θ0 = 0 ∈ I0.99 , so estimated significance level 0.01 < pb < 0.05: weaker evidence than before Extends to tests of θ = θ0 against other alternatives:

◮

• if θ0 ∈ 6 I 1−α = (−∞, θα ), have evidence that θ < θ0 • if θ0 ∈ 6 I1−2α = (θα , θα ), have evidence that θ 6= θ0

Anthony Davison: Bootstrap Methods and their Application,

65

Tests

Pivot tests ◮ ◮

◮ ◮

◮

◮

Equivalent to use of confidence intervals Idea: use (approximate) pivot such as Z = (θb − θ)/V 1/2 as statistic to test θ = θ0 Observed value of pivot is zobs = (θb − θ0 )/V 1/2 Significance level is ! θb − θ = Pr(Z ≥ zobs | M0 ) ≥ zobs | M0 Pr V 1/2 = Pr(Z ≥ zobs | F ) . = Pr(Z ≥ zobs | Fb)

Compare observed zobs with simulated distribution of ∗1/2 , without needing to construct null b Z ∗ = (θb∗ − θ)/V c0 hypothesis model M Use of (approximate) pivot is essential for success

Anthony Davison: Bootstrap Methods and their Application,

66

Tests

Example: Handedness data ◮

◮

Test zero correlation (θ0 = 0), not independence; θb = 0.509, V = 0.1702 : θb − θ0 0.509 − 0 zobs = 1/2 = = 2.99 0.170 V Observed significance level is 1 + #{zr∗ ≥ zobs } 1 + 215 = = 0.0216 1+R 1 + 9999

0.0

Probability density 0.1 0.2 0.3

pb =

−6

−4

−2 0 2 Test statistic

4

Anthony Davison: Bootstrap Methods and their Application,

6

67

Tests

Exact tests ◮

Problem: bootstrap estimate is c0 ) 6= Pr(T ≥ t | M0 ) = pobs , pbobs = Pr(T ≥ tobs | M

so estimate the wrong thing ◮

In some cases can eliminate parameters from null hypothesis distribution by conditioning on sufficient statistic

◮

Then simulate from conditional distribution

◮

More generally, can use Metropolis–Hastings algorithm to simulate from conditional distribution (below)

Anthony Davison: Bootstrap Methods and their Application,

68

Tests

Example: Fir data ◮ ◮

◮

◮

◮

iid

Data Y1 , . . . , Yn ∼ Pois(λ), with λ unknown Poisson model has E(Y ) = var(Y ) = λ: base test of overdispersion on X . T = (Yj − Y )2 /Y ∼ χ2n−1 ;

observed value is tobs = 55.15 Unconditional significance level:

c0 , λ) Pr(T ≥ tobs | M

Condition on value w of sufficient statistic W = c0 , W = w), pobs = Pr(T ≥ tobs | M

P

Yj :

independent of λ, owing to sufficiency of W Exact test: simulate from P multinomial distribution of Y1 , . . . , Yn given W = Yj = 107.

Anthony Davison: Bootstrap Methods and their Application,

69

Tests

Example: Fir data

0.04

Figure: Simulation results for dispersion test. Left panel: R = 999 values of the dispersion statistic t∗ obtained under multinomial sampling: the data value is tobs = 55.15 and pb = 0.25. Right panel: chi-squared plot of ordered values of t∗ , dotted line shows χ249 approximation to null conditional distribution.

80 60 40 20

0.0

0.01

0.02

0.03

dispersion statistic t*

.

20

40

60

.

... . .... ...... ........ . . . ..... ...... ...... ..... ....... . . . . . ...... ...... ..... ...... ...... . . . . .. ..... ... . ..

80

t*

Anthony Davison: Bootstrap Methods and their Application,

30

40

50

60

70

80

chi-squared quantiles

70

Tests

Handedness data: Permutation test ◮ ◮ ◮

◮

Are dnan and hand related? Take T = θb (correlation coefficient) again Impose null hypothesis of independence: F (u, x) = F1 (u)F2 (x), but condition so that marginal distributions Fb1 and Fb2 are held fixed under resampling plan — permutation test Take resamples of form (dnan1 , hand1∗ ), . . . , (dnann , handn∗ )

◮

◮

where (1∗ , . . . , n∗ ) is random permutation of (1, . . . , n) Doing this with R = 9, 999 gives one- and two-sided significance probabilities of 0.002, 0.003 Typically values of pb very similar to those for corresponding bootstrap test

Anthony Davison: Bootstrap Methods and their Application,

71

Tests

1

0.0

2

0.5

3

hand 4 5

6

7

Probability density 1.0 1.5 2.0 2.5

8

3.0

Handedness data: Permutation resample

15

20

25

30 dnan

35

40

45

Anthony Davison: Bootstrap Methods and their Application,

−0.6

−0.2 0.2 Test statistic

0.6

72

Tests

Contingency table 1 2 0 1 0 ◮

2 0 1 1 1

2 0 1 2 1

1 2 1 0 1

1 3 2 0 1

0 0 7 0 0

1 0 3 1 0

Are row and column classifications independent: Pr(row i, column j) = Pr(row i) × Pr(column j)?

◮

Standard test statistic for independence is T =

X (yij − ybij )2 , ybij i,j

◮

ybij =

yi· y·j y··

.

Get Pr(χ224 ≥ 38.53) = 0.048, but is T ∼ χ224 ?

Anthony Davison: Bootstrap Methods and their Application,

73

Tests

Exact tests: Contingency table ◮

For exact test, need to simulate distribution of T conditional on sufficient statistics — row and column totals

◮

Algorithm (D8) for conditional simulation: 1. choose two rows j1 < j2 and two columns k1 < k2 at random 2. generate new values from hypergeometric distribution of yj1 k1 conditional on margins of 2 × 2 table y j 1 k1 y j 1 k2 y j 2 k1 y j 2 k2 3. compute test statistic T ∗ every I = 100 iterations, say

◮

Compare observed value tobs = 38.53 with simulated T ∗ — . get pb = 0.08

Anthony Davison: Bootstrap Methods and their Application,

74

Tests

Key points

◮ ◮

Tests can be performed using resampling/simulation Must take account of null hypothesis, by • modifying sampling scheme to satisfy null hypothesis • inverting confidence interval (pivot test)

◮

Can use Monte Carlo simulation to get approximations to exact tests — simulate from null distribution of data, conditional on observed value of sufficient statistic

◮

Sometimes obtain permutation tests — very similar to bootstrap tests

Anthony Davison: Bootstrap Methods and their Application,

75

Regression

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

76

Regression

Linear regression ◮

Independent data (x1 , y1 ), . . ., (xn , yn ) with yj = xTj β + εj ,

◮

b leverages hj , residuals Least squares estimates β, ej =

◮

εj ∼ (0, σ 2 )

yj − xTj βb

(1 − hj

)1/2

.

∼

(0, σ 2 )

Design matrix X is experimental ancillary — should be held fixed if possible, as b = σ 2 (X T X)−1 var(β)

if model y = Xβ + ε correct

Anthony Davison: Bootstrap Methods and their Application,

77

Regression

Linear regression: Resampling schemes ◮

Two main resampling schemes

◮

Model-based resampling: yj∗ = xTj βb + ε∗j ,

ε∗j ∼ EDF(e1 − e, . . . , en − e)

• Fixes design but not robust to model failure • Assumes εj sampled from population ◮

Case resampling: (xj , yj )∗ ∼ EDF{(x1 , y1 ), . . . , (xn , yn )} • Varying design X but robust • Assumes (xj , yj ) sampled from population • Usually design variation no problem; can prove awkward in designed experiments and when design sensitive.

Anthony Davison: Bootstrap Methods and their Application,

78

Regression

Cement data Table: Cement data: y is the heat (calories per gram of cement) evolved while samples of cement set. The covariates are percentages by weight of four constituents, tricalciumaluminate x1 , tricalcium silicate x2 , tetracalcium alumino ferrite x3 and dicalcium silicate x4 .

1 2 3 4 5 6 7 8 9 10 11 12 13

x1 7 1 11 11 7 11 3 1 2 21 1 11 10

x2 26 29 56 31 52 55 71 31 54 47 40 66 68

x3 6 15 8 8 6 9 17 22 18 4 23 9 8

x4 60 52 20 47 33 22 6 44 22 26 34 12 12

Anthony Davison: Bootstrap Methods and their Application,

y 78.5 74.3 104.3 87.6 95.9 109.2 102.7 72.5 93.1 115.9 83.8 113.3 109.4

79

Regression

Cement data ◮

Fit linear model y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + ε and apply case resampling

◮

◮

. Covariates compositional: x1 + · · · + x4 = 100% so X almost collinear — smallest eigenvalue of X T X is l5 = 0.0012 Plot of βb1∗ against smallest eigenvalue of X ∗T X ∗ reveals that var∗ (βb∗ ) strongly variable 1

◮

Relevant subset for case resampling — post-stratification of output based on l5∗ ?

Anthony Davison: Bootstrap Methods and their Application,

80

Regression

10

10

Cement data

. .

5 beta2hat*

-5

.

.

5 10

-10

-10

.

. .

1

. . . . .. ........ .. . .............................. . . .. .................................. ........ ................ .. . .. ... ... ................................................................................ .. . . ... ..................... . ... . . .......... ............... .. .. .. ... .. . .

. . . 0

5 0 -5

beta1hat*

. .. . . . ..... ........... . . . . .. ..... ......................................................... .... . . ...... . .. .. .............................................................. . . .. ... .. ................................................................. . . .. . .... ................ . . ... ..... . . . .. . . .

50

500

1

smallest eigenvalue

Anthony Davison: Bootstrap Methods and their Application,

. 5 10

50

500

smallest eigenvalue

81

Regression

Cement data

Table: Standard errors of linear regression coefficients for cement data. Theoretical and error resampling assume homoscedasticity. Resampling results use R = 999 samples, but last two rows are based only on those samples with the middle 500 and the largest 800 values of ℓ∗1 .

Normal theory Model-based resampling, R = 999 Case resampling, all R = 999 Case resampling, largest 800

Anthony Davison: Bootstrap Methods and their Application,

βb0 70.1 66.3 108.5 67.3

βb1 0.74 0.70 1.13 0.77

βb2 0.72 0.69 1.12 0.69

82

Regression

Survival data dose x survival % y

◮ ◮

117.5 44.000 55.000

235.0 16.000 13.000

470.0 4.000 1.960 6.120

705.0 0.500 0.320

940.0 0.110 0.015 0.019

1410 0.700 0.006

Data on survival % for rats at different doses Linear model: log(survival) = β0 + β1 dose •• ••

• • • 200

• 600

• 1000

0

•

• • -2

• •

• • •

•

-4

20

30

log survival %

2

•

0

10

survival %

40

50

4

•

••

• 1400

• 200

dose

Anthony Davison: Bootstrap Methods and their Application,

600

1000

1400

dose

83

Regression

Survival data ◮ ◮

-0.004 -0.008 -0.012

estimated slope

0.0

◮

Case resampling Replication of outlier: none (0), once (1), two or more (•). Model-based sampling including residual would lead to change in intercept but not slope.

•

•

•

• •• • • •

• 1 • • •• • •• 1 ••• • 1 •••••• ••• ••• 111 1 11 1 ••• • •••••••• ••••• 11 1111 1 • 1 1 1 1 1 1 1 1 1 11 11111111 1111111 1 11 1 1111 11111 1 1 1 1 1 1 1 1 1 1 0 1 1 11 1 1 0 0 0 01000 00 0 0 0 0 0 0 00000 0 0 0 0 000 0 0 0 00 0 000 0 0 0000 00 0 00 000 0000 00 0 0

0.5

1.0

1.5

2.0

2.5

3.0

sum of squares Anthony Davison: Bootstrap Methods and their Application,

84

Regression

Generalized linear model ◮

Response may be binomial, Poisson, gamma, normal, . . . yj

◮

mean µj , variance φV (µj ),

where g(µj ) = xTj β is linear predictor; g(·) is link function. b fitted values µ MLE β, bj , Pearson residuals rP j =

◮

∼

yj − µ bj

{V (b µj )(1 − hj )}

.

1/2

∼

(0, φ).

Bootstrapped responses

yj∗ = µ bj + V (b µj )1/2 ε∗j

where ε∗j ∼ EDF(rP 1 − rP , . . . , rP n − rP ). However • possible that yj∗ 6∈ {0, 1, 2, . . . , } • rP j not exchangeable, so may need stratified resampling Anthony Davison: Bootstrap Methods and their Application,

85

Regression

AIDS data ◮

Log-linear model: number of reports in row j and column k follows Poisson distribution with mean µjk = exp(αj + βk )

◮

Log link function g(µjk ) = log µjk = αj + βk and variance function var(Yjk ) = φ × V (µjk ) = 1 × µjk

◮

Pearson residuals: rjk =

◮

Yjk − µ bjk {b µjk (1 − hjk )}1/2

Model-based simulation:

1/2

∗ =µ bjk + µ bjk ε∗jk Yjk

Anthony Davison: Bootstrap Methods and their Application,

86

Regression

Diagnosis period Year 1988

1989

1990

1991

1992

Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Reporting-delay interval (quarters): 0† 31 26 31 36 32 15 34 38 31 32 49 44 41 56 53 63 71 95 76 67

1 80 99 95 77 92 92 104 101 124 132 107 153 137 124 175 135 161 178 181

2 16 27 35 20 32 14 29 34 47 36 51 41 29 39 35 24 48 39

3 9 9 13 26 10 27 31 18 24 10 17 16 33 14 17 23 25

4 3 8 18 11 12 22 18 9 11 9 15 11 7 12 13 12

5 2 11 4 3 19 21 8 15 15 7 8 6 11 7 11

Anthony Davison: Bootstrap Methods and their Application,

6 8 3 6 8 12 12 6 6 8 6 9 5 6 10

··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ···

≥14 6 3 3 2 2 1

Total reports to end of 1992 174 211 224 205 224 219 253 233 281 245 260 285 271 263 306 258 310 318 273 133

87

Regression

AIDS data

0

rP

2

4

1984 1986 1988 1990 1992

-2

+ ++ + + + + ++ ++ + + + + +++ ++ ++ + ++++ + + + +++ +++++

-4

300 200 0

100

Diagnoses

400

6

500

◮

Poisson two-way model deviance 716.5 on 413 df — indicates strong overdispersion: φ > 1, so Poisson model implausible Residuals highly inhomogeneous — exchangeability doubtful

-6

◮

0

1

2

3

4

5

skewness Anthony Davison: Bootstrap Methods and their Application,

88

Regression

AIDS data: Prediction intervals ◮

To estimate prediction error: ∗ • simulate complete table yjk ; ∗ • estimate parameters from incomplete yjk • get estimated row totals and ‘truth’ X b∗ X ∗ ∗ ∗ µ b∗+,j = eαbj = eβk , y+,j yjk . k unobs

• Prediction error

k unobs

∗ y+,j −µ b∗+,j ∗1/2

µ b+,j

studentized so more nearly pivotal. ◮

Form prediction intervals from R replicates.

Anthony Davison: Bootstrap Methods and their Application,

89

Regression

AIDS data: Resampling plans ◮

Resampling schemes: • • • •

parametric simulation, fitted Poisson model parametric simulation, fitted negative binomial model nonparametric resampling of rP stratified nonparametric resampling of rP

◮

Stratification based on skewness of residuals, equivalent to stratifying original data by values of fitted means

◮

Take strata for which µ bjk < 1,

1≤µ bjk < 2,

Anthony Davison: Bootstrap Methods and their Application,

µ bjk ≥ 2 90

Regression

AIDS data: Results ◮

Deviance/df ratios for the sampling schemes, R = 999.

◮

Poisson variation inadequate.

◮

95% prediction limits.

600

5 4 3

1.0

1.5

2.0

2.5

0.0

0.5

1.0

1.5

2.0

2.5

deviance/df

nonparametric

stratified nonparametric

300

5 4

200

3

+ + ++ + +

1

2

+

+ ++ + + + ++ + +

1989 1990 1991 1992

0

0

1

2

3

4

5

6

deviance/df

Diagnoses

0.5

6

0.0

400

500

2 1 0

0

1

2

3

4

5

6

negative binomial

6

poisson

0.0

0.5

1.0

1.5

deviance/df

2.0

2.5

0.0

0.5

1.0

1.5

2.0

2.5

deviance/df

Anthony Davison: Bootstrap Methods and their Application,

91

Regression

AIDS data: Semiparametric model ◮

More realistic: generalized additive model µjk = exp {α(j) + βk } ,

600 500 Diagnoses

400 300

+ ++ + + + ++ ++ + + +++++ + +++ + +++++ + + + +++ + ++++

+ +

200

0

Diagnoses

◮

where α(j) is locally-fitted smooth. Same resampling plans as before 95% intervals now generally narrower and shifted upwards 100 200 300 400 500 600

◮

+ + ++ + +

1984 1986 1988 1990 1992

Anthony Davison: Bootstrap Methods and their Application,

+ ++ + + + ++ +

1989 1990 1991 1992

92

Regression

Key points

◮ ◮

Key assumption: independence of cases Two main resampling schemes for regression settings: • Model-based • Case resampling

◮

Intermediate schemes possible

◮

Can help to reduce dependence on assumptions needed for regression model

◮

These two basic approaches also used for more complex settings (time series, . . .), where data are dependent

Anthony Davison: Bootstrap Methods and their Application,

93

Regression

Summary ◮ ◮

Bootstrap: simulation methods for frequentist inference. Useful when • standard assumptions invalid (n small, data not normal, . . .); • standard problem has non-standard twist; • complex problem has no (reliable) theory; • or (almost) anywhere else.

◮

Have described • • • •

basic ideas confidence intervals tests some approaches for regression

Anthony Davison: Bootstrap Methods and their Application,

94

Regression

Books ◮

Chernick (1999) Bootstrap Methods: A Practicioner’s Guide. Wiley

◮

Davison and Hinkley (1997) Bootstrap Methods and their Application. Cambridge University Press

◮

Efron and Tibshirani (1993) An Introduction to the Bootstrap. Chapman & Hall

◮

Hall (1992) The Bootstrap and Edgeworth Expansion. Springer

◮

Lunneborg (2000) Data Analysis by Resampling: Concepts and Applications. Duxbury Press

◮

Manly (1997) Randomisation, Bootstrap and Monte Carlo Methods in Biology. Chapman & Hall

◮

Shao and Tu (1995) The Jackknife and Bootstrap. Springer

Anthony Davison: Bootstrap Methods and their Application,

95

2006 A short course based on the book ‘Bootstrap Methods and their Application’, by A. C. Davison and D. V. Hinkley c

Cambridge University Press, 1997

Anthony Davison: Bootstrap Methods and their Application,

1

Summary ◮ ◮

Bootstrap: simulation methods for frequentist inference. Useful when • standard assumptions invalid (n small, data not normal, . . .); • standard problem has non-standard twist; • complex problem has no (reliable) theory; • or (almost) anywhere else.

◮

Aim to describe • • • •

basic ideas; confidence intervals; tests; some approaches for regression

Anthony Davison: Bootstrap Methods and their Application,

2

Motivation

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

3

Motivation

AIDS data ◮ ◮ ◮

◮

◮

UK AIDS diagnoses 1988–1992. Reporting delay up to several years! Problem: predict state of epidemic at end 1992, with realistic statement of uncertainty. Simple model: number of reports in row j and column k Poisson, mean µjk = exp(αj + βk ). Unreported diagnoses in period j Poisson, mean X X µjk = exp(αj ) exp(βk ). k unobs

◮

k unobs

Estimate total unreported diagnoses from period j by replacing αj and βk by MLEs. • How reliable are these predictions? • How sensible is the Poisson model?

Anthony Davison: Bootstrap Methods and their Application,

4

Motivation

Diagnosis period Year 1988

1989

1990

1991

1992

Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Reporting-delay interval (quarters): 0† 31 26 31 36 32 15 34 38 31 32 49 44 41 56 53 63 71 95 76 67

1 80 99 95 77 92 92 104 101 124 132 107 153 137 124 175 135 161 178 181

2 16 27 35 20 32 14 29 34 47 36 51 41 29 39 35 24 48 39

3 9 9 13 26 10 27 31 18 24 10 17 16 33 14 17 23 25

4 3 8 18 11 12 22 18 9 11 9 15 11 7 12 13 12

5 2 11 4 3 19 21 8 15 15 7 8 6 11 7 11

Anthony Davison: Bootstrap Methods and their Application,

6 8 3 6 8 12 12 6 6 8 6 9 5 6 10

··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ···

≥14 6 3 3 2 2 1

Total reports to end of 1992 174 211 224 205 224 219 253 233 281 245 260 285 271 263 306 258 310 318 273 133

5

Motivation

AIDS data ◮

◮

300

+ ++ + ++ + + + + + + + + + + + + ++ + ++ + +++

200 0

100

Diagnoses

400

500

◮

Data (+), fits of simple model (solid), complex model (dots) Variance formulae could be derived — painful! but useful? Effects of overdispersion, complex model, . . .?

+ ++ + ++ + ++++ 1984

1986

1988

1990

1992

Time

Anthony Davison: Bootstrap Methods and their Application,

6

Motivation

Goal

Find reliable assessment of uncertainty when ◮

estimator complex

◮

data complex

◮

sample size small

◮

model non-standard

Anthony Davison: Bootstrap Methods and their Application,

7

Basic notions

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

8

Basic notions

Handedness data Table: Data from a study of handedness; hand is an integer measure of handedness, and dnan a genetic measure. Data due to Dr Gordon Claridge, University of Oxford.

1 2 3 4 5 6 7 8 9 10

dnan 13 18 20 21 21 24 24 27 28 28

hand 1 1 3 1 1 1 1 1 1 2

11 12 13 14 15 16 17 18 19 20

dnan 28 28 28 28 28 28 29 29 29 29

hand 1 2 1 4 1 1 1 1 1 2

21 22 23 24 25 26 27 28 29 30

dnan 29 29 29 30 30 30 30 31 31 31

Anthony Davison: Bootstrap Methods and their Application,

hand 2 1 1 1 1 2 1 1 1 1

31 32 33 34 35 36 37

dnan 31 31 33 33 34 41 44

hand 1 2 6 1 1 4 8

9

Basic notions

Handedness data Figure: Scatter plot of handedness data. The numbers show the multiplicities of the observations.

7

8

1

hand

5

6

1

4

1

3

1

2211

2 1

1

1

1 15

2 20

15534

2 25

11

30 dnan

Anthony Davison: Bootstrap Methods and their Application,

35

40

45

10

Basic notions

Handedness data

◮

◮ ◮

Is there dependence between dnan and hand for these n = 37 individuals? Sample product-moment correlation coefficient is θb = 0.509. Standard confidence interval (based on bivariate normal population) gives 95% CI (0.221, 0.715).

◮

Data not bivariate normal!

◮

What is the status of the interval? Can we do better?

Anthony Davison: Bootstrap Methods and their Application,

11

Basic notions

Frequentist inference ◮ ◮ ◮

Estimator θb for unknown parameter θ. iid

Statistical model: data y1 , . . . , yn ∼ F , unknown Handedness data • • •

◮

◮

y = (dnan, hand) F puts probability mass on subset of R2 θb is correlation coefficient

Key issue: what is variability of θb when samples are repeatedly taken from F ? Imagine F known — could answer question by • analytical (mathematical) calculation • simulation

Anthony Davison: Bootstrap Methods and their Application,

12

Basic notions

Simulation with F known ◮

For r = 1, . . . , R: iid

◮

• generate random sample y1∗ , . . . , yn∗ ∼ F ; • compute θbr using y1∗ , . . . , yn∗ ;

Output after R iterations:

∗ θb1∗ , θb2∗ , . . . , θbR

◮

∗ to estimate sampling distribution of θ b Use θb1∗ , θb2∗ , . . . , θbR (histogram, density estimate, . . .)

◮

In practice R is finite, so some error remains

◮

If R → ∞, then get perfect match to theoretical calculation (if available): Monte Carlo error disappears completely

Anthony Davison: Bootstrap Methods and their Application,

13

Basic notions

Handedness data: Fitted bivariate normal model Figure: Contours of bivariate normal distribution fitted to handedness data; parameter estimates are µ b1 = 28.5, µ b2 = 1.7, σ b1 = 5.4, σ b2 = 1.5, ρb = 0.509. The data are also shown. 10 0.020 1

8

0.015 1

hand

6

1

4

0.010

1

1 0.005

2211

2 1

1

2

15534 11

2

0

0.000 10

15

20

25

30

35

40

45

dnan

Anthony Davison: Bootstrap Methods and their Application,

14

Basic notions

Handedness data: Parametric bootstrap samples

10 8

10 8

8

10

Figure: Left: original data, with jittered vertical values. Centre and right: two samples generated from the fitted bivariate normal distribution.

0

10

20

30 dnan

40

50

hand 0

2

4

hand 4 2 0

0

2

4

hand

6

Correlation 0.533

6

Correlation 0.753

6

Correlation 0.509

0

10

20

30 dnan

Anthony Davison: Bootstrap Methods and their Application,

40

50

0

10

20

30

40

50

dnan

15

Basic notions

Handedness data: Correlation coefficient Figure: Bootstrap distributions with R = 10000. Left: simulation from fitted bivariate normal distribution. Right: simulation from the data by bootstrap resampling. The lines show the theoretical probability density function of the correlation coefficient under sampling from a fitted bivariate

3.5 3.0 Probability density 1.0 1.5 2.0 2.5 0.5 0.0

0.0

0.5

Probability density 1.0 1.5 2.0 2.5

3.0

3.5

normal distribution.

−0.5

0.0 0.5 Correlation coefficient

1.0

−0.5

Anthony Davison: Bootstrap Methods and their Application,

0.0 0.5 Correlation coefficient

1.0

16

Basic notions

F unknown ◮

Replace unknown F by estimate Fb obtained

• parametrically — e.g. maximum likelihood or robust fit of distribution F (y) = F (y; ψ) (normal, exponential, bivariate normal, . . .) • nonparametrically — using empirical distribution function (EDF) of original data y1 , . . . , yn , which puts mass 1/n on each of the yj

◮

◮

Algorithm: For r = 1, . . . , R: iid • generate random sample y1∗ , . . . , yn∗ ∼ Fb ; ∗ ∗ • compute θbr using y1 , . . . , yn ;

Output after R iterations:

∗ θb1∗ , θb2∗ , . . . , θbR Anthony Davison: Bootstrap Methods and their Application,

17

Basic notions

Nonparametric bootstrap ◮

iid Bootstrap (re)sample y1∗ , . . . , yn∗ ∼ Fb, where Fb is EDF of y1 , . . . , yn

• Repetitions will occur!

◮ ◮

◮ ◮ ◮

Compute bootstrap θb∗ using y1∗ , . . . , yn∗ .

For handedness data take n = 37 pairs y ∗ = (dnan, hand)∗ with equal probabilities 1/37 and replacement from original pairs (dnan, hand) Repeat this R times, to get θb∗ , . . . , θb∗ 1

R

See picture

Results quite different from parametric simulation — why?

Anthony Davison: Bootstrap Methods and their Application,

18

Basic notions

Handedness data: Bootstrap samples

10

15

20

25 30 dnan

35

40

45

6 1

2

3

hand 4 5

6 hand 4 5 3 2 1

1

2

3

hand 4 5

6

7

Correlation 0.491

7

Correlation 0.733

7

Correlation 0.509

8

8

8

Figure: Left: original data, with jittered vertical values. Centre and right: two bootstrap samples, with jittered vertical values.

10

15

20

25 30 dnan

35

Anthony Davison: Bootstrap Methods and their Application,

40

45

10

15

20

25 30 dnan

35

40

45

19

Basic notions

Using the θb∗ ◮

◮ ◮

b Bootstrap replicates θbr∗ used to estimate properties of θ. Write θ = θ(F ) to emphasize dependence on F Bias of θb as estimator of θ is iid β(F ) = E(θb | y1 , . . . , yn ∼ F ) − θ(F )

estimated by replacing unknown F by known estimate Fb: ◮

iid β(Fb) = E(θb | y1 , . . . , yn ∼ Fb ) − θ(Fb )

Replace theoretical expectation E() by empirical average: β(Fb) ≈ b = θb∗ − θb = R−1

Anthony Davison: Bootstrap Methods and their Application,

R X r=1

θbr∗ − θb 20

Basic notions

◮

Estimate variance ν(F ) = var(θb | F ) by

1 X b∗ b∗ 2 v= θ −θ R − 1 r=1 r R

◮

◮

Estimate quantiles of θb by taking empirical quantiles of ∗ θb1∗ , . . . , θbR

For handedness data, 10,000 replicates shown earlier give b = −0.046,

v = 0.043 = 0.2052

Anthony Davison: Bootstrap Methods and their Application,

21

Basic notions

Handedness data b Figure: Summaries of the θb∗ . Left: histogram, with vertical line showing θ. Right: normal Q–Q plot of θb∗ .

0.0

−0.5

0.5

t* 0.0

Density 1.0 1.5

0.5

2.0

2.5

Histogram of t

−0.5

0.0 t*

0.5

1.0

−4 −2 0 2 4 Quantiles of Standard Normal

Anthony Davison: Bootstrap Methods and their Application,

22

Basic notions

How many bootstraps? ◮

◮ ◮

Must estimate moments and quantiles of θb and derived quantities. Nowadays often feasible to take R ≥ 5000 Need R ≥ 100 to estimate bias, variance, etc. Need R ≫ 100, prefer R ≥ 1000 to estimate quantiles needed for 95% confidence intervals

2 Z* 0 −2 −4

−4

−2

Z* 0

2

4

R=999

4

R=199

−4

−2 0 2 Theoretical Quantiles

4

Anthony Davison: Bootstrap Methods and their Application,

−4

−2 0 2 Theoretical Quantiles

4

23

Basic notions

Key points ◮

Estimator is algorithm • • • •

◮

◮

applied to original data y1 , . . . , yn gives original θb applied to simulated data y1∗ , . . . , yn∗ gives θb∗ θb can be of (almost) any complexity for more sophisticated ideas (later) to work, θb must often be smooth function of data

Sample is used to estimate F

• Fb ≈ F — heroic assumption

Simulation replaces theoretical calculation • removes need for mathematical skill • does not remove need for thought • check code very carefully — garbage in, garbage out!

◮

Two sources of error • statistical (Fb 6= F ) — reduce by thought • simulation (R 6= ∞) — reduce by taking R large (enough)

Anthony Davison: Bootstrap Methods and their Application,

24

Confidence intervals

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

25

Confidence intervals

Normal confidence intervals ◮

◮

◮

. If θb approximately normal, then θb ∼ N (θ + β, ν), where θb has bias β = β(F ) and variance ν = ν(F ) If β, ν known, (1 − 2α) confidence interval for θ would be (D1) θb − β ± zα ν 1/2 ,

where Φ(zα ) = α. Replace β, ν by estimates: . . β(F ) = β(Fb) = b = θb∗ − θb X . . (θbr∗ − θb∗ )2 , ν(F ) = ν(Fb) = v = (R − 1)−1 r

◮

giving (1 − 2α) interval θb − b ± zα v 1/2 . Handedness data: R = 10, 000, b = −0.046, v = 0.2052 , α = 0.025, zα = −1.96, so 95% CI is (0.147, 0.963)

Anthony Davison: Bootstrap Methods and their Application,

26

Confidence intervals

Normal confidence intervals Normal approximation reliable? Transformation needed? b b − θ)}: Here are plots for ψb = 12 log{(1 + θ)/(1

0.0

0.5

Density 1.0

1.5

◮

Transformed correlation coefficient −0.5 0.0 0.5 1.0 1.5

◮

−0.5 0.0 0.5 1.0 1.5 Transformed correlation coefficient

−4

Anthony Davison: Bootstrap Methods and their Application,

−2 0 2 Quantiles of Standard Normal

4

27

Confidence intervals

Normal confidence intervals ◮

Correlation coefficient: try Fisher’s z transformation: b = ψb = ψ(θ)

for which compute bψ = R−1

R X r=1

◮

◮ ◮

1 2

b b log{(1 + θ)/(1 − θ)} 1 X b∗ b∗ 2 , ψ −ψ R − 1 r=1 r R

b ψbr∗ − ψ,

vψ =

(1 − 2α) confidence interval for θ is o n o n 1/2 1/2 , ψ −1 ψb − bψ − zα vψ ψ −1 ψb − bψ − z1−α vψ For handedness data, get (0.074, 0.804) But how do we choose a transformation in general?

Anthony Davison: Bootstrap Methods and their Application,

28

Confidence intervals

Pivots ◮

◮

◮

◮

∗ mimic effect of sampling from Hope properties of θb1∗ , . . . , θbR original model. Amounts to faith in ‘substitution principle’: may replace unknown F with known Fb — false in general, but often more nearly true for pivots. Pivot is combination of data and parameter whose distribution is independent of underlying model. iid

Canonical example: Y1 , . . . , Yn ∼ N (µ, σ 2 ). Then Z=

◮

Y −µ (S 2 /n)1/2

∼

tn−1 ,

for all µ, σ 2 — so independent of the underlying distribution, provided this is normal Exact pivot generally unavailable in nonparametric case.

Anthony Davison: Bootstrap Methods and their Application,

29

Confidence intervals

Studentized statistic ◮ ◮ ◮

Idea: generalize Student t statistic to bootstrap setting Requires variance V for θb computed from y1 , . . . , yn Analogue of Student t statistic: θb − θ V 1/2 If the quantiles zα of Z known, then Z=

◮

θb − θ Pr (zα ≤ Z ≤ z1−α ) = Pr zα ≤ 1/2 ≤ z1−α V

!

= 1 − 2α

(zα no longer denotes a normal quantile!) implies that Pr θb − V 1/2 z1−α ≤ θ ≤ θb − V 1/2 zα = 1 − 2α

so (1 − 2α) confidence interval is (θb − V 1/2 z1−α , θb − V 1/2 zα )

Anthony Davison: Bootstrap Methods and their Application,

30

Confidence intervals ◮

Bootstrap sample gives (θb∗ , V ∗ ) and hence Z∗ =

◮

b V ): R bootstrap copies of (θ, (θb1∗ , V1∗ ),

and corresponding z1∗ =

◮

◮

θb1∗ − θb

, ∗1/2

V1 ∗ ∗ z1 , . . . , zR

θb∗ − θb V ∗1/2

(θb2∗ , V2∗ ),

z2∗ =

θb2∗ − θb

, ∗1/2

V2

...,

...,

∗ (θbR , VR∗ ) ∗ zR =

∗ −θ b θbR . ∗1/2 VR

Use to estimate distribution of Z — for example, ∗ < · · · < z∗ order statistics z(1) (R) used to estimate quantiles Get (1 − 2α) confidence interval ∗ θb − V 1/2 z((1−α)(R+1)) ,

Anthony Davison: Bootstrap Methods and their Application,

∗ θb − V 1/2 z(α(R+1))

31

Confidence intervals

Why Studentize? ◮

D

Studentize, so Z −→ N (0, 1) as n → ∞. Edgeworth series: Pr(Z ≤ z | F ) = Φ(z) + n−1/2 a(z)φ(z) + O(n−1 ); a(·) even quadratic polynomial.

◮

Corresponding expansion for Z ∗ is Pr(Z ∗ ≤ z | Fb) = Φ(z) + n−1/2 b a(z)φ(z) + Op (n−1 ).

Typically b a(z) = a(z) + Op (n−1/2 ), so

Pr(Z ∗ ≤ z | Fb) − Pr(Z ≤ z | F ) = Op (n−1 ).

Anthony Davison: Bootstrap Methods and their Application,

32

Confidence intervals

◮

D

If don’t studentize, Z = (θb − θ) −→ N (0, ν). Then z z z Pr(Z ≤ z | F ) = Φ 1/2 +n−1/2 a′ 1/2 φ 1/2 +O(n−1 ) ν ν ν and

Pr(Z ∗ ≤ z | Fb) = Φ

z z z −1/2 ′ +n b a φ 1/2 +Op (n−1 ). νb1/2 νb1/2 νb

Typically νb = ν + Op (n−1/2 ), giving

◮

Pr(Z ∗ ≤ z | Fb ) − Pr(Z ≤ z | F ) = Op (n−1/2 ).

Thus use of Studentized Z reduces error from Op (n−1/2 ) to Op (n−1 ) — better than using large-sample asymptotics, for which error is usually Op (n−1/2 ).

Anthony Davison: Bootstrap Methods and their Application,

33

Confidence intervals

Other confidence intervals ◮

◮

Problem for studentized intervals: must obtain V , intervals not scale-invariant Simpler approaches: • Basic bootstrap interval: treat θb − θ as pivot, get ∗ b θb − (θb((R+1)(1−α)) − θ),

∗ b θb − (θb((R+1)α) − θ).

∗ • Percentile interval: use empirical quantiles of θb1∗ , . . . , θbR :

◮

∗ , θb((R+1)α)

∗ . θb((R+1)(1−α))

Improved percentile intervals (BCa , ABC, . . .) • Replace percentile interval with ∗ θb((R+1)α ′) ,

∗ θb((R+1)(1−α ′′ )) ,

where α′ , α′′ chosen to improve properties. • Scale-invariant. Anthony Davison: Bootstrap Methods and their Application,

34

Confidence intervals

Handedness data ◮

95% confidence intervals for correlation coefficient θ, R = 10, 000: Normal 0.147 0.963 Percentile −0.047 0.758 Basic 0.262 1.043 ′ ′′ BCa (α = 0.0485, α = 0.0085) 0.053 0.792 Student 0.030 1.206 Basic (transformed) Student (transformed)

◮

0.131 0.824 −0.277 0.868

Transformation is essential here!

Anthony Davison: Bootstrap Methods and their Application,

35

Confidence intervals

General comparison ◮

Bootstrap confidence intervals usually too short — leads to under-coverage

◮

Normal and basic intervals depend on scale.

◮

Percentile interval often too short but is scale-invariant. Studentized intervals give best coverage overall, but

◮

• depend on scale, can be sensitive to V • length can be very variable • best on transformed scale, where V is approximately constant ◮

Improved percentile intervals have same error in principle as studentized intervals, but often shorter — so lower coverage

Anthony Davison: Bootstrap Methods and their Application,

36

Confidence intervals

Caution ◮

◮

.. .

10

◮

Edgeworth theory OK for smooth statistics — beware rough statistics: must check output. Bootstrap of median theoretically OK, but very sensitive to sample values in practice. Role for smoothing?

0 -10

-5

T*-t for medians

5

. .. ... . ........ ......... ............. . ............................. .......... ................ ............ .............. ........ . ......... . ...... . . -2

0

2

Quantiles of Standard Normal

Anthony Davison: Bootstrap Methods and their Application,

37

Confidence intervals

Key points

◮

Numerous procedures available for ‘automatic’ construction of confidence intervals

◮

Computer does the work

◮

Need R ≥ 1000 in most cases

◮

Generally such intervals are a bit too short

◮

Must examine output to check if assumptions (e.g. smoothness of statistic) satisfied

◮

May need variance estimate V — see later

Anthony Davison: Bootstrap Methods and their Application,

38

Several samples

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

39

Several samples

Gravity data Table: Measurements of the acceleration due to gravity, g, given as deviations from 980,000 ×10−3 cms−2 , in units of cms−2 × 10−3 .

1 76 82 83 54 35 46 87 68

2 87 95 98 100 109 109 100 81 75 68 67

3 105 83 76 75 51 76 93 75 62

Series 4 5 95 76 90 76 76 78 76 79 87 72 79 68 77 75 71 78

Anthony Davison: Bootstrap Methods and their Application,

6 78 78 78 86 87 81 73 67 75 82 83

7 82 79 81 79 77 79 79 78 79 82 76 73 64

8 84 86 85 82 77 76 77 80 83 81 78 78 78

40

Several samples

Gravity data

40

60

g

80

100

Figure: Gravity series boxplots, showing a reduction in variance, a shift in location, and possible outliers.

1

2

3

4

5

6

7

8

series

Anthony Davison: Bootstrap Methods and their Application,

41

Several samples

Gravity data ◮

Eight series of measurements of gravitational acceleration g made May 1934 – July 1935 in Washington DC

◮

Data are deviations from 9.8 m/s2 in units of 10−3 cm/s2

◮

Goal: Estimate g and provide confidence interval

◮

Weighted combination of series averages and its variance estimate !−1 P8 8 2 X y × n /s i i i=1 i , ni /s2i , V = θb = P 8 2 n /s i i=1 i i=1 giving

θb = 78.54,

V = 0.592

and 95% confidence interval of θb ± 1.96V 1/2 = (77.5, 79.8)

Anthony Davison: Bootstrap Methods and their Application,

42

Several samples

Gravity data: Bootstrap ◮

◮

Apply stratified (re)sampling to series, taking each series as a separate stratum. Compute θb∗ , V ∗ for simulated data Confidence interval based on θb∗ − θb Z ∗ = ∗1/2 , V whose distribution is approximated by simulations z1∗ = giving

◮

∗ −θ b θbR θb1∗ − θb ∗ , . . . , z = , R 1/2 1/2 V1 VR

∗ ∗ (θb − V 1/2 z((R+1)(1−α)) , θb − V 1/2 z((R+1)α) )

For 95% limits set α = 0.025, so with R = 999 use ∗ , z∗ z(25) (975) , giving interval (77.1, 80.3).

Anthony Davison: Bootstrap Methods and their Application,

43

Several samples

..

5 0 -5

z*

.

0

2

-2

0

2

Quantiles of Standard Normal

... . .. .. .. ................... . . . .................................................................. . . .. .. . .... . .. .. ... ... .. . .. ....................................................................................... . . .............. ....... . . . ................................................................. ... . . . . . . . . . . . . . .. . ................................................................ .... . .. .. ...... ..... ..... ........... . . . ..... . .. ........................ . . . . . . ............................. ........ . . . ... . . ........... ..... .................. .. .. .. . . . . .. . . ..... ...... .... . . . . .. . . . . . .. . . . . . . .

.. . ... ..................... .... .. ................... ........................................ . .......................... .................... . ......................................................... .. . . . . . . . . .............................................. . . ...................................... . . . ................................. .... . . . . ... ... .. .................................... . .. . . . ...... ..... .... . . . . . ... ... ..... . . . . .

77

78

79

80

81

sqrt(v*)

Quantiles of Standard Normal

0.1 0.2 0.3 0.4 0.5 0.6 0.7

sqrt(v*)

.

.. . .... ............ ........... ..................... ......... ... ...... ........ ............. .......... .... ....... .... ............. .... . . . . . . . . ... ........

. -2

0.1 0.2 0.3 0.4 0.5 0.6 0.7

.

-10

.. .. .... .... ..... ..... . . . . . .... ......... ............ ........ ........... ......... ...... .... ......... . . . . . . .... ..... .... ...

-15

79 77

78

t*

80

81

Figure: Summary plots for 999 nonparametric bootstrap simulations. Top: normal probability plots of t∗ and z ∗ = (t∗ − t)/v ∗1/2 . Line on the top left has intercept t and slope v 1/2 , line on the top right has intercept zero and unit slope. Bottom: the smallest t∗r also has the smallest v ∗ , leading to an outlying value of z ∗ .

. -15

-10

t*

Anthony Davison: Bootstrap Methods and their Application,

-5

0

5

z*

44

Several samples

Key points

◮

For several independent samples, implement bootstrap by stratified sampling independently from each

◮

Same basic ideas apply for confidence intervals

Anthony Davison: Bootstrap Methods and their Application,

45

Variance estimation

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

46

Variance estimation

Variance estimation

◮

◮

Variance estimate V needed for certain types of confidence interval (esp. studentized bootstrap) Ways to compute this: • • • •

double bootstrap delta method nonparametric delta method jackknife

Anthony Davison: Bootstrap Methods and their Application,

47

Variance estimation

Double bootstrap

◮ ◮

◮

◮ ◮

Bootstrap sample y1∗ , . . . , yn∗ and corresponding estimate θb∗ Take Q second-level bootstrap samples y1∗∗ , . . . , yn∗∗ from y1∗ , . . . , yn∗ , giving corresponding bootstrap estimates ∗∗ , θb1∗∗ , . . . , θbQ

Compute variance estimate V as sample variance of ∗∗ θb1∗∗ , . . . , θbQ

Requires total R(Q + 1) resamples, so could be expensive . Often reasonable to take Q = 50 for variance estimation, so need O(50 × 1000) resamples — nowadays not infeasible

Anthony Davison: Bootstrap Methods and their Application,

48

Variance estimation

Delta method ◮

◮ ◮

◮ ◮ ◮

◮

Computation of variance formulae for functions of averages and other estimators b estimates ψ = g(θ), and θb ∼. N (θ, σ 2 /n) Suppose ψb = g(θ) Then provided g′ (θ) 6= 0, have (D2) b = g(θ) + O(n−1 ) E(ψ) b = σ 2 g′ (θ)2 /n + O(n−3/2 ) var(ψ)

. 2 ′ b2 b = Then var(ψ) σ b g (θ) /n = V Example (D3): θb = Y , ψb = log θb . b = Variance stabilisation (D4): if var(θ) S(θ)2 /n, find . b =constant transformation h such that var{h(θ)} Extends to multivariate estimators, and to ψb = g(θb1 , . . . , θbd )

Anthony Davison: Bootstrap Methods and their Application,

49

Variance estimation

Anthony Davison: Bootstrap Methods and their Application,

50

Variance estimation

Nonparametric delta method ◮ ◮

Write parameter θ = t(F ) as functional of distribution F General approximation: n 1 X . L(Yj ; F )2 . V = VL = 2 n j=1

◮

◮

L(y; F ) is influence function value for θ for observation at y when distribution is F : t {(1 − ε)F + εHy } − t(F ) , L(y; F ) = lim ε→0 ε where Hy puts unit mass at y. Close link to robustness. Empirical versions of L(y; F ) and VL are X lj2 , lj = L(yj ; Fb), vL = n−2 usually obtained by analytic/numerical differentation.

Anthony Davison: Bootstrap Methods and their Application,

51

Variance estimation

Computation of lj ◮ ◮

Write θb in weighted form, differentiate with respect to ε Sample average: X 1X yj = wj yj θb = y = n wj ≡1/n Change weights:

wj 7→ ε + (1 − ε) n1 ,

wi 7→ (1 − ε) n1 ,

i 6= j

so (D5)

◮

y 7→ y ε = εyj + (1 − ε)y = ε(yj − y) + y, P −1 2 giving lj = yj − y and vL = n12 (yj − y)2 = n−1 n n s Interpretation: lj is standardized change in y when increase mass on yj

Anthony Davison: Bootstrap Methods and their Application,

52

Variance estimation

Nonparametric delta method: Ratio ◮

Population F (u, x) with y = (u, x) and Z Z θ = t(F ) = x dF (u, x)/ u dF (u, x), sample version is

◮

θb = t(Fb) =

Z

x dFb(u, x)/

Z

u dFb(u, x) = x/u

Then using chain rule of differentiation (D6),

giving

b j )/u, lj = (xj − θu

1 X vL = 2 n

b j xj − θu u

Anthony Davison: Bootstrap Methods and their Application,

!2 53

Variance estimation

Handedness data: Correlation coefficient ◮

Correlation coefficient P may be written as a function of −1 averages xu = n xj uj etc.: θb = n

xu − x u

(x2

− x2 )(u2

−

o1/2 ,

u2 )

from which empirical influence values lj can be derived ◮

In this example (and for others involving only averages), nonparametric delta method is equivalent to delta method

◮

Get vL = 0.029

◮

for comparison with v = 0.043 obtained by bootstrapping. b — as here! vL typically underestimates var(θ)

Anthony Davison: Bootstrap Methods and their Application,

54

Variance estimation

Delta methods: Comments ◮

Can be applied to many complex statistics

◮

Delta method variances often underestimate true variances:

◮

b vL < var(θ)

Can be applied automatically (numerical differentation) if algorithm for θb written in weighted form, e.g. X xw = wj xj , wj ≡ 1/n for x

and vary weights successively for j = 1, . . . , n, setting X wj = wi + ε, i 6= j, wi = 1

for ε = 1/(100n) and using the definition as derivative Anthony Davison: Bootstrap Methods and their Application,

55

Variance estimation

Jackknife ◮

Approximation to empirical influence values given by lj ≈ ljack,j = (n − 1)(θb − θb−j ),

where θb−j is value of θb computed from sample y1 , . . . , yj−1 ,

◮

◮ ◮

yj+1 , . . . , yn

Jackknife bias and variance estimates are X 1 1X 2 − nb2jack ljack,j , vjack = ljack,j bjack = − n n(n − 1)

Requires n + 1 calculations of θb b with Corresponds to numerical differentiation of θ, ε = −1/(n − 1)

Anthony Davison: Bootstrap Methods and their Application,

56

Variance estimation

Key points

◮

Several methods available for estimation of variances

◮

Needed for some types of confidence interval

◮

Most general method is double bootstrap: can be expensive

◮

Delta methods rely on linear expansion, can be applied numerically or analytically

◮

Jackknife gives approximation to delta method, can fail for rough statistics

Anthony Davison: Bootstrap Methods and their Application,

57

Tests

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

58

Tests

Ingredients

◮

Ingredients for testing problems: • data y1 , . . . , yn ; • model M0 to be tested; • test statistic t = t(y1 , . . . , yn ), with large values giving evidence against M0 , and observed value tobs

◮

◮

P-value, pobs = Pr(T ≥ tobs | M0 ) measures evidence against M0 — small pobs indicates evidence against M0 . Difficulties: • pobs may depend upon ‘nuisance’ parameters, those of M0 ; • pobs often hard to calculate.

Anthony Davison: Bootstrap Methods and their Application,

59

Tests

Examples ◮

Balsam-fir seedlings in 5 × 5 quadrats — Poisson sample? 0 0 1 4 3

◮

1 2 1 1 1

2 0 1 2 4

3 2 1 5 3

4 4 4 2 1

3 2 1 0 0

4 3 5 3 0

2 3 2 2 2

2 4 2 1 7

1 2 3 1 0

Two-way layout: row-column independence? 1 2 0 1 0

2 0 1 1 1

2 0 1 2 1

1 2 1 0 1

1 3 2 0 1

Anthony Davison: Bootstrap Methods and their Application,

0 0 7 0 0

1 0 3 1 0 60

Tests

Estimation of pobs ◮

◮

◮

Estimate pobs by simulation from fitted null hypothesis c0 . model M Algorithm: for r = 1, . . . , R: c0 ; • simulate data set y1∗ , . . . , yn∗ from M • calculate test statistic t∗r from y1∗ , . . . , yn∗ .

Calculate simulation estimate

of ◮

pb =

1 + #{t∗r ≥ tobs } 1+R

c0 ). pbobs = Pr(T ≥ tobs | M

Simulation and statstical errors:

pb ≈ pbobs ≈ pobs

Anthony Davison: Bootstrap Methods and their Application,

61

Tests

Handedness data: Test of independence ◮ ◮

◮ ◮

◮

◮

Are dnan and hand positively associated? Take T = θb (correlation coefficient), with tobs = 0.509; this is large in case of positive association (one-sided test) Null hypothesis of independence: F (u, x) = F1 (u)F2 (x) Take bootstrap samples independently from Fb1 ≡ (dnan1 , . . . , dnann ) and from Fb2 ≡ (hand1 , . . . , handn ), then put them together to get bootstrap data (dnan∗1 , hand∗1 ), . . . , (dnan∗n , hand∗n ). b so With R = 9, 999 get 18 values of θb∗ ≥ θ, 1 + 18 pb = = 0.0019 : 1 + 9999 hand and dnan seem to be positively associated To test positive or negative association (two-sided test), b gives pb = 0.004. take T = |θ|:

Anthony Davison: Bootstrap Methods and their Application,

62

Tests

c0 Handedness data: Bootstrap from M

1

2

2

1

2

15

20

0.5

3

1 2

1

1102 3 4 3

25

30 dnan

1

35

40

0.0

1

hand 4 5

6

2

Probability density 1.0 1.5 2.0

7

2.5

8

1

45

Anthony Davison: Bootstrap Methods and their Application,

−0.6

−0.2 0.2 Test statistic

0.6

63

Tests

Choice of R ◮

◮

◮

Take R big enough to get small standard error for pb, typically ≥ 100, using binomial calculation: 1 + #{t∗r ≥ tobs } var(b p) = var 1+R pobs (1 − pobs ) 1 . Rpobs (1 − pobs ) = = 2 R R . so if pobs = 0.05 need R ≥ 1900 for 10% relative error (D7) . Can choose R sequentially: e.g. if pb = 0.06 and R = 99, can augment R enough to diminish standard error. Taking R too small lowers power of test.

Anthony Davison: Bootstrap Methods and their Application,

64

Tests

Duality with confidence interval ◮

Often unclear how to impose null hypothesis on sampling scheme

◮

General approach based on duality between confidence interval I1−α = (θα , ∞) and test of null hypothesis θ = θ0

◮

Reject null hypothesis at level α in favour of alternative θ > θ0 , if θ0 < θα

◮

Handedness data: θ0 = 0 6∈ I0.95 , but θ0 = 0 ∈ I0.99 , so estimated significance level 0.01 < pb < 0.05: weaker evidence than before Extends to tests of θ = θ0 against other alternatives:

◮

• if θ0 ∈ 6 I 1−α = (−∞, θα ), have evidence that θ < θ0 • if θ0 ∈ 6 I1−2α = (θα , θα ), have evidence that θ 6= θ0

Anthony Davison: Bootstrap Methods and their Application,

65

Tests

Pivot tests ◮ ◮

◮ ◮

◮

◮

Equivalent to use of confidence intervals Idea: use (approximate) pivot such as Z = (θb − θ)/V 1/2 as statistic to test θ = θ0 Observed value of pivot is zobs = (θb − θ0 )/V 1/2 Significance level is ! θb − θ = Pr(Z ≥ zobs | M0 ) ≥ zobs | M0 Pr V 1/2 = Pr(Z ≥ zobs | F ) . = Pr(Z ≥ zobs | Fb)

Compare observed zobs with simulated distribution of ∗1/2 , without needing to construct null b Z ∗ = (θb∗ − θ)/V c0 hypothesis model M Use of (approximate) pivot is essential for success

Anthony Davison: Bootstrap Methods and their Application,

66

Tests

Example: Handedness data ◮

◮

Test zero correlation (θ0 = 0), not independence; θb = 0.509, V = 0.1702 : θb − θ0 0.509 − 0 zobs = 1/2 = = 2.99 0.170 V Observed significance level is 1 + #{zr∗ ≥ zobs } 1 + 215 = = 0.0216 1+R 1 + 9999

0.0

Probability density 0.1 0.2 0.3

pb =

−6

−4

−2 0 2 Test statistic

4

Anthony Davison: Bootstrap Methods and their Application,

6

67

Tests

Exact tests ◮

Problem: bootstrap estimate is c0 ) 6= Pr(T ≥ t | M0 ) = pobs , pbobs = Pr(T ≥ tobs | M

so estimate the wrong thing ◮

In some cases can eliminate parameters from null hypothesis distribution by conditioning on sufficient statistic

◮

Then simulate from conditional distribution

◮

More generally, can use Metropolis–Hastings algorithm to simulate from conditional distribution (below)

Anthony Davison: Bootstrap Methods and their Application,

68

Tests

Example: Fir data ◮ ◮

◮

◮

◮

iid

Data Y1 , . . . , Yn ∼ Pois(λ), with λ unknown Poisson model has E(Y ) = var(Y ) = λ: base test of overdispersion on X . T = (Yj − Y )2 /Y ∼ χ2n−1 ;

observed value is tobs = 55.15 Unconditional significance level:

c0 , λ) Pr(T ≥ tobs | M

Condition on value w of sufficient statistic W = c0 , W = w), pobs = Pr(T ≥ tobs | M

P

Yj :

independent of λ, owing to sufficiency of W Exact test: simulate from P multinomial distribution of Y1 , . . . , Yn given W = Yj = 107.

Anthony Davison: Bootstrap Methods and their Application,

69

Tests

Example: Fir data

0.04

Figure: Simulation results for dispersion test. Left panel: R = 999 values of the dispersion statistic t∗ obtained under multinomial sampling: the data value is tobs = 55.15 and pb = 0.25. Right panel: chi-squared plot of ordered values of t∗ , dotted line shows χ249 approximation to null conditional distribution.

80 60 40 20

0.0

0.01

0.02

0.03

dispersion statistic t*

.

20

40

60

.

... . .... ...... ........ . . . ..... ...... ...... ..... ....... . . . . . ...... ...... ..... ...... ...... . . . . .. ..... ... . ..

80

t*

Anthony Davison: Bootstrap Methods and their Application,

30

40

50

60

70

80

chi-squared quantiles

70

Tests

Handedness data: Permutation test ◮ ◮ ◮

◮

Are dnan and hand related? Take T = θb (correlation coefficient) again Impose null hypothesis of independence: F (u, x) = F1 (u)F2 (x), but condition so that marginal distributions Fb1 and Fb2 are held fixed under resampling plan — permutation test Take resamples of form (dnan1 , hand1∗ ), . . . , (dnann , handn∗ )

◮

◮

where (1∗ , . . . , n∗ ) is random permutation of (1, . . . , n) Doing this with R = 9, 999 gives one- and two-sided significance probabilities of 0.002, 0.003 Typically values of pb very similar to those for corresponding bootstrap test

Anthony Davison: Bootstrap Methods and their Application,

71

Tests

1

0.0

2

0.5

3

hand 4 5

6

7

Probability density 1.0 1.5 2.0 2.5

8

3.0

Handedness data: Permutation resample

15

20

25

30 dnan

35

40

45

Anthony Davison: Bootstrap Methods and their Application,

−0.6

−0.2 0.2 Test statistic

0.6

72

Tests

Contingency table 1 2 0 1 0 ◮

2 0 1 1 1

2 0 1 2 1

1 2 1 0 1

1 3 2 0 1

0 0 7 0 0

1 0 3 1 0

Are row and column classifications independent: Pr(row i, column j) = Pr(row i) × Pr(column j)?

◮

Standard test statistic for independence is T =

X (yij − ybij )2 , ybij i,j

◮

ybij =

yi· y·j y··

.

Get Pr(χ224 ≥ 38.53) = 0.048, but is T ∼ χ224 ?

Anthony Davison: Bootstrap Methods and their Application,

73

Tests

Exact tests: Contingency table ◮

For exact test, need to simulate distribution of T conditional on sufficient statistics — row and column totals

◮

Algorithm (D8) for conditional simulation: 1. choose two rows j1 < j2 and two columns k1 < k2 at random 2. generate new values from hypergeometric distribution of yj1 k1 conditional on margins of 2 × 2 table y j 1 k1 y j 1 k2 y j 2 k1 y j 2 k2 3. compute test statistic T ∗ every I = 100 iterations, say

◮

Compare observed value tobs = 38.53 with simulated T ∗ — . get pb = 0.08

Anthony Davison: Bootstrap Methods and their Application,

74

Tests

Key points

◮ ◮

Tests can be performed using resampling/simulation Must take account of null hypothesis, by • modifying sampling scheme to satisfy null hypothesis • inverting confidence interval (pivot test)

◮

Can use Monte Carlo simulation to get approximations to exact tests — simulate from null distribution of data, conditional on observed value of sufficient statistic

◮

Sometimes obtain permutation tests — very similar to bootstrap tests

Anthony Davison: Bootstrap Methods and their Application,

75

Regression

Motivation

Basic notions

Confidence intervals

Several samples

Variance estimation

Tests

Regression

Anthony Davison: Bootstrap Methods and their Application,

76

Regression

Linear regression ◮

Independent data (x1 , y1 ), . . ., (xn , yn ) with yj = xTj β + εj ,

◮

b leverages hj , residuals Least squares estimates β, ej =

◮

εj ∼ (0, σ 2 )

yj − xTj βb

(1 − hj

)1/2

.

∼

(0, σ 2 )

Design matrix X is experimental ancillary — should be held fixed if possible, as b = σ 2 (X T X)−1 var(β)

if model y = Xβ + ε correct

Anthony Davison: Bootstrap Methods and their Application,

77

Regression

Linear regression: Resampling schemes ◮

Two main resampling schemes

◮

Model-based resampling: yj∗ = xTj βb + ε∗j ,

ε∗j ∼ EDF(e1 − e, . . . , en − e)

• Fixes design but not robust to model failure • Assumes εj sampled from population ◮

Case resampling: (xj , yj )∗ ∼ EDF{(x1 , y1 ), . . . , (xn , yn )} • Varying design X but robust • Assumes (xj , yj ) sampled from population • Usually design variation no problem; can prove awkward in designed experiments and when design sensitive.

Anthony Davison: Bootstrap Methods and their Application,

78

Regression

Cement data Table: Cement data: y is the heat (calories per gram of cement) evolved while samples of cement set. The covariates are percentages by weight of four constituents, tricalciumaluminate x1 , tricalcium silicate x2 , tetracalcium alumino ferrite x3 and dicalcium silicate x4 .

1 2 3 4 5 6 7 8 9 10 11 12 13

x1 7 1 11 11 7 11 3 1 2 21 1 11 10

x2 26 29 56 31 52 55 71 31 54 47 40 66 68

x3 6 15 8 8 6 9 17 22 18 4 23 9 8

x4 60 52 20 47 33 22 6 44 22 26 34 12 12

Anthony Davison: Bootstrap Methods and their Application,

y 78.5 74.3 104.3 87.6 95.9 109.2 102.7 72.5 93.1 115.9 83.8 113.3 109.4

79

Regression

Cement data ◮

Fit linear model y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + ε and apply case resampling

◮

◮

. Covariates compositional: x1 + · · · + x4 = 100% so X almost collinear — smallest eigenvalue of X T X is l5 = 0.0012 Plot of βb1∗ against smallest eigenvalue of X ∗T X ∗ reveals that var∗ (βb∗ ) strongly variable 1

◮

Relevant subset for case resampling — post-stratification of output based on l5∗ ?

Anthony Davison: Bootstrap Methods and their Application,

80

Regression

10

10

Cement data

. .

5 beta2hat*

-5

.

.

5 10

-10

-10

.

. .

1

. . . . .. ........ .. . .............................. . . .. .................................. ........ ................ .. . .. ... ... ................................................................................ .. . . ... ..................... . ... . . .......... ............... .. .. .. ... .. . .

. . . 0

5 0 -5

beta1hat*

. .. . . . ..... ........... . . . . .. ..... ......................................................... .... . . ...... . .. .. .............................................................. . . .. ... .. ................................................................. . . .. . .... ................ . . ... ..... . . . .. . . .

50

500

1

smallest eigenvalue

Anthony Davison: Bootstrap Methods and their Application,

. 5 10

50

500

smallest eigenvalue

81

Regression

Cement data

Table: Standard errors of linear regression coefficients for cement data. Theoretical and error resampling assume homoscedasticity. Resampling results use R = 999 samples, but last two rows are based only on those samples with the middle 500 and the largest 800 values of ℓ∗1 .

Normal theory Model-based resampling, R = 999 Case resampling, all R = 999 Case resampling, largest 800

Anthony Davison: Bootstrap Methods and their Application,

βb0 70.1 66.3 108.5 67.3

βb1 0.74 0.70 1.13 0.77

βb2 0.72 0.69 1.12 0.69

82

Regression

Survival data dose x survival % y

◮ ◮

117.5 44.000 55.000

235.0 16.000 13.000

470.0 4.000 1.960 6.120

705.0 0.500 0.320

940.0 0.110 0.015 0.019

1410 0.700 0.006

Data on survival % for rats at different doses Linear model: log(survival) = β0 + β1 dose •• ••

• • • 200

• 600

• 1000

0

•

• • -2

• •

• • •

•

-4

20

30

log survival %

2

•

0

10

survival %

40

50

4

•

••

• 1400

• 200

dose

Anthony Davison: Bootstrap Methods and their Application,

600

1000

1400

dose

83

Regression

Survival data ◮ ◮

-0.004 -0.008 -0.012

estimated slope

0.0

◮

Case resampling Replication of outlier: none (0), once (1), two or more (•). Model-based sampling including residual would lead to change in intercept but not slope.

•

•

•

• •• • • •

• 1 • • •• • •• 1 ••• • 1 •••••• ••• ••• 111 1 11 1 ••• • •••••••• ••••• 11 1111 1 • 1 1 1 1 1 1 1 1 1 11 11111111 1111111 1 11 1 1111 11111 1 1 1 1 1 1 1 1 1 1 0 1 1 11 1 1 0 0 0 01000 00 0 0 0 0 0 0 00000 0 0 0 0 000 0 0 0 00 0 000 0 0 0000 00 0 00 000 0000 00 0 0

0.5

1.0

1.5

2.0

2.5

3.0

sum of squares Anthony Davison: Bootstrap Methods and their Application,

84

Regression

Generalized linear model ◮

Response may be binomial, Poisson, gamma, normal, . . . yj

◮

mean µj , variance φV (µj ),

where g(µj ) = xTj β is linear predictor; g(·) is link function. b fitted values µ MLE β, bj , Pearson residuals rP j =

◮

∼

yj − µ bj

{V (b µj )(1 − hj )}

.

1/2

∼

(0, φ).

Bootstrapped responses

yj∗ = µ bj + V (b µj )1/2 ε∗j

where ε∗j ∼ EDF(rP 1 − rP , . . . , rP n − rP ). However • possible that yj∗ 6∈ {0, 1, 2, . . . , } • rP j not exchangeable, so may need stratified resampling Anthony Davison: Bootstrap Methods and their Application,

85

Regression

AIDS data ◮

Log-linear model: number of reports in row j and column k follows Poisson distribution with mean µjk = exp(αj + βk )

◮

Log link function g(µjk ) = log µjk = αj + βk and variance function var(Yjk ) = φ × V (µjk ) = 1 × µjk

◮

Pearson residuals: rjk =

◮

Yjk − µ bjk {b µjk (1 − hjk )}1/2

Model-based simulation:

1/2

∗ =µ bjk + µ bjk ε∗jk Yjk

Anthony Davison: Bootstrap Methods and their Application,

86

Regression

Diagnosis period Year 1988

1989

1990

1991

1992

Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Reporting-delay interval (quarters): 0† 31 26 31 36 32 15 34 38 31 32 49 44 41 56 53 63 71 95 76 67

1 80 99 95 77 92 92 104 101 124 132 107 153 137 124 175 135 161 178 181

2 16 27 35 20 32 14 29 34 47 36 51 41 29 39 35 24 48 39

3 9 9 13 26 10 27 31 18 24 10 17 16 33 14 17 23 25

4 3 8 18 11 12 22 18 9 11 9 15 11 7 12 13 12

5 2 11 4 3 19 21 8 15 15 7 8 6 11 7 11

Anthony Davison: Bootstrap Methods and their Application,

6 8 3 6 8 12 12 6 6 8 6 9 5 6 10

··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ···

≥14 6 3 3 2 2 1

Total reports to end of 1992 174 211 224 205 224 219 253 233 281 245 260 285 271 263 306 258 310 318 273 133

87

Regression

AIDS data

0

rP

2

4

1984 1986 1988 1990 1992

-2

+ ++ + + + + ++ ++ + + + + +++ ++ ++ + ++++ + + + +++ +++++

-4

300 200 0

100

Diagnoses

400

6

500

◮

Poisson two-way model deviance 716.5 on 413 df — indicates strong overdispersion: φ > 1, so Poisson model implausible Residuals highly inhomogeneous — exchangeability doubtful

-6

◮

0

1

2

3

4

5

skewness Anthony Davison: Bootstrap Methods and their Application,

88

Regression

AIDS data: Prediction intervals ◮

To estimate prediction error: ∗ • simulate complete table yjk ; ∗ • estimate parameters from incomplete yjk • get estimated row totals and ‘truth’ X b∗ X ∗ ∗ ∗ µ b∗+,j = eαbj = eβk , y+,j yjk . k unobs

• Prediction error

k unobs

∗ y+,j −µ b∗+,j ∗1/2

µ b+,j

studentized so more nearly pivotal. ◮

Form prediction intervals from R replicates.

Anthony Davison: Bootstrap Methods and their Application,

89

Regression

AIDS data: Resampling plans ◮

Resampling schemes: • • • •

parametric simulation, fitted Poisson model parametric simulation, fitted negative binomial model nonparametric resampling of rP stratified nonparametric resampling of rP

◮

Stratification based on skewness of residuals, equivalent to stratifying original data by values of fitted means

◮

Take strata for which µ bjk < 1,

1≤µ bjk < 2,

Anthony Davison: Bootstrap Methods and their Application,

µ bjk ≥ 2 90

Regression

AIDS data: Results ◮

Deviance/df ratios for the sampling schemes, R = 999.

◮

Poisson variation inadequate.

◮

95% prediction limits.

600

5 4 3

1.0

1.5

2.0

2.5

0.0

0.5

1.0

1.5

2.0

2.5

deviance/df

nonparametric

stratified nonparametric

300

5 4

200

3

+ + ++ + +

1

2

+

+ ++ + + + ++ + +

1989 1990 1991 1992

0

0

1

2

3

4

5

6

deviance/df

Diagnoses

0.5

6

0.0

400

500

2 1 0

0

1

2

3

4

5

6

negative binomial

6

poisson

0.0

0.5

1.0

1.5

deviance/df

2.0

2.5

0.0

0.5

1.0

1.5

2.0

2.5

deviance/df

Anthony Davison: Bootstrap Methods and their Application,

91

Regression

AIDS data: Semiparametric model ◮

More realistic: generalized additive model µjk = exp {α(j) + βk } ,

600 500 Diagnoses

400 300

+ ++ + + + ++ ++ + + +++++ + +++ + +++++ + + + +++ + ++++

+ +

200

0

Diagnoses

◮

where α(j) is locally-fitted smooth. Same resampling plans as before 95% intervals now generally narrower and shifted upwards 100 200 300 400 500 600

◮

+ + ++ + +

1984 1986 1988 1990 1992

Anthony Davison: Bootstrap Methods and their Application,

+ ++ + + + ++ +

1989 1990 1991 1992

92

Regression

Key points

◮ ◮

Key assumption: independence of cases Two main resampling schemes for regression settings: • Model-based • Case resampling

◮

Intermediate schemes possible

◮

Can help to reduce dependence on assumptions needed for regression model

◮

These two basic approaches also used for more complex settings (time series, . . .), where data are dependent

Anthony Davison: Bootstrap Methods and their Application,

93

Regression

Summary ◮ ◮

Bootstrap: simulation methods for frequentist inference. Useful when • standard assumptions invalid (n small, data not normal, . . .); • standard problem has non-standard twist; • complex problem has no (reliable) theory; • or (almost) anywhere else.

◮

Have described • • • •

basic ideas confidence intervals tests some approaches for regression

Anthony Davison: Bootstrap Methods and their Application,

94

Regression

Books ◮

Chernick (1999) Bootstrap Methods: A Practicioner’s Guide. Wiley

◮

Davison and Hinkley (1997) Bootstrap Methods and their Application. Cambridge University Press

◮

Efron and Tibshirani (1993) An Introduction to the Bootstrap. Chapman & Hall

◮

Hall (1992) The Bootstrap and Edgeworth Expansion. Springer

◮

Lunneborg (2000) Data Analysis by Resampling: Concepts and Applications. Duxbury Press

◮

Manly (1997) Randomisation, Bootstrap and Monte Carlo Methods in Biology. Chapman & Hall

◮

Shao and Tu (1995) The Jackknife and Bootstrap. Springer

Anthony Davison: Bootstrap Methods and their Application,

95