estimator complex. â» data complex. â» sample size small. â» model non-standard. Anthony Davison: Bootstrap Methods and their Application,. 7 ...
Bootstrap Methods and their Application Anthony Davison c
2006 A short course based on the book ‘Bootstrap Methods and their Application’, by A. C. Davison and D. V. Hinkley c
Cambridge University Press, 1997
Anthony Davison: Bootstrap Methods and their Application,
1
Summary ◮ ◮
Bootstrap: simulation methods for frequentist inference. Useful when • standard assumptions invalid (n small, data not normal, . . .); • standard problem has non-standard twist; • complex problem has no (reliable) theory; • or (almost) anywhere else.
◮
Aim to describe • • • •
basic ideas; confidence intervals; tests; some approaches for regression
Anthony Davison: Bootstrap Methods and their Application,
2
Motivation
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Anthony Davison: Bootstrap Methods and their Application,
3
Motivation
AIDS data ◮ ◮ ◮
◮
◮
UK AIDS diagnoses 1988–1992. Reporting delay up to several years! Problem: predict state of epidemic at end 1992, with realistic statement of uncertainty. Simple model: number of reports in row j and column k Poisson, mean µjk = exp(αj + βk ). Unreported diagnoses in period j Poisson, mean X X µjk = exp(αj ) exp(βk ). k unobs
◮
k unobs
Estimate total unreported diagnoses from period j by replacing αj and βk by MLEs. • How reliable are these predictions? • How sensible is the Poisson model?
Anthony Davison: Bootstrap Methods and their Application,
4
Motivation
Diagnosis period Year 1988
1989
1990
1991
1992
Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Reporting-delay interval (quarters): 0† 31 26 31 36 32 15 34 38 31 32 49 44 41 56 53 63 71 95 76 67
1 80 99 95 77 92 92 104 101 124 132 107 153 137 124 175 135 161 178 181
2 16 27 35 20 32 14 29 34 47 36 51 41 29 39 35 24 48 39
3 9 9 13 26 10 27 31 18 24 10 17 16 33 14 17 23 25
4 3 8 18 11 12 22 18 9 11 9 15 11 7 12 13 12
5 2 11 4 3 19 21 8 15 15 7 8 6 11 7 11
Anthony Davison: Bootstrap Methods and their Application,
6 8 3 6 8 12 12 6 6 8 6 9 5 6 10
··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ···
≥14 6 3 3 2 2 1
Total reports to end of 1992 174 211 224 205 224 219 253 233 281 245 260 285 271 263 306 258 310 318 273 133
5
Motivation
AIDS data ◮
◮
300
+ ++ + ++ + + + + + + + + + + + + ++ + ++ + +++
200 0
100
Diagnoses
400
500
◮
Data (+), fits of simple model (solid), complex model (dots) Variance formulae could be derived — painful! but useful? Effects of overdispersion, complex model, . . .?
+ ++ + ++ + ++++ 1984
1986
1988
1990
1992
Time
Anthony Davison: Bootstrap Methods and their Application,
6
Motivation
Goal
Find reliable assessment of uncertainty when ◮
estimator complex
◮
data complex
◮
sample size small
◮
model non-standard
Anthony Davison: Bootstrap Methods and their Application,
7
Basic notions
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Anthony Davison: Bootstrap Methods and their Application,
8
Basic notions
Handedness data Table: Data from a study of handedness; hand is an integer measure of handedness, and dnan a genetic measure. Data due to Dr Gordon Claridge, University of Oxford.
1 2 3 4 5 6 7 8 9 10
dnan 13 18 20 21 21 24 24 27 28 28
hand 1 1 3 1 1 1 1 1 1 2
11 12 13 14 15 16 17 18 19 20
dnan 28 28 28 28 28 28 29 29 29 29
hand 1 2 1 4 1 1 1 1 1 2
21 22 23 24 25 26 27 28 29 30
dnan 29 29 29 30 30 30 30 31 31 31
Anthony Davison: Bootstrap Methods and their Application,
hand 2 1 1 1 1 2 1 1 1 1
31 32 33 34 35 36 37
dnan 31 31 33 33 34 41 44
hand 1 2 6 1 1 4 8
9
Basic notions
Handedness data Figure: Scatter plot of handedness data. The numbers show the multiplicities of the observations.
7
8
1
hand
5
6
1
4
1
3
1
2211
2 1
1
1
1 15
2 20
15534
2 25
11
30 dnan
Anthony Davison: Bootstrap Methods and their Application,
35
40
45
10
Basic notions
Handedness data
◮
◮ ◮
Is there dependence between dnan and hand for these n = 37 individuals? Sample product-moment correlation coefficient is θb = 0.509. Standard confidence interval (based on bivariate normal population) gives 95% CI (0.221, 0.715).
◮
Data not bivariate normal!
◮
What is the status of the interval? Can we do better?
Anthony Davison: Bootstrap Methods and their Application,
11
Basic notions
Frequentist inference ◮ ◮ ◮
Estimator θb for unknown parameter θ. iid
Statistical model: data y1 , . . . , yn ∼ F , unknown Handedness data • • •
◮
◮
y = (dnan, hand) F puts probability mass on subset of R2 θb is correlation coefficient
Key issue: what is variability of θb when samples are repeatedly taken from F ? Imagine F known — could answer question by • analytical (mathematical) calculation • simulation
Anthony Davison: Bootstrap Methods and their Application,
12
Basic notions
Simulation with F known ◮
For r = 1, . . . , R: iid
◮
• generate random sample y1∗ , . . . , yn∗ ∼ F ; • compute θbr using y1∗ , . . . , yn∗ ;
Output after R iterations:
∗ θb1∗ , θb2∗ , . . . , θbR
◮
∗ to estimate sampling distribution of θ b Use θb1∗ , θb2∗ , . . . , θbR (histogram, density estimate, . . .)
◮
In practice R is finite, so some error remains
◮
If R → ∞, then get perfect match to theoretical calculation (if available): Monte Carlo error disappears completely
Anthony Davison: Bootstrap Methods and their Application,
13
Basic notions
Handedness data: Fitted bivariate normal model Figure: Contours of bivariate normal distribution fitted to handedness data; parameter estimates are µ b1 = 28.5, µ b2 = 1.7, σ b1 = 5.4, σ b2 = 1.5, ρb = 0.509. The data are also shown. 10 0.020 1
8
0.015 1
hand
6
1
4
0.010
1
1 0.005
2211
2 1
1
2
15534 11
2
0
0.000 10
15
20
25
30
35
40
45
dnan
Anthony Davison: Bootstrap Methods and their Application,
14
Basic notions
Handedness data: Parametric bootstrap samples
10 8
10 8
8
10
Figure: Left: original data, with jittered vertical values. Centre and right: two samples generated from the fitted bivariate normal distribution.
0
10
20
30 dnan
40
50
hand 0
2
4
hand 4 2 0
0
2
4
hand
6
Correlation 0.533
6
Correlation 0.753
6
Correlation 0.509
0
10
20
30 dnan
Anthony Davison: Bootstrap Methods and their Application,
40
50
0
10
20
30
40
50
dnan
15
Basic notions
Handedness data: Correlation coefficient Figure: Bootstrap distributions with R = 10000. Left: simulation from fitted bivariate normal distribution. Right: simulation from the data by bootstrap resampling. The lines show the theoretical probability density function of the correlation coefficient under sampling from a fitted bivariate
3.5 3.0 Probability density 1.0 1.5 2.0 2.5 0.5 0.0
0.0
0.5
Probability density 1.0 1.5 2.0 2.5
3.0
3.5
normal distribution.
−0.5
0.0 0.5 Correlation coefficient
1.0
−0.5
Anthony Davison: Bootstrap Methods and their Application,
0.0 0.5 Correlation coefficient
1.0
16
Basic notions
F unknown ◮
Replace unknown F by estimate Fb obtained
• parametrically — e.g. maximum likelihood or robust fit of distribution F (y) = F (y; ψ) (normal, exponential, bivariate normal, . . .) • nonparametrically — using empirical distribution function (EDF) of original data y1 , . . . , yn , which puts mass 1/n on each of the yj
◮
◮
Algorithm: For r = 1, . . . , R: iid • generate random sample y1∗ , . . . , yn∗ ∼ Fb ; ∗ ∗ • compute θbr using y1 , . . . , yn ;
Output after R iterations:
∗ θb1∗ , θb2∗ , . . . , θbR Anthony Davison: Bootstrap Methods and their Application,
17
Basic notions
Nonparametric bootstrap ◮
iid Bootstrap (re)sample y1∗ , . . . , yn∗ ∼ Fb, where Fb is EDF of y1 , . . . , yn
• Repetitions will occur!
◮ ◮
◮ ◮ ◮
Compute bootstrap θb∗ using y1∗ , . . . , yn∗ .
For handedness data take n = 37 pairs y ∗ = (dnan, hand)∗ with equal probabilities 1/37 and replacement from original pairs (dnan, hand) Repeat this R times, to get θb∗ , . . . , θb∗ 1
R
See picture
Results quite different from parametric simulation — why?
Anthony Davison: Bootstrap Methods and their Application,
18
Basic notions
Handedness data: Bootstrap samples
10
15
20
25 30 dnan
35
40
45
6 1
2
3
hand 4 5
6 hand 4 5 3 2 1
1
2
3
hand 4 5
6
7
Correlation 0.491
7
Correlation 0.733
7
Correlation 0.509
8
8
8
Figure: Left: original data, with jittered vertical values. Centre and right: two bootstrap samples, with jittered vertical values.
10
15
20
25 30 dnan
35
Anthony Davison: Bootstrap Methods and their Application,
40
45
10
15
20
25 30 dnan
35
40
45
19
Basic notions
Using the θb∗ ◮
◮ ◮
b Bootstrap replicates θbr∗ used to estimate properties of θ. Write θ = θ(F ) to emphasize dependence on F Bias of θb as estimator of θ is iid β(F ) = E(θb | y1 , . . . , yn ∼ F ) − θ(F )
estimated by replacing unknown F by known estimate Fb: ◮
iid β(Fb) = E(θb | y1 , . . . , yn ∼ Fb ) − θ(Fb )
Replace theoretical expectation E() by empirical average: β(Fb) ≈ b = θb∗ − θb = R−1
Anthony Davison: Bootstrap Methods and their Application,
R X r=1
θbr∗ − θb 20
Basic notions
◮
Estimate variance ν(F ) = var(θb | F ) by
1 X b∗ b∗ 2 v= θ −θ R − 1 r=1 r R
◮
◮
Estimate quantiles of θb by taking empirical quantiles of ∗ θb1∗ , . . . , θbR
For handedness data, 10,000 replicates shown earlier give b = −0.046,
v = 0.043 = 0.2052
Anthony Davison: Bootstrap Methods and their Application,
21
Basic notions
Handedness data b Figure: Summaries of the θb∗ . Left: histogram, with vertical line showing θ. Right: normal Q–Q plot of θb∗ .
0.0
−0.5
0.5
t* 0.0
Density 1.0 1.5
0.5
2.0
2.5
Histogram of t
−0.5
0.0 t*
0.5
1.0
−4 −2 0 2 4 Quantiles of Standard Normal
Anthony Davison: Bootstrap Methods and their Application,
22
Basic notions
How many bootstraps? ◮
◮ ◮
Must estimate moments and quantiles of θb and derived quantities. Nowadays often feasible to take R ≥ 5000 Need R ≥ 100 to estimate bias, variance, etc. Need R ≫ 100, prefer R ≥ 1000 to estimate quantiles needed for 95% confidence intervals
2 Z* 0 −2 −4
−4
−2
Z* 0
2
4
R=999
4
R=199
−4
−2 0 2 Theoretical Quantiles
4
Anthony Davison: Bootstrap Methods and their Application,
−4
−2 0 2 Theoretical Quantiles
4
23
Basic notions
Key points ◮
Estimator is algorithm • • • •
◮
◮
applied to original data y1 , . . . , yn gives original θb applied to simulated data y1∗ , . . . , yn∗ gives θb∗ θb can be of (almost) any complexity for more sophisticated ideas (later) to work, θb must often be smooth function of data
Sample is used to estimate F
• Fb ≈ F — heroic assumption
Simulation replaces theoretical calculation • removes need for mathematical skill • does not remove need for thought • check code very carefully — garbage in, garbage out!
◮
Two sources of error • statistical (Fb 6= F ) — reduce by thought • simulation (R 6= ∞) — reduce by taking R large (enough)
Anthony Davison: Bootstrap Methods and their Application,
24
Confidence intervals
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Anthony Davison: Bootstrap Methods and their Application,
25
Confidence intervals
Normal confidence intervals ◮
◮
◮
. If θb approximately normal, then θb ∼ N (θ + β, ν), where θb has bias β = β(F ) and variance ν = ν(F ) If β, ν known, (1 − 2α) confidence interval for θ would be (D1) θb − β ± zα ν 1/2 ,
where Φ(zα ) = α. Replace β, ν by estimates: . . β(F ) = β(Fb) = b = θb∗ − θb X . . (θbr∗ − θb∗ )2 , ν(F ) = ν(Fb) = v = (R − 1)−1 r
◮
giving (1 − 2α) interval θb − b ± zα v 1/2 . Handedness data: R = 10, 000, b = −0.046, v = 0.2052 , α = 0.025, zα = −1.96, so 95% CI is (0.147, 0.963)
Anthony Davison: Bootstrap Methods and their Application,
26
Confidence intervals
Normal confidence intervals Normal approximation reliable? Transformation needed? b b − θ)}: Here are plots for ψb = 12 log{(1 + θ)/(1
0.0
0.5
Density 1.0
1.5
◮
Transformed correlation coefficient −0.5 0.0 0.5 1.0 1.5
◮
−0.5 0.0 0.5 1.0 1.5 Transformed correlation coefficient
−4
Anthony Davison: Bootstrap Methods and their Application,
−2 0 2 Quantiles of Standard Normal
4
27
Confidence intervals
Normal confidence intervals ◮
Correlation coefficient: try Fisher’s z transformation: b = ψb = ψ(θ)
for which compute bψ = R−1
R X r=1
◮
◮ ◮
1 2
b b log{(1 + θ)/(1 − θ)} 1 X b∗ b∗ 2 , ψ −ψ R − 1 r=1 r R
b ψbr∗ − ψ,
vψ =
(1 − 2α) confidence interval for θ is o n o n 1/2 1/2 , ψ −1 ψb − bψ − zα vψ ψ −1 ψb − bψ − z1−α vψ For handedness data, get (0.074, 0.804) But how do we choose a transformation in general?
Anthony Davison: Bootstrap Methods and their Application,
28
Confidence intervals
Pivots ◮
◮
◮
◮
∗ mimic effect of sampling from Hope properties of θb1∗ , . . . , θbR original model. Amounts to faith in ‘substitution principle’: may replace unknown F with known Fb — false in general, but often more nearly true for pivots. Pivot is combination of data and parameter whose distribution is independent of underlying model. iid
Canonical example: Y1 , . . . , Yn ∼ N (µ, σ 2 ). Then Z=
◮
Y −µ (S 2 /n)1/2
∼
tn−1 ,
for all µ, σ 2 — so independent of the underlying distribution, provided this is normal Exact pivot generally unavailable in nonparametric case.
Anthony Davison: Bootstrap Methods and their Application,
29
Confidence intervals
Studentized statistic ◮ ◮ ◮
Idea: generalize Student t statistic to bootstrap setting Requires variance V for θb computed from y1 , . . . , yn Analogue of Student t statistic: θb − θ V 1/2 If the quantiles zα of Z known, then Z=
◮
θb − θ Pr (zα ≤ Z ≤ z1−α ) = Pr zα ≤ 1/2 ≤ z1−α V
!
= 1 − 2α
(zα no longer denotes a normal quantile!) implies that Pr θb − V 1/2 z1−α ≤ θ ≤ θb − V 1/2 zα = 1 − 2α
so (1 − 2α) confidence interval is (θb − V 1/2 z1−α , θb − V 1/2 zα )
Anthony Davison: Bootstrap Methods and their Application,
30
Confidence intervals ◮
Bootstrap sample gives (θb∗ , V ∗ ) and hence Z∗ =
◮
b V ): R bootstrap copies of (θ, (θb1∗ , V1∗ ),
and corresponding z1∗ =
◮
◮
θb1∗ − θb
, ∗1/2
V1 ∗ ∗ z1 , . . . , zR
θb∗ − θb V ∗1/2
(θb2∗ , V2∗ ),
z2∗ =
θb2∗ − θb
, ∗1/2
V2
...,
...,
∗ (θbR , VR∗ ) ∗ zR =
∗ −θ b θbR . ∗1/2 VR
Use to estimate distribution of Z — for example, ∗ < · · · < z∗ order statistics z(1) (R) used to estimate quantiles Get (1 − 2α) confidence interval ∗ θb − V 1/2 z((1−α)(R+1)) ,
Anthony Davison: Bootstrap Methods and their Application,
∗ θb − V 1/2 z(α(R+1))
31
Confidence intervals
Why Studentize? ◮
D
Studentize, so Z −→ N (0, 1) as n → ∞. Edgeworth series: Pr(Z ≤ z | F ) = Φ(z) + n−1/2 a(z)φ(z) + O(n−1 ); a(·) even quadratic polynomial.
◮
Corresponding expansion for Z ∗ is Pr(Z ∗ ≤ z | Fb) = Φ(z) + n−1/2 b a(z)φ(z) + Op (n−1 ).
Typically b a(z) = a(z) + Op (n−1/2 ), so
Pr(Z ∗ ≤ z | Fb) − Pr(Z ≤ z | F ) = Op (n−1 ).
Anthony Davison: Bootstrap Methods and their Application,
32
Confidence intervals
◮
D
If don’t studentize, Z = (θb − θ) −→ N (0, ν). Then z z z Pr(Z ≤ z | F ) = Φ 1/2 +n−1/2 a′ 1/2 φ 1/2 +O(n−1 ) ν ν ν and
Pr(Z ∗ ≤ z | Fb) = Φ
z z z −1/2 ′ +n b a φ 1/2 +Op (n−1 ). νb1/2 νb1/2 νb
Typically νb = ν + Op (n−1/2 ), giving
◮
Pr(Z ∗ ≤ z | Fb ) − Pr(Z ≤ z | F ) = Op (n−1/2 ).
Thus use of Studentized Z reduces error from Op (n−1/2 ) to Op (n−1 ) — better than using large-sample asymptotics, for which error is usually Op (n−1/2 ).
Anthony Davison: Bootstrap Methods and their Application,
33
Confidence intervals
Other confidence intervals ◮
◮
Problem for studentized intervals: must obtain V , intervals not scale-invariant Simpler approaches: • Basic bootstrap interval: treat θb − θ as pivot, get ∗ b θb − (θb((R+1)(1−α)) − θ),
∗ b θb − (θb((R+1)α) − θ).
∗ • Percentile interval: use empirical quantiles of θb1∗ , . . . , θbR :
◮
∗ , θb((R+1)α)
∗ . θb((R+1)(1−α))
Improved percentile intervals (BCa , ABC, . . .) • Replace percentile interval with ∗ θb((R+1)α ′) ,
∗ θb((R+1)(1−α ′′ )) ,
where α′ , α′′ chosen to improve properties. • Scale-invariant. Anthony Davison: Bootstrap Methods and their Application,
34
Confidence intervals
Handedness data ◮
95% confidence intervals for correlation coefficient θ, R = 10, 000: Normal 0.147 0.963 Percentile −0.047 0.758 Basic 0.262 1.043 ′ ′′ BCa (α = 0.0485, α = 0.0085) 0.053 0.792 Student 0.030 1.206 Basic (transformed) Student (transformed)
◮
0.131 0.824 −0.277 0.868
Transformation is essential here!
Anthony Davison: Bootstrap Methods and their Application,
35
Confidence intervals
General comparison ◮
Bootstrap confidence intervals usually too short — leads to under-coverage
◮
Normal and basic intervals depend on scale.
◮
Percentile interval often too short but is scale-invariant. Studentized intervals give best coverage overall, but
◮
• depend on scale, can be sensitive to V • length can be very variable • best on transformed scale, where V is approximately constant ◮
Improved percentile intervals have same error in principle as studentized intervals, but often shorter — so lower coverage
Anthony Davison: Bootstrap Methods and their Application,
36
Confidence intervals
Caution ◮
◮
.. .
10
◮
Edgeworth theory OK for smooth statistics — beware rough statistics: must check output. Bootstrap of median theoretically OK, but very sensitive to sample values in practice. Role for smoothing?
0 -10
-5
T*-t for medians
5
. .. ... . ........ ......... ............. . ............................. .......... ................ ............ .............. ........ . ......... . ...... . . -2
0
2
Quantiles of Standard Normal
Anthony Davison: Bootstrap Methods and their Application,
37
Confidence intervals
Key points
◮
Numerous procedures available for ‘automatic’ construction of confidence intervals
◮
Computer does the work
◮
Need R ≥ 1000 in most cases
◮
Generally such intervals are a bit too short
◮
Must examine output to check if assumptions (e.g. smoothness of statistic) satisfied
◮
May need variance estimate V — see later
Anthony Davison: Bootstrap Methods and their Application,
38
Several samples
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Anthony Davison: Bootstrap Methods and their Application,
39
Several samples
Gravity data Table: Measurements of the acceleration due to gravity, g, given as deviations from 980,000 ×10−3 cms−2 , in units of cms−2 × 10−3 .
1 76 82 83 54 35 46 87 68
2 87 95 98 100 109 109 100 81 75 68 67
3 105 83 76 75 51 76 93 75 62
Series 4 5 95 76 90 76 76 78 76 79 87 72 79 68 77 75 71 78
Anthony Davison: Bootstrap Methods and their Application,
6 78 78 78 86 87 81 73 67 75 82 83
7 82 79 81 79 77 79 79 78 79 82 76 73 64
8 84 86 85 82 77 76 77 80 83 81 78 78 78
40
Several samples
Gravity data
40
60
g
80
100
Figure: Gravity series boxplots, showing a reduction in variance, a shift in location, and possible outliers.
1
2
3
4
5
6
7
8
series
Anthony Davison: Bootstrap Methods and their Application,
41
Several samples
Gravity data ◮
Eight series of measurements of gravitational acceleration g made May 1934 – July 1935 in Washington DC
◮
Data are deviations from 9.8 m/s2 in units of 10−3 cm/s2
◮
Goal: Estimate g and provide confidence interval
◮
Weighted combination of series averages and its variance estimate !−1 P8 8 2 X y × n /s i i i=1 i , ni /s2i , V = θb = P 8 2 n /s i i=1 i i=1 giving
θb = 78.54,
V = 0.592
and 95% confidence interval of θb ± 1.96V 1/2 = (77.5, 79.8)
Anthony Davison: Bootstrap Methods and their Application,
42
Several samples
Gravity data: Bootstrap ◮
◮
Apply stratified (re)sampling to series, taking each series as a separate stratum. Compute θb∗ , V ∗ for simulated data Confidence interval based on θb∗ − θb Z ∗ = ∗1/2 , V whose distribution is approximated by simulations z1∗ = giving
◮
∗ −θ b θbR θb1∗ − θb ∗ , . . . , z = , R 1/2 1/2 V1 VR
∗ ∗ (θb − V 1/2 z((R+1)(1−α)) , θb − V 1/2 z((R+1)α) )
For 95% limits set α = 0.025, so with R = 999 use ∗ , z∗ z(25) (975) , giving interval (77.1, 80.3).
Anthony Davison: Bootstrap Methods and their Application,
43
Several samples
..
5 0 -5
z*
.
0
2
-2
0
2
Quantiles of Standard Normal
... . .. .. .. ................... . . . .................................................................. . . .. .. . .... . .. .. ... ... .. . .. ....................................................................................... . . .............. ....... . . . ................................................................. ... . . . . . . . . . . . . . .. . ................................................................ .... . .. .. ...... ..... ..... ........... . . . ..... . .. ........................ . . . . . . ............................. ........ . . . ... . . ........... ..... .................. .. .. .. . . . . .. . . ..... ...... .... . . . . .. . . . . . .. . . . . . . .
.. . ... ..................... .... .. ................... ........................................ . .......................... .................... . ......................................................... .. . . . . . . . . .............................................. . . ...................................... . . . ................................. .... . . . . ... ... .. .................................... . .. . . . ...... ..... .... . . . . . ... ... ..... . . . . .
77
78
79
80
81
sqrt(v*)
Quantiles of Standard Normal
0.1 0.2 0.3 0.4 0.5 0.6 0.7
sqrt(v*)
.
.. . .... ............ ........... ..................... ......... ... ...... ........ ............. .......... .... ....... .... ............. .... . . . . . . . . ... ........
. -2
0.1 0.2 0.3 0.4 0.5 0.6 0.7
.
-10
.. .. .... .... ..... ..... . . . . . .... ......... ............ ........ ........... ......... ...... .... ......... . . . . . . .... ..... .... ...
-15
79 77
78
t*
80
81
Figure: Summary plots for 999 nonparametric bootstrap simulations. Top: normal probability plots of t∗ and z ∗ = (t∗ − t)/v ∗1/2 . Line on the top left has intercept t and slope v 1/2 , line on the top right has intercept zero and unit slope. Bottom: the smallest t∗r also has the smallest v ∗ , leading to an outlying value of z ∗ .
. -15
-10
t*
Anthony Davison: Bootstrap Methods and their Application,
-5
0
5
z*
44
Several samples
Key points
◮
For several independent samples, implement bootstrap by stratified sampling independently from each
◮
Same basic ideas apply for confidence intervals
Anthony Davison: Bootstrap Methods and their Application,
45
Variance estimation
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Anthony Davison: Bootstrap Methods and their Application,
46
Variance estimation
Variance estimation
◮
◮
Variance estimate V needed for certain types of confidence interval (esp. studentized bootstrap) Ways to compute this: • • • •
double bootstrap delta method nonparametric delta method jackknife
Anthony Davison: Bootstrap Methods and their Application,
47
Variance estimation
Double bootstrap
◮ ◮
◮
◮ ◮
Bootstrap sample y1∗ , . . . , yn∗ and corresponding estimate θb∗ Take Q second-level bootstrap samples y1∗∗ , . . . , yn∗∗ from y1∗ , . . . , yn∗ , giving corresponding bootstrap estimates ∗∗ , θb1∗∗ , . . . , θbQ
Compute variance estimate V as sample variance of ∗∗ θb1∗∗ , . . . , θbQ
Requires total R(Q + 1) resamples, so could be expensive . Often reasonable to take Q = 50 for variance estimation, so need O(50 × 1000) resamples — nowadays not infeasible
Anthony Davison: Bootstrap Methods and their Application,
48
Variance estimation
Delta method ◮
◮ ◮
◮ ◮ ◮
◮
Computation of variance formulae for functions of averages and other estimators b estimates ψ = g(θ), and θb ∼. N (θ, σ 2 /n) Suppose ψb = g(θ) Then provided g′ (θ) 6= 0, have (D2) b = g(θ) + O(n−1 ) E(ψ) b = σ 2 g′ (θ)2 /n + O(n−3/2 ) var(ψ)
. 2 ′ b2 b = Then var(ψ) σ b g (θ) /n = V Example (D3): θb = Y , ψb = log θb . b = Variance stabilisation (D4): if var(θ) S(θ)2 /n, find . b =constant transformation h such that var{h(θ)} Extends to multivariate estimators, and to ψb = g(θb1 , . . . , θbd )
Anthony Davison: Bootstrap Methods and their Application,
49
Variance estimation
Anthony Davison: Bootstrap Methods and their Application,
50
Variance estimation
Nonparametric delta method ◮ ◮
Write parameter θ = t(F ) as functional of distribution F General approximation: n 1 X . L(Yj ; F )2 . V = VL = 2 n j=1
◮
◮
L(y; F ) is influence function value for θ for observation at y when distribution is F : t {(1 − ε)F + εHy } − t(F ) , L(y; F ) = lim ε→0 ε where Hy puts unit mass at y. Close link to robustness. Empirical versions of L(y; F ) and VL are X lj2 , lj = L(yj ; Fb), vL = n−2 usually obtained by analytic/numerical differentation.
Anthony Davison: Bootstrap Methods and their Application,
51
Variance estimation
Computation of lj ◮ ◮
Write θb in weighted form, differentiate with respect to ε Sample average: X 1X yj = wj yj θb = y = n wj ≡1/n Change weights:
wj 7→ ε + (1 − ε) n1 ,
wi 7→ (1 − ε) n1 ,
i 6= j
so (D5)
◮
y 7→ y ε = εyj + (1 − ε)y = ε(yj − y) + y, P −1 2 giving lj = yj − y and vL = n12 (yj − y)2 = n−1 n n s Interpretation: lj is standardized change in y when increase mass on yj
Anthony Davison: Bootstrap Methods and their Application,
52
Variance estimation
Nonparametric delta method: Ratio ◮
Population F (u, x) with y = (u, x) and Z Z θ = t(F ) = x dF (u, x)/ u dF (u, x), sample version is
◮
θb = t(Fb) =
Z
x dFb(u, x)/
Z
u dFb(u, x) = x/u
Then using chain rule of differentiation (D6),
giving
b j )/u, lj = (xj − θu
1 X vL = 2 n
b j xj − θu u
Anthony Davison: Bootstrap Methods and their Application,
!2 53
Variance estimation
Handedness data: Correlation coefficient ◮
Correlation coefficient P may be written as a function of −1 averages xu = n xj uj etc.: θb = n
xu − x u
(x2
− x2 )(u2
−
o1/2 ,
u2 )
from which empirical influence values lj can be derived ◮
In this example (and for others involving only averages), nonparametric delta method is equivalent to delta method
◮
Get vL = 0.029
◮
for comparison with v = 0.043 obtained by bootstrapping. b — as here! vL typically underestimates var(θ)
Anthony Davison: Bootstrap Methods and their Application,
54
Variance estimation
Delta methods: Comments ◮
Can be applied to many complex statistics
◮
Delta method variances often underestimate true variances:
◮
b vL < var(θ)
Can be applied automatically (numerical differentation) if algorithm for θb written in weighted form, e.g. X xw = wj xj , wj ≡ 1/n for x
and vary weights successively for j = 1, . . . , n, setting X wj = wi + ε, i 6= j, wi = 1
for ε = 1/(100n) and using the definition as derivative Anthony Davison: Bootstrap Methods and their Application,
55
Variance estimation
Jackknife ◮
Approximation to empirical influence values given by lj ≈ ljack,j = (n − 1)(θb − θb−j ),
where θb−j is value of θb computed from sample y1 , . . . , yj−1 ,
◮
◮ ◮
yj+1 , . . . , yn
Jackknife bias and variance estimates are X 1 1X 2 − nb2jack ljack,j , vjack = ljack,j bjack = − n n(n − 1)
Requires n + 1 calculations of θb b with Corresponds to numerical differentiation of θ, ε = −1/(n − 1)
Anthony Davison: Bootstrap Methods and their Application,
56
Variance estimation
Key points
◮
Several methods available for estimation of variances
◮
Needed for some types of confidence interval
◮
Most general method is double bootstrap: can be expensive
◮
Delta methods rely on linear expansion, can be applied numerically or analytically
◮
Jackknife gives approximation to delta method, can fail for rough statistics
Anthony Davison: Bootstrap Methods and their Application,
57
Tests
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Anthony Davison: Bootstrap Methods and their Application,
58
Tests
Ingredients
◮
Ingredients for testing problems: • data y1 , . . . , yn ; • model M0 to be tested; • test statistic t = t(y1 , . . . , yn ), with large values giving evidence against M0 , and observed value tobs
◮
◮
P-value, pobs = Pr(T ≥ tobs | M0 ) measures evidence against M0 — small pobs indicates evidence against M0 . Difficulties: • pobs may depend upon ‘nuisance’ parameters, those of M0 ; • pobs often hard to calculate.
Anthony Davison: Bootstrap Methods and their Application,
59
Tests
Examples ◮
Balsam-fir seedlings in 5 × 5 quadrats — Poisson sample? 0 0 1 4 3
◮
1 2 1 1 1
2 0 1 2 4
3 2 1 5 3
4 4 4 2 1
3 2 1 0 0
4 3 5 3 0
2 3 2 2 2
2 4 2 1 7
1 2 3 1 0
Two-way layout: row-column independence? 1 2 0 1 0
2 0 1 1 1
2 0 1 2 1
1 2 1 0 1
1 3 2 0 1
Anthony Davison: Bootstrap Methods and their Application,
0 0 7 0 0
1 0 3 1 0 60
Tests
Estimation of pobs ◮
◮
◮
Estimate pobs by simulation from fitted null hypothesis c0 . model M Algorithm: for r = 1, . . . , R: c0 ; • simulate data set y1∗ , . . . , yn∗ from M • calculate test statistic t∗r from y1∗ , . . . , yn∗ .
Calculate simulation estimate
of ◮
pb =
1 + #{t∗r ≥ tobs } 1+R
c0 ). pbobs = Pr(T ≥ tobs | M
Simulation and statstical errors:
pb ≈ pbobs ≈ pobs
Anthony Davison: Bootstrap Methods and their Application,
61
Tests
Handedness data: Test of independence ◮ ◮
◮ ◮
◮
◮
Are dnan and hand positively associated? Take T = θb (correlation coefficient), with tobs = 0.509; this is large in case of positive association (one-sided test) Null hypothesis of independence: F (u, x) = F1 (u)F2 (x) Take bootstrap samples independently from Fb1 ≡ (dnan1 , . . . , dnann ) and from Fb2 ≡ (hand1 , . . . , handn ), then put them together to get bootstrap data (dnan∗1 , hand∗1 ), . . . , (dnan∗n , hand∗n ). b so With R = 9, 999 get 18 values of θb∗ ≥ θ, 1 + 18 pb = = 0.0019 : 1 + 9999 hand and dnan seem to be positively associated To test positive or negative association (two-sided test), b gives pb = 0.004. take T = |θ|:
Anthony Davison: Bootstrap Methods and their Application,
62
Tests
c0 Handedness data: Bootstrap from M
1
2
2
1
2
15
20
0.5
3
1 2
1
1102 3 4 3
25
30 dnan
1
35
40
0.0
1
hand 4 5
6
2
Probability density 1.0 1.5 2.0
7
2.5
8
1
45
Anthony Davison: Bootstrap Methods and their Application,
−0.6
−0.2 0.2 Test statistic
0.6
63
Tests
Choice of R ◮
◮
◮
Take R big enough to get small standard error for pb, typically ≥ 100, using binomial calculation: 1 + #{t∗r ≥ tobs } var(b p) = var 1+R pobs (1 − pobs ) 1 . Rpobs (1 − pobs ) = = 2 R R . so if pobs = 0.05 need R ≥ 1900 for 10% relative error (D7) . Can choose R sequentially: e.g. if pb = 0.06 and R = 99, can augment R enough to diminish standard error. Taking R too small lowers power of test.
Anthony Davison: Bootstrap Methods and their Application,
64
Tests
Duality with confidence interval ◮
Often unclear how to impose null hypothesis on sampling scheme
◮
General approach based on duality between confidence interval I1−α = (θα , ∞) and test of null hypothesis θ = θ0
◮
Reject null hypothesis at level α in favour of alternative θ > θ0 , if θ0 < θα
◮
Handedness data: θ0 = 0 6∈ I0.95 , but θ0 = 0 ∈ I0.99 , so estimated significance level 0.01 < pb < 0.05: weaker evidence than before Extends to tests of θ = θ0 against other alternatives:
◮
• if θ0 ∈ 6 I 1−α = (−∞, θα ), have evidence that θ < θ0 • if θ0 ∈ 6 I1−2α = (θα , θα ), have evidence that θ 6= θ0
Anthony Davison: Bootstrap Methods and their Application,
65
Tests
Pivot tests ◮ ◮
◮ ◮
◮
◮
Equivalent to use of confidence intervals Idea: use (approximate) pivot such as Z = (θb − θ)/V 1/2 as statistic to test θ = θ0 Observed value of pivot is zobs = (θb − θ0 )/V 1/2 Significance level is ! θb − θ = Pr(Z ≥ zobs | M0 ) ≥ zobs | M0 Pr V 1/2 = Pr(Z ≥ zobs | F ) . = Pr(Z ≥ zobs | Fb)
Compare observed zobs with simulated distribution of ∗1/2 , without needing to construct null b Z ∗ = (θb∗ − θ)/V c0 hypothesis model M Use of (approximate) pivot is essential for success
Anthony Davison: Bootstrap Methods and their Application,
66
Tests
Example: Handedness data ◮
◮
Test zero correlation (θ0 = 0), not independence; θb = 0.509, V = 0.1702 : θb − θ0 0.509 − 0 zobs = 1/2 = = 2.99 0.170 V Observed significance level is 1 + #{zr∗ ≥ zobs } 1 + 215 = = 0.0216 1+R 1 + 9999
0.0
Probability density 0.1 0.2 0.3
pb =
−6
−4
−2 0 2 Test statistic
4
Anthony Davison: Bootstrap Methods and their Application,
6
67
Tests
Exact tests ◮
Problem: bootstrap estimate is c0 ) 6= Pr(T ≥ t | M0 ) = pobs , pbobs = Pr(T ≥ tobs | M
so estimate the wrong thing ◮
In some cases can eliminate parameters from null hypothesis distribution by conditioning on sufficient statistic
◮
Then simulate from conditional distribution
◮
More generally, can use Metropolis–Hastings algorithm to simulate from conditional distribution (below)
Anthony Davison: Bootstrap Methods and their Application,
68
Tests
Example: Fir data ◮ ◮
◮
◮
◮
iid
Data Y1 , . . . , Yn ∼ Pois(λ), with λ unknown Poisson model has E(Y ) = var(Y ) = λ: base test of overdispersion on X . T = (Yj − Y )2 /Y ∼ χ2n−1 ;
observed value is tobs = 55.15 Unconditional significance level:
c0 , λ) Pr(T ≥ tobs | M
Condition on value w of sufficient statistic W = c0 , W = w), pobs = Pr(T ≥ tobs | M
P
Yj :
independent of λ, owing to sufficiency of W Exact test: simulate from P multinomial distribution of Y1 , . . . , Yn given W = Yj = 107.
Anthony Davison: Bootstrap Methods and their Application,
69
Tests
Example: Fir data
0.04
Figure: Simulation results for dispersion test. Left panel: R = 999 values of the dispersion statistic t∗ obtained under multinomial sampling: the data value is tobs = 55.15 and pb = 0.25. Right panel: chi-squared plot of ordered values of t∗ , dotted line shows χ249 approximation to null conditional distribution.
80 60 40 20
0.0
0.01
0.02
0.03
dispersion statistic t*
.
20
40
60
.
... . .... ...... ........ . . . ..... ...... ...... ..... ....... . . . . . ...... ...... ..... ...... ...... . . . . .. ..... ... . ..
80
t*
Anthony Davison: Bootstrap Methods and their Application,
30
40
50
60
70
80
chi-squared quantiles
70
Tests
Handedness data: Permutation test ◮ ◮ ◮
◮
Are dnan and hand related? Take T = θb (correlation coefficient) again Impose null hypothesis of independence: F (u, x) = F1 (u)F2 (x), but condition so that marginal distributions Fb1 and Fb2 are held fixed under resampling plan — permutation test Take resamples of form (dnan1 , hand1∗ ), . . . , (dnann , handn∗ )
◮
◮
where (1∗ , . . . , n∗ ) is random permutation of (1, . . . , n) Doing this with R = 9, 999 gives one- and two-sided significance probabilities of 0.002, 0.003 Typically values of pb very similar to those for corresponding bootstrap test
Anthony Davison: Bootstrap Methods and their Application,
71
Tests
1
0.0
2
0.5
3
hand 4 5
6
7
Probability density 1.0 1.5 2.0 2.5
8
3.0
Handedness data: Permutation resample
15
20
25
30 dnan
35
40
45
Anthony Davison: Bootstrap Methods and their Application,
−0.6
−0.2 0.2 Test statistic
0.6
72
Tests
Contingency table 1 2 0 1 0 ◮
2 0 1 1 1
2 0 1 2 1
1 2 1 0 1
1 3 2 0 1
0 0 7 0 0
1 0 3 1 0
Are row and column classifications independent: Pr(row i, column j) = Pr(row i) × Pr(column j)?
◮
Standard test statistic for independence is T =
X (yij − ybij )2 , ybij i,j
◮
ybij =
yi· y·j y··
.
Get Pr(χ224 ≥ 38.53) = 0.048, but is T ∼ χ224 ?
Anthony Davison: Bootstrap Methods and their Application,
73
Tests
Exact tests: Contingency table ◮
For exact test, need to simulate distribution of T conditional on sufficient statistics — row and column totals
◮
Algorithm (D8) for conditional simulation: 1. choose two rows j1 < j2 and two columns k1 < k2 at random 2. generate new values from hypergeometric distribution of yj1 k1 conditional on margins of 2 × 2 table y j 1 k1 y j 1 k2 y j 2 k1 y j 2 k2 3. compute test statistic T ∗ every I = 100 iterations, say
◮
Compare observed value tobs = 38.53 with simulated T ∗ — . get pb = 0.08
Anthony Davison: Bootstrap Methods and their Application,
74
Tests
Key points
◮ ◮
Tests can be performed using resampling/simulation Must take account of null hypothesis, by • modifying sampling scheme to satisfy null hypothesis • inverting confidence interval (pivot test)
◮
Can use Monte Carlo simulation to get approximations to exact tests — simulate from null distribution of data, conditional on observed value of sufficient statistic
◮
Sometimes obtain permutation tests — very similar to bootstrap tests
Anthony Davison: Bootstrap Methods and their Application,
75
Regression
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Anthony Davison: Bootstrap Methods and their Application,
76
Regression
Linear regression ◮
Independent data (x1 , y1 ), . . ., (xn , yn ) with yj = xTj β + εj ,
◮
b leverages hj , residuals Least squares estimates β, ej =
◮
εj ∼ (0, σ 2 )
yj − xTj βb
(1 − hj
)1/2
.
∼
(0, σ 2 )
Design matrix X is experimental ancillary — should be held fixed if possible, as b = σ 2 (X T X)−1 var(β)
if model y = Xβ + ε correct
Anthony Davison: Bootstrap Methods and their Application,
77
Regression
Linear regression: Resampling schemes ◮
Two main resampling schemes
◮
Model-based resampling: yj∗ = xTj βb + ε∗j ,
ε∗j ∼ EDF(e1 − e, . . . , en − e)
• Fixes design but not robust to model failure • Assumes εj sampled from population ◮
Case resampling: (xj , yj )∗ ∼ EDF{(x1 , y1 ), . . . , (xn , yn )} • Varying design X but robust • Assumes (xj , yj ) sampled from population • Usually design variation no problem; can prove awkward in designed experiments and when design sensitive.
Anthony Davison: Bootstrap Methods and their Application,
78
Regression
Cement data Table: Cement data: y is the heat (calories per gram of cement) evolved while samples of cement set. The covariates are percentages by weight of four constituents, tricalciumaluminate x1 , tricalcium silicate x2 , tetracalcium alumino ferrite x3 and dicalcium silicate x4 .
1 2 3 4 5 6 7 8 9 10 11 12 13
x1 7 1 11 11 7 11 3 1 2 21 1 11 10
x2 26 29 56 31 52 55 71 31 54 47 40 66 68
x3 6 15 8 8 6 9 17 22 18 4 23 9 8
x4 60 52 20 47 33 22 6 44 22 26 34 12 12
Anthony Davison: Bootstrap Methods and their Application,
y 78.5 74.3 104.3 87.6 95.9 109.2 102.7 72.5 93.1 115.9 83.8 113.3 109.4
79
Regression
Cement data ◮
Fit linear model y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + ε and apply case resampling
◮
◮
. Covariates compositional: x1 + · · · + x4 = 100% so X almost collinear — smallest eigenvalue of X T X is l5 = 0.0012 Plot of βb1∗ against smallest eigenvalue of X ∗T X ∗ reveals that var∗ (βb∗ ) strongly variable 1
◮
Relevant subset for case resampling — post-stratification of output based on l5∗ ?
Anthony Davison: Bootstrap Methods and their Application,
80
Regression
10
10
Cement data
. .
5 beta2hat*
-5
.
.
5 10
-10
-10
.
. .
1
. . . . .. ........ .. . .............................. . . .. .................................. ........ ................ .. . .. ... ... ................................................................................ .. . . ... ..................... . ... . . .......... ............... .. .. .. ... .. . .
. . . 0
5 0 -5
beta1hat*
. .. . . . ..... ........... . . . . .. ..... ......................................................... .... . . ...... . .. .. .............................................................. . . .. ... .. ................................................................. . . .. . .... ................ . . ... ..... . . . .. . . .
50
500
1
smallest eigenvalue
Anthony Davison: Bootstrap Methods and their Application,
. 5 10
50
500
smallest eigenvalue
81
Regression
Cement data
Table: Standard errors of linear regression coefficients for cement data. Theoretical and error resampling assume homoscedasticity. Resampling results use R = 999 samples, but last two rows are based only on those samples with the middle 500 and the largest 800 values of ℓ∗1 .
Normal theory Model-based resampling, R = 999 Case resampling, all R = 999 Case resampling, largest 800
Anthony Davison: Bootstrap Methods and their Application,
βb0 70.1 66.3 108.5 67.3
βb1 0.74 0.70 1.13 0.77
βb2 0.72 0.69 1.12 0.69
82
Regression
Survival data dose x survival % y
◮ ◮
117.5 44.000 55.000
235.0 16.000 13.000
470.0 4.000 1.960 6.120
705.0 0.500 0.320
940.0 0.110 0.015 0.019
1410 0.700 0.006
Data on survival % for rats at different doses Linear model: log(survival) = β0 + β1 dose •• ••
• • • 200
• 600
• 1000
0
•
• • -2
• •
• • •
•
-4
20
30
log survival %
2
•
0
10
survival %
40
50
4
•
••
• 1400
• 200
dose
Anthony Davison: Bootstrap Methods and their Application,
600
1000
1400
dose
83
Regression
Survival data ◮ ◮
-0.004 -0.008 -0.012
estimated slope
0.0
◮
Case resampling Replication of outlier: none (0), once (1), two or more (•). Model-based sampling including residual would lead to change in intercept but not slope.
•
•
•
• •• • • •
• 1 • • •• • •• 1 ••• • 1 •••••• ••• ••• 111 1 11 1 ••• • •••••••• ••••• 11 1111 1 • 1 1 1 1 1 1 1 1 1 11 11111111 1111111 1 11 1 1111 11111 1 1 1 1 1 1 1 1 1 1 0 1 1 11 1 1 0 0 0 01000 00 0 0 0 0 0 0 00000 0 0 0 0 000 0 0 0 00 0 000 0 0 0000 00 0 00 000 0000 00 0 0
0.5
1.0
1.5
2.0
2.5
3.0
sum of squares Anthony Davison: Bootstrap Methods and their Application,
84
Regression
Generalized linear model ◮
Response may be binomial, Poisson, gamma, normal, . . . yj
◮
mean µj , variance φV (µj ),
where g(µj ) = xTj β is linear predictor; g(·) is link function. b fitted values µ MLE β, bj , Pearson residuals rP j =
◮
∼
yj − µ bj
{V (b µj )(1 − hj )}
.
1/2
∼
(0, φ).
Bootstrapped responses
yj∗ = µ bj + V (b µj )1/2 ε∗j
where ε∗j ∼ EDF(rP 1 − rP , . . . , rP n − rP ). However • possible that yj∗ 6∈ {0, 1, 2, . . . , } • rP j not exchangeable, so may need stratified resampling Anthony Davison: Bootstrap Methods and their Application,
85
Regression
AIDS data ◮
Log-linear model: number of reports in row j and column k follows Poisson distribution with mean µjk = exp(αj + βk )
◮
Log link function g(µjk ) = log µjk = αj + βk and variance function var(Yjk ) = φ × V (µjk ) = 1 × µjk
◮
Pearson residuals: rjk =
◮
Yjk − µ bjk {b µjk (1 − hjk )}1/2
Model-based simulation:
1/2
∗ =µ bjk + µ bjk ε∗jk Yjk
Anthony Davison: Bootstrap Methods and their Application,
86
Regression
Diagnosis period Year 1988
1989
1990
1991
1992
Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Reporting-delay interval (quarters): 0† 31 26 31 36 32 15 34 38 31 32 49 44 41 56 53 63 71 95 76 67
1 80 99 95 77 92 92 104 101 124 132 107 153 137 124 175 135 161 178 181
2 16 27 35 20 32 14 29 34 47 36 51 41 29 39 35 24 48 39
3 9 9 13 26 10 27 31 18 24 10 17 16 33 14 17 23 25
4 3 8 18 11 12 22 18 9 11 9 15 11 7 12 13 12
5 2 11 4 3 19 21 8 15 15 7 8 6 11 7 11
Anthony Davison: Bootstrap Methods and their Application,
6 8 3 6 8 12 12 6 6 8 6 9 5 6 10
··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ···
≥14 6 3 3 2 2 1
Total reports to end of 1992 174 211 224 205 224 219 253 233 281 245 260 285 271 263 306 258 310 318 273 133
87
Regression
AIDS data
0
rP
2
4
1984 1986 1988 1990 1992
-2
+ ++ + + + + ++ ++ + + + + +++ ++ ++ + ++++ + + + +++ +++++
-4
300 200 0
100
Diagnoses
400
6
500
◮
Poisson two-way model deviance 716.5 on 413 df — indicates strong overdispersion: φ > 1, so Poisson model implausible Residuals highly inhomogeneous — exchangeability doubtful
-6
◮
0
1
2
3
4
5
skewness Anthony Davison: Bootstrap Methods and their Application,
88
Regression
AIDS data: Prediction intervals ◮
To estimate prediction error: ∗ • simulate complete table yjk ; ∗ • estimate parameters from incomplete yjk • get estimated row totals and ‘truth’ X b∗ X ∗ ∗ ∗ µ b∗+,j = eαbj = eβk , y+,j yjk . k unobs
• Prediction error
k unobs
∗ y+,j −µ b∗+,j ∗1/2
µ b+,j
studentized so more nearly pivotal. ◮
Form prediction intervals from R replicates.
Anthony Davison: Bootstrap Methods and their Application,
89
Regression
AIDS data: Resampling plans ◮
Resampling schemes: • • • •
parametric simulation, fitted Poisson model parametric simulation, fitted negative binomial model nonparametric resampling of rP stratified nonparametric resampling of rP
◮
Stratification based on skewness of residuals, equivalent to stratifying original data by values of fitted means
◮
Take strata for which µ bjk < 1,
1≤µ bjk < 2,
Anthony Davison: Bootstrap Methods and their Application,
µ bjk ≥ 2 90
Regression
AIDS data: Results ◮
Deviance/df ratios for the sampling schemes, R = 999.
◮
Poisson variation inadequate.
◮
95% prediction limits.
600
5 4 3
1.0
1.5
2.0
2.5
0.0
0.5
1.0
1.5
2.0
2.5
deviance/df
nonparametric
stratified nonparametric
300
5 4
200
3
+ + ++ + +
1
2
+
+ ++ + + + ++ + +
1989 1990 1991 1992
0
0
1
2
3
4
5
6
deviance/df
Diagnoses
0.5
6
0.0
400
500
2 1 0
0
1
2
3
4
5
6
negative binomial
6
poisson
0.0
0.5
1.0
1.5
deviance/df
2.0
2.5
0.0
0.5
1.0
1.5
2.0
2.5
deviance/df
Anthony Davison: Bootstrap Methods and their Application,
91
Regression
AIDS data: Semiparametric model ◮
More realistic: generalized additive model µjk = exp {α(j) + βk } ,
600 500 Diagnoses
400 300
+ ++ + + + ++ ++ + + +++++ + +++ + +++++ + + + +++ + ++++
+ +
200
0
Diagnoses
◮
where α(j) is locally-fitted smooth. Same resampling plans as before 95% intervals now generally narrower and shifted upwards 100 200 300 400 500 600
◮
+ + ++ + +
1984 1986 1988 1990 1992
Anthony Davison: Bootstrap Methods and their Application,
+ ++ + + + ++ +
1989 1990 1991 1992
92
Regression
Key points
◮ ◮
Key assumption: independence of cases Two main resampling schemes for regression settings: • Model-based • Case resampling
◮
Intermediate schemes possible
◮
Can help to reduce dependence on assumptions needed for regression model
◮
These two basic approaches also used for more complex settings (time series, . . .), where data are dependent
Anthony Davison: Bootstrap Methods and their Application,
93
Regression
Summary ◮ ◮
Bootstrap: simulation methods for frequentist inference. Useful when • standard assumptions invalid (n small, data not normal, . . .); • standard problem has non-standard twist; • complex problem has no (reliable) theory; • or (almost) anywhere else.
◮
Have described • • • •
basic ideas confidence intervals tests some approaches for regression
Anthony Davison: Bootstrap Methods and their Application,
94
Regression
Books ◮
Chernick (1999) Bootstrap Methods: A Practicioner’s Guide. Wiley
◮
Davison and Hinkley (1997) Bootstrap Methods and their Application. Cambridge University Press
◮
Efron and Tibshirani (1993) An Introduction to the Bootstrap. Chapman & Hall
◮
Hall (1992) The Bootstrap and Edgeworth Expansion. Springer
◮
Lunneborg (2000) Data Analysis by Resampling: Concepts and Applications. Duxbury Press
◮
Manly (1997) Randomisation, Bootstrap and Monte Carlo Methods in Biology. Chapman & Hall
◮
Shao and Tu (1995) The Jackknife and Bootstrap. Springer
Anthony Davison: Bootstrap Methods and their Application,
95