Implementation of Saddlepoint Approximations to Bootstrap Distributions. Angelo J. Canty

Anthony C. Davison

Department of Statistics University of Oxford 1 South Parks Road, Oxford OX1 3TG United Kingdom

Abstract

In many situations, saddlepoint approximations can be used to replace Monte Carlo simulations to nd the bootstrap distribution of a statistic. We explain how bootstrap and permutation distributions can be expressed as conditional distributions and how methods for linear programming and for tting generalized linear models can be used to nd the saddlepoint approximations. If the statistic of interest cannot be expressed in terms of a single estimating equation, then an approximation to the marginal distribution of the statistic is required. This situation arises commonly in nding the bootstrap distribution of a studentized statistic. We discuss two proposed approximations and look at the their implementation and performance. The results are illustrated using an example from statistical process control.

1 Introduction One use of the nonparametric bootstrap is to approximate the distribution of a statistic T when sampling from a given dataset X = (X1 ; :::; Xn)t, where X t denotes the transpose of X . For an accurate approximation in the tails, this requires a large number of bootstrap replicates, T , of T . Another method of approximating a distribution is to use saddlepoint methods which replace the Monte Carlo replication with analytical calculations. The advantage of the saddlepoint approximation is that it is highly accurate very far into the tails of the distribution. The major disadvantage is in programming and implementing it eciently. In this paper we show that the saddlepoint approximation to the resampling distribution of T can often be found quite easily using readily This work was supported by the UK Engineering and Physical Sciences Research Council through a grant and an Advanced Research Fellowship to the second author

available code for solving linear programming problems coupled with methods for tting generalized linear models. If this is possible then great eciency gains can be made. For some statistics, particularly studentized statistics, saddlepoint methods cannot be used directly, and so a more sophisticated approximation is needed. We shall look at two possible methods for dealing with such statistics and discuss the diculties in implementation that arise.

2 Saddlepoint Approximations In this section we look at saddlepoint approximations as they are applied in the nonparametric resampling context. For a more general review of saddlepoint methods see Jensen (1995) and the references therein. SupposePthat the bootstrap statistic T can be expressed as j aj fj where fj is the bootstrap frequency of the observed xj and the aj are constant d 1 vectors. Since a bootstrap sample is generated by resampling n observations with replacement from the observed data x, the fj ; j = 1; : : :; n; have a joint multinomial distribution with parameters n and p = (p1 ; : : :; pn), where pj is the resampling probability of xj in a bootstrap sample. For the usual nonparametric bootstrap pj = n?1 ; j = 1; : : :; n. Thus T has cumulant generating function 8 9

Anthony C. Davison

Department of Statistics University of Oxford 1 South Parks Road, Oxford OX1 3TG United Kingdom

Abstract

In many situations, saddlepoint approximations can be used to replace Monte Carlo simulations to nd the bootstrap distribution of a statistic. We explain how bootstrap and permutation distributions can be expressed as conditional distributions and how methods for linear programming and for tting generalized linear models can be used to nd the saddlepoint approximations. If the statistic of interest cannot be expressed in terms of a single estimating equation, then an approximation to the marginal distribution of the statistic is required. This situation arises commonly in nding the bootstrap distribution of a studentized statistic. We discuss two proposed approximations and look at the their implementation and performance. The results are illustrated using an example from statistical process control.

1 Introduction One use of the nonparametric bootstrap is to approximate the distribution of a statistic T when sampling from a given dataset X = (X1 ; :::; Xn)t, where X t denotes the transpose of X . For an accurate approximation in the tails, this requires a large number of bootstrap replicates, T , of T . Another method of approximating a distribution is to use saddlepoint methods which replace the Monte Carlo replication with analytical calculations. The advantage of the saddlepoint approximation is that it is highly accurate very far into the tails of the distribution. The major disadvantage is in programming and implementing it eciently. In this paper we show that the saddlepoint approximation to the resampling distribution of T can often be found quite easily using readily This work was supported by the UK Engineering and Physical Sciences Research Council through a grant and an Advanced Research Fellowship to the second author

available code for solving linear programming problems coupled with methods for tting generalized linear models. If this is possible then great eciency gains can be made. For some statistics, particularly studentized statistics, saddlepoint methods cannot be used directly, and so a more sophisticated approximation is needed. We shall look at two possible methods for dealing with such statistics and discuss the diculties in implementation that arise.

2 Saddlepoint Approximations In this section we look at saddlepoint approximations as they are applied in the nonparametric resampling context. For a more general review of saddlepoint methods see Jensen (1995) and the references therein. SupposePthat the bootstrap statistic T can be expressed as j aj fj where fj is the bootstrap frequency of the observed xj and the aj are constant d 1 vectors. Since a bootstrap sample is generated by resampling n observations with replacement from the observed data x, the fj ; j = 1; : : :; n; have a joint multinomial distribution with parameters n and p = (p1 ; : : :; pn), where pj is the resampling probability of xj in a bootstrap sample. For the usual nonparametric bootstrap pj = n?1 ; j = 1; : : :; n. Thus T has cumulant generating function 8 9