On the asymptotic relative efficiencies between rank ...

2 downloads 0 Views 351KB Size Report
two competing estimators” (or tests) “meet a given standard of performance”. ... Hodges and Lehmann (1956) showed that for the two-sample location problem.
Electronic Journal of Statistics ISSN: 1935-7524

On the asymptotic relative efficiencies between rank-based tests Yvik Swan† and Thomas Verdebout†† and Nadir Maaroufi∗ Laboratoire EQUIPPE, Universit´ e Lille Nord de France†† UR en Math´ ematiques, Universit´ e du Luxembourg



Universit´ e Internationale de Rabat, pˆ ole E.L.I.T.∗ Abstract: We obtain a general formula providing bounds on the Asymptotic Relative Efficiency between different rank-based tests. We compute these bounds for the Wilcoxon, van der Waerden, Cauchy and Spearman rank-based tests for a location parameter. Keywords and phrases: Asymptotic Relative Efficiency, rank-based tests.

1. Introduction For any given estimation (or testing) problem, there typically exist several reasonable procedures which can be envisaged. A natural enquiry is then whether any procedure is to be favored over any other. Quoting Serfling (2011), a “natural and time honored approach” consists in “comparing the sample sizes at which two competing estimators” (or tests) “meet a given standard of performance”. Among such discrimination measures, asymptotic relative efficiency (ARE) is perhaps the most widely used in practice.

We denote by AREf (t1 /t2 ) the asymptotic relative efficiency between the statistical procedures t1 and t2 when the true underlying distribution is f . There exist many ways of defining this quantity. For instance, if t1 and t2 are two estimators such that t1 ≈ N (θ, V1 (f )/n) and t2 ≈ N (θ, V2 (f )/n) for large samples, then a natural way of comparing them is to consider the ratio of limit variances AREf (t1 /t2 ) = V1 (f )/V2 (f ). Similarly, when comparing two testing procedures φ1 and φ2 for a given null hypothesis H0 , Pitman (1938) suggests to discriminate 1

Swan, Verdebout and Maaroufi/Bounds on the asymptotic relative efficiencies (n)

2

(n)

between these two tests via the ratio AREf (φ1 /φ2 ) = limn→∞ n1 /n2 , with (n)

n1

(n)

and n2

the sample sizes required for tests φ1 and φ2 to reach the same

power. Finally the ARE can also be computed as the ratio of the slopes of the two procedures (see e.g. van der Vaart and Wellner (2000)). In many instances the three definitions are equivalent (see e.g. van Eeden (1963)) and much is known on the properties of these ratios. Aside from the above so-called Pitman ARE, one can also compare testing procedures through other means among which the other two basic ones are the so-called Bahadur and the Hodges-Lehmann ARE. Again there are instances in which these definitions coincide, see e.g. Wieand (1976); Groeneboom and Oosterhoff (1981). ARE computations are, in general, non-trivial, and we refer the reader to the classical monographs H´ajek ˇ ak (1967); Serfling (1980); Nikitin (1995); van der Vaart (1998) or to and Sid´ the more recent encyclopedia articles Nikitin (2011) and Serfling (2011) for a complete overview of their applications, computations and interpretation.

It is of general interest to obtain bounds on the AREs between statistical procedures and to study conditions on f under which these bounds hold. Consider for instance the problem of comparing the Wilcoxon test φWil with the optimal Gaussian (Student) test φN . One of the first and most striking lessons brought by ARE computations in this context is due to Pitman (1938) who proved that, even for Gaussian samples, the ARE is high at 3/π. Moreover Hodges and Lehmann (1956) showed that for the two-sample location problem when f admits finite second moments, we have 0 ≤ AREf (φN /φWil ) ≤ 1.157, both bounds being “tight”, in the sense that there exist densities f attaining both the upper and lowed bound. Along the same lines Chernof and Savage (1958) proved that 0 ≤ AREf (φN /φvdW ) ≤ 1, with φvdW the van der Waerden rank-based test. They also showed that the upper bound is attained at the Gaussian case only. One of the striking learnings of these results is that they entail inadmissibility–in the Pitman sense–of the standard Gaussian procedures under even mild violations of the Gaussian hypothesis.

Swan, Verdebout and Maaroufi/Bounds on the asymptotic relative efficiencies

3

Because the traditional Gaussian procedures are not robust with respect, for instance, to a misspecification of the actual density, it seems natural to study also relationships between two different rank-based tests. In this frame of mind Hodges and Lehmann (1961) obtained the elegant bounds 0 ≤ AREf (φWil /φvdW ) ≤ 6/π,

(1.1)

with φWil Wilcoxon’s scores and φvdW van der Waerden’s scores ; all the values in this range being attained by specific densities which they exhibit. Other efforts of the same nature have been uncovered in Yu (1971); Wieand (1976); Sinha and Wieand (1977) in which the above mentioned statistics are also compared with the Cramer-von Mises or the Kolmogorov-Smirnov. We will return to these results in the course of the subsequent text.

In this paper we consider the Pitman AREs between different rank-based tests for location within the framework of locally asymptotically normal (LAN) experiments for the location parameter in the general linear model. The LAN framework is particularly amenable to these computations since then standard ˇ ak, 1967; van der Vaart, 1998)) yield, at techniques (see, e.g., (H´ ajek and Sid´ least in principle, the AREs between any two sufficiently well-behaved tests through R1 AREf (J1 /J2 ) = R01 0

J22 (u)du

R∞

J12 (u)du

R∞

J (F (x))f 0 (x)dx −∞ 1

−∞

J2 (F (x))f 0 (x)dx

!2 .

(1.2)

This ARE formula is not, however, very well suited for analysis and in this paper we will show that the LAN framework suffices to rewrite (1.2) (without any further assumption on the underlying density f ) under the more agreeable form AREf (J1 /J2 ) =

E(J22 (U )) E(J12 (U ))



E(J1 ◦ F )0 (X) E(J2 ◦ F )0 (X)

2 .

(1.3)

where we let X ∼ f and U ∼ U nif (0, 1). As will be shown, the above formula allows for obtaining “tight” bounds on the AREs for several important pairs of

Swan, Verdebout and Maaroufi/Bounds on the asymptotic relative efficiencies

4

score functions; it also allows a neat interpretation in terms of tail orderings, as already explored in, e.g., Van Zwet (1964); Wieand (1976); Loh (1984) and Caperaa (1988).

2. Framework, assumptions and formula for the asymptotic efficiency The asymptotic behavior of rank-based tests is, in general, hard to determine. Uniform local asymptotic normality (ULAN) of a sequence of models nevertheless makes the study of this behavior a tractable problem. When a sequence of location models is ULAN, it is indeed known that the asymptotic distribution under the cumulative distribution function (cdf) F (with corresponding density (pdf) f ) of a rank test based on the score generating function J (satisfying some regularity conditions) depends on the information quantities K(J, f ) and K(J) with, putting ϕf := f 0 /f (f 0 stands for the almost everywhere derivative of f ), R1 R1 K(J, f ) := 0 J(u)ϕf (F −1 (u))du and K(J) := 0 J 2 (u)du. More specifically the ULAN property for f allows to compute the Asymptotic Efficiency of a rankbased test φJ based on the score generating function J (denoted AEf (φJ )) in terms of these information quantities; in the general linear model, when testing for the slope, we get AEf (φJ ) =

K2 (J, f ) . K(J)

(2.4)

From here comparing φJ with another rank-based test φJ˜ based on the score generating function J˜ in terms of Pitman asymptotic relative efficiency can be done simply by taking the ratio of the expressions provided in (2.4); this yields the formula (1.2) provided in the Introduction.

A necessary condition on f for LAN to hold is that f 1/2 be differentiable in quadratic mean (see for instance (van der Vaart, 1998, Theorem 7.2)). As shown in Hallin and Paindaveine (2002), this quadratic mean differentiability condition is strictly equivalent to the condition that f 1/2 ∈ W 1,2 (R) the Sobolev space of

Swan, Verdebout and Maaroufi/Bounds on the asymptotic relative efficiencies

5

order 1 on L2 (R). These considerations explain that we will, from here onwards, work with densities satisfying the following assumption.

Assumption A. The square root f 1/2 of the density f is in W 1,2 (R) and admits the weak L2 derivative (f 1/2 )0 .

It is easy to verify that Assumption A directly entails that the Fisher information R1 for location is finite, i.e. that If := 0 ϕ2f (F −1 (u))du < ∞. We stress the fact that, through Assumption A, we do not require moment conditions on f and even linear models with heavy tailed errors can be shown to be quadratic mean differentiable, irrespective of the tail index (see (Hallin et al., 2011, Lemma 2.1)).

As for the score function J we will need the following condition.

Assumption B. The score generating function J : [0, 1] 7→ R is the difference between two monotone increasing square integrable functions.

Assumption B is sufficient to ensure that the corresponding rank-based function possesses a “H` ajek-type” asymptotic representation, with asymptotic power given in (2.4) (see, e.g., Hallin and Paindaveine (2002)). Note that if g is the pdf of a distribution which satisfies Assumption A then the corresponding local optimal score function is Jg (x) = ϕg (G−1 (x)); this score function does not necessarily satisfy Assumption B although there are many important instances (log-concave densities, for instance) where they do. √

For the sake of illustration take g = e−

2|x|

√ / 2, the Laplace (double exponen-

tial) distribution standardized to have mean 0 and variance 1. This distribution is log-concave and satisfies Assumption A. The corresponding score function ϕg (G−1 (x)) is equivalent to J(x) = sign(x − 1/2), the Laplace score, which satisfies Assumption B and is used to construct the sign test. The asymptotic of this test is easily computed using (2.4), yielding AEf (φLap ) = f (x(1/2) )2

Swan, Verdebout and Maaroufi/Bounds on the asymptotic relative efficiencies

6

with x(1/2) denoting the median of the distribution f . It is straightforward to bound this expression in terms of the class of distributions f under study. As second example take g = φ the standard gaussian yields J(x) = Φ−1 (x), the van der Waerden score function (with Φ the Gaussian cdf). This score function is used to construct the van der Waerden test; the latter is (asymptotically) equivalent to the Fisher-Yates or normal scores test. Again using (2.4) we get R∞ AEf (φVdW ) = −∞ Φ−1 (F (x))f 0 (x)dx. This expression is hard to quantify and bound in terms of f . It is tempting to apply integration by parts to obtain a more agreeable expression. As we propose to show in this paper, the resulting formula is indeed more amenable to comparisons. Moreover we show how the LAN framework with Assumptions A and B above suffices to justify this manipulation.

We first state a result from analysis which, though perhaps standard, we have not found in the literature. We therefore provide a proof. Lemma 2.1. Let f be a density with cdf F such that Assumption A holds. Let J be a score generating function satisfying Assumption B. Then lim J(F (x))f (x) = 0.

|x|→∞

Proof. The result follows once it is shown that J(F (x)))f (x) ∈ W 1,1 (R) (see e.g. (Brezis, 2010, Corollary 8.9)). Therefore, we show in the sequel that J(F (x))f (x) belongs to L1 (R) and that the weak derivative (J(F (x))f (x))0 also belongs to L1 (R). First, note that the substitution u = F (x) and the Cauchy-Schwartz inequality directly entail that Z Z |J(F (x))f (x)|dx = R

1

|J(u)|du ≤ kJkL2 (0,1) ,

0

which is finite in view of Assumption B. As a direct consequence, J(F (x)))f (x) is in L1 (R). Therefore, it remains to show that the weak derivative J 0 (F (x))f 2 (x) + J(F (x))f 0 (x)

(2.5)

Swan, Verdebout and Maaroufi/Bounds on the asymptotic relative efficiencies

7

of J(F (x)))f (x) is in L1 (R). For the second term in (2.5), using the substitution u = F (x) and the Cauchy-Scwartz inequality, we obtain that Z Z 1 f 0 (F −1 (u)) |J(F (x))f 0 (x)|dx = du ≤ kJkL2 (0,1) If 1/2 , J(u) f (F −1 (u)) R 0 which is clearly finite in view of both Assumptions A and B. Therefore, it remains to show that the first term in (2.5) belongs to L1 (R) to complete the proof. First, note that the substitution u = F (x) again yields Z Z 1 0 2 |J 0 (u)|f (F −1 (u))du. |J (F (x))|f (x)dx = 0

R

It is easy to verify that if Assumption A holds (that is, if f 1/2 ∈ W 1,2 (R)), then f ◦ F −1 belongs to W01,2 (0, 1) := {u ∈ W 1,2 (0, 1)| limx→0 u(x) = limx→1 u(x) = 0}. Now, it is well known that C0∞ (]0, 1[), the set of all C ∞ and compactly supported functions on (0, 1) is dense in W01,2 (0, 1) (see Theorem 8.2 in Brezis (2010)). Consider a sequence of functions ψn ∈ C0∞ (]0, 1[), n ∈ N. Noting that for any ψn ∈ C0∞ (]0, 1[), limx7→0 J(x)ψn (x) = limx7→1 J(x)ψn (x) = 0, by integrating by parts and using the Cauchy-Schwartz inequality, we obtain that for all n, Z 1 |J 0 (u)|ψn (u)du ≤ kJkL2 (0,1) kψn (u)kL2 (0,1) , 0

which is finite in view of Assumption B. The result follows by using the fact that C0∞ (]0, 1[) is dense into W01,2 (0, 1). A direct consequence of Lemma 2.1 is the following result which will be the starting point for our analysis. Proposition 2.1. Let f be a density with cdf F satisfying Assumption A and J a score generating functions satisfying Assumption B. Then the Asymptotic Efficiency (AE) of a rank-based test based on J when the underlying density is f is given by Z



AEf (J) = −∞

2 Z  J (F (x))f (x)dx 0

2

1

J 2 (u)du.

(2.6)

0

Taking ratios of the expressions provided by Proposition 2.1 yields (1.3) for any combination of targets and scores which satisfy the requirements.

Swan, Verdebout and Maaroufi/Bounds on the asymptotic relative efficiencies

8

3. Bounding AREs In this section, we use Proposition 2.1 to obtain bounds on the asymptotic relative efficiency between different rank-based tests. Although [some text to introduce?] 3.1. A general formula Wilcoxon’s two-sample and one-sample tests for location are based on the socalled Wilcoxon score generating function given by the linear function JWil (x) = 2x − 1. Using together the fact that any pdf f which satisfies Assumption A is R1 such that 0 f 2 (u)du is finite and the fact that EJWil (U )2 = 1/3, we directly obtain applying (2.6) that Z

2



AEf (JWil ) = 12

2

f (x)dx −∞

for all sufficiently well-behaved densities f . (This formula is, of course, not new and has already been obtained by many authors before us.) Then the general form of (2.6) suggests that the Wilcoxon score is an appropriate basis for 2

comparison and, letting κ1 (J, f ) = supx |J 0 (F (x))| /EJ 2 (U ) and κ2 (J, f ) = 2

inf x |J 0 (F (x))| /EJ 2 (U ), we obtain AEf (φJ ) κ1 (J, f ) κ2 (J, f ) ≤ AREf (J/JWil ) = ≤ . 12 AEf (φJWil ) 12

(3.7)

While necessarily coarse, the following examples show that the bounds (3.7) are nevertheless (perhaps surprisingly) tight.

Take JvdW (x) = Φ−1 (x) the van der Waerden (normal) scores already mentioned √ 2 above. Then E(JvdW (U )) = 1 and 2π ≤ (JvdW )0 (F (x)) ≤ ∞ for all x and all cdf F . Then, from (3.7), we deduce π/6 ≤ AREf (JvdW /JWil ) ≤ ∞;

(3.8)

this is (1.1) from Hodges and Lehmann (1961). The bounds in (3.8) are “tight”, in the sense that they are the best attainable without making further regularity assumptions on the class of densities under consideration.

Swan, Verdebout and Maaroufi/Bounds on the asymptotic relative efficiencies

9

Next take JCau (x) = sin(2πx) the score function associated to the Cauchy dis2 tribution. Then E(JCau (U )) = 1/2 and 0 ≤ |(JCau )0 (F (x))| ≤ 2π for all x and

all cdf F . Then, from (3.7), we get 0 ≤ AREf (JCau /JWil ) ≤ 2π 2 /3.

(3.9)

We will exhibit densities which show that the bounds in (3.9) are tight, in the sense that they are the best attainable without making further regularity assumptions on the class of densities under consideration.

Finally choose JGum (x) = 1 + ln(1 − x) the score function associated to the 2 Gumbel distribution. Then E(JGum (U )) = 1 and −∞ ≤ (JGum )0 (F (x)) ≤ −1

for all x and all cdf F . Then, from (3.7), we get 0 ≤ AREf (JWil /JGum ) ≤ 12.

(3.10)

Again we will prove tightness of these bounds.

When comparing two tests, one can always use the chain rule AREf (J1 /J2 ) = AREf (J1 /J3 ) AREf (J3 /J2 ). As a consequence, working from (3.7) it is always possible to compare any two pairs of scores. For example we readily get 0 ≤ AREf (JCau /JVdW ) ≤ 4π

(3.11)

for comparing Cauchy and van der Waerden scores.

Taking gν to be the density of a student-t random variable with ν degrees of freedom we get Igν = (ν + 1)/(ν + 3).

3.2. Proving tightness We follow a manipulation inspired from Hodges and Lehmann (1961). Take g a symmetric pdf satisfying Assumption A such that Jg (x) = ϕg (G−1 (x))

Swan, Verdebout and Maaroufi/Bounds on the asymptotic relative efficiencies

10

satisfies Assumption B. Suppose furthermoe that g is sufficiently regular to ensure validity of all subsequent manipulations (this is the case for all targets considered above) and let l1 = lim→0 g()/ϕ0g () and l2 = lim→∞ g()/ϕ0g (). Define (for a,  > 0) the transformed distribution Fa, (x) = G(x) I{0≤x≤} + G(y) I{x>} , where y = a(x−)+ and Fa, is defined by symmetry for x ≤ 0. The associated density is fa, (x) = g(x) I{0≤x≤} + a g(y) I{x>} . Using Proposition 2.1 we easily obtain !2 R 2 R∞ 2 g (x) dx + a g (x) dx R∞ AREa, (JWil /Jg ) = 12 Ig R  0 0 , ϕ (x)g(x) dx + a  ϕ0g (x)g(x) dx 0 g with Ig = E(ϕg (X)2 ) the (location) Fisher information for g. Taking a → 0 then  → ∞ we get AREa, (JWil /Jg ) → 12Ig (l1 )2 , while taking a → ∞ and  → 0 yields AREa, (JWil /Jg ) → 12Ig (l2 )2 . This observation yields a number of interesting findings. √ Take g = φ, the Gaussian density, then Jg (x) = Φ−1 (x), l1 = 1/ 2π and l2 = 0. Consequently 0 ≤ AREf (JWil /JVdW ) ≤ 6/π with all intermediate values being attained, by continuity.

Next take JCau (x) = sin(2πx) the score function associated to the Cauchy distribution.

Taking gν to be the density of a student-t random variable with ν degrees of √ freedom we get Igν = (ν + 1)/(ν + 3), l1 = Γ((ν + 1)/2)/( νπ Γ(ν/2)) and l2 = 0 so that we have that when a → 0 then  → ∞ we get AREf (JWil /Jgν ) = 12(ν+1) Γ2 ((ν+1)/2) (ν+3) νπ Γ2 (ν/2)

while taking a → ∞ and  → 0 yields AREf (JWil /Jgν ) = 0

4. ARE bounds and stochastic ordering

Swan, Verdebout and Maaroufi/Bounds on the asymptotic relative efficiencies

11

In this section, we would like to find a link between stochastic ordering and the computation of our ARE’s. From Loh (1984), we know that there exists a link between the ARE of the Wilcoxon test with respect the van der Waerden test and stochastic ordering. Let F and G denote two cdfs with median zero. In the sequel, we write as in Loh (1984) F 0. It is well-known from Loh (1984) that if F and G are strongly unimodal with densities f and g (or “ IFR”) then, we have that if F