goodnessoffit tests for fisher's distribution on the ... - Wiley Online Library

Austral. J. Statfit., 26 ( 2 ) , 1984, 142-150

GOODNESS-OF-FIT TESTS FOR FISHER’S DISTRIBUTION ON THE SPHERE’ N. I. FISHERAND D. J. BEST CSIRO Division of Mathematics & Statistics, Lindfield, NSW 2070 Australia summary

A set of three goodness-of-fit procedures is proposed to investigate the adequacy of fit of Fisher’s distribution on the sphere as a model for a given sample of spherical data. The procedures are all based on standard tests using the empirical distribution function. Key words : Directional data; Kolmogorov-Smirnov tests.

1. Introduction Data which take the form of samples of unit three-dimensional vectors, or equivalently, samples of points on the unit sphere, arise in a variety of scientific disciplines such as Structural Geology, Palaeomagnetism, and Astrophysics. The only statistical model in common use for unimodal data of this sort is Fisher’s distribution on the sphere, which has probability density [K/(47T

sinh K ) ] sin 8 exp (K(sin 8 sin (Y cos ( 4 - p ) + cos 8 cos a)},

o a , a c T , O S ~ , 0o.

In this expression, (a,P ) represents the mean direction of the distribution, and K a measure of the concentration about this mean direction; we denote this by F ( ( a ,p), K ) . Despite its widespread use, little attention has been paid to determining whether this model is in fact suitable for any given sample of data. However, some recent papers (Lewis & Fisher, 1982, Fisher, Lewis & Willcox, 1981) have proposed, respectively, some graphical methods of assessing the goodness-of-fit of the model, and some formal tests for discordancy. In this paper, we present three formal goodness-of-fit procedures based on the informal methods of Lewis & Fisher. These three procedures are designed to detect specific types of departure from the Fisherian model. In Section 2, we give the definitions of the test statistics. Section 3

* Manuscript received May

19, 1982, revised September 16, 1983.

143

GOODNESS-OF-FIT TESTS

contains the appropriate distribution theory, and Section 4 examines power of the procedures, and Section 5 gives some examples of their application. 2. Definitions of the Test Statistics Let (8, @) be a random vector with the F ( ( a ,p ) , K ) distribution. Lewis & Fisher (1982) introduced three graphical procedures for informal inference about a sample of spherical data, which utilize the following properties of this distribution: (i) If (8, @> is rotated to (Of, @’) with mean direction (0,O) using the transformation sin 8’ cos a’ sin # s i n @’

sin Q cos p

sin a sin p cos a

then 8‘ and 0’are independent, with @’ being uniformly distributed U ( 0 , 2 ~ and ) 8’ having density proportional to sin 8’ekcose‘. Thus C’ = 1- cos 8’ has the truncated exponential density Ke-KC‘/(l- e - 2 K ) ) ,0 S c ’ s 2 and so can be regarded as being exponentially distributed E ( K )if K 3 3. (ii) If (0,@) is rotated to (W, W ) by the transfo-mation sin 0” cos CP,, sin 0” sin CP,,

cos CK cos /3

cos a sin p -sin a

and a’’ is re-defined to have the range [-T, T ) instead of [0,27r), then (Lewis & Fisher (1982))cp“ -&-@ is approximately normally distributed N ( 0 , U K ) , This approximation being adequate for K 3 3 . The probability plotting procedures are based on these derived variGiven a sample (el, 4J . . . (On, 4”) purables C’, @’ and W porting to be drawn from an F((cK,8). K ) distribution, let (2,6, be the estimates of ( a , 0 ) calculated from R sin d cos fi 4, R sin d sin 6 = 1;mz, R cos d = 1;n, (where li = sin 8, cos &, mi = sin ei sin bi, n, = cos 8,, and R 2 = (c 1,)2 + (c mi)* + (1q)2). Apply the rotation

m.

=c;

144

N. I. FISHER AND D. J. BEST

defined in (i), using (&, fi) instead of (a,P ) , to each (Oi, 4i) to obtain (el, &;), 1 s i ~ n The . set of values S E = { c ! = 1-cos el, l s i s n } can then be tested for exponentiality, and the set Su = {+:, 1S i s n} for uniformity. Next, apply the rotation defined in (ii) using ar = &, p = fi, and 1s i s n) for normality. test S , ={z,= 4: Procedures based on the set SE provide a general check on the underlying co-latitutde (0) distribution, but in addition, are useful for detecting the particular sort of departure from the hypothesized model caused by contamination-data from a Fisher distribution with a different mean direction. A physical mechanism by which this problem can arise in palaeomagnetic work is described in Fisher, Lewis & Willcox (1981). Procedures based on Su are designed to check the assumption of rotational symmetry of the distribution about its mean direction; and those based on S , to detect correlation between the co-latitude and longitude (a),although they will also be sensitive to the presence of outlying values in the sample. The three test statistics investigated in some detail were

w,

(i) Kolmogorov-Smirnov statistic D,, to test c;, . . . , cl, for exponentiality when the scale parameter has to be estimated. (ii) Kuiper statistic V,, to test . . . , 4 ; for uniformity. (iii) Kolmogorov-Smirnov statistic to test z l r .. . , zn for normality, when the mean and variance have to be estimated.

+;,

In general, if xl,. . . ,x , is a sample of data for which the distribution function of the underlying population is postulated to be F(x), the computational forms for 0, and V,, are:

D,, = max (D:, D;),

V,, = D:

+ 0;

where

D: = ,rnnn {i/n - F ( x ) ( ~ ) ) }D; , = lmax {F(x(~)) - (i - l)/n}, 6iSn and x ( ~S) .. * S x(,,) are the ordered values of xl, . . . , x,,. For the special case above, we have: (i) x , = c ; , F(x)=l-exp(-Cx), i = ( n - l ) / C c ; (ii) = &, F ( x ) = x / 2 n (iii) x, = zi, F(x) = 1/(2nr2)*J2, exp (- t 2 / 2 r 2 )dr, r2= x?/n [Note: i = (n - 1)/C CT = (n- 1)/C (1 -cos (Oi - &)} = ( n - l)/(n - R), the usual approximation to the maximum likelihood estimate of K . ]

3. Distriiutions of the Statistits Monte Car10 experiments were performed to study the distribution of the three test statistics introduced above. Random vectors (0, a) from an F((O,O),K ) distribution were generated, as in Fisher,


145

Lewis & Willcox (1981), by taking 0 = 2 sin-' [-{ln (V(1- A ) + A)}/ and 0 = 27rV where U, V are independent uniform (0, 1) variables and A =exp ( - 2 ~ ) .Pseudo-random values for U and V were obtained by using the RANF generator on a CDC 7600. 2KIi

(i) Stephens (1974) provides the modified form M E ( D n ) = (0,-0.2/n)(&+0.26 + 0.5/&) for the Kolmogorov-Smirnov goodness-of-fit test for exponentiality when a parameter has to be estimated. Initially, for each (n, K ) pair (n = 10, 15, 20, 50, 100; K = 3, 5 , 10, 500) 1000 pseudo-random F ( ( 0 ,0), K ) samples of size n were simulated, the corresponding 1000 values of ME(Dn) calculated, and the upper l o % , 5% and 1%points compared with Stephens' tabulated values. There being in good agreement, 6,000 pseudo-random F((0,O), K ) samples for each of the pairs (n,K ) ( n = 1 0 , 1 5 , 3 0 ; K = 5, 10,500) were obtained to check for any dependence of ME(D,,)on K ; and 4,000 such samples for each (n, K ) ( n = 40,50; K = 10) obtained to check any dependence on n. In neither case was any dependence manifested. Table l(a) gives typical results and shows satisfactory agreement with Stephens' values. TABLE1 Estimated percentage points of M E ( D n )for K = 10, a ={O.Ol,0.05,0.1} and n ={lo, 15,30,40, SO} (a)

n\a

0.1

0.05

0.01

10 15 30 40 50

0.969 0.989 0,990 0.978 0.991

1.074 1.092 1.094 1.074 1.103

1.264 1.312 1.306 1.300 1.274

Stephens 0.990

1.094

1.308

( b ) Estimated percentage points of M,(D,) for K = 10, a = {0.01,045,0.1} and n ={lo, 15,30,40,50}. n\a

0.1

0.05

0-01

10 15 30 40 50

0.825 0.819 0.807 0.816 0.813

0.896 0.890 0.882 0.890 0.888

1,020 1.023 1.004 1.039 1.011

Stephens 0.819

0.895

1.035

146


(In the initial study based on 1000 simulations and 20 (n, K ) combinations, Stephens’ modified form of the Cramer-von Mises statistic Wz was also studied, with similar results.) (ii) The same simulation program as in (i) was carried out using Kuiper’s test statistic V,,. Stephens provides a modified version of V,, of the form V,,(&+a+ b/&) for testing uniformity. It was found necessary to obtain estimates of a and b by fitting, rather than using Stephens’ values. The percentiles of the statistic M ( V,,) = V,,(&-0.467 + 1*623/&) so obtained exhibited some (monotone) dependence on K ; however, the test should be adequate as a guide to goodness-of-fit (see end of this section). Some percentiles for M(V,,) are 10%: 1.138, 5%: 1.207, 1 % : 1.347. (iii) Again, the simulation program outlined in (i) was used, with for the Stephens’ form MN(D,,)= Dn(&-0.01+0-85/&) Kolmogorov-Smirnov statistic when F is hypothesized to be normal but the parameters have to be estimated. (Note that only one parameter, namely K, has to be estimated here.) The simulated percentiles agreed satisfactorily with those given by Stephens. Table l(b) gives some typical results. As a final check, 10,000 simulations of the statistics were made for each of the nine (n, K)-combinations ( n = 1 0 , 1 5 , 3 0 ; K = 5 , 1 0 , 5 0 0 ) and the actual numbers of rejections using the tabulated l o % , 5% and 1 % points counted. Some typical results are given in Table 2. Occasionally, the difference between estimated and nominal rejection levels is greater than twice the estimated standard error of the estimated TABLE2 Simulated Percentage Rejections for Nominal Percentage Rejections of lo%, 5% and 1% when n = I15.30) and K ={lo, 500). K

n

Yo

ME(DN)

MV,)

f‘&(Dn)

15

10 5 1

9.96 4.82 0.86

9.60 4,84 1.10

9.85 4.52 0-88

10 5 1 10 5

10.28

1

9.41 4.59 0.91 9.87 5.00 0.86

5.18 1.06

9-15 4.50 0-91 10.80 5.16 1.19

10 5 1

9.35 4.49 0.78

11.29 5.94 1.13

9.99 5.03 0.99

10

30

15

5.15 0.99 10.05

500 30


147

level (i.e. 2 x [$(1- $)/10,000]~,where $ is estimated level). However, the tests still seem adequate for applied statistical analysis. It is worth pointing out the fact that, since the simulated null distributions of M E ( D n ) and My(D,) calculated from SE and S, respectively are in good agreement with those tabulated by Stephens for the exponential and normal distributions, the sets S, and S, may be adequately modelied by these distributions; hence, any of the goodness-of-fit tests provided by Stephens for testing exponentiality or normality, when the parameter values are unknown, can be used. 4. PowerStudy

The behaviour of each of the recommended statistics under departures from the null hypothesis was examined by simulation. In contrast with the estimation of critical values outlined in Section 3, where large numbers of simulations were required, it was sufficient to use a few hundred simulations to obtained points on the power curves. The following alternative models were used: (i) Check on co-latitude distribution. The alternative model used was that of a Fisherian sample with 10% contamination by outliers. The outliers were simulated from a Fisher distribution whose mean direction was located at the upper 100PY0 quantile of the basic distribution of O’, 0 =0.01, 0.001 and 0-0001. The size of the test was 0.10. Power curves of D, for n = 10, 30, 50 are given in Figure 1, each point on each curve being based on 400 simulations. As expected, there were no differences between the results for K = 10 and K = 50. Similar simulations for the modified form of the Cramer-von Mises statistic W2 indicated that this test is inferior to D, for this hypothesis. (ii) Check on uniformity. T h e alternative model was obtained by simulating the direction cosines (li,mi) from a bivariate normal with zero means, variances 1 / and ~ correlations p = 0.001, 0.5, 0.8; thus the increasing value of p corresponds to increasing departure from uniformity of a’, The design of the simulation experiment was as described in (i). Power curves for V, are shown in Figure 2. (iii) Check on normality. The alternative model was obtained by generating correlated uniform variates U, V and then transforming them as described at the beginning of Section 3; for independent uniform (0, 1)variates U, U,, V was defined to be equal to U with probability else U1,for p = 0.2, 0.5 and 0.8. Thus 0 and @ had the correct marginal distributions but were not independent. The design of the simulation experiment was as described in (i). Power curves for 0, are shown in Figure 3. Power curves for the

148


0

1.o

2.0

3.0

4.0

5.0

log ( 0 )

Fig. 1.-Power curves for a co-latitude test. Samples of size n from 0.9F((O, 0). 0)+ O.lF((O,, 0), 50). where 0, is the upper 100@"/0quantile of the F((0,0). SO) co-latitude distribution.

1 .o

=50 = 30

0.8

0.6

= 10 power

0.4

0.2

0

0.2

0.4

,P

0.6

0.8

0

Fig. 2.-Power curves for uniformity test. Samples of size n from bivariate normal distribution with zero means, variances 1/50, and correlation p.

149


0

I

I

0.2

0.4

I

P

0.6

0.8

1 .0

Fig. 3.-Power curves for normality test. Samples of s u e n from correlated F((O.0). SO) distribution, where marginal probability integral transformed variates are the same with probability p.

modified form of the Cramer-von Mises statistic W’ were essentially the same. It is clear from these figures that each of the tests is sensitive to the type of departure from the null hypothesis that it was designed to detect.

5. Examples Lewis & Fisher (1982) analysed two data sets using probability plotting methods. The first set (Embleton’s data, n = 26) yields the following results for the tests (using the generic symbol M for the compact forms of the statistics): (i) D,, =0.1107, ME =0.5619