nearest neighbourhood designs - IASRI

2 downloads 0 Views 253KB Size Report
principles viz., replication, randomization and local control. Replication ..... the sequel we discuss the cases of block designs and two-dimensional designs.
NEAREST NEIGHBOURHOOD DESIGNS Rajender Parsad and Cini Verghese I.A.S.R.I., Library Avenue, New Delhi-110 012 1. Introduction This talk is mainly based on a review paper on "Nearest Neighbour Designs for Comparative Experiments: Optimality and Analysis" by G.K.Shukla and P.S.Gill, that appeared in the proceedings of the Symposium on Optimization, Designs of Experiments and Graph Theory (1986). It is well known that the theory of designs is based on three principles viz., replication, randomization and local control. Replication provides an estimate of the experimental error that acts as a basic unit of measurement for assessing the significance of observed treatment differences. Randomization is supposed to eliminate the bias by removing the differences in the characteristics exhibited by the experimental plots. Local control refers to the balancing, blocking and grouping of the experimental units so as to reduce the magnitude of the experimental error, thereby, making the experimental design efficient. The traditional ways of local control such as blocks, Latin and Lattice squares were developed in the third and fourth decades of this century. These methods have been a subject of mathematical study and are well understood by now. They serve the purpose well where fertility variations are smooth and well known so that the experimental area can be divided into homogeneous blocks or rows and columns. However, these methods are often applied uncritically in a cookbook approach, under the conditions where crop yield variations and fertility variations are complicated and often change from one season to another. In such cases it is likely that a bad choice of blocks may actually increase the error variance instead of decreasing it. Recently, there has been a considerable interest in the use of some alternative methods of local control called spatial or nearest neighbour (abbreviated as NN) methods. Surprisingly, these methods have evoked little interest along the Indian statisticians so far. This is because in agricultural field trial experiments, plots occurring close together within a field area are well known to be similar than plots occurring far away from each other. The article is divided into two sections. Section 1, introducing the currently available NN methods. Section 2 deals with the more sophisticated problem of optimality of experimental designs. 2.

Nearest Neighbour Methodology

2.1 PAPADAKIS Covariance Method (The Genesis) It was first suggested by Papadakis (1937) and further discussed by Bartlett (1938) and Papadakis (1940), but thereafter it had been neglected until recent resurgence. The method relies on the assumption that the yield from a plot is closely related to the yields from its immediate neighbours due to inherent positive correlation between the fertility of

Nearest Neighbourhood Designs

neighbouring plots. The basic method is as follows. Let yi denote the yield from ith plot and yi be the mean yield of all the plots that receive the treatment applied in the ith plot. 1. Let ei = yi − yi be a residual from the ith plot. 2. Compute a covariate for each interior plot by taking the mean of the residuals of adjacent plots. For an edge plot, the covariate is the residual from the adjacent interior plot. 3. Analyze the data by using the method of analysis of covariance. Bartlett (1938) examined the Papadakis method and suggested the use of blocks among with NN adjustment. It was also seen that the expected contribution of the regression term to the sum of squares in the analysis of covariance is about two times the true error variance when the regression coefficient is nearly zero. Consequently, Bartlett suggested two degrees of freedom for the covariate instead of the usual one. Papadakis (1940) suggested the use of two regression coefficients, one for interior plots, and one for edge plots. Iterative method of calculating residuals, the use of second NN residuals as an additional covariate and an extension of the method for two-dimensional data were also given. Uniformity trial data were used to check the properties of the Papadakis method. Papadakis continued the development of his method and suggested many changes (Papadakis, 1954, 1970, 1984). Papadakis method can be applied to analyze any type of data from any design and assumes the least about the fertility pattern of the field. The method has always been open to the logical objection that covariates are calculated using treatment effects derived immediately from the data, yet it has been the feeling of theoreticians as well as practitioners that the method has potential for increasing the precision of comparative experiments. Pearce and Moore (1976), Pearce (1978, 1980) and Kempton and Howes (1981) reported substantial gains (as high as 50% in some cases) in the efficiency of treatment comparisons when Papadakis method was used. The method is suitable for variety trials where one-dimensional blocks are very long so those distant plots have little in common. 2.2 Correlation Methods (The reincarnation of Papadakis method) Another technique (closely related to Papadakis Method) of analyzing experimental data is based on the use of correlation among observations through generalized least squares (GLS) analysis. In situations, where the structure of correlation is known or can be postulated adequately, it may be advantageous to use this information at the stages of design and analysis of experiment. Williams (1952) considered the case when observations are assumed to follow the first order and second-order autoregressive (AR) processes and experimental plots are arranged in time or along a line. Atkinson (1969) investigated the connection of Papadakis procedure with the analysis based on first-order AR errors. He showed that the Papadakis estimator is the first approximation to the ML estimator. Variances of the two estimators were also shown to be nearly the same. Theoretical results were also

538

Nearest Neighbourhood Designs

confirmed through simulation. Draper and Farraggi (1985) also established the relationship of the two estimators. In the light of recent developments spatial statistics (e.g., Whittle (1954), Besag (1974), Bartlett (1978)) re-examined Papadakis method theoretically for both one-dimensional and two-dimensional layouts making use of various forms on correlation models for observations. Iteration of Papadakis estimator, by using the residuals from one stage to obtain the estimator at the next stage, was also suggested. The adjusted error sum of squares was shown to be a conservative measure of the accuracy of treatment estimation. For large number of treatments, the additional advantage of using blocking along with NN adjustment was shown to be nonsignificant. Some numerical examples from field experiments and simulation experiments were given for illustrative purposes and an appreciable increase in efficiency to treatment estimation was reported. Bartlett's paper stimulated a lively and useful discussion of the issue as is evident from a series of papers that followed it. In the following sections we continue the discussion of further developments of the Papadakis-type methods. 2.3 WILKINSON's Methods Wilkinson, Eckert, Hancock and Mayo (1983) described the results of extensive MonteCarlo randomization studies of the Papadakis method on uniformity trial data. The results showed that while a non-iterated Papadakis analysis is reasonably valid under randomization, iterated estimator is more efficient but leads to upward bias in treatment F-ratio. The most serious defect of method, however, was found to be its inherent efficiency when trend effects are appreciable. They proposed a smooth trend + independent error model. yi =τi +ξi+ηi (2.3.1) where τi represents the effects of treatment applied in the ith plot, ξi is the smooth trend component on plot i with var(ξi) = σ ξ2 , and ηi‘s are independent local errors with var(ηi)= ση2 . The smoothness-of-trend assumption here is that the residual trend components, after local linear detrending, ξi' = ξi' − ξ Ni , are small relative to the standard deviation of local errors. Here ξ Ni denotes average of trend components of neighboring plots of plot i. Based upon this model, they suggested the analysis comprising two phases termed the intra - N and inter - N analysis analogues of intra - block and recovery of inter - block information analyses in the classical fixed block methodology. Validity of new method under randomization (for example, approximate unbiasedness of treatment F-ratio) was demonstrated empirically through Monte Carlo studies. The net efficiency of the method was shown to be comparable with the efficiency factors for lattice and lattice square designs with similar replications. A concept of partial nearest neighbour balance for large experiments was also formulated. Street and Street (1985)

539

Nearest Neighbourhood Designs

considered the construction of nearest neighbour designs, which satisfy the balance conditions of Wilkinson et al (1983). The new methodology has been reported to be used in field trials in Australia, New Zealand and Canada (Wilkinson, (1984)). An important note made in the discussion of Wilkinson et al. (1983) by Patterson, Besag and others was that this method yields estimates very similar to those obtained by generalized least squares estimation assuming correlated observations. 2.4 Least Squares Smoothing Method Green, Jennison and Seheult (1983, 1985) proposed a purely data - analytic approach called least squares smoothing (LSS), in which an explicit use of smoothness assumption is made. The model is the same as proposed by Wilkinson et al.(1983), and can be written in the matrix notation as y = Xτ + ξ + η (2.4.1) Where y and τ are respectively the vectors of observations and treatment effects; X is the design matrix; components of ξ are spatially smooth trend effects; and η is the random error vector. The smoothness - of - trend in one dimension means that in some sense, the second differences ξi −1 − 2 ξi + ξi +1 should be small. A least squares approach leads to estimates of τ and ξ which minimize the penalty function φξ' ∧' ∧ξ +(y - Xτ - ξ)' T(y - Xτ - ξ)

(2.4.2)

subject to a side condition 1' τ = 0 so that the overall mean is included in ξ. Here, ∧ξ is the vector of second differences of ξ, φ is a tuning constant which can be varied to control the degree of smoothness in the estimate of ξ. It can be seen that τ is the same as that obtained by generalized least squares on ∧ y, assuming E (∧ y) = ∧ x τ and Var(∧ y)=σ2 (φ-1 + ∧∧'). The choice of the tuning constant φ, estimation of the error variance and some generalizations of LSS were discussed by Green et al (1985). One advantage of the LSS method is that it allows an examination of the fitted ξ, and the residuals η, which may be helpful in detecting any outliers in the data and patterns in ξ. Green et al (1985) illustrated the LSS method using data from variety trials and experiment on mildew control. Green and et al.(1985) suggested the use of cross-validation both to choose the degree of smoothing required, and to select the appropriate smoother. There are some more NN methods available for example, Exponential variance, linear variance and errors in variable modules, the bifurcation of treatment etc. 2.5 Some Comments on NN Methods Recently there has been a rapid growth of NN methods, none of which has yet achieved an acceptance for routine use in field experimentation. Perhaps it is too early to expect such a radical change because the well established classical methods can not be discarded 540

Nearest Neighbourhood Designs

unless some new method is clearly superior, on average, to the best of the classical methods. In 1984, the British region of Biometric Society conducted a one-day Workshop to compare the methods currently available, in the hope of making practical recommendations for the guidance of applied statisticians (Kempton, 1984). These methods are often recommended for increasing the efficiency of treatment estimation. But in the absence of proper randomization theory, assessment of the precision of estimates poses a formidable problem. The main advantage of a randomized design and analysis is the simplicity. The analysis can be justified by permutation tests thus making the inference model-free. 3. Experimental Designs For Correlation Models In section 2, we discussed the currently available nearest neighborhood analysis methods. While some of them can be viewed as alternatives to the traditional methods for local control, others can be used in conjunction with the traditional methods. It is also important to note that almost all the NN methods can be embraced by generalized least squares analysis assuming some correlation pattern for random errors. Atkinson (1969), Draper and Faraggi (1985), and Martin (1982) showed the equivalence of Papadakis and correlation methods. Patterson, Besag and others, while discussing Wilkinson et al (1983), pointed out that Wilkinson’s procedure yields treatments very similar to those obtained through GLS analysis. Similarly least squares smoothing (though not the generalized version), first differences method of Besag and Kempton (1986) can be shown to be equivalent to GLS analysis. There exists a wide spectrum of models, which can be used to incorporate correlation among neighboring observations. Although the estimates of treatment contrasts are reasonably robust to the assumed correlation structure, yet the variance of an estimated treatment contrast may heavily depend on the assumed correlation structure. In the following we discuss some of the models which have been used in NN methodology. Autoregressive models: In one-dimensional case, let εi denote the error component of the observation from the ith plot. The simplest model, frequently used in time series analysis, is the first-order autoregressive (AR(1)) or Markov model εi = ρ εi-1 +ηi (3.1) 2 where ρ