Contextualized Geographically Weighted Principal

0 downloads 0 Views 133KB Size Report
P. Harris, N.J.K. Howden, S. Peukert, V. Noacco, K. Ramezani, ... isolated catchments, where in each catchment, water chemistry, precipitation and ... this data using a geographically weighted principal components analysis, where we.
Chapter 99

Contextualized Geographically Weighted Principal Components Analysis for Investigating Baseline Soils Data on the North Wyke Farm Platform P. Harris, N.J.K. Howden, S. Peukert, V. Noacco, K. Ramezani, E. Tuominen, B. Eludoyin, R. Brazier, A. Shepherd, B. Griffith, R. Orr, and P. Murray

Abstract The UK’s North Wyke Farm Platform (NWFP) for sustainable grassland farming is set up as a large agriculture modelling system of 15 hydrologicallyisolated catchments, where in each catchment, water chemistry, precipitation and soil moisture data are continuously monitored. This spatio-temporal data are then interrogated with respect to climatic timings and changes in crop, livestock and farm management, across the NWFP. Complementary data sets are also found via spatial field surveys, remote sensing and greenhouse gas studies. This study focuses on one such field survey, consisting of soils data at 495 sites. We spatially explore this data using a geographically weighted principal components analysis, where we provide a novel adaptation of the technique to deal with the distinctly partitioned nature of the data, which is collected across 20 fields, spread over the 15 catchments. Keywords Grasslands • Livestock production • Non-stationarity • Local models

99.1

INTRODUCTION

The North Wyke Farm Platform (NWFP) at Rothamsted Research in the SouthWest of England, is a large, farm-scale experiment for collaborative research, training and knowledge exchange in agro-environmental sciences; with the aim P. Harris (*) • A. Shepherd • B. Griffith • R. Orr • P. Murray North Wyke, Rothamsted Research, Okehampton, Devon EX20 2SB, UK e-mail: [email protected] N.J.K. Howden • V. Noacco • K. Ramezani • E. Tuominen Faculty of Engineering, University of Bristol, Bristol BS8 1TR, UK S. Peukert • B. Eludoyin • R. Brazier Department of Geography, University of Exeter, Exeter, Devon EX4 4RJ, UK © Capital Publishing Company 2016 N.J. Raju (ed.), Geostatistical and Geospatial Approaches for the Characterization of Natural Resources in the Environment, DOI 10.1007/978-3-319-18663-4_99

651

652

P. Harris et al.

of addressing agricultural productivity and ecosystem responses to different management practices. The 70 ha NWFP site, captures the data necessary to develop a better understanding of the dynamic processes and underlying mechanisms that can be used to model how agricultural grassland systems respond to different management inputs. Here, via beef and sheep production, the underlying principle is to manage each of three farmlets (each consisting of five hydrologically-isolated catchments) in three contrasting ways: (i) improvement through use of mineral fertilizers; (ii) improvement through use of legumes; and (iii) improvement through innovation. The connectivity between the timing and intensity of different management operations and the transport of nutrients and potential pollutants from the farm is evaluated using sensor technology (providing numerous catchment-specific, temporal data sets) coupled with traditional field studies. For this study, we focus our attention on the latter with the statistical analysis of a 2012 soils survey (for the following seven variables: Bulk Density, Total Carbon, Total Nitrogen, Soil Organic Matter, pH, Isotope 13 for Carbon and Isotope 15 for Nitrogen), covering all 15 NWFP catchments. Height and slope are also included, taking the study data to nine variables in total. The design of the NWFP precedes this survey, where catchments were allocated to each farmlet based on: (a) historical farm practices; (b) expert knowledge of the physical properties of the North Wyke site; and (c) a need for a certain spatial connectivity between the five catchments of each farmlet. The soils data is viewed as baseline data before different management inputs were set in motion in early 2013. The soils data can be analyzed in a number of ways, where for this study we apply a geographically weighted principal components analysis (GWPCA) [1, 2]. A GWPCA can: (1) provide insights into how the dimensionality and structure of the soil variables varies across space, (2) identify local data anomalies; and (3) direct future soil sampling campaigns. For this study, we focus our attention on how GWPCA can be used as a local dimension reduction technique, where we introduce an adaptation of GWPCA to deal with the partitioned nature of the data.

99.2

METHODOLOGY

In a GWPCA, a series of localized PCAs are computed, where the local component outputs are mapped, permitting a local identification of any change in the structure of the multivariate data. The choice of kernel weighting function and the type and size of its bandwidth are all crucial. Fixed bandwidths (constant distance) suit data sets that are sampled over a fairly regular grid, whilst adaptive bandwidths (constant local sample size) suit irregular sample configurations. With this in mind, we calibrate our GWPCAs with a bi-square kernel using adaptive bandwidths, whose sizes are found (automatically and objectively) via cross-validation. We find the GWPCA results only at the sample sites, but variances and loadings could have been found at un-observed sites, also. As in any GWPCA, we report the results of

99

Contextualized Geographically Weighted Principal Components Analysis for. . .

653

the global PCA, so that we can gauge the extent of spatial heterogeneity. In addition to the application of a standard GWPCA, we demonstrate an adapted GWPCA that attempts to deal with the partitioned nature of the soils data; data which is collected across 20 distinct fields spread over the 15 catchments. Field partitions include roads, hedges, fences, and the man-made French drains needed to hydrologicallyisolate the catchments. Thus the soils data is not expected to be fully-continuous across the NWFP site (and in parts, may reflect historical farm practices). Instead continuity is only really expected within each of the 20 fields. Observe here, that is not viable to apply separate, local PCAs within each field, as some fields are highly under-sampled with as few as six observations (whilst some fields are relatively rich in information with over 100 observations). Also observe that a degree of continuity is still likely across all 15 catchments, and this study-scale process would not be accounted for if a piecemeal PCA approach was followed. To adapt GWPCA, we follow an approach similar in spirit to that used in the contextualized GW regression [3], where for our study, the weighting is a combination of the usual geographical weighting, together with a second geographical weighting that groups data by the field that they are located in. Thus the usual bi-square kernel is used for the first weighting, with:  2 2 wij ¼ 1  dij =r 1

if dij  r 1 and wij ¼ 0

otherwise;

ð99:1Þ

where the bandwidth is the geographic distance r1; dij is the geographic distance between spatial locations of the ith and jth rows in the data matrix; and, wij is the geographic weight attached to an observation point indexed by j, for a calibration point indexed by i. This first weighting is multiplied by a second weighting, again using a bi-square kernel, with:  2 wkl ¼ 1  ðdkl =r 2 Þ2

if dkl  r 2 and wkl ¼ 0 otherwise;

ð99:2Þ

where the bandwidth is the geographic distance r2; dkl is the geographic distance between the centroids of the kth and lth fields; and, wkl is the geographic weight attached to an observation point that is field indexed by l, for a calibration point that is field indexed by k. This second weighting function entails that data within each field are given the same second weighting, regardless of their location in that field (i.e. there are only ever 20 distinct second weights). Only the bandwidth r1 is found optimally, as r2 is user-specified so that all observations are given a (non-zero) second weighting. The combined weighting function is such, that for a GWPCA calibration point near to a boundary, a near-by observation that is within its field will be given a larger weight (i.e. more importance) than a near-by observation that is at the same distance from the GWPCA calibration point, but located in a different field. This contextualized GWPCA should reflect the expected discontinuities in the soils data.

654

99.3

P. Harris et al.

RESULTS

For our PCA and GWPCA fits, the same globally standardized data is used (see [2] for some consequences of this). The PCA results (Fig. 99.1a) reveal that the first four components collectively account for 76.7% of the variation in the data. Thus proceeding with the same number of retained components for our GWPCA calibrations is natural, as it directly corresponds to that of a reasonable PCA specification. An optimal bandwidth for GWPCA using the single, standard weighting function is found to be 69.3% (i.e. the nearest 343 observations are weighted), with a minimum cross-validation score ¼ 1.41. An optimal bandwidth for the contextualized GWPCA, using the combined weighting function is much tighter at 28.9%, with a minimum cross-validation score ¼ 1.15. Figs. 99.1b-d present the percentage of total variance (PTV) maps (again, with four retained components) for both GWPCAs, together with a second, standard GWPCA using the smaller bandwidth of the contextualized GWPCA. This third GWPCA acts as a control to gauge the effects of using different bandwidths between our standard and contextualized GWPCAs. It is clear that both standard

Fig. 99.1 (a) PCA cumulative PTV plot; (b) standard GWPCA PTV map (69.3% bandwidth); (c) contextualized GWPCA PTV map (28.9% bandwidth); and (d) standard GWPCA PTV map (28.9% bandwidth)

99

Contextualized Geographically Weighted Principal Components Analysis for. . .

655

GWPCAs are representative of some smoothly-varying continuous multivariate process, where the GWPCA with the larger bandwidth tends to an over-smoothing, whilst the GWPCA with the smaller bandwidth tends to an under-smoothing. Neither reflects the expected discontinuities in the soils data at the field boundaries or strongly different variances between fields. Conversely, the contextualized GWPCA behaves as expected, where clear discontinuities are present according to the field boundaries. Considering the contextualized GWPCA provides the smallest cross-validation score, we tentatively assume that its PTV map provides the truest representation of changes in local data dimensionality in the soils data.

99.4

CONCLUDING REMARKS

For this study, we have outlined an adapted form of GWPCA to deal with the partitioned nature of the NWFP baseline soils data. Only changes in local data dimensionality were reported, but future work will report changes in the local relationships of the nine variables. The use of contextualised GWPCA to detect local outliers and direct future sampling campaigns will also be considered.

REFERENCES 1. Harris, P., Brunsdon, C. and Charlton, M.: Geographically Weighted Principal Components Analysis. Int. J Geogra Inf Sci, 25(10), 1717–1736 (2011) 2. Harris, P., Clarke, A., Juggins, S., Brunsdon, C. and Charlton, M.: Enhancements to a geographically weighted principal components analysis in the context of an application to an environmental data set. In press Geogr Anal (2014) 3. Harris, R., Dong, G. and Zhang, W.: Using Contextualized Geographically Weighted Regression to Model the Spatial Heterogeneity of Land Prices in Beijing, China. Transactions in GIS, 17(6), 901–919 (2013)