High resolution mapping of soil organic carbon stocks ...

4 downloads 0 Views 2MB Size Report
error; C, carbon; DSM, digital soil mapping; DEM, digital elevation model; SLGA ...... H., Kooistra, L., Stevens, A., van Leeuwen, M., van Wesemael, B., Ben-Dor, ... Brungard, C.W., Boettinger, J.L., Duniway, M.C., Wills, S.A., Edwards, T.C., 2015.
Science of the Total Environment 630 (2018) 367–378

Contents lists available at ScienceDirect

Science of the Total Environment journal homepage: www.elsevier.com/locate/scitotenv

High resolution mapping of soil organic carbon stocks using remote sensing variables in the semi-arid rangelands of eastern Australia Bin Wang a,⁎, Cathy Waters b, Susan Orgill a, Jonathan Gray c, Annette Cowie d, Anthony Clark b, De Li Liu a,e a

NSW Department of Primary Industries, Wagga Wagga Agricultural Institute, NSW 2650, Australia NSW Department of Primary Industries, Orange Agricultural Institute, NSW 2800, Australia Science Division, NSW Office of Environment and Heritage, PO Box 644, Parramatta, NSW 2124, Australia d NSW Department of Primary Industries, Trevenna Rd, Armidale, NSW 2351, Australia e Climate Change Research Centre and ARC Centre of Excellence for Climate System Science, University of New South Wales, Sydney, NSW 2052, Australia b c

H I G H L I G H T S

G R A P H I C A L

A B S T R A C T

• Remote sensing covariates improved the estimation of SOC stocks. • Prediction accuracy of tree-based models was superior to support vector machine. • Digital soil mapping for SOC was practical and cost-effective in semi-arid rangelands. • Fractional cover data influenced SOC stock at the soil surface.

a r t i c l e

i n f o

Article history: Received 18 December 2017 Received in revised form 29 January 2018 Accepted 17 February 2018 Available online xxxx Editor: Ouyang Wei Keywords: Soil organic carbon stocks Seasonal fractional cover Remote sensing Machine learning Digital soil mapping

a b s t r a c t Efficient and effective modelling methods to assess soil organic carbon (SOC) stock are central in understanding the global carbon cycle and informing related land management decisions. However, mapping SOC stocks in semi-arid rangelands is challenging due to the lack of data and poor spatial coverage. The use of remote sensing data to provide an indirect measurement of SOC to inform digital soil mapping has the potential to provide more reliable and cost-effective estimates of SOC compared with field-based, direct measurement. Despite this potential, the role of remote sensing data in improving the knowledge of soil information in semi-arid rangelands has not been fully explored. This study firstly investigated the use of high spatial resolution satellite data (seasonal fractional cover data; SFC) together with elevation, lithology, climatic data and observed soil data to map the spatial distribution of SOC at two soil depths (0–5 cm and 0–30 cm) in semi-arid rangelands of eastern Australia. Overall, model performance statistics showed that random forest (RF) and boosted regression trees (BRT) models performed better than support vector machine (SVM). The models obtained moderate results with R2 of 0.32 for SOC stock at 0–5 cm and 0.44 at 0–30 cm, RMSE of 3.51 Mg C ha−1 at 0–5 cm and 9.16 Mg C ha−1 at 0–30 cm without considering SFC covariates. In contrast, by including SFC, the model accuracy for predicting SOC stock improved by 7.4–12.7% at 0–5 cm, and by 2.8–5.9% at 0–30 cm, highlighting the importance of including SFC to enhance the performance of the three modelling techniques. Furthermore, our models produced a more accurate

Abbreviations: SOC, soil organic carbon; SFC, seasonal fractional cover data; RF, random forest; BRT, boosted regression trees; SVM, support vector machine; RMSE, root-mean-square error; C, carbon; DSM, digital soil mapping; DEM, digital elevation model; SLGA, Soil and Landscape Grid of Australia; OOB, out-of-bag; NSW, New South Wales; R2, regression coefficient of determination; MAE, mean absolute error; LCCC, Lin's Concordance Correlation Coefficient; RI, the relative improvement; OEH, Office of Environment and Heritage. ⁎ Corresponding author. E-mail address: [email protected] (B. Wang).

https://doi.org/10.1016/j.scitotenv.2018.02.204 0048-9697/© 2018 Elsevier B.V. All rights reserved.

368

B. Wang et al. / Science of the Total Environment 630 (2018) 367–378

and higher resolution digital SOC stock map compared with other available mapping products for the region. The data and high-resolution maps from this study can be used for future soil carbon assessment and monitoring. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Globally, rangelands account for approximately half of the world's land mass, providing a key role in the mitigation of climate change. The extensive areas occupied by rangelands can potentially store huge amounts of carbon both in biomass and soil organic matter (Bikila et al., 2016). Australian rangelands extend across low rainfall environments accounting for approximately 81% of national land area (http:// www.environment.gov.au/land/rangelands). It is estimated that Australia's rangeland soils store between 34 and 48 Gt of carbon, representing a sequestration potential of 78 Mt C per year (Keating et al., 2009). Soil organic carbon (SOC) is also recognized as the most important indicator of soil fertility and playing a vital role in a range of soil processes (Schillaci et al., 2017a). While the accurate assessment of SOC stock is essential to enhance this resource, quantifying and mapping SOC stocks in the rangelands is challenging due to low levels of SOC and the inherently patchy spatial and temporal patterns of vegetation and soil resources (Waters et al., 2015). Using direct measurement (field survey including soil sampling and laboratory analyses) to determine SOC stocks is both time consuming and costly (Bartholomeus et al., 2011), and prohibitive at large scales (regional, national or global). Digital soil mapping (DSM) techniques are a useful tool to reduce sampling and analytical costs while still obtaining reliable results (Jeong et al., 2017). DSM is the procedure of creating spatial soil information based on mathematical or statistical relationships between field soil observations and related environmental covariates or factors (e.g. climate, vegetation, relief, parent material and time) to understand spatial and temporal variation in soil type and other soil properties in the form of rasters of prediction (Camera et al., 2017; Jeong et al., 2017; Lagacherie et al., 2006; Malone et al., 2016; Minasny and McBratney, 2016). In DSM, these environmental variables can be retrieved from available digital elevation model (DEM), readily accessible remote sensing data and other data sources (such as climate data). The past few decades have seen the growth of DSM as a sub-discipline of soil science, experiencing a continuous expansion mainly due to its increased efficiency (Kempen et al., 2012) and accuracy (Lorenzetti et al., 2015) compared to conventional field soil mapping techniques. With continual growth in computational capacities, the great explosion of ‘Big Data’ involved with the development of data-mining algorithms, geographic information systems, and the increased availability of spatial data (DEM and satellite imagery) (Minasny and McBratney, 2016), DSM is likely to play an increasingly important role in the future monitoring of changes in soil properties and characteristics. Recently, DSM has been successfully applied to map SOC stocks under a range of environments (Bonfatti et al., 2016; Gray et al., 2015; Ottoy et al., 2017; Schillaci et al., 2017a; Wang et al., 2017; Were et al., 2015; Yang et al., 2016). These advances in DSM of SOC mainly result from the development of machine learning techniques and the availability of high-quality covariates. The success of machine learning in DSM is related to several advantages over traditional soil survey. These advantages have been summarized as: 1) DSM is easy to update because predicting models can be stored and rerun when new data become available; 2) Different models of spatial variation can be chosen due to the availability of computing power to process large data sets; 3) The proper use of data mining tools and progress in geographic information systems results in predictions with quantified uncertainty (Kempen et al., 2012; Minasny and McBratney, 2016). In Australia, a recent project has produced the Soil and Landscape Grid of Australia (SLGA) (http://www.clw.csiro.au/aclep/ soilandlandscapegrid/index.html) (Grundy et al., 2015; Viscarra

Rossel et al., 2015) which is based on recent digital soil mapping methods and integrates historical soil information and novel spatial modelling to generate nationwide digital maps of soil attributes including SOC (3 arc sec, approximately 90 m, resolution) (Grundy et al., 2015; Viscarra Rossel et al., 2014). However, the SLGA's accuracy varies between soil depths and soil textures across Australia. For example, the accuracy of SLGA products was higher in clay soil (R2 = 0.53) than that in silt soil (R2 = 0.46) at 0–5 cm. In addition, the range in time since the surveyed soil data were collected may result in poor estimates of the current status of attributes that are dynamic and responsive to land management practices, such as SOC (Grundy et al., 2015). Similarly, DSMs (100 m resolution) have been produced for key soil properties over New South Wales in eastern Australia (OEH, 2017) derived through quantitative modelling techniques (mainly multiple linear regressions) that are based on relationships between soil attributes and different environmental variables. These existing map products were produced at a national or state level, so they may not provide reliable information on SOC stocks down to the local or farm levels. This information is fundamental to monitor changes in the SOC stocks as a consequence of land management and is not available for the semi-arid rangelands of eastern Australia. Accurate predictions of SOC stocks at smaller spatial scales are central in assessing the carbon sink capacity of soils, temporal changes due to seasonal conditions as well as the influence of management (Wang et al., 2017). Remote sensing data have gained attention in the past few decades as a promising secondary data source for improving DSM due to their high accessibility, resolution and availability at a range of scales. Forkuor et al. (2017) summarized the advantages of soil data sources derived from remote sensing as (1) contain extractable soil information, e.g. spectral reflectance, (2) have large spatial coverage and therefore permit mapping of inaccessible areas, (3) produce consistent and comprehensive data both in time and space and (4) provide possibilities of supplementing or at least reducing traditional labourintensive soil sampling in soil surveys. Based on these advantages, numerous studies have explored the use of remote sensing data with varying spatial, temporal and spectral characteristics in digital soil mapping (Forkuor et al., 2017; Rudiyanto et al., 2016; Schillaci et al., 2017a,b; Wang et al., 2017; Yang et al., 2015). For example, Schillaci et al. (2017a) found that the integration of remote sensing with other environmental predictors increased the predictive ability compared to models built without remote sensing covariates. Previous studies in the semi-arid rangelands have shown clear relationships between ground cover (perennial grass and litter cover) and SOC stock (Orgill et al., 2017b; Waters et al., 2015, 2016). These relationships indicate that suitable satellite-derived covariates such as seasonal fractional cover (SFC) data may be useful in the estimation of SOC stocks in these semi-arid environments. However, the efficacy of SFC is improving prediction of SOC in semi-arid rangelands has not been tested. The aim of the present study was to determine a reliable method for mapping the SOC stocks in the semi-arid rangelands of eastern Australia through different machine learning techniques using a set of environmental covariates obtained from remote sensing, and specifically to investigate whether inclusion of SFC improves prediction of SOC. We compared the influence of two groups of predictor variables on machine learning model performance in the study area; (1) 12 covariates (referred to as data set 1) including parent material, relief, climate and radiometric variables that represented a large suite of potentially useful covariates, (2) the covariates in data set 1 plus 16 additional covariates (annual seasonal fractional ground cover; mean value of Band 1 (bare

B. Wang et al. / Science of the Total Environment 630 (2018) 367–378

369

the four depths were summed to obtain the SOC stock to 30 cm soil depth.

ground fraction), Band 2 (green vegetation fraction), Band 3 (non-green vegetation fraction) and Band 4 (model fitting error) during 1988–2015). Three statistical methods which have been widely used for DSM in previous studies, random forest (RF), boosted regression trees (BRT) and support vector machine (SVM) were explored to determine the most suitable method to produce high-resolution SOC stock maps in the study area. We compared our digital maps and alternative available mapping products with direct, field-based soil observations in the study area and assessed the accuracy of each DSM product.

    gravel½% −1 SOC stock Mg C ha ¼ C  BD  D  1− 100

ð1Þ

where C is the concentration of soil carbon (g C (100 g)−1 sieved soil); BD is bulk density of the whole soil (g cm−3); D is the thickness of the corresponding soil depth layer (cm); gravel [%] is the percentage of gravel in the soil sample. To test whether SFC predictors were more sensitive to surface soil, SOC stock at 0–5 cm and 0–30 cm were used in our study.

2. Materials and methods 2.1. Study area The study area encompassed three Bioregions (Mulga Lands, Cobar Peneplain, Darling Riverine Plains) located in the Western Region of New South Wales (NSW), Australia, which have a combined area of 233,877 km2 (Fig. 1) (http://www.environment.gov.au/land/nrs/ science/ibra/australias-bioregions-maps). It lies between the latitude of 28.54° and 34.32° South and the longitude of 141.64° and 150.81° East. The climate is semi-arid (http://koeppen-geiger.vu-wien.ac.at/ present.htm) with average annual rainfall and temperature, 379 mm and 18.9 °C respectively. The terrain is primarily flat with elevation mostly below 500 m above mean sea level. The vegetation includes open woodlands and grasslands with scattered trees and patches of shrubs.

2.3. Environmental predictors Environmental variables were selected to represent the key soilforming factors of climate, parent material, relief and age based on the scorpan model (McBratney et al., 2003). These environmental covariates as outlined below included average annual rainfall, average annual temperature, lithology, gamma radiometric data, clay components, weathering index, DEM and seasonal fraction cover. The original spatial maps/layers from various sources were resampled into raster format with the same 30 m resolution using nearest neighbour resampling method, and all layers were re-projected to a common coordinate reference system for future analyses. The pixel values of the raster layers (environmental predictors) that corresponded to the coordinates of sampling points were extracted and compiled in a database to build the model.

2.2. Observed soil data The soil observations (n = 705) which were used to calibrate and validate the models were collected by NSW Department of Primary Industries and NSW Office of Environment and Heritage during 2008–2016. The sampling and analytical methods are outlined in Wang et al. (2018). Briefly, each data point represents a soil core (to at least 0.30 m). In most cases, soil cores were collected using a hydraulically driven core sampler with a 40 mm inner core diameter to a depth of at least 0.30 m and divided into four soil depths; 0–5 cm, 5–10 cm, 10–20 cm and 20–30 cm. Soil OC was determined on approximately 2 g of finely ground b2 mm soil analysed on a LECO combustion furnace (LECO 1995) (Rayment and Lyons, 2011; Method 6B2b). Bulk density (BD) was determined on subsamples dried at 105 °C as described by Dane and Topp (2002). Results were reported as SOC g (100 g)−1 and BD g cm−3 on an oven-dry basis. For each soil core, SOC stock for each soil layer was calculated according to Eq. (1) using SOC concentration and BD. SOC stocks from

2.3.1. Climate Climate data were obtained from the SILO (Scientific Information for Land Owners) website (PPD, http://www.longpaddock.qld.gov.au/silo/ ppd/index.php) (Jeffrey et al., 2001). We calculated 56-year (1961–2016) mean annual temperature and rainfall for 8022 meteorological stations and interpolated them to a 1 km grid across Australia. 2.3.2. Lithology Silica index was used to indicate lithology due to its relationship to the character and composition of parent material. Gray et al. (2016) classified parent material into a number classes based on a silica index to aid landscape interpretation, i.e. Mafic (45–52% silica), Intermediate lower (52–60% silica), Intermediate upper (60–65% silica), Siliceous lower (65–70% silica), Siliceous mid (70–77% silica), Siliceous upper (77–85% silica), Siliceous extreme (N85% silica) and five other mainly

MUL DRP

COP Elevation Value (m)

High : 540 Low : 30

0

80

160

320 Km

Fig. 1. The study area showing three Bioregions (MUL: Mulga Lands, COP: Cobar Peneplain, DRP: Darling Riverine Plains) and direct, field-based sampling points (green dots) located in the rangelands of eastern Australia. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

370

B. Wang et al. / Science of the Total Environment 630 (2018) 367–378

non-silicate classes. These relate to the inherent soil properties (e.g. nutrient and water holding capacity) that influence SOC accumulation. Filtered radiometric data were obtained from Australian Geoscience Data Cube (http://www.datacube.org.au/). The radiometrics (gammaray spectrometric) method is a geophysical process applied to estimate concentrations of the radioelements potassium (K), uranium (U) and thorium (Th). Gamma radiometrics provide useful information for SOC mapping as a proxy of water retention and clay content and thus the concentrations of K, U and Th were selected as factors in the SOC study (Grinand et al., 2017). Detailed information about Geoscience Australia and the State and Northern Territory Geological Surveys conducted in most of Australia over the past 40 years can be found through the Geophysical Archive Data Delivery System (http://www.ga.gov.au/ scientific-topics/disciplines/geophysics/radiometrics). The clay components including relative proportions of kaolin, illite, smectite and the smectite/kaolin (S/K) ratio were derived from modelling with visible near-infrared spectroscopy (Viscarra Rossel, 2011). 2.3.3. Relief and weathering covariates As the rangelands in western NSW have a low-relief surface and our previous study (Wang et al., 2018) showed that some topographic indices (e.g. aspect, slope and topographic wetness index) except elevation had marginal effects on SOC stocks, and therefore to reduce the computation burden, only elevation from a DEM with a resolution of 30 m was included in the present study. Soils are largely the weathering product of the parent material (Grimm et al., 2008). Weathering index was included as an index to represent the degree of weathering of parent material (Gray et al., 2016). This index was developed using regression models based on airborne gamma-ray spectrometry imagery and the Shuttle Radar Topography Mission elevation data (Wilford, 2012). 2.3.4. Biota (seasonal fractional cover) Seasonal fractional cover data for 1988–2015 were downloaded from AusCover (http://data.auscover.org.au/xwiki/bin/view/Product +pages/Landsat+Seasonal+Fractional+Cover) with a 30 m resolution including four bands (Band 1, bare ground fraction (bare ground, rock, disturbed); Band 2, green vegetation; Band 3, non-green vegetation (litter, dead leaf and branches); Band 4, model fitting error). These four bands are calculated using models linked to an extensive field sampling program using N1500 sites across a broad range of vegetation, soil and climatic types (Muir et al., 2011). Previous studies found that a long time series of a vegetation index (e.g. Enhanced Vegetation Index) significantly enhanced prediction of spatially varying SOC pools (Wilson et al., 2017), therefore for subsequent analyses we calculated the simple arithmetic mean of the entire 28 years' time series for each band and each season. A range of predictor variables that could be related to SOC stocks are provided in Table 2.

must be optimised to generate the best “fit” possible between covariates and outcomes (Brungard et al., 2015). 2.4.1. Random forest Random forest (RF) is a tree-based ensemble learning method that works by building a set of regression trees and averaging results (Breiman, 2001). Within the training procedure, the RF algorithm produces multiple trees. Each regression tree in the forest is independently constructed based on a unique bootstrap sample (sample with replacement) from the original training data set. All trees are grown to maximum size without pruning (Grimm et al., 2008). The bootstrap sampling method makes RF less sensitive to over-fitting compared to decision trees (Heung et al., 2014). The RF method uses either categorical (i.e., classification) or continuous (i.e., regression) response variables, and either categorical or continuous predictor variables. Unlike most common methods based on machine learning, RF only needs two parameters to generate a prediction model: (i) the number of regression trees to grow in the forest (ntree), (ii) the number of randomly selected evidential features at each node (mtry). RF uses some observational values which are not used to construct the trees as the out-of-bag (OOB) sample (on average about one third of the data). This sample can be used for validation purposes by comparing it to the model outputs and calculating the corresponding relative error (OOB error). An additional feature of RF is the capacity to rank the relative importance of the variables in the prediction. Importance of a variable is based on the regression prediction error of the OOB. It is computed as a function of change prediction error by permuting with each input variable and expressed using mean decrease in accuracy (Heung et al., 2014). In error estimation, the OOB samples were predicted by the respective trees and by aggregating the predictions with the mean square error (MSEOOB). MSEOOB was calculated using Eq. (2):

MSEOOB ¼

n  2 1X O −P iOOB n i¼1 i

ð2Þ

2.4. Modelling techniques

where n is the number of observations, P iOOB is the average of all OOB predictions across all trees. In this study, in order to optimise two parameters of ntree and mtry, a number of experiments were conducted using different combinations of ntree and mtry. The range of number of ntree was set between 100 and 900 at intervals of 200, and the number of selected evidential features between 1 and 15 at 1 interval (Rodriguez-Galiano et al., 2015). We split the training data into three groups for cross validations, considering the computational resource available. Three replicates of the three-fold cross-validation were conducted. The final model (optimal model) was selected when the prediction error was lowest. For this study, we fitted a RF model via the ‘randomForest’-package of Rsoftware (https://cran.r-project.org/web/packages/randomForest/ randomForest.pdf).

Three supervised machine learning methods, RF, BRT and SVM were selected for SOC prediction. The first two models allow estimating the relative importance of the predictor variables based on how much worse the prediction would be if the data for that predictor were permuted randomly (Prasad et al., 2006). Each of these methods is able to model complex nonlinear relationships between SOC stocks and environmental variables. They showed good performance for the prediction of SOC stock in various climatic areas (Rudiyanto et al., 2016; Schillaci et al., 2017a; Somarathna et al., 2016; Sreenivas et al., 2016; Yang et al., 2016). In this study, input covariates were selected based on their relationships between soil and environmental factors. Each type of machine learning model has specific and different required parameters (referred to as tuning parameters) to control how the relationship between input predictors and response is defined. These parameters

2.4.2. Boosted regression trees Boosted regression trees (BRT) provide greater predictive performance compared to regression trees by aggregating several regression trees (Sindayihebura et al., 2017). The boosting algorithm is also referred to as a forward, stage-wise procedure in which a subset of the data is randomly selected to iteratively fit new tree models to minimize the loss function (Elith et al., 2008). A stochastic gradient boosting procedure was introduced in this process, which can improve model predictive power and minimize the risk of over-fitting through numerical optimization and regularization (Wang et al., 2017). The relative importance of variables was assessed, given the number of times a variable was selected for splitting and the square improvement to each split (Ottoy et al., 2017). For this study, we fitted a BRT model via the ‘gbm’-package of R-software (https://cran.r-project.org/web/packages/

B. Wang et al. / Science of the Total Environment 630 (2018) 367–378 Table 1 Descriptive statistics of soil organic carbon (SOC) stock (Mg C/ha) (n = 705). Depth (cm)

Min

Max

Mean

Median

CV (%)

0–5 5–10 10–20 20–30 0–30

0.33 0.58 1.49 0.36 5.08

47.96 20.24 39.42 41.77 101.90

7.32 5.57 7.96 6.77 27.61

6.68 5.19 7.10 5.68 25.94

58.37 47.07 53.08 62.64 44.52

gbm/gbm.pdf). Fitting a BRT model requires specification of the following three meta-parameters: (1) learning rate (LR) or shrinkage parameter determining the contribution of each tree to the growing model, (2) tree complexity (TC) controlling the order of interactions that can be fitted, and (3) number of trees (NT) required for optimal prediction, which depends on LR and TC. Similar to the RF model, in order to optimise these three parameters, a number of experiments were conducted using several combinations of the NT, TC and LR. The values of TC were 1, 3, 6, 9, 12, 15; NT varied between 100 and 2000 at 200 intervals; and the values of LR ranged between 0.001 and 0.008 at 0.001 intervals. These combinations generated an optimal BRT model through three repetitions of three-fold cross-validation by performing a grid search using the train function of the ‘caret’-package (Kuhn, 2008). 2.4.3. Support vector machine Support vector machine (SVM) analysis is another popular supervised machine learning tool for classification and regression, proposed by Cortes and Vapnik (1995). There are many successful applications of SVM in image segmentation, object detection, image classification, hand-written recognition, text and hypertext categorization, and applications in the biological and other sciences. SVM uses hyperplanes to divide all of the data into different classes optimally. It has a better learning capability and smaller prediction errors than many other methods (Maynard and Levi, 2017; Were et al., 2015). In general, there are four types of kernel functions commonly seen in SVM. As the relationship between SOC stocks and predictor variables in our case is non-linear, we select Radial basis function (RBF) as the kernel model for SVM in the following experimental test. SVM needs two parameters to be tuned including penalty (cost) that controls the trade-off between margin and training errors, and the kernel width (sigma) that controls the degree of nonlinearity of the model (Naghibi et al., 2017). The regression problem can then be solved by a standard quadratic programming form to obtain the optimal solution. Similarly, in order to optimise two parameters of cost and sigma, a number of experiments were conducted using different combinations of cost and sigma. The number of cost was set between 4, 8, 16, 32 and 64 and the range of number of sigma was set between 0.005 and 0.05 at intervals of 0.005. In the

371

current study, three replicates of the three-fold cross-validation were used to select the optimal parameters of SVM. For this study, we fitted a SVM model via the ‘e1071’-package of R-software (https://cran.rproject.org/web/packages/e1071/e1071.pdf). 2.5. Model performance assessment We used the three machine learning methods (RF, BRT and SVM) to build SOC stock models with and without SFC variables. The first three models only included climate variables, parent material, relief and age covariates (i.e. data set 1, Table 1). The second three models included all predictors (data set 1 plus SFC covariates), allowing an evaluation of the contribution of SFC in this DSM context. To compare model performance, we computed the following validation indices (Yang et al., 2015): (1) regression coefficients of determination (R2), (2) the root mean squared error (RMSE), (3) the mean absolute error (MAE), (4) Lin's Concordance Correlation Coefficient (LCCC) (Lin, 1989) and (5) the relative improvement (RI) of RMSE, which are defined as. 0

12    n ∑i¼1 Oi −O P i −P B C C R2 ¼ B @rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi A  2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   2 n n ∑i¼1 P i −P ∑i¼1 Oi −O

MAE ¼

ð6Þ

n 1X jP −Oi j n i¼1 i

ð7Þ

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n u1 X ðP −Oi Þ2 RMSE ¼ t n i¼1 i LCCC ¼

RI ¼

σ 2o

ð8Þ

2rσ o σ p  2 þ σ 2p þ P−O

ð9Þ

RMSE2 −RMSE1  100% RMSE1

ð10Þ

where Pi and Oi are the predicted and observed SOC stocks; n is the number of samples; P and O are the means for the predicted and observed SOC stocks; σ2o and σ2p are the variances of predicted and observed values and r is the Pearson correlation coefficient between the predicted and observed values. Model with the highest LCCC and R2 and lowest RMSE and MAE is determined to be the most accurate model. The model calibration and validation were repeated using predictors with/without seasonal fractional cover data. Soil samples were randomly divided into two datasets. The first dataset (75%) was used to

Table 2 Environmental variables used in the prediction of SOC stock in the study area. Types

Variables

Definition

Resolution

Climate

Rainfall Temperature Elevation Weathering index Silica index Illite Kaolin Smectite Smectite/kaolin ratio (SK) Potassium (RadK) Uranium (RadU) Thorium (RadTh) Band 1 (four seasons) Band 2 (four seasons) Band 3 (four seasons) Band 4 (four seasons)

Mean annual rainfall Mean annual temperature The height of a location above the Earth's sea level The index to represent the degree of weathering of parent materials An index representing the lithological character of the parent material The relative proportions of illite The relative proportions of kaolin The relative proportions of smectite The ratio of smectite to kaolin (S/K) Concentrations of the radioelements potassium Concentrations of the radioelements uranium Concentrations of the radioelements thorium Bare ground fraction (bare ground, rock, disturbed) Green vegetation fraction Non-green vegetation fraction (litter, dead leaf and branches) Model fitting error

1 km 1 km 30 m 100 m 100 m 100 m 100 m 100 m 100 m 100 m 100 m 100 m 30 m 30 m 30 m 30 m

Relief and weathering Lithology

Biota

372

B. Wang et al. / Science of the Total Environment 630 (2018) 367–378

Table 3 The coefficient variation (CV) of the predictive quality of random forest (RF), boosted regression trees (BRT) and support vector machine (SVM) for SOC stock using different predictors with 100 runs. The CVs of three modelling techniques without seasonal fractional cover data are shown in brackets. The coefficient of determination (R2), Lin's Concordance Correlation Coefficient (LCCC), root mean squared error (RMSE), and mean absolute error (MAE) are used to evaluate accuracy. Model

0–5 cm

0–30 cm

2

RF BRT SVM

R

LCCC

RMSE

MAE

R2

LCCC

RMSE

MAE

22.0% (29.9%) 22.0% (29.2%) 23.0% (36.0%)

14.7% (21.1%) 15.9% (20.8%) 17.1% (27.7%)

20.4% (19.9%) 20.8% (20.6%) 22.0% (20.5%)

8.2% (7.2%) 8.3% (7.6%) 9.4% (9.0%)

11.9% (12.6%) 11.3% (12.6%) 13.1% (17.3%)

6.8% (7.8%) 7.2% (8.0%) 8.2% (11.0%)

9.8% (9.4%) 10.7% (10.5%) 10.6% (10.8%)

6.3% (6.2%) 6.4% (6.5%) 7.0% (7.1%)

tune the parameters of the models by three-fold cross validation with three repetitions. The second dataset (25%) was used to estimate model performances. This procedure was repeated 100 times applying a sampling with replacement method, to obtain 100 random subsamples of the data, each one with its own calibration and validation dataset. At the end, 100 tuned models and model evaluation criteria were returned for each method for SOC stocks. For each raster cell, the 100 model runs were used to determine the mean and the standard deviation. Fig. 2 provides schematic representation of the model selection procedure for SOC stock mapping in the entire study area. 3. Results and discussion 3.1. Descriptive statistics of SOC Table 1 shows the descriptive statistics of SOC stock at each depth interval including the 0–30 cm interval. The SOC stock at 0–5 cm ranged

from 0.33 to 47.96 Mg C ha−1, with a mean and median of 7.32 and 6.68 Mg C ha−1, respectively. The coefficient of variation was 58.4%, representing high variability of SOC stocks across the entire dataset. By contrast, the range of the SOC stock was 5.08–101.90 Mg C ha−1 for the 0–30 cm soil depth, with a mean and median of 27.61 and 25.94 Mg C ha−1, respectively. The coefficient of variation was lower for the 30 cm soil depth (44.5%) than for the 5 cm soil layer. 3.2. The relationships between SOC stocks and the predictors Linear correlations between SOC and quantitative predictors were derived as shown in Fig. 3. SOC stock at the soil surface (0–5 cm) was positively correlated with elevation (r = 0.29) and mean annual rainfall (r = 0.10) but negatively correlated with mean annual temperature (r = −0.33). Of more interest for our study was that correlations between SOC and SFC covariates were all significant. For example, SOC had the slightly higher correlation with Band 2, especially in spring (r = 0.53)

Data set 1+seasonal fractional cover (SFC)

Data set 1 including 12 potential predictors For i=1:100

For i=1:100

705 data points

Randomly split data

Data set 75%

RF

BRT

705 data points

Randomly split data

Data set 75%

Data set 25%

RF

SVM

BRT

Data set 25%

SVM

Tuning by 3-fold CV with 3 repetitions

Tuning by 3-fold CV with 3 repetitions

R2, LCCC, RMSE, MAE

R2, LCCC, RMSE, MAE end

end

Increase or not?

Model performance

Model performance

100 maps of SOC stocks with best model

Mean maps of SOC stocks

SD maps of SOC stocks

Fig. 2. Schematic overview of assessing the model performances and generation of SOC maps for different soil depth layers (0–5 cm and 0–30 cm) for the rangelands in eastern Australia. RF, random forest; BRT, boosted regression trees; SVM, support vector machine; R2, regression coefficients of determination; MAE, mean absolute error; LCCC, Lin's Concordance Correlation Coefficient; RMSE, root-mean-square error; SD, standard deviation.

B. Wang et al. / Science of the Total Environment 630 (2018) 367–378

and winter (r = 0.50), compared to the other three Bands. Similar linear correlations between SOC stock and environmental covariates could be found for the 0–30 cm soil depth (Fig. 3).

3.3. Model comparison and evaluation RF, BRT and SVM were fitted to SOC stock at 0–5 cm and 0–30 cm soil depths. To evaluate the modelling uncertainty, boxplots were used to visualise the distribution of the RMSE, R2, MAE and LCCC from the 100 runs for SOC stock at two soil depths for different model algorithms (Fig. 4). We found that there were moderate variations of these four indices based on 100 runs (Table 3), suggesting these three models were moderately stable in their predictive ability. The modelling uncertainty can be attributed to the variability in observed SOC stock values, in addition, uncertainty may also be introduced in low predictor precision and modelling errors (Goidts et al., 2009; Yang et al., 2016). Our results showed that the BRT model outperformed the RF and SVM when excluding SFC covariates in the model, offering the highest mean value of R2 and LCCC and the lowest mean value of MAE and RMSE of the 100 runs, although the difference between the four evaluation indicators in the three machine learning techniques was less pronounced. This is in agreement with the results of Yang et al. (2016) who reported that BRT performed better than the RF model in predicting SOC stocks on the Tibetan Plateau, China. Meanwhile, in our study the mean R2 values suggested that BRT model could explain approximately 44% variation of SOC stock at 0–30 cm depth. However, the BRT model showed relatively low prediction performance at 0–5 cm depth (R2 = 0.32), which may be due to the effects on the surface 0–5 cm soil layer of other environmental variables which were not directly included in our study (e.g. soil erosion).

373

Incorporation of SFC covariates improved SOC stock predictions in the three models, which is consistent with the significant correlations between SFC and SOC (Fig. 3). This was expected, as addition of predictors (more information), the better the model (Yang et al., 2015). For example, the R2 of the RF model increased from 0.29 to 0.43 at 0–5 cm and from 0.42 to 0.48 at 0–30 cm. Similar increases could be found with the BRT and SVM models. Comparing the RMSE change over the three models with and without SFC, the results showed that the relative improvement in RMSE was 7.4–12.7% for 0–5 cm and 2.8–5.9% for 0–30 cm soil depths, therefore highlighting the role of SFC covariates in predicting SOC stocks in the surface 0–5 cm soil layer. This was unsurprising due to the greater concentration of SOC in the surface 5 cm of soil compared with deeper (N5 cm) soil layers, the vulnerability of surface soil to erosion and the sensitivity of this layer to changes in soil moisture, temperature and organic matter (biomass) inputs which are key drivers of SOC (Orgill et al., 2017a). Although environmental conditions, sampling strategies, quantity and quality of the auxiliary data used, and validation methods of our study differed from previous studies, the regression quality indices of modelling techniques (the mean value of R2 among the three models was 0.43 for 0–5 cm and 0.48 for 0–30 cm) were comparable with other regional and fine-scale studies. For example, Román-Sánchez et al. (2018) also used RF but achieved much lower explained variance of 18% in Sierra Morena, Spain. Wang et al. (2017) developed a BRT model to explain 39–65% in north eastern China. A study in the central highlands of Madagascar, Razakamanarivo et al. (2011) found that BRT model explained 61–68% of the total spatial SOC variability. Gray et al. (2015) used Cubist linear piecewise decision tree and multiple linear regression models to examine the key soil-forming factors controlling SOC stocks in eastern Australia. In this latter study independent validation of the SOC stock predictions showed LCCC values up to 0.68,

Fig. 3. Pearson correlation coefficients for the relation between SOC stock (0–5 cm and 0–30 cm) and 28 predictor variables used in this study. The correlations with p-value N 0.05 are considered as insignificant. In this case the correlation coefficient values are blank (white).

374

B. Wang et al. / Science of the Total Environment 630 (2018) 367–378

which were similar to the LCCC values of validation points ranging between 0.54 and 0.74 (Fig. 4) found in our study. Generally, tree-based ensemble methods are reported to provide more accurate SOC stock predictions compared to SVM models (Forkuor et al., 2017; Ließ et al., 2016). However, Were et al. (2015) found SVM was a more accurate model for predicting the spatial distribution of SOC stock compared to RF in the Eastern Mau Forest Reserve, Kenya. Forkuor et al. (2017) reported RF as having better prediction accuracy compared to BRT in mapping soil properties in west Africa, while Yang et al. (2016) found the latter superior to the former in soil carbon storage prediction at the north-eastern edge of the Tibetan Plateau in China. Similarly, RF and BRT outperformed SVM in our study in a semi-arid rangeland environment. Based on this, no single machine learning algorithm might serve best for every landscape, and therefore models should be calibrated and assessed in order to identify the most appropriate for the situation (Forkuor et al., 2017).

3.4. Relative importance of environmental variables The relative importance of each environmental variable in the RF and BRT models was assessed by normalizing the environmental variables of each model to 100% (Fig. 5). The results revealed different dominating environmental features between the two models. In the BRT model, SOC was mainly explained by Band 1 (29% relative importance summed by four seasonal values), followed by Band 2 (28%) and Band 3 (13%) at 0–5 cm. This showed that different bands of SFC imagery were the most influential factors in predicting SOC. These results were expected because SFC variables have been shown to be correlated with the spatial patterns of topsoil C (Fig. 3). Specifically, importance measures of the predictor variables investigated by BRT indicated that bare ground fraction in spring (Band 1_spring) was the most important variable in explaining the spatial distribution of SOC stock, followed by green vegetation fraction in winter (Band 2_winter) and Band 2_spring.

SOC stocks (0-5cm) 0.7

NO_SFC

MAE (Mg C/ha)

RMSE (Mg C/ha)

LCCC

R

2

0.6

WITH_SFC

0.5 0.4

SOC stocks (0-30cm) 0.7 0.6 0.5 0.4

0.3 0.2

0.3 0.2

0.1

0.1

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

5.0 4.5

11

4.0 3.5

10 9

3.0 2.5 2.0

8 7

2.6

7.0

2.4

6.5

2.2

6.0

2.0

5.5

1.8

5.0

RF

BRT

SVM

RF

BRT

SVM

Fig. 4. Results of model evaluation criteria for prediction of soil carbon stocks (Mg C/ha) using random forest (RF), boosted regression tree (BRT) and support vector machine (SVM) with 100 runs for the different input environmental variables (NO_SFC: 12 predictor variables without seasonal fractional cover data sets (SFC); WITH_SFC: 28 predictors including 16 SFC variables). The coefficient of determination (R2), Lin's Concordance Correlation Coefficient (LCCC), root mean squared error (RMSE), and mean absolute error (MAE) are used to evaluate accuracy. The black lines within the box indicate the medians with 100 runs while crosshairs indicate means. Box boundaries indicate the 25th and 75th percentiles, whiskers below and above the box indicate the 10th and 90th percentiles. A good model will have R2 and LCCC close to 1 and RMSE and MAE of almost 0.

B. Wang et al. / Science of the Total Environment 630 (2018) 367–378

The other four mineralogy predictors namely lithology, smectite, illite and kaolin can be considered less important, as the relative importance of those variables was lower than that of the previous variables. These findings indicate the potential application of remote sensing techniques to mapping SOC distribution in the surface soil layer in semi-arid rangeland environments. However, the importance of each predictor variable (~5%) in the RF model showed the similar patterns in the surface 0–5 cm soil. For example, rainfall explained 5.0% of SOC variation, followed by Band 3 in winter (4.3%), Band 2 in autumn (4.3%) and Band 2 in summer (4.0%). The SFC covariates were important factors in determining SOC in the BRT and RF models at 0–5 cm. This result demonstrates that the proportion of bare ground and vegetation in different seasons derived from Landsat images are practical indicators for representing surface (0–5 cm) soil organic carbon in the semi-arid rangelands. While vegetation cover is a key driver of SOC in this study, in contrast to other studies (Taghizadeh-Mehrjardi et al., 2016; Yang et al., 2016), it is both the role of vegetation in supplying organic matter and the loss of SOC through erosion of bare soil that can influence SOC stocks. For example, normalized difference vegetation index (NDVI) is the most commonly used band ratio in ecological environment research and it has been widely applied in the studies of SOC. Page et al. (2013) found that NDVI was one of the most important variables affecting SOC stocks across Queensland grain-cropping regions. However, there are some limitations of only using vegetation indices in semi-arid and arid environments because the effects of exposed soil, standing dead vegetation and litter on the spectral response can be substantial (Maynard and Levi, 2017),

especially when bare soil cover is N20% (Sankey and Weber, 2009). In our study, the spatial pattern of SOC in the surface 0–5 cm layer was not only highly related to vegetation, but also to the fraction of bare ground in rangelands (Fig. 3). The study area was characterised by large bare soil fractions; Band 1 ranged between 22.5% and 32.9% over four seasons. Therefore, the correlation between SOC and bare ground may relate to the loss of SOC via soil erosion, equally or more than the absence of vegetation influencing the OM input. We conclude that satellite-derived covariates such as fractional cover (bare ground fraction) show their utility in the model for the estimation of SOC stocks in the semi-arid rangelands of eastern Australia. It is noteworthy that the relative influence of each factor varied at 0–5 and 0–30 cm with the SFC covariates becoming less dominant and climate variables becoming more dominant with the larger depth interval (i.e. 0–30 cm). Rainfall was identified as the most important variable that affected the spatial distribution of SOC stocks for RF and BRT model at 0–30 cm. Similar to the findings of this study, Davy and Koen (2013) found rainfall as the most important variable influencing SOC stock in eastern Australia. This is not surprising due to the relationship between climate variables (precipitation and temperature) and soil moisture, a key driver of plant growth and net primary productivity, and therefore SOC dynamics. This is supported in other studies in eastern Australia, which identified temperature and rainfall as important factors controlling SOC (Allen et al., 2013; Gray et al., 2015; Hobley et al., 2015). The contributions of the remaining predictors to the models were more or less the same, namely their exclusion only marginally influenced model performance at 0–30 cm depth.

0-5 cm lithology sme illite Band4_spring kaolin Band4_autumn Band4_summer Band4_winter radk sk Weathering Band3_autumn Temperature Band3_winter Rainfall Band3_spring radu DEM radth Band3_summer Band2_summer Band2_autumn Band1_autumn Band1_winter Band1_summer Band2_spring Band2_winter Band1_spring

0-30 cm lithology illite Band4_summer kaolin sme Band4_autumn Band4_spring Band4_winter Band3_autumn sk Band3_spring Band1_summer Band3_summer Weathering Band1_winter Band3_winter Band1_autumn radk Band2_summer Band2_spring DEM radth Temperature Band2_autumn Band2_winter Band1_spring radu Rainfall

BRT RF

0

3

6

375

9

Variable importance (%)

12

BRT RF

0

4

8

12

16

20

Variable importance (%)

Fig. 5. Patterns in the importance of each predictor variable used in random forest (RF) and boosted regression tree (BRT) models to predict SOC stocks at 0–5 cm and 0–30 cm. Each variable was scaled to sum to 100%.

376

B. Wang et al. / Science of the Total Environment 630 (2018) 367–378

Fig. 6. Mean and standard deviation (SD) of SOC stock at 0–5 cm and 0–30 cm predicted from 100 runs of the random forest (RF) model.

3.5. Spatial prediction By adding SFC predictors, our results indicated that RF can produce good model results for SOC as noted above (Fig. 4). Therefore, the RF model was selected as the best algorithm based on evaluation criteria to predict the spatial distribution of SOC across the study area. Instead of only one model, a total of 100 RF models were applied to the study area as variation was observed within each model run (Table 3). Fig. 6 displays the spatial patterns of SOC stock for two soil depths (0–5 cm

RF model

90

Predicted SOC stocks (0-30cm, Mg C/ha)

SLGA 80

OEH

70

y = 0.48x + 14.27 R² = 0.47

60 50 40

y = 0.17x + 24.98 R² = 0.24

30

y = 0.01x + 31.00 R² = 0.001

20 10 0 0

20

40

60

80

100

120

Observed SOC stocks (0-30cm, Mg C/ha) Fig. 7. Observed (validation datasets) vs. predicted values for SOC stock from Office of Environment and Heritage (OEH) and Soil and Landscape Grid of Australia (SLGA) digital map (0–30 cm, Mg C/ha) together with our random forest (RF) model using seasonal fractional cover based on mean values of 100 runs.

and 0–30 cm) predicted by RF models. Broadly, there was an increasing gradient of SOC stock from west to east influenced by the rainfall gradient (see rainfall distribution in Liu et al. (2014)), with the highest stock on the eastern and south-eastern parts. The temperature and elevation, which were correlated factors, were also responsible for great variations, with low stocks in low altitude, warmer regions and higher stocks in high altitude, cooler parts of the study area. Most of the eastern parts of rangelands in the study area had N40 Mg C ha−1 SOC in the top 30 cm whereas the average stock in the western parts was b30 Mg C ha−1. Similar spatial patterns can be found in the 0–5 cm soil layer. In addition, higher standard deviation of SOC stock was found in the northern and southern parts of the study area. In this study, differences in seasonal rainfall patterns and soil type occurred across the study area. The north-eastern extent of the study area tends to have a more warm-season dominant rainfall pattern with a high proportion of high-clay soils (Vertisols) compared to south-eastern areas with winter-dominated rainfall patterns and dominated by coarser texture, sandy soils (Arenosols). Plants growing in these regions will respond differently to rainfall, due to different moisture characteristics. The higher soil fertility and higher rainfall favour higher net primary productivity and may explain the higher SOC stocks in the north-eastern part compared with the western parts of the study area typified by less rainfall and higher temperatures. We compared our digital mapping of SOC stock with maps that were produced at national and state scale, namely the Soil and Landscape Grid of Australia soil attributes (90 m) map (referred to as SLGA) that can be downloaded from the CSIRO Data Access Portal (DAP) (http:// www.clw.csiro.au/aclep/soilandlandscapegrid/GetData-DAP.html) (Viscarra Rossel et al., 2014) and Office of Environment and Heritage (OEH) digital soil mapping (100 m) over NSW (OEH, 2017), which can be downloaded from the OEH Data Portal (http://data. environment.nsw.gov.au/dataset/digital-soil-maps-for-key-soilproperties-over-nsw). Viscarra Rossel et al. (2014) presented the Australian SOC stock map for the soil depth layer of 0–30 cm, which

B. Wang et al. / Science of the Total Environment 630 (2018) 367–378

was produced with kriging of residuals. The OEH map was a product of linear regression model with Log-link function. The SOC stock at 0–30 cm predicted by RF in this study, and the existing SLGA and OEH maps were plotted against the observed SOC stock (Fig. 7). The regression equation was calculated for the 100 validation data sets. Overall, the R2 of RF was 0.47, SLGA R2 was 0.24 and OEH R2 was 0.001, which showed that our RF model predictions had stronger agreement with the observed values than the other two products. The intercept of RF was 43% and 54% lower than SLGA and OEH, respectively. Furthermore, the slope of regression for the RF model was not significantly different with 1 (p = 0.45) and was much higher than OEH. Both SLGA and OEH maps underestimated more observed SOC stock at a number of sites in the study area compared with our model. This is likely to be due to differences in data sources, scale of prediction, types of predictor variables and modelling techniques used. This suggested that these spatial maps may be unsuitable for predicting SOC stock in soils with low SOC stock in rangelands. Inevitably, there is some uncertainty involved in our study. The limited accuracy may be related to variation in environmental conditions between the training and validation data, lower correlation found between SOC and some selected predictors, and random errors in measures of the training or validation samples (Bonfatti et al., 2016). Poor coverage of samples in north-western parts of the study area may result in high SOC uncertainty in these areas. It is noteworthy that retrieval of auxiliary spatial data from different sources means different data quality (Were et al., 2015). For example, SFC data are limited by the high level of cloud cover in some areas, which leads to data gaps in the resulting SFC image. Some previous studies highlight that the spatial distribution of SOC is subject to various factors operating at different levels of scale (Forkuor et al., 2017; Miller et al., 2015; Wilson et al., 2017). Therefore, multi- or hyper-scale remote sensing data which consider different spatial scales might further improve prediction (Maynard and Levi, 2017; Miller et al., 2015). In addition, some soil erosion and deposition data are not included in the models for lack of suitable data. Thus, incorporating the missing environmental data and multiple scale predictors, as well as the stochastic component by analysing the spatial structure of residuals with geostatistical techniques (e.g. kriging) are some of the ways to minimize spatial prediction errors in the future (Were et al., 2015). 4. Conclusion In this study we produced an accurate and high resolution DSM of SOC stocks using remote sensing data. This map can be used to benchmark and monitor changes in SOC stock at regional scales in the semiarid rangelands of eastern Australia. Our study firstly used multidecadal (1988–2015) seasonal fractional cover data as potential remote sensing predictors to predict SOC stock with different machine learning techniques. Our findings demonstrated that SFC covariates were important factors in predicting top SOC stock with reasonable accuracy in the semi-arid rangelands and including SFC variables in machine learning techniques improves model accuracy by about 2.8–12.7%. The model performance of predicting SOC was better at 0–30 cm than that at 0–5 cm. The accuracies obtained in this study are promising for future local scale digital soil mapping efforts in data-poor regions such as the rangelands, particularly considering the increasing availability of free high resolution remote sensing data. The use of remote sensing products can reduce field-based soil sampling efforts and therefore may be more cost-effective. This study also reveals that the performance of models in mapping SOC can vary considerably depending on the type of model, the extent at which the environmental covariates were selected and the depth of mapping. It is also evident that the relative influence of each factor varied at the two soil depths (0–5 and 0–30 cm) with the SFC covariates less dominant and climate variables more dominant in the 0–30 cm compared with the 0–5 cm soil layer. In addition, improving model performance by incorporating multi- or hyper-scale

377

remote sensing data may warrant further investigation. Nevertheless, the high-resolution maps and the methodology developed from this study should be of use for ongoing soil carbon assessment in rangeland regions of Australia and beyond.

Acknowledgements Funding for this research was provided by NSW Treasury and NSW Department of Primary Industries (RDE852-2). The authors would like to thank Dr. Xihua Yang for his helpful comments on an early draft of the manuscript. Thanks to Mr. Mark Young at OEH for providing additional observed soil data and Mr. Ian McGowen and Dr. Marja Simpson for their downloading topographic data and land fractional cover data. References Allen, D.E., Pringle, M.J., Bray, S., Hall, T.J., O'Reagain, P.O., Phelps, D., Cobon, D.H., Bloesch, P.M., Dalal, R.C., 2013. What determines soil organic carbon stocks in the grazing lands of north-eastern Australia? Soil Res. 51 (8), 695–706. Bartholomeus, H., Kooistra, L., Stevens, A., van Leeuwen, M., van Wesemael, B., Ben-Dor, E., Tychon, B., 2011. Soil organic carbon mapping of partially vegetated agricultural fields with imaging spectroscopy. Int. J. Appl. Earth Obs. Geoinf. 13 (1), 81–88. Bikila, N.G., Tessema, Z.K., Abule, E.G., 2016. Carbon sequestration potentials of semi-arid rangelands under traditional management practices in Borana, Southern Ethiopia. Agric. Ecosyst. Environ. 223, 108–114. Bonfatti, B.R., Hartemink, A.E., Giasson, E., Tornquist, C.G., Adhikari, K., 2016. Digital mapping of soil carbon in a viticultural region of Southern Brazil. Geoderma 261, 204–221. Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32. Brungard, C.W., Boettinger, J.L., Duniway, M.C., Wills, S.A., Edwards, T.C., 2015. Machine learning for predicting soil classes in three semi-arid landscapes. Geoderma 239, 68–83. Camera, C., Zomeni, Z., Noller, J.S., Zissimos, A.M., Christoforou, I.C., Bruggeman, A., 2017. A high resolution map of soil types and physical properties for Cyprus: a digital soil mapping optimization. Geoderma 285, 35–49. Cortes, C., Vapnik, V., 1995. Support-vector networks. Mach. Learn. 20 (3), 273–297. Methods of soil analysis: part 4 physical methods. In: Dane, J.H., Topp, C.G. (Eds.), Agronomy No. 9. Soil Science Society of America, Madison, WI, United States of America. Davy, M.C., Koen, T.B., 2013. Variations in soil organic carbon for two soil types and six land uses in the Murray Catchment, New South Wales, Australia. Soil Res. 51 (8), 631–644. Elith, J., Leathwick, J.R., Hastie, T., 2008. A working guide to boosted regression trees. J. Anim. Ecol. 77 (4), 802–813. Forkuor, G., Hounkpatin, O.K.L., Welp, G., Thiel, M., 2017. High resolution mapping of soil properties using remote sensing variables in South-Western Burkina Faso: a comparison of machine learning and multiple linear regression models. PLoS One 12 (1), e0170478. Goidts, E., Van Wesemael, B., Crucifix, M., 2009. Magnitude and sources of uncertainties in soil organic carbon (SOC) stock assessments at various scales. Eur. J. Soil Sci. 60 (5), 723–739. Gray, J.M., Bishop, T.F.A., Wilson, B.R., 2015. Factors controlling soil organic carbon stocks with depth in Eastern Australia. Soil Sci. Soc. Am. J. 79 (6), 1741–1751. Gray, J.M., Bishop, T.F.A., Wilford, J.R., 2016. Lithology and soil relationships for soil modelling and mapping. Catena 147, 429–440. Grimm, R., Behrens, T., Märker, M., Elsenbeer, H., 2008. Soil organic carbon concentrations and stocks on Barro Colorado Island — digital soil mapping using random forests analysis. Geoderma 146 (1), 102–113. Grinand, C., Maire, G.L., Vieilledent, G., Razakamanarivo, H., Razafimbelo, T., Bernoux, M., 2017. Estimating temporal changes in soil carbon stocks at ecoregional scale in Madagascar using remote-sensing. Int. J. Appl. Earth Obs. Geoinf. 54, 1–14. Grundy, M.J., Rossel, R.A.V., Searle, R.D., Wilson, P.L., Chen, C., Gregory, L.J., 2015. Soil and landscape grid of Australia. Soil Res. 53 (8), 835–844. Heung, B., Bulmer, C.E., Schmidt, M.G., 2014. Predictive soil parent material mapping at a regional-scale: a Random Forest approach. Geoderma 214–215, 141–154. Hobley, E., Wilson, B., Wilkie, A., Gray, J., Koen, T., 2015. Drivers of soil organic carbon storage and vertical distribution in Eastern Australia. Plant Soil 390 (1), 111–127. Jeffrey, S.J., Carter, J.O., Moodie, K.B., Beswick, A.R., 2001. Using spatial interpolation to construct a comprehensive archive of Australian climate data. Environ. Model Softw. 16 (4), 309–330. Jeong, G., Oeverdieck, H., Park, S.J., Huwe, B., Ließ, M., 2017. Spatial soil nutrients prediction using three supervised learning methods for assessment of land potentials in complex terrain. Catena 154, 73–84. Keating, B., Grundy, M., Battaglia, M., Eady, S., 2009. An Analysis of Greenhouse Gas Mitigation and Carbon Biosequestration Opportunities From Rural Land Use. CSIRO, St Lucia, QLD https://doi.org/10.4225/08/58615c9dd6942 (changeme:822). Kempen, B., Brus, D.J., Stoorvogel, J.J., Heuvelink, G.B.M., de Vries, F., 2012. Efficiency comparison of conventional and digital soil mapping for updating soil maps. Soil Sci. Soc. Am. J. 76 (6), 2097–2115. Kuhn, M., 2008. Building Predictive Models in R Using the Caret Package. vol. 28(5) p. 26. Lagacherie, P., McBratney, A., Voltz, M., 2006. Digital Soil Mapping: An Introductory Perspective. vol. 31. Elsevier.

378

B. Wang et al. / Science of the Total Environment 630 (2018) 367–378

Ließ, M., Schmidt, J., Glaser, B., 2016. Improving the spatial prediction of soil organic carbon stocks in a complex tropical mountain landscape by methodological specifications in machine learning approaches. PLoS One 11 (4), e0153673. Lin, L.I.K., 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics 45 (1), 255–268. Liu, D.L., Anwar, M.R., O'Leary, G., Conyers, M.K., 2014. Managing wheat stubble as an effective approach to sequester soil carbon in a semi-arid environment: spatial modelling. Geoderma 214, 50–61. Lorenzetti, R., Barbetti, R., Fantappiè, M., L'Abate, G., Costantini, E.A.C., 2015. Comparing data mining and deterministic pedology to assess the frequency of WRB reference soil groups in the legend of small scale maps. Geoderma 237, 237–245. Malone, B.P., Jha, S.K., Minasny, B., McBratney, A.B., 2016. Comparing regression-based digital soil mapping and multiple-point geostatistics for the spatial extrapolation of soil data. Geoderma 262, 243–253. Maynard, J.J., Levi, M.R., 2017. Hyper-temporal remote sensing for digital soil mapping: characterizing soil-vegetation response to climatic variability. Geoderma 285, 94–109. McBratney, A.B., Mendonça Santos, M.L., Minasny, B., 2003. On digital soil mapping. Geoderma 117 (1), 3–52. Miller, B.A., Koszinski, S., Wehrhan, M., Sommer, M., 2015. Impact of multi-scale predictor selection for modeling soil properties. Geoderma 239–240, 97–106. Minasny, B., McBratney, A.B., 2016. Digital soil mapping: a brief history and some lessons. Geoderma 264, 301–311. Muir, J., Schmidt, M., Tindall, D., Trevithick, R., Scarth, P., Stewart, J., 2011. Guidelines for field measurement of fractional ground cover: a technical handbook supporting the Australian collaborative land use and management program. Tech. Rep.Queensland Department of Environment and Resource Management for the Australian Bureau of Agricultural and Resource Economics and Sciences, Canberra Naghibi, S.A., Ahmadi, K., Daneshi, A., 2017. Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Resour. Manag. 31 (9), 2761–2775. OEH, 2017. Digital soil mapping of key soil properties over NSW. Technical Report. NSW Office of Environment and Heritage, Sydney. Orgill, S.E., Condon, J.R., Conyers, M.K., Morris, S.G., Murphy, B.W., Greene, R.S.B., 2017a. Parent material and climate affect soil organic carbon fractions under pastures in south-eastern Australia. Soil Res. 55 (8), 799–808. Orgill, S.E., Waters, C.M., Melville, G., Toole, I., Alemseged, Y., Smith, W., 2017b. Sensitivity of soil organic carbon to grazing management in the semi-arid rangelands of southeastern Australia. Rangel. J. 39 (2), 153–167. Ottoy, S., De Vos, B., Sindayihebura, A., Hermy, M., Van Orshoven, J., 2017. Assessing soil organic carbon stocks under current and potential forest cover using digital soil mapping and spatial generalisation. Ecol. Indic. 77, 139–150. Page, K.L., Dalal, R.C., Pringle, M.J., Bell, M., Dang, Y.P., Radford, B., Bailey, K., 2013. Organic carbon stocks in cropping soils of Queensland, Australia, as affected by tillage management, climate, and soil characteristics. Soil Res. 51 (8), 596–607. Prasad, A.M., Iverson, L.R., Liaw, A., 2006. Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9, 181–199. Rayment, G.E., Lyons, D., 2011. Soil Chemical Methods - Australasia. CSIRO Publishing, Collingwood, Victoria Australia. Razakamanarivo, R.H., Grinand, C., Razafindrakoto, M.A., Bernoux, M., Albrecht, A., 2011. Mapping organic carbon stocks in eucalyptus plantations of the central highlands of Madagascar: a multiple regression approach. Geoderma 162 (3–4), 335–346. Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., Chica-Rivas, M., 2015. Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 71, 804–818. Román-Sánchez, A., Vanwalleghem, T., Peña, A., Laguna, A., Giráldez, J.V., 2018. Controls on soil carbon storage from topography and vegetation in a rocky, semi-arid landscapes. Geoderma 311, 159–166. Rudiyanto, Minasny, B., Setiawan, B.I., Arif, C., Saptomo, S.K., Chadirin, Y., 2016. Digital mapping for cost-effective and accurate prediction of the depth and carbon stocks in Indonesian peatlands. Geoderma 272, 20–31.

Sankey, T.T., Weber, K., 2009. Rangeland assessments using remote sensing: is NDVI useful? In: Weber, K.T., Davis, K. (Eds.), Final Report: Comparing Effects of Management Practices on Rangeland Health With Geospatial Technologies (NNX06AE47G), pp. 113–122 (168 pp.) Schillaci, C., Acutis, M., Lombardo, L., Lipani, A., Fantappiè, M., Märker, M., Saia, S., 2017a. Spatio-temporal topsoil organic carbon mapping of a semi-arid Mediterranean region: the role of land use, soil texture, topographic indices and the influence of remote sensing data to modelling. Sci. Total Environ. 601–602, 821–832. Schillaci, C., Lombardo, L., Saia, S., Fantappiè, M., Märker, M., Acutis, M., 2017b. Modelling the topsoil carbon stock of agricultural lands with the Stochastic Gradient Treeboost in a semi-arid Mediterranean region. Geoderma 286, 35–45. Sindayihebura, A., Ottoy, S., Dondeyne, S., Van Meirvenne, M., Van Orshoven, J., 2017. Comparing digital soil mapping techniques for organic carbon and clay content: case study in Burundi's central plateaus. Catena 156, 161–175. Somarathna, P.D.S.N., Malone, B.P., Minasny, B., 2016. Mapping soil organic carbon content over New South Wales, Australia using local regression kriging. Geoderma Reg. 7 (1), 38–48. Sreenivas, K., Dadhwal, V.K., Kumar, S., Harsha, G.S., Mitran, T., Sujatha, G., Suresh, G.J.R., Fyzee, M.A., Ravisankar, T., 2016. Digital mapping of soil organic and inorganic carbon status in India. Geoderma 269, 160–173. Taghizadeh-Mehrjardi, R., Nabiollahi, K., Kerry, R., 2016. Digital mapping of soil organic carbon at multiple depths using different data mining techniques in Baneh region, Iran. Geoderma 266, 98–110. Viscarra Rossel, R.A., 2011. Fine-resolution multiscale mapping of clay minerals in Australian soils measured with near infrared spectra. J. Geophys. Res. Earth Surf. 116 (F4) (n/a–n/a). Viscarra Rossel, R.A., Webster, R., Bui, E.N., Baldock, J.A., 2014. Baseline map of organic carbon in Australian soil to support national carbon accounting and monitoring under climate change. Glob. Chang. Biol. 20 (9), 2953–2970. Viscarra Rossel, R.A., Chen, C., Grundy, M.J., Searle, R., Clifford, D., Campbell, P.H., 2015. The Australian three-dimensional soil grid: Australia's contribution to the GlobalSoilMap project. Soil Res. 53 (8), 845–864. Wang, S., Zhuang, Q., Wang, Q., Jin, X., Han, C., 2017. Mapping stocks of soil organic carbon and soil total nitrogen in Liaoning Province of China. Geoderma 305, 250–263. Wang, B., Waters, C., Orgill, S., Cowie, A., Clark, A., Li Liu, D., Simpson, M., McGowen, I., Sides, T., 2018. Estimating soil organic carbon stocks using different modelling techniques in the semi-arid rangelands of eastern Australia. Ecological Indicators 88, 425–438. Waters, C.M., Melville, G.J., Orgill, S.E., Alemseged, Y., 2015. The relationship between soil organic carbon and soil surface characteristics in the semi-arid rangelands of southern Australia. Rangel. J. 37 (3), 297–307. Waters, C.M., Orgill, S.E., Melville, G.J., Toole, I.D., Smith, W.J., 2016. Management of grazing intensity in the semi-arid rangelands of southern Australia: effects on soil and biodiversity. Land Degrad. Dev. 28, 1363–1375. Were, K., Bui, D.T., Dick, Ø.B., Singh, B.R., 2015. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. Ecol. Indic. 52, 394–403. Wilford, J., 2012. A weathering intensity index for the Australian continent using airborne gamma-ray spectrometry and digital terrain analysis. Geoderma 183 (Suppl. C), 124–142. Wilson, C.H., Caughlin, T.T., Rifai, S.W., Boughton, E.H., Mack, M.C., Flory, S.L., 2017. Multidecadal time series of remotely sensed vegetation improves prediction of soil carbon in a subtropical grassland. Ecol. Appl. 27 (5), 1646–1656. Yang, R., Rossiter, D.G., Liu, F., Lu, Y., Yang, F., Yang, F., Zhao, Y., Li, D., Zhang, G., 2015. Predictive mapping of topsoil organic carbon in an alpine environment aided by Landsat TM. PLoS One 10 (10), e0139042. Yang, R.-M., Zhang, G.-L., Liu, F., Lu, Y.-Y., Yang, F., Yang, F., Yang, M., Zhao, Y.-G., Li, D.-C., 2016. Comparison of boosted regression tree and random forest models for mapping topsoil organic carbon concentration in an alpine ecosystem. Ecol. Indic. 60, 870–878.