A comparison study of DRASTIC methods with

0 downloads 0 Views 4MB Size Report
values estimated correctly as nitrate, False Positive (FP) is the number of nitrate point ...... est impact on GVA. ... As the AU-ROC is based on TP, TN, FP, and FN ...
Science of the Total Environment 642 (2018) 1032–1049

Contents lists available at ScienceDirect

Science of the Total Environment journal homepage: www.elsevier.com/locate/scitotenv

A comparison study of DRASTIC methods with various objective methods for groundwater vulnerability assessment Khabat Khosravi a, Majid Sartaj b, Frank T.-C. Tsai c, Vijay P. Singh d, Nerantzis Kazakis e, Assefa M. Melesse f, Indra Prakash g, Dieu Tien Bui h, Binh Thai Pham i,j,⁎ a

Faculty of Natural Resources, Sari Agricultural Science and Natural Resources University, Sari, Iran Civil Engineering Department, University of Ottawa, Ottawa, Ontario K1N6N5, Canada Department of Civil and Environmental Engineering, Louisiana State University, Baton Rouge, LA 70803, USA d Department of Biological and Agricultural Engineering & Zachry Department of Civil Engineering, Texas A & M University, USA e School of Geology, Aristotle University of Thessaloniki, Greece f Department of Earth and Environment, AHC-5-390, Florida International University, USA g Department of Science & Technology, Bhaskarcharya Institute for Space Applications and Geo-Informatics (BISAG), Gandhinagar, India h Geographic Information System Group, Department of Business and IT, University of South-Eastern Norway, Gullbringvegen 36, N-3800 Bø i Telemark, Norway i Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Viet Nam j Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Viet Nam b c

H I G H L I G H T S

G R A P H I C A L

A B S T R A C T

• Bivariate, machine learning, DRASTIC methods were compared for groundwater assessment. • Predictive power of WOE is the highest whereas of DRASTIC is the lowest. • Eight extra factors were investigated for groundwater vulnerability assessment. • The most important factors have been identified using IGR and SE.

a r t i c l e

i n f o

Article history: Received 5 March 2018 Received in revised form 10 June 2018 Accepted 11 June 2018 Available online xxxx Editor: Ouyang Wei Keywords: Groundwater vulnerability DRASTIC Weights-of-Evidence Shannon Entropy

a b s t r a c t Groundwater vulnerability assessment is a measure of potential groundwater contamination for areas of interest. The main objective of this study is to modify original DRASTIC model using four objective methods, Weights-ofEvidence (WOE), Shannon Entropy (SE), Logistic Model Tree (LMT), and Bootstrap Aggregating (BA) to create a map of groundwater vulnerability for the Sari-Behshahr plain, Iran. The study also investigated impact of addition of eight additional factors (distance to fault, fault density, distance to river, river density, land-use, soil order, geological time scale, and altitude) to improve groundwater vulnerability assessment. A total of 109 nitrate concentration data points were used for modeling and validation purposes. The efficacy of the four methods was evaluated quantitatively using the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC). AUC value for original DRASTIC model without any modification of weights and rates was 0.50. Modification of weights and rates resulted in better performance with AUC values of 0.64, 0.65, 0.75, and 0.81 for BA, SE, LMT, and WOE methods, respectively. This indicates that performance of WOE is the best in assessing groundwater vulnerability for DRASTIC model with 7 factors. The results also show more improvement in predictability of

⁎ Corresponding author at: Ton Duc Thang University, Ho Chi Minh City, Viet Nam. E-mail address: [email protected] (B.T. Pham).

https://doi.org/10.1016/j.scitotenv.2018.06.130 0048-9697/© 2018 Elsevier B.V. All rights reserved.

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049 Logistic Model Tree Bootstrap Aggregating

1033

the WOE model by introducing 8 additional factors to the DRASTIC as AUC value increased to 0.91. The most effective contributing factor for ground water vulnerability in the study area is the net recharge. The least effective factors are the impact of vadose zone and hydraulic conductivity. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Groundwater is one of the most important sources of drinking water due to less chance of pollution from the surface. In arid and semi-arid regions such as Iran, groundwater is considered as the sole safe water source for domestic, industrial, and agricultural activities. However, groundwater quantity and quality are being threatened by increasing demands due to population growth and agricultural/industrial activities on one hand and increasing pollution from discharge of wastewater and application of chemical fertilizers on the other hand (Gardner and Vogel, 2005). Nitrate contamination of aquifers has become a significant problem as a result of land application of fertilizers in many agricultural areas. The United States (US) of America Environmental Protection Agency uses nitrate concentration in groundwater as an indicator for: a) groundwater quality deterioration, b) specific vulnerability mapping, and c) identification of susceptible locations (Shrestha et al., 2016). Assessment of aquifer vulnerability or pollution potential is an important step in order to implement any plan for managing groundwater resources and protecting these water sources against pollution (Hashimoto et al., 1982). DRASTIC is one of the most widely used methods for vulnerability assessment of groundwater resources (Fijani et al., 2013; Kim and Hamm, 1999; Panagopoulos et al., 2006; Rahman, 2008; Sadeghfam et al., 2016). It is an overlay-index method developed by Aller et al. (1987b). DRASTIC uses seven hydrogeological factors which control the movement of contaminants in an aquifer (Wang et al., 2012). Despite its popularity, the original DRASTIC model has been used frequently without any validation against field measurements such as nitrate concentration in groundwater. In addition, weights of the factors are assigned based on pre-set values originally proposed by Aller et al. (1987b) and the sub-factors or rates are selected based on experts' judgment that introduces human subjectivity, error, and uncertainty. Thus, many researchers proposed different modified versions of DRASTIC to address these issues. There are two general approaches for modification of DRASTIC method: changing the factors of the original DRASTIC such as subtraction of factors (Evans and Myers, 1990) or adding extra factors such as land-use and irrigation type (Secunda et al., 1998), or modifying the weight and rate scores based on field measurement data and coupling vulnerability and hazard maps. Recently, the Analytical Hierarchy Process (AHP) (Kang et al., 2017; Neshat et al., 2014), the Adaptive Neuro-Fuzzy Inference System (ANFIS) (Fijani et al., 2013), the Fuzzy Logic (Asadi et al., 2017; Dixon, 2005; Nadiri et al., 2017a; Sahoo et al., 2016), the supervised committee fuzzy logic (Nadiri et al., 2017b), the single parameter sensitivity analysis (Sahoo et al., 2016), the Dempster–Shafer Theory of evidence (DST) (Al-Abadi, 2017; Neshat and Pradhan, 2015a), the frequency ratio (Neshat and Pradhan, 2015b), and the weight of evidence (WOE) (Abbasi et al., 2013) have been used to improve the results of DRASTIC. Sahoo et al. (2016) used entropy information (E-DRASTIC), fuzzy pattern recognition (F-DRASTIC), and single parameter Sensitivity Analysis (SA-DRASTIC) to modify the weights of DRASTIC parameters for aquifer vulnerability assessment and to compare performance of the subjective (DRASTIC and SA-DRASTIC) and the objective (E-DRASTIC and FDRASTIC) weighting-based methods. They concluded that objective methods were effective in assessing vulnerability of the study area. Nadiri et al. (2017b) used Supervised Intelligent Committee Machines (SICM) model to modify DRASTIC parameters. The artificial intelligence models were trained using measured nitrate concentration. The modified model showed high correlation with the observed nitrate

concentrations (r = 0.94–0.98). Although some of the above models such as Artificial Neural Network, Fuzzy Logic and ANFIS have been applied with some degree of improvement, but in some cases they suffer from low prediction accuracy and weakness in determination of best weighs for membership functions (Bui et al., 2016a,b). In addition there is need to try other methods such as decision trees-based algorithms (such as Logistic Model Tree), which have not been used for groundwater vulnerability assessment yet. Decision trees-based algorithms have been shown to perform better in terms of prediction power than other models that have a hidden layer in their structure (such as ANN, FL, and ANFIS) (Kisi et al., 2012). Kazakis and Voudouris (2015) modified DRASTIC by replacing soil media and vadose zone with aquifer thickness, nitrogen loss from soil, and hydraulic resistance and developed DRASTIC-PA and DRASTICPAN for groundwater vulnerability and risk assessment. The generated vulnerability maps and risk zones correlated well with nitrate pollution. Asadi et al. (2017) modified DRASTIC by adding an extra factor of land use to assess groundwater vulnerability to nitrate in the study area. The Spearman rank correlation factor for modified model was 0.61 compared to 0.52 for the original DRASTIC model. They also concluded that recharge and land use were the most significant parameters for vulnerability assessment. In addition to land-use, there are other hydrogeological factors such as geological faults or streams that can affect the transport of contaminants and could to be evaluated for inclusion to modify the original DRASTIC model. The objective of this study is two-fold. The first one is to incorporate four objective methods, Weights-of-Evidence (WOE), Shannon Entropy (SE), Logistic Model Tree (LMT) and Bootstrap Aggregating (BA) to modify the weights and rate scores to overcome the subjectivity in DRASTIC and to improve groundwater vulnerability assessment. The second one is to modify the DRASTIC (mod-DRASTIC) method using the best performing objective model from the previous step and also adding extra factors that have the potential to affect the transport of contaminants such as nitrate into groundwater and assess the performance of the modified model. The extra factors that were considered include distance to faults, fault density, distance from rivers, river density, land-use, soil order, geological time scale, and altitude. Nitrate is one of the predominant contaminants associated with agricultural activities in the study area. It has high solubility and mobility and thus can easily reach groundwater. Therefore, measured nitrate concentrations from monitoring wells in the study area were used for modification of the parameters and to check the correlation between pollution and vulnerability index. Performance of the above modified models was then compared to original DRASTIC model. The main contributions of this study are: (1) testing new objective methods for determination of DRASTIC weights and rating scores; (2) comparing bivariate statistical objective methods to machine learning methods used for modification of DRASTIC model; (3) evaluating a modified DRASTIC model by introducing new parameters; and (4) investigating a quantitative validation method for nitrate correlation and model performance. 2. Study area Sari-Behshahr plain in northern Iran was selected as the study area as large amount of geohydrological data was available for the assessment of hydrogeological characteristics and nitrate concentration in the aquifer for various existing and proposed DRASTIC model studies. This plain is located in the east part of Mazandaran province, Iran,

1034

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

Fig. 1. Location map of the study area Sari-Behshahr plain, Iran showing locations of sampling wells and nitrate distribution.

which extends over an area of 1643 km2 and lies between longitude 52°55′ to 53°56′ E and latitudes 35°56′ to 36°39′ N (Fig. 1). The altitude of the plain varies from −29 m to 615 m above mean sea level. Two unconfined aquifers, Sari-Neka and Behshahr-Bandargaz, are underlying the Sari-Behshahr plain. The slope of the area varies from 0° to 55°. About 87% area has low relief with slopes lower than 2%. According to the Du Marten classification system, the Sari-Behshahr plain has a humid climate with mean annual precipitation of 760 mm (Mazandaran weather bureau report). In recent years, intense agricultural activities and a lack of aquifer protection zones in the area have deteriorated groundwater quality. The plain is covered by various soils with different compositions of clay, silt and sand. The unconfined aquifers are located in the Quaternary formation. Most of the study area is covered by a mixture of agricultural land and fallow land. There are 18,343 wells, 17 qanats, and 21 springs. The combined groundwater discharge from the plain area is about 162 million m3 per year. According to the Food and Agriculture Organization (FAO) of the United Nations method, the amount of infiltration from rainfall is about 71.6 million m3 per year in the study area, out of which about 42 million m3 per year percolate and reach to the aquifers. 3. Data and methodology The original DRASTIC model requires seven hydrogeological parameters: depth to groundwater (D), net recharge (R), aquifer media (A), soil media (S), topography (T), impact of the vadose zone (I), and hydraulic conductivity (C). For the modification in model additional eight more factors were investigated in this study. These factors include distance to faults (DF), fault density (FD), distance from rivers (DR), river density (RD), land-use (LU), soil order (SO), geological time scale (GTS), and altitude (AL). The complete data-modeling process is shown in Fig. 2. All the required data layers were prepared in ArcGIS environment. The major steps of modeling are summarized below: (1) Preparing of the data of seven original DRASTIC factors and eight new factors. (2) Preparing groundwater vulnerability assessment map using original DRASTIC method. (3) Dividing nitrate concentration data into two groups: I. Locations

(4) (5)

(6) (7) (8)

of nitrate concentration b50 mg/l were considered unpolluted and II. Locations of nitrate concentration higher than 50 mg/l were considered polluted (Fig. 1). Dividing nitrate concentrations data set into training (70%) and testing (30%) data sets. Modifying weights and rates of original DRASTIC model using the four objective methods (LMT, BA, WOE and SE) and evaluating performance of the models for groundwater vulnerability assessment. Determining the best model for modification of DRASTIC weight and rate scores based on performance of the objective models. Modifying DRASTIC (mod-DRASTIC) by using the best objective model and adding extra factors (using 15 factors). Performing groundwater vulnerability assessment and comparing the results.

3.1. Preparation of DRASTIC model data 3.1.1. Depth to groundwater (D) It represents the vertical distance between ground surface and water table. The shorter the distance of ground surface to water table, the higher the groundwater vulnerability. Water level data of 25 boreholes were used to interpolate depth to groundwater by Inverse Distance Weighted (IDW) method. Then, the depth was divided into 7 intervals: 1–3, 3–6, 6–9, 9–11, 11–14, 14–18 and N18 m based on the expert opinion and literature review (Neshat et al., 2014; Neshat and Pradhan, 2015a). The resulting map is shown in Fig. 3a. 3.1.2. Net recharge (R) Net recharge can affect the transport of contaminants vertically to water table and spread horizontally in the aquifer (Aller et al., 1987a). Net recharge is affected by precipitation, ground slope and soil permeability (Ouedraogo et al., 2016; Piscopo et al., 2001). The net recharge value can be determined using the Piscopo method (Piscopo et al., 2001): Net recharge value ¼ Slope þ Rainfall þ Soil permeability

ð1Þ

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

1035

Fig. 2. Flowchart of methodology adopted for the development of model.

The Digital Elevation Model (DEM) of the study area was used for preparing a slope map. Rainfall data of 11 years period (2005 to 2016) from 10 rain gauges and the IDW method were used to construct a rainfall map. The rainfall depth was divided into 4 classes: b500, 500–600, 600–700 and N700 mm. Iso-soil permeability map was obtained from the Mazandaran Water Regional Authority (MWRA) and was divided into 5 classes using ArcGIS10.2. Finally, the net recharge was produced using Eq. (1) and was divided into four classes: 9–12, 12–14, 14–16 and 16–23 cm/year (shown in Fig. 3b). The factors, classes, and ratings are listed in Table 1. 3.1.3. Aquifer media (A) The larger the grainsize, the higher the permeability, the lower the attenuation capacity, and thus, the greater the vulnerability potential (Anwar et al., 2002). The aquifer media map (Fig. 3c) was constructed using well log data at 32 piezometers which included seven lithological classes: clay, lime, clay-sand-silt, sand and clay, sand, clay-silt/sandgravel and pebble and gravel/sand. 3.1.4. Soil media (S) Soil texture properties affect the amount of infiltration from ground surface. This layer was prepared using a soil map from the Iranian Soil and Water Research Institution. A map of soil media (Fig. 3d) was constructed with four classes: clay and very heavy soil (clay and silt), heavy soil (clay, silt and a little sand), moderate soil (clay, soil and sand), and sandy soil. 3.1.5. Topography (ground slope) (T) Topography also affects infiltration at ground surface. Gentle slope produces less runoff and more retention of water resulting in more

infiltration, and thus higher potential of contamination. Using the DEM, aground slope map (Fig. 3e) was constructed, which was divided into five classes: b2%, 2–6%, 6–12%, 12–18%, and N18% based on Aller et al. (1987a).

3.1.6. Impact of vadose zone (I) The impact of vadose zone on groundwater contamination potential depends on permeability and attenuation characteristics of the sediments. The sediment data were extracted from boring well logs provided by the Mazandaran Regional Water Authority and classified into five classes: clay, clay-silt, silt-clay-sand, clay-gravel, and gravel-sand (Fig. 3f).

3.1.7. Hydraulic conductivity (C) Hydraulic conductivity of an aquifer describes the ability of aquifer media (soil and rock) to transmit water through pore spaces or fractures and plays an important role in pollutant migration velocity and dispersion. Hydraulic conductivity (K, m/s) of the Sari-Behshahr plain was calculated from the equation below: K ¼ T=b

ð2Þ

where T is the transmissivity of aquifer (m2/day), and b is the aquifer thickness (m). Both of transmissivity and aquifer thickness maps were acquired from Mazandaran Water Regional Authority. Hydraulic conductivity map (Fig. 3g) was prepared and classified into five classes: 0–5, 5–10, 10–15, 15–25 and N25 (m/day).

1036

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

3.2. Preparation of mod-DRASTIC model data Eight factors in addition to 7 factors of the original DRASTIC model were considered to modify the model. 3.2.1. Distance to fault and fault density Faults affect the continuity of rocks and soil masses (Ayalew and Yamagishi, 2005) and may act as a conduit for seepage. A fault trace map of Mazandaran province was available and used to generate a map of distance to fault (Fig. 3h) and a map of fault density (Fig. 3i). The classification of distance to fault and fault density as well as distance from river, river density and altitude was implemented based on the frequency analysis of nitrate measurements. This technique was applied for classification of factors for flood susceptibility modeling (Khosravi et al., 2018) and landslide susceptibility assessment (Pham et al., 2017f). 3.2.2. Distance from river and river density Pollutants may infiltrate from polluted rivers to unconfined aquifers. Thus, distance from river and river density could be two important factors in the areas with agricultural activities and subsequent draining of fertilizers by the surface streams. A map of distance from rivers was created by four classes: 0–500, 500–1000, 1000–2000 and N2000 m (Fig. 3j). Also, a map of river density was constructed by five classes: 0–0.19, 0.19–0.26, 0.26–0.33, 0.33–0.4 and 0.4–0.7 km/km2 (Fig. 3k).

3.2.3. Land-use Infiltration of contaminants to groundwater depends on types of land use. In this study, a map of eleven classes of land use were created that includes airport, dense forest, good rangeland, mixture of agriculture and garden, mixture of agriculture and fallow, dry farming, moderate forest, poor rangeland, urban, reservoir, and wetland (Fig. 3l). 3.2.4. Soil order Soil scientists have developed a soil taxonomy system, in which the most general level of classification known in the United States system is the soil order which includes 12 classes (https://www.soils.org/ discover-soils/soil-basics/soil-types). This taxonomy is based on one or two dominant physical, chemical, or biological properties of the soil. This study adopted the soil order of 5 classes: coastal sand, salt flat, Alfisols (moderately weathered), Inceptisols (slightly developed and young), and Mollisols (deep and fertile) to generate a soil order map (Fig. 3m). 3.2.5. Geologic time scale Geology is a critical factor to the spatio-temporal variation of drainage basin hydrology (Miller, 1990). Lithological and structural variations affect the permeability of rocks and soils (Chapi et al., 2017; Pham et al., 2017b). Geological time scale relates geological strata to time of occurrence, which provides additional information beyond

Fig. 3. Maps of groundwater vulnerability conditioning factors: (a) depth to groundwater, (b) net recharge, (c) aquifer media, (d) soil texture, (e) topography, (f) impact of vadose zone, (g) hydraulic conductivity, (h) distance to fault, (i) fault density, (j) distance from river, (k) river density, (l) land use, (m) soil order, (n) geological time scale, and (o) altitude.

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

1037

Fig. 3 (continued).

rock type. A map of geological time scale for the study area based on the data from the Geological Survey of Iran was produced by six stratigraphic classes: Jurassic, Jurassic-Cretaceous, Pliocene, Pre-Cambrian, Quaternary, and Triassic-Jurassic (Fig. 3n). 3.2.6. Altitude This study considers altitude to be one factor because groundwater is likely to occur in low lands than in high lands (Naghibi et al., 2018). A map of altitude was produced based on 5 classes: −29–0, 0–50, 50–100, 100–200 and N200 m (Fig. 3o). 3.3. Selection of model factors 3.3.1. Multi-collinearity diagnosis Dependency of input factors in this study was evaluated by two commonly used methods, Tolerance and Variance Inflation Factor (VIF), to calculate multi-collinearity among factors. Tolerance b 0.1 and VIF N 5 show strong multi-collinearity (O'brien, 2007). 3.3.2. Information Gain Ratio The performance quality of a spatial model depends on two important factors, selection of model and quality of input data (Pradhan,

2013). It is noted that input factors do not have an equal effect on model output (i.e. predictability). Factors with null or low predictability (decreasing the prediction ability of a model) should be eliminated (Bui et al., 2016b). This study adopts the Information Gain Ratio (IGR) (Bui et al., 2016b; Quinlan, 1993) that was used to calculate the predictive capability of factors in data mining (Witten et al., 2016). Higher IGR values indicate higher predictive capability of factors. 3.4. DRASTIC method for groundwater vulnerability assessment The DRASTIC method is a standard system for evaluating groundwater pollution potential (Babiker et al., 2005; Rahman, 2008; Sahoo et al., 2016). Each factor is classified and assigned a rating score and a weight according to the probability of pollution to the aquifer (LeGrand, 1964). The rating score is from 1 to 10 and the weight is from 1 to 5 (Aller et al., 1987b). The higher the values, the greater the pollution risk is. The DRASTIC index is calculated as follows: DRASTIC Index ¼ Dr Dw þ Rr Rw þ Ar Aw þ Sr Sw þ Tr Tw þ Ir Iw þ Cr Cw

ð3Þ

where the capital letter shows the corresponding conditioning factor

1038

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

Fig. 3 (continued).

and the subscripts “r” and “w” represent the rating and the weight, respectively. Selection of a rating score and a weight for a factor is always questionable in the DRASTIC model. As mentioned before, the rating scores are based on expert opinions that represent a source of uncertainty. 3.5. Objective methods The objective methods used in the present study included two bivariate statistical methods (WOE and BA) and two machine learning methods (LMT and SE). Bivariate statistical models can be used as a simple tool to evaluate the performance of spatial prediction models (Khosravi et al., 2016a,b). The advantages of using the bivariate statistical models are that (1) implementation is simple, (2) the models produce reasonable accuracy, and (3) the models are able to determine factors or combinations of factors to be used in the assessment (Van Westen et al., 2003). The first step for performance of these methods are preparation of nitrate inventory for bivariate (WOE and SE) and construction of the training and testing datasets for machine learning methods (LMT and BA) to establish the spatial relationship between nitrate concentrations and factors. Nitrate concentration data from 218 wells in 2015 were divided into two groups of b50 PPM and N50 PPM, as illustrated in Fig. 1. To validate the predictability of models, both these two nitrate datasets were further divided into two sub datasets for the training and validation purposes. 70% of data points were randomly selected for training the models. The remaining 30% data were used for model validation. It should be noted that the groundwater vulnerability assessment using LMT and SE is considered as a pattern classification with two classes, nitrate risk and non-nitrate, in which the nitrate risk data and the non-nitrate risk data were indexed with a binary value of 1 or 0, respectively (Bui et al., 2016b). Probability value belongs to the nitrate risk class of pixels was used as groundwater vulnerability index. 3.5.1. Weights-of-Evidence Weights-of-Evidence (WOE) method is a bivariate statistical model, which is based on Bayesian theorem. The method has been applied for landslide susceptibility (Chen et al., 2016), flood susceptibility

(Khosravi et al., 2016b), gully erosion susceptibility (Rahmati et al., 2016), and groundwater potential mapping (Lee et al., 2012). The WOE method is described in detail in Bonham-Carter (2014) and Pradhan et al. (2010). Consider P(B| D) to be the posterior probability of potential nitrate risk predictive factor (B) given the nitrate risk data (D) and PðBjDÞ to be the posterior probability of potential nitrate risk predictive factor (B) given the non-nitrate risk data (D). Similarly, Consider PðBjDÞ to be the posterior probability of potential non-nitrate risk predictive factor (B) given the nitrate risk data (D) and PðBjDÞ to be the posterior probability of potential non-nitrate risk predictive factor (B) given the nonnitrate risk data (D). The weights, W+ and W−, for the potential nitrate risk predictive factor and potential non-nitrate risk predictive factor, respectively, are the natural logarithms of the posterior probability ratios (Bonham-Carter, 2014): P ðBjDÞ W þ ¼ ln   P BjD

ð4Þ

   P BD W − ¼ ln    P BD

ð5Þ

The weight contrast is C = W+ − W−. The standard deviation of the weight contrast is σ ðC Þ ¼

ffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   σ 2 W þ þ σ 2 ðW − Þ

ð6Þ

where σ2(W+) and σ2(W−) are the variances of W+ and W−, respectively. The final weight is calculated as the ratio of the weight contrast divided by its standard deviation, i.e., W = C/σ(C). 3.5.2. Shannon Entropy (SE) SE explains the uncertainty of a system defined as the mean of differences between proportions of unit groups and the total system. Shannon modified the Boltzmann model and developed the theory of information (Pourghasemi et al., 2012). The set of equations below calculates a weight for conditioning of the incorporated information,

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049 Table 1 Net recharge rates assigned to the study area using Piscopo approach (Piscopo et al., 2001). No

Factors

Class

Rating

1

Slope

0–2 2–6 6–12 12–18 N18 b500 500–600 600–700 N700 Very low Low Moderate High Very high 9–12 12–14 14–16 16–23

5 4 3 2 1 1 2 3 4 1 2 3 4 5 1 6 9 10

2

Rainfall (mm)

3

Soil permeability

4

Net recharge (cm/year)

Mj X

E

ð8Þ

i¼1

where Hj is the entropy value for the factor j. H j max ¼ log2 M j

ð9Þ

where Hj,max is the maximum entropy value for the factor j.   I j ¼ H j; max −H j =H j; max

ð10Þ

where Ij is the information for the factor j, whose value is between 0 and 1. PM j V j ¼ Ij 

FRij Mj

i¼1

f nL ðxÞ

ð12Þ

It should be noted that all model parameters should be identified to obtain the highest performance. In the current study, LMT model was built using the number of boosting iterations of −1, minimum number of instances of 15 for splitting at nodes and fast regression and error in probabilities to achieve the best performance.

ð7Þ

Eij  log2ij

L X n¼1

where FR represents the frequency ratio (i is number of classes in factor j and Mj is total number of classes for each factor) and Eij is the probability density for class i in the factor j. Hj ¼ −

classes of 1,…,L by means of weak learners n is used in the LogitBoost method as below:

F L ðxÞ ¼

where Vj is indicative of the value of the parameter from the total figure, which is determined based on the following equation: FRij Eij ¼ PM j i¼1 FRij

1039

3.5.4. Bootstrap Aggregating (bagging meta-algorithm) Bootstrap Aggregating or bagging meta-algorithm (BA) which is a bootstrap ensemble method first introduced by Breiman et al. (1984) and used to enhance the classification and accuracy of the machine learning models. Bootstrapping is a procedure for producing random samples with substitution for predicting sample statistics. This model is one of the averaging approaches and can be ensembled with other algorithms, especially decision tree, to improve results. Bagging model reduces the variance of prediction, producing additional data from the original data for training. The present model was trained and constructed on every bootstrap sample and the final model which is an aggregated model of the all samples was constructed (Bui et al., 2016a) (Fig. 5). In this study, the Bagging model was constructed using random umber of seed 1 which was utilized to split the data, the number of Iterations 30, to perform and the number of execution slots of 1 that was used to construct the ensemble and the percentage of bag size, 100 was the percentage of training set size which was applied to achieved the highest prediction power. 3.6. Generation of groundwater vulnerability maps Groundwater vulnerability maps were constructed by calculating and classifying groundwater vulnerability indexes (GVI). In the first step, the training dataset was used for training the models. Next, the total study area was divided into pixels, in which each pixel was classified as a polluted or unpolluted class using a set of all sampling pixels. Then, each pixel was assigned a unique value which indicated the probability of pollution occurrences. Classification of GVI was done using the quantile method which is one of the most popular classification methods (Osaragi, 2002; Tehrany et al., 2015). However, there are more methods for classification such as Natural Break, Quantile, Standard Deviation, Equal Interval, Manual and Geometrical Interval which can be applied depending on the nature of the data and objective of the study (Tehrany et al., 2015). Where indexes have skewness, quantile or natural break method can be applied (Akgun, 2012). Finally,

ð11Þ

Vj represents the weight for the factor j. 3.5.3. Logistic Model Tree (LMT) LMT is one of the classification methods which is a combination of C4.5 decision tree learning methods (Quinlan, 1996) and linear logistic regression (Bui et al., 2016a). In this model, splitting and fitting of the logistic regression function can be carried out using information gain and LogitBoost algorithm at the node, respectively (Landwehr et al., 2005). Finally the constructed tree is pruned by classification and regression tree (CART) algorithm to prevent over-fitting (Breiman et al., 1984). The leaf node is divided into two child nodes according to the threshold value, as the right and left branches belonging to the values of the attribute greater and lesser than the threshold value, respectively (Fig. 4) (Nachiappan et al., 2016). A group of functions (FL) to forecast the

Fig. 4. Structure of LMT model.

1040

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

Fig. 5. Concept of the bagging model.

these indexes were classified into 5 vulnerability classes: very low, low, moderate, high and very high. 3.7. Model performance assessment using statistical evaluation measures The performance of models should be evaluated in both training and testing phases (Pham et al., 2017d,e, 2018a,b). Three widespread statistical evaluation measures, including kappa index, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) were used in both training and testing phases for the performance assessment of machine learning models. Kappa was calculated from equations below: P obs −P exp 1−P exp

ð13Þ

P obs ¼ TP þ TN

ð14Þ



P exp ¼ ððTp þ FN ÞðTP þ FPÞ þ ðFP þ TNÞðFN þ TNÞÞ

According to literature review, researchers only used the correlation between nitrate data and DRASTIC map to calculate the accuracy of the achieved map. But in the current study, Area Under the Receiver-

Table 2 DRASTIC rating and weight values. No

Factors

Class

Rate

1

D (W:5)

2

R (W:4)

3

A (W:3)

4

S (W:2)

5

T (W:1)

6

I (W:5)

7

C (W:3)

1–3 3–6 6–9 9–11 11–14 14–18 N18 9–12 12–14 14–16 16–23 Clay Lime Clay-sand-silt Sand & clay Sand Clay & silt & sand and gravel Pebble & gravel & sand Clay and very heavy soil heavy soil Moderate soil Sandy soil b2 2–6 6–12 12–18 N18 Clay Clay-silt Silt-clay and sand Clay-gravel Gravel and sand 0–5 5–10 10–15 15–25 N25

10 8 6 5 3 2 1 1 6 9 10 1 2 3 5 6 7 8 1 2 3 6 10 9 5 3 1 2 3 4 7 9 1 3 5 7 8

ð15Þ

where Pobs is the ratio of observed agreement and Pexp is the expected or estimated agreement. True Positive (TP) is the number of nitrate point values estimated correctly as nitrate, False Positive (FP) is the number of nitrate point values classified as a non-nitrate, True Negative (TN) is the total number of non-nitrate point values estimated as a nonnitrate. False negative (FN) is the number of non-nitrate point values classified as nitrate. RMSE and MAE were applied to calculate the model error assessment and smaller the RMSE and MAE show better the model (Bui et al., 2016a; Pham and Prakash, 2017b, 2018c): sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn 2 i¼1 ðOi −P Þ RMSE ¼ N

3.8. Comparison and validation of the maps

ð16Þ

Pn MAE ¼

i¼1 jOi −P i j

N

ð17Þ

where Oi and Pi are the observation (target) and prediction (output) values in both training and testing datasets and N is the total samples in the training or the testing dataset.

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

Operating Characteristic (ROC) Curve (AUC) was used to evaluate and compare the prediction capability of models (Pham et al., 2017a,c; Pham and Prakash, 2017a). The ROC is a graphical plot created by plotting the true positive rate (sensitivity) against the false positive rate (or

1041

1-specificity). The advantage of ROC in comparison to correlation method is that it shows the model prediction ability and also provides map accuracy quantitatively. In the present study, the ROC curve was prepared by considering nitrate concentration values and groundwater

Fig. 6. Groundwater vulnerability maps using different models: (a) DRASTIC, (b) WOE (7 factors), (c) SE, (d) LMT, (e) Bagging, and (f) WOE (15 factors).

1042

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

Fig. 6 (continued).

vulnerability indices. The AUC value varies from 0.5 to 1. Higher AUC values indicate better model prediction capability. Testing dataset which were not used for model building was used to evaluate the performance and capability of the models. The AUC range was classified as 0.9–1, excellent; 0.8–0.9, very good; 0.7–0.8, good; 0.6–0.7, average; and 0.5–0.6, poor (Yesilnacar, 2005).

4. Results and discussions 4.1. Multi-collinearity diagnosis analysis The result showed that the values of VIF and tolerance were far from the critical values, thus there was not any multi-collinearity among all of

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

fifteen factors (Table A1 - Appendix). The minimum tolerance and maximum VIF were observed for fault density and were 0.21 and 4.7, respectively. This shows that there is no correlation among theses 15 factors, which indicate that there is no overlap among factors or in other words the elimination of any factor.

1043

(Chapi et al., 2017; Tehrany et al., 2015), to develop the groundwater vulnerability map (Fig. 6a). 4.4. Groundwater vulnerability assessment using bivariate models These methods include WOE and SE models. Results of these methods in groundwater vulnerability study are briefly described below:

4.2. IGR analysis to assess predictive capability of factors The IGR technique with 10-fold cross-validation was applied to identify the prediction capability of all conditioning factors. According to results of IGR technique, the most effective factor for groundwater pollution occurrences at the Sari-Behshahr plain was net recharge (IGR = 0.33), followed by fault density (0.27), soil media (IGR = 0.26), altitude (IGR = 0.17), depth to groundwater (IGR = 0.14), slope percent (IGR = 0.13), distance to fault (IGR = 0.12), river density (IGR = 0.08), and soil order (IGR = 0.06). As per the IGR analysis, least effective factors were impact of vadose zone, hydraulic conductivity, distance from river, aquifer media, geology, and landuse (IGR = 0). However, these factors were also considered in the modeling as by removing these factors AUC value had reduced very little from 0.86 to 0.84 in a training phase.

4.3. Groundwater vulnerability assessment using original DRASTIC model The DRASTIC index was calculated by considering factors rates and weights presented in Table 2 and using Eq. (3). The DRASTIC index was classified into 5 classes, based on the quantile classification scheme

4.4.1. WOE model Results indicate that in the case of WOE model for 7 factors (Table 3), the depth to groundwater level of 14–18 m had the highest impact (3.24) on groundwater vulnerability, followed by depth to groundwater of 6–9 m (1.11) and 9–11 m (0.91), and the depth of 1–3 m had the most negative impact on groundwater pollution. The net recharge N14 cm/year had the most effect on groundwater pollution occurrences. For aquifer media, clay, sand and gravel class and clay class had the highest (5.44) and lowest influences (−2.16) on groundwater pollution occurrences, respectively. The moderate soil had the highest influence (4.37) and three other types of soil have negative values showing no impact on groundwater pollution occurrence. Results indicated that slope of 2–6% had the highest influence (3.07) on the groundwater pollution probability. In the case of impact of vadose zone parameter, the clay material had the highest value (1.67), followed by clay and silt (0.77), clay and gravels (0.66), and gravel and sand (0.36), and on the contrary, the silt, clay and sand class had no impact on pollution occurrences. The lowest hydraulic conductivity (0–5) had the highest impact (1.74) on the groundwater pollution probability, followed by N25 (1.19) and

Table 3 Spatial correlation between nitrate and seven DRASTIC factors using WOE (7 factors) model. Factors

Class

No. of nitrate

Percentage of nitrate (%)

No. of pixels in domain

Percentage of domain (%)

W+

W−

Sc

C

C/Sc

Depth to groundwater (m)

1_3 3_6 6_9 9_11 11_14 14_18 N18 9_12 12_14 14_16 16_23 Clay Lime Clay/sand and silt Sand and clay Sand Clay, sand and gravel Pebble and gravel Clay, and very heavy soil Heavy soil Moderate soil Sandy soil 0–2 2–6 6–12 12–18 N18 Clay Clay and silt Silt, clay and sand Clay and gravel Gravel and sand 0–5 5–10 10–15 15–25 N25

16 13 20 8 5 13 1 3 14 24 35 1 6 10 0 11 8 40 6 17 42 11 58 14 3 1 0 7 5 35 19 10 9 28 30 7 2

21.05 17.11 26.32 10.53 6.58 17.11 1.32 3.95 18.42 31.58 46.05 1.32 7.89 13.16 0.00 14.47 10.53 52.63 7.89 22.37 55.26 14.47 76.32 18.42 3.95 1.32 0.00 9.21 6.58 46.05 25.00 13.16 11.84 36.84 39.47 9.21 2.63

533,413 381,733 385,019 141,181 156,184 130,416 97,114 16,076 14,242 13,985 22,763 191,865 139,100 104,572 479,523 133,617 27,666 746,494 251,748 683,081 566,758 322,201 1,504,869 152,102 66,721 39,728 61,640 90,810 85,963 1,033,702 398,963 215,622 123,074 763,844 784,888 132,487 20,767

29.23 20.92 21.10 7.74 8.56 7.15 5.32 23.97 21.24 20.85 33.94 10.53 7.63 5.74 26.31 7.33 1.52 40.95 13.80 37.45 31.08 17.67 82.46 8.33 3.66 2.18 3.38 4.98 4.71 56.64 21.86 11.81 6.74 41.85 43.01 7.26 1.14

−0.328 −0.201 0.221 0.308 −0.263 0.873 −1.397 −1.805 −0.142 0.416 0.306 −2.079 0.034 0.830 None 0.680 1.937 0.251 −0.559 −0.515 0.576 −0.199 −0.077 0.793 0.077 −0.503 None 0.616 0.334 −0.207 0.134 0.108 0.563 −0.128 −0.086 0.238 0.838

0.109 0.047 −0.068 −0.031 0.021 −0.113 0.041 0.234 0.035 −0.146 −0.203 0.098 −0.003 −0.082 0.305 −0.080 −0.096 −0.220 0.066 0.216 −0.432 0.038 0.300 −0.117 −0.003 0.009 0.034 −0.046 −0.020 0.218 −0.041 −0.015 −0.056 0.083 0.060 −0.021 −0.015

0.281 0.305 0.261 0.374 0.463 0.305 1.007 0.589 0.296 0.247 0.230 1.007 0.425 0.339 None 0.326 0.374 0.230 0.425 0.275 0.231 0.326 0.270 0.296 0.589 1.007 None 0.397 0.463 0.230 0.265 0.339 0.355 0.238 0.235 0.397 0.717

−0.437 −0.248 0.290 0.339 −0.284 0.986 −1.439 −2.039 −0.177 0.561 0.508 −2.177 0.037 0.912 None 0.761 2.033 0.471 −0.625 −0.732 1.008 −0.237 −0.377 0.910 0.080 −0.512 None 0.661 0.354 −0.425 0.175 0.123 0.619 −0.210 −0.146 0.259 0.854

−1.55 −0.81 1.11 0.91 −0.61 3.24 −1.43 −3.46 −0.60 2.27 2.21 −2.16 0.09 2.69 None 2.33 5.44 2.05 −1.47 −2.66 4.37 −0.73 −1.40 3.07 0.14 −0.51 None 1.67 0.77 −1.85 0.66 0.36 1.74 −0.88 −0.62 0.65 1.19

Net recharge (cm/year)

Aquifer media

Soil media

Slope (percent)

Impact of Vados zone

Hydraulic conductivity (m/day)

1044

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

Table 4 Spatial relationship between nitrate and DRASTIC (7 factors) using SE method. Factors

Class

No. of nitrate

Percentage of nitrate (%)

No. of pixels in domain

Percentage of domain (%)

FR

Eij

HJ

Ij

Vj

Depth to groundwater (m)

1_3 3_6 6_9 9_11 11_14 14_18 N18 9_12 12_14 14_16 16_23 Clay Lime Clay/sand and silt Sand and clay Sand Clay, sand and gravel Pebble and gravel Clay, and very heavy soil Heavy soil Moderate soil Sandy soil 0–2 2–6 6–12 12–18 N18 Clay Clay and silt Silt, clay and sand Clay and gravel Gravel and sand 0–5 5–10 10–15 15–25 N25

16 13 20 8 5 13 1 3 14 24 35 1 6 10 0 11 8 40 6 17 42 11 58 14 3 1 0 7 5 35 19 10 9 28 30 7 2

21.05 17.11 26.32 10.53 6.58 17.11 1.32 3.95 18.42 31.58 46.05 1.32 7.89 13.16 0.00 14.47 10.53 52.63 7.89 22.37 55.26 14.47 76.32 18.42 3.95 1.32 0.00 9.21 6.58 46.05 25.00 13.16 11.84 36.84 39.47 9.21 2.63

533,413 381,733 385,019 141,181 156,184 130,416 97,114 16,076 14,242 13,985 22,763 191,865 139,100 104,572 479,523 133,617 27,666 746,494 251,748 683,081 566,758 322,201 1,504,869 152,102 66,721 39,728 61,640 90,810 85,963 1,033,702 398,963 215,622 123,074 763,844 784,888 132,487 20,767

29.23 20.92 21.10 7.74 8.56 7.15 5.32 23.97 21.24 20.85 33.94 10.53 7.63 5.74 26.31 7.33 1.52 40.95 13.80 37.45 31.08 17.67 82.46 8.33 3.66 2.18 3.38 4.98 4.71 56.64 21.86 11.81 6.74 41.85 43.01 7.26 1.14

0.72 0.82 1.25 1.36 0.77 2.39 0.25 0.16 0.87 1.51 1.36 0.13 1.03 2.29 0.00 1.97 6.94 1.29 0.57 0.60 1.78 0.82 0.93 2.21 1.08 0.60 0.00 1.85 1.40 0.81 1.14 1.11 1.76 0.88 0.92 1.27 2.31

0.10 0.11 0.17 0.18 0.10 0.32 0.03 0.04 0.22 0.39 0.35 0.01 0.08 0.17 0.00 0.14 0.51 0.09 0.15 0.16 0.47 0.22 0.19 0.46 0.22 0.13 0.00 0.29 0.22 0.13 0.18 0.18 0.25 0.12 0.13 0.18 0.32

2.57

0.09

0.09

1.73

0.132617

0.13

2.00

0.29

0.56

1.82

0.087963

0.08

1.83

0.210978

0.20

2.27

0.022937

0.03

2.22

0.043622

0.06

Net recharge (cm/year)

Aquifer media

Soil media

Slope (percent)

Impact of Vados zone

Hydraulic conductivity (m/day)

15–25 (0.65), and on the contrary two other classes of 5–10 and 10–15 had no influence. The groundwater vulnerability indexes from the WOE model were calculated from Eq. (18) and finally were divided into 5 classes, namely, very low, low, moderate, high and very high for vulnerability assessment (Fig. 6b). GVIWOE ¼ DWOE þ RWOE þ AWOE þ SWOE þ TWOE þ IWOE þ CWOE

groundwater vulnerability map (Fig. 6c) was prepared by dividing these indexes into 5 classes using the quantile classification method. GVISE ¼ 0:09  DFR þ 0:13  RFR þ 0:56  AFR þ 0:08  SFR þ 0:2  TFR þ 0:03  IFR þ 0:06  CFR

ð19Þ

ð18Þ 4.5. Groundwater vulnerability assessment using machine learning methods

4.4.2. SE model The difference between Shannon Entropy and other bivariate models used in the present study was that Shannon Entropy benefitted from both rating and weighting. Results of entropy indicated that the weight values for D, R, A, S, T, I, and C were 0.09, 0.13, 0.56, 0.08, 0.2, 0.03, and 0.06, respectively (Table 4). Overall, the SE model results revealed that the most and the least important factors for groundwater pollution probability in the Sari-Behshahr plain were aquifer media and impact of vadose zone parameter, respectively. Finally, the groundwater vulnerability indexes for SE were calculated from Eq. (19) and the

4.6. Groundwater vulnerability assessment using WOE and mod-DRASTIC (15 factors)

Table 5 Model performance using the training dataset and the testing dataset. No

1 2 3

Results of training and testing phases for the LMT model and the bagging model for the DRASTIC model with 7 factors indicate that both models have higher Kappa and lower RMSE and MAE values than the original DRASTIC (Table 5). However, these values are lower for the LMT model in comparison to the Bagging model in both the training and the testing phases. The corresponding groundwater vulnerability maps for using LMT and Bagging methods are presented in Fig. 6d and f.

Model evaluation criteria

LMT Training dataset

Testing dataset

Training dataset

Testing dataset

Kappa RMSE MAE

0.88 0.2 0.1

0.38 0.52 0.31

0.98 0.17 0.13

0.40 0.41 0.29

Bagging

Since WOE model performed better than the SE, LMT and Bagging models for the modification of weights and rates of the original DRASTIC model with 7 factors, it was used for the case of the modDRASTIC model with 15 factors as well. Analysis of the WOE model results (Table 6) in this case indicates that with the increase of river density probability of groundwater pollution due to nitrate contamination increases. Inceptisol soils had the most influence on groundwater

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

1045

Table 6 Spatial correlation between nitrate concentrations and 15 factors for mod-DRASTIC using WOE model. Factors

Class

No. of nitrate

Percentage of nitrate (%)

No. of pixels in domain

Percentage of domain (%)

W+

W−

Sc

C

C/Sc

Rivre density (km/km2)

0–19 19–0.26 0.26–0.33 0.33–0.4 0.4–0.7 Coastal sand Salt flat Alfisol Inceptisol Mollisol 0–500 500–1000 1000–2000 N2000 0–500 500–1000 1000–2000 N2000 0–0.1 0.1–0.17 0.17–0.24 0.24–0.36 0.36–0.54 Jurassic Jurassic-Cretaceous Pliocene Precambrian Quaternary Triassic-Jurassic −29–0 0–50 50–100 100–200 N200 Airport Dense forest Good rangeland Mixture of agriculture and garden Mixture of agriculture and follow Dry farming Moderate forest Poor rangeland Urban Reservoir Wetland

8 9 12 24 23 0 0 5 35 36 14 14 18 30 29 16 19 12 10 15 18 12 21 0 0 1 0 75 0 30 37 4 3 2 0 1 0 0 62 1 4 8 0 0 0

10.53 11.84 15.79 31.58 30.26 0.00 0.00 6.58 46.05 47.37 18.42 18.42 23.68 39.47 38.16 21.05 25.00 15.79 13.16 19.74 23.68 15.79 27.63 0.00 0.00 1.32 0.00 98.68 0.00 39.47 48.68 5.26 3.95 2.63 0.00 1.32 0.00 0.00 81.58 1.32 5.26 10.53 0.00 0.00 0.00

357,865 375,982 369,803 366,755 356,811 16,815 75,737 243,163 552,446 920,704 242,963 220,393 310,702 1,051,002 584,307 447,017 489,489 303,472 749,435 270,254 266,203 265,956 273,212 12,490 25,821 66,539 5549 1,713,578 2490 1,080,006 451,739 99,227 118,328 74,062 720 96,713 980 801 1,400,830 47,735 85,635 46,158 90,220 5645 17,143

19.59 20.58 20.24 20.07 19.53 0.93 4.19 13.44 30.54 50.90 13.31 12.08 17.02 57.59 32.03 24.50 26.83 16.64 41.06 14.81 14.59 14.57 14.97 0.68 1.41 3.65 0.30 93.89 0.11 59.23 24.78 5.44 6.49 4.06 0.04 5.40 0.05 0.04 78.15 2.66 4.78 2.57 5.03 0.31 0.96

−0.621 −0.553 −0.248 0.453 0.438 None None −0.715 0.411 −0.072 0.325 0.422 0.330 −0.378 0.175 −0.152 −0.071 −0.052 −1.138 0.287 0.485 0.080 0.613 None None −1.020 None 0.049 None −0.406 0.676 −0.033 −0.497 −0.434 None −1.411 None None 0.043 −0.705 0.097 1.408 None None None

0.107 0.104 0.054 −0.155 −0.143 0.009 0.043 0.076 −0.253 0.069 −0.061 −0.075 −0.084 0.356 −0.094 0.045 0.025 0.010 0.403 −0.049 −0.102 −0.004 −0.151 0.015 0.023 0.033 0.011 1.379 0.01 0.415 −0.372 0.010 0.035 0.023 −0.009 0.033 −0.009 −0.009 −0.212 0.004 −0.015 −0.094 0.042 −0.006 0.000

0.374 0.355 0.315 0.247 0.250 None None 0.463 0.230 0.230 0.296 0.296 0.270 0.235 0.236 0.281 0.265 0.315 0.339 0.288 0.270 0.315 0.257 None None 1.007 None 1.007 None 0.235 0.230 0.514 0.589 0.717 None 1.007 None None 0.296 1.007 0.514 0.374 None None None

−0.728 −0.657 −0.303 0.609 0.581 None None −0.791 0.663 −0.141 0.386 0.497 0.414 −0.733 0.270 −0.197 −0.095 −0.062 −1.541 0.336 0.587 0.084 0.764 None None −1.052 None 1.428 None −0.821 1.047 −0.044 −0.533 −0.457 None −1.444 None None 0.255 −0.709 0.111 1.503 None None None

−1.95 −1.85 −0.96 2.47 2.33 None None −1.71 2.88 −0.62 1.30 1.68 1.43 −3.12 1.14 −0.70 −0.36 −0.20 −4.54 1.17 2.18 0.27 2.98 None None −1.05 None 1.42 None −3.50 4.56 −0.09 −0.90 −0.64 None −1.43 None None 0.86 −0.70 0.22 4.02 None None None

Soil order

Distance to fault (m)

Distance to river (m)

Fault density (km/km2)

Geologic time scale

Altitude

Landuse

pollution (2.88) and Quaternary soil type in comparison to other soil types present in the study area. A distance to faults of N2000 m did not have any effect on groundwater vulnerability and the faults at distance of 500–1000 m had high influence on the groundwater vulnerability. Faults in the area with a density of N0.36 km/km2 had the most important effect for groundwater vulnerability assessment in the area. Near the river probability of groundwater pollution (1.14) up to 500 m distance is higher. Probabilities of groundwater contamination at lower altitude between −29–0 and 0–50 m is higher. As land use pattern is concerned, the poor rangeland has the highest value, followed by the mixture of agriculture, and fallow and dense forest. Based on the fifteen factors, the WOE map was constructed using Eq. (20) (Fig. 6g). GVIWOE ¼ DWOE þ RWOE þ AWOE þ SWOE þ TWOE þ IWOE þ CWOE þ RDWOE þ SOWOE þ DFWOE þ DRWOE þ FDWOE þ GTSWOE þ ALWOE þ LUWOE ð20Þ

4.7. Evaluation and comparison of model results The ROC curve method was used for the evaluation and comparison of results obtained by different methods for the testing dataset (Fig. 7). Results showed that for the DRASTIC model with seven factors, the WOE

model had the highest AUC value (0.81) and lowest standard error (0.056). The other models in terms of performance accuracy for vulnerability assessment were LMT (AUC = 0.76), SE (AUC = 0.65), Bagging (AUC = 0.63), respectively. AUC value of original DRASTIC model was 0.51. It is noted that by including additional factors in the case of mod-DRASTIC (15 factors) and WOE for adjusting weights and rates, AUC significantly increased to a value of 0.91. This shows the superiority of the latest method in predicting vulnerability assessment for the study area. 4.8. Discussion Original DRASTIC model has been used by most of the researcher for the ground water vulnerability assessment. Some researchers have pointed out weaknesses in this model. The main sources of uncertainty in DRASTIC model are: (1) expert opinion based rating of classes of each parameter; (2) expert opinion or fixed weight of each factor; (3) it only considers transport of contaminants from surface through the unsaturated zone to groundwater but it does not capture contamination that results from human activities via direct contact with groundwater in wells, thus, GVI is lower than actual values; and (4) using only seven limited conditioning factors (Hamutoko et al., 2016). Nadiri et al. (2017b) stated that DRASTIC model can be considered as one of the

1046

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

Fig. 7. Evaluation of models' performance.

heuristic approaches, and thus it is amenable to calibration and validation and also, the vulnerability index values calculated by the model cannot be controlled. Another way to improve performance of the DRASTIC model is to change the factors by adding new ones or eliminating some of original factors. In this study, results showed that performance of DRASTIC model could be improved by modifying the weights and rates according to study area characteristics (by field measurements) using bivariate statistical and machine learning models as well as addition of extra factors to the model to increase the precision and accuracy of the predicted groundwater vulnerability maps. The rates in DRASTIC model are determined according to the table of Aller et al. (1987b) but finally rates are selected based on expert judgment. It is generally considered that an aquifer having lower ground water table is more likely to be contaminated depending on the nature and location of pollutant on ground, as it is expected that contaminants might reach the water table in a short travel time. But it is not always the case, as other factors such as type of aquifer, nature of vadose zones, presence of faults/fractures in the ground, source and surrounding ground condition may also have a significant impact on groundwater pollution potential. This is the case in the present study, where, the WOE model results indicate that the depth to groundwater of 14–18 m (instead of 1–3 m) had the highest impact on assessment of groundwater vulnerability, which is in accordance to the results of Neshat and Pradhan (2015a) who showed that in Dempster–Shafer Theory (DST) model, the depth to groundwater higher than 30 m exhibited the highest weight. In case of more net recharge, higher amount of contamination can be transported to the aquifer. Therefore, the rates for WOE model were adopted in accordance with the above considerations. For an aquifer media having higher coarser fractions, permeability would be higher and consequently pollution potential is expected to be higher as well (Anwar et al., 2002). According to results of the WOE model, the aquifer material consisting of clay, sand and gravel had a stronger impact value on groundwater pollution risk than pebble and gravel, which is contrary to the hypothesis. For the soil media, the presence of fine grain size material decreased the intrinsic permeability and prevented the transport of contamination. Results of the WOE model revealed that moderate soil instead of sandy soil had a higher rate. In general, where ground slope is gentler, the higher the retention and collection of water on the surface occur would be, thus, infiltration in such area would be higher. Results showed that the slope of 2–6% had

a higher rate than the slope of 0–2%. The impact of vadose zone was like other factors of aquifer and soil media. According to results of the WOE model, clay has a higher rate (1.6) than gravel and sand (0.36). More the hydraulic conductivity, higher the ability of aquifer formation to transport water, then higher the impact on groundwater vulnerability assessment. Results showed that the lowest hydraulic conductivity (0–5 m/day) had the highest rate (1.27), followed by N25 m/day (1.19). In the case of the river density and distance to river, there was a good relationship between them and polluted areas due to high drainage density and gentle slope. Faults are big cracks, thus closer distance to the fault, the higher is groundwater contamination probability, and there was a reasonable relationship between nitrate data and fault density and distance to fault. The poor rangeland and dense forest had the highest and lowest impact on groundwater vulnerability assessment. For poor rangelands, it can be due to animal dung, as most of the people in villages are farmers and for dense forest as there are no anthropogenic changes. Inceptisol soils (https://www.britannica.com/science/ Inceptisol) mostly formed from colluvial and alluvial materials have high permeability and thus high potential for pollution. Similarly, Quaternary formation has high permeability and also has a high influencing on groundwater vulnerability. Due to topography accumulation of contamination at lower altitude b50 m is more likely. It can be stated that some of seven original factors of the DRASTIC model did not have a significant impact on groundwater vulnerability assessment and other extra factors such as fault density and altitude were more effective in controlling the transport of contaminants and pollution potential. Therefore, results of the current study reject the hypothesis that the seven factors, rating and weighing must be constant for modeling of groundwater vulnerability assessment. The poor rangeland and dense forest had the highest and lowest impact on groundwater vulnerability assessment. For porous rangelands, it can be due to animal dung, as most of the people in villages are farmers, and for dense forest as there are no anthropogenic changes, they have the lowest impact on GVA. Sahoo et al. (2016) indicated that objective methods were the most effective approaches for the determination of weights of input parameters, as these methods assigned weights to parameters based on the relative importance of each parameter on the final vulnerability maps. Present analysis results revealed that all of objective methods had a reasonable accuracy and performed better than subjective methods. The bivariate model, it can be stated that the importance of ratings is more than weighting as the WOE model only considered the rating. Results of the current study are in agreement with results of Sahoo et al. (2016) who stated that all of considered factors of the DRASTIC models did not have any impact on the results and can be removed from the modeling process. According to the results of RMSE, MAE and Kappa, the Bagging model had better prediction power than LMT in both of training and testing phases, but according to AUC, the LMT model had higher accuracy than the Bagging model. As the AU-ROC is based on TP, TN, FP, and FN, it is more accurate than RMSE and MAE for model evaluation and comparison (Termeh et al., 2018). Thus, AUC was considered for model evaluation and comparison between their prediction powers; the WOE considering all 15 factors had the highest prediction capability. The factors must be selected according to the characteristics of the case study and input data should not be limited to only the seven parameters of DRASTIC. The advantage of the WOE data driven model is that it benefitted from the Bayesian method with log-linear model over the ordinary probability expression. This leads to the easier interpretation of results. Also, probability of prior and posterior logits is applied. This method can be used when sufficient data are available for the prediction of the relative importance of themes by statistical means (BonhamCarter, 2014). The LMT model benefit is in leaf nodes instead of constant values replaced by a regression plane (Witten and Frank, 2005) and it is computationally more effective to build the logistic models (Landwehr et al., 2005).

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

1047

Fig. 8. Location of some big cities in final groundwater vulnerability zone.

Kazakis and Voudouris (2015) modified DRASTIC by replacing some quantitative factors. They revealed that maps thus obtained provided higher correlation with nitrate pollution. Results of the present study are in line with the results of Kazakis and Voudouris (2015) who showed that the seven parameters of the DRASTIC model should be changed according to the case study characteristics. 5. Conclusions Groundwater vulnerability assessment has become a professional measure for sustainable management of groundwater resources, thus there is a need for development of new methods to increase the accuracy of such assessments. In this study, original and modified DRASTIC methods were evaluated and compared with objective models, namely WOE, SE, Bagging, and LMT. The efficacy of these methods was evaluated quantitatively using the AUC and other statistical indexes. The IGR and multi-collinearity diagnosis were used for the selection of model factors. Results of this study show that: (1) The most effective contributing factor for groundwater pollution is the net recharge followed by fault density, soil media and altitude, and lowest effective factors are impact of vadose zone and hydraulic conductivity. (2) The highest prediction power was obtained by WOE method for modifying weights and rates. The rating was more important than weighting for groundwater vulnerability modeling using WOE model. (3) By including more factors in the DRASTIC model better representation of groundwater vulnerability assessment was achieved. (4) The weighting in DRASTIC model is not constant and must be considered according to condition of study area. (5) According to WOE model, the rating is more important than weighting for groundwater vulnerability modeling.

6. Drawbacks and recommendations The main drawbacks of this study can be summarized as: (1) large scale of the input data, (2) low density of borehole construction in the aquifer and vadoze zone map, (3) limited data for preparing soil media, and (4) higher density of nitrate samples with excellent distribution within the case study.

Fig. 8 shows that four big and important cities of Sari, Neka, Behshahr and Galugah with a combined population of about 570,000 are located on areas with high vulnerability. Pollution may be due to activities such as discharge from household, small industries, farms in the village or close by (on the north of Iran, there is high population density and cities and villages are close to each other). Thus it is recommended to prevent the further development of manufacturing and industrial areas without proper treatment, which could pollute groundwater. It is also recommended that water quality of the aquifers, especially areas with high vulnerability be investigated and monitored and the areas which are suitable for drinking water and agriculture be identified in order to implement proper protection and management plans. Appendix A

Table A1 Multi-collinearity diagnosis test for all conditioning factors. No

Factors

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Depth to groundwater Net recharge Aquifer media Soil media Topography Impact of vadose zone Hydraulic conductivity Distance to fault fault density Distance from river river density Land-use Geology Altitude Soil order

Collinearity statistics Tolerance

VIF

0.403 0.22 0.75 0.27 0.38 0.37 0.36 0.30 0.21 0.79 0.77 0.72 0.65 0.23 0.40

2.47 4.4 1.32 3.59 2.60 2.70 2.77 3.26 4.7 1.21 1.29 1.3 1.53 4.32 2.47

References Abbasi, S., Mohammadi, K., Kholghi, M., Howard, K., 2013. Aquifer vulnerability assessments using DRASTIC, weights of evidence and the analytic element method. Hydrol. Sci. J. 58, 186–197. Akgun, A., 2012. A comparison of landslide susceptibility maps produced by logistic regression, multi-criteria decision, and likelihood ratio methods: a case study at İzmir, Turkey. Landslides 9, 93–106.

1048

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049

Al-Abadi, A.M., 2017. The application of Dempster–Shafer theory of evidence for assessing groundwater vulnerability at Galal Badra basin, Wasit governorate, east of Iraq. Appl Water Sci 7, 1725–1740. Aller, L., Bennett, T., Lehr, J., Petty, R., Hackett, G., 1987a. DRASTIC-A Standardized System for Evaluating Ground Water Pollution Potential Using Hydrogeologic Settings: Ada, Okla., Robert S. Ken-Environmental Research Laboratory. EPA/600/2-87-035, Volumes 1 and 2. Aller, L., Lehr, J.H., Petty, R., Bennett, T., 1987b. DRASTIC: A Standardized System to Evaluate Groundwater Pollution Potential Using Hydrogeologic Settings. National Water Well Association, Worthington, Ohio, United States of America. Anwar, M., Prem, C., Rao, V., 2002. Evaluation of groundwater potential of Musi River catchment using DRASTIC index model. Hydrology and watershed management. Proceedings of the International Conference, pp. 18–20. Asadi, P., Hosseini, S.M., Ataie-Ashtiani, B., Simmons, C.T., 2017. Fuzzy vulnerability mapping of urban groundwater systems to nitrate contamination. Environ. Model. Softw. 96, 146–157. Ayalew, L., Yamagishi, H., 2005. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65, 15–31. Babiker, I.S., Mohamed, M.A., Hiyama, T., Kato, K., 2005. A GIS-based DRASTIC model for assessing aquifer vulnerability in Kakamigahara Heights, Gifu Prefecture, central Japan. Sci. Total Environ. 345, 127–140. Bonham-Carter, G.F., 2014. Geographic Information Systems for Geoscientists: Modelling With GIS. vol. 13. Elsevier. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A., 1984. Classification and Regression Trees. CRC Press. Bui, D.T., Pradhan, B., Nampak, H., Bui, Q.-T., Tran, Q.-A., Nguyen, Q.-P., 2016a. Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS. J. Hydrol. 540, 317–330. Bui, D.T., Tuan, T.A., Klempe, H., Pradhan, B., Revhaug, I., 2016b. Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 13, 361–378. Chapi, K., Singh, V.P., Shirzadi, A., Shahabi, H., Bui, D.T., Pham, B.T., et al., 2017. A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ. Model. Softw. 95, 229–245. Chen, W., Ding, X., Zhao, R., Shi, S., 2016. Application of frequency ratio and weights of evidence models in landslide susceptibility mapping for the Shangzhou District of Shangluo City, China. Environ. Earth Sci. 75, 64. Dixon, B., 2005. Applicability of neuro-fuzzy techniques in predicting ground-water vulnerability: a GIS-based sensitivity analysis. J. Hydrol. 309, 17–38. Evans, B.M., Myers, W.L., 1990. A GIS-based approach to evaluating regional groundwater pollution potential with DRASTIC. J. Soil Water Conserv. 45, 242–245. Fijani, E., Nadiri, A.A., Moghaddam, A.A., Tsai, F.T.-C., Dixon, B., 2013. Optimization of DRASTIC method by supervised committee machine artificial intelligence to assess groundwater vulnerability for Maragheh–Bonab plain aquifer, Iran. J. Hydrol. 503, 89–100. Gardner, K.K., Vogel, R.M., 2005. Predicting ground water nitrate concentration from land use. Groundwater 43, 343–352. Hamutoko, J., Wanke, H., Voigt, H., 2016. Estimation of groundwater vulnerability to pollution based on DRASTIC in the Niipele sub-basin of the Cuvelai Etosha Basin, Namibia. Phys. Chem. Earth Parts A/B/C 93, 46–54. Hashimoto, T., Stedinger, J.R., Loucks, D.P., 1982. Reliability, resiliency, and vulnerability criteria for water resource system performance evaluation. Water Resour. Res. 18, 14–20. Kang, J., Zhao, L., Li, R., Mo, H., Li, Y., 2017. Groundwater vulnerability assessment based on modified DRASTIC model: a case study in Changli County, China. Geocarto Int. 32, 749–758. Kazakis, N., Voudouris, K.S., 2015. Groundwater vulnerability and pollution risk assessment of porous aquifers to nitrate: modifying the DRASTIC method using quantitative parameters. J. Hydrol. 525, 13–25. Khosravi, K., Nohani, E., Maroufinia, E., Pourghasemi, H.R., 2016a. A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio and weights-of-evidence bivariate statistical models with multi-criteria decision-making technique. Nat. Hazards 83, 947–987. Khosravi, K., Pourghasemi, H.R., Chapi, K., Bahri, M., 2016b. Flash flood susceptibility analysis and its mapping using different bivariate models in Iran: a comparison between Shannon's entropy, statistical index, and weighting factor models. Environ. Monit. Assess. 188, 656. Khosravi, K., Pham, B.T., Chapi, K., Shirzadi, A., Shahabi, H., Revhaug, I., et al., 2018. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. 627, 744–755. Kim, Y.J., Hamm, S.-Y., 1999. Assessment of the potential for groundwater contamination using the DRASTIC/EGIS technique, Cheongju area, South Korea. Hydrogeol. J. 7, 227–235. Kisi, O., Dailr, A.H., Cimen, M., Shiri, J., 2012. Suspended sediment modeling using genetic programming and soft computing techniques. J. Hydrol. 450, 48–58. Landwehr, N., Hall, M., Frank, E., 2005. Logistic model trees. Mach. Learn. 59, 161–205. Lee, S., Kim, Y.-S., Oh, H.-J., 2012. Application of a weights-of-evidence method and GIS to regional groundwater productivity potential mapping. J. Environ. Manag. 96, 91–105. LeGrand, H.E., 1964. System for evaluation of contamination potential of some waste disposal sites. J. Am. Water Works Assoc. 56, 959–974. Miller, J.R., 1990. Morphometric assessment of lithologic controls on drainage basin evolution in the Crawford Upland, South-Central India. Am. J. Sci. 290, 569–599. Nachiappan, M.R., Sugumaran, V., Elangovan, M., 2016. Performance of logistic model tree classifier using statistical features for fault diagnosis of single point cutting tool. Indian J. Sci. Technol. 9.

Nadiri, A.A., Gharekhani, M., Khatibi, R., Moghaddam, A.A., 2017a. Assessment of groundwater vulnerability using supervised committee to combine fuzzy logic models. Environ. Sci. Pollut. Res. 24, 8562–8577. Nadiri, A.A., Gharekhani, M., Khatibi, R., Sadeghfam, S., Moghaddam, A.A., 2017b. Groundwater vulnerability indices conditioned by supervised intelligence committee machine (SICM). Sci. Total Environ. 574, 691–706. Naghibi, S.A., Pourghasemi, H.R., Abbaspour, K., 2018. A comparison between ten advanced and soft computing models for groundwater qanat potential assessment in Iran using R and GIS. Theor. Appl. Climatol. 131, 967–984. Neshat, A., Pradhan, B., 2015a. An integrated DRASTIC model using frequency ratio and two new hybrid methods for groundwater vulnerability assessment. Nat. Hazards 76, 543–563. Neshat, A., Pradhan, B., 2015b. Risk assessment of groundwater pollution with a new methodological framework: application of Dempster–Shafer theory and GIS. Nat. Hazards 78, 1565–1585. Neshat, A., Pradhan, B., Dadras, M., 2014. Groundwater vulnerability assessment using an improved DRASTIC method in GIS. Resour. Conserv. Recycl. 86, 74–86. O'brien, R.M., 2007. A caution regarding rules of thumb for variance inflation factors. Qual. Quant. 41, 673–690. Osaragi, T., 2002. Classification Methods for Spatial Data Representation. Ouedraogo, I., Defourny, P., Vanclooster, M., 2016. Mapping the groundwater vulnerability for pollution at the pan African scale. Sci. Total Environ. 544, 939–953. Panagopoulos, G., Antonakos, A., Lambrakis, N., 2006. Optimization of the DRASTIC method for groundwater vulnerability assessment via the use of simple statistical methods and GIS. Hydrogeol. J. 14, 894–911. Pham, B.T., Prakash, I., 2017a. Evaluation and comparison of LogitBoost ensemble, Fisher's linear discriminant analysis, logistic regression, and support vector machines methods for landslide susceptibility mapping. Geocarto Int. 1–32. Pham, B.T., Prakash, I., 2017b. A novel hybrid model of Bagging-based Naïve Bayes Trees for landslide susceptibility assessment. Bull. Eng. Geol. Environ. 1–15. Pham, B.T., Bui, D.T., Prakash, I., 2017a. Landslide susceptibility assessment using bagging ensemble based alternating decision trees, logistic regression and J48 decision trees methods: a comparative study. Geotech. Geol. Eng. 1–15. Pham, B.T., Khosravi, K., Prakash, I., 2017b. Application and comparison of decision treebased machine learning methods in landside susceptibility assessment at Pauri Garhwal Area, Uttarakhand, India. Environ. Process. 4, 711–730. Pham, B.T., Khosravi, K., Prakash, I., 2017c. Application and comparison of decision treebased machine learning methods in landside susceptibility assessment at Pauri Garhwal Area, Uttarakhand, India. Environ. Process. 1–20. Pham, B.T., Prakash, I., Bui, D.T., 2017d. Spatial prediction of landslides using hybrid machine learning approach based on Random Subspace and Classification and Regression Trees. Geomorphology 1–15. Pham, B.T., Shirzadi, A., Bui, D.T., Prakash, I., Dholakia, M., 2017e. A hybrid machine learning ensemble approach based on a Radial Basis Function neural network and Rotation Forest for landslide susceptibility modeling: a case study in the Himalayan area, India. Int. J. Sediment Res. Pham, B.T., Prakash, I., Bui, D.T., 2017f. Spatial prediction of landslides using hybrid machine learning approach based on Random Subspace and Classification and Regression Trees. Geomorphology 1–15. Pham, B.T., Jaafari, A., Prakash, I., Bui, D.T., 2018a. A novel hybrid intelligent model of support vector machines and the MultiBoost ensemble for landslide susceptibility modeling. Bull. Eng. Geol. Environ. 1–22. Pham, B.T., Son, L.H., Hoang, T.-A., Nguyen, D.-M., Tien, Bui D., 2018b. Prediction of shear strength of soft soil using machine learning methods. Catena 166, 181–191. Pham, B.T., Tien Bui, D., Prakash, I., 2018c. Bagging based Support Vector Machines for spatial prediction of landslides. Environ. Earth Sci. 77, 146. Piscopo, G., Pleasure, P., Sinclair, P., 2001. Groundwater Vulnerability Map Explanatory Notes, Lachlan Catchment, Centre of Natural Resources. New South Wales (NSW) Department of Land and Water Conservation, p. 14. Pourghasemi, H.R., Mohammady, M., Pradhan, B., 2012. Landslide susceptibility mapping using index of entropy and conditional probability models in GIS: Safarood Basin, Iran. Catena 97, 71–84. Pradhan, B., 2013. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 51, 350–365. Pradhan, B., Oh, H.-J., Buchroithner, M., 2010. Weights-of-evidence model applied to landslide susceptibility mapping in a tropical hilly area. Geomat. Nat. Haz. Risk 1, 199–223. Quinlan, J.R., 1993. C4. 5: Programming for Machine Learning. vol. 38. Morgan Kauffmann, p. 48. Quinlan, J.R., 1996. Bagging, Boosting, and C4. 5. vol. 1. AAAI/IAAI, pp. 725–730. Rahman, A., 2008. A GIS based DRASTIC model for assessing groundwater vulnerability in shallow aquifer in Aligarh, India. Appl. Geogr. 28, 32–53. Rahmati, O., Haghizadeh, A., Pourghasemi, H.R., Noormohamadi, F., 2016. Gully erosion susceptibility mapping: the role of GIS-based bivariate statistical models and their comparison. Nat. Hazards 82, 1231–1258. Sadeghfam, S., Hassanzadeh, Y., Nadiri, A.A., Zarghami, M., 2016. Localization of groundwater vulnerability assessment using catastrophe theory. Water Resour. Manag. 30, 4585–4601. Sahoo, M., Sahoo, S., Dhar, A., Pradhan, B., 2016. Effectiveness evaluation of objective and subjective weighting methods for aquifer vulnerability assessment in urban context. J. Hydrol. 541, 1303–1315. Secunda, S., Collin, M., Melloul, A.J., 1998. Groundwater vulnerability assessment using a composite model combining DRASTIC with extensive agricultural land use in Israel's Sharon region. J. Environ. Manag. 54, 39–57. Shrestha, S., Semkuyu, D.J., Pandey, V.P., 2016. Assessment of groundwater vulnerability and risk to pollution in Kathmandu Valley, Nepal. Sci. Total Environ. 556, 23–35.

K. Khosravi et al. / Science of the Total Environment 642 (2018) 1032–1049 Tehrany, M.S., Pradhan, B., Mansor, S., Ahmad, N., 2015. Flood susceptibility assessment using GIS-based support vector machine model with different kernel types. Catena 125, 91–101. Termeh, S.V.R., Kornejady, A., Pourghasemi, H.R., Keesstra, S., 2018. Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. Sci. Total Environ. 615, 438–451. Van Westen, C., Rengers, N., Soeters, R., 2003. Use of geomorphological information in indirect landslide susceptibility assessment. Nat. Hazards 30, 399–419. Wang, J., He, J., Chen, H., 2012. Assessment of groundwater contamination risk using hazard quantification, a modified DRASTIC model and groundwater value, Beijing Plain, China. Sci. Total Environ. 432, 216–226.

1049

Witten, I.H., Frank, E., 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J., 2016. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. Yesilnacar, E.K., 2005. The Application of Computational Intelligence to Landslide Susceptibility Mapping in Turkey. University of Melbourne, Department, p. 200.