Groundwater Depth Prediction Using Data-Driven Models with ... - MDPI

3 downloads 271 Views 859KB Size Report
Oct 25, 2016 - model; back-propagation artificial neural network; support vector machine. 1. ..... and Service Solutions (SPSS) software in version 19.0.
sustainability Article

Groundwater Depth Prediction Using Data-Driven Models with the Assistance of Gamma Test Jiyang Tian 1 , Chuanzhe Li 1, *, Jia Liu 1,2 , Fuliang Yu 1 , Shuanghu Cheng 3 , Nana Zhao 4 and Wan Zurina Wan Jaafar 5 1

2 3 4 5

*

State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower Research, Beijing 100038, China; [email protected] (J.T.); [email protected] (J.L.); [email protected] (F.Y.) State Key Laboratory of Hydrology-Water Resource and Hydraulirc Engineering, Hohai University, Nanjing 210098, China Bureau of Water Resources Survey of Heibei, Shijiazhuang 050031, China; [email protected] Institute of Wetland Research, Chinese Academy of Forestry, Beijing 100091, China; [email protected] Department of Civil Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur 50603, Malaysia; [email protected] Correspondence: [email protected]; Tel.: +86-10-6878-1656

Academic Editor: Vincenzo Torretta Received: 26 August 2016; Accepted: 18 October 2016; Published: 25 October 2016

Abstract: Prediction of the groundwater dynamics via models can help better manage the groundwater resources and guarantee their sustainable use. Three types of data-driven models are built for groundwater depth prediction in the plain of Shijiazhuang, the capital of Hebei Province in North China. The data-driven models include the Power Function Model (PFM), Back-Propagation Artificial Neural Network (BPANN) and Support Vector Machines (SVM) with two kernel functions of linear kernel function (LKF) and radial basis function (RBF). Five classes of factors (including 12 indices) are considered as potential model input variables. The Gamma Test (GT) is adopted in this study to help identify the relative importance of the input indices and tackle the tricky issue of the optimal input combinations for the data-driven models. The established models are evaluated in both fitting and testing procedures based on the root mean squared error (RMSE) and Nash-Sutcliffe efficiency (E) for different input combination schemes. The results show that SVM (RBF) performs the best. It is interesting to find that the natural factors (i.e., precipitation and evaporation) are less relevant to the groundwater depth variations. The methods used in this study have much significance for groundwater depth prediction in areas lacking hydrogeological data. Keywords: groundwater dynamics prediction; data-driven models; Gamma Test; power function model; back-propagation artificial neural network; support vector machine

1. Introduction Groundwater is an important part of water resources for domestic, agricultural, industrial, and environmental uses due to its generally good quality, widespread availability, good water stability and the fact it is not easily polluted [1]. Groundwater resources are especially important in the arid and semi-arid regions, which can cover the shortage of water caused by uneven distribution of the surface water in time and space. However, arable land increases, urban expansion and population growth have caused a dramatic increase in water consumption, which has led to a decrease of the groundwater level in many countries all over the world [2,3].

Sustainability 2016, 8, 1076; doi:10.3390/su8111076

www.mdpi.com/journal/sustainability

Sustainability 2016, 8, 1076

2 of 17

With the rapid economic development in China in the past several decades, the demand for water has increased at a fast speed. This has led to an excessive use of water resources, especially groundwater. At present, there are many groundwater overdraft areas in China, among which Hebei Province in North China plain is one of the most serious overdraft areas where the groundwater level has decreased dramatically [4,5]. Much attention has been paid by both researchers and policy makers in this area. There have been studies focusing on both the reasons for and the negative effects of the continuous decline of the groundwater level in Hebei province and the North China plain [6,7]. Groundwater dynamic forecasting provides important information for groundwater management. Physical descriptive models and data-driven models are two classes of dynamic prediction models [8]. Detailed hydrogeological data in space and time, which are usually obtained by a large number of field experiments, are required to construct the physical models in practical application. So, a data-driven model is a good choice when data are limited and the groundwater system is complex. Power Function Model (PFM), Back-Propagation Artificial Neural Network (BPANN) and Support Vector Machines (SVM) are popular data-driven models used for forecasting the groundwater level [9–11]. Among these methods, PFM is a simple non-linear regression (NLR) model, while the other two models are more complex. Some studies indicate that the BPANN model and SVM model perform better, especially the SVM model [12], but the prediction abilities of these two models are not always good for all cases due to the complexities of hydrogeological conditions and the groundwater flow in different regions [13]. There are many causes for the decrease of the groundwater level, including natural factors, anthropic factors, biological factors and economic factors, which can all be treated as the inputs of the data-driven models. To construct a reliable data-driven model for groundwater prediction, some challenging questions need to be answered beforehand. For example, which inputs are relevant to the groundwater level and which are irrelevant? To what extent do the inputs determine the output from a smooth model? How many data points are required to make a prediction with the best possible accuracy? So far, these questions have not been addressed adequately in groundwater level prediction using data-driven models. However, it is possible that significant progress can be made to tackle these questions, because the Gamma Test (GT) has been presented as an efficient algorithm. By estimating the variance of the noise in the raw data, GT can directly estimate the best mean squared error for a given selection of input by a smooth model on unseen data, before the model construction. In this way, it can help find the best input data combination and identify the sufficient number of data points used for model training. A formal proof of the GT can be found in Evans and Jones [14]. Then, a series of studies have indicated that GT can greatly reduce the model construction work and can potentially be used in the area of hydrological prediction and water management. Moghaddamnia and colleagues used GT to select the input data for Artificial Neural Networks (ANN) and the adaptive neuro-fuzzy inference system for evaporation estimation [15]. Noori and colleagues assessed the SVM model performance for monthly stream flow prediction with the assistance of GT [16]. Han and colleagues explored the model structure for index flood regionalization using GT [17]. However, with respect to groundwater prediction, little work has been done using this efficient tool. In this study, GT is used to rank the relative importance of the model inputs and find the best input combinations before the data-driven models are calibrated. Three data-driven models, i.e., PFM, BPANN and SVM, are used to predict the variations of the groundwater depth and the performances of the three established models are further compared. Twelve indices, including natural, anthropic, biological, economic and social factors which may influence the groundwater depth, are considered as the input of the data-driven models. The study is carried out in the plain of Shijiazhuang, the capital city of Hebei province in North China. During the past three decades, the groundwater level in this region has declined by 30 m, which has not only restricted the development of the economy but has also caused serious environmental problems [18]. It is hoped that the study can provide a novel and complementary methodology for the prediction of groundwater level dynamics in the study area.

Sustainability 2016, 8, 1076

3 of 17

2. Data and Methodology Sustainability 2016, 8, 1076 2.1. Study Area

3 of 17

2.1. Study Area plain (from lat 37◦ 330 N to lat 38◦ 420 N and from long 113◦ 180 E to long 114◦ 410 E) is Shijiazhuang located inShijiazhuang central and plain eastern Hebei province belongs to the Sea Economic The (from lat 37°33′ N to and lat 38°42′ N and fromBohai long 113°18′ E to longZone. 114°41′ E) total 2 . The average annual precipitation is about 490 mm, and the area of Shijiazhuang plain is 8157 km is located in central and eastern Hebei province and belongs to the Bohai Sea Economic Zone. The 2. The distribution uneven plain in space andkm time. Over 70%annual of theprecipitation precipitation occurs inmm, the and summer total areaisofquite Shijiazhuang is 8157 average is about 490 ◦ C in the the from distribution quite uneven space and Over 70%ranges of the precipitation in the summer months June tois August. Theinaverage airtime. temperature from −0.8 occurs winter seasons ◦ months from June to August. The average air temperature ranges from −0.8 °C in the winter seasons to 25.9 C in the summer seasons. The plain comprises Shijiazhuang city and 13 counties, including to 25.9Luquan, °C in theYuanshi, summer Gaoyi, seasons.Xinle, The plain comprises Shijiazhuang city and 13 counties, including Xingtang, Zhengding, Luancheng, Zhaoxian, Wuji, Gaocheng, Jinzhou, Xingtang, Luquan, Yuanshi, Gaoyi, Xinle, Zhengding, Luancheng, Zhaoxian, Wuji, Gaocheng, Xinji and Shenze. As a semi-arid region, water resources are critical to economic development and Jinzhou, Xinji and Shenze. As a semi-arid region, water resources are critical to economic groundwater is the main source of water supply. The location of the study area and the monitoring development and groundwater is the main source of water supply. The location of the study area wellsand andthe meteorological stations located in this area are shown in Figure 1. Two aquifers (the Holocene monitoring wells and meteorological stations located in this area are shown in Figure 1. Two (Q4) and Upper Pleistocene (Q3)) inUpper Shijiazhuang plain are in composed of quaternary unconsolidated aquifers (the Holocene (Q4) and Pleistocene (Q3)) Shijiazhuang plain are composed of sediments, including sandy loam, sand and gravel-cobble. The sand west and of Xingtang, Luquan quaternary unconsolidated sediments, including sandy loam, gravel-cobble. The and westYuanshi of belongs to Q4, Luquan and other of the studytoarea to parts Q3. The burial of the to floor 7–20 m Xingtang, andparts Yuanshi belongs Q4, belong and other of the studydepth area belong Q3.isThe burial 7–20 m for Q4 and 40–100m m3 /(h for Q3. specific field is35 20–30 m3/(h·m) for Q4 and depth 40–100ofmthe forfloor Q3. isThe specific field is 20–30 ·m) The for Q4 and about m3 /(h ·m) for Q3. 3/(h·m) for Q3. Both of the aquifers contain a groundwater mineralization level for Q4 and about 35 m Both of the aquifers contain a groundwater mineralization level lower than 0.5 g/L. All the 14 wells lower 0.5 g/L. the 14 wells are used to monitor the shallow groundwater. are used tothan monitor theAll shallow groundwater.

Figure 1. The location of of thethe study ofmonitoring monitoring wells and meteorological stations. Figure 1. The location studyarea areaand andthe the distributions distributions of wells and meteorological stations.

2.2. Input Indices Data Source 2.2. Input Indices andand Data Source Natural, anthropic, biological,economic economic and areare closely related to the Natural, anthropic, biological, and social socialfactors factorswhich which closely related to the decline of the groundwater level are chosen as the key inputs of the data-driven models. These are decline of the groundwater level are chosen as the key inputs of the data-driven models. These are 12 indices, including precipitation, evaporation, groundwater exploitation, industrial groundwater 12 indices, including precipitation, evaporation, groundwater exploitation, industrial groundwater use, use, agricultural irrigation groundwater use, crop yield, population, urbanization rate, primary agricultural groundwater use,output, crop yield, population, urbanization primary industry industryirrigation output, secondary industry tertiary industry output and grossrate, domestic product output, secondary industry output, tertiary industry output and gross domestic product (GDP). (GDP). Precipitation and evaporation are the main natural factors which influence the groundwater Precipitation evaporation are the main natural whichfetching influence recharge and and loss. Groundwater exploitation is thefactors groundwater andthe usegroundwater quantity, and recharge it is the main anthropic factor which is the major reason for groundwater descending in theand study and loss. Groundwater exploitation is the groundwater fetching and use quantity, it area is theinmain recentfactor decades [19], is while industrial and for agricultural irrigation groundwater uses are two anthropic which the major reason groundwater descending in the study areamajor in recent causes of groundwater exploitation. Crop yield is the main biological factor which reflects the crop decades [19], while industrial and agricultural irrigation groundwater uses are two major causes

Sustainability 2016, 8, 1076

4 of 17

of groundwater exploitation. Crop yield is the main biological factor which reflects the crop water consumption, especially the groundwater consumption in the study area [20]. GDP is a comprehensive index indicating the regional economic strength and the pressure on groundwater use, while primary industry output, secondary industry output and tertiary industry output can reflect different aspects of economic development. Primary industry includes agriculture, forestry, animal husbandry and fishery. Secondary industry includes mining, manufacturing and construction. Tertiary industry refers to all other economic activities not included in the primary or secondary industry. Population and urbanization rate are two important factors for social development. The growth of population and the increase of urbanization rate can also raise the use of water, especially groundwater in this region. Table 1 shows the serial number of the 12 indices and their mutual relations are illustrated by Figure 2. Groundwater depth is the only index which is used to show the change of groundwater resources in this study. The higher the groundwater depth, the more scarce the groundwater resources. As located in Figure 1, 14 monitoring wells covering the whole study area are chosen to provide observational data on the annual groundwater depth from 1984 to 2013. The continuous decrease of groundwater depth can be easily observed over the 30-year period (Figure 3), and the seasonal variation of groundwater depth is obvious. The lowest groundwater level appears in summer, while the highest groundwater level can be found in winter. The groundwater depth is highly related to groundwater exploitation in different seasons for one year. In order to evaluate the groundwater of the whole study area, the average groundwater depth of 14 monitoring wells in each year is used by the models. Observed precipitation and evaporation at 13 meteorological stations are also obtained for the same period from the National Meteorological Information Center of China Meteorological Administration (available at http://www. nmic.gov.cn/). The average precipitation and evaporation from the 13 meteorological stations in each year are used as the model inputs. Thirty-year data of the other 10 input indices, including annual groundwater exploitation, industrial groundwater use, agricultural irrigation groundwater use, crop yield, population, urbanization rate, primary industry output, secondary industry output, tertiary industry output and GDP for the whole study area are also obtained for the period from 1984 to 2013. For each of the 10 input indices, annual value is the sum of the corresponding values of Shijiazhuang city and 13 counties. All the data of the 10 input indices come from the Shijiazhuang Statistical Yearbook [21]. Table 1. Serial number of the 12 input indices. Number

Indices

Number

Indices

1 2 3 4 5 6

Precipitation (mm) Evaporation (mm) Groundwater exploitation (million m3 ) Industrial groundwater use (million m3 ) Irrigation groundwater use (million m3 ) Crop yield (million ton)

7 8 9 10 11 12

GDP (million USD) Primary industry output (million USD) Secondary industry output (million USD) Tertiary industry output (million USD) Population (million) Urbanization rate (%)

Sustainability 2016, 8, 1076 Sustainability 2016, 8, 1076

5 of 17 5 of 17

Sustainability 2016, 8, 1076

5 of 17

900 900 800 800 700 700 600 600 500 500 400 400 300 300 200 200 100 100 0 0

0 0 Annual precipitation Annual precipitation 5 Multi-year average precipitation 5 Multi-year average precipitation Groundwater depth 10 Groundwater depth 10 15 15 20 20 25 25 30 30 35 35

Groundwaterdepth/m depth/m Groundwater

Precipitation/mm Precipitation/mm

Figure 2. Mutual relations of the 12 input indices. The blue box shows the five key factors, which are Figure 2. Mutual relations of the 12 input indices. The blue box shows the five key factors, which are Figure 2. Mutual relations oflevel the 12changes. input indices.green, The blue box shows the fiveyellow key factors, which are closely related to groundwater red, cyan, grey boxes respectively closely related to groundwater level changes. The The green, red, cyan, grey andand yellow boxes respectively closely related to groundwater level changes. The green, red, cyan, grey and yellow boxes respectively represent natural, anthropic, economic, biological and lineswith withtwotwo-way represent natural, anthropic, economic, biological andsocial socialindices. indices. The The connecting connecting lines represent natural, anthropic, economic, biological and social indices. The connecting lines with twoarrows mean interaction effects between the indices and the groundwater level. The connecting way arrows mean interaction effects between the indices and the groundwater level. The connecting lines way arrows mean interaction effects between the indices and the groundwater level. The connecting with single arrows one-way supplement consumption of andand the lines with lines single arrows mean mean one-way supplement oror consumption ofthe thegroundwater, groundwater, the lines lines with single arrows mean one-way supplement or consumption of the groundwater, and the lines without arrow mean inclusiveness. without arrow mean inclusiveness. without arrow mean inclusiveness.

Year Year Figure 3. Precipitation andgroundwater groundwater depth from toto 2013 in Shijiazhuang plain. Figure 3. Precipitation and from1984 1984 2013 Shijiazhuang plain. Figure 3. Precipitation and groundwater depth depth from 1984 to 2013 in in Shijiazhuang plain.

2.3. Methodology

2.3. Methodology 2.3. Methodology 2.3.1. Gamma Test

Gamma 2.3.1.2.3.1. Gamma TestTest The GT estimates the mean square error (MSE) that can be achieved when modelling the unseen Theestimates GT estimates mean squareerror error (MSE) that can achieved when modelling the unseen The GT thethe mean square that canbe beGT achieved modelling unseen data using any continuous nonlinear model. (MSE) A formal proof of is given when by Evans and Jonesthe [14]. data using any continuous nonlinear model. A formal proof of GT is given by Evans and Jones [14]. data The using any continuous nonlinear model. A formal proof of GT is given by Evans and Jones basic idea is quite distinct from the earlier attempts with nonlinear analysis. Suppose we have a [14]. The basic idea is quite distinct from the earlier attempts with nonlinear analysis. Suppose we have a The basic is quite distinct set of idea data observations in thefrom form:the earlier attempts with nonlinear analysis. Suppose we have a set of data observations in the form:

set of data observations in the form:

≤ i ≤ M} {{((xxi ,,yyi ),1 i i ),1 ≤ i ≤ M}

(1) (1)

yi )some , 1 ≤ closed i ≤ M}bounded set C ∈ RM and, yi ∈ R are (1) {(xi ,to where xi ∈ RMM are input vectors confined where xi ∈ R are input vectors confined to some closed bounded set C ∈ RM and, yi ∈ R are corresponding outputs. The system of GT can be expressed as the following form: system of GT can expressed as bounded the following wherecorresponding x ∈ RM areoutputs. input The vectors confined tobe some closed setform: C ∈ RM and, y ∈ R are i

i

corresponding outputs. The system of GT can be expressed as the following form: y = f(x1 · · · xm ) + r

(2)

Sustainability 2016, 8, 1076

6 of 17

where f is a smooth function and r is a random variable representing noise. Generally, the mean of the distribution of r is assumed as 0 and the variance of the noise Var(r) is bounded. The Gamma statistic Γ is the main parameter, which can estimate the model’s output variance. For each vector xi (1 ≤ i ≤ M), the N[i,k] are the kth (1 ≤ k ≤ p) nearest neighbors xN[i,k] (1 ≤ k ≤ p). The distance statistic of input data can be calculated: δM (k) =

2 1 M xN(i,k) − xi (1 ≤ k ≤ p) ∑ M i=1

(3)

where | . . . | denotes Euclidean distance, and the corresponding Gamma function of the output values: γM (k) =

2 1 M − y y N(i,k) i (1 ≤ k ≤ p ) 2M i∑ =1

(4)

where y is the corresponding y-value for the kth nearest neighbor of xi in Equation (3). In order to compute Γ, the p points (δM (k), γM (k)) are calculated by univariate linear regression equation with least-squares: γ = Aδ + Γ (5) The value of Γ is the intercept of the Equation (5). Usually, the data, which are tested by GT, are normalized to a range of 0–1. Particularly, when δ equals 0, Γ value can be computed by γ: γM (k) → Var (r ) in probability as δM (k) → 0

(6)

A more detailed description of calculation principles can be found in Evans and Jones [14]. The V ratio is another term, which can return a scale invariant noise estimate: Vratio =

Γ σ2 (y)

(7)

where σ2 (y) is the variance of output y. According to the definition of V ratio , the value of V ratio close to 0 indicates a high degree of predictability of the given output y. In addition, the estimation of noise variance on the given output can be more credible if the standard error (SE) value is close to 0, and the complexity of the smooth function can be measured by the value of gradient. 2.3.2. Power Function Model (PFM) The PFM is a kind of non-linear regression model, which is often used for prediction by establishing the relationship between the forecast factors and the forecast objects. The general equation is as follows: P P (8) y = p0 x1 1 xP2 2 · · · xMN where y is the forecast object, pk (k = 0, . . . ,N) are the parameters generally estimated by least squares and xk (k = 1, . . . ,M) are the explanatory variables (forecast factors). 2.3.3. Back-Propagation Artificial Neural Network (BPANN) The ANN is a nonlinear arithmetic system with densely interconnected processing elements or neurons. Three kinds of layers are contained in the mathematical structure, including input, hidden, and output layers. These layer has their nodes and activation functions. The back-propagation algorithm can effectively train the network and shorten the learning time, and it needs little information about the complex mechanism and process which should be explicitly described in mathematical form. If the neuron is the jth one in the present layer, while the inputs which it receives from the other n neurons are x1 , x2 , . . . . . . ,xn , respectively, in the previous layer. The connection weights between the

Sustainability 2016, 8, 1076

7 of 17

Sustainability 2016, 8, 1076

7 of 17

between thethe jth other neuron the other n 1jneurons wnj 2j,, ……, wnj, respectively. The mathematical jth neuron and n and neurons are w , w2j , . . are . . .w. 1j,, w respectively. The mathematical expression is as expression follows: is as follows: ! n

n yj = f  ∑ wij xi + bj 

y j = f ∑ w x + bj  1 ij i i =1  i= 

(9)

(9)

where yj is the output of the neuron, bj is the threshold of the neuron, and f is the transfer function. where yj is the output of the neuron, bj is the threshold of the neuron, and f is the transfer function. In this study, one layerisisused used to build the ANN by the In this study, onehidden hidden layer to build the ANN model,model, which iswhich trainedisbytrained the backback-propagation algorithm. A log-sigmoid andfunction a linearare function are contained hidden propagation algorithm. A log-sigmoid and a linear contained in hidden layerin and output layer and output layer, respectively. kind of configuration is the most commonly forcan ANNs, layer, respectively. This kind This of configuration is the most commonly used for ANNs,used which which can improve extrapolation ability. layer, layerlayer and have output have improve extrapolation ability. The input The layer,input hidden layerhidden and output fivelayer nodes, 30 five nodes and one node, (Figure 4).(Figure A momentum term is also added the weight nodes, 30 nodes and onerespectively node, respectively 4). A momentum term isinalso addedupdating in the weight process to avoid results captured local minimum. number of hidden layerof and nodes layer updating process tothe avoid thebeing results being in captured in localThe minimum. The number hidden is discussed in Section and nodes is discussed in3.5. Section 3.5.

Figure 4. The architecture the Back-Propagation Artificial Network (BPANN) model with with Figure 4. The architecture of of the Back-Propagation ArtificialNeural Neural Network (BPANN) model 12 input layers, 30 hidden layers and one output layer. 12 input layers, 30 hidden layers and one output layer.

2.3.4. Support Vector Machines (SVMs)

2.3.4. Support Vector Machines (SVMs)

The SVM model is based on Vapnik-Chervonenkis (VC) dimension and structural risk minimum The SVM[22]. model based on Vapnik-Chervonenkis dimension and structural risk minimum principle TheisSVM provides a new approach to (VC) solving the nonlinear and high dimension principle [22].with TheaSVM a new nonlinear andnonlinear high dimension problem problem smallprovides sample set. Theapproach basic ideatoofsolving SVM isthe to search for the relationship between input and output vectors through nonlinear transformation of the input vector into the highwith a small sample set. The basic idea of SVM is to search for the nonlinear relationship between input N

dimensional feature space.nonlinear Given a settransformation of N samples ofof{xthe ( xk is the vector, y k is and output vectors through intoinput the high-dimensional k , yinput k }k =1 vector N feature space. Given a set of N samples of {x , y } (x is the input vector, y is the the corresponding output value), and the regression of SVM can be expressed as:corresponding k k k=1function k k output value), and the regression function of SVM can be expressed as:

y = f (x) = w ⋅ ϕ(x) + b

(10)

y = f (x) transfer = w · ϕfunction (x) + b that implements transformation the (10) where w is a weight vector, ϕ is a nonlinear nonlinear to linear relationship of input to output vectors, and b is a bias. Vapnik introduced the

where w is a weight vector, ϕ is a nonlinear transfer function that implements transformation the nonlinear to linear relationship of input to output vectors, and b is a bias. Vapnik introduced the convex

Sustainability 2016, 8, 1076

8 of 17

quadratic optimization question to ensure that extreme solution is optimal [22], and a ε-insensitively loss function is added to Equation (10): N

minϕ(w) = 12 ||w||2 + C ∑ (ξk + ξk∗ ) k=1

 T   yk − w ϕ (xk ) − b ≤ ε + ξk subject to wT ϕ (xk ) + b − yk ≤ ε + ξk∗   ξk , ξk∗ ≥ 0

  

(11) k = 1, 2, · · · , N

 

where ξ and ξ∗ are slack variables that penalize training errors by the loss function over the error tolerance ε, and C denotes the degree of penalization for the sample outside the error tolerance ε. The input vectors are called support vectors. The dual Lagrangian form is: W (α, α∗ ) = −

N N  1 N ∗ ∗ ∗ α − α ( α − α ) K x , x + y α − α − ε ( ) ( ) i j i j i ∑ i ∑ (αi + αi∗ ) i j i 2 i,j∑ =1 i=1 i=1

(12)

with the constraints,  

N

∑ (αi − αi∗ ) = 0

i=1



0≤

αi , αi∗

(i = 1, 2, · · · , N)

(13)

≤C

where α and α* are Lagrange multipliers, and the optimal desired weight vector of the regression hyperplane can be expressed with the kernel function: N

w=

∑ (αi − αi∗ ) K (x, xi )

(14)

i=1

where K(x, xi ) is the kernel function. Equation (10) can be expressed as: N

y = f(x) = w · ϕ(x) + b =

∑ (αi − αi∗ ) K (x, xi ) + b

(15)

i=1

Linear kernel function (LKF), polynomial kernel function (PKF) and radial basis function (RBF) are commonly used as kernel functions. Among these three kernel functions, the number of hyper-parameters in PKF is much more than LKF and RBF, so LKF and RBF are chosen in this study [23,24]. The parameters of RBF are discussed in Section 3.5. LKF, PKF and RBF are described by the Equations (16)–(18), respectively: K(x, xi ) = xxi (16) K(x, xi ) = (1 + xxi )d (d = 1, 2, · · · n)

(17)

K(x, xi ) = exp (−γ||x − xi ||)

(18)

2.3.5. Implementation and Assessment of Three Models In this study, we define y as the groundwater depth which is an index of the groundwater dynamic variations and the forecast object, and xk as the input factors which may influence the groundwater in the models of PFM, BPANN and SVM. PFM is constructed by the Statistical Product and Service Solutions (SPSS) software in version 19.0. BPANN and SVM are both implemented by MATLAB toolbox. Twenty groups of data from 1984 to 2003 are used to build and train the models, while 10 groups of data from 2004 to 2013 are used for testing. All the data are normalized to a range of 0–1 to avoid disturbance of dimensions. The root mean squared error (RMSE) and Nash-Sutcliffe efficiency

Sustainability 2016, 8, 1076

9 of 17

represented by E are used to assess the accuracy of the three built models based on the observed values and the simulated values: v u N u1 (19) RMSE = t ∑ (yˆ i − yi )2 N i=1 N

_

N

2

E = 1 − ∑ ( y i − yi ) / ∑ (yi − yi )2 i=1

(20)

i=1

where yi and yˆ i are the observed and simulated values of groundwater depth, respectively, yi is the mean observed value, i = 1,2, . . . , N, and N is the number of data groups. The perfect scores of RMSE and E are 0 and 1, respectively. 3. Results 3.1. Correlation Test for 12 Inputs Pearson correlation coefficients (ρ) are used to analyze the correlations among the 12 inputs. The value of 1 means a perfect positive correlation while the value of −1 means a perfect negative correlation. The correlation coefficients of the 12 inputs are shown in Table 2. Most correlation coefficients are lower than 0.5 and most inputs show weak correlations. However, the correlation coefficient between GDP and secondary industry output reach 0.799, which indicates that secondary industry output is the main means to improve the economy in Shijiazhuang plain. Tertiary industry output also has a high correlation with population, which means that tertiary industry is greatly influenced by population in the study area. Though the two pairs of inputs have relatively high correlation coefficients, overall, the correlations among the 12 inputs have little effect on the feature selection results. Table 2. Correlation test for 12 inputs. Correlation Coefficient (ρ)

Inputs Number

1

2

3

4

5

6

7

8

9

10

11

12

1 2 3 4 5 6 7 8 9 10 11 12

1 −0.199 −0.360 −0.225 −0.355 0.122 0.131 0.139 0.135 0.131 0.150 0.154

1 0.376 −0.176 0.346 0.357 0.211 0.315 0.217 0.189 0.313 0.146

1 −0.115 0.352 0.406 0.168 0.250 0.167 0.150 0.323 0.238

1 0.190 −0.234 −0.410 −0.392 −0.387 −0.288 −0.381 −0.324

1 0.175 −0.175 −0.105 −0.178 −0.189 −0.049 −0.160

1 0.309 0.324 0.309 0.388 0.385 0.217

1 0.432 0.799 * 0.399 0.430 0.455

1 0.282 0.376 0.371 0.347

1 0.398 0.328 0.354

1 0.720 * 0.351

1 0.440

1

Note: * means significant at 0.05 level.

3.2. Relative Importance of the Model Inputs The GT can provide the least model error compared to any smooth model by only giving the input and output data. In the study, the Gamma value Γ is used as the main criterion for distinguishing the importance of model inputs and the other three factors produced by GT (i.e., Gradient, Standard error of Γ and V ratio ) are used as a reference. This is because there are still some technical problems of GT that cannot be explained in detail, such as the utilization of the three factors (Gradient, Standard error of Γ and V ratio ). Basically, the least |Γ| represents the best model input combination. The gradient can indicate the model complexity. The lower the gradient, the more simple the model should be fitted. The SE of Γ shows the reliability of the Gamma value. The higher it is, the more unreliable the Gamma value is. V ratio illustrates the predictability of outputs using the effective inputs [15].

Sustainability 2016, 8, 1076

10 of 17

A mathematical model (e.g., PFM, BPANN and SVM in this study) may be built in high quality, if the three factors (gradient, SE of Γ and V ratio ) show low values by setting the best input data [17]. Table 3 shows the GT results for schemes with different combinations of model input indices listed in Table 1. Scheme 1 involves all 12 input indices and the other schemes have 11 indices with the rest masked. It is interesting to find that only Scheme 5 (without index 4, i.e., industrial groundwater use) and Scheme 13 (without index 12, i.e., urbanization rate) have larger |Γ| values than Scheme 1. The other schemes all have smaller |Γ| values which indicate they will perform better than Scheme 1 in constructing a reliable model. It can be seen from Table 3 that the best combination of indices is in Scheme 3 which has the least |Γ| (0.00170), and the worst combination of indices is in Scheme 13 which has the highest |Γ| (0.01372). Then, we can rank the schemes from best to worst, which is 3 > 2 > 11 > 12 > 7 > 9 > 8 > 6 > 10 > 4 > 1 > 5 > 13, and the missing index in each of the schemes can form a ranking of the importance of the model input indices, which is 2 < 1 < 10 < 11 < 6 < 8 < 7 < 5 < 9 < 3 < 4 < 12. This indicates that natural factors (precipitation and evaporation) have less effect on the groundwater level, while anthropic factors, biological factors, economic factors and social factors have greater influence. Urbanization rate and industrial groundwater use are the two most important model input indices which imply that the process of modernization and industrial development in Shijiazhuang plain is the main cause of groundwater recession. Table 3. GT results for schemes with different combinations of model input indices. Scheme ID

Combination of Indices

Masked Index

Gamma (Γ)

Gradient

Standard Error

V ratio

1 2 3 4 5 6 7 8 9 10 11 12 13

1,2,3,4,5,6,7,8,9,10,11,12 2,3,4,5,6,7,8,9,10,11,12 1,3,4,5,6,7,8,9,10,11,12 1,2,4,5,6,7,8,9,10,11,12 1,2,3,5,6,7,8,9,10,11,12 1,2,3,4,6,7,8,9,10,11,12 1,2,3,4,5,7,8,9,10,11,12 1,2,3,4,5,6,8,9,10,11,12 1,2,3,4,5,6,7,9,10,11,12 1,2,3,4,5,6,7,8,10,11,12 1,2,3,4,5,6,7,8,9,11,12 1,2,3,4,5,6,7,8,9,10,12 1,2,3,4,5,6,7,8,9,10,11

None 1 2 3 4 5 6 7 8 9 10 11 12

−0.00922 −0.00212 −0.00170 −0.00836 −0.01020 −0.00732 −0.00559 −0.00714 −0.00580 −0.00745 −0.00316 −0.00351 −0.01372

0.02841 0.02529 0.02776 0.03004 0.03265 0.03007 0.02845 0.02955 0.02850 0.02944 0.02759 0.02842 0.03514

0.01161 0.00420 0.00612 0.00767 0.00566 0.00711 0.00606 0.00510 0.00613 0.00993 0.00739 0.00810 0.00919

−0.03687 −0.00848 −0.00679 −0.03343 −0.04082 −0.02927 −0.02235 −0.02857 −0.02320 −0.02981 −0.01264 −0.01405 −0.05487

3.3. Model Input Selection Based on Gamma Test In order to find the best combination of model inputs, search methods of backward and forward selections are coupled with the GT [25]. The backward procedure starts with all indices and each index is gradually removed in turn, while the forward procedure starts with the minimum index number and carries on by adding one index at a time successively. The index which works the best with the indices from the last round (i.e., resulting in the least |Γ| value) is added in forward selection and the one with the absence of which leads to the best results is removed in backward selection. The procedure is iterated until one index is left for backward selection or until all indices are included for forward selection. Tables 4 and 5 show the model input selection results by using the search methods of backward and forward selections coupled with the GT, respectively. It can be found from the two tables that Scheme 19 has relatively fewer indices and the smallest values of |Γ| (0.00006). Theoretically, because of the least |Γ| and relatively fewer indices, Scheme 19 is the best model input combination. In this study, Scheme 19 is used to construct the three models. However, GT is still a method which requires more experiments to be completed. In order to make our conclusions more reliable, all the schemes in Tables 4 and 5 are calculated as part of the sensitivity analysis of the input choice in Section 3.6.

Sustainability 2016, 8, 1076

11 of 17

Table 4. Backward selection results with the assistance of GT. Scheme ID

Combination of Indices

Index Removed

Gamma (Γ)

Gradient

None 2 11 1 8 2 7 11 10 1 5 8 9 7 6 10 3 5 4 9

−0.00922 −0.00170 0.00103 0.00031 −0.00170 −0.00014 0.00103 0.00021 0.00031 0.00091 −0.00014 −0.00006 0.00021 0.00160 0.00091 0.01719 −0.00006 0.02153 0.00160 0.00546

0.02841 0.02776 0.02878 0.03579 0.02776 0.03999 0.02878 0.04415 0.03579 0.05043 0.03999 0.06856 0.04415 0.08593 0.05043 0.08515 0.06856 0.15409 0.08593 0.71495

1 1,2,3,4,5,6,7,8,9,10,11,12 3 1,3,4,5,6,7,8,9,10,11,12 Sustainability 2016, 8, 1076 14 1,3,4,5,6,7,8,9,10,12 15 3,4,5,6,7,8,9,10,12 3 1,3,4,5,6,7,8,9,10,11,12 16 3,4,5,6,7,9,10,12 14 1,3,4,5,6,7,8,9,10,12 17 3,4,5,6,9,10,12 3,4,5,6,7,8,9,10,12 18 15 3,4,5,6,9,12 3,4,5,6,7,9,10,12 19 16 3,4,6,9,12 3,4,5,6,9,10,12 20 17 3,4,6,12 3,4,5,6,9,12 21 18 3,4,12 3,4,6,9,12 22 19 4,12 3,4,6,12 23 20 12 21 22 23

3,4,12 4,12 Table 12

Scheme ID

5. Forward

6 3 selection 4

results

0.01719 0.02153 with the 0.00546

Standard Error 0.01161 0.00612 0.00555 0.00625 0.00612 0.00916 0.00555 0.00776 0.00625 0.00736 0.00916 0.00886 0.00776 0.00660 0.00736 0.01186 0.00886 0.01040 0.00660 0.00361

0.08515 0.15409 assistance 0.71495 of

0.01186 0.01040 GT. 0.00361

CombinationTable of Indices Index addedresultsGamma (Γ)assistance Gradient 5. Forward selection with the of GT.

23 12 Scheme ID Combination of Indices 24 10,12 23 12 25 7,10,12 24 10,12 26 6,7,10,12 25 7,10,12 27 3,6,7,10,12 26 6,7,10,12 28 3,4,6,7,10,12 27 3,6,7,10,12 29 3,4,5,6,7,10,12 3,4,6,7,10,12 16 28 3,4,5,6,7,9,10,12 3,4,5,6,7,10,12 15 29 3,4,5,6,7,8,9,10,12 3,4,5,6,7,9,10,12 14 16 1,3,4,5,6,7,8,9,10,12 3,4,5,6,7,8,9,10,12 3 15 1,3,4,5,6,7,8,9,10,11,12 1,3,4,5,6,7,8,9,10,12 1 14 1,2,3,4,5,6,7,8,9,10,11,12 3 1,3,4,5,6,7,8,9,10,11,12 1 1,2,3,4,5,6,7,8,9,10,11,12

None Index added 10 None 7 10 6 7 3 6 4 53 94 85 19 118 21 11 2

0.00546 Gamma (Γ) 0.00285 0.00546 0.00149 0.00285 0.00645 0.00149 −0.00069 0.00645 0.00199 −0.00069 0.00134 0.00199 − 0.00014 0.00134 0.00031 −0.00014 0.00103 0.00031 − 0.00170 0.00103 − 0.00922 −0.00170 −0.00922

0.71495 Gradient 0.19576 0.71495 0.13163 0.19576 0.08496 0.13163 0.07638 0.08496 0.05354 0.07638 0.04241 0.05354 0.03999 0.04241 0.03579 0.03999 0.02878 0.03579 0.02776 0.02878 0.02841 0.02776 0.02841

0.06877 0.08612 0.02185

Standard Error

Standard0.00361 Error 0.00491 0.00361 0.00483 0.00491 0.00294 0.00483 0.00797 0.00294 0.00541 0.00797 0.00717 0.00541 0.00916 0.00717 0.00625 0.00916 0.00555 0.00625 0.00612 0.00555 0.01161 0.00612 0.01161

V ratio

−0.03687 −0.00679 11 of 17 0.00413 0.00124 −0.00679 −0.00054 0.00413 0.00085 0.00124 0.00366 −0.00054 −0.00023 0.00085 0.00640 0.00366 0.06877 −0.00023 0.08612 0.00640 0.02185

V ratio

Vratio0.02185 0.01142 0.02185 0.00597 0.01142 0.02579 0.00597 −0.00277 0.02579 0.00796 −0.00277 0.00539 0.00796 −0.00054 0.00539 0.00124 −0.00054 0.00414 0.00124 −0.00679 0.00414 −0.03687 −0.00679 −0.03687

3.4. Influence of the Parameters in BPANN and SVM (RBF) 3.4. Influence of the Parameters in BPANN and SVM (RBF)

Possible data overfitting is a disadvantage of BPANN and SVM (RBF). In order to avoid this data is a disadvantage BPANN SVMlayers (RBF).and In order avoid this and problem Possible and make theoverfitting models more accurate, we of choose theand hidden nodestofor BPANN problem and make the models more accurate, we choose the hidden layers and nodes for BPANN two parameters (C and γ) for SVM (RBF) carefully by means of the try-and-error method. The parameter and two parameters (C and γ) for SVM (RBF) carefully by means of the try-and-error method. The C which is a positive trade-off parameter determines the degree of empirical error and the parameter γ parameter C which is a positive trade-off parameter determines the degree of empirical error and the which is the main parameter in kernel function of RBF. Numerous trials were conducted. parameter γ which is the main parameter in kernel function of RBF. Numerous trials were conducted. Taking Scheme 19 as example, thethe optimum layersand andnodes nodesper perlayer layer are Taking Scheme 19 an as an example, optimumnumbers numbers of of hidden hidden layers 1 andare 301for respectively, whilewhile the optimal values of C γ in this andBPANN, 30 for BPANN, respectively, the optimal values of and C and γ in thisstudy studyare arechosen chosen to be 1.0 and 0.31.0for SVM (RBF), All these parameters are are chosen based onon the to be and 0.3 for SVMrespectively. (RBF), respectively. All optimal these optimal parameters chosen based thevalues values of RMSE and With the inof number oflayers hiddenand layers and different nodes, different of and of RMSE and E. With theE.change in change number hidden nodes, valuesvalues of RMSE RMSE and E can be seen in Figures 5 and 6. The influences of the C and γ on the testing results are E can be seen in Figures 5 and 6. The influences of the C and γ on the testing results are shown in shown in Figures 7 and 8. Figures 7 and 8. 2.0

(a)

4.0

1.0

3.0

0.0

2.0

Optimal model

Testing Fitting 5

10

15

20

25

30

-1.0 Testing Fitting

1.0

Optimal model

0

E

RMSE

5.0

-2.0

0.0 0

10 20 Number of hidden layers

(a)

30

-3.0

Number of hidden layers

(b)

Figure 5. Influence numberofofhidden hidden layers layers on with 30 nodes. (a) RMSE; (b) E. (b) E. Figure 5. Influence of of number onthe theBPANN BPANNmodel model with 30 nodes. (a) RMSE;

Sustainability 2016, 8, 1076

12 of 17

Sustainability 2016, 8, 1076 Sustainability 2016, 8, 1076 Sustainability 2016, 8, 1076

RMSE RMSE RMSE

12.0 12.0 12.0 9.0 9.0 9.0 6.0 6.0 6.0 3.0 3.0 3.0 0.0 0.0 0 0.0 0 0

Testing Testing Fitting Testing Overfitting Fitting Fitting Underfitting Overfitting Underfitting Overfitting Underfitting

12 of 17 12 of 17 12 of 17

2.0 2.0 2.0 0.0 0.0 0 0.0 -2.0 -2.0 00 -2.0 -4.0 -4.0 -4.0 -6.0 -6.0 -6.0 -8.0 -8.0 -8.0 -10.0 -10.0 -10.0 -12.0 -12.0 -12.0

Optimal model Optimal model Optimal model 10 10 10

20 20 20

30 30 30

40 40 40

50 50 50

Underfitting Underfitting Underfitting

RMSE RMSE RMSE

EE E

RMSE RMSE RMSE

EE E

EE E

Overfitting Overfitting Overfitting Testing Optimal model Testing Fitting Testing Optimal model Fitting Optimal model Fitting 10 20 30 40 50 Number of nodes in hidden layer 10 40 50 Number of20 nodes in30 hidden layer Number of nodes in hidden layer 10 20 30 40 50 Number of nodes in hidden layer Number of nodes in hidden layer Number of nodes (a) in hidden layer (b) (a) (b) (a) (b) Figure 6. Influence of number of nodes in hidden layer on the BPANN model with one hidden layer. FigureFigure 6. Influence of number of of nodes onthe theBPANN BPANN model with hidden 6. Influence of number nodesininhidden hidden layer layer on model with oneone hidden layer.layer. Figure 6. Influence (a) RMSE; (b) E. of number of nodes in hidden layer on the BPANN model with one hidden layer. (a) RMSE; (b) E.(b) E. (a) RMSE; (a) RMSE; (b) E. 2.0 4.0 2.0 4.0 Underfitting Optimal model 2.0 4.0 Underfitting Optimal model 1.0 Optimal model Underfitting Optimal model 3.0 1.0 Optimal model 3.0 1.0 Optimal model 3.0 Testing 0.0 2.0 Testing 0.0 0 Fitting 0.5 1 1.5 2 2.5 3 3.5 Testing 0.0 2.0 Fitting 0.5 1 1.5 2 2.5 Testing 3 3.5 2.0 Fitting -1.0 00 0.5 1 1.5 2 2.5 3 3.5 Testing -1.0 1.0 Fitting Overfitting Testing -1.0 1.0 Fitting Overfitting 1.0 Fitting -2.0 Overfitting -2.0 0.0 -2.0 Underfitting Overfitting 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 -3.0 Underfitting 0.0 Overfitting Underfitting 0.0 0.5 1.0 1.5 γ 2.0 2.5 3.0 3.5 γ Overfitting -3.0 0.0 0.5 1.0 1.5 γ 2.0 2.5 3.0 3.5 -3.0 γ γ γ (a) (b) (a) (b) (a) (b) Figure 7. Influence of parameter γ on the SVM (RBF) with C = 1.0. (a) RMSE; (b) E. Figure 7. Influence of parameterγ γon on theSVM SVM (RBF) with = 1.0. (a)(a) RMSE; (b) E. Figure 7. Influence of parameter withC 1.0. RMSE; Figure 7. Influence of parameter γ onthe the SVM (RBF) (RBF) with CC= =1.0. (a) RMSE; (b) (b) E. E. 2.0 3.5 2.0 Underfitting Overfitting 3.5 2.0 Underfitting 3.5 Overfitting 3.0 Testing Underfitting Overfitting 1.0 3.0 Testing 3.0 1.0 Fitting 2.5 Testing 1.0 Fitting 2.5 Optimal model Fitting 2.5 0.0 2.0 Optimal model Optimal model 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 2.0 0.0 2.0 1.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 1.5 -1.0 0.0 0.5 1.0 1.5 Underfitting Optimal model 1.5 1.0 Underfitting -1.0 Optimal model -1.0 Underfitting Overfitting 1.0 Optimal model 1.0 Overfitting 0.5 Testing -2.0 Overfitting 0.5 Testing -2.0 Fitting 0.5 Testing -2.0 0.0 Fitting 0.0 0.0 Fitting -3.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 C -3.0 0.0 0.5 1.0 1.5 C 2.0 2.5 3.0 3.5 -3.0 0.0 0.5 1.0 1.5 C 2.0 2.5 3.0 3.5 C (a) C (b)C (a) (b) (a) (b) Figure 8. Influence of parameter C on the SVM (RBF) with γ = 0.3. (a) RMSE; (b) E. Figure 8. Influence of parameter C on the SVM (RBF) with γ = 0.3. (a) RMSE; (b) E. Figure 8. Influence of parameter C on the SVM (RBF) with γ = 0.3. (a) RMSE; (b) E.

Figure 8. Influence of parameter C on the SVM (RBF) with γ = 0.3. (a) RMSE; (b) E. 3.5. Fitting Results and Model Errors 3.5. Fitting Results and Model Errors 3.5. Fitting Results and Model Errors For Scheme 19, the equation of PFM can be expressed as follows calibrated on the 20 groups of 3.5. FittingFor Results and19, Model Errors of Scheme the equation PFM can be expressed as follows calibrated on the 20 groups of Scheme the respectively: equation of PFM can be expressed as follows calibrated on the 20 groups of data For from 1984 to19, 2003, dataScheme from 1984 2003, respectively: For 19,to equation of PFM can be expressed as follows calibrated on the 20 groups of data from 1984 tothe 2003, respectively: .2428 .1144 .0009 0.4832 (21) yˆ = 0.0015x 0.2961 x004.2428 x006.1144 x090.0009 x012 .4832 data from 1984 to 2003, respectively: ˆy = 0.0015 x 003..2961 (21) .2428 x 0.1144 x 0.0009 x 0.4832 2961 0 x (21) yˆ = 0.0015x33 x 44 x 66 x99 x12 12 where x3 is groundwater exploitation, x4 0.2961 is industrial groundwater use, x6 is crop yield, x9 is secondary 0.2428 groundwater 0.1144 0.0009 use, 0.4832 where x3 is groundwater exploitation, x4 is industrial x6 is crop yield, x9 is secondary (21) y ˆ = 0.0015x x x x x 3 industrial 6 fitting9 results 12 for where x3 is groundwater exploitation, x4 rate, is groundwater use, x6groundwater is crop yield, xdepth. 9 is secondary yˆ4 is the urbanization industry output and x12 is ˆˆ is the fitting results for groundwater depth. industry output and x12 is urbanization rate, y y 12 is urbanization rate, is the fitting results for groundwater depth. industry output and x The other two models of BPANN and SVM are also trained based on the dataset from 1984 to where x3 isThe groundwater exploitation, x4 isand industrial use, xon crop yield,from x9 is1984 secondary other two models of BPANN SVM aregroundwater also trained based dataset to 6 isthe two models of BPANN andtimes SVMfor are alsomodels trainedand based the dataset 1984 to 2003.The Theother procedure is carried out several both the on optimal resultsfrom are adopted. 2003.output The procedure is carried out several times for both models andfor thegroundwater optimal resultsdepth. are adopted. industry and x is urbanization rate, y ˆ is the fitting results 12 2003. The procedure is carried out several times results for bothare models and optimal results are adopted. The RMSE and E of the three models for fitting shown in the Table 6. The other RMSE and Emodels of the three models for are trained shown in Table 6. The of BPANN andfitting SVMresults are also The RMSE two and E of the three models for fitting results are shown inbased Table on 6. the dataset from 1984 to

2003. The procedure is carried out several times for both models and the optimal results are adopted. The RMSE and E of the three models for fitting results are shown in Table 6.

Sustainability 2016, 8, 1076

13 of 17

Table 6. Fitting results and errors of PFM, BPANN and SVM models for Scheme 19.

Sustainability 2016, 8, 1076

13 of 17

PFM BPANN SVM SVM Table 6. Fitting results and errors of PFM, BPANN and SVM(LKF) models for Scheme 19. (RBF)

Scheme ID

RMSE E RMSE BPANN E RMSE E PFM SVM (LKF) Scheme ID 19 1.0638RMSE0.9157 E 0.8411 1.4345 RMSE 0.9473E RMSE 0.8467 E 19 1.0638 0.9157 0.8411 0.9473 1.4345 0.8467

RMSE SVM (RBF)E 0.4265 0.9864 RMSE E 0.4265 0.9864

The RMSEs of PFM and SVM (LKF) are both higher than 1.0, and the value of SVM (LKF) is close The RMSEs of PFM and SVM (LKF) are both higher than 1.0, and the value of SVM (LKF) is close to 1.5. The RMSE is lower than than 0.5 for SVM (RBF) model, ofBPANN BPANNmodel model to 1.5. The RMSE is lower 0.5 the for the SVM (RBF) model,while while the the value value of is is higher than 0.5than and0.5 lower than than PFMPFM and and SVM (LKF). So,So, the fitting SVM(RBF) (RBF)are arecloser closer higher and lower SVM (LKF). the fittingvalues values of of SVM to to observed values.values. The highest values of Eofare near 1.01.0 forfor SVM lowestvalues valuesofof E are observed The highest values E are near SVM(RBF), (RBF),and and the lowest E are lower than SVM It(LKF). It indicates that(RBF) SVM has (RBF) the highest reliability themodels. four lower than 0.85 for 0.85 SVMfor (LKF). indicates that SVM thehas highest reliability in the in four models. Based on the values of RMSE and E, it can be easily observed that the SVM (RBF) model Based on the values of RMSE and E, it can be easily observed that the SVM (RBF) model has thehas best the bestability generalization ability in the sample learning process, while(LKF) the SVM (LKF) model performs generalization in the sample learning process, while the SVM model performs the worst. the worst. Good fitting results do prediction not mean good prediction for themodels. data-driven The Good fitting results do not mean good ability for the ability data-driven Themodels. performance performance of the models needs to be tested with further datasets. The observed and fitted values of the models needs to be tested with further datasets. The observed and fitted values (Scheme 19) of (Scheme 19) of groundwater depth are shown in Figure 9. groundwater depth are shown in Figure 9. Year 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003

Groundwater depth/m

10.00

15.00 Observation

20.00

PFM BPANN

25.00

SVM(LKF) SVM(RBF)

30.00

Figure 9. The observed and fitted values (Scheme 19) of groundwater depth from 1984 to 2003. Figure 9. The observed and fitted values (Scheme 19) of groundwater depth from 1984 to 2003.

3.6. Testing Results and Model Errors

3.6. Testing Results and Model Errors

The three types of data-driven models are tested based on the 10 groups of data from 2004 to The three types of data-driven models are tested based on the 10 groups of data from 2004 to 2013 for2013 three The RMSE andand E ofEthe three models resultsare areshown shown Table forschemes. three schemes. The RMSE of the three modelsfor fortesting testing results in in Table 7. 7. For Scheme 19, only RMSE of SVM (RBF) is lower than 2.0, of SVM SVM(LKF) (LKF)is is higher For Scheme 19,the only the RMSE of SVM (RBF) is lower than 2.0,and andthe theRMSE RMSE of higher than 3.0.than The BPANN model has lower RMSE than Thevalue value E higher is higher 3.0. The BPANN model has lower RMSE thanthe thePFM PFMmodel. model. The ofof E is thanthan 0 for (RBF), SVM (RBF), and lowest the lowest values ofare E are lower than−−2.0 for PFM PFM and It It can bebe 0 for SVM and the values of E lower than 2.0 for andSVM SVM(LKF). (LKF). can that BPANN and SVM (RBF) models perform betterthan thanPFM PFM and and SVM noticed noticed that BPANN and SVM (RBF) models perform better SVM(LKF) (LKF)asasa awhole whole with a relatively lower RMSE and better E, but stability is poor for the BPANN model. (RBF) with a relatively lower RMSE and better E, but stability is poor for the BPANN model.The TheSVM SVM (RBF) model performs best in the testing process, and it has generalization ability in the testing process. The model performs best in the testing process, and it has generalization ability in the testing process. observed and tested values (Scheme 19) of groundwater depth are shown in Figure 10. The observed and tested values (Scheme 19) of groundwater depth are shown in Figure 10. Table 7. Testing results and errors of PFM, BPANN and SVM models for Scheme 19.

Table 7. Testing results and errors of PFM, BPANN and SVM models for Scheme 19. Scheme ID

Scheme ID19 19

PFM RMSE PFM 2.3916

E −0.5409

RMSE

E

2.3916

−0.5409

BPANN RMSE E BPANN 2.2597 −0.3755

SVM (LKF) RMSE SVM (LKF) E 3.4417 −2.1910

SVM (RBF) RMSE SVM (RBF)E 1.4612 0.4248

RMSE

E

RMSE

E

RMSE

E

2.2597

−0.3755

3.4417

−2.1910

1.4612

0.4248

Sustainability 2016, 8, 1076

14 of 17

Sustainability 2016, 8, 1076

14 of 17

Year 2003

2005

2007

2009

2011

2013

Groundwater depth/m

20.00

25.00

30.00 Observation PFM BPANN SVM(LKF) SVM(RBF)

35.00

40.00

10.observed The observed tested values (Scheme19) 19)of of groundwater groundwater depth 2004 to 2013. FigureFigure 10. The andand tested values (Scheme depthfrom from 2004 to 2013.

3.7. Sensitivity Analysis of the Input Choice

3.7. Sensitivity Analysis of the Input Choice

In order to make the input choice conclusion more reliable, the same methods are used to

Incalculate order tothe make theofinput choice more reliable, same methods are used to calculate values RMSE and Econclusion for other schemes in Tablesthe 4 and 5, and the results are compared the values RMSE for other schemes in Tables and 5, andare thealso results areby compared with of Scheme 19.and TheEoptimal models for BPANN and4 SVM (RBF) chosen testing thewith Scheme 19. The optimal models for BPANN and SVM (RBF) are also chosen by testing the influence influence of the parameters in the models. The fitting and testing results and errors of PFM, BPANN of the parameters in the models. The fitting and9.testing errors PFM, BPANN and SVM models are shown in Tables 8 and Figuresresults 11 and and 12 show theofchanges of RMSE and and SVM E with It indicates that8Scheme is the best model combination with values RMSE|Γ|. models are|Γ|. shown in Tables and 9. 19 Figures 11 and 12input show the changes oflower RMSE and of E with and E that for the three models, andbest it also has relatively fewer indices. Overall, with the increase of the It indicates Scheme 19 is the model input combination with lower values of RMSE and E |Γ|, the performances of the three models become worse, so GT can be used to confirm the best for the three models, and it also has relatively fewer indices. Overall, with the increase of the |Γ|, combination of model inputs. It should be mentioned that Schemes 21, 22, 23, 24 and 26 with much the performances of the three models become worse, so GT can be used to confirm the best combination fewer indices have much higher RMSE and lower E. So, the schemes with much fewer indices but of model inputs. It should be mentioned that Schemes 21, 22, 23, 24 and 26 with much fewer indices smaller values of |Γ| should be tested carefully. Additionally, it also can be seen that the SVM (RBF) have much higher lower E. So, the schemes with much fewer indices but smaller values of performs bestRMSE in mostand schemes. |Γ| should be tested carefully. Additionally, it also can be seen that the SVM (RBF) performs best in Table 8. Fitting results and errors of PFM, BPANN and SVM models. most schemes. Scheme ID

PFM

BPANN

SVM (LKF)

19 −0.00006 1.0638 PFM1.1770 16 −0.00014 Scheme ID17 Gamma (Γ)0.00021 1.1626 E RMSE 15 0.00031 1.2342 1.0638 0.9157 19 27 −0.00006 −0.00069 1.3094 1.1770 0.8968 16 18 −0.00014 0.00091 1.2942 1.1626 0.8993 17 0.00021 0.00103 14 1.5336 1.2342 0.8801 15 0.00031 29 0.00134 1.6092 1.3094 0.7908 27 −0.00069 25 0.00149 1.8864 1.2942 0.7882 18 0.00091 20 0.00160 1.6903 1.5336 0.6627 14 0.00103 3 −0.00170 1.6805 1.6092 0.5634 29 0.00134 28 0.00199 2.0937 1.8864 0.4101 25 0.00149 24 14.9721 1.6903 0.4629 20 0.00160 0.00285 23 −0.00170 0.00546 14.1693 1.6805 0.4307 3 26 13.8233 2.0937 0.2956 28 0.00199 0.00645 1 14.9721 −4.2475 13.2698 24 0.00285 −0.00922 21 11.9236 14.1693 − 13.9601 23 0.00546 0.01719 22 14.9261 13.8233 − 12.3302 26 0.00645 0.02153

0.9157 0.8411 0.9473 1.4345 0.8467 SVM 0.8968 BPANN 1.6441 0.7986 1.4995(LKF)0.8351 0.8993 1.4874 RMSE 0.9923E 0.9266RMSE E0.8325 0.8801 1.7692 0.9359 1.5033 0.8302 0.8411 1.7903 0.9473 0.91371.4345 0.7908 1.5232 0.8467 0.7365 1.6441 1.8233 0.7986 0.82441.4995 0.7882 1.6044 0.8351 0.7086 0.9923 1.9917 0.9266 0.68231.4874 0.6627 1.7908 0.8325 0.6109 1.7692 0.9359 1.5033 0.8302 0.5634 1.9737 0.5605 1.7834 0.5701 1.7903 0.9137 1.5232 0.7365 0.4101 2.1431 0.4023 2.0937 0.3952 1.8233 0.8244 1.6044 0.7086 0.4629 2.2098 0.3857 2.8806 0.3328 1.9917 0.6823 1.7908 0.6109 0.4307 2.3105 0.3632 2.2308 0.3586 1.9737 0.5605 1.7834 0.5701 0.2956 2.4208 0.2924 2.6011 0.2393 2.1431 0.4023 2.0937 0.3952 −13.2698 6.0912 −4.3223 12.2426 −13.9142 2.2098 0.3857 2.8806 0.3328 −13.9601 13.9031 0.3586 −12.4881 2.3105 12.9411 0.3632−9.3762 2.2308 −12.3302 11.7868 0.2393 −10.6984 2.4208 6.0265 0.2924−4.9808 2.6011 −2.9804 6.2099 −13.9142 −4.0928 6.0912 2.2928 −4.3223−2.4636 12.2426 −9.5725 12.9325 −12.4881 −11.3207 12.9411 8.0231 −9.3762−5.9467 13.9031 −10.7363 15.9222 −10.6984 −13.2903 6.0265 12.1099 −4.9808−9.3307 11.7868

4.2475 −2.9804 2.2928 −2.4636 6.2099 −4.0928 −0.00922 11.9236 −9.5725 8.0231 5.9467BPANN 12.9325 −11.3207 0.01719 Table 9. Testing results and errors of−PFM, and SVM models. 14.9261 −10.7363 12.1099 −9.3307 15.9222 −13.2903 0.02153

1 21 22 Scheme ID 19 16

SVM (RBF) E 0.4265 0.9864 SVM (RBF) 0.4868 0.9823 0.4279 0.9864 RMSE E 0.4427 0.9821 0.4265 0.9864 0.4915 0.9122 0.4868 0.9823 0.5111 0.9035 0.4279 0.9864 0.7293 0.8146 0.4427 0.9821 0.7184 0.8003 0.4915 0.9122 1.6436 0.7822 0.5111 0.9035 1.7305 0.7909 0.7293 0.8146 0.9236 0.8013 0.7184 0.8003 0.9937 0.8028 1.6436 0.7822 4.9251 −2.9634 1.7305 0.7909 10.4623 −8.3092 0.9236 0.8013 4.8521 −2.8341 0.9937 0.8028 2.2947 −0.3525 4.9251 −2.9634 6.6943 −3.9408 10.4623 −8.3092 9.2362 −7.8460 4.8521 −2.8341

Gamma (Γ) Table 8. Fitting results and errors ofRMSE PFM, BPANN and SVM models. RMSE E E RMSE E RMSE

Gamma (Γ) −0.00006 −0.00014

PFM RMSE 2.3916 3.6501

E −0.5409 −2.5891

BPANN RMSE E 2.2597 −0.3755 2.5698 −0.7791

SVM (LKF) RMSE E 3.4417 −2.191 3.4791 −2.2608

2.2947 6.6943 9.2362

−0.3525 −3.9408 −7.8460

SVM (RBF) RMSE E 1.4612 0.4248 1.4359 0.4446

Sustainability 2016, 8, 1076

15 of 17

Table 9. Testing results and errors of PFM, BPANN and SVM models.

Scheme ID

PFM

Gamma (Γ)

Sustainability Sustainability2016, 2016,8,8,1076 1076 RMSE

19 16 17 15 27 18 14 29 25 20 3 28 24 23 26 1 21 22

1717 1515 2727 1818 1414 2929 2525 2020 33 2828 2424 2323 2626 11 2121 2222

BPANN E

RMSE

SVM (LKF) E

RMSE

2.2597 −0.3755 2.3916 −0.5409 −0.00006 0.00021 −1.7097 2.2831 0.00021 3.65013.1715 3.1715 −1.70972.5698 2.2831 −−0.4042 −0.4042 −2.5891 0.7791 −0.00014 0.00031 −2.5912 2.4672 0.00031 3.17153.7253 3.7253 −2.59122.2831 2.4672 −−0.7923 −0.7923 −1.7097 0.4042 0.00021 −0.00069 −2.6108 2.5094 −0.000693.72533.7526 3.7526 −2.61082.4672 2.5094 −−0.8155 −0.8155 −2.5912 0.7923 0.00031 −2.6108 0.8155 0.00091 −2.7941 2.6455 0.00091 3.75263.7802 3.7802 −2.79412.5094 2.6455 −−0.9406 −0.9406 −0.00069 −2.7941 0.9406 0.00091 0.00103 −2.7992 2.5123 0.00103 3.78023.8204 3.8204 −2.79922.6455 2.5123 −−0.8299 −0.8299 −2.7992 0.8299 0.00103 0.00134 −2.8456 2.5297 0.00134 3.82043.8246 3.8246 −2.84562.5123 2.5297 −−0.8482 −0.8482 −2.8456 0.8482 0.00134 0.00149 −10.936 7.6436 0.00149 3.82468.0644 8.0644 −10.9362.5297 7.6436 −−7.0354 −7.0354 −10.936 7.0354 0.00149 0.00160 −9.4213 7.9928 0.00160 8.06449.5036 9.5036 −9.42137.6436 7.9928 −−8.3409 −8.3409 −9.4213 8.3409 0.00160 −0.00170 −3.2194 2.4093 −0.001709.50364.8355 4.8355 −3.21947.9928 2.4093 −−0.7021 −0.7021 −3.2194 0.7021 −0.00170 0.00199 −4.9412 4.2356 0.00199 4.83555.1032 5.1032 −4.94122.4093 4.2356 −−3.9236 −3.9236 −4.9412 3.9236 0.00199 0.00285 −70.3527 24.821 0.00285 5.103230.8291 30.8291 −70.35274.2356 24.821 −−50.3423 −50.3423 −70.3527 50.3423 0.00285 0.00546 34.9115 −98.5744 30.5092 0.0054630.8291 34.9115 −98.574424.821 30.5092−−88.3257 −88.3257 34.9115 − 98.5744 30.5092 − 0.00546 0.00645 7.9212 −6.8211 0.00645 7.9212 −7.2205 −7.2205 6.9023 6.9023 88.3257 −6.8211 7.92124.2245 −7.2205 6.9023 −6.8211 0.00645 −0.00922 −0.00922 4.2245 −3.9231 −3.9231 2.5646 2.5646 −0.7924 −0.7924 4.2245 − 3.9231 2.5646 − 0.7924 −0.00922 0.01719 20.8431 0.01719 20.8431 −40.5496 −40.5496 15.4925 15.4925 −24.9825 −24.9825 20.8431 −40.5496 15.4925 −24.9825 0.01719 0.02153 31.2469 0.02153 31.2469 −79.0362 −79.0362 20.9421 20.9421 −33.4924 −33.4924 31.2469 −79.0362 20.9421 −33.4924 0.02153

2020

RMSE RMSE

RMSE 1515ofof17 E17

3.4417 −2.191 3.4125 −2.1371 3.4125 −2.1371 3.4791 − 2.2608 3.7982 −2.6952 3.7982 −2.6952 3.4125 − 2.1371 3.8023 −2.7325 3.8023 −2.7325 3.7982 − 2.6952 3.8023 − 2.7325 3.9246 −2.8244 3.9246 −2.8244 3.9246 − 2.8244 3.4819 −2.4953 3.4819 −2.4953 3.4819 − 2.4953 4.1284 −3.2349 4.1284 −3.2349 4.1284 − 3.2349 9.423 −10.843 9.423 −10.843 9.423 − 10.843 8.1567 −9.6242 8.1567 −9.6242 8.1567 − 9.6242 3.6938 −3.0365 3.6938 −3.0365 3.6938 − 3.0365 5.9435 −6.0256 5.9435 −6.0256 5.9435 −6.0256 35.9801 35.9801 −102.8313 −102.8313 35.9801 −−94.2952 102.8313 33.4883 33.4883 −94.2952 33.4883 − 94.2952 8.9346 −9.9438 8.9346 −9.9438 8.9346 − 9.9438 3.7992 −2.5871 3.7992 −2.5871 3.7992 − 2.5871 23.5794 23.5794 −46.9324 −46.9324 23.5794 −46.9324 29.91207 29.91207 −68.3629 −68.3629 29.91207 −68.3629

1.4612

0.4248

1.5822 1.5822 0.3256 1.4359 0.3256 0.4446 1.5901 1.5901 0.3083 1.5822 0.3083 0.3256 1.6433 1.6433 0.2596 1.5901 0.2596 0.3083 1.6433 0.2319 0.2596 1.7024 1.7024 0.2319 1.7024 0.2297 0.2319 1.7293 1.7293 0.2297 1.7293 0.2006 0.2297 1.9039 1.9039 0.2006 1.9039 −3.4251 0.2006 4.6255 4.6255 −3.4251 4.6255 −2.8603 −3.4251 3.9288 3.9288 −2.8603 3.9288 0.1353 −2.8603 2.0934 2.0934 0.1353 2.0934 −2.0425 0.1353 3.5647 3.5647 −2.0425 3.5647 −41.9327 −2.0425 22.9564 22.9564 −41.9327 22.9564−94.0942 −41.9327 33.4219 33.4219 −94.0942 33.4219 −94.0942 5.3196 5.3196 −5.9443 −5.9443 5.3196 0.1212 −5.9443 2.0293 2.0293 0.1212 2.0293 0.1212 10.9577 10.9577 −13.9548 −13.9548 10.9577 −13.9548 17.9822 17.9822 −26.9293 −26.9293

17.9822

−26.9293

55

PFM PFM BPANN BPANN SVM SVM(LKF) (LKF) SVM SVM(RBF) (RBF)

1515

SVM (RBF)

E

00 00

1010

0.005 0.005

0.01 0.01

0.015 0.015

0.02 0.02

0.025 0.025

E E

-5-5

55

PFM PFM BPANN BPANN SVM SVM(LKF) (LKF) SVM SVM(RBF) (RBF)

-10 -10

00 00

0.005 0.005

0.01 0.01

0.015 0.015

|Γ||Γ|

0.02 0.02

0.025 0.025

-15 -15

|Γ||Γ|

(a) (a)

(b) (b)

Figure 11.The The different values of RMSE and E with the change ofinin |Γ| in results. fitting results. (a)(b) RMSE; Figure different values ofofRMSE and EEwith the ofof|Γ| fitting (a) Figure11. 11. The different values RMSE and with thechange change |Γ| fitting results. (a)RMSE; RMSE; (b)E.E. (b) E. 5050

2020

PFM PFM BPANN BPANN SVM SVM(LKF) (LKF) SVM SVM(RBF) (RBF)

4040

00

0.005 0.005

0.01 0.01

0.015 0.015

0.02 0.02

0.025 0.025

-40 -40

E E

RMSE RMSE

3030

00 -20 -20

2020

-60 -60 -80 -80

1010

-100 -100

00 00

0.005 0.005

0.01 0.01

0.015 0.015

|Γ||Γ|

(a) (a)

0.02 0.02

0.025 0.025

-120 -120

PFM PFM BPANN BPANN SVM SVM(LKF) (LKF) SVM SVM(RBF) (RBF)

|Γ||Γ| (b) (b)

Figure 12.12. The different values of RMSE and with the change of|Γ| |Γ| intesting testing results. (a) RMSE; Figure The different values ofof RMSE inintesting results. (a) (b) Figure 12. The different values RMSEand andEEEwith withthe thechange changeofof |Γ| results. (a)RMSE; RMSE; (b)E.(b) E. E.

4.4.Discussion and Conclusions Discussion and Conclusions 4. Discussion and Conclusions Considering Consideringlimited limiteddata dataand andcomplicated complicatedgeological geologicalconditions conditionsininthe theplain plainofofShijiazhuang, Shijiazhuang, Considering limited data and complicated geological conditions in the plain of Shijiazhuang, physical physicalmodels modelsare arenot notaagood goodchoice, choice,and andthree threetypes typesofofdata-driven data-drivenmodels modelsare areused usedininthis thisstudy study physical models are not a gooddepth choice, and three types of Five data-driven models are usedanthropic, in this study totopredict ininShijiazhuang kinds predictthe thegroundwater groundwaterdepth Shijiazhuangplain. plain.Five kindsofoffactors factors(natural, (natural,anthropic, to predict the economic groundwater depth in Shijiazhuang plain. of the factors (natural,depth anthropic, biological, 1212indices which may influence biological, economicand andsocial) social)including including indices whichFive maykinds influence thegroundwater groundwater depth biological, economic and social) including 12 indices which may influence the groundwater depth are areconsidered, considered,i.e., i.e.,precipitation, precipitation,evaporation, evaporation,groundwater groundwaterexploitation, exploitation,industrial industrialgroundwater groundwateruse, use, are considered, i.e., precipitation, evaporation, groundwater exploitation, industrial groundwater agricultural agriculturalirrigation irrigationgroundwater groundwateruse, use,crop cropyield, yield,population, population,urbanization urbanizationrate, rate,primary primaryindustry industry use, output, output, tertiary industry An tool named isis output,secondary secondaryindustry industry output,use, tertiary industry outputand andGDP. GDP. Anefficient efficient toolprimary namedGT GT agricultural irrigation groundwater crop yield, output population, urbanization rate, industry adopted adoptedininthis thisstudy studytotoidentify identifythe therelative relativeimportance importanceofofthe theindices indicesand andfurther furtherdetermine determinethe thebest best

Sustainability 2016, 8, 1076

16 of 17

output, secondary industry output, tertiary industry output and GDP. An efficient tool named GT is adopted in this study to identify the relative importance of the indices and further determine the best model input combination. The results of GT indicate that natural factors (precipitation and evaporation) have less effect on the groundwater level, while anthropic factors, biological factors, economic factors and social factors are the main reasons for the groundwater level decline, especially the anthropic factors which reflect the groundwater exploitation. Evaporation is found to be the least relevant index among the 12 indices, while urbanization rate and industrial groundwater use are shown to have the most significant effect on groundwater variations. This indicates that the relationship between the evaporation and the groundwater becomes weaker with the drop in groundwater level. It also reveals that modernization and industrial development in Shijiazhuang plain are the main reasons for groundwater recession. The RMSE and E are used to assess the accuracy of the data-driven models. The results show that BPANN and SVM (RBF) models perform better than PFM and SVM (LKF) as a whole in all three schemes, while SVM (RBF) has the best generalization ability during the fitting and testing process. The SVM (RBF) model performs the best in both model fitting and model testing. Additionally, all the schemes in Tables 4 and 5 are used to analyze the sensitivity of the input choice. The results show that GT can be used to confirm the best combination of model inputs, but schemes with much fewer indices and smaller values of |Γ| should be tested carefully. The groundwater level has decreased dramatically and is continuing to do so in Shijiazhuang plain. Countermeasures need to be taken to reduce groundwater usage and guarantee the sustainable utilization of groundwater, by improving the water use efficiency, promoting comprehensive water-saving measures, adjusting the industrial structure and finding alternative water sources, such as water recycling and the South-to-North Water Diversion Project. The results of this study may also provide a methodology and a reference for the prediction of groundwater resources in data-limited areas, especially for the study region and North China in general. However, monthly prediction of the groundwater depth has not been considered in this study due to limited data observations. The advantage of GT can be better exploited when dealing with larger datasets using data-driven models in groundwater prediction. Acknowledgments: This study was supported by the National Natural Science Foundation of China (Grant No. 51409270), partition prediction of groundwater system of Hebei Province, the foundation of China Institute of Water Resources and Hydropower Research (1232), the International Science and Technology Cooperation Program of China (Grant No. 2013DFG70990), and the Open Research Fund Program of State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering (2014490611). Author Contributions: All the authors have contributed to the conception and development of this manuscript. Jiyang Tian carried out the analysis and wrote the paper. Chuanzhe Li, Jia Liu, Fuliang Yu and Wan Zurina Wan Jaafar conceived and designed the framework. Shuanghu Cheng and Nana Zhao provided assistance in calculations and figure productions. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2.

3. 4. 5.

Cao, G.; Zheng, C.; Scanlon, B.R.; Liu, J.; Li, W. Use of flow modeling to assess sustainability of groundwater resources in the North China Plain. Water Resour. Res. 2013, 49, 159–175. [CrossRef] Natkhin, M.; Steidl, J.; Dietrich, O.; Dannowski, R.; Lischeid, G. Differentiating between climate effects and forest growth dynamics effects on decreasing groundwater recharge in a lowland region in Northeast Germany. J. Hydrol. 2012, 448, 245–254. [CrossRef] Goderniaux, P.; Brouyère, S.; Wildemeersch, S.; Therrien, R.; Dassargues, A. Uncertainty of climate change impact on groundwater reserves—Application to a chalk aquifer. J. Hydrol. 2015, 528, 108–121. [CrossRef] Yuan, Z.; Shen, Y. Estimation of agricultural water consumption from meteorological and yield data: A case study of Hebei, North China. PLoS ONE 2013, 8, e58685–e58685. [CrossRef] [PubMed] Davidsen, C.; Liu, S.; Mo, X.; Rosbjerg, D.; Bauer-Gottwein, P. The cost of ending groundwater overdraft on the North China Plain. Hydrol. Earth Syst. Sci. 2015, 12, 5931–5966. [CrossRef]

Sustainability 2016, 8, 1076

6.

7. 8. 9. 10. 11. 12. 13.

14. 15.

16.

17. 18. 19. 20. 21. 22. 23.

24. 25.

17 of 17

Kendy, E.; Gérard-Marchant, P.; Walter, M.T.; Zhang, Y.; Liu, C.; Tammo, S.S. A soil-water-balance approach to quantify groundwater recharge from irrigated cropland in the North China Plain. Hydrol. Process. 2003, 17, 2011–2031. [CrossRef] Hu, Y.; Moiwo, J.P.; Yang, Y.; Han, S.; Yang, Y. Agricultural water-saving and sustainable groundwater management in Shijiazhuang Irrigation District, North China Plain. J. Hydrol. 2010, 393, 219–232. [CrossRef] Knotters, M.; Bierkens, M.F.P. Physical basis of time series models for water table depths. Water Resour. Res. 2000, 36, 181–188. [CrossRef] Asefa, T.; Kemblowski, M.; Urroz, G.; Mckee, M. Support vector machines (SVMs) for monitoring network design. Groundwater 2005, 43, 413–422. [CrossRef] [PubMed] Xu, T.F.; Valocchi, A.J.; Choi, J.; Amir, E. Use of machine learning methods to reduce predictive error of groundwater models. Groundwater 2013, 52, 448–460. [CrossRef] [PubMed] Yang, Z.P.; Lu, W.X.; Long, Y.Q.; Li, P. Application and comparison of two prediction models for groundwater levels: A case study in Western Jilin Province, China. J. Arid Environ. 2009, 73, 487–492. [CrossRef] Shirmohammadi, B.; Vafakhah, M.; Moosavi, V.; Moghaddamnia, A. Application of several data-driven techniques for predicting groundwater level. Water Resour. Manag. 2013, 27, 419–432. [CrossRef] Ping, J.; Qiang, Y.; Xi, M. A combination model of chaos, wavelet and support vector machine predicting groundwater levels and its evaluation using three comprehensive quantifying techniques. Inf. Technol. J. 2013, 12, 3158–3163. [CrossRef] Evans, D.; Jones, A.J. A proof of the Gamma test. Proc. R. Soc. Lond. A 2002, 458, 2759–2799. [CrossRef] Moghaddamnia, A.; Gousheh, M.G.; Piri, J.; Amin, S.; Han, D. Evaporation estimation using artificial neural networks and adaptive neuro-fuzzy inference system techniques. Adv. Water Resour. 2009, 32, 88–97. [CrossRef] Noori, R.; Karbassi, A.R.; Moghaddamnia, A.; Han, D.; Zokaei-Ashtiani, M.H.; Farokhnia, A.; Gousheh, M.G. Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. J. Hydrol. 2011, 401, 177–189. [CrossRef] Han, D.; Wan Jaafar, W.Z. Model structure exploration for index flood regionalization. Hydrol. Process. 2013, 27, 2903–2917. [CrossRef] Lu, X.; Jin, M.; Martinus, T.V.G.; Wang, B. Groundwater recharge at five representative sites in the Hebei Plain, China. Ground Water. 2011, 49, 286–294. [CrossRef] [PubMed] Liu, C.M.; Yu, J.J.; Kendy, E. Groundwater exploitation and its impact on the environment in the North China Plain. Water Int. 2001, 26, 265–272. Cao, G.; Han, D.; Song, X. Evaluating actual evapotranspiration and impacts of groundwater storage change in the North China Plain. Hydrol. Process. 2014, 28, 1797–1808. [CrossRef] Shijiazhuang Bureau of Statistics. Shijiazhuang Statistical Yearbook; China Statistics Press: Beijing, China, 1984–2013. Cortes, C.; Vapnik, V. Support-Vector networks. Mach. Learn. 1995, 20, 273–297. [CrossRef] Elangovan, M.; Sugumaran, V.; Ramachandran, K.I.; Ravikumar, S. Effect of SVM kernel functions on classification of vibration signals of a single point cutting tool. Expert Syst. Appl. 2011, 38, 15202–15207. [CrossRef] Safavi, H.R.; Esmikhani, M. Conjunctive use of surface water and groundwater: Application of support vector machines. SVMs and genetic algorithms. Water Resour. Manag. 2013, 27, 2623–2644. [CrossRef] Wan Jaafar, W.Z.; Han, D. Variable Selection Using the Gamma Test Forward and Backward Selections. J. Hydrol. Eng. 2012, 17, 182–190. [CrossRef] © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).