A Comparative Study of Groundwater Level Forecasting Using ... - MDPI

water Article

A Comparative Study of Groundwater Level Forecasting Using Data-Driven Models Based on Ensemble Empirical Mode Decomposition Yicheng Gong 1,2, *, Zhongjing Wang 1,2,3, *, Guoyin Xu 1 and Zixiong Zhang 1 1 2 3

*

Department of Hydraulic Engineering, Tsinghua University, Beijing 100084, China; [email protected] (G.X.); [email protected] (Z.Z.) State Key Laboratory of Hydro-Science and Engineering, Tsinghua University, Beijing 100084, China State Key Lab of Plateau Ecology and Agriculture, Qinghai University, Xining 810016, China Correspondence: [email protected] (Y.G.); [email protected] (Z.W.); Tel.: +86-010-6278-2021 (Z.W.)

Received: 15 March 2018; Accepted: 29 May 2018; Published: 4 June 2018

Abstract: The reliable and accurate prediction of groundwater levels is important to improve water-use efficiency in the development and management of water resources. Three nonlinear time-series intelligence hybrid models were proposed to predict groundwater level fluctuations through a combination of ensemble empirical mode decomposition (EEMD) and data-driven models (i.e., artificial neural networks (ANN), support vector machines (SVM) and adaptive neuro fuzzy inference systems (ANFIS)), respectively. The prediction capability of EEMD-ANN, EEMD-SVM, and EEMD-ANFIS hybrid models was investigated using a monthly groundwater level time series collected from two observation wells near Lake Okeechobee in Florida. The statistical parameters correlation coefficient (R), normalized mean square error (NMSE), root mean square error (RMSE), Nash–Sutcliffe efficiency coefficient (NS), and Akaike information criteria (AIC) were used to assess the performance of the EEMD-ANN, EEMD-SVM and EEMD-ANFIS models. The results achieved from the EEMD-ANN, EEMD-SVM and EEMD-ANFIS models were compared with those from the ANN, SVM and ANFIS models. The three hybrid models (i.e., EEMD-ANN, EEMD-SVM, and EEMD-ANFIS) proved to be applicable to forecast the groundwater level fluctuations. The values of the statistical parameters indicated that the EEMD-ANFIS and EEMD-SVM models achieved better prediction results than the EEMD-ANN model. Meanwhile, the three models coupled with EEMD were found have better prediction results than the models that were not. The findings from this study indicate that the proposed nonlinear time-series intelligence hybrid models could improve the prediction capability in forecasting groundwater level fluctuations, and serve as useful and helpful guidelines for the management of sustainable water resources. Keywords: groundwater level; ensemble empirical mode decomposition; artificial neural network; support vector machine; adaptive neuro fuzzy inference system

1. Introduction Groundwater is an increasingly important water resource for irrigation, and domestic and industrial activities in many countries. More reliable and accurate estimation of groundwater levels can help prevent groundwater overexploitation and improve water-use efficiency for water resource management. However, in some regions, the groundwater has been pumped out much faster than it can be replenished, which eventually reduces the groundwater level. In addition, groundwater level time series are highly non-linear and non-stationary in nature, and prediction depends on many complex environmental factors, such as groundwater aquifers, precipitation, etc. Groundwater aquifers Water 2018, 10, 730; doi:10.3390/w10060730

www.mdpi.com/journal/water

Water 2018, 10, 730

2 of 20

are intrinsically heterogeneous systems that are affected by complex hydrogeological conditions with groundwater-surface water interactions at various temporal-spatial scales [1,2]. Therefore, it is essential to develop more effective models for groundwater level prediction. Many groundwater modeling approaches and data-driven models have been implemented to forecast groundwater levels [2–4]. A groundwater numerical model is firstly developed from a conceptual model, which often includes only the main and fundamental principles but neglects the complexities of groundwater systems [1]. It is hard to establish groundwater flow equations and set the values of hydrogeological parameters for conceptual models. Also, it is difficult to obtain long time series data for groundwater numerical modeling; there are still challenges and uncertainties in modelling processes [1,5]. Recently, data-driven models such as artificial neural networks (ANN), support vector machines (SVM), adaptive neuro-fuzzy inference systems (ANFIS) and genetic programming (GP), and time series methods such as autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) have been proved efficient in forecasting hydrologic time series (e.g., groundwater level, water demand and inflow) [5–18]. Yoon et al. [5] developed ANN and SVM models to forecast groundwater level fluctuations in a coastal aquifer. The performance of the SVM was similar to or even better than that of the ANN model, according to the results from the validation. Shirmohammadi et al. [10] applied time series, system identification and ANFIS models to predict groundwater levels. The obtained results showed that ANFIS is superior to time series and system identification. Moosavi [11] proposed a model to forecast groundwater levels for different prediction periods. The achieved results showed that predictive ability of the wavelet–ANIFIS model was more accurate than those of the ANN, ANFIS and wavelet–ANN models. Shiri and Ki¸si [12] investigated the ability of adaptive neuro-fuzzy inference systems (ANFIS) and genetic programming (GP) to predict water table depth fluctuations. The results showed that the ANFIS and GP models can be applied successfully in groundwater depth prediction. Fallah-Mehdipour et al. [6] investigated the capability of ANFIS and GP to forecast and simulate groundwater levels in the Karaj plain of Iran. These results indicated that GP is an effective method for predicting groundwater levels. Valipour et al. [13] compared autoregressive moving average (ARMA) with autoregressive integrated moving average (ARIMA) models in forecasting the inflow of the Dez dam reservoir. The ARIMA model was proved have better predictive ability compared to the ARMA model. Xu et al. [14] built the instance-based weighting and support vector regression data-driven models to reduce the predictive error of groundwater models. The results of two real-world case studies showed that data-driven models can be applied effectively to reduce the root mean square error of the groundwater models. Asefa et al. [15] used support vector machines (SVMs) to monitor network design. The obtained results showed that SVMs can select the best configurations of well networks by reproducing the behavior of Monte Carlo flow. Ensemble empirical mode decomposition (EEMD) is a scale-adaptive method proposed by Wu and Huang [19], which improved on empirical mode decomposition (EMD) to avoid the drawback of mode mixing [20]. EEMD has a high ability to decompose the original signal into intrinsic mode function (IMF) components and residual components for nonstationary and nonlinear signal sequences. In the hydrological and environmental research field, EEMD has been successfully applied to predict nonlinear problems such as runoff [21–23], wind speed [24], wave height [25], particulate matter 2.5 (PM2.5) [26], streamflow [27,28], vegetation dynamics [29], etc. Wang et al. [21] proposed an EEMD-ARIMA model for the forecasting of annual runoff time series. Quantitative standard statistical performance values proved that EEMD-ARIMA is a superior model to ARIMA for the forecasting of annual runoff. Liu et al. [24] applied the fast ensemble empirical mode decomposition–multilayer perceptron (FEEMD-MLP), Fast Ensemble Empirical Mode Decomposition-Adaptive neuro fuzzy inference system (FEEMD-ANFIS), wavelet packet–multilayer perceptron (WP-MLP) and WP-ANFIS models to forecast wind speed. The results suggested that all of the proposed hybrid algorithms are suitable for wind speed prediction. Duan et al. [25] developed the improved empirical model decomposition–support vector regression (EMD-SVR) model for the short-term prediction of wave height and found that

Water 2018, 10, 730

3 of 20

the EMD-SVR model performs better than the wavelet-decomposition-based SVR (WD-SVR) model. Ausati and Amanollahi [26] used ensemble empirical mode decomposition–general regression neural network (EEMD-GRNN), principal component regression (PCR), adaptive neuro-fuzzy inference system (ANFIS) and multiple liner regression (MLR) models to predict concentrations of particulate matter 2.5 (PM2.5) in the city of Sanandaj. It was concluded that the EEMD-GRNN hybrid model had higher accuracy than the linear model in forecasting the particulate matter 2.5 (PM2.5) concentration. Karthikeyan and Nagesh [28] tested the predictability of monthly streamflow time series using wavelet- and EMD-based ARMA models. The result showed that the wavelet-based ARMA method outperformed the EMD-based ARMA method. Hawinkel et al. [29] used EEMD to decompose the time series of the normalized difference vegetation index (NDVI) and extracted climate-driven interannual vegetation dynamics. The result indicated that the EEMD was a feasible method for the assessment of climate-driven interannual vegetation dynamics. ANN, SVN and ANFIS data-driven models and EEMD were applied in formal hydrology studies. All of these models were used alone or in combination to predict nonlinear and nonstationary time series. In this study, the ANN, SVN and ANFIS nonlinear time-series model coupled with EEMD were applied to groundwater level prediction, for validity and accuracy testing. The main advantage of the EEMD-ANN, EEMD-SVM and EEMD-ANFIS hybrid models is that exogenous factors affecting the groundwater level do not need to be considered in the prediction. In order to compare the forecasting performance of the three hybrid models, single ANN, SVM and ANFIS models were applied to predict the groundwater level by considering exogenous factors, such as precipitation, temperature and surface water level, etc. The prediction results obtained from models with and without coupling were compared based on standard statistical analysis. This paper is organized as follows: Section 2 briefly describes the basic theory of EEMD, ANN, SVM and ANFIS, and presents the framework of the EEMD-ANN, EEMD-SVM and EEMD-ANFIS hybrid models; Section 3 introduced the study area, available data and performance criteria. Section 4 explained the groundwater level decomposition by the EEMD for modeling and application; Section 5 analyzed the predicting results of the proposed EEMD-ANN, EEMD-SVM and EEMD-ANFIS hybrid models; Section 6 concluded this study. 2. Methods 2.1. Ensemble Empirical Mode Decomposition Ensemble empirical mode decomposition (EEMD) is a new empirical mode decomposition (EMD)-based algorithm, which is a fully data-adaptive method to analyze nonlinear and nonstationary signals [19,20]. EMD can decompose the original signal into the number of intrinsic mode functions (IMFs) when the signal meets two conditions: (1) the functions are symmetric and the mean value of the upper and lower envelopes should be zero; (2) the number of extrema and the number of zero crossings must be equal or differ at most by one. However, EMD still has some drawbacks, such as mode mixing. Ensemble empirical mode decomposition (EEMD) was proposed to combine the white-noise-assisted system based on EMD to solve the problem of mode mixing [19,30]. If x (t) is the signal or time series to be decomposed, the EMD algorithm can be briefly summarized as follows: Step 1. Identify all local maxima and minima points of the time series x (t); Step 2. Interpolate between all local maxima and minima points of x (t) to form the upper envelope emax (t) and lower envelope emin (t); Step 3. Compute the mean envelope m(t) between the upper envelope emax (t) and the lower envelope emin (t); m(t) = (emax (t) + emin (t))/2 (1)

Water 2018, 10, 730

4 of 20

Step 4. Calculate the IMF candidate; h ( t + 1) = x ( t ) − m ( t )

(2)

Step 5. Determine whether or not h(t + 1) satisfies the two conditions of IMF. Is h(t + 1) an IMF? t +1

If yes, save h(t + 1), calculate the residue r (t + 1) = x (t) − ∑ h(i ), t = t + 1, and treat r (t) as i =1

input data in step 2. If no, treat h(t + 1) and as input data in step 2 until it satisfies the two conditions of IMF. Step 6. Continue until the final residue meets some predefined stopping criteria. At the end of the shifting process, the final residual term rn (t) has less than two local extrema, and the original signal or time series can be decomposed into a set of IMFs and a residue as follows: n

x (t) =

∑ hi ( t ) + r ( t )

(3)

i =1

where n is the number of IMFs, rn (t) is the final residue and hi (t) as IMFs are nearly orthogonal to each other. Ensemble empirical mode decomposition (EEMD) was proposed by adding a finite amount of Gaussian white noise to series with EMD mixture, and the mode mixing is mostly eliminated. Based on the EMD method, the EEMD can be briefly summarized as follows [19,31]: Step 1. Initialize the ensemble number and the amplitude of the added white noise. Step 2. Add random white noise to produce the noise-added data. Step 3. Identify the local maxima and minima and obtain the upper and lower envelopes. Step 4. Compute the mean of the upper and lower envelopes. Step 5. Decompose the data with added random white noise into IMFs. Step 6. Repeat step 3 to step 5 until the stopping criteria. After the shift processing, the IMFs and the residue are obtained. 2.2. Artificial Neural Network An artificial neural network (ANN) is a black box tool that has certain performance characteristics that resemble the biological neural networks in the human brain [32]. Feed-forward neural networks (FFNN) are the most commonly used artificial neural networks, and have been broadly employed for modeling and forecasting in hydrogeology [33,34]. The architecture of the FFNN model consists of an input layer, hidden layer, output layer and artificial neurons (Figure 1). In a feed forward network, the neurons in each layer are connected to those in the next layer. Therefore, the output of a node in a layer only depends on the input it received from previous layer, determined weight, and type of transform function. The present research used the three-layer FFNN model, which was trained with a Bayesian regularization (BR) algorithm. The tan-sigmoid transfer function was selected by previous studies [35]. The mathematical expression can be expressed as follows: N

yj = f

∑ w ji xi + bj

! (4)

i =1

where xi is the input vector, y j is the output, bi is the bias, w ji is a weight connecting xi and y j , N is the number of nodes, and f is the activation function.

Water 2018, 10, 730

5 of 20

2.3. Support Vector Machine A support vector machine (SVM) is a novel machine learning technique based on statistical learning theory [36,37]. In a regression SVM model, a regression hyperplane with a ε-insensitivity loss function is a convex dual optimization problem. The solution can be obtained from the optimization algorithm. function of the SVM can be expressed as follows: Water 2018,The 10, xdeterministic FOR PEER REVIEW 5 of 20 Water 2018, 10, x FOR PEER REVIEW

5 of 20

f f(xx) = w · φ (x) + b (5) w φ (x) + b ( ) =⋅ (5) f ( x ) =⋅ w φ (x) + b (5) where wi is vector, a high-dimensional feature by w i a isweight a weight vector,and and bb isisaa bias. bias. xx is is mapped mapped totoa high-dimensional feature space space by where w b x is a weight vector, and is a bias. is mapped to a high-dimensional feature space by where i nonlinear transfer function thesequential sequential minimal optimization (SMO) algorithm Inthis this study, study, the minimal optimization (SMO) algorithm nonlinear transfer functionφ.φ .In the sequential minimal optimization (SMO) can algorithm nonlinear transfer function φ . In this study, waswas used to solve the dual optimization problem of SVM [38,39]. The SMO algorithm be directly used to solve the dual optimization problem of SVM [38,39]. The SMO algorithm can be directly was used to solve the dual optimization problem of SVM [38,39]. The SMO algorithm can be directlycodes acquired from an analytical solution without invoking a quadratic optimizer. The programming acquired from an analytical solution without invoking a quadratic optimizer. The programming acquired from an analytical without invoking a quadratic optimizer. The programming of the library for support vector solution machines (LIBSVM) was applied to predict a non-linear time series in codes of the library for support vector machines (LIBSVM) was applied to predict a non-linear time codes of the library for support vector machines (LIBSVM) was applied to predict aSVM non-linear time the validation calibration [40]. Figure shows the schematic diagram of the model. series in the and validation and calibration [40].2Figure 2 shows the schematic diagram of the SVM model. series in the validation and calibration [40]. Figure 2 shows the schematic diagram of the SVM model. Input Vector Input Vector

x

x1 x1

Input Layer Input Layer

Hidden Layer Hidden Layer

Output Layer Output Layer

x2 x2

x

f (x) f (x)

x3 x3

Output Output

xn xn Figure 1. Schematic diagram of an ANN.

Figure diagramofofananANN. ANN. Figure1.1.Schematic Schematic diagram Input Vector Input Vector

x

x

Support Vectors Support Vectors

Kernel Function Kernel Function

x1 x1

K ( x, x1 ) K ( x, x1 )

x1 x1 x1 x1

K ( x, x1 ) K ( x, x1 )

x1 x1

K ( x, x1 ) K ( x, x1 )

Output Layer Output Layer

K ( x, x1 ) K ( x, x1 )

Figure 2. Schematic diagram of a SVM. Figure 2. Schematic diagram of a SVM.

f (x) f (x)

Output Output

Figure 2. Schematic diagram of a SVM. 2.4. Adaptive Neuro Fuzzy Inference System 2.4. Adaptive Neuro Fuzzy Inference System 2.4. Adaptive Neuro Fuzzy Systemsystem (ANFIS) as a hybrid algorithm is an adaptive neural An adaptive neuro Inference fuzzy inference An adaptive neuro fuzzy inference system (ANFIS) as a hybrid algorithm is an adaptive neural network based on a fuzzy inference system [41]. ANFIS is capable of approximating any real An adaptive (ANFIS) as aishybrid is an adaptive neural network basedneuro on a fuzzy fuzzy inference inference system system [41]. ANFIS capablealgorithm of approximating any real continuous function in a compact set to any degree of accuracy. This study used the first order continuous function in a compact set to[41]. any ANFIS degree is of accuracy. This study usedany the real firstcontinuous order network based on a fuzzy system ofthat approximating Sugeno–fuzzy model of inference ANFIS [41,42]. To simplify, it wascapable assumed the fuzzy inference system Sugeno–fuzzy model of ANFIS [41,42]. To simplify, it was assumed that the fuzzy inference system function in a compact set to any degree of accuracy. This study used the first order Sugeno–fuzzy has two inputs x and y and one output f . For the first-order Sugeno inference system, the model x and y itand has two inputs output that For fuzzy the first-order Sugeno inference f . the of ANFIS wasone assumed inference system has twosystem, inputs the x and y typical [41,42]. rules canTo besimplify, expressed as: typical rules can be expressed as: and one output f . For the first-order Sugeno inference system, the typical rules can be expressed as: Rule 1: If x is A1 and y is B1; then f1 = p1x + q1y + r1 (6) Rule 1: If x is A1 and y is B1; then f1 = p1x + q1y + r1 (6) Rule 1: 2: If If x is thenf2f=1 p=2xp+1 xq2+ (6) Rule x isAA and yy is is B B21;;then y +q1ry2 + r1 (7) 1 2and Rule 2: If x is A2 and y is B2; then f2 = p2x + q2y + r2 (7) where x and y are the crisp inputs to the node i , Ai and Bi are the linguistic labels (low, If xinputs is A2 and y is B2 ; then = p2 xB+i qare r2 linguistic labels (low, (7) 2y + i , Aif 2 and where x and y areRule the 2: crisp to the node the medium, high, etc.,) characterized by the convenient membership functions. pi , qi , and ri ( i = 1, 2 ) pi , qi (low, ri ( i = 1, 2 ) high, etc.,) characterized convenient membership , and medium, wheremedium, x and yhigh, are the crisp inputs toby thethe node i, Ai and B are thefunctions. linguistic labels are the parameters in the then part (consequent part) of ithe first-order Sugeno fuzzy model. The etc.,) are characterized by the convenient 1, fuzzy 2) aremodel. the parameters (i = the parameters in the then partmembership (consequent functions. part) of thepfirst-order The i , q(Figure i , and rSugeno i3). Sugeno–fuzzy structure of ANFIS comprised a five-layer network More comprehensive Sugeno–fuzzy structure of ANFIS comprised a five-layer network (Figure 3). More comprehensive information about ANFIS can be found in the literature [11,43–45]. information about ANFIS can be found in the literature [11,43–45].

Water 2018, 10, 730

6 of 20

in the then part (consequent part) of the first-order Sugeno fuzzy model. The Sugeno–fuzzy structure of ANFIS comprised a five-layer network (Figure 3). More comprehensive information about ANFIS Water 2018, in 10, xthe FORliterature PEER REVIEW 6 of 20 can be found [11,43–45]. Water 2018, 10, x FOR PEER REVIEW

Layer1

Layer1

x x

Y Y

A1 A1

Layer2 Layer2

A2 A2

∏ ∏

B1 B1

∏ ∏

B2 B2

Layer4

Layer3

Layer4

Layer3

w1 w1

w2 w2

N N N N

6 of 20

Layer5 Layer5

x y x y

w1

w1 f1

w1

f

w1 f1

w2

f

w2 f 2

w2

w2 f 2

x y x y

Figure3.3.Schematic Schematic diagram Figure diagramofofANFIS. ANFIS. Figure 3. Schematic diagram of ANFIS.

2.5. The Hybrid EEMD-ANN, EEMD-SVM and EEMD-ANFIS Forecasting Models

2.5. The Hybrid EEMD-ANN, EEMD-SVM and EEMD-ANFIS Forecasting Models

2.5. The Hybrid and EEMD-ANFIS Forecasting Models The goal EEMD-ANN, of this studyEEMD-SVM was to improve the forecasting accuracy of groundwater levels by The goal of this study was to improve the forecasting accuracy of groundwater by coupling coupling the EEMD and data-driven models. A schematic diagram of hybrid levels EEMD-ANN, The goal of this study was to improve the forecasting accuracy of the groundwater levels by the EEMD and data-driven models. A schematic diagram of the hybrid EEMD-ANN, EEMD-SVM EEMD-SVM EEMD-ANFIS models is illustrated in Figure 4. As shown FigureEEMD-ANN, 4, the process and coupling the and EEMD and data-driven models. A schematic diagram of theinhybrid of runningmodels the models of As three main steps: EEMD-ANFIS ishybrid illustrated in consists Figure 4. shown in Figure 4, the process the three EEMD-SVM andthree EEMD-ANFIS models is illustrated in Figure 4. As shown in Figureof4,running the process

hybrid modelsthe consists of three mainconsists steps: of three main steps: of running three hybrid models Groundwater level time series Groundwater levelInput time series Input EEMD EEMD

IMF1 IMF1

IMF2 IMF2

... ...

IMFm-1 IMFm-1

IMFm IMFm

rm rm

ANN1, ANN2, …, ANNm, ANNm+1 SVMm+1 m+1 SVM2,2,…, …,ANN SVMmm, ,ANN SVM ANN1,1,ANN 1, ANFIS2, …, ANFISm, ANFISm+1 ANFIS SVM1, SVM2, …, SVMm, SVMm+1 ANFIS1, ANFIS2, …, ANFISm, ANFISm+1 Predicted values Predicted values

Figure 4. Schematic diagram of EEMD-ANN, EEMD-SVM and EEMD-ANFIS. Figure 4. Schematic diagram of EEMD-ANN, EEMD-SVM and EEMD-ANFIS.

4. EEMD Schematic diagram EEMD-ANN, EEMD-SVM and EEMD-ANFIS. (1) Firstly,Figure use the technique to of decompose the original groundwater level fluctuation time

x ( tthe 1,= 2,technique , n; i 1, 2, ci ( t level component one residual series = ) ( t EEMD ) into antheIMF ) and (1) Firstly, use to, m decompose original groundwater fluctuation time

Firstly, use original groundwater level fluctuation x (the t ) ( trmEEMD = ; i 1, 2, , mto ci ( t ) and one component into an IMFthecomponent residual series (1,t= )2,.  , ntechnique ) decompose time series x t t = 1, 2, · · · , n; i = 1, 2, · · · , m into an IMF component c t and one ( )( ) ( ) i (2) component Secondly, the to make residual the rm (three t ) . data driven models ANN, SVM and ANFIS are developed component rm (t). prediction using extracted IMF component and the residual component, corresponding (2) Secondly, the three data driven models ANN, SVM and ANFIS are developed to make the (2) Secondly, the three data driven ANN, and ANFIS areresidual developed to make the respectively. corresponding prediction usingmodels extracted IMF SVM component and the component, (3) Finally, all of the predicting results from the ANN, SVM and ANFIS models are combined to corresponding respectively.prediction using extracted IMF component and the residual component, respectively. (1)

obtain the new values, which arethe the ANN, final forecasting result for the groundwater level (3) Finally, all of theoutput predicting results from SVM and ANFIS models are combined to prediction. obtain the new output values, which are the final forecasting result for the groundwater level prediction.

Water 2018, 10, x FOR PEER REVIEW

3. Study Area and Available Data

Water 2018, 10, 730

7 of 20

7 of 20

3.1. Study Site and Data Preprocessing

(3)

Finally, all of thewas predicting theStates. ANN,The SVM andmean ANFIS models are combined The study area located inresults Florida,from United annual temperature in the study to obtain the new output values, which are the final forecasting result for the groundwater area was 23.36 °C and the average annual mean precipitation was 980.61 mm in the past 16 years. All prediction. thelevel observed data (including the monthly mean precipitation, monthly mean temperature, monthly

maximum temperature, monthly minimum temperature, monthly mean lake level and monthly 3. Study Area and Available Datacollected to forecast the future groundwater fluctuations at well mean groundwater level) were M1255 and STL185. For the obtained data from the past 16 years (from 1997 to 2012), data from the 3.1. Study Site and Data Preprocessing first 14 years was used for training and the last two years of data was used for validation. Meanwhile, study area wasand located in Florida, United States. The annual mean temperature in the theThe IMF components one residual component of the groundwater level time-series were usedstudy to ◦ predict future groundwater fluctuations based on the EEMD. The results of the groundwater level area was 23.36 C and the average annual mean precipitation was 980.61 mm in the past 16 years. using the(including EEMD were analyzed mean and compared withmonthly the results obtained not using the Allprediction the observed data the monthly precipitation, mean temperature, monthly EEMD. The locations monthly of the observed wells in the study area aremean shown inlevel Figure 5. monthly Figure 6a–d maximum temperature, minimum temperature, monthly lake and mean illustrates the monthly precipitation, monthly mean temperature, monthly mean lake level,M1255 and groundwater level) weremean collected to forecast the future groundwater fluctuations at well monthly maximum temperature. Figure 7a,b illustrates the monthly groundwater levels at well and STL185. For the obtained data from the past 16 years (from 1997 to 2012), data from the first and used STL185. 14 M1255 years was for training and the last two years of data was used for validation. Meanwhile, To eliminate time-series data level was time-series normalizedwere usingused the to the IMF components the and dimensional one residual differences, component the of the groundwater following equation before the training stage. predict future groundwater fluctuations based on the EEMD. The results of the groundwater level prediction using the EEMD were analyzed and compared with the results obtained not using the x − xmin x = (8) EEMD. The locations of the observed wells i in the − xmin area are shown in Figure 5. Figure 6a–d xmax study illustrates the monthly mean precipitation, monthly mean temperature, monthly mean lake level, and xmin is xmax xi is the normalized x is7a,b data, the illustrates time-seriesthe data, the minimum value where maximum monthly temperature. Figure monthly groundwater levels at and well M1255 andisSTL185. the maximum value.

Figure 5. Location map of the United States and observation wells (Source: Imagery © Data SIO, Figure 5. Location map of the United States and observation wells (Source: Imagery © Data SIO, NOAA, U.S. Navy, NGA, GEBCO, Image Landsat/Copernicus). NOAA, U.S. Navy, NGA, GEBCO, Image Landsat/Copernicus).

Water 2018, 10, 730 Water2018, 2018,10, 10,xxFOR FORPEER PEERREVIEW REVIEW Water

8 of 20 20 8 8ofof20

bb

aa

700 700

45 45 40 40

600 600

Mean MeanTemperature Temperature(℃) (

35 35

Precipitation Precipitation(mm) (mm)

500 500

30 30

400 400

25 25

15 15

200 200

10 10

100 100

5 5 0 0

0 0

1998 2000 2002 2004 2006 2008 2010 2012 1998 2000 2002 2004 2006 2008 2010 2012 30 30

dd

1998 2000 2002 2004 2006 2008 2010 2012 1998 2000 2002 2004 2006 2008 2010 2012 45 45 40 40

25 25

Max MaxTemperature Temperature(℃) (

35 35

20 20

Lake LakeLevel Level(ft) (ft)

30 30 25 25

15 15

20 20

10 10

15 15

℃ )

cc

℃ )

20 20

300 300

10 10

5 5

5 5

0 0

0 0

1998 2000 2002 2004 2006 2008 2010 2012 1998 2000 2002 2004 2006 2008 2010 2012

1998 2000 2002 2004 2006 2008 2010 2012 1998 2000 2002 2004 2006 2008 2010 2012

Figure Plotsof of(a) (a)precipitation precipitation(b) (b)mean meantemperature temperature(c) (c) lake level (d) max. temperature. Figure 6.6.6.Plots (b) mean temperature (c)lake lakelevel level (d) max. temperature. Figure Plots of (a) precipitation (d) max. temperature.

aa

bb

15 15

15 15

Groundwater GroundwaterLevel Level(ft) (ft)

20 20

Groundwater GroundwaterLevel Level(ft) (ft)

20 20

25 25

25 25

30 30

30 30

35 35

35 35

1998 2000 2002 2004 2006 2008 2010 2012 1998 2000 2002 2004 2006 2008 2010 2012

1998 2000 2002 2004 2006 2008 2010 2012 1998 2000 2002 2004 2006 2008 2010 2012

Figure Plotsof of(a) (a)groundwater groundwaterlevel levelof ofM1255 M1255 (b) groundwater level ofof STL185. Figure 7.7.7.Plots level of M1255(b) (b)groundwater groundwater level STL185. Figure Plots of (a) groundwater level of STL185.

The partial partial autocorrelation autocorrelation function function (PACF), (PACF),as asone oneof ofthe the statisticalmethods, methods,was wasfrequently frequently ToThe eliminate the dimensional differences, the time-series datastatistical was normalized using the following used to to select select suitable models models [10]. [10]. The The PACF PACF of of well well site site M1255 M1255 from from lag-0 lag-0 to to lag-18 lag-18 isis shown shown in in used equation before suitable the training stage. Figure 8. Figure 8 suggests a significant correlation up to the lag-1 month for this time series within Figure 8. Figure 8 suggests a significant correlation to the lag-1 month for this time series within x −up xmin xi = theconfidence confidenceinterval. interval.The Thepartial partialautocorrelation autocorrelation coefficients indicatedthat thatthere therewas wasaaone-month one-month(8) the xmaxcoefficients − xmin indicated time lag lagfor for the themonthly monthlygroundwater groundwaterlevel leveltime timeseries seriesas asthe theinput inputvector vectorof of the the ANN, ANN, SVM SVM and and time where xi is the normalized data, x is the time-series data, xmin is the minimum value and xmax is the ANFIS models. ANFIS models. maximum value. The partial autocorrelation function (PACF), as one of the statistical methods, was frequently used to select suitable models [10]. The PACF of well site M1255 from lag-0 to lag-18 is shown in Figure 8. Figure 8 suggests a significant correlation up to the lag-1 month for this time series within the confidence interval. The partial autocorrelation coefficients indicated that there was a one-month time lag for the monthly groundwater level time series as the input vector of the ANN, SVM and ANFIS models.

Water 2018, 10, 730 Water 2018, 10, x FOR PEER REVIEW

9 of 20 9 of 20 Partial Autocorrelation Function

1

Partial Autocorrelations

0.5

0

-0.5 2

4

6

10

8

12

14

16

18

Lag

Figure function (PACF) (PACF) of of the the groundwater groundwater level level series series at at site site M1255. M1255. Figure 8. 8. The The partial partial autocorrelation autocorrelation function

3.2. Performance Performance Criteria Criteria 3.2. Five statistical statisticalperformance performance evaluation parameters—correlation coefficient (R), normalized Five evaluation parameters—correlation coefficient (R), normalized mean mean square error (NMSE), root mean squared error (RMSE), Nash–Sutcliffe efficiency square error (NMSE), root mean squared error (RMSE), Nash–Sutcliffe efficiency coefficientcoefficient (NS) and (NS) and Akaike information criteria (AIC)—were used to the performance ofproposed the models Akaike information criteria (AIC)—were used to estimate theestimate performance of the models in proposed in this study. this study. Correlation coefficient coefficient (R): (R): Correlation

(

)

)(

N N O − O PPi −−PP i =1 Oi i − O ∑∑ i i = 1 R =q R= 2 2 NN 22 NN O −P − OO ∑ Oii − ∑∑ ∑ii=11 PP i =i1 1 = i i− P =

(

(

)

)

(9) (9)

Normalized Normalized mean mean square square error error (NMSE): (NMSE):

(O − P ) 2 nn−− 1 ∑ ∑iN= i =1 1 (Oi i − iPi ) NMSE NMSE == 22 nn ∑ NN O O −−OO N

2

∑i=1 ( i =1

)

ii

(10) (10)

Root Root mean mean squared squared error error (RMSE): (RMSE): ∑ ∑iN=i =11(( Oii −−PPi )i ) RMSE== RMSE NN s

N

(11) (11)

Nash–Sutcliffe efficiency efficiency coefficient coefficient(NS): (NS): Nash–Sutcliffe

∑N ((OO −−PP)i )2 ∑ NSNS == 11−− i=1 i ∑iN=1 ( O i −−OO) 2 ∑ 2

N

i =1

i

2

N

i =1

i

(12) (12)

i

Akaike Akaike information information criteria criteria (AIC): (AIC):   22   1 NN − AIC (k ) =⋅ N In  1 ∑ O P ( ) (13) i i  ++2k2k AIC(k) = N · In (13)  N ∑i =(1Oi − Pi )   N i =1 where Oi is the observed groundwater level value, Pi is the predicted groundwater level value, where Oi isaverage the observed level value, level Pi is the predicted level O is O is the of thegroundwater observed groundwater value, the average of thevalue, predicted P is groundwater the average of level the observed level value, P is the average of the groundwater N is the number k predicted groundwater value, groundwater of input samples, and is the number of free level value, N is the number of input samples, and k is the number of free parameters used in (AIC) those parameters used in those models. Suppose k is 0 in this paper. The Akaike information criteria models. Suppose k is 0 in this paper. The Akaike information criteria (AIC) was employed to select the was employed to select the best time series mode [46]. The best fit between the predicted value and best time series fit between the predicted value observed value would have observed value mode would[46]. haveThe R =best 1, NMSE = 0, RMSE = 0, NS = 1, AICand = −∞ , respectively. R = 1, NMSE = 0, RMSE = 0, NS = 1, AIC = −∞, respectively.

4. Decomposing, Modeling and Application The time series of the original groundwater levels can be decomposed into several independent IMFs and one residue using the EEMD technique. The results are shown in Figures 9 and 10. It is

Water 2018, 10, 730

10 of 20

4. Decomposing, Modeling and Application The time series of the original groundwater levels can be decomposed into several independent Water 2018, xx FOR REVIEW 10 of IMFs and residue using the EEMD technique. The results are shown in Figures 9 and Water one 2018, 10, 10, FOR PEER PEER REVIEW 10 10. of 20 20 It is noticeable that the two time series of the groundwater levels were resolved into six independent IMF noticeable that the two time series of the groundwater levels were resolved into six independent IMF and one residue component. Six independent IMFs were successively displayed from the highest to the and one residue component. Six independent IMFs were successively displayed from the highest to lowest frequency, respectively. The extracted IMF components and the residual component as input the lowest frequency, respectively. The extracted IMF components and the residual component as variables were used to forecast the groundwater level. EEMD, as a new noise-assisted data analysis input variables were used to forecast the groundwater level. EEMD, as a new noise-assisted data method, overcame theovercame scale separation modethe mixing and conserved analysis method, the scale problem, separationavoided problem,the avoided modeproblem mixing problem and the physical uniqueness of the decomposition [19]. Therefore, the decomposition could helpful conserved the physical uniqueness of the decomposition [19]. Therefore, the decompositionbe could be to transform non-linear andnon-linear non-stationary time series to independent IMF components from high to low helpful to transform and non-stationary time series to independent IMF components from high and to low frequency and could be useful improveaccuracy. the forecast accuracy. frequency could be useful to improve thetoforecast IMF1 IMF1

11 00 -1 -1 Jan-1997 Jan-1997

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

IMF4 IMF4

IMF3 IMF3

IMF2 IMF2

22 00 -2 -2 Jan-1997 Jan-1997 11 00 -1 -1 Jan-1997 Jan-1997 0.5 0.5 00 -0.5 -0.5 Jan-1997 Jan-1997

IMF6 IMF6

IMF5 IMF5

0.5 0.5 00 -0.5 -0.5 Jan-1997 Jan-1997 0.1 0.1 00 -0.1 -0.1 Jan-1997 Jan-1997

Residue Residue

24 24 24.5 24.5 25 25 Jan-1997 Jan-1997

Figure 9. Decomposition of groundwater atthe theM1255 M1255observation observation well. Figure 9. of level time-series well. Figure 9. Decomposition Decomposition of groundwater groundwaterlevel leveltime-series time-series at at the M1255 observation well.

IMF1 IMF1

55 00 -5 -5 Jan-1997 Jan-1997

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

Jan-1999 Jan-1999

Jan-2001 Jan-2001

Jan-2003 Jan-2003

Jan-2005 Jan-2005

Jan-2007 Jan-2007

Jan-2009 Jan-2009

Jan-2011 Jan-2011

Dec-2012 Dec-2012

IMF4 IMF4

IMF3 IMF3

IMF2 IMF2

55 00 -5 -5 Jan-1997 Jan-1997 55 00 -5 -5 Jan-1997 Jan-1997 22 00 -2 -2 Jan-1997 Jan-1997

IMF6 IMF6

IMF5 IMF5

22 00 -2 -2 Jan-1997 Jan-1997 0.1 0.1 00 -0.1 -0.1 Jan-1997 Jan-1997

Residue Residue

23 23 24 24 25 25 Jan-1997 Jan-1997

FigureFigure 10. Decomposition of groundwater atthe theSTL185 STL185observation observation well. 10. of level time-series well. Figure 10. Decomposition Decomposition of groundwater groundwaterlevel leveltime-series time-series at at the STL185 observation well.

Water 2018, 10, 730

11 of 20

The selection of input variables depended on the availability of the inputs and historically observed records. Several combinations of the exogenous factors were applied in the ANN, SVM Water 2018, 10, x FORfor PEER REVIEW 11 of 20were and ANFIS models groundwater level prediction. The input variables in the ANN model the monthly time-series data of the precipitation, temperature (maximum, mean and minimum), and The selection of input variables depended on the availability of the inputs and historically groundwater level. ANN+L representing the lake level was added to input variables in the ANN observed records. Several combinations of the exogenous factors were applied in the ANN, SVM and model. EEMD-ANN represents that the input variables were IMF components and one residual ANFIS models for groundwater level prediction. The input variables in the ANN model were the component. input variables the SVM and ANFIS models weremean similar that of the monthly The time-series data of theof precipitation, temperature (maximum, and to minimum), andANN models (see Tableslevel. 1–3).ANN+L representing the lake level was added to input variables in the ANN groundwater model. EEMD-ANN represents that the input variables were IMF components and one residual

5. Results and Discussion component. The input variables of the SVM and ANFIS models were similar to that of the ANN models (see Tables 1–3).

5.1. EEMD-ANN and ANN Models 5. Results and Discussion For the investigation of the effects of the input structure and input data on the performance of the ANN model, three input structures were considered, as shown in Table 1. The results showed that 5.1. EEMD-ANN and ANN Models the input variables of the EEMD-ANN at two sites has the minimum RMSE and AIC values in the the investigation of the effects ofofthe input structure data on performance of validation.For Meanwhile, the input variables EEMD-ANN hadand the input maximum R the value in the validation. the ANN model, three input structures were considered, as shown in Table 1. The results showed At site M1255, the minimum RMSE and AIC of EEMD-ANN were 0.329 and −53.350, respectively; that the input variables of the EEMD-ANN at two sites has the minimum RMSE and AIC values in at site STL185, the minimum RMSE and AIC of EEMD-ANN were 0.646 and −20.979, respectively. the validation. Meanwhile, the input variables of EEMD-ANN had the maximum R value in the The maximum EEMD-ANN in the validation was 0.926 at site M1255 and0.329 0.891 at −53.350, site STL185. validation. R Atofsite M1255, the minimum RMSE and AIC of EEMD-ANN were and Tablerespectively; 1 suggestedatthat valuesthe ofminimum the four input ANN+L at were two sites better than site R STL185, RMSE variables and AIC ofofEEMD-ANN 0.646 were and −20.979, that of three inputThe variables of ANN in the validation stage. Thewas R of0.926 ANN+L site M1255 wasat0.754. respectively. maximum R of EEMD-ANN in the validation at siteatM1255 and 0.891 STL185. Table 1 suggested values of the0.839. four input variables of ANN+L at two sites The Rsite value for ANN+L applied atthat siteRSTL185 was Figure 11 shows the observed and were predicted better than that of three input variables of ANN in the validation stage. The R of ANN+L at site M1255 groundwater levels using ANN, ANN+L and EEMD-ANN in the validation stage. was 0.754. The R value for ANN+L applied at site STL185 was 0.839. Figure 11 shows the observed and predicted groundwater usingfrom ANN, and EEMD-ANN inand the STL185. validation stage. Table 1. Result oflevels modeling theANN+L ANN model at site M1255 Table 1. Result of modeling from the ANN model at site M1255 and STL185. Training Validation NMSE

M1255

STL185

EEMD-ANNEEMD-ANN 0.977 ANN 0.869 STL185 ANN ANN+L 0.872

ANN+L

ANN+L

a

R 0.131 0.932 0.345 0.808 0.333 0.816 0.044 0.977 0.244 0.869 0.239 0.872

RMSE NS Training NMSE 0.234RMSE0.869NS 0.131 0.869 0.380 0.234 0.653 0.345 0.653 0.374 0.380 0.665 0.333 0.374 0.665 0.281 0.955 0.044 0.281 0.955 0.659 0.754 0.244 0.659 0.754 0.651 0.760 0.239 0.651 0.760

20

Observed (ft) ANN ANN+L EEMD-ANN

Groundwater Level (ft)

22

24

26

AIC

R

AIC −488.040 −−488.040 325.009 −−325.009 330.668 −330.668

−−426.784 426.784 −−140.343 140.343 −144.026 −144.026

R 0.926 0.926 0.737 0.737 0.754 0.754 0.891 0.891 0.813 0.813 0.839 0.839

NMSE RMSE Validation NMSE NS 0.317RMSE 0.329 0.3170.4420.329 0.389 0.669 0.4420.6750.389 0.480 0.538 0.675 0.480 0.296 0.208 0.646 0.208 0.646 0.783 0.353 0.842 0.353 0.842 0.632 0.298 0.774 0.298 0.774 0.689

b

NS

AIC

AIC 0.669 −53.350 0.538 −45.377 0.296 −35.246 0.783 −20.979 0.632 −8.239 0.689 −12.323

−53.350 −45.377 −35.246 −20.979 −8.239 −12.323

Observed (ft) ANN ANN+L EEMD-ANN

20


R EEMD-ANN 0.932 EEMD-ANN ANN 0.808 M1255 ANN ANN+L 0.816

22

24

26

28

28 30 0

4

8

12

16

Time (Months)

20

24

0

4

8

12

16

20

24

Time (Months)

Figure 11. Observed and predicted groundwater level using ANN, ANN+L EEMD-ANN Figure 11. Observed and predicted groundwater level using ANN, ANN+L andand EEMD-ANN (a)(a) M1255 M1255 (b) STL185. (b) STL185.

5.2. EEMD-SVM and SVM Models

5.2. EEMD-SVM and SVM Models

The same input structures were introduced to the SVM model, as shown in Table 2. The results

The same input structures were introducedattosite theM1255 SVMand model, as shown in Table of 2. SVM The results showed that the input variables of EEMD-SVM the three input variables at showed thehad input variables RMSE of EEMD-SVM at site M1255 and the inputRMSE variables of SVM site that STL185 the minimum and AIC values in the validation. Thethree minimum and AIC

Water 2018, 10, 730

12 of 20

at site STL185 had the minimum RMSE and AIC values in the validation. The minimum RMSE and AIC of EEMD-SVM at site M1255 were 0.315 and 221255.373, respectively. The minimum RMSE and Water 2018, 10, x FOR PEER REVIEW 12 of 20 AIC of SVM at site STL185 were 0.755 and −13.512, respectively. Table 2 suggests that the maximum R value of EEMD-SVM was 0.885were at site M1255 and 0.860 at site STL185. R values of the of EEMD-SVM at site M1255 0.315 and 221255.373, respectively. TheThe minimum RMSE andfour AICinput variables of SVM+L at the two sites were higher than or equal to those of the three input variables of SVM at site STL185 were 0.755 and −13.512, respectively. Table 2 suggests that the maximum R of SVM value in theofvalidation stage. of SVM+L at site M1255 were and of 0.730 in the training EEMD-SVM was The 0.885Ratvalues site M1255 and 0.860 at site STL185. The0.758 R values the four input variables ofstage, SVM+L at the two sites higherof than or equal those of thewere three0.858 input and variables and validation respectively. Thewere R values SVM+L at to site STL185 0.845ofin the SVM in the validation stage.respectively. The R values of SVM+L site M1255 were 0.758and andpredicted 0.730 in thegroundwater training training and validation stage, Figure 12atshows the observed and validation stage, respectively. The R values of SVM+L at site STL185 were 0.858 and 0.845 in the levels using SVM, SVM+L and EEMD-SVM in the validation stage. training and validation stage, respectively. Figure 12 shows the observed and predicted groundwater levels usingTable SVM,2.SVM+L EEMD-SVM in the the SVM validation Resultsand of modeling from modelstage. at site M1255 and STL185. Table 2. Results of modeling from the SVM model at site M1255 and STL185. Training

NMSE

M1255

EEMD-SVMEEMD-SVM 0.948 SVM 0.786 M1255 SVM SVM+L 0.758

STL185

EEMD-SVMEEMD-SVM 0.972 SVM 0.931 STL185 SVM SVM+L 0.858 SVM+L

SVM+L

a

R

NMSE

RMSE

NS

0.102 0.102 0.2060.206 0.898 0.948 0.898 0.450 0.434 0.548 0.786 0.450 0.434 0.548 0.424 0.421 0.574 0.758 0.073 0.972 0.134 0.931 0.273 0.858

0.424 0.421 0.574 0.073 0.3600.360 0.927 0.927 0.134 0.4880.488 0.865 0.865 0.273 0.6960.696 0.725 0.725

20

Observed (ft) SVM SVM+L EEMD-SVM

22


Validation

Training NS RMSE

24

26

AIC

AIC

R

R

Validation NMSE RMSE

NMSE

RMSE

−530.463 0.885 0.885 0.2920.2920.315 −530.463 −280.434 0.7089 0.508 −280.434 0.7089 0.508 0.416 −290.376 0.730 1.149 −290.376 −343.467 −343.467 −241.064 −241.064 −121.615 −121.615

0.730 0.860 0.860 0.845 0.845 0.845 0.845

1.149 0.626 0.5070.5071.009 0.2830.2830.755 0.2840.2840.756

NS

0.315 0.696 0.416 0.470 0.626

−0.199

1.009 0.471 0.755 0.704 0.756 0.703

b

NS

AIC

0.696 −55.373 0.470 −42.067 −0.199

−55.373 −42.067 −22.467

AIC

−22.467 0.471 0.447 0.704 −13.512 0.703 −13.441

0.447 −13.512 −13.441

Observed (ft) SVM SVM+L EEMD-SVM

20


R

22

24

26

28

28 30 0

4

8

12

16

Time (Months)

20

24

0

4

8

12

16

20

24

Time (Months)

Figure 12. Observed and predicted groundwater levels using SVM, SVM+L EEMD-SVM Figure 12. Observed and predicted groundwater levels using SVM, SVM+L andand EEMD-SVM (a)(a) M1255 M1255 (b) STL185. (b) STL185.

5.3. EEMD-ANFIS and ANFIS Models

5.3. EEMD-ANFIS and ANFIS Models

In the ANFIS model, the grid partition algorithm was used to generate a fuzzy inference system

In thestructure ANFIS model, grid partition algorithm was structures used to generate a fuzzy to inference system (FIS) from thethe training data [47]. The same input were introduced the ANFIS (FIS) model, structure from the training data [47]. The same input structures were introduced to the ANFIS the results are shown in Table 3. Table 3 suggested that the input variables of EEMD-ANFIS model, the sites results in Table Table 3 suggested that the input variablesRMSE of EEMD-ANFIS at two hadare theshown minimum RMSE 3. and AIC values in the validation. The minimum and AIC of sites EEMD-ANFIS at site M1255 were and 0.360AIC and −49.100, respectively. The minimum RMSE and AIC and at two had the minimum RMSE values in the validation. The minimum RMSE of EEMD-ANFIS EEMD-ANFIS atatsite −24.035, respectively. The The resultminimum showed that the and AIC of siteSTL185 M1255were were0.606 0.360and and −49.100, respectively. RMSE maximum R value of EEMD-ANFIS in the validation was 0.926 at site M1255 and 0.909 at site STL185. AIC of EEMD-ANFIS at site STL185 were 0.606 and −24.035, respectively. The result showed that The R values of four input variables of ANFIS+L at two sites were greater than that of the three input the maximum R value of EEMD-ANFIS in the validation was 0.926 at site M1255 and 0.909 at site variables of ANFIS in the validation stage. The R value of ANFIS+L at site M1255 was 0.799 in the STL185. The R values of four input variables of ANFIS+L at two sites were greater than that of the validation. The R value of ANFIS+L at site STL185 was 0.910 in the validation. Figure 13 shows the threeobserved input variables of ANFIS in the validation stage. The RANFIS+L value of and ANFIS+L at site M1255 and predicted groundwater levels using ANFIS, EEMD-ANFIS in the was 0.799validation in the validation. The R value of ANFIS+L at site STL185 was 0.910 in the validation. Figure 13 stage. shows the observed and predicted groundwater levels using ANFIS, ANFIS+L and EEMD-ANFIS in the validation stage.

Water 2018, 10, 730

13 of 20

Water 2018, 10, x FOR PEER REVIEW

13 of 20

Table 3. Result of modeling from the ANFIS model at site M1255 and STL185. Table 3. Result of modeling from the ANFIS model at site M1255 and STL185. Training

NMSE R

M1255

0.063 0.968 0.157 0.918 0.058

STL185

EEMD-ANFIS EEMD-ANFIS 0.982 ANFIS 0.940 STL185 ANFIS ANFIS+L 0.980 ANFIS+L

0.982 0.035 0.117 0.940 0.039 0.980

ANFIS+L

a

0.971

20

Observed (ft) ANFIS ANFIS+L EEMD-ANFIS


22

24

26

AIC

AIC

−611.840 −611.840 −457.099 −457.099 −624.287 −624.287

−466.512 −466.512 −264.408 −264.408 −447.574 −447.574

R

R

0.926 0.926 0.785 0.785 0.799 0.799 0.909 0.909 0.855 0.855 0.910 0.910

Validation NMSE RMSE NMSE RMSE NS 0.379 0.379 0.360 0.360 0.605 0.496 0.412 0.412 0.496 0.482 0.443 0.389 0.443 0.389 0.538 0.183 0.809 0.183 0.606 0.606 0.318 0.799 0.799 0.318 0.668 0.262 0.726 0.726 0.262 0.726

b

NS

AIC

0.605 −49.100 0.482 −42.602 0.538

−49.100 −42.602 −45.341

AIC

−45.341 −24.035 0.809 0.668 −10.756 0.726 −15.360

−24.035 −10.756 −15.360

Observed (ft) ANFIS ANFIS+L EEMD-ANFIS

20


R EEMD-ANFIS 0.968 EEMD-ANFIS ANFIS 0.918 M1255 ANFIS ANFIS+L 0.971

Validation

RMSETraining NS NMSE RMSE NS 0.162 0.937 0.063 0.162 0.937 0.257 0.2570.8420.842 0.157 0.156 0.942 0.058 0.156 0.942 0.035 0.249 0.2490.9650.965 0.455 0.4550.8830.883 0.117 0.264 0.2640.9610.961 0.039

22

24

26

28

28 30 0

4

8

12

16

20

24

0

Time (Months)

4

8

12

16

20

24

Time (Months)

Figure 13. Observed and predictedgroundwater groundwater levels using ANFIS, ANFIS+L and EEMD-ANFIS (a) Figure 13. Observed and predicted levels using ANFIS, ANFIS+L and EEMD-ANFIS M1255 (b)STL185. STL185. (a) M1255 (b)

5.4. Comparison of EEMD-ANN, EEMD-SVM and EEMD-ANFIS

5.4. Comparison of EEMD-ANN, EEMD-SVM and EEMD-ANFIS The best statistical parameter values of validation are shown in Tables 1–3, when the IMF

The best statistical parameter values of validation are shown in Tables 1–3, when the IMF components and one residual component were considered as the input variables. Also, based on the components and residual component as the Also, based on the results of theone statistical analysis, it was were foundconsidered that taking the lakeinput level variables. into account for the input results of the statistical analysis, it was found that taking the lake level into account for the variables led to better prediction in terms of the accuracy. Since the two well sites for data collectioninput variables toto better prediction in terms of the accuracy. thegroundwater two well sites forfluctuations. data collection wereled close the lake, groundwater-lake interaction could Since affect the level that the lake level should be considered as anthe input variable when exogenous were The closeresults to theshowed lake, groundwater-lake interaction could affect groundwater level fluctuations. factors were used to forecast the groundwater level for those well sites. The results showed that the lake level should be considered as an input variable when exogenous Theused analysis results from the model training and stages for two well sites are listed in factors were to forecast the groundwater level forvalidation those well sites. Tables 1–3. At the training stage, the RMSE values in the EEMD-ANN, EEMD-SVM and EEMDThe analysis results from the model training and validation stages for two well sites are listed in ANFIS models for well M1255 were 0.234, 0.206 and 0.162 respectively; the RMSE values in these Tables 1–3. At the training stage, the RMSE values in the EEMD-ANN, EEMD-SVM and EEMD-ANFIS models for well STL185 were 0.281, 0.360 and 0.249 respectively. The RMSE value of the EEMDmodels for well M1255 were 0.234, 0.206 and 0.162 respectively; the RMSE values in these models for ANFIS model was smaller than those of other two models in the training stage, which implied that well STL185 were 0.281, 0.249 respectively. The RMSE theother EEMD-ANFIS the prediction ability0.360 of theand EEMD-ANFIS model was better thanvalue that ofofthe two models model for the was smaller than those of other two models inRMSE the training stage, implied that the prediction ability given data. In the validation stage, the values for the which EEMD-ANN, EEMD-SVM and EEMDof theANFIS EEMD-ANFIS was were better than0.315 thatand of the two models for the given data.for In the models formodel well M1255 0.329, 0.360other respectively; the RMSE of these models well STL185 were 0.646, 1.009 and 0.606 respectively. For well M1255, the prediction result based on validation stage, the RMSE values for the EEMD-ANN, EEMD-SVM and EEMD-ANFIS models for EEMD-ANFIS was close that0.360 obtained from the other models; the prediction EEMD-were well M1255 were 0.329, 0.315toand respectively; the two RMSE of these models forresult wellofSTL185 ANFIS was more accurate than that of the other two models for well STL185. 0.646, 1.009 and 0.606 respectively. For well M1255, the prediction result based on EEMD-ANFIS was Generally, the modeling result is regarded as a perfect estimation when the NS criterion is equal close to that obtained from the other two models; the prediction result of EEMD-ANFIS was more to 1. If the NS criterion is higher than 0.8, the model can be recognized as effective and accurate [48]. accurate than that of the other two models for well STL185. The NS values for the EEMD-ANN, EEMD-SVM and EEMD-ANFIS models in the training stage were Generally, the modeling result is that regarded as a perfect when theare NSacceptable criterion is all greater than 0.8. This indicated the results from allestimation these hybrid models forequal to 1. Ifforecasting the NS criterion is higher than 0.8,validation the model canthe be NS recognized effective and accurate groundwater levels. In the stage, values forasthe EEMD-SVM model at [48]. The NS values for the EEMD-ANN, EEMD-SVM and EEMD-ANFIS models in the training site M1255 were greater than those for the EEMD-ANN and EEMD-ANFIS models. The NS stage valueswere all greater 0.8. This model indicated that the results fromthan all these hybrid models areand acceptable for thethan EEMD-ANFIS at site STL185 were higher those for the EEMD-ANN EEMD- for forecasting groundwater levels. In the validation stage, the NS values for the EEMD-SVM model at site M1255 were greater than those for the EEMD-ANN and EEMD-ANFIS models. The NS values for the EEMD-ANFIS model at site STL185 were higher than those for the EEMD-ANN and EEMD-SVM

Water 2018, 10, 730 Water 2018, 10, x FOR PEER REVIEW

14 of 20 14 of 20

models in theinvalidation stage. stage. This indicated that EEMD-ANFIS and EEMD-SVM had anhad overall SVM models the validation This indicated that EEMD-ANFIS and EEMD-SVM an better estimation qualityquality in comparison with the EEMD-ANN model. model. (See Tables overall better estimation in comparison with the EEMD-ANN (See 1–3). Tables 1–3). Whencomparing comparingthe theRMSE RMSEand andRRvalues values of of the the EEMD-ANN, EEMD-ANN, EEMD-SVM EEMD-SVMand and EEMD-ANFIS EEMD-ANFIS When modelsin in the the validation, validation, the the RMSE in models RMSE value value of of the the EEMD-SVM EEMD-SVMmodel modelatatsite siteM1255 M1255was wasless lessthan thanthat that both the EEMD-ANN and the EEMD-ANFIS model. The RMSE value ofof the EEMD-ANFIS model at in both the EEMD-ANN and the EEMD-ANFIS model. The RMSE value the EEMD-ANFIS model at site STL185 was higher than both EEMD-ANN and EEMD-SVM model. results site STL185 was higher than both thethe EEMD-ANN and thethe EEMD-SVM model. ForFor thethe results for for the the sites, EEMD-ANFIS higher R values compared the other models. Obviously, twotwo wellwell sites, EEMD-ANFIS hadhad higher R values compared to thetoother models. Obviously, the R the and RNS and NS values the EEMD-ANFIS andEEMD-SVM the EEMD-SVM model the validation greater values of theof EEMD-ANFIS and the model in theinvalidation stagestage werewere greater than than ofEEMD-ANN the EEMD-ANN model. Therefore, the EEMD-ANFIS and EEMD-SVM model be that that of the model. Therefore, the EEMD-ANFIS and EEMD-SVM model couldcould be good good data-driven model the validation (See Tables data-driven model in theinvalidation stage.stage. (See Tables 1–3). 1–3). Figures Figures14 14 and and 15 15 illustrate illustrate the the coefficient coefficient of of determination determination (R (R22))values, values, corresponding correspondingto tothe the predicted in the thescatter scatterplots plotsatatthe the M1255 and STL185 observation wells, using the EEMDpredicted values values in M1255 and STL185 observation wells, using the EEMD-ANN, ANN, EEMD-SVM, EEMD-ANFIS, SVM, ANFIS, ANN+L, SVM+L and ANFIS+L models. The EEMD-SVM, EEMD-ANFIS, ANN, ANN, SVM, ANFIS, ANN+L, SVM+L and ANFIS+L models. The scatter scatter plots revealed the relationships between the predicted and observed groundwater levels for plots revealed the relationships between the predicted and observed groundwater levels for two two observation wells. It be can be clearly seen clearly from the scatter plots the EEMD-ANFIS model observation wells. It can seen from the scatter plots that thethat EEMD-ANFIS model forecast forecast the groundwater levelsless with less scatter theobserved two observed Figures 14 and 15 show the groundwater levels with scatter for thefor two wells.wells. Figures 14 and 15 show that that EEMD-ANFIS best line compared theother othermodels. models. Figure Figure 16a,b 16a,b show EEMD-ANFIS hadhad thethe best fit fit line compared toto the show the the forecast forecast groundwater groundwaterlevels levelsversus versusobserved observedgroundwater groundwaterlevels levelsusing usingall allof ofthe themodels modelsin inthe thetraining trainingstage stage and andthe thevalidation validationstage. stage.

26

25

24

y=0.5444x+11.287 R2=0.5433

25

25

26

24

25

Observed (ft)

e

27

24

26

25

f y=0.384x+15.348 R2=0.5018

25

27

SVM+L forecasted (ft)

26

26

Observed (ft)

26

y=1.0266x-0.7224 R2=0.7829

SVM forecasted (ft)

EEMD-SVM forecasted (ft)

25

Observed (ft)

25

y=1.1511x-3.6898 R2=0.5326

26

25

24 24

24

24

25

26

27

24

25

Observed (ft)

24

25

Observed (ft)

25

26

26

27

i y=0.8149x-4.754 R2=0.6168

26

ANFIS+L forecasted (ft)

26

25

24

24

Observed (ft)

h y=1.1304x-3.4762 R2=0.8581

ANFIS forecasted (ft)

26

26

Observed (ft)

g EEMD-ANFIS forecasted (ft)

y=0.872x+3.3704 R2=0.5681

24

24

24

d

26

ANN+L forecasted (ft)

y=1.0847x-2.3096 R2=0.8569

ANN forecasted (ft)

26

EEMD-ANN forecasted (ft)

c

b

a

25

24

24

25

Observed (ft)

26

y=0.874x+3.1958 R2=0.6379

25

24

24

25

26

Observed (ft)

Figure and predicted groundwater levels at the M1255 observation wellwell using (a) Figure 14. 14. Observed Observed and predicted groundwater levels at the M1255 observation using EEMD-ANN; (b) ANN; (c) ANN+L; (d) EEMD-SVM; (e) SVM; (f) SVM+L; (g) EEMD-ANFIS; (h) (a) EEMD-ANN; (b) ANN; (c) ANN+L; (d) EEMD-SVM; (e) SVM; (f) SVM+L; (g) EEMD-ANFIS; groundwater level, ANFIS; (i) ANFIS+L. ( x is (h) ANFIS; (i) ANFIS+L. (xobserved is observed groundwater level,y y isispredicted predicted groundwater groundwater level). level).

Water Water2018, 2018,10, 10,730 x FOR PEER REVIEW

c

b y=0.8897x+2.6719 R2=0.7947

26

ANN forecasted (ft)

26

24

y=0.7828x+5.3706 R2=0.6612

24

f

24

y=0.7459x+6.0157 R2=0.7143

26

SVM+L forecasted (ft)

26

SVM forecasted (ft)

EEMD-SVM forecasted (ft)

y=0.3907x+14.286 R2=0.7391

24

24

24

22

26

26

ANFIS+L forecasted (ft)

ANFIS forecasted (ft)

y=0.9428x+1.3736 R2=0.7302

24

22

22

24

Observed (ft)

26

26

Observed (ft)

i 26

24

24

22

26

h y=0.9284x+1.651 R2=0.8254

22

24

Observed (ft)

g 26

y=0.6274x+9.0413 R2=0.7145

22

22

22

Observed (ft)

EEMD-ANFIS forecasted (ft)

Observed (ft)

e

d

26

24

22

Observed (ft)

Observed (ft)

22

24

26

24

22

26

24

22

y=0.7936x+5.0679 R2=0.7037

22

22

22

26

26

ANN+L forecasted (ft)

a EEMD-ANN forecasted (ft)

15 15 of of20 20

22

24

Observed (ft)

26

y=1.1103x-2.7663 R2=0.8278

24

22

22

24

26

Observed (ft)

Figure Observed and and predicted predictedgroundwater groundwaterlevels levelsatatthe theSTL185 STL185observation observation well using Figure 15. 15. Observed well using (a) (a) EEMD-ANN; ANN; ANN+L; EEMD-SVM; SVM; SVM+L; EEMD-ANFIS; EEMD-ANN; (b)(b) ANN; (c)(c) ANN+L; (d)(d) EEMD-SVM; (e) (e) SVM; (f) (f) SVM+L; (g) (g) EEMD-ANFIS; (h) (h) ANFIS; ANFIS+L. observedgroundwater groundwaterlevel, level, y isispredicted ANFIS; (i) (i) ANFIS+L. ( x (xisisobserved predictedgroundwater groundwaterlevel). level).

Accordingto tothe theanalysis, analysis,the theRR22value valuefor forthe thedata dataatatsite siteM1255 M1255indicated indicatedthat thatEEMD-ANFIS EEMD-ANFIS According performedbetter betterthan thanthe theother othermodels, models,although althoughthe theRMSE RMSEvalue valuefor forthe theEEMD-SVM EEMD-SVMwas wasless lessthan than performed thatfor forEEMD-ANFIS. EEMD-ANFIS.The TheRR2 2value valueof ofthe thethree threehybrid hybridmodels models(i.e., (i.e.,EEMD EEMDcoupled) coupled)were werebetter betterthan than that thatofof other models. Therefore, both the EEMD-ANFIS and EEMD-SVM can be that thethe other models. Therefore, both the EEMD-ANFIS and EEMD-SVM models canmodels be considered 2 2 considered good models data-driven models site M1255. Figure shows that of the R value of EEMDgood data-driven at site M1255.atFigure 15 shows that15the R value EEMD-ANFIS at site ANFISwas at site STL185 was nearly equal to that ANFIS+L and than was athat bit of higher than thatThe of EEMDSTL185 nearly equal to that of ANFIS+L andof was a bit higher EEMD-ANN. RMSE ANN.for The RMSE valuewas for EEMD-ANFIS was less than that forANFIS+L. EEMD-ANN and Thus, value EEMD-ANFIS less than that for EEMD-ANN and Thus, theANFIS+L. EEMD-ANFIS the EEMD-ANFIS model the canbest be considered bestatestimation model site STL185. For thesites, two model can be considered estimation the model site STL185. Forat the two observation observation sites, the forecastfrom results from the threewere hybrid models were suggested to have the forecast results obtained theobtained three hybrid models suggested to have better quality better quality compared to those notEEMD. coupled with EEMD. compared to those not coupled with Boxplotisisimportantly importantlyused usedto tocheck checkwhether whetherthe thedata-driven data-drivenmodels modelsare areable ableto toforecast forecastthese these Boxplot variations and corresponding prediction errors. It intuitively depicts the quartile values for the variations and corresponding prediction errors. It intuitively depicts the quartile values for the predictionerror errorofofgroundwater groundwaterdata. data.Figures Figures17 17and and18 18display displaythat thatthe themedian medianvalue valueof ofthe thetraining training prediction errorsisisclose closeto tozero, zero,which whichindicates indicatesthe thegood goodperformance performanceof ofthe thedata-driven data-drivenmodels modelsin inthe thetraining training errors stagein interms termsof ofthe theefficiency. efficiency.Figures Figures17 17and and18 18show showthe thecomparison comparisonof oferrors errorsbetween betweenthe theresults results stage obtainedby bythe theEEMD-ANN, EEMD-ANN,EEMD-SVM, EEMD-SVM,EEMD-ANFIS, EEMD-ANFIS,ANN, ANN,SVM, SVM,ANFIS, ANFIS,ANN+L, ANN+L,SVM+L SVM+Land and obtained ANFIS+Lmodels modelsin in training period andvalidation the validation Theofresults of theindicated boxplots ANFIS+L thethe training period and the period.period. The results the boxplots indicated that EEMD-ANFIS the most accurate model in the In training stage. Instage, the validation that EEMD-ANFIS was the mostwas accurate model in the training stage. the validation the result stage, the result of the for well close to that of the andwhile EEMDof the EEMD-ANFIS forEEMD-ANFIS well M1255 was closeM1255 to thatwas of the EEMD-SVM andEEMD-SVM EEMD-ANN; in ANN; while in the prediction of the groundwater level for well STL185, the EEMD-ANFIS performed the prediction of the groundwater level for well STL185, the EEMD-ANFIS performed more accurately morethe accurately thanand the EEMD-ANN. EEMD-SVM and EEMD-ANN. than EEMD-SVM Overall, the performance of the EEMD-ANFIS, EEMD-SVM and EEMD-ANN models was superior to the other models. The results of the ANFIS+L, SVM+L and ANN+L model were better

Water 2018, 10, 730

16 of 20

Water 2018, 10, x FOR REVIEW Overall, the PEER performance

16 of 20 of the EEMD-ANFIS, EEMD-SVM and EEMD-ANN models was superior to the other models. The results of the ANFIS+L, SVM+L and ANN+L model were better than than that of ANFIS, and ANN in terms R. It indicated that lake level fluctuations as an input that of ANFIS, SVM SVM and ANN in terms of R. Itof indicated that lake level fluctuations as an input variable variable is important in the prediction of the groundwater level in the near-lake area. Comparing is important in the prediction of the groundwater level in the near-lake area. Comparing the resultsthe of results of the EEMD-ANFIS, EEMD-ANN and EEMD-SVM the three hadadvantages their own the EEMD-ANFIS, EEMD-ANN and EEMD-SVM models, themodels, three models hadmodels their own advantages and disadvantages. The of the model prediction model should the effects and and disadvantages. The selection ofselection the prediction should balance thebalance effects and benefits of benefits of the parameters statistical parameters (e.g., R andinRMSE) in both the training period the validation the statistical (e.g., R and RMSE) both the training period and the and validation period. period. The prediction of the EEMD-ANFIS wereto close the EEMD-SVM and EEMDThe prediction results results of the EEMD-ANFIS were close thattoofthat the of EEMD-SVM and EEMD-ANN ANN models at site M1255; the prediction results of the EEMD-ANFIS were a bit more accurate than models at site M1255; the prediction results of the EEMD-ANFIS were a bit more accurate than that of that of the EEMD-SVM and EEMD-ANN at site of the application the EEMD-SVM and EEMD-ANN models at models site STL185. TheSTL185. results ofThe the results application suggested that suggested that the EEMD-ANFIS, EEMD-SVM and EEMD-ANN models were feasible andEEMD effective. the EEMD-ANFIS, EEMD-SVM and EEMD-ANN models were feasible and effective. Also, can Also, EEMD can be used to improve the accuracy of predicting nonlinear and nonstationary time be used to improve the accuracy of predicting nonlinear and nonstationary time series. The results in series. The are results in this with studythose are consistent with those acquired by [22,26,29]. this study consistent acquired by [22,26,29].

a

17 18

Observed ANN SVM ANFIS ANN+L SVM+L ANFIS+L EEMD-ANN EEMD-SVM EEMD-ANFIS


19 20 21 22

Validation

Training

23 24 25 26 27 28 29 0

12

24

36

48

60

72

84

96

108

120

132

144

156

168

180

192

Time (Months)

b

16 Observed ANN SVM ANFIS ANN+L SVM+L ANFIS+L EEMD-ANN EEMD-SVM EEMD-ANFIS

17


18 19 20 21

Validation

Training

22 23 24 25 26 27 28 0

12

24

36

48

60

72

84

96

108

120

132

144

156

168

180

Time (Months)

Figure Figure 16. 16. Model Model predictions predictions versus versus observed observed data data (a) (a) M1255 M1255 and and (b) (b) STL185. STL185.

192

Water 2018, 2018, 10, 10, 730 x FOR PEER REVIEW Water Water 2018, 10, x FOR PEER REVIEW

Prediction Error (TR) Prediction Error (TR) (ft)(ft)

a a

17 of of 20 20 17 17 of 20

2 2 1 1 0 0 -1 -1 -2 -2

Prediction Error (VA) Prediction Error (VA) (ft)(ft)

b b

ANN ANN

SVM SVM

ANFIS ANFIS

ANN+L ANN+L

SVM+L SVM+L

ANFIS+L ANFIS+L

EEMD-ANN EEMD-ANN

EEMD-SVM EEMD-ANFIS EEMD-SVM EEMD-ANFIS

ANN ANN

SVM SVM

ANFIS ANFIS

ANN+L ANN+L

SVM+L SVM+L

ANFIS+L ANFIS+L

EEMD-ANN EEMD-ANN


2 2

1 1

0 0

-1 -1

-2 -2

Figure 17. Box plots of the prediction error at site M1255 (a) Tr—Training; (b) Va—Validation. Figure 17. 17. Box Box plots plots of of the the prediction prediction error error at at site site M1255 M1255 (a) (a) Tr—Training; Tr—Training; (b) Figure (b) Va—Validation. Va—Validation.

Prediction Error (TR) Prediction Error (TR) (ft)(ft)

a a

3 3 2 2 1 1 0 0 -1 -1 -2 -2 -3 -3

Prediction Error (VA) Prediction Error (VA) (ft)(ft)

b b

ANN ANN

SVM SVM

ANFIS ANFIS

ANN+L ANN+L

SVM+L SVM+L

ANFIS+L ANFIS+L

EEMD-ANN EEMD-ANN


ANN ANN

SVM SVM

ANFIS ANFIS

ANN+L ANN+L

SVM+L SVM+L

ANFIS+L ANFIS+L

EEMD-ANN EEMD-ANN


3 3 2 2 1 1 0 0 -1 -1 -2 -2 -3 -3

Figure 18. 18. Box Boxplots plots of of prediction prediction error error at at site site STL185 STL185 (a) (a) Tr—Training; Tr—Training; (b) Figure (b) Va—Validation. Va—Validation. Figure 18. Box plots of prediction error at site STL185 (a) Tr—Training; (b) Va—Validation.

Water 2018, 10, 730

18 of 20

6. Conclusions The reliable and accurate estimation of groundwater level fluctuation is essential in order to manage water resources and improve water-use efficiency. In this study, the prediction capability of the ANN, SVM and ANFIS models based on ensemble empirical mode decomposition (EEMD) were investigated using monthly groundwater level data collected at the M1255 and STL185 observation wells. The statistical parameters R, NMSE, RMSE, NS and AIC were used to assess the performance of the EEMD-ANN, EEMD-SVM and EEMD-ANFIS models. The results from the EEMD-ANN, EEMD-SVM and EEMD-ANFIS models were analyzed and compared with the results from the ANN, SVM and ANFIS models. The values of the statistical parameters indicated that the EEMD-ANN, EEMD-SVM, and EEMD-ANFIS models achieved better prediction results than the ANN, SVM and ANFIS. The average R value of three hybrid models was higher than that of the ANN, SVM and ANFIS models, and the average RMSE value of these hybrid models was less than that of the ANN, SVM and ANFIS models. The results in this study suggested that EEMD can effectively enhance predicting accuracy. The proposed EEMD could significantly improve the performance of the ANN, SVM and ANFIS date-driven models in groundwater level forecasting. The proposed three hybrid models based on EEMD had several obvious advantages: (a) it was convenient and effective to combine the EEMD with the ANN, SVM and ANFIS to forecast the nonstationary and nonlinear groundwater level fluctuations; (b) time series data on the groundwater level was only required in the hybrid models and exogenous factors affecting the groundwater level do not need to be considered in the research area; and (c) the prediction results of the EEMD-ANN, EEMD-SVM and EEMD-ANFIS models were more accurate when using the groundwater level time series decomposition. Therefore, this study supported the validity and applicability of the EEMD-ANN, EEMD-SVM and EEMD-ANFIS models in the prediction of groundwater levels. The results from this research would be beneficial for sustainable water resource management. Author Contributions: Y.G. wrote the manuscript; Z.W., G.X. and Z.Z. revised and proofread the manuscript. Acknowledgments: This study was supported by the National Key Research and Development Program (No. 2016YFC0402900, 2017YFC0403501), Key Research and Development Program of Qinghai Province (No. 2017-SF-116), National Natural Science Foundation of China (No. 41671020) and China Postdoctoral Science Foundation (No. 2017M610909). The authors gratefully acknowledge the helpful and insightful comments of the editor and anonymous reviewers. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3.

4.

5.

6.

Chang, F.J.; Chang, L.C.; Huang, C.W.; Kao, I.F. Prediction of monthly regional groundwater levels through hybrid soft-computing techniques. J. Hydrol. 2016, 541, 965–976. [CrossRef] Chang, J.; Wang, G.; Mao, T. Simulation and prediction of suprapermafrost groundwater level variation in response to climate change using a neural network model. J. Hydrol. 2015, 529, 1211–1220. [CrossRef] Emamgholizadeh, S.; Moslemi, K.; Karami, G. Prediction the groundwater level of bastam plain (Iran) by artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS). Water Resour. Manag. 2014, 28, 5433–5446. [CrossRef] Yoon, H.; Hyun, Y.; Ha, K.; Lee, K.-K.; Kim, G.-B. A method to improve the stability and accuracy of ANNand SVM-based time series models for long-term groundwater level predictions. Comput. Geosci. 2016, 90, 144–155. [CrossRef] Yoon, H.; Jun, S.C.; Hyun, Y.; Bae, G.O.; Lee, K.K. A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer. J. Hydrol. 2011, 396, 128–138. [CrossRef] Fallah-Mehdipour, E.; Bozorg Haddad, O.; Mariño, M.A. Prediction and simulation of monthly groundwater levels by genetic programming. J. Hydro-Environ. Res. 2013, 7, 253–260. [CrossRef]

Water 2018, 10, 730

7. 8.

9. 10. 11. 12. 13.

14. 15. 16.

17. 18. 19. 20.

21. 22.

23. 24. 25. 26. 27.

28. 29.

19 of 20

Nourani, V.; Alami, M.T.; Vousoughi, F.D. Wavelet-entropy data pre-processing approach for ANN-based groundwater level modeling. J. Hydrol. 2015, 524, 255–269. [CrossRef] Suryanarayana, C.; Sudheer, C.; Mahammood, V.; Panigrahi, B.K. An integrated wavelet-support vector machine for groundwater level prediction in visakhapatnam, india. Neurocomputing 2014, 145, 324–335. [CrossRef] Jiang, Q.; Tang, Y. A general approximate method for the groundwater response problem caused by water level variation. J. Hydrol. 2015, 529, 398–409. [CrossRef] Shirmohammadi, B.; Vafakhah, M.; Moosavi, V.; Moghaddamnia, A. Application of several data-driven techniques for predicting groundwater level. Water Resour. Manag. 2013, 27, 419–432. [CrossRef] Moosavi, V.; Vafakhah, M.; Shirmohammadi, B.; Behnia, N. A wavelet-ANFIS hybrid model for groundwater level forecasting for different prediction periods. Water Resour. Manag. 2013, 27, 1301–1321. [CrossRef] Shiri, J.; Ki¸si, Ö. Comparison of genetic programming with neuro-fuzzy systems for predicting short-term water table depth fluctuations. Comput. Geosci. 2011, 37, 1692–1701. [CrossRef] Valipour, M.; Banihabib, M.E.; Behbahani, S.M.R. Comparison of the arma, arima, and the autoregressive artificial neural network models in forecasting the monthly inflow of dez dam reservoir. J. Hydrol. 2013, 476, 433–441. [CrossRef] Xu, T.; Valocchi, A.J.; Choi, J.; Amir, E. Use of machine learning methods to reduce predictive error of groundwater models. Ground Water 2014, 52, 448–460. [CrossRef] [PubMed] Asefa, T.; Kemblowski, M.; Urroz, G.; Mckee, M. Support vector machines (SVMs) for monitoring network design. Ground Water 2005, 43, 413–422. [CrossRef] [PubMed] White, S.; Robinson, J.; Cordell, D.; Jha, M.; Milne, G. Urban Water Demand Forecasting and Demand Management: Research Needs Review and Recommendations; Water Services Association of Australia: Melbourne, Australia, 2003. Wang, Y.-H.; Yeh, C.H.; Young, H.W.V.; Hu, K.; Lo, M.T. On the computational complexity of the empirical mode decomposition algorithm. Phys. A Stat. Mech. Appl. 2014, 400, 159–167. [CrossRef] Ghalehkhondabi, I.; Ardjmand, E.; Young, W.A., II; Weckman, G.R. Water demand forecasting: Review of soft computing methods. Environ. Monit. Assess. 2017, 189, 313. [CrossRef] [PubMed] Wu, Z.; Huang, N.E. Ensemble empirical mode decomposition:A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [CrossRef] Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Chi, C.T.; Liu, H.H. The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. Math. Phys. Eng. Sci. 1998, 454, 903–995. [CrossRef] Wang, W.C.; Chau, K.W.; Xu, D.M.; Chen, X.Y. Improving forecasting accuracy of annual runoff time series using arima based on EEMD decomposition. Water Resour. Manag. 2015, 29, 2655–2675. [CrossRef] Wang, W.C.; Chau, K.W.; Qiu, L.; Chen, Y.B. Improving forecasting accuracy of medium and long-term runoff using artificial neural network based on EEMD decomposition. Environ. Res. 2015, 139, 46–54. [CrossRef] [PubMed] Wang, W.-C.; Xu, D.-M.; Chau, K.-W.; Chen, S. Improved annual rainfall-runoff forecasting using PSO–SVM model based on EEMD. J. Hydroinform. 2013, 15, 1377–1390. [CrossRef] Liu, H.; Tian, H.-Q.; Li, Y.-F. Comparison of new hybrid feemd-mlp, feemd-ANFIS, wavelet packet-mlp and wavelet packet-ANFIS for wind speed predictions. Energy Convers. Manag. 2015, 89, 1–11. [CrossRef] Duan, W.Y.; Han, Y.; Huang, L.M.; Zhao, B.B.; Wang, M.H. A hybrid emd-svr model for the short-term prediction of significant wave height. Ocean Eng. 2016, 124, 54–73. [CrossRef] Ausati, S.; Amanollahi, J. Assessing the accuracy of ANFIS, EEMD-GRNN, PCR, and MLR models in predicting PM 2.5. Atmos. Environ. 2016, 142, 465–474. [CrossRef] Napolitano, G.; Serinaldi, F.; See, L. Impact of emd decomposition and random initialisation of weights in ann hindcasting of daily stream flow series: An empirical examination. J. Hydrol. 2011, 406, 199–214. [CrossRef] Karthikeyan, L.; Nagesh Kumar, D. Predictability of nonstationary time series using wavelet and emd based arma models. J. Hydrol. 2013, 502, 103–119. [CrossRef] Hawinkel, P.; Swinnen, E.; Lhermitte, S.; Verbist, B.; Van Orshoven, J.; Muys, B. A time series processing tool to extract climate-driven interannual vegetation dynamics using ensemble empirical mode decomposition (EEMD). Remote Sens. Environ. 2015, 169, 375–389. [CrossRef]

Water 2018, 10, 730

30.

31. 32. 33. 34. 35.

36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48.

20 of 20

Humeau-Heurtier, A.; Abraham, P.; Mahe, G. Analysis of laser speckle contrast images variability using a novel empirical mode decomposition: Comparison of results with laser doppler flowmetry signals variability. IEEE Trans. Med. Imaging 2015, 34, 618–627. [CrossRef] [PubMed] Colominas, M.A.; Schlotthauer, G.; Torres, M.E. An unconstrained optimization approach to empirical mode decomposition. Digit. Signal Process. 2015, 40, 164–175. [CrossRef] Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall: Upper Saddle River, NJ, USA, 1998. ASCE. Artificial neural networks in hydrology. I: Preliminary concepts. J. Hydrol. Eng. 2000, 5, 115–123. ASCE. Artificial neural networks in hydrology. II: Hydrologic applications. J. Hydrol. Eng. 2000, 5, 124–137. Gong, Y.; Zhang, Y.; Lan, S.; Wang, H. A comparative study of artificial neural networks, support vector machines and adaptive neuro fuzzy inference system for forecasting groundwater levels near lake okeechobee, florida. Water Resour. Manag. 2016, 30, 375–391. [CrossRef] Vapnik, V.N. Statistical Learning Theory; Wiley: New York, NY, USA, 1998. Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. Platt, J.C. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods; MIT Press: Cambridge, MA, USA, 1999. Scholkopf, B.; Smola, A.J. Learning with Kernels: SUPPORT Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2001. Chang, C.C.; Lin, C.J. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [CrossRef] Jang, J.S.R. ANFIS: Adaptive ne twork based fuzzy inference system. IEEE Trans. Syst. Man Cybern. 1993, 23, 665–685. [CrossRef] Jang, J.S.R.; Sun, C.T.; Mizutani, E. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence; Prentice Hall: Upper Saddle River, NJ, USA, 1996. Valizadeh, N.; El-Shafie, A. Forecasting the level of reservoirs using multiple input fuzzification in ANFIS. Water Resour. Manag. 2013, 27, 3319–3331. [CrossRef] Akrami, S.A.; El-Shafie, A.; Jaafar, O. Improving rainfall forecasting efficiency using modified adaptive neuro-fuzzy inference system (MANFIS). Water Resour. Manag. 2013, 27, 3507–3523. [CrossRef] Nourani, V.; Komasi, M. A geomorphology-based ANFIS model for multi-station modeling of rainfall–runoff process. J. Hydrol. 2013, 490, 41–55. [CrossRef] Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, AC-19, 716–723. [CrossRef] The Math Works; Matlab; Version 9.0.0; The Math Works Inc.: Natick, MA, USA. Available online: http://www.mathworks.com (accessed on 3 March 2016). Shu, C.; Ouarda, T.B.M.J. Regional flood frequency analysis at ungauged sites using the adaptive neuro-fuzzy inference system. J. Hydrol. 2008, 349, 31–43. [CrossRef] © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).