A Comparative Study of Artificial Neural Networks ... - Springer Link

3 downloads 3955 Views 858KB Size Report
Nov 19, 2015 - near Lake Okeechobee, Florida. Yicheng ... College of Architecture and Civil Engineering, Beijing University of Technology, Beijing 100124,.
Water Resour Manage (2016) 30:375–391 DOI 10.1007/s11269-015-1167-8

A Comparative Study of Artificial Neural Networks, Support Vector Machines and Adaptive Neuro Fuzzy Inference System for Forecasting Groundwater Levels near Lake Okeechobee, Florida Yicheng Gong 1,2 & Yongxiang Zhang 1 & Shuangshuang Lan 1,3 & Huan Wang 4

Received: 14 April 2015 / Accepted: 19 October 2015 / Published online: 19 November 2015 # Springer Science+Business Media Dordrecht 2015

Abstract Accurate and reliable prediction of groundwater level is essential for water resource development and management. This study was carried out to test the validity of three nonlinear time-series intelligence models, artificial neural networks (ANN), support vector machines (SVM) and adaptive neuro fuzzy inference system (ANFIS) in the prediction of the groundwater level when taking the interaction between surface water and groundwater into consideration. These three models were developed and applied for two wells near Lake Okeechobee in Florida, United States. 10 years data-sets including hydrological parameters such as precipitation (P), temperature (T), past groundwater level (G) and lake level (L) were used as input data to forecast groundwater level. Five quantitative standard statistical performance evaluation measures, correlation coefficient (R), normalized mean square error (NMSE), root mean squared error (RMSE), Nash-Sutcliffe efficiency coefficient (NS) and Akaike information criteria (AIC), were employed to evaluate the performances of these models. The conclusions achieved from this research would be beneficial to the water resources management, it proved the necessity and effect of considering the surface water-groundwater

* Yicheng Gong [email protected] * Yongxiang Zhang [email protected] 1

Key Laboratory of Beijing for Water Quality Science and Water Environment Recovery Engineering, College of Architecture and Civil Engineering, Beijing University of Technology, Beijing 100124, China

2

School of Earth Sciences, Ohio State University, Columbus, OH 43210, USA

3

Key Laboratory of Subsurface Hydrology and Ecological Effect in Arid Region of Ministry of Education, Xi’an 710054, China

4

Department of Water Resource, China Institute of Water Resource and Hydropower Research, Beijing 100038, China

376

Y. Gong et al.

interaction in the prediction of groundwater level. These three models were proved applicable to the prediction of groundwater level one, two and three months ahead for the area that is close to the surface water, for example, the lake area. The models using P, T, G and L achieved better prediction result than that using P, T and G only. At the same time, results from ANFIS and SVM models were more accurate than that from ANN model. Keywords Groundwater level . Artificial neural network . Support vector machine . Adaptive neuro fuzzy inference system

1 Introduction Groundwater is an important water resource for domestic, agricultural, and industrial activities in many countries. More accurate prediction of groundwater level could help avoid overexploiting groundwater and assist water resource management. However, the prediction of groundwater is very complex and highly nonlinear in nature as it depends on many complex factors such as precipitation, temperature, etc. Therefore, it is imperative to develop effective models to precisely predict groundwater level (Verma and Singh 2013). Many different models such as numerical groundwater models, nonlinear empirical models and data driven models have been used to predict groundwater level (Emamgholizadeh et al. 2014; Sun and Xu, 2011; Sun et al., 2006). Numerical groundwater models is required and important in establishing the governing equation in the subsurface, assigning physical properties of the domain and model parameters and calibrating the model simulation. It is difficult to analyze the geologic structure and set the geological parameters. Also, it is hard to obtain sufficient data from long time series for numerical model development (Yoon et al. 2011). Recently, new data driven models such as artificial neural network (ANN), support vector machines (SVM) and adaptive neuro-fuzzy inference system (ANFIS) have been proved efficient in predicting for complex hydrologic system (Emamgholizadeh et al. 2014; Güldal and Tongal 2010; Sreekanth et al. 2010). In hydrologic research field, ANN model has been well developed and applied to predication of nonlinear problems such as precipitation (Nastos et al. 2014), sediment load (Afan et al. 2015), river flow (He et al. 2014), chlorophyll-a levels (Cho et al. 2014), regional index flood (Latt et al. 2014), etc. Nastos et al. (2014) used ANN model to evaluate the potential of daily extreme precipitation at Athens, Greece. The results of proved that the ANN model is successful and feasible for the assessment of daily extreme precipitation. Afan et al. (2015) used two different ANN algorithms, the feed forward neural network (FFNN) and radial basis function (RBF) to estimate the daily sediment load from Rantau Panjang station on Johor River. The results indicated that the FFNN model has superior performance than the RBF model. He et al. (2014) developed ANN, ANFIS and SVM models for forecasting river flow with complicated topography in the semiarid mountain regions. The results suggested that three models can be successfully applied to estimate river flow with complicated topography and SVM model performed better than ANN and ANFIS models. Cho et al. (2014) used ANN model to predict chlorophyll-a levels in reservoir formed by damming a river. It was concluded that ANN trained with the time series data can provide information regarding the principal factors affecting algal bloom at Lake Juam and successfully predict the Chl-a concentration in reservoir. Latt et al. (2014) developed ANN model to estimate regional index flood. The result showed that ANN model can capture the nonlinear relationships between the index floods and the catchment and is superior to the conventional regression method.

A comparative study of artificial neural networks, support vector

377

SVM model is also as efficient as ANN in modeling nonlinear systems. SVM model has been used by many researchers to solve hydrology, hydrogeology problems. For example, Tabari et al. (2012) used SVM, ANFIS, regression and climate based models to estimate reference evapotranspiration using limited climatic data in a semi-arid highland environment. The results achieved with the SVM and ANFIS models for evapotranspiration estimation are outperform than those obtained using the regression. Yoon et al. (2011) applied SVM and ANN models to predict groundwater levels in a coastal aquifer. The result showed that prediction ability of SVM model is similar to or even better than those of the ANN model in prediction stage. Noori et al. (2011) used PCA, Gamma test, and forward selection techniques to evaluate support vector machine performance for predicting monthly stream flow. The result of PCA-SVM is superior to that of GT-SVM and PCA-ANN models for forecasting monthly stream flow. Çimen and Kisi (2009) used ANN model and SVM model to predict water level of Lake Van in Turkey and found that the SVM based model performs better than the ANN. Ch et al. (2013) constructed SVM model with quantum behaved particles warm optimization in predicting monthly stream flow and found that SVM-QPSO is far better technique for forecasting monthly streamflow. Meanwhile, SVM model can provide a high degree of accuracy and reliability. Wen et al. (2015) used SVM model to evaluate daily reference evapotranspiration using limited climatic data. The results showed that the performance of SVM method is the best among ANN and three empirical models including Priestley-Taylor, Hargreaves, and Ritchie models. ANFIS model with high abilities can be suitable for modeling non-linear dynamic hydrological systems. ANFIS models have been developed and applied to predict diverse water resources variables for an effective water management. Talebizadeh and Moridnejad (2011) compared ANFIS model with ANN model in forecasting lake level fluctuations. ANFIS model was proved superior to ANN model in terms of efficiency. Hipni et al. (2013) used ANN and ANFIS for forecasting of daily dam water levels. Quantitative standard statistical performance values proved that SVM is a superior model to ANFIS for predicting dam water levels. Awan and Bae (2014) used ANFIS to predict long-term dam inflow and got the conclusion that the prediction for all selected dams using the ANFIS model with categorical rainfall forecast was better than the ANFIS model with only preceding month’s dam inflow and weather data. Goyal et al. (2014) improved the accuracy of daily pan evaporation estimation in subtropical climates using machine learning models including ANFIS, LS-SVR, Fuzzy Logic, and ANN. The results showed that theses machine learning models outperform the traditional HGS and SS empirical methods. ANN, SVM and ANFIS data driven models were applied in formal hydrology studies. The interaction between groundwater and surface water is an important factor and would apparently affect the groundwater level fluctuation in the research area that is close to surface water, such as lake and river (Ala-aho et al. 2015; Martinez et al. 2015; O’Connor and Moffett 2015). In this study, ANN, SVM and ANFIS nonlinear time-series intelligence models were developed and used to predict groundwater level fluctuations from research area near lake, which is mainly aimed at proving the necessity and effect of considering groundwater and surface water interaction in groundwater level prediction. At the same time, the validity of the three developed models in these conditions were tested. The data used include monthly precipitation (P), temperature (maximum, mean and minimum) (T), past groundwater level (G) as well as lake level (L) data. Meanwhile, the result of groundwater level prediction taking lake level fluctuation into account is compared with that not considering the fluctuation. The predictions from ANN, SVM and ANFIS models are compared with the observed values and evaluated based on quantitative standard statistical analysis.

378

Y. Gong et al.

2 Methods 2.1 Artificial Neural Network Artificial neural network (ANN) is a simplified model of biological neuron system consisting of a massive parallel distributed information processing system that has certain performance characteristics resembling biological neural networks of the human brain (Haykin 1999; Samarasinghe 2006). The architecture of ANN neural network, which is also called a multilayer perceptron network (MLPN), consists of an input layer, one or more hidden layer, an output layer and a layer comprising one or more artificial neurons. Feed-forward back propagation neural networks (FFNN) is one of the simplest artificial neural networks and has been successfully used for modeling and predicting in the earth sciences (ASCE 2000a, b). The mathematical expression of the MLPN feed-forward process is described as follows: ! N X ð1Þ wji xi þ b j yj ¼ f i¼1

where xi is the ith nodal value in the previous layer, yj is the jth nodal value in the present layer, bi is the bias of the jth node in the present layer, wji is a weight connecting xi and yj, Nis the number of nodes in the previous layer, and f is the activation function in the present layer. Since a neural network problem is solved according to the selected training algorithm, algorithm that provides the best fit to the data is required in prediction. Different training algorithms involved in this study were Levenberg-Marquardt (LM), Bayesian regularization (BR), Scaled conjugate gradient (SCG), Gradient descent with momentum and adaptive learning rate (GDX), etc. (ASCE 2000a, b).

2.2 Support Vector Machine Support vector machine (SVM) is a relatively new machine-learning approach in data-driven research fields based on statistical learning theory (Vapnik 1995; Vapnik 1998). The process of an SVM estimator (f) in regression can be expressed as follows: f ðxÞ ¼ w⋅ϕðxÞ þ b

ð2Þ

wherewi is a weight vector, and b is a bias. ϕ denotes a nonlinear transfer function that maps the input vectors into a high-dimensional feature space in which theoretically a simple linear regression can cope with the complex nonlinear regression of the input space. Vapnik (1995) introduced the convex dual optimization problem with an ε-insensitivity loss function to obtain the solution. Many algorithms have been suggested for solving the dual optimization problem of the SVM. An overview of these algorithms is found in (Shevade et al. 2000; Scholkopf and Smola 2002). In the present research, the sequential minimal optimization (SMO) algorithm, introduced by (Platt 1999; Scholkopf and Smola 2002), has been employed to solve this dual optimization problem. The main advantage of the SMO is that an analytical solution of a subset can be obtained directly without invoking a quadratic optimizer. The model parameters of the SVM are trained by SMO. The calibration and prediction were performed using the programming codes of the Library for Support Vector Machines (LIBSVM) (Chang and Lin 2011). Figure 1 shows the SVM models schematic representation.

A comparative study of artificial neural networks, support vector Support Vectors

Input Vector

x

379

Kernel Funcon

x1

K ( x, x1 )

x1

K ( x, x1 )

x1

K ( x, x1 )

x1

K ( x, x1 )

Output Layer

f (x ) Output

Fig. 1 Schematic diagram of SVM

2.3 Adaptive Neuro Fuzzy Inference System Adaptive neuro fuzzy inference system (ANFIS), first introduced by (Jang 1993; Jang et al. 1997), is a combination of an adaptive neural network and a fuzzy inference system. ANFIS is a universal approximation methodology and is capable of approximating any real continuous function on a compact set to any degree of accuracy. ANFIS used in this study is the first-order Sugeno fuzzy model (Jang 1993). The Sugeno’s fuzzy structure of ANFIS model is consist of five layers and is given in Fig. 2. The more comprehensive presentation of ANFIS for modeling nonlinear phenomena can be found in the literature (e.g., evaporation modeling, (Moghaddamnia et al. 2009); groundwater level prediction, (Shiri and Kişi 2011); rainfall-runoff modeling, (Vernieuwe et al. 2005)).

3 Study Area and Available Data 3.1 Study Site and Data Preprocessing The study site is located on the northeast shore of Lake Okeechobee, Florida, United States. The local average annual temperature and precipitation in the study area are 23.3 °C and 828.3 mm over past 10 years, respectively. The precipitation, temperature and lake level are considered as exogenous factors affecting the groundwater level in this area. Monthly precipitation and temperature Layer1

Layer2

Layer4

Layer3

x y

A1

x

A2

Layer5



w1

N

w1

w1 f1

f

B1

Y

B2

Fig. 2 Schematic diagram of ANFIS



w2

N

w2 f 2

w2

x y

380

Y. Gong et al.

data were obtained from National Oceanic and Atmospheric Administration (NOAA). Monthly lake level data of Lake Okeechobee was collected from South Florida Water Management District. Monthly groundwater data was obtained from United States Geological Survey (USGS). In this study, monthly time-series data of the precipitation, temperature (maximum, mean and minimum), lake level and groundwater level are used to forecast future groundwater fluctuations on well M1048 and STL313. The locations of the observed wells in the study are shown in Fig. 3. From the observed data during 12 years (from 1998 to 2009), the first 10 years data are used for training and the second 2 years data are used for validation. Figure 4a-d illustrates the monthly time-series data of precipitation, mean temperature, lake level and groundwater level at well M1048. Time-series data is supposed to be normalized by Eq. (3), so the variables in the training data are scaled to a limit between 0 and 1. Y ¼

X −X min X max −X min

ð3Þ

whereY is the normalized data, Xis the time-series data, Xmin is the minimum value of the time-series data and Xmax is maximum values of time-series data. Precipitation (P), temperature (T) (maximum, mean and minimum), lake level (L) and groundwater level (G) are considered as input variables of ANN, SVM and ANFIS models. Statistical methods such as the autocorrelation function (ACF) and the partial autocorrelation function (PACF) are generally employed for selecting appropriate data-driven models (Lin et al. 2006). The ACF and PACF of well number M1048 from lag-0 to lag-13 are presented in Fig. 5. Figure 5 suggests a significant correlation up to lag-2 month for this time series at 95 % confidence level interval. The partial autocorrelation coefficients indicate incorporating monthly groundwater level data up to 2 month lag in input vector to ANN, SVM and ANFIS models.

3.2 Performance Criteria Five statistical parameters are used to evaluate the effectiveness of the ANN, SVM and ANFIS models in this study. Correlation coefficient (R) is defined as the degree of correlation between

Fig. 3 Location map of United States and observation wells

A comparative study of artificial neural networks, support vector

(a)

381

(b) 45

700

40

600

35

Precipitation (mm)

500 30 400

25

300

20 15

200 10 100

5

0

0

Jan-98

Jan-00 Jan-02

Dec-09

Jan-08

Jan-06

Jan-04

(c)

Jan-98

Jan-04

Jan-06

Jan-08

Dec-09

Jan-00 Jan-02

Jan-04

Jan-06

Jan-08

Dec-09

(d) 15

30

25

20

Well Level (ft)

20

Lake Level (ft)

Jan-00 Jan-02

15

10

25

30

35

5

0

40

Jan-98

Jan-00 Jan-02

Dec-09

Jan-08

Jan-06

Jan-04

Jan-98

Fig. 4 Plots of a Precipitation b Mean Temperature c Lake Level d Well Level

the predicted and observed values:

  XN    O − O P − P i i i¼1 R ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi    XN   2X N  2 Oi − O Pi − P i¼1 i¼1

ð4Þ

Normalized Mean Square Error (NMSE) can be calculated as follows: XN ðOi −Pi Þ2 n−1 i¼1 NMSE ¼ 2 n XN  Oi −O i¼1 Partial Autocorrelation Function

Autocorrelation Function

1

1

Partial Autocorrelations

Autocorrelation

95% Confidence band 0.5

0

-0.5

ð5Þ

95% Confidence band 0.5

0

-0.5 1

2

3

4

5

6

7 Lag

8

9

10

11

12

13

1

2

3

4

5

6

7 Lag

8

9

10

11

12

13

Fig. 5 The autocorrelation function (ACF) and the partial autocorrelation function (PACF) of groundwater level series at site M1048

382

Y. Gong et al.

Root Mean Squared Error (RMSE) can be calculated as follows: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uX N u ðOi −Pi Þ t i¼1 RMSE ¼ N

ð6Þ

Nash-Sutcliffe efficiency coefficient (NS) can be calculated as follows: XN

2

ðOi −Pi Þ NS ¼ 1− X i¼1 2 N Oi −O i¼1

ð7Þ

The Akaike Information Criteria (AIC) can be calculated as follows:  X 2 N 1 AIC ðk Þ ¼ N ⋅In ð O −P Þ þ 2k i i i¼1 N

ð8Þ

where Oi is observed value, Pi is predicted value,Ois the average of the observed value,Pis the average of the predicted value, N is the total number of values, and k is the number of free parameters used in models. The Akaike Information Criteria (AIC) were used for selecting the best time series mode (Akaike 1974). The best fit between predicted value and observed value would have R = 1, NMSE = 0, RMSE = 0, NS = 1, AIC = 0, respectively.

4 Results and Discussion 4.1 ANN Model Selecting of training algorithm and the number of hidden nodes which affect the performance of ANN model are important. Levenberg-Marquardt (LM), Bayesian regularization (BR), Scaled conjugate gradient (SCG), Gradient descent with momentum and adaptive learning rate (GDX) are used to train the time-series data and the best algorithm can be selected according to the R and RMSE. The results presented in Table 1 show that BR is better than other algorithms, which is used in this study for groundwater level forecasting. The optimal Table 1 R and RMSE of different training algorithms of 10 neurons at site M1048 for lag time 1 Training Algorithm

Training R

P+T+G+L

P+T+G

Validation RMSE

R

RMSE

BR

0.949

0.575

0.918

1.163

GDX

0.926

0.687

0.895

1.132

LM

0.926

0.688

0.894

1.316

SCG

0.950

0.574

0.880

1.415

BR

0.949

0.570

0.867

1.296

GDX

0.938

0.634

0.855

1.318

LM SCG

0.963 0.954

0.495 0.550

0.741 0.791

1.950 1.789

A comparative study of artificial neural networks, support vector

383

number of neurons in the hidden layer is identified using the trial and error procedure by varying the number of hidden neurons from 5 to 50. The one with minimum RMSE is selected as the optimal network. The effect of changing the number of hidden neurons on the RMSE of the time series data is shown in Fig. 6. For investigation of the effects of input structure and input data on the performance of FFNN model, two input structures and three lag times are considered, as is shown in Table 2 and Table 3. The results show that the 1 lead time of four input variables has the minimum RMSE and AIC values in the validation. The minimum RMSE and AIC at site M1048 are 1.163 and 7.246, respectively. The minimum RMSE and AIC at site STL313 are 0.761 and −13.091, respectively.

4.2 SVM Model The same input structures and lag times are introduced to SVM model, the results are shown in Table 4 and Table 5. The results show that the 1 lead time of four input variables at site M1048 and 3 lead time of four input variables at site STL313 have the minimum RMSE and AIC values in the validation. At site M1048, the minimum RMSE and AIC are 1.087 and 4.019, respectively; at site STL313, the minimum RMSE and AIC are 0.641 and −21.330, respectively.

4.3 ANFIS Model In the ANFIS model, fuzzy inference system (FIS) structure is generated from training data using very popular grid partition algorithm (Matlab 2013). The same input structures and lag times introduced to ANFIS model, as is shown in Table 6 and Table 7. Table 6 and Table 7 suggest that 1 lead time of four input variables at two sites has the minimum RMSE and AIC values in the validation. The minimum RMSE and AIC of site M1048 are 1.112 and 5.081, respectively. The minimum RMSE and AIC of site STL313 are 0.590 and −25.286, respectively.

4.4 Comparison ANN, SVM and ANFIS As can be seen from Table 1 to Table 7, statistical parameter values of training and validation are better when the lake level is considered as input variable. Moreover, the groundwater level Fig. 6 Effect of changing the number of nodes in the hidden layer on the RMSE of the FNN

0.80

RMSE

0.75

0.70

0.65

0.60

0.55 0

10

20

30

40

Number of nodes in the hidden layer

50

384

Y. Gong et al.

Table 2 Result of modeling from ANN model with different lag time at site M1048 Lead Time

Training R

Validation

NMSE RMSE NS

AIC

R

NMSE RMSE NS

AIC

P + T + G + L LT1

0.949 0.099

0.576

0.900 −132.587 0.918 0.479

1.163

LT2

0.911 0.171

0.755

0.828 −67.495

0.899 0.636

1.340

0.336 14.065

LT3

0.914 0.166

0.745

0.832 −70.618

0.768 0.679

1.385

0.291 15.646

LT1

0.950 0.097

0.570

0.902 −134.794 0.867 0.595

1.296

0.379 12.458

LT2

0.862 0.260

0.932

0.737 −16.886

0.896 0.594

1.295

0.380 12.416

LT3

0.727 0.469

1.250

0.527 53.621

0.772 0.658

1.363

0.314 14.867

P+T+G

0.500 7.246

fluctuations in these sites are affected by groundwater-lake interaction because of these sites are close to the lake. These results indicate that the influence of fluctuating lake level should be considered for groundwater level predication for the sites that are close to the lake. In other words, interaction of groundwater and surface water (lake, reservoir, river, etc.) should not be ignored for predicting groundwater level. The analysis about the model calibration and validation for two sites is listed in Table 2 to Table 7. At the training stage, the mean RMSE value in ANN, SVM and ANFIS models for well M1048 are 0.804, 0.873 and 0.411 respectively; the mean of RMSE in these models for well STL313 are 0.558, 0.509 and 0.239 respectively. The mean RMSE values of the ANFIS model are smaller than those of other models in the training stage, which implies that the calibration capability of the ANFIS model is better than that of other models for the given data. In validation stage, the mean RMSE value in ANN, SVM and ANFIS models for well M1048 are 1.229, 1.102 and 1.166 respectively; the mean of RMSE in these models for well STL313 are 0.779, 0.710 and 0.618 respectively. The prediction results of ANFIS was close to that of SVM model for well M1048; the prediction results of ANFIS was more accurate than that of SVM model for well STL313. If the NS criterion in a model is equal to 1, then this model can be claimed to produce a perfect estimation. Normally, a model can be considered as accurate if the NS criterion is greater than 0.8 (Shu and Ouarda 2008). Most of the NS values for the ANN, ANFIS and SVM models in training stage are over 0.8, which indicates that all these models achieved acceptable results. The NS values for the ANFIS model predicting of the groundwater value are higher than those for the SVM and ANN models, which also indicates that the overall Table 3 Result of modeling from ANN model with different lag time at site STL313 Lead Time

Training R

Validation

NMSE RMSE NS

AIC

R

NMSE RMSE NS

AIC

P + T + G + L LT1

0.940 0.118

0.376

0.881 −234.827 0.789 0.495

0.761

0.483 −13.091

LT2

0.889 0.209

0.500

0.789 −166.518 0.756 0.723

0.919

0.246 −4.035

P+T+G

LT3

0.800 0.368

0.664

0.629 −98.416

0.801 0.707

0.910

0.262 −4.549

LT1

0.894 0.210

0.501

0.788 −165.812 0.777 0.544

0.798

0.433 −10.859

LT2

0.811 0.350

0.647

0.647 −104.668 0.788 0.516

0.777

0.462 −12.118

LT3

0.798 0.364

0.659

0.633 −99.979

0.781

0.456 −11.877

0.758 0.521

A comparative study of artificial neural networks, support vector

385

Table 4 Result of modeling from SVM model with different lag time at site M1048 Lead Time

Training R

Validation

NMSE RMSE NS

AIC

R

NMSE RMSE NS

AIC

P + T + G + L LT1

0.926 0.146

0.698

0.853 −86.147 0.921 0.418

1.087

0.563 4.019

LT2

0.865 0.258

0.928

0.740 −18.031 0.880 0.532

1.227

0.444 9.802

LT3

0.857 0.264

0.938

0.734 −15.243 0.811 0.610

1.312

0.364 13.045

LT1

0.911 0.173

0.759

0.826 −66.143 0.929 0.442

1.117

0.539 5.314

LT2

0.867 0.246

0.906

0.752 −23.674 0.831 0.898

1.593

0.063 22.345

LT3

0.832 0.306

1.010

0.692 2.386

1.541

0.123 20.748

P+T+G

0.746 0.840

quality of estimation of the ANFIS model is better than the SVM and ANN models. (See Table2 to Table 7). Compared with the ANN, ANFIS and SVM models perform from the RMSE and R values in the validation stage, RMSE of the SVM model is a bit better than both the ANN and the ANFIS model, and R values of the ANFIS is a bit higher than other models. Obviously, RMSE value of ANFIS model is a bit higher than that of SVM model, the NS value and R values of SVM model in validation stage perform a bit less than that of ANFIS model. Therefore, SVM model is also a good data-driven model in validation stage. (See Table 2 to Table 7). Figures 7 and 8 displays the comparison between the observed and predicted groundwater level at M1048 and STL313 observation wells using the best ANN, SVM and ANFIS models for 1-month-ahead forecast. The correlation coefficient (R2) values for validation period for all the models are also shown in scatter plots for two wells in Figs. 7 and 8 corresponding to each model predicted values. From the scatter plots drawn between observed and predicted groundwater levels for two observation wells, it can be seen that ANFIS model predicts the groundwater levels with less scatter and all points are close to the straight line when compared to ANN and SVM models. Figure 9a-b shows the observed groundwater levels versus predicted groundwater levels using three models in training stage and validation stage. Figure 7 shows that R2 of ANFIS performs a bit higher than that of SVM model, while the error of the local point for SVM performs a bit less than that for ANFIS. Thus, both ANFIS and SVM models can be considered as good data-driven models at site M1048. Figure 8 shows that R2 and the error of the local point of ANFIS perform higher than that of SVM and ANN models. The plots can also indicate that predictions of the groundwater levels in SVM Table 5 Result of modeling from SVM model with different lag time at site STL313 Lead Time

Training R

Validation

NMSE RMSE NS

AIC

R

NMSE RMSE NS

AIC

P + T + G + L LT1

0.905 0.182

0.466

0.816 −183.024 0.797 0.420

0.701

0.562 −17.048

LT2

0.925 0.145

0.416

0.854 −210.491 0.831 0.606

0.842

0.368 −8.265

LT3

0.937 0.129

0.392

0.870 −224.736 0.804 0.351

0.641

0.633 −21.330

LT1

0.885 0.217

0.510

0.781 −161.802 0.805 0.443

0.720

0.537 −15.755

P+T+G

LT2

0.826 0.318

0.616

0.679 −116.117 0.771 0.464

0.737

0.515 −14.639

LT3

0.800 0.360

0.656

0.637 −101.277 0.746 0.493

0.760

0.485 −13.198

386

Y. Gong et al.

Table 6 Result of modeling from ANFIS model with different lag time at site M1048 Lead Time

Training R

Validation

NMSE RMSE NS

AIC

R

NMSE RMSE NS

AIC

P + T + G + L LT1

0.996 0.007

0.156

0.993 −445.760 0.931 0.437

1.112

0.544 5.081

LT2

0.991 0.018

0.243

0.982 −339.304 0.851 0.763

1.468

0.204 18.443

LT3

0.990 0.020

0.260

0.980 −323.147 0.731 0.855

1.555

0.107 21.179

LT1

0.972 0.055

0.427

0.945 −204.235 0.867 0.527

1.220

0.450 9.555

P+T+G

LT2

0.936 0.124

0.642

0.875 −106.250 0.865 0.843

1.543

0.120 20.830

LT3

0.915 0.163

0.737

0.836 −73.172

1.516

0.151 19.965

0.773 0.813

model are superior to that of ANN model. So, ANFIS model can be considered as the best estimation models at site STL313. Box plot is used to check whether the models are able to predict these variations and corresponding prediction errors. Figure 10 shows that most of prediction errors are concentrated around zero, which supports the efficiency of the optimum architecture obtained. Figure 10 also displays a comparison of errors between the results obtained by the ANN, SVM and ANFIS models in the training period and validation period. The results indicate that out of all the models employed in current study, ANFIS is the most accurate model in training stage. In validation stage, the prediction on groundwater level for well M1048 showed there is slight difference between the result acquired from ANFIS and SVM; while in the prediction for well STL313, ANFIS performed more accurate than SVM. These results are consistent with that acquired by (He et al. 2014; Shirmohammadi et al. 2013) results. Overall, the results of the ANFIS and SVM models are superior to the ANN in the near-lake groundwater level prediction. Comparing the results of ANFIS and SVM models, SVM and ANFIS models have their own advantages and disadvantages. Concretely, RMSE value of ANFIS model was higher that of SVM model, the NS and R values of SVM model in validation stage was a little lower that of ANFIS model. The selection of the best applicable model should balance the benefits of the statistical parameters according to the performance criteria in both training stage and validation stage. The prediction results of ANFIS are close to that of SVM model for well M1048; the prediction results of ANFIS are more accurate than that of SVM model for well STL313. Table 7 Result of modeling from ANFIS model with different lag time at site STL313 Lead Time

Training R

Validation

NMSE RMSE NS

AIC

R

NMSE RMSE NS

AIC

P + T + G + L LT1

0.996 0.007

0.093

0.993 −570.162 0.868 0.298

0.590

0.689 −25.286

LT2

0.990 0.020

0.156

0.979 −445.677 0.788 0.552

0.803

0.424 −10.505

P+T+G

LT3

0.996 0.008

0.100

0.992 −552.532 0.831 0.509

0.772

0.469 −12.436

LT1

0.973 0.052

0.249

0.948 −333.367 0.864 0.358

0.647

0.627 −20.903

LT2

0.925 0.143

0.414

0.855 −211.583 0.809 0.578

0.822

0.397 −9.396

LT3

0.923 0.148

0.421

0.850 −207.593 0.794 0.456

0.731

0.524 −15.057

A comparative study of artificial neural networks, support vector

387

(a) 20

Observed ANN forecasted

24

32

ANN forecasted (ft)

Groundwater Level (ft)

22

26 28 30 32

y=1.0959x-1.8525 R2=0.8427

30

28

26

34

24 24

36 0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

30

32

30

32

Observed (ft)

Time (Months)

(b) 20

Observed SVM forecasted

24

32

SVM forecasted (ft)

Groundwater Level (ft)

22

26 28 30 32

y=0.8276x+5.7291 R2=0.8482

30

28

26

34 24 24

36 0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

Observed (ft)

Time (Months)

(c) 20

Observed ANFIS forecasted

24

32

ANFIS forecasted (ft)

Groundwater Level (ft)

22

26 28 30 32

y=1.1348x-2.9872 R2=0.8673

30

28

26

34 36 0

2

4

6

8

10

12

14

16

Time (Months)

18

20

22

24

24 24

26

28

Observed (ft)

Fig. 7 Comparison of observed and predicted groundwater level at M1048 observation well using a best ANN model, b best SVM model, c best ANFIS model for 1-month-ahead forecast

5 Conclusions The accurate and reliable estimation of groundwater level is one of the most important issues in the management of water resources. In this study, monthly groundwater levels data at the M1048 and STL313 observation wells are used to investigate the prediction capability of ANN, SVM and ANFIS models. Multivariate time series analysis is performed when various hydrological variables are included, such as P, T (Max, Mean and Min), G and L. Five standard

388

Y. Gong et al.

(a) 18

Observed ANN forecasted

28

22

ANN forecasted (ft)

Groundwater Level (ft)

20

24 26 28

y=0.7472x+6.9467 R2=0.6225

27

26

25

30 24 32 23 23

34 0

2

4

6

8

10

12

14

16

18

20

22

24

24

25

26

27

28

27

28

27

28

Observed (ft)

Time (Months)

(b) 18

Observed SVM forecasted

22

28

SVM forecasted (ft)

Groundwater Level (ft)

20

24 26 28

y=0.5506x+11.9843 R2=0.6352

27

26

25

30 24 32 23 23

34 0

2

4

6

8

10

12

14

16

18

20

22

24

24

25

26

Observed (ft)

Time (Months)

(c) 18

Observed ANFIS forecasted

22

28

ANFIS forecasted (ft)

Groundwater Level (ft)

20

24 26 28 30

y=0.7635x+6.4363 R2=0.7534

27

26

25

24 32 34 0

2

4

6

8

10

12

14

16

Time (Months)

18

20

22

24

23 23

24

25

26

Observed (ft)

Fig. 8 Comparison of observed and predicted groundwater level at STL313 observation well using a best ANN model, b best SVM model, c best ANFIS model for 1-month-ahead forecast

statistical parameters, R, NMSE, RMSE, NS and AIC, are used for evaluating the performance of the three models above. The result achieved from the model using P, T, G and L were compared with that from the model using P, T and G to test the effect of considering the interaction between groundwater and surface water. Standard statistical parameters showed that the models using P, T, G and L achieved better prediction result than that using P, T and G only: the average R value is bigger and the average RMSE is smaller in the models which involve P, T, G and L. The groundwater prediction result

A comparative study of artificial neural networks, support vector

389

(a) 22 Observed ANN SVM ANFIS

23

Groundwater Level (ft)

24 25 26 27 28 29 30 31 32

Validation

Training

33 34 0

12

24

36

48

60

72

84

96

108

120

132

144

Time (Months)

(b) 20 Observed ANN SVM ANFIS

21

Groundwater Level (ft)

22 23 24 25 26 27 28 29 30

Validation

Training

31 32 0

12

24

36

48

60

72

84

96

108

120

132

144

Time (Months) Fig. 9 Best models predictions versus observed data a M1048 and b STL313

(a)

(b) 2

Prediction Error (ft)

Prediction Error (ft)

2

0

-2

0

-2

ANN(Tr)

ANN(Va)

SVM(Tr)

SVM(Va) ANFIS(Tr) ANFIS(Va)

ANN(Tr)

ANN(Va)

SVM(Tr)

SVM(Va) ANFIS(Tr) ANFIS(Va)

Fig. 10 Box plots of Prediction Error a M1048 and b STL313 (Tr Training; Va Validation)

390

Y. Gong et al.

when considering lake level fluctuation is superior to the result that is not considering the lake level fluctuation. Therefore, the result of groundwater level prediction demonstrates that lake level variations as the main driving force of groundwater discharge or recharge considering the groundwater-lake interaction should not be ignored. Also, the result shows that the three models could properly predict groundwater level one, two or three months ahead. Near-lake groundwater level predictions from ANFIS and SVM models are more accurate than that from ANN model. This study proves the validity and applicability of the ANN, SVM and ANFIS models in the prediction of groundwater level when considering the lake level fluctuation. The result achieved from this study could work as useful guidelines for selecting appropriate modeling techniques when the data sets for groundwater level prediction is collected from near lake area. The necessity and effect of considering the interaction between groundwater and surface water in groundwater level prediction has been proved in this study, which could be beneficial to water resource management that it could help to achieve more valid and accurate prediction results. Acknowledgments This study was supported by National Key Technology Research and Development Program (2011BAC12B00), Basic Research Foundation of Beijing University of Technology and China Scholarship Council. The authors gratefully acknowledge the helpful and insightful comments of the editor and anonymous reviewers.

References Afan HA, El-Shafie A, Yaseen ZM, Hameed MM, Wan Mohtar WHM, Hussain A (2015) ANN based sediment prediction model utilizing different input scenarios. Water Resour Manag 29:1231–1245 Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control AC-19:716–723 Ala-aho P, Rossi PM, Isokangas E, Kløve B (2015) Fully integrated surface–subsurface flow modelling of groundwater–lake interaction in an esker aquifer: model verification with stable isotopes and airborne thermal imaging. J Hydrol 522:391–406 ASCE (2000a) Artificial neural networks in hydrology. I: Preliminary concepts. J Hydrol Eng 5:115–123 ASCE (2000b) Artificial neural networks in hydrology. II: Hydrologic Applications Journal of Hydrologic Engineering 5:124–137 Awan JA, Bae D-H (2014) Improving ANFIS based model for long-term dam inflow prediction by incorporating monthly rainfall forecasts. Water Resour Manag 28:1185–1199 Ch S, Anand N, Panigrahi BK, Mathur S (2013) Streamflow forecasting by SVM with quantum behaved particle swarm optimization. Neurocomputing 101:18–23 Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1– 27 Cho S et al. (2014) Factors affecting algal blooms in a man-made lake and prediction using an artificial neural network. Measurement 53:224–233 Çimen M, Kisi O (2009) Comparison of two different data-driven techniques in modeling lake level fluctuations in Turkey. J Hydrol 378:253–262 Emamgholizadeh S, Moslemi K, Karami G (2014) Prediction the groundwater level of bastam plain (Iran) by artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS). Water Resour Manag 28:5433–5446 Goyal MK, Bharti B, Quilty J, Adamowski J, Pandey A (2014) Modeling of daily pan evaporation in sub tropical climates using ANN, LS-SVR, fuzzy logic, and ANFIS. Expert Syst Appl 41:5267–5276 Güldal V, Tongal H (2010) Comparison of recurrent neural network, adaptive neuro-fuzzy inference system and stochastic models in eğirdir lake level forecasting. Water Resour Manag 24:105–128 Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall He Z, Wen X, Liu H, Du J (2014) A comparative study of artificial neural network, adaptive neuro fuzzy inference system and support vector machine for forecasting river flow in the semiarid mountain region. J Hydrol 509:379–386

A comparative study of artificial neural networks, support vector

391

Hipni A, El-shafie A, Najah A, Karim OA, Hussain A, Mukhlisin M (2013) Daily forecasting of dam water levels: comparing a support vector machine (SVM) model with adaptive neuro fuzzy inference system (ANFIS). Water Resour Manag 27:3803–3823 Jang JSR (1993) ANFIS : adaptive Ne twork based fuzzy inference system. IEEE Trans Syst, Man, Cybern 23: 665–685 Jang JSR, Sun CT, Mizutani E (1997) Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence. Prentice Hall Latt ZZ, Wittenberg H, Urban B (2014) Clustering hydrological homogeneous regions and neural network based index flood estimation for ungauged catchments: an example of the chindwin river in Myanmar. Water Resour Manag 29:913–928 Lin J-Y, Cheng C-T, Chau K-W (2006) Using support vector machines for long-term discharge prediction. Hydrol Sci J 51:599–612 Martinez JL, Raiber M, Cox ME (2015) Assessment of groundwater-surface water interaction using long-term hydrochemical data and isotope hydrology: headwaters of the condamine river, southeast Queensland, Australia. Sci Total Environ 536:499–516 Matlab (2013) version 8.1.0 Natick, Massachusetts: The Math Works Inc. (http://www.mathworks.com) Moghaddamnia A, Ghafari Gousheh M, Piri J, Amin S, Han D (2009) Evaporation estimation using artificial neural networks and adaptive neuro-fuzzy inference system techniques. Adv Water Resour 32:88–97 Nastos PT, Paliatsos AG, Koukouletsos KV, Larissi IK, Moustris KP (2014) Artificial neural networks modeling for forecasting the maximum daily total precipitation at athens, Greece. Atmos Res 144:141–150 Noori R, Karbassi AR, Moghaddamnia A, Han D, Zokaei-Ashtiani MH, Farokhnia A, Gousheh MG (2011) Assessment of input variables determination on the SVM model performance using PCA, gamma test, and forward selection techniques for monthly stream flow prediction. J Hydrol 401:177–189 O’Connor MT, Moffett KB (2015) Groundwater dynamics and surface water–groundwater interactions in a prograding delta island, Louisiana, USA. J Hydrol 524:15–29 Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Christopher JCB, Alexander JS (eds) Bernhard S, lkopf. MIT Press, Advances in kernel methods, pp. 185–208 Samarasinghe S (2006) Neural networks for applied sciences and engineering. Auerbach Publications Scholkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK (2000) Improvements to the SMO algorithm for SVM regression. IEEE Trans Neural Netw 11:1188–1193 Shiri J, Kişi Ö (2011) Comparison of genetic programming with neuro-fuzzy systems for predicting short-term water table depth fluctuations. Comput Geosci 37:1692–1701 Shirmohammadi B, Vafakhah M, Moosavi V, Moghaddamnia A (2013) Application of several data-driven techniques for predicting groundwater level. Water Resour Manag 27:419–432 Shu C, Ouarda TBMJ (2008) Regional flood frequency analysis at ungauged sites using the adaptive neuro-fuzzy inference system. J Hydrol 349:31–43 Sreekanth PD, Sreedevi PD, Ahmed S, Geethanjali N (2010) Comparison of FFNN and ANFIS models for estimating groundwater level. Environ Earth Sci 62:1301–1310 Sun S, Xu X (2011) Variational inference for infinite mixtures of gaussian processes with applications to traffic flow prediction. IEEE Trans Intell Transp Syst 12:466–475 Sun S, Zhang C, Yu G (2006) A bayesian network approach to traffic flow forecasting. IEEE Trans Intell Transp Syst 7:124–132 Tabari H, Kisi O, Ezani A, Hosseinzadeh Talaee P (2012) SVM, ANFIS, regression and climate based models for reference evapotranspiration modeling using limited climatic data in a semi-arid highland environment. J Hydrol 444-445:78–89 Talebizadeh M, Moridnejad A (2011) Uncertainty analysis for the forecast of lake level fluctuations using ensembles of ANN and ANFIS models. Expert Syst Appl 38:4126–4135 Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag New York, Inc. Vapnik VN (1998) Statistical learning theory. Wiley Verma AK, Singh TN (2013) Prediction of water quality from simple field parameters. Environ Earth Sci 69: 821–829 Vernieuwe H, Georgieva O, De Baets B, Pauwels VRN, Verhoest NEC, De Troch FP (2005) Comparison of datadriven takagi–sugeno models of rainfall–discharge dynamics. J Hydrol 302:173–186 Wen X, Si J, He Z, Wu J, Shao H, Yu H (2015) Support-vector-machine-based models for modeling daily reference evapotranspiration with limited climatic data in extreme arid regions. Water Resour Manag 29: 3195–3209 Yoon H, Jun SC, Hyun Y, Bae GO, Lee KK (2011) A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer. J Hydrol 396:128–138