The Application of the Artificial Neural Network ... - ScienceDirect

Available online at www.sciencedirect.com

ScienceDirect Procedia Engineering 154 (2016) 1217 – 1224

12th International Conference on Hydroinformatics, HIC 2016

The Application of the Artificial Neural Network Ensemble Model for Simulating Streamflow Do-Hun Leea,*, Doo-Sun Kanga a

Civil Engineering Department, Kyung Hee University, 1732 Deogyeong-daero Giheung-gu , Yongin-Si, Gyeonggi-do 446-701, Republic of Korea

Abstract

This study applies the ensemble method with artificial neural network (ANN) for simulating daily discharge. The study area is the Bocheong-cheon watershed, located in the central part of South Korea. The ANN outputs were generated by randomly sampling the initial weights, the nodes of hidden layer, and the training data. The correlation and principal component analysis are used for identifying the input vectors for ANN model. The impact affecting on the performance and the uncertainty of ANN ensemble model is investigated by considering the length of training data, the type of activation functions, the number of hidden nodes and training methods. The results indicate that the performance and uncertainty of the ANN ensemble model are affected by various elements related to ANN model development. The ensemble method applied in this case study provides satisfactory results for the stream flow modelling and is useful for assessing the uncertainty of ANN model. However, the computational efficiency of the ensemble method is not effective and might be problematic for an application that needs a rapid simulation. ©©2016 Authors. Published by Elsevier Ltd. Ltd. This is an open access article under the CC BY-NC-ND license 2016The The Authors. Published by Elsevier (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of HIC 2016. Peer-review under responsibility of the organizing committee of HIC 2016 Keywords: Neural networks; Ensemble; Uncertainty; Stream flow modelling

1. Introduction

Over the last decade, ANN models have been increasingly used and are considered a very useful tool in solving

* Corresponding author. Tel.: +82-31-201-2546; fax: +82-31-202-8854. E-mail address: [email protected]

1877-7058 © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license

(http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of HIC 2016

doi:10.1016/j.proeng.2016.07.434

1218

Do-Hun Lee and Doo-Sun Kang / Procedia Engineering 154 (2016) 1217 – 1224

hydrology and water resource-related problems [1,2,3,4]. Since the ANN model is not based on the underlying physical concepts and relies on the available data, the procedures involved in building the ANN model are not clearly established [3]. Furthermore, the effects of the elements related to the ANN model development on the uncertainty of the ANN model have not been fully investigated. Instead of investigating the response of a single model, the ensemble method can be used to improve the model performance and to quantify the uncertainty of a model. Ensemble method with an ANN model can generate ensemble members by varying the initial weights, the ANN topology or architecture, training algorithm, and training data [5,6]. Baker and Ellison [5] discuss the concept of ensembles and the bias/variance trade-off of the ensemble prediction in environmental modelling. Ensembles reduce model variance without reducing model bias. It has been shown that the ensemble ANN model are more robust than a single ANN model and improve the generalization ability of ANN model [7,8,9]. The ANN model contains large degrees of freedom and a number of uncertainty sources affecting the outputs. The uncertainties arising from input vector and network structure of ANN were investigated by Asefa [10] using a GLUE-based ANN approach. Other studies consider the uncertainties in ANN outputs arising from variation in initial parameters and training data [11,12,13]. The focus of this study is to investigate the applicability of the ensemble ANN model for modelling stream flow and to identify the effects of the length of training data, the structure of hidden layer, and training method affecting on the performance and the uncertainty of the ensemble ANN model. 2. Methods and material 2.1. Study area and data The study area is the Bocheong-cheon watershed, located in the central part of South Korea. The watershed area is about 350 km2 and the average slope is approximately 24%. The elevation of the watershed ranges from 130m to 800m. The main land use types are forest (62%) and agricultural land (27%). This study uses daily rainfall data from eight rainfall stations within the watershed and daily discharge data measured at Gidae gauging station (Fig. 1). These data can be obtained from Water Management Information System (www.wamis.go.kr). Table 1 shows the statistical moments of daily rainfall and discharge data for two different time periods. The data for the period 19871990 are used for the calibration purposes and the data for the period 1995-1996 are used for validating the models.

Fig. 1. Bocheong-cheon watershed with eight rainfall stations and six sub-watersheds. Table 1. Statistical moments of daily rainfall and measured dailyGdischarge. Period Average of daily rainfall Standard deviation of Average of daily (mm) daily rainfall (mm) discharge (mm)

Standard deviation of daily discharge (mm)

1987 – 1990

3.69

10.45

2.73

9.20

1995 – 1996

2.98

9.97

2.24

6.11


1219

2.2. ANN model The ANN is a very flexible mathematical structure which can identify the complex, nonlinear relationship between input and output. The ANN architecture used in this study is a feed-forward, multi-layer perceptron (MLP) whereby the information is propagated in one direction from input layer to output layer. The MLP network is the most popular network architecture applied to the prediction of water resource variables in river systems [3]. This study applied the MLP network with an input layer, a hidden layer, and an output layer. Each layer consists of a number of nodes. The number of nodes in the input layer is determined by the number of input features and the number of nodes in the output layer is determined by the number of output features. But the number of nodes in the hidden layer is not known a priori and can be determined by minimizing the error of ANN model. The activation functions of the hidden and output layers need to be specified for ANN modeling. The activation function of the hidden layer transforms the input into the output. The logistic function is a bounded, monotonic function and is one of the most widely used activation functions due to the simplicity of its derivative [1]. Yonaba et al. [14] compared three different activation functions for neural network multistep ahead stream flow forecasting and suggested the tangent sigmoid function as the most pertinent function. In this study, the logistic and tangent sigmoid functions are applied for comparing the performance and uncertainty of ANN ensemble results. The logistic function is given as

f ( z)

1

(1)

1 ez

The tangent sigmoid function can be expressed as

f ( z)

2 1 e 2 z

1

(2)

The logistic function is bounded between 0 and 1 while the tangent sigmoid function is bounded between -1 and 1. The linear activation function for the output layer is applied to the present study because it has an advantage in extrapolation beyond the range of the training data [15]. The detailed explanation of the ANN concept, the ANN model development procedure, and ANN applications in hydrology and water resources can be found in ASCE [1,2] Maier and Dandy [15] and Maier et al. [3]. 2.3. ANN training For the given inputs and outputs, the weights/biases can be determined through the network training. The training of the ANN in this study was performed by the Levenberg–Marquardt back-propagation algorithm. The backpropagation algorithm consists of two steps. The first step is a forward pass, in which the input is passed forward through the network to reach the output layer. After the errors are computed at the output layer, the errors are propagated back toward the input layer with the weights being modified. Further details of the back-propagation algorithm can be found in ASCE [1]. The ANN training often suffers from the over-fitting problem which results in the poor performance for the validation data not seen in the training stage. The early stopping method and Bayesian regularization method are two approaches being used to prevent the over-fitting problem of ANN. The early stopping method has computational efficiency and the calibration data set is divided into the training data set and the testing data set. The training data set and testing data set are concurrently used in the training process such that the ANN training stops when the errors associated with the testing data set reach a minimum. Hence, the early stopping method reduces the training time since the training process finishes before the errors associated with training data set are minimized. The Bayesian regularization method [16,17] is intended to improve the generalization ability of the ANN using the modified cost function given as

1220


cost function = D MSE + E MSW

(3)

where MSE is the sum of squares of the network errors, MSW is the sum of squares of the network weights and biases, D and E are the regularization parameters. The inclusion of a penalty term MSW penalizes the network with large weights and biases such that the weights of the network can be minimized. The Bayesian regularization method needs only the training data set unlike the early stopping method which needs both the training data set and the testing data set. Examples of the application of the Bayesian regularization method can be found in [7,14] for stream flow modelling or in [8] for pedotransfer function modelling. 2.4. Input vectors for ANN An important step in ANN model development is the determination of proper input vectors. Maier et al. [3] reviewed the taxonomy and various methods for determining the input significance and input independence of the ANN model. In this study, the analytical model-free approach, which is based on the correlation analysis, is used to assess the significance of relationships between potential inputs and outputs. Although the correlation indicates a statistical measure of significance, it only considers the linear dependence between variables. Sudheer et al. [18] outlined a procedure for selecting an appropriate input vector based on correlation analysis in a rainfall-runoff ANN model. The cross-correlation between rainfall and the measured discharge at the outlet of watershed and the autocorrelation for the measured discharge in the Bocheong-cheon watershed are statistically significant up to two days. These results suggest that the potential input vectors for an ANN model are the lagged rainfall and discharge with a time lag of two days. The ANN model structure can be represented as: y(t) = f(p(t), p(t-1), p(t-2), y(t-1), y(t-2))

(4)

where y(t) is the predicted daily discharge at the outlet of the watershed, p(t), p(t-1), and p(t-2) are antecedent daily rainfalls with a time lag from zero to two days, and y(t-1) and y(t-2) are the daily discharge measured at the outlet of the watershed with time lags of one and two days. Determining input independence of the ANN model is another important step in model development. In this study, input independence is based on principal component analysis (PCA). ThePCA eliminates the redundant correlated inputs, which might cause over-fitting and additional local minima [3]. PCA has been shown to effectively minimize the data redundancy of input variables in data-driven models. The principal components are linear combination of variables such that the components are uncorrelated and the total variance remains the same. A set of principal components includes all information in the original set of variables. Table 2 shows the eigenvalues associated with five input variables defined in the ANN model (4). According to Table 2, the first four PCs explain 95.88% of total variance in the original input variables. Based on the variance ratio of the eigenvalues, the four PCs are selected as input vectors for the ANN model. The four PCs can be expressed in terms of original variables and the eigenvectors as

PCi (t) = e i1 p(t) + e i 2 p(t-1) + e i 3 p(t-2)+ e i 4 y(t-1)+ e i 5 y(t-2)

( i =1, 2, 3, 4)

where PC i indicates i th principal component, e i1 , e i 2 , e i 3 , e i 4 and e i 5 are the eigenvectors.

Table 2. Eigenvalues of PCA for input variables of ANN model. PCs eigenvalue Variance ratio (%) PC1 PC2 PC3 PC4 PC5

2.83 1.09 0.52 0.35 0.21

56.59 21.87 10.47 6.95 4.12

Cumulative variance ratio (%) 56.59 78.46 88.93 95.88 100.00

(5)


1221

2.5. Creation of ensemble members In this study the ensemble members are created by randomly sampling the initial weights, the nodes of hidden layer, and the calibration data. In all cases 200 ANN models are trained and used for the evaluation of the ensemble output and uncertainty of ANN models. There is no theoretical guideline for the number of ANN models being trained in order to provide the reliable ensemble output. Anctil and Lauzon [7], and Yonaba et al. [14] used 50 ANN models to obtain the ensemble output while Srivastav et al. [11] trained 300 models. We investigated the performance of ensemble output ranging from 50 to 1000 networks and found that 200 networks are appropriate for this study. Based on the Kolmogorov theorem, Asefa [10] allowed the maximum number of hidden nodes to be three times the number of input nodes for implementing GLUE-based ANN. With reference to Asefa [10], the number of hidden nodes is randomly sampled between 0.5 times and 3 times the number of input nodes. For the ANN training with the early stopping method, the calibration data set for the period 1987-1990 needs to be divided into the training and testing data sets. The length of testing data is fixed at 20% of the length of the calibration data. The random sampling from the calibration data set allowed the training data set and testing data set independent from each other. 2.6. Evaluation measures of performance and uncertainty The HydroTest web resource developed by Dawson et al. [11] provides a wide range of evaluation metrics and the present study adopted the coefficient of efficiency (CE), and the persistence index (PI) for the evaluation of ensemble results. A CE value of one represents the perfect model and a negative value indicates that the performance of the model is worse than that of no known model. The PI metric is very similar to the CE metric except that it uses the measured discharge shifted by one time step, whereas the CE metric uses the average of the measured discharge. The PI score of the perfect model is one and negative values of PI are possible. The different uncertainty measures can be used to assess the uncertainty of model outputs. In this study, the pfactor and d-factor applied by Abbaspour et al. [20,21] are employed along with S-factor proposed by Xiong et al. [22]. The p-factor is defined as the percentage of the measured data bracketed by the lower and upper limits of prediction bounds. The p-factor represents the goodness of prediction bounds and a perfect model exhibits 100% of p-factor when all measured data are enveloped by prediction bounds. The p-factor is the same uncertainty measure as the containing ratio reported by Xiong et al. [22] and the percentage of coverage reported by Zhang et al., [23] and Kasiviswanathan et al. [24]. The d-factor is the average distance between the lower and upper limits of prediction bounds normalized by the standard deviation of the measured variable. The d-factor is analogous to the average width of prediction interval [22,23,24]. The d-factor is the average width of prediction interval divided by the standard deviation of the measured variable. The d-factor represents the degree of uncertainty in the simulated outputs and is close to zero when the prediction interval is the minimum. The uncertainty of the outputs becomes larger as the d-factor increases. The S-factor assesses the average asymmetry degree of prediction bounds with respect to the measured variable. The S-factor becomes zero when the perfect symmetry of prediction interval with respect to the measured variable is achieved. Further discussion of S-factor, including three possible scenarios for the location of measured discharges, can be found in Xiong et al. [22]. The lower limit and upper limit of prediction bounds in this study are calculated by the 2.5th and 97.5th percentiles of the cumulative distribution of every simulated point, respectively. Thus, the confidence level employed in this study amounts to 95%. Although the confidence levels can be arbitrarily chosen, the width of the prediction interval is related to the confidence level and thus a larger prediction interval is associated with larger confidence level. 3. Results The ensemble performance for the validation period is evaluated by CE and PI measures which are calculated using the median of 200 ANN outputs and the observed discharge. The uncertainty of 200 ANN outputs for the validation period is evaluated by p-factor, d-factor and S-factor. There are several methods for combining the

1222


ensemble members and the mean of ensemble members is widely used for evaluating the ensemble performance because of its simplest [5]. But the median may be a better measure than the mean when the outputs are not normally distributed [5]. Fig 2 and Fig 3 show the effects of the length of training data on the performance and the uncertainty of ensemble model. Both the early stopping (ES) method and Bayesian regularization (BR) method are compared in figures where the logistic activation function in the hidden layer is used for the simulation. The results show that CE and PI scores are improved as the length of training data increases. The p-factor is slightly decreasing as the length of training data increases. The decreasing rate of p-factor for BR method is more evident than ES method. The contrasting effects between p-factor and CE/PI scores might be due to the fact that the lower and upper limits of prediction bounds might be narrowed as the length of training data increases. The d-factor of ES method is greatly reduced as the length of training data increases while d-factor of BR method is slightly decreasing. The Sfactor for ES method is slightly decreased and S-factor for BR method is slightly increased as the length of training data increases. The results suggest that the performance and the uncertainty of ANN ensemble model depend on the length of training data as well as the training method.

Fig. 2. Performance of the ensemble model for the variation in the length of training data.

Fig. 3. Uncertainty of the ensemble model for the variation in the length of training data. The table 3 shows the effects of the activation functions on the performance and the uncertainty of ensemble model. The LogES MLP applied the logistic function of equation (1) with ES method and the TanES MLP applied the tangent sigmoid function of equation (2) with ES method. Similarly, the LogBR and TanBR MLPs used the logistic and tangent sigmoid activation functions with BR method. As shown in Table 3, the scores of five measures between the logistic function and the tangent sigmoid function are nearly the same except for d-factor which reveals larger difference compared with other measures. The result indicates that the effects of the activation functions on the performance and uncertainty of ensemble model might not be important. Table 3. Performance and uncertainty of the ensemble model for different activation functions. MLP CE PI p-factor d-factor LogES TanES LogBR TanBR

0.83 0.83 0.79 0.79

0.77 0.78 0.73 0.73

0.99 0.99 0.92 0.93

1.14 1.22 0.73 0.78

S-factor 0.25 0.27 0.25 0.24


1223

Fig 4 and Fig 5 show the effects of the number of hidden nodes on the performance and the uncertainty of the ensemble model. The best CE and PI scores are attained with 8 nodes for ES method and with 12 nodes for BR method. The CE and PI scores of ES method are largely affected by the number of hidden nodes while the effects of the number of hidden nodes on CE and PI scores of BR method are not significant. The best p-factor is attained with 12 nodes for both ES and BR methods and the p-factor scores do not change with the hidden nodes being larger than 4 nodes. The best d-factor is attained with 2 nodes for ES method and with 4 nodes for BR method. The d-factor scores of ES method are largely affected by the number of hidden nodes while the impact of the number of hidden nodes on the d-factor scores is relatively small for BR method. The main reason for smaller d-factor score of BR method is due to the inclusion of a penalty term in cost function which minimizes the weights of the network. It should be noted that for ES method, the p-factor score improves as the number of hidden nodes increases while dfactor score deteriorates as the number of hidden nodes increases. The best S-factor score is attained with 2 nodes for ES method and with 12 nodes for BR method. All S-factor scores of both ES and BR methods are steady with respect to the number of hidden nodes and the impact of the number of hidden nodes on S-factor might be negligible.

Fig. 4. Performance of the ensemble model for the variation in the number of hidden nodes.

Fig. 5. Uncertainty of the ensemble model for the variation in the number of hidden nodes.

4. Conclusions The application of ensemble method with the ANN model suggests that the performance of the ensemble model is satisfactory and the performance and uncertainty of ANN ensemble model are affected by a number of elements with regard to ANN model development. The performance of ensemble model can be improved and the degree of uncertainty of the ensemble model is reduced by increasing the length of training data. The impact of the different activation functions on the performance and uncertainty of the ensemble model appeared to be negligible. The performance of ensemble model based on ES method is better than that based on BR method, but the uncertainty of ensemble model based on ES method is greater than that based on BR method. The performance and uncertainty of ANN ensemble model based on ES method are highly influenced by the number of hidden nodes and it may be very difficult to determine the ANN structure which optimizes the different performance and uncertainty measures simultaneously. It is recommended to use a maximum number of hidden nodes satisfying the Kolmogorov theorem for the application of ensemble model based on BR method because the performance and uncertainty measure scores are improved by the use of larger hidden nodes. The ensemble method requires a series of repetition simulation

1224


which might be problematic issue in practice where the simulation time is important. Further testing of the ensemble ANN model employed in this study is needed for diverse watersheds with longer validation data.

References [1] ASCE Task Committee on Application of Artiﬁcial Neural Networks in Hydrology, Artificial Neural Networks in Hydrology I: Preliminary Concepts, Journal of Hydrologic Engineering. 5 (2000a) 115-123. [2] ASCE Task Committee on Application of Artiﬁcial Neural Networks in Hydrology, Artificial Neural Networks in Hydrology II: hydrologic applications, Journal of Hydrologic Engineering. 5 (2000b) 124-137. [3] H.R. Maier, A. Jain, G.C. Dandy, K.P. Sudheer, Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions, Environmental Modelling & Software. 25 (2010) 891-909. [4] R.J. Abrahart, F. Anctil, P. Coulibaly, C.D. Dawson, N.J. Mount, L.M. See, A.Y. Shamseldin, D.P. Solomatine, E. Toth, R.L. Wilby, Two decades of anarchy? Emerging themes and outstanding challenges for neural network river forecasting, Progress in Physical Geography. 36 (2012) 480-513. [5] L. Baker, D. Ellison, The wisdom of crowds - ensembles and modules in environmental modelling, Geoderma. 147 (2008) 1-7. [6] A.J.C. Sharkey, On combining artificial neural nets, Connection Science. 8 (1996) 299-314. [7] F. Anctil, N. Lauzon, Generalisation for neural networks through data sampling and training procedures, with applications to streamflow predictions. Hydrology and Earth System Sciences. 8 (2004) 940-958. [8] L. Baker, D. Ellison, Optimisation of pedotransfer functions using an artificial neural network ensemble method, Geoderma. 144 (2008) 212224. [9] C. Shu and D.H. Burn, Artificial neural network ensembles and their application in pooled flood frequency analysis, Water Resources Research. 40 (2004) W09301, doi:10.1029/2003WR002816. [10] T. Asefa, Ensemble streamflow forecast: A GLUE-based neural network approach, Journal of the American Water Resources Association 45 (2009) 1155-1163. [11] R.K. Srivastav, K.P. Sudheer, I. Chaubey, A simplified approach to quantifying predictive and parametric uncertainty in artificial neural network hydrologic models, Water Resources Research. 43 (2007) W10407, doi:10.1029/2006WR005352. [12] M. Talebizadeh, S. Morid, S.A. Ayyoubzadeh, M. Ghasemzadeh, Uncertainty analysis in sediment load modeling using ANN and SWAT model, Water Resources Management. 24 (2010) 1747-1761. [13] M.K. Tiwari and C. Chatterjee, Uncertainty assessment and ensemble flood forecasting using bootstrap based artificial neural networks (BANNs), Journal of Hydrology. 382 (2010) 20-33. [14] H. Yonaba, F. Anctil, V. Fortin, Comparing sigmoid transfer functions for neural network multistep ahead streamflow forecasting, Journal of Hydrologic engineering. 15 (2010) 275-283. [15] H.R. Maier, G.C. Dandy, Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications, Environmental Modelling & Software. 15 (2000) 101-124. [16] D.J.C. MacKay, Bayesian interpolation, Neural Computation. 4 (1992) 415-447. [17] F.D. Foresee, M.T. Hagan, Gauss-Newton approximation to Bayesian regularization, Proceedings of the 1997 International Joint Conference on Neural Networks. (1997) 1930-1935. [18] K.P. Sudheer, A.K. Gosain, K.S. Ramasastri, A data-driven algorithm for constructing artificial neural network rainfall-runoff models, Hydrological Processes. 16 (2002) 1325-1330. [19] C.W. Dawson, R.J. Abrahart, L.M. See, HydroTest: A web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts, Environmental Modelling & Software. 22 (2007) 1034-1052. [20] K.C. Abbaspour, A. Johnson, M Th. van Genuchten, Estimating uncertain flow and transport parameters using a sequential uncertainty fitting procedure, Vadose Zone Journal. 3 (2004) 1340-1352. [21] K.C. Abbaspour, J. Yang, I. Maximov, R. Siber, K. Bogner, J. Mieleitner, J. Zobrist, R. Srinivasan, Modelling hydrology and water quality in the pre-alpine/alpine Thur watershed using SWAT, Journal of Hydrology. 333 (2007) 413-430. [22] L. Xiong, M. Wan, X. Wei, K.M. O'Connor, Indices for assessing the prediction bounds of hydrological models and application by generalised likelihood uncertainty estimation, Hydrological Sciences Journal. 54 (2009) 852-871. [23] X. Zhang, F. Liang, R. Srinivasan, M. Van Live, Estimating uncertainty of streamflow simulation using Bayesian neural networks, Water Resources Research. 45 (2009) W02403, doi:10.1029/2008WR00730. [24] K.S. Kasiviswanathan, R. Cibin, K.P. Sudheer, I. Chaubey, Constructing prediction interval for artificial neural network rainfall runoff models based on ensemble simulations, Journal of Hydrology. 499 (2013) 275-288.