Comparison of three updating models for real time ...

6 downloads 0 Views 2MB Size Report
middle reaches of the Huai River in East China for a case study. Using 13 flood ...... Crissman RD, Chiu CL, Yu W, Mizumura K, Corbu I (1993). Uncertainties in ...
Comparison of three updating models for real time forecasting: a case study of flood forecasting at the middle reaches of the Huai River in East China Kailei Liu, Cheng Yao, Ji Chen, Zhijia Li, Qiaoling Li & Leqiang Sun

Stochastic Environmental Research and Risk Assessment ISSN 1436-3240 Stoch Environ Res Risk Assess DOI 10.1007/s00477-016-1267-x

1 23

Your article is protected by copyright and all rights are held exclusively by SpringerVerlag Berlin Heidelberg. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”.

1 23

Author's personal copy Stoch Environ Res Risk Assess DOI 10.1007/s00477-016-1267-x

ORIGINAL PAPER

Comparison of three updating models for real time forecasting: a case study of flood forecasting at the middle reaches of the Huai River in East China Kailei Liu1,2 • Cheng Yao1 • Ji Chen2 • Zhijia Li1 • Qiaoling Li1 • Leqiang Sun3

 Springer-Verlag Berlin Heidelberg 2016

Abstract This study explores the performances of three real-time updating models in improving flood forecasting accuracy. The first model is the K-nearest neighbor (KNN) algorithm. The KNN algorithm estimates forecast errors based upon the most similar samples of errors rather than on the most recent ones. The two other updating models are the Kalman filter (KF) and a combined model incorporating both the KF and KNN procedures. To compare the performances of these three models, this study uses the middle reaches of the Huai River in East China for a case study. Using 13 flood events occurring from 2003 to 2010 as examples, one hydraulic routing model is applied for flood simulation. Subsequently, the three updating models are utilized with lead times of 1- to 8-h for updating the outputs of the hydraulic model. Comparison of the updated results from the three updating models reveals that all three updating models improve the performance of the hydraulic model for flood forecasting. Among them, the KNN model performs more robustly for forecasts with a longer lead time than the other two updating models. Statistical results show that the KNN model is capable of providing excellent forecasts with an 8-h lead time in both the calibration and validation periods.

& Ji Chen [email protected] 1

College of Hydrology and Water Resources, Hohai University, Xikang Road, Nanjing 210098, Jiangsu Province, China

2

Department of Civil Engineering, The University of Hong Kong, Pokfulam, Hong Kong, China

3

Civil Engineering Department, University of Ottawa, Ottawa, ON K1N 6N5, Canada

Keywords Hydraulic routing model  Real-time updating model  K-nearest neighbor  Parameter calibration  Kalman filter  The combined model  Flood forecasting

1 Introduction Operational flood forecasting aims to accurately forecast the discharge/water stage. However, the forecasting skill of current mathematical models is generally limited. This may be due to for example unavoidable uncertainties and errors associated with rainfall data (Islam and Sivakumar 2002). Hydraulic routing models may also introduce errors (Sivakumar et al. 2001) which in part may be due to the fact that the performance of a hydraulic routing model is normally affected by errors in a stage-discharge rating curve which is a frequently used downstream boundary condition (Wilson et al. 2002; Shen et al. 2015). In addition, human activities and hydraulic engineering works (e.g., dams and levees) can influence a river flow state, and improper treatment of them in a model may lead to poor results in operational flood forecasting (Todini 1999; Gupta et al. 2003; Sivakumar 2003). Hence, a large amount of research for improving flood forecasting technology has been carried out over the past decades (e.g., Georgakakos and Smith 1990; Todini 2005; Komma et al. 2008; Kan et al. 2015; Shen et al. 2015). It has been acknowledged that forecast updating models may be of considerable importance in the improvement of forecast accuracy (Moore et al. 2005; Tong et al. 2014). Updating models can be applied to correct a model output in the forecast period to account for inaccuracy of the input data, state variables, model parameters and/or output variables of the hydraulic routing models. Updating procedures are largely derived by operational real-time

123

Author's personal copy Stoch Environ Res Risk Assess

observations, and are carried out by a recursive manner (Butts et al. 2002; Madsen and Skotner 2005). The most widely used procedure in operational forecasting is the updating of output variables (i.e., error prediction) (Refsgaard 1997; Sivakumar et al. 2002), and the errors are considered to be predictable and related to the previous and subsequence errors of water stage or discharge (Box et al. 2011). An error prediction model has the advantage of simplicity and does not require presupposition of the source of errors (Butts et al. 2002). The most commonly used model is the standard linear autoregressive (AR) model, which depends on the assumption that the forecast errors are strongly related to the most recent errors (Box et al. 2011). However, updating theories (e.g., the AR model) based on the regression of errors are not suitable for evaluating forecast errors of non-stationary processes, especially for the peak flow forecasting or flood forecasting with significant effects from flood management activities. In this study, three updating models are used for improving the flood forecasting accuracy of a hydraulic routing model. The first model is the nonparametric K-nearest neighbor (KNN) algorithm, which has been widely used for machine learning, pattern recognition, statistics and weather forecasting since the 1950s (Blackith 1958). Karlsson and Yakowitz (1987) introduced the KNN model to the field of hydrology in order to test its application in rainfall-runoff forecasting, and it was found that its performance was superior to the autoregressive moving average (ARMA) when dealing with large datasets. The attractive theoretical properties of the KNN model include that it does not require regression equations or the calculation of correlation coefficients, and it is easily accommodated into highly nonlinear dynamic models (e.g., the one-dimensional hydraulic model). The two other updating models used in this study are the Kalman filter (KF) and a combined model, which is developed through incorporating both the KF and KNN procedures. The KF was introduced to adjust model variables and parameters of a flood forecasting system, by Husain (1985), and its value in improving accuracy in flood forecasting has been reported by many studies (e.g., Refsgaard 1997; Young 2002; Shen et al. 2015). The KF focuses on updating state variables or parameters of dynamical systems (Hunt et al. 2007), for example, the Manning’s roughness n which represents the resistance of river flow due to channel roughness caused by bed forms, bend effects, and eddy losses (Anderson and Burt 1985). In addition, variables of the Saint– Venant equations and the modeled, or observed, discharge and water stage are assembled as a state vector. These updating procedures, unlike the output updating procedures correcting the estimated water stage or discharge at one cross section only, are able to update the entire state of a routing model simultaneously. According to the Kalman filtering

123

theory, besides KF, researchers have developed an extended Kalman filter (EKF) and an ensemble Kalman filter (EnKF). However, although these two have been received many attentions (Anderson and Anderson 1999; Miyoshi and Kunii 2012; Wu et al. 2013, 2014) in recent years, Verlaan (1998) and Moore et al. (2005) found that the EKF and the EnKF require large amounts of computing power with their performance not better than KF. Additionally, the hydraulic model, which can be transcribed into linear functions, can be issued coupling with KF easily, and KF instead of EKF and EnKF is adopted in this study. In this study, the KF procedure is used with the state vector designed by Ge et al. (2005), and the Sage–Husa model is used to evaluate characteristics of the process noise in the KF procedure. The third model used in this study is the combined model which is developed by integrating the KNN and KF models. This study adopts a combined approach, which is based on the combination of KF procedure and an errorforecasting model developed by Madsen and Skotner (2005). In the third model, the KF procedure utilizes the most recent observations for adjusting hydrographies of all the cross-sections of the study reach, and then the KNN model is implemented to use KF-updated forecasts to estimate the errors and to correct the forecasts in the forecast period. The paper is organized as follows. First, the hydraulic routing model, and the three real-time updating models, i.e., the KNN, the KF and the combined model, are introduced. Then, these updating models are used to adjust the simulated discharge, which are obtained from running the hydraulic routing model, over the middle reaches of the Huai River in East China.

2 Models This section introduces the hydraulic routing model and three updating models. Figure 1 shows the schematic diagram of those models with emphasis on the comparison and relationship among the three updating models. 2.1 The hydraulic routing model The Saint–Venant equations describe the one-dimensional water movement in a river channel, and then with the Preissman weighted four-point implicit scheme, the equations are transcribed as follows (Anderson and Burt 1985): j j Qi1 þ Qij þ Cij Zi1 þ Cij Zij ¼ Dij

ð1Þ

j j Eij Qi1 þ Gij Qij  Fij Zi1 þ Fij Zij ¼ /ij

ð2Þ

where i is the cross section number; j is the time step; Qij ; Zij are discharge and water stage of section j at time i,

Author's personal copy Stoch Environ Res Risk Assess

Fig. 1 Schematic diagram of the relationships of the hydraulic routing model (noted as Hydraulic model in the figure) with the three updating models

respectively. Cij , Dij , Eij , Fij , Gij and /ij are the state variables corresponding to i and j, and their values can be calculated according to the model state. The river reaches are divided into N cross sections, and then 2N - 2 equations are established. Given the initial state of discharge and water stage of all cross sections, variables in the Saint– Venant equations can be obtained through a recursive calculation. 2.2 The KNN based real-time updating model The KNN model derives from the algorithm of KNN nonparametric regression, which has been successfully employed in various forecasting applications (e.g., economics and weather forecasting, etc.) (Altman 1992; Foley et al. 2012; Butler and Kazakov 2010). The KNN updating procedure utilizes the number of k most representative samples from the historical datasets rather than most recent ones. These samples may include the forecast errors which can be close to the error in the present forecasting. Moreover, the KNN does not require either solutions to linear or nonlinear regression equations or normality assumption of residuals, and it can avoid the complicated computations associated with some updating models. The KNN updating model chooses k nearest neighbors from the large-sample database archived before operating real-time forecasting. For the KNN model, streamflow processes corresponding to the number of k nearest neighbors are dynamically similar to the present hydro-physical process for updating flood forecasting. Then, the input–output mapping between feature vectors and corresponding forecast errors in the

k neighbors is used to reflect the potential relationship of the error derivation at the time of forecasting. Finally, the KNN estimates errors at the forecast time and adds the forecast errors to corresponding forecasted discharges from the hydraulic routing model to obtain the updated flood forecasts. Figure 2 shows the details of the KNN model for realtime updating and forecasting streamflow. The error at the forecast time is evaluated on the basis of the statistical characteristics of prior modeled errors. The following gives the four steps of using the KNN model: (1)

(2)

Collection of forecast errors: The error database with prior knowledge and adaptation rules is necessary for the KNN model. The number of k most appropriate data will be used for estimating the real-time forecast errors. Vectorization of forecast errors: In the KNN model, the relationship among the forecasting errors is expressed by the equation below: piþext ¼ f ðpis ; . . .; pi2 ; pi1 Þ

ð3Þ

where pi is the forecast error at time i, ext is the length of the forecast period, ðpis ; . . .; pi2 ; pi1 Þ is the error vector consisting of the s previous forecast errors that are considered to influence the value of piþext , and s is the dimension of the vector. The KNN is a nonparametric procedure as it concerns the mapping relationships among the elements on both the sides of Eq. (3), instead of the exact expression of the function (Altman 1992).

123

Author's personal copy Stoch Environ Res Risk Assess

Fig. 2 Schematic diagram of the KNN updating model for real time flood forecasting (t1, t2, and t3 are the time of the newest observation, forecast and updated-forecast, respectively). Flow line (1) shows updated forecast error at time t1 compared to the historical data of forecast errors; resample the historical forecast errors to build error

(3)

(4)

Identification of the number of k nearest neighbors: Collected error vectors are compared with ðpis ; . . .; pi2 ; pi1 Þ and the Euclidean distance between them are calculated. The Euclidean distance between vector ðpis ; . . .; pi2 ; pi1 Þ and vector   pjs ; . . .; pj2 ; pj1 (j = 1, 2, 3…k) is expressed as Lj. Then, all sets of the error vectors are sorted by the value of Lj in the order of smallest to largest to identify the number of k error vectors that are the most similar to that at the forecast time (t ? ext). Estimation of the forecast errors: The Inverse Distance Weighted model (Eq. 4) is adopted in the KNN model to determine the weights of each historical forecast error identified in step 3, and the weighted average of the number of k forecast errors is considered as the errors at the forecast time.

123

vectors; flow line (2) indicates the number of k error vectors that are most similar to the present one; flow line (3) displays the forecast error at time instant t2; and flow line (4) presents the updated discharge at t2 through combining the forecast error et2 and the forecast from the hydraulic routing model

piþext

k    X ¼ pj L j j¼1

, k X  1 Lj

ð4Þ

j¼1

2.3 The Kalman Filter (KF) model A large amount of research have been carried out on application of the KF model in the field of real time updating for flood forecasting (Husain 1985). Those studies are generally classified into two categories. One focuses on updating of outputs in the forecast period (Werner et al. 2005), and the other focuses on updating parameters or state variables of the hydraulic routing model (Crissman et al. 1993; Ge 2002). This study uses the KF model to update several state variables of the hydraulic routing

Author's personal copy Stoch Environ Res Risk Assess

model. In applying the KF model to the hydraulic routing model, several key issues need to be resolved in advance (Verlaan 1998; Wang and Bai 2008), and these issues include formulation of the KF model and adaptive estimation of error covariance matrixes. The following gives the details of solving these issues. 2.3.1 Coupling the KF procedure with the onedimensional hydraulic routing model The KF model can be coupled with the hydraulic routing model and the state vector can be represented as follows: State vector: xt ¼ ½G1 ; D1 ; /1 ; D2 ; /2 :::DN1 ; /N1 ; GN T ð5Þ Assume there are r (BN) stations where the observed values of discharge and water stage are available, the observation vector is thus represented in the form of a (2r)dimensional vector. Observation vector: yt ¼ ½Q1 ; Z1 ; Q2 ; Z2 ; Q3 :::Zr1 ; Qr ; Zr T ð6Þ where G1 ¼ Q1 and GN are constant coefficients of the stage-discharge rating curve (expressed in the form of twovariable linear equation); Di ; /i (i = 1 to N - 1) are variables of the hydraulic routing model. The hydraulic routing model is easily rewritten in the form of the state function and the measurement function as follows: State function: xtþ1 ¼ Ixt þ xt

ð7Þ

Measurement function: yt ¼ Hxt þ mt

ð8Þ

where I is the N-dimensional unit matrix. H is the (2r 9 N) dimensional measurement matrix. xt and mt are the vectors of the system process noise and measurement noise, respectively. The observed data of water stage and discharge of all of the r cross sections at time t  ext (ext is the length of the forecast period) are used to generate the observation vector at time t. Detailed formulae of KF for coupling with the one-dimensional hydraulic routing model can be found in the studies of Crissman et al. (1993) and Ge et al. (2005).

2.3.2 Evaluation of characteristics of parameters in the KF model

T r ¼ Efmt g; Rdt;s ¼ Efm ( t ms g; q ¼ Efxt g; Qdt;s 1; i ¼ j ¼ Efxt xTs g; dt;s ¼ 0; i 6¼ j

ð9Þ

where d is the Kronecker symbol. Performance of the KF model may be limited due to inaccurate estimation of noise covariance matrixes. The KF procedure requires Q and R; however, to determine formulations and values of these matrixes is challenging. Sage and Husa (1969) proposed an adaptive filtering to estimate the covariance matrixes for a wide spectrum of dynamic systems. Zhang (1998) found that Sage-Husa adaptive filtering model cannot estimate Q and R simultaneously. This study analyzed the performance of Sage-Husa adaptive filtering for evaluating Q or R, and found that the filtering performs more stably on evaluating the vector q and matrix Q than r and R; further, it should be noted that when the time series is relatively long, the effect of the adaptive filtering may be impaired. 2.4 The combined model Traditional error prediction models focus on performance of the forecasts rather than analyzing sources of errors. The updating models mainly deal with the errors from the observations and state variables. In this study, the combined model is developed through coupling the KNN model with the KF procedure (see Fig. 1). The KF model utilizes real-time information of the input variables to update state variables to improve the forecasts from the hydraulic routing model. The KNN model deals with the errors between the KF-updated outputs and the observation with the KNN model, based on the information acquired from the prior practical forecasting. The combined model (noted as KK) may be superior to the other two ‘‘single’’ updating models because of the following two aspects (Madsen and Skotner 2005; Chen and Wu 2012). Firstly, the combined model takes into account real-time measurements of all the observation stations along the river channel. Secondly, the model avoids the need of manual adjustment to change the updating models in case the realtime operation of the updating model cannot work properly. However, the combined model is more complex compared to both of the ‘‘single’’ models; uncertainty inherent in input variables, boundary and initial conditions, and model structures may deteriorate the model performance.

3 Study background The process error covariance matrix Q and measurement error covariance matrix R (together with the process error vector q and measurement error vector r) are key parameters for the KF model. Their values influence model stability and capability in real-time forecasting.

3.1 Study area The study area is the middle reaches of the Huai River between the Wujiadu and Xiaoliuxiang stations (see Fig. 3)

123

Author's personal copy Stoch Environ Res Risk Assess

Fig. 3 Study area and the river reaches of the Huai River in East China

in East China. The Huai River is located between the Yangtze River and the Yellow River, and its mainstem is 1078 km. The study channel is 152 km, divided into 76 cross sections, with an average interval of 2 km. As depicted in Fig. 3, two hydrological observation stations (the Wujiadu and Xiaoliuxiang stations) and one water stage observation station (the Linhuaiguan station) are located along the river channel. There are two lateral inflows at Mohekou and Wuhe, separately (see Fig. 3). The inflow process at the upstream Wujiadu station and the stage-discharge rating curve of the downstream Xiaoliuxiang station are chosen as the boundary conditions for running the hydraulic routing model. In order to protect the densely populated areas and property in the flood plain in the Huai River basin, five floodwater detention and diversion regions have been

123

developed along the river dikes, or banks, to store the floodwater. However, presently, due to the rapid socioeconomic development in the past several decades, there are about 284,000 people living in the flood detention and diversion regions. Thus, accurate flood forecasting is critical for flood management (Chaleeraktrakoon and Chinsomboon 2015). Fortunately, the hydraulic routing models have the capability to simulate a wide spectrum of waterway characteristics (Linsley et al. 1982; Fread 1985; Todini 2005; WMO 2011), and are valid in areas where backwater effects are significant. However, due to their significant simplification, hydraulic routing models cannot provide stable and accurate forecasts, especially for the peak flow forecasting. Therefore, updating models are necessary to enhance performance in flood forecasting. However, the forecasting lead time of various updating models is

Author's personal copy Stoch Environ Res Risk Assess Table 1 Information of the thirteen flood events used for calibration and validation of the models

Flood event

Start/end time

Peak flow discharge (m3/s)

Calibration 2003062320

20:00 Jun 23/20:00 Aug 10, 2003

8580

2003083000

00:00 Aug 30/8:00 Oct 5, 2003

5360

2004070820

20:00 Jul 8/20:00 Aug 14, 2004

3950

2004082620

20:00 Aug 26/14:00 Sep 9, 2004

3450

2005070804

4:00 Jul 8/14:00 Jul 28, 2005

6350

2005082020

20:00 Aug 20/20:00 Sep 13, 2005

7180

2006062708

08:00 Jun 27/8:00 Aug 26, 2006

4410

2007022508

8:00 Feb 25/14:00 Apr 8, 2007

1750

2007062508 2008041620

8:00 Jun 25/8:00 Aug 24, 2007 20:00 Apr 16/20:00 May 1, 2008

7950 2040

2008071708

8:00 Jul 17/2:00 Sep 6, 2008

4550

2009082914

14:00 Aug 29/8:00 Sep 13, 2009

2640

2010081408

8:00 Aug 14/8:00 Oct 19, 2010

5080

Validation

restricted generally by three factors. They are (1) the lead time of the inflow process generated by the rainfall-runoff model of the upstream sub-basin, (2) the lag of the flood wave from the upstream to the downstream extremity, and (3) the operational requirement for flood alteration activities in the study area (Sivakumar and Wallender 2005). As the travel time of the flood wave in the study reach is about 26 h and the fact that flood alteration should be made 8 h ahead in the study area (Xin and Cheng 1998), the lead time of the three updating models ranges from 1 to 8 h. Specifically, this study will explore the flood forecasting with lead times of 1, 2, 3, 4, 6, and 8 h. 3.2 Research data All three real-time updating models are used individually to improve accuracy of the simulated discharge from the hydraulic routing model for the middle reaches of the Huai River. Due to data availability, this study uses thirteen flood events that occurred during the period 2003–2010, with hourly observations of discharge and water stage. Seven flood events during the period 2003–2006 are used for calibration, while the remaining storm datasets during the period 2007–2010 are used for validation. Table 1 lists the information of these flood events, including the beginning, ending time and peak flow discharge. The hydraulic model and the three updating models are simulated at a time step of 1 h. All three updating models are applied with the predetermined lead times given above. To test the applicability of the three updating models in real time flood forecasting, forecasts of discharge computed by the hydraulic routing model at t ? ext are corrected by the three updating models, and then the updated

forecast variable is treated as the final forecast for further analysis of the performance of the updating models. 3.3 Criteria for evaluating model performance To evaluate the accuracy of the final forecasts, this study adopts NSE (Nash–Sutcliffe efficiency) and ARPE (Absolute value of Relative Peak Error) as criteria: , i¼n  i¼n  X 2 X 2 NSE ¼ 1  qs;i  qo;i qo;i  qo ð10Þ i¼1

i¼1

where i is the time step, n is the length of the flood event, qo;i is the observation discharge at time i, qs;i is the forecast discharge at time i, and qo is the mean value of observed discharge. When the computed and the observed hydrographs match perfectly, the NSE is equal to 1. The average value of the efficiency criterion NSE is noted as NSE. The equation for computing ARPE is given as follows:    ARPE ¼  qs;p  qo;p qo;p  ð11Þ where qs;p is the forecast of flood peak, qo;p is the observed flood peak. If the peak flow is perfectly forecasted, the ARPE is 0.

4 Results and discussion Forecasts of discharge at the outlet of the study area, Xiaoliuxiang station (see Fig. 3), are analyzed. Section 4.1 verifies the capability of the hydraulic routing model in flood forecasting. Then, Sect. 4.2 presents the performance evaluation of three updating models.

123

Author's personal copy Stoch Environ Res Risk Assess

4.1 Performance of the hydraulic routing model Table 2 lists the NSE and APRE of the forecast results of the hydraulic routing model. From this table, it can be observed that the values of the NSE metric are larger than 0.80 for 5 of the 7 flood events in the calibration period and for 4 of the 6 flood events in the validation period. Additionally, Most of ARPE values for all the flood events in both the calibration and the validation period are smaller than 10 %. The statistical results given in Table 2 reveal that the discharge forecasts from the hydraulic routing model are close to the observations; the hydraulic routing model is capable of computing both the flood hydrograph and peak flow. From Table 2, it can be seen that the flood event of 2007062508 has an anomalous ARPE value, which is 14.53 %, and the largest among all of the flood events. This indicates that the hydraulic routing model does not perform well in the peak flow simulation of that flood event. Checking the historical records revealed a possible reason for the poor performance of the peak flow simulation of that event. The flood event covers the period of 25 June to 24 August 2007 (see Table 1). Before the observed flood peak occurred on 20 July 2007, in order to reduce the magnitude of the flood peak in the mainstem, there were some engineering activities in the river reaches during the period 11–19 July 2007. However, the location, timing and volume of diverted floodwater (i.e., human actives) were not included in the hydraulic routing model; consequently, the forecasting flood peak from the hydraulic routing model is significantly different from the observed one in the flood event. Overall, from Table 2, it can be observed that even though the hydraulic routing model can provide reasonable simulation of flood processes and peak flows,

Table 3 Values of the parameter k in the KNN procedure for different lead times

Lead time (h)

k

1

10

2

10

3

11

4

12

6

13

8

14

there is room for further improvement of flood forecasting accuracy. 4.2 Performance of the three updating models This study adopts the Ordinary Least Squares (OLS) algorithm to calibrate the parameters of the KNN procedure, aiming to fit the time series errors in forecasts of the hydraulic routing model. s is designated to be 2 and the number of k varies depending on the change of lead time (see Table 3). With the confirmation of parameters in the KNN procedure and system/measurement noise in the KF procedure, these three updating models can be applied, or coupled, with the hydraulic routing model. The KNN model needs to provide estimates of forecast errors in order to correct the forecasts. The KF model adjusts variables in the hydraulic routing model that likely leads to better forecasts. The KK uses the KF procedures to perform adaptive adjustment to the variables in the hydraulic routing model and the KNN procedure to deal with error persistence between the KF-updated discharge and observed discharge. Tables 4 and 5 and Figs. 4, 5, and 6 show the statistics of NSE of the updated forecasts of discharge at the Xiaoliuxiang station (see Fig. 3). Further, Figs. 5 and 6 present the updated forecasts at flood peak for the three updating models. The following gives the detailed explanation of these tables and figures.

Table 2 Forecast accuracy of the hydraulic routing model Flood event

NSE

2003062320

0.96

1.48

2003083000

0.50

3.52

2004070820

0.95

9.67

2004082620

0.92

6.96

2005070804

0.77

0.69

2005082020

0.90

5.72

2006062708

0.91

4.78

2007022508

0.73

7.21

2007062508 2008041620

0.98 0.75

14.53 9.23

2008071708

0.96

3.97

2009082914

0.96

6.22

2010081408

0.99

3.50

123

ARPE (%)

4.2.1 Comparison of the forecasted and observed flood hydrographs The efficiency criterion NSE is used as the primary model efficiency index to evaluate the accuracy of the updated flood hydrographs (Yao et al. 2012). Table 4 lists the NSE of the updated results for each of the updating models with different lead times. The values of NSE in Table 4 show that any of the three updating models provide higher accuracy, and this indicates that the updated forecasts are superior to the original forecasts from the hydraulic routing model. This suggests that all of the three updating models are valuable in realtime applications. As the lead time gets longer, NSE for each updating model trends to get smaller and their

Author's personal copy Stoch Environ Res Risk Assess Table 4 Values of NSE for updated forecasts from the three different updating modelsa, the left and right sides of the slash showing the average NSE for the events in the calibration and validation periods (see Table 1), respectively Model

Lead time (h) 1

2

3

4

6

8

KNN

0.999/0.998

0.998/0.996

0.998/0.995

0.997/0.992

0.995/0.987

0.993/0.980

KF

0.998/0.997

0.998/0.996

0.997/0.993

0.995/0.990

0.993/0.981

0.990/0.970

KK

0.998/0.998

0.998/0.998

0.998/0.997

0.997/0.995

0.995/0.990

0.992/0.983

a

Average NSE of non-updated forecasts that are generated directly by the one-dimensional hydraulic routing model is 0.843 and 0.895 for the calibration and validation periods (see Table 2), respectively

Table 5 NSE of the ‘best’ and ‘worst’ performances of the real-time updating models for the thirteen flood events Flood Event

The best performance

The worst performance

Lead 1

Lead 4

Lead 8

Lead 1

Lead 4

Lead 8

2003062320

KNN

KNN

KNN

KF

KF

KF

0.998

0.996

0.993

0.995

0.995

0.990

2003083000

KK

KK

KNN

KNN

KF

KF

0.999

0.998

0.995

0.999

0.997

0.993

2004070820

KK

KK

KNN

KNN

KF

KF

0.999

0.999

0.996

0.999

0.998

0.994

KNN

KK

KNN

KF

KF

KF

0.996

0.992

0.983

0.994

0.988

0.974

2005070804

KK

KNN

KNN

KNN

KF

KF

0.999

0.997

0.993

0.999

0.995

0.988

2005082020

KK

KK

KNN

KF

KF

KF

0.999

0.999

0.998

0.999

0.999

0.997

2006062708

KK

KK

KK

KF

KF

KF

2007022508

0.999 KK

0.999 KK

0.997 KK

0.999 KF

0.997 KF

0.991 KF

0.999

0.998

0.993

0.999

0.994

0.981

2007062508

KNN

KNN

KNN

KNN

KF

KF

0.999

0.998

0.996

0.999

0.998

0.995

2008041620

KK

KK

KK

KF

KF

KF

0.997

0.985

0.933

0.993

0.959

0.874

2008071708

KK

KNN

KNN

KNN

KF

KF

0.999

0.999

0.998

0.999

0.999

0.997

KNN

KK

KNN

KF

KF

KF

0.994

0.992

0.985

0.994

0.989

0.977

KK

KNN

KNN

KNN

KF

KF

0.999

0.999

0.999

0.999

0.999

0.997

2004082620

2009082914 2010081408

accuracy declines gradually. The NSE values obtained from the KNN model are higher than 0.993 and 0.980 for the calibration and validation periods, respectively. The performance of the KK model is comparable to that of the KNN model, with the NSE values greater than 0.992 and 0.983 for the calibration and validation periods,

Fig. 4 100 % stacked area chart of the three updating models as the a ‘best’ and b ‘worst’ performing models for different lead time forecasts (evaluated by the NSE)

respectively. The KF updating model also produces accurate forecasts, with the values of NSE larger than 0.990 for the events in the calibration period and 0.970 for the events in the validation period. Further, Table 5 lists the detailed information of performance of the three models, in terms of the best and the worst performances of the updating models for the 13 flood events.

123

Author's personal copy Stoch Environ Res Risk Assess

From Table 5, it can be observed that for the forecasts with a lead time of 8 h, KNN produces the best forecasts in most of the flood events (10 out of 13 events) and KF performs not better than any of the other two in all of the 13 events. It is worth noting that even the ‘worst’ forecasts from any of the three updating models are still superior to those provided by the hydraulic routing model (see Table 2). All of the three updating models provide improved forecasts for the lead times of 1–8 h. Figure 4a, b show the 100 % stacked area charts of the frequencies of both the best and worst updating models for the 13 flood events. The frequency is computed through counting the number of the best (or worst) performing updating models in both the calibration and validation periods with different lead times, and then dividing those numbers by 13. From Fig. 4, it can be observed that, although the KK updating model yields more satisfactory forecasts with 1or 2-h lead times as evaluated by NSE, its performance declines rapidly once the lead times get longer. Given a lead time of larger than 6 h, the performance of the KK model is not better than that of the KNN model. Comparatively, the KNN updating model is able to give better forecasts with lead times of 6 and 8 h. Figure 4b also indicates that the KF model is an efficient tool for real time updating (as shown in Table 4), but its performance is not superior to those of the KNN and the KK models. 4.2.2 Performance evaluation on peak flow forecast

Fig. 5 ARPE values for the final forecast generated by a the KNN model, b the KF model and c the KK model

123

Figure 5a–c present the RPE for the KNN, the KF and the KK models, respectively. The top and bottom lines of the box-plots represent the maximum and minimum values, respectively. The top, middle, and bottom of the box and the symbol of the empty square in Fig. 5 stand for the 75th percentile, the median, the 25th percentile, and the mean value, respectively. In Fig. 5, the RPE values for both the KF and the KK model are distributed in a marginally wider range than that of the KNN model, especially in lead 4-, 6- or 8-h forecasts. There is a significant increasing tendency of RPE for each of the three updating models as the lead time varies from 1 to 8 h, indicating the growing divergence between the observed and the simulated streamflow hydrographs near peak flow. Additionally, positive and negative values of RPE distribute fairly around RPE = 0 for both the KNN and the KK models, while positive values distribute more heavily than negative values for the KF. The relative poor performance of the KF and the KK models is mostly due to the uncertainty involved in estimating the system noise or the observation noise (see Eqs. 7 and 8).

Author's personal copy Stoch Environ Res Risk Assess

Fig. 6 Forecast accuracy of the KNN, the KF and the KK models in computing the peak flow at 8:00 Sep 11, 2010 (with lead times ranging from 1 to 8 h), and the x-axis showing the number of hour since 8:00 Aug 14, 2010

123

Author's personal copy Stoch Environ Res Risk Assess

4.2.3 Representation of flood processes To further examine the performances of these three updating models, this subsection presents the peak flow forecasting of the three updating models in the flood event 2010081408 as a case study. The event period is from 14 August to 19 October 2010 (see Table 1); the peak flow period is from 5:00 to 12:00 on September 11, 2010. During the peak flow period, the peak discharge is 5080 m3/s, which occurred at 8:00 on September 11, 2010, which is 673 h after 8:00 August 14, 2010 (see Fig. 6). It is worth noting that Fig. 6 does not include the simulated peak flow from the hydraulic routing model since it is very different from the observations and the forecasts from the three updating models. From Fig. 6, it can be observed that the peak discharge time and the flood hydrograph from these updating models are close to the observations when the forecasting lead time is 1 or 2 h. Further, it can be observed that the KK and the KNN models can provide more accurate forecasts than the KF model when the lead time is larger than or equal to 2 h. Besides the test on peak flow forecast, the efficiency of the three updating models is further explored through summarizing their computational cost in forecasting the peak flow at 8:00 on Sep 11, 2010. Using a Core i5-4200M 2.50 GHz processor, the KK model requires marginally more computational time than the KNN and KF (Table 6). However, it is worth noting that all of the three updating models can finish the simulation within 31 s and all the three models are efficient for real time flood forecasting. This study reveals that the KNN and the KK models perform more effectively in comparison with the KF procedure, in terms of NSE and ARPE. The KK model, which couples the KF model with the KNN model and performs the best in forecasts with lead times of 1–4 h, is suitable for flood forecasting with a short lead time. However, the KK model is not the best choice for forecasting with requirements of a longer lead time. The reasons may be twofold: (1) imperfection of the KF procedure in the hybrid model; (2) parameters s and k in the KK model, that are inherited from the KNN procedure, may not be suitable for estimation of the errors in KF-updated forecasts. The KNN

Table 6 The computational time in the units of seconds for KNN, KF and KK in forecasting the peak flow at 8:00 on Sep 11, 2010 Updating model

Lead time (h) 1

2

3

4

6

8

KNN

3

3

3

3

2

2

KF

9

10

13

15

19

24

KK

16

15

21

23

26

31

123

model, which predicts forecast errors at the forecast time with the number of k most similar periods of historical flood processes, is shown to be more robust and reliable for real-time flood forecasts. Although the KNN real-time updating model is a simple error prediction model, it is distinct from other time-series models. For example, traditional time-series models (e.g. ARMA) treat forecast errors following gradual processes, and consider the error at the forecast time as being derived from a similar source of uncertainties as the most recent errors. However, these traditional models cannot account for uncertainties of flood management activities (WMO 1992; Hossain et al. 2015), and changes of water volume in the main channel. These uncertainties and changes cannot be neglected when these models are used to forecast large flood events; normally, traditional time-series models are prone to deteriorate peak flow forecasting (Sivakumar and Wallender 2005). On the other hand, the KNN model is capable of dealing with both gradual and non-gradual processes. The KNN model predicts forecast errors on the basis of the number of k most similar chains of errors, and avoids overconfidence on the most recent ones that may mislead error derivation at the forecast time (Chen and Wu 2012). Consequently, the KNN model is capable of providing a more reliable estimation of errors, especially processes near the peak flow, than the traditional timeseries model. It also avoids re-writing of state-space formula of the hydraulic routing model and estimation of system/observation errors. It can be concluded that the KK model is the most effective among the three updating model when the lead time is not greater than 4 h, while the KNN model performance is more stable and efficient than both the KF and the KK models when the lead time is longer than 4 h.

5 Conclusions The hydraulic routing model is often used to forecast floods. However, in most situations, the forecasting lead time and forecast accuracy provided by the hydraulic routing model are often insufficient for meeting practical demands. As a result, real-time updating models are used for the purpose of improving accuracy of routing model outputs. In this paper, the KNN model, the KF and KK (the combined model) models are used for real-time updating of flood forecasts. This study has applied a hydraulic routing model and the three updating models for real-time forecasting of 13 flood events that occurred in the middle reaches of the Huai River in East China. Statistical results reveal that all of the three updating models are well calibrated and capable of producing more reliable flood forecasts with the lead times ranging from 1

Author's personal copy Stoch Environ Res Risk Assess

to 8 h than the hydraulic routing model. Among them, the KK model is the most effective in flood forecasts with the lead times of 1–4 h. The KNN model provides the best forecasts of flood hydrograph or peak flow discharge when the lead time is not smaller than 6 h and is highly recommended for incorporating with the hydraulic routing model in real-time application. Acknowledgments This work was supported by the National Natural Science Foundation of China (Grant Nos. 41130639, 51179045, 41101017, 41201028), and the Research and Innovation Program for College Graduates of Jiangsu Province (CXZZ13_0246).

References Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185 Anderson JL, Anderson SL (1999) A Monte Carlo implementation of the non-linear filtering problem to produce ensemble assimilations and forecasts. Mon Weather Rev 127:2741–2758 Anderson MG, Burt TP (1985) Hydrological forecasting, vol 372. Wiley, Chichester Blackith RE (1958) Nearest-neighbour distance measurements for the estimation of animal populations. Ecology 39(1):147–150 Box GE, Jenkins GM, Reinsel GC (2011) Time series analysis: forecasting and control, vol 734. Wiley, New York Butler M, Kazakov D (2010) Modeling the behavior of the stock market with an artificial immune system. In: IEEE Congress on Evolutionary Computation, pp. 1–8 Butts MB, Hoest-Madsen J, Refsgaard JC (2002) Hydrologic forecasting. Encycl Phys Sci Technol 2002:547–566 Chaleeraktrakoon C, Chinsomboon Y (2015) Dynamic rule curves for flood control of a multipurpose dam. J Hydro-Environ Res 9(1):133–144 Chen J, Wu Y (2012) Advancing representation of hydrologic processes in the Soil and Water Assessment Tool (SWAT) through integration of the TOPographic MODEL (TOPMODEL) features. J Hydrol 420–421:319–328 Crissman RD, Chiu CL, Yu W, Mizumura K, Corbu I (1993) Uncertainties in flow modeling and forecasting for Niagara River. J Hydraul Eng 119(11):1231–1250 Foley AM, Leahy PG, Marvuglia A, McKeogh EJ (2012) Current models and advances in forecasting of wind power generation. Renew Energy 37(1):1–8 Fread DL (1985) Channel routing. In: Anderson MG, Burt TD (eds) Hydrological forecasting. Wiley, London, pp 437–503 Ge SX (2002) Modern flood forecasting techniques. Water Conservancy Publishing Firm, Beijing, pp 95–125 (in chinese) Ge SX, Cheng HY, Li YR (2005) Real-time updating of hydraulic model by using Kalman filter. J Hydraul Eng 36(6):687–693 (in chinese) Georgakakos KP, Smith GF (1990) On improved hydrologic forecasting- Results from a WMO real-time forecasting experiment. J Hydrol 114(1–2):17–45 Gupta S, Javed A, Datt D (2003) Economics of flood protection in India. Nat Hazards 28(1):199–210 Hossain F, Arnold J, Beighley E, Brown C, Burian S, Chen J, Madadgar S, Mitra A, Niyogi D, Pielke R, Tidwell V, Wegner D (2015) Local-to-regional landscape drivers of extreme weather and climate: implications for water infrastructure resilience. J Hydrol Eng 20(7):02515002

Hunt BR, Kostelich J, Szunyogh I (2007) Efficient data assimilation for spatiotemporal chaos: a local ensemble transform Kalman filter. Physica D 230:112–126 Husain T (1985) Kalman filter estimation model in flood forecasting. Adv Water Resour 8(1):15–21 Islam MN, Sivakumar B (2002) Characterization and prediction of runoff dynamics: a nonlinear dynamical view. Adv Water Resour 25(2):179–190 Kan G, Yao C, Li Q, Li Z, Yu Z, Liu Z, Ding L, He X, Ke L (2015) Improving event-based rainfall-runoff simulation using an ensemble artificial neural network based hybrid data-driven model. Stoch Env Res Risk Assess 29:1345–1370 Karlsson M, Yakowitz S (1987) Nearest-neighbor models for nonparametric rainfall-runoff forecasting. Water Resour Res 23(7):1300–1308 Komma J, Blo¨schl G, Reszler C (2008) Soil moisture updating by Ensemble Kalman Filtering in real-time flood forecasting. J Hydrol 357(3):228–242 Linsley RK, Kohler MA, Paulhus JL (1982) Hydrology for engineers. McGraw-Hill College, New York, pp 286–310 Madsen H, Skotner C (2005) Adaptive state updating in real-time river flow forecasting-a combined filtering and error forecasting procedure. J Hydrol 308(1):302–312 Miyoshi T, Kunii M (2012) The local ensemble transform Kalman filter with the weather research and forecasting model: experiments with real observations. Pure Appl Geophys 169:321–333 Moore RJ, Bell VA, Jones DA (2005) Forecasting for flood warning. CR Geosci 337(1):203–217 Refsgaard JC (1997) Validation and intercomparison of different updating procedures for real-time forecasting. Hydrol Res 28(2):65–84 Sage AP, Husa GW (1969) Adaptive filtering with unknown prior statistics. In: Proceedings of joint automatic control conference, pp. 760–769 Shen JC, Chang CH, Wu SJ, Hsu CT, Lien HC (2015) Real-time correction of water stage forecast using combination of forecasted errors by time series models and Kalman filter method. Stoch Env Res Risk Assess 19:1903–1920 Sivakumar B (2003) Forecasting monthly streamflow dynamics in the western United States: a nonlinear dynamical approach. Environ Modell Softw 18(8):721–728 Sivakumar B, Wallender WW (2005) Predictability of river flow and sediment transport in the Mississippi River basin: a nonlinear deterministic approach. Earth Surf Proc Land 30:665–677 Sivakumar B, Sorooshian S, Gupta HV, Gao X (2001) A chaotic approach to rainfall disaggregation. Water Resour Res 37(1):61–72 Sivakumar B, Jayawardena AW, Fernando TMGH (2002) River flow forecasting: use of phase-space reconstruction and artificial neural networks approaches. J Hydrol 265(1–4):225–245 Todini E (1999) An operational decision support system for flood risk mapping, forecasting and management. Urban Water J 1(2):131–143 Todini E (2005) Rainfall-runoff models for real-time forecasting. In: Anderson MG (ed) Encyclopedia of hydrological sciences. Wiley, London, pp 1869–1896 Tong J, Hu BX, Huang H, Guo L, Yang J (2014) Application of a data assimilation method via an ensemble Kalman filter to reactive urea hydrolysis transport modeling. Stoch Env Res Risk A 28(3):729–741 Verlaan M (1998) Efficient Kalman filtering algorithms for hydrodynamic models. Delft University of Technology, Delft, pp 40–60 Wang CH, Bai YL (2008) Algorithm for real time correction of stream flow concentration based on Kalman filter. J Hydrol Eng 13(5):290–296

123

Author's personal copy Stoch Environ Res Risk Assess Werner M, Reggiani P, De RAD, Bates P, Sprokkereef E (2005) Flood forecasting and warning at the river basin and at the European scale. Nat Hazards 36(1–2):25–42 Wilson CAME, Bates PD, Hervouet JM (2002) Comparison of turbulence models for stage-discharge rating curve prediction in reach-scale compound channel flows using two-dimensional finite element methods. J Hydrol 257(1):42–58 WMO (1992) Simulated real-time intercomparison of hydrological models, Operational Hydrology Report No. 38. World Meteorological Organisation, Geneva, pp 33–40 WMO (2011) Manual on flood forecasting and warning. World Meteorological Organization, Geneva, pp 20–40 Wu GC, Zheng XG, Wang LQ, Zhang SP, Liang X, Li Y (2013) A new structure of error covariance matrices and their adaptive estimation in EnKF assimilation. Q J R Meteorol Soc 139:795–804

123

Wu GC, Zheng XG, Wang LQ, Liang X, Zhang SP, Zhang XZ (2014) Improving the ensemble transform Kalman filter using a secondorder Taylor approximation of the nonlinear observation operator. Nonlinear Proc Geophys 21:955–970 Xin JB, Cheng XG (1998) Analysis on peak flow travel time in the No 98.6 flood event in the Huai river. J Manag Huai River 9:7–8 (in Chinese) Yao C, Li Z, Yu Z, Zhang K (2012) A priori parameter estimates for a distributed, grid-based Xinanjiang model using geographically based information. J Hydrol 468:47–62 Young PC (2002) Advances in real-time flood forecasting. Philos Trans R Soc Lond Ser A 360(1796):1433–1450 Zhang CY (1998) Approach to adaptive filtering algorithm. Acta Aeronautica et Astronautica Sinica 19(7):96–99