Application of artificial neural networks in regional flood frequency ...

Stoch Environ Res Risk Assess (2014) 28:541–554 DOI 10.1007/s00477-013-0771-5

ORIGINAL PAPER

Application of artificial neural networks in regional flood frequency analysis: a case study for Australia K. Aziz • A. Rahman • G. Fang • S. Shrestha

Published online: 27 July 2013 Springer-Verlag Berlin Heidelberg 2013

Abstract Regional flood frequency analysis (RFFA) is widely used in practice to estimate flood quantiles in ungauged catchments. Most commonly adopted RFFA methods such as quantile regression technique (QRT) assume a log-linear relationship between the dependent and a set of predictor variables. As non-linear models and universal approximators, artificial neural networks (ANN) have been widely adopted in rainfall runoff modeling and hydrologic forecasting, but there have been relatively few studies involving the application of ANN to RFFA for estimating flood quantiles in ungauged catchments. This paper thus focuses on the development and testing of an ANN-based RFFA model using an extensive Australian database consisting of 452 gauged catchments. Based on an independent testing, it has been found that ANN-based RFFA model with only two predictor variables can provide flood quantile estimates that are more accurate than the traditional QRT. Seven different regions have been compared with the ANN-based RFFA model and it has been shown that when the data from all the eastern Australian states are combined together to form a single region, the ANN presents the best performing RFFA model. This indicates that a relatively larger dataset is better suited for successful training and testing of the ANN-based RFFA models. Keywords Artificial neural networks (ANN) Regional flood frequency analysis Design floods Ungauged catchments Regionalisation Quantile regression technique K. Aziz A. Rahman (&) G. Fang S. Shrestha School of Computing, Engineering and Mathematics, University of Western Sydney, Kingswood, NSW, Australia e-mail: [email protected]

1 Introduction Regional flood frequency analysis (RFFA) is the generic name given to describe techniques which utilizes streamflow data from gauged catchments in a region to estimate design floods for poorly gauged or ungauged catchments. The use of RFFA enables the ‘‘transfer’’ of flood characteristics information from gauged to ungauged catchments (Blo¨schl and Sivapalan 1997; Pallard et al. 2009). The most commonly adopted RFFA methods have been described in Cunnane (1988) and Hosking and Wallis (1997) RFFA essentially consists of two principal steps: (a) formation of regions and (b) development of prediction equations. Regions have traditionally been formed based on geographic, political, administrative or physiographic boundaries (e.g. NERC 1975; I.E. Aust. 1987). Regions have also been formed in catchment characteristics data space using multivariate statistical techniques (e.g. Acreman and Sinclair 1986; Nathan and McMahon 1990; Rao and Srinivas 2008; Guse et al. 2010). Regions can also be formed using a region-of-influence approach where a certain number of catchments based on proximity in geographic or catchment attributes space are pooled together based on some objective function to form an optimum region (e.g. Burn 1990; Zrinji and Burn 1994; Kjeldsen and Jones 2009; Haddad and Rahman 2012a). For developing the regional flood prediction equations, the commonly used techniques include the rational method, index flood method and quantile regression technique (QRT). The rational method has widely been adopted in estimating design floods for small ungauged catchments (e.g. Mulvany 1851; I.E. Aust. 1987; Jiapeng et al. 2003; Pegram and Parak 2004; Rahman et al. 2011). The index flood method has widely been adopted in many countries, which relies on the identification of homogeneous regions

123

542

(Dalrymple 1960; Hosking and Wallis 1993; Bates et al. 1998; Rahman et al. 1999; Kjeldsen and Jones 2010; Ishak et al. 2011). The QRT, proposed by the United States Geological Survey (USGS) has been applied by many researchers using either an ordinary least square (OLS) or generalized least square (GLS) regression technique (e.g. Benson 1962; Thomas and Benson 1970; Tasker 1980; Stedinger and Tasker 1985; Tasker et al. 1986; Madsen et al. 1997; Pandey and Nguyen 1999; Bayazit and Onoz 2004; Rahman 2005; Griffis and Stedinger 2007; Ouarda et al. 2008; Kjeldsen and Jones 2009; Haddad and Rahman 2011, 2012b; Haddad et al. 2012; Zaman et al. 2012). Because of vast area of study, diversity of climatic conditions and site characteristics different researchers have emphasized on different aspects of RFFA. In the seventies and early eighties much effort was made on developing efficient at-site flood frequency analysis techniques; however, in late eighties focus was shifted towards developing and enhancing existing RFFA methods (e.g. Greis and Wood 1983; Potter 1987; Kirby and Moss 1987; NRC 1988; WMO 1989; Potter 1987; Bobee et al. 1993). Most of the above RFFA methods assume linear relationship between flood statistics and predictor variables generally in log domain while developing the regional prediction equations. However, most of the hydrologic processes are nonlinear and exhibit a high degree of spatial and temporal variability and hence a simple log transformation cannot guarantee achievement of linearity in modeling. Among the non-linear modeling, artificial neural networks (ANN) are regarded as universal approximators and have been widely applied to a variety of complex nonlinear problems in finance, medicine and engineering. The ANN have been applied to a wide range of hydrological problems, such as rainfall runoff modeling and hydrologic forecasting, but there have been relatively few studies involving the application of ANN to RFFA (e.g. Daniell 1991; Muttiah et al. 1997; Shu and Burn 2004; Kothyari 2004; Dawson et al. 2006; Shu and Ouarda 2007, 2008). This paper seeks to fill this gap by focusing on the development and testing of an ANN-based RFFA method using the most extensive and comprehensive database that has become available in Australia as a part of the on-going revision of the Australian rainfall and runoff—the national guide to flood estimation (I.E. Aust. 1987, 2001). The aim of this study are thus threefold: (1) to test the applicability of the ANN to RFFA problem using an extensive data set; (2) to compare the performances of the ANN-based RFFA models with alternative regions formed based on state boundaries and hydro-climatology; and (3) to compare the performances of the ANN-based RFFA model with traditional approach such as QRT using an independent testing. An overview of the ANN is presented first, which follows the description of study area and data. The adopted

123

Stoch Environ Res Risk Assess (2014) 28:541–554

methodology is presented next, which follows the results and conclusion from the study. 2 Overview of ANN and their applications to RFFA 2.1 Application of ANN to RFFA The development of ANN started in 1940s (McCulloch and Pitts 1943), inspired by a desire to understand the human brain and emulate its functioning. The development of ANN-based techniques experienced a renaissance in 1980s due to efforts of Hopfield (1982) in iterative auto-associable neural networks. A tremendous growth in the interest of this computational mechanism occurred since Rumelhart et al. (1986) discovered a mathematically rigorous theoretical framework for ANN, i.e., back-propagation algorithm. Application of ANN in hydrological modeling was inspired by the work on forecasting mapping (predictors) for chaotic dynamic systems (Farmer and Sidorowich 1987). Since early nineties, ANN have been applied to many hydrological problems such as rainfall-runoff modeling, streamflow forecasting, groundwater modeling, water quality, rainfall forecasting, hydrologic time series modeling and reservoir operations (e.g. Govindaraju 2000; Luk et al. 2001; Dawson and Wilby 2001; Zhang and Govindaraju 2003; Abrahart et al. 2004, 2007; Chokmani et al. 2008; Wu et al. 2008; Turan and Yurdusev 2009; Besaw et al. 2010; Gao et al. 2010; Huo et al. 2012). The Task Committee on Application of ANN in Hydrology by ASCE (2000) stated that ANN should be classified as empirical models, which treat hydrologic systems (such as a watershed) as a black-box and attempt to find a relationship between historical inputs (e.g. rainfall and streamflow) and outputs (e.g. catchment runoff measured at a stream gauge). The application of ANN requires careful consideration as highlighted by Maier and Dandy (2000) who reported a review of 43 scientific papers dealing with the use of ANN models for the prediction and forecasting of water resources variables. They mentioned that issues in relation to the optimal division of the available data, data pre-processing and the choice of appropriate model inputs were seldom considered. In addition, the process of choosing appropriate stopping criteria and optimizing network geometry and internal network parameters were generally described poorly or carried out inadequately. There have been relatively few applications of ANN to RFFA to estimate flood quantiles in ungauged catchments. For example, Dawson et al. (2006) applied ANN to develop a model for index flood using data from 850 UK catchments and found that ANN provided more accurate flood quantile estimates than the QRT. They pointed out


that ANN are heavily data dependent and cannot explicitly account for physical processes, reducing confidence in model predictions. Muttiah et al. (1997) developed an ANN-based RFFA method for 2-years peak flood prediction using a large data set from the US. They found that the ANN-based quantile estimates were comparable to the QRT ones. Kothyari (2004) applied ANN-based RFFA method to 97 catchments in India to develop prediction equation for mean annual flood. He compared two scenarios: scenario 1 with 12-neurons in the hidden layer and scenario 2 with only 1 neuron in the hidden layer. He found that scenario 2 provided the best results with minimum error and best model goodness-of-fit values. He pointed out that an ANN-based RFFA model having more complex architecture than the one used in scenario 1 did not produce any better results. Daniell (1991) adopted ANN to 14 catchments in Australian Capital Territory to develop a RFFA model; however, due to limited data set the method could not produce any meaningful prediction. Dastorani and Wright (2001) applied ANN-based RFFA method to predict the index flood for a small number of catchments in the UK. In summary, ANN has been widely adopted in rainfall and streamflow forecasting problems; however, there have been only limited studies on the application of ANN to RFFA problems. Most of the previous studies focused on the estimation of mean annual flood; however, ANN application to higher average recurrence interval (ARI) floods (such as 50 and 100 years ARIs) has not been explored. Hence this study investigates the applicability of the ANN-based RFFA method in estimating flood quantiles from 2 to 100 years ARIs using a large data set from Australia. 2.2 Working structure of ANN The ANN method adopted in this study is based on the structure of the multi-layer perception (MLP), which has been widely used in hydrological modelling (Maier and Dandy 2000; Shamseldin 1997). The MLP structure consists of a network of interconnected neurons linked by connection pathways as shown in Fig. 1. In this study, the

543

adopted ANN model has three layers of neurons or nodes: an input layer, a hidden layer and an output layer. Each neuron has a number of inputs (from outside the network or the previous layer) and a number of outputs (leading to the subsequent layer or out of the network). In Fig. 1 the circles represent neurons and lines represent the connections where each input is multiplied by a connection parameter known as weight and combined (usually with certain bias) to produce a single value. This value is then operated by a transfer function. Therefore, the overall output value of a neuron can be expressed as below: yj ¼ f X Wi bj ð1Þ where, the inputs in the first layer form an input vector: ð2Þ X ¼ x1; . . .; xi ; . . .; xn The sequence of weights leading to the node forms a weight vector: Wmj ¼ w1j ; . . .; wij ; . . .; wnj ð3Þ where j = 1, 2,…n and m = number of neurons, wij = connection weight from the ith node in the preceding layer to the present node. The output of node j, yj, is obtained by computing the value of function f with respect to the inner product of vector X and Wj minus bj, where bj is the threshold value, also called the bias, associated with this node. In ANNs parlance, the bias bj of the node must be exceeded before it can be activated. Here, the logistic sigmoid is used as an activation function. The sigmoid function is a bounded, monotonic, non-decreasing function that provides a graded and nonlinear response. The popularity of the sigmoid function is partially attributed to the simplicity of its derivative that is used during the training process. This function enables a network to map any nonlinear process. The hyperbolic tangent sigmoid function adopted in this study is expressed by: 1 ex f ðxÞ ¼ ð4Þ 1 þ ex 3 Study area and data description

Fig. 1 Configuration of a feedforward three-layered ANN

This study needs two basic types of data: (i) streamflow data (the annual maximum flood series); (ii) climatic and catchment characteristics data (which affects the flood generation process and potentially useful in RFFA modeling). The eastern Australia was selected as the study area for this research as this has the highest density of streamflow gauging stations with good quality data in Australia. In selecting the study catchments, the following criteria were adopted (Ishak et al. 2013) to ensure that the study

123

544

catchments are minimally affected by the land use change and artificial storage, and have a relatively long period of streamflow record.


3.6 Gap filling

Most flood frequency analysis assumes that the available streamflow data is error free; however, at some stations this assumption may be grossly violated. Stations graded as ‘poor quality’ or with specific comments by the gauging authority regarding quality of the data were assessed in greater detail, and stations deemed ‘low quality’ were excluded from the study.

In preparing the streamflow data, gaps in the annual maximum flood series data were filled by two methods: (i) comparison of the monthly instantaneous maximum data with monthly maximum mean daily data at the same station for years with data gaps; if a missing month of instantaneous maximum flow corresponded to a month of very low maximum mean daily flow, then that was taken to indicate that the annual maximum did not occur during that missing month; and (ii) regression of annual maximum mean daily flow series against the annual instantaneous maximum series of the same station.

3.2 Record length

3.7 Rating curve extrapolation

A scarcity of stream gauging stations with long records requires a compromise between maximizing the number of stations (which provides a greater spatial coverage), and maximising the record length (which enhances the accuracy of trend analysis). Hence, selection of a cut-off record length is important, which affects the total number of stations available for the study. For this investigation, the stations having a minimum record length of 25 years were included.

The stations having annual maximum flood data associated with a high degree of rating curve extrapolation were identified by using a ‘rating ratio’. The annual maximum flood series data point for each year (estimated flow) was divided by the maximum gauged flow for that station to define the rating ratio. Any station with a ‘very high rating ratio’ was excluded from the database as explained in Haddad et al. (2010) and Haddad and Rahman (2012a). Based on the above criteria, a total of 452 stations were finally selected for this study. The locations of the selected stations are shown in Fig. 2, which include 96 stations from New South Wales (NSW), 131 from Victoria (VIC), 172 from Queensland (QLD) and 53 from Tasmania (TAS) in Australia. It should be noted here that although there is a good coverage of stations along the eastern coast of Australia (Fig. 2) there is a lack of stations in the western part of NSW, VIC and QLD due to inadequate data availability. The catchment sizes of the selected 452 stations range from 1.3 to 1,900 km2 with the median value of 256 km2. For the stations of NSW, VIC and QLD, the upper limit of catchment size was 1,000 km2; however, for Tasmania there were four catchments in the range of 1,000–1,900 km2. Overall, there are about 12 % catchments in the range of 1–50 km2, about 11 % in the range of 50–100 km2, 53 % in the range of 100–500 km2, 23 % in the range of 500–1,000 km2 and 1 % greater than 1,000 km2. The annual maximum flood record lengths of the selected stations range from 25 to 75 years (mean 33 years). The Grubbs and Beck (1972) method was adopted in detecting high and low outliers (at the 10 % level of significance) in the annual maximum flood series data. The detected low outliers were treated as censored flows in flood frequency analysis. Only a few stations had a high outlier, which was not removed from the data set as no data error was detected for these high flows. The annual maximum flood series data in the selected stations did not show

3.1 Data quality

3.3 Catchment size Priority was given to small/medium sized catchments with an upper limit of 1,000 km2; however an upper limit of 1,900 km2 was adopted for Tasmania due to the limited availability of the number of small/medium catchments. 3.4 Land use Care was taken to eliminate catchments where significant anthropogenic influences are likely to have occurred, particularly related to deforestation, change of agricultural practice, and urbanisation. Each selected catchment was examined by going through the topographic and aerial photographic maps, and catchments that had undergone major land use changes over the period of streamflow records were excluded. 3.5 Regulation Since major regulation undermines the assessment of climatically-forced hydrological changes, gauging stations with significant regulation or diversion were excluded. Streams with minor regulation, such as small farm dams and small storage diversion weirs, were included because this type of regulation is unlikely to have a significant effect on AM floods.

123


545

Fig. 2 Location of study catchments (blue and red colour represent training and test catchments, respectively). (Color figure online)

any trend (at the 5 % level of significance) based on Mann– Kendall test (Kendall 1970). In estimating the flood quantiles for each of the selected stations, log-Pearson III (LP3) distribution was fitted to the annual maximum flood series using Monte Carlo Bayesian method as implemented in FLIKE software (Kuczera 1999). According to ARR, LP3 is the recommended distribution for at-site flood frequency analysis in Australia (I. E Aust. 1987), and hence it was adopted. In previous applications (e.g. Haddad et al. 2012), it has been found that LP3 distribution provide an adequate fit to the Australian annual maximum flood data. In selecting the candidate predictor variables, the variables adopted by similar previous RFFA studies are first examined (see Table 1). It was found that all the mentioned previous studies adopted catchment area and mean annual rainfall as the predictor variables and hence these were included as candidate predictor variables in our study. Design rainfall intensity and evaporation were adopted by three previous Australian studies, and hence these were included in this study. Main stream slope was adopted by all but one study and hence it was included. To use the design rainfall intensity, one needs duration of rainfall and ARI; in this study, six different combinations of durations and ARIs were adopted. Hence, this study included a total of 10 predictor variables; however, six of them represent design rainfall intensity of different durations and ARIs. The correlations of these 10 variables are plotted in Fig. 3, which shows that six different rainfall intensities are highly correlated, which indicates that the use of only one design rainfall intensity is desirable in the final prediction equation as the use of highly correlated variables does not add any much extra information to the model. In summary, the predictor variables adopted in this study are: catchment area expressed in km2 (A); design

Table 1 Catchment characteristics predictor variables used in some previous studies Authors

Country

Predictor variables adopted

Flavell (2012)

Australia

Catchment area, mean annual rainfall, mainstream slope, main-channel length, and 12 and 24 h statistical rainfall totals

Griffis and Stedinger (2007)

USA

Catchment area, mean annual rainfall, runoff measured, mainstream slope, main-channel length, forest cover, and storage measured as the percent of catchment area

Haddad and Rahman (2012a)

Australia

Catchment area, design rainfall intensity, mean annual rainfall, mean annual evapotranspiration, stream density, mainstream slope, stream length, and forest cover

Muttiah et al. 1997

USA

Catchment area, mean annual rainfall, and mean basin elevation

Rahman 2005

Australia

Catchment area, design rainfall intensity, mean annual rainfall, mean annual rain days, mean annual Class A pan evaporation, mainstream slope, river bed elevation at the gauging station, maximum elevation difference in the basin, stream density, forest cover, and fraction quaternary sediment area

Shu and Ouarda (2008)

Canada

Catchment area, mean annual rainfall, mainstream slope, fraction of the basin area covered with lakes and annual mean degree-days

Riad et al. (2004)

Morocco

Catchment area and mean annual rainfall

123

546


Fig. 3 Plot representing bi-variate correlations of the candidate 10 independent variables

rainfall intensity values in mm/h Itc, ARI (where ARI = 2, 5, 10, 20, 50 and 100 years and tc = time of concentration (hour), estimated from tc = 0.76A0.38); mean annual rainfall expressed in mm/y (R); mean annual areal evapotranspiration expressed in mm/y (E); and main stream slope expressed in m/km (S). These predictor variables capture the flood generation and attenuation process very well. For example, catchment area determines the flood magnitude, i.e. the larger the catchment area for a given rainfall, the greater the flood magnitude is. The rainfall intensity or mean annual rainfall represents the main input to the rainfall-runoff process, i.e. the higher the rainfall, the greater is the flood for a given catchment area. Furthermore, the higher the slope the greater is the flood flow velocity i.e. a greater flood peak. The evapo-transpiration is the loss component to the rainfall and runoff process. We

123

did not include any soil characteristics in our study as this is not readily available for larger catchments in Australia and furthermore, it is not easy to obtain a catchment representative value of this variable. The main stream slope adopted here excludes the extremes of slope found at either end of the mainstream. It is the ratio of the difference in elevation of the stream bed at 85 and 10 % of its length from the basin outlet, and 75 % of the mainstream length. The slope was determined from 1:100,000 topographic maps using an opisometer to measure the stream length. The basic design rainfall intensities (I) data for the selected catchments were obtained from ARR (I.E. Aust. 1987, 2001). The mean annual areal potential evapo-transpiration (E) data was obtained from the Evaporation Data CD published by the Australian Bureau of Meteorology (BOM). Similarly, the data for


547

Table 2 Exploratory data analysis of the ten explanatory predictor variables Variables Catchment area (A) (km2) Mean annual areal evapo-transpiration (E) (mm/year) Mean annual rainfall (R) (mm) Main stream slope (S) (m/km)

Range

Median

Mean

SD

1.3–1,900

255.5

329.4

277.3

410.1–1,543.3

998.5

977.8

188.9

416–4,348

1,005.6

1,185.8

603.5

0–197.7

7.7

11.3

16.8

Design rainfall intensity—2 years ARI and time of concentration of tc hours (I_tc_2) (mm/h)

2.9–43.1

8.9

10.9

6.1


3.6–54.5

11.4

13.9

8.0


4.0–235.8

12.9

16.3

13.7


4.6–70.1

15.0

18.3

10.5


5.4–757

17.7

23.4

36.7


6.0–91

20.1

24.5

14.0

mean annual rainfall (R) was extracted from the BOM CD of mean annual rainfall. The summary of the selected predictor variables is provided in Table 2.

4 Method The analyses presented in this paper were carried out in three major steps: • •

•

Finding the best set of predictor variables for estimation of flood quantiles; Determination of optimum region by comparing alternative regions formed based on state boundary and climate regime; and Independent testing of the ANN-based RFFA models and comparison with the traditional QRT method.

The study considers 6 flood quantiles being 2, 5, 10, 20, 50 and 100 years ARIs. Prediction equations were developed for each of these ARIs. For both the ANN and QRT based regional flood model development, at the beginning of the study, the n available study catchments (which is 452 in this study) were divided randomly into (i) training dataset and (ii) testing dataset. The training dataset contained 80 % of the n study catchments, i.e. 0.80 * n = 0.80 * 452 = 362 catchments. The testing dataset contained 20 % of the n study catchments i.e. 0.20 * n = 90 catchments. The testing dataset was not used in the model training/development. It should be noted that other divisions such as 70 % for training and 30 % for testing could have been adopted (Kashani et al. 2008), but this is unlikely to affect the outcomes of this study.

In the ANN modeling, Lavenberg-Marquardt method was used as the training algorithm to minimize the mean squared error (MSE). The purpose of training an ANN with a set of input and output data is to adjust the weights in the ANN to minimize the MSE between the desired outputs and the ANN outputs. The testing data set was selected randomly to produce a reasonable sample of different catchment types and sizes. A feedforward ANN consisting of three layers (input, hidden and output layers) was used with the training algorithm known as ‘backpropagation of error’. Three hidden-layered neural networks were selected with 7, 3 and 1 neurons to each of these three layers. Two inputs (A, Itc_ARI) were used in one input layer and one output layer with one output (Qpred). The transfer function used for the hidden layers and the output layer was all hyperbolic tangent sigmoid function (Eq. 4). Transfer functions calculate a layer’s output from its net input. A maximum training iteration of 20,000 was adopted. Each predictor and predictand was standardized to the range of (0.05, 0.95), such that extreme flood events which exceeded the range of the training data set could be modelled between the boundaries (0, 1) during testing. A learning rate of 0.05 was used together with a momentum constant of 0.95. MATLAB was used to perform the ANN training. To select the best performing model the different combinations of hidden layers, algorithm, and number of neurons were observed against the MSE value. In order to obtain the best ANN-based model, the MSE values between the observed and predicted flood quantiles were calculated and the training was undertaken to minimise this error. To avoid over-training during the training of ANN model, the MSE values were also calculated for the testing

123

548


data set. If the testing MSE was increasing, even when the training MSE still was decreasing, the training of the ANN was terminated. This ensured the training quality of the ANN and avoided over-fitting. 4.1 Selection of important variables in the ANN models In this study, a separate regression equation was developed for each of the six flood quantiles separately, as this is the usual practice in regression-based RFFA, e.g. see, Thomas and Benson (1970), Stedinger and Tasker (1985), Pandey and Nguyen (1999), Griffis and Stedinger (2007) and Haddad et al. (2012). In few previous RFFA studies, canonical correlation analysis was adopted where the correlation among the flood quantiles themselves was accounted for (e.g. Bates et al. 1998; Cavadias et al. 2001); but these models did not provide any better performance than the usual regression models where the correlation among the dependent variables are disregarded. Five catchment characteristics predictor variables as explained in Sect. 3 were used as candidates in this study. For these five variables, there could be 31 different models. However, all these models may not necessarily be useful since some combination of variables would only result in weaker RFFA models. For example, catchment area has been found to be the most important predictor variable in almost all the previous RFFA studies as shown in Table 1. The second most important predictor variable has been reported to be design rainfall intensity (e.g. Javelle et al. 2002; Jingyi and Hall 2004). Hence, the combination of these two predictor variables is likely to result in the most significant prediction equation than that is delivered by any two other variables. In fact, previous Australian RFFA studies have found that these two predictor variables generate the best RFFA prediction equation (e.g. Haddad and Rahman 2012a). In this study, eight different models are considered as shown in Table 3, which contain catchment

area and design rainfall intensity and combinations of the other three predictor variables. This approach, however, makes an assumption that there is no other combination of predictor variables (from these five variables) that would deliver a better model than any one of these eight models. This assumption is not unreasonable. 4.2 Formation of regions There could be a number of alternatives in forming the regions/sub-regions from the selected 452 catchments. For example, all the catchments may be assumed to have formed one region. Alternatively, regions may be based on state boundaries. In this study, a number of alternative regions were considered to identify the optimum region for ANNbased RFFA modeling in eastern Australia. Geographic and climate conditions were considered while forming these alternative regions. Initially, each of the states of VIC, NSW, QLD and TAS were treated as a separate region following the approach as adopted in ARR (I.E. Aust. 1987). At the second stage, data from all these states were combined to form one region. Finally the data set was divided into two sub-regions consisting of summer dominated and winter dominated rainfall areas. These regions are listed in Table 4. For each of these regions, same methodology was adopted in model training and validation as mentioned in Sect. 4 i.e. 80 % for training and 20 % for validation and the same ANN structure. For each region, the ANN model was built and used to predict 2, 5, 10, 20, 50 and 100 years ARI flood quantiles. 4.3 Quantile regression technique (QRT) In QRT, flood quantiles (QT) are regressed against catchment characteristics (predictor variables) (X) using the power form equation (Thomas and Benson 1970; Stedinger and Tasker 1985; Haddad and Rahman 2012a): b

b

Q T ¼ b0 X 1 1 X 2 2 . . .

ð5Þ

where regression coefficients bs are generally obtained by using an OLS or GLS regression. In developing the QRT,

Table 3 Candidate models and catchment characteristic predictor variables used in this study Model ID

Predictor variables

Description of variables (details in Sect. 3)

Table 4 Description of candidate regions Region label

Description of region

No. of stations

Abbreviated region name

1

A, Itc_ARI

A catchment area

2

A, Itc_ARI, S

Itc_ARI design rainfall intensity

1

New South Wales

96

NSW

3 4

A, Itc_ARI, E A, Itc_ARI, R

S main stream slope E evapotranspiration

2

Victoria

131

VIC

3

Queensland

172

QLD

5

A, Itc_ARI, S, E

R mean annual rainfall

4

Tasmania

53

TAS

6

A, Itc_ARI, R, E

5

Combined data set

452

7

A, Itc_ARI, R, S

6

Winter dominated rainfall region

249

WDRR

8

A, Itc_ARI, R, S, E

7

Summer dominated rainfall region

203

SDRR

123

Combined


549

both the dependent and independent variables are generally log-transformed to linearize Eq. 5. In this study, an OLS regression was adopted to develop a prediction equation for each of the six flood quantiles using two predictor variables (A, Itc_ARI). The data sets for building and independent testing of the QRT model were the same as with the ANN models. The MINITAB 14 software was used to develop the QRT models. 4.4 Assessment of model performance Following evaluation statistics were used for model assessment: •

Coefficient of efficiency (CE): n P ðQObs QPred Þ2 i¼1 CE ¼ 1 P n Þ2 ðQPred Q

ð6Þ

5.2 Selection of the optimum region

i¼1

•

Ratio between predicted and observed flood quantiles:

Qpredicted Ratioðr Þ ¼ Qobserved

ð7Þ

•

Relative error (RE): Qpred Qobs REð%Þ ¼ Abs 100 Qobs

values based on the 90 test catchments. As shown in Table 5, Model 1 provides median r values closer to 1 followed by Model 2 as compared to other models. Similarly, Models 1 and 2 outperform other models in terms of median RE values except for Model 4, where Model 4 produces better results as compared to Model 2. Moreover, Model 1 shows much smaller values of median RE for Q2, Q5 and Q50, a similar median RE values for Q10 and Q100 and a higher median RE value for Q20. Furthermore, when comparing different models for CE values, it was found that Models 1, 2, 3 and 4 outperform the remaining four models. Model 1 exhibits more consistency and better CE values for different quantiles when compared with closely performing Model 2. These results demonstrate that overall Model 1 outperforms Model 2 and other models and hence Model 1 with two predictor variables (A, Itc_ARI) is adopted in this study.

ð8Þ

where Qpred is the flood quantile estimate from the ANN or QRT-based model, Qobs is the at-site flood frequency estimate obtained from LP3 distribution using a Bayesian the parameter fitting procedure (Kuczera 1999) and Qis mean of the observed flood quantiles. The CE, median RE and median ratio values were used to measure the relative accuracy of a model. The CE ranges from -? in the worst case to ?1 for a perfect model (Shamseldin 1997). A ratio closer to 1 indicates a perfect match between the observed and predicted value and a smaller median RE is desirable for a good model.

5 Results 5.1 Selection of predictor variables A total of eight alternative combinations of catchment characteristics predictor variables as shown in Table 3 were examined. An ANN-based RFFA model was developed for each of the combination of predictor variables based on the randomly selected 80 % of the catchments (i.e. 362 catchments) and it was then tested on the remaining 20 % of the catchments (i.e. 90 catchments). These models were assessed on the basis of ratio (r) (Eq. 7), median RE and CE

Table 6 summarizes the median ratio (r) values for the seven candidate regions. For NSW candidate region, median ratio for Q10 is too small (0.17) which indicates a significant under-estimation by the ANN-based model. Also, for this region, Q50 shows remarkable over-estimation with a median ratio of 1.82. For VIC candidate region, all the median ratios seem to be reasonable with a range of 0.86–1.49. For QLD region, both Q50 and Q100 show an excellent median ratio close to 1.00 (a range of 0.98–1.48), which appear to be reasonable. For TAS region, Q50 shows notable overestimation with a median ratio value of 2.46. For summer dominated rainfall region (SDRR) and winter dominated rainfall region (WDRR), results are better than the individual states except for Q50 for the WDRR, which shows a notable overestimation with a median ratio of 2.02. It seems that when the region size increases, the median ratio values are more consistent over different ARIs. When all the data sets are combined together, the median ratio values show remarkable improvement with a range of 0.99–1.14, which appears to be satisfactory. There are smaller differences in the median ratio values across various ARIs for the combined data set as compared to other regions as illustrated in Fig. 4. In terms of median RE values (Table 7), for NSW Q10 and Q50 show very high median RE values, which are 91 and 82 %, respectively. The best results are found for Q20 and Q100 with median RE values close to 50 %. For VIC region, median RE values for Q50 and Q100 are in the range of 66–78 %, which appear to be quite high. For QLD region, median RE values are in the rage of 37–58 % which seems to be consistent across various ARIs and represents the best result among all the states. For TAS region, Q50 has a very high median RE value (146 %), for the other ARIs,

123

550


Table 5 Comparison of eight different models using 90 independent test catchments Quantiles

Model Model 1

Model 2

Model 3

Model 4

Model 5

Model 6

Model 7

Model 8

Q2

0.73

0.68

0.78

0.76

0.37

0.21

0.40

0.70

Q5

0.61

0.52

0.65

0.65

0.79

0.28

0.71

0.70

Q10

0.63

0.78

0.72

0.56

0.46

0.40

0.39

0.67

Q20

0.71

0.72

0.68

0.68

0.57

0.67

0.27

-0.19

Q50

0.68

0.68

0.59

0.62

-0.33

0.36

0.54

0.14

Q100 Average

0.52 0.66

0.57 0.66

0.44 0.64

0.55 0.63

0.37 0.37

0.33 0.38

0.45 0.46

0.37 0.40

Median

0.64

0.68

0.67

0.64

0.42

0.35

0.43

0.52

Q2

1.04

1.09

0.94

1.26

1.37

1.19

1.36

1.20

Q5

0.99

1.03

1.02

1.13

1.13

1.09

1.08

1.24

Q10

1.02

1.31

1.31

1.07

1.34

1.41

1.21

1.06

Q20

1.04

1.09

1.06

1.01

1.26

1.07

1.19

0.94

Q50

1.14

1.06

1.41

1.16

1.17

1.69

1.22

1.11

Q100

1.10

1.09

1.14

1.30

1.37

1.03

0.96

1.03

Average

1.06

1.11

1.15

1.16

1.27

1.25

1.17

1.10

Median

1.04

1.09

1.14

1.16

1.27

1.19

1.19

1.10

Q2

37.56

49.93

44.22

46.98

55.75

40.28

61.36

44.05

Q5

40.39

50.01

39.60

44.25

49.56

57.66

38.28

46.78

Q10 Q20

44.63 35.62

43.98 30.65

55.26 49.42

39.35 40.69

49.87 47.48

55.01 51.66

55.20 46.90

44.68 52.95

Q50

39.09

44.00

55.01

41.10

69.61

78.77

46.66

66.80

Q100

44.53

44.13

51.18

60.08

55.75

53.11

53.72

49.20

Median

39.74

44.07

50.30

42.68

52.81

54.06

50.31

47.99

Average

40.30

43.78

49.12

45.41

54.67

56.08

50.35

50.74

CE

Ratio (median)

RE (%) (median)

Table 6 Median Qpred/Qobs ratio values for seven ANNbased candidate regions and QRT

Flood quantile

ANN-based models NSW

VIC

QLD

TAS

SDRR

WDRR

Combined

Combined

Q2 Q5

1.38 0.84

1.06 1.13

1.28 1.48

1.08 1.56

1.14 1.21

1.25 1.06

1.04 0.99

1.15 1.06

Q10

0.17

0.86

1.11

1.65

1.38

1.26

1.02

1.35

Q20

1.53

1.49

1.11

0.74

0.84

1.28

1.04

1.13

Q50

1.82

1.17

0.98

2.46

1.32

2.02

1.14

1.19

Q100

1.22

1.24

1.00

1.05

1.21

1.33

1.10

1.28

results are quite reasonable. It seems that there is a sharp decrease in median RE value from Q50 to Q100 which is unexpected. This indicates that for very small data set (TAS region has only 53 stations) ANN-based RFFA model provides inconsistent results across various ARIs. For SDRR and WDRR, the median RE values are in the range of 29–57 and 43–102 %, respectively. Here all the

123

QRT

median RE values are in the reasonable range except for Q50 for WDRR region. When all the data are combined the median RE values are consistent across all the ARIs (in the range of 37–44 %). There are smaller differences in the median RE values across various ARIs for the combined data set as compared to other regions as illustrated in Fig. 5. These results clearly show that the combined data


551

Fig. 4 Median ratio values for different ARIs for candidate regions

Fig. 5 Median relative error values (%) for different ARIs for candidate regions

set provides the smallest median RE values among all the seven candidate regions, which is also consistent in terms of ratio values as discussed before. Table 7 Median relative error values (%) for seven ANNbased candidate regions and QRT

Table 8 Coefficient of efficiency (CE) values for seven ANN-based candidate regions and QRT

Flood quantile

Table 8 summarizes the results based on CE values for different candidate regions. Regions based on state boundaries show very small CE values except for TAS for Q2. Similarly, SDRR and WDRR regions performs ‘poorly’ with the CE values ranging from -2.14 to 0.48. Overall, in terms of CE values, the best performance with ANN-based model can be seen with the combined data set. In this case, the best results are obtained for smaller ARIs with CE value of 0.73 for Q2 followed by 0.71 for Q20. The smallest value (0.52) is found for Q100. It is thus found that the biggest region/dataset i.e. the combined data set provides the best performing ANNbased RFFA model; as with a bigger dataset, the regional flood patterns are easier to model, which increases the overall model predictability. The best performing ANN model (i.e. Model 1 and combined data set) was compared with the QRT. Here the same dataset was used for building and testing the ANN and QRT models. Based on the median ratio values as shown in Table 6, the ANN-based RFFA model with combined data set shows a median ratio closer to 1.00 than the QRT models for all the six ARIs. Similarly, as shown in Table 7, the ANN-based RFFA model with combined data shows a smaller median RE values than the QRT models for all the ARIs. Furthermore, in Table 8, the ANN-based RFFA models outperform the QRT models with respect to CE values. These results demonstrate that ANN-based RFFA models outperform the QRT models considering all the three evaluation statistics. It should be noted here that the median RE values for the best ANN-based RFFA model developed here range from 35 to 44 % (with few

ANN-based models

QRT

NSW

VIC

QLD

TAS

SDRR

WDRR

Combined

Combined

Q2

48.21

78.05

42.42

65.77

52.40

48.50

37.56

65.38

Q5

51.94

40.89

50.24

55.52

29.87

53.03

40.39

45.35

Q10

91.52

39.75

37.67

64.61

52.79

43.88

44.63

57.62

Q20

53.17

55.58

37.67

38.19

43.12

52.75

35.62

42.64

Q50

82.08

73.75

57.90

146.47

57.66

102.13

39.09

48.71

Q100

50.00

66.88

58.45

15.28

54.85

67.72

44.53

51.72

Flood quantile

ANN-based models NSW

VIC

QRT QLD

TAS

SDRR

WDRR

Combined

Combined

Q2

0.38

-0.01

0.67

0.84

-2.14

0.30

0.73

0.35

Q5

0.42

0.56

0.19

0.40

0.33

0.34

0.61

0.37

Q10

-0.30

0.12

0.35

0.51

0.04

0.48

0.63

0.30

Q20

-4.92

0.26

0.22

0.48

0.23

0.24

0.71

0.37

Q50 Q100

0.40 -0.06

0.26 0.10

-0.76 0.02

-4.26 0.67

0.28 0.29

0.14 0.47

0.68 0.52

-8.42 0.38

123

552

cases where RE [ 100 %), which is typical with Australian regional flood estimation methods (e.g. see Haddad et al. 2012; Haddad and Rahman 2012a). To enhance the accuracy of regional flood estimation methods in Australia, a larger data set with longer streamflow record lengths would be needed as Australia is characterized by a highly variable hydrology/flood regime. It is expected that the availability of such a larger data in future would enhance the accuracy of regional flood estimation methods in Australia.

6 Conclusions The paper examines the application of the ANN based RFFA method using data from 452 gauged catchments in eastern Australia. Based on an independent testing, it has been found that the ANN-based RFFA model with only two predictor variables (catchment area and design rainfall intensity) outperforms other models with a greater number of predictor variables. This model would be easier to apply in practice as these two predictor variables are readily obtainable. Seven different regions have been compared with the ANN-based RFFA models and it has been found that when the data from all the eastern Australian states are combined together to form a single region, the ANN presents the best performing model with median RE values in the range of 35–44 %, median ratio of predicted and observed flood quantiles in the range of 0.99–1.14 and CE values in the range of 0.52–0.73. It has also been found that ANN-based RFFA models are not very successful when the data set consists of smaller number of stations where the performances of the ANN-based models show notable inconsistency across various ARIs. This result indicates that a relatively larger data set is better suited for successful training and testing of the ANN-based RFFA models. It has also been found that ANN-based RFFA models outperform the traditional QRT for eastern Australia. Acknowledgments The authors would like to acknowledge the financial supports of Geoscience Australia and Engineers Australia and various government and private organizations in Australia that provided the data for the project: Department of Sustainability and Environment (VIC), Australian Bureau of Meteorology, Department of Natural Resources and Water (QLD), Department of Water and Energy (NSW) and ENTURA (TAS). The authors would also like to thank two anonymous reviewers and the Associate Editor for very useful comments which have helped to improve the paper notably.

References Abrahart RJ, Kneale PE, See L (eds) (2004) Neural networks for hydrological modelling. Taylor & Francis, London Abrahart RJ, Heppenstall AJ, See LM (2007) Timing error correction procedure applied to neural network rainfall-runoff modelling. Hydrol Sci J 52(3):414–431

123

Stoch Environ Res Risk Assess (2014) 28:541–554 Acreman MC, Sinclair CD (1986) Classification of drainage basins according to their physical characteristics and application for flood frequency analysis in Scotland. J Hydrol 84(3):365–380 ASCE Task Committee (2000) Artificial neural networks in hydrology-I: Preliminary concepts. J Hydrol Eng 5(2):115–123 Bates BC, Rahman A, Mein RG, Weinmann PE (1998) Climatic and physical factors that influence the homogeneity of regional floods in south-eastern Australia. Water Resour Res 34(12):3369–3382 Bayazit M, Onoz B (2004) Sampling variances of regional flood quantiles affected by inter-site correlation. J Hydrol 291:42–51 Benson MA (1962) Evolution of methods for evaluating the occurrence of floods. U.S. Geological Surveying Water Supply Paper, 1580-A, p 30 Besaw L, Rizzo DM, Bierman PR, Hackett WR (2010) Advances in ungauged streamflow prediction using artificial neural networks. J Hydrol 386(1–4):27–37 Blo¨schl G, Sivapalan M (1997) Process controls on regional flood frequency: coefficient of variation and basin scale. Water Resour Res 33:2967–2980 Bobee B, Cavadias G, Ashkar F, Bernier J, Rasmussen P (1993) Towards a systematic approach to comparison of distributions used in flood frequency analysis. J Hydrol 142:121–136 Burn DH (1990) Evaluation of regional flood frequency analysis with a region of influence approach. Water Resour Res 26(10):2257–2265 Cavadias GS, Ouarda TBMJ, Bobee B, Girard C (2001) A canonical correlation approach to the determination of homogeneous regions for regional flood estimation of ungauged basins. Hydrol Sci J 46(4):499–512 Chokmani K, Ouarda BMJT, Hamilton S, Ghedira MH, Gingras H (2008) Comparison of ice-affected streamflow estimates computed using artificial neural networks and multiple regression techniques. J Hydrol 349:83–396 Cunnane C (1988) Methods and merits of regional flood frequency analysis. J Hydrol 100:269–290 Dalrymple T (1960) Flood frequency analyses. U.S. Geological Survey Water Supply Paper 1543-A, pp 11–51 Daniell TM (1991) Neural networks: applications in hydrology and water resources engineering. International Hydrology & Water Resources Symposium. Perth, Australia, 2–4 Oct 1991 Dastorani MT, Wright NG (2001) Application of artificial neural networks for ungauged catchment flood prediction. Floodplain Management Association Conference, San Diego, CA Dawson CW, Wilby RL (2001) Hydrological modelling using artificial neural networks. Prog Phys Geogr 25(1):80–108 Dawson CW, Abrahart RJ, Shamseldin AY, Wilby RL (2006) Flood estimation at ungauged sites using artificial neural networks. J Hydrol 319:391–409 Farmer JD, Sidorowich J (1987) Predicting chaotic time series. Phys Rev Lett 59(8):845–848 Flavell D (2012) Design flood estimation in Western Australia. Aust J Water Resour 16(1):1–20 Gao C, Gemmer M, Zeng X, Liu B, Su B, Wen Y (2010) Projected streamflow in the Huaihe River Basin (2010–2100) using artificial neural network. Stoch Environ Res Risk Assess 24:685–697 Govindaraju RS (2000) Artificial neural networks in hydrology II. Hydrological applications. J Hydrol Eng 5(2):124–137 Greis NP, Wood EF (1983) Regional flood frequency estimation and network design. Water Resour Res 17:1167–1174 Griffis VW, Stedinger JR (2007) The use of GLS regression in regional hydrologic analyses. J Hydrol 344:82–95 Grubbs FE, Beck G (1972) Extension of sample sizes and percentage points for significance tests of outlying observations. Technometrics 14:847–854 Guse B, Thieken AH, Castellarin A, Merz B (2010) Deriving probabilistic regional envelope curves with two pooling methods. J Hydrol 380(1–2):14–26

Stoch Environ Res Risk Assess (2014) 28:541–554 Haddad K, Rahman A (2011) Regional flood estimation in New South Wales Australia using generalised least squares quantile regression. J Hydrol Eng 16(11):20–925. doi:10.1061/(ASCE)HE. 1943-5584.0000395 Haddad K, Rahman A (2012a) Regional flood frequency analysis in eastern Australia: Bayesian GLS regression-based methods within fixed region and ROI framework: quantile regression vs. parameter regression technique. J Hydrol. doi:10.1016/j. jhydrol.2012.02.012 Haddad K, Rahman A (2012b) ‘Regional flood frequency analysis in eastern Australia: Bayesian GLS regression-based methods within fixed region and ROI framework: quantile regression vs parameter regression technique. J Hydrol 20:142–161 Haddad K, Rahman A, Weinmann PE, Kuczera G, Ball JE (2010) Streamflow data preparation for regional flood frequency analysis: lessons from south-east Australia. Aust J Water Resour 14(1):17–32 Haddad K, Rahman A, Stedinger JR (2012) Regional flood frequency analysis using Bayesian generalized least squares: a comparison between quantile and parameter regression techniques. Hydrol Process 26:1008–1021 Hopfield J (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA 79(8):2554–2558 Hosking JRM, Wallis JR (1993) Some statistics useful in regional frequency analysis. Water Resour Res 29(2):271–281 Hosking JRM, Wallis JR (1997) Regional frequency analysis: an approach based on L-moments. Cambridge University Press, New York Huo Z, Feng S, Kang S, Huang G, Wang F, Guo P (2012) Integrated neural networks for monthly river flow estimation in arid inland basin of Northwest China. J Hydrol 420–421:159–170 Institution of Engineers Australia (I.E. Aust.) (1987/2001) Australian rainfall and runoff: a guide to flood estimation. In: Pilgrim DH (ed), vol 1. I. E. Aust, Canberra Ishak E, Haddad K, Zaman M, Rahman A (2011) Scaling property of regional floods in New South Wales Australia. Nat Hazards 58:1155–1167. doi:10.1007/s11069-011-9719-6 Ishak E, Rahman A, Westra S, Sharma A, Kuczera G (2013) Evaluating the non-stationarity of Australian annual maximum floods. J Hydrol 494:134–145 Javelle P, Ouarda BMJT, Lang M, Bobee B, Galea G, Gresillon JM (2002) Development of regional flood-duration-frequency curves based on the index flood method. J Hydrol 258:249–259 Jiapeng H, Zhongmin L, Zhongbo Y (2003) A modified rational formula for flood design in small basins. J Am Water Resour Assoc 39(5):1017–1025 Jingyi Z, Hall MJ (2004) Regional flood frequency analysis for the Gan-Ming River basin in China. J Hydrol 296:98–117 Kashani MH, Montaseri M, Yaghin MAL (2008) Flood estimation at ungauged sites using a new hybrid model. J Appl Sci 8: 1744–1749 Kendall MG (1970) Rank correlation methods, 2nd edn. Hafner, New York Kirby W, Moss M (1987) Summary of flood frequency analysis in the United States. J Hydrol 96:5–14 Kjeldsen TR, Jones D (2009) An exploratory analysis of error components in hydrological regression modelling. Water Resour Res 45:W02407. doi:10.1029/2007WR006283 Kjeldsen TR, Jones DA (2010) Predicting the index flood in ungauged UK catchments: on the link between data-transfer and spatial model error structure. J Hydrol 387(1–2):1–9. doi:10.1016/j. jhydrol.2010.03.024 Kothyari UC (2004) Estimation of mean annual flood from ungauged catchments using artificial neural networks. In: Proceedings of

553 the British hydrological society international conference on hydrology: science and practice for the 21st century, vol. 1 Kuczera G (1999) Comprehensive at-site flood frequency analysis using Monte Carlo Bayesian inference. Water Resour Res 35(5):1551–1557 Luk KC, Ball JE, Sharma A (2001) An application of artificial neural networks for rainfall forecasting. Math Comput Model 33: 683–693 Madsen H, Pearson CP, Rosbjerg D (1997) Comparison of annual maximum series and partial duration series for modeling extreme hydrological events—2. Regional modeling. Water Resour Res 33(4):771–781 Maier HR, Dandy GC (2000) Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environ Model Softw 15(1):101–123 McCulloch WS, Pitts W (1943) A logic calculus of the ideas immanent in nervous activity. Bull Math Biophys 5:115–133 Mulvany TJ (1851) On the use of self-registering rain and flood gauges. Inst Civ Eng Trans (Ireland) 4(2):1–8 Muttiah RS, Srinivasan R, Allen PM (1997) Prediction of two year peak stream discharges using neural networks. J Am Water Resour Assoc 33(3):625–630 Nathan RJ, McMahon TA (1990) Identification of homogeneous regions for the purpose of regionalisation. J Hydrol 121:217–238 National Research Council (NRC) (1988) Estimating probabilities of extreme floods: methods and recommended research. National Academy Press, Washington, DC, p 141 NERC (1975) Flood studies report, vol. 5. Natural Environment Research Centre (NERC), London Ouarda TBMJ, Baˆ KM, Diaz-Delgado C, Caˆrsteanu C, Chokmani K, Gingras H, Quentin E, Trujillo E, Bobe´e B (2008) Intercomparison of regional flood frequency estimation methods at ungauged sites for a Mexican case study. J Hydrol 348:40–58 Pallard B, Castellarin A, Montanari A (2009) A look at the links between drainage density and flood statistics. Hydrol Earth Syst Sci (HESS) 13:1019–1029 Pandey GR, Nguyen VTV (1999) A comparative study of regression based methods in regional flood frequency analysis. J Hydrol 225:92–101 Pegram GGS, Parak M (2004) A review of the regional maximum flood and rational formula using geomorphological information and observed floods. Water S Afr 30(3):377–392 Potter KW (1987) Research on flood frequency analysis, 1983–1986. Rev Geophys 25(2):113–118 Rahman A (2005) A quantile regression technique to estimate design floods for ungauged catchments in south-east Australia. Aust J Water Resour 9(1):81–89 Rahman A, Bates BC, Mein RG, Weinmann PE (1999) Regional flood frequency analysis for ungauged basins in south-eastern Australia. Aust J Water Resour 3(2):199–207 Rahman A, Haddad K, Zaman M, Kuczera G, Weinmann PE (2011) Design flood estimation in ungauged catchments: a comparison between the probabilistic rational method and quantile regression technique for NSW. Aust J Water Resour 14(2):127–137 Rao AR, Srinivas VV (2008) Regionalization of watersheds: an approach based on cluster analysis. Springer, Berlin, ISBN: 14020685142008 Riad S, Mania J, Bouchaou L, Najjar Y (2004) Rainfall-runoff model using an artificial neural network approach. Math Comput Model 40(7–8):839–846 Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In Rumelhart DE, McClelland JL, PDP Research Group (eds) Paralled distributed processing. Explorations in the microstructure of cognition, vol. 1. MIT Press, Cambridge, MA, pp 318–362

123

554 Shamseldin AY (1997) Application of a neural network technique to rainfall-runoff modelling. J Hydrol 199:272–294 Shu C, Burn DH (2004) Artificial neural network ensembles and their application in pooled flood frequency analysis. Water Resour Res 40(9):W09301. doi:10.1029/2003WR002816 Shu C, Ouarda TBMJ (2007) Flood frequency analysis at ungauged sites using artificial neural networks in canonical correlation analysis physiographic space. Water Resour Res 43:W07438. doi:10.1029/2006WR005142 Shu C, Ouarda TBMJ (2008) Regional flood frequency analysis at ungauged sites using the adaptive neuro-fuzzy inference system. J Hydrol 349:31–43 Stedinger JR, Tasker GD (1985) Regional hydrologic analysis: 1. Ordinary, weighted and generalized least squares compared. Water Resour Res 21:1421–1432 Tasker GD (1980) Hydrologic regression with weighted least squares. Water Resour Res 16(6):1107–1113 Tasker GD, Eychaner JH, Stedinger JR (1986) Application of generalised least squares in regional hydrologic regression analysis. US Geological Survey Water Supply Paper 2310, pp 107–115

123

Stoch Environ Res Risk Assess (2014) 28:541–554 Thomas DM, Benson MA (1970) Generalization of streamflow characteristics from drainage-basin characteristics. U.S. Geological Survey Water Supply Paper 1975, US Governmental Printing Office Turan ME, Yurdusev MA (2009) River flow estimation from upstream flow records by artificial intelligence methods. J Hydrol 369:71–77 World Meteorological Organization (WMO) (1989) Statistical distributions for flood frequency analysis. Operational Hydrology Report, 33 Wu J, Li N, Yang H, Li C (2008) Risk evaluation of heavy snow disasters using BP artificial neural network: the case of Xilingol in Inner Mongolia. Stoch Environ Res Risk Assess 22:719–725 Zaman M, Rahman A, Haddad K (2012) Regional flood frequency analysis in arid regions: a case study for Australia. J Hydrol 475:74–83 Zhang B, Govindaraju RS (2003) Geomorphology-based artificial neural networks for estimation of direct runoff over watersheds. J Hydrol 273(1):18–34 Zrinji Z, Burn DH (1994) Flood frequency analysis for ungauged sites using a region of influence approach. J Hydrol 153(1–4):1–21