Combining Artificial Neural Network Models, Geostatistics, and ...

0 downloads 0 Views 1MB Size Report
Abstract—A new modeling framework combining neural- network-based models, passive microwave data, and geostatistics is proposed for snow water ...
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 46, NO. 7, JULY 2008

1925

Combining Artificial Neural Network Models, Geostatistics, and Passive Microwave Data for Snow Water Equivalent Retrieval and Mapping Noël Dacruz Evora, Dominique Tapsoba, and Danielle De Sève

Abstract—A new modeling framework combining neuralnetwork-based models, passive microwave data, and geostatistics is proposed for snow water equivalent (SWE) retrieval and mapping. Brightness temperature data from the seven-channel Special Sensor Microwave/Imager and the interpolated minimum temperature are the inputs of a multilayer feedforward neural network (MFF). Kriging with an External Drift algorithm is applied to ground-based SWE data to produce gridded SWE data that are used as the target of the neural network. An optimal division of the sample of available pixels is achieved by a self-organizing feature map. Prediction error is used for model selection and is assessed by bootstrap. It is shown that a committee of a network containing neural networks with different architectures can provide consistent SWE retrievals. This modeling framework is applied for SWE retrieval and mapping over La Grande River basin in north eastern Quebec (Canada). The results are very promising for operational purposes particularly for SWE mapping during periods with no ground measurements and operational streamflow forecasting. Index Terms—Bootstrap, Kriging with an External Drift (KED), multilayer perceptrons (MLPs), passive microwave, selforganizing feature map (SOFM), snow water equivalent (SWE), Special Sensor Microwave/Imager (SSM/I).

I. I NTRODUCTION

S

NOW cover parameters [extent, depth, density, and resulting snow water equivalent (SWE)] are very important state variables of hydrological, meteorological, and climatological models in areas of the world dominated by snow. Snow cover is one of the landscapes having the greatest variability in terms of extent. During the annual cycle, an average of 7%–40% of the land in the Northern Hemisphere is covered by snow on a monthly basis [1]. In Canada, snow cover appears and disappears in the course of a six-month period. Snow is an important source of water supply because the melting of the snow that has settled during winter generates the floods occurring in spring. Information on SWE is thus very relevant for hydrological forecasting and monitoring and for climatological applications as well because SWE has an impact on the global water and energy balance.

Manuscript received March 1, 2007; revised December 23, 2007. This work was supported by Hydro-Québec Production, one of the four divisions of Hydro-Québec. The authors are with Hydro-Québec Research Institute, Varennes, QC J3X 1S1, Canada (e-mail: [email protected]; [email protected]; deseve. [email protected]). Digital Object Identifier 10.1109/TGRS.2008.916632

A. Objectives of Study Our goal is to use a neural network model to retrieve SWE by using passive microwave data [Special Sensor Microwave/ Imager (SSM/I)] as basic predictor variables. Reliable estimates of SWE are very important for Hydro-Québec since snowmelt provides the water that fills the reservoirs and is used thereafter for hydropower generation. For operational run-off and streamflow forecasts, the maximum SWE observed prior to the onset of the spring snowmelt is a critical information. This information is used to assess the a priori potential for getting large run-off and floods. It can also be used to update the snow-related state variables of our hydrological models, which forecast water inflows into the reservoirs during spring. Artificial neural networks (ANNs) may have better performance than linear regression models because of their ability to approximate nonlinear functions. Nonlinearity is a major property that makes ANNs very appealing, particularly if the functional relation between the predictors (SSM/I data) and the predictand (SWE) is nonlinear. Another important benefit of ANN models is that they are data-driven models providing an input–output mapping through a supervised learning process that modifies the synaptic weights between the neurons to reduce gradually the difference between the desired response and the response from the network [2]. ANNs are now recognized as nonparametric statistical models rather than “magical” black box models. They have the advantage of capturing highdimensional effects, which is the case when many predictor variables affect the predictand [3]. B. Background Passive microwave signals have the ability to penetrate dry snow and clouds, providing information about the spatial and temporal variations of snow cover in large areas of boreal regions at different frequencies and at horizontal and vertical polarizations. However, using passive microwave satellite data to map areal SWE is very risky when there is liquid water inside the snow pack or when grain size varies [4], [5]. Carroll et al. [5] pointed out four important factors restricting the use of SSM/I data for SWE mapping: coarse resolution of SSM/I-derived snow cover products, presence of forested or heavily vegetated or mountainous environments, presence of small amounts of liquid water on the snow surface, and existence of precipitating clouds in the atmosphere. Grain size can significantly alter the performance of passive microwave SWE

0196-2892/$25.00 © 2008 IEEE

1926

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 46, NO. 7, JULY 2008

algorithms because of its influence on the scattering of emitted energy. Microwave brightness temperature can decrease with an increase in snow grain size even if other parameters of the snowpack remain constant [6]. The large depth hoar grains induce a high level of scattering, reducing the snowpack emissivity [7]. A reliable SWE estimate from passive microwave data is then conditioned by the use of appropriate screening criteria. These criteria eradicate the footprints affected by wet snow, large water bodies, and the presence of forest and depth hoar [1]. Many inversion algorithms that retrieve SWE using passive microwave brightness temperature (TB ) data or indices (typically TB differences among SSM/I channels) have been proposed over the last 20 years. These algorithms are mainly multiple regression models and have performed rather well in retrieving SWE. Usually, better performances were obtained on areas showing the following physical description: nonforested and noncomplex terrain, dry or no melting snow, and no depth hoar. The common retrieval algorithms make use of the 19- and 37-GHz horizontal polarization channels [8]–[10]. However, Goodison et al. [11] and Hallikainen [12] developed SWE retrieval algorithms by using vertical polarization at the same frequencies. Aschbacher [13] proposed the spectral polarization difference (SPD) algorithm. The SPD is based on the TB of two SSM/I channels (19 GHz at horizontal and vertical polarizations and 37 GHz at vertical polarization). Hallikainen and Jolma [14] used Nimbus-7 SMMR data and found that the best performing algorithms were based on the brightness temperature differences between 37 GHz and either 18 or 10.7 GHz, all of them in vertical polarization. Tait [15] used 19-, 37-, and 85-GHz frequencies at both horizontal and vertical polarizations and obtained better results when difference functions (19–37 GHz; 19–85 GHz) were used as independent variables. Singh and Gan [1] performed a review of existing algorithms for retrieving SWE from SSM/I data. New meaningful algorithms were also proposed in their study by including physiographic and atmospheric data as predictors. Goïta et al. [16] developed SWE retrieval algorithms that consider the percentage of forest and/or water bodies inside the footprints. Another important aspect of the relationship between ground measured SWE and brightness temperature difference functions is the existence of a threshold value. The algorithms work well for low SWE values but perform poorly with deeper snow [6], [17], [18]. De Seve et al. [17] showed that after this threshold value (around 200 mm of SWE for the boreal region of Quebec), there is a change of slope in the relationship between SWE and the normalized brightness temperature difference between 19- and 37-GHz channels. Very few studies have used neural network models for SWE retrieval from SSM/I data. Tedesco et al. [18] developed and tested an inversion technique for the retrieval of SWE and dry snow depth based on ANNs. The four input predictors were made up of 19- and 37-GHz vertical and horizontal brightness temperatures. The layout of this paper is as follows. In Section II, the study area and data are described. In Section III, the modeling framework is presented. Section IV describes the method of

Fig. 1. Map of the province of Quebec (Canada) outlining the study area (red box).

Kriging with an External Drift (KED) which was used for SWE observation mapping in order to provide the information on a grid at the same resolution as the SSM/I data. In Section V, the methodology of neural network modeling for the retrieval of SWE from SSM/I data is described. The neural-network-based model is used in 2003 for the retrieval and mapping of SWE over La Grande River basin (northern Quebec, Canada). In Section VI, the prediction error is estimated and used to select the best models, and their performance in 2003 is assessed. Finally, Section VII is dedicated to the discussion and conclusions of this paper. II. S TUDY A REA AND D ATA A. Study Area The study area is the La Grande River basin located in a taiga environment in northern Quebec, Canada (Fig. 1). La Grande River rises in the highlands of north central Quebec and flows westward over a distance of approximately 900 km draining into James Bay. This basin has numerous lakes and rivers and covers an area of approximately 180 000 km2 . HydroQuébec operates the La Grande River water-resource system which has great hydroelectric power generation potential (16 GW of installed capacity overall). Vegetation over this basin consists principally of open forests of black spruces. The soil is constituted of till in the eastern part of the basin and is covered by lichen. Elevations range from 50 m in the southeastern part of the basin to almost 1000 m above the mean sea level in the northwestern areas. The climate is subarctic, and air temperature during winter falls usually under −30 ◦ C. Snow covers the ground from November to May. This basin was selected mainly because it is a low forested (taiga environment) and nonmountainous area. In fact, forests affect the microwave radiation received by the satellite because they are responsible

EVORA et al.: COMBINING ANN MODELS, GEOSTATISTICS, AND PASSIVE MICROWAVE DATA

1927

mainly because of the large areas to be covered, the large gaps in the spatial and temporal coverages of the recording stations, and the tough conditions of the weather during winter of the boreal regions. Minimum temperatures (TMIN) are obtained from Environment Canada weather stations. III. M ODELING F RAMEWORK

Fig. 2.

Snow stations network.

for the attenuation of the ground microwave signal while contributing to the brightness temperature [15], [16]. Mountains with significant relief can alter the microwave signal [19]. B. Data Brightness temperature data used in this paper are provided by the seven-channel SSM/I aboard the Defense Meteorological Satellite Program F-11 and F-13 spacecrafts. The passive microwave radiometer operates at four frequencies (19, 22, 37, and 85 GHz) both at horizontal and vertical polarizations, except at 22 GHz which only operates at vertical polarization. The footprint ranges from 69 × 43 km at 19 GHz to 15 × 13 km at 85 GHz. These SSM/I data are in descending mode and have the same spatial resolution of 25 × 25 km. The SSM/I EASE-Grid brightness temperature data were acquired free on the National Snow and Ice Data Center Web site. In order to have the SSM/I EASE-Grid data in the same projection as the one of the NOAA/AVHRR ground occupation mosaic, the brightness temperature data were reprojected by using a Lambert Conformal Conic projection. This projection was implemented in a code written for PCI software. More specifically, this projection is based on a central meridian line (68◦ N, 90◦ W), a reference latitude (53◦ N), and a reference origin (63◦ N, 90◦ W). The ground occupation mosaic of the study area was carried out by the Canada Centre for Remote Sensing. An automated SSM/I data acquisition procedure has been developed and implemented at Hydro-Québec, and the data are archived in a database. A very important step toward the reliable retrieval of SWE from passive microwave remote sensing data is to use some screening criteria to eradicate all the footprints affected by liquid precipitation [20] and wet snow [21]. This step is also very helpful in achieving a good training of the neural-networkbased model used to retrieve SWE from SSM/I data. No depth hoar screening criteria have been used formally. SWE observations are derived from snow depth and density measured manually at snow stations (Fig. 2). These SWE observations are typically sparse both spatially and temporally

A new modeling framework is proposed to retrieve SWE over large low forested and nonmountainous areas of boreal regions. It combines simultaneously the use of artificial-intelligencebased models, namely, ANNs, passive microwave data, and geostatistics. Our modeling framework involves the following two steps. 1) Mapping Point-Based SWE Observations: This is a very important and crucial step of our modeling framework. To fully take advantage of remote sensing data that are spatially distributed and not be restricted in using information available only at a few stations over large areas, it was important to find a way to go from point-based to area-based information and to provide reliable interpolated values at each pixel over a basin. A KED algorithm is applied to point-based SWE data to produce gridded SWE data that are used as the target of the neural network model. KED also provides a grid for the variance of estimation, which measures the estimation accuracy of the geostatistical model. 2) Training of an MFF Neural Network: This second step involves the training of an MFF neural network to retrieve SWE from SSM/I data. Optimal division of the data is obtained through the use of a self-organizing feature map (SOFM) that clusters the input and output data into representative subsets. The training methodology involves the selection of the best pixels. Two approaches to constituting the sample of pixels before clustering them by an SOFM are available: 1) use all the available pixels or 2) select only those pixels according to a precise criterion. We have used the second approach to select the best pixels for the neural network training (see Section V). IV. M APPING SWE O BSERVATIONS U SING KED Since detailed information about geostatistical procedures can be found in the scientific literature [22]–[26], only a brief description of the geostatistical methods used in this paper is provided. The first step in the geostatistical analysis is to calculate a sample variogram using the following equation: 1  [Z(xi + h) − Z(xi )]2 2n i=1 n

γ(h) =

(1)

where x and x + h are sampling locations separated by a distance vector h, and Z(x) and Z(x + h) are measured values of the variable Z at the corresponding locations. Z is assumed to come from a stationary random process. The sample variogram is fitted with a model, and adequacy of the chosen model is tested by using cross-validation. The crossvalidation phase consists in removing, in turn, one data point and reestimating it (by Kriging) from its neighbors and using

1928

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 46, NO. 7, JULY 2008

the model previously fitted. Statistics of cross-validation used to select the variogram model are the mean error between measured and Kriging estimated values, the correlation coefficient between measured and estimated values, and the Kriging variance [27]. After a proper variogram model is chosen, Kriging is applied to estimate the values of unsampled points using data from the sampled area. A Kriging estimator is expressed as follows: Z ∗ (x) =

n 

λi Z(xi )

(2)

The external drift variable y(x) was expressed by elevation combined with geographical coordinates. A high correlation between SWE and DEM justifies the integration of DEM as an external drift. At the SWE measurement locations, the elevation values were extracted from the DEM and cross-plotted against the SWE values. The correlation coefficient between SWE and DEM for all the dates that were investigated varies from 0.7 to 0.9. In the case of a combination of several variables, the external variable is expressed as a first-order polynomial equation

i=1

where Z ∗ (x) is an estimated value of Z at location x, λi is the weight of the observation at location x, and n is the number of observations within the neighborhood. The weights are determined by the variogram model, which takes into account the spatial variability of the random field. KED combines measurements with additional information. It merges two sources of information: a primary variable that is accurate and precise but only available at a limited number of locations and a secondary variable that covers the full domain with a fine grid resolution but is less accurate. This method originated from petroleum and gas exploration where a few accurately measured (expensive) borehole data needed to be combined with many fairly imprecise (but easily obtainable) seismic data in order to map the top of a reservoir [28], [29]. More recently, the method has been applied in hydrogeology to map the log of transmissivity using the log of specific capacity as an external drift variable [30]. Another application in the same field consists in mapping piezometric measurements for a given year by combining the few available data from that year with a great amount of data recorded in a former year [31]. The possibility of including several shape functions as external drift has been tested by Renard and Nai-Hsien [32]. In this paper, SWE measurements are considered as the primary variable, and elevation as the secondary variable. The elevation is the single most important variable describing largescale snow variability [33]. Elevation is derived from a 1-km digital elevation model (DEM) over the river basin. Collocated co-Kriging [34] can be used as an alternative, but KED requires a less demanding variogram analysis. Furthermore, comparison studies showed that KED interpolation performs better than the collocated co-Kriging [35], [36]. KED is a form of data fusion using a regression-based interpolation method. The secondary information is treated as a covariate or explanatory variable and, as such, partly explains the variation in the spatially correlated observations. KED has been successfully applied for SWE mapping over La Gatineau River basin using sparsely sampled data and elevation [37]. Since detailed information about KED can be found in [38], a very concise presentation of the technique is given in the succeeding discussions. KED represents the Kriging estimate Z ∗ (x) as a sum of a trend component m(x) and a residual R(x), where the trend component m(x) is a linear function of the external drift variable y(x) m(x) = a0 + a1 y(x).

(3)

y(x) = b0 +b1 y1 (x)+b2 y2 (x)+b3 y3 (x)+· · ·+bt yt (x).

(4)

This equation is a result of the multiple regression analysis conducted for SWE versus elevation; bi ’s are the regression coefficients, and yi (x)’s represent the variables used in the regression analysis. The geostatistical analysis is conducted using values of the residuals as the difference between the observed SWE values and values calculated from the regression equation. Knowing the elevation at every point and modeling the relationship between elevation and SWE make it possible to improve the mapping of SWE between monitoring sites. In order to assess the impact of the drift, ordinary Kriging (OK) and KED maps are shown in Fig. 3(a) and (b), respectively. OK is a basic Kriging method that does not take the elevation into account. Since Kriging takes into account the spatial continuity of the observed data, the resulting map respects the SWE’s specific spatial behavior. When the SWE is linked to elevation, this additional indirect information is taken into account so as to produce a map that reflects this link. In a qualitative sense, OK SWE fields look very smooth and with spatial variations conditioned to monitoring sites values, although the elevation data introduced through correlation maps are reflected in a slight increase of roughness of estimated fields. KED fields show a realistic spatial structure that follows more closely the structure of elevation fields but with SWE values suggested from observation sites. Because of the probabilistic scope of geostatistics, it is possible to quantify the uncertainty associated with the interpolated value by using the Kriging variance. This variance—or its square root, the Kriging standard deviation s(x)—represents the possible scattering of the real yet unknown SWE, around the value obtained by Kriging. The smaller the scatter, the closer to reality the interpolated value will be—on average—and so the map will be more accurate. The large values of the Kriging variance usually display undersampled areas on the interpolated map. The maps in Fig. 4(a) and (b) display the Kriging standard deviation for SWE based on Kriging from only the measured SWE values and KED, respectively. It is noteworthy that the use of an external drift such as the elevation obviously improves the map, particularly in areas with very few monitoring sites and where extrapolation is involved. These results were consistent with a previous study by Tapsoba et al. [37] demonstrating that using the KED method improved SWE estimation over the Gatineau basin. All the geostatistical analyses of this paper were performed with the ISATIS software package [39].

EVORA et al.: COMBINING ANN MODELS, GEOSTATISTICS, AND PASSIVE MICROWAVE DATA

Fig. 3. SWE mapping at 1-km resolution over the La Grande River basin obtained (a) from Kriging only the measured SWE values and (b) from KED that includes the elevation as auxiliary information. Values are in centimeters.

1929

Fig. 4. Mapping at 1-km resolution of the Kriging standard deviation for SWE over La Grande River basin obtained (a) from Kriging only the measured SWE values and (b) from KED that includes the elevation as auxiliary information. Values are in centimeters.

V. N EURAL N ETWORK M ODELING FOR THE R ETRIEVAL OF SWE F ROM SSM/I D ATA A. Neural Network Architecture The standard multilayer perceptron (MLP) is used for the retrieval of SWE from SSM/I data. The MLP corresponds to a generalization of the single-layer perceptron that was introduced by Rosenblatt [40] who proposed the perceptron as the first model for learning with a teacher. The MLP is one of the most popular ANNs because it solves efficiently various input–output mapping problems. The MLP architecture is very simple: an input layer, one or more hidden layers, and an output layer. Each layer is formed by a group of neurons or processing units, and each of these processing units sums its inputs and adds a bias that is constant to form a total input. This total input usually named activation potential is the only argument of an activation or transfer function, which limits the amplitude of the neuron output. The output signals of each layer are used as the inputs of the following layer, except for the output layer that provides the response of the network. This output layer has one neuron if the network response is a scalar, or it has several neurons if the response is a vector. The MLP network is usually said to be fully connected, each neuron of a layer being connected to every neuron of the following layer. Moreover, this type of network is also called a multilayer feedforward network in the

Fig. 5. MFF neural network architecture used to retrieve SWE from SSM/I data and illustrated here with one hidden layer. Some connection links are missing to facilitate the legibility of the figure.

sense that communication links convey the signal from an input layer to an output layer, but not vice versa. The MLP is trained by using the popular error back-propagation algorithm [41]. An MFF neural network having one or two hidden layers is used for SWE mapping. Eight inputs are used by the network to retrieve SWE (Fig. 5). The brightness temperature data from each SSM/I channel at horizontal and vertical polarizations, except for the 22-GHz channel, which operates at vertical polarization only, are used as the basic neural network inputs. The eighth and last input of the neural network is the interpolated TMIN by KED at a resolution of 25 × 25 km. TMIN is

1930

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 46, NO. 7, JULY 2008

used because its measurement time coincides approximately with the occurrence time of the satellite passes over the basin. Because remote sensing data are spatially distributed, the neural network as the target uses gridded SWE estimates with a spatial resolution of 25 × 25 km obtained by using a KED algorithm. Assuming that an MFF neural network has only one hidden layer (layer j) with m neurons, the scalar output Zˆ ∗ (x) of this network is given at location x by Zˆ ∗ (x) = f0 [b0 + W 2 · fj (bj + W 1 · px )]

(5)

where f0 (.) and fj (.) are the activation functions of output layer and hidden layer j, respectively, b0 is the scalar bias of the output layer, bj is a vector (m × 1) of bias for layer j, px is an input vector of size (r × 1) at location x, described in the previous paragraph, W 1 is a matrix (m × r) of weights between the input layer and the hidden layer j, and W 2 is a vector (1 × m) of weights between the hidden layer j and the output layer. An r−m−1 feedforward neural network has r inputs, m neurons in the hidden layer, and 1 output. This neural network has m(r + 2) + 1 parameters. In the case where an MFF neural network has two hidden layers with m and l neurons in layer j and layer k, respectively, the scalar output Zˆ ∗ (x) of this network is given at location x by Zˆ ∗ (x) = f0 [b0 +W 3 ·fk (bk +W 2 ·fj (bj +W 1 ·px )]

(6)

where f0 (.), fk (.), and fj (.) are the activation functions of output layer and hidden layers k and j, respectively, b0 is the scalar bias of the output layer, bj is a vector (m × 1) of bias for layer j, bk is a vector (l × 1) of bias for layer k, px is an input vector of size (r × 1) at location x, W 1 is a matrix (m × r) of weights between the input layer and the hidden layer j, W 2 is a matrix (l × m) of weights between the hidden layers j and k, and W 3 is a vector (1 × l) of weights between the hidden layer k and the output layer. In this paper, all the activation functions are log-sigmoid functions. An r−m−l−1 feedforward neural network has r inputs, m neurons in the first hidden layer, l neurons in the second hidden layer, and 1 output. This neural network has m(r + l + 1) + 2l + 1 parameters. B. Training Methodology Our training methodology involves the following three steps. 1) Best Pixel Selection: The selection of the best pixels available for the training of the MFF neural network is based on a criterion defined according to the characteristics of the target. This criterion can be compared to a noise-to-signal ratio denoted from now on by rKED rKED (x) = s(x)/Z ∗ (x)

(7)

where s(x) is the standard deviation of the error of estimation and Z ∗ (x) is the interpolated value of SWE by KED at location x (see Section IV). Fig. 6 shows the values of rKED for an SWE map obtained by KED for February 6, 2000. It is clear that there is a lot of uncertainty attached to the SWE of a few pixels showing a value of rKED over 50%. One pixel has a particularly high

Fig. 6. Ratio between the standard deviation of the error of estimation and the value of SWE interpolated by KED for February 6, 2000.

rKED value over 100%, which means that, for this pixel, the standard deviation of the error of estimation is greater than the SWE obtained by KED. Only the pixels showing a relatively low value of rKED , which provide relatively accurate SWE observations, are chosen to make up the sample of pixels available for the training of the MFF neural network. 2) Optimal Division of the Training Pixels: The best pixel sample is divided into representative subsets by using an SOFM. One of the key aspects when training a neural network is the way that available data are divided into training, validation, and test subsets. With the conventional data division method, the sets are divided on an arbitrary basis. This method does not consider the statistical properties of each subset, and usually, the chronology of the data is not altered. In this paper, optimal division of the input data is obtained through the use of an SOFM that clusters the input and output data into representative subsets. This technique has been proposed recently for optimal division of data for neural network models [42]. The idea behind the development of SOFMs came from the fact that visual or tactile sensory inputs, for example, are mapped onto various areas of the cerebral cortex in a topologically ordered manner [2]. SOFMs perform a specific type of clustering of input vectors according to how they are grouped in the input space. SOFMs build artificial topographic maps through a selforganizing type of learning that is competitive or unsupervised and for which neighboring neurons of the winning neuron learn to recognize neighboring sections of the input space. Initially, the neurons are arranged on a 1-D or 2-D array or lattice according to a topology function that ensures that each neuron has a set of neighbors. The k-dimensional input vectors are then represented by a 1-D or usually 2-D input vector preserving the distribution and the topology of the k-dimensional input vectors. 3) Creating the Training, Validation, and Test Sets: These three data sets are created by sampling each representative cluster. Once the clusters are formed, four data records from each cluster are sampled. For each cluster, the records showing the minimum and maximum values of SWE are positioned in the training data set. To constitute the validation and test sets, pixel selection is made in order to attenuate the influence of the spatial correlation between neighboring pixels. If pixels are not chosen properly, it may result in an artificial increase of the ANN performance. We have considered a mask of

EVORA et al.: COMBINING ANN MODELS, GEOSTATISTICS, AND PASSIVE MICROWAVE DATA

Fig. 7. Profile of a cluster obtained by an SOFM in the transformed domain (range [0, 1]).

25 km around selected pixels at each stage of the sampling procedure. That means that once a pixel has been sampled, all the surrounding pixels (eight at the maximum) cannot be selected. If a cluster contains only one record, then this record is positioned in the training set. The validation set is used to stop the training of the network if the performance on this set is not improving or remains the same for a maximum of five epochs. The test vector has no influence on the training but is used to check that the network generalizes well. An additional check is also made by using another set called the extra set that gathers all the pixels not in the training, validation, and test sets. VI. A PPLICATION OF THE N EURAL -N ETWORK -B ASED M ODEL FOR THE R ETRIEVAL OF SWE F ROM SSM/I D ATA A. Neural Network Training Fifteen images from 1993 to 2002, both for the SSM/I and KED SWE data (interpolated values along with the standard deviation of the error of estimation), have been used to train the MFF neural network with one or two hidden layers. To avoid overfitting, there is a maximum of ten neurons in the hidden layer of a one-hidden-layer neural network, a maximum of eight and four neurons, respectively, in the hidden layers of a two-hidden-layer network if the first hidden layer has more neurons than the second one, and a maximum of six and eight neurons, respectively, in the hidden layers of a two-hidden-layer network if the first hidden layer has less neurons than the second hidden layer. Then 21 possible neural network topologies were considered and trained with pixel subsets (training, validation, and test) derived through the sampling of the best pixel set obtained by using rKED values of 15% and 20%, respectively. At the 15% level for rKED , the numbers of pixels in the training, validation, and test sets are 403, 164, and 130, respectively. For an rKED value of 20%, there are 417 pixels in the training set, 172 pixels in the validation set, and 147 pixels in the test set. Input and output data are scaled in the range [0, 1]. Pixel input and output data clustering is done by a 2-D SOFM of 225 neurons (15 × 15) arranged in a hexagonal topology. Evaluation of the SOFM clustering is done through graphical inspection of the profile of each cluster (Fig. 7) and by simulation of the clustering to evaluate the number of effective clusters. The number of effective clusters should be the largest

1931

as possible (≤ 225 clusters in the case of a 2-D 15 × 15 hexagonal topology). An example of statistical parameters for the training, validation, and test sets is shown in Table I. Two hypothesis tests were used to evaluate the statistical difference in the mean (two-sample t test) and variance (two-sample F test) between training, validation, and test sets for all the input variables. For rKED values of 15% and 20%, the null hypotheses are accepted for all input variables except for the F test null hypothesis at the 0.05 level for the SWE validation and test sets. This is due to the way of constituting the training set. Indeed, for each cluster and when available, two records showing the minimum and maximum values of SWE are placed in the training set, which is not the case for the validation and test sets. The Levenberg–Marquardt algorithm is used to train the MFF neural network [43]. The weights and biases are initialized randomly. For each topology and rKED value, 30 neural networks were trained. Based on the study in [44], we are 99% confident that the best of −30 random starts will result in one of the best 14.2% values of sum-of-squared errors. For all cases considered, no more than 50 epochs were necessary to achieve the stopping criterion discussed in Section V. B. Estimating the Prediction Error Prediction error is a measure of the performance of a model to predict the correct output, given future observations used as predictors. It is used for model selection since it is obvious that the best model should be the one with the lowest prediction error. In our case, even though a neural network is trained with the best pixels whose rKED values are less than or equal to a ∗ (20% or 15% in our specific value of this ratio named rKED case), a trained neural network model is then applied to pixels ∗ ). It is then important showing any value of rKED (≤ or > rKED to estimate the prediction error of each trained neural network. We have used the bootstrap technique which is a statistical inference method providing the sampling distribution of an estimator by resampling with replacement from the original sample [45]. Efron and Tibshirani [45] proposed the .632 estimate of the prediction error. In their method, bootstrap samples are first generated, and for each one of them, a model is fitted to the data. The .632 estimate of the prediction error combines the apparent error and the average error obtained from bootstrap data sets not containing the point being predicted. The apparent error is assessed with the same data that were used to fit the model. Details on this method are given in [45, Sec. 17.7], and an application for statistical validation of neural network models under condition of sparse data is given in [46]. In this paper, we have designed a simple bootstrap procedure which exploits the major advantage of our training methodology. Indeed, only a few pixels of an image are selected by the sampling procedure to constitute the training, validation, and test sets for the MFF neural network training. Therefore, ∗ value, for each image, the extra set considering a specific rKED gathers all the pixels not in the training, validation, and test sets, ∗ . We then add showing rKED values less than or equal to rKED to the extra set, all the other pixels of this image whose rKED ∗ values are greater than rKED , to get a final sample of pixels. ∗ Because two different rKED ’s were considered (15% and 20%,

1932

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 46, NO. 7, JULY 2008

TABLE I STATISTICAL PARAMETERS OF THE TRAINING, VALIDATION, AND TEST SETS FOR EACH OF THE EIGHT INPUT VARIABLES AND FOR SWE (CLUSTERING PERFORMED BY AN SOFM). THE rKED VALUE IS 15%

respectively), two final samples of pixels are constituted. Then, to compare the performance of the MFF neural networks, the pixels which are common to the two final samples are retained to constitute the sample of pixels that will be bootstrapped. Fig. 8 shows the four-step bootstrap experiment considering that two hypothetical images were used to train an MFF neural network. The input data of each pixel (the number of pixels being the length of the sample of common pixels) of each one of the N bootstrapped samples (N ≥ 5000) are given as inputs to each trained neural network, and N values of the prediction error in terms of root mean squared error (rmse), mean absolute error (MAE), and bias are assessed by comparing, for each bootstrapped sample, the series of computed SWE by the neural network and the corresponding observed SWE. A probability density function is fitted to these N values of rmse, MAE, and bias whose shape is close to a normal distribution. In this case, the mode of the distribution is the value at which its probability density function attains its maximum value and the sample mean can also be used as an estimate of the mode. The mode of the prediction error for the best neural network models in terms of rmse, MAE, and bias is given with the confidence limits of 95% confidence interval (Table II).

Usually, there is some ambiguity in determining which model structure is the best in performing a modeling task. Equifinality of models means that there is no unique optimal model structure but rather a set of possible models that are ranked considering some likelihood measure [47]. The acceptance of the possible equifinality of models is very well recognized in hydrologic modeling. In weather forecasting, the uncertainty of short-term weather forecasts is provided by giving a certain number of scenarios, named the ensemble weather forecasts, that are issued by a numerical weather prediction (NWP) model using a multimodel strategy. This NWP model is parameterized in different ways to reflect the uncertainty on some atmospheric processes involved in the weather modeling [48]. The multimodel strategy is usually combined to the perturbation of the initial state of the atmosphere to fully represent the uncertainty of short-term weather forecasts. Because we have no a priori belief on which neural network architecture is the best one, we have selected the best five neural network models from Table II based on the bias. Table III gives the prediction error of these five selected models. By constituting a committee of networks, it can lead to significant improvements in the predictions of the best individual network,

EVORA et al.: COMBINING ANN MODELS, GEOSTATISTICS, AND PASSIVE MICROWAVE DATA

Fig. 8.

1933

Bootstrap procedure used for prediction error estimation.

and a reduction in error is usually obtained. This reduction in error comes from a reduced variance when averaging over many solutions. The single models constituting the committee should be the ones with relatively small bias, and the averaging process can reduce subsequently the extra variance [49]. However, in this paper, only the predictions from the individual models are investigated. In the perspective of constituting a committee of networks, the best five models selected for SWE retrieval are the less biased eight-predictor models and are identified in Table III: 8-6-8-1 (model M1 ), 8-6-1 (model M2 ), 8-4-8-1 (model M3 ), 8-4-4-1 (model M4 ), and 8-6-6-1 (model M5 ). All these models were trained at the 20% level for rKED .

C. Comparison With Tedesco’s Neural Network Architecture and Sensitivity Analysis A four-input (19 H, 19 V, 37 H, and 37 V) feedforward neural network has been used previously by Tedesco et al. [18]. The same training methodology described in Section V has been applied to train this neural network. The rmse, the MAE, and the bias of the best five models using only these four predictors are given in Table IV. Comparison of Tables III and IV proves that it was valuable to use the four other inputs as predictor variables. Prediction errors are much greater with neural networks using only 19 H, 19 V, 37 H, and 37 V as inputs. A sensitivity analysis was also performed to assess the importance of the remaining four inputs. The statistics of the prediction error in terms of rmse, MAE, and bias are used to

characterize the relevance of adding some inputs to the four basic inputs used by Tedesco et al. [18]. Fig. 9 shows that adding TMIN to the set of four basic inputs (19 H, 19 V, 37 H, and 37 V) was very helpful in decreasing the prediction error of a model using only these four basic inputs.

D. Analyzing the Performance of the MFF Neural Network The five selected MFF neural networks have been used for SWE retrieval and mapping over the La Grande River basin in northeastern Quebec during winter 2002–2003. Winter 2002–2003 was a very cold one. However, during the month of December 2002, temperatures were above normal almost everywhere in Quebec. The turn for a cold and sunny winter occurred in mid-January. Most regions of Quebec experienced below average precipitation during winter 2002–2003. The SSM/I grid for each channel is an average over a few days before and/or after the date of interest. This average SSM/I grid over a few days attenuates the variability of the SSM/I data on a daily basis. This is also required because the snow observations used to provide the KED maps are sparse temporally. Fig. 10 shows the average areal SWE over the La Grande River basin obtained by retrieving SWE from SSM/I data by the five neural network models. These are compared to the average areal SWE from KED. Two hundred conditional simulations on the KED map have been generated, and the minimum and maximum values of the average areal SWE from these simulations are also shown in Fig. 10. These

1934

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 46, NO. 7, JULY 2008

TABLE II PREDICTION ERROR OF THE BEST NEURAL NETWORK MODELS IN TERMS OF RMSE, MAE, AND BIAS ALONG WITH THE 95% CONFIDENCE LIMIT BOUNDS. THE LINES IN BOLD IDENTIFY THE BEST FIVE MODELS IN TERMS OF BIAS. THE NUMBER OF BOOTSTRAP SAMPLES N = 7500, AND EACH SAMPLE HAS 2853 PIXELS

conditional simulations give an idea of the variability of the KED SWE which can be considered as an average map resulting from the simulations. Tables V–VIII give the rmse, the MAE, the bias, and the Nash–Sutcliffe model efficiency (NSE) coefficient of the best five models in 2003. The Nash–Sutcliffe coefficient has been used instead of the correlation coefficient because it provides a better measure of goodness-of-fit. NSE values are definitely lower than zero until the mid-February, indicating that the five models are not good at all at reproducing the observed SWE mainly in the southeastern part of the basin. NSE values are almost equal to zero at the end of

February and higher than zero in mid-March, meaning that their performance is better than that of a model whose prediction is the observed average areal SWE. In general, the average areal SWEs from the five neural network models overestimate the KED SWE in January and February. This overestimation is caused by the southeastern part of the basin where the values of SWE are much lower than those of the northwestern part. For example, in mid-January 2003, model M5 has the worst performance. If the SWE values of the southeastern part of the basin are not taken into account (SWE ≤ 12 cm), the rmse, MAE, and bias are 3.28 cm (compared to 7.08 cm all over the

EVORA et al.: COMBINING ANN MODELS, GEOSTATISTICS, AND PASSIVE MICROWAVE DATA

1935

TABLE III PREDICTION ERROR OF THE BEST FIVE NEURAL NETWORK MODELS IN TERMS OF RMSE, MAE, AND BIAS ALONG WITH THE 95% CONFIDENCE LIMIT BOUNDS. THESE NEURAL NETWORKS USE EIGHT INPUTS (19 H, 19 V, 22 V, 37 H, 37 V, 85 H, 85 V, AND TMIN). THE NUMBER OF BOOTSTRAP SAMPLES N = 7500, AND EACH SAMPLE HAS 2853 PIXELS

TABLE IV PREDICTION ERROR OF THE BEST FIVE NEURAL NETWORK MODELS IN TERMS OF RMSE, MAE, AND BIAS ALONG WITH THE 95% CONFIDENCE LIMIT BOUNDS. THESE NEURAL NETWORKS USE FOUR INPUTS (19 H, 19 V, 37 H, AND 37 V). T HE N UMBER OF B OOTSTRAP S AMPLES N = 7500, AND E ACH S AMPLE H AS 2693 P IXELS

Fig. 9. Variation of the prediction error according to the inputs used by the neural networks. The four inputs are 19 H, 19 V, 37 H, and 37 V. + Tmin means adding TMIN to the four inputs.

Fig. 10. Average areal SWE over La Grande River basin in 2003 calculated from the SWE retrieved by the five neural network models and the SWE interpolated by KED. Minimum and maximum values from 200 conditional simulations on the KED map are also plotted.

basin), 2.57 cm (compared to 5.86 cm all over the basin), and −0.50 cm (compared to 4.30 cm all over the basin), respectively. At the end of the winter, the average areal SWEs calculated from models M1 to M5 are in the range of the

average areal KED SWE obtained from the 200 conditional simulations. Fig. 11 shows the comparison of the SWE maps obtained by KED and the five selected models in 2003. It clearly shows the overestimation of SWE by the five models in the southeastern

1936

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 46, NO. 7, JULY 2008

TABLE V RMSE (IN CENTIMETERS) OF THE BEST FIVE MODELS IN 2003

TABLE VI MAE (IN CENTIMETERS) OF THE BEST FIVE MODELS IN 2003

TABLE VII BIAS (IN CENTIMETERS) OF THE BEST FIVE MODELS IN 2003

TABLE VIII NSE COEFFICIENT OF THE BEST FIVE MODELS IN 2003

part of the basin from mid-January 2003 to mid-February 2003. Only model M2 was able to reproduce low SWE values on some pixels of the southeastern part of the basin in January 2003. Each of the five models performs consistently throughout the winter. The models perform also consistently when compared to each other. At the end of the winter (mid-March 2003), the maps from KED and from models M1 to M5 are very much similar overall, higher SWE values in the northeastern part of the basin and lower SWE values in the southeastern part. This is a very satisfactory point because the maximum SWE observed prior to the onset of the spring snowmelt is a critical information for operational run-off and streamflow forecasts. VII. C ONCLUSION A new modeling framework for SWE retrieval and mapping has been presented in this paper. It combines ANNs, passive

microwave data, and geostatistics. This technique is suitable for real-time or close to real-time applications and helpful during periods when there are no ground measurements. The neural network training methodology has the following two advantages: it facilitates the training performance evaluation and the estimation of the prediction error of each trained neural network. Eight possible predictors were used as inputs to the neural network. These eight inputs are the brightness temperatures provided by the seven-channel SSM/I and the interpolated TMIN. By using the prediction error as a measure of the performance of a model to predict the correct output, given new observations of the predictor variables, we have demonstrated that these eight input variables provide the less biased predictions of SWE. SWE retrieval and mapping was performed in 2003 by using a committee of networks containing five MFF neural networks having different architectures. The results were very consistent, although showing some overes-

EVORA et al.: COMBINING ANN MODELS, GEOSTATISTICS, AND PASSIVE MICROWAVE DATA

1937

Fig. 11. Comparison of the SWE mapping over the La Grande River basin obtained from KED and the five selected neural network models in 2003. The + sign in the KED map indicates the location of the SWE measurement stations. The bigger the + sign is, the greater is the value of SWE.

timation in January and mid-February when compared to the ground truth provided by KED. At the end of the winter, the comparison between the KED and neural network mapping revealed a good agreement between the two maps. This is excellent for operational streamflow forecasting since the SWE retrieved by the neural network could be used to update the snow-related state variable of our hydrological model. At this point, it is very important to remind the reader that the objective of this paper was to demonstrate the usefulness of our methodology but not to find an “optimal” network topology. Future work will be performed to select a network with parameters that have been better optimized than those considered in this paper. Additional evaluation of various network topologies could provide clues about the complexity of networks that are necessary to give reliable SWE retrievals using passive microwave data. Guidance on how this paper might be extended include as follows: 1) networks with layers with odd numbers of neurons and 2) network topologies derived by limiting the number of weights and biases instead of limiting the number of

neurons in each hidden layer. Table IX provides some insight into the stability and efficiency of the networks. It shows that none of the additional neural nets have bias values that are less than the nets of Table III. However, all of them have lower rmse and MAE than those in Table III. Our goal in the long term is to improve the accuracy of the neural network SWE retrieval. First of all, by using more recent images, it will be possible to train the neural networks with samples containing more data or update the training with new data sets, then improving the generalization capability of the networks. Another perspective is to improve the accuracy of the KED SWE mapping by taking into account additional variables as external drifts such as basin terrain characteristics (slope, rugosity, etc.) as well as SWE measurement associated errors. Finally, we are planning to use brightness temperature differences as input predictor variables of the MFF neural network instead of brightness temperature raw data. Additional screening criteria such as the presence of depth hoar will improve the fit between observed and retrieved SWEs. To get better resolution

1938

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 46, NO. 7, JULY 2008

TABLE IX PREDICTION ERROR OF ADDITIONAL NEURAL NETWORK MODELS IN TERMS OF RMSE, MAE, AND BIAS ALONG WITH THE 95% CONFIDENCE LIMIT BOUNDS. THESE NEURAL NETWORKS USE EIGHT INPUTS (19 H, 19 V, 22 V, 37 H, 37 V, 85 H, 85 V, AND TMIN). THE NUMBER OF BOOTSTRAP SAMPLES N = 7500, AND EACH SAMPLE HAS 2853 PIXELS

mappings, using AMSR-E data instead of SSM/I data would be appropriate. As far as the neural network modeling for SWE retrieval is concerned, a first research perspective is to develop a method that will produce an SWE map obtained by combining the outputs from the members of a committee of networks. A second goal is to provide an uncertainty map considered as a reliability measure on the neural network SWE map. Finally, the fusion of the KED and MFF maps will be investigated to supply an optimal map that reduces the uncertainty of the KED map. ACKNOWLEDGMENT The authors would like to thank the National Snow and Ice Data Center for the SSM/I data. The authors would also like to thank the individuals within IREQ and the reviewers for their thorough reading of the manuscript and their constructive comments. R EFERENCES [1] P. R. Singh and T. Y. Gan, “Retrieval of snow water equivalent using passive microwave brightness temperature data,” Remote Sens. Environ., vol. 74, no. 2, pp. 275–286, Nov. 2000. [2] S. Haykin, Neural Networks: A Comprehensive Foundation. Upper Saddle River, NJ: Prentice-Hall. [3] H. K. H. Lee, Bayesian Nonparametrics Via Neural Networks, ser. ASASIAM Series on Statistics and Applied Probability. Philadelphia, PA: SIAM, ch. 2, pp. 21–22. [4] C. Mätzler, “Passive microwave signatures of landscapes in winter,” Meteorol. Atmos. Phys., vol. 54, no. 1–4, pp. 241–260, Mar. 1994. [5] S. S. Carroll, T. R. Carroll, and R. W. Poston, “Spatial modeling and prediction of snow-water equivalent using ground-based, airborne, and satellite snow data,” J. Geophys. Res., vol. 104, no. D16, pp. 19 623– 19 629, Aug. 1999. [6] R. L. Armstrong, A. Chang, A. Rango, and E. Josberger, “Snow depths and grain-size relationships with relevance for passive microwave studies,” Ann. Glaciol., vol. 17, pp. 171–176, 1993. [7] J. L. Foster, D. K. Hall, and A. T. C. Chang, “An overview of passive microwave snow research and results,” Rev. Geophys. Space Phys., vol. 22, no. 2, pp. 195–208, 1984. [8] A. T. C. Chang, J. L. Foster, and D. K. Hall, “Effects of forest on the snow parameters derived from microwave measurements during the BOREAS winter field campaign,” Hydrol. Process., vol. 10, no. 12, pp. 1565–1574, Dec. 1996. [9] T. Y. Gan, “Passive microwave snow research at the Canadian High Arctic,” Can. J. Remote Sens., vol. 22, no. 1, pp. 36–44, Mar. 1996. [10] J. L. Foster, A. T. C. Chang, and D. K. Hall, “Comparison of snow mass estimates from a prototype passive microwave snow algorithm, a revised algorithm and snow depth climatology,” Remote Sens. Environ., vol. 62, no. 2, pp. 132–142, Nov. 1997.

[11] B. E. Goodison, I. Rubinstein, F. W. Thirkettle, and E. J. Langham, Determination of Snow Water Equivalent on the Canadian Prairies Using Microwave Radiometry, pp. 163–173, 1986. IAHS Publication, no. 155. [12] M. T. Hallikainen, “Microwave radiometry of snow,” Adv. Space Res., vol. 9, no. 1, pp. 267–275, 1989. [13] J. Aschbacher, “Land surface studies and atmospheric effects by satellite microwave radiometry,” Ph.D. dissertation, Univ. Innsbruck, Innsbruck, Austria, 1989. [14] M. T. Hallikainen and P. A. Jolma, “Comparison of algorithms for retrieval of snow water equivalent from Nimbus-7 SMMR data in Finland,” IEEE Trans. Geosci. Remote Sens., vol. 30, no. 1, pp. 124–131, Jan. 1992. [15] A. B. Tait, “Estimation of snow water equivalent using passive microwave radiation data,” Remote Sens. Environ., vol. 64, no. 3, pp. 286–291, Jun. 1998. [16] K. Goïta, A. E. Walker, and B. E. Goodison, “Algorithm development for the estimation of snow water equivalent in the boreal forest using passive microwave data,” Int. J. Remote Sens., vol. 24, no. 5, pp. 1097–1102, Mar. 2003. [17] D. De Seve, M. Bernier, J.-P. Fortin, and A. Walker, “Preliminary analysis of snow microwave radiometry using the SSM/I passive-microwave data: The case of La Grande River watershed (Quebec),” Ann. Glaciol., vol. 25, pp. 353–361, 1997. [18] M. Tedesco, J. Pulliainen, M. Takala, M. Hallikainen, and P. Pampaloni, “Artificial neural network-based techniques for the retrieval of SWE and snow depth from SSM/I data,” Remote Sens. Environ., vol. 90, no. 1, pp. 76–85, Mar. 2004. [19] J. L. Foster, C. Sun, J. P. Walker, R. Kelly, A. Chang, J. Dong, and H. Powell, “Quantifying the uncertainty in passive microwave snow water equivalent observations,” Remote Sens. Environ., vol. 94, no. 2, pp. 187– 203, Jan. 2005. [20] N. C. Grody and A. N. Basist, “Global identification of snowcover using SSM/I measurements,” IEEE Trans. Geosci. Remote Sens., vol. 34, no. 1, pp. 237–249, Jan. 1996. [21] A. E. Walker and B. E. Goodisson, “Discrimination of a wet snowcover using passive microwave satellite data,” Ann. Glaciol., vol. 17, pp. 307– 311, 1993. [22] G. Matheron, “Principles of geostatistics,” Econ. Geol., vol. 58, no. 8, pp. 1246–1268, Dec. 1963. [23] A. G. Journel and C. J. Huijbregts, Mining Geostatistics. San Diego, CA: Academic. [24] E. H. Isaaks and R. M. Srivastava, An Introduction to Applied Geostatistics. New York: Oxford Univ. Press. [25] N. A. C. Cressie, Statistics for Spatial Data. New York: Wiley. [26] J. P. Chilès and P. Delfiner, Geostatistics: Modeling Spatial Uncertainty. New York: Wiley. [27] R. Zhang, D. E. Mayers, and A. W. Warrick, “Estimation of the spatial distribution of soil chemicals using pseudo-cross-variograms,” Soil Sci. Soc. Amer. J., vol. 56, pp. 1444–1452, 1992. [28] J. P. Delhomme, “Spatial variability and uncertainty in groundwater flow parameters: A geostatistical approach,” Water Resour. Res., vol. 15, no. 2, pp. 269–280, Apr. 1979. [29] A. Galli and G. Meunier, “Study of a gas reservoir using the external drift method,” in Geostatistical Case Studies, G. Matheron and M. Armstrong, Eds. Dordrecht, The Netherlands: Reidel, 1987, pp. 105–119.

EVORA et al.: COMBINING ANN MODELS, GEOSTATISTICS, AND PASSIVE MICROWAVE DATA

[30] S. Ahmed and G. de Marsily, “Comparison of geostatistical methods for estimating transmissivity using data on transmissivity and specific capacity,” Water Resour. Res., vol. 23, no. 9, pp. 1717–1737, Sep. 1987. [31] J. P. Chilès, “Application du krigeage avec dérive externe à l’implantation d’un réseau de surveillance piézométrique,” Sciences de la Terre—Inf. Géol., vol. 30, pp. 131–147, 1991. [32] D. Renard and M. Nai-Hsien, “Utilisation des dérives externes multiples,” Sciences de la Terre—Inf. Géol., vol. 28, pp. 281–301, 1988. [33] M. Reza Ghanbarpour, B. Saghafian, M. M. Saravi, and K. C. Abbaspour, “Evaluation of spatial and temporal variability of snow cover in a large mountainous basin in Iran,” Nord. Hydrol., vol. 38, no. 1, pp. 45–58, 2007. [34] C. V. Deutsch and A. G. Journel, GSLIB Geostatistical Software Library and User’s Guide. London, U.K.: Oxford Univ. Press, 1998. [35] E. Pardo-Igúzquiza, “Comparison of geostatistical methods for estimating the areal average climatological rainfall mean using data on precipitation and topography,” Int. J. Climatol., vol. 18, no. 9, pp. 1031–1047, Jul. 1998. [36] P. Goovaerts, “Geostatistical approaches for incorporating elevation into the spatial interpolation of rainfall,” J. Hydrol., vol. 228, no. 1/2, pp. 113– 129, Feb. 2000. [37] D. Tapsoba, V. Fortin, F. Anctil, and M. Haché, “Apport de la technique du krigeage avec dérive externe pour une cartographie raisonnée de l’équivalent en eau de la neige : Application aux bassins de la rivière Gatineau,” Can. J. Civ. Eng., vol. 32, no. 1, pp. 289–297, Feb. 2005. [38] H. Wackernagel, Multivariate Geostatistics. An Introduction With Applications. New York: Springer-Verlag. [39] ISATIS Software Manual, Geovariances & Ecole des Mines de Paris, Paris, France, 2006. release 6.06. [40] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,” Psychol. Rev., vol. 65, no. 6, pp. 386–408, 1958. [41] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA: MIT Press, 1986, ch. 8. [42] G. J. Bowden, H. R. Maier, and G. C. Dandy, “Optimal division of data for neural network models in water resources applications,” Water Resour. Res., vol. 38, no. 2, pp. 2.1–2.11, 2002. article 1010. [43] Neural Network Toolbox User’s Guide, Math Works Inc., Natick, MA. [44] M. S. Iyer and R. R. Rhinehart, “A method to determine the required number of neural-network training repetitions,” IEEE Trans. Neural Netw., vol. 10, no. 2, pp. 427–432, Mar. 1999. [45] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap. Boca Raton, FL: CRC Press. [46] J. M. Twomey and A. E. Smith, “Bias and variance of validation methods for function approximation neural networks under conditions of sparse data,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 28, no. 3, pp. 417–430, Aug. 1998. [47] K. Beven and A. Binley, “The future of distributed models: Model calibration and uncertainty prediction,” Hydrol. Process., vol. 6, no. 3, pp. 279– 298, Jul.–Sep. 1992. [48] R. Buizza, M. Miller, and T. N. Palmer, “Stochastic representation of model uncertainties in the ECMWF ensemble prediction system,” Q. J. R. Meteorol. Soc., vol. 125, no. 560, pp. 2887–2908, Oct. 1999. [49] C. M. Bishop, Neural Networks for Pattern Recognition. New York: Oxford Univ. Press.

1939

Noël Dacruz Evora was born to Cape Verdean parents in Dakar, Sénégal. He received the degree in civil engineering from the Ecole Polytechnique de Thiès, Thiès, Sénégal, in 1987, the complementary degree of civil engineer in hydrology and M.Sc. degree from the Faculté Polytechnique de Mons, Mons, Belgium, in 1990, and the Ph.D. degree in civil engineering (stochastic hydrology) from the Ecole Polytechnique de Montréal, Montréal, QC, Canada, in 1997. Since December 2000, he has been with HydroQuébec Research Institute, Varennes, QC, where he is currently a Research Scientist in hydrology and hydrometeorology. His research interests are in surface-water hydrology and primarily cover the methods of improving water resources management and environmental risk assessment (floods and droughts). His specific research areas include hydrologic modeling, hydrometeorology, neural networks, and genetic programming applications to water resources. Dr. Evora is a member of American Geophysical Union.

Dominique Tapsoba was born in Ouagadougou, Dagnoën, Burkina Faso. He received the B.Sc. degree in fundamental and applied geology from the University of Ouagadougou, Ouagadougou, and the M.Sc. degree in hydrology and water resources management and the Ph.D. degree in hydrology from the University of Paris XI, Paris, France. He was with the Industrial Chair in Statistical Hydrology (INRS–ETE, Université du Québec à Québec, QC, Canada) as a Postdoctoral Fellow in 1997. From 1999 to 2001, he was a Visiting Expert with the Food and Agriculture Organization of the United Nations, Rome, Italy. He is currently a Research Scientist in statistical hydrology with the Hydro-Québec Research Institute, Varennes, QC, Canada. Hydro-Québec is a world hydropower in North America. He is also currently a Lecturer with the Université du Québec à Montréal, Montréal, and he is involved as a Research Scientist in the consortium on regional climatology and adaptation to climate change (Ouranos). His research activity concerns geostatistical characterization mainly of climatic variables and hydrologic analysis and design in the climate change context.

Danielle De Sève, photograph and biography not available at the time of publication.