Operational Delivery of Hydro-Meteorological ... - IEEE Xplore

6 downloads 157 Views 350KB Size Report
Edward A. King, Matthew J. Paget, Peter R. Briggs, Cathy M. Trudinger, and ... E. A. King, M. J. Paget, P. R. Briggs, and M. R. Raupach are with the CSIRO.
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 2, NO. 4, DECEMBER 2009

241

Operational Delivery of Hydro-Meteorological Monitoring and Modeling Over the Australian Continent Edward A. King, Matthew J. Paget, Peter R. Briggs, Cathy M. Trudinger, and Michael R. Raupach

Abstract—The Australian Water Availability Project (AWAP) is a system that operationally delivers weekly estimates of soil moisture stores and water fluxes at continental scale over Australia. The highly modularized system implements a miniature spatial data infrastructure by exploiting a simple data format standard and metadata scheme to enable the flexible ingestion of a variety of input data types, including gridded meteorological fields, land surface parameterizations and, optionally, remote sensing data. The use of these standards, together with a client-server architecture and portable coding, enable the system to function across multiple interchangeable computers, leading to a robust system with a high degree of redundancy. Through a well-defined interface, the framework supports the development and testing of multiple models. Thorough model and data version-control and log file capture also allows automated operational runs in the same environment as that in which models are built and tested. The system includes a web portal (http://www.csiro.au/awap) that provides a variety of ways for data users to dynamically explore and examine output (which currently includes over a century of data for the Australian continent at monthly intervals, in addition to weekly near-real-time products) in summary or extended forms. Index Terms—Environmental factors, hydrology, modeling, water resources.

I. INTRODUCTION NE of the major impacts of climate change is likely to be a change in patterns of rainfall and evaporation, and consequently, disturbance of the hydrological cycle, e.g., [1]. Under such altered regimes, ongoing measurement and monitoring of the terrestrial water balance is essential, both to assess and manage water resource availability as well as to understand biosphere interaction and response processes. Moreover as global population growth continues, the human demand on terrestrial water resources will increase, requiring more precise and timely water information in order to ensure sustainable use. As an arid and hot continent, these pressures are becoming evident sooner in Australia than elsewhere [2], [3], and land man-

O

Manuscript received March 31, 2009; revised July 15, 2009. First published September 11, 2009; current version published January 20, 2010. This work was supported in part by the Australian National Heritage Trust fund via the Commonwealth Bureau of Rural Science. E. A. King, M. J. Paget, P. R. Briggs, and M. R. Raupach are with the CSIRO Marine and Atmospheric Research, Canberra, ACT, 2601, Australia (e-mail: [email protected]). C. M. Trudinger is with the CSIRO Marine and Atmospheric Research, Aspendale, Vic, 3195, Australia. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSTARS.2009.2031331

agers and government agencies are seeking new and efficient ways of conducting water resource assessments. In this paper, we describe the operational system and software environment supporting the Australian Water Availability Project (AWAP). This project monitors the state and trend of the terrestrial water balance of the Australian continent at a spatial resolution of 5 km at approximately weekly timescales [4]–[6]. The AWAP system was developed by CSIRO Marine and Atmospheric Research in partnership with the Australian Bureau of Meteorology and Bureau of Rural Sciences. The AWAP system simulates the water balance on a regular grid using a physical model of the relevant water-energy processes. The model is forced by daily meteorological fields and parameterized by characterizations of soil and surface properties, and so tracks the changes in the water balance resulting from atmospheric forcing (rainfall) and terrestrial responses (transpiration, soil evaporation, runoff, drainage). These meteorological fields are essential to the running of the model and are, therefore, mandatory inputs. In addition to the forcing meteorology, numerous other optional data inputs are available, including large-scale satellite measurements of land surface temperature, soil moisture and vegetation cover, and sparser in-situ observations of stream flow and water and energy fluxes. These data can be used, either directly or indirectly, to constrain the model state variables by means of model-data fusion or data assimilation techniques (e.g., [7] and [8]). From the outset, the whole of the AWAP system was designed to enable these methods to be developed and used operationally. Despite the capacity to ingest a wide range of data types, the primary focus of the AWAP system is to monitor the terrestrial water balance in the broad, without intrinsic regard for detailed topography, surface flow and storage; that is, the model grid is spatially uniform and specific hydrological features are not explicitly represented. In this respect AWAP differs markedly from other operational hydrological monitoring systems, such as Delft FEWS [9], [10], which uses a well defined interface, underpinned by a database, to couple modules expressing data handling and hydrological modeling operations, such as rainfall-runoff models, in an interactive environment. These differences arise for two main reasons; first, the AWAP system is attempting to simulate the slowly changing terms in the water balance rather than providing forecasts at an hourly time scale, and second, the goal is to provide monitoring at continental spatial scales over regions for which relatively little detailed hydrological information is currently available. AWAP, therefore, implements a somewhat rigid structure that permits computation en masse, whereas systems such as FEWS and other localized modeling environments provide a framework that allows

1939-1404/$26.00 © 2009 IEEE

242

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 2, NO. 4, DECEMBER 2009

the model structure itself to be adapted to the specific hydrology. Both systems achieve data-model interoperability but by different means; in AWAP it is the gridded data interface, while in FEWS it is the schema of the internal database. It is still possible for AWAP to take advantage of nongridded data at finer spatial and temporal scales, but this requires purpose-built converters to represent the observations in the context of the model state variables and grid. Extra model logic would also be required to correctly address processes that operate at a sub-daily time scale. Several requirements were crucial in shaping the design of the operational system. First, the capacity to utilize a wide variety of data, coupled with a range of spatial resolution and scale, mean that the data demands of the continental monitoring system are both immense and varied; a key design goal was to ensure that these needs could be met into the future as available data types and spatial resolutions increased. Second, in order to operationally maintain the assessment on an ongoing basis in near real-time, timeliness and quality control are also important. Third, in addition to providing regular updates to a contemporary monitoring program, it is important to be able to run simulations of the water balance over long historical periods, both to understand the changes in terrestrial water stores, and to compare the simulations with observations as a means of evaluating the veracity of the model. Fourth, for efficiency in testing and detailed evaluation, it was desirable that the system could be run not only for the whole continent but for small areas, such as individual catchments or water measurement jurisdictions. Finally, it was recognized that an operational framework which delivered on these objectives would be an ideal vehicle for testing and developing increasingly sophisticated physical models. To this end, care was taken to separate the model itself from the operational framework, so that models would be easily interchangeable. In recent years, much progress on data interoperability technology has been made, including particularly the areas of metadata standards (e.g., ISO19115 [11]), data formats (e.g., GEOTiff, [12], HDF, [13], netCDF, [14], [15]), data access protocols (OGC WCS, [16], [17], OPeNDAP, [18], THREDDS, [19]), and data discovery (catalogs and spatial data infrastructures, [20], [21]). While all these developments are useful, they are not in and of themselves the essential element. The underlying principles that are key to the success of an extensive data management and processing system come from having a sufficiently general data architecture consistently applied throughout for interoperability and reusability, and modularity for reliability and scalability. Provided these two characteristics are present, the various technologies can assist greatly, especially when it comes to developing powerful interfaces for accessing complex data types and large amounts of data efficiently. The AWAP framework is an articulation of these principles. The overarching focus of this paper is on the design and implementation of the operational system rather than the model. Nevertheless the characteristics of the model are summarized in Section II from the point of view of its input and output requirements, as these determine the design of the operational system. The high-level design of the system, in the context of the multiple goals just described, is discussed in Section II. Key elements of the system implementation and relevant aspects of the

system operations, including data delivery and visualization, are addressed in two subsequent sections. The final section provides an assessment of how successfully the operational system met the objectives and evaluates the role of the various design principles in achieving them. II. MODEL FRAMEWORK Although they are the engine of the AWAP project, the internal working details and underlying physics of the model and the data assimilation techniques [4]–[6], [22] are not germane to the operational system. The aim of this section is only to provide sufficient background to the model requirements that the impact on the overall design is clear. To date, two models have been used in the AWAP system, Waterdyn and CABLE. Waterdyn [4], [5] was created initially as a standalone model to develop the water and energy modeling science required for AWAP, and has evolved into the operational model. CABLE is the land surface scheme for a larger climate and earth system simulator currently under development [23]. Both models are optionally capable of data assimilation; i.e., they can optionally utilize additional observations (such as satellite derived land surface temperatures or vegetation cover) and corresponding observation models to adjust the state variables (e.g., [7]). To anchor the following discussion, the Waterdyn model is used as an example. Its major components are illustrated in Fig. 1, together with the information flow pathways, both in simple forced mode and with data assimilation turned on. At its simplest the model uses daily meteorological forcing data (rainfall, solar radiation and temperature maxima and minima) together with an initial state to estimate, for each pixel in the grid, soil moisture, as well as all water fluxes contributing to changes in soil moisture (rainfall, transpiration, soil evaporation, surface runoff and deep drainage). These variables are the output fields at the end of each daily time step. The soil moisture stores are propagated as initial conditions for the next time step. A simple diagnostic check is performed on the soil moistures to ensure that they are neither negative nor greater than unity, either of which would indicate a model failure, or possible data stream corruption. In data assimilation mode the model outputs are constrained by additional observations and modified products are produced, which are also propagated to the next time step. The model runs over a specified time period (a week in near-real-time operational mode and over 100 years for historic runs), with propagation of information from one daily time step to the next being handled internally by the model except at the start of a run. In near-real-time operational mode, this means that the final state of one week’s run is required as the initial condition for the next run. From the point of view of the operational system, this is a very simple process to manage, there being only two complications. Firstly, were it not for the feedback of some of the outputs for use as priors in the next run, this system would essentially be a linear pipeline from start to finish. The presence of the loop means that the surrounding framework needs to capture some of the outputs and label them correctly in order for the process to continue. The second important point is the distinction that the

KING et al.: OPERATIONAL DELIVERY OF HYDRO-METEOROLOGICAL MONITORING AND MODELING OVER THE AUSTRALIAN CONTINENT

243

Fig. 1. Schematic representation of the modeling environment. With data assimilation, the pathways represented by the dotted arrows are replaced by the dashed arrows. Without data assimilation, the feedback of soil moisture products for use as inputs to the next model step is subject to a simple physical range check, whereas data assimilation explicitly adjusts the model outputs to be consistent with observations.

forcing data is required at every time step, whereas the assimilation data is optional. The operational system needs to organize both input data streams independently and apply different existence checks. For performance reasons, models are implemented in high-level compiled languages such as, in the case of Waterdyn, Fortran95. The models themselves are typically relatively short pieces of code that are embedded, usually by means of a subroutine call, in infrastructure code. This code reads and arranges the input data (forcing and/or assimilation), together with any initialization data and static parameter fields, and manages the output of the results. The Fortran95 infrastructure code developed in the AWAP project is a flexible framework that accepts and writes spatial data in a single binary format optimized for fast access and presents it to the model code. The locations and names of the data files that are read and written, together with any model mode control flags, are set by means of a “key-value” format ASCII file that is read on initialization. The operational system is able to control the model behavior only by means of this ASCII file. To change the model requires only including the appropriate subroutine call and adjusting the ASCII configuration file. The supporting infrastructure thus enables multiple models to present a relatively static interface to the operational system. This is an important design feature, as will be described in Section III. III. SYSTEM DESIGN It is clear from the requirements for the AWAP operational framework described above that the system has the potential for considerable complexity. The number of possible input data streams is essentially arbitrary, some are mandatory (meteorological forcing) and some are optional (data assimilation inputs), the system needs to support a number of different processing streams (operational, experimental) and a range of different models. In addition it needs to be capable of efficiently

Fig. 2. Representation of the model from the point of view of the operational framework. The key tasks are preparing the inputs, setting up the control parameters and gathering the results. The inputs include the initial model state, spatial parameters (e.g., soil properties), forcing data, and, optionally, observations for data assimilation.

processing regions of different sizes (areas) and spatial resolutions, in a variety of different run durations. Abstraction and encapsulation are key techniques for controlling complexity in software systems (e.g., [24]). An abstraction summarizes the essential properties of an object that distinguish it from other objects, whilst encapsulation hides the internal workings of an object. Together they are used to compartmentalize the elements of a system into independent modules with well-defined interfaces. This permits individual modules to be designed, tested and revised independently of one another [25] so that the solution to the problem can be broken down into tractable parts. This approach is used throughout the coding of the AWAP operational system, a good example being the model abstraction already discussed (Fig. 2).

244

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 2, NO. 4, DECEMBER 2009

In addition to the development of the program code, there are two other areas in which abstraction has had an important impact on the design. The first has been the adoption of spatial data and metadata standards. Spatial data ingested by the system have disparate origins, including point meteorological measurements interpolated onto grids (and, therefore, rasterised), satellite imagery from multiple sensors, compilations of surface properties, and maps of model state and water fluxes produced by the system itself at the end of each time step. Each of these data potentially has a different format and geographic projection, together with its own set of related metadata. To modify the model code to accommodate each data stream would complicate the model framework code and rapidly lead to a system that potentially required a great deal of code be compiled and available even when testing small features of the model. An alternative approach, which breaks the nexus between data diversity and the model, is to require all data to conform to a common format and geographic projection, with a common set of metadata, and then to develop for the model a single interface to that format. This permits all data-stream specific processing (re-projection, temporal compositing, calibration etc) to be removed from the core of the system and located in independent modules that are maintained separately. Adding data streams then amounts only to developing a suitable module for presenting the data in the standardized format. This approach is made easier in this system because the point meteorology has already been converted to a raster format similar to the other spatial input fields [26], [27]. A further benefit is that tools developed to view or manipulate these data (e.g., to produce sub-sets, find statistics, or diagnose problems) not only share a common set of interface code with the model framework, but can be used with any data set. The ability of the standardized format to represent all the anticipated types of data input is crucial to this implementation. What is surprising is the simplicity of the format standard that was able to achieve the necessary level of data interoperability. The requirements that were imposed on the input data were as follows. • Geographic projection with common grid spacing. • Single channel 32-bit IEEE floating-point grid format with an ASCII header, supported as a simple raster input/output standard by the ArcGIS software suite [28]. • All filenames to begin with a date and, optionally, time of day in the format YYYYMMDD[HHMMSS], followed by a separator (’_’) and a word from a controlled vocabulary of possible data types (which implied certain other characteristics of the data, such as measurement units). • Multichannel data to be aggregated in an archive (zip) file with a name that conforms to the above scheme, with a suitable umbrella type name. (e.g., a zip file containing thermal data from the AVHRR satellite instrument could have the name AVHRR-IR and contain three files with key words UTC, BT11 and BT12 for the time of day, and 11 and 12 micron brightness temperature observations, respectively). The data were then grouped by year, either on a local file system, or on an ftp server. To locate the available data, the model framework then needs only a root URL (file or ftp), a data type name and a date-time range.

It should be noted that under this schema, most metadata in the system is effectively stored in data file names. The use of file names as a repository for metadata is convenient in a system that processes large quantities of spatio-temporal data, since the processes that access and create data files can directly read the required metadata without recourse to any more complex interface. The data catalogues effectively expose the metadata simply through their indexes. To realize the benefits of such a system, it is implicit that a controlled vocabulary and format is used to express the metadata, for otherwise machine processing becomes problematic. The method is also very attractive from the point of view of interactive diagnosis of a data system since standard operating system tools can be used to examine and, if necessary, modify its state. There are, however, practical limits to the extent to which such a scheme can be implemented, especially as the range of related metadata grows. Nevertheless, even for systems with extensive metadata, the practice of exposing key elements in the data file names remains useful for the reasons discussed. The second important abstraction in the design is related to the output data series. From the outset, it was recognized that the system would be used both for operational data production and development and testing, and with different models, data sources, regions, time ranges and time steps. To accommodate this diversity, each output data stream is organized independently, with its own archive of intermediate results, output data and log files and diagnostics, but arranged with a consistent data structure and file naming system. Although the primary goal was to ensure a clear separation between the results of different model runs, a significant additional advantage was that tools for analysis and display worked equally well across all output data streams, leading to a substantial efficiencies, since there was no additional coding and display cost associated with extra data series (either test or operational). In object-oriented programming parlance, the output data streams are objects that, together with the analysis and display methods, form a class. The support for the ftp protocol in the data format standard has far-reaching implications because it allows the data provision function to be distributed across multiple systems, and in doing so, forces an abstraction (and, hence, modularization) of data access beyond that of simply opening local files. This actually makes the model framework itself much simpler since, paradoxically, it needs only to be concerned with opening local files that have been retrieved by the network access module, which can itself perform certain common operations such as sub-setting on-demand so that the data is always presented to the model in the correct format. Note also that the particular choice of the ftp protocol is unimportant, it was simply convenient because some existing data sets were available via that mechanism. Other network data access protocols, such as the OGC Web Coverage Service [16], OPeNDAP [18], or http access to a file system could equally have been used, and because this functionality is external to the model framework, easily can be. Finally, it is worth noting that because all data comes via local files it is possible to prepare a static data set and efficiently re-use it when attempting to diagnose problems with the model itself. The separation of data provision from data use, together with the simple metadata description, confers other advantages. The

KING et al.: OPERATIONAL DELIVERY OF HYDRO-METEOROLOGICAL MONITORING AND MODELING OVER THE AUSTRALIAN CONTINENT

different data streams can be prepared independently and so it is possible to provide access to subsets of large network-accessible data archives without having commensurate local storage, and the uniformity of interface means that it is straightforward to switch between different data streams to test the impact of different data on results. Second, the flexibility of the network interface makes it simple to rapidly switch between data sources, so an operational system requiring high availability can easily take advantage of replicated redundant data servers. Last, the simple metadata protocol means that each data provision stream is effectively an autonomous self-populating catalogue which can be explored by the data access module to discover available data. This means that data preparation can take place independently and asynchronously from running of the model. For the near real-time operational system this means that, at least for the optional data streams (i.e., not the forcing meteorology), the regular running of the model is decoupled from data production, so that short interruptions or delays in supply of some nonessential data (e.g., due to failure to acquire satellite imagery or network failure between supplier agencies and the operational system) do not prevent the production of time critical model outputs. The eventual system resulting from these considerations is shown schematically in Fig. 3. It broadly consists of four layers or sub-systems. The external layer (A in Fig. 3) includes a variety of data providers who make their data available in a variety of formats over the Internet. The second layer (B) is made up of several local servers that independently fetch data and perform specific processing before reformatting, assigning metadata and placing in an ftp-accessible archive. These last three steps are the crucial ones that make the data interoperable, discoverable and accessible, effectively implementing a local spatial data infrastructure. The model-run apparatus (C) is the part of the system where the model control logic comes into play, arranging the input data, model parameters (e.g., soil and vegetation properties) and model initial state, and then running the model. The output fields are archived for each step according to the particular processing run. The model final state is included in the archive and can then be used to initialize the next sequential run. The final component is the output dissemination and visualization layer (D). This is built upon the uniformly organized output archives and provides a web-based browse and display feature as well as operational (ftp) and experimental (OPeNDAP) data access. IV. IMPLEMENTATION The implementation of the AWAP operational framework is essentially a systems integration task, in which a flow of data needed to be coordinated from a number of repositories, through a sequence of modules, and output to another archive. The only existing component at the time of commencement was the Fortran95 model framework described in Section II. With very little modification, this component was able to be used directly in the operational system by recreating the data environment that it was built to expect in the development phase. This had the advantage that model implementations could be moved very rapidly from test and development to operational since no changes to the Fortran95 were required to adapt to the operational environment.

245

Fig. 3. Completed AWAP operational framework. The four major elements A, B, C, and D are described in the text.

The PERL scripting language [29] was chosen to build the operational framework because it is a powerful systems integration language providing access to operating system functionality with strong file handling and text processing features. Virtually all the modules shown in Fig. 3 were implemented either exclusively in PERL, or by using PERL to call preexisting binary applications to manipulate the data. All use of the operating system features was abstracted into a PERL library so that changing operating systems would require changing only the library code, not the modules. This library was tested in the current operational Microsoft Windows environment, and in a Linux environment. A method for defining system configuration was implemented in PERL and used throughout the operational framework. This permitted the choice of actual data sources, output streams, models, parameter sets, initial conditions and code versions to be set in a text file rather than hard coded. The benefits of this approach were that the system is very portable since data locations can easily be changed, and that the configuration of any operational data stream can be maintained consistently simply by preserving and re-using the configuration file with a sequence of dates. A small number of coding standards, particularly relating to module interface implementations, together with code re-use

246

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 2, NO. 4, DECEMBER 2009

through libraries, were adopted to ensure that modules would interoperate predictably and so that developers could readily work on each others’ code. A particular problem in a system of this size that relies on multiple co-operating modules is that an error or otherwise anomalous condition that arises in one module can cascade through subsequent processing stages before raising a fatal error. This can make it extremely difficult to diagnose faults since their causes can be in antecedent modules to those exhibiting the fault itself. By rigorously and consistently testing the results of all intermediate steps within all modules, and generating a fatal error at the earliest detection of an anomalous condition, this “disguised-error” problem was effectively eliminated and was a key feature enabling the rapid development of the whole system. The types of tests involved here fall into two categories; the first being standard software checks to ensure that basic operations (such as opening files, reading data) completed correctly, and the second being verification that the data volumes (numbers, sizes and types of files) are as expected, i.e., completeness checks. V. OPERATIONS The AWAP system was designed to automatically produce weekly estimates of soil moisture and other elements of the continental water balance for use in ongoing agricultural policy application and is an operational system subject to real timeliness constraints. This imposes expectations of reliability on the system itself and traceability on the outputs that it produces. It also extends the domain of the system into the realm of data delivery and access. This section examines these features and their impact on the implementation. Several aspects of the AWAP framework relating to reliability and robustness have already been discussed, including the capacity to switch between alternate data sources, to discover and use only those observational data that are available, and to detect anomalous conditions early before producing erroneous results. One respect in which the AWAP system is vulnerable to external factors is the dependence on the meteorological forcing data; if that is unavailable, then the model cannot be run. During testing, it became apparent that the supply of daily gridded solar radiation fields, generated daily from geostationary satellite observations, was subject to intermittent error, in the form of missing pixels and delayed whole days. This was not surprising as the gridded solar radiation product is itself experimental, having been developed concurrently by the Bureau of Meteorology to support this project [26], [27]. To prevent these intermittent errors from disrupting data delivery, since the solar radiation is one of the mandatory model forcing inputs, an input data “repair” step was implemented in which missing pixels were automatically detected and replaced by surrogate data. The derivation of these surrogates is necessarily a statistical process involving some form of spatio-temporal averaging or correlation. Since the extent of the data gaps is unknown a priori it is difficult to choose a fixed “averaging size” that can be applied uniformly, so the codification of any automated averaging process is potentially challenging. Moreover in this particular situation the scale size of the intrinsic variations in the incident solar radiation field, while themselves variable, can often be small relative to the temporal sampling and comparable or

smaller than the spatial sampling, so it is unlikely that a localized averaging scheme would confer any advantage over a simple nearest neighbor replication. In the face of these difficulties, a monthly “climatology” was constructed for every location (pixel) by computing a value for each month using values drawn from that location for all days within that month over a period of a decade. This captures both the seasonal and broad spatial variation well, and since approximately 300 values are averaged for every location, is smoothly varying in both space and time. Besides being computationally efficient, replacing missing data with values drawn from a static climatology has the additional advantage that the origin of the replacement data is traceable, and it is, therefore, easier to replicate the processing at a later date should reprocessing be required for diagnostic or development purposes. Since the number of missing pixels is nearly always small, the effect on the overall results is negligible but processing continuity is maintained. It should be noted that the method is effective in this particular circumstance but may not be applicable to other meteorological fields with larger characteristic variational scales relative to the spatio-temporal sampling. Whole missing days present a more serious problem than small spatial gaps; in almost all cases, the data for a given day are simply delayed and become available some days later, although the subsequent days are available in the interim. Here the approach taken is to automatically create a new temporary output stream and replace the missing day by the climatology so that operational data production can continue. When the missing day subsequently becomes available, the temporary output stream ceases and production reverts to the original stream using the new data in place of the climatology. The ease with which this solution is implemented is due to the modularization of the output streams (described earlier) which makes starting a new temporary output stream a straightforward task. Traceability of production was designed as an integral feature, regardless of whether running in experimental or operational mode. All modules produce log messages and statistics at key internal stages as well as when they complete, and these are automatically captured, time stamped and recorded as part of the output data stream. The system configuration file is also included in the output data collection so that all module (and model) versioning is captured with the data sources and types. Through these facilities it is possible to establish exactly how the system was configured when any given data was produced, and also to trace when any changes in the operational configuration took place. The data generated by the AWAP system with the Waterdyn model are spatial estimates of soil moisture and other related meteorological and hydrological variables (fifteen in total). As already described, these are stored in a separate file-system archive for each type of model series and are labeled with the start and end dates for each modeling interval. Since one of the objectives of the system is to monitor the water balance with respect to long-term averages, the estimates are converted to percentile ranks (i.e., anomalies) relative to a multiyear climatology for each variable. In the operational system these thirty fields (fifteen fluxes and stores, and their fifteen anomalies) are further converted to a portable self-describing format (netCDF)

KING et al.: OPERATIONAL DELIVERY OF HYDRO-METEOROLOGICAL MONITORING AND MODELING OVER THE AUSTRALIAN CONTINENT

247

Fig. 4. Public web interface (http://www.csiro.au/awap) permits easy exploration and comparison of the variables at continental scale; here, one of the weekly output streams is shown with five of the output fields (columns 1–5, maximum temperature, rainfall, upper and lower level soil moisture stores, and local discharge) and the corresponding anomalies (columns 6–10) as percentile ranks for seven weeks in February and March 2009.

and exported to an ftp server for immediate external client access. One of the benefits of using netCDF format for data output is that, since it is widely used in the climate research community, the OPeNDAP Hyrax [30] and THREDDS data servers can be used to provide experimental access to the results via the OPeNDAP and WCS protocols. A high priority was placed on delivery and visualization of outputs, since the ultimate use for all these data is to contribute to integrated monitoring and understanding of the dynamics of landscape systems (including responses to climate variability and change). Imagery for each field and its anomaly is generated to support visualization via the web. A public web site (http:// www.csiro.au/awap) was built to enable time series of maps of different variables to be displayed, either independently, or multiply for comparison, and over different time ranges. To achieve a compact and interactive display, these maps are initially shown at reduced spatial resolution (Fig. 4) but the corresponding full resolution data are also available. This has proven to be an important tool for gaining an appreciation of the trends reflected in the data, especially for the century-long historical runs, and also for rapidly identifying regional anomalies in particular fields. The uniformity in structure and organization of the output data series (Section III.) means that the web code that interactively displays the data for the operational data series can equally be used to display other series, such as those resulting from test and development runs. Display customization for each series is achieved by means of a single text file located in the root of each data series which provides specific labelling information and translations for variable names. The file-system is effectively being used as a database, with the data organized and named so that the web code is able to automatically discover new data

as they are produced. As an aid to visualization and outreach, experiments are underway to describe the outputs of the operational series via KML files [31]. By overlaying these data with features familiar to land users and managers, such as roads and rivers, and enabling the animation of the time series, the ease of interpretation is greatly improved. VI. DISCUSSION The AWAP operational system has been running with the Waterdyn model since March 2007, producing weekly and monthly data both for autonomous web delivery (http://www.csiro.au/ awap) and also for use in a national agricultural monitoring system [32] which provides assessments of drought and other natural resources and stresses. Over this period the only interruptions have been caused by breaks in the supply of the externally produced gridded meteorological fields, and in all cases, these episodes have been detected automatically and repaired quickly and easily, so that no corrupt data have been exported. Simultaneously the same system has been run to produce monthly and annual series of historical soil moisture fields. The system has also provided a continental-scale development and testing environment for another model (CABLE), demonstrating the successful creation of a unified development and operational environment. In addition to model development, the unique historical data sets from AWAP are also having an impact in the research sphere. They are being used as input to projects seeking to develop detailed national water balance models for water assessment and accounting. They are of interest in climate change impact modeling because they directly couple meteorological measurements with hydrologically relevant outcomes. The

248

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 2, NO. 4, DECEMBER 2009

historical series also enable investigation of the impact of climate indicators such as ENSO and the Indian Ocean Dipole on factors directly affecting land condition. At present the AWAP system is producing data continentally on a 5-km grid. Many of the observation data sets, particularly the satellite remote sensing fields, are available at 1-km resolution so it should be possible to produce output at that resolution. The 25-fold increase in data volume, and the accompanying computational burden, would, however, overcome the capacity of the existing processing hardware. The obvious solution is to divide the processing task spatially and distribute the load across multiple processors. The current system is amenable to exactly such an approach since the spatial sub-setting could easily be added to the data access interface, and the data stores/catalogs are themselves replicable and so can scale to share the load. The other pathway is to modify the model itself to run in a tightly coupled parallel processing environment (e.g., using the message passing interface [33]) such as a cluster or other multiprocessor system. In this case, since such systems generally run the Linux operating system, the portability features built into the operational framework will be of enormous benefit. The AWAP operational framework demonstrates the importance of data (and metadata) standards to enabling modularity and interoperability. Since data objects are the primary messages that flow across interfaces between modules, it follows that standardizing their format and description permits modules to be developed independently. This in turn allows the system to be developed and tested piece by piece, and so, via abstraction, its aggregate behavior will be predictable. The inclusion of a network protocol as part of the data interface standard means that the various modules, and their processing load, can be distributed across multiple processors, and can easily be replicated to provide the redundancy required for an operational system with high-availability requirements. Last, the AWAP operational framework can provide a test bed for modern spatial data interoperability technologies, such as OGC Web Coverage Services and Web Mapping Services, OPeNDAP and THREDDS. These are straightforward to implement in the existing environment, precisely because the design of the AWAP system already enforces the uniformity of data model which they are designed to help implement. This is true for both the input data streams and dissemination of the output data and imagery. Moreover, the encapsulation of the data discovery logic within the network access module means that only that module would need to be modified to use one of the more sophisticated catalog and data discovery systems that are currently under development in the wider geospatial community. Taking a wider view, as an operational system relying on a diverse range of spatial data from a variety of sources and suppliers, both in-house and external, the AWAP system is proving to be a useful model for analysing the requirements of the next generation of water accounting and assessment systems that are the subject of active research and development [34]. ACKNOWLEDGMENT The authors would like to thank Dr. L. J. Renzullo and Dr. P. J. Turner and two anonymous referees for helpful comments on the original manuscript.

REFERENCES [1] B. C. Bates, Z. W. Kundzewicz, S. Wu, and J. P. E. Palutikof, “Climate change and water,” Technical Paper of the Intergovernmental Panel on Climate Change, IPCC Secretariat, Geneva, Switzerland, 2008, pp. 210–210. [2] K. Hennessy, B. Fitzharris, B. C. Bates, N. Harvey, S. M. Howden, L. Hughes, J. Salinger, and R. Warrick, “Australia and New Zealand. Climate change 2007: Impacts, adaptation and vulnerability,” in Contribution of Working Group II to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, M. Parry, O. Canziani, J. Palutikof, P. van der Linden, and C. Hanson, Eds. Cambridge, U.K.: Cambridge Univ. Press, 2007, pp. 507–540. [3] B. Timbal and D. A. Jones, “Future projections of winter rainfall in southeast Australia using a statistical downscaling technique,” Clim. Change, vol. 86, pp. 165–187, DOI 10.1007/s10584-007-9279-7. [4] M. R. Raupach, P. R. Briggs, E. A. King, M. J. Paget, and C. M. Trudinger, “Australian water availability project (AWAP),” CSIRO Marine and Atmospheric Research Component: Final Report for Phase 2 CSIRO Marine and Atmospheric Research. Canberra, Australia, 2005, p. 38. [5] M. R. Raupach, P. R. Briggs, V. Haverd, E. A. King, M. J. Paget, and C. M. Trudinger, “Australian water availability project (AWAP),” CSIRO Marine and Atmospheric Research Component: Final Report for Phase 3 CSIRO Marine and Atmospheric Research. Canberra, Australia, 2007, p. 67. [6] M. R. Raupach, C. M. Trudinger, P. R. Briggs, V. Haverd, E. A. King, and M. J. Paget, “108 years of Australian water balance,” Hydrol. Earth Syst. Sci., 2009. [7] M. R. Raupach, P. J. Rayner, D. J. Barrett, R. S. DeFries, M. Heimann, D. S. Ojima, S. Quegan, and C. C. Schmullius, “Model-data synthesis in terrestrial carbon observation: Methods, data requirements and data uncertainty specifications,” Global Change Biol., vol. 11, 2005. [8] T. Vischell, G. G. S. Pegram, S. Sinclair, W. Wagner, and A. Bartsch, “Comparison of soil moisture fields estimated by catchment modelling and remote sensing: A case study in South Africa,” Hydrol. Earth Syst. Sci., vol. 12, pp. 751–767, 2008. [9] M. G. F. Werner, M. van Dijk, and J. Schellekens, Liong, Phoon, and Babovic, Eds., “DELFT-FEWS: An open shell flood forecasting system,” in Proc. 6th Int. Conf. Hydroinformatics, 2004, pp. 1205–1212, ISBN 981-238-787-0. [10] Delft Flood Early Warning System, Deltares [Online]. Available: http:// www.wldelft.nl/soft/fews/int/index.html [11] W. Kresse and K. Fadaie, ISO Standards for Geographic Information. Berlin, Germany: Springer-Verlag, 2004. [12] N. Ritter and M. Ruth, “GeoTIFF format specification revision 1.0,” Jet Propulsion Lab. Cartographic Application Group. Pasadena, CA, 1995. [13] Hierarchical Data Format Group [Online]. Available: http://www.hdfgroup.org/ [14] R. K. Rew and G. P. Davis, “The unidata netCDF: Software for scientific data access,” in Proc. 6th Int. Conf. Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Anaheim, CA, Feb. 1990, pp. 33–40. [15] netCDF: Network Common Data Form [Online]. Available: http://www.unidata.ucar.edu/software/netcdf/ [16] Open Geospatial Consortium Web Coverage Service [Online]. Available: http://www.opengeospatial.org/standards/wcs [17] L. Di, W. Yang, D. Meixia, D. Deng, and K. McDonald, “Interoperable access of remote sensing data through NWGISS,” in Proc. IEEE Int. Geoscience and Remote Sensing Symp., 2002, vol. 1, pp. 255–257. [18] P. Cornillon, J. Gallagher, and T. Sgouros, “OPeNDAP: Accessing data in a distributed, heterogeneous environment,” Data Sci. J., vol. 2, pp. 164–174, 2003. [19] B. Domenico, J. Caron, E. Davis, R. Kambic, and S. Nativi, “Thematic real-time environmental distributed data services (THREDDS): Incorporating interactive analysis tools into NSDL,” J. Digit. Inf., vol. 2, 4, pp. 2002–05, 2005. [20] A. Rajabifard and I. P. Williamson, “Spatial data infrastructures: Concept, SDI hierarchy and future directions,” presented at the Geomatics, Australia [Online]. Available: http://www.sli.unimelb.edu.au/research/ publications/IPW/4_01Raj_Iran.pdf [21] A. Doyle and C. Reed, “Introduction to OGC web services,” presented at the White Paper, Open Geospatial Consortium, 2001 [Online]. Available: http://portal.opengeospatial.org/files/index.php?artifact_ id=14973

KING et al.: OPERATIONAL DELIVERY OF HYDRO-METEOROLOGICAL MONITORING AND MODELING OVER THE AUSTRALIAN CONTINENT

[22] C. M. Trudinger, M. R. Raupach, P. R. Briggs, V. Haverd, E. A. King, and M. J. Paget, “Model data assimilation for parameter estimation in continental-scale hydrological monitoring,” Water Res. Res., 2009, to be published. [23] E. A. Kowalczyk, Y. P. Wang, R. M. Law, H. L. Davies, J. L. McGregor, and G. Abramowitz, “The CSIRO atmosphere biosphere land exchange (CABLE) model for use in climate models and as an offline model,” CSIRO Marine and Atmospheric Research Paper 013. Melbourne, Australia, 2006. [24] G. Booch, R. Maksimchuk, M. Engle, B. Young, J. Conallen, and K. Houston, Object-Oriented Analysis and Design With Applications, third ed. Reading, MA: Addison-Wesley, 2007. [25] K. H. Britton and D. L. Parnas, A-7E Software Module Guide, NRL Memorandum Rep. 4702, 1981. [26] I. F. Grant, D. Jones, W. Wang, R. Fawcett, and D. Barratt, “Meteorological and remotely sensed datasets for hydrological modelling: A contribution to the Australian water availability project,” in Proc. CAHMDA-III Workshop, Melbourne, Australia, Jan. 2008, pp. 9–11 [Online]. Available: http://www.cahmda3.info/ab_files/ CAHMDA3_Grant.pdf [27] D. Jones, W. Wang, and R. Fawcett, Climate Data for the Australian Water Availability Project Final Milestone Rep., Bureau of Meteorology, 2007. [28] ESRI ArcView Software [Online]. Available: http://www.esri.com/ software/arcview/ [29] L. Wall, T. Christiansen, and R. Schwartz, Programming Perl, 2nd ed. Sebastopol, CA: O’Reilly, 1996. [30] Hyrax Data Server [Online]. Available: http://docs.opendap.org/index. php/Hyrax [31] Open Geospatial Consortium Keyhole Markup Language [Online]. Available: http://www.opengeospatial.org/standards/kml/

249

[32] National Agricultural Monitoring System Bureau of Rural Science [Online]. Available: http://www.nams.gov.au [33] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, “A high-performance, portable implementation of the MPI message passing interface standard,” Parallel Comput., vol. 22, 6, pp. 789–828, Sep. 1996. [34] R. G. O’Hagen and A. Parashar, “Use case analysis of AWAP,” in Proc. CSIRO Land and Water, Mar. 27, 2008, pp. 41–41. Edward A. King, photograph and biography not available at the time of publication.

Matthew J. Paget, photograph and biography not available at the time of publication.

Peter R. Briggs, photograph and biography not available at the time of publication.

Cathy M. Trudinger, photograph and biography not available at the time of publication.

Michael R. Raupach, photograph and biography not available at the time of publication.