Ocean Dynamics

5 downloads 0 Views 2MB Size Report
Corresponding Author's Secondary. Institution: First Author: Rama Rao Karri ..... 1) Network A, consisting of the 3 Singapore stations Raffles, Horsburgh and ...
Ocean Dynamics Application of data assimilation for improving forecast of water levels and residual currents in Singapore regional waters --Manuscript Draft-Manuscript Number: Full Title:

Application of data assimilation for improving forecast of water levels and residual currents in Singapore regional waters

Article Type:

Original Papers

Keywords:

data assimilation, ensemble Kalman filter, sea level anomaly, non-tidal currents, Singapore regional waters, Malacca Strait

Corresponding Author:

Rama Rao Karri, Ph.D. National University of Singapore SINGAPORE

Corresponding Author Secondary Information: Corresponding Author's Institution:

National University of Singapore

Corresponding Author's Secondary Institution: First Author:

Rama Rao Karri, Ph.D.

First Author Secondary Information: Order of Authors:

Rama Rao Karri, Ph.D. Abhijit Badwe, Ph.D. Xuan Wang, M Sc Ghada El Serafy, Ph.D. Julius Sumihar, Ph.D. Vladan Babovic, Ph.D. Herman Gerritsen, Ph.D.

Order of Authors Secondary Information: Abstract:

Hydrodynamic models are commonly used for predicting water levels and currents in the deep ocean, ocean margins and shelf seas. Their accuracy is typically limited by factors such as the complexity of the coastal geometry and bathymetry, plus the uncertainty in the flow forcing (deep ocean tide, winds, pressure). In Southeast Asian waters with its strongly hydrodynamic characteristics, the lack of detailed marine observations (bathymetry, tides) for model validation is an additional factor limiting flow representation. This paper deals with the application of ensemble Kalman filter (EnKF) based data assimilation with an overall objective of improving the deterministic model forecast. The efficacy of the EnKF is demonstrated via a twin experiment performed on the Singapore regional model. The results show that the data assimilation can improve forecasts significantly even for a detailed model of this level.

Powered by Editorial Manager® and Preprint Manager® from Aries Systems Corporation

Manuscript Click here to download Manuscript: OceanDynamics_Manuscript_FinalSub.docx

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Click here to view linked References

Application of data assimilation for improving forecast of water levels and residual currents in Singapore regional waters Rama Rao KARRI1*, Abhijit BADWE1, Xuan WANG1, Ghada EL SERAFY2, Julius SUMIHAR2, Vladan BABOVIC1, and Herman GERRITSEN2 1 2

Singapore-Delft Water Alliance, National University of Singapore, 117577, Singapore

Deltares, P.O. Box 177, 2600 MH Delft, The Netherlands

Abstract: Hydrodynamic models are commonly used for predicting water levels and currents in the deep ocean, ocean margins and shelf seas. Their accuracy is typically limited by factors such as the complexity of the coastal geometry and bathymetry, plus the uncertainty in the flow forcing (deep ocean tide, winds, pressure). In Southeast Asian waters with its strongly hydrodynamic characteristics, the lack of detailed marine observations (bathymetry, tides) for model validation is an additional factor limiting flow representation. This paper deals with the application of ensemble Kalman filter (EnKF) based data assimilation with an overall objective of improving the deterministic model forecast. The efficacy of the EnKF is demonstrated via a twin experiment performed on the Singapore regional model. The results show that the data assimilation can improve forecasts significantly even for a detailed model of this level. Keywords: data assimilation, ensemble Kalman filter, sea level anomaly, non-tidal currents, Singapore regional waters, Malacca Strait. * Correspondence Author: [email protected] 1. Introduction Modelling of water levels and currents in Malacca Strait and Singapore regional waters is challenging due to the presence of large number of smaller islands and strongly nonlinear tidal interactions of both diurnal and semi-diurnal nature. The complexity is further enhanced due to significant local bathymetry and geometry variations around Singapore Island and meteorological effects on different scales- monsoon, shorter scale storms and highly intense storm squalls. The flows in this region experience the effects of nonlinear dynamical interactions between three large water bodies- South China Sea on the east, Andaman Sea and Indian Ocean on the west and Java Sea on the south. The complex shallow water hydrodynamics resulting from the multiple ocean currents moving into and out of this region 1

combined with sharply varying short term meteorological effects leads to a very high 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

variability of the water levels and currents along the Singapore coast. Since Malacca Strait is a major navigation corridor, understanding the mechanisms and accurate modelling of such complex phenomena and their impact on navigation and risk of collisions is of great economic and environmental importance. To obtain a better understanding of the processes and reproducing the driving mechanism of these large-scale driven flows in Singapore waters, the Singapore Regional Model (SRM) was developed by Kernkamp and Zijl (2004). This model covers the region 950 E - 1090 E and 60 S - 100 N, stretching from the Andaman Sea to the south China and Java seas. This depth-integrated model simulates barotropic flows driven by tides and meteorological forcing. The contribution of tidal interactions, seasonal monsoons and shorter time scale weather features to residual (non-tidal) currents or current anomalies (CA) and its associated sea level anomalies (SLA) are investigated in the framework of the Must Have Box study (Gerritsen et al. 2009). Understanding the source of CA‘s and SLA‘s is of paramount importance in the modelling. A detailed study by Rao et al. (2010) has shown that South China Sea winds act as a triggering mechanism for the SLA‘s around the Singapore Island. Depth-Integrated hydrodynamic modelling may quantify the various phenomena that contribute to the currents and water levels in the Singapore region but may not replicate these anomalies and their (spatial and temporal) cross correlation with the winds. Ooi et al. (2009) used a selectively refined grid variant of the SRM to show that it is possible to further improve the original SRM‘s tidal flow prediction along Singapore. The critical analysis by Kurniawan et al. (2011) systematically improved the tidal flow representation further, removing existing bias and significantly enhancing the representation accuracy. To improve the model further, for day to day forecasting, the application of data assimilation (DA) scheme is proposed. These methods integrate recent model outputs and all observations so far available to obtain an optimal solution up to the present time epoch. The discrepancy between the model output and the observations may have arisen due to inaccuracy in the model parameters, numerical approximations and uncertainty in the prescribed forcing terms. This discrepancy can be minimized by various techniques that use the error correction techniques inspired by chaos theory (Sannasiraj et al. 2004; Sun et al. 2010), local model approximation (Fuhrman 2001; Babovic et al. 2005; Sannasiraj et al. 2005), artificial neural networks (Zhang et al. 2006; Moeini et al. 2012; Sun et al. 2012) and genetic programming(Gaur and Deo 2008; Ghorbani et al. 2010; Rao and Babovic 2010). These algorithms try to correct errors in the sample covariance between observations and 2

model outputs. To address the uncertainties associated within the model, updating of the state 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

variables would be more appropriable for highly nonlinear systems. This updated/corrected state variable will be used as the new initial condition for model forecast. Kalman filter (KF) based data assimilation schemes which are the optimal updating procedure are found to be capable and efficient in handling high dimensional systems like ocean dynamics and weather forecasting (Yen et al. 1996; Carme et al. 2001; Sorensen and Madsen 2004; Wei and Malanotte-Rizzoli 2010). The classical Kalman filter (Kalman 1960) was developed for a linear system. Thereafter for different degrees of nonlinearity, various variants of Kalman filters have evolved (Madsen and Canizares 1999; Chen et al. 2009). For Southeast Asian waters, the efficacy of the linear Kalman filter (KF) and the extended Kalman filter (EKF), where the nonlinear function is linearized around the current estimate, is limited by the highly nonlinear nature of the flow dynamics. It is also limited by the burden of its implementations and the dimensionality of the large scale model at hand. In view of this this and the simplicity in their implementation, ensemble based variants of the Kalman filter have been extensively used in the field of ocean data assimilation. Of these, the Ensemble Kalman filter (EnKF) and the reduced rank square root (RRSQRT) Kalman filter have been widely implemented (Verlaan and Heemink 1997; Heemink et al. 2001; Tippett et al. 2003; Evensen 2007; Chui and Chen 2009; Ponsar and Luyten 2009). This paper focuses on the implementation of an Ensemble Kalman filter (EnKF) for improving forecasting of water levels and currents in the Singapore region. In Section 2, the applied numerical model is described. In Section 3, a twin experimental setup is described that provides a realistic test case for the assessment of the EnKF in correcting the solution back toward the original solution. Section 4 provides a brief introduction to the data assimilation environment OpenDA and its EnKF module. The efficacy of the EnKF for estimating water levels and currents at both observation stations and unobserved locations in both hindcast and forecast simulations is verified in Sections 5 and 6. Based on correlation analysis, an optimal network of monitoring stations is derived in Section 7, thus reducing the redundancy in the system and thereby decreasing the computation time for generating operational forecasts for the Singapore region. Finally, a summary of the study and conclusions derived from the results are presented in Section 8.

3

2. Numerical model description 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

2.1. SRM and SRMC; model domain The numerical model used in the present study is the Singapore Regional Model - Coarse (SRMC). The SRMC is a computationally efficient version of the original SRM developed by Kernkamp & Zijl (2004) and was created initially to reduce the computational time while maintaining similar response characteristics to the SRM. Kurniawan et al., (2010; 2011) systematically analysed the sensitivity of the tidal representation in the coastal waters around Singapore to various SRMC modelling parameters, leading to much reduced model bias and enhanced tidal representation. Kurniawan et al., (2010) showed that the SRMC has largely the same response characteristics and can suitably replace the SRM for process studies and data assimilation to estimate the water levels and currents in the Singapore Straits. The SRM / SRMC model application provides water level and current information, both tidal and non-tidal in Malacca and Singapore Straits. The application is based on the Delft3DFlow modelling system. The system solves the barotropic Navier-Stokes equations for an incompressible fluid, under the shallow water and the Boussinesq assumptions to simulate the free surface flow for the whole water body between the Andaman Sea on the West, South China Sea on the east, and a small part of Java Sea on the South. Prescribed tides at the western and eastern boundaries represent the influence of the Indian and Pacific oceans, respectively, while the more complex tidal interaction is resolved in the interior model domain. The scope and grid of the SRMC along with the spatial distribution of water level observation stations that are considered in the present study are shown in Fig. 1. The coordinates of the sixteen coastal tidal observation or monitoring stations are given in Table 1.

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Fig. 1 Map showing the SRMC grid, bathymetry, open boundaries (dark blue line), boundary support points (black dots), 16 tidal observation stations (red dots) and 9 unobserved locations (blue dots) in the domain of study

5

Table 1 Co-ordinates of the 16 monitoring stations 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Stations

Latitude (deg min N) Longitude(deg min E)

LANGKAWI

6° 19'

99° 43'

PENANG

5° 25'

100° 15'

LUMUT

4° 12'

100° 38'

KELANG

3° 02'

101°26'

KELING

2° 13'

102° 09'

KUKUP

1° 20'

103° 27'

RAFFLES

1° 09'

103° 44'

TANJONG PAGAR

1° 16'

103° 51'

HORSBURGH

1° 19'

104° 24'

SEDILI

1° 56'

104° 07'

TIOMAN

2° 48'

104° 08'

KUANTAN

3° 59'

103° 26'

CENDERING

5° 16'

103° 11'

GETING

6° 14'

102° 06'

MEDAN

3° 40'

98° 38'

PONTIANAK

0° 03'

109° 15'

2.2. Grid and boundary supports points in the SRMC SRMC has a total of 4,200 boundary-fitted orthogonal curvilinear grid cells with spatial resolution of ~75 km at the open sea boundaries to approximately 1 km around Singapore. The SRMC bathymetry is based on Admiralty charts giving a depth in the model of about 2,000 m in the Andaman Sea and a maximum depth of approximately 40-50 m in the Singapore Straits. The tidal range varies from about 2.8 m to the West of Singapore to about 1.5 m to the East of Singapore. Along the open sea boundaries best estimates of local amplitudes and phases of eight tidal constituents (Kurniawan et al. 2011) are prescribed, where tidal water levels at the open boundaries are calculated from the following relationship (1) 6

in which: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

H(t)

Tidal water level at time, t

H0

Mean water level over a certain period

K

Number of prescribed tidal constituents

k

Index of a tidal constituent

Hk

Local tidal amplitude of tidal constituent, k

Fk

Nodal amplitude factor

ωk

Angular velocity

(V0+u)k

Astronomical phase at Greenwich

uk

Nodal phase factor

Gk

Local tidal phase of tidal constituent, k

Fig. 1 shows the location of the open boundaries of the SRM through its boundary support points (black circles; where tidal and mean sea level forcing are prescribed and adjusted). The eight main tidal constituents Q1, O1, P1, K1, N2, M2, S2 and K2 are prescribed at these three open sea boundaries, while direct tide generating forces are included in the interior domain. The best estimates of amplitudes and phases at these open boundaries come from tidal analysis of time series of available local water level data, further enhanced as described in Kurniawan et al., (2011) . 2.3. Modelling period – focus on occurrence of sea level anomalies While the tidal model representation has much improved through the analysis (Kurniawan et al., 2010, 2011), but water level and current anomalies are less well represented. These therefore form a practically relevant case for analysing the capability of the Kalman Filter. From the data analysis in (Babovic 2007; Calkoen et al. 2009; Rao and Babovic 2009), three strong water level increase events in early 2004 were identified, which are non-tidal in nature with maximum water level of ~50 cm Fig. 2 shows the observed values at the Cendering, Tioman and Tanjong Pagar. These events are generated by storms over the South China Sea, with the local initial water level rises moving southwards to Singapore waters. Hence, their representation in the model is through incorporation in the eastern boundary condition, rather than through local SRMC wind forcing over the Singapore model domain. First estimates of these boundary condition adjustments are obtained from local model results with the South China Sea eXtension (SCSX) model which covers the whole South China Sea, and which does apply wind and pressure effects, see Fig. 3. 7

The data assimilation test aims to investigate whether it can estimate the adjustments to 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

the boundary conditions needed to predict these local non-tidal anomalies and thereby estimate/forecast the water levels and residual currents. Given the considerable computational time required to carry out data assimilation runs over the full one year period in 2004, in the present study, the runs and the subsequent analyses were carried out for a one month period – from 1st January 2004 00:00 to 31st January 2004 23:00 which covers the first SLA event.

Fig. 2 Variation of non tidal sea level signals for three stations (Cendering, Tioman and Tanjong Pagar) for the period January through April, 2004

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Fig. 3 SCSX model grid and the SRMC model grid that is nested in the SCSX domain

3. Set up of the twin experiment The twin experiment is designed to assess the capability of the Kalman filter to correctly estimate the flow forcing along the eastern model boundary needed to generate the non-tidal water levels in the interior stations. The schematic diagram for implementation of twin experiment is shown in Fig. 4. The deterministic model simulation with these tide + surge boundary conditions at the boundary segments, generate time series of water levels that are treated as ―reference‖ run data. The ―true‖ water levels or the ―truth‖ are generated by simulating SRMC with perturbed boundary conditions i.e. the water level time series at each boundary segment corresponding to the three open sea boundaries are perturbed by adding spatially-uncorrelated colored noise at each boundary segment i, (2) 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

where wi is a coloured noise sequence generated by passing a white noise sequence of standard deviation 5 cm through a AR (1) process with a coefficient of 0.93.

Fig. 4 Schematic diagram of SRMC twin experiment setup The three uncorrelated white noise sequences are used to generate the coloured noise sequences resulting in w1, w2, w3 that are uncorrelated to each other. The ―synthetic‖ observations that are used for data assimilation are generated by adding white noise of zero mean and standard deviation of 5 cm to the true water levels at each station. Even these white noise sequences added to the true water levels at any stations are spatially uncorrelated to each other. Perturbing the boundary conditions introduces a considerable mismatch between the reference (model) outputs and the truth, while the observation noise depicts a realistic scenario where there is uncertainty associated with the observations. The efficacy of the EnKF to reduce this mismatch by estimating back the original boundary conditions are assessed at every observation station where the synthetic observations (perturbed true water levels) is used as ―measurements‖ for assimilation.

4. OpenDA- Data assimilation environment OpenDA is a generic open source data assimilation environment (El Serafy et al. 2010) for application to a choice of physical process models and hydrodynamic unsteady flow models. It uses a set of interfaces that describe the interaction between models, observations and data assimilation algorithms. The structure of the modelling environment, sequence manager, representation of uncertainties, interfacing of process models and user prerequisites are described in El Serafy et al. (2007) and Weerts et al., (2010). This toolbox allows experimentation with data-assimilation / calibration methods without the need of extensive 10

programming. It supports assimilation of the available measurements to carry out the 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

sensitivity analysis and simultaneous parameter optimization of model parameters (Kurniawan et al. 2011). This environment also features filtering techniques like Ensemble Kalman Filter (EnKF), Reduced Rank Square Root Filter (RRSQRTF), Ensemble Square Root Filter (EnSQRTF) and particle filters. These functionalities have been successfully applied in different areas like data assimilation of currents and salinity profiles (El Serafy et al. 2007) and flood forecasting (Weerts et al. 2010) and more recently to data assimilation for accurate forecasting of SLA‘s and residual currents (Babovic et al. 2011; Karri et al. 2011). These diverse applications demonstrate the efficacy of OpenDA as a generic toolbox for data assimilation. In the present study, the data assimilation scheme based on the EnKF available in OpenDA is used to improve the prediction of water levels and currents in the Singapore regional waters.

5. Calibration of the ensemble Kalman filter (hindcasting) For the ensemble Kalman filter based data assimilation, the twin experiment simulations use a one hourly time step for the period 2nd January 2004 00:00 to 31st January 2004 23:00, during which one significant positive sea level event is recorded. This simulation produces a time series of hourly data for all grid points to generate the truth and reference data. The synthetic observations generated for the 16 observation stations (shown in Fig. 1) are used by data assimilation to correct the model state. Several assimilation runs are carried out with different ensemble sizes, but it was observed that for ensemble size of 32, predictions are acceptable and were produced in a reasonable computation time. A higher number of ensemble members resulted in slighter improved predictions, but come at a cost of higher computation time. Hence as a trade-off between the accuracy and the computation time, the present work is limited to 32 ensemble members. The settings used for the implementation of EnKF for the hindcast runs are shown in Table 2. Table 2 Parameter settings implemented in the EnKF simulation Parameter Ensemble Size ARMA noise model coefficient

Value 32 0.93

Standard deviation of uncertainties at open boundaries

0.05 m

Standard deviation of measurement noise

0.05 m 11

For comparing the results from the various cases, two conventional statistics have been 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

used: 1. Root Mean Square Error (RMSE) 2. Percentage Improvement in estimates

The water level and current estimates yielded by data assimilation are compared with the truth and with those yielded via the free run of the model or the reference. Considering this, at any station or location i, the RMSE for the 'reference (without data assimilation)' and the ‗EnKF (with data assimilation)‘ is computed as, (3)

where Xi is the true state (water level or current) at a location, subscript s represents the reference model (r) or Kalman filter estimate (f), and N is the number of data points. In order to quantify the improvement gained by the Kalman filter correction, we also define a percentage improvement as,

(4)

The current in the model state is defined in terms of the u and v components. The results here, however are presented in terms of the magnitude and direction of the currents because, operationally, these are the parameters of interest. The current magnitude (um) and direction () are obtained as,

(5)

(6)

To assess and validate the capability of the EnKF in estimating the correct water levels, in the following sections, the assimilated water levels and the deterministic model results (reference run) are compared to the truth and the corresponding RMSE‘s and percentage improvement 12

plots are presented. During this analysis, the first 240 data points (10 days) are neglected to 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

compensate the initialization effect (spin up) in model simulation.

5.1. Monitoring networks Accuracy of the water level estimates can vary depending on the availability of relevant water level data at different locations. If the locations of these monitoring/ observation stations are very close or strongly correlated, then this available data introduces redundancy in the system. For real time operational mode, this redundancy can increase the computational cost. Hence, there is a tradeoff between the number of observations stations, accuracy and computational cost. To assess the efficacy of EnKF and quality of estimates based on the available information via the observation stations; three monitoring networks are considered as follows: 1) Network A, consisting of the 3 Singapore stations Raffles, Horsburgh and Tanjong Pagar. This network is defined to study the quality of water level estimates that can be expected at all the 16 observation stations based on the availability of only limited local water level information. In real time, the time series of water levels at these stations can be available from Maritime and Port Authority of Singapore (MPA). 2) Network B, consisting of the 3 stations in network A and the 11 Malaysian stations Geting, Cendering, Kuantan, Sedili, Kukup, Keling, Kelang, Lumut, Penang, Langkawi and Tioman which are located around the Malacca strait and South China Sea. Altogether, network B consists of 14 stations for which historic measurements of tidal water levels are available from University of Hawaii sea level centre (UHSLC) (the 11 stations) and MPA (3 Singapore stations). 3) Network C, consisting of network B plus two Indonesian stations, Medan and Pontianak. This network represents the main sea level monitoring stations in the model domain, located in Singapore, Malaysia and Indonesia. The historical tidal water level data for the Indonesian stations are available from International Hydrographic Organization (IHO). In addition to the above, the performance of the data assimilation schemes to estimate the water levels are also assessed at locations where no observations are available and which are of regional importance for navigation. These locations of interest (LOIs) lie in the marine

13

navigation corridor around Singapore. Their spatial positions in SRMC domain are shown in Fig. 1, marked as blue dots in the deep waters. The twin experiment hindcast runs were carried out with EnKF as the data assimilation scheme for all three monitoring networks and its performances with respect to the estimation of water levels are shown in Fig. 5. For all three monitoring networks, the EnKF hindcast runs gave similar improvement in estimating the water levels at the three Singapore stations (Raffles, Horsburgh and Tanjong Pagar), Network A led to some improvement at the stations east of Singapore, but improvement dropped quickly west of Singapore, resulting in a worsening of the solution at Keling and all stations further west. For Networks B and C, improvement was over 50% for all stations included. 100 Network A Network B Network C

% Improvement in water level estimates

50

0

PONTIANAK

GETING

CENDERING

KUANTAN

TIOMAN

SEDILI

HORSBURGH

TG-PAGAR

RAFFLES

KUKUP

KELING

KELANG

LUMUT

PENANG

LANGKAWI

-50 MEDAN

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Fig. 5 Percentage improvement in water level estimates based on EnKF hindcast for networks A, B and C 5.2. Enhancement of water level prediction at observation stations It was observed from the results presented in the subsection 5.1 that the combination of observations stations used in the network C results in better estimates of water level at all the monitoring stations. Hence further analysis in this paper is therefore limited to network C. The assimilated water levels and the deterministic model results (reference run) are compared with respect to the truth and the corresponding RMSE‘s at the observation stations are shown in Fig. 6. It is clearly seen that by correcting the water level estimates using the EnKF, the corrected water levels are very close to the true water levels. On average, there is an error of 14

approximately 20 cm between the deterministic model output and the true water levels, which 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

after data assimilation are reduced to approximately 6 cm (on average) for all the stations except the most western ones. On average, the error reduction is about 70%. For the stations located very close to Andaman sea, there is roughly 40% deviation between the true water levels and the assimilated water levels, while there is 20-30% deviation (except at Pontianak) at other observation stations considered in the domain of study. The higher improvement at the stations located east of Singapore (in the South China Sea), is due to the fact that these are closer to the eastern boundary conditions where the EnKF makes its major adjustments.

Fig. 6 EnKF hindcast results with network C: RMSE of water levels in the observation stations without and with application of EnKF (bars); the red line presents the percentage improvement (RMSE reduction) at each observation station 5.3. Variability of the estimated water levels. In Section 5.2 above it was shown that the RMSE of the estimated water levels with data assimilation is significantly reduced. Scatter plots with the estimated water levels plotted against the true water levels (with and without data assimilation) for the stations Langkawi, Raffles, Geting and Pontianak which are widely spatially distributed are shown in Fig. 7. It is clearly seen that the deterministic model predictions exhibit a larger error with respect to the true water levels than the ones corrected by data assimilation.

15

LANGKAWI

RAFFLES 2

model without assimilation model with assimilation

Estimated Water Level (m)

Estimated Water Level (m)

1.5 1 0.5 0 -0.5 -1 -1.5 -1.5

-1

-0.5 0 0.5 True Water Level (m)

1

model without assimilation model with assimilation

1.5 1 0.5 0 -0.5 -1 -1.5 -1.5

1.5

-1

-0.5

GETING

0 0.5 1 True Water Level (m)

1.5

2

Pontianak 1.5 Estimated Water Level (m)

model without assimilation model with assimilation

1.5 Estimated Water Level (m)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

1 0.5 0 -0.5 -1

model without assimilation model with assimilation

1 0.5 0 -0.5 -1

-1.5 -1.5

-1

-0.5 0 0.5 True Water Level (m)

1

1.5

-1

-0.5

0 0.5 True Water Level (m)

1

1.5

Fig. 7 Scatter plot showing the estimated water level vs true water level at stations Langkawi, Raffles, Geting and Pontianak for EnKF hindcast and network C

5.4. Enhancement of water level prediction at unobserved locations In the previous sections, it was confirmed that better estimates of water level predictions are achieved at the observed stations. To verify the performance of EnKF to predict the water levels at unobserved locations, nine locations of interest (LOI) as shown in Fig. 1 in the study domain. These LOI‘s were chosen to lie in the marine corridors near the Singapore where real time satellite track data is not available and local buoys cannot be installed due to practical reasons. Reliable estimates of water levels and currents at these locations may have a direct practical relevance for vessel guidance. The twin experiment provides synthetic observations at these locations that are used for local EnKF evaluation in the same way as the results at the monitoring locations. Key difference is that these locations are not used as observations in the EnKF updates. The RMSE‘s for the corresponding scenarios with and without data assimilation are presented in Fig. 8. It is observed that the RMSE values of the corrected water levels are reduced drastically and there is roughly around 60-80% improvement over the deterministic model predictions. This shows the extent of

16

improvement/ prediction that can be achieved based on the spatial distribution properties of 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Kalman gain (that is computed based on the available data) over the model domain.

Fig. 8 EnKF hindcast results with network C: RMSE of water levels in the 9 LOI stations without and with application of EnKF (bars); the red line presents the percentage improvement (RMSE reduction) at each unobserved locations

5.5. Spatial characterization of the estimated water level over the SRM domain. In the previous sections the efficacy of EnKF in correcting water levels at individual observation stations and estimating the water levels at the unobserved locations (locations of interest) was assessed. In this section, we assess the improvement achieved over the whole SRM domain. The spatial characterization of the improvement in terms of RMSE is shown in Fig. 9 for the scenario without and with assimilation at every grid point in the whole SRM domain. It is evident that there is a significant improvement in the assimilated water levels over the whole domain, except for the zone near the Andaman sea open boundary. This clearly indicates that the data assimilation procedure improves the water level estimates in Malacca Strait and the South China Sea. On the other hand, no considerable improvement in water levels is observed near the Andaman Sea (and the most southern part of the domain). 17

This can be attributed to the complex effects in the shallow region with varying bathymetry, where there are no water level observations that can be used to correct for these complex effects. And also due to the uncertainties from the open boundaries which leads to a larger uncertainty along the coastal than on other deeper areas. The dynamics of this area seems to be dominated by local phenomena, which prevents the Kalman filter to have impact over a larger area.

RMSE (m) of Water level without DA

RMSE (m) of Water level with DA

10° N

10° N

8° N

8° N

6° N

6° N

4° N

4° N

latitude (deg) 

latitude (deg) 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

2° N



2° N



2° S

2° S

4° S

4° S

6° S

6° S 95° E

0

100° E 105° E longitude (deg) 

0.05

95° E

110° E

0.1

0.15

100° E 105° E longitude (deg) 

0.2

110° E

0.25

0.3

Fig. 9 Spatial plot of RMSE of water level: without and with data assimilation

5.6. Improved estimates of water levels in the Singapore Straits. The overall improvement of the water level representation realised by EnKF is demonstrated in Fig. 10. The plots show the quality of the estimates without and with assimilation at every grid point around the Singapore Island. The water levels that are predicted by the deterministic model deviate by ~20cm from the true water levels. On the other hand, the water level estimates with data assimilation exhibit a deviation of about 5cm relative to the true water levels. With highly non linear tidal interactions due to the position of smaller islands around the Singapore, this minor deviation is inevitable and practically acceptable.

18

RMSE (m) of Water level with DA

latitude (deg) 

RMSE (m) of Water level without DA

latitude (deg) 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

longitude (deg) 

0

0.05

longitude (deg) 

0.1

0.15

0.2

0.25

0.3

Fig. 10 Spatial plot of RMSE of water level in the Singapore Straits without and with data assimilation

5.7. Spatial variation of residuals currents To provide accurate scheduling of harbor facilities, docking and sailing times in the Singapore Straits, besides water levels an accurate estimation of residual currents is very important. These residual currents are non tidal and dominate the tidal flow conditions which influences the drive forces and the large scale phenomena in the Singapore waters. And also due to the non availability of current measurements in this region, it becomes highly desirable to obtain their estimates accurately. While using only the modelled water levels as control variables, EnKF updates the whole model states i.e., water levels and currents. Through the model dynamics the currents are also automatically corrected, along with the water levels. Through the model dynamics the currents are also automatically improved, along with the water levels. The improvement in the non-tidal current estimates achieved by data assimilation is shown in terms of RMSE in Figures 11 and 12 for magnitude and direction respectively. It is again observed that there is a significant improvement in the region influenced by South China Sea. Especially in the Singapore Straits, there is considerable improvement in the prediction of the non-tidal currents for both magnitude and direction. As observed for the water levels, there is little to no improvement in the estimates of currents near the Andaman Sea. 19

RMSE (m/s) of currents (magnitude) with DA

10° N

10° N

8° N

8° N

6° N

6° N

4° N

4° N

latitude (deg) 

latitude (deg) 

RMSE (m/s) of currents (magnitude) without DA

2° N



2° N



2° S

2° S

4° S

4° S

6° S

6° S 95° E

0

100° E 105° E longitude (deg) 

0.02

0.04

95° E

110° E

0.06

0.08

0.1

100° E 105° E longitude (deg) 

0.12

0.14

0.16

110° E

0.18

0.2

Fig. 11 Spatial plot of RMSE (m/s) of magnitude of non-tidal currents without and with data assimilation over the whole SRM domain

RMSE (degrees) of currents (direction) without DA

RMSE (degrees) of currents (direction) with DA

10° N

10° N

8° N

8° N

6° N

6° N

4° N

4° N

latitude (deg) 

latitude (deg) 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

2° N



2° N



2° S

2° S

4° S

4° S

6° S

6° S 95° E

0

100° E 105° E longitude (deg) 

20

95° E

110° E

40

60

100° E 105° E longitude (deg) 

80

100

110° E

120

Fig. 12 Spatial plot of RMSE (degrees) of direction of non-tidal currents without and with data assimilation over the SRM domain

20

5.8. Quality of water level estimates during the occurrence of SLA. To establish the capability of the EnKF in estimating the peak water levels and residual currents during the SLA event; the improvement in the estimated water levels over the whole domain is assessed for the instance when the first positive SLA event (~ 26th January 2004 10:00) is recorded. The absolute non-tidal water level error is plotted and shown in Fig. 13 for the whole SRM domain. These results clearly show that during this SLA event, the nonlinear tidal characteristics are captured and hence results in improved estimates of water levels at the observation stations as well as at other locations in the domain. 12° N

12° N

10° N

10° N

8° N

8° N

6° N

6° N latitude (deg) 

latitude (deg) 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

4° N

2° N

4° N 2° N





2° S

2° S

4° S

4° S

6° S

6° S 95° E

0.05

100° E 105° E longitude (deg) 

0.1

0.15

110° E

0.2

0.25

0.3

95° E

0.35

100° E 105° E longitude (deg) 

0.4

0.45

0.5

110° E

0.55

Fig. 13 Spatial plot of absolute error in water level without and with data assimilation during the period of occurrence of SLA (26th January 2004 10:00)

6. Improved forecasting through Kalman filtering / Data Assimilation In the preceding sections, the efficacy of the EnKF based data assimilation scheme in improving the deterministic model predictions was demonstrated via twin experiment results from hindcast simulations. Next, we assess the quality of the model forecasts for a finite horizon. In the present study, the corrected states obtained during the hindcast runs were stored at 1 hr intervals. The model state corrected via data assimilation up to the present state are used as initial state for a deterministic model simulation 24 hours into the future. Thus, 21

new forecasts were generated at 1 hour intervals in a moving horizon fashion where the 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

length of the forecast horizon was 24 hrs. In the OpenDA framework, this was carried out by storing the EnKF corrected states in restart files, from which the forecast simulations are initiated. The forecast simulations covered the period starting from 12th January 2004 00:00 to 30th January 2004 23:00. Thus, the last forecasted value is available for 31st January 2004 23:00 and the total number of forecast cycles is 24 cycles/day x 19 days = 456 cycles. These 456 forecast cycles are then used to compute the RMSE as a function of forecast horizon 0, 1, 2, …, 24 hours. Thus, the RMSE value corresponding to the ith forecast horizon at a particular station is given as, 456

RMSE i 

 WL i  k  1  WL i  k 1

2

True

k

456

(7)

where WLTrue is the true water level and WLk is the corresponding water level in the kth forecast cycle. The forecast RMSE for the reference model is computed in an analogous fashion.

6.1. Forecasts at the observation stations for a finite prediction horizon. The forecast quality in terms of RMSE for forecast horizon up to 24 hrs is analyzed at all monitoring and LOI stations in the domain of study. Figures 14 - 16 show the forecast results for the three stations Keling, Raffles and Pontianak which are well distributed across the domain. Clearly the efficacy of forecasts from the EnKF corrected states is better than those from the deterministic model up to forecast horizon of 12 hrs. This implies that the data assimilation improves the forecast of water levels up to 12 hrs into the future, after which the effect of initial conditions die out and it starts to match that of the deterministic model forecast. An exact match does not occur, as there will be unspecified (secondary) origins of errors which EnKF also tries to correct for. Further, over a short horizon of up to 6 hrs, there is significant improvement in the forecast quality, which is clearly evident from the correspondingly low RMSE values. A mixed trend is observed when forecasting the magnitude and direction of currents. Better predictions are observed in the forecast of currents directions up to 24 hrs, whereas the quality of the current magnitude predictions first improves, but beyond 6 hrs it starts to fluctuate around the quality of the deterministic results. 22

This variation can be associated to the fact that current magnitudes have a cyclic behaviour with period half that of water levels. For the Keling station shown in Figure 14, there is no change in the direction of the currents for both deterministic and assimilated model runs, this is due to the fact that the location of this station in the grid is aligned with the land boundary and the velocity component in the normal direction (v) is zero.

RMSE (m)

Water level 0.2 Model without assimilation EnKF - Forecast

0.1 0

RMSE (m/s)

Current (Magnitude) 0.1 0.05 0

RMSE (Degrees)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Current (Direction) 1 0 -1

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Forecast Horizon (h)

Fig. 14 RMSE of water levels and currents (magnitude and direction) at Keling station for a 1, 2, .., 24 hrs forecast horizon

23

RMSE (m)

Water level 0.2 Model without assimilation EnKF - Forecast

0.1 0

Current (Magnitude) RMSE (m/s)

0.4 0.2

RMSE (Degrees)

0

Current (Direction) 35 30 25 20

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Forecast Horizon (h)

Fig. 15 RMSE of water levels and currents (magnitude and direction) at Raffles station for a 1, 2, .., 24 hrs forecast horizon

RMSE (m)

Water level 0.2 Model without assimilation EnKF - Forecast

0.1 0

RMSE (m/s)

Current (Magnitude) 0.08 0.06 0.04

RMSE (Degrees)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Current (Direction) 80 60 40

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Forecast Horizon (h)

Fig. 16 RMSE of water levels and currents (magnitude and direction) at Pontianak station for a 1, 2, .., 24 hrs forecast horizon 24

6.2. Forecasts at the unobserved locations for a finite prediction horizon The efficacy of EnKF in improving the forecasts of water levels and currents at two locations, LOI-2 and LOI-6 is demonstrated via forecast error plots in Figures 17 and 18. It is observed that the forecast quality of the estimates (water levels and currents) is far better than the deterministic model predictions for a forecast window up to 6 hrs. On the other hand, for a longer prediction horizon of 24 hrs, the quality of the forecasts varies at different locations, but the forecasts of direction of currents are superior to the predictions from the deterministic model. In the stations in Singapore waters, the improvements in currents generally have a longer forecast horizon than water levels, which is related to the fact that in Singapore water currents have a predominantly diurnal nature while water levels are predominantly semidiurnal (van Maren and Gerritsen 2012).

RMSE (m)

LOI-2 Water Level 0.2 Model without assimilation EnKF - Forecast

0.1 0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

RMSE (m/s)

Current (Magnitude)

RMSE (Degrees)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

0.2 0.1 0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Current (Direction) 30 25 20

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Forecast Horizon (h)

Fig. 17 Plot showing the RMSE of water level, current (magnitude) and current (direction) at unobserved location LOI-2 for a 1, 2, .., 24 hrs forecast horizon

25

LOI-6 Water Level

RMSE (m)

0.2

0

RMSE (m/s)

Model without assimilation EnKF - Forecast

0.1

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Current (Magnitude)

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Current (Direction)

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Forecast Horizon (h)

0.1 0.05 0

RMSE (Degrees)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

30 25 20 15

Fig. 18 Plot showing the RMSE of water level, current (magnitude) and current (direction) at unobserved location LOI-6 for a 1, 2, .., 24 hrs forecast horizon

6.3. Comparison of the quality of estimates for hindcast and forecast runs In the previous sections, the benefit of using EnKF as a data assimilation technique for hindcast and forecast were studied individually. We now present the water level estimates on the same time scale. To compare the quality of estimates for a specific period, first the hindcast runs are carried for a time horizon of 3 days starting from 27th January 2004 00:00 and to compare the 24 hrs forecast run, the corrected states at 29th January 2004 00:00 are stored as a restart file. Using this corrected state as initial condition, the deterministic model is run into future instances up to a time horizon of 24 hrs. The effectiveness of EnKF (hindcast and forecast) is shown at Keling and Pontianak stations by verifying their performance against the true water levels and those predicted by the SRM without assimilation. The comparison of time series of water level estimates using the different schemes are shown in Figures 19 - 20. It is observed that the estimates of water levels in the hindcast run are very much overlapping with the true water levels, thus indicates the superiority of EnKF (hindcast) over the deterministic water level model predictions (without assimilation). For the forecast runs, the quality of estimates is significant and generally in 26

better argument with the hindcast runs. For a prediction window of 6 hrs, the estimates from the EnKF -forecast almost match with the estimates obtained in the EnKF-hindcast runs. As the forecast horizon is moved further beyond 6 hrs up to 12 hrs, the water level estimates are still close to the true water levels and better than predicted water levels from the deterministic model. Whereas, for longer forecast horizon (up to 24hrs), the quality of the estimates deteriorates and becomes similar to the deterministic model predictions as the improved initial conditions gets washed out. This behaviour essentially occurs for all stations and all forecasting periods. This observation thus explains the quality of water level estimates shown in Figures 14 - 16 and trends of forecasts for varying prediction window. KELING Truth Deterministic Model Hindcast (EnKF) Forecast (EnKF)

1

Water Level (m)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

0.5

0

-0.5

-1 -48 hrs

-24 hrs

0 hrs

+24 hrs

Time horizon

Fig. 19 Efficacy of EnKF at Keling station – hindcast vs forecast; truth and deterministic results also shown

27

Pontianak 1 Truth Deterministic Model Hindcast (EnKF) Forecast (EnKF)

0.5

Water Level (m)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

0

-0.5 -48 hrs

-24 hrs

0 hrs

+24 hrs

Time horizon

Fig. 20 Efficacy of EnKF at Pontianak station – hindcast vs forecast; truth and deterministic results also shown 7. Network optimization based on correlation analysis In the previous hindcast runs, all the observation stations were used for assimilation. In reality, there may be some occasions, where the availability of information at any of these observation stations is limited. In such situations, the missing information may hinder the quality of forecasts. Besides this, the available information at different observations stations may be correlated. Reducing the use of number of observation stations data for data assimilation, not only reduces the redundancy in the system, but will also reduce the dimension of the Kalman gain matrix, consequently decreasing the computation time. Along these lines, we will try to define the optimal network by verifying the correlation between the water levels at different monitoring (observation) stations and their influence on the quality of forecasts.

28

7.1. Cross correlation of the water levels at the observation stations The cross-correlation analysis between the time series of water levels at each monitoring station along the study domain is shown in Fig. 21. This correlation plot with no time lag indicates that water levels at some stations are strongly correlated with each other. To verify the quality of estimates using a reduced set of monitoring stations, different configurations (based on the correlation analysis) are considered. Correlation Matrix 1

PONTIANAK MEDAN

0.9

GETING

0.8

CENDERING KUANTAN

0.7

TIOMAN

0.6

SEDILI HORSBURGH

0.5

TG-PAGAR RAFFLES

0.4

KUKUP

0.3

KELING KELANG

0.2

LUMUT

0.1

PENANG

PONTIANAK

MEDAN

GETING

CENDERING

KUANTAN

TIOMAN

SEDILI

HORSBURGH

TG-PAGAR

RAFFLES

KUKUP

KELING

KELANG

LUMUT

PENANG

LANGKAWI

LANGKAWI

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Fig. 21 Cross correlation matrix of water level at each station with no time lag

7.2. Configurations based on the correlation Based on the degree of cross correlation, three different cases were identified as shown below: Case (i): In this case, nine stations; Penang, Lumut, Keling, Raffles, Horsburg, Tioman, Cendering, Getting and Pontianak are considered as observations stations. Case (ii): In this case, ten stations; Medan, Langkawi, Lumut, Keling, Raffles, Horsburg, Tioman, Cendering, Getting and Pontianak are considered as observations stations. Case (iii): In this case eleven stations; Medan, Langkawi, Penang, Lumut, Keling, Raffles, Horsburg, Tioman, Cendering, Getting and Pontianak are considered as observations stations. 29

The remaining stations in each case are considered as validation stations, where the water levels and residual currents are estimated. The benefit of using different combination of observation stations based on the correlation analysis is shown in Figures 22 - 24, where the hindcast results for the respective cases (designated as network 2) are compared with the hindcast results of network ‗1‘ where all the sixteen observation stations are considered as observation stations. The percentage improvement at the validation stations for different networks is circled in Figures 22 - 24. It is observed that for case (iii); with eleven stations as observation stations, the hindcast results are close to hindcast results of network ‗1‘. Indeed it is also observed that by using the network ‗2‘ for case (iii), at stations close to the Andaman sea and South China sea, we see a slight improvement in the predicted water levels over network ‗1‘, whereas at other stations the improvements in predicted water levels are about 4% less than that in the case of network ‗1‘. Hence of these three configurations, case (iii) can be used as best or most suitable optimal monitoring network for estimating the water levels and currents in the Singapore regional waters. 100 Network 1 Network 2 (Case i) 80

% Improvement

60

40

PONTIANAK

GETING

CENDERING

KUANTAN

TIOMAN

SEDILI

HORSBURGH

TG.PAGAR

RAFFLES

KUKUP

KELING

KELANG

LUMUT

PENANG

0

LANGKAWI

20

MEDAN

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Fig. 22 Percentage improvement in predicted water levels by implementing EnKF over the free model run for case (i) ; % improvement at the validation stations are dotted circles

30

100 Network 1 Network 2 (Case ii)

% Improvement

80

60

40

PONTIANAK

GETING

CENDERING

KUANTAN

TIOMAN

SEDILI

HORSBURGH

TG.PAGAR

RAFFLES

KUKUP

KELING

KELANG

LUMUT

PENANG

LANGKAWI

0

MEDAN

20

Fig. 23 Percentage improvement in predicted water levels by implementing EnKF over the free model run for case (ii) ; % improvement at the validation stations are dotted circles

100 Network 1 Network 2 (Case iii) 80 % Improvement

60

40

PONTIANAK

GETING

CENDERING

KUANTAN

TIOMAN

SEDILI

HORSBURGH

TG.PAGAR

RAFFLES

KUKUP

KELING

KELANG

LUMUT

PENANG

0

LANGKAWI

20

MEDAN

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Fig. 24 Percentage improvement in predicted water levels by implementing EnKF over the free model run for case (iii) ; % improvement at the validation stations are dotted circles

31

8. Summary and conclusions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

In this study a systematic analysis is performed on implementing Kalman filter as a data assimilation scheme to improve the accuracy of modelling and predicting the water levels and residual currents in the Singapore regional waters by means of the Singapore Regional Model (SRM). A SRM twin experiment set up was designed, in which non-tidal (surge) effects from the South China Sea were represented in the SRM model open boundary conditions. This reflects the situation that the major surges are not internally generated but are propagating from the South China Sea. The setup attempts to reproduce the major surge period of January 2004 and hence provides a realistic test bed for the assessment of the EnKF as a data assimilation scheme for this large scale application. The efficacy of EnKF is assessed by doing hindcast simulations to predict the water levels at the observed stations and as well as at the unobserved locations. These stations are widely spread in the domain of the area of interest and variation of outcome clearly presents the sensitivity of domain. In the hindcast runs, the EnKF yields accurate estimates of water levels at both observation stations and at unobserved locations. On average, the improvement over the deterministic model prediction in estimating the water levels at all the stations considered is almost 70%. At the stations close to the Andaman sea boundary, the effects of the uncertainties are significant due to varying bathymetry that gives rise to local processes and hence result in more local and smaller improvement compared to stations close to South China Sea and Java Sea. Similar improvement is observed spatially over the whole domain and there is significant improvement in the stations influenced by the EnKF updates of the boundary conditions at South China Sea. Especially for the stations located in the Singapore Strait, the improvement is superior. Using only water level observations that are available at the observation stations, the EnKF is able to generate accurate estimates of currents as well, since it updates the full model state. The quality improvement of water levels and currents forecasts is very significant up to 12 hrs and beyond that the improvement is mixed. An optimal network of monitoring stations is designed based on the correlation analysis to reduce the redundancy in the system and thereby decrease the computation time for generating daily operational forecasts for the Singapore region. From the above detailed study and analysis, it can be concluded that EnKF is very efficient data assimilation technique to improve the description of the non linear dynamics in this area and thus enhance the capability of the hydrodynamic model to accurate forecast of water levels and residual currents. It is observed that the forecast quality of water levels is 32

very significant for a forecast horizon of 12 hrs. This short term forecast is practically 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

reasonable due to the fact that this region is highly dominated by mixed (diurnal and semi diurnal) tidal constituents. In view of the non availability of current measurements in this region, the accurate estimates of the residual currents in the domain of interest are highly beneficial. The estimates of currents can be further improved by incorporating the wind data from WRF which is available in this region for every 6 hrs; thereby the forecasts of these residual currents may yield better estimates for a longer forecast horizon. The optimal network of monitoring stations (case: iii) can be used as the most suitable optimal monitoring network for estimating the water levels and currents in the Singapore regional waters.

Acknowledgements The authors gratefully acknowledge the support and contributions of the Singapore-Delft Water Alliance (SDWA) and Deltares‘ strategic research funding. The research presented in this work was carried out as part of SDWA‘s ―Must-Have Box‖ research program (R-303001-003-272). The authors also thank MPA and UHSLC for providing the maritime data for analysis. The authors wish to thank Martin Verlaan, Erwin Loots, Arjen Markus and Stef Hummel for the support in using the OpenDA software. The authors also wish to thank Alamsyah Kurniawan, Seng Keat Ooi, Piyamarn Sisomphon, Pavlo Zemskyy and Serene Tay for providing the necessary help in nesting and generating the model.

33

References 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Babovic V (2007) Data-model Integration - An Approach to assimilation of Sea Level anomaly Data into an Oceanographic Model. In: Navarro PG, Playan E (eds) Num Modelling of Hydrodynamics for Water Resources, Zaragoza, SPAIN, 2007. pp 67-75 Babovic V, Karri RR, Wang X, Ooi SK, Badwe A (2011) Efficient Data Assimilation for Accurate Forecasting of Sea-Level Anomalies and Residual Currents using the Singapore Regional Model. Geophys Res Abstr 13. doi:EGU2011-8492 Babovic V, Sannasiraj SA, Chan ES (2005) Error correction of a predictive ocean wave model using local model approximation. J Marine Syst 53 (1-4):1-17 Calkoen C, Wensink H, Twigt D, Sisomphon P, Mynett A (2009) Sea level anomalies from satellite altimetry - retrieval and validation. In: Proc of the 8th Int conf on Hydroinform, Chile, 2009. p 8. doi:188a164 Carme S, Dinh-Tuan P, Verron J (2001) Improving the singular evolutive extended Kalman filter for strongly nonlinear models for use in ocean data assimilation. Inverse Prob 17 (5):1535-1559 Chen C, Malanotte-Rizzoli P, Wei J, Beardsley RC, Lai Z, Xue P, Lyu S, Xu Q, Qi J, Cowles GW (2009) Application and comparison of Kalman filters for coastal ocean problems: An experiment with FVCOM. J Geophys Res 114 (C5):C05011 Chui CK, Chen G (2009) Kalman Filtering: With Real-Time Applications. 4th edn. SpringerVerlag, New York Inc El Serafy G, Gerritsen H, Hummel S, Weerts A, Mynett A, Tanaka M (2007) Application of data assimilation in portable operational forecasting systems—the DATools assimilation environment. Ocean Dynam 57 (4):485-499 El Serafy G, Verlaan M, Hummel S, Weerts A, Dhondia J (2010) OpenDA Open Source Generic Data Assimilation Environment and its Application in Process Models. Geophys Res Abstr 12. doi:EGU2010-9346-2 Evensen G (2007) Data Assimilation-The ensemble Kalman filter. Springer-Verlag Berlin Heidelberg, Fuhrman DR (2001) Data Assimilation and Error Prediction Using Local Models. IHE, Delft

34

Gaur S, Deo MC (2008) Real-time wave forecasting using genetic programming. Ocean Eng 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

35 (11–12):1166-1172 Gerritsen H, Twigt D, Mynett A, Calkoen C, Babovic V (2009) MH Box - analysis and prediction of Sea Level Anomalies and associated currents in Singapore and Malacca straits. In: Proc of 8th Int conf of Hydroinform, Chile, p 10. doi:188a163 Ghorbani MA, Khatibi R, Aytek A, Makarynskyy O, Shiri J (2010) Sea water level forecasting using genetic programming and comparing the performance with Artificial Neural Networks. Comput Geosci 36 (5):620-627 Heemink AW, Verlaan M, Segers AJ (2001) Variance Reduced Ensemble Kalman Filtering. Mon Weather Rev 129 (7):1718-1728 Kalman RE (1960) A new approach to linear filtering and prediction problems J of basic eng 82 (1):35-45 Karri RR, Wang X, Ooi SK, Babovic V, Gerritsen H (2011) Improving predictions of water levels and currents for Singapore regional waters through Data Assimilation using OpenDA. In: proc of 34th IAHR Biennial Congress, Brisbane, Australia, Engineers Australia, pp 4521-4528 Kernkamp H, Zijl F (2004) Further Hydraulic Model Studies for Pulau Ubin & Pulau Tekong Reclamation Scheme. Interim Report on Hydrodynamic Modelling – Model Set-up and Calibration. Delft Hydraulics Report Z3437, Delft, The Netherlands Kurniawan A, Ooi SK, Gerritsen H, and Twigt D (2010) Calibrating the Regional Tidal Prediction of the Singapore Regional Model using OpenDA. In: Tao J, Chen Q, Lion S (eds) Proc of 9th Int Conf of Hydroinform, Tianjin, China, Chemical Industry, pp 1406-1413 Kurniawan A, Ooi SK, Hummel S, Gerritsen H (2011) Sensitivity analysis of the tidal representation in Singapore Regional Waters in a data assimilation environment. Ocean Dynam 61 (8):1121-1136 Madsen H, Canizares R (1999) Comparison of extended and ensemble Kalman filters for data assimilation in coastal area modelling. International Journal for Numerical Methods in Fluids 31 (6):961-981 Moeini M, Etemad-Shahidi A, Chegini V, Rahmani I (2012) Wave data assimilation using a hybrid approach in the Persian Gulf. Ocean Dynam 62 (5):785-797 35

Ooi SK, Zemskyy P, Sisomphon P, Gerritsen H, Twigt D (2009) The effect of grid resolution 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

and weather forcing on hydrodynamic modelling of South East Asian waters In: Proc of 33rd IAHR Congress, Vancouver, Canada, pp 3712-3719 Ponsar S, Luyten P (2009) Data assimilation with the EnKF in a 1-D numerical model of a North Sea station. Ocean Dynam 59 (6):983-996 Rao R, Babovic V (2009) Wavelet transformation based data assimilation for improved ocean hydrodynamic modelling - Singapore regional model case study. In: Proc of 8th Int Conf on Hydroinform, Chile, p 10. doi:188a98 Rao R, Babovic V (2010) Genetic Programming based Sea Level Anomaly forecasting models as Data Assimilation tools. In: Proc of 17th IAHR-APD cong, Auckland, New Zealand, 21st - 24th, Feb 2010 2010. Rao R, Gerritsen H, Van den Boogaard H, Babovic V (2010) South China Sea winds as a triggering mechanism for sea level anomalies around the Singapore coast. In: Jinhua Tao, Qiuwen Chen, Liong S-Y (eds) Proc of 9th Int Conf on Hydroinform, Tianjin, China, Chemical Industry Press, pp 1266-1273 Sannasiraj SA, Babovic V, Soon Chan E (2005) Local model approximation in the real time wave forecasting. Coastal Eng 52 (3):221-236 Sannasiraj SA, Zhang H, Babovic V, Chan ES (2004) Enhancing tidal prediction accuracy in a deterministic model using chaos theory. Adv Water Resour 27 (7):761-772 Sorensen JVT, Madsen H (2004) Efficient Kalman filter techniques for the assimilation of tide gauge data in three-dimensional modeling of the North Sea and Baltic Sea system. J Geophys Res 109 (C3):C03017 Sun Y, Babovic V, Chan E (2012) Artificial neural networks as routine for error correction with an application in Singapore regional model. Ocean Dynam 62 (5):661-669 Sun YB, Babovic V, Chan ES (2010) Multi-step-ahead model error prediction using timedelay neural networks combined with chaos theory. J Hydrol 395 (1-2):109-116 Tippett MK, Anderson JL, Bishop CH, Hamill TM, Whitaker JS (2003) Ensemble Square Root Filters. Mon Weather Rev 131 (7):1485-1490

36

van Maren DS, Gerritsen H (2012) Residual flow and tidal asymmetry in the Singapore 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Strait, with implications for resuspension and residual transport of sediment. J Geophys Res 117 (C4):C04021 Verlaan M, Heemink A (1997) Tidal flow forecasting using reduced rank square root filters. Stochastic Hydrology and Hydraulics 11 (5):349-368 Weerts AH, El Serafy GY, Hummel S, Dhondia J, Gerritsen H (2010) Application of generic data assimilation tools (DATools) for flood forecasting purposes. Comput Geosci 36 (4):453-463 Wei J, Malanotte-Rizzoli P (2010) Validation and application of an ensemble Kalman filter in the Selat Pauh of Singapore. Ocean Dynam 60 (2):395-401 Yen P-H, Jan C-D, Lee Y-P, Lee H-F (1996) Application of Kalman Filter to Short-Term Tide Level Prediction. J WATERW PORT C 122 (5):226-231 Zhang ZX, Li CW, Qi YQ, Li YS (2006) Incorporation of artificial neural networks and data assimilation techniques into a third-generation wind-wave model for wave forecasting. Journal of Hydroinform 8 (1):65-76

37