A Case Study of Wave Forecast Over South China Sea Using SWAN Model and Ensemble Kalman Filter Method

The marine and atmospheric model’s accuracy mainly depends on the quality of the initial and boundary conditions, numeric, and physics of the model, and on the variation of the phenomena (e.g., chaos of atmosphere) [1-3]. In the wave prediction problem, the wave field is mainly calculated based on the wave–action balance equations (e.g., the WAVEWATCH III, simulating waves nearshore SWAN, wave model WAM) with the given initial and boundary conditions. There are two types of boundary conditions, including surface and open boundaries. The surface boundary is a field of meteorological factors (wind and pressure fields) extracted from meteorological forecasting models. For the open boundaries, the wave direction, period, wave height or wave spectrum from models or observations can be used at a limited area, such as the South China Sea East Sea of Vietnam (SCS). There are two ways to determine the initial condition for the wave model: (i) The initial condition is the state of the calm sea surface until the actual state of the sea surface is reached to continue calculating the forecast (for example, in the SCS, this process usually takes 48 to 72 h); and (ii) using the analysis wave field as the initial condition. This initial field can be either the observational data or the wave field from the previous forecast (the hot start running).


Introduction
The marine and atmospheric model's accuracy mainly depends on the quality of the initial and boundary conditions, numeric, and physics of the model, and on the variation of the phenomena (e.g., chaos of atmosphere) [1][2][3]. In the wave prediction problem, the wave field is mainly calculated based on the wave-action balance equations (e.g., the WAVEWATCH III, simulating waves nearshore -SWAN, wave model -WAM) with the given initial and boundary conditions. There are two types of boundary conditions, including surface and open boundaries. The surface boundary is a field of meteorological factors (wind and pressure fields) extracted from meteorological forecasting models. For the open boundaries, the wave direction, period, wave height or wave spectrum from models or observations can be used at a limited area, such as the South China Sea -East Sea of Vietnam (SCS). There are two ways to determine the initial condition for the wave model: (i) The initial condition is the state of the calm sea surface until the actual state of the sea surface is reached to continue calculating the forecast (for example, in the SCS, this process usually takes 48 to 72 h); and (ii) using the analysis wave field as the initial condition. This initial field can be either the observational data or the wave field from the previous forecast (the hot start running).
It is found that the errors of the initial conditions of models can be alleviated using data assimilation (DA) methods, from empirical methods (successive correction method, nudging) to statistical methods (optimal interpolation and variational (VAR) methods) and advanced methods (Kalman Filter-KF, Ensemble Kalman Filter-EnKF or Hybrid VAR/EnKF) [4][5][6][7][8][9][10][11]. The variational method uses Bayesian theory to find the optimal analysis state for a given/priori state (or the background or shortrange forecast), deviation/innovation of the model, and limitednumber-observations and the prior/predefined information Oceanography & Fisheries Open access Journal of the background and observations (Gaussian distribution of background and observation covariance errors). Due to the limit of the priori error statistics from a prescribed source, as it relies heavily on the system, the ensemble-based DA methods (e.g., EnKF) retrieve it from an ensemble so that the use of linearized models becomes unnecessary [12].
Several DA applications for oceanographical problems can be listed here. While a successive correction method or statistical interpolation (SI) has been widely used in atmospheric and wave models by various meteorological centers worldwide, Emmanoui examined ways to improve this method to assimilate altimeter satellite data (from the European Space Agency ESA-ENVISAT) in the WAM model, and the result for the impact on wave forecasting in the Indian Ocean is much more considerable than in isolated areas [13]. More advances in using variational methods have been applied in Hasselmann [8] to the inversion problem with synthetic aperture radar (SAR) data by extending the cost function with a cutoff error term.
During wave forecast (large-scale wave model), the assimilation procedure was applied every time new observational data were available. This method was also revised in Yang-Ming [6] and has been extended more for the wave spectra from pitchand-roll buoy data (a one-dimensional wave spectrum compared to a full two-dimensional wave spectrum of inverted SAR spectrum in the original Hasselmann work). More variational-based DA applications can be found with the validations of individual numerical adjoint procedures in Orzech [10] for a nearshore wave model and can be applied for the public domain of the SWAN model.
The main issue in forcing problems is that the wind sea/ wave quickly loses memory of its initial state and therefore needs forecast range assimilation types along, such as the KF approach [14]. Yoon [11] developed a wave DA scheme based on KF combined with a nonlinear wave model. A number of experiments had been carried out using the synthetic data with noise for one-dimensionally propagating irregular waves, and the results showed that the estimated free surface elevation using the present DA scheme matched well the numerical solution of the nonlinear wave model in the absence of noise. The developed DA scheme can clearly improve the stability of the wave prediction system in comparison with that based on a purely numerical DA scheme. Evensen [15] proposed EnKF in which the background covariance error matrix can be estimated by the spread information of ensemble forecasting systems, for example, Serpoushan [16] using the Open Data Assimilation (OpenDA) toolbox with an EnKF option to improve the wave simulation for the SWAN model with wave data (significant wave height-SWH and peak wave period) recorded at five stations in the Persian Gulf.
Further developing the EnKF method, Lei [9] used an ensemble-based approach (ensemble optimal interpolation or extension of ensemble Kalman smoother-EnKS, more detail in Evensen [17]) to assimilate altimeter SWH with a number of different observations from satellites (Jason-1 and 2 and ENVISAT) and buoys in the Wave Watch III model for the SCS. Results showed that this ensemble-based assimilation scheme provided as many positive outcomes as those of other previous schemes; therefore, it helps to alleviate the computational costs in practice.
The SCS and the focus area of this study is located over the west of the Pacific Ocean and surrounded by the island of Taiwan and the Philippines islands in the east; the Indonesian islands (Borneo, Sumatra) and the Malaysian Peninsula in the south and southeast; and the Indochina peninsula in the west and Southern China mainland in the north. With an area of about 3,400,000 km 2 , the average depth is ~1 km, and the maximum depth is ~5 km [18]. Many studies have been carried out for wave simulations, forecasts and using the observation data from satellites to enhance the quality of initial conditions over the SCS [18][19][20]. The errors of the wave height forecast can be amplified, especially during a tropical cyclone or strong northern winds over the SCS [21].
By using the wave observation of an oil platform over the SCS, this study investigated the capabilities of DA for wave forecast. This research applied the EnKF method, which already integrated in the OpenDA toolbox for the SWAN model. The remainder of the paper is organized as follows. The next section describes an overview of EnKF, the OpenDA toolbox, the SWAN model, the data, and the experimental design. The main results are discussed in Section 3, and conclusions are given in the final section.

EnKF Algorithm and the OpenDA Toolbox
Hong and Kalnay [22] summarized the DA process as the finding of the best combination of the model forecast f x (also called as background fields) and the observation 0 y to generate the improved initial condition or analysis state a x as: where H is the forward observation operator (converting/ interpolating model variables to observation space) If we define that f P and are the error covariance matrices for the forecast and observation and H is the linear perturbation of H then the weighting matrix K is given by Kalman gain formula: According to Evensen [15], if we have forecasts or an ensemble forecast with size and denoted as ( ) f e K x k=1,2,… and the ensemble mean f e x − then the forecast error matrix can be estimated by: In this study, we used the pre-build EnKF procedures of OpenDA package for a wave forecast mode (described in the next section). Developed by Verlaan et al. [23] of the Delft University of Technology in the Netherlands, OpenDA is a generic toolbox to provide many typical efficient DA schemes (e.g., ensembleor variational-based approaches) which is independently compatible with the models, including the SWAN model in this study. Following OpenDA's document, certain interfaces are defined in OpenDA to communicate to outside models which need to go through assimilation procedures [23].
There are some typical ways to generate the ensemble forecast, including using different model parameters or different models with the same forcing or one model but with different forcing (by adding noise to a given forcing). In this study with the OpenDA toolbox, the wave forecast is mainly dependent on wind forcing, and the adding noise to the given wind fields will be used to generate different wind boundaries. The wind effect in source/ sink parts in wave action equations (denoted as γ ) is updated to surface boundaries depending on the time correlation scale, which can be described as the following equation for time step i th : with a given standard deviation for wind fields. In addition to the model error generated by the wind forcing noise, an observation error can also be generated the same way by adding a Gaussian noise to observation when it is updated. Another aspect of assimilation is the practical computations of the covariance matrix. OpenDA uses the horizontal correlation scale () in deriving the i row and j column (denote as ij) covariance element ( , cov i j )of the covariance matrix as in the following equation: ii jj δ is the distance between the ii and jj points (in spherical space). Based on the main parameters, including ensemble size, time correlation scale, horizontal correlation scale, and standard deviation of noise, a few EnFK experiments is setup and listed in Subsection 2.4 of this section.

SWAN Model Description
The study uses the SWAN model for wave forecasts version 41.10. This is a third-generation spectral wave model to simulate and forecast waves generated by surface gravity waves. SWAN integrates with both finite difference and finite element schemes in all dimensions (time, geographic space, and spectral space) for the action balance equation (Booij [24]). The model can be downloaded via the following link: https//www.swanmodel. sourceforge.net/.

Data Bathymetry Data
Depth and shoreline data for SCS are Earth Topography-ETOPO1, which was collected from NOAA [25]. Illustrated in Figure 1, the study domain is the entire SCS stretching from 1.5° to 25° N (from 150,000 to 2,800,000 in UTM 48, projection zone 105 th ) and 99° to 121° E (−150,000 to 2,200,000 in UTM 48, projection zone 105 th ). The study domain used unstructured grids with 10km of gridline in the nearshore areas and 50km of gridline for the offshore areas. The total number of grid cells are 12,858, which were generated by ADCIRC software version 10.0 [26].

Initial Conditions, Boundary Conditions, and Driving Wind Data
For the wave boundary, this study used three open boundaries for waves, including:(i) the Taiwan Strait (marked as 1 in Figure  1) in the north, connected to the Pacific Ocean; (ii) the Bashi Strait (marked as 2 in Figure 1) in the east, connected to the Pacific Ocean; and (iii) the Strait of Malacca (marked as 3 in Figure 1) in the south, connected to the Indian Ocean. The initial conditions were used from previous running (the hot-start setup) and the wind data were taken from wind forecast fields of the global CFS model of the National Center for Environmental Forecasting (NCEP) [27,28], with a horizontal resolution of 0.2 × 0.2 degrees and updated every 1 h, and were converted into UTM 48 projections [29].

Case Study and Observational Data
The period from 16 September 2013 to 20 September 2013 was selected in this research to study the performances of EnKF with single point observation. There was a tropical depression (TD) over the study area for this period (18W TD according to Joint Typhoon Warning Center-JTWC [30]). Formed in SCS, this TD had moved to the west and landed to the middle coastal of Vietnam. Wave observational data for the assimilation experiments were collected from an oil platform named MSP1 of VietSovPetro Company [31], with a temporal resolution of 1h. This station is located offshore of Vung Tau City, which is about 150km southeast of the coast and marked as the red point in Figure 1.
In Table 1, the details of wave observation from MSP1 from 16 September 2013 to 20 September 2013 are shown. In Table 1, the re-analysis of the ECMWF data is also provided for reference but only available for every 6h.

Experimental Designs
As mentioned in Subsection 2.1 of this section, by use of the OpenDA toolbox, the experiments were easily designed based on the different selections of the main factors affecting the DA process, including: (i) the horizontal correlation scale (from Exp01 to Exp05; (ii) the time correlation scale (Exp03, Exp08, Exp09); (iii) the number of ensemble members (Exp03, Exp10); and the (iv) standard deviation of noise for the wind boundary updating processes (Exp03, Exp06, Exp07). Details of all ensemble experiments are presented in Table 2. For all EnKF experiments, the standard deviation of observational noise was set to 0.1m and updated every 1 h. In the EnKF experiment, the observation was updated from 00z 16 September 2013 to 23z 18 September 2013 and the forecast time is from 00z 19 September 2013 to 23z 20 September 2013.  The non-using DA was denoted as the NoEnKF experiment. To reduce the uncertainties of wind driving data for SWAN in the NoEnKF experiment, the study still generated the ensemble forecast for this case, with 20 members and using a 1.5 m/s standard deviation for wind noise. The time scale is 24h. Validation was carried out for the ensemble mean forecast of the EnKF experiment, including: (i) a comparison with the observation data from oil platform MSP1 hourly; and (ii) a comparison with

Oceanography & Fisheries Open access Journal
re-analysis ECMWF data every 6h at computing grid points. The validation of the experiments was based on common verification indexes, including BIAS and Root mean square error (RMSE). If we defined obs(i) as the observed value and fcst(i) as the forecast at time i, and N as the size of the sample: BIAS is determined by the following formula:

Oceanography & Fisheries Open access Journal
The assimilation results and the forecast from the EnKF experiment show the large positive bias of the model for the period from 00z 17 September 2013 to 00z 19 September 2013 (assimilated period), and the EnKF updated these model error information (positive biases) to the initial fields, which can help to clearly reduce the model error for a 24h forecast range (00z 19 September 2013 to 00z 20 September 2013). For the period from 12z 19 September 2013 to 00z 20 September 2013, the model performed quite closely to the observation, but after this period, the sea state had changed by reducing the wave height, and all of the experiments could not catch this result. In fact, this is the typical result while applying the KF algorithm when the phenomena change quite suddenly, for example, when the air temperature changes by a cold surge or the onset of the rainy season [32,33].
Even though there is no improvement for the forecast of the second day, Figure 2b shows the larger ensemble spread (estimated as the standard deviation of ensemble members to its ensemble mean) of most EnKF experiments (~0.2m to 0.4m), which can provide high proportions of ensemble members closer to observations than the NoEnKF forecast. Related to the ensemble spread, Figure 2b shows the stable spreads of NoEnKF around 0.2-0.3m and the lowest spreads in Exp06 relating small wind noise of 0.5m/s. The Exp10 with 40 members can provide the largest spreads, meaning the lowest errors of this experiment, compared to others.  Figure 3 shows that over the South of the Paracel Islands (~8N-12N), all experiments provided an overestimated forecast, with positive BIAS ranging from 0.4-0.8 and a better forecast relating to the North of the Paracel Islands (~12N-20N), with BIAS around ±0.2. In NoEnKF, the bias error is 0.6-0.9 for area 8N-12N (Figure 3a). When using EnkF, this error reduced to 0.3-0.5 for the experiments Exp04, Exp05, and Exp06 (Figure 3eg) and 0.5-0.6 for Exp10 (Figure 3k). The same validations for the RMSE score are in Figure 4. Excluding Exp04, Exp05, Exp06, and Exp10, the rest of the EnKF experiments have quite similar BIAS and RMSE values to NoEnKF. The area with lower BIAS and RMSE scores is also associated with the change of the initial conditions when using DA and will be discussed in the next subsections.

Effects of Parameters on Initial Conditions and Forecast
To evaluate the impact of the number of ensembles to SWH, we studied the calculation of Exp03, the Exp10 options with 20 and 40 ensembles, and kept the horizontal correlation scale fixed to 200km, the wind noise to1.5m/s, and the time correlation scale to 24h (Figure 5c,j for SWH and Figure 6c,j for wave period). The effects of the ensemble size can be seen from the increments of Exp03 and Exp10. The distributions of the increments are quite similar but differ in value. For example, for SWH, there are the same negative increments, but the difference value for Exp03 is about −0.5 to −0.7 whereas for Exp10, it is about −0.3 to −0.5. Another aspect is that when the number of ensembles increases, the change of SWH between ensembles increases, which can generate larger closing to observation members in Exp10 and can somehow reduce the error of Exp10 compared to that of Exp03 (Figure 7d,k).
When changing the time correlation scale in updating the boundary from 12h to 36h in Exp08, Exp03, and Exp09 (in Figure  6h,c,i) with the same horizontal correlation scale, ensemble size, and wind noises, these illustrations show that the temporal correlation scale is proportional to its impact on the ensemble mean SWH in terms of increment magnitude around the observation station (the H s_EnKF -H s_NoEnKF decreases from −0.5m to −1.0m when the time correlation scale changes from 12h to 36h). By contrast, the time correlation scale decreases, and the influence in magnitude and area around MSP1 area is also reduced. The forecast results are similar for the 24h forecast range of Exp08, Exp03, and Exp09 (Figure 7i,d,j), whereas for the 48h forecast range, the 36h time correlation scale experiment provided better results (reducing by 10-15% for the ensemble mean forecast error).
Significant effects were present on distribution and the magnitude of increments of the ensemble mean SWH (Figure 5ae) and wave period (Figure 6a-e) when increasing the horizontal correlation scales from 50km to 5000km (Exp01 to Exp05). In option Exp01 with the small horizontal correlation scale, the ensemble means SWH almost changed around the MSP1 point with −0.4m and −0.8s for the wave period. With horizontal scales from 100-5000km, the increment for mean SWH is about −0.8, but having total distributions, the Exp04 and Exp05 spread the increments over the whole computed domain. In Exp01 and Exp02, when the waves propagate from offshore across the MSP1 point, the ensemble mean SWH decreases dramatically, like the case when the waves propagate through the obstacles, which is inconsistent with actual wave propagation. In the larger horizontal correlation scale, Exp03 to Exp05, the ensemble mean SWH contours are more consistent with the actual ocean wave propagation but still ensure the effectiveness of DA.
Clear effects can be seen on the distributions of increments for initial analysis and on the spread of the ensemble system for the different standard deviation of generating wind noise from 0.5m/s to 2.5m/s in Exp06, Exp03, and Exp07, respectively (for the ensemble mean SWH, in Figure 5f,c,g, and for the wave period, in Figure 6f,c,g). The smaller value of wind noise leads to more stable (or less varied) SWH of ensemble members in each option. By contrast, as the wind noise increases, the SWH of ensemble members in each option increases, especially in the forecast period. For the 48h forecast range, Exp03 ( Figure 7d) and Exp07 (Figure 7h) with larger ensemble spreads can generate more ensemble members closer to observation than in Exp06 ( Figure  7g).

Oceanography & Fisheries Open access Journal Conclusion
This study used the EnKF assimilation method (integrated in the OpenDA system) to investigate the effects of observation (collected from an oil platform) on improving wave forecast with the SWAN model for the SCS for the period from 16 September 2013 to 20 September 2013. A set of numbers of assimilation experiments was established based on the changes of key parameters (ensemble size, standard deviation for wind noise fields, time correlation scales in boundaries updating processes, and horizontal correlation scale in covariance derivation) in EnKF of OpenDA. The wave height assimilation experiments showed the effects on the whole computing domain and for other model variables (e.g., the direction and period of wave).
The larger the horizontal correlation scale was, the more accurate SWH was in both the DA period and the forecasted period. However, the difference between the three options, whether the horizontal correlation scales are 200km, 500km, or 5000km, is insignificant. With a horizontal correlation scale of less than 200km, SWH distribution around the observation station fails to simulate the actual wave propagation. Thus, the horizontal correlation scale of 200km is enough to ensure the calculation results. During the DA period (00z 16 September 2013 to 23z 18 September 2013), the greater the wind noise, the more accurate the results, but the contrary is true in the forecasted period (00z 19 September 2013 to 23z 20 September 2013), meaning that the greater the wind noise, the more inaccurate the forecasted results. Thus, a wind noise of 1.5m/s is sufficient to ensure the results of DA and forecast.
During the DA period, the larger the time correlation scale was, the greater the error of SWH between simulation and observation; however, these errors do not differ significantly among options. By contrast, in the forecasted period, the larger the time correlation scales were, the more accurate the forecast. Thus, a time correlation scale of 36h ensures both the results of DA and forecast. It is noticeable in this study that the larger ensemble size is proportional with more accurate results during both the DA period and the forecasted period. However, a larger number of ensemble sizes require longer calculation time in practice.
Compared to both the observational station of the oil platform and the re-analysis of ECMWF data over the computing domain, the assimilation results and the forecasts from the EnKF experiments show the large positive bias of the model for the period from 00z 17 September 2013 to 00z 19 September 2013 (assimilated period), and the EnKF had updated these model errors (positive biases) to the initial fields, which can help to clearly reduce the model bias for the 24h forecast range (00z 19 September 2013 to 00z 20 September 2013).
For the period from 12z 19 September 2013 to 00z 20 September 2013, the simulations resembled quite closely the observations, but for the 48h forecast range, when the sea state had changed quickly (reducing the wave height), it led to all of the assimilation experiments not being able to have a better adjustment compared to NoEnKF. The EnKF experiments showed a clear lower BIAS and RMSE score relating to the area around the assimilated observation position. Another aspect showing the non-linear behaviors in tuning parameters of EnKF is in Exp06. The Exp06 experiment had the smallest wind noise (leading small ensemble spreads) but can still reduce the BIAS and RMSE compared to other EnKF experiments and NoEnKF. More case studies and a combination of various observations in assimilation will be carried out to produce more comprehensive skill verifications in upcoming studies.