An Assimilation Simulation Approach for the Suspended Sediment Concentration in Inland Lakes Using a Hybrid Perturbation Generation Method

Suspended sediments, one of the most important factors affecting the water environments of inland lakes, are closely related to the migration and interaction of various pollutants. Existing studies show that the suspended sediment concentration can be accurately predicted based on assimilation methods coupled with hydrodynamic models. However, in the hydrological assimilation simulation process, the existing perturbation generation methods consider the perturbation error to follow a random Gaussian distribution, which does not consider the spatial variation characteristics of errors. Thus, in this paper, we proposed a new method that generates a hybrid perturbation field for the assimilation simulation instead of using random error. The proposed approach was validated through assimilation simulations of the suspended sediment concentration of Taihu Lake, China, and five assimilation experiments were conducted. The proposed method was compared with the existing methods for generating the perturbation field. After three days and 72 time steps of assimilation simulation based on the hybrid perturbation field, the proposed assimilation method provided results that were more consistent with the buoy-measured data. These findings demonstrate that the proposed method for generating a hybrid perturbation field has a higher simulation accuracy than other methods and is therefore effective and provides a new idea for the assimilation simulation of suspended sediment concentrations in inland lakes.


Introduction
Due to increasing anthropogenic pressure, the expansion of urban areas and the unreasonable exploitation of water resources, the deterioration of inland lake water environments is becoming increasingly serious (Friese et al. 2010;Wu et al. 2012;Wei et al. 2019). The main influencing factors of inland lake water environments include suspended sediments, plankton and dissolved organic matter. As one of the most important factors affecting the water environment, suspended sediments are carriers of various pollutants and are closely related to the migration and interaction migration and interaction of those pollutants (Wu et al. 2012;Wei et al. 2019;Zhang et al. 2019). Because of the obvious effects of wind and waves on sediments, sediments are easily resuspended, providing living environments for algae and promoting their growth. As a result, resuspension leads to frequent outbreaks of algal blooms. Especially when the concentration of suspended sediment is excessively high, the attenuation of light will cause the death of aquatic plants, resulting in considerable ecological damage and water pollution (Wu et al. 2012;Zhang et al. 2019). Therefore, the simulation and prediction of suspended sediment concentrations are key issues in research on the water environments of inland lakes.
At present, hydrodynamic modelling is one of the main methods for predicting suspended sediment concentrations. Based on coupled hydrodynamic characteristics, hydrodynamic models comprehensively consider the interactions and genetic mechanisms of multiple factors, such as lake bathymetry, wind speed, water depth, sedimentation rate and resuspension, to describe the spatial-temporal distribution characteristics and dynamic evolution of suspended sediments in multiple dimensions (Daneshfaraz and Kaya 2008;Pang et al. 2008;Kouakou et al. 2013;Bright et al. 2020;Politano et al. 2020;Song et al. 2020). However, due to the complexity of hydrodynamic conditions and evolution mechanisms, it is difficult to accurately forecast the concentration of suspended sediments over the long term. In addition, uncertainties in the data and parameters generate initial, measurement and model errors and affect the simulation accuracy.
To overcome the model simulation uncertainty and to improve the model accuracy, the ensemble Kalman Filter (EnKF) data assimilation method can provide an effective solution for the modelling of suspended sediment concentrations (Evensen 2003;Vrugt et al. 2006;Clark et al. 2008;Shokri et al. 2019;Kim et al. 2020). The EnKF, which is based on Bayesian theory, introduces the concept of ensemble forecasting, for which the Monte Carlo method is used, followed by a subsequent filtering matrix calculation. This approach overcomes the defect that other sequential assimilation methods cannot be applied to complex nonlinear systems with a non-Gaussian error distribution (Burgers 1998;Bishop et al. 2001). Ensemble forecasting uses initial fields within a certain error range to represent the initial state distribution and then obtains the analysis field through model prediction. Then, the analysis field is used as the initial field of the next simulation period, and the simulation is carried out iteratively. Ensemble forecasting includes an initial perturbation, a model perturbation and a boundary condition perturbation. The initial perturbation is the first step of the assimilation simulation, and its accuracy will directly affect the quality of the numerical simulation result (Burgers 1998;Bishop et al. 2001).
At present, the methods for generating the initial perturbation in hydrodynamic modelling include the Monte Carlo method, lagged average forecast (LAF) method, breeding of growing modes (BGM) method and ensemble transform Kalman filter (ETKF) method. The Monte Carlo method is based on the concept of ensemble prediction and considers that the main source of the observation error is random error (Leith 1974;Hollingsworth 1980). As a part of the EnKF, a random perturbation of the Gaussian distribution is superimposed on the initial observation field to carry out the model calculation. Then, the standard Kalman filter method based on the prediction update process is implemented. However, due to the control of macro laws and the influence of accidental factors in the geographical model simulation, the spatial variability of errors is neither purely random nor purely structural but is characterized by local randomness and an overall structure. Therefore, the Monte Carlo method does not consider the spatial variation characteristics of errors, so it is difficult to generate a reasonable error field.
For the LAF method, the initial perturbation is superimposed on the initial field of the model for ensemble forecasting, and then the average value of the simulation at the same time is taken as the initial value field with an appropriate time lag (Hoffman and Kalnay 2010). In this ensemble forecast, the influence of the spatial variability of errors is not considered in the generation of the initial perturbation field.
In contrast, the BGM method is based on the existing model and generates the error field from the repeated prediction of the model (Toth and Kalnay 1997;Chen et al. 2018). First, a random perturbation is added to the initial analysis field, and the prediction model is calculated both with and without a perturbation; the perturbed prediction and control prediction are used to generate the difference field. Then, the modified difference field is used as the perturbation field for the next prediction. The above mentioned process can generate a reasonable initial perturbation field after many model predictions. In general, this method is better than the Monte Carlo method. However, the error is regarded as being randomly distributed in the model prediction process.
For the method used to generate the ETKF initial perturbation field, the transformation matrix is calculated from the analysis error covariance containing the observation information, and the size and distribution of the perturbation are adjusted by the transformation matrix (Bishop et al. 2001;Majumdar 2015). Compared with the Monte Carlo method and BGM method, this method can improve the generation of the error field by introducing the concept of the transformation matrix and expansion factor. When generating the initial perturbation field, the model error and observation error obey random and static distributions, and the influence of the spatial structure of the model error is not considered.
The above analysis shows that the EnKF assimilation method coupled with a hydrodynamic model can effectively improve the accuracy of predicting the suspended sediment concentration in inland lakes. Moreover, in an assimilation simulation, the existing initial perturbation generation methods consider the perturbation error to follow a random Gaussian distribution. However, in dynamic predictions, the spatial distribution of the simulation error often contains structural information. Therefore, in this paper, a new method to generate the perturbation field for assimilation simulation was proposed considering the spatial structure characteristics of the simulated suspended sediment concentration error. The proposed approach was validated through its application to the assimilation simulation of suspended sediment in Taihu Lake, China.

Study Area
The study area, Taihu Lake, is a typical inland shallow lake in China's East Asia monsoon region having a water surface area of approximately 2,338 km 2 and a mean depth of 1.9 m ( Fig. 1). The lake bed, which has accumulated a high concentration of nutrients over a long time, is covered by mud composed of silt and clay with grain sizes ranging between 2 and 20 µm. The release of nutrients from the flat lake bed occurs due to sediment resuspension on windy days, resulting in the frequent occurrence of harmful algal blooms.

Data Sources
Nineteen buoys located in Taihu Lake ( Fig. 1) measured real-time data during the simulation period at a one-hour interval. Each buoy recorded turbidity and pollutant data, whereas only 14 buoys acquired meteorological data (wind speed, wind direction). The concentration of total suspended sediments (C TSM ) can be obtained according to the empirical relationship between turbidity (T) and C TSM (following Eq. 1) (Huang et al. 2015): Water enters Taihu Lake from the west and generally flows out to the east. According to the runoff model and measured hydrological data, eighteen main rivers and tributaries (including inflows and outflows) were selected for the simulation.

Basic Principle
We believe the hydrodynamic simulation of suspended sediment is spatially variable, and that the resulting error is composed of both structural error and random error. It is thus possible to utilize the structural error when generating the perturbation field for the assimilation simulation rather than to treat all errors as random errors. On this basis, we propose a new method to generate the perturbation field and realize the assimilation simulation. The design of this method is mainly divided into three steps. The first step is to construct a hydrodynamic model of suspended sediment. The second step is to generate the structural perturbation, which is combined with a correlation analysis of the time series data and a dynamic model simulation. The third step is to construct and simulate the assimilation coupling simulation model based on a hybrid perturbation.

Hydrodynamic Model
The three-dimensional shallow water equations are composed of the continuity equation, the momentum equation, and the free surface equation (Chen et al. 2003;Zhang et al. 2019): where x, y, and z are the east, north, and vertical axes in the Cartesian coordinate system, respectively; u, v, and w are the velocity components in the x, y, and z directions, respectively; t denotes time; η is the water level; f is the Coriolis parameter; ρ is the density; P is the pressure; K m is the vertical eddy viscosity coefficient; F u and F v represent the horizontal momentum; and h is the static water depth denoting the vertical distance between the static water level and bottom of the lake with the positive direction downwards. (

Suspended Sediment Model
The total suspended sediment (TSM) model uses a concentration-based approach subject to the following evolution equation (Chen et al. 2003;Zhang et al. 2019): where C is the suspended sediment concentration, A h is the horizontal eddy viscosity, K h is the vertical eddy viscosity, and ω c is the settling velocity of the sediment. The model consider the transport of both suspended load and bedload and ignores condensation and dissolution.
At the surface, a no-flux boundary condition is used for the sediment concentration: The bottom boundary condition for the sediment is where D (kg m -2 h -1 ) denotes the deposition flux and E (kg m -2 h -1 ) denotes the erosion flux, also called resuspension: where Q is the erosive flux, P b is the bottom porosity, F b is the bottom fraction, b is the bottom shear stress, and c is the critical shear stress for erosion.

Description of the EnKF Method
Let X t be an array of the true values, let X b be an array of the predicted values, let X a be an array of analysis values indicating the assimilation results, and let Y be an array of observational values. In a forecast model, the value of X t at time step i can be predicted by where M presents the nonlinear model operator and ε is the model error. In the Kalman filter forecast model system, the analysis values X a are calculated by where K is the Kalman gain matrix and H is an observation operator. The EnKF is constructed by running a forecast model driven by a set of initial conditions and then estimating the error covariance relative to the ensemble mean to determine the ensemble analysis values for the next time step forecast. Let k denote the kth ensemble and m the total number of ensemble members; then, The analysis values at time step i for the kth ensemble model run are calculated by The expression of gain matrix K is as follows: where B is the forecast error covariance matrix and R is the observational error covariance matrix. With a sufficiently large number of ensembles, the ensemble analysis error covariance matrix can be updated with the following relationship: To implement the EnKF, we need to create an ensemble of observational data constructed with perturbations relative to the real values:

Modified Perturbation Field Generation Method
The structural error is difficult to obtain accurately, but we postulate that the structural error is closely related to the observation error at adjacent times; that is, the spatial structure of the error at different times exhibits correlation and hysteresis. At time step i , the suspended sediment concentrations at different buoys at the same time are selected to construct the data set H i . Correlation analysis is then carried out on the data sets at different time steps. The time lag of the time series data is determined according to the correlation coefficient.
Let m be the time lag and R i o be the observation error of time step i; then, The structural error at this time step can be calculated by the average observation error within the time lag: To maintain the spatial structure characteristics of errors, the error matrix is standardized: where R i mean is the mean error at time step i and S i R is the standard deviation of the error at time step i.
Let S p be the variance of a given perturbation field of the specified value. Then, the structural perturbation R i m can be calculated by At time step i, the structural and random perturbations are superimposed to generate the perturbation fields and are then substituted into the model for the next calculation step.

Data Assimilation Experiments
Data assimilation experiments were performed on the basis of EnKF assimilation coupled with a dynamic model. This section describes the setup of the experiments for evaluating the proposed method.

Description of the Assimilation Simulation Process
The assimilation coupled simulation is illustrated in Fig. 2, which includes five parts: data organization, parameter calibration, TSM model calculation, assimilation calculation and analysis of the results.
Data preparation includes the data processing required by the coupling simulation, and the model parameter calibration is used to find the optimal model parameters. In the TSM model calculation, the initial background field is generated first, and the simulation results at time step i are taken as the background field of the i + 1 time step. In the assimilation simulation, at the i − 1 time step, the proposed perturbation method is used to generate the perturbation field. Then, the i-step model prediction field is used as the background field. The assimilation analysis field is generated by combining the observation data, which are imported into the model calculation module as the background field of the i + 1 time step for the TSM model calculation. In all parts, the model calibration is the precondition for the assimilation coupled simulation and is further described below.

TSM Model Calibration
The parameters of the TSM model were calibrated by using the buoy turbidity and meteorological data from November 9 to 11, 2015. Combining these data with the findings of existing research (Pang et al. 2008;Zhang et al. 2019) and through a verification of the suspended sediment concentrations and flow field, the final parameter calibration results  Table 1. During the model calibration, the simulated flow field is consistent with the existing results. The concentrations of suspended sediments simulated by the TSM model were compared with the data from all 19 buoys; the average root mean square error is 22.36 mg/l, and the average relative error is approximately 19.38%. The accuracy of the model is better than that of the existing research on the water of inland lakes. Hence, the calibration results show that the TSM model can be used for assimilation simulation.

Experimental Design
In the assimilation simulation experiments, the turbidity data measured at the buoy stations in Taihu Lake and meteorological data were used. The concentration of suspended sediment (C TSM ) is calculated by Eq. 1. Since the data release interval of buoy data is 1 h, this study simulated 72 h of data from November 12 to 14, 2015, with a total of 72 time steps. Taking 1 h as the time interval, we analysed the correlation coefficient of C TSM at all 19 buoys at different time steps. In this paper, we selected a time-step lag of 3 to participate in the perturbation of the perturbation field. In this case, the maximum correlation coefficient is 0.99, and the minimum correlation coefficient is 0.76. Then, we designed the following five experiments.
Experiment 1: TSM model simulation experiment. In this experiment, combined with the parameters calibrated in Table 1, C TSM is simulated and predicted. Experiment 2: Standard EnKF assimilation simulation experiment (S-EnKF). In this experiment, the Monte Carlo method is used to generate the initial perturbation coupled with the TSM model for data assimilation. Experiment 3: EnKF assimilation simulation experiment with an ETKF initial perturbation (ET-EnKF). In this experiment, the Monte Carlo method is used to generate the random perturbation field at the initial time, the prediction perturbation is updated to analyse the perturbation by the ETKF, and the ensemble prediction of the next step is carried out. Experiment 4: EnKF simulation experiment with a modified perturbation (M-EnKF). In this experiment, the perturbation field is modified by the method proposed in this paper. The hybrid perturbation field represents structural disturbance and random errors and will be added in each time step for the assimilation simulation. Experiment 5: EnKF simulation experiment with a modified ETKF perturbation (M-ET-EnKF). The hybrid perturbation generation method is the same as in experi- Among the above experiments, Experiment 1 provides a dynamic model for the assimilation simulation. In Experiments 2 and 3, only random error was considered to generate the perturbation field; in Experiments 4 and 5, the improved method was used to generate the hybrid perturbation field. In this way, two different assimilation methods can be used to verify the effectiveness of the proposed method.

Results
To examine the applicability of the assimilation simulation method based on a hybrid perturbation, match-up points were used to test the proposed method. We used the root mean square error (RMSE) and mean absolute error (MAE) to evaluate the simulation results. Table 2 lists the RMSE and MAE of the computed errors of the five assimilation simulation experiments, and Fig. 3 shows histograms of the RMSE and MAE. For the TSM model, S-EnKF and ET-EnKF model, the average RMSE values are 22.14,9.56,and 9.55,respectively,and the average MAE values are 17.97,7.45,and 7.45. The experimental results show that the assimilation simulation based on random perturbations is better than the TSM-model simulation at each buoy. Moreover, for the assimilation simulation based on a hybrid perturbation, the accuracy is further improved, the average RMSE values for the M-EnKF model and M-ET-EnKF model are 8.70 and 8.68, respectively, and the average MAE values are 6.70 and 6.69. We also calculated and compared the mean relative error (MRE) of each model, which was reduced from 24.80% to 9.67%, and thus, the conclusion is the same as above.
In the TSM model simulation experiment, two of the 19 buoy stations (No. 9 and No. 10) exhibit relatively large errors. The calculated values at these two buoy stations are lower than the measured values. More suspended sediments were released, perhaps due to the shallow water depth and the frequent human activities in the nearshore area; however, this phenomenon cannot be accurately reflected by the TSM model. Nevertheless, the accuracy is greatly improved by the assimilation simulation experiment and is further improved by the proposed method.
The C TSM values calculated from the buoy-measured turbidity data are used to validate and compare the model results shown in Fig. 4, which means that the results produced using the assimilation simulation based on a hybrid perturbation are more consistent with the measurements than those based on a random perturbation. The reason is that the influence of the structural error is considered in the generation of the hybrid perturbation field, and thus, the proposed perturbation generation method is more suitable for the assimilation simulation than the random perturbation generation method.

Determination of the Variance of the Perturbation Field
The basic purpose of generating an ensemble perturbation is to generate uncorrelated initial perturbation values, which can more accurately represent the analysis error covariance. Therefore, the error structure and amplitude of the perturbation are the key problems in the study of initial perturbations in ensemble forecasting. The error structure reflects the spatial distribution characteristics of the analysis error, while the perturbation amplitude indicates that the analysis error should be equivalent to the corresponding prediction error and directly affects the dispersion of the prediction set.
Due to the existence of uncertainties, this paper uses the control variable method. Keeping the other factors unchanged, each factor was assimilated five times for 24 h, and finally, the optimal value was obtained based on the average RMSE of the results. The final observation error variance is 4.5, and the assimilation model error variance is 4.0 (Fig. 5). In this case, the RMSE of the assimilation simulation results is the minimum.
We consider that the assimilation model error variance represents the ratio of the structural error to the perturbation field and finally generates the perturbation field at each time step. However, due to the complexity of the error disturbance, whether a better structural error variance generation method exists remains to be studied.

Impact of the Ensemble Size on the Assimilation Simulation Results
The ensemble size in the EnKF method will affect the assimilation simulation results, so we must first determine the total number of ensemble members selected in the forecast model run. In this paper, the Monte Carlo method is used to generate the initial perturbation, and the EnKF method is used to compare and analyse the influence of the ensemble size on the forecast. We set the number of ensemble members from 10 to 100 with an interval of 10. The control variable method was used in the same way to calibrate the ensemble size. Keeping the other factors unchanged, when the number of ensemble members is equal to 40, the assimilation simulation results are better than the results from the other methods (Fig. 6). Obviously, the proposed method needs to be tried many times and needs to be improved in the future.

Conclusion
In this paper, a novel method to generate the perturbation field for assimilation simulation was proposed. The proposed method mainly verifies the idea that the combined effects of structural error should be considered in the generation of the perturbation field. On this basis, we designed five assimilation experiments to simulate the concentration of suspended sediments in Taihu Lake, China. After three days and 72 time steps of assimilation simulation based on the hybrid perturbation generation method, we found that the proposed assimilation method provided results that were more consistent with the buoy-measured data. The RMSE results show that the assimilation method based on the hybrid perturbation field generation method has a higher simulation accuracy than other methods. This study shows that the proposed method is effective and provides a new idea for the assimilation simulation of suspended sediment concentrations in inland lakes.

Consent to Participate
This paper mainly studies the natural environment of inland lakes, and all analyses were based on previously published studies; thus, consent to participate was not required for this research.

Consent to Publish
The manuscript has been approved by all authors for publication.

Competing interests
The authors declare that they have no competing interests.