Intercomparison of the impact of INSAT-3D atmospheric motion vectors in 3DVAR and hybrid ensemble-3DVAR data assimilation systems during Indian summer monsoon

The impact of observations in a data assimilation (DA) may depend on various factors, and one aspect that can affect the impact is the specification of the background error covariance matrix. The present study compares the impact of INSAT-3D atmospheric motion vector (AMV) observations in traditional three-dimensional variational (3DVAR) DA system and hybrid ensemble transform Kalman filter (ETKF)-3DVAR DA system (HYBRID) available in Weather Research and Forecast (WRF) modeling system. The objective of the study is to understand how the impact of INSAT-3D AMV observations differ when assimilated using 3DVAR and HYBRID DA systems. The DA experiments are conducted over a ~4-week period of Indian summer monsoon rainfall of July 2016. Four sets of experiments are performed with and without INSAT-3D AMV in both the DA systems. The domain-wide verification with respect to radiosonde observations reveals that forecasts in HYBRID experiments are more accurate than 3DVAR experiments, in general. Geographical distribution depicts the positive impacts of INSAT-3D AMV observations across the domain in both 3DVAR and HYBRID DA systems. The AMV observations show a larger relative impact in HYBRID than in 3DVAR. The relative improvement in HYBRID with AMV DA compared to 3DVAR is 77% and 71% for wind and tropical temperature. The skill scores for quantitative evaluation of precipitation forecast indicate a modest improvement in rainfall for HYBRID run, and incorporating the AMV observation does not considerably enhance the skill of 24-h and 48-h rainfall forecast.


Introduction
The importance of early warnings from the numerical weather prediction (NWP) model has increased greatly in the last few decades as it plays an important role in mitigating the damages due to natural disasters like floods, thunderstorms, heavy rains, tropical cyclones, etc. NWP is an initial value problem, and its ability to represent the future state of the atmosphere depends primarily on the initial conditions of the atmosphere. The initial conditions for the limited area models are mostly derived from coarser-resolution global forecast models, which lack information about the region-specific local conditions. Data assimilation (DA) is a scientific method obtaining a very precise initial state of combining the background forecast and the observations (Daley 1991). Some of the widely used DA techniques such as three and four-dimensional variational method (3DVAR and 4DVAR) estimates the true state of the system based on the minimization of the cost function (e.g., Courtier et al. 1998;Wu et al. 2002;Bouttier and Kelly 2001). At the same time, the DA methods such as the ensemble Kalman filter (EnKF) estimate the optimal weights to obtain accurate initial conditions (Kalnay 2003). To preserve the accuracy of the state of the system, the distance between the background state and the observations are scaled by background error covariance (BEC) and observational error covariance. In the traditional variational approach, BEC is generated using climatological data, and it is assumed to be static and isotropic, while the advanced DA methods such as EnKF utilize the ensemble of the model forecast to compute BEC, which evolves with time. However, the implementation of EnKF in an operational forecast system is computationally intensive as its accuracy depends upon the ensemble size (Houtekamer et al. 2005). In addition to that, the BEC is prone to sampling error due to limited ensemble size. Recently, a sophisticated DA system is developed by incorporating the flow-dependent BEC information provided by the ensemble members in the 3DVAR DA system, which is popularly known as the hybrid ensemble-3DVAR DA system (hereafter HYBRID; Wang et al. 2007). The improved performance of HYBRID over 3DVAR is documented by many authors (e.g., Hamill and Snyder 2000;Wang et al. 2008;Prasad et al. 2016;Kutty and Wang 2015;Kutty et al. 2018).
Atmospheric motion vectors (AMV) are satellite-derived wind observations, which are obtained by continuously tracking regions of clouds or water vapor using satellite images. AMV provides wind information with good areal coverage, particularly over the data-sparse oceanic region. Several studies have shown the benefit of assimilating AMVs on improving the weather forecasts over the tropics (e.g., Velden et al. 1992;Leslie et al. 1998;Soden et al. 2001;Rani and Gupta 2014;Mounika et al. 2018). A study by Kaur et al. (2015) during the Indian summer monsoon (ISM) reported a positive impact of Kalpana-1 AMV assimilation over the tropical region in the 3DVAR DA system. Deb et al. (2016) demonstrates the reduction of track forecast errors of the cyclonic storm NANAUK over the Arabian Sea when INSAT-3D AMV observations are assimilated. Kumar et al. (2017) has reported a positive impact of INSAT-3D AMV observations in the 3DVAR DA system during ISM.
The impact of observations may vary depending on many factors in the DA assimilation system such as data quality control, preprocessing, and specification of BEC. Previous studies have shown that in the presence of flow evolving BEC in the HYBRID DA system, the observations are effectively assimilated as compared to traditional 3DVAR DA systems, and hence, it is expected that the impact of INSAT-3D AMV observations may vary in different DA systems. Recent studies have established the forecast improvements due to AMV observations in the HYBRID DA system (e.g., Sawada et al. 2019;Zhang et al. 2018). The present study attempts to quantify the impact of assimilation of INSAT-3D AMV using the advanced HYBRID DA system using a~4-week period of July 2016 using a limited area model. The specific objective of the study is to understand how different or similar the impact of INSAT-3D AMV observations by 3DVAR as compared to that assimilated by HYBRID DA system. This paper is organized as follows. Section 2 gives a detailed description of the DA systems, the model, and the experimental design. Section 3 describes the major results, and Section 4 concludes the paper.

Data assimilation systems
In this study, experiments are conducted using two DA systems, namely, 3DVAR and hybrid ensemble transform Kalman filter (ETKF)-3DVAR, available in Weather Research and Forecast Data Assimilation (WRFDA) system. The minimization cost function of HYBRID carries an additional term that incorporates ensemble covariance in the 3DVAR framework (Wang et al. 2008) as shown below (Eq. (1)): where x′is the analysis increment that has the contribution of both static and ensemble error covariances available in HYBRID, B is the static BEC matrix, x 1 ′ is the model space increment that has the contribution of static BEC of 3DVAR, and β 1 is used to assign a weight to static BEC.
Here, a consists of the spatially varying extended control variables (Lorenc 2003), and its variations are controlled by the localization matrix A. β 2 1 2 a T A −1 a incorporates the flowdependent ensemble BEC information, and β 2 is used to assign weight to the ensemble covariance. The present study assigns equal weight (50%) to the static and flow-dependent ensemble BEC in the HYBRID cost function. The static BEC matrix used in this study is generated using the National Meteorological Center (NMC; Parrish and Derber 1992) method from 1 month of WRF generated model forecast using CV5 option. In Eq. (1), y Þis the innovation vector where y o is the observation, x b is the background forecast, and H is the nonlinear observation operator. Here, H is the linearized observation operator, and R is the observation The forecast ensemble perturbations are updated by ETKF using the following transformation matrix T: where Г represnts the eigen values and C represents the eigenvectors obtained by the singular value decomposition of (HX b ) T R −1 HX b . The under-sampling problem associated with ETKF due to small ensemble size is dealt with adding inflation factors Π and ρ that increases the analysis error variance (Wang et al. 2008): Since covariance localization has not been applied to the ETKF formulation, large inflation factors are used to improve the systematic underestimation in the error variance. The initial perturbations for ensembles are obtained as random draws from the static BEC of 3DVAR. Further details on the configurations of the DA systems can be found in Gogoi et al. (2020)

Model description and configurations
The model simulations are performed using the Advanced Weather Research and Forecasting (ARW-WRF) model of version 3.8.1 (Skamarock et al. 2008). The model is fully compressible and non-hydrostatic that utilizes sophisticated parameterization schemes to represent unresolved atmospheric processes. The configuration of the model domain and the various schemes used in this study are shown in Table 1. Figure 1 represents the simulation domain used in this study that covers the monsoon prevailing region in and around the Indian subcontinent. The initial and boundary conditions for WRF simulations are obtained from the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) data.

INSAT-3D AMV and Global Telecommunication System data
INSAT-3D is an Indian geostationary meteorological satellite with the imager and the sounder onboard. The imager has one  visible channel and five infrared (IR) channels, namely, shortwave IR (SWIR), mid-wave IR (MIR), water vapor (WV), and two split thermal IR (TIR-1, TIR-2) channels. The channels have different ground resolutions. Both visible and SWIR channels are at 1-km resolution, the MIR and TIRs channels are at 4-km ground resolution, and the WV channel is comparatively at a coarser resolution of 8 km. INSAT-3D sounder has 18 IR and one visible channel. Out of the 18 IR channels, six bands are in the SWIR, five are in the MIR, and seven channels are in the longwave infrared (LWIR). The ground resolution of all sounder channels is 10 km. Three consecutive INSAT-3D images of 30-min intervals are used to determine the AMVs, which consists of the following steps: (1) image registration, thresholding, and filtering; (2) features/tracer (3) quality control; and (4) height assignment (Sankhala et al. 2020). This study uses AMVs retrieved from low-level MIR and visible channels extended from 600 to 950 hPa, and upper-level WV channel data ranges from 100 to 500 hPa. A Recent study shows that INSAT-3D AMV is found to be useful in understanding the monsoon intraseasonal variability of ISM (Sankhala et al. 2019). AMV data for this study is obtained from https://www. mosdac.gov.in/. Conventional in situ observations and satellite-derived wind observations available from the Global Telecommunication System (GTS) are assimilated.

Experimental design and validation
Observation System Experiment (OSE) is conducted to understand the impact of INSAT-3D AMV in 3DVAR and HYBRID DA systems, and the details of experiments are provided in Table 2. Four experiments are conducted, and the DA system is continuously cycled for~4-week period starting from 0000 UTC 1 July 2016 to 0000 UTC 30 July 2016 at every 12-h interval, and 48-h free forecast is commenced from each 0000 and 12 UTC DA analysis during the month of July. To avoid spin-up issues, the ensembles are initialized 24 h prior to the first analysis time, i.e., 0000 UTC 30 June 2016, by adding 50 random perturbations from WRF 3DVAR. The 50-member ensemble system is then updated using ETKF after 24 h of model integration. The ensemble mean is used to initialize both 3DVAR and HYBRID experiments at 0000 UTC 1 July 2016. The observation errors are taken from NCEP statistics, and the error statistics for INSAT-3D AMV are adapted from Kumar et al. (2017). India Meteorological Department (IMD) gridded rainfall data at~25-km resolution (Pai et al. 2014) is used to validate the rainfall forecast. Further, the European Center for Medium Range Weather Forecasting Reanalysis (ERA) interim and Advanced Scatterometer (ASCAT) data are used for verification of atmospheric field variables. Prior to validation, the model simulations from all the experiments are brought to the same grid resolution as ASCAT and ERA interim data using bilinear interpolation. Verification metrics used in this study involve root mean square error (RMSE), mean error (ME), or Bias and improvement parameter (η). The (4)).
Positive η (%) values depict improvement in 3DVAR_AMV, HYBRID, and HYBRID_AMV in comparison to 3DVAR. The rainfall forecast is further validated using Bias score (BS) and equitable threat score (ETS) (Hamill and Juras 2006). Figure 2 shows the profiles of root mean square fit of analysis to the observation for zonal and meridional winds, temperature, and water vapor mixing ratio validated against radiosonde observations. The values are obtained by averaging over 60 DA cycles and over the domain, which is shown in Fig. 1. It can be seen that HYBRID analysis without AMV observations fit more closely to the observations for zonal and meridional winds, while 3DVAR experiments show slightly better fit to observations for temperature and water vapor mixing ratio in almost all the levels. The analysis from 3DVAR and 3DVAR_AMV experiments depicts a similar fit to the observation for all the variables. It is to be noted that the radiosonde observations used for validating the analysis are also assimilated to the analysis. Hence the results may not represent the error in the analysis. However, it is indicative of how close the analysis is to the observations. In general, the fit of analysis to observations is closely tied to the configurations of 3DVAR and HYBRID algorithms and its BEC settings. Previous studies such as Wang et al. (2008) show that for smaller background error variances or larger correlation length scales, the analysis may not fit well with the observations. In addition to that, Zhang et al. (2011) has shown that a closer fit to observations does not necessarily lead to a better forecast. To further assess the impact of observations on the model variables, the vertical profile of analysis increment at the radiosonde location is evaluated (Fig. 3). The varying effect of two different DA techniques is quite visible in the zonal wind analysis increment. Both HYBRID experiments show strong upper-level zonal wind increment compared to 3DVAR experiments. No significant impact of AMV is observed in the HYBRID experiments. However, for meridional wind, AMV data's influence is seen in 3DVAR from 400 to 800 hPa. The temperature field result shows a negative temperature increment for all the experiments from 1000 to 500 hPa. However, the magnitude of increment is higher in both the HYBRID experiments. In the upper level , TT (f-h), and RH (j-l) from 3DVAR_AMV, HYBRID, and HYBRID_AMV system with respect to 3DVAR system above 500 hPa, HYBRID_AMV shows a larger positive increment compared to other experiments.

Analysis and forecast profile verification
The root mean square error (RMSE) is computed with respect to radiosonde observations for 24-h forecasts initialized from the analysis at 0000 UTC of the month of July 2016, which is shown in Fig. 4. The results indicate that both the HYBRID experiments are more accurate than the 3DVAR experiments. The wind forecast shows improvement mostly near the upper troposphere, while the improvement in the temperature forecast in HYBRID experiments are substantially larger as compared to 3DVAR experiments. For wind forecasts, the impact of AMV is more evident in 3DVAR than HYBRID. Having said these, the impact of AMV observations by the domain averaged vertical profiles of field variables are not evident from Fig. 4. The forecast validation is done with respect to radiosonde observation, which is absent over oceans. Therefore, the actual impact of AMV observations may not be reflected when verified over limited radiosonde profiles.

Spatial forecast verification
To further assess the impact of observations as well as DA system on the model forecast, the significant zone of forecast difference between 3DVAR and the rest of the experiments is evaluated for model variables including wind at 850-hPa level, tropospheric temperature (TT) averaged over 200-to 700-hPa level, and relative humidity (RH) at 850-hPa level (figure not shown). The spatial representation of forecast difference shows an overall more significant difference zone in HYBRID experiments than 3DVAR_AMV simulations. More impact on TT field is observed over Bay of Bengal (BoB) and southern oceanic region. Figure 5a, e, and i depict the monthly averaged wind at 850-hPa level, TT, and RH at 850-hPa level from ERAinterim reanalysis data, respectively, and Fig. 5b-d, f-h, and j-l illustrate the spatial distribution of improvement parameter (η) for the respective variables for July 2016. The positive (negative) η value represents percentage improvement (degradation) in the model forecast compared to the 3DVAR experiment. Both the HYBRID experiments depict higher positive η values, which is indicative of improvements due to incorporating ensemble covariance in the 3DVAR framework for wind and TT variables. Further, the experiments that assimilate INSAT-3D AMV observations show substantial relative forecast improvements in both 3DVAR and HYBRID DA systems, spatially. The largest positive impact is seen in HYBRID_AMV experiments with 77% positive η values for wind. A dipole-like pattern with positive and negative impact is observed over the western Arabian Sea (AS) due to the presence of low-level jet (LLJ) in Fig. 5b-d. LLJ is also known as Findlater jet (Findlater 1978) is an important lower level circulation feature of the Indian summer monsoon, which carries extensive moisture from the Indian Ocean producing rainfall over the Indian subcontinent. Therefore, the improvement in the forecast of LLJ is expected to improve the rainfall over Indian landmass.
Apart from wind, a substantial improvement of 70% positive η values for TT variable is observed in HYBRID_AMV experiment (Fig. 5h). The improvement percentage for RH is not very significant compared to wind and TT. However, marginal improvement due to AMV DA is observed both in 3DVAR and HYBRID. Figure 6 shows the time series of area-averaged RMSE of zonal and meridional wind over the AS, respectively, for both Fig. 8 Bias of monthly averaged (July) 24-h forecasted rainfall (mm/day) with respect to IMD gridded rainfall for a 3DVAR, b 3DVAR_AMV, c HYBRID, and d HYBRID_AMV 24-h and 48-h forecasts. The verification is confined to AS to understand the impact of experiments on LLJ. It is evident from Fig. 6a to c that the assimilation of INSAT-3D AMV observations has significantly reduced the 24-h and 48-h forecasts errors for zonal winds in HYBRID_AMV experiment as compared to other experiments, and the impact of assimilation increases with increase in time that is indicative of the cumulative impact of assimilation. Though the HYBRID experiment does not significantly improve 24-h forecasts, the reduction in forecast errors is clearly evident in 48-h forecast. In 3DVAR DA system, the positive impact of INSAT3D-AMV observations is evident during the later DA cycling hours.
To explore further on the impact of observations closer to the surface, 10-m surface wind forecast is evaluated with respect to ASCAT wind observations over the AS region. Figure 7 shows the time series plot of monthly averaged RMSE of 24 h and 48 h near surface zonal and meridional wind forecast over the AS. Similar to the results obtained in ERA-interim validation, the assimilation of AMV observations has produced substantial improvements in the wind forecasts in both 3DVAR and HYBRID DA systems, in general. However, the 3DVAR_AMV experiment depicts higher reduction in forecast errors as compared to the other experiments for zonal wind. The improvement in meridional wind component is more pronounced in HYBRID_AMV run as compared to other experiments for both 24-h and 48-h forecasts. Figure 8 shows the mean error (Bias) of the model simulated 24-h rainfall forecast with respect to IMD gridded rainfall. The HYBRID experiments show lower Bias in precipitation as compared to 3DVAR run. It can be seen that the wet bias in the Central India (CI) and dry bias to the south of CI has shown considerable reduction in HYBRID experiments. The AMV experiments do not depict any significant change in Bias when compared to its corresponding control DA experiments. Further, the difference of 3DVAR forecasted rainfall from 3DVAR_AMV experiments shows no significant difference zone. The difference of 3DVAR from HYBRID and HYBRID_AMV experiments shows a considerable difference zone over the same region where HYBID has improved wet bias and dry bias compared to 3DVAR. HYBRID_AMV also shows significant difference over southern CI and north eastern region of India for both 24-h and 48-h forecast (figure not shown). To quantitatively evaluate the precipitation forecasts, skill scores such as ETS and Bias score are calculated for various experiments. It is to be noted that the skill scores are calculated in two phases of experiments: phase 1 (2-16 July 2016) and phase 2 (17-31 July 2016), which is represented in Fig. 9. The skill of 24-h precipitation forecast for HYBRID experiments are found to be higher than 3DVAR experiments in phase 2 towards higher rainfall thresholds, which is evident from ETS values (Fig.  9c). The HYBRID_AMV experiment shows improved skills for precipitation forecast for higher rainfall thresholds as compared to 3DVAR_AMV experiment in phase 2. Bias score indicates that all the experiments in phase 1 and phase 2 show overestimation of rainfall and the results are more pronounced in 3DVAR experiments in phase 2. Similarly, 48-h rainfall forecast results do not show substantial improvement in both the HYBRID experiment in phase 1 (Fig. 10a). However, in phase 2, the skill scores indicate modest improvements in rainfall forecast in moderate-high rainfall threshold for HYBRID experiment as compared to that of 3DVAR (Fig. 10c). Furthermore, the assimilation of AMV observations does not significantly improve the precipitation forecast in both 3DVAR and HYBRID experiments.

Conclusion
In this study, the impact of the assimilation of INSAT-3D AMV in the two DA systems for short-range forecast during the Indian summer monsoon season is evaluated. The DA systems used in this study include 3DVAR and hybrid ETKF-3DVAR available in the WRF modeling system. The DA cycling experiments are performed for the~4-week period of July 2016, and a 48-h model forecast is generated from each analysis.
The results indicate that 3DVAR analysis fits more closely with the observations than HYBRID analysis. The domainwide verification with respect to radiosonde observations reveals that forecasts in HYBRID experiments are more accurate than the 3DVAR experiments, in general. The wind forecasts show more improvements near the upper troposphere for HYBRID run, with the neutral impact of INSAT-3D AMV observations. In comparison with the forecasts from HYBRID analysis, the impact of INSAT-3D AMV observations is more pronounced in 3DVAR DA system, for wind forecasts over land. Geographical distribution depicts the positive impacts of INSAT-3D AMV observations across the whole domain in both 3DVAR and HYBRID DA systems. The AMV observations show a larger relative impact in HYBRID than in Fig. 10 a ETS and b Bias scores valid from 2 nd July 2016 to 16 th July 2016 (first phase) and the c ETS and d Bias scores valid from 17 th July 2016 to 31 st July 2016 (second phase) for different rainfall thresholds computed over the Indian land mass averaged over the 48-h forecasts 3DVAR, and the relative improvement in comparison to 3DVAR is 77% for wind and 71% for tropospheric temperature. Time evolution of forecast errors with respect to ERA-Interim analysis in the zonal wind over the Arabian Sea indicates a larger growth rate in 3DVAR experiment in comparison to HYBRID experiment while the assimilation of AMV observations considerably reduces forecast errors in both DA systems. The HYBRID_AMV experiments show improvement in the meridional component of near surface winds when validated against ASCAT observations. The HYBRID run reduces the Bias in precipitation forecast, especially when AMV observations are incorporated. The skill scores for quantitative evaluation of precipitation forecast indicate a modest improvement in rainfall for HYBRID run, and incorporating the AMV observation does not considerably enhance the skill of 24-h and 48-h rainfall forecast.
The present study attempts to quantify the impact of INSAT-3D AMV observation in the 3DVAR and the hybrid ETKF-3DVAR DA system. The HYBRID DA system incorporates flow-dependent ensemble BEC that generates optimal analysis through increments that are consistent with the background flow and responds adaptively to the change in the observing system. Therefore, it is expected that the impact of the observing system may vary depending on the DA system used. As a matter of fact, the results from the study indicate that the impact of INSAT-3D AMV observations varies in 3DVAR and HYBRID DA systems. Furthermore, the impact of the new observing system shows more value to the advanced DA systems such as HYBRID than the traditional 3DVAR approach. However, the ensemble system needs to be properly configured for the DA system to perform optimally. Furthermore, the study has not assimilated satellite radiance observation. The impact of INSAT-3D AMV may differ considerably in both the DA systems when satellite radiance is incorporated. Future studies in this direction are warranted.