## 3.1 Study area and data collection

Located at a low latitude and close to the sea, Hong Kong has a subtropical monsoon climate influenced by monsoon winds and oceanic circulation with high and fluctuating water vapor content values, rendering it suitable for meteorological studies. Moreover, there is a wide distribution of CORS stations equipped with meteorological instruments, and the data streams of each station constitute complete historical records and are convenient to access, providing a suitable database for tomography. In addition, there is a radiosonde station in Hong Kong, which can provide reliable reference values for obtaining tomography results. Therefore, the Hong Kong region was chosen as the tomography test area, with a longitudinal range of 113.85°E-114.40°E and a latitudinal range of 22.20°N-22.60°N.

The GNSS data were provided by the Hong Kong Satellite Positioning Reference Station Network (SatRef), which can access the GNSS observation data of 19 CORS stations. The average geographic distance of each sampling point is approximately 10 km, and the distribution is shown with a blue triangle in Fig. 4. The receiver type of the stations is TRIMBLE NRTR9 or Leica GR50, and there are three sampling intervals of 1 s/5 s/30 s for the observation files. In this experiment, observation data for 2022 are collected, with a sampling interval of 30 seconds.

The King's Park radiosonde (station no. 45004) is located approximately at the center of the stations, as marked with a red circle in Fig. 4. The RS station provides high-precision water vapor density data at 0:00 and 12:00 daily, which are considered reference values for accurately assessing the tomographic inversion results in this experiment.

HRES is a high-precision, high-spatiotemporal resolution forecast dataset provided by the ECMWF. This dataset combines observations, prior information, and short-term forecast models, with a temporal resolution of 6 hours and a delay of 5–12 hours (https://www.ecmwf.int/en/forecasts/datasets/set-i). Since HRES data are currently unavailable for the study area, the ERA5 dataset, which has an equivalent accuracy to that of the HRES dataset, was utilized as the model training and prediction data for the simulation experiments in this paper. These two datasets provide approximately the same global standard deviation values for the ZTDs as those obtained from the GNSS solution, at 1.54 cm for the HRES dataset and 1.69 cm for the ERA5 dataset (Yu et al. 2021). ERA5 is the fifth generation of the global high-resolution atmospheric reanalysis dataset of the ECMWF. This dataset is obtained from data assimilation systems and model forecasts, providing hourly estimates of an extensive range of meteorological parameters. The dataset exhibits a temporal resolution of 1 hour and a horizontal grid resolution of 0.25°×0.25° (interpolated 0.125°×0.125° grid data are also available). The data are vertically organized into 37 atmospheric pressure strata, with a delay of 5 days, and this dataset is downloadable free of charge from https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5.

In this study, ERA5 data from January 1, 2018, to December 31, 2022, were utilized, with the first four years used to train the Informer-WV model and the final year used for prediction to obtain real-time water vapor density values as the initial values for tomography. Horizontally, 20 grid points adjacent to the study area were considered, while vertically, meteorological parameter data were obtained for a total of 21 baroclinic layers from 1000 hPa (closest to the surface) to 250 hPa (corresponding to a potential height of approximately 11 km).

## 3.2 Analysis of the Informer-WV model prediction results

With the utilization of the trained time-series prediction model of the water vapor density in each baroclinic layer described in Section 2.2, the data of six meteorological parameters \(H\), \(T\), \(q\), \(e\), \(Nw\), and \(\rho\) at each grid point in the study region in 2022 were adopted as inputs to predict each factor during the future 24 hours of the corresponding period. Adopting the actual ERA5 data as the true values, the model prediction results were analyzed.

First, the overall accuracy of the prediction results of each pressure layer is evaluated for all parameters separately, and the statistical results are listed in Table 2. The model prediction results are highly accurate and can be used as the initial values for solving the tomography observation equations. The MAE and RMSE in the table are related to the unit and the value of each meteorological parameter, but they are all relatively small, indicating that the model achieves a superior prediction precision. The average coefficients of determination (R2) of all the meteorological parameters are higher than 0.93, indicating that the prediction results for each parameter can explain more than 93% of the variation in the true value, reflecting the superior prediction capability of the model.

Table 2

Statistical table for assessment of accuracy of prediction results (Units, *H*: m, *T*: K, *q*: g/kg, *e*: hPa, *Nw*: dimensionless, *ρ*: g/m³)

Predicted parameter | MAE | RMSE | \({\mathbf{R}}^{2}\) |

\(H\) | 5.27 | 6.68 | 0.96 |

\(T\) | 0.48 | 0.62 | 0.97 |

\(q\) | 0.47 | 0.61 | 0.94 |

\(e\) | 0.58 | 0.77 | 0.94 |

\(Nw\) | 2.83 | 3.72 | 0.93 |

\(\rho\) | 0.44 | 0.58 | 0.93 |

Since the prediction aims to obtain the real-time water vapor density as the initial tomography value, only one weather parameter, namely, the water vapor density, is subsequently analyzed on the basis of the model prediction results. First, the distributions of the deviation values of the water vapor density predicted by the Informer-WV model are analyzed in time and space, and the three-dimensional results are shown in Fig. 5, with the time on the *x*-axis, barometric pressure values on the *y*-axis, and deviation values on the *z*-axis. Evidently, there is no significant trend in the deviation values in the time dimension. In the barometric pressure dimension, the predicted water vapor density bias is close to 0 in the region lower than 500 hPa, whereas in the area above 500 hPa, the bias first increases with increasing barometric pressure (i.e., decreasing altitude) overall and then widely fluctuates. The spatial and temporal distributions of the deviations from the predicted values shown in this figure are strongly correlated with the characteristics of the spatial and temporal distributions of water vapor. For instance, the seasonal variation in the atmospheric water vapor content, which is relatively high in summer and autumn (June to November) and low in winter and spring (December to May) in the Hong Kong region, imposes such an effect on the deviation values. Additionally, at high altitudes, the water vapor density values are relatively low and decrease with increasing altitude. Therefore, even if the water vapor density varies significantly, the deviation is slight. In addition, under the impact of extreme weather events (e.g., heavy rainfall and typhoons), local water vapor density values exhibit drastic changes, causing considerable deviation between the predicted and actual water vapor density values. After analyzing the spatial and temporal distributions of the water vapor density bias, precision statistics were calculated to assess the predicted values of the water vapor density along the vertical direction and temporal dimension more accurately. The obtained statistical results are shown in Fig. 6 and Fig. 7, respectively. According to the vertical accuracy, the height of the top layer is determined for the mesh of the tomography method proposed in this paper, and the tomography results of the worst prediction period in the time dimension are used as the basis for more robust verification of the effectiveness of the tomography method.

Figure 6 shows the trend plots of the MAE, RMSE, and R2 values of the water vapor density at each pressure level with respect to the air pressure. The trend line of the coefficient of determination reveals that the minimum value occurs in the 250-hPa layer, which is close to 0.6 since the water vapor density is minimal in this air pressure layer (approximately 11 km). Moreover, a very small prediction error will result in a low R2 value. However, according to the RMSE value, the prediction error is very small in this layer. For the remaining layers, R is approximately 0.8 or even close to 1, indicating that the model prediction results can explain more than 80% of the variation in the true value. This indicates that the model provides a superior performance in predicting the water vapor density, and the prediction values can be regarded as the initial values for solving the tomography observation equations.

The trends in the MAE and RMSE of the predicted water vapor density with respect to the air pressure are generally consistent. In the 250 to 600 hPa section, the predicted water vapor density values exhibit an apparent upward trend with increasing air pressure, and the RMSE and MAE fluctuate above and below specific values within the 600 to 1,000 hPa range, at 1.1 and 0.8 g/m³, respectively. According to the corresponding RMSE and MAE values for each pressure layer, the height corresponding to 550 hPa (approximately 5100 m) is adopted as the top height of the reduced tomography grid since the water vapor density prediction accuracy is higher in the 250 to 550 hPa region. As such, more accurate 3D water vapor density inversion results can be obtained in the 600 to 1,000 hPa region, where the prediction accuracy is low and water vapor enrichment occurs. Considering the vertical division strategy of the tomography grid, \({H}_{2}\) in Section 2.3 is set to 5.2 km, which indicates that the top layer of the tomography grid is 5.2 km.

Figure 7 shows a statistical comparison of the accuracy of the water vapor density model prediction results on Day of Year (DOY) in 2022 by extracting the predicted data at 0:00 and 12:00 UTC, corresponding to the temporal resolution of the daily sounding data. Then, the MAE and RMSE of the predicted values of the water vapor density versus the true values are calculated at each moment, and folded line graphs are finally obtained. The accuracy of the prediction results of the Informer-WV model exhibits no apparent periodic pattern in terms of the temporal trend, and the trends in the RMSE and MAE are similar, indicating that the model prediction accuracy remains relatively stable. Moreover, the model prediction accuracy is satisfactory during most of the experimental periods, with both the MAE and RMSE values below 1 g/m³. Moreover, the average RMSE and MAE values of the predicted values for the selected periods throughout the year are 0.80 and 0.57 g/m³, respectively, which indicate more remarkable accuracy improvements than those achieved in previous studies. In addition, the peak value of the RMSE occurs on DOY 100 at 0:00, i.e., April 10, 2022, and 0:00, and while the MAE is also high on the same day, the peak value occurs on DOY 351 at 0:00, i.e., December 17, 2022, and 0:00. According to the meteorological records and weather warning records of the Hong Kong Observatory, the above days are the dates when extreme weather conditions occurred, and weather warnings were broadcast in Hong Kong. Therefore, the poor predictive performance of the model during these two periods can be attributed to extreme weather.

## 3.3 Strategies for tomography voxel division

With respect to the tomography voxel delineation strategy, along the horizontal direction, the HK region was divided into 10-by-10 grids in this experiment, each spanning 0.055° in longitude and 0.04° in latitude, as shown by the gray grid lines in Fig. 4.

The tomography grid division along the vertical direction is shown in Fig. 8. The traditional approach considers 10 km as the top layer of the grid, and a total of 13 layers are distinguished from 0–10 km. The nonuniform layer division is adopted, and the layer heights are 0.3, 0.4, 0.5, 0.6, 0.6, 0.6, 0.7, 0.7, 0.8, 0.9, 1.2, 1.3, and 1.4 km from bottom to top. Based on the precision of the predicted water vapor density along the vertical direction shown in Fig. 6, in this paper, 5.2 km is regarded as the height of the top of the new tomography layer. Therefore, 0-5.2 km is designated the tomography region, which is divided in the same way as above, and the water vapor density values of the voxels in the 5.2–10 km region are expressed by the model-predicted values.