PCNN model for imputing OMI tropospheric NO2 column images
Images captured by satellite instruments, such as the OMI, often contain missing values due to factors like cloud fraction and sensor failures. To address this issue and obtain complete images by imputing missing pixels in OMI TCDNO2, we designed the PCNN model (Lops et al., 2023). The PCNN model consists of a U-net deep convolutional neural network (Ronneberger et al., 2015) that incorporates partial convolution layers (Liu et al., 2018) in addition to regular convolution layers (Krizhevsky et al., 2017).
Convolution neural networks (CNNs) process data by convolving the data at each layer across multiple image channels, assigning weights and biases at various aspects within the data, and differentiating between them. The partial convolution padding process in the PCNN model gradually reduces the influence of the missing data mask during each encoding phase. The partial convolution process considers only valid pixels, and normalization is adjusted for a fraction of missing data. To handle different mask features for each of the three channels of the dataset (representing the temporal dimension), we replaced the conventional 2D convolution layer with depthwise convolutions (Chollet, 2017) within the partial convolution process during the encoding phase of the model. The PCNN model was trained using TCDNO2 simulated by the Community Multiscale Air Quality (CMAQ) modeling system version 5.2, incorporating OMI missing data masks. Once trained, the model performs spatiotemporal imputation of the original OMI datasets, resulting in a complete OMI TCDNO2 that can be used by the DNN model for proxy NO2 surface estimations. For more detailed information about the PCNN model used in this study, refer to Lops et al. (2023).
DNN model for estimating daily proxy surface NO2 map
We employed the DNN model to estimate daily surface concentrations of NO2 over the CONUS in the summer of 2017 at a spatial resolution of 10×10 km. The DNN model consists of input and output layers connecting to four hidden layers, each containing 70 neurons (Ghahremanloo et al., 2023b).
In our analysis, we utilized several predictor variables, including the OMI TCDNO2, which was imputed by the PCNN model, as mentioned earlier. We also incorporated weight-averaged NO2 concentrations (10×10 km), generated using the distance-inversed weighted average approach (Di et al., 2016) with data from the Environmental Protection Agency (EPA) Air Quality System (AQS), Other predictor variables included the Enhanced Vegetation Index (EVI) (5×5 km) from the Moderate Resolution Imaging Spectroradiometer (MODIS), various meteorological factors (0.125°×0.125°) such as air temperature, surface pressure, specific humidity, U/V components of wind speed, and longwave radiation flux downward from the North American Land Data Assimilation System (NLDAS), surface layer height (0.5°×0.625°) from the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2), modeled surface NO2 concentrations (12 km) from the CMAQ simulation using a priori NOx emissions, population density (8 km) from the NASA Socioeconomic Data and Applications Center (Center for International Earth Science Information Network, 2018), road density (8 km) (km per km2) from the Global Roads Inventory Project (Meijer et al., 2018), and percentage of urban space (8 km) from the 2016 National Land Cover Database (NLCD). We matched all predictor variables to the closest grid cells of the weight-averaged NO2 layer. To assess multicollinearity among the predictor variables and determine feature importance, we employed the variance inflation factor (VIF) approach (Ghahremanloo et al., 2021b; Kock et al., 2012; RB Kline, 2015) and the SHapley Additive exPlanations (SHAP) feature importance (Ghahremanloo et al., 2021b).
WRF-CMAQ modeling setup
We updated NOx emissions using the Weather Research and Forecasting (WRF)-CMAQ simulation to gain insights into the subsequent impacts on air chemistry. The CMAQ version 5.2, developed and released by the US Environmental Protection Agency (EPA), was utilized to simulate air quality variables, including surface NO2 concentration, and to capture the relationship between NOx emissions and NO2 concentration. Meteorological input data was generated by WRF version 4.0, with the initial and boundary conditions obtained from the National Centers for Environmental Prediction (NCEP) North American Regional Reanalysis (NARR). The model simulations were conducted over the CONUS during the summer of 2017, using a horizontal grid spacing of 12 km. The emission data employed as a priori for NOx emission inversion comprised the US EPA NEI 2017 (Eyth et al., 2016; Eyth and Vukovich, 2016) for anthropogenic emission, biogenic emission computed using the Biogenic Emission Inventory System (BEIS) version 3.61 incorporated within the SMOKE modeling system, biomass burning emissions based on the Fire Inventory from National Center for Atmospheric Research (FINN) version 1.5 (Wiedinmyer et al., 2014, 2011, 2006), and lightning emission produced by the in-line lightning NO (LNO) production module in the CMAQ (Allen et al., 2012; Kang et al., 2019) with data from the National Lightning Detection Network (NLDN). Considering the significant contribution of background sources in the free troposphere to tropospheric NOx (Silvern et al., 2019), we implemented dynamic lateral boundary conditions by incorporating additional hemispheric modeling (HCMAQ) with seasonal scaling based on multiple satellites (e.g., OMI/Aura vertical O3 profile, tropospheric NO2 and HCHO, and MOPITT CO). The dynamic lateral boundary conditions, together with lightning NOx emissions, effectively reduced the underestimation of NO2 in the upper troposphere by 23.24% (Jung et al., 2022a). Further details regarding the modeling setup and input data preparation can be found in the earlier study by Jung et al. (2022a).
Iterative finite difference mass balance (IFDMB) method
The mass balance approach is utilized to estimate the posterior emissions based on the relationship between NOx emissions and NO2 concentrations near the surface. Additionally, a scaling factor β, introduced by Lamsal et al. (2011), is incorporated to reflect the sensitivity of changes in NO2 concentrations to changes in NOx emissions, and it helps reduce errors associated with the chemical process of NOx (e.g., NOx oxidation by OH and N2O5 hydrolysis). In this study, we updated NOx emissions using the IFDMB method, following the approach described by Cooper et al. (2017), which iteratively adjusts the NOx emissions until the normalized mean difference between the new emissions and the emissions from the previous iteration becomes less than 1%.
$${E}_{t}={E}_{a}\left(1+\frac{1}{\beta }\frac{{{\Omega }}_{o}-{{\Omega }}_{a}}{{{\Omega }}_{a}}\right)$$
where \({E}_{t}\) and \({E}_{a}\) represent the posteriori and priori emissions, respectively. Ωo represents the DL-estimated complete surface map, and Ωa represents the simulated NO2 concentrations. β is the scaling factor, which accounts for the sensitivity of NO2 concentrations to changes in NOx emissions, calculated by (∆Ω/Ω)/(∆E/E).
OMI tropospheric NO2 column
The OMI instrument on the NASA Earth Observing System (EOS) Aura satellite was launched in July 2004. Since its launch, OMI has been providing continuous daily measurements of several air quality components (such as NO2, sulfur dioxide (SO2), bromine monoxide (BrO), HCHO, and aerosols) at a relatively high spatial resolution (up to 13 × 24 km2 at the nadir) (Levelt et al., 2006). In this study, the tropospheric NO2 columns used are the OMI operational retrieval products (Level 2 and version 3) distributed by the NASA Goddard Earth Sciences Data and Information Service Center (GES DISC) (Bucsela et al., 2013; Choi et al., 2020; Krotkov et al., 2017; Lamsal et al., 2014). However, similar to other satellite measurements, a significant amount of data is missing due to various factors such as cloud cover, cloud shadow, surface contamination, and erroneous surface reflectance caused by clouds and aerosol (Choi et al., 2016). Furthermore, the presence of row anomaly in the OMI data since 2008 has led to a significant reduction in the number of available data points and spatial coverage, impeding its further application (Torres et al., 2018).