Processing methodology of global anthropogenic for air quality modeling

The Global Emissions Initiative (GEIA) storages and offers global datasets of emission inventories developed in the last 30 years. One of the most recently updated global datasets covering anthropogenic source emissions is the Copernicus Atmosphere Monitoring Service (CAMS). This study applied the NetCDF Command Operator (NCO) software to preprocess the anthropogenic sources included in the CAMS datasets and converted those les as an input in the Sparse Matrix Operator Kerner Emissions (SMOKE) model for future air quality modeling. As a result, six steps were applied to obtain the required le format. The case of the Central coast in Chile was analyzed to compare the global database and the ocial reports for the on-road transport sector. As a result, some differences were shown in the most populated locations of the domain of analysis. The rest of the zones registered similar values. The methodology exposed in this report could be applied in any other region of the planet for air quality modeling studies. The development of global datasets like CAMS is useful for hemispheric analysis and it could bring an estimation on the mesoscale scale. It represents an opportunity for those locations without ocial reports of non-updated data.


Introduction
The most accurate and absolute air emission inventory estimation is crucial to achieving an air quality simulation [1]. Some countries have been established datasets with this information at a high level of detail, improving and enhancing studies for better environmental policy at local, regional and national levels. Unfortunately, there are many regions with unclear or unde ned emission inventory, avoiding air quality models in those locations.
In the last 30 years, various global datasets of emission inventories have been developed for different sources and covering speci c periods of analysis. Today, the Global Emissions Initiative (GEIA) storages and offers those datasets. It represents an opportunity for those locations without o cial reports of nonupdated data. More information about the organization's mission and goals can be found on its website (http://www.geiacenter.org). One of the most recently updated global datasets covering anthropogenic source emissions is the Copernicus Atmosphere Monitoring Service (CAMS) developed by the European Centre for Medium-Range Weather Forecasts (ECMWF) on behalf of the European Commission.
CAMS datasets is a compiled emission inventory developed for the years 2000-2020 for many atmospheric compounds. The anthropogenic sources include 12 sectors, and the spatial resolution is 0.1º. The methodology of the emission inventory estimation can be accessed in a published report [2].
The les of CAMS datasets are hosted on the GEIA website. Also, they are processed with the Network Common Data Form (NetCDF) format and are available upon user request. The instruction for download the desired les are sent by email to the registered user.
Until today, there are no published reports with the application of CAMS datasets as an input on air quality models. One of the main causes could be the lack of information about its processing. The purpose of this study is to expose a methodology for CAMs datasets processing and convert those les as an input in the Sparse Matrix Operator Kerner Emissions (SMOKE) model [3]. SMOKE is the preprocessor of the emission inventory for air quality models like CMAQ [4] and CAMx [5]. The steps reported could be applied in future air quality modeling studies on regions where the emission inventory is not de ned or not developed at all.

Methodology
The CAMS-GLOB-ANT datasets contain monthly emissions for anthropogenic sources. In this study, the on-road transport (tro) sector was analyzed. All les were processed using the NetCDF Command Operator (NCO) software. NCO is designed for analysis and modi cation of gridded data stored in NetCDF format [6]. This software was released in 2008, and many research studies have been developed by using this tool since then. NCO has different types of commands for reading, writing, interpolating and averaging [7].
In this study, the rst step was the extraction of the emissions for one month, as shown in Table 1. If the user downloads more than one month in a unique le, the rst month's counter must start at zero. For example, in this study, the total emissions for 2018 were downloaded, so the counter for August emissions was the number seven. The counter for the rest of the monthly emissions must be according to the item in an ordered list. While the user downloads one le with one-month data, this step is not needed. Table 1 Main steps for processing CAMS dataset to obtain the required format as an input in SMOKE.
Step command Example After the extraction of the desired monthly data, the le format must be modi ed. The second step deletes the attribute named "time" in the extracted le on step 1. Next, the variable "time" is also deleted in step 3. SMOKE does not read gridded emission inventory data with that attribute and variable [3], and that is why they must be extracted in steps 2 and 3. The next step (4) changes the le format to NetCDF type 3, which is a SMOKE requirement for this gridded emission data. Steps 2-4 are also mentioned in the methodology reported by Pino-Cortés et al. [8].
Finally, steps 5 and 6 modify the attribute "longitude". The original le of CAMS-GLOB-ANT datasets has the longitude ordered from 180 º to -180 º, and it is observed in Fig. 1 (upside). However, this attribute must be relocated from 0 º to 360 º, being the 0 º the longitude with the original column. Figure 1 (downside) shows the modi cation of step 6. The output le from step 6 has all the requirements to input into the SMOKE model. All steps exposed in Table 1 could be replicated for all sectors included in the CAMS-GLOB-ANT datasets.

Case Study Results And Discussions
The The monthly processed les as described in the previous section were input into SMOKE as gridded data.
The spatial distribution of the emissions processed in SMOKE is shown in Fig. 2. The letters represent the communes' location in the domain of analysis. The gridded cells positioned in the ocean are explained by the resolution of the CAMS datasets les. However, the emissions processed in SMOKE can be considered acceptable and a preliminary estimation for future air quality modeling.
One of the main uncertainties for emission inventory simulation is the temporal pro le of the emissions.
The SMOKE output les bring the monthly fraction of transport emissions, as shown in Fig. 2. In this case, three different registries were obtained for different communes in the zone of analysis.
The highest emissions occurred during August 2018 for all communes. In contrast, the lowest fractions were observed in January, February and July, when the holiday season is present in Chile. The temporal pro le obtained using CAMS datasets is reasonable except for Viña del Mar and Valparaiso. These communes received many tourists during holidays every year, impacting the transport sector and increasing the tra c on the urban streets. In January and February, the monthly pro le must be higher than the rest of the year. The information shown in Fig. 3 could help future studies of emission inventories about this anthropogenic source in Chile. The CAMS datasets registries were compared to the o cial report from PRTR in Chile [10]. As shown in Table 2 Summary The methodology exposed in this report could be applied in any other region of the planet. This study applied NCO commands available for the preprocessing of the CAMS datasets les. As a result, the required le format is obtained to input gridded data into the SMOKE model. The emissions and temporal pro le registered in CAMS datasets must be compared to o cial reports of transport sectors. More realistic and accurate emission inventory must bene t the research community for more air quality modeling studies in zones where this information is scarce or unde ned. The development of global datasets like CAMS is useful for hemispheric analysis and it could bring an estimation on the mesoscale scale.

Declarations
Funding This work was supported by the supercomputing infrastructure at NLHPC (ECM-02) and the Project 039.461/2020 DI EMERGENTE PUCV 2020.

Competing interests
The author declare that he has no competing interests Availability of data and materials All data generated or analyzed during this study are included in this published article.

Code availability
The SMOKE codes can be downloaded at www.cmascenter.org. Maps used in the spatial plots were created using Google Earth Pro and Panoply. Panoply is available at www.giss.nasa.gov/tools/panoply. CO emissions of CAMS datasets. a) original format. b) processed using NCO in this study. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.