Robust association between short-term ambient PM2.5 exposure and COVID prevalence in India

Novel coronavirus (COVID) outbreak is the deadliest pandemic in our lifetime. The COVID prevalence risk may be enhanced due to comorbidity from other health risk factors like air pollution. However, such evidence is still lacking in India. Using daily conrmed cases, ambient PM 2.5 (ne particulate matter) exposure and meteorological parameters from 28 major states of India between March 14-June 9, 2020, in a generalized additive model, we estimate the association between short-term PM 2.5 exposure and daily COVID conrmed cases. We nd that a 10 mg m -3 increase in ambient PM 2.5 exposure (with a lag of 0-14 days) is signicantly associated with an increased COVID incidence [relative risk (RR) of 1.135 (95% uncertainty interval: 1.091-1.180)] after adjusting for the meteorological factors. A non-linear association between PM 2.5 (lag 0-14) and COVID infection predicts an RR of 4.482 (3.357-5.983) for exposure at 60 mg m -3 relative to 25 mg m -3 . Our results indicate a signicant positive association between ambient PM 2.5 exposure and COVID prevalence in India. As India is easing lockdown measures, higher outdoor air pollution may have implications on COVID transmission, information which can be helpful for general public and policymakers alike.


Introduction
Novel coronavirus (COVID) outbreak in Wuhan, China in December 2019 and its subsequent spread across the world has been declared as an international public health emergency by the World Health Organization (WHO). The scale of the COVID-19 impact both on morbidity and mortality is unprecedented in our lifetime and is expected to have a momentous socio-economic impact on the global population. The disease has been spreading in almost every country through infected individuals coming in direct or indirect contacts with others 1 . To stop the spread of the disease further, either partial or complete lockdown with strict vigilance of social distancing has been adopted by the affected countries.
Early studies 2,3 have pointed out that patients with any comorbidity are more vulnerable to COVID.
Particularly the non-communicable diseases such as hypertension, chronic obstructive pulmonary disease, cerebrovascular disease and diabetes are identi ed as major risk factors for COVID patients 4 . Besides, a high level of air pollution exposure has been identi ed as an important risk factor for COVID patients 5,6 . In a study in China 7 , a positive association has been observed between COVID cases and major criteria pollutants (PM 2.5 , PM 10 , NO 2 ,O 3 and CO) except SO 2 . A similar positive association between COVID cases and PM 2.5 exposure was obtained from Italy 8 . While these studies examined the association between COVID cases and short-term exposure to air pollution, chronic exposure to air pollution is also being considered as a risk of COVID outbreak 9 .
In response to COVID pandemic, India was one of the few countries to implement a nationwide lockdown (from March 25 to April 14, 2020) at an early stage and announced the extension of the lockdown till May 30, 2020, with partial relaxation after April 21, 2020. The rst COVID case was diagnosed in India on January 30, 2020, and it took 59 days to reach 1,000 cases. In our analysis period the cases steadily rose from 81 cases as of March 14 to 267,261 cases as of June 9, 136,176 people recovered (50% recovery rate) and 7798 people died. The extent of the success for the early implementation of lockdown in reducing the spread of the disease in India can only be known in future. However, India has a large burden of non-communicable diseases 10 and is one of the most polluted nations in the world 11 . In 2017, 99% of the Indian population was exposed to ambient PM 2.5 level higher than the WHO air quality guideline 12 . This suggests that the Indian population is likely to be vulnerable to COVID and hence, it is important to examine if the positive association between air pollution and COVID cases found in the recent literature stands true for India too.
In this study, we examine the association of short-term exposure to ambient PM 2.5 (lag 0-7, 0-14 and 0-21 days) with COVID prevalence in India (Fig. 1) using a generalized additive model (GAM) (see Methods). Due to the paucity in ground-based measurements in India 13 , we use satellite-PM 2.5 data that is calibrated and evaluated against reference-grade monitors (Fig. 2). We analyze COVID prevalence data in place of death because the total number of COVID-19 deaths (fortunately) is not large enough for statistical analysis. We perform additional six sensitivity tests to con rm the robustness of our results-(1) analysis with exclusion of Maharashtra where one-third of the COVID cases in India is registered; (2) inclusion of the total number of testing in the model; (3) altering the period of analysis; (4) exploring the non-linearity in the association; (5) analysis at the district level and (6) use of a mixed-effect quasi-Poisson model.

Results
Association between short-term PM 2.5 exposure and con rmed COVID cases. In Table 1, we present summary statistics for daily con rmed COVID case count, ambient PM 2.5 exposure and meteorological data. The 28 states in our study included over 2,67,000 COVID cases during our analysis period. The mean PM 2.5 for our sample is 60 μg m -3 and the average daily number of con rmed cases is 108.5 (with a standard deviation of 311.6, this large standard deviation points towards the disease heterogeneity at the state level). The average daily temperature and humidity were 302.3 K and 0.0124 kg/kg respectively. We plot the moving average lag effect of short-term ambient PM 2.5 exposure with lag 0-7, lag 0-14, lag 0-21 in Fig. 3. We nd that a 10 μg m -3 increase in ambient PM 2.5 exposure is associated with an RR of The sensitivity of the association to various cases for robustness check. Our rst sensitivity analysis involves dropping the state of Maharashtra from our analysis (Fig. 4a). We nd that a 10 μg m -3 increase in ambient PM 2.5 exposure is associated with an RR of 1.013 (0.982-1.045), 1.048 (1.002-1.096) and 1.014 (0.962-1.069) for lag 0-7, lag 0-14 and lag 0-21 respectively. The result implies that the relationship between COVID con rmed cases and short-term exposure to ambient PM 2.5 is still present after the exclusion of the state of Maharashtra. In our next sensitivity analysis, we introduce explicit control for the number of tests conducted ( Analysis using district-level data and meta-analysis reveals similar results as before (Fig. 6). Our estimates from these analyses are similar in magnitude, however, these sensitivity analyses show that estimates for lag 0-21 no longer remains signi cant. This implies that our results are robust to the alternate unit of analysis (district) and alternate choice of estimation model (meta-analysis with the state as the random variable) as well.

Discussion And Conclusions
Using a GAM, we nd that short-term exposure to air pollution as represented by ambient PM 2.5 is positively associated with daily con rmed COVID cases in 28 states of India. Our results are found to be robust to multiple sensitivity checks. For a country like India with an average PM 2.5 amongst the highest in the world, our analysis shows that short-term exposure to PM 2.5 can be an additional risk factor in COVID infection and transmission.
Previous literature has established associations between exposure to air pollution and spread of measles 14,15 and spread of respiratory diseases 16,17 . Speci cally, in case of COVID, studies from China 7 and Italy 8,9 have shown air pollution to be an important factor in disease transmission and mortality. Our study provides evidence from India which is currently witnessing a rapid rise in COVID cases and has a signi cant proportion of its population residing in areas with poorest air quality in the world.
Our study has important policy implications. Air quality has improved signi cantly in India during lockdown due to cease of economic activity nationwide 18 . However, since India is currently easing the lockdown, it is expected to increase the pollution level. Pollution mitigation strategies should be adopted by the public and prescribed by policymakers to prevent further spread of the infection, especially in the polluted place like the Indo-Gangetic Plain in Northern India where 24-hr ambient PM 2.5 remains higher than the national standard most of the time. Such areas should receive special attention as pollution might exacerbate the spread of COVID infection.
We note that our study should be interpreted keeping in mind the following points. First, the unit of analysis is a big geographical unit that is a state. The daily data on COVID infection is released consistently only at this level by the government; hence we conduct our analysis at the state level. Since the association remains signi cant at a district level, we feel the results to hold good at a ner spatial scale. Future studies can explore pollution-infection association at a city level as more data becomes available. Second, due to the absence of a ground-based monitoring network in India that has adequate regional coverage, we rely on satellite data for PM 2.5 only. Although most other pollutants are found to be highly correlated with PM 2.5 7 , precise single pollutant models for other pollutants like SO 2 , NO 2 , PM 10 , CO and O 3 are not considered in this study due to paucity of representative stations in most of the states.
Third, due to the unavailability of other relevant confounding factors like daily mobility or migration data, and age and gender of patients, we could not perform detailed heterogeneity analysis for various subgroups. Lastly, these estimates are not causal, but only highlight the association of air pollution with COVID con rmed cases similar to studies following similar empirical design 3,7 . Our robust association calls for focused cohort studies to establish causality and understand the biological mechanism.
We conclude that a signi cant positive association between short-term exposure to ambient PM 2.5 and COVID infection in India suggests that air pollution can be an additional risk factor in disease transmission and therefore, the air pollution mitigation programs need to be implemented with more e ciency.
Methods COVID data. We use a daily number of COVID con rmed cases for states (and union territories, UTs) of India from a crowdsourced platform (covid19india.org), which collates data from o cial state bulletins among other sources. The data on this platform is updated on a real-time basis with information from the o cial state press bulletins, o cial (Chief Minister, Health Minister) handles, Press Information Bureau, Press Trust of India, ANI reports. We use data on con rmed COVID cases from March 14 to June 9, 2020.
The total number of con rmed COVID cases in India as of June 9 th was 267,261 of which 28 major states (and UTs) had more than 100 cases and accounted for 99.9% of all cases (Fig. 1). We include data from Ladakh (a new UT) in the erstwhile state of Jammu and Kashmir in our analysis. We also used data on the number of tests conducted from the same source (i.e. covid19india.org).
Ambient PM 2.5 exposure data. Inadequate in-situ data of ambient PM 2.5 in India 13 propels us to utilize satellite-based ambient PM 2.5 exposure. We derive surface PM 2.5 from satellite aerosol optical depth (AOD) product following our previous works 19,20 . For this study, we analyze AOD retrieved by Moderate Resolution Imaging Spectroradiometer (MODIS) using MAIAC (Multi-Angle Implementation of Atmospheric Correction) algorithm at 1 km 1 km resolution. We convert daily MAIAC AOD to PM 2.5 using a dynamic scaling factor derived from MERRA-2 (Modern-Era Retrospective analysis for Research and Applications, version 2) reanalysis data and calibrate the satellite-PM 2.5 using the coincident in-situ measurements of PM 2.5 by a ground-based network that is maintained and quality-controlled by the Central Pollution Control Board (CPCB). We have used a percentile-based regression technique to develop region-speci c correction factors and tuned the satellite-PM 2.5 closer to the ground-based measurements 20 . Since the satellite-PM 2.5 represents satellite overpass time (10 am to 2 pm), we have applied a diurnal scaling factor from MERRA-2 reanalysis data to estimate 24-hr PM 2.5 from the retrieved PM 2.5 . Comparison of 24-hr satellite-derived calibrated PM 2.5 with CPCB measurements (Fig. 2) reveals an excellent correlation (R = 0.98) and a reasonable root mean square error (24.2 g/m 3 )attesting to the quality of the satellite data product given the large spatial data gaps in ground-based observations.
Meteorological data. We analyze the temperature and speci c humidity data from ERA5 reanalysis. ERA5 is an updated and improved version of the ERA-Interim datasets 21 . We use the hourly data at the lowest level averaged over the diurnal scale in our analysis. We note that the surface pressure can be lower than 1000 hPa in a high-altitude grid for which the ERA5 reanalysis extrapolates the values from the level above in the post-processing.
Modeling framework. The ambient PM 2.5 exposure-COVID relationship has been examined by GAM, which uses daily data on the count of con rmed COVID cases, 24-hr average PM 2.5 exposure and weather data on humidity and temperature for the 28 major states of India. We estimate daily state-level PM 2.5 and meteorological parameters by averaging all the grids lying within the state boundary. For the grids that partially belong to a particular state, the area-weighted average is considered in Arc-GIS using the statelevel shape les. The population-weighted daily exposure estimates allow us to minimize exposure misclassi cation as the urban areas (where the COVID cases are higher) have a larger population (and hence larger weight) than the rural areas in a state/union territory (UT).
The dependent variable-daily number of con rmed COVID cases is a count variable; hence, we use GAM with a quasi-Poisson family for over-dispersed data 14,22,23 . The median incubation period for COVID cases in China has been estimated to be 5.1 days, with 97.5% of cases developing symptoms within 11.5 days 24,25 . Thus, in this study, we model short-term exposure to ambient PM 2.5 (lag 0-7, lag 0-14, lag 0-21) as a moving-average 7,22 . Our basic GAM is de ned as follows: In this model, gyit is a log link function of the expectation yit ≡EYit, where Yit is the series of the count of daily con rmed COVID cases in a state. t denotes the day of observation and α is the intercept. PMil represents the linear term for l+1 day moving average of PM 2.5 in state i. Our main coe cient of interest is β which captures the association between short-term exposure to PM 2.5 and COVID con rmed cases.
The delayed effect of meteorological factors-temperature and humidity are also controlled for the same period.
To capture the possible non-linear relationship between the meteorological factors and COVID incidence, we use a natural cubic spline (ns ()) function with three degrees of freedom. The choice of degrees of freedom is based on previous studies 3,7,26,27 . yi,t-1 denotes previous day count of daily con rmed COVID cases in state i, to account for serial correlation in our data 7,26,27 . We also introduce state xed effects, statei, to account for unobserved time-invariant characteristics of states like population density, availability of health infrastructure and total population. Lastly, time xed effects, dayt, are added to the model to capture unobserved factors at the day level, they account for shocks like national lockdown, lifting of movement restrictions 7 . These xed effects (state and time) help us in exploiting the temporal and spatial variation in our data.
Sensitivity analysis. We employ six additional sensitivity analysis to check for robustness of our results. First, we exclude the state of Maharashtra which accounts for the largest contribution (one-third) to the total cases in India (as of June 9, 2020) from our analysis to check whether the results hold. Second, we include total tests conducted in each state in our analysis to check whether our results are sensitive to this alternate speci cation. This variable (total tests conducted) captures how states are ramping up their efforts to detect COVID infections which in turn affects the total con rmed case count in each state. The data on total cases con rmed in each state is not available for every day and hence the inclusion of this variable in our model has reduced our analysis sample considerably. Third, we test for the robustness of our results by altering the choice of the analysis period. We do this by restricting our study only to the post lockdown period, that is from March 25 to June 9, 2020. Fourth, we test whether our ndings are sensitive to an alternate non-linear speci cation for the exposure variable. We replace the linear PM 2.5 variable with a quadratic B-spline 23 . The quadratic B-spline for the moving average PM 2.5 is de ned by 3 knots placed at 20 μg m -3 , 60 μg m -3 and 100 μg m -3 . All the spline basis variables are centered at 25 μg m -3 (which is the WHO air quality guideline for 24-hour PM 2.5 ).
Our next sensitivity analysis is to test the impact of exposure misclassi cation (if any, due to the use of state-level population-weighted PM 2.5 exposure) on our results. We could not carry out city-level analysis as the COVID cases are not reported systematically at city-level. Instead, we used the second-order geographical and administrative unit (i.e. district) as the unit of analysis. On April 30, 2020 Government of India through a circular (available at https://static.mygov.in/rest/s3fspublic/mygov_158831498053877021.pdf) categorized all districts of India (total number of districts in India is 733) into the red zone (hotspots), orange zone (incidence of cases, doubling rate is relatively lower than red zones) and green zone (no con rmed cases in the last 21 days). We note that even the district-level data on COVID con rmed cases is only available from April 21, 2020, and the data is not available for each day for every district (the reason for mainly relying on the state-level analysis). Out of 130 districts categorized as red zone districts, we conducted our analysis on 118 districts which have consistent data.
Our nal robustness check comprises of conducting an alternate analysis which used meta-analysis following a two-stage process. In the rst stage, state-speci c PM 2.5 -COVIDrelationshipsare analyzed using a time series Poisson model which accounted for overdispersion in the data. The rst stage model is similar to equation 1 described above, it used a linear control for exposure (moving average), a natural cubic spline with df = 2 for humidity, df = 3 for temperature, df = 4 for time and is also controlled for the previous day-case count to account for serial correlation. In the second stage, these individual state associations are combined using a random-effects meta-analysis to arrive at a national level estimate.
The random-effects meta-analysis model was tted using maximum likelihood estimation.
The analysis has been carried out in R (version 3.5.1) using glm command in the mgcv package. The meta-analysis was conducted using mvmeta command. The estimates of ambient PM 2.5 exposure is expressed as relative risk (RR) with RR > 1 representing an increased risk due to unit increases in PM 2.5 exposure by 10 μg m -3 and corresponding 95% con dence intervals have also been provided.

Declarations
Data availability. All data will be made available on request from the rst author. Cumulative con rmed COVID cases in states of India with more than 100 cases (grey shade) as of June 9, 2020. Larger circle implies higher COVID cases.