Transmission dynamics, heterogeneity, and controllability of SARS-CoV-2 in the rural area

Backgrounds: Few studies examine the transmission dynamics and heterogeneity of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in rural areas and clarify rural – urban differences. Moreover, the effectiveness of non-pharmaceutical interventions (NPIs) relative to that of vaccination in rural areas is uncertain. Methods: We addressed this knowledge gap using an improved statistical stochastic method based on the Galton – Watson branching process considering both symptomatic and asymptomatic cases. Data were collected from the epidemiological records of 1136 SARS-2-CoV infections after the rural outbreak in Hebei, China, between 2 January and 20 February 2021. Results: The estimated average reproductive number R and dispersion parameter k ( k <1 indicating strong heterogeneity) in the rural area were 0.55 (95% confidence interval [CI]: 0.45 – 0.68) and 0.14 (95% CI: 0.10 – 0.20), respectively. Although age group and contact-type distributions significantly differed between urban and rural areas, the 𝑅 and 𝑘 did not. Further, simulation results based on pre-control parameters ( R = 0.81, k = 0.27 ) showed that in the vaccination scenario (80% efficacy and 55% coverage), the cumulative secondary infections will be reduced by more than half; however, NPIs are more effective than vaccinating 65% of the population. The presence of asymptomatic infections might affect the estimation of R but showed no significant effect on estimating transmission heterogeneity. Conclusion:


INTRODUCTION
The dynamics of an outbreak depend on the average reproductive number () and individual heterogeneity in transmission.Although there is ample research on , studies regarding heterogeneity are limited.Heterogeneity reflects the divergence of secondary infections in each case and can be estimated by describing the distribution of secondary cases as a negative binomial distribution with dispersion parameter , where  < 1 suggests that the transmission is over-dispersed [1].Diseases with high heterogeneity show infrequent but explosive epidemics; for example, in 2003, many settings experienced no epidemic despite unprotected exposure to severe acute respiratory syndrome (SARS) cases [2,3], whereas a few cities suffered explosive outbreaks of SARS [4,5].Thus, understanding the role of transmission heterogeneity in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) dynamics is important for outbreak control.
However, few studies explore the impact of asymptomatic infection on disease dynamics, especially individual heterogeneity in transmission.This is likely due to the lack of a valid statistical model and ambiguity of fundamental epidemiological questions that remain poorly understood, such as the proportion of asymptomatic cases [6].Nishiura et al estimated that the asymptomatic ratio of coronavirus disease (COVID-19) was 41.6% (5 out of 12 confirmed cases) among 565 Japanese individuals evacuated from Wuhan, China [7].A recent review [8] of 41 studies showed that the pooled percentage of asymptomatic infection was 15.6% (95% confidence interval [CI]: 10.1-23.0%).
Moreover, most studies focus solely on urban areas and ignore rural regions.Rural regions tend to have higher levels of poverty [9] and fewer job opportunities [10] relative to urban areas.Furthermore, rural areas broadly lack access to healthcare [11], tend to have older and healthier populations [12,13], and lack awareness of timely medical treatment [14].The gaps lead to discrepancy in transmission dynamics between urbanrural areas, warranting improved corresponding control policies.
The second-wave outbreak in Hebei, China, mainly occurred in rural regions and swiftly subdued.After the first confirmation on 2 January 2021, the government implemented city-wide nucleic acid tests (NAT) in the two most severely affected cities to detect symptomatic and asymptomatic infections.Concurrently, highly detailed epidemiological information on individuals and their close contacts was collected by the Health Commission of Hebei province.
In this study, we provide an evidence-based picture of the SARS-CoV-2 outbreak in rural areas, where the government should pay more attention to older adults, children, and community contacts when conducting prevention and control measures.Additionally, we extended a statistical model that can be applied to other regions and provide more comprehensive results considering symptomatic and asymptomatic cases.To the best of our knowledge, this is the first direct comparison of the effects of non-pharmaceutical interventions (NPIs) under actual conditions and vaccination.

Data collection
We collected detailed data on 942 confirmed SARS-CoV-2 infections in Hebei province, China, from 2 January to 20 February 2021, which were available in the website of the Health Commission of Hebei province [15].In addition, asymptomatic infections accounted for 17% (194/1136) of the total infections.No new infections have been confirmed in Hebei province since February 14, indicating that the outbreak has been controlled.Primary and secondary SARS-CoV-2 infections were identified through: (i) active screening of incoming passengers in Hebei province, especially those had travelled to areas defined by the Chinese government [16] as medium-or high-risk to capture travel-associated symptomatic and asymptomatic infections; (ii) passive surveillance in hospitals and outpatient clinics, involving testing of individuals suspected with COVID-19 to capture symptomatic cases; (iii) contact tracing of all confirmed infections identified by the above screening, followed by systematic monitoring of their close contacts, to capture symptomatic and asymptomatic infections; and iv) city-wide NAT to capture symptomatic and asymptomatic infections.From 6 January to 22 January 2021, Shijiazhuang City, the most severely affected area in Hebei province, has carried out three rounds of full-staff NAT, with the total exceeding 30 million person-times.
The collected data for each confirmed case included age, sex, prefecture, date of symptom onset, date of diagnosis of asymptomatic infection, date of confirmation, potential exposures, and contact history.Based on contact tracing, a SARS-CoV-2 cluster was defined as a group of ≥2 confirmed SARS-CoV-2 cases or asymptomatic infections with an epidemiological link, i.e. occurring through the same contact-type (e.g., home, social, community, or other).According to the extent of resolution of the reconstructed infection cluster, i.e. the number of primary cases and chain size (total number of cases in the transmission chain), three chain types were further identified (Simple/Ordinary/Complex). We considered the sporadic cases as isolated simple transmission chains (detailed definitions are provided Supplement, Section 1).Because contact tracing of asymptomatic infections was unavailable, we imputed them into the whole transmission chain according to the rates of secondary cases (imputation mechanisms are provided in Supplement, Section 2).
We additionally collected detailed epidemiological records for the outbreak of 135 cases confirmed in Tianjin [17] from 21 January 2020 to 26 February 2020.We selected Tianjin, a municipal city in China, as a representative city of the urban outbreak and compared the differences in transmission dynamics, demographics, and serial intervals between urban and rural areas.The definition of rural and urban areas in China according to relative regulations [18].

Statistical analysis Inference about Transmission Characteristics
To estimate the R and k simultaneously, we deployed a statistical stochastic method based on the Galton-Watson branching process to simulate the entire transmission.
Assuming that the offspring distribution follows a negative binomial distribution with mean  and dispersion parameter  (lower  indicates higher heterogeneity), we estimated the two parameters using maximum likelihood estimation (MLE) [19,20].
Additionally, by considering asymptomatic cases, we improved their method to fit into a wider population including symptomatic and asymptomatic cases.To guarantee the robustness of the estimation results, we adopted two approaches to infer the corresponding CIs of  and , namely, the LRT and the biased-corrected and accelerated bootstrap methods.The model details and parameter settings are provided in Sections 3 and 4 of the Supplement.

Estimation of serial interval
To compare the difference between the serial intervals of SARS-CoV-2 in rural and urban areas, we fitted serial intervals retrieved from a total of 8 and 12 infector-infectee pairs in Hebei and Tianjin into a parametric Weibull model.Statistically significant differences were determined by conducting the LRT.

Assessment of different interventions
We analyzed the effectiveness of two types of interventions on SARS-CoV-2 transmission: vaccination and NPIs, including city-wide NATs, isolation, and maskwearing.For the NPIs, we divided the entire outbreak period into three segments according to the time of two rounds of city-wide NAT (9 January and 14 January 2021), and estimated  and  for each time period.We used the parameters before the first NAT intervention as the baseline and the parameters for the second period (during 9 January and 14 January 2020) as the effect of non-pharmaceutical intervention.To clarify the impact of the vaccine and its comparison with NPIs, we assumed that the population had been vaccinated before the outbreak (80% efficacy) and simulated the cumulative secondary infections, considering a range of coverage rates (20%, 55%, 65%, and 75%).

Characterizing SARS-CoV-2 transmission chains and heterogeneity in rural areas
In total, 942 SARS-CoV-2 confirmed infections, except for two infections with missing information, occurred in 387 (41.1%) males and 553 (58.8%) females.We observed 0-4 generations of transmission, with the largest cluster involving 44 SARS-CoV-2-infected individuals.Exposures were grouped into four categories according to contact type: household, community, social, and primary case (Definitions in Section 5 of Supplement).Except for the primary case, household contacts accounted for the highest proportion of transmission, followed by community contacts and social contacts.Figure 1 shows the reconstructed transmission chain.

Comparison of SARS-CoV-2 transmission characteristics between the urban and rural areas
There were no significant differences in  and  between the two areas ( < 0.05, Table 3).The  values were 0.74 and 0.55 and k values were 0.35 and 0.14 for urban and rural areas, respectively.However, there were significant differences in age, sex, and contact-type distributions between these two areas.The proportions of older adults (≥ 65 years old) and children (≤ 20 years old) in the rural area were higher than those in the urban area (16.1% vs. 14.8% and 16.4% vs. 3.7%, respectively).Although more than 50% of transmissions in both urban and rural areas were caused by household contacts (61.1% and 51.4%), community contacts also accounted for a large proportion (46.5%) in rural areas.The median serial interval of the rural area was shorter than that of the urban area (5.5 days vs. 6.0 days), although without significant differences.

Assessment of NPIs and vaccination in SARS-CoV-2 transmission
For NPIs, until the first NAT, the average  was 0.81 (95% CI: 0.65-1.02)and dispersion parameter  was 0.27 (95% CI: 0.14-0.56;Table 4).After the first NAT and before the second NAT,  decreased significantly to 0.33 (95% CI: 0.22-0.50)and remained around this level after the second NAT, while the  value showed a downward trend at first but then increased slightly to 0.17 (95% CI: 0.10-0.31)without significant changes.We concluded that the first round of NAT played the most significant role in curbing the spread of SARS-CoV-2.We conducted simulation studies with 656 primary cases and estimated the cumulative offspring infections from 1 January to 31 March.In Figure 2

Sensitivity analyses
In addition, the robustness of the model was verified.At First, we identified the role of asymptomatic infections in the epidemic dynamics.Given the higher proportion of asymptomatic infections, the estimates for  showed a steadily increasing trend, from 0.51 to 1.95.The estimates of  fluctuated around 0.14.This indicated that with a higher estimation of the proportion of asymptomatic cases, based on a parallel dataset, the estimated  was larger, indicating a more severe potential spread of SARS-CoV-2.In contrast, there was no significant change in the heterogeneity of disease transmission due to the proportion of asymptomatic infections.Moreover, we evaluated the performance of our model based on the idea of 10-fold cross-validation [21].The data were divided into 10 equal parts, and nine of them were randomly selected for parameter estimation.The mean values (standard deviations) of  and  were 0.55 (0.02) and 0.14 (0.01), respectively, verifying the good performance of our model when generalized to an independent dataset (Table 5).

DISCUSSION
We characterized the transmissibility of SARS-CoV-2 in rural areas with the presence of asymptomatic infections based on the detailed epidemiology records in Hebei, China, and further compared the effectiveness of vaccination and that of NPIs.
SARS-CoV-2 transmission in the urban and rural areas showed a strong heterogeneity.
Moreover, household contact was the most important mode of transmission, whether in the city or the countryside, but community contact also played an important role in countryside transmission.We also found that in the vaccination scenario (80% efficacy and 55% coverage), the cumulative secondary infections will be reduced by more than half; however, NPIs are more effective than vaccinating 65% of the population.The presence of asymptomatic infections might affect the estimation of  but showed no significant effect on estimating transmission heterogeneity.
The estimated dispersion parameter  was 0.14 (95% CI: 0.10-0.20) in the rural area, indicating strong transmission heterogeneity.This result is consistent with that of another study.Lau et al [22] reported that a rural area (Dougherty) in Georgia, USA, had strong transmission heterogeneity ( = 0.43; 95% CI: 0.39-0.47).Although there was no significant urban-rural difference in  under 5% type I error (=0.09), in the rural area was lower than that in the urban area, in concordance with the results of Lau et al [22].Transmission heterogeneity results from many factors including pathogen virulence, control measures, and activity density.The results of whole-genome sequencing and phylogenetic analysis revealed that the strains of Tianjin and Hebei both belonged to the European branch of the lineage(L-Lineage) [23,24].However, large-scale NAT, which was not carried out in Tianjin, can quickly and comprehensively screen asymptomatic and mild symptomatic infections, thereby effectively shortening the infection time.
Additionally, the rural outbreak in Hebei coincided with China's spring festival and several weddings, thus, increasing the probability of large gatherings.Therefore, the government should pay sufficient attention to rural areas instead of focusing solely on urban areas.
SARS-CoV-2 transmission in rural areas had similar transmission dynamics as that in urban areas but differed in terms of age group and contact-type distributions.Our results indicated that rural areas had a larger proportion of older adults and children with COVID-19 than did the urban areas.This may be because rural areas had older populations, on average, and more people with underlying health conditions than suburban and urban communities [12].Additionally, older adults are more likely to be hospitalized and have severe COVID-19, with higher mortality rates [25].Household transmission played an important role in both outbreaks, which corroborates previous studies [26,27].However, in rural areas, community contacts also lead to a large proportion of infections.Similar to that in other rural areas [28], most community contacts in Hebei were consequent to wedding receptions.Therefore, the government must develop prevention and control measures in rural areas, mainly focusing on older adults and children and restricting large gatherings that pose a high risk for infectious disease transmission [25].
We found that NPIs lead to a larger reduction in infections than vaccination (80% efficacy and 65% coverage).This may explain why the outbreak was swiftly controlled in Hebei.A recent study also verified that NPIs are cost-effective approaches to curb the spread of SARS-CoV-2 [29].However, the effect of NPIs is closely related to the timing and quality of implementation; hence, similar strategies might have different effects in different cities [30].Therefore, countries with strong governance can prioritize NPIs until vaccines are widely available.Nonetheless, the durability of responses after vaccination is uncertain [31]; therefore, vaccination is more suitable for countries with economic strength but weak governance.
Our findings have several limitations: First, contact tracing of asymptomatic infections is not provided, which may give rise to biased reconstructions of transmission chains.However, detailed records of asymptomatic infections are difficult to collect because they require intensive prospective clinical sampling and screening, which hinders many studies.We imputed this information by assuming missing at random, and our sensitivity analyses can prove the accuracy of our model to a certain extent, but more detailed epidemic data would be helpful.Second, we explicitly set the fixed value of the proportion of asymptomatic infections when estimating the model parameters.Although we conducted sensitivity analyses, improved methodology and more accurate estimation of the asymptomatic proportion is a future research direction.Lastly, the regulation of asymptomatic infections on transmission dynamics needs to be further explored and evidenced.

Figure 1 .
Figure 1.SARS-CoV-2 transmission chains.Reconstructed transmission chains of 942 SARS-CoV-2-infected individuals in the rural outbreak in Hebei province.Each node in the network represents a patient infected with SARS-CoV-2, and each link represents an infector-infectee relationship.The color of the node denotes the reporting contact type of the infected individuals.The size of the node corresponds to the number of secondary cases.

X
.Z., Y.Z., and Y.L. conceived the study and led the over-all scientific questions.Y.L., T.H. and X.G. carried out data analysis and modelling studies.Y.L. collected and processed the data.Y.L. and T.H. wrote the paper.Y.L., T.H., and X.G. revised the paper.

Figures
Figures

Figure 2 Comparison
Figure 2

Table 1 .
Characteristics of the three types of transmission chains for SARS-CoV-2

Table 1 )
. Considering asymptomatic infections, the average  of outbreak in the rural area was 0.55 (Table2); this represents a low transmission risk.The 95% CIs of  estimated by the likelihood radio test (LRT) and biased-corrected and accelerated bootstrap (BCa bootstrap) methods were 0.45-0.68 and 0.44-0.69,respectively.There was no significant difference between the results obtained by the two methods, indicating the robustness of our results.The dispersion parameter  was 0.14, and 95% CIs were 0.10-0.20 and 0.10-0.19estimated by LRT and BCa bootstrap, indicating considerable heterogeneity of SARS-CoV-2 transmission in the rural area.

Table 3 .
Comparison of SARS-CoV-2 transmission between urban and rural areas 's t-test was used to compare the differences in age groups.The  2 test was used to compare differences in sex and contact type.LRT was used to compare the difference in the serial interval and transmission dynamics.The estimation of  and  is based on the imputed dataset with a total of 1136 infections.IQR, interquartile range. Student

Table 5 .
Sensitivity analyses.The impact of asymptomatic infections and cross-