Homogeneous and heterogeneous propagation of COVID-19 from super-spreading to super-isolation

: We investigated daily COVID-19 cases and death in the 337 lower tier local authority regions in England and Wales to better understand how the disease propagated over a 10-month period. Population density scaling models revealed residual variance and skewness to be sensitive indicators of the dynamics of propagation. Lockdowns and schools reopening triggered increased variance indicative of outbreaks with local impact and country scale heterogeneity. University reopening and December holidays triggered reduced variance indicative of country scale homogenisation which reached a minimum after New Year. Homogeneous propagation was associated with better correspondence with normally distributed residuals while heterogeneous propagation was more consistent with skewed models. Skewness varied from strongly negative to strongly positive revealing an unappreciated feature of community propagation. Hot spots and super-spreading events are well understood descriptors of regional disease dynamics that would be expected to be associated with positively skewed distributions. Positively skewed behaviour was observed; however, negative skewness indicative of “cold-spots” and “super-isolation” dominated for approximately 4 months during the period of study. In contrast, death metrics showed near constant behaviour in scaling, variance, and skewness metrics over the full period with rural regions preferentially affected, an observation consistent with regional age demographics in England and Wales.


Introduction:
SARS-CoV-2 spread rapidly from a cluster of cases in China in December 2019 to a global pandemic on 13 March 2020. The number of confirmed cases of COVID-19 continues to grow worldwide with over 112 million cases and nearly 2.5 million deaths. SARS-CoV-2 is thought to spread by direct contact, fomites, and aerosols from both symptomatic and asymptomatic people [1][2][3][4]. During the first year of the pandemic, distancing measures and meeting size restrictions have been widely deployed to slow the spread of the disease by reducing the number and duration of interactions capable of causing infection. At scale, population density could be a proxy for these interactions. For example, someone living in a region of high population density is expected to have a greater number of interactions compared with someone who lives in a rural setting [5].
The effects of population size on COVID-19 dynamics have been investigated previously including aspects of population density effects [6][7][8][9]. Investigations of population density effects have been limited to a relatively small number of time points aggregated over a period of time, usually a month or year [10][11][12][13][14][15][16]. Daily granularity of data is not easily accessible; however, the COVID-19 pandemic has provided a unique and evolving data set with daily updates. These data have been influential in informing government interventions, policy decisions, and public perceptions allowing data driven informed decisions [17].
These daily data at relatively high regional granularity provide an opportunity to document the daily evolution of scaling metrics and residual variance over an extended period. Here, we investigated scaling behaviour in England and Wales using daily COVID-19 cases and death in England and Wales Lower Tier Local Authorities (LTLAs) with population density. These were examined to better understand how infectious disease metrics progress over time at country scale.

Theory:
Scaling Models. Urban scaling [18] considers population to predict a range of urban indicators.
A variety of mathematical forms have been applied with power laws being widely used.
Here, is the indicator, P is the population and is the scaling exponent. An estimate to the parameter , can be obtained by applying the least square method to the logarithmic version of equation 1 ( . . log . log ).
When combining rural and urban regions, density metrics provide better models [12][13] than population. This can be described by similar power-law functions of the form Similarly to population scaling, when < 1 scaling is sub-linear, when = 1, the scaling is linear and when > 1 the scaling is super-linear. When interpreting density scaling results, sub-linear scaling accelerates in rural (low-density) regions and super-linear scaling accelerates in urban (high density) areas. The log transformed data is usually fitted to the logarithmic form to obtain the parameters.
The residuals, , from the fits to the models defined in equations 2-4 using method of least squares provide density scale adjusted metrics (DSAMs). If DSAMs are negative, then this is below the expectation and if DSAMs are positive then this is above expectation.

Residual and Case Density Models.
The distribution of residuals obtained from the England was modelled using normal and generalised logistic (GL) distributions. The latter has the form, where , and are the location, scale and shape parameters respectively such that > 0, > 0 and −∞ < < +∞. The first moment of the GL is ( ) = + ( ( ) − (1)) where (1) ≅ −0.57721 and the digamma function. The second moment of the GL distribution is ( ) = 2 ( 2 6 + ′ ( )).

Results and Discussion:
Overview of regions, cases, and number of observations. England  showing negative skew ( Fig. 2(a)) while at others they were positively skewed ( Fig. 2(b)). The availability of testing varied widely over the 10 months which may be a confounder in some presentations; however, the daily scaling metrics, variance, and skewness will reflect the processes in place on the day and were not obviously aligned with testing or the number of observations. All daily per capita case histograms can be found in Figure S1 in the supplementary material.   Figure S6). The persistence of regional positions relative to the scaling laws indicates that the notion of local "surges" of COVID-19 may be less important than persistent local features after the first few weeks of the pandemic. DSAMs are more useful metrics that could be used to assist local interventions.
In the scaling plots ( Fig. 3 (a-d)), variability in residual variance was clear by inspection. For example, toward the end of the December holiday period (31/12/2020; Fig. 3(d)) the data were closer to the power law than in September (30/9/2020; fig 3(c)). The low variance periods represent a more homogenous presentation of cases across the regions while the higher variance periods were indicative of more heterogeneous regional cases. All daily scaling plots and corresponding geomaps can be found in Figures   Daily Exponent, Variance, and Skewness for Cases. The LTLA data were examined to assess the trajectory of scaling behaviour, residual variance, and skewness over 10 months of the pandemic for cases (Fig 4). The scaling exponents ( Fig. 4(a)) for cases rose quickly reaching a peak near the beginning of the first lockdown (announced on the 23/03/2020) in the England and Wales and declined gradually until restrictions were slowly eased toward the end of May and early June. The peak in cases during the first three months coincides with super-linear scaling; however, super-linear scaling was not universal and the preference for propagation in rural vs. urban regions reversed three times during the period of study: early-March, late April, and the end of July. Although periods of rapid growth, seem to coincide with acceleration in high population density regions the long-term behaviour of the pandemic makes clear there is no universal rule and population density is not a simple proxy for infectious interactions. The trajectory of residual variance (Fig 4(b)) changed by over a factor of 4 during the 10 month period and presented a contrast to the scaling parameters. Variance remained relatively constant Homogenisation also occurred following the release of the second lockdown (03/12/2020) through to the end of the period of study. Notably, only the abrupt release of the second lockdown is associated with an obvious "surge" in cases. This includes the major holidays of Christmas and New Year's. Neither caused a "surge." They continued the propagation of the disease in a way that was consistent before and after these key dates. The general country scale homogenisation between the LTLA regions drove residual variance to the lowest levels seen over the 10-month period.
Skewness provides a further contrast to case counts, scaling exponents, and variance. We used the scaling law residuals to create a time series of skewness metrics (Fig. 4(c)). Similar behaviour was seen in the per capita case distributions (Fig. 2) with characteristics changing over the course of the 10-month period. When cases follow a distribution with a strong positive skew, the long positive tail of the skewed distribution is indicative of propagation with hot spots and potential super-spreading incidents. Conversely, when the residuals are negatively skewed, this indicates a distribution better characterised by a long tail of "cold spots" or superisolated regions.

Daily Exponent, Variance, and Skewness for Deaths.
In contrast, daily exponents, variance, and skewness for COVID-19 death ( proportion of elderly people [11]. The scaling exponents for death throughout are consistent with those seen for scaling of people 60 and above in England and Wales. This is overwhelmingly the demographic most likely to die from COVID-19.  Although there is some noise in the differences, the contrast between cases and death is again clear. During the initial periods of the lockdowns (March, November and January) propagation of cases was associated with a GL distribution and negative skew whilst during less restrictive time frames (August, September, October) propagation is associated with a normal distribution.
Modelling and simulation of propagation [40] using network science and a gamma distribution have been attempted using varying parameter values to represent different proportions of "super-spreaders." This analysis indicated that the initial trajectory of exposed and infected people in a population accelerates quickly in networks where there are a high proportion of super-spreaders. Our analysis makes clear a gamma distribution cannot accommodate the range of shapes required over the full period of a pandemic. A gamma distribution cannot be negatively skewed and the structure of cases in the pandemic has periods of negative skewing.
"Super-spreading" is almost certainly important but understanding the converse concepts of "super-isolation" and cold-spots which can better concepts for defining the features of a distribution need to be appreciated. It is also useful to better understand the characteristics and features of regions where a disease is not spreading or is consistently below expectation. The contrasting behaviour of deaths is also of interest. Normally distributed residuals were the overwhelming feature of the 10 month period.

Conclusions
This study has established that both regional per capita measures and scaling law residuals exhibit both positive and negative skewing. Positively skewed distributions have been used to model pandemic behaviour [40]. This is important to indicate super-spreading and hot-spots, but insufficient to characterise the full sweep of a pandemic.
Similarly, scaling law parameters are often thought to be constant or very slowly changing features of a process. In the case of COVID-19 cases, scaling parameters evolved over relatively short periods of time. For cases, the scaling law exponents reached a peak at the beginning of the first lockdown and gradually declined for approximately three months.
Preferential propagation of COVID-19 cases switched between rural and urban regions multiple times. COVID-19 mortality gave a more consistent picture of low population density regions preferentially and consistently affected with linear scaling only being approached at the beginning and end of the 10-month period.
Variance is a key descriptor of the distribution of regional cases. Lockdowns produce heterogeneity (higher variance) across regions while reducing cases. The re-opening of schools drove heterogeneity during a period of case growth indicative of locally important outbreaks.
Country scale mixing such as occurred with the opening of universities and holiday periods promotes homogenisation (low variance). All key statistical metrics from regional death data were remarkably different from cases in the time period. This is consistent with regional age demographics in England and Wales. From a policy point of view these observations and patterns are particularly important, as they provide insight and expected indicative effects following implementation of health policies.
Within this framework it is important to note that England and Wales had continuous community spread of SARS-COV-2. Excepting the very early period in March, there has been nothing that could be called a "surge." Within the 10-month period, the rise and fall of cases and deaths have been gradual as has the evolution of scaling metrics, variance structures, and distribution shapes.

Data Availability
All data generated or analysed during this study are included in this published article (and its supplementary information files). This data was compiled from a range of publicly available sources as noted in the manuscript. These are provided as the following files: UK_regions.xlsx, UK_daily_cases.xlsx, UK_daily_death.xlsx and UK_daily_total.xlsx.

Code Availability
We have also provided a R-script as supplementary information. This has been provided as code_version2.R.