Data
Confirmed cases: the confirmed cases database of the District Health Secretary of Bogotá (SDS) stores individual information on dates: symptoms onset, admission to general hospitalisation and intensive care units (ICU), discharge from hospitalisation services, and death. It also contains information on the condition of patients, the level of severity of the infection, and demographic details such as age and sex[8]. This database is maintained and updated with the daily report of confirmed cases from PCR and antigen tests and the information on patients' status provided by the National Epidemiology Surveillance System (SIVIGILA).
To compute stays at general hospital beds we created end dates using the following hierarchy of available dates: ICU entrance, discharge, and death. Similarly, we calculated stays in ICU creating the end dates from discharge and death dates, following the same hierarchy. To validate the estimated end dates, we recreated daily occupancy curves from the confirmed cases report. We compared them with the official report of ICU occupancy for both services available on the open data websites: Datos Abiertos Bogotá[9] and Saludata Bogotá[10].
Genomic data
the genomic surveillance data was published by the Global Initiative on Sharing All Influenza Data (GISAID)[11],[12]; it contains the genomic sequences for SARS-CoV-2, processed by different laboratories all across the country. We classified the viral lineages using the nomenclature presented in the literature[13] and counted the sequences grouped by epidemic week, starting from the 12th week of 2021 which is the date earliest date available.
Statistical methods
Start and end dates of waves: we computed the start and end date of each wave using the first derivative criteria for change of convexity applied on the daily series of new cases. For that purpose, we differentiated this series and calculated its roots using simple linear interpolation. To avoid multiple roots generated by the typical oscillations of series, we smoothed both series using the Gaussian smoothing method with a kernel width of 10 days.
Reproduction number
we estimated the time-varying instantaneous reproduction number using the epidemiological R package EpiEstim[14]. We used the report of daily new cases from the confirmed cases database, grouped by onset date, distinguishing imported from local cases. We assumed an incubation period of 5 days and a serial interval of 6.48 days with a standard deviation equal to 3.83 days[15],[16]. Additionally, we estimated the delay time of the database using the percentile 0.9 of the distribution of differences between reporting and onset dates of cases. We ran these estimations for the total confirmed population in Bogotá and for adults 60 years of age or older to compute the possible effects of changes in the focalization of massive testing strategies in Bogotá.
Transmissibility advantage
we evaluated the transmissibility advantage using a multinomial logistic regression with a single explanatory variable given by
$$f(v,t)=\alpha +{\beta }_{v,0}t$$
where \(\alpha\) is the intercept of the model and \({\beta }_{v,0}\) is the variant-specific parameter for the time covariate, which is computed with respect to a reference (or pivot) variant. For simplicity, we chose Alpha as the pivot variant, which is the first observation in time that we have.
In general terms, the coefficients \({\beta }_{v,0 }\) can be used to calculate the transmissibility advantage of a variant \(v\) with respect to the pivot variant (in our case Alpha) by means of the following relation[17].
$${T}_{v,0}=exp\left(\frac{{\beta }_{v,0}}{7}{g}_{0}\right)$$
,
Where \({g}_{0 }= 4.5 days, (3.7–5.4)\) is the generation time of Alpha[18], and the coefficient was divided by seven to normalise its value to daily scale. Thus, we can compute the transmissibility advantages between any two different variants \(w\) and \(v\), using the transmissibilities \({T}_{w,0}\) and \({T}_{v,0}\) as follows:
$${T}_{w,v}= \frac{{T}_{w,0}}{{T}_{v,0}}$$
Severe outcomes
we computed the Hospitalisation Case Ratio (HCR), ICU Case Ratio (ICU-CR) (HCR), Case Fatality Ratio (CFR), Hospitalisation Fatality Ratio (HFR) and ICU Fatality Ratio (ICU-FR) disaggregated by sub-populations where is the number of wave
and \(g\in \{0-9, 9-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+\}\) the age-group. For the case ratios we used:
$$XC{R}_{i,g} =\frac{{X}_{i,g}}{{C}_{i,g}}$$
Where \({X}_{i,g}\in \{{H}_{i,g} ,IC{U}_{i,g}\}\) is the cumulative number of hospitalised patients (\({H}_{i,g}\)) and the cumulative number of patients at ICU (\(IC{U}_{i,g}\)) in a subpopulation \((i,g)\), and \({C}_{i,g}\) is the cumulative number of cases by sub-population. On the other hand, the fatality ratios were calculated as:
$$XF{R}_{i,g} =\frac{{D|X}_{i,g}}{{X}_{i,g}}$$
In this case \({X}_{i,g}\in \{{C}_{i,g} , {H}_{i,g} ,IC{U}_{i,g}\}\) and \(D|{X}_{i,g}\) is the cumulative number of deaths given that they belong to the population \({X}_{i,g}\)[19].
We also computed the percentages of Hospitalisation, ICU admission and Deaths per age group and wave:
$$Y{\%}_{i, g} = 100\times \frac{{Y}_{i,g}}{{Y}_{i}}$$
Where \({Y}_{i,g}\in \{{H}_{i,g}, IC{U}_{i,g}, {D}_{i,g}\}\) is the number of cases for each outcome per wave and age group and \({Y}_{i}\in \{{H}_{i}, IC{U}_{i}, {D}_{i}\}\) is the total number of cases for each outcome per wave. In all cases we estimated confidence intervals of 95% using binomial proportions.
Probability distributions of epidemiological delays
we fitted the probability density functions to the observed distributions for onset to death, general hospitalisation, and ICU entrance; and for stays at general hospital and ICU beds. We used a Bayesian hierarchical model[20] to fit the parameters of each distribution. In this order, we assumed that the set of parameters of the -th wave was normally distributed as follows
$${q}_{i,j} \sim N({q}_{i, Bog},{\sigma }_{i})$$
,
where \(i=\text{1,2},3, .., n\) runs over \(n\) parameters of certain PDF, \(j=\text{1,2},\text{3,4}\) is the wave, \({q}_{i, Bog}\) is the value of the \(i\)-th parameter of the PDF estimated for Bogotá and \({\sigma }_{i } \sim {N}^{+}\left(\text{0,1}\right)\) is the standard deviation which is assumed to be distributed as a truncated normal distribution. For simplicity, we assumed normal truncated distributions \({N}^{+}\left(\text{0,1}\right)\) as prior probabilities for the parameters at the district level. All the parameters were estimated within a confidence interval of 95%.
We ran all the estimations of posterior samples using the Hamiltonian Monte Carlo (HMC) algorithm implemented in Stan, setting four chains of 2000 iterations (1000 for warming up and 1000 for sampling). To get the best fitting for each epidemiological distribution, we compared the models by pairs using the Bayes Factor (BF), as follows:
$${B}_{ij}=\frac{z\left(y\right|{M}_{i})}{z\left(y\right|{M}_{j})}$$
.
Here \(z\left(y\right|{M}_{i})\) is the evidence of the model \({M}_{i}\), computed using the Laplace approximation corrected with Thermodynamic Integration[21],[22].
We fitted the multinomial regressions and the Bayesian hierarchical models using the Hamiltonian Monte Carlo algorithm implemented in Stan. In all the cases, we used a typical number of 2000 iterations (1000 for warming up and 1000 for sampling) and sampled over 4 chains.