Weekly Bayesian modelling strategy to predict deaths by COVID-19: a model and case study for the state of Santa Catarina, Brazil

Background: The novel coronavirus pandemic has affected Brazil’s Santa Catarina State (SC) severely. At the time of writing (24 March 2021), over 764,000 cases and over 9,800 deaths by COVID-19 have been conﬁrmed, hospitals were fully occupied with local news reporting at least 397 people in the waiting list for an ICU bed. Despite initial state-wide measures at the outbreak of the pandemic, the state government passed most responsibilities down to cities local government, leaving them to plan whether and when to apply Non-Pharmaceutical Interventions (NPIs). In an attempt to better inform local policy making, we applied an existing Bayesian algorithm to model the spread of the pandemic in the seven geographic macro-regions of the state. However, as we found that the model was too reactive to change in data trends, here we propose changes to extend the model and improve its forecasting capabilities. Methods: Our four proposed variations of the original method allow accessing data of daily reported infections and take into account under-reporting of cases more explicitly. Two of the proposed versions also attempt to model the delay in test reporting. We simulated weekly forecasting of deaths from the period from 31/05/2020 until 31/01/2021. First week data were used as a cold-start to the algorithm, after which weekly calibrations of the model were able to converge in fewer iterations. Google Mobility data were used as covariates to the model, as well as to estimate of the susceptible population at each simulated run. Findings: The changes made the model signiﬁcantly less reactive and more rapid in adapting to scenarios after a peak in deaths is observed. Assuming that the cases are under-reported greatly beneﬁted the model in its stability, and modelling retroactively-added data (due to the “hot” nature of the data used) had a negligible impact in performance. Interpretation: Although not as reliable as death statistics, case statistics, when modelled in conjunction with an overestimate parameter, provide a good alternative for improving the forecasting of models, especially in long-range predictions and after the peak of an infection wave.


Introduction
The World Health Organization (WHO) declared COVID-19 a global pandemic in mid-March 2020, prompting countries to take actions to reduce the spread of the virus in view of the serious respiratory problems that require specialized care in Intensive Care Units (ICU) 1,2 . Little was known about this new strain of coronavirus that threatened to overwhelm health systems, had forced several countries to lockdown, and had already been rapidly spreading across Brazil 3 . With no drug treatments or vaccines in sight at that time and in the face of a lack of national measures to prevent the spread of the disease 4, 5 , governors and mayors had to decide independently on the implementation of non-pharmacological measures (NPI) 6 . In the midst of this scenario and despite the harsh inherent challenges of epidemic modeling, mathematical models offered a timely approach to help understand the regional dynamics of contagion of the disease and to predict how this health crisis could unfold in the weeks and months that followed [7][8][9][10][11][12][13][14][15][16] .
A prominent mathematical model was introduced by the MRC Center for Global Infectious Disease Analysis group at Imperial College London in March 2020 17 , the source code and a technical description of the equations used in this Bayesian inferential model were put online in April 2020 18 , with the study being peer reviewed and published later in June 2020 19 . This model sought, above all, to estimate the impact and effectiveness of NPI measures that were being taken by European countries. Nonetheless, the model produces other results of interest such as an estimated number of people infected by SARS-CoV-2 and the variations in the reproduction number (R t ) up until the current date.
In this paper, we present four variations of the Flaxman et al model (here called base model) to forecast deaths by COVID- 19 and overcome limitations that were observed after replicating the model on a weekly basis to the 7 macro-regions within Santa Catarina, a southern state in Brazil. More specifically, we show that the original model was not able to predict major changes in the trend in mortality data. This was most noticeable in weeks when the number of new reported infections showed a rapid increase or decrease. Since the base model did not have access to this data, it could not anticipate change in the trends of infection. The covariates used in the model have also been identified as inadequate, since it limited the effective reproductive number R t to change only at predetermined time points where a NPI measure came into effect 20 . Despite these limitations, the model has been used elsewhere to estimate the impact of NPI measures in two Brazilian states 21 .
Other obstacles, not related to the algorithm itself but which we also consider, are the under-notification of infected cases 22 and delays in test reporting 23 . The data suggests an average delay of 5 days from RT-PCR test collection until the result is available in the official database of Santa Catarina state. As a consequence, at any given date it is almost guaranteed that the last week of data is incomplete and almost uninformative.
Our proposed alterations to the original method aim at overcoming the limitations mentioned above, allowing this mathematical model to be used more effectively for forecasting. Additional equations and algorithmic strategies allow the model to use reported cases to estimate deaths by COVID-19.

Results
We simulated the base model and four variations of our proposed model, (base-ron, base-rnn, base-ror, base-rnr) described in Materials and Methods section and summarised in Table 1. All models add the new reported infections to the equations but they differ in whether we explicitly overestimate infections in the equations and whether an estimated percentage of cases and deaths are added retroactively in the data before running the model to account for delay in test reporting.
Each model produced a forecast for the 7 macro-regions of the state of Santa Catarina in weekly snapshots, starting from 31/05/2020 -when daily snapshots of data became available in the official database system of the state -until 31/01/2021. At every simulated week, the models were fed with data exactly as it were available at that current date and produce a forecast to the number of deaths for the following 30 days. The first week is used as a warm-up (to calibrate the method and allow us to run a lighter variant) and from the second week onward, the priors estimated in the previous week are provided to each corresponding model to help speed convergence. More details about the proposed model, its variations, and our test workflow can be seen in the Materials and Methods section.
On average, both original and alternative models had a similar prediction accuracy on the first 7-10 days of forecast but our proposed models outperformed the base model on the medium term. As shown in Figure 1 which illustrates residual errors aggregated to the entire state of SC, the margin error of predictions made by base model grew wider over time whereas our models maintained a more stable error throughout the forecasting period. Table S1 also provide a numerical comparison of these errors for a window of 7 and 30 days respectively and on Table S2, one could also inspect the results for each of the seven geographic regions that compose the state of Santa Catarina.
The gap between the baseline model and the proposed method over time are most noticeable in particular points in time, as indicated by RMSE plots for 7-day and 30-day forecasting windows of the models in Figures 2 and 3, respectively. The base model showed the largest short-term error in the middle of August 2020 and at the end of November 2020. On the 30-day window, the baseline algorithm is clearly making the worst predictions particularly during August 2020 and the beginning of 2021 ( Figure 3). Compared to the curve of deaths in SC, shown in Figure 4, we observe that these higher errors were made in dates during or immediately after the peaks in daily number of deaths (highlighted dates). Diagnostic graphs produced by the model for these dates confirm that base was unable to reflect major changes in the trend of death data. The model predicted that the number of infections was growing even though data regarding new reported infections already displayed a downward trend (Figures S1a -S1c). One could contrast the diagnostic plots above to the ones obtained by base-ron model for the same dates in Figures S2a -S2c, where this misdirection in predicting death trend was not present in our proposed models.
Our methods outperform the baseline in almost all runs but the best algorithms were the ones that attempt to overestimate the number of infected individuals from the number of reported cases: base-ron and base-ror. Results given by these methods yielded the most stable predictions over both forecasting periods examined. An interesting outcome of these results is that we model overestimate as a Gaussian distribution of the form overestimate m ∼ N (11.5, 2.0) but we noticed that in most cases this variable was optimised to a much smaller value, for example overestimate m ≤ 5 for the "Sul" and "Grande Oeste" geographical macro-regions ( Figure S6). As it happens, the MCMC inference algorithm automatically converges to a value that makes the most sense to the optimisation problem.
On the other hand, base-rnn and base-rnr exhibited larger errors in predictions for certain periods in time, for example the middle of July 2020, later August 2020 or at the end of December 2020 ( Figure 2). Adding an estimate of   the retroactive data did not seem to impact the predictive accuracy since this characteristic was present both in one of the best-performing model (base-ror) and in one of the less predictive ones, base-rnr. Other assumptions embedded in each model variation can be found in the Materials and Methods Section and on Table 1. Another way to visualize these results is by comparing the graph of cumulative deaths with predictions made by base and one of the best performing models base-ron ( Figure 5). Scenarios 01 and 03 show models' 95% confidence interval around 3/12

Discussion
There is evidence in the literature that COVID-19 models are inefficient for long-range forecasting 24 . In this paper we show, however, that one can use "hot" data (i.e., the most recent data, that is constantly being updated and might still not be complete) and update a model on a weekly basis to achieve higher accuracy, and that such "hot" data come with own sets of issues, such as the unreliability, delays in updating, etc. Knowledge of SARS-CoV-2 interactions in the human body, as well as social contagion dynamics of the disease have not been elucidated entirely. A lot is still unknown and epidemic modelling has to address real-time uncertainties and changes as closely as possible. A model which does not take into account the fast pace production of data is bound to underperform when compared to a model based on up-to-date information. Our proposed models overcome some limitations of the original algorithm in Flaxman et al 18 mostly by accessing data regarding daily reported infections and letting it account for under-reporting in a more explicit way. We have also tested some variations in the algorithm to account for the delay within test collection and notification but these did not prove as useful in predicting the death curve in the state of Santa Catarina. These alternative models, however, are not without their failings. While predictions have improved and some assumptions of the model could be confirmed by inspection of the data -for example, the onset-to-death seems to follow a gamma distribution like the original model assumed -there are just too many assumptions in the original model that have not been thoroughly validated 25 (i.e., is the R 0 really that value of R 0 = 3.28?). Another issue is that, from the point of view of the optimisation algorithm, estimated R t values and estimated number of people daily infected are interchangeable. The algorithm could reach two opposing configurations that are equally valid and optimal: one where the reproduction rate is low but there is a large pool of infected people in the population, and another separate solution in which the number of infected people is small but R t is larger.
Also, even though adding infection data into the model has given it more adaptability, the same data could make the model more fragile in the future. If the infection dynamic changes (e.g. because of new more transmissible strains of the virus), the model might be biased to readjust its fitting of the historical data to compensate. This could be counteracted by more

5/12
reliable epidemiological data from other sources (i.e., tracking the prevalence of new virus strains, information about age, or information of entry and exit into ICU) or by including even more granular mobility data, at the expense of the citizens' privacy, all of which are generally more expensive or unfeasible to obtain.
New or data of higher resolution can alleviate some challenges in epidemic modeling, but importantly new observations can help revising assumptions of existing models in search of a more accurate description of real-world cases 24,26 . Our proposed model is one step in that direction of scientific inquiry. We show that a model can become more accurate by adding one more data source and a new assumption about under-reporting the tests, and therefore more useful for forecasting and decision making. We intend to review the other assumptions built into the original model and continue to investigate how much the changes we have introduced are sustained in the face of new epidemic waves, new variants of the virus and new political measures that affect the dynamics of contagion.

Epidemiological Data
Anonymized data on every confirmed case of COVID-19 in SC along with date of onset of first symptoms, date of PCR test collection, date of death and municipality of residence were downloaded from the state government big data platform Plataforma BoaVista 27 . Estimated population in each of the seven regions of Santa Catarina were obtained from the Brazilian Institute of Geography and Statistics IBGE 28 and mobility data for every city was downloaded from Google Mobility 29 . The state of Santa Catarina has over 7 million inhabitants organised in 297 municipalities organised in 6 distinct geographical macro-regions distributed across 95 square kilometers of land area. The state government imposed suspensions of many economic activities after the first deaths were confirmed in SC in March 2020 but ended up relaxing social distancing measures, eventually leading to a decree in June 2020 after which municipal governments would be responsible for most decisions regarding NPI measures. At the time of writing (24 March 2021), over 764,000 cases and over 9,800 deaths by COVID-19 have been confirmed, hospitals were fully occupied with local news reporting at least 397 people in the waiting list for an ICU bed 30 .

Bayesian Modelling
The baseline model (base) is almost identical to the Monte-Carlo (MCMC) model made available by Flaxman et al 18 , with the exception of the covariates used. Instead of manually curated non-pharmaceutical interventions measures, we used Google Mobility data as covariates to the model. This strategy counteracts an implicit confirmation bias of the model 20 and have been used by the original group in a subsequent work 31 . In terms of practical reasons to amend covariates, we mention that, in contrast to European countries that the original model targets, the State of Santa Catarina did not impose strict state-wide measures consistently during this period, with municipal governments being independently responsible for social distancing measures 6 and rendering changes in legislation hard to track. Our Bayesian models rely on the same distributions as the base model so whenever possible we will use the same symbols to mean the same distributions. We refer the reader to their paper for a full description of the symbols 18,19 .
We now describe the equations and modifications of our proposed models, which are also summarised in Table 1. In mathematical terms, we observe daily deaths D t,m and daily cases C t,m for days t ∈ {1, . . . , n} and regions m ∈ {1, . . . , M}. These values might be augmented in some models (e.g. base-ror and base-rnr) to a D * t,m and C * t,m , which are explained later in this section. The objective function of the original model, and part of the objective of our models is given by: where d t,m represents the number of modelled cases, following: where xfr is either the region-specific case-fatality rate cfr (in models base-rnn and base-rnr) or the infection-fatality rate ifr ≈ 0.0076 estimated for Brazil in 31 (in models base, base-ron, and base-ror). All models have an uncertainty factor added in the same way as in the original model.
A key component in all variations of the proposed algorithm is that, instead of relying solely on the death-based data to model the pandemic -largely considered to be the most reliable source -we also use the reported infections as a way to make the model less reactive. In some variations, we suppose that the number of reported infections C t,m might be under-reported, 6/12 so we add a normally distributed overestimate variable overestimate m ∼ N (11.5, 2.0) to model under-reporting explicitly, following estimates that the number of COVID-19 cases in Brazil was about 11 times higher than what is officially reported 22 . So, instead of optimising only the original objective function present in Equation 1, we also rely on Equation 3 below to calibrate our model: where p t,m estimates the actual number of people infected at time t considering an overestimate. c t,m represents the number of reported cases, and it depends on the same assumptions of susceptible individuals S, reproduction rate R, and generation distribution g as the original model: Our models have K = 7 covariates which correspond to the 6 Google Mobility indicators and the percentage of the population of a region which are Susceptible for infection S t,m . Since the Google Mobility data is generally unavailable for future dates, when we are simulating the weekly runs, assumed that they remain constant onward from one week before the snapshot date. The reproduction rate R t,m is assumed to vary with the covariates: the percentage of susceptible population S t,m (the seventh covariate) was modelled with a similar impact measure α pop as the mobility data. On both the base and our proposed models we use an extra-region impact measure α k as well as a per-region impact measure α k,m . To consistently simulate the baseline algorithm, our simulations with base model did not include S t,m as a covariate, and we also used the same way of weighting these covariates, with both a state-wide α k and a per-region α k,m .  Figure 6. Distribution of cumulative retroactively-added cases on the last date.
Some of our models (base-base-ror and base-rnr) also try to take into account the delay between PCR test collection and test result notification. We have looked at how this report delay affects the data in Santa Catarina state and results can be seen on Figure 6 for cases and on Figure 7 for deaths, according to each region of the state. Notice from these plots that infections generally can be between 2.5x to 7.5x higher than their initial reported values, and thus from 60% up to more than 85% of cases are reported with 5 days of delay. Deaths follow a similar, but less volatile, pattern rarely passing values 2x higher than their initial reports, nonetheless possibly having 50% of the data being reported after this period. While this confirms that deaths are the most reliable source of information, this also shows an issue in using such "hot" data for modelling: It can often be incomplete and lead to a false decrease in the number of cases and deaths.   Figure 7. Distribution of cumulative retroactively-added deaths on the last date.
In the aforementioned models, we increase the number of cases and deaths of the past week by a percent delta before running the model to try to compensate for this delay, the distribution of which can be seen in Figures S3 and S4. This gives us: and Another minor notification was that the original model assumed the onset-to-death distribution to follow the distribution Gamma(17.8, 0.45). We kept the gamma function but we estimated the average and deviation from the data at each snapshot. For reference, this value was close to Gamma(20.67, 0.76) on the latest simulated weeks and the distribution of onset-to-death in Santa Catarina can also be seen on Figure S5.

Test Workflow and Validation
Our main goal here is to test which combination of inputs and equations would most accurately forecast the curve of deaths by COVID-19 for the seven demographic macro-regions within Santa Catarina. With this in mind, we planned the test to span the period 31/05/2020 to 31/01/2021, running the model as if it was run weekly using only the data available at that point in time. The results were then aggregated to compose the overall prediction for the entire state.
The models produce an average prediction which we compare to the "ground-truth" number of deaths using Root Mean Squared Error (RMSE) and the Mean Average Error (MAE) metrics (Results in Tables S1-S2 and Figures 2 and 3). Because of test notification delays discussed on the previous section and shown on Figures S3-S4, we selected the data snapshot from 07/03/2021 to represent the "ground-truth" -1 month after the last run simulation. We also wanted to estimate the error of the model given its confidence intervals and to measure this, we calculated RMSE and MAE of the upper and lower confidence intervals and took their average. These metrics RMSE_conf and MAE_conf provide a measure of how distant the borders of the confidence interval are from the truth values (shown in Tables S1-S2).
Another important part of our testing procedure is that by using the priors estimated on the previous week as starting point to the models, the Bayesian inference optimisation converged much faster and we were able to set much lighter hyper-parameters, with fewer iterations than if we had built the model from the ground up every week. The weekly procedure can also be seen in Algorithm 1. The models were run for many iterations on the first simulated week then we switched to a lighter while loading the priors form the last version (Table 2). These hyper-parameters were chosen in a preliminary phase of testing, analysing the trade-off between reliability and execution time. This testing procedure took more than 56 hours on a personal computer to load, model and save the data, for every model. The sequential nature of the tests, which were designed in a manner that better mirrors the real-world use of the models, was one of the main reasons for why the test takes so long. While this means that we only have a single iteration of the model for each week, we still believe the sample size produced is more than enough to allow us to identify which models are better.

9/12
Algorithm 1 Test Workflow 1: procedure TEST 2: last-date ← NULL 3: for current-date in dates do 4: if last-date is NULL then 5: run-and-save-model(current-date, initial-hyperparameters) 6 As per the baseline models 18 Loaded from previous week Table 2. List of hyperparameters used in the tests, with the differences shown between the version used in the first week of modelling vs all the other tests.