Modeling the COVID-19 Dynamics and Explaining the Portuguese CaseA Contribution for Fiting Differential Equation Solutions to Epidemiological Data


 Compartmental epidemiological models are, by far, the most popular in the study of dynamics related with infectious diseases. It is, therefore, not surprising that they are frequently used to study the current COVID-19 pandemic. Taking advantage of the real-time availability of COVID-19 related data, we perform a compartmental model fitting analysis of the portuguese case, using an online open-access platform with the integrated capability of solving systems of differential equations. This analysis enabled the data-driven validation of the used model and was the basis for robust projections of different future scenarios, namely, increasing the detected infected population, reopening schools at different moments, allowing Easter celebrations to take place and population vaccination. The method presented in this work can easily be used to perform the non-trivial task of simultaneously fitting differential equation solutions to different epidemiological data sets, regardless of the model or country that might be considered in the analysis.


Introduction
Back on January 2020, when the World Health Organization (WHO) first mentioned a cluster of pneumonia cases in Wuhan, no one expected that by the 11th of March,  would have spread across the globe and would be declared a pandemic [1]. Infectious diseases have accompanied mankind since the beginning of its existence and are known to have profoundly influenced the fate of entire nations upon their evolution to local epidemics or to great pandemics [2]. COVID-19, caused by SARS-CoV-2 (Severe Acute Respiratory Syndrome CoronaVirus 2), is, however, the first pandemic to ever reach every country in the world within this era of global information. Furthermore, this happened despite what might be considered the largest lock-down in history, applied by many countries.
The well-known works of W. O. Kermack and A. G. McKendrick [3], published in 1927, on the compartmental SIR model, have paved the way for epidemiologists to mathematically describe the dynamics of infectious diseases and became increasingly popular among the scientific community, particularly for the analysis of the current pandemic [4][5][6][7][8][9][10][11][12]. These compartmental SIRmodels are composed of systems of differential equations having no analytical solutions. Probably, this is the main reason why only a very limited number of scientific works actually provide the fit of the SIR model to the COVID-19 related data [6,7,11] and why others prefer to fit the data using approximated analytical expressions, such as the logistic function [13,14].
The number of daily new cases, deceased and recovered related to COVID-19 is made available to the general public in close to real time [15,16], which makes it possible to test models by actually fitting the existing the existing data. Here we propose a model based on the comprehensive PSEIRD(S) model by Beira et al. [11] to fit the data sets related to the infected, deceased and hospitalized cases reported for Portugal, a country that had a severe post-Christmas outbreak. As fitting the solutions of differential equations is a such a non-trivial process and requires an extensive overhead work to implement dedicated software, an online open-access platform was developed. This platform was used to test the modified PSEIRD(S) model and may be used to test other models/assumptions. It is important to note that, as it was observed for the original PSEIRD(S) model when used to analyze the data for 11 countries, the method here presented will eventually work for any country and may be used in the validation of any model.   1 Compartment diagram of the alternative SEIRD(s) model, modified in order to accommodate protected individuals, P, to discriminate between detected, I d , and nondetected, I , infectious cases, and to include the process of vaccination. The flows between compartments are normalized and mathematically related to the model parameters, which are listed and briefly described in the box on the right hand side

Model
The model considers that when the virus first reaches a country the initial number of susceptible individuals is equal to the entire population of that country. As people become aware of the presence of the virus they tend to protect themselves and, as a consequence, become a part of the P compartment. The rate of this transition was considered to be proportional to the number of detected active infected and to a factor α that we decided to call the protection factor. This flow happens as a consequence of individuals following sanitary and social distancing measures, either imposed by the government or self-imposed by each individual as a result of risk perception. There is also an involuntary flux from S to P corresponding to the part of the population that is physically or geographically isolated from the virus.
Multiple outbreaks can be explained by considering flows from P to S with rates k P S , which may differ over time and for specific outbreaks. These flows are a consequence of governments and/or individuals relaxing some or all of the sanitary and social distancing measures.
From the susceptible compartment, individuals evolve, upon contacting with an infectious individual from compartment I , to an exposed state, E , where they are infected but cannot yet infectious. This transition happens at a rate k, which is the product of the number of daily contacts and the probability of a contact generating an infection. An exposed individual evolves to I with a characteristic time, τ I , related with the incubation period of the virus. From I a fraction d is detected and transitions into I d with a characteristic time τ Id and the remaining individuals recover and go into R with the characteristic recovery time τ R . From compartment I d , a fraction l of I d dies and becomes part of compartment D while fraction p1¡lq recovers with characteristic time τ R . Finally, vaccination and loss of immunity can also be accounted for by knowing the daily vaccinated, v, and applying the characteristic reinfection rate, k RP , in the transition from R to P, respectively.
The main difference of this modified PSEIRD(S) model with respect to the original PSEIRD(S) [11] is the elimination of the infection pathway by which a fraction of the infectious individuals would not infect anyone else. Additionally, it was further assumed that the deceased came exclusively from the detected infected compartment, I d , as opposed to what was considered for the original PSEIRD(S) model. These modifications do not disregard the importance of human action in the dynamics of COVID-19, as testing (parameters d and τ Id ), risk-perception (α) and sanitary/social distancing measures (P compartment) are still considered in the modified PSEIRD(S) model and vaccination was even added to this new version.
The mathematical description of the seven compartments composing this model requires the following set of differential equations: In order to fit the data for the cumulative number of detected cases, N T d , only the flow that goes into I d (see eq. 5), needs to be considered. The daily detected cases are obtained from the derivative of the total detected cases The same can be done for the daily deceased, as described by equation 10.
The rate of change in the total number of infections (detected and nondetected) considers the positive flow into compartment I (see eq. 4), It is also possible to know how the number of hospitalized cases and the number of patients in intensive care units (ICU) will evolve. Both these groups are a fraction of the infected detected population, present a specific rate of hospitalization and leave this state with a characteristic time. The differential equations that characterize the flow into and from these compartments are k h and k ICU are parameters that include the percentage of active infected cases that evolve into hospitalized or ICU, respectively, and the rate at which these transitions happen. τ R h and τ R ICU are the characteristic times required for a patient to move out of the hospitalized and ICU states, respectively.

Data and Model Fitting
In order to perform the present study, the data sets corresponding to daily infected, total deceased, hospitalized and ICU patients in Portugal were obtained from the portuguese Directorate-General of Health website [17]. Additionally, the sets corresponding the total detected cases and daily deaths were also analyzed even though they are calculated from the daily infected and total deceased data sets, respectively. The simultaneous fit of instantaneous and cumulative data proved to contribute to stabilizing the minimization process. The model fits were performed using an open-access user-friendly online platform fitteia R [18] (fitteia.org) that uses the non-linear least-squares minimization method with a global minimum target, provided by the numerical routine MINUIT from the CERN library [19].
As the compartmental models require the resolution of differential equations, the Runge-Kutta method integrated in fitteia R was used in the minimization process. After setting the values for each population compartment and parameter at initial time, t 0 , a Runge-Kutta iteration with a time resolution fixed to 0.01 days was applied. Next, the solutions for each equation were compared with the existing data in a loop until a global least-squares minimum in the model parameters space was obtained.

Fixed Parameters and General Assumptions
In view of the fact that the available data sets are not enough to make an independent estimation of all model fitting parameters, assumptions had to be made for the parameters presented in Tab. 1. Although the first COVID-19 cases in Portugal were detected on March 2, 2020, (day 62) those patients were known to have developed symptoms as early as February 26 [20]. Therefore, t 0 was set to 56, which is the day of year 2020 corresponding to February 26.
To accommodate for the initial growth of COVID-19 detected cases, a value of I 0 equal to one was insufficient. I 0 was, therefore, set to three, the smallest value that could explain the observed data. This indicates that on February 26 there were three infectious individuals in Portugal. The need for more than one patient zero may be related to the dissemination of the disease across different regions of the Portuguese territory.
The initial value for the susceptible population, S 0 was set to the total population of Portugal, as explained in subsection 2.1.
The characteristic evolution time from exposed to infectious, τ I , was set to 3.5 days. The reason for this lies in the fact that individuals become infectious about two days before becoming symptomatic. Since the incubation period for this virus is 5 days in more than 90% of the cases and outliers usually fall over this 5 day period, 3.5 seems to be a reasonable assumption [21].
The detection ratio, d, and the characteristic detection time, τ Id are variables that are interconnected. It is possible to obtain the same result if both parameters are multiplied or divided by the same value. Here we considered a combination of d=0.25 (25% of cases are detected) and τ Id =6.25 to reflect the portuguese case. As it can be observed in Fig. 2, the post-Christmas outbreak was associated with a 10% increase in the percentage of positive PCR tests. As a consequence, we reduced the fraction of detection to 0.15 for the time range associated with increased positivity (between day 358 and 399). The fraction of people that die was calculated by dividing the most recent total number of deaths by the sum of the total number of deaths and recovered.
Portugal was one of the countries that did not provide systematic data for the recovered. This compartment is the one that is most subject to reporting delays and, if the reporting delays are not systematic, the data present several discontinuities that are difficult to incorporate in the model. We chose to attribute 14 days to this characteristic recovery time, based on the what was observed for other countries [11].

Model Fits
The results obtained from fitting the infectious, dead and hospitalized data until February 24 are presented in Fig. 3 and the optimized parameters are summarized in Tab. 2. For all plots, the horizontal axis presents the elapsed time from January 1, 2020.
As it can be observed in Fig. 3, the simultaneous fit of the modified PSEIRD(S) model to the provides very good results, which may be extended in order to obtain the projections, assuming, of course, that none of the parameters change. If confinement conditions are maintained, it is is possible to observe that the number of detected daily infections decreases to zero by May 15 (day 500) . As it can be observed, the modified PSEIRD(S) model explains complex disease dynamics, such as multiple outbreaks. These outbreaks are described as a flow of people from the P to the S compartment, whose magnitude increases for larger outbreaks, as seen from the different k P S values, presented in Tab.

2.
The change observed for the k value in the beginning of September, can be assigned to the establishment of strict measures related with extending the use of masks in outdoor public places, in order to prepare a secure reopening of services, such as schools.
For the hospitalized cases, it was not possible to make a fit considering a unique value of k h or k ICU . The values were different for different outbreaks but can be estimated by the initial increase observed for the different waves of infection. The different values of k h and k ICU are consistent with the fact that the rate at which patients are admitted to hospitals is largely dependent on the age stratification related with a specific outbreak (for example, an outbreak where the elderly are the most affected should be related with a larger value of k h and k ICU ) and/or to the season at which it happens (winter season should increase the infected percentage and/or the rate at which patients are hospitalized). Regarding the characteristic recovery time from a hospitalization state, it was possible to make the fits assuming a single value for the different outbreaks except for the most recent one. This outbreak was the most severe in Portugal so far and lead many hospitals to exceed their capacity for receiving COVID-19 patients. Under such conditions, patients tend to leave the hospitalized state sooner because it became increasingly difficult to keep them alive.
The characteristic death time also presents different values for different outbreaks. Naturally, when the number of hospitalized increases to values close to the limits of the health care system (less than 2000 ICU beds in Portugal [22]), mortality tends to happen at a faster rate. At the beginning of the epidemic in Portugal the characteristic death time may have had a smaller value because the knowledge of the virus and of effective therapeutics was probably insufficient at that time.

Remaining Compartments and Model Simulations
Starting from the assumptions presented in Sect. 2.3 and the model fitting parameters presented in Tab. 2 it was possible to generate the time evolution of the compartments for which there is no available data. It is important to note that these time-series are a necessary step to solve and fit the solutions of the set of differentials equations (eq. 1-7), as it is not possible to determine one compartment without considering the remaining. The time evolution of the P, S , R, E , I and I d compartments, as well as NT d , is presented in Fig.  4.
Among all the compartments, the infected non-isolated, I (Fig. 4 c)), is crucial for the description of recurrent outbursts, as these individuals, unaware of their infectious state, silently spread the disease. In fact, the dramatic outcome caused by allowing people to move freely on Christmas can be explained by the following simple reasoning based on this compartment. According to the obtained results, on Christmas eve there were about 58000 individuals in compartment I . Considering that all of these spent Christmas with four other relatives, that means at least 232000 non-isolated infected cases by new year's eve. This value might even be underestimated, as it does not account for the subsequent gatherings that might have taken place during this season festivities (e.g. multiple meetings with parents and in-laws, parties for exchanging gifts, etc). In view of this, it becomes clear that the number of daily cases is not enough to characterize the state of the pandemic. A closer analysis of the curves presented in Figs. 3 and 4 shows that, although the detected daily new cases drops to zero by May 15 (day 500), the number of infected non-isolated is around 20 on the same date. This number is more than sufficient to generate new outbreaks upon careless reopening.
The model fitting software -fitteia R -also provides a straightforward way of making simulations exploring the effects of varying any model parameter in anticipation of confinement/reopening policy changes. Therefore, simulations were made in order to understand how testing can be used to decrease the present confinement duration. For instance, the level of testing is here related with d and τ Id , so we have simulated different epidemic scenarios after March 1 (day 426). The results for these simulations are presented in Fig. 5. In Tab. 3 are presented the dates and N I d values corresponding to different I -related milestones: 50 and 1 cases per 100k inhabitants. As it is economically unsustainable to maintain a state of "endless" (e.g. until I goes to zero) confinement, it is important to evaluate the secure reopening of schools and of other services,especially in view of the dramatic decrease in detected infections and deaths that is currently being observed. As a reference we considered a value of 50/100k silent spreaders, as this was the value obtained for August 31 2020, prior to the first reopening of schools.
Looking at Fig.5 and Tab. 3, it becomes clear that the larger the value of d, the sooner the 50/100k plateau is reached. However, upon tripling d, it is only possible to reach the 50/100k plateau 5 days sooner, which is a relatively small gain compared to the investment that would have to be made to target such a high detection fraction. Massive testing is indeed an alternative to lock-down as pointed out in the original PSEIRD(S) work [11], but testing cannot be made in an arbitrarily slow manner. In fact, the best projected scenario corresponds to doubling the value of d while decreasing the characteristic detection time, τ Id , to two days. It is also important to stress that, as the target plateau gets  lower (e.g. 1/100k), the gains that may happen as a result of changing d or τ Id become more significant.
In order to test the model sensitivity to the onset of possible reopening strategies, such as reopening schools or allowing Easter celebrations to take place, the P-S flow associated with the largest increase of cases while schools were functioning (k P S =1.6) was applied either from March 1 or from April 1. Furthermore, schools reopening was paired with Easter celebrations by assuming the same P-S flow obtained for Christmas (k P S =27.8). Two cases were considered for Easter: either the P-S leakage lasts for 25 days, as was observed for Christmas, or it lasts 50 days. The results of these simulations are presented in Fig. 6.
The model is clearly sensitive to the onset of school activities, especially when an Easter related leakage is considered. If schools are reopened in April, there is a smaller tendency for uncontrollable infections growth, as naturally expected. This fact, although intuitively trivial, is hardly ever expressed by actual numbers, which demonstrates the advantage of using a differential equations solver/fitter and compartmental models such as PSEIRD(S) to project different scenarios.
It is important to stress that the magnitude of the leakage or its duration can have unpredictable values, depending on the efficacy of applied measures and on the extent of compliance. In case of doubling the time range within which the Easter leakage is active, the resulting epidemic wave reaches a much higher amplitude, as can be seen in Fig 6 (violet lines).
If we now go back to the simulation for the P compartment presented in Fig.7 it is clear that, even taking the Christmas leakage into account, the amount of individuals that are still in this compartment is much greater than the number of people that were infected with COVID-19 in Portugal so far. This fact makes it possible to project future epidemic scenarios that are much worse than the one observed immediately after Christmas. The way to dramatically reduce this risk is to vaccinate as many individuals as possible to deplete the P compartment of people and reduce its capacity to feed the susceptible compartment.
In Fig. 7 we present a few scenarios to explore the effect of different vaccination rates starting from the worst scenarios presented in Fig. 6 (violet curves). In the simulations presented in Fig. 7 it was assumed that vaccination started from March 1 with different numbers of effective daily vaccinated. In view of the fact that some vaccines require two doses to fully immunize an individual, in order to achieve the effective daily vaccinated number considered for the simulations in Fig. 6, the number of vaccine doses administered per day has to be adapted according to the type of vaccine.
As it can be observed in Fig. 6, if the number of effective daily vaccinated is 30000 (60000 vaccine doses for two-dose vaccines), then it is possible to decrease the number of infectious cases to zero before the end of 2021. It is also possible to account for vaccination related immunity loss. For example, assuming that immunity lasts five months, five months from March 1, the number of individuals that will have lost immunity is equal to the number of daily vaccinated. In Fig. 8, it is possible to see the simulations based again on the worst scenario presented in Fig. 6 and where the effects of different daily vaccinated paired with a five months lasting immunity are described. As immunity loss is considered, the daily vaccinated needed to eliminate COVID-19 infectious cases by the end of 2021 increases from 30000 to 38000.

Conclusion
In this work we have presented the analysis of the COVID-19 epidemic data for Portugal using the modified PSEIRD(S) model and an open-access online platform -fitteia R at fitteia.org -that, among other functionalities, enables the fitting of differential equation solutions to different sets of epidemiological data. The simultaneous analysis of the infected, deceased and hospitalized data sets, along with considering both daily and cumulative data, facilitated the process of finding the set of parameters corresponding to the global leastsquares minimum. Furthermore, considering the data from the beginning of the epidemic in Portugal has put into evidence similarities, differences and variability of model parameter for the successive outbursts.
The modified PSEIRD(S) model includes some aspects of social behavior and is, therefore, particularly interesting for projecting different scenarios by changing parameters that can be translated into specific actions, such as increasing the fraction of detected cases, decreasing the characteristic detection time and analyzing different numbers of daily vaccinated individuals. The attempt to predict any future evolution of the pandemic is prone to failures, as human and social behavior are essentially unpredictable. The different scenarios presented in this work illustrate how the time series of the different population compartments will evolve assuming that the general response to reopening will follow the same trends observed so far. Naturally, much worse scenarios could be evaluated, but we expect that this work will contribute to put into evidence how to avoid past mistakes.
Regarding the model and the data analysis methods followed, we expect to have made clear that incorporating new compartments, parameters and assumptions (ex: different variants of the virus or age stratification) will be within reach with a relatively small effort, taking into account the technical possibilities the open-access platform at fitteia.org available for general use.
We trust that both the method and the analytical possibilities offered by fitteia R could provide a positive contribution for the analysis of the current pandemic in other countries and of other epidemics. This tool provides a userfriendly way of fitting compartmental models to epidemiological data and, therefore, enables the user to make more robust projections starting from the description of past observations. In fact, if a model does not explain the details of past, there is a good chance that it will be appropriate to explore the near future.

Funding
This study was funded by FCT -Fundação para a Ciência e a Tecnologia.