Generalised SEIR modelling of the COVID-19 pandemic course: data quality issues and structural analysis

Successive generalisations of the basic SEIR model have been proposed to accommodate the different needs of the organisations handling the SARS-CoV-2 epidemic. These generalisations have not been able until today to represent the potential of the epidemic to overwhelm hospital capacity until today. This work builds on previous generalisations, including a new compartment for hospital occupancy that allows accounting for the infected patients that need for specialised medical attention. Consequently, a deeper understanding of the hospitalisations rate and probability as well as of the recovery rates for hospitalised and non-hospitalised individuals is achieved, offering new information and predictions of crucial importance for the planning of the health systems and global epidemic response. Additionally, a new methodology to calibrate epidemic ﬂows between compartments is proposed. We conclude that the two-step calibration procedure is able to recalibrate non-error-free data and showed crucial to reconstruct the series in a speciﬁc situation characterised by signiﬁcant errors over the ofﬁcial recovery cases. The performed modelling also allowed to understand how effective the several interventions (lockdown or other mobility restriction measures) were, offering insight for helping public authorities to set the timing and intensity of the measures in order to avoid the implosion of the health systems.


Introduction
In December 2019, a new coronavirus named severe acute respiratory syndromecoronavirus-2 (SARS-CoV-2), causing severe acute respiratory disease emerged in the region of Wuhan, China [1,2]. SARS-CoV-2 is an acute respiratory infectious disease that spreads through the respiratory tract by droplets, respiratory secretions, and person-to-person contact [3,4].
At the time of the writing of this introduction, a "second wave" of new COVID-19 infections is striking some European countries, while in South America and North America a first wave is spreading at an alarming rate. Its transmissibility is high (reproductive number, R 0 , estimated to be between 2 and 3 [26,6,7]), indicating that a large proportion of the world population can be infected. As a significant number of patients have severe symptoms and need dedicated medical care (around one in every five people who are infected), the potential to overwhelm our health care systems and intensive care units is a real threat [13]. In Portugal, after an incidence spike in early April (cf. Figure 1 (a)), the lockdown measures had been easing in early May, resulting in an increase of the daily incidence of new cases (cf. Figure 1 (a)). Although the Portuguese national health system limits (dashed horizontal lines in Figure 1 (b) and (c)) have never been reached, the current course of the epidemic raises critical management issues, in particular, because the more significant growth of the second wave is threatening to overcome hospitalisation capacity in the short term.
Much work has been done so far on the COVID-19 outbreak course using the so-called compartmental models [8,9,10]. The origin of compartmental models dates back to the early 20th century with the seminal work of Kermack and McKendrick (in 1927) [11]. Compartmental models simplify the mathematical modelling of infectious diseases. The population is assigned to compartments and may progress between compartments. Compartmental order usually shows the flow patterns between the compartments; for example, SIS means susceptible, infectious, and then susceptible again [12]. The models are most often run with ordinary differential equations (which are deterministic), but can also be used with a stochastic framework, as it is the case here [12].
Compartmental models allow predicting how a disease spreads out through the total number infected, the duration of an epidemic, and infection reproductive number (R 0 ), among other important epidemic course parameters. Moreover, such models can show how different public health interventions (e.g. vaccination, limited social contacts, lockdown) may affect the outcome of the epidemic. The basic compartmental model is the SIR model, and other more complex variations are derivatives of the basic SIR model [11]. The SIR model comprises three compartments: S for the number of susceptible, I for the number of infectious, and R for the number of removed (recovered, deceased, or immune) indi-   in Portugal viduals at a particular time. This model assumes incidence grows exponentially, which is not in agreement with the observed epidemic course, as the measures adopted by public health measures at different moments, as well as human behaviours tend to flatten the parabolic incidence curve. Therefore, its predictive value for infectious diseases that are transmitted from humans to humans, without any change on the constant transmission rate is of limited usefulness.
To represent that the number of susceptible, infectious and removed individuals may vary over time (even if the total population size remains constant) a time index, t, might be added: S t , I t and R t . For a specific disease within a particular population, these functions may be worked out to predict possible outbreaks and bring them under control.
For infections with the characteristics of SARS-CoV-2, there is a significant incubation period during which individuals have been infected but are not yet infectious themselves [13]. During this period the individual is in compartment E (for exposed), resulting in the so-called SEIR model. The SEIR model's basic version is still short for our current needs, especially regarding the burden put in the health system. Indeed, the basic version of SEIR model is unable to distinguish between deceased and recovered people, as well as, provide any helpful information on the daily burden put on a health system by individuals in need of medical care. Therefore successive generalisations of the basic SEIR model have been proposed to accommodate either the different needs of the different organisations handling epidemic-related problems or the need to account for public health data not specifically used in the basic SEIR model. This work seeks to pursue these needs and extend such generalisations. For public health officers, the number of confirmed cases is the part of the iceberg that is visible as many more people are infected but still not infectious because the virus is still in the replication phase (incubation period). Indeed, these people would come up as confirmed cases a few days ahead. Knowing in advance how the number of confirmed cases might evolve is of crucial importance for planning reasons.
Moreover, it is of unquestionable value for health system officers to know the share of confirmed COVID-19 confirmed cases who are going to need for specialised medical care and how many of them will recover or die. Additionally, for researchers with no access to micro-data files (due to unavailability or lack of data quality) it is impossible to know some of the flows between some compartments of the SEIR model's generalisations. This work also addresses this problem, establishing a methodology to calibrate epidemic flows between compartments, that is relevant to recalibrate non-error-free data. To accomplish these goals, we build on the work of [14] and decided to propose several extensions and restrictions, including the consideration of a new compartment representing hospital occupancy.
This work is organised as follows: section 2.1 describes the classic SEIR model summarily and its proposed generalisation to meet our aims. It also de- Figure 2: Basic SEIR model scribes our proposals for accommodating some of the characteristics already mentioned as well as others suggested by the empirical analysis of the Portuguese epidemic course; section 2.2 describes the used data and the major data processing tasks prior to modelling using a Bayesian framework and, finaly, section 2.3 presents the hierarchial Bayesian framework used for parameters estimation. Sections 3 and 4 respectively present and discuss, respectively, the achieved results for the case of Portuguese epidemic course and the main implications for managing the course of the outbreak.

Data and model 2.1 The SEIQRHD model
Assuming the vital dynamics (birth, deaths and migration) may be neglected (which is not a hard assumption given the short period of the pandemic) the basic SEIR model is defined as in Figure 2.
The boxes represent the compartments (or states) and arrows represent transitions from one compartment to another. S t is the number of people susceptible at time t, E t is the number of people exposed to the infection at time t (people that become infected but not infective), I t is the number of infective people infected at time t and R t : number of people removed (recovered or deceased) at time t. The parameter β denotes the transmission rate (the expected amount of people an infected person infects per day and is the result of the contact rate -the number of people an average person enters into contact with each day -and the probability that a contact provokes the transmission of the disease). It is multiplied by the ratio S/N to avoid counting contacts between two people who cannot infect each other (e.g., because one of them has already recovered and immune, or because both are infective [19]). The parameter δ is the disease incubation rate (the inverse of the incubation period in days), and γ is the removal rate (the inverse of the removal period in days). When some of these three parameters are considered fixed, they do not, in general, fit the observed course of the current epidemic. For example, the social distancing measures or even the lockdown pull the transmission rate down by reducing the daily contacts one person regularly has; or the improvement and gained experience of the health care system pull the removal rate up by lowering the recovery period or the death rate. Making these two parameters, β t and γ t , functions of time, with user-defined functional forms, is the solution to describe the course of the disease spread appropriately. As already mentioned, for epidemics that last for a short period (some months, for instance) it does not worth to account for demographics dynamics (births, migration, deaths by other causes) as the population can be considered fixed. Nonetheless, for the cases whenever the demographics may impact other transitions might also be considered also to account for those dynamics.
Based on the SEIR model, many measures of the epidemic development measures might be assessed [16,17,18,19,20]. We refer to [15] as an excellent review of the recent work done in this area. The SEIR model was also used to compare the effects of the Hubei province's lockdown on the transmission dynamics in Wuhan and Beijing [27].
As the SEIR model can generate interpretable results, wave derivatives are being developed [15] (a) to simulate the processes of transmission from infection source; (b) to assess the transmission risk and prediction of patient number based into two subpopulations of infected people, those the quarantined and unquarantined; (c) to simulate the incubation period and the period before recovery.
Many of these works consider different generalisations of the basic SEIR model as those four compartments are often insufficient to describe a much more complex reality. First, considering only one compartment to account for removed people, R, is not enough to describe the evolution of patients that recover from the infection and the share that deceases. Secondly, the compartment I does not match the observed amount of the infected people in many situations as many infected people are asymptomatic or only experience mild symptoms (and they are not accounted for by the figures released out by the health authorities) or it does not account for the isolation (quarantine) usually imposed by health authorities making these people no longer infectious. Indeed, the daily number of "confirmed infections" corresponds to people who are isolated (quarantined) to avoid new infection chains. As a solution, many authors [14,15] have been proposing generalisations of this basic SEIR model by splitting the removed compartment, R into two compartments, one to account for the infected people that genuinely recover from the disease, R, and a second one to account for the mortality induced by the infection, D. Moreover, to precisely match the observed number of confirmed cases, one additional compartment, Q, has been considered immediately after the I compartment [14]. Building on the work of [15] we propose an additional compartment to account for hospital occupancy, H. Figure 3 (a) reproduces the generalisation of the SEIR model used in this work. Figure 3 (b) describes schematically the temporal evolution between compartments E, I and Q. Schematic representation of temporal evolution between compartments E, I and Q the so-called "active cases", A. This models includes other not yet described parameters. Further ahead we elaborate on the potential functional forms to describe their temporal evolution, and we will use this opportunity to define them.
The following differential equations describe this is generalisation of the SEIR model (SEIQRHD): The discussion on whether the parameters are fixed is done further ahead; therefore, the subscript t in the parameters is omitted for the time being. Some of the model coefficients might be considered fixed (δ , κ, α), but others (β , η, h, γ and µ) vary in time according to some specific functional form.
The parameter δ is the inverse of the average incubation period and governs the lag between having undergone an infectious contact and showing symptoms. It is generally considered fixed because it depends on the the infectious agent's characteristics to a great extent, The parameter κ is the inverse of the average period taken to isolate a symptomatic person, which usually happens after the person has been tested positive. Therefore, the quarantined compartment matches the "active confirmed cases" in many official data sources in most of the developed countries and the quarantined individuals are excluded from the infectious compartment (I) because they are supposed to be isolated. The time required to isolate a person with symptoms depends mainly on the protocol followed by the health authorities which with some minor adjustments is constant. As explained further ahead, we consider this parameter fixed after an initial period of adaptation, and use an empirical estimate, jointly with a literature-based value of δ to estimated the flows between compartment S and E and E and I. Within the hierarchical Bayesian model they are even considered fixed.
As the lockdown and social distancing measures impact on the number of individuals a person comes in contact with on a daily basis, the parameter β should not be considered constant.
The parameters γ H and γ Q are the recovery rate for hospitalised and quarantined people. They correspond to the inverse of the average time required for an active case (quarantined/hospitalised) to be classified as recovered. They provide precious information about how fast the people may recover from the disease (in days, in general). We believe that both γ Q and γ H are related to how a health system can improve its capability to treat people over time (e.g., with the introduction of a new therapeutic strategies). However, γ Q is much more dependent on recovery confirmation protocols and therefore, much more susceptible to increase over time.
The outer flows from compartment Q depend on γ Q , already discussed, η and h. The rate at which quarantined individuals are hospitalised, η (η − 1 is the average period spent in compartment Q), mainly depends on the illness severity and health system capacity to accept ill individuals, whilst the share of people in need of specialised medical care, h mainly depends on the population health conditions, as well as the mix of the quarantine compartments regarding demographic characteristics as gender and age. Whilst the former might be considered fixed to some extent, the latter, due to the mentioned reasons cannot be. Indeed, recent empirical analysis shows that after the epidemic spike in late April 2020, the share of COVID-19 patients admitted to a hospital decreased from almost 100%, in the early days of the epidemic, to a tiny constant value, in late July. This phenomenon might be explained by the uncertainty and lack of knowledge of the disease course in the early days of the epidemic. However, as time goes by, the medical practitioners learn and become more selective about the patients to be admitted to a hospital. Moreover, as the prevalence curve increases, the health systems tend to avoid the intolerable burden and become more demanding on the hospitalisation requirements. Therefore, there is enough evidence to consider η a constant parameter and h t a time-varying parameter.
The parameter µ is the fatality rate and provides information on the proportion of hospitalised individuals who, unfortunately, die. It depends either on the resilience of the patients and the severity of the disease. As this variation is not time-related we consider it to be fixed. It is worth mentioning here that any transition from the quarantined Q to the death D compartments is not considered at all, as any ill and quarantined individual that gets worse is, at some point in time, admitted to a hospital and eventually recovers or dies. The parameter α denotes the transition rate between H and D. Its inverse corresponds to the time a hospitalised individual takes to die. We anticipate these parameters to be highly variable as, in the ultimate analysis, it depends on the individuals themselves, more than the quality of the health care provided, but we do not see any reason to consider it a time-varying parameter.
Many infected people experience mild or no symptoms at all and are not accounted for (ascertained). Therefore, the Infection Fatality Rate (IFR), i.e. the number of deaths as a proportion of all persons infected with the SARS-CoV-2 novel coronavirus is expected to be much lower than the Case Fatality Rate (CFR), the proportion among confirmed cases. Some works have reported values for the ascertainment rate between 0.4% and 14% in Wuhan [22,23,24,25,26], 28.4% in Italy [27] and just 0.23% in Iran [28]. Unfortunately, the available data does not allow the estimation of this ascertained rate [20], that account for an increased number of people in each compartment than the actual figures.
For modelling and empirical data analysis performed hereafter one uses a discrete-time approximation to the stochastic continuous-time SEIQRHD model defined in equation (1). Consider a time interval (t,t + h), where h represents the length between the time points at which measurements are taken, here h = 1 day. Let dSE t denote the daily number of susceptible individuals who become infected, dSI t the number of cases by date of symptom onset, dIQ t the daily number of confirmed cases, dQR t the daily number of quarantined cases who recover, dHR t the daily number of hospitalised cases who recover, dQH t the daily number of quarantined cases admitted to a hospital for more specialised care and dHD t the daily number of cases who decease. Given initial conditions S 0 = s 0 , E 0 = e 0 , I 0 = i 0 , Q t = q 0 R 0 = 0, H t = 0 and D 0 = 0, and the population size N, the discretised stochastic SEIQRHD model is specified by: This set of equations, jointly with the initial conditions, define the SEIQRHD model used in this work.

Data and empirical analysis
We collected the epidemic data from the Direção-Geral de Saúde (Directorate-General of Health, DGS), the Portuguese health authority. Using the SEIQRHD model described in the previous section we propose to disclose either the course of the Portuguese SARS-CoV-2 epidemic curves at the country level on a daily basis as long as the analysis of the information provided by model parameters. The results of the work based on a simplified version of the model can be found on https://insights.cotec.pt/ on the mosaic Modelos epidemiológicos.
Since the beginning of the pandemic, DGS releases daily data on its course (https://covid19.min-saude.pt/ponto-de-situacao-atual-em-portugal/). The key released variables are number of confirmed cases, number of deaths (nationwide and by health region), number of recovered cases, number of hospitalisations including intensive care units (nationwide), as well as the characterisation of the positive cases by origin, gender and age group. The figures released by DGS are used for modelling purposes as explained below. By September 11th, 2020, the course of the epidemic is as Figures 1 and 4 describe.
The epidemic started on March 2nd, 2020 with the first two cases. Some unexpected spikes of daily recovered cases can be observed between May 14th and   24th 2020. This information corresponds to the release of recovered cases than have recovering during the previous weeks but have not been accounted for before as the recover notifications issued by the primary health care system were not included in the released figures. Moreover, it is globally clear that the released data about recoveries does not reflect the actual daily number of cases as some of these cases tend to be accumulated and only released later on. Therefore, we decided to proceed to a calibration on the number of recovered cases. This calibration impacts the hospital occupancy (H t ) and quarantined individuals (Q t ) and the two flows of recovered cases coming into the compartment R t . Only the series of daily new quarantined (dIQ t ), and the daily states of the active (A t = Q t +H t ), recovered (R t ), hospitalised (H t ), and death (D t ) cases are released by DGS, but the fact that the number of recoveries is released with some delay, also leads to inaccuracies over the stock of quarantined cases and consequently on the active cases. To account for the mentioned dependencies we decided to calibrate the unknown flows (dSE t , dEI t , dQH t , dQR t , dHR t ), using the available data: (a) the daily number of new confirmed cases, dIQ t , (b) the daily number of new deaths, dHD t , and its daily stock D t , (c) the daily number of new recovered cases, dHR t + dQR t and is stock R t (d) the daily number of hospitalisations, H t and (e) the daily number of quarantined cases The calibration methodology is a two-step procedure. In the first step we calibrate the series of exposed (dSE t ) and infected (dEI t ) based on the incubation period of COVID-19 and the time from symptom onset to laboratory confirmation (based on the analysis of COVID-19 micro-data file provided by DGS). Table 1 summarises the distribution parameters used to calibrate the daily incidences of dSE t and dEI t as described in Figures 5 and 6. The median incubation period of SARS-Cov-2 has been estimated at approximately 5-6 days, with a range between 1-14 days [29,30,31]. We used the value provided by [31] for the gamma distribution. We assume that the mean time from symptom onset to diagnosis confirmation in Portugal decreased from 5.3 days (with a standard deviation of

(a)
Density  4.8 days) in the early period of the epidemic (early March) to 4.7 days (standard deviation of 6.5 days) in early April (based on empirical analysis pf the DGS provided microdata). To reflect the above, we modelled the time between case reporting and symptom onset with a Gamma distribution with a shape/scale parameter of 4.7/2.5 on March 2, decreasing linearly to 2.6/4.3 on April 2 and after ( Figure 5 (a)). Similarly the time between case exposure (infection with SARS-CoV-2) and diagnosis confirmation (based on empirical analysis DGS provided microdata) was modelled with a Gamma distribution with shape/scale parameter of 4.7/2.5 on March 2, decreasing linearly to 2.6/4.3 on April 2 and after ( Figure  6 (a)).
The calibrated series dEI t and dSE t are shown is Figure 5 (b) and Figure 6 (b). Before moving on to the second calibration step, last teh 15 days in the dataset were trimmed to adjust for in case ascertainment delays.
In the second step, as the available data provide no information on the daily  flows from the Q compartment to the H and R compartments (dQH t , dQR t ), and from the H compartment to the R compartment, (dHR t ), we decided to use the discrete-time approximation to the stochastic continuous-time SEIQRHD model defined in equation (1) with the following assumptions: is an indicator function taking the value 1, if the argument H t − H t−1 + dHD t + dHR t ≥ 0 is true, and zero otherwise, γ 0 Q > 0, ρ γ Q > 0 and 0 < w < 1.
The first assumption originates from the fact, already mentioned, that the recovery rate increases with time reflecting the fact that the health system can improve its capability to treat people over time. The second assumption originates from the empirical evidence that about 80% of COVID-19 patients recover from the disease without needing special treatment, and for the majority -especially for children and young adults -illness due to COVID-19 is generally minor. However, for some people it can cause serious illness (20%) -difficulty in breathing requiring hospital care -which is particularly true for people who are aged over 60 years and people who have underlying medical conditions such as diabetes, heart disease, respiratory disease or hypertension. Therefore, assuming that ω% of the total outgoing flows from Q t go to H t we have: The third assumption relies on the fact that, having calibrated the values of dHR t and using the observed values of H t , dQH is calibratedt by difference, provided that results in a non-negative flow.
The flows described in the assumptions are only used for t ≥ 13, and t ≥ 12, because the number of hospitalised individuals, H t , is equal to the observed cumulative number of quarantined cases in the first 12 days of the epidemic, h t = ∑ t i=1 dIQ i , t = 1, ..., 12 and we a assume no recoveries in the first 11 days of the epidemic.
The "hat" in the above notation denotes calibrated figures (either on the first or the second procedure steps) whilst the figure with no "hat" are currently observed.
The calibrated value dQH (1) t (7th step) is submitted to a second verification in the 10th step of the algorithm to avoid flows into compartment H that are higher than actually observed. Indeed, if the calibrated valueĤ t is higher than the observed value H t we reduce the inflow into H t , dQH (1) t , by the differenceĤ t −H t , provided this reduction produces a non-negative value, dQH (2) t . In evolutionary computation, differential evolution (DE) [38] is a method that optimises a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Such methods are commonly known as metaheuristics as they make few or no assumptions about the problem being optimised and can search very large spaces of candidate solutions. DE is used for multidimensional real-valued functions but does not use the gradient of the problem being optimised, which means DE does not require the optimisation problem to be differentiable, as is required by classic optimisation methods such as gradient descent and quasi-Newton methods. Therefore, DE can also be used on optimisation problems that are not even continuous, are noisy, change over time, etc.. DE optimises a problem by maintaining a population of candidate solutions and creating new candidate solutions by combining existing ones according to its simple formulae, and then keeping whichever candidate solution has the best score on the optimisation problem at hand. Using a DE algorithm (provided by R package DEoptimR) we minimise the loss function: The first and the second rows of the loss function (4) ensure that the calibrated number of quarantined and recovered individuals on the correction (day 84th) matches the observed and reliable figure as closely as possible . The third and the fourth rows ensure the calibrated number of quarantined and recovered individuals match the known figures at the last observation as closely as possible . The second last row ensures the full-length time series of hospital occupancy is calibrated as closely as possible and finally the last row ensures the accumulated number of recovered individuals originated on the quarantined compartment, R (Q) t = ∑ t i=1 dQR i , is equal to the number of recovered individuals recorded by the Portuguese primary health care system, but released at once on May 24th, 2020 (9652 individuals). This procedure allows the calibrations of the flows dQH t , dQR t and dHR t , conditional to the flows dSE t and dEI t calibrated in the first place, as described above.
The upper and lower limits of the hypersquare where the best solution is searched for are: h , τ (u) ) = (1/20, 0.2, 0.9, 0.2) and • (γ   Figure 7. The calibrated flows dQR t , dHR t and dR = dQR t + dHR t are represented in Figure 7 (dR) along with the observed total flow dR (the deviation between the observed and calibrated dR on 2020-09-27, the last day under analysis, is -4.22 %).
As expected, the accumulated number recovered individuals on May 25th (day 85) is interpolated, back-casting the excess of recovered cases released on May 24th (day 84), accordingly to the interdependence model dynamics. As such, the curve of calibrated quarantined individuals reproduces almost exactly its course after May 24th and flattens before this day, reproducing the calibrated recovered flows (dQR t ). This last flow, jointly with the flow of recovered individuals from the hospitalised compartment, dHR t are calibrated resulting in reliable courses both for accumulated recovered individuals and daily recovered individuals.

Bayesian hierarchical modelling
Using the discrete-time approximation to the stochastic continuous-time SEIQRHD model 2 defined above we set up a Bayesian hierarchical model where the incidence variables are assumed stochastic. The transitions of individuals from the one compartment to the next one of the SEIQRHD model are considered stochastic movements between the corresponding population compartments. In each period an individual either stays in or moves on to the next compartment. In reliability analysis life-time is usually considered to follow an exponential distribution. By analogy, here the time length that an individual spends in a compartment is exponentially distributed with some compartment-specific rate λ t . Therefore, the probability of extending the stay by a further period of length h is exp(−λ t h) and the probability of leaving is therefore 1 − exp(−λ t h). The summation over the individual Bernoulli trials assuming they are independent and identical for all members of a compartment would result in binomial distributions [32]. Due to the scale we are working with, we found it useful to take advantage of the approximation of the binomial to the Poisson distribution. Therefore incidences are assumed to follow the distributions: where the transition probabilities are given by: assuming the time unit h is one day. Conditional on all information up to time t, the Poisson random variables dSE t , dEI t , dIQ t , dQH t , dQR t , dHR t and dHD t are independent. The model further assumes that the population size N remains constant and that individuals mix homogeneously. In order to account for the effects of control measures such as lockdown and social distancing, we assume that the transmission parameter β is constant up to the time point when the control measures are introduced and after that decays exponentially at a constant rate. During the SARS-CoV-2 epidemic in Portugal four distinct periods can be distinguished. The first one, between the beginning of the epidemic and the declaration of the state of emergency, the first hard lockdown control measures (on March 18th 2020) where the transmission parameter is assumed to be constant. The second one, between March 18th, 2020, and the declaration of the situation of calamity when the lockdown measures started easing (May 4th, 2020). A third period from May 4th onwards and until August 18th, corresponding to the traditional Portuguese holidays time and the beginning of the school year and finally, from August 18th onwards. This is formulated as: where τ 0 , τ 1 and τ 2 correspond to those three intervention dates (March 18th, May 4th, and August), β 0 is the initial transmission rate, and ρ β 0 , ρ β 1 , ρ β 2 > 0 are the rates at which β 0 , β 1 and β 2 decay on τ 0 ≤ t < τ 1 , τ 1 ≤ t < τ 2 and t ≥ τ 2 , respectively. The basic reproduction number R 0 is defined as the average number of secondary cases generated by a primary case over his/her infectious period when introduced into a large population of susceptible individuals [33]. The constant R 0 thus measures the initial growth rate of the epidemic and for the model above 34]. Furthermore, [35] define the time dependent effective reproduc-tive number R 0 (t) = β t κ S t N as the number of secondary cases per infectious case at time t. Because S t ≈ N, it follows that R 0 (t) = β t κ is a function proportional to the time-varying transmission rate in (7). The time point at which R 0 (t) assumes values smaller than 1 indicates when control measures have become effective in controlling the epidemic. The parameter κ governs the passage from the infective to the quarantined compartment, as explained before. An infected individual is not automatically quarantined, because the authorities are often unable to test enough people while keeping pace with the spread. This measure is especially difficult because many people do not develop symptoms at all, but can transmit the infection to others. So, we believe that κ also contains some information about the percentage of the detected infective people. A study by [35] proposed the introduction of an additional parameter to understand this issue better. Despite all these considerations, we used values estimated from a microdata file provided by DGS in the first step of data calibration. Therefore, we decided to move on a parsimonious parameterisation for κ, considering it fixed and equal to 4.73 days (see Table 1). Similarly, the incubation period, δ −1 is considered fixed and equal to 6.5 days (see Table 1). Accordingly to what we have discussed before, we assume η is constant, and h t follows the model: The parameters γ Q and γ H govern the passage from the Q and H compartments to the R compartment. As already mentioned, we assume γ Q is much more dependent on protocols for recovery confirmation and therefore much more susceptible to increase over time than γ H . Therefore, γ Q follows an exponential trend, whilst γ H is considered fixed. The assumption is that the recovery rate, γ Q (t) converges towards an asymptotic value: The value of γ 0 Q is the final asymptotic value of the recovery rate. It depends on ρ γ which represents how fast the health system learns to respond to the disease and the protocol on recovery confirmation evolves. Finally, the parameter µ, which governs the passage from the H compartment to the D compartment, corresponds to the share of the hospitalised people who dies due to the disease. Again, we believe µ is related to how a health system can improve its capability to treat people over time. Nevertheless, for parsimonious reasons and given the residual hospitalised fatality rate of COVID-19 we decided to assume µ is fixed. Evoking the aforementioned reasons, the parameter α is also considered fixed, but highly variable between individuals.
Let dSE = {dSE t ,t = 1, 2, ..., T } be the daily observed (calibrated) counts of susceptible individuals who become infected. We define similarly the other vectors: dEI,dIQ,dQH,dQR,dHR and dHD. Because the series are conditionally independent the likelihood of the data is: where p(·|·) stands for the Poisson transition probabilities specified in (5) conditioned on θ and on all the information up to time t. Given the hierarchical representation presented above and using the same notation, one can evaluate the posterior distribution of all of the parameters, given the observed counts: Π(θ |dSE, dEI, dIQ, dQH, dQR, dHR, dHD).
One cannot evaluate this posterior distribution analytically and must resort to numeric simulation methods. We use the special case of MCMC known as Gibbs sampling [36] and implement the algortithm using the R package JAGS (all code used in this paper can be obtained from authors upon request).

Results
After a burn-in period of 10,000 iterations by which we believe convergence has been achieved, a sample of size 250 taken every 500 iterations (to avoid serial correlation, especially for parameters where identification problem may arise, such as η and h and µ and α) of the chain was used to obtain marginal posterior distributions for all model parameters.
Convergence of the Markov chain was assessed using a series of runs for four different chains with different starting values and also inspecting the autocorrelation function (cf. Figures 14, 15, 16, 17, 18 in Appendix A). In all cases the Markov chain appeared to have converged after the burn-in period.
The model fit is assessed by plotting the data against the estimated expected values of the compartments. The fit of the model is evident from Figure 8 for the period corresponding to the used data. Table 2 presents the main results for the model parameters.
The mean of the basic reproduction number (R 0 ) is 1.35 with a 95% credible  interval of (1.32 -1.38) and varies between 1.31 and 1.39. The distribution of the basic reproduction number is depicted in Figure 9 (a). The temporal evolution of the effective reproduction number is represented in Figure 9 (b). The imposition of the lockdown and social distancing measures were effective in pulling the reproduction number down to a value below the epidemic control of one. Nevertheless, easing those measures implied a significant increase on the infection transmission in early May. Still, the level that was reached kept the disease's spread under control and additional restrictions made it possible to bring it to levels close to the threshold, although the end of the epidemic cannot be anticipated yet.
The posterior mean of transmission rate before first intervention (β 0 ) is 0.27 with a 95% credible interval of (0.265 -0.276). These values are consistent with the values published in the literature (see, for example, [14]). By the time the second intervention takes place, the value of β (t) is only 0.13 (0.125 -0.134). This explains the benefits of imposing lockdown measures to bring the transmission rate down from 0.27 to 0.13. As already mentioned, on May 4th the lockdown measures started to be eased and the transmission rate raised to 0.239 (0.235 -0.243) starting a new downward path. Nevertheless, the end of the Portuguese traditional holidays period and (second half of August) and the beginning of the academic year push the transmission rate up again from 0.23 to 0.298, due to the natural increase of daily contacts. The distribution of ρ β 0 , ρ β 1 , and ρ β 2 are Gaussian-like with posterior means 0.016 (0.015 -0.017), 0.00035 (0.00018 -0.00053), and 0.00927 (0.00809 -0.0103). The rate of decay of β 1 is very minimal. Indeed, the value of β (τ 2 − 1) (the day before of the third intervention) is 0.23 indicating the transmission rate after the second intervention (0.24) has almost no decay at all. Actually, although the current mobility levels (cf. Google Mobility Reports) are below the pre-pandemic recorded figures, from early May onwards they started to catch up and are currently close to the pre-pandemic levels from early May onwards, potentially inducing a large number of daily infections. The posterior distribution of the transmission rates, β 0 , β 1 and β 2 are represented in Figure 10 (a), (b) and (c). The temporal evolution of the transmission rate is represented in Figure 10 (d).
The posterior mean of the hospitalisation rate (η) is between 0.443 (0.247 -0.813) corresponding to a mean period of 1.2 -4.0 days between case confirmation and hospital admittance. The posterior mean of the initial hospitalisation probability h 0 is 5.24% (3.14% -8.11%). Confirming what is currently known about the severity of the disease this figure indicates that only a small proportion of the quarantined individuals need specialised medical care. The posterior distributions of the hospitalisation rate, η, and initial hospitalisation fraction, h 0 are represented in Figure 11 (a) and (b). The temporal evolution of the hospitalisation fraction is represented in Figure 11 (c).
Regarding recovery, we observe a mean asymptotic recovery rate of 0.016 (0.0158 -0.0163). This rate translates in a mean asymptotic recovery period of 62.3 (61.5 -63.2) days. Although it seems to be larger than expected, this value is affected by the speed of the reporting of recovered individuals. Indeed, it is consensual between epidemiologists in Portugal that recovered individuals figures released by DGS have been permanently below the expert expectations, given recovery time reported in the literature (around 14 days, on average) and admitted by clinical practitioners. Indeed, by the time we of finishing this manuscript DGS has already released a batch of additional 13529 recovered individuals (referring to the previous period with unspecified allocation) that would result in a reduction of the recovery time consistent with the usual clinical mean recovery period. The posterior distribution of the asymptotic recovery rate is depicted in Figure 12 (a). The temporal evolution of the recovery rate is represented in Figure 12 (c).
On the other hand, the posterior mean of the recovery rate of hospitalised individuals is 0.121 (0.108 -0.146) which corresponds to a period of 8.3 (6.834 -9.297) days in hospital before recovery confirmation. That is a perfectly reasonable period despite its dependence on the health system promptness. The poste- rior distribution of the mean recovery rate of hospitalised individuals is depicted in Figure 12 (b). Finally, the posterior mean of the hospitalised case fatality rate (HCFR) (µ) is 19.93% (10.79% -32.49%). We must remark that this HCFR concerns only the hospitalised individuals and should not be compared with the case fatality rate (CFR) only concerns the recorded number of infections. Indeed, in our model the case fatality rate is given by the ratio of D t (accumulated number of deaths) to C t (accumulated number of infected individuals). The course of the CFR is represented in Figure 13. It follows a convergent path to a maximum value around 3.992% which is in line with the values reported by the literature, for countries that have adopted a lockdown policy for a significant period [40,41]. The posterior mean of the death rate, α, is 0.102 (0.056 -0.179) which corresponds to a period of 9.8 (5.581 -17.882) days in hospital before death. As anticipated this is highly dispersed value denoting a significant heterogeneity of the length of stay in hospital among individuals.

Discussion
This work adopts a generalised SEIR model, the so-called SEIQRHD model, to offer a quantitative overview of the complex structural analysis and prediction of the SARS-CoV-2 epidemic. The work was developed during the first wave of course of the epidemic. By the time we completed the manuscript, a new wave of increased infections was striking all over Europe.
The proposed SEIQRHD model is, to the best of our knowledge, is the first considering a specific compartment to follows the hospitalised individuals who might put an intolerable burden on the national health system, especially now that a second wave is starting and no one can anticipates the course of the epidemic when the Winter arrives This model offers to health authorities the possibility to predict of the number of individuals who are going to require hospitalisation, thus allowing for better planning of resources. Moreover, the set of model parameters (some of them ordinary in SEIR model specification), either time-fixed or time-varying, provide useful insights on the health authorities reaction to the daily challenges. Namely, it is possible to monitor (a) the time between case symptom onset and subsequently isolation, (b) the time spent in isolation, (c) the hospitalisation rate and the probability of hospitalisation and its evolution over time which provides information on the near future burden to the health system among other important insights as discussed in the section devoted to the analysis of results. It is also possible to distinguish between recovery rates and duration for hospitalised and non-hospitalised individuals, allowing a much deeper understanding on the recovery evolution and the efficiency of hospital care.
An additional future step over of this line of modelling is the separation between hospitalised noncritical patients from the ones in intensive care, which may of course offer authorities with additional insight for planning purposes.
Other researchers might use the methodology developed here to either calibrate non-error free or missing epidemic data. Indeed, as [39] points out, frequently the official databases are not exempt from measurement errors and/or missing data that must be accounted for and fixed before any further analysis. Estimating missing flows or the calibrated version of observed series can be done using the rationale behind the two-step calibration procedure proposed in this paper. For Portuguese data the calibration procedure has shown to be crucial to cope with errors on the official recovery cases, with impact over virtually all the transition rates between compartments and particularly the ones related with hospital occupancy: hospitalisation rate and recovery rate from hospitalised patients.
Both the calibration methodology and the Bayesian model were adapted to the Portuguese case and corresponding available data. However, we believe the data released in Portugal, with some minor differences, are widely available in most of the developed countries. Therefore, the methodologies developed here might be successfully replicated in many other countries.
The numbers retrieved from the Portuguese data show that Portugal has been dealing with the epidemic promptly and the measures taken to flatten the incidence curve have proved to produce the desirable results. In particular, the performed modeling allowed to understand how the contact rate has evolved over time and more importantly, how effective several interventions were (lockdown or other mobility restriction measures). For example, the first Portuguese intervention allowed the transmission rate to reduce from 0.27 to 0.13, but the second intervention was much less effective only allowing a reduction from 0.24 to 0.23. It is important to notice that the first intervention corresponded to a full lockdown, while the second one corresponded to setting limited mobility restrictions. It also allows us to understand the price to pay when restrictions are removed or eased. In fact, we have observed a rapid increase of the transmission rate from 0.13 to 0.24 right before the second intervention and from 0.23 to 0.30 before the third one.
These results may help public authorities to set the timing and intensity of the measures in order to achieve the planned results and particularly to avoid the implosion of the health system.