A Hawkes process model for the propagation of COVID-19: Simple analytical results

We present a model for the COVID-19 epidemic that offers analytical expressions for the newly registered and latent cases. This model is based on an epidemic branching process with latency that is greatly simplified when the bare memory kernel is given by an exponential function as observed in this pandemic. We expose the futility of the concept of “bending the curve” of the epidemic as long as the number of latent cases is not depleted. Our model offers the possibility of laying out different scenarios for the evolution of the epidemic in different countries based on the most recent observations and in terms of only two constants obtained from clinical trials.

incorporate census and mobility data [2,9]. These models lay out different scenarios depending on the degree of compliance of social distancing measures. But in either case, the exact data being used as well as the assumptions and algorithms are not always available to the public or straightforward to understand, let al one reproduce [6].
In this work we develop a simple epidemiologic model for the propagation of the current COVID-19 pandemic that is amenable to analytical solutions easy to implement. Our model evidences that bending the epidemiological curve is not as relevant as depleting the number of latent cases (contagious, but not registered yet), while providing simple expressions for this quantity from publicly available data. Finally, we lay out three different scenarios for the evolution of the epidemic in the USA. Other studies that lay out scenarios for the evolution of the pandemic have also used publicly available data [10].
The model. - Figure 1 shows the time series of the registered weekly new cases (red dots) for four different countries that have handled the epidemic very differently along with the number of latent cases (solid blue traces) calculated with our model. To construct it, we make use of the fact that in those countries in which the peak of a first outbreak has been reached, the relaxation of this dynamics is well described by a decaying exponential (insets of figs. 1(a) and (b)). Epidemic processes for which the Fig. 1: Weekly registered new cases (dots) and calculated latent ones (eq. (3)), along with the corresponding reproduction numbers Ri = βi +γ (eq. (4)) for (a) South Korea, (b) Italy, (c) USA and (d) Mexico. See main text for details. Insets show the weekly new cases in log-linear scale. The solid black line in (a) depicts an exponentially decaying activity (eq. (5) with β = 0) setting γ = 0.5, as calculated from clinical studies (see the appendix "Methods"). This apparent agreement suggests contagion events were successfully brought to a halt in S. Korea during that relaxation period. In the inset of fig. 1(b), the solid line is the best fit to eq. (5) (β = 0.194 ± .004) assuming γ = 0.5 as before.
analytic form of the relaxation dynamics is known can be conveniently modeled by the so-called self-excited Hawkes conditional Poisson processes [11].
These processes are composed of two main ingredients: the first one is the "branching ratio" that accounts for contagion. The second one is a latency function -also known as the bare memory kernel-that incorporates the fact that there is a statistical lag between contagion and the specific action whose aggregated dynamics is recorded. We propose that the weekly new observed or "registered" COVID-19 cases at time t -henceforth called the "activity" and denoted by λ t -obeys the dynamics from a discrete-time Hawkes process which in general takes the form where ϕ t is the latency function, β i is the number of people infected of first generation (i.e., directly infected by individual i at time t i ) that will become registered at any future time, and the term S t represents the exogenous sources of new cases. In the case at hand, these sources are the number of people already infected arriving in some new city [9]. When the latency function is given by a decaying exponential with constant (1/τ ) and β i = β for all i, the solution of eq. (1) takes the form where the constant θ is a function of β, as we show below in more detail. Equation (2) reveals a key feature characterizing Hawkes processes: when the latency function is given by either a power law -as in the case of many human activities [12][13][14] and earthquakes [15]-or by a decaying exponential [16] as in the present case, the activity during relaxation is slowed down, but it remains described by the same function. In other words, contagion renormalizes the function that describes the activity in the absence of exogenous sources. Thus, if the form of the bare memory kernel is known, the relaxation dynamics can give information about the magnitude of the contagion taking place during the relaxation period. In general, the activity derived from Hawkes processes is non-Markovian [17], but in the special case when the bare kernel is given by a decaying exponential -as is the case of the COVID-19 pandemicthe activity at time t depends only on the activity at time t − 1. This feature makes our model amenable to simple analytic solutions that help shed light into the relevant mechanisms at play [16]. In a nutshell, the key to the simplicity of the model and its solution rests in the fact that for an exponential bare memory kernel, the contribution of both contagion and latency at time t can be expressed as a fraction of the previous activity at time t − 1 [16]. This property is not shared by processes involving power-law bare memory kernels, where use of the Laplace transform is needed to solve a time-continuous version of eq. (1) [13][14][15]. The discrete time-step process of eq. (1) applied to the case at hand can be understood more clearly with the schematic diagram shown in fig. 2. We consider an initial number λ −1 of "patients-zero" infectious at time t = −1 that are exogenously introduced to the population. At time t = 0, these individuals will follow one of three paths: they will either 1) be registered with probability α, 2) not be registered but continue being contagious with probability γ, or 3) recover with probability ρ = 1−α+γ. The λ −1 patients-zero will also infect a number β 0 λ −1 of new individuals at time t = 0. In SIR-type models, the factor β is proportional to the so-called "susceptible" portion of the Hawkes model with an exponential bare memory kernel. A normalized initial population of "patients-zero" on week −1 follows three possible outcomes on the following week given by the fractions: ρ (recovered), γ (still infectious but not registered) and α (registered and isolated). A new generation of infectious individuals is given by βt times the population that originated it at time t. The sum of elements containing an α (elements inside circles) at a given time gives the total activity (eq. (3)), while the latent cases (eq. (5)) are the sum of the remaining terms after the recovered ones (elements with a ρ) have been subtracted. A tree is formed by replicating the four possible trajectories for each element that is still infectious at subsequent times. The dashed black arrows ending at the grey square depict the trajectory followed by the portion of the patients-zero that were asymptomatic and recovered at time t = 2. Note these asymptomatic individuals infected others before recovering.
population that gets depleted as the epidemic advances. In the form displayed in fig. 2, the model we propose here is valid only for the initial propagation of the pandemic since an effectively infinite source of susceptible individuals is considered, but it could be modified to incorporate a finite-sized population.
A second important assumption we make is that individuals that have been registered will either self-isolate or be hospitalized, but in either case, they will no longer be a source of new infections. In contrast, those that have not been registered yet can continue infecting others until they finally either recover or fall ill after which they will become registered. This contemplates the fact that asymptomatic individuals have been shown to be an important source of new COVID-19 infections [18] (see, for example, the dashed path in fig. 2). Our model can be considered as a simplified version of other more complex epidemiological models that take into account social mobility to be able to relate the reproduction number with confinement measures [3]. The solution to eq. (1) for the registered new cases in the absence of exogenous sources is The activities at times t and t + 1 yield the relation: From this last equation we can establish the condition for the contagion to either grow (β t + γ > 1) or die out exponentially (β t + γ < 1). Thus, we associate (β t + γ) with the effective reproduction number R t at time t. While the collection of reproduction numbers {(β 0 + γ), (β 1 + γ), . . . , (β t + γ)} merely constitutes a parametrization of the activity in terms of exponential changes, from it we can construct the time series of the latent cases. From fig. 2, these latent cases are given by Alternatively, we could have simply postulated eq. (5) for the number of latent cases as a starting point and propose that at every time period, a fraction α of the previous latent cases is registered. However, the current exposition helps put our model in the general framework of the important class of Hawkes processes, and, in doing so, underscore the role of contagion in the renormalization of the latency function.
Knowing λ L,t allows us to calculate the number of cases that are still to be registered in the extreme optimistic scenario where one could assure no more contagion events will take place. This is readily calculated as where A is the fraction of asymptomatic individuals and is related to ρ (see the appendix "Methods"). This scenario is a particular case of β i = β for all i's. In this case, the activity λ t can be written as where θ ≡ τ Ln(βe 1/τ +1) and 1/τ ≡ −Ln(γ). The form of this equation shows explicitly how contagion renormalizes the activity from a decaying exponential with constant (1/τ ) in the absence of contagion (β = 0), to one with constant (1/τ )(1 − θ) otherwise. Equations (3)-(6) can be used to lay out different scenarios, as we shall show next.
Results and discussion. -To apply our model to real data, we aggregate the registered, publicly available daily new cases of all the regions (or states) of a given country and during a whole week into the observed activity. This means that the step size of our model is equal to one week, thus smoothing the daily fluctuations observed in most countries. We are aware that in general the epidemic tends to evolve differently in each region of a given country since states may follow particular social distancing laws. Therefore, in its present formulation, ours can be considered as a mean-field model that also ignores the topology of the social network and any mobility factors [19]. According to clinical studies, the average time from contagion to attending a hospital is ≈ 2 weeks [20] from which we obtain γ = 0.5, or, in terms of the decay constant, (1/τ ) ≡ − Ln(γ) = 0.693 (see the appendix "Methods"). The inset of fig. 1(a) shows that an exponentially decaying activity with this same constant is in apparent agreement with the dynamics observed for South Korea from week #6 to week #10 (as counted from the week ending on February 21st, 2020), suggesting the intense cluster tracing, mass testing, and the imposition of quarantines on people travelling from abroad brought the number of new infections close to zero in this country during that period [21,22]. Similarly, we calculate ρ = 0.1 after assuming a conservative estimate of 20% of asymptomatic cases [23] (see the appendix "Methods"). Other relevant studies that also take into account asymptomatic cases can be found in refs. [24,25]. Figure 1 shows the latent cases vs. t using eq. (4) and the values for γ and ρ calculated above for four countries that have responded in very different ways to the pandemic. This figure shows how South Korea ( fig. 1(a)) managed to stop a first wave of contagion by week #6 (April 3rd, 2020), a fact reflected in the reproduction number being closer to γ. Consequently, the number of latent cases also decreased sharply reaching almost the same number of registered cases by week #8 (mid-April, 2020). By then, our model shows there were practically no unregistered cases propagating the infection. In the case of Italy ( fig. 1(b)), where it took longer for the government to impose confinement measures, the number of latent cases reached almost 140000 by week #4 (March 20th, 2020), which amounts to close to 60% of the total 230000 new cases this country has registered since then. It then took almost 10 weeks since the peak of activity was reached for these latent cases to decay, despite a strict lockdown being enforced around week #4. Using eq. (5), we estimate the reproduction number in Italy in the period between weeks #9 and #15 to be (β + γ) = 0.694 ± 0.004 (see inset of fig. 1(b)). After reaching this stage, relaxing some social distance measures necessary to reopen their respective economies has been done with some confidence.
Having just a few latent cases is why a second wave of infections has been successfully contained in South Korea (from week #14, see inset of fig. 1(a)). In contrast, for both the USA and Mexico, the number of latent cases as of July 23rd is enormous. Even if a strict nationwide lockdown were to be imposed and assuming the conservative estimate of 20% for the asymptomatic cases, we can still expect at least ≈ 0.115 × 10 6 × (1 − 0.2) ≈ 92000 new cases in Mexico and ≈ 0.5 × 10 6 × (1 − 0.2) ≈ 400000 new cases in the USA. A more realistic scenario for the USA in the case of a lockdown can be laid out by assuming a similar reproduction number as the one calculated in the case of Italy after it underwent lockdown.
Using eq. (6), the total sum of the activity after a peak λ max during an infinite number of weeks is λ max (β + γ)/(1 − β − γ). After plugging the reproduction number for Italy during relaxation ((β + γ) = 0.694) and λ max ≈ 480000, see last point in fig. 1(c) in the last expression, the USA can still expect about 1.1 million new cases in the course of the following months. In a much grimmer (but not at all unrealistic) scenario, if the USA go back to the slow decay in activity they experienced after reaching the initial peak on week #7 (mid-April, (β + γ) ≈ 0.95, see top panel of fig. 1(c)), then we can expect 9.12 million new COVID-19 cases. The number of cases to be expected in the worst-and best-case scenarios explored here (see fig. 3) differ by a factor of over 20, evidencing the enormous difference that distinct approaches to re-opening the economy and returning to some sort of normality can make [26]. This cannot be understated, specially considering that the death toll is proportional to the number of cases. In this respect, we can estimate the number of new deaths from COVID-19 that may occur in the USA in the following months. The current mortality rate (for registered patients) is likely to be much smaller than its present time-average of 7% due to some of the following factors: 1) earlier infection identification; 2) better equipped hospitals; 3) younger average infected population (down from 65 to 35 years old); 4) and better knowledge about how to treat critically ill patients [27]. The current mortality rate can be roughly estimated to be about 1.36% by assuming a one-week lag between new cases and new deaths (6426 new deaths on week #22 from 472842 cases on week #21). This would imply 124000 new COVID-19 deaths in the following months in the worstcase scenario presented above for the USA, bringing the total casualties to about 270000 if no new treatment is developed that can reduce further the mortality rate.
The analysis we have presented illustrates that bending the epidemic curve and even having a downwards trend is meaningless in terms of public safety unless such trend lasts long enough as to allow for most of the latent cases to either recover or become symptomatic. Furthermore, given that the dynamics of eqs. (3) and (5) is Markovian, 68005-p4 Fig. 3: Evolution of the number of weekly new cases in the USA after week #22 (July 24th) using eq. (6) assuming the relaxation follows 1) the same decay constant from the previous relaxation in the USA (γ + β) = 0.95 ("worst-case scenario"), 2) the trend in Italy, (γ +β) = 0.694, and 3) the trend in South Korea (γ + β) = 0.5 ("best-case scenario"). In the best-and worst-case scenarios, our model estimates, respectively, 400000 and 9 million new cases from July 23rd on. a sudden change in confinement practices is bound to have strong effects regardless of any apparent trend. It is worth pointing out that trends reflect social behavior and not any kind of inertia or "weakening" of the pathogen itself.
As stated in the introduction, our model is a special case of SIR-type models and is therefore not devoid of the same issues related to the sensitivity to the choice of parameters present in this class of models. However, we emphasize that our model constitutes a reparameterization of the weekly activity in terms of exponential changes. Consequently, it is particularly simple to implement and allows for the formulation of predictions by assuming previous reproduction numbers that can be calculated easily. Furthermore, it evidences the Markovian nature of this epidemic process.
The model we have proposed in this work is only valid for the initial stages of the epidemic. To account for the finite size of a given population, the β k factors could be made proportional to the product of the susceptible and infected populations as is done in SIR-type models. Also, more precise scenarios than the ones we have laid out in this work could be obtained by atomizing our mean field model to the state or county level if, for example, knowledge of whether people are following social distancing measures is available. * * * We thank Raul Esquivel and Gabriel Caballero for reviewing the manuscript and acknowledge funding from DGAPA-PAPIIT project IT101820. The authors declare that there is no conflict of interest regarding the publication of this article.

Data collection.
(Raw data can be found, see the Supplementary Material: Italy weekly data.txt, Mexico weekly data.txt, S Korea weekly data.txt, Scenarios for the US weekly new cases Fig3.txt, USA weekly data.txt.) Daily new cases and new deaths were obtained from the website https:// www.worldometers.info/coronavirus/. Daily counts were aggregated weekly starting from February 15th 2020, so that, for example, week #0 comprises the daily data from February 15th to February 21st 2020.
In constructing our model, we have made the important assumption that the number of registered cases consists only of symptomatic individuals. But it is worth pointing out that this is not necessarily the case. For instance, during the early stages of the pandemic in a given country, the number of registered cases likely consisted of the number of individuals that having developed symptoms were be admitted into a hospital. As the number of tests performed increased, it is almost certain that the registered cases included also asymptomatic individuals. The validity of our results hinges on the assumption of a somewhat stable percentage of positive tests. In the USA, this percentage stabilized to around 20% from mid-February to the end of April (see https://covidtracking.com/data/us-daily), the period over which the reproduction number was calculated to predict the dynamics of future relaxations for this country. For our prediction to be valid, this percentage of positive tests must remain somewhat constant in the following months. This important caveat needs be kept in mind when considering the validity of our results.
Estimation of γ and ρ from clinical trials. First, consider the evolution of the patients-zero in the absence of contagion. The probability of being registered after a number t of weeks has elapsed since this population was introduced, is given by λ(t|β k = 0) = αγ t = (1 − γ − ρ)γ t . From this equation, the percentage of asymptomatic cases γ). Then, if the possibility of having asymptomatic cases is precluded (as in the case of clinical tests with infected patients), the naturally normalized bare memory kernel is ϕ t ≡ λ(t|β k = 0, ρ = 0), or ϕ t = (1 − γ)γ t . Then, the average time elapsed from contagion to being registered is then T ≡ ∞ t=0 ϕ t = γ/ (1 − γ). Independent clinical studies on isolated infected patients have found that T = 13 days 2 weeks [20], which contemplates the fact that, on average, it takes people 2 days after developing dyspnea to go to a hospital. Since in our model t = 0 is the first week after exposure (see fig. 2), this period corresponds to T 1, yielding γ = 0.5 and (1/τ ) ≡ −Ln(γ) = 0.6931 Regarding the rate of asymptomatic cases, this is a topic of current debate [28,29]. But investigations on isolated groups of people over long periods of time have found A 20% to be a conservative estimate [23]. We use this value in the present study, but other scenarios could be easily explored by modifying it. Along with the estimation for γ made above, this implies ρ = 0.1. Using clinical data to fix γ and ρ as we have just done implies that the only free parameter of the model is the number of patients-zero, λ −1 . The simplicity of our model carries with it another important condition that must be consistent with independent clinical results for COVID-19. Given that the γ and ρ branches are symmetric in the diagram of fig. 2, the latency function for new registered cases is identical to the latency function for recovered ones. Consequently, the average time from infection to recovery (T ρ ) must equal T . A recent study [15] has found that the probability of being infectious at time t is given by a gamma distribution function with an average close to 2 weeks. While other studies have found that the period during which virus shedding lasts is longer than this value, its magnitude is not enough for contagion to continue [30]. Once again, this average of two weeks corresponds to T ρ = 1 in our model, which equals T , as required. Note that we are requiring these average values to coincide and not the full probability functions themselves, which would be a much stronger condition. The only distribution function we do require to equal the observed data is the activity λ t during the relaxation period with constant β (eq. (5), insets of figs. 1(a) and (b)). Here we have made the simplifying assumption that symptomatic and asymptomatic clinical features are similar, even though some differences have been found for the characteristics of viral shedding in these populations [31]. We emphasize the fact that the condition T = T ρ is a consequence of the simplicity of our model that may not be satisfied by other infectious diseases.
Exponential form of the activity. The binomial expression for the activity of eq. (5) can be recast in exponential form as follows. Starting from eq. (5), the term involving the time evolution is: