Modelling of the Hungarian spread of COVID-19 and control strategies with risk-based approach

Novel Coronavirus Disease (COVID-19), caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), threatens humanity in terms of health and economy as it spreads extremely fast and causes massive epidemics all over the world. In the absence of a vaccine, social isolation and hygienic measures are the only way to curb the virus. In our study, the Hungarian spread of COVID-19 is modelled by applying a modified SEIR 19 (Susceptible, Exposed, Infected, Recovered) compartment model, which takes into account the route 20 of disease transmission not only from infected, but from latent individuals (exposed compartment) as 21 well. The differences between the modified model and the traditional SEIR model has been 22 evaluated. The different scenarios of disease spreading simulate the effect of the different level of 23 interventions (social distancing and hygienic measures) taken place in Hungary. The modelling also 24 considers the population and mobility data which are also essential in case of infectious disease 25 spreading. For controlling the disease in the long-term a network-based analysis is provided based on 26 the concept of the epidemic threshold and the identification of super-spreader population groups. awareness also on this issue.


Abstract 11
Background 12 Novel Coronavirus Disease (COVID-19), caused by Severe Acute Respiratory Syndrome 13 Coronavirus 2 (SARS-CoV-2), threatens humanity in terms of health and economy as it spreads 14 extremely fast and causes massive epidemics all over the world. In the absence of a vaccine, social 15 isolation and hygienic measures are the only way to curb the virus. 16 17 Methods 18 In our study, the Hungarian spread of COVID-19 is modelled by applying a modified SEIR 19 (Susceptible, Exposed, Infected, Recovered) compartment model, which takes into account the route 20 of disease transmission not only from infected, but from latent individuals (exposed compartment) as 21 well. The differences between the modified model and the traditional SEIR model has been 22 evaluated. The different scenarios of disease spreading simulate the effect of the different level of 23 interventions (social distancing and hygienic measures) taken place in Hungary. The modelling also 24 considers the population and mobility data which are also essential in case of infectious disease 25 spreading. For controlling the disease in the long-term a network-based analysis is provided based on 26 the concept of the epidemic threshold and the identification of super-spreader population groups. 27 28 Results 29 According to sensitivity analysis of the modified SEIR model, disease transmission of latent 30 individuals has the greatest effect on the number of infections. Based on the results, the applied 31 interventions have a great impact on the disease spreading and are effective in controlling the 32 COVID-19 epidemic., a network-based analysis is provided based on the concept of the epidemic 33 threshold and the identification of super-spreader population groups. According to the results of the 34 network-based study, the proportion of people to be sampled for an effective disease control is the 35 function of the identified people with high number of contacts in social networks who act as super-36 spreaders. 37 38 Conclusion 39 Applying network-based random, selective and targeted sampling, testing and isolation of affected 40 individuals would yield significantly different sample sizes, highlighting the importance of super-41 spreaders. Network analysis (but also all computational science methods) need large amount of good 42 quality data and the spread of these methods could be supported by easy-to-use tools. We wanted to 43 raise awareness also on this issue. 44 Keywords: COVID-19, epidemiological modelling, control strategies, risk-based testing, super-45 spreader identification, network analysis. 46 3 Background 47 Genetic mutations of microorganisms are inevitable, however, due to climate crisis and other crucial 48 drivers of change, emerging infectious diseases are more and more likely to threaten humanity in the 49 near future. Nowadays, mankind is in a great crisis in terms of health and economy because of an 50 epidemic caused by the mutation of an originally animal-related coronavirus (1)(2). By adapting to 51 humans, the virus named Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) (3) 52 caused pandemic infectious disease COVID-19 (4) that affects millions of lives. As of today, it has 53 appeared almost all over the world, it spreads extremely fast and causes massive epidemics. 54 COVID-19 is transmitted by inhalation or contact with infected droplets (5,6). It must be noted that 55 the aerosol and surface stability of SARS-CoV-2 is higher than SARS-CoV-1, therefore indirect 56 transmission routes are more significant than previously thought (7,8). 57 In the absence of a vaccine, social isolation and hygienic measures are the only way to curb the virus. 58 However, these interventions are proved to be effective only when applied very strictly because of 59 the very high virulence of SARS-CoV-2 (9,10). The level of isolation needed (e.g. school closures, 60 banned events and gatherings) is such high that it is unsustainable in the long run. 61 Modelling is a key option in epidemiology in cases when limited data are available regarding an 62 infectious disease like emerging diseases such as COVID-19. Scenario analysis is a tool by which 63 information can be gained that can support decision making regarding mitigation strategies and risk 64 management, nevertheless, it has its limitations which must be taken into account. average degree is not enough in itself to describe the network topology. Diseases can spread much 89 faster in such networks due to the presence of super-spreader hubs. These networks are also known 90 for their specific robustness: they withstand untargeted attacks without falling apart, but they are 91 vulnerable to attacks targeting the large hubs of the network. This phenomenon could also be used 92 for planning targeted interventions. 93 The aim of this study is to present an epidemiological modelling based on the SEIR compartment 94 model that is generally accepted for modelling the spread of COVID-19, but with a modification in 95 order to take the transmission route of the latent, yet infectious people into account. Network 96 analysis-based intervention strategies are also discussed in the paper, since the profound implications 97 of network theory are not widely known in the public health community. 98 99

Materials and Methods 100
Structure of the epidemiological model 101

Modified SEIR model for COVID-19 102
For COVID-19, we have built a SEIR (Susceptible, Exposed, Infected, Recovered) compartment 103 model for a basis of the modelling but we have applied some modifications. In the original SEIR 104 model, the Exposed compartment is not infectious. Our modified SEIR model accounts for the 105 infectiousness of this compartment 'E', as it has been shown by previous studies (19-23), and also 106 account for a route of transition of being recovered without symptoms or with mild symptoms from 107 the Exposed compartment to the Recovered compartment. For better differentiation from the original, 108 SEIR model with the applied modifications will be referred as 'modified SEIR model' (   The parameter values in the baseline scenario (Scenario 1) with no interventions are shown in Table  137 1. 138 can be specified in percentages of the total global population. Initial epidemic locations (cities) and 179 number of individuals in the given compartment can be defined for arbitrary number of initial 180 epidemic seeds. staying at home and reducing the physical contacts as much as possible. Events with large number of 202 people were banned (the limit was 100 people for indoor and 500 for outdoor events, respectively). 203 According to our assumption that is supported by the data of Google Analytics (42)  the first round until the 11 th of April, and in the second round it was extended for an indefinite period. 211 The goal of the restrictions was generally to limit the contact of people who do not live in the same 212 household, therefore residences should only be left for the satisfaction of basic needs (e.g. work, 213 grocery, pharmacy, health services) the previously specified restrictions regarding the institutions 214 have remained unchanged. According to our assumption and the further mentioned references (42,43) 215 that resulted in a further 25%, altogether 75% reduction in the level of social interactions compared 216 to the normal lifestyle, which means a reduction in β and βL values to 0.3 and 0.1, respectively. 217 Scenarios 1 to 3 attempt to model the real-life situation regarding disease spreading with the 218 temporally applied intervention measures in Hungary. It is noted, however, that the list of 219 interventions is not fully complete, only those are mentioned which are crucial regarding scenario 220 building. 221

Practical implementation in GLEAMviz (Settings) 222
Unless not specifically indicated, settings are applied for each four Scenarios.  Table 2. 225 § Initial geographic location of the epidemic: Officially 2 infected people were registered on the 253 4 th of March 2020 in Hungary (48), Budapest, but in order to get more realistic simulations 254 regarding the coverage of the region, initially 5 infected individuals were set for Budapest and 255 3, 3 latent individuals for Debrecen and lake Balaton, based on population data, respectively. 256 These were the only available locations in GLEAMviz software for Hungary. 257 As there is a possibility for setting numerous initial locations of the epidemic, number of 258 individuals with registered COVID-19 infections (number of active patients) were collected for 259 the 4 th of March, 2020. Only cities can be set in the software, and data about registered 260 infections were available in different levels of aggregation. In order to evaluate the differences between modified SEIR model and the traditional ('basic') SEIR 270 model, a simulation has been made in which parameters βL and µ were disregarded (transition rates 271 that contribute to the infectiousness of exposed individuals). Other parameters and settings were the 272 same as in Scenario 1 in the simulation. 273

Sensitivity analysis of modified SEIR model 274
The sensitivity analysis has been done to reveal the parameter(s) to which the model is the most 275 sensitive. There were five parameters in our modified SEIR model to be investigated: β, βL, γ, , μ. 276 The parameters and settings of Scenario 1 was used for the evaluation. The maximum value of the 277 daily number of individuals in the infected compartment and the number of days related to this value 278 were selected as the endpoint. Only one parameter per scenario was changed at a time. Changes in 279 both directions were evaluated with lower (´ ½) and higher (´ 2) values compared to the baseline 280 value (Table 1.) of the examined parameter. 281

Network analysis-based intervention strategies 282
The basic assumption of the conventional compartment models is that any person can contact 283 anyone, and everyone has the same number of contacts, is not true in the real contact networks. Real 284 networks are sparse, where many nodes with small number of contacts are connected with each other 285 through few large hubs with many contacts, which could also be identified as super-spreaders. 286 The first applications of network science to disease modelling set a new scientific field called 287 network epidemics (58). When modelling disease spreading, the network characteristics will yield 288 many important differences compared to the conventional compartment models. 289 First of all, the diseases spread much faster in scale free networks due to the presence of hubs. In 290 network epidemics the concept of epidemic threshold (λc) is used: pathogens can only spread if the 291 spreading rate λ exceeds λc. The spreading rate can be defined as: 292 (2) 293 where b * is the likelihood that the disease will be transmitted from one infected person to a 294 susceptible one in a unit of time, and µ is the recovery rate. The conventional R0 can be defined as: where 〈 〉 is the average degree. In case of scale free networks, the epidemic threshold is: 297 where 〈 ' 〉 is the second moment of the degree distribution, and it is used in the calculation of the 299 variance (25 Other important consequence of network topology is the difference in planning intervention 317 strategies. If we try to remove nodes from the network either with immunization (which we can't 318 perform yet in case of COVID-19) or with identifying latent and infected people with sampling and 319 testing, and then removing them with quarantine measures, we may have use other than conventional 320 options as well. For assessing the effect of identifying and removing latent and infected people with 321 sampling, testing and consequent quarantine measures, we use the concept of critical immunization 322 well established in network epidemiology. Critical immunization gc is the proportion of nodes needed 323 to be removed from the network to stop the spreading of the disease. 324

Random sampling 325
In case of random sampling, testing (and consequent removing of SARS-CoV-2 positive people from 326 the contact network), the critical immunization gc can be defined as: 327 Besides random sampling, there are other options as well, stemming from the characteristic of the 329 scale free networks first described as the error tolerance of networks (65). If we remove the nodes 330 from a network in a targeted manner, the spreading on the network could be quickly slowed down or 331 stopped. As it was demonstrated, the rapid spread of diseases is caused by the fact that there is a large 332 variance in the degrees of the nodes in these networks (〈 ' 〉 ≫ 〈 〉), and this comes from the 333 presence of hubs. If we want to decrease the variance (thus increasing λc), we have to block hubs 334 from interacting. 335

Selective sampling 336
If we don't know the exact mapping of the contact network, we could reach for the 'friendship 337 paradox' (66) and the immunization strategy based on it proposed by Cohen et al. (67). The 338 friendship paradox says that on average the neighbours of a node have higher degrees than the node 339 itself. The average degree of a node's neighbour doesn't equal to 〈 〉, but it is a different number, 340 depending largely also on 〈 ' 〉 (25). The origin of this phenomenon is that it is more likely for a 341 random node to be connected to a hub than to a small degree node, because hubs have more 342 connections than other nodes. Thus, immunizing (or isolating) the contacts of randomly selected 343 individuals, we target the hubs without knowing exactly which individuals are the hubs. 344

Targeted sampling 345
If we knew the whole contact network, we could target the most prominent hubs. We don't know the 346 exact mapping of the Hungarian social network, but we can have the following assumptions: 347 -The network is scale-free; 348 where the probability pk that a node has exactly k links is: ) = ) () *(,) , where g is the degree 349 exponent and x(g) is the Riemann-zeta function (using discrete formalism (25)); 350 and 2 < g < 3, as in most of real-life networks.

351
With targeted sampling we try to remove all nodes whose degree is larger than kt. From an 352 epidemiological viewpoint this is the same as removing the high degree nodes from the network with 353 their links as well. With this intervention, the network will change, and λc will increase (25)

374
The proportion of maximally affected latent and infected people (daily and overall) can be seen in 375 Table 5. According to the modelling, about 1,000,000; 190,000 and 3,500 people will be infected on 376 the day, when the epidemic reaches its peak in Scenarios 1, 2 and 3, respectively. Overall, about 377 5,400,000; 3,300,000 and 110,000 people will get through the infection with moderate or serious 378 symptoms in Scenarios 1, 2 and 3 and respectively. Note that in case of Scenario 3, the maximum 379 number was not reached till the end of the simulation (365 days, which was the limitation of the 380 software). 381  388 389 Regarding exposed (latent) people, who will only have mild symptoms or no symptoms at all 390 (according to our assumption), on the day of the peak of the epidemic, about 1,700,000; 280,000 and 391 4,800 people will be affected in Scenarios 1, 2 and 3, respectively. Overall, about 8,700,000; 392 5,300,000 and 170,000 people will get through the infection with mild symptoms or as asymptomatic 393 cases in Scenarios 1, 2 and 3, respectively.  (Table 6). It can 402 be seen that the peak of the curve is shifted in the basic SEIR model (from about 50 days after the 403 start date of the simulation to the ~90 th day) (Fig. 4.). 404   2 times resulted in a ~66% increase in the number of maximum daily infections (Fig. 5A), and 420 decreasing its value to the half of the original ended up in a 24 days sooner peak in the curve of the 421 number of daily infections (Fig. 5B). However, other the importance of parameters such as µ and β 422 are also indicated by the tornado-plot. 423

Results of network analysis 430
To address the problem caused by the inability of conventional compartment models to capture the 431 non-homogenous nature of the population, network-based analyses were also performed, which 432 yielded the following results. 433 Based on the Copenhagen Networks Study (60) data, we found average degree 〈 〉=46, the second 434 moment of the degree 〈 ' 〉=2847 and based on Equation (4)  λ=0.067, which means that for stopping the epidemic, we need to change the network in such a way 438 that λc will be increased above this value. 439

Random sampling 440
In case of random sampling, testing (and consequent removing of SARS-CoV-2 positive people from 441 the contact network), based on equation (5) we have found the critical immunization to be gc=0.761. 442 This means that 76.1% of the population shall be removed from the network. This implies very strict 443 sampling and quarantine measures with testing a very large proportion of the population. 444

Selective sampling 445
Using the 'friendship paradox', the procedure proposed by Cohen et al. (67), consists of three steps: 446 1. Choosing randomly a fraction of nodes, this is layer 0. 447 2. Selecting randomly a link for each node in layer 0. The nodes to which these links connect will 448 form layer 1. 449 3. Immunizing (in our case sampling and testing) the layer 1 individuals. 450 This sampling strategy doesn't require information on the structure of the network. According to 451 Cohen et al. (67), gc is systematically under 0.3. It means that by selecting and removing a randomly 452 chosen neighbour of 30% of the population, the spreading of the disease could be stopped. With 453 selecting contacts referred simultaneously by more people, this strategy could be even enhanced. 454

Targeted sampling 455
With targeted sampling we try to remove all nodes whose degree is larger than kt. Based on equation 456 (6), we could obtain target degrees (kt) with the new epidemic thresholds (λ'c) which are presented on 457   The main advantage of epidemiological modelling with GLEAMviz software is that besides the 480 characteristic parameters of the infectious disease, other essential factors are taken into account when 481 simulating the disease spreading, namely population density and mobility of the population. The 482 irregular network structure, that affects the local spread of infectious disease between neighbouring 483 subpopulations is captured in GLEAMviz datasets, so as the difference between high traffic and low 484 traffic airports that has a significant impact of disease spreading around the globe. 485 Simulations are evaluated for the Hungarian region, but with GLEAMviz, it is possible to set the 486 initial number of infected individuals in different cities on the day when the simulation starts, thereby 487 the impact of global presence of infected people is also taken into account. Note that registered 488 number of infected people in different countries/cities carries bias as the protocol for registering 489 COVID-19 positive cases and number of tests performed differ greatly from country to country. 490 Nonetheless, these data and this option provide additional adjustable settings that contributes to more 491 realistic simulations. 492

Evaluation of modified SEIR model -sensitivity analysis 493
All modelling has their limitations, and this applies particularly to modelling of emerging diseases 494 such as COVID-19. As limited data are available regarding SARS-CoV-2 and the spread of COVID-495 19, parameters of the compartment model can only be estimated. Our modified SEIR model has been 496 made by using available data from scientific literature regarding the compartment model parameters 497 and its novelty is that the route of infection transmission from the Exposed compartment (latent 498 individuals) is built in. Compared to the traditionally applied SEIR model, in which latent individuals 499 cannot transmit the infection to susceptible people, it results in an earlier and higher peak in the 500 epidemic curve of infected individuals, which means a faster epidemic course with more infected 501 people in shorter period of time, that poses greater burden to the healthcare system. However, total 502 number of infections are lower altogether in the modified model compared to the basic SEIR model. 503 According to this result, the experienced characteristics of COVID-19 epidemics, namely the so-504 called 'exponential growth' of the disease spread can be explained by the disease transmission of 505 latent people in case of SARS-CoV-2. Sensitivity analysis of the modified SEIR model also reveals 506 the importance of the disease transmission by latent individuals. According to the results, disease 507 transmission related to latent individuals has the greatest impact on the results of the simulation 508 regarding the number of maximum daily infections and the time of the peak of the epidemic. 509

Evaluation of network analysis results 510
When taking into account the social contact network topology during intervention planning, we have 511 to be aware of the specific challenges and also opportunities these network characteristics pose. Most 512 of the counterintuitive network phenomena are caused by the presence of hubs, i.e. nodes with large 513 amount of contacts, which could also be identified as super-spreaders. One of the results is that the 514 diseases spread much faster on real networks than conventional compartment models would predict. 515 This also means that the R0 identified as a benchmark value might be wrong: even if R0 < 1, there 516 might be still nodes in the network whose degree is higher than the average degree, maintaining and 517 spreading the disease. 518 The usual intervention strategies mentioned in this paper are still useful and effective, but they don't 519 target the super-spreader hubs. When planning sampling strategies, random sampling would imply 520 very strict sampling and quarantine measures with testing a very large proportion of the population 521 (>75%). When the 'friendship paradox' (i.e. on average the neighbours of a node have higher degree 522 than the node itself) would be used for planning selective sampling, a significantly lower number of 523 samples (~30%) would suffice. And if we could find the super-spreader hubs in a network during 524 targeted sampling, <1% of the population would be enough to be sampled, tested and isolated. 525 Of course, natural contact networks are not ideal scale-free networks, and we don't know the exact 526 super-spreaders either. Nevertheless, according to the recent analysis of the Hungarian Centre for 527 Economic and Regional Studies Institute of Economics on occupations affected the most by the 528 COVID-19 outbreak (69)  veterinarians, public transport workers, postmen and waste removal staff would fall into this high-577 risk group. These people must be tested much more often as they play key role in SARS-CoV-2 578 transmission. From this group, healthcare workers could be identified as key actors from network 579 perspective, implying even stricter sampling and testing regime for them. When identified positive 580 cases of super-spreaders would be isolated, this would have a large effect on the contact network 581 itself, cutting off all the links of the large hubs, thus slowing down the spreading considerably. 582 Our approach is generally applicable for the prevention and early detection of epidemics caused by 583 other microorganisms as well, thereby protecting human health and preventing economic crises 584 caused by emerging and re-emerging diseases. 585

Proposal for data collection and data sharing 586
Our study also points to the well-known fact that models are only as good as their input data. When 587 modelling and planning intervention strategies, fit-for-purpose and timely data are essential. Using 588 specific population like college student data can't be used for precise modelling of the spreading of 589 diseases in other societal groups. Unfortunately, there is a lack of social contact data with sufficient 590 granularity, and advances on network science couldn't be fully exploited in real-life situations. It is 591 not to say that collecting and sharing contact network data for public health purposes would 592 overwrite data protection and privacy aspects, but as Oliver et al. (73) pointed out, there are available 593 data sources like mobile phone data, which could be extremely useful for such purposes. These 594 datasets, if would be made available in a careful and transparent manner and taking into account data 595 protections issues, could also be used for other important public health domains, like foodborne 596 disease outbreak investigation. 597 In our paper we had two important objectives regarding network analysis approach to 598 epidemiological modelling: 599 -Showing the profound implications of network theory for epidemiological modelling and risk 600 management, since the network epidemiological approach is not widely known in the public 601 health community. Most of the analyses and also intervention strategies don't take into 602 account the inhomogeneity of the connections nor the network-based background of the 603 spreading of the virus. Even when there is a general knowledge on the role of super-spreaders, 604 the quantification of this role is not well known. 605 -Network analysis (but also all computational science methods) need large amount of good 606 quality data and the spread of these methods could be supported by easy-to-use tools. We 607 wanted to raise awareness also on this issue. 608 These two objectives are in close connection, since the lack of fit-for-purpose data and access to 609 computational methods prevent the public health community from applying network-based approach 610 in the decision-making process. However, the public health community should work towards solving 611 data and tool related issues, for example with using proxy data in a short term (e.g. mobile phone 612 data, tracing applications, etc.) or with planned exercises on network data collection in the long run. 613 614 Declaration 615 Ethics approval and consent to participate 616 Not applicable. 617 618 Consent for publication 619 The authors consent for publication. 620 621 Availability of data and materials 622 All the analysed and generated data during this study are included and referred in this published 623 article. The KNIME workflow is available at https://univet. contributed to manuscript revision, read and approved the version to be submitted 639 640   Epidemic thresholds (λ'c) of different degree exponents (g) and target degrees (kt) calculated to 869 increase the epidemic threshold above the spreading rate of COVID-19 with targeted sampling. The 870 spreading rate (λ) of the virus is set to 0.067 (red line). 871