Temporal Contact Graph Reveals The Evolving Epidemic Situation of COVID-19

Digital contact tracing has been recently advocated by China and many countries as part of 2 digital prevention measures on COVID-19. Controversies have been raised about their effec- 3 tiveness in practice as it remains open how they can be fully utilized to control COVID-19. In 4 this article, we show that an abundance of information can be extracted from digital contact 5 tracing for COVID-19 prevention and control. Speciﬁcally, we construct a temporal con- 6 tact graph that quantiﬁes the daily contacts between infectious and susceptible individuals 7 by exploiting a large volume of location-related data contributed by 10,527,737 smartphone 8 users in Wuhan, China. The temporal contact graph reveals ﬁve time-varying indicators can 9 accurately capture actual contact trends at population level, demonstrating that travel re- 10 strictions (e.g., city lockdown) in Wuhan played an important role in containing COVID-19. 11 We reveal a strong correlation between the contacts level and the epidemic size, and estimate 12 several signiﬁcant epidemiological parameters (e.g., serial interval). We also show that user 13 participation rate exerts higher inﬂuence on situation evaluation than user upload rate does. 14 At individual level, however, the temporal contact graph plays a limited role, since the behav- 15 ior distinction between the infected and uninfected contacted individuals are not substantial. 16 The revealed results can tell the effectiveness of digital contact tracing against COVID-19, 17 providing guidelines for governments to implement interventions using information technol- 18 ogy. 19 20 and noisy crowdsourced data contact voluntary the intrinsic transmission that validated have capability capturing actual providing a data-driven evidence the chance of thus played an important role in containing COVID-19. reveal a strong correlation (Pearson coefﬁcient 0.77) between the number of daily symptomatic cases and daily total contacts with a 12-days delay, and estimate several signiﬁcant epidemiological pa- rameters such as the serial interval. We study the effect of user involvement on the effectiveness of digital contact tracing measures, ﬁnding that user participation rate exerts higher inﬂuence on

tems of many countries with large case counts and threatening to infect an extremely large popu-23 lation, but it is still too early to tell its disappearance 2 . Currently, many countries (e.g., the U.S.A., 24 the U.K., Australia, etc.) have been cooperating together to prevent and control such an unprece-25 dented disease via a variety of ways 3-5 . 26 As is known, contact tracing is one of the most effective ways to find the high-risk indi- 27 viduals who may be infected, while it costs the expense in effort, time, and financial. Recently, 28 digital contact tracing using information technology has been widely advocated to replace tradi-29 tional labor-intensive contact surveys 5-7 . The main idea is to exploit Bluetooth/positioning sensors 30 on smartphones to discover nearby devices held by users and identify the contacts with the infec-31 tious individuals 8, 9 . On one hand, about 28 countries such as China, Switzerland, Spain, the United 32 Kingdom, Australia, Singapore and Germany have implemented various measures using informa-33 2 tion technology (e.g., launching digital contact tracing apps) 10-14 . On the other hand, however, 34 recent works have revealed that digital contact tracing contributes little to contain outbreaks, prin-35 cipally because of low participation rates and low engagement of participants 15,16 . As many con-36 troversial issues of digital contact tracing have been raised, it is urgent to review empirical evidence 37 for the effectiveness of this measure against a pandemic spreading from different aspects [17][18][19] . 38 In this article, we take a pioneering and in-depth investigation into this issue, and show that 39 an abundance of information can be extracted from digital contact tracing for COVID-19 preven-40 tion and control. We construct a temporal contact graph ( Fig. 1a and 1d) that quantifies the daily 41 contacts between infectious and susceptible individuals by exploiting a large volume of location-42 related data contributed by more than 10,527,737 smartphone users in Wuhan, China. We demon-43 strate that such a temporal contact graph has many applications, e.g., to estimate the individual-44 level contact trend, analyze the dynamic contact behavior (Fig. 1b), identify the potential infected 45 contacted individuals (Fig. 1c), estimate the possible number of confirmed cases in the near future 46 (e.g., cases in the next week) (Fig. 1e), and assist the decision-making of control measures. This 47 is different from previous studies which focused on integrating mathematical models and avail-48 able statistical data of confirmed cases to characterize the transmission of epidemic diseases 20-36 , 49 or those which utilized individual mobility traces (with the information of confirmed cases) to 50 simulate the spreading process 7, 37 , opening up a new perspective to understand the spreading of 51 COVID-19 from the aspect of digital contact tracing. 52 Since contact tracing measures are essentially based on crowdsourcing 38,39 , their perfor-53 mance highly relies on the involvement of voluntary smartphone users. Due to potential privacy 54 leakage and cost incurred during crowdsourcing process, voluntary users are reluctant to partic-55 ipate and contribute their personal data at a fine-grained scale 38,39 . It is, therefore, challenging 56 to fully utilize sparse and noisy crowdsourced data of contact information from voluntary users 57 to capture the intrinsic transmission characteristic of COVID-19. In this article, we introduce 58 five time-varying indicators that are validated to have the capability of accurately capturing actual 59 contact trends at individual and population level in Wuhan, providing a data-driven evidence that 60 the travel restrictions in Wuhan significantly reduced the chance of susceptible individuals having 61 contacts with the infectious and thus played an important role in containing COVID-19. We reveal 62 a strong correlation (Pearson coefficient 0.77) between the number of daily symptomatic cases 63 and daily total contacts with a 12-days delay, and estimate several significant epidemiological pa-64 rameters such as the serial interval. We study the effect of user involvement on the effectiveness 65 of digital contact tracing measures, finding that user participation rate exerts higher influence on 66 3 situation evaluation than user upload rate does. We also show that the distinction of contact behav-67 iors between the infected and uninfected contacted individuals are not prominent, which is more 68 substantial than the sex character but less substantial than the age characteristics. By designing 69 an infection risk evaluation framework, the area under the receiver operating characteristic curve 70 reaches 0.57, indicating it only performs a limited role in identifying high-risk contacted individu-71 als. This indicates that it is not highly effective to narrow down the search of high risk contacted 72 individuals for quarantine by the distinction of contact behaviors. The empirical results can offer a 73 Figure 1: Temporal contact graph and schematics for its potential applications. An individual has four status: susceptible, contacted, infectious and confirmed. The status 'susceptible' turns to 'contacted' when an individual had at least one contact with infectious individuals. A contacted individual may be infected or stay healthy. The status 'infectious' changes to 'confirmed' when confirmation is made. A confirmed case will be quarantined for treatment in China and no longer infectious to others. a. Daily contact graph. b. The analysis for contact behaviors shows the distributions of contact counts between infected and uninfected contacted individuals. c. The personal risk evaluation based on contact behaviors. d. Contact history and status of individuals. A node denotes an individual and different colors indicate different status. A dashed line means the status evolution of a single individual in timeline, and a solid curve between two individuals means a contact. e. The correlation between normalized daily total contacts and daily confirmed cases. 4 promising way to evaluate and predict the evolving epidemic situation of COVID-19, and provide 74 guidelines for governments to implement digital contact tracing measures.

76
Characteristics of informative indicators. We leverage a large volume of location-related data 77 set contributed by 10,527,737 smartphone users in Wuhan, China. Each item in the data set in-78 cludes a geohash encoded meshed area, a timestamp and an anonymized identity. We build a 79 contact model, in which a contact between two individuals is said to occur when they are reported 80 within a certain spatial area and a temporal interval (see the Method section for more details).

81
By collaborating with local authority, we obtain the information whether and when each anony-  The five indicators at the beginning of 2020 are shown along with a series of implements ( Fig.   95 2a). The daily total contacts between infectious and susceptible individuals C(t) can reflect the 96 potential transmission. We find that C(t) increased dramatically first from 4 to 20 January, 2020, 97 due to the fast increasing infectious individuals, and then dropped after 20 January. As we know, 98 the Chinese authority announced the outbreak of COVID-19 and confirmed its infection among 99 people on 20 January, which explains the decline of C(t). Obviously, C(t) decreased sharply 100 around 23 January when the lockdown was implemented in Wuhan, and tended to zero around 28 101 February.

102
From a macroscopic view, N (t) and S(t) describe population-level contacts trend in Wuhan.

103
Notice that N (t) had a minor bouncing back after 26 January, 2020, which is possibly due to 104 the number of confirmed cases quickly increased after 23 January, and people in Wuhan could 105 still move within the city (their mobility increased due to the approaching of Chinese New Year).  From the perspective of the infectious, the distribution of daily contacts also follows a power-125 law distribution (Fig. 2b), and has a prominent long tail especially when the exponent coefficient 126 is small before 23 January. The long tails indicate that there were some super active cases who 127 had contacted with hundreds of susceptible individuals. Identifying and quarantining them helps 128 mitigate the fast transmission. From 15 to 26 January, specifically, the power exponent increased 129 first and then decreased, having the same evolving pattern as K(t). Therefore, C(t), N (t), S(t),  The curves of daily number of contacts C(t) (in blue) and daily symptomatic cases (in red)  Furthermore, we also explore several significant epidemiology parameters including the con-160 tacting period, incubation period, and serial interval (Fig. 3e). Specifically, the contacting period 161 indicates the interval from the first possible contact to the last possible contact, which is estimated 162 to be 2.3 days (95%CI, 0.4 to 6.7 days) (Fig. 3f). The incubation period indicates the interval 163 from the last possible contact to symptom onset, which is estimated to be 7.3 days (95%CI, 1.2 164 to 14.1 days) (Fig. 3g). The serial interval indicates the interval from symptom onset of A to 165 symptom onset of B who is infected by A, which is a proxy of generation period from the infection 166 of A to the infection of B who is infected by A. Notice that the serial interval could be negative 167 because of asymptomatic transmissions, and it is estimated to be 2.5 days (95% CI, -9.2 to 13.9 168 days) (Fig. 3h). These estimations are in accordance with most existing survey 42-49 , demonstrating 169 the effectiveness of revealing epidemic situation at population level by digital contact tracing. of contact tracing (e.g., estimating K(t) and L(t) and daily confirmed cases) is affected by user 174 involvement, raising the question on whether contact tracing measures can really work in practice. 175 We study on this issue by taking into account two types of user involvement: user participation 176 rate (the proportion of users in the whole population) and data uploading rate (their data reporting 177 frequency per day). To simulate user involvement, we randomly choose α% users as the voluntary 178 users, and α% data items each participating user uploading per day, and evaluate the corresponding 179 performance loss. We conduct extensive explorations by varying the values of α, and repeat ten times of Monte 181 Carlo experiments at each involvement level to make our experiments more credible. At a specific 182 α, we plot the time series with error bars of K(t), L(t) and total contacts C(t) for both scenarios 183 of user participation rate and user upload rate, ranging from Jan. 1st, 2020 to Feb, 28th, 2020.

184
It is shown that, as α decreases, corresponding time series decrease with the similar trend ( Fig.   185 4a-4f). This is expected as reduction in either user participation rate or user upload rate decreases 186 the chances of having contacts among users. To see if the reduction has influence on capturing the 187 evolving trends, we calculate the Pearson correlations between the time series under α% and full 188 (100%) participation rate/data upload rate case ( Fig. 4g and 4h). 189 We get the following observations. 1) Decreasing the user upload rate or participation rate 190 results in the lower values of K(t), L(t) and C(t). 2) User participation rate and data upload 191 rate have minor effects on the evaluation of evolving pattern of C(t), whose error bars are not as 192 obvious as another two variables. The above observations indicate that C(t) is more robust than 193 K(t) and L(t) when user involvement changes. 3) K(t) is more sensitive to the change of user 194 involvement α than L(t). This is because the number of susceptible individuals is much larger 195 than that of the infectious. 4) User participation rate exerts higher influence on the three indicators 196 than user upload rate does according to Fig. 4g and Fig. 4h. Therefore, we should encourage 197 more user participation to obtain a better performance in practice. Considering their privacy and 198 cost concerns, it would be a good strategy to allow voluntary smartphone users having a relatively 199 low data upload rate. 5) For the participation rates analysis, when the participation rate reduces to 200 10%, the correlation coefficient reduces significantly according to Fig. 4g, which can be attributed 201 to the characteristics of the overall power-law distribution of the network, which has an obvious 202 long-tail effect. Only when the participation rate is low enough can some key nodes be deleted, 203 thereby affecting the trend of the entire network. We note that the performance of individual-level 204 infection risk evaluation will be impacted when user participation rate or upload rate drops since 205 we may miss many contacts with infectious cases in such case and make an incorrect evaluation.  (Fig. 5a).  The infected contacted individuals follows a power-law distribution with < k >= 3.95 and γ = 221 1.33, and the uninfected contacted individuals follows a power-law distribution with < k >= 2.89 222 and γ = 1.79 (Fig. 5b). We count the number of days when contacted individuals had contacts with  are not prominent, which is more substantial than the sex character but less substantial than the age 257 characteristics. To perform a sensitivity analysis for the temporal and spatial granularities, we vary Health Emergencies Regulations of China. All raw data was stored in specialized data servers 295 with limited access by LBS providers. This article only utilizes the temporal contact graph that is 296 derived from the raw data. 297 We propose a contact model based on the crowdsourced dataset: a contact between two 298 smartphone users is said to occur when they report the identical geohash within a given time 299 interval. As aforementioned, the geohash can be projected into a mesh area of a certain meshed 300 area (e.g., 15 × 29m 2 ). This means that a contact is characterized when the distance between 301 two smartphone users is within 18 meters averagely. Such a definition is similar to that adopted 302 by most contact tracing apps which exploit Bluetooth or GPS to decide a contact when two users 303 are in a short distance. As smartphone users report data in a very low and irregular frequency, 304 the contributed data are typically sparse. We would miss many contacts if we only count those 305 where two smartphone users are reporting identical information simultaneously. Considering the 306 data sparsity, we define a contact occurring when two users upload the same geohash with time 307 interval T . We vary T from fifteen minutes to two hours and perform the sensitivity analyses (see 308 Supplementary Fig. 7   we employ the Bayesian formula 332 P (z j = 1|B j , F j ) = P (B j , F j |z j = 1) · P (z j = 1) P (B j , F j ) . (2) The term P (B j , F j |z j = 1) is called the likelihood, indicating the distributions of behaviors and Since we have found that the probabilities for various contacts follow power-law distributions, i.e., where coefficient c (u) is the normalizing constant, satisfying 337 c (u) = 1 ∞ k=1 k −γ (u) dk = γ − 1, γ > 1.
We next try to compute the values of c and γ by maximum likelihood estimate 53 . Supposing 338 we have N infected samples b 1 , b 2 , · · · , b N , we obtain the likelihood function 339 l(γ) = ln P (b 1 , b 2 , · · · , b N |γ) = ln N j=1 (γ − 1) · b −γ j = (−γ) · N j=1 ln b j + N · ln(γ − 1). (6) Then, Holding ∂l(γ) ∂γ = 0, we can obtain As P (F Specifically, supposing we have M infected samples f 1 , f 2 , · · · , f M , the multinomial distribution 344 Q(k) is estimated by Notice that there is difference between the behaviors of the infected contacted individuals and the 346 uninfected contacted individuals. We thus denote the estimations from the infected samples byγ where ρ can be obtained by the proportion of the infectious among the population.

351
Data availability The temporal contact graph and other key statistical information used in all the 352 analyses will be made available upon publication. The daily symptomatic cases are referred to the 353 Ref. 40 .

354
Competing interests The authors declare no competing interests.