Estimation of the serial interval of COVID-19 in Ireland using contact tracing data

The serial interval is the period of time between the onset of symptoms in an infector and an infectee and is an important parameter which can impact on the estimation of the reproduction number. Whilst several parameters inuencing infection transmission are expected to be consistent across populations, the serial interval can vary across and within populations over time. Therefore, local estimates are preferable for use in epidemiological models developed at a regional level. We used data collected as part of the national contact tracing process in Ireland to estimate the serial interval of SARS-CoV-2 infection in the Irish population, and to estimate the proportion of transmission events that occurred prior to the onset of symptoms.


Abstract Background
The serial interval is the period of time between the onset of symptoms in an infector and an infectee and is an important parameter which can impact on the estimation of the reproduction number. Whilst several parameters in uencing infection transmission are expected to be consistent across populations, the serial interval can vary across and within populations over time. Therefore, local estimates are preferable for use in epidemiological models developed at a regional level. We used data collected as part of the national contact tracing process in Ireland to estimate the serial interval of SARS-CoV-2 infection in the Irish population, and to estimate the proportion of transmission events that occurred prior to the onset of symptoms.

Results
After data cleaning, the nal dataset consisted of 471 infected close contacts from 471 primary cases.
The mean serial interval was 4.0 (95% con dence intervals 3.75, 4.31) days, whilst the 25 th , 50 th and 75 th percentiles were 2, 4 and 6 days respectively. We found that intervals were lower when the primary or secondary case were in the older age cohort (greater than 64 years). Simulating from an incubation period distribution from international literature, we estimated that 67% of transmission events had greater than 50% probability of occurring prior to the onset of symptoms in the infector.

Conclusions
Whilst our analysis was based on a large sample size, data were collected for the primary purpose of interrupting transmission chains. Similar to other studies estimating the serial interval, our analysis is restricted to transmission pairs where the infector is known with some degree of certainty. Such pairs may represent more intense contacts with infected individuals than might occur in the overall population.
It is therefore possible that our analysis is biased towards shorter serial intervals than the overall population.

Background
The basic reproduction number, R 0 , is an indicator of the transmissibility of an infectious agent, de ned as the expected number of new infections that are generated by a single infected individual over the course of its infectious period, in an otherwise susceptible population [1]. With the implementation of tiered control strategies, the number of secondary cases from an infected individual is expected to vary at particular points in time. The estimation of a time-varying reproduction number (Rt) allows the e ciency of infection transmission to be traced over time [2], providing some measure of the e cacy of control measures that have been introduced [3], as well as facilitating short, and longer term predictions in case numbers.
One of the key parameters involved in estimating Rt from case data is the generation time (also called the generation interval), de ned as the duration of time between two successive linked infection events, in other words the time between the point of infection for the infector and the point of infection for the infectee [4]. Since the precise time of infection is generally unobserved and therefore di cult to ascertain, the serial interval is often used as an approximate value for the generation time [5]. In contrast to the generation time, the serial interval is the duration of time between the onset of symptoms in the infector and the onset of symptoms in the infectee.
The serial interval is determined by a number of important factors: the contact patterns between infectious and susceptible individuals; the latent period and the duration of infectiousness; and the incubation period. The incubation period of COVID-19 is likely to be similar across populations and within populations over time [6]. Similarly, there does not appear to be any evidence to suggest that latent periods and duration of infectiousness are likely to vary across different countries [7]. In contrast, contact patterns are likely to be relatively speci c to a particular population, therefore local estimates (for example at a national level) of the serial interval are preferable in the estimation of Rt [5,8]. In addition, within a given population, due to factors in uencing human behaviour including public health advice, regulations restricting work and social events, and movement restrictions, the contact patterns of individuals are likely to change over time [9], resulting in temporal changes in the serial interval that can impact on the accuracy of the Rt estimate [8].
Similar to other countries, a national contact tracing service was established in Ireland to reduce transmission of infection. The primary goal was to instruct cases to self-isolate, to identify and provide advice to their close contacts to interrupt onward transmission, and to enable targeted testing of these close contacts. In addition, information was also gathered on the likely source of infection following discussion with that infected individual. This included reporting interactions with con rmed cases and clusters of cases reporting attending the same events and venues. Whilst these data were collected for a speci c purpose, that is reducing onward transmission, secondary analysis of these data could be used to inform national controls.
The aims of the current study were: 1. To use contact tracing data to estimate the serial interval of SARS-CoV-2 in Ireland and, 2. To use the resulting serial interval distribution to estimate, using simulation, the proportion of transmission events that occurred prior to the onset of symptoms in Ireland.

Materials And Methods
Description of the data Data captured from the contact tracing process is described in detail elsewhere [9]. Brie y, details on cases and their contacts were collected during two phone calls which were held in two separate datasets (Dataset 1 and Dataset 2). The rst call informed cases of their positive test status (or con rmed the test result if the individual had already been informed by their GP), collected the date of symptom onset (if any), as well as categorising the likely source of infection for that individual (close contact with a con rmed case, healthcare setting or community transmission -if the source was unknown). The second call collected details of the contacts of each con rmed case.
These data, based on data entries from contact tracing call centres, were collected by the Health Service Executive (HSE) under the Medical O cer of Health legislation, collected by the Central Statistics O ce (CSO) in compliance with the Statistics Act 1993, pseudonymised, and stored in a centralised database (the CSO C19 Data Research Hub). The CSO C19 Data Research Hub is a secure data repository from which personally identi able data cannot be exported.
These data were accessed through the CSO data hub by the rst author for the purpose of this analysis. Access was granted under Section 20(b) of the Statistics Act, 1993, for the purpose of using data collected during the pandemic to aid in the national response. The study was approved by the National Research Ethics Committee (20-NREC-COV-099). The requirement for informed consent was waived by review board Health Research Declaration Committee (20-025-AF1/COV). All methods were carried out in accordance with relevant guidelines and regulations.

Data linking/management
Contacts were linked to the primary case by joining Datasets 1 and 2 on the basis of the reference ID of the primary case. Next, contacts who subsequently became cases were identi ed by searching for the reference ID of the contact within the case database. To remove 'mirror image' transmission events (that is, those situations where a single transmission event is identi ed twice in the dataset, for example, one in which person A is the primary case, and person B is the secondary case; with a second entry where person B is the primary case and person A is the secondary case), we assigned a 'pair id' to each transmission event and retained only unique transmission events within the dataset.

Study design
In order to restrict the analysis to de nite infector-infectee pairs, data ltering processes were undertaken. Figure S1 and Figure S2 (Supplementary Material) outlines potential errors that may arise, leading to incorrect speci cation of the infector for each infectee. To avoid the misidenti cation of an intermediate case in a close co-contact ( Figure S1, Supplementary Material; secondary or tertiary case), we restricted our analysis to primary cases who only infected a restricted number of individuals: In the rst instance, we restricted our analysis to transmission pairs where the primary case infected only one other individual, and conducted a sensitivity analysis including those who infected three or less individuals. Whilst conducting the sensitivity analysis, in cases where the primary case resulted in more than one serial interval, we randomly sampled one of these for our analysis in order to account for the non-independence between multiple serial intervals from a single case.
To avoid the possibility that both the observed primary and secondary cases acquired the infection from a single unidenti ed infector ( Figure S2, Supplementary Material; secondary case or common source), we restricted the analysis to cases where the most likely source of infection reported for the primary case was community transmission (that is, no source identi ed), whilst the source of infection reported for the secondary case was contact with a con rmed case. To reduce the potential for recall bias, we only used transmission pairs where contact tracing took place within 1 week of the onset of symptoms for both the primary case and secondary case. Finally, data were right censored to a point 30 days prior to the end of record collection to avoid bias from omitting longer serial intervals for which the date of symptom onset for any secondary cases had not yet been collected.

Data analysis
The difference in time between the onset of symptoms between linked cases was calculated. Intervals greater than 28 days were removed from the dataset. This duration was chosen to correspond to the maximum possible duration of the serial interval given a maximum post-symptom onset infectious period of 13.4 days [7] and the 97.5 th percentile of the incubation period [6]. Similarly, serial intervals less than -10 days were also removed from the dataset, the minimum serial interval reported by Du et al. [10]. Range, median, mean and interquartile range were summarised for the overall dataset, by age cohort and region (Dublin, Rest of Country) of the primary case, and by age cohort of the infected contact.
A range of statistical distributions were t to the data: Weibull, gamma, normal and lognormal. Because of the possibility of negative serial intervals, a constant (k=10 days) was added to each serial interval before tting the distribution. The value for k was chosen to be su ciently large such that its addition to each serial interval would equal a positive integer. Previous work has shown that a considerable proportion of serial intervals are negative, with a range of -10 to 20 days [10]. The best tting distribution as determined by the AIC was used as the nal estimate of the serial interval distribution.
We simulated from the resulting distribution to estimate the proportion of transmission events that were likely to have occurred prior to the onset of symptoms of the infector. First, 100,000 random samples were drawn from the serial interval distribution, and k was subtracted from each sample. For each serial interval observation, we simulated 10,000 incubation periods using a lognormal distribution mu = 1.63 and sigma = 0.50 [6] and calculated the probability of pre-symptomatic transmission as the proportion of simulations where the serial interval minus the incubation period was <0 (Supplementary Material Figure  S3). The proportion of serial intervals with a probability of pre-symptomatic transmission >0.5 was used as the estimate of the proportion of pre-symptomatic transmission in the population.
All data manipulation was conducted in R version 3.3.1 [11]. Parametric distributions were t using the ' tdistrplus' package in R [12].

Descriptive statistics
Following initial data read in and selection of records with a primary case and contact identi er, there were 308,305 close contacts recorded from 110,408 unique primary cases. Data cleaning steps are detailed in Supplementary Material Table S1. The nal dataset for analysis included 471 infected close contacts from 471 primary cases, from 11 th April to 13 th December 2020. Figure 1 shows the distribution of serial intervals. The mean interval was 4.03 days (95% con dence intervals 3.75, 4.31), median interval was 4 days, the 25 th and 75 th percentiles were 2 and 6 days respectively and 4% of serial intervals were less than zero. Table 1 shows the breakdown of serial interval by region and age cohort of the primary case, and age cohort of the infected contact. When evaluated by age, serial interval was shortest when the primary case or the secondary case were in the older age cohort (>64). Relaxing the restriction on number of secondary cases from those who infected one other individual, to those who infected 3 or less other individuals and sampling one transmission event per infector, resulted in a dataset of 1,502 transmission pairs. Mean serial interval was 4.11 (95% con dence intervals 3.95, 4.28) days, 25 th , 50 th and 75 th percentiles were 2, 4 and 6 days respectively.

Fitted distributions
The parameters of the different distributions tted to the shifted serial intervals are shown in Table 2. Based on the lowest AIC, the shifted distribution of serial intervals (k =10) was best approximated by either a lognormal distribution with parameters 2.61 and 0.22 (AIC = 2373) or a gamma distribution with parameters shape = 21. 13 (Table 3). Figure 2 shows the pdf of each distribution . Table 3 compares the percentiles of the tted distributions and the raw data. Estimation of presymptomatic transmission By simulating from the resulting lognormal serial interval distribution and from a previous meta-analysis of international literature on the incubation period of SARS-CoV-2 [6], we found that 67% of draws from our serial interval distribution had greater than 0.50 probability of pre-symptomatic transmission. Using the gamma distribution to simulate the serial intervals, 66% of draws from our serial interval distribution had greater than 0.50 probability of pre-symptomatic transmission.

Discussion
Based on the national contact tracing data, we estimated the mean serial interval of SARS-CoV-2 in Ireland to be 4.03 (95% con dence intervals 3.75, 4.31). 4% of our serial intervals were less than zero. These local estimates of the serial interval are important in order to more accurately estimate Rt in Ireland.
The estimate from the current study is within the lower range of earlier international estimates (3.0 to 7.6 days) for serial interval, as reported by [5]. This lower estimate for Ireland may re ect the timing of the earlier international estimates, with many being taken from earlier stages of the pandemic. Since then, there have been signi cant international and national efforts to inform those with clinical signs to isolate from susceptible individuals. It is anticipated that with greater awareness and efforts to isolate with the onset of symptoms, pre-symptomatic transmission would account for a greater proportion of all transmission events. In support of this, simulating from the resulting tted distributions, we estimate that 67% of transmission events in our population occur prior to the onset of symptoms in the primary case. In addition, national public health interventions are likely to have impacted on this value. Previously, it has been shown that national interventions can be expected to reduce the serial interval as within household transmission becomes relatively more important if strict national interventions reduce virus spread elsewhere [8]. An additional possibility could be related to a higher frequency of contacts for the Irish population. However, Ireland has a lower population density than most other countries for which the serial interval has been calculated, suggesting that this hypothesis is less likely.
We found that serial intervals were shorter when either the primary case or the secondary case was older than 64 years of age. To our knowledge, this nding has not been reported before. One potential explanation could be that the incubation period in elderly patients might be shorter, thereby reducing the serial interval when these individuals are the secondary case. However, this hypothesis is not supported by the literature on incubation period, in which there is some evidence to suggest that the incubation period could be longer in these individuals [13]. Similarly, the decrease in serial interval observed when the primary case was an elderly patient could potentially be explained by a reduced latent period, that is the time from infection to the onset of viral shedding, in this cohort. A reduced latent period would mean that elderly patients could have an earlier onset of infectiousness. Alternatively, elderly patients could be more infectious in the pre-symptomatic phase. Interestingly, a recent US study of residents of a skilled nursing home marked high levels of viral shedding in nursing home residents in the pre-symptomatic stage of the infection [14].
However, perhaps more likely is that transmission to these patients is even more biased towards presymptomatic transmission than the rest of the population. For example, given the occurrence of clinical signs in middle-aged (40-64 year-old) patients, it is likely that some mixing within that household (which might include a cohabitant of the same age cohort and/or children) is inevitable irrespective of within-household isolation efforts that are adopted. In contrast, greater efforts to restrict any contact with individuals in the older age cohort may be expected when compared to contacts within younger age groups due to the concern of higher risk of clinical severity in these older age-groups. Consequently, there is less 'opportunity' for post symptom onset transmission to older age-groups, such that a greater proportion of transmissions occur pre-symptomatically. In addition, it should be noted that contacts tend to occur with greater frequency within the same age cohort [9]. Therefore, an observation of decreased serial interval according to the age of the secondary case, could be confounded by an effect of the age of the primary case, and vice versa.

Limitations
Previous studies investigating the serial interval of SARS-CoV-2 have largely been based on early reports from Asia during the early stages of the pandemic. In those studies, serial intervals were frequently estimated from detailed descriptions of speci c outbreaks, tracing forwards and backwords from a smaller number of primary cases in order to trace in detail the particular outbreak [5]. In our study, contact tracing data were collected by trained individuals for the primary purpose of identifying close contacts of infected cases that were at a high risk of onward transmission, not for the purpose of epidemiological parameter estimation. The data are therefore largely collected as part of a trace forward effort. Consequently, there is a risk of mis-specifying the infector within our dataset, as illustrated in the Supplementary Material. To address this, we took additional steps to restrict the analysis to a subgroup of transmission events where there was greater certainty over the identity of the infector. One such step included restricting the analysis to events where the primary case was deemed to have acquired infection via community transmission (where the source of the infection could not be ascertained) whilst the secondary case was deemed to have been infected through close contact with a con rmed case. However, the effectiveness of this approach is dependent on the reliability of the conclusion of the contact tracer in relation to the source of infection. Further diagnostics possibly including viral genome sequencing could be used as an aid to more accurately identify the infector, however these data were not available in our study.
The date of symptom onset is prone to recall bias, therefore, we restricted our analysis to transmission pairs where case details were collected within one week of the onset symptoms for both the primary and secondary cases.
The date of symptom onset was collected during the contact tracing process. Whilst the date of symptoms onset was collected at multiple time points, it is possible that longer serial intervals may have been missed if patients were in the pre-symptomatic phase during the process of testing and contact tracing. This could have biased our estimate downwards.
Finally, since all reports of serial interval (and other parameters such as incubation period) are based on transmission events where there is a higher degree of certainty about the identity of the infector, it is likely that these reports are based on more intense contacts. This limitation is also present in our study and may have resulted in some bias in our nal estimate.

Conclusions
Based on contact tracing data, we estimate the mean serial interval of SARS-CoV-2 in the Republic of Ireland at 4.03 days. Simulating from serial interval and from an incubation period distribution based on the international literature, we estimate that 67% of transmission events had a greater than 50% probability of occurring prior to the onset of symptoms of the infector.

List Of Abbreviations
Rt: Time-varying reproduction number; HSE: Health Service Executive; CSO: Central Statistics O ce; AIC: Akaike Information Criterion.

Declarations
Ethics approval and consent to participate All methods were carried out in accordance with relevant guidelines and regulations. The study was approved by the National Research Ethics Committee (20-NREC-COV-099). The requirement for informed consent was waived by review board Health Research Declaration Committee (20-025-AF1/COV).

Consent for publication
Not applicable

Data availability statement
The data that support the ndings of this study are available from the Irish Central Statistics O ce but restrictions apply to the availability of these data, which were accessed by the primary author for the purpose of these analyses and under Section 20(b) of the Statistics Act, 1993 for the purpose of using data collected during the COVID-19 pandemic to aid in the national response.

Competing interests
The authors declare that they have no competing interests.

Funding
No speci c funding was received for this study.