Healthcare-associated COVID-19 across 3 pandemic waves: patient characterisation and validation of clinical denitions using genome sequencing

Background Worldwide, healthcare-associated SARS-CoV-2 infections are a major problem: they are associated with increased morbidity, mortality, and hospitalization costs. In-depth studies across the pandemic are crucial to understand and prevent transmission in hospital settings. The principal aims of this study were to characterise patients and validate ECDC denitions of healthcare-associated COVID-19 infections. Methods We set up a retrospective observational study spanning the rst three waves of the COVID-19 pandemic in a Belgian university hospital: it describes the characteristics of COVID-19 patients admitted, with either healthcare- or community-associated infections. We performed a cluster analysis through epidemiological and viral genome analyses of the healthcare-associated infections, in order to validate the ECDC denitions of healthcare-associated COVID-19 infections.


Introduction
The end of 2019 saw the emergence of a novel severe acute respiratory syndrome -coronavirus 2 (SARS-CoV-2), causing coronavirus disease 2019 . A third distinct epidemic wave was ending in Belgium by July 2021 1,2 . Because of the intrinsic viral properties of SARS-CoV-2 and increasing pressure on healthcare facilities since the beginning of the pandemic, healthcare-associated COVID-19 infections (HAIs) have been of major concern.
During the rst wave of the pandemic, approximately 10 to 15% of hospitalized COVID-19 cases were HAI 3,4 . Front-line healthcare workers also have an increased risk of acquiring COVID-19 by a factor of 3.4, compared to the general population 5 . A hospital-wide screening in the UK showed that 49% of healthcare workers who tested positive for SARS-CoV-2 were asymptomatic; 40% had mild symptoms; this contributes to the risk of in-hospital SARS-CoV-2 transmission 6 . Furthermore, even vaccinated individuals can become infected and transfer the virus to others.
However, HAIs in the rst wave might not be representative of later phases of the pandemic. Indeed, initial HAIs may have been partly attributable to incorrect isolation procedures, indistinct use of shared healthcare equipment, movements of infected personnel, and insu cient knowledge and awareness on viral transmission properties 7 . Knowledge on the virus' characteristics then improved dramatically. It seemed that the rate of HAI decreased, thanks to adequate responses from infection control teams; shortages of personnel protective equipment were not an issue anymore. Indeed, vaccines and nonpharmaceutical interventions have had dramatic effects on viral transmission 8 .
In the meantime however, the SARS-CoV-2 virus evolved from early 2020. Some variants of concern (VOCs) are now related to more severe infections, vaccine escape and are associated with increased transmissibility 9 . Indeed, transmissibility of the B.1.1.7 SARS-CoV-2 strain (the Alpha variant) is estimated to be 1.56 times higher than the previously dominant wild-type variants 10 3,13 . Fewer have focused on HAIs, despite the risks associated with HAIs including a higher burden of comorbidities, such as malignancies and renal impairment 14 . To the best of our knowledge no thorough investigation of HAIs across multiple waves of the pandemic has been conducted.
Another challenge is the correct identi cation of HAI. De nitions of HAIs have been proposed by several national and international healthcare organizations. In the current manuscript we will use the guidelines provided by the European Centre for Disease Control (ECDC) ( Table 1) 15 . Genomic sequencing can also provide valuable information to support the epidemiological ECDC HAI de nition 16 . Combining the epidemiological and genome sequencing approaches could even prove very useful in root cause analysis and outbreak investigations 17 .
We therefore decided to describe our HAIs in terms of patients' demographics and clinical data; we then combined ECDC de nitions with genomic cluster analyses, to provide an in-depth examination of the HAIs at our tertiary care centre, across the pandemic.

Timing and Setting
Our study was conducted from week 10 of 2020 until week 22 of 2021, at our academic hospital 'Universitair Ziekenhuis Brussel', a 721-bed Belgian tertiary care centre. The hospital has a maximum capacity of 132 "low care COVID-19 beds" and 36 "intensive care COVID-19 beds" during epidemic peaks.
The study was done using the hospital-wide severe acute respiratory syndrome surveillance database (SARI Registry) and was approved by our hospital's ethical committee (EC-2021-176).

De nitions
We used the ECDC case source de nitions of HAI, as described in table 1. Part B. We performed whole genome sequencing (WGS) and cluster analyses with all available samples (including non-hospitalised healthcare workers). Our primary objective here was to describe the different clusters on the basis of the genetic analyses of the viruses; our second objective was to compare the ECDC source de nitions with the sequencing analyses, as a validation tool for the ECDC de nitions.

Inclusion of patients and healthcare workers
All included subjects ( Figure 1) had a registered polymerase chain reaction (PCR)-con rmed COVID-19 infection.
1. CAI (used as 'controls' for the case-control study) We extracted demographic, clinical, and laboratory data from a random sample of all hospitalized CAIs (when hospitalized for >24 hours) from our SARI Registry in an anonymized manner.

HAI (whether indeterminate, probably or de nite)
Hospitalised patients with HAI (used as 'cases' for the case-control study). Their demographic, clinical, and laboratory data were also extracted from our SARI Registry in an anonymized manner.

Healthcare workers employed by our hospital and diagnosed with COVID-19
Those subjects included medical and non-medical staff employed in our hospital. The included HAI healthcare workers were not hospitalized. Their data, albeit less detailed, were carefully recorded by the infection control department.
Laboratory: inclusion of samples for genetic sequencing and cluster analyses Nasopharyngeal samples from HAIs and healthcare workers were systematically stored at -80°C. Samples with su ciently high (Ct value ≤ 25) viral load and remaining sample volume were included in the WGS analysis.
We adapted a SARS-CoV-2 WGS protocol 18 . Amplicon libraries were sequenced using MinION ow cells (Oxford Nanopore Technologies, Oxford, UK). Genomes were assembled with reference-based assembly and an in-house bioinformatic pipeline with 300× minimum coverage cut-off for any region of the genome. Consensus fasta sequences were generated using the tools from the artic network 19 . A custom scheme using primers of 1200 base pairs was used 20 . Lineages were assigned using the command-line version (3.1.5) of pangolin 21 . Gene sequences were uploaded onto the Global Initiative on Sharing All In uenza Data (GISAID)'s open access EpiCoV platform (accession numbers in appendix 1) 22 .
WGS data were processed with the SARS-CoV-2 plug-in of BioNumerics v.7.6.3 (Applied Maths, Biomérieux, Sint-Martens-Latem, Belgium). The subsequences of the Wuhan-Hu-1 (NC 04551219) reference genome were used as reference sequences for a BLAST search 23 . After the extraction, these subsequences were screened for single nucleotide polymorphisms (SNPs). Seven entries with an incomplete SNP character set were excluded from further analysis. Next, a similarity matrix was calculated based on the 86 remaining SNP experiments and minimal spanning trees (MSTs) were constructed. SNP distances were represented in the trees. Forty-eight reference sequences of the circulating VOCs at that moment (B.1.1.7, P.1, B.1.351) were downloaded from the NCBI website and added to the MSTs. Clusters are de ned as genomes with ≤ two SNPs difference and are marked with a contour. The National Health Service of the United Kingdom de nes a nosocomial cluster of COVID-19 as the occurrence of two or more cases of COVID-19 infection in a single setting (e.g. a single ward), where at least one case has become symptomatic or detected on screening at least eight days post hospital admission 24 .
The hospital wards were anonymized. The numbers correspond to the oor on which a ward is found.

Epidemiological and statistical analyses
For Part A, we described cases over time in epicurves using STATA scripts and Excel software, by week of diagnosis. All HAIs (patients and healthcare workers) and CAIs were included in the representations.  Amongst symptomatic HAI patients, the gender distribution was 54.8% males for 45.2% females, and the median age distribution was 76.0 years. Compared to CAI patients, HAI patients were signi cantly older (median age 76.0 vs. 64.0 years) (P < 0.0001) and smoked more (P = 0.0164); HAI patients had a lower BMI (HAI: 24.5 kg/m² vs. CAI: 26.9 kg/m²) (P = 0.0025) and were frailer (P < 0.0001). The following comorbidities were more frequent in HAI vs. CAI patients: i.e. anaemia (P = 0.0066), cancer (P < 0.0001), heart disease (P = 0.0001), liver disease (P = 0.0443) and renal disease (P = 0.0483), while others did not differ, amongst which in particular asthma (P = 0.074), and diabetes mellitus (P = 0.4917).
Thrombocyte values were signi cantly higher in HAI patients (232.0 x 10^6/µL) compared to CAI patients (182.0 x 10^6/µL) (P = 0.0011), but all were in the physiological range. D-dimer values were above the physiological cut-off of 500 ng/mL in both patient groups and signi cantly higher in HAI patients (1293.0 ng/mL) compared to CAI patients (785.0 ng/mL) (P = 0.0013). Absolute leucocyte numbers (P = 0.3334) and neutrophil / lymphocyte ratios (P = 0.5008) did not statistically differ between groups. Although both acute phase proteins were above physiological thresholds, ferritin levels did not differ signi cantly between HAI and CAI patients (P = 0.1786) whereas C-reactive protein (CRP) was signi cantly higher in CAI patients (73.9 mg/L) compared to HAI patients (32.6 mg/L) (P < 0.0001). Multiple logistic regression was performed. When demographic and laboratory parameters were found to be signi cantly different between HAI and CAI patients in the univariate analysis (Table 2), we looked at whether those factors were associated with each other or outcome measures (being an 'HAI' or a 'CAI').
Symptoms and comorbidities were not included in the analysis because of their strong relatedness.
We report our ndings in Table 3. The odds ratios re ect the effect of that parameter on the probability that a patient has a HAI. A patient's frailty index, CRP, and thrombocyte levels at COVID-19 diagnosis seem to be signi cant at predicting a HAI. Of the healthcare worker-related HAIs, 84% could be attributed to a HAI genome cluster.
In gure 3, we included hospital wards as different metadata in a similar epicurve and MST. Clustering of genomes has made it possible to detect outbreaks spanning different wards and locate these clusters at different timepoints across the full length of the pandemic. Please note that not all cases depicted in the epicurve (Figure 3/A) are included in the genome analysis of panel B. Also of note, details of the genomes are described in appendix one. A comparison of epidemiological and genome data on HAI is provided in appendix two. Figure 4 gives an overview of the genomes we sequenced. Panel A depicts the incidence of the different SARS-CoV-2 lineages across the pandemic. During the rst and second waves, HAIs could be attributed to viral genomes closely related to the reference Wuhan- Hu-1 strain (B.1, B.1.1, B.1.160, B.1.1.44, B.1.177

Burden of HAI across three COVID-19 waves
In our study of COVID-19 HAI, we report numbers of HAI that are comparable to other centres 3,4 . Interestingly, our study spans the rst three waves of the COVID-19 pandemic in Belgium; the number and percentages of HAI in our centre were stable across the rst two waves, and increased towards the third wave, despite more control measures in place by that time of the pandemic. This may re ect more infectious variants as well as more systematic screening of all hospital admissions. Indeed, the percentage of symptomatic HAI patients with time was lowest during the third wave, probably re ecting vaccination rollout, earlier diagnoses and more exhaustive testing across the hospital.

Characteristics of HAI vs CAI
We then went on to describe symptomatic HAI and compared them to CAI. Again, there may have been some selection bias re ected in our results; for example, patients with a HAI are expected to be in poorer health before infection, since they were already hospitalised.
In the univariate analysis, some factors seemed to support this selection of sicker patients: our HAI patients were older, had higher frailty scores, more pronounced smoking habits and more comorbidities.
Higher percentage of malignancies, kidney disease and older age were also observed in a British study on HAI versus CAI COVID-19 14 . With increased age, frailty, comorbidities, and initial reason for hospitalization in mind, it might not come as a surprise that HAI patients had a longer hospital stay from the time of COVID-19 diagnosis. Furthermore, more HAI patients were admitted to the ICU during their stay, compared to CAI. Finally, the mortality was also signi cantly higher in HAI, compared to CAI patients. Again, these differences are probably due to the selected population, rather than a causal link with HAI per se. Other studies had similar ndings which con rm the extent of the problem of HAIs in hospitalized patients 7 .
Apparent differences in the presence or absence of symptoms between HAI and CAI patients might be due to the timing of COVID-19 diagnoses. Indeed, as HAI might be diagnosed at an earlier stage of the disease, because of timeliness of disease detection and laboratory testing -especially at later waves of COVID-19, certain symptoms might not yet have been present in some HAI cases. Similarly for laboratory data, in particular the signi cantly lower CRP levels in HAI patients which may be indicative of an earlier stage of COVID-19 13 . Other confounders for some laboratory ndings, for e.g. D-dimers, are the underlying pathologies of hospitalized patients.
In the multivariate analysis frailty (+), thrombocytes (+) and CRP (-) were signi cantly associated with HAI. These may become important parameters to take into account when trying to decide if a patient has acquired his/her infection in hospital. Of note, BMI was not signi cantly associated with having a HAI, compared to a CAI, but we must stress we did not assess associations with severe outcomes -so our results cannot draw any conclusion on the link between BMI amongst HAI and severe COVID-19 disease.

Cluster analysis
In part B of the results' section, we report correlations between clinical criteria for the diagnosis of an HAI and genetic sequencing data. When considering bias and limitations, it is true that we were only able to sequence a proportion of the HAI samples.
Despite this, our sequencing analyses allowed us to validate the ECDC de nitions for HAI. Sequencing can therefore nicely complement descriptive analyses to describe clusters and seems to be a great tool to better understand COVID-19 transmission within hospitals. Indeed, due to the remaining uncertainties around the incubation period, pre-symptomatic transmission and asymptomatic infections, a de nite international consensus on the de nition of a HAI is yet to be de ned.
We described 12 clusters involving HAI. Some of the clusters we described were large, stressing once again the extreme infectiousness of this infectious agent. In some, different wards and oors were affected by the same cluster (cluster 10): movement of (undetected) infected patients and healthcare workers across wards might have caused transmission. Indeed, looking at gure 2, many clusters have at least a healthcare worker as part of the cluster, thereby suggesting that healthcare workers were involved in nosocomial transmission. A narrative review by Abbas and co-workers highlights the important

Conclusion
This study is an in-depth analysis of HAI in a university hospital in Brussels, Belgium, across all past waves of the COVID-19 pandemic (at the time of submission).
It appears that HAI patients tend to be older, frailer, and have more comorbidities. We conducted a genomic cluster analysis of our HAIs, and were able to validate the ECDC clinical criteria to identify HAIs.
Even if no infection control system will prevent HAIs entirely, we therefore suggest combining the ECDC HAI criteria with a timely and automated tracking system with genomic cluster analysis in real-time. Such a system coupled with a hospital-wide alert system, on a background of repeated screening and surveillance, could prevent or limit such large outbreaks in the future. This will become increasingly important with yet other variants emerging.
With those elements in hand, we should urge local and national policy makers to invest more in infection control and HAI surveillance.

Declarations
Contributors TD, LS, DP, SA, and IW designed the study and wrote the study protocol. TD, RB, OS, BC, and RV aided in sample selection and genomic sequencing. TD, LS, and JM did the clinical data extraction. TD, LS JP, TS, JB, HM, FC, and IW performed data analysis and visualization. TD drafted the rst version of the manuscript; LS worked on the second version. All authors had full access to all data, aided in nalizing the text and shared nal responsibility for the submission for publication.

Declaration of interests
No funding was asked for this study. The authors declare no competing interests.

Data sharing
Data of patients included in this manuscript are considered sensitive and will not be shared. The study methods and statistical analyses are described in detail in the methods section. Virus genomic data are shared on GISAID.
and-clusters/covid-19-epidemiological-de nitions-of-outbreaks-and-clusters-in-particular-settings. Legend not included with this version Figure 3 Page 16/16 Legend not included with this version Legend not included with this version

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.