Our study characterized the global diversity of genomic surveillance strategies and sequencing availability, sequenced coverage of SARS-CoV-2 cases, public availability extent of variant sequences, as well as current epidemic situation of SARS-CoV-2 variants. We found that genomic surveillance strategies were globally heterogenous, with limited or no routine surveillance among many countries in the Africa and Eastern Mediterranean Regions. Our analysis of publicly deposited SARS-CoV-2 sequences implied that the sequenced coverage is low in most countries, with a low proportion of VOCs sequences shared to public repositories. The pervasive spread of Alpha and Delta variants further highlights the threat of SARS-CoV-2 mutations despite the availability of vaccines in many countries.
The diversity of SARS-CoV-2 genomic surveillance between countries is associated with country-specific priorities (e.g., surveillance objectives, targeted monitoring, or event-/risk-based sequencing) and available resources. ECDC recommends population-based and/or targeted sampling strategies (e.g., imported cases, cluster cases, and potential vaccine escapers) for genomic surveillance, which could provide a more representative estimate of the relative prevalence of variants. Notably, several countries, many of which are classified as low- or lower middle-income countries by the World Bank, lack genomic surveillance data, likely due to limitations in infrastructure capacity and resources. However, even some countries classified as high-income, have suffered from a slow and inconsistent process of adopting genomics-based surveillance22. Establishment of reference laboratories and networks to provide sequencing services for countries without established sequencing capacity may enable improved detection and tracking of emerging variants worldwide.
The detection of most variants relies on the full-length or partial genomic sequencing, but the sequences only become available for the global community when the laboratories have established sequencing capacity, willing to share, and legally allowed to upload them. The discrepancies in sharing was observed in each region, which confirmed that some countries are sequencing but are not uploading. However, our study observed a sharing extent of exceed 100% exists in some countries, likely due to delays in the official reporting of sequencing results, or the incomplete official reporting system. The timely sharing of those enables to adequately contextualize local data when looking at introductions and examine transmission routes, as well as to look for sites of repeated mutations that can guide laboratory work on characterizing those mutations effects on therapeutics and vaccine efficacy. The underlying reasons why some countries didn’t share might be related with the distrust for publicly repositories in the concept of data security.
We found relatively low completeness of demographic and clinical characteristics in metadata accompanying uploaded sequences. Our analysis revealed that high-income countries frequently did not share demographic information. A possible reason for this is that these regions may having more restrictive data privacy/laws preventing/discouraging the release of this information. Genomic data coupled with those additional data can maximize the utility of genomic sequences in rapid scientific discovery during this pandemic, which are valuable for in-depth epidemiological analyses to characterize risk factors, clinical severity, and other public health risk of variants23–25. Therefore, it’s vital to optimize the sharing of information in a secure and trusted channel in the context of protect patient anonymity and in accordance of local regulations26. In addition, decreasing the lag between sample collection to deposition of these sequences27, including the timely sharing and standardizing of metadata 18,23,28, may facilitate the design and development of treatment and prevention strategies by policy-makers29.
An important role of genomic surveillance is to investigate the spread and dynamics of SARS-CoV-2 variants. Amidst the emergence of different variants, the current dominance of the Delta variant suggests that it may possess higher fitness than other variants, which might be associated with a combination of higher virus load, and shorter incubation period and serial intervals30–32. The decrease in real-world vaccine effectiveness against Beta variants11 and increasing breakthrough cases with Delta variants33 underlines the importance of determining the local or regional patterns of variant spread, including the need to develop new or modified vaccines to achieve adequate protection34.
Our results should be interpreted in view of several limitations. First, the lack of data from some countries limited our global mapping. The data completeness and quality could be impacted by key steps in the surveillance or reporting, including differences in diagnostic criteria, under-reporting, delayed reporting, and reporting methods. The inconsistent diagnostic criteria of variants might cause sampling bias, especially when adopting PCR assay to detect Alpha variant owing to its non-specificity35. We did an extensive search to collect multi-source data and chose the aggregated data with a priority to sequencing results rather than PCR-screening results. Second, the analysis of global and national spread could be biased as data from public repositories or aggregated dataset are not always representative of the variants circulating in the regions, especially for the regions with relatively limited sequencing capacity or with investigating outbreak-based events. Therefore, the global prevalence of variants may be biased due to the uneven sequencing across the regions. Indeed, it’s difficult to obtain a truly representative and random sample, and how to understand these biases will become important24. Lastly, the detailed demographical, epidemiological and clinical information about variant cases cannot be accessed, which limited our further epidemiological analysis about variant spread.
In conclusion, our study provides a landscape for genomic surveillance, the global coverage of sequencing and public availability of sequences, as well as the evidence for rapid spread of SARS-CoV-2 variants. Our findings suggest that global SARS-CoV-2 genomic surveillance strategies and capacity are diverse, and may be limited in some regions, especially in the context of the global spread and dominance of variants of concern. The gap still exists in sequencing availability and magnitude, therefore international efforts are needed to address some genomic bottleneck, and more work are needed to be done in defining the ideal sampling schemes for different purposes and sharing these data in public repositories to allow for further rapid scientific discovery.