As the causal agent of coronavirus disease 2019 (COVID-19), SARS-CoV-2 is characterized by a high mutation rate, leading to the rapid emergence and global dissemination of new viral variants [8]. Consequently, ongoing real-time genomic surveillance is essential to track the spread of the disease and monitor viral evolution. In regions where sequencing capacity is limited and demand is high, pooled sequencing offers a cost-effective strategy to obtain substantial genomic data, thus enhancing our understanding of the virus’s evolutionary dynamics.
The bioinformatics analysis tool Freyja was utilized to estimate the relative frequencies of SARS-CoV-2 variants in pooled samples, employing a statistical model that incorporates a predefined set of genomic polymorphisms specific to these variants [7]. This approach has demonstrated efficacy in monitoring sewage samples across several countries [9, 10]. In this study, the software, based on Freyja, effectively identified the correct variant proportions in the majority of simulated samples. Nonetheless, there are notable considerations regarding the implementation of pooled sequencing in genomic surveillance that warrant further attention.
Not all simulated samples were accurately recovered, primarily due to original samples lacking definitive mutations and exhibiting lower extraction quality. For instance, the simulated samples harboring B.1.1.48 (assessed as low-quality by nextclade) and BA.2.76 (with less than 96% coverage) led to 36.8% unidentified cases among top X results and 80% among complete results. Therefore, ensuring the quality of original samples before pooling is essential, as those of poor quality can compromise identification accuracy. Additionally, observations indicated that samples with lower viral loads could be overlooked in the final pooled real-time reverse transcription polymerase chain reaction (RT-qPCR) result [11], emphasizing the importance of pooling samples with similar nucleic acid concentrations.
The software utilized for analyzing genomic composition ranked the abundance, revealing differences between the top X results and the complete results. Upon simulating various pooling scenarios, statistical discrepancies were noted among different pooling strategies, particularly when focusing on the top X results. Optimal consistency between the simulated samples and original samples was observed in the recombinants group, while the mix group showed the lowest level of consistency. These findings suggest that the complexity of the pooled samples significantly impacts the accuracy of identification. The models of pooling tests in RT-qPCR [12, 13] and wastewater-based epidemiological monitoring [14] serve as valuable references for the practical implementation of pooled sequencing for SARS-CoV-2.
While the overall results demonstrated improved identification compared to the top X approach, there were instances of inaccurate genotyping due to the presence of low-abundance mutation mixtures. The freyja method incorporated a bootstrap technique to calculate standard errors for predicting variant compositions. However, determining the optimal cutoff value for genomic composition results that strikes a balance between sensitivity and specificity remained a challenging task.
The study commenced by sporadically conducting gene sequencing on a pooled sample of routine RT-qPCR tests, successfully recovered the genotypes present in the individual samples. Due to the impracticality of pooling numerous samples into diverse groups, a simulated study was conducted to assess the feasibility of pooled sequencing. An additional constraint is that pooled sequencing can solely determine the viral lineage compositions and abundance, necessitating individual identification when new variants emerge, akin to pooling tests in RT-qPCR [15].
In summary, this study utilized simulated mixed samples to assess the feasibility of pooled sequencing with analysis using the Freyja tool. The findings demonstrated the successful recovery of the gene composition of the original samples. Therefore, pooling sequencing presents itself as a promising tool that can enhance genomic surveillance efforts in combating COVID-19 in a cost-effective manner.