Technical framework for wastewater-based epidemiology of SARS-CoV-2 based on relative quantification via qPCR


 The global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has greatly affected people’s normal production, living, social and economic activities. Wastewater-based epidemiology (WBE) is expected to become a powerful tool to monitor the dissemination of SARS-CoV-2 at the community level, which has attracted the attention of scholars all over the world. However, there is not yet a standard protocol to guide its implementation. In this paper, we attempt to propose a technical framework of relative quantification for determining the virus abundance in wastewater and estimating the infection rate in corresponding communities, which is expected to achieve horizontal and vertical comparability in virtue of human-specific biomarkers as internal references. A comprehensive theoretical framework for relative quantification of viruses by qPCR is provided and discussed in detail. Critical factors affecting the virus detectability in wastewater and the estimation of infection rate include virus concentration methods, lag-period, per capita virus shedding amount, sewage generation rate, temperature-related decay kinetics of virus/biomarkers in wastewater, and hydraulic retention time (HRT), etc. Theoretical simulation shows that the main factors affecting the detectability of virus in sewage are per capita virus shedding amount and sewage generation rate. While the decay of SARS-CoV-2 in sewage is a relatively slow process, which has limited impact on its detection. Under the ideal condition of high per capita virus shedding amount and low sewage generation rate, WBE can give early warning for single infected person among 400,000 people.

qPCR) is commonly used to detect the RNA of SARS-CoV-2 in sewage. Most of them adopt absolute quantification, and virus gene standard (normally a plasmid with SARS-CoV-2 genes) is needed to construct standard curve [22]. Absolute quantification gives the real concentration of virus RNA in sewage, which can be correlated with local clinical observations [17,22,31]. While other studies only reported the value of cycle threshold (CT) [16] or even just gave qualitative results [19]. Such data provide limited information for monitoring actual infection rate in the community. Various factors affect the accuracy of absolute quantitative estimation of community infection rate or hinder the lateral comparison between different regions. For example, in the combined sewage system, the diluting effect of rainfall will affect the concentration of virion in sewage.
In addition, the individual sewage generation rate is in the range of 50-500 L/person/day, which fluctuates with seasons and affects the concentration of virus in the sewage.
Furthermore, in order to estimate the infection rate in a community, it is necessary to accurately measure the community population, i.e. the population normalization for the data [32]. This is a challenging task for areas with high population mobility. Several studies have tried to quantify some population-related biomarkers, such as human ribonuclease P [14], creatinine, urea, benzotriazole, caffeine etc. [15,17] to achieve this goal. Surprisingly, so far, no research has adopted relative quantification using a humanspecific biomarker as internal reference.
In this paper, we try to provide a better alternative for monitoring the SARS-CoV-2 in sewage, adopting the relative quantification with qPCR. Based on this concept, a self-contained technical framework is to be proposed by incorporating a human-specific 7 fecal biomarker as internal reference during the quantification. The nucleic acid property of the chosen biomarkers confers themselves the possibility of being quantifiable with the same processes and platforms as the viral nucleic acid. The promising biomarkers that can be used as internal reference in the relative quantification with qPCR are recommended. The critical stages and key points involved in the WBE of SARS-CoV-2 are evaluated. The decay kinetics of biomarkers in sewage, which is one of important factors affecting the detection sensitivity, is studied in a mimic mode in virtue of qPCR.

Relative quantification via qPCR
Two popular operation modes of qPCR are available for the quantification of genetic RNA or DNA: SYBR green and TaqMan. In this study, we prefer using the TaqMan mode due to its improving specificity. During PCR amplification, the fluorescence signal increment at the Nth amplification cycle (∆ ) can be expressed as: where is the luminescent intensity of unit fluorescence molecule; 0 is the initial copy number of the gene to be amplified; is the apparent amplification coefficient, which is 2 under ideal conditions.
By selecting the characteristic genes of biomarkers related to population (such as characteristic genes of human feces-associated bacteria or bacteriophage [33]) as reference genes, the relative quantification of the target virus genes can be realized. The 8 same threshold of fluorescence signal (∆ ) is set for both reference and target genes.
In the case of using the same fluorescent reporter molecule (i.e. = ), the relative abundance of the target virus gene (Ra) can be expressed as: Where CT is the threshold cycle number; the subscript s indicates the reference gene; and the subscript t refers to the target virus gene.
For the population in a certain sewage catchment, the relative abundance of virus gene in the sewage is proportional to the infection rate (Ir) in the catchment, as shown in Eq. (4): Where the infection rate = , is the ratio of the number of infected persons (PI) to the total population (PT) in the sewage catchment; Sr is the personal shedding amount of biomarkers (copies/person/day); η is the decay rate of biomarkers in the sewage related to temperature and hydraulic retention time (HRT); and r is the recovery rate of biomarkers. Similarly, the subscript s indicates the reference gene; and the subscript t refers to the target virus gene. Then the relative abundance of target virus gene in the feces sample of infected person (Ra,0) can be expressed as: Then, the infected rate can be calculated with the following equation. 9 SARS-CoV-2 is an enveloped positive-sense single-stranded RNA virus, showing a higher propensity for mutation [34,35]. The inclusivity of the primer-probe sets currently used in RT-qPCR needs to be evaluated. Over 5,000 SARS-CoV-2 genome sequences were downloaded from GISAID's EpiCoV TM database in early April. After filtering out the sequences with poor quality, the remaining 3986 sequences were used for in silico analysis of present primer-probe sets. The genome sequence of bat coronavirus RaTG13, which has an identity of 96.2% with the genome of SARS-CoV-2, was also included [1]. All the downloaded genome sequences were clustered with CD-HIT-EST using a sequence identity cut-off of 1.0 to group genomes with identical sequence into one cluster [36]. Representative genome sequences (n=2257) of all clusters were aligned with the multiple sequence alignment program MAFFT [37]. The mutation rate of each site in the target binding region of every primer/probe was recorded, respectively. The primer-probe sets with less mutation rate in the targeting region is considered to be more inclusive. The exclusivity of primer-probe sets was also determined by assessing their ability to distinguish SARS-CoV-2 from RaTG13. The primer pairs with high inclusivity and high exclusivity were further evaluated via

In silico performance evaluation of primer-probe sets
Primer-BLAST at the website of national center for biotechnology information (NCBI).

Virus detectability in sewage samples
The concentration of biomarkers corresponding to the reference gene (population-related bacteria or bacteriophages) is relatively high in the sewage.
Therefore, it is expected that there will not be a situation in which they cannot be 10 detected. The detection limit of the relative quantification is the same as that of the absolute one, which is determined by the concentration of virus RNA in the sewage.
In this work, the sensitivity of the whole method is expressed in terms of the scale of a Population containing a Single Infected Person (PSIP), among which the SARS-CoV-2 RNA shedding off the infected person is detectable in the sewage samples collected from the catchment of the same population. The higher the value, the more sensitive the detection is. The SARS-CoV-2 loading to sewage was estimated using recently reported its excretion rate (Er) in human stool (copies/g feces), and assuming a fecal load (Mf) in the range of 100-400 g feces/day/person. Then, the PSIP can be expressed as: Where LOD means the lowest limitation of detection, which is set to be 1 copy/mL sewage [38]; Vs is the per capita sewage generation rate (50-500 L/person/day).

Decay kinetics of biomarker in Sewage
It worth noting that the decay kinetics of biomarkers in wastewater is an important factor to be considered whether it is relative or absolute quantification. The biomarkers (reference microbe or the virus) will decay exponentially with time in sewage of given composition [27,39].
Where n0 is the initial amount of biomarkers discharged into the sewage system; n is the amount after hydraulic retention time (HRT) of t; k is the first order decay rate 11 constant; and t1/2 the half-life (i.e. T50 the time for half decay).
The decay rate η can be expressed as: According to Eq. (8) Where T90 is the time required for 1 log10 decay, and T99 the time for 2 log10 decay.
The decay of biomarkers in sewage is closely related to the temperature, which is expressed with the Arrhenius equation.
Where Ea is the activation energy of the decay reaction (J/mol); R is the ideal gas constant (8.314 J/mol/K); T the absolute temperature (K); and C the constant related to standard state setting. Therefore, the decay rate constant and half-life at different sewage temperatures can be calculated using Eqs. (14) and (15).
Where k1 is the decay rate constant at temperature T1; k2 the decay rate constant at temperature T2; t1/2,1 is half-time at temperature T1; t1/2,2 is half-time at temperature

T2.
The decay kinetics of the virus in the sewage also affects the detection sensitivity. 12 According to the linear correlation between lnk and 1/T [Eq. (13)], the decay activation energy (Ea) of SARS-CoV-2 RNA in sewage can be deduced using the reported decay rate constant at different temperatures [39]. The decay rate constant at each temperature is calculated by introducing Ea into Eq. (14). The impact of temperature and HRT on detection sensitivity is evaluated.

Critical stages and key points
The critical stages and key points involved in the wastewater-base epidemiology of SARS-CoV-2 are summarized in Table 1. Many factors need to be considered in both relative and absolute quantification, while relative quantification exhibits more advantages over absolute quantification. The ultimate purpose of determining the virus content in sewage is to estimate the infection rate at population level. The results from absolute quantification are sensitive to the fluctuation of sewage generation rate (Vs) and the dilution effect of rainfall events. As the reference and target biomarker will be diluted to the same extent when sewage generation rate increases or the rainfall event occurs, the relative quantification will not be affected too much. In our previous work, qPCR was used to quantify the prevalence of enteropathogenic E. coli (EPEC) in sewage successfully. The fluctuation of absolute quantification data spanned two orders of magnitude over a period of one year, while relative quantification data changed by no more than an order of magnitude [40]. The relative quantitative data more accurately reflected the changes of EPEC in the sewage. In the relative quantification process, it 13 is not necessary to accurately record the volume of samples used for virus recovery and nucleic acid extraction, nor to accurately determine the volume of the final nucleic acid extract. In addition, there is no need of the target gene standard to construct the standard curve. Even so, in order to ensure the reproducibility of the results and the standardization of the methods, essential information such as sample processing procedure, qPCR protocol and data analysis method etc. should be provided when reporting, as recommended in previously published guidance [41].
One of the major challenges in the practice of WBE on the community monitoring of COVID-19 is the establishment of standardized methods and procedures. Among them, the recovery of virus from sewage samples is the first and most important step. However, as an enveloped virus, SARS-CoV-2 is different from nonenveloped virus in nature, which will affect its partition behavior in sewage. Therefore, the concentration method should be adjusted accordingly [42]. According to our knowledge and reported studies, the virus adsorbed on the sewage particles cannot be neglected and may even account for the major portion of the SARS-CoV-2 in the sewage [17,43]. By far the highest value of virus content ever detected in sewage-related samples is from the 14 primary sewage sludge, and the value is as high as 4.6× 10 8 RNA copies/L [14].
Simultaneous recovery of virions from both liquid and solid phases also gave higher detection sensitivity (higher PSIP values) as shown in Table 2 [19,44]. However, we do not recommend directly using the virus RNA abundance in the primary sewage to person, which also fluctuates throughout the infection period [8,10]. In order to incorporate these factors into consideration, the range of infection rate is ought to be estimated via such as Monte Carlo model [18]. In addition, the decay kinetics and recovery efficiency of viruses and reference biomarkers from sewage are also factors that need to be considered, which will be elaborated in the following sections. Under the relatively loose experimental operation standard, the relative abundance data 15 obtained by different laboratories can be compared horizontally. In addition, the relative abundance data obtained from sewage catchments of different population sizes can be directly used to compare the severity of epidemics among communities. The results can be used to guide regional policy adjustment for either restart of economies or continuous lockdown. For a specific community within a sewage catchment, the parameters (ηs, ηt, rs and rt) in Eq. (6) can be considered as constants when using fixed sampling mode, virus concentration method, nucleic acid extraction method and standardized qPCR relative quantification operation. The equation can be simplified to If accurate clinical diagnosis data are available, the constant K can be accurately estimated by correlating the actual infection rates with the relative abundances of virus RNA in sewage. With sufficient basic data, the method of WBE can even be combined with machine learning to further improve the prediction accuracy of community infection rate.
Besides the above key points which may affected the performance of the quantification, biosafety related factors also need to be considered. Scholars in Italy and Germany have proved that the infectivity of SARS-CoV-2 in sewage is limited [15,17]. However, the possibility of fecal-oral transmission cannot be completely ruled out.
Furthermore, it seems that SARS-CoV-2 can be effectively removed in the sewage treatment process. Scholars from China, Spain, India, and Italy have detected the RNA of SARS-CoV-2 in the influent of the sewage treatment plant, but not in the effluent [15,16,20,45]. That is to say, the location of sampling may affect the detectability of the virus. Personal protection (wearing N95 mask, gloves, and goggles etc.) during sampling and pasteurization of samples (60 o C, 30-90 min) before virus concentration are essential for biosafety reasons.

Performance of primer-probe sets
A total of 18 currently used primer-probe sets were evaluated in this work (Supplementary Table S1). Their positions in the SARS-CoV-2 genome are illustrated in Supplementary Figure S1. The mutation rate of SARS-CoV-2 genome at the probe/primer targeting regions is shown as a heatmap in Figure 2A. The original data are available as online supplementary datasets (Supplementary Dataset S1 and S2).
Four primer-probe sets with high inclusivity (low mutation rate at their targeting regions, marked in green in Figure 2A) and high exclusivity (more mismatches with the sequence of RaTG13, Figure 2B) were recommended ( Table 3). Sequences of some primers were amended according to the sequence alignment and/or Primer-BLAST analysis, and the primers were renamed accordingly. Although we have carefully evaluated these primer-probe sets, it was still found that some mutant virus sequences in the database, especially those uploaded after April, could not perfectly match the primer/probe sequences. Therefore, we suggest that two or three primer-probe sets should be used simultaneously in practice to ensure the accuracy of detection results.

Reference biomarkers
To realize the relative quantification of SARS-CoV-2, a population-related reference biomarker needs to be chosen. There are numerous assays available for the quantitative assessment of human fecal pollution [46], which can meet this demand.
Those qPCR-based microbial source tracking (MST) methods targeting mitochondrial DNA, rRNA and functional genes involved in microorganism-human interaction are favorable candidates. In Table 4, we recommend some promising qPCR assays that have been proved to be effective for the detection of human-associated MST markers.
The genes from the three main sources can be used as reference genes for the relative quantification: genes from human hosts (mitochondria) [47], genes from humanassociated bacteria (Bacteroidales) [48] and bacteriophages (crAssphage) [33]. As shown in Table 3, the MST method targeting the reference gene should be of high enough sensitivity (true-positive rate) and specificity (true-negative rate). Furthermore, the reference gene must also be of a high enough content in feces or waste, so as to ensure that it can still be detected in the sewage even at a high dilution level.
The validity of reference biomarkers (reference genes) can be verified by comparing the abundance of different reference biomarkers. The premise of this selfconsistent mode of the validation is that all reference biomarkers are all population related. Dilution will not affect the relative abundance between different reference biomarkers. Therefore, the relative abundance will not fluctuate dramatically over time.
The result can also support the feasibility of relative quantification of SARS-CoV-2. In addition, reasonably selected reference gene itself is a good internal control of the whole method, which can exclude the influence of concentration efficiency, inhibiting substance of PCR and other factors to a certain extent. 18 First, personal virus shedding amount and per capita sewage production will directly affect the virus concentration in the sewage, thus affecting the detectability.

Factors affecting the detectability
The virus shedding varies from person to person. Even for the same infected person, the virus emission also changes during the whole infection period. The personal sewage production fluctuates with seasons. According to previous reports, the load of SARS-CoV-2 in feces is in the range of 10 3 -10 8 RNA copies/g [8,12,24]. We assume the fecal load (Mf) in the range of 100-400 g feces/person/day and the sewage generation rate in the range of 100-400 L/person/day [27]. The detection sensitivity (PSIP) is calculated with Eq. (7) as high as 4×10 5 (Figure 3), which means that one single infected person is detectable among a population of 4×10 5 at ideal conditions with the highest load of virus in feces (10 8 RNA copies/g), the highest fecal load (400 g feces/day/person) and the lowest sewage generation rate (100 L/person/day). With the decrease of personal virus shedding amount and the increase of sewage generation rate, the detection sensitivity gradually decreased to PSIP=0. 25, which means that in the case of extremely low virus shedding amount (1 × 10 5 RNA copies/day/person) and high sewage production (400 L/person/day), the detection of sewage from one single infected person may also be negative. However, in practice, when encountering an infected person with such a low virus shedding amount, it may be caused by the following two situations: one is that the person has just been infected, and the virus has not yet been copied in large quantities; the other is that the patient has recovered from the disease and the virus shedding has been reduced to a very low level. The above two types of infection do limited harm to the community, so even if they are not detected, they will not have a 19 great impact on the disease surveillance in the community.
On the other hand, the detectability is also affected by the decay kinetics of virus and the HRT. The decay of virus may be due to biological and chemical activities in sewage [49]. Pasteurization can delay the decay kinetics of virus in sewage via eliminating bacterial extracellular enzyme activity and protozoan or metazoan predation [42]. Recently, Ahmed et al. reported the decay kinetics of SARS-CoV-2 in pasteurized and unpasteurized sewage [39]. We extracted the decay kinetics data of SARS-CoV-2 RNA in unpasteurized sewage from Table 3  also be discharged into the sewage system, such as sputum. The virus content in sputum is as high as 2.35×10 9 RNA copies/mL [8]. Therefore, the shedding amount of virus in the sewage from infected persons should be higher than what we have estimated. The previous simulation study has shown that the detection of one infected case among a population of 2,000,000 is theoretically feasible [27]. However, according to our estimation, this population should be in the order of 100,000 among which the detectability of single infected person is more promising. The experimental study also has proved that sewage sample started to give positive RT-qPCR signal of virus genes when the observed COVID-19 prevalence was around or even below one case in 100,000 people [22]. According to the calculations, under optimal conditions, one infected person can be detected in a population of 400,000 by the means of WBE. For Chengdu, a city with 16 million residents, a reasonable layout of 40 to 160 sampling points will be sufficient for the entire city's epidemic surveillance of SARS-CoV-2.
Such a small-scale of detection (40-160 samples) can even be routinely completed in our laboratory every day to implement the real-time monitoring of the epidemic situation for the entire city. 21 The decay kinetics of biomarkers (target or reference genes) in sewage can be studied via microcosm experiments in the laboratory [39]. Freshly collected sewage samples will be spiked with certain number of biomarkers (bacteria, bacteriophages, or inactivated SARS-CoV-2 virions) and incubated at certain temperature. QPCR will be still used to track the changes of biomarkers in the sewage.

Mimic study on the decay kinetics of biomarkers
, , = ( Pseudomonas phage ϕ6 [42]. According to the detected temperature of sewage samples, introducing corresponding t1/2 and the estimated HRT (t) into Eqs. (9) and (6) will give a reasonable estimation of the real infection rate in the sewage catchment with the experimentally determined relative abundance of SARS-CoV-2. In addition, if the 22 bacteriophage is chosen as the reference biomarker, its decay behavior in sewage may be similar with SARS-CoV-2, due to their similar properties [50].

Summary of the technical framework
The technical framework of WBE for SARS-CoV-2 is illustrated in Figure 4.

Conclusion
 By incorporating human-specific fecal biomarkers (genes) as internal reference of qPCR, a technical framework of WBE based on relative quantification has been established for monitoring the dissemination of SARS-CoV-2.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information
 Supplementary Table S1. Information of RT-qPCR primer−probe sets analyzed in this work.  Acknowledgement table for downloaded genome sequences from GISAID EpiCoV database. Table 1. Critical stages involved in the WBE of SARS-CoV-2 and key points to be considered. Some of these important factors had ever been summarized in a previous review [51]. Reporting the results  Type of reported values (relative abundance of viral genes, C T values or gene copies/mL wastewater)  Estimating the community epidemic burden (infection rate)  Uncertainty of quantification  Validation and efficiency of the method  Estimating the daily distributed lag of the detection [14] *: To realize the relative quantification of the target virus (gene), a reference biomarker (gene) whose content is closely related to population size need to be chosen.      . The experimental data of the decay kinetics of SARS-CoV-2 RNA are available in reference [39].