How to Anticipate Public Health Indicators Using Wastewater SARS-CoV-2 Concentration During the COVID-19 Pandemic. Madrid Region Case Study

This study analyzes the presence and dynamics of SARS-CoV-2 in sewage sanitation systems in the Madrid region. The statistical results and data generated are presented daily via a platform as a tool for the early detection of SARS-CoV-2 and its spread based on the WBE (wastewater-based epidemiological) approach. The number of sampled points amounts to a total of 289 sampling points in terms of SARS-CoV-2 concentration that collects over six million and a half inhabitants’ discharges. The project was developed by Canal de Isabel II, the water utility company from Madrid. The research evaluates the correlation found between SARS-CoV-2 concentration in wastewater and the following public health indicators: incidence rate, reported active cases and COVID-19 hospitalization data (regular hospitalization and ICU admission cases). SARS-CoV-2 presence and dynamics in wastewater show a strong connection with both 14-day incidence rates with active infection and reported COVID-19 hospitalizations. A lag varying from 3 to 8 days between wastewater presence and hospitalizations is explained because the infection is found in the feces of patients before symptom onset. The resulting data are available for consultancy on the company’s website (named VIGÍA project) as well as on the regional government’s websites. The results have already been useful to anticipate the second and third COVID-19 waves in Madrid. Information is shared daily with health authorities for consultancy and decision-making. The results are available as an aggregation for the entire region and for each sewershed.


Introduction
The aim of this study is to analyze the presence and dynamics of SARS-CoV-2 in the sewage sanitation system in the region of Madrid as a suitable tool for early warning detection of SARS-CoV-2 spreading from a wastewater-based epidemiology approach, with the goal of supporting government decisionmaking. It is named the VIGÍA project.
Canal de Isabel II is the company responsible for water infrastructure management and maintenance in the Madrid region, which consists of more than six million and a half inhabitants located in 179 municipalities.
Canal de Isabel II started monitoring SARS-CoV-2 spreading on wastewater from Madrid at the end of March 2020, when there were approximately 10,000 daily con rmed cases and a total of 850 deaths had already occurred in Spain [3]. The virus spreading has been monitored for over a year (April 2020 -today) for the full population of the region. The methodology was tested and upgraded during the rst months of the project by means of pilot tests. It can be said that today, it has reached a solid system in both numerical and graphical formats. This manuscript presents this methodology and the correspondence found between SARS-CoV-2 concentration in wastewater and different public health indicators.

Materials And Methods
Due to the novelty of the case study, Canal de Isabel II ran several pilot tests to determine adequate onsite sampling points on the sewer network. The aim is to monitor the full population of the Madrid region in terms of SARS-CoV-2 presence.

Sample locations' selection criteria
It was concluded that at least 289 sampling points onsite were required to collect sewage from more than six million and a half inhabitants [4]. The selection of sampling locations followed the following rules: A single sampling point could collect up to 25,000 equivalent inhabitants.
There could not be a sampling point located further than 3.5 km from the discharging population center.
There could not be a sampling point located further than 2.5 km from the upstream sampling location.
A sampling location coincides with a manhole either along the sewer network or at the inlet of the wastewater treatment plant. The ease of accessibility to the sampling taker was also a determinant criterion, so the water level is usually accessible from overground, and it does not require a sampler to enter the sewer pipe.

Madrid region -The full sewershed to sample
The Madrid region consists of 179 municipalities where there are a total of 6,779,888 registered inhabitants [4]. Wastewater is treated in 157 WWTPs as well as sewersheds [5] and 16,605 km of sewer pipes in the Madrid region. A WWTP is the outlet of a sewershed. Only in the municipality of Madrid there are eight sewersheds. According to Figure 1, from upstream to downstream, every WWTP discharges untreated sewage into the downstream sewer network that leads into the next WWTP, con rming that the sewer network of Madrid follows a tree scheme.
In terms of the current study, in total, 289 subsewersheds were de ned (from now on sample locations or SL), of which 289 outlets were de ned manholes (Figure 2, left). Polygons represent the population centers discharging into a sample location. There can be one or more sample locations by municipality according to the population density being 51 only in the city of Madrid (Figure 2, right). The higher the city's population density, the greater the number of sample locations. Most downstream sample locations collect wastewater coming from upstream sample locations due to sewer network tree scheme features.

Sampling campaign: eldwork and laboratory analisis
The large number of samples as well as its onsite storage requirements, the urgency and impact of the study, and accessibility to the sewer network involved a logistical challenge when de ning the sampling campaigns. Three laboratories are currently responsible for sampling SARS-CoV-2 presence: one is part of Canal de Isabel II sources, while the other two are outsourced. Onsite, they count on a team of one to three eld workers each one.
The sampling strategy must follow a weekly frequency to minimize the variability that results show during the day, and furthermore, samples should be taken at a xed time for every point to better capture the dynamics from week to week [6,7].
For this reason, to capture a reliable xed picture of the virus presence, the sampling frequency was set to a weekly basis. Samples are analyzed following the following ground rules: Samples from the same sewershed will always be analyzed in the same laboratory.
Wastewater will always be collected on the same weekday and hour. This is subject to some case-bycase exceptions due to uncontrolled discharges, weather conditions or infrastructure operations that might require a resampling process.
• The onsite sampling process -A route plan A route to collect samples is planned daily, always in the same order and hour of the day. At every sample location, 1 liter or half liter of wastewater is collected, and the volume depends on the laboratory. Water temperature and conductivity were measured onsite, and then the bottle was refrigerated. When the collection process from a route is over, samples are sent to the laboratory.
Given a grab sample, the laboratory will analyze the remaining physicochemical parameters (chemical oxygen demand (COD), chloride levels and electrical conductivity) as well as the SARS-CoV-2 concentration by means of qPCR methodology. The results are received 24 to 48 hours, and the process to evaluate its reliability will start. Resamplings are demanded when a sample is not reliable enough; however, it does not mean the current sample must be ruled out. It also may validate the rst sample. The on-site sampling process for each of the 289 assigned points follows the process detailed in Table 1. It indicates the weekly process followed by a laboratory when analyzing any set of samples to give a reliable result: Resampling requests for some locations from the initial sampling set Table 1. Sampling week routine The following ow chart de nes the decisions made through the process after the initial sampling ( Figure  3). It describes the guidelines for entering a value as valid and making the decision to enter it into the VIGÍA application as correct, resample, or extrapolated results through previous samples: Sometimes it is required to resample some locations to con rm the results. For this reason, it is common to sample from 320 to 340 samples every week instead of the initial 289 expected. From 30 to 50 additional samples are due to the resampling requirements and occur in a different route during regular hours and as soon as the request has arrived.
After a year of experience, several upgrades were applied to the study. One is related to 13 polygons with low population density that have low representativity. Currently, these sample locations are sampled every two weeks, so another 17 belonging to a high-density population or those selected because of great interest are sampled twice a week.
In balance, from 10 to 20% of locations are sampled twice a week, between two and three days in between. More than 17,000 sample analyses have been deployed to date. Each laboratory receives 20 to 30 sewage bottles; in total, approximately 70 samples are sent daily, distributed approximately among laboratories, as shown in Table 2. Statistical analysis and data validation The results in terms of SARS-CoV-2 concentration and physicochemical parameter data were submitted to an in-depth analysis in relation to historical and current data before inclusion in the study.
While analysts are waiting for counteranalysis, the SARS-CoV-2 concentration is provisionally extrapolated [8] with previous results to allow the overall analysis of the whole region.
The Madrid sewerage system is mainly combined, which means that the wastewater system (both domestic and industrial waste) goes together in sewers with stormwater [9]. It is still unclear how SARS-CoV-2 virus from infected people interacts with these e uents, but in anticipation of unusual dilution that could affect virus detectability, physicochemical parameters were also monitored for outlier detection [10,11].
The virus presence is evaluated at a local level, which may include from a part of a municipality to a full municipality. In general, the results are expressed in terms of municipality. It sometimes requires combining or aggregating data from several locations. Only in the last stage of the analysis are the results from all sample locations aggregated at the regional level to provide virus trends in (gc/L)/100,000 inhabitant units.
• Analysis at the sample location level (Madrid city or municipalities) The virus dynamics were evaluated in terms of tendency variation; therefore, the magnitude of virus concentration was not relevant to the study.
Given a new sample's result, virus concentration is compared with extreme ranges obtained from one year of historical data. Extreme trends [12,13] are computed by the percentage change of SARS-CoV-2 concentration. After the whole period of study, it can be determined that every sample location features characteristic physicochemical parameters so that whenever a sample does not range between extreme virus concentrations, extreme physicochemical parameters will also be compared. If sample results are outliers, resampling is going to be requested. The more relevant the sampling point is, the faster the resampling is executed.
The most relevant physicochemical parameter for the detection of unusual wastewater composition seems to be COD ( Figure 4). This parameter is the most sensitive in the rainfall scenario, as it is the scenario of higher dilution than the usual presence of pollutants in industrial e uents. Other physicochemical parameters analyzed are chloride and electrical conductivity.
All of this decision making is associated with the ow chart in Figure 3.
When a sample has values outside the established minimum and maximum ranges of the physicochemical or virus concentration parameters, a sample is resampled at those locations [14]. The aim is to corroborate or discard the initial PCR results within 2 to 3 days [15]. In Figure 5, there is an example of outlier detection based on COD.
Through this study, it has been found that very low SARS-CoV-2 concentrations do not necessarily have to be associated with very dilute samples. Rather, these are often related to inconsistent SARS-CoV-2 concentrations. An example would be those that may be due to rst ush or resuspension phenomena produced, related to low COD values together with high SARS-CoV-2 concentrations. Because of this, it was determined that each manhole has a different representativeness in terms of COD so that comparisons can always be made at the same location.
All historical data are represented homogeneously (Figure 6), where the x-axis (Equation 1) and y-axis (virus concentration) represent normalized values. The x-axis indicates the ratio of the sample result to the mean of the historical series, expressed by percentage values. Samples that, as in the case shown, have a COD value outside the established range have been discarded (Figure 6, red color).

Data has been rescaled to a standardized variable [Equation 1
] to represent each sampled manhole: By calculating the ratio of the sample result to the mean of the historical series, expressed by percentage values, the following result can be seen in Figure 6.
COD is not always the determining parameter. Field observations are often much more important, such as the following cases: uncontrolled discharges, unusual colors of the wastewater or unexpectedly low ows. In these situations, the sample is discarded. It may be that a location ends up not being sampled during a whole week either because of sampling failure or because of highly diluted wastewater. In those rare cases, data are lled in with the last valid data.
To this point, virus trends have been represented at the local level by means of a tendency curve for each of the 289 locations. Registers' frequency could go from 2 or 3 to 7 days. Local authorities used it as a complementary tool for decision making on perimetral con nement during the lockdown.
• The analysis on the regional level (Madrid region) To show weekly SARS-CoV-2 presence in wastewater in the Madrid region, all sample points are aggregated into one curve. The results are normalized by 100,000 inhabitants to be aligned with the health department and local statistics.
The dataset for the different municipalities is aggregated and normalized according to the total population. The resulting aggregation was then compared to daily hospitalizations and incidence rates to validate the methodology at the beginning of the project and afterwards to provide useful weekly ndings to the regional authorities. Since September 23rd, 2020, the incidence rate has not been publicly available, ( ) so the aggregated curve is currently compared to daily COVID-19 hospitalizations and intensive care unit admissions.
The results of virus dynamics at the regional level are generated on a daily basis.
o Incidence rates The study began analyzing reported active incidence rates, which became an important source of information on the dynamics of the pandemic. These data were obtained from the local health authority. A correlation of indicators between SARS-CoV-2 concentration in sewage and 14-day active incidence rates (AIR) with active infection was performed. These cases are determined, establishing as a person with active incidence those who are [16]: 1. Persons with COVID-19 symptoms plus a con rmed positive active infection test.

Persons with a positive diagnostic test for active infection and a negative IgG antibody test or no symptoms.
This information provided a better t and correlation in terms of assessing the spread of infection in contrast to the study developed by Canal de Isabel II. As shown in Figure 8, a strong correspondence was found between the 14-day incident rate and the SARS-CoV-2 concentration (gc/L per 100,000 inh.) in the active infection rates. However, local authorities have not provided incidence rate data since September 23rd, 2020.
Since a subset of points was sampled each day of the week, the results were extended to a daily frequency with two considerations: 1. The signal was estimated to be constant from one sample to the next, 2. When the interval between two samples exceeds the 7-day period (due to resampling or any eldwork di culties), the previous results were extrapolated to ll in the missing information by applying the slope of the moving average series of the two previous results.
Data can be consulted by sampling location, municipality or by the whole region. Currently, Canal de Isabel II is also providing data on a daily basis showing SARS-CoV-2 trends. This is developed thanks to the use of historical counter samples.

Results
Sewage from a total of 289 sample locations has been characterized in terms of SARS-CoV-2 concentration, which is helpful to predict con rmed hospitalizations between 3 and 11 days. The results showed a strong correlation between the presence of SARS-CoV-2 in wastewater and different epidemiological indicators. In this section, a comparative analysis with reported cases and hospitalizations was performed.
When pooling the SARS-CoV-2 concentration observed at the sampling points corresponding to the Madrid capital and the rest of the region, the second and third waves developed differently. Madrid capital experienced similar peak concentrations for both waves, while the shape of the second wave showed a much smoother downward trend. For the rest of the region, the third wave almost doubled the peak of the second wave, as shown in Figure 7.
A signi cant association was found between the concentration of SARS-CoV-2 in sewage and 14-day incidence rates with active infection (Figure 8). However, local authorities have not provided public incidence rate data since September 2020 [2].
Hospitalization and ICU admissions data -correlation and prediction.
To assess the capacity of wastewater as an early warning indicator, the aggregation of SARS-CoV-2 concentrations was also compared with reported new COVID-19 hospitalizations and ICU admissions. The daily series of hospitalizations had a strong weekly seasonality, as shown in Figure 9, so a 7-day moving average is preferred as a more robust indicator. The mismatch that can be seen in the graph above between the presence of SARS-CoV-2 in sewage and hospitalizations is because the virus is already present in feces some days before symptom onset. According to the prediction, those days could be from 3 to 11 days in advance at the locations studied [6,17]. Additionally, predictive analysis of sewage as an indicator of SARS-CoV-2 may differ depending on several factors, such as location, size, population associated with the catchment, sampling strategies, temperature, and others.
However, different lags for the second and third waves were tested, resulting in six-and ve-day lags in the best linear correlation between both series for the second and third waves, respectively, as shown in Table 3 and Figure 11. As it has been proven, the Madrid region has been an early warning tool for the current COVID-19 pandemic for over a year (April 2020 -today). It is based on the wastewater-based epidemiology approach with more than 17,000 sampling analyses, approximately 325 per week. The region of Madrid is tested daily in terms of SARS-CoV-2 concentration, and reports are released on a weekly basis. Therefore, data on each of the 289 samplings are available once or twice a week, depending on its importance to the study. To obtain reliable results, the sampling point selection process set the following rules: a single sampling point could collect up to 25,000 equivalent inhabitants, there could not be a sampling point located further than 3.5 km from the discharging population center, and there could not be a sampling point located further than 2.5 km from the upstream sampling location.
Given a location, the sampling process must occur at the same hour and weekday, except for resamples that occur one or two days after the latest. The laboratory result is always validated against physicochemical parameters to detect compositions that may be unusual. It was found that there is a strong connection between SARS-CoV-2 presence and dynamics in wastewater and the 14-day incidence rates with active infection and reported COVID-19 hospitalizations.
Canal de Isabel II shares resulting information with health authorities at a daily frequency. The results could be presented as aggregation for the entire region, per municipality or per sample population center. This project has become the baseline to develop a permanent epidemiological surveillance system that is going to be based on 87 out of the 289 sample locations. This is possible due to the tree scheme followed on the sewer network. The current full set of locations provides geographically positioned virus presence. At the end of the pandemic, it is considered that there will be enough to learn if there is virus presence in the primary locations. Whenever presence is detected, the monitoring system will start a more detailed surveillance from the upstream locations. The 289 locations were active at some point.
Canal de Isabel II is currently training personnel to assume laboratory tests [19] at full capacity instead of outsourcing part of it, as happens today. A study on how SARS-CoV-2 decays in raw wastewater is also under development. Canal de Isabel II counts numerical hydraulic models of the sewer network, one by municipality, where the existing sewer network is evaluated in terms of hydraulic capacity for short-to long-term development. These models are currently used to test a set of new theoretical pollutants assumed to be SARS-CoV-2. This was done with the aim of de ning the virus decay rate based on historical data in sample locations.
Finally, there is an ongoing pilot test to determine if composite samples could be more representative than grab samples. Technicians are assessing the PCR relationships between both sample types collected with automatic refrigerated samplers.
Molecular detection and concentration of SARS-CoV-2 fragments in wastewater show a strong connection with both 14-day incidence rates with active infection and COVID-19 hospitalizations reported in the following 3 to 8 days. Therefore, monitoring this parameter would be useful to develop health strategies and optimize human resources where they may be most needed. Guidelines to decisions making throughout the process Figure 4 Observed distribution of physicochemical parameters (EC, COD, chloride).

Figure 5
Outlier detection based on COD  New COVID-19 14-day incidence rates with active infection per 100,000 inh in Madrid Region Figure 9 Hospitalization data for Madrid region