For the first time, we report an amplicon-based high throughput sequencing (HTS) approach that incorporates the DADA2 algorithm, enabling pneumococcal strain resolution and longitudinal characterization of carriage dynamics based on SNPs in the novel plyNCR marker. Our method is less labor intensive and more accurate than conventional methods, as it allows for direct detection of major and minor (low abundance) pneumococcal strain variants and therefore, circumvents the need of picking and analyzing individual pneumococcal colonies to fully grasp the extend of diversity. Also, our method does not require prior culture enrichment which may disrupt the original bacterial composition as certain variants may outcompete others.
Our results further demonstrate the plyNCR's potential as a top-performing pneumococcal marker with an in-silico specificity among only S. pneumoniae strains as well as a high molecular (PCR) performance (100% PPV and 100% specificity) matching that of the gold standard (lytA qPCR). The modest rates achieved in sensitivity (94.2%) and NPV (97.8%) appear to stem from the use of a conserved amount of template DNA (10 µl), which in light of the assay's previously described detection limit (1 pg) at a higher DNA amount (31.4 µl) , may explain the reduced sensitivity and NPV. Nevertheless, compared to previous marker discoveries like the SPN9802 [27, 28] and SP2020 , the plyNCR shows more promise, given an absence of misidentified strains (false positives) during molecular assessment, as well as its ability to easily discriminate pneumococci from closely related VGS, which may harbor plyNCR homologues at a size difference of ~200 bp.
To our knowledge, this is the first carriage study that provides adequate resolution of "unique" nasopharyngeal pneumococcal strains in a longitudinal infant sampling without a prior culture step. Only two other studies have attempted to characterize S. pneumoniae at the strain-level using infant longitudinal samples [3, 4]. However, both studies employed a metagenomic approach, which requires: (i) a general enrichment process for Streptococcus sp., (ii) a high coverage that is mostly outbalanced by host DNA, (iii) a short-read sequencing library requiring assembly, and (iv) a more computationally intensive (and mostly database-dependent) approach , all of which may be cost-limiting for large scale studies. On the other hand, our amplicon sequencing strategy utilizes a variable, species-specific marker that can be: (i) amplified directly from clinical samples, (ii) sequenced as long reads on a 3rd generation platform (PacBio) with an accuracy of ≥99.8%, and (iii) easily resolved as true genetic strain variants down to a single nucleotide difference via the DADA2 pipeline.
The parallel use of cPCR enabled serotype identification directly from NS samples and allowed serotypes to be primarily linked to most strain (plyNCR ASV) profiles. However, co-colonizing serotypes were unidentified in 45% of NS samples, demonstrating a significant limitation of this method for multiple serotype detection, as previously shown [2, 31]. The microarray, which is currently considered the best method for detecting multiple serotypes , notably overcomes these limitations, as it is able to detect all known serotypes, including NTs and other bacterial species as well as their relative abundances, though at a higher cost and with a culture-amplification step, which is not required for cPCR.
By itself, the plyNCR does not offer any serotype information for resolved strains in carriage and likewise, serotyping does not provide any genomic background details for pneumococci in carriage. The advantage of the plyNCR for carriage studies is therefore complemented by serotyping, as it confirms the identity of a particular strain (plyNCR ASV) and allows instances of new acquisitions (with an identical serotype, but a different strain) and re-acquisitions to be easily identified. Thus, plyNCR combined with serotyping offers a more detailed insight into carriage dynamics including duration.
Based on our findings, the median age to first acquisition (147 days) was much higher in our study compared to reports from higher carriage countries such as, South Africa, Malawi, Indonesia and the Gambia [24, 26, 32, 33]. Given higher rates of PCV uptake in Switzerland , the longer time to acquisition in our birth cohort may therefore reflect reduced chances for pneumococcal transmission. Presence of two or more siblings in the home was the only risk factor associated with the first pneumococcal acquisition, supporting previous reports of older siblings being the transmission sources for infants [35, 36]. We additionally observed a trend of longer carriage durations for the first acquired pneumococcal episode compared to subsequent episodes, which is similar to a recent study in Indonesia . Longer duration of carriage has been shown to be positively correlated with more prevalent serotypes and negatively associated with age , which may partially explain our findings, as 66% of first acquisitions were the more prevalent non-PCV13 serotypes.
Contrary to a previous study, which demonstrated stabilized co-colonization rates during the pre-vaccine (2004-2005) and PCV7 era (2006-2009) in Switzerland , our current findings indicate that the rate of co-colonization is ~2x higher since PCV13 implementation (2010-2014) in the country. Interestingly, our estimated rate of 17.8% is not far off when compared to studies conducted during similar periods in the Netherlands (22%) , Portugal (25%)  and Nepal (20.2%) . Although, it remains unclear whether overall co-colonization rates are affected by PCV selection pressure or simply unmasking following the use of more sensitive detection measures, the lower rates observed in the previous Swiss study may imply an underestimation bias resulting from sampling acute otitis media (AOM) and pneumonia patients, who represent populations with compromised nasopharyngeal flora , compared to healthy (asymptomatic) young children, who are considered the major reservoir and drivers of pneumococcal transmission [36, 42]. At the same time, methodological differences may have also contributed to differing estimations.
The most common pneumococcal co-colonization dynamic observed in this birth cohort was a stable co-colonization, with varied relative strain abundances, indicating continued inter-strain competition. This is contrary to a recent study in Indonesian infants , where displacement of the primary colonizer by the secondary serotype was more common. As there is no national PCV program in Indonesia, a disparity in the serotype distributions of both countries may explain the observed differences in co-colonization dynamics. This is mostly evidenced by reported high carriage episodes of serotype 6B in Indonesia , which has all but disappeared from Swiss carriage isolates . As the Indonesian study did not include any information on the serotypes involved in co-colonization, no further comparisons can be made between both studies. However, the identification of serotype 15B/C as a frequent initial/primary co-colonizer [39, 40], is consistent with the observations in this study.
Our study has a number of limitations. First, the NS samples used here were collected by the parents up to a decade ago and as a result, we cannot rule out the possibility of a substandard collection as well as potential sample degradation from previous frequent use of the samples in earlier studies. Second, it is possible that the unavoidable addition of the PacBio Universal sequencing overhangs to the plyNCR primers may have reduced the amplification sensitivity, especially in the presence of low template DNA. This is mostly supported by successful PCR amplifications (without overhangs) in 50% of the 14 NS samples that failed to amplify with the use of overhang primers (results not shown). This trade-off in specificity due to the overhangs, suggests that additional PCR optimization may be required for studies seeking to apply PacBio SMRT amplicon sequencing to their research. Third, there was no plyNCR ASV output for two samples, despite successful plyNCR amplification, suggesting a preferential sequencing bias for homologues, which were observed when the DADA2 length filter was lifted (results not shown). We are yet to estimate the frequency of this event in future studies. Fourth, sequencing had to be repeated for one amplicon, but no technical reason was found for its failure during troubleshooting. Therefore, the cost of PacBio amplicon sequencing could be exacerbated by random repetitions as well as the required sequencing depth for variant detection during multiplexing. However, this limitation is applicable to all HTS approaches. Lastly, a major limitation of this study was the absence of a culture-based serotyping method like the Quellung or microarray for cPCR verification. This was not feasible as the nasal swabs used in this study were not intended for the investigation of viable bacterial cultures. Thus, our use of the cPCR was therefore unavoidable, though we argue that despite its limitations, cPCR still provided reliable serotyping results due to our study's close sampling time frame (biweekly swabs), where previous and/or ensuing timepoints with identical serotypes and plyNCR ASVs provided additional support and justification for serotype attribution, even in 84% of co-colonizing samples.
Despite these limitations, our amplicon-based sequencing approach was successful in resolving strain profiles of S. pneumoniae in the nasopharynx and distinguishing them from those of the viridans group streptococci. In addition, our infant cohort represents an intensively sampled study population, preventing underestimation bias in carriage estimates and allowing us to closely track changing carriage and co-colonization dynamics over a 12-month period.