Application of VirCapSeq-VERT and BacCapSeq In the Diagnosis of Presumed and Definitive Neuroinfectious Diseases

Background: Unbiased high-throughput sequencing (HTS) has enabled new insights into the diversity of agents implicated in central nervous system (CNS) infections. The addition of positive selection capture methods to HTS has enhanced the sensitivity while reducing sequencing costs and complexity of bioinformatic analysis. Here we report the use of virus capture based sequencing for vertebrate viruses (VirCapSeq-VERT) and bacterial capture sequencing (BacCapSeq) in investigating CNS infections. Design/Methods: Thirty-four samples were categorized: (1) Patients with definitive CNS infection by routine testing; (2) Patients meeting clinically Brighton Criteria (BC) for meningoencephalitis (3) Patients with presumptive infectious etiology highest on the differential. RNA extracts from cerebrospinal fluid (CSF) were used for VirCapSeq-VERT and DNA extracts were used for BacCapSeq analysis. Results: Among 8 samples from known CNS infections in group 1, VirCapSeq and BacCapSeq confirmed 3 expected diagnoses (42.8%), were negative in 2 (25%), yielded an alternative result in 1 (11.1%), and did not detect 2 expected negative pathogens. The confirmed cases identified HHV-6, HSV-2, and VZV while the negative samples included JCV and HSV-2. In groups 2 and 3,11/26 samples (42%) were positive for at least one pathogen, however 27% of the total samples (7/26) were positive for commensal organisms. No microbial nucleic acids were detected in negative control samples. Conclusions: HTS showed limited promise for pathogen identification in presumed CNS infectious diseases in our small sample. Before conducting larger-scale prospective studies to assess clinical value of this novel technique, clinicians should understand benefits and limitations of using this modality.


Introduction
High-Throughput Sequencing (HTS) is a promising approach in diagnosis of neurological infections, as up to 60% of meningoencephalitis cases remain undiagnosed after routine clinical testing. 1,2,3 With the ability to test for a multitude of pathogens in a single assay without the need for speci c targeted primers as used in consensus polymerase chain reaction (PCR) tests, HTS focuses on a broad-spectrum and multi-target diagnostic method. 4 HTS has the potential to identify infectious agents that are beyond a clinician's initial presumptive differential diagnosis, particularly if the organism is novel, or a reemerging zoonotic pathogen or rarely attributed to CNS disease. This can be particularly useful in diagnosis in immunocompromised patients who are prone to CNS infections. However, HTS also has several limitations in clinical practice, including expensive costs, risk of environmental contamination, data from host genomic sequences, and limited availability across varied clinical-resource settings. 5 In this study, we used HTS testing on cerebrospinal uid (CSF) samples from patients across diagnostic categories, including those with unidenti ed pathogens thought to have neurological infections and those with de nitive CNS infections diagnosed by routine hospital-based diagnostic tests. Speci cally, we utilized two capture-based enrichment techniques for HTS; 1] for virus pathogen identi cation-VirCapSeq-VERT, a technology with approximately 2 million biotinylated probes covering 207 viral taxa known to infect vertebrates 6 , and 2] for bacterial pathogen identi cation -BacCapSeq, which contains 4.2 million biotinylated probes covering 307 bacterial species including all known pathogens infecting humans. 7 Methods A retrospective chart review was conducted on a cohort of patients admitted with presumed or de nitive neuroinfectious or neuroin ammatory conditions who underwent lumbar puncture and had their CSF samples bio-banked at New York Presbyterian-Columbia University Medical Center/Children's Hospital of New York (NYP-CUMC/CHONY) between 2017-2019. Bio-banked CSF specimens which were stored at -80 degrees celsius soon after CSF collection were then analyzed.
Patients were selected based on chart review from the electronic medical record (EMR). The protocol was approved by the CUIMC institutional review board and all appropriate consent was received from all included patients. Thirty-four samples were selected in three categories: (1) Patients with de nitive CNS infection by standard hospital-based clinical testing, (2) Patients without de nitive microbiological diagnosis via standard testing who met clinical criteria by Brighton Criteria (BC) for meningitis or encephalitis 8 , (3) Patients who neither received a de nitive microbiological diagnosis nor met BC but had infection highest on the differential by primary clinical team. CSF samples were de-identi ed and transferred to the laboratory maintaining cold-chain for VirCapSeq-VERT and BacCapSeq analyses. RNA extracts from CSF were used for VirCapSeq-VERT and DNA extracts were used for BacCapSeq analysis. HTS analysis was completed as part of usual clinical care in the time period of admission or shortly after discharge. Laboratory teams running HTS were blinded to all clinical data.
AllPrep DNA/RNA Mini Kit (Qiagen, Hilden Germany) was used to extract 300ul of CSF collected from 34 patients, with 50ul eluted to isolate DNA and RNA, respectively. Negative controls included 2 CSF samples from healthy controls without any known infectious disease, and 3 buffer extraction controls (no template controls) for background reads ltration in further analyses. Individual VirCapSeq-VERT libraries were prepared using the Hyper Prep kit (KAPA Biosystems, Boston, MA, USA) and unique barcodes. RNA extract was DNAse treated (DNAse I, Ambion, Life Technologies) and rst strand cDNA was generated using Superscript III and random hexamer primers for reverse transcription (Invitrogen, Life Technologies). Prior to the second stranded cDNA synthesis, the cDNA was treated with RNAse-H using random primer extension and Klenow enzyme (New England Biolabs, Ipswich, MA, USA). Fragmented segments of an average size of 200bp were generated using enzymatic shearing from the resulting double stranded cDNA preparations for HTS. (Kapa biosystems, Roche). These fragmented segments were puri ed using Agencourt Axyprep AMPure Beads 9 and libraries were prepared using the HyperPlus Library Prep Kit following the manufacturer's protocol. The nal libraries were quanti ed using Agilent TapeStation System. The libraries were then pooled and hybridized with the VirCapSeq-VERT probe set prior to a nal PCR and sequencing on Illumina HiSeq 4000 system. Library preparation and capture sequencing with BacCapSeq Individual BacCapSeq libraries were prepared from the extracted DNA also using the Hyper Prep Kit (KAPA Biosystems, Boston, MA, USA). Extracted DNA was fragmented using enzymatic shearing to generate segments of an average size of 200bp followed by the same steps as VirCapSeq-VERT sequencing. The libraries were pooled and hybridized with the BacCapSeq probe sets followed by a nal PCR and sequencing on Illumina HiSeq 4000 system.

Data Analytics and Bioinformatics Pipeline
After VirCapSq-VERT and BacCapSeq sequencing on the Illumina platform, 100bp single end reads were generated, which were de-multiplexed using bcl2fastq software v 1.8.4 into individual samples for further analysis. Illumina adapter sequences were trimmed from the demultiplexed FastQ les using cutadapt (v1.8.3) 10 , quality ltered and end trimmed with PRINSEQ software (v0.20.3) 11 and cleaned of human host sequences using Bowtie 2 mapper (v2.2.9) 12 . Assembly of quality ltered and host subtracted reads was performed using MIRA Assembler (v4.0). Assembled contiguous sequences and unique singletons were subjected to homology searches against the entire GenBank nucleotide database using Megablast.
All the contiguous sequences and unique singletons from assembly, which did not assign any hit with MegaBLAST, were subjected to NCBI blastx. Based on Megablast and blastx analysis viral and bacterial sequences matching with Illumina reads and contigs, were downloaded from NCBI and used for mapping to recover partial or complete genomes and determine genome consensus sequence and breadth of coverage. The Fastq reads were then imported into Geneious 10.0.9 (https://www.geneious.com) and the reads were trimmed to remove low quality sequences. The map-to-reference tool was used to assemble the reads using the reference genomes from GenBank.  Table 1) In one of these cases, Pseudomonas was identi ed in addition to the originally diagnosed Human betaherpesvirus 6B (HHV6B). All of these cases involved immunocompromised participants with active use of immunosuppressive therapy (steroids, CAR-T therapy). One of the de nitive cases of Varicella Zoster Virus (VZV) identi ed alternative pathogens (GBV-C/ Human Pegivirus) and failed to align sequences to VZV. In the other de nitive case of VZV antibodies positive, HTS accurately identi ed sequences of VZV in CSF and additionally found Torque Teno Virus. Both cases occurred in immunocompromised participants with histories of bone marrow transplantation and HIV, respectively. The remaining four de nitively diagnosed samples (4/8; 50%) were negative. One of these participants had a de nitive diagnosis of Cryptococcus neoformans by antigen test originally, therefore the negative reading by VirCapSeq-VERT and BacCapSeq was as expected. Additionally, another sample had an initial positive read for Borrelia burgdorferi on serological testing, thus the negative read on sequencing was also expected given this result was likely captured later in infection when the nucleic acid has been degraded. The additional two samples had original diagnoses of JC Virus and HSV-2 per standard clinical testing, however VirCapSeq-VERT/BacCapSeq yielded no sequences for these agents. A majority of the patients in this cohort were immunocompromised (6/8, 75%) and 4/6 (67%) of them had a pathogen detected on VirCapSeq-VERT/BacCapSeq. 50% (3/6) of these samples analyzed con rmed the original diagnosis based on gold standard testing.

Groups 2 and 3: Suspected CNS Infections
Of the suspected CNS infection patient group, 6 were designated as group 2 according to Brighton criteria classi cation for meningitis/encephalitis, and 20 samples that did not meet Brighton criteria were designated as group 3. Four of six (67%) of the group 2 cases were positive for a pathogen, while 7/20 (33.3%) of samples in group 3 were positive. Overall, 11/26 (42%) of the samples were positive for at least one pathogen by VirCapSeq-VERT/BacCapSeq analysis. Of the positive illumina reads, 7/11 (64%) had commensal organisms (Bacteroides, Herbaspirillum, Faecalibacterium, etc) as at least one of the detected pathogens on HTS.
Within group 3, 6 samples (30%) returned one bacterial species, while 1 returned multiple (> 2) bacterial species. Two (10%) samples were positive for viral pathogens. In the case of the viruses, one sample originated from a HIV + patient which returned positive for HIV on HTS testing. The second viral case returned hepatitis E. 15 samples (57.7%) were negative for any pathogen on HTS testing. The most commonly identi ed pathogens in these cohorts were Bacteroides (3/11), Herbaspirillum (2/11, and Faecalibacterium (2/11) (Table 3).

Discussion
While some evidence has indicated the promise of e cient pathogen detection of suspected CNS infections using HTS, the practical clinical application of the technique remains understudied. The potential of HTS in microbial pathogen detection is underscored by advantages unique to the platform, including "hypothesis free" testing that does not rely on patient details or microbial knowledge, identi cation of varied strains of target pathogens, and ability to detect non-culturable or slow-growing organisms that may not be detected on traditional modalities. However, one of the most challenging aspects of HTS as a clinical tool is in the interpretation of data. HTS produces a large amount of complex data, requiring high-quality bioinformatics pipelines. Because of the untargeted nature of HTS, background interference must be ltered out, including highly repetitive nucleotide sequences, poor-quality and redundant sequences, and human host data. 13 About 98% of CSF sequencing data maps to the human genome and must be removed. 14 Additionally, patient samples must be handled with care to prevent cross-sample and environmental contamination. 15,16 VirCapSeq-VERT and BacCapSeq methods have advantages over traditional HTS methods by, 1] Increasing sensitivity by enriching viral and bacterial read up to 10-1000 folds; 2] reducing host reads in nal outcome by target enrichment for virus and bacteria, 3] reduction of required sequencing depth many folds and cost/sample. Within our study, HTS detected pathogens in a considerable portion of our suspected CNS infection group (42%). However, a large majority of these reads were for commensal organisms and potential contaminants that have been studied previously. 15,16 While it may appear that patients without de nitive microbiological diagnosis who met criteria set by Brighton Criteria for meningoencephalitis (group 2) had a higher rate of pathogen detection, many of the pathogen reads require deeper analysis, contextualization, and interpretation. For example, we identi ed the presence of Herbaspirillum in the CSF of 2 patients based on the number of reads. Despite high read counts, the literature indicates that Herbaspirillum is typically a benign bacteria usually found in soil, but has been shown to be present in molecular biology grade water and laboratory reagents used in PCR and DNA extraction kits. 17 While the sensitivity of this technique enables scientists to detect rare organisms with high human host backgrounds, environmental and reagent contamination is similarly detected and may be potentially misinterpreted. When there are low amounts of nucleic acid input, which is typically the case when sequencing CSF, contaminants tend to be overampli ed, leading to high number of read counts of non-causative specimens. 18 Although there are advanced molecular techniques that can address this hurdle, it is unlikely these methods would be used in a clinical setting. 13 Similar to how traditional clinical test results require clinical context, potential pathogens determined through HTS are also enhanced by clinical context. In a previous study evaluating the use of HTS in clinical sequencing, a 'clinical microbial sequencing board' was established by experts in the eld (neurology, ID, HTS, lab specialists) to discuss the data produced through HTS in the context of the clinical features of the case. 19 In our patient sample, we selected patients with clinical suspicion for CNS infection, and thus were able to interpret the potential signi cance of the HTS reads in context of the clinical picture.
This study had several limitations. The primary limitation was the variation in time between sample collection and sequencing initiation among samples.
Between samples, there was not a standardized time in which HTS was utilized for each case, and in several cases HTS was performed as a nal measure for potential etiologic diagnosis. Given that pathogens may be undetectable in CSF in delayed stages of illness and CNS symptoms may appear only in postacute period of infection during which genetic detection is not always possible, the utility of HTS as an early primary diagnostic tool is not fully captured in our study. This is especially illustrated in our study by samples in Group 1 diagnosed by serological tests (i.e. Lyme). Given these antibody tests may yield positive results in later phases of disease at which nucleic acid is degraded, sequencing may not be as useful when relying on serologies. This was a retrospective study in which CSF was taken via lumbar puncture and frozen before being sent to the laboratory. Nucleic acid degradation as a result of prior freeze-thaw steps may produce shorter, sparser sequencing reads which make it di cult to identify overlapping regions that allow de novo assembly into longer contiguous sequences, consequently leading to lower sensitivity of HTS. 20,21 In this study, we utilized VirCapSeq-VERT, a technology with probes targeting viral taxa known to infect vertebrates, as well as BacCapSeq, which contains probes covering bacterial species including all known pathogens infecting humans. 6,7 As a result, we were unable to detect potential fungal pathogens. Additionally, all of the sequences in this study underwent homology searches against the entire GenBank nucleotide database using MegaBLAST followed by NCBI BLASTx, which is a widely used approach. 21 Other studies in non-clinical settings have utilized limited-target approach pipelines that can account for novel pathogen detection and aid in the interpretation of sequencing data. Our study analyzed the use of these techniques in a very small cohort of patients (34), which could also lead to potential error and variability within our results, along with signi cantly limiting the generalizability of the results. Lastly, the platforms utilized in this analysis were initially published upon in 2015 and 2018. Since then, additional technologies have been developed and investigated, however the e cacy of these more recent techniques is not re ected in our study.
While there are many potential applications of HTS, there are caveats that may hinder its ability to aid in the diagnosis of CNS infections in clinical settings. These include the cost, institutional capacity, false positive reads as a result of contamination, timing of sample collection, sample degradation, and a lack of clear criteria for diagnosis. HTS methods and work ows must be standardized. The establishment of a validated and universal bioinformatics pipeline would also aid in the clinical application of HTS. An a priori database of common reagent contaminants, in published literature as well as those seen in speci c laboratory environments, should be established to compile a list of common contaminants to further aid in the interpretation of HTS results. Overall, HTS showed limited promise for pathogen identi cation in presumed CNS infectious diseases in our small sample. Larger-scale prospective studies should be conducted to fully assess the clinical value of this novel technique and improve its potential incorporation into clinical guidelines.