Genomic Features and Comparative Genomic Analysis of Streptococcus sp. v1. nov., Isolated from an Endophthalmitis Patient

Endophthalmitis is an acute inflammatory intraocular condition that can cause permanent vision loss. The treatment strategy and visual outcome partly depend on the identification of the agents of pathogens. In this study, metagenomic sequencing was conducted to investigate the microbial and antibiotic resistance genes (ARGs) composition in the vitreous (intraocular body fluid) of an endophthalmitis patient, who progressed rapidly and accompanied by severe pain. Metagenomic sequencing data revealed that the vitreous sample was predominated by Streptococcus, with a low-diversity microbiome in the vitreous. This strain harbor’s the ARGs mainly against beta-lactam, macrolide-lincosamide–streptogramin, and multidrug. Additionally, metagenome-assembled genome sequence of Streptococcus sp. v1. nov. was identified. The Tetra Correlation Search (TCS) analysis uncovered that the closest relative of the Streptococcus sp. v1. nov. was Streptococcus mitis SK321. Pan/core genome analysis for Streptococcus sp. v1. nov. and TCS top 25 hits strains revealed that most unique genes of Streptococcus sp. v1. nov. were linked to ATP-binding cassette transport system, which could indicate unique virulence and pathogenic potentials of Streptococcus sp. v1. nov. In addition, a total of 7 virulence factors were identified, and the overwhelming of them were classified into “offensive virulence factors”. The high pathogenicity of Streptococcus sp. v1. nov. could be a reason for the patient’s rapid disease progression. Our study was first isolated an ocular pathogen with highly virulent based on metagenomic sequencing and bioinformatics analysis, which has important reference value for revealing the composition and genome characteristics of pathogens in endophthalmitis patient in the future.


Introduction
Endophthalmitis is a severe sight-threatening intraocular infection following eye trauma or surgery [1].The pathogens may be introduced from the patient's own microbiota, or through contaminated solutions or instruments used during ophthalmic surgery [2,3].In the majority of cases, the detection of pathogens is made by culture of vitreous and/or aqueous.However, traditional culture methods have some limitations in identifying hard-to-culture and slow-growing microorganisms.Culture-negative endophthalmitis are not so uncommon, having an incidence between 20 and 30% [4].Therefore, the spectrum of microorganisms involved in endophthalmitis is not fully described or characterized.A recent study of metagenomic sequencing on the vitreous of patients with endophthalmitis after cataract surgery or intravitreal injection showed that the vitreous of patients with endophthalmitis contained a variety of microorganisms [5].In addition, an endophthalmitis patient with rare Streptococcus suis was identified in vitreous samples based Meiqin Zheng and Yutong Kang have contributed equally to this work.
Due to the low pathogen detection rate of the traditional culture methods and polymerase chain reaction (PCR) [7,8], differentiating infectious endophthalmitis and sterile endophthalmitis clinically is difficult.Deshmukh et al. detected the presence of microorganisms in 30/34 (88%) patient cases using Next Generation Sequencing (NGS), and all 30 control groups showed negative results for bacteria or fungi [9].For sterile endophthalmitis, steroid treatment may yield better results than antibiotics.Metagenome sequencing technology can identify the entire microbial community including archaea, bacteria, fungi, and viruses, which is expected to be the most important tool for improving the treatment and diagnosis in the area of infectious diseases [10][11][12].16S rDNA sequencing is also used to characterize the bacterial composition of exogenous endophthalmitis [13,14].According to the latest research report, rapid nanopore targeted sequencing has also been used for the detection of pathogens in patients with endophthalmitis [15][16][17].In addition, advances in sequencing throughput and computing technology make it possible to recovery cultureindependent genome from metagenomes [18].
Here, metagenomic sequencing was conducted on a vitreous sample of a post-traumatic endophthalmitis patient to reveal the composition of its pathogens and ARGs.We recovered a high-quality draft genome of "Streptococcus sp.v1.nov." by metagenome assembly and genome binning, and performed subsequent silico-determinations of its taxonomic affiliation, phylogenetic relationship and potential virulence factors.In the future, the use of metagenomic sequencing and bioinformatics analysis can more quickly identify ocular pathogens and antibiotic resistance, which will contribute to clinical precision treatment and reduce blindness.The highquality genomes of unculturable or difficult-to-culture eye pathogens can also be getting form metagenomic binning.

Sample Collection, DNA Extraction, and Metagenomic Sequencing
The vitreous (intraocular body fluid) sample was obtained from a patient suffering from endophthalmitis attending the Optometry and Eye Hospital Affiliated to Wenzhou Medical University.The patient is a 62-year-old man presented with trauma to right eye with iron splinter.On examination, there was y-shaped, full-thickness corneal perforation wound.He had received debridement suture, anterior chamber flushing, eye globe magnet aspiration emergently, vitreous cavity injection (vancomycin 1 mg/0.1 ml), systemic antiinflammatory (cefazolin and hydroprednisone intravenous antibiotics) and topical antibiotics therapy (tobramycin and dexamethasone combination eye drops, levofloxacin ophthalmic solution, gatifloxacin ophthalmic gel, pranoprofen eye drops).But the effect is unsatisfactory.Microscopic examination of smears has found the gram-positive coccus from the vitreous aspirate and vitreous humor.Because of inflammation progressing rapidly, severe pain and the poor visual acuity, the patient has decided to operate again and then the right eye was enucleated.
Genomic DNA of vitreous sample was extracted using the MicroElute Genomic DNA Kit (Omega, Guangzhou, China).Library construction and whole-metagenome shotgun sequencing was performed using a 2 × 150 bp pairedend protocol on the HiSeq X10 platform (Novogene, Beijing, China).
Circular genome map was drawn using the CGView server (http:// stoth ard.afns.ualbe rta.ca/ cgview_ server/).Repetitive sequences were identified using RepeatMasker and RepeatModeler pipeline [32].Digital DNA-DNA hybridization (DDH) were calculated using the Genome-to-Genome Distance Calculator (GGDC) (https:// ggdc.dsmz.de/).Average nucleotide identity (ANI) was determined using FastANI software [33].The Core-pan-genome analysis was done using the Bacterial Pan Genome Analysis Page 3 of 10 378 (BPGA) pipeline [34].Then we use Evolview to visualize a phylogenetic tree of the core gene sequence which was constructed based on the unweighted pair-group method with arithmetic means (UPGMA) method [35].We used BPGA software to access the Kyoto encyclopedia of genes and genomes (KEGG) database and Cluster of Orthologous Groups (COG) database.
For variation analysis, we used Streptococcus mitis SK321 as the reference strain.Single nucleotide polymorphism (SNP) and small insertions and deletions (InDels) variants were called using the The Genome Analysis Toolkit (GATK) software suite [36].Low-confidence variants were filtered using the VariantFiltration tool in GATK and var-Filter option in Samtools [37].Structural variation (SV) was detected using Breakdancer [38].And we applied CNVnator to identify copy number variation (CNV) [39].

NCBI Accession Numbers
The sequence data of the whole genome of Streptococcus sp.v1.nov.have been deposited on NCBI under the Accession PRJNA658170.

Microbial Community Revealed by Metagenomic Sequencing
Metagenomic shotgun sequencing of the endophthalmitis sample disclosed a diverse microbial community.After quality control, removing adapters and trimming lowquality reads, 1.56 million nonhuman reads were obtain for subsequent analysis by using Bowtie2.After taxonomic annotation, only bacterial was identified.The bacterial sequences were classified as 2 phyla, 2 genus, and 4 species.The microbial community is dominant by Streptococ-cus_mitis_oralis_pneumoniae (97.98%), followed by Pseu-domonas_unclassified (0.96%), Helicobacter pylori (0.72%) and Neisseria_unclassified (0.33%) (Fig. 1a).Streptococcus is a common pathogen causing severe infectious endophthalmitis [40].The virulence of pathogens is one of the factors affecting the prognosis of endophthalmitis [41].Compared with coagulase-negative Staphylococcus endophthalmitis, Streptococcal endophthalmitis has a poor visual prognosis [4,42].

Abundance and Categories of ARG Types and Subtypes
In summary, 6 ARG types comprising 14 ARG subtypes were identified in the endophthalmitis sample.Beta-lactam resistance gene was the dominant ARG type (4.41 × 10 -1 copies/16S rRNA gene), followed by ARGs against macrolide-lincosamide-streptogramin (2.19 × 10 -1 copies/16S rRNA gene), and multidrug (1.79 × 10 -1 copies/16S rRNA gene) (Fig. 1b).The high abundance of multidrug may be one of the reasons why the patients were ineffective in the treatment of systemic or topical antibiotics.The top 5 ARG subtype with the highest abundance were, PBP-2X, multi-drug_transporter, PBP-1A, and PBP-1B (additional Fig. 1).These results were valuable for guiding clinical rational medication.

Streptococcus Genome Recovered from Metagenomes
Metagenome assembly and genome binning based on the MetaWRAP pipeline, resulting in one near-complete genomic bin with 23 scaffolds.The genome size of the obtained bin is 2Mbp, which encodes 1974 genes with a mean gene length of 907 bp.The genome was estimated at 99.78% completeness and 0.846% contamination, with a low GC content of 39.7%.Only 0.76% of this genome was occupied by repeat elements.Interspersed repeats were the predominant type of repeat region, which accounted for 0.55% of the whole genome size.
The genomic bin has a single circular chromosome and no plasmid was found.Blasting genomic sequences against the NT database showed that the bin was classified as Streptococcus and could not be identified for the species level.Metagenomic taxonomy results showed that Streptococ-cus_mitis_oralis_pneumoniae were the most dominant species, accounting for 97.98%.Due to S. mitis, S. pneumoniae, S. pseudopneuomoniase, and S. oralis are closely related species of viridans group streptococci, it is difficult to discriminate at the species level [43,44].Therefore, there is a need to carry out genomes comparisons of Streptococcus MAG against related Streptococcus spp for species circumscription.

Comparative Genome Analysis
We used ANI and digital DDH for genome comparison analyses.The ANI is regarded to be the most relevant comparative parameter to determine the species of bacterial.The whole-genome pairwise ANI > 95%, indicating same species [33].DDH is generally used to determine the genomic similarity among strains [45], 70% similarity was considered as the gold standard threshold of DDH values for species boundaries [46].TCS function of JSpeciesWS can rapidly compare selected genomes against a reference database continuously updated (ftp:// ftp.ncbi.nlm.nih.gov/ genom es/ genba nk) [47].TCS analysis showed that draft genomes of Streptococcus sp.v1.nov.was closest to Streptococcus mitis SK321, with Z-scores of 0.99859.To identify genetic differences between the two strains, we performed variant calling.
Comparing Streptococcus sp.v1.nov.with the reference Streptococcus mitis SK321, a total of 39,979 highquality SNPs were identified, including 28,914 transitions (Ti) and 11,065 transversions (Tv), with an average Ti/ Tv ratio of 2.61.A > G|T > C type (14,497) and G > A|C > T (14,417) type accounted for the majority of all SNPs.The gain and loss of mutated genes were perceived as one of the most important contributors to functional changes [48].A total of 1588 InDels were identified, which containing 809 insertions and 779 deletions.Moreover, 29 SVs containing 17 deletions, 1 inversion, and 11 translocations, was uncovered in the Streptococcus sp.v1.nov.genome.For CNVs, 2 duplication and 43 deletion were identified (Fig. 2).Genomic variation is an important evolutionary driving force [49].

The Pan-Core Genome Analysis
To find different characteristics between Streptococcus sp.v1.nov.and the species Streptococcus mitis, a comparative genome analysis for Streptococcus sp.v1.nov.and TCS top 25 hits Streptococcus strains (additional Table 1) was performed by BPGA pipeline.The core-pan plot showed that as the number of given genomes increases, the number of core genes decreases gradually (Fig. 3a).The pangenome analysis revealed that the genomic diversity of Streptococcus intermedius also follows an "open" pangenome model [51].With the addition of each new genome, the genes of core genome reduced from 1915 to 824, the genes of pan genome increased from 1915 to 5120.The pangenome of all the strains had 718 core genes, 2591 accessory genes and 1810 unique genes.When compared with these related Streptococcus strains, the Streptococcus sp.v1.nov.had the most accessory genes.The dendrogram of core genes showed that Streptococcus sp.v1.nov.had the closest phylogenetic relationship with Streptococcus mitis SK321 (Fig. 3b).
The functional analyses of COGs in the 26 Streptococcus genomes revealed that the highest proportion of genes of core genomes were related to "metabolism", the majority of unique gene families were mostly associated with "information storage and processing".These observations are consistent with previous study that has reported on the significant Fig. 2 Genome-wide landscape of genetic variation.Circle 1, 2 (outer to inner) represents sequence information and GC content curve of the reference genome, respectively.Circle 3 represents sequencing depth and coverage.Circle 4 represents gene coding region (CDS) and non-coding RNA region (rRNA, tRNA) in the reference genome (the outer circle represents the positive strand, and the inner circle represents the negative strand).Circle 5, 6, 7, 8 represents single the density of nucleotide polymorphism (SNP), small fragments insertions and deletions (Indels) copy number variation (CNV), and structural variation (SV), respectively 378 Page 6 of 10 role of metabolism-related genes in core genomes [52].Previous research suggests that information storage and processing category was linked to intracellular survival [53].The core genes were mainly enriched in "Translation, ribosomal structure and biogenesis", "General function prediction only", and "Amino acid transport and metabolism".In the categories of "Transcription", "Replication, recombination and repair", "Cell wall/membrane/envelope biogenesis", and "Defense mechanisms", the accessory and unique genes accounted for a greater proportion than core genes (Fig. 4a).The KEGG analysis revealed that genes associated with "metabolisms" taken up the largest proportion both in core, accessory, and unique genomes.Among these genes, most of them were related to "Carbohydrate metabolism", "Membrane transport", "Overview", "Amino acid metabolism", "Replication and repair", and "Nucleotide metabolism" (Fig. 4b).According to prior relevant studies, their results are largely consistent with the findings of the present study.For instance, previous research has also revealed that the core genome and accessory genome in bacterial genomes exhibit distinct characteristics in terms of gene function differentiation [54].Moreover, the application of the KEGG database has been widely used in the study and analysis of genome function.In this study, the combined analysis of COGs and the KEGG database provided powerful support for us to further comprehend the functional differentiation of the Streptococcus genome [55].
In our new strain Streptococcus sp.v1.nov., 86 genes were unique.KEGG pathway annotations of these unique genes indicated that most KOs were associated with ABC Transport system, including ATP-binding cassette, subfamily B, bacterial (K06147), putative ABC transport system permease protein (K02004), putative ABC transport system ATP-binding protein (K02003), raffinose/stachyose/ melibiose transport system permease protein (K101190), ABC-2 type transport system ATP-binding protein (K01990), and raffinose/stachyose/melibiose transport system permease protein (K10118) (Table 2).Pathogens acquire essential nutrients from the host by select ABC transporters to rapidly adapt to changing host microenvironments, while mediating the effects of toxicity [56][57][58].A total of 9 virulence factors were identified in Streptococcus sp.v1.nov.genome using the virulence factor database (VFDB), including Capsule, PavA, Hyaluronic acid capsule, CBPs, PfbA, Autolysin, and PsaA.Following the functional classification scheme in VFDB, 71% of virulence factors are related to "offensive function", which contributes to successful infection of the host cells and tissues by colonization and toxicity [59].This also explains to a certain extent why the patient's condition progresses rapidly.The identification of these virulence factors is important for understanding the pathogenesis of Streptococcus sp.v1.nov.and developing targeted strategies for preventing and treating infections caused by this bacterium.
The identification of virulence factors in bacterial genomes using the VFDB has been a useful tool for understanding the pathogenesis of bacterial infections and developing targeted interventions.For example, Zhou et al. identified 17 major virulence factors and capsule-associated genes in three Streptococcus oralis strains, highlighting the pathogenic potential of this bacterium and laying a foundation for the prevention and treatment of S. oralis infections [60].Sinha et al. employed a principal component analysis of virulence gene presence/absence in S. intermedius, revealing that serine-threonine kinases of the Sda group, adhesion protein LAP, and capsule polysaccharide biosynthesis protein Cps4E are associated with cerebral abscess and bronchopulmonary infections.Conversely, hepatic and abdominal abscesses are associated with the presence of fibronectin-binding protein Fbp54, and capsule polysaccharide biosynthesis proteins Cap8D and CpsB [51].

Conclusions
Metagenome, refers to the sum of all microbial genomes in an environmental sample.Metagenomics is based on the study of microbial populations in specific environmental samples, using the next generation of high-throughput sequencing technology to explore and study microbial diversity, population structure, evolutionary relationship, functional potential, cooperative relationship, and the relationship with the environment.It is not limited by the isolation and culture of microorganisms, and provides an effective means for the study of microbial community and ARGs distribution.
This study describes the microbial community of the vitreous of an endophthalmitis patient in Wenzhou.The patient's disease progressed rapidedly and accompanied by severe pain.Metagenomic analysis revealed that the vitreous sample was predominated by Streptococcus.This strain harbor's multiple types of ARGs which partly explains why the patient's disease progressed rapidly.This study also uncovered genomic characteristics and comparative genomics of Streptococcus sp.v1.nov.Genomics-based analysis of patient specimens can promote faster and more accurate clinical diagnosis, which is of great significance for guiding the clinical diagnosis of infectious diseases.

Fig. 1
Fig. 1 Metagenomic analysis of species classification and antibiotic resistance.a Sunburst plot shows the taxonomic composition at the species level, b the radial bar chart shows antibiotic resistance genes (ARGs) composition The core genome curve set plateaued while the pan-genome trend curve grew continuously, indicating an open pan-genome and a conserved core genome.Donati et al. conducted a pan-genome analysis of Streptococcus pneumoniae strains closely related to Streptococcus sp.v1.nov.and observed the presence of an open pan-genome in Streptococcus pneumoniae as well [50].

Fig. 3
Fig. 3 Pan-genome analysis of Streptococcus sp.v1.nov.and TCS top 25 hits Streptococcus strains.a Core-pan plot of Streptococcus sp.v1.nov and TCS top 25 hits Streptococcus strains.The green and yellow lines indicate the numbers of pan genomes and core genomes, respectively, b The UPGMA clustering tree based on core gene sequences of Streptococcus sp.v1.nov.and TCS top 25 hits Streptococcus strains (Color figure online)

Fig. 4
Fig. 4 Detailed COG and KEGG pathway classification in core, accessory, and unique genomes.a The COG distribution b The KEGG distribution

Page 9 of 10 378
KYK [2015]34).The tenets of the Declaration of Helsinki were followed in this study.Consent to ParticipateWritten informed consent of this subject was obtained before sample collection.Consent for Publication Not applicable.
Table 1 contains the digital ANI and DDH values between this Streptococcus MAG and 12 type strains of related species.The digital DDH and ANI values of Streptococcus MAG against 12 type strains of related species were below 95 and 70% respectively, which showed that Streptococcus MAG was a new species.This new Streptococcus strain was named Streptococcus sp.v1.nov.Consistent with the metagenomic taxonomic classification results, both DDH and ANI values showed that Streptococcus sp.v1.nov.was highly different to 12 type Streptococcus strains including Streptococcus gwangjuense ChDC B345, Streptococcus mitis NCTC 12261, Streptococcus pseudopneumoniae CCUG 49455, Streptococcus pneumoniae NCTC 7465, Streptococcus oralis subsp.dentisani CECT 7747, and the others.

Table 1
The digital DDH and ANI values between Streptococcus sp.v1.nov.and 12 type strains of related species