General features of H. himalayensis genome
The nucleotide sequence of the genome of H. himalayensis strain 80(YS1)T was deposited in the NCBI Databases with accession No. CP014991. The final genome assembly of H. himalayensis 80(YS1)T contained 103 contigs with a draft genome size of 1,829,936 bp and a genomic GC content of 39.89% (Table 1). The general annotation was performed by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). It predicted 1,769 genes in total, among which 1,664 belongs to predicted proteins, 1,290 (77.52%) could be assigned to a function with a high level of confidence, and 374 (22.48%) were assigned as hypothetical proteins. The genome contained 39 tRNAs, a pair of 16S rRNAs, a pair of 23S rRNAs, a pair of 5S rRNAs, 3 ncRNAs and one phage intergrase, however, no plasmids or insertion sequence (IS) elements were found. Moreover, a genomic island predicted by a online tool IslandViewer 4, was named HhiG1 in the genome. The genomic island contained multiple genes encoding four restriction endonuclease subunit S, a restriction endonuclease subunit R, a type I restriction-modification system subunit M, a type II toxin-antitoxin system RelE/ParE family toxin, a type II toxin-antitoxin system RelB/DinJ family antitoxin, and site-specific integrase, as well as some hypothetical proteins (See Supplementary Table 1, Additional File 1). HhiG1 comprised 52,677 bp at position 1,738,356 to 1,791,033 in genome .
Phylogenetic relationship and genomic collinearity features of H. himalayensis with other Helicobacter species
Seventeen whole-genome sequences of Helicobacter species, including H. himalayensis, and two whole-genome sequences of Campylobacter jejuni and Acetobacter pasteurianus as outgroups, were selected to construct the phylogenetic tree. H. himalayensis was in a single clade that was adjacent to H. cinaedi, H. bilis and H. hepaticus. These three Helicobacter spp. were in a node that was far away from H. pylori (Fig. 1a). Moreover, the phylogenetic relationship between H. himalayensis and the selected sixteen Helicobacter species, based on the core-genome, were also analyzed (Fig. 1b). The results were consistent with the phylogenetic map based on the Helicobacter genome sequences, which indicated that H. himalayensis is evolutionarily close to H. hepaticus and H. cinaedi but not to H. pylori. The general genomic features of H. himalayensis, H. cinaedi, H. hepaticus, and H. pylori are presented for comparison in Table 1. H. himalayensis has a larger genome than that of H. hepaticus (1,799,146 bp) and smaller genome than that of H. cinaedi (2,240,130 bp) [21]. The genomes of all three Helicobacter spp. were larger than that of H. pylori. All three Helicobacter spp. have phages or phage like elements in their genome, but H. pylori has none. By contrast, H. pylori has IS elements (IS605 and IS606), whereas H. himalayensis, H. cinaedi, and H. hepaticus do not have.
Figure 2 shows a comparison of the gene sequences of H. himalayensis with those of H. cinaedi, H. hepaticus, and H. pylori at the whole-genome scale. The results revealed a very high degree of genome collinearity between H. himalayensis and H. hepaticus (Fig. 2a), as well as between H. himalayensis and H. cinaedi (Fig. 2b). However, it also revealed a low degree of genome collinearity between H. himalayensis and H. pylori (Fig. 2c).
Function analysis of H. himalayensis
Of the 1664 predicted proteins, 1184 were clearly assigned to a functional classification with evidence and 202 were poorly functionally characterized using the Clusters of Orthologous Groups (COG) database (Fig. 3). Moreover, there were 105 proteins without any annotations in the COG database and 173 were not in the database. The numbers of genes identified through functional classification in the COG database for “information storage and processing”, “metabolism”, and “cellular processes and signaling” were 284, 546, and 426 respectively. Compared with H. cinaedi, H. hepaticus, H. bilis, and H. pylori, H. himalayensis has more genes for functions in the “cell wall/membrane/envelope biogenesis” and “coenzyme transport and metabolism” sub-branches, but fewer genes for functions in the “cell cycle control/ cell division/ chromosome partitioning”, “intracellular trafficking/ secretion/ vesicular transport”, “lipid transport and metabolism” and “carbohydrate transport and metabolism” sub-branches (Table 2). The circular genome atlas of H. himalayensis integrating different kinds of information is shown in Figure 4, that including the protein-coding genes, tRNA and rRNA genes, GC content and GC skew information.
In addition, Forty-two genes were predicted to match virulence factors genes in the VFDB from Helicobacter genus, and 83.3% (35/42) of them were associated with flagella motility and bacterial invasion (See Supplementary Table 2, Additional File 2). Other virulence factors included cytolethal distending toxin (cdtA, cdtB, cdtC), lipopolysaccharide Lewis antigens (futA), neutrophil activating protein (napA), autoinducer-2 production protein (luxS), and catalase (katA). The H. himalayensis genome does not contain a urease and the Cag pathogenic island, like H. pylori. However, the genome contained other predicted virulence factors associated with migration, invasion, colonization, and carcinogenesis, which might be pathogenic to the host.