Antimicrobial Resistance and Genetic Characteristics that Might be Related to Antibiotic Resistance of 112 Helicobacter Pylori Shanghai Isolates

Background Shanghai, east China has one of the world’s highest burdens of Helicobacter pylori infection. While multidrug regimens can effectively eradicate H. pylori, antibiotic-resistant (AR) H. pylori has been recognized by the WHO as ‘high priority’ for urgent need of new therapies. Moreover, the genetic characteristics of H. pylori AR in Shanghai is under-reported. The purpose of this study was to determine the resistance prevalence, re-substantiate resistance-conferring mutations, and investigate novel genetic elements might be related to H. pylori AR.


Introduction
Helicobacter pylori colonizes the gastric mucosa of 50-70% of the world's population and increases the risk of developing chronic gastritis, peptic ulcer, gastric cancer, and gastric mucosa-associated lymphoid tissue (MALT) lymphoma [1][2][3].Currently, the recommended H. pylori eradication regimen is triple or quadruple therapy, comprising a proton pump inhibitor and a gastric mucosal protective agent, accompanied by one or two antibiotics including clarithromycin (CLA), metronidazole (MTZ), levo oxacin (LEV), and amoxicillin (AMX) [4,5].However, H. pylori exhibits a diverse genomic structure with a high mutation rate and, under selective antibiotic pressure, manifests growing antibiotic resistance (AR) traits, especially to MTZ and CLA [6].The resistance rates of H. pylori to MTZ, CLA, and LEV in China have previously been measured as 40-70%, 20-50% and 20-50%, respectively [7].Increasing rates of resistance have also been reported in Europe and America [5,8], and AR H. pylori has been recognized by the World Health Organization (WHO) as "high priority" for urgently needed new therapies [9].Given the limited number of antibiotics that are appropriate for H. pylori eradication and the worldwide increase of antibiotic resistance, the need for new efforts to understand the comprehensive mechanisms underlying H. pylori resistance to existing antibiotics is signi cant.
The genomic diversity that is characteristic of H. pylori is crucial for the establishment of associations between genetic elements and antibiotic resistance.A well-con rmed example of the intrinsic resistance mechanism of H. pylori has been elucidated showing that the emergence of resistance is mainly due to mutations in antimicrobial target genes.In particular, the single nucleotide mutations in the target genes of CLA (23S rRNA), MTZ (rdxA, frxA and fdxB), and LEV (gyrA and gyrB) enable H. pylori to evade corresponding antibiotic activity by inhibiting bacterial protein synthesis, hampering intracellular reductionactivation of MTZ, and blocking bacterial DNA replication and transcription, respectively [10,11].However, antibiotic resistance in H. pylori is a complex and multifactorial problem.Several other possible mechanisms have been described, including highly expressed outer membrane proteins that form a permeability barrier, reducing antibiotic uptake [12], loss of porins with narrow pore channels causing decreased intracellular accumulation of antibiotics [13], increase in the e ux of the antibiotics through the e ux-pump transporters [14,15], the passive tolerance to antimicrobials by bio lm formation [16], as well as sporadic mutations with synergistic effects in the loci of some important protein synthesis factors such as translation initiation factor IF-2 and ribosomal protein L22 [17,18].Furthermore, many other whole genome associations with antibiotic resistance have been established in recent years.For instance, a signi cant association has been found between the virulence factor dupA1 genotype and the A2147G CLA resistance mutation [19] and a correlation between H. pyloriresistant strains and CagA-negative strains [20].Additionally, in Gram-negative bacteria, most studies have suggested an inverse relationship of the clustered regularly interspaced short palindromic repeats (CRISPRs) loci with antimicrobial resistance in Escherichia coli, Acinetobacter baumannii, and Klebsiella pneumoniae [21][22][23][24], although some inconsistent observations showed either a positive or negative correlation of CRISPR-Cas with antibiotic resistance genes in some clinically important species [25].However, although the H. pylori is indeed one of the few bacterial species constitutively competent for natural transformation and uptake extracellular DNA [26], little is known about the relationship between CRISPR repeats with antibiotic resistance and acquired resistance genes in H. pylori.
Recently, based on the tools and databases of antibiotic resistance (AR) studies, a resistome-wide association study (RWAS) has brought unprecedented insights into the global distribution of potential antibiotic-resistance determinants of many pathogenic bacteria [27], which facilitates the discovery of the molecular bases of the association between antibiotic resistance and genetic diversity.The aim of this study was to identify novel genetic elements associated with antibiotic resistance and map genomic heterogeneity among H. pylori clinical isolates in Shanghai, China.Here, we surveyed the resistance prevalence and present an antibiotic resistance-related genome characterization including speci c subsystem analysis at the locus level, and RWAS at variant level, for 112 clinically isolated H. pylori isolates from Shanghai.

Patients and Samples
A total of 112 H. pylori isolates (H.pylori-Shi) from 112 individual patients were successfully isolated from 437 gastric biopsy specimens collected during endoscopy in patients with gastric diseases who resided in Shanghai.Patients receiving treatment for H. pylori with any antibiotics, H 2 -receptor blockers, or proton pump inhibitors within 4 weeks before the study were excluded.All the biopsy samples were identi ed as H. pylori by morphology and immunohistochemistry.The 112 gastric biopsy samples positive for H. pylori included 19 in chronic super cial gastritis (CSG), 66 in chronic atrophic gastritis (CAG), 8 in gastric ulcer (GU), 18 in duodenal ulcer (DU), and 1 in gastric carcinoma (GC).The demographics and clinical characteristics (gastroscopy diagnosis) of the 112 patients are shown in Supplementary Table S1.This study was approved by the Ethics Committee for Human Studies of Fudan University Huadong Hospital, with written informed consent from all subjects (Ethics Approval Number: 2020K080).

H. pylori Isolation and Culture
During endoscopy, two biopsy samples were obtained from the antrum, 2-3 cm on the front of the pylorus of each patient.One sample was shipped immediately to a pathology center for H. pylori detection in Huadong Hospital.Another was stored at -80 ºC and kept on dry ice during transfer.To culture H. pylori, the samples were homogenized and inoculated onto commercial H. pylori selective plates (Nissui Pharmaceutical Co. Ltd., Tokyo, Japan), followed by incubating for 3-7 days at 35 ºC under microaerophilic conditions (10% O 2 , 5% CO 2 , and 85% N 2 ).
Small, purple colonies typical of H. pylori were subsequently subjected to Gram staining, the rapid urease test (RUT), antibiotic susceptibility testing, and genomic DNA extraction.

Antibiotic Susceptibility Testing
Antibiotic resistance of H. pylori to MTZ, CLA, LEV, and AMX was determined by E-test (Autobio, China) and an agar dilution method according to the protocols of the Clinical and Laboratory Standards Institute (Wayne, PA, USA).Brie y, after three subcultures from the time individual strains were isolated, the culture suspension turbidity of H. pylori was adjusted with saline to a McFarland opacity standard of 2.0, and the suspensions were inoculated onto a Mueller-Hinton II agar plate (Becton Dickinson, Sparks, MD, USA) supplemented with 10% horse blood.The four drug E-test strips were attached to the plate and incubated at 37 °C for 3-5 days under microaerophilic conditions.Agar dilution MIC tests were performed by serial two-fold dilutions of MTZ, CLA, LEV, and AMX with each concentration ranging from 0.0039-256 mg/L.All drugs were purchased from Wako Pure Chemical Industry (Osaka, Japan).The breakpoints used to determine H. pylori antibiotic resistance were MIC ≥ 1 µg/ml for CLA, MIC ≥ 8 µg/ml for MTZ, MIC ≥ 1 µg/ml for LEV, and MIC ≥ 2 µg/ml for AMX.H. pylori ATCC 43504 was used as the control strain.Based on the antibiotic susceptibility testing results, the H. pylori isolates could be divided into different antibiotic-resistance categories in two patterns: resistance (R)-susceptibility (S) pattern (MTZ-S and -R, CLA-S and -R, LEV-S and -R) and non-overlapping patterns (MTZ mono-R, CLA mono-R, LEV mono-R, MTZ and CLA dual-R, MTZ and LEV dual-R, CLA and LEV dual-R, multiple drug resistant (MDR) and susceptible to four antibiotics).
H. pylori Genomic DNA Extraction and Whole Genome Sequencing H. pylori isolates were inoculated on Columbia agar (OXOID, Thermo Fisher Scienti c Inc., Waltham, MA, USA) medium containing 8% sterile de brinated sheep blood under microaerophilic conditions at 35 °C for 3-5 days.After three subcultures from the time individual strains were isolated, the total genomic DNA of H. pylori was extracted by the cetyltrimethyl ammonium bromide (CTAB) method [28].All the procedures were performed following the manufacturer's instructions.
Genome sequencing was performed using the whole genome shotgun (WGS) strategy based on the Illumina MiSeq platform (Illumina, San Diego, CA, USA) by constructing paired-end DNA libraries.The DNA libraries (~400 base-pair insert size) for each isolate were prepared using the TruSeq DNA Sample Preparation Kit (Illumina, CA, USA).Brie y, genomic DNA was randomly uniformly fragmented in the range of 300-500 bases by sonication (Diagenode Bioruptor UCD-200).Illumina adapters were ligated to each fragment.Quality control of the libraries was identi ed by an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA) with a DNA 1000 chip according to the manufacturer's instructions.The libraries were sequenced on the Illumina MiSeq sequencer using the 600V3 Kit (Illumina) with sequencing mode of paired-end 2×251 bp and sequencing depth of more than 400× in all the isolates.The quality of the raw reads was checked using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc).To obtain high-quality reads, adapter contamination was removed by AdapterRemoval (version 2.1.7)[29], and all reads were ltered on SOAPec (version 2.0) [30] with a Kmer of 17 to trim the low-quality bases with a Phred quality score (Q value) of less than 20.The ltered clean reads were assembled by A5-miseq (version 20160825) [31] into contigs and scaffolds.The sequenced genomes of the 112 H. pylori-Shi isolates were submitted to the National Center for Biotechnology Information (NCBI).

H. pylori Protein Coding Genes, rRNA and CRISPRs Prediction
Predicting genome function elements included the prediction of coding genes (CDSs), non-coding RNA, and CRISPRs.Putative coding sequences were identi ed by Glimmer (version 3.0) [32] with the length of the open reading frame (ORF) no less than 110 bp.tRNA genes were identi ed by tRNAscan-SE (version 1.3.1)[33] and rRNA genes were identi ed by RNAmmer (version 1.2) [34].CRISPRs were identi ed by the CRISPR nder (http://crispr.i2bc.parissaclay.fr/Server/),and the direct repeats (DRs) and spacers in the whole genome were predicted by CRISPRs recognition tool [35].

H. pylori Functional Annotation of Predicted Protein Coding Genes
Functional annotations of all predicted protein coding genes were obtained using the Refseq database (NCBI-NR) (https://www.ncbi.nlm.nih.gov/refseq/) and SwissProt database (http://www.gpmaw.com/index.html).The Refseq database is a non-redundant gene and protein database with relatively high accuracy from the NCBI.In the SwissProt database, all sequence entries have been veri ed by experienced molecular biologists and protein chemists by consulting relevant literature.Each ORF was thus run through the two databases via blastall [36] with an E value less than 1e-6, at least 30% identity and 70% coverage.The functional descriptions of the best hits in the databases were assigned to the corresponding protein coding genes [37].
Functional classi cations by clusters of orthologous groups (COGs) of proteins were analyzed against the eggNOG (evolutionary genealogy of genes: Nonsupervised Orthologous Groups) database (version 5.0) (https://eggnogdb.embl.de/#/app/home)[38].The unique genes of the indicated categories were annotated using this database via blast with E value less than 1e-6 and at least 70% identity and 70% coverage.The database assigned the best hits to the corresponding protein coding genes, which were then classi ed into corresponding COG classes.The genes containing more than one domain of different categories were classi ed as multiple class genes, whereas those that did not have a suitable hit against the database were recognized as hypothetical proteins.

H. pylori Genome-Wide Gene Family Analysis
A data set of the predicted protein sequences with a minimum of 50 amino acids of all the 112 H. pylori isolates were constructed for further ortholog analysis.An all-VS-all blastp analysis was performed on the dataset with an E value of 1e-10 and 70% protein coverage cut-off applied for the alignments.The orthologous gene clusters of the sequences alignment results were identi ed using OrthoMCL (version 2.0.8)[39], with the I (In ation) value was set to 1.5.Finally, the unique genes of indicated strain population were collected and further analyzed.

H. pylori Genomic Subsystem Analysis
Virulence factor associated genes in the protein coding sequences of the 112 H. pyloriisolates were predicted by aligning amino acid sequences in the Virulence Factors of Pathogenic Bacteria (VFDB) database (http://www.mgc.ac.cn/VFs/) [40], which is a comprehensive database of pathogenic bacteria virulence factors containing literature-con rmed 532 virulence factors and 2599 virulence-related genes from 74 genera of pathogenic bacteria including H. pylori.Each protein sequence was thus run through the database via blastp v.2.6.0+ for virulence-associated genes with E-value less than 1e-5, greater than 60% identity, 70% coverage, and less than 10% gap length.To retrieve the given outer membrane protein (OMP) family genes and e ux pump system genes, the blastx v.2.5.0+ was used with E-value less than 1e-6 and greater than 90% identity.

Identi cation of Acquired Antimicrobial Resistance Genes
The command line version of ResFinder (version 4.1) with default parameters (90% identity and 60% coverage) was used to identify acquired antimicrobial resistance genes (ARGs) in which resistance is conferred by a complete gene [41].The ResFinder database is composed of 1,649 acquired ARGs from 15 antibiotic drug classes.It does not identify resistance conferred by point mutations.

H. pylori Resistome
The resistome of the 112 H. pylori isolates was obtained from databases and literature (Supplementary Table S6).First, the antibiotic-resistant (AR) genes in the genomes of the 112 H. pylori isolates were identi ed using the Comprehensive Antibiotic Research Database (http://arpcard.mcmaster.ca)via BLAST [42].An E value of 1e-6, 45% identity, and 70% protein coverage cut-off were applied for the alignments.The database includes genes involved in antibiotic susceptibility and/or resistance.To speci cally identify AR enzyme genes, the BLDB (β-lactamase database) (version 1.0.11)(http://bldb.eu/)[43], a newly updated manually curated database for AR enzymes (mainly β-lactamase), was used with an E value of 1e-6, 80% identity, and 70% protein coverage cut-off.Because there was no hit obtained in the database, no β-lactamase genes were included in the nal resistome.
To obtain a more comprehensive resistome, we compiled a literature-supported list of genes whose variation or presence or absence is associated with antibiotic resistance in H. pylori species, including antibiotic targets and other sporadically reported genes, which were either known to confer H. pylori antibiotic resistance or subsequently found to be associated with H. pylori resistance to clinically used antibiotics.

Mapping and Variant Detection
To identify genetic alterations resulting in single nucleotide polymorphisms (SNPs) and insertions or deletions (Indels), the sequencing reads of each isolate were mapped to the H. pylori 26695 reference genome (NC_000915.1)using the Burrows-Wheeler Alignment Tool (version 0.7.12-r1039) [44].The alignment les were subjected to local realignment and de-duplication using the Picard (version 1.107) (http://www.psc.edu/index.php/user-resources/software/picard).The accuracy of variant culling was improved using Genome Analysis Toolkit (GATK) [45].Variant SNPs and Indels were culled from each alignment le using GATK.Variant ltering was carried out by removing variants with low mapping quality (MQ) < 40, quality depth (QD) < 2, and haplotype score > 13.The variants were annotated using ANNOVAR (Wang et al., 2010) to determine non-synonymous SNPs (nsSNPs) and frameshift Indels (fsIndels).All variants identi ed in this study were manually inspected using the Integrative Genomics Viewer (version 2.3.86)[46].
Linkage Disequilibrium Among Variants Potentially Associated With Antibiotic Resistance or Susceptibility The linkage disequilibrium (LD) of SNPs/Indels potentially associated with MTZ, CLA, and LEV resistance and susceptibility (P < 0.05) was assessed by calculating r2 and D' values to identify co-occurring correlation strength between each two variant sites using plink (version 1.07) [47].The value of D' was between -1 and 1, and the D' value of 0 represented the two variants in a state of complete linkage equilibrium, and the D' value of 1 represented the two variants in a state of complete LD.

Crystal structures and Functional Domains of the Resistome-Encoding Proteins With Resistance-Associated Variations
The crystal structure of the proteins coded by the indicative genes within the H. pylori resistome were available from SWISS-MODEL (https://swissmodel.expasy.org/)[48].The structure model with sequence identity greater than 30%, GMQE (Global Model Quality Estimation) score greater than 0.6, and QMEAN Z-score between -4 to 0 was selected from the alignment results.The detailed model evaluation including the local quality estimate and comparing the model quality score with that of other experimental structures of similar size are shown in Supplementary Figure S3.The functional domain of the indicative genes within the H. pylori resistome was identi ed using Pfam 34.0 (http://pfam.xfam.org/)[49].All signi cantly matched entries to each gene sequence were retained and all insigni cantly matched entries were discarded.

Statistical Analysis
Differences between groups were compared using the Fisher's exact tests, chi-squared test, Pearson's chi-squared tests, or Student's t-test according to the data type and the number of comparisons.When the results of Pearson's chi-squared tests showed a statistical difference, the intra-group pairwise comparison was adjusted using the Bonferroni formula.The statistical analysis was performed via RStudio (R 4.0.2) or GraphPad prism (version 8.0).A Pvalue less than 0.05 was considered statistically signi cant.

Results
Antibiotic Susceptibility and General Genomic Features of the 112 H. Pylori-Shi Isolates A total of 112 H. pylori isolates from patients with digestive diseases were categorized by antibiotic susceptibility phenotypes.Patients' mean age was 52.2 ± 14.8 (mean ± SD) years, and 54.5% (61/112) were male (Table 1).The prevalence of resistance to MTZ was the highest (65.2%), with 33.9% MTZ monoresistant, followed by LEV (34.8%), with 4.5% LEV mono-resistant, CLA (16.1%) with 2.7% CLA mono-resistant, and no isolate was resistant to AMX.For dualresistant isolates, the majority (20.5%) were resistant both to MTZ and LEV.A total of eight isolates were MDR isolates that were simultaneously resistant to MTZ, CLA, and LEV, and 28 were susceptible to all four antibiotics (Table 1).Gender, age, and gastroscopy diagnosis were not signi cantly associated with resistance (P > 0.05).

Genome-wide Gene Family Analysis and Resistance-Unique Genes of H. pylori Isolates
We initially conducted gene family analysis to de ne the core genetic content and unique-gene pool of speci c antibiotic resistance.We obtained 2,439 genes from all the genomes, and among these genes, 2,194 genes were found shared between MTZ-S and -R, 2,101 genes were shared between CLA-S and -R, and 2,193 genes were shared between LEV-S and -R, accounting for 89.95%, 86.14% and 89.91% of the total genes, respectively, which constituted the core genetic content of the corresponding resistant and susceptible categories (Figure 1A-C).Moreover, we obtained 183, 9, and 23 unique genes in MTZ-, CLA-, and LEVresistant categories, respectively, containing 75, 5, and 13 newly found genes which were not included in neither NCBI-NR, SwissProt, nor COG database (Figure 1D-F, Supplementary Table S9).Among the remaining 108, 4 and 11 unique genes de ned as resistance-unique-gene pools of the three antibiotics resistance categories (Supplementary Table S3), only 31, 2, and 3 genes could be assigned to various COG functional classes, respectively (Figure 1D-F).

Distributions of the OMP Family Genes in Resistant and Susceptible Categories
To investigate whether the occurrence of genes of the OMP families was related to speci c antibiotic resistance, the presence and absence of genes predicted to encode OMPs in different categories were analyzed, including the currently identi ed ve paralogous gene families major OMP family including Hop subfamily and Hop-related (Hor) subfamily, Hof family, Hom family, iron-regulated OMPs, and e ux pump OMPs, ranging in size from 3 to 36 members, as well as 9 other putative OMPs not in paralogous families (Supplementary Table S4).We rst compared the average levels of OMP genes in the resistant and the susceptible categories irrespective of the number of alleles of the genes.There were no signi cant differences in the average number of total OMP genes, the major OMP family genes, Hof and Hom family genes, and e ux pump OMP genes between isolates resistant and susceptible to the three antibiotics, respectively (Figure 2A-F).We intriguingly found that the average frequency of the iron-regulated OMP genes (fecA and frpB) in the CLA-resistance category was signi cantly lower than that in CLA-susceptible category (P = 0.0103).Notably, the frequencies of these genes in the isolates varied greatly.Nearly all the isolates contained the majority of the Hop subfamily genes (not including hopE and hopN), the e ux pump OMP genes, horB, horG, horD, hofA belonging to the hor or hof subfamily, and the gene coding BamA and murein lipoprotein 20 (Lpp20) (Figure 2G).However, horA, horC and hofG genes were present in very few strains.Remarkably, difference analysis showed the proportion of isolates carrying hopE in both MTZ-and LEV-susceptible categories was signi cantly higher than that in corresponding resistant category (Figure 2G).Another gene with signi cantly lower frequency in resistant categories were hofF (LEV).

Distributions of the Virulence-Associated Genes in Resistant and Susceptible Categories
To investigate whether the occurrence of virulence factor genes H. pylori carried was related to speci c antibiotic resistance, the presence and absence of 108 predicted virulence-associated genes against the VFDB database in resistant and susceptible categories was analyzed, including 26 cag pathogenicity island (cagPAI) genes, 36 agellar genes, and 47 other virulence-associated genes (Supplemental Table S5).Similarly, we compared the average levels of virulenceassociated genes in the resistant and susceptible categories irrespective of the number of alleles of the genes.There were no signi cant differences in the number of total virulence-associated genes, cagPAI genes, and agellar genes between resistant and the susceptible categories of the three antibiotics, respectively (Figure 3A-C).The presence frequency of these genes in the resistant and susceptible categories were compared, and there was no signi cant difference in the distribution of most of the genes (Supplemental Table S5).Dramatically, the frequencies of cagY (HP0527) and p A (HP1274) were signi cantly higher (P = 0.0240) and lower (P = 0.0488) in MTZ-R category than that in MTZ-S, respectively (Figure 3D-E).

Distributions of the E ux Pump Genes in Resistant and Susceptible Categories
To investigate whether the occurrence of e ux pump genes was related to speci c antibiotic resistance, the presence and absence of 18 literature-based e ux pump genes were analyzed, including 12 RND family genes, 3 ABC family genes, and 3 MFS family genes (Figure 4).Among the 18 genes, 16 were present in almost all the 112 H. pylori isolates, and the remaining two genes, spaB (HP0600) and HP1181 were widely deleted in these isolates, detected in 25 and 31 isolates, respectively.Nevertheless, the proportion of isolates carrying spaB in the MTZ-R (P = 0.0002) and LEV-R (P = 0.0408) categories was obviously higher than that in the MTZ-S and LEV-S categories, respectively (Figure 4).

Correlation Analysis of CRISPR and Its Components With H. pylori Antibiotic Resistance
Given the natural function of the CRISPR-Cas system as an adaptive defense mechanism against foreign DNA and a controversial effect of the CRISPR-Cas system on the transfer of the acquired antibiotic resistance genes (ARGs) in bacterial genomes, we conducted predictions in the CRISPRs-Cas systems and acquired ARGs where resistance is conferred by a complete gene rather than mutations, seeking to explore the relationship among CRISPR-Cas, acquired ARGs, and antibiotic resistance in H. pylori.Unfortunately, the results suggested neither ARGs nor cas genes were found in 112 H. pylori isolates.However, a total of 153 CRISPR loci were predicted in 83 isolates, ranging in length from 65 to 496 bp.
Because of the different CRISPR contents of the isolates, the proportions of isolates with 0, 1, 2, 3, and 4 CRISPRs in resistant and susceptible categories of MTZ, CLA, and LEV were analyzed, respectively (Figure 5A-C).Notably, the proportion of the isolates with no CRISPR in the LEV-R category was signi cantly higher than that in the LEV-S category (P = 0.0075, Figure 5C).Because of the different spacer contents of the CRISPRs, the proportion of CRISPRs with 1, 2, 3, 4, and 6 spacers in resistant and susceptible categories was also analyzed (Figure 5D-F).Interestingly, the proportion of the CRISPRs with three spacers in the MTZ-R category was signi cantly higher than that in the MTZ-S category (P = 0.0331, Figure 5D), and the proportion of CRISPRs with two spacers in the LEV-R category was signi cantly lower than that in the LEV-S category (P = 0.0152, Figure 5F).Similarly, the proportion of the CRISPRs with two or three spacers in the resistance category of the other two antibiotics was also higher or lower than that of the corresponding susceptible category, respectively, although there was no signi cant difference (Figure 5D-F).In addition, detailed analysis of the DR and spacer sequences showed that a CRISPR combination containing three most common CRISPRs in H. pylori coexisted in seven isolates, which intriguingly were all MTZ mono-R isolates (Figure 5G).Of note, one of the DR in the CRISPR combination occurred in CRISPRs of 9 isolates, which included CRISPRs in seven MTZ mono-R isolates and 2 CRISPRs (Supplementary Figure S1) in two other MTZ-R isolates (a MTZ mono-R isolate and a MTZ and CLA dual-R isolate), causing the signi cantly higher proportion of the isolates with the DR in the MTZ-R category compared with that in the MTZ-S category (P = 0.0259, Figure 5G, Supplementary Table S10).
Phylogenetic Analysis and Relationship of Antibiotic Resistance Patterns and speci c genetic loci with strain relatedness in H. pylori Isolates.
When constructing the maximum likelihood tree of the 112 H. pylori-Shi genomes with respect to different resistance patterns and the pro le of the observed differently present subsystem genomic loci between susceptible and resistant categories for the corresponding antibiotic, no strain-relatedness was found to be associated to neither each single antibiotic phenotype nor the presence/absence of the differently present loci including genes coding OMPs (hopE, hofF and iron-regulated OMPs), virulence-associated genes (virB10/cagY and p A), e ux pump genes (spaB), and CRISPRs with 2 or 3 spacers or the indicated DR sequences (Figure 6).Alternatively, these strains had distinct susceptibility and indicated loci pro les regardless of genetic relatedness among strains.
Genome-Wide Genetic Variations and Prevalence of Variations in H. pylori Resistome We culled SNPs and Indels within the coding sequences and noncoding rRNAs and tRNAs from the genomes of all the H. pylori isolates when compared with H. pylori 26695 reference genome (Figure 7A), which showed high densities of the variations that could be as high as 136 Indel density and 75 SNP density corresponding to reference genome (Figure 7A).Among these genetic variations, a total of 388 non-synonymous SNPs (nsSNPs) and 1,718 frameshift Indels (fsIndels) were identi ed (Supplementary Table S8).
We obtained a list of 80 genes de ned as the H. pylori resistome, whose variation, presence, or absence were known to either confer H. pylori antibiotic resistance or subsequently found to be associated with H. pylori resistance to clinically used antibiotics not limited to antibiotics with known phenotypes in this study.To investigate the prevalence of the genetic variations in the H. pylori resistome, we rst assayed the variations in six most common resistance genes, including 23S rRNA, gyrA and gyrB genes, and rdxA, frxA, and fdxB genes, in which single-point mutations in speci c regions are known to be responsible for CLA, LEV, and MTZ resistance, respectively (Table 2-4).Of these SNPs, a subset was signi cantly associated with antibiotic resistance, including A2143G, A2142G and C148T in the 23S rRNA genegene, and N87T, N87I, and D91G, D91Y in gyrA, as well as I38V in fdxB.Additionally, the assay detected four Indels in 23S rRNA, three Indels in gyrA, and one in frxA.Unexpectedly, the prevalence of one of the Indels (762_763insT) in 23S rRNA was signi cantly higher in the CLA-S category (53.19%) compared with that in the CLA-R category (Table 2).Furthermore, we overviewed the distribution of the numbers of the SNPs and Indels present in the remaining genes of the H. pylori resistome of the 112 isolates along with the antibiotic resistant pro le of these genes (Supplementary Figure S2).Of the remaining 74 genes of the H. pylori resistome, 174 variations were obtained corresponding to 54 genes, involving 29 genes detected containing nsSNPs, 36 containing fsIndels, and 10 containing both nsSNPs and fsIndels (Supplementary Table S7).Additionally, the maximum number of Indels (ranging from 0 to 15) contained in these genes was up to three times the maximum number of SNPs (ranging from 0 to 3), suggesting to a certain extent that Indel mutations make greater contributions to the high genetic diversity of the H. pylori resistome than SNP mutations do (Supplementary Figure S2).In order to explore MTZ, CLA, and LEV resistance-related or resistance-induced compensation related potential genetic variants, we compared the prevalence of the resistome-wide nsSNPs and fsIndels between resistant and susceptible isolates.We observed 23 nsSNPs and 48 fsIndels corresponding to 31 genes that were signi cantly enriched in antibiotic-resistant or -susceptible isolates (P < 0.05) We found that four variations within the integrase/recombinase gene xerD showed a signi cant association with both MTZ and CLA resistance.Some of these variations had strong linkage disequilibrium (LD).Both nsSNPs (xerD:D353N** and xerD:S275N*) were linked together but in low LD with other two xerD fsIndels.Similarly, the two fsIndels (xerD:I40/120_121 delete GG and xerD:L39/116_117 insert G) were also linked together (Figure 7B).A deletion within the site-speci c DNA-methyltransferase gene hsdM (HP0850) (hsdM:P333/998_1002 delete AGCCG) was found to be signi cantly associated with both MTZ and LEV resistance, which also were in high LD with gyrA:N87I.Furthermore, we observed 16 variations within 15 genes simultaneously showing association with MTZ susceptibility and LEV resistance, and many of these genes code membrane-related proteins (hetA, msbA, hofC, oppA, tetA(p), and babB) or proteins involved in recombination (recG and ruvC) (Figure 7B).Moreover, several deletions within the ATP-dependent serine protease gene lon (especially the deletions at lon:T582, lon:K584, and lon:E588) showing a signi cant association with MTZ susceptibility were found with high LD with more than half of the other variations, including the resistanceconferring nsSNPs 23S rRNA:A2143G, 23S rRNA:A2142G, 23S rRNA:C148T, gyrA:N87T, and some of the remaining variants, most of which were associated with MTZ resistance or susceptibility (Figure 7B) (Supplementary Table S11).
Resistance-or Susceptibility-Associated Variations Occurred in Functionally Important Domains of Genes Within the H. pylori Resistome For the purpose of relating the resistance-or susceptibility-enriched variations to speci c protein domains, we further characterized these variations in the genes within the H. pylori resistome with corresponding protein structure data.Localization of the nsSNPs and fsIndels in the four proteins (Lon, BabB, XerD, and TrpS) with the largest number of phenotype-associated variations were summarized (Figure 8), and the functional domains of all the genes with the phenotype -associated variations within the H. pylori resistome were identi ed.Eight variations in Lon and three partial in TrpS closely located variations were identi ed although they were not in any domains (Figure 8A and D, Supplementary Figure S4).In BabB, two clusters of locationally close variation sites at L59, N60, G129, S130, and N135 were all in the sialic acid binding adhesin (SabA) N-terminal extracellular adhesion domain (Figure 8B, Supplementary Figure S4).
Among four variations in XerD, only one nsSNP S275N was located in the phage integrase family domain that enables the protein to covalently link to DNA.Of the remaining 27 genes, 14 contained variations totally or partially located in their functional domains, including three genes encoding e ux pump transporters with an S267P mutation in the extracellular solute-binding domain of dppA, an R390K mutation in the ABC transporter domain of hetA, and an L304F mutation in the ABC transporter transmembrane region of msbA; three genes required for lipopolysaccharide (LPS) biosynthesis with a V76A mutation in the mur ligase family catalytic domain of murC, a deletion at A350 and an insertion at I351 in the mur ligase family glutamate ligase domain of murE, and a deletion at P129 in the region with glycosyltransferase activity of rfaF; two genes related to protein translation with an S32A mutation in the elongation factor Tu GTP binding domain of lepA and a deletion at A297 in the tRNA synthetase domain of iles, as well as the metalloregulator gene fur with an insertion at S148 and a deletion at E149 in the ferric uptake regulator region; the OMP hofC with an I206V mutation and a deletion at M276 in the major functional domain; the stress-response protein coding gene grpE with H145N mutation in its main functional domain; the chromosomal replication initiator gene dnaA with A200T mutation in the whole functional region; and the restriction enzyme alleles hsdM_2 and hsdM_3 with a deletion at P333 in the domain the N-6 DNA methylase activity and a deletion at N49 in the N-terminal domain, respectively (Supplementary Figure S4).

Discussion
The emergence and rapid spread of AR H. pylori has become a major obstacle to eradicating the infection and a great threat to human health because of a sharp global decline in the e cacy of the recommended treatment that used to be 90% effective and has reached an unacceptable threshold (< 80%) compared to other bacteria [4,[50][51][52].The ability of H. pylori to develop resistance to antibiotics is usually genetically encoded, but only few of these genetic features have been associated with speci c antibiotic resistance.In this study, the rst-line antibiotic (MTZ, CLA, and LEV)-resistance associated genetic features of 112 H. pylori-Shi isolates were analyzed to explore possible links between speci c gene loci and variants with the resistance phenotype (Figure 9).
Initially we determined the antibiotic resistance rates of four antibiotics of the 112 H. pylori-Shi isolates.Our data demonstrated that both the mono-(65.2%and 34.8%) and dual-(20.5%)resistance of MTZ and LEV had the highest prevalence compared to other resistance categories, respectively, while the prevalence of resistance to CLA (16.1%) was slightly lower than the resistance rate range of 20-50% in China previously reported [7].Because of the quite low AMX resistance rate of 0% or < 5% worldwide [52], which was zero in our 112 H. pylori-Shi isolates, it was not possible to conduct further association analysis of the resistance mechanism of AMX.In addition, the H. pylori strains in this study were isolated from patients with four main gastric diseases including chronic super cial gastritis, chronic atrophic gastritis, gastric/duodenal ulcer, and gastric cancer, which enabled us to examine potential differences in resistance rates in different speci c gastrointestinal pathologies (except gastric cancer because of insu cient sample number).The negative results implied that the intragastric conditions of patients with different H. pylori-linked gastric disorders may have no signi cant effect on the antibiotic resistance of the pathogen colonizing the gastric mucosa.
Because different phenotypes of H. pylori are considered to be associated with unique sets of genetic features [53,54], we conducted the genome-wide gene family analysis to compare resistant and susceptible categories to obtain unique-gene pools and their functional enrichment, which is rarely investigated in H. pylori.Putatively affected by differences in the strain number of each resistant and susceptible category, the content of the unique-gene pools of the three antibiotics varied greatly.Nevertheless, nearly or more than half of them were newly found genes in H. pylori with resistance to MTZ (75/183, 40.98%),CLA (5/9, 55.56%), and LEV (13/23, 56.52%), respectively, indicating the extensive intraspecies diversity and genetic heterogeneity of H. pylori [55].
In our analysis, despite a great deal of porin hopE labelled loss in nearly half of the 112 isolates, the gene displayed signi cant enrichment in susceptible categories of both MTZ and LEV compared with the resistant population.Similar situations have been reported in a large number of publications showing the loss of porins in clinical isolates [56].It is assumed that bacterial outer membrane porins in uence the in ux of small-molecule antimicrobials into bacterial cells, and there have been many reports of antibiotic resistance acquired through loss or functional change of porins in a large number of organisms [57].For example, in E. coli, the loss of porins OmpC and OmpF contributes to quinolone resistance [58,59].A. baumannii porins OmpA, CarO, Omp33-36, and OprD have been associated with resistance to cephalothin and carbapenem [60][61][62][63].The loss of OmpK35 and OmpK36 porins of K. pneumoniae has been related to resistance to cephalosporins and increased meropenem MIC [57].The Hop member HopE, one of the rst identi ed porins in H. pylori, is a major nonselective one that is the homolog to the E. coli OmpF porin [64,65].In our study, the signi cant loss of hopE in the MTZ-and LEV-resistant population may implicate the underlying effect of hopE loss on resistance development to the two antibiotics with small molecular weights rather than CLA with a large molecular weight.
Concerning the relationship of virulence factors and antibiotic resistance, our analysis suggested a generally irrelevant relationship between most of the predicted virulence-associated genes and the phenotypes at the "locus" level, including the extensively studied H. pylori virulence factors CagA and VacA.This observation is consistent with some reports showing no correlation between antibiotic resistance and the presence of either the cagA or vacA gene [20,66], but it contrasts with many other ndings that the cagA-negative strains and vacA-containing strains were associated with CLA resistance [67,68].These results remain inconclusive as they showed a variable pattern, most likely due to low sample numbers in individual studies or variations in geographical regions and evolution.Moreover, we rst reported the apparent differences of cagY and p A frequencies in MTZ-resistant and -susceptible populations of H. pylori.CagY is an ortholog of VirB10 and is an essential gene in the ComB type IV secretion system (T4SS) [69].Rather than a canonical type IV (pseudo-) pilus used by other Gram-negative bacteria, H. pylori employs the ComB system for initial DNA uptake during natural transformation [70].Thus, the signi cantly higher frequency of CagY in the MTZ resistance category suggests that the functional ComB system with intact CagY enables the entry of resistance-associated DNA via transformation, which promotes the development of MTZ resistance.Future studies are needed to identify and better understand the speci c mechanisms behind this observation.
Previous reports have demonstrated that the AcrAB-TolC e ux pump belonging to the RND family encoding hefABC, hefDEF, and hefGHI is the most important e ux system in H. pylori, mainly mediating resistance to several antibiotics such as MTZ and CLA in laboratory conditions [11,15].However, our analysis showed that the frequencies of the RND family e ux pump genes in resistant and susceptible categories were not signi cantly different.We also reported here that the ABC family member spaB with a severe loss in the 112 isolates showed a frequency difference in the resistant and susceptible categories of both MTZ and LEV, which is consistent with a previous report by Yang et al. from China that the expression level of spaB in multidrug-resistant strains were signi cantly higher than those in susceptible strains and that the sensitivity of spaB-knockout H. pylori to four antibiotics (LEV, CLA, chloramphenicol and rifampicin) was signi cantly higher than that of wild-type strains (reference not shown).Similar studies on spaB have not been found reported in the rest of the world.Whether the transporter is involved in H. pylori resistance remains to be determined.
Studies investigating the characteristics of CRISPR-Cas systems in relation to antibiotic resistance in clinical H. pylori isolates are lacking.These are signi cant gaps in knowledge because the CRISPR-Cas system is hypothesized to be a natural impediment to HGT in bacteria [71], and that HGT disseminates antibiotic resistance-associated genetic elements among competent pathogens in vivo [72], and it is essential to identify strategies to reduce the spread of these elements.Nevertheless, our analysis failed to predict any acquired ARGs where resistance is conferred by a complete gene.Given the high rate of the naturally competent transformation of H. pylori [73], it is plausible that transfer of single nucleotide mutations conferring antibiotic resistance rather than complete ARGs plays a prominent role in the horizontally acquired resistance of H. pylori, which has been supported by a series of reports [26,70,74].Furthermore, we found that LEV resistance and possession of complete CRISPR loci are inversely related.Similar associations showing that the CRISPR system was more likely to be found in antimicrobial-susceptible species such as E. coli and Enterococci have also been reported [71,75,76].Additionally, our correlation of a different number of spacers with resistance or susceptibility to LEV or MTZ, as well as the interesting nding of a CRISPR combination and a DR sequence exclusively occurring in MTZ-resistant strains, provided an initial foundation for potentially utilizing spacer content or DR sequence to rapidly determine the phenotype of H. pylori isolates.
Similar to other bacteria with small-size genomes, H. pylori responses to environmental pressure rely mainly on variations rather than a complex regulatory network [77].Next-generation sequencing is a powerful tool to compare whole genomes and analyze the variations found within antibiotic resistanceassociated genes.When compiling the H. pylori resistome based on databases, no β-lactamase genes were found in the 112 H. pylori isolates, which was compatible with previous reports that the ß-lactamase gene can be detected in AMX-resistant H. pylori but not in AMX-susceptible strains [78].Regarding the antibiotic target genes, not only the well-known mutations in gyrA (N87T/I and D91G/Y) for LEV resistance and mutations in 23S rRNA (A2143G and A2142G) for CLA resistance were detected but also several other single nucleotide mutations and Indels in relation to antibiotic resistance or susceptibility, which could be compensatory mutations to mitigate the deleterious effects of the speci c functions compromised by the resistance mutations [79].In the other resistomewide genetic variations associated with antibiotic resistance or susceptibility, we noticed an MTZ susceptibility-associated variations in fur.We have known that MTZ belonging to the nitroimidazoles class is a pro-drug that is activated by nitroreductases, and the reduction products can disrupt DNA.It was suggested that the resistance stemming from inactive nitroreductases can be eliminated by mutations in the fur protein [80].The LD analysis among these variations revealed several notable deletion mutations in lon that were found in high LD with more than half of the other variations including the resistanceconferring nsSNPs.Given that lon-encoded protease mediates the selective degradation of mutants and abnormal proteins [81, 82], we speculated that deletion mutations at speci c sites of lon indirectly participate in the resistance mechanisms of H. pylori by in uencing the effect of lon in degrading mutants or proteins coded by mutants potentially related to antibiotic resistance or susceptibility of H. pylori, which may explain the linked co-occurrence of them to a certain extent.The further structure and functional domain analysis of the genes containing the resistance-or susceptibility-associated variations within the H. pylori resistome better indicated their probable involvement in the regulation of antibiotic resistance e ux pumps [14,84,88] and proteins regulating a stress response [84, 85] were all associated with the bio lm formation of H. pylori.Thus, it may be the case that bio lm formation in uenced by these genetic mutants is an important mechanism in the antibiotic resistance of H. pylori, which need to be substantiated via extensive experiments.

Conclusion
Our study demonstrated H. pylori strains isolated from Shanghai exhibited multidrug antibiotic resistance, and correctly identi ed the well-known mutations associated with intrinsic resistance to MTZ, CLA and LEV.Furthermore, we found antibiotic resistance phenotypes were associated with a number of novel locus-and variant-level genetic elements.The underlying co-occurrence relationships among them were also revealed.These ndings provide a data basis for further resistance associated research.Moreover, thorough detection and characterization of antibiotic resistance in H. pylori strains is necessary to provide e cacious treatment regimens.Continued monitoring of antibiotic resistance using susceptibility assays and genotyping is warranted in Shanghai.

Declarations
Availability of data and materials The Whole Genome Shotgun project of the 112 H. pylori-Shi isolates has been deposited at DDBJ/ENA/GenBank (BioProject number PRJNA690879) under the accessions shown in supplementary table S2.Additional raw data is available in supplementary table S8-S11.

Figures Figure 1 common
Figures

Figure 2 Analysis
Figure 2Analysis of OMP family genes in resistant and susceptible categories.Violin plots showing the distribution of the OMP genes in resistant and susceptible categories.The average presence levels of the total 62 OMP genes (A), the major OMP family genes along with two subfamilies genes (B), Hof family genes (C), Hom family genes (D), iron-regulated OMPs genes (F) and e ux pump OMP genes (F) in resistant and susceptible categories were compared, irrespective of the number of alleles of the genes.Solid lines indicate median levels and dotted lines indicate quartiles levels.(G) Proportion of strains with each OMP family gene in resistant and susceptible categories.Only genes with greater than 90% coverage were labelled as present.The number of strains containing the gene were listed on the right.(A-G) * P < 0.05, *** P < 0.001.Where there were no statistical analysis results labeled, there were no signi cant difference.

Figure 3 Analysis
Figure 3 Analysis of virulence-associated genes in resistant and susceptible categories.Violin plots showing the distribution of the virulence-associated genes in resistant and susceptible categories.Comparisons of the average presence levels of the total virulence-associated genes (A), cagPAI genes (B) and agellar genes (C) between resistant and susceptible categories were conducted, irrespective of the number of alleles of the genes.Solid lines indicate median levels and dotted lines indicate quartiles levels.Where there were no statistical analysis results labeled, there were no signi cant difference.(D) Two virulenceassociated genes with signi cant different frequencies in MTZ-R and MTZ-S categories.* P < 0.05.

Figure 4 Analysis
Figure 4Analysis of e ux pump genes in resistant and susceptible categories.Proportion of strains with each e ux pump gene in resistant and susceptible categories.The number of strains containing the gene were listed on the right.* P < 0.05, *** P < 0.001.Where there were no statistical analysis results labeled, there were no signi cant difference.

Figure 6
Figure 6Phylogenetic analysis with respect to antibiotic resistance patterns and speci c genetic loci pro les in H. pylori isolates.Maximum likelihood phylogenetic tree based on whole genome sequences of the 112 H. pylori isolates, the resistance patterns, and the signi cantly different genomic loci observed above between susceptible and resistant categories for the corresponding antibiotic.The susceptible and resistant patterns are denoted by blue and red rectangles, respectively.The presence and absence of the speci c loci are denoted by green and grey rectangles, respectively.

Figure 7 Synoptic
Figure 7 Synoptic representation of the risistome-wide association study (RWAS) of genetic variations and linkage disequilibrium analysis.(A) The densities of total SNPs and Indels identi ed in the 112 H. pylori isolates against H. pylori 26695 genome are overviewed.The numbers next to the rulers represent the number of SNPs and Indels per unit length of H. pylori 26695 genome.(B) The proportions of the strains with the nsSNPs and fsIndels present in the genes within the H. pylori resistome in the resistant and susceptible categories of MTZ, CLA and LEV were analyzed, respectively.Only nsSNPs and fsIndels with signi cantly higher (red cells) or lower (blue cells) frequencies in the resistant categories of MTZ, CLA and LEV were displayed.* P < 0.05, ** P < 0.01, *** P < 0.001.Conversely, grey cells indicate the variations not signi cantly associated with resistance or susceptibility to the antibiotics.Loci, genes, and variations are reported in rows and antibiotics in columns.The nature of the variations is color-coded on the left.The right panel represents the extent of linkage disequilibrium between variations.

Figure 8
Figure 8 Localization of the nsSNPs and fsIndels in the four proteins with the largest number of resistance-associated variations within the H. pylori resistome.The available crystal structure of each protein was used to indicate the sites with amino acid changes.(A) Lon, (B) BabB, (C) XerD, and (D) TrpS.The color scale represents the corresponding antibiotics of the resistance or susceptibility association.

Table 1
Demographic information and antibiotic resistance rates.The overall sequence read length, contigs, scaffolds, ORFs, and guanine-cytosine (GC) content for the genomes of the 112 H. pylori isolates grouped by antibiotic resistance phenotypes are shown in Supplementary TableS2.On average, the read length of the 112 H. pylori isolates' genomes was 1.60 billion bp long and encoded a range of 1,511 to 1,624 genes per genome.A low average guanine-cytosine content, 38.7%, was identi ed, and the chromosome sizes ranged from 1.52 to 1.69 Mb.All sequenced genomes harbored 36 tRNA genes and an average of two rRNA genes.

Table 2
nsSNPs and fsIndels in genes conferring resistance to CLA.
[12]r example, in BabB, ve variations occurred in SabA Nterminal extracellular adhesion domain.This domain is conserved among SabA orthologs and BabA, and is known to function as a sugar-binding adhesion domain with conserved disul de bonds [83].Smiley et al previously reported that BabB transmembrane protein were up-regulated in CLA-resistant H. pylori[12].The other resistome-wide functional domains containing variations included regions important for e ux (dppA, hetA and msbA), LPS biosynthesis (murC, murE and rfaF) and stress response (grpE).It has been reported that adhesins (especially OMPs from the Hop and Hom family) [84-86], LPS [84, 86, 87],