Identification and characteristics of clinical H. pylori isolates
Twenty-seven clinical strains of H. pylori were successfully isolated from the gastric mucosa of Chinese patients with chronic gastritis, atrophic gastritis, gastric ulcer, and gastric cancer and identified by gram staining, urease, catalase, and oxidase tests. Sequencing of the 16S rDNA and cagA genes of H. pylori was performed and showed all of the 27 isolates were cagA-positive (Additional file 1: Fig. S1). Subsequently, the cagA gene sequences amplified from 30 H. pylori strains by PCR, including 26 sequences originating from clinical isolates named GZ1-GZ26 and four sequences from Western strains NCTC11637, NCTC11639, 26695, and SS1, were submitted to the GenBank database with accession numbers KR154731-KR154758, GQ161098, GQ161099, KR154758, and KR154757. One strain (GZ15) was not submitted to the GenBank database due to the loss of the C-terminal region of CagA.
Based on the characteristics of EPIYA motifs of CagA, 20 of the 30 strains were classified as East Asian strains with the remaining 10 strains as Western strains, including six clinical isolates (Table 1). In addition, two East Asian strains (GZ17 and GZ18) were isolated from the same patient and East Asian strain GZ11 and Western strain GZ10 were isolated another patient. Strain GZ15 lacks EPIYA-A, -B, and -C/D sites and cannot be divided into any group.
The sequence comparison of CagA showed that Western strains had more variation than East Asian strains in EPYIA motifs, especially with the deletion of the EPIYA-C site (5/10 strains) and variation of A→T at the EPIYA-B site (5/10 strains), while only 2 of the 21 East Asian strains had A→V variation at the EPIYA-D site and one strain showed P→S conversion at the EPIYA-B site (Table 2). More importantly, the deletion or partial deletion of 13 amino acids at the 821th-834th region of CagA (referring to 26695 sequences) was detected for the first time in all East Asian strains but not in Western strains (Fig. 1A). An eight amino acid difference between the EPIYA-C/D side and CM motifs between East Asian and Western strains was also detected (Fig. 1B).
Functional domain diversity of CagA between East Asian and Western strains
Studies on the crystal structure of CagA revealed that CagA consists of an N-terminal ordered region (residues 1-829), including domain I, II, III, and a C-terminal disordered segment (residues 830-1186) [11]. There are some important functional domains and segments in CagA, including the CagA-phosphatidylserine (PS) interaction domain, EPIYA motifs, and CM motifs (Fig. 2A). The PS segment in domain II mediates the attachment of CagA to the cytoplasmic membrane past-translocation, while N- and C-terminal binding sequences (NBS and CBS) are associated with CagA dimerization. EPIYA motifs (Glu-Pro-Ile-Tyr-Ala) have several important functions, such as CagA phosphorylation, binding of CagA to SHP-2, and activation of multiple intracellular signaling pathways. CM motifs, by binding to partitional defective 1 kinase b (PAR1b, a serine-threonine kinase), participate in the maintenance of gastric epithelial cell polarity. In brief, these domains play a crucial role in H. pylori-induced gastric pathogenesis. Interestingly, we found that there is significant sequence variability in these important domains and their flanking regions between East Asian and Western CagA (Fig. 2B).
Construction of molecular phylogenic tree of CagA
One hundred and fifty sequences of H. pylori-CagA, including our own isolates, were randomly obtained from the Genbank database to construct the CagA-based molecular phylogenic tree via MEGA software (Fig. 3). In the phylogenic tree, a total of 150 strains were clustered into two large groups: East Asian group with characteristics of EPIYA-ABD (80 strains) and Western group with characteristics of EPIYA-ABC (70 strains). Western group was further clustered into three subgroups that was named as Western group, East Asian type of the Western group, and South America type of the Western group based on its geographical origin. The East Asian group primarily originated from China (20/80), Japan (17/80), Vietnam (29/80), and the Philippines (6/80). The Western Group primarily originated from Colombia (24/70), and Philippines (12/70). However, many strains originating from East Asian countries including Japan, China, and Korea are clustered into the East Asian type of the Western group; two strains originating from Peru are clustered into the South America type of the Western group. In 24 of our clinical isolates, 18 and six isolates are clustered into East Asian and Western groups, respectively, suggesting that Western H. pylori strains are widespread in China. We further found that strains from the same country and region tend to cluster together, which may be the cause of varying gastric cancer incidence in different geographical regions. According to global estimates of the incidence and mortality rates by the World Health Organization in 2018, gastric cancer incidence gradually decreases from East Asian regions like Korea, China, and Japan to South-East Asian regions, such as Vietnam and the Philippines, and finally to South American regions, such as Colombia.
The sequences flanking EPIYA-C/D sites affect tyrosine phosphorylation of CagA and are defined as left and right CM domains, respectively. We found three sets of different sequences in both sides of EPIYA-C/D sites. The FLLKRHDKVDDLSKVG is a typical sequence located on both sides of EPIYA-C in the Western group and East Asian type of the Western group; it represents the classical CM motifs. The SSLKRYAKVDDLSKVG is located on both sides of EPIYA-C in the South America type of the Western group, which was reported as less virulent strains. The third set of sequences is found on both sites of EPIYA-D in the East Asian group. The KIASAGKGVGGFSGVG segment to the left of EPIYA-D replaces the classical CM motif on the left of EPIYA-C and the FPLRRSAAVND LSKVG to the right of EPIYA-D is partly identical to EPIYA-C (Additional file 2: Fig. S2). In addition, consistent with the results analyzed from our isolates, Western CagA has more variation, in which conversion of EPIYA to EPIYT at the EPIYA-B site, and duplication of EPIYA-C, appear in 25 (36%) and 31 (44%) of the 70 Western strains, respectively, while the same change at the EPIYA-B site was observed only in 4 (5%) of the 80 East Asian strains. The variations in EPIYA motifs of CagA from 150 strains was listed in Table 2.
We also analyzed the composition of 20 amino acids in East Asian and Western CagA from 150 H. pylori stains and found that certain amino acids, such as Glu, Leu, Thr, Arg, and Trp are significantly more present in East Asian strains. Conversely, Lys, Met, Gly, His, Pro, Val, and Cys are significantly more present in Western strains (Additional file 3: Fig. S3). In particular, 50/70 Western CagA contain Cys, an important amino acid in protein dimerization, but only 12/80 East Asian CagA has Cys.
iTRAQ-based quantitative proteomics of East Asian and Western H. pylori strains
CagA sequence polymorphisms between East Asian and Western H. pylori strains cannot completely explain their pathogenic differences. Therefore, we sought to further define the proteomic changes of two strain groups. Six H. pylori strains, including three East Asian strains (GZ1, GZ3, and GZ7) with EPIYA-ABD motifs and three Western strains (NCTC11639, 26695, and GZ5) with EPIYA-ABC motifs) were used to conduct iTRAQ-based absolute quantitative proteomics. Proteomic analysis quantified a total of 2084 proteins and 108 differentially expressed proteins between Western and East Asian H. pylori strains according to the criteria of Bonferroni-corrected P< 0.01, and fold change ≥1.2 (up-regulation) or ≤0.8 (down-regulation) [22]. After exclusion of the hypothetical, duplicate, and unidentified proteins, 70 differential proteins are mapped to the standard strain H. pylori 26695 in Uniprot Database (https://www.uniprot.org/), of which 26 proteins were up-regulated and 44 proteins were down-regulated in the Western group compared to the East Asian group (Fig. 4A and 4B). Using hierarchical clustering analysis, we found that, among differential proteins, CagA protein was highly expressed in the Western group. Alternatively, the urease subunit alpha (UreA) and urease accessory protein (UreH), both of which are related to the colonization of H. pylori in the human stomach, were highly expressed in the East Asian group. We further observed that flagellin-associated proteins (flagellin FlaA, flagellar hook protein FlgE, and flagellar biosynthesis protein FlhA) and cell division proteins (FtsZ and FtsI) were found in abundance in the East Asian group (Fig. 4C). More details of differential proteins are presented in Additional file 4: Table S1, and the identification of differential proteins by MS/MS are presented in Additional file 5:Table S2.
Because of the high heterogeneity among different H. pylori strains, the biological repeatability of three strains in same group was analyzed by Principal Component Analysis (PCA) and correlation coefficients of normalized protein intensity between two strains of the same group were measured. The results indicated good clustering and clear distinction for both groups (Fig. 4D). The correlation coefficient ranges from 0.45 to 0.63 (Additional file 6: Fig. S4). The standard deviation and coefficient of variation of the abundance of 2084 proteins and 70 differential proteins were also calculated and shown in Additional file 7: Table. S3 and Additional file 8: Table. S4, respectively. Finally, the mRNA expression levels of five differentially expressed proteins including UreA, FlaA, FlgE, CagA, and FlhA were validated by RT-qPCR and the validation results were consistent with the proteomic results (Fig. 4E).
Functional annotation and protein interaction networks of differential proteins
To obtain functional information and interaction networks, 70 differentially expressed proteins were annotated with gene ontology (GO) by DAVID 6.8. KEGG pathway enrichment and interaction network analysis were conducted by KOBAS 3.0 and STRING online tools, respectively. The results showed that these differential proteins are mainly associated with biosynthetic processes, metabolism, translation and gene expression (Fig. 5A), and are enriched into nine important pathways, in which five pathways possess significant enrichment (FDR-corrected P value < 0.05) (Fig. 5B and 5C). Protein-protein interaction analysis indicated that the highly-expressed proteins in East Asian strains are clustered into two significant networks with UreA and FtsI as core nodes, while the highly-expressed proteins in Western strains are clustered into an important network with GroEL and CagA as nodes (Fig. 5D).