CagA is an important oncoprotein that can be translated into the gastric epithelial cells and subsequently tyrosine-phosphorylated at residues of the EPIYA motifs. The phosphorylated CagA can activate the phosphatase SHP-2 and then cause actin cytoskeleton rearrangement, hummingbird phenotype, which disturbs the normal signal transduction pathway of cells and promotes abnormal proliferation of gastric epithelial cells. The interaction between CagA and SHP-2 suggests that CagA, as a key oncoprotein, plays an important role in the development of gastrointestinal diseases caused by H. pylori. Therefore, we used molecular epidemiological methods to study the diversity of CagA 3’ variable region and explore the molecular mechanisms by which H. pylori infection promotes the development of gastrointestinal diseases.
The tyrosine phosphorylation site is located on EPIYA repeat sequences at the CagA C-terminus, and the number of EPIYA repeats directly affects the binding of CagA to SHP-2 and the ability of causing morphological changes of gastric epithelial cells [20]. Therefore, the variation of EPIYA repeat sequences may be an important reason for the difference in H. pylori strains virulence and clinical outcome. In our study, the EPIYA motifs variation in EPIYA-B segments were more frequent than in the EPIYA-A, EPIYA-C and EPIYA-D segments, and the EPIYT (74/1,587) was the most common variant type. In EPIYA-C and EPIYA-D segments, the amino acids following EPIYA motif are generally TIDD and TIDF, respectively, which is an important structural domain of binding SHP-2. Our study confirms that the EPIYA belongs to segment C if it is followed by TIED or TIDE. However, it has been proven that EPIYA is also identified as segment C if it is followed by TIEE, SIDD, TIDG, TIAE or TIAD, and it belongs to segment D if followed by TIDS [2].
According to the segments flanking the EPIYA motifs, we defined several segments, including B’D, B’’D and D’. The sequences of B’D, B’’D and D’ segments have some differences from those of B and D segments. For example, the sequences before EPIYA are similar to those of D segment in B’’D segment, whereas the sequences after EPIYA are similar to those of AD segment in B’D segment. It has been reported that the distribution of CagA EPIYA segments shows great geographical differences. The EPIYA-A and EPIYA-B segments appear in almost all cagA-positive strains, whereas EPIYA-C and EPIYA-D segments are characteristic of Western and East Asian CagA strains, respectively. In our study, 82.1% (413/503) of the CagA strains were of the ABD subtype, whereas 4.0% (20/503) were of the ABC or ABCC subtype. 77.3% (17/22) of the Western CagAs were from Neimenggu, Heilongjiang and Yunnan, which may be due to human migration or direct transmission. Studies have reported that there was no significant correlation between CagA-ABD and the types of gastroduodenal diseases [21, 22]. However, our study confirmed that there was significant correlation between the ABD subtype and gastroduodenal diseases (P < 0.01). Studies have shown that East Asian CagA is more pathogenic than Western Cag A, which may explain why the incidence of gastric cancer in eastern countries is significantly higher than that in western countries [23, 24].
CagA can be phosphorylated by the SFKs at tyrosine residues of the EPIYA motifs [25]. The tyrosine phosphorylated C and D segments specifically bind to SHP2, which plays an important role in the development of gastric cancer [26, 27]. The tyrosine phosphorylated A and B segments can bind and activate the CagA C-terminal Src kinase (CSK) that is a SFK with negative feedback regulation [28, 29]. The inhibition of SFK can lead to the decrease of phosphorylated CagA protein, which to some extent explains that H. pylori can survive in gastric epithelial cells for a long time without causing extensive gastric injury [29]. Therefore, it is thought that CagA with more A and B segments can inhibit SFK more effectively, and thereby reduce cell damage [30, 31]. In the present study, we found 20 CagA sequence types with different numbers of the EPIYA-A or EPIYA-B segment, such as AAABD, ABDABD and BD. The number of EPIYA-A and EPIYA-B segments may lead to the difference in the type and severity of gastrointestinal diseases. The relationship between EPIYA segments and gastrointestinal diseases needs to be further explored.
Research has shown that the pathogenicity of CagA is determined by the binding ability of SHP-2, which is also related to the number of tyrosine phosphorylation sites [12]. Souza [32] reported that the SH2 domains bound to highly correlated sequences, and the binding motif is pY-(S/T/A/V/I)-X-(V/I/L)-X-(W/F). Interestingly, the binding ability of East Asian CagA (pY-A-T-I-D-F) to SHP-2 is higher than that of Western CagA (pY-A-T-I-D-D), which can lead to more severe gastroduodenal diseases. Higashi et al. [10] demonstrated that the difference of single amino acid led to the difference of SHP-2 binding activity between East Asian and Western CagA proteins. Therefore, the research on amino acid polymorphisms and their assosiation with gastrointestinal diseases may have an important clinical value. In our study, we obtained seven amino acid polymorphisms in the sequences surrounding the EPIYA motifs: residues 893, 894, 900, 906, 909, 910 and 963. The absence of the amino acid 893 and 894 had a statistically significant association with GC. In most patients with CG, GU, DU and MALT, the amino acids at residues 893 and 894 were asparagine (Asn) and glutamic acid (Glu), respectively, whereas 36.8% (7/19) of the isolates from GC patients lost these two amino acids. This change may affect the ability of CagA tyrosine phosphorylation and binding to SHP-2, and alter the spatial conformation of CagA protein, thereby accelerating the development of gastrointestinal diseases.