Molecular pathological signatures of gastric cancer in Koreans revealed by label-free proteomics

Recent studies in gastric cancer (GC) suggested that it could be a heterogeneous disease caused by various genetic defects in combination with environmental risk factors. In this study, a quantitative label-free proteomic analysis were performed to detect differentially expressed proteins and fusion proteins that are only expressed in GC tissues and to identify the signal transduction pathways involved in the tumorigenesis of GC in Korean patients. We identified 72 up- and 29 down-regulated proteins in at least 5 out of 9 GC tissues compared with paired normal tissues.

For completely personalized medicine, additional genetic defects present in GCs must be elucidated. In this study, we employed quantitative and qualitative label-free proteomics to identify differentially expressed proteins (DEPs) and fusion proteins that are exclusively present in GC tissues and not in normal tissues. The identified DEPs and fusion genes could be used as prognostic and predictive markers and targets for developing tools for precision medicines.

Protein extraction
For the label-free proteomic analysis, proteins were extracted from gastric cancer and adjacent normal control tissues with the T-PER Tissue protein extraction kit (Pierce Biotechnology, Rockford, IL). Briefly, 20 to 30 mg of tissue was ground with glass beads in 200 μL of the T-PER reagent containing protease inhibitors (Roche Diagnostics, Basel, Switzerland), followed by sonication using 6 bursts of 30 s. The protein extracts were subsequently centrifuged for 10 min at 15,000 g to obtain soluble fractions. All steps were performed on ice. Total proteins in the soluble fractions were quantified using the BCA assay. The supernatant was mixed with the loading buffer (Tris 40 mM pH 7.5, 2% SDS, 10% glycerol, 25 mM DTT), and the mixture was boiled for 5 min at 95°C. Then, 30 μg of protein per lane was loaded into a 12% SDS-PAGE gel. The gels were subsequently cut to separate each sample lane, and each lane was divided into 5 pieces. Finally, each piece was digested with trypsin gold, and the peptides were extracted and completely dried.

Label-free quantitative proteomics
Tryptic peptides were analyzed in triplicate using a nanoAcquity UPLC (Waters, Milford, MA) coupled to a Synapt G1 HDMS mass spectrometer (Waters). Peptides were separated using a BEH 130 C18 75 μm x 250 mm column with a particle size of 1.7 μm (Waters) and enriched on a Symmetry C18 RP (180 μm × 20 mm, particle size 5 μm). For each experiment, 2 μL of tryptic-digested peptides was loaded onto the enrichment column with mobile phase A (water with 0.1% formic acid). A step gradient was employed at a flow rate of 280 nL/min, which included 5 to 45% mobile phase B (0.1% formic acid with acetonitrile) over 55 min, followed by a sharp increase to 90% B within 10 min. The eluted peptides were analyzed in positive ionization mode using the data-independent MS E mode. The MS/MS peaks of [Glu1]-fibrinopeptide (400 fmol/μL) were employed to calibrate the time-of-flight analyzer in the range of m/z 50 to 1990, and a doubly charged [Glu1]fibrinopeptide ion (m/z 785.8426) was employed for lock mass correction. During data acquisition, the capillary voltage was set at 3.2 kV, and the source temperature was set at 100°C. The collision energies for low-energy MS mode (intact peptide ions) and elevatedenergy mode were set to the 6 eV and 15 to 40 eV energy ramping modes (peptide product ions), respectively. The scan time was set to 1.0 s. LC-MS E raw data files were processed, and protein identification and relative quantitative analyses were all performed using the ProteinLynx Global Server (PLGS 2.5.1, Waters).
The processing parameters included automatic tolerance for precursor and product ions, a minimum of 3 fragment ion matches per peptide, a minimum of 7 fragment ion matches per protein, a minimum of 2 peptide matches per protein with a maximum false positive rate (FPR) of 4%, carbamidomethylation of cysteine (+57 Da) as a fixed modification and oxidation of methionine (+16 Da) as a variable modification, and one allowed missed cleavage. Proteins were identified by the ion accounting algorithm of PLGS software, searching the Homo sapiens (Human) database (70,718 entries) on the UniProt website (http://www.uniprot.org).
The quantitative analysis was based on measuring the peptide ion peak intensities observed in low-collision-energy mode in a triplicate set and was performed using Waters expression, which is part of PLGS 2.5.1. Datasets were normalized using the autonormalization function. All proteins were identified with a confidence of >95%, and identical peptides from each triplicate set for each sample were clustered based on mass precision and a retention time tolerance of <0.25 min using clustering software included in PLGS 2.5.1. Only those proteins identified in at least two of three technical instrument replicates with a greater than 80 protein probability score were selected for qualitative and quantitative analysis.
To identify fusion proteins in GCs, we used a fusion protein database from the Catalogue of Somatic Mutations in Cancer (COSMICv77: http://cancer.sanger.ac.uk/cosmic). Regions of the amino acid sequences of fusion proteins that match peptides are highlighted in color, according to the key below.
Twenty-six candidate fusion proteins were manually confirmed to determine whether a peptide spanning the junctional region of two proteins was detected (Additional file 3).

Prediction of three-dimensional structures of fusion proteins
Three-dimensional (3-D) homology models of two fusion proteins were generated at

Data availability
The law data acquired during this study will be provided upon request.

Differentially expressed proteins in Korean GC patients
To identify DEPs in GC tissues, we performed label-free proteomics using 9 pairs of cancer and matched normal stomach tissues. We found that 72 and 29 proteins were commonly up-or down-regulated, respectively, in at least 5 GC tissues compared with normal control tissues ( Fig. 1, Additional file 2) To investigate the Gene Ontology categories of the DEPs, up-or down-regulated proteins were loaded into the Panther database (www.pantherdb.org) for categorization according to biological processes. Among the 72 up-regulated proteins identified in GCs, 42, 34, 23, and 21 proteins were allocated to the metabolic process, cellular process, localization, and cellular component organization or biogenesis categories, respectively. In contrast, among the 29 down-regulated proteins, 13, 11, 10, and 9 proteins were allocated to the multicellular organismal process, metabolic process, developmental process, and cellular process categories, respectively (Fig. 1).
To further investigate the molecular and cellular etiology underlying the pathogenesis of

Significantly up-regulated major components of microtubules
Among the diverse Tubulins present in humans, 6 a-Tubulins [Tubulin a (TUBA)-1A, TUBA-interacted with HSP family A member 1 like, HSP family A member 2, HSP family A member 5, HSP family A member 6, HSP family A member 8, HSP family A member 9, HSP family D member 1, Hypoxia up-regulated 1, and HSP B member 1 (HSPB1). With the exception of HSPB1, all HSPs were up-regulated in GCs.

Up-regulated proteins with protein folding and trafficking activities
Cluster VI (Fig. 2VI)

Up-regulated proteins involved in protein synthesis
Three proteins involved in protein synthesis: Eukaryotic translation elongation factor (EEF)-2, EEF-1A1, and EEF-1A2, were up-regulated and interacted with components in other clusters. Among these proteins, EEF1A2 is involved in tumorigenesis in ovarian cancer [25].  (Fig. 2B). In addition, Serpin peptidase inhibitor clade (SERPIN)-A member 1 (SPERPINA1), which is involved in tumor progression in GCs [32] and colorectal cancer identified by mass spectrometry. Only 2 fusion proteins were confirmed to have a corresponding peptide spanning two proteins: TPM4-ALK and hnRNPA2B1-FAM96A (Additional file 2).

Down-regulated proto-oncogenes and up-regulated peptidase inhibitors in GCs
To gain further insight into the possible roles of the 2 fusion proteins in tumorigenesis, the 3-D structures of the two fusion proteins were predicted by performing homology modeling. The 3-D structure of the TPM4-ALK fusion protein showed that it contains an ALK kinase domain and a TPM that primarily consists of an alpha helix and may assemble into parallel dimeric coiled-coils with normal TPM4 or other TPM4-ALK fusion proteins (Fig.   3a). The 3-D modeling of the hnRNPA2B1-FAM96A fusion protein revealed that the fusion protein presented as a dimer, given that FAM96A is known to form dimers [38]. The ability of the fusion proteins to dimerize suggested that the localization of the fusion proteins could be different from that of the normal proteins and might alter cellular and molecular processes.

Increased size of bands recognized by monoclonal TPM4 antibodies in GC tissues
To confirm the presence of TPM4-ALK fusion proteins, we performed Western blot analysis using monoclonal antibodies specific to TPM3 or TPM4. Anti-TPM4 antibodies recognized bands of increased sizes that were not detected in normal gastric tissues. However, anti-TPM3 antibodies did not detect any additional bands. Only one band from GC tissues showed the same migration pattern observed in normal gastric tissues (Fig. 4).

Discussion
The development of new therapeutic methods for GC has contributed to remarkably decreased mortality worldwide [13]. Nevertheless, GCs remain one of the most common cancers in Korea and Asia [2]. Given that GC is associated with heterogeneous genetic defects [1,3], we must elucidate specific genetic defects at the protein level in each patient to expand the efficacy of personalized medicine [39]. In this study, we employed label-free proteomic analysis to investigate DEPs and fusion proteins in Korean GC tissues.
The DEPs identified in this study could be divided into 6 highly associated clusters and 5 relatively linked groups. These 5 groups were manually curated based on their reported or predicted functions because direct interactions between them have not yet been determined (Fig. 2).
Interestingly, the 6 clusters identified in this study harbored targets for currently prescribed cancer drugs or their homologs as well as proteins known to play pivotal roles in tumorigenesis and metastasis. For example, various b-Tubulins are known targets for Paclitaxel and Vinca alkaloids that inhibit chromosome mitosis and angiogenesis in tumor cells [23], and these targets formed a cluster (Fig. 2II). In addition, the levels of heat shock molecular chaperones, which are increased in most cancers [40], were found to be significantly increased in the present work as well and formed a cluster (Fig. 2V).
Furthermore, ER-and GA-resident components, which regulate protein folding, modifications, and trafficking, were also significantly increased and formed a cluster.
Among these proteins, RPN2, a component of an N-oligosaccharyl transferase complex that is involved in anti-cancer drug resistance in breast cancer [41] and non-small cell lung cancer [42], was significantly increased, suggesting possible conservation of drug resistance and metastasis mechanisms in GC. In addition, the significant increases in HSPA5, CALR and CANX in GC ( Fig. 2V and 2VI) suggested that the unfolded protein response (UPR) might be activated in GC, similar to the activation of the UPR observed in Helicobacter-induced GC [43], to overcome hypoxia and a low nutrient supply, which are two initial stress conditions observed in cancer cells that induce aggressiveness in a murine model of melanoma [44]. Interestingly, we also identified a cluster harboring significantly up-regulated glycolytic enzymes, such as PKM, PGK1, ENO1, ENO2, and GPI   Six highly associated clusters were identified. Additional 5 groups were clustered based on their reported or predicted biological and molecular functions.