Integrative proteomic characterization of trace FFPE samples in early-stage gastrointestinal cancer

Background The surveillance and therapy of early-stage cancer would be better for patients’ prognosis. However, the extreme trace amount of tissue samples in different stages have limited in portraying the characterization of early-stage cancer. Therefore, we focused on and presented comprehensive proteomic and phosphoproproteomic profiling of the trace FFPE samples from early-stage gastrointestinal cancer, and then explored the potential biomarkers of early-stage gastrointestinal cancer. Methods In this study, a quantitative proteomic method with chromatography with mass spectrometry (LC-MS/MS) was used to analyse the proteomic difference between the trace early-stage esophageal squamous cell carcinoma (EESCC) and early-stage duodenum adenocarcinoma cancer (EDAC). Results We identified ~ 6000 proteins and > 10,000 phosphosites in single trace FFPE samples. Comparative analysis disclosed the diverse proteomic features of tumor tissues compared with paired normal tissue of EESCC and EDAC, and revealed the difference of EESCC and EDAC was derived from their origin normal tissue. The distinct separation of EESCC and EDAC illustrated the functions of cell cycle (RB1 T373, EGFR T693) in EESCC, and the positive impacts of apoptosis, metabolic processes (MTOR and MTOR S1261) in EDAC. Furthermore, we deconvoluted the immune infiltration of early-stage gastrointestinal cancer, in which higher immune cell signatures were detected in EDAC, and showed the specific cytokines in EESCC and EDAC. We performed kinases-substates relationship analysis and elucidated the specific proteomic kinase characterization of EESCC and EDAC, and proposed the medicative effects and corresponding drugs for EESCC and EDAC at the clinic. Conclusion We disclosed the specific immune characterization of the early-stage gastrointestinal cancer, and presented potential makers of EESCC (EGFR, PDGFRB, CDK4, WEE1) and EDAC (MTOR, MAP2K1, MAPK3). This study represents a major stepping stone towards investigating the carcinogenesis mechanism of gastrointestinal cancer, and providing a rich resource for medicative strategy in the clinic. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s12953-022-00188-0.


Introduction
Cancer is still a major health problem worldwide, leading nearly 10 million deaths every year [1,2].
Surgery is the predominant curative treatment strategy in advanced stages (T2 to T4 stages), with poor quality of life (QOL) and low five-year survival rate (< 30%) [3]. Early screening and diagnosis of disease, approved by World Health Organization (WHO) has been prominent nowadays, especially in gastrointestinal cancer. In addition, advances of endoscopic submucosal dissection (ESD) [4] have achieved the early detection cancers (T1 stage), especially in gastrointestinal cancer, with higher QOL and significantly improved overall survival rate (> 90%) [5][6][7]. The major events in advanced-stage cancer, appeared to be identified in as early as in early-stage cancer, thus, the surveillance and therapy of early-stage cancer would be better for patients' prognosis. However, the extreme trace amount of tissue samples in different stages have limited in portraying the characterization of early-stage cancer.
Pathologically, formalin-fixed, paraffin-embedded (FFPE) biospecimens represent gold standard for archiving pathology samples, keeping the tissues stability, providing a valuable resource for clinical and biomarker researches [8]. In addition, FFPE tissue biopsies showed a high degree of the proteome pattern similarity between histological regions samples collected for 1 and 15 years, and facilitates tumor stratification [9]. During the last decade, great progress has been made in mass spectrometry-based proteomics, providing chances for employing biobanked FFPE samples to many types of cancers to reveal cancer proteomic characterization. Specifically, proteomic analysis of colon rectal cancer (CRC) revealed decreased T cell infiltration and increased glycolysis in CRC [10]; Sai Ge et.al, described a proteomic landscape of diffuse-type gastric cancer, and illustrated the overrepresentation of immune response in the third subtypes with worst survival [11]. However, the proteomic profiling of trace FFPE samples in early-stage cancer remain largely unknow.
Here, we collected FFPE samples of early-stage gastrointestinal cancer, including early-stage esophageal squamous cell carcinoma (EESCC) and early-stage duodenum adenocarcinoma cancer (EDAC). We presented a comprehensive proteomic landscape of early-stage gastrointestinal cancer, with the identification of ~6,000 proteins and > 10,000 phosphosites in single trace FFPE samples. We revealed the functional classification and proteomic characterization of EESCC and EDAC. In addition, we elucidated the immune infiltration, cytokines types, specific kinases of EESCC and EDAC, and proposed the potential kinase clinic strategy for EESCC and EDAC, providing a novel useful resource for potential therapeutic approaches for gastrointestinal cancer.

Processing of formalin-fixed, paraffin-embedded (FFPE) specimens
In this study, all the tissue samples from the corresponding substages were separately dissected from the formalin-fixed, paraffin-embedded (FFPE) slides, and were prepared and provided by Zhongshan Hospital, Fudan University. The study was carried out in compliance with the ethical standards of Helsinki Declaration II and approved by the Institution Review Board of Fudan University Zhongshan Hospital (B2019-200R). For clinical sample preparation, slides (10 μm thick) from FFPE blocks were macro-dissected, deparaffinized with xylene and washed with ethanol. All the selected specimens were evaluated and confirmed by two or three experienced and board-certified gastrointestinal pathologists, and materials were aliquoted and kept in storage at -80 ℃ until further processing.
Then, 13 μL 10% formic acid was added into each tube and made vortex for 3 min, and then sedimentation for 5 min (12,000 g). After that, a new 1.5 mL tube with 350 μL buffer (0.1% formic acid in 50% acetonitrile) is needed for collected the supernatant for extraction (vortex for 3 min, and then 12,000 g sedimentation for 5 min). And then the supernatant was transferred into a new tube for drying in 60 ˚C vacuum drier. After drying, 100 μL 0.1% formic acid was needed for dissolving the peptides and vortex for 3 min, and then sedimentation for 3 min (12,000 g). The supernatant was picked into new tube and then desalinated. Before desalination, the activation of pillars with 2 slides of 3M C8 disk is required, and the lipid is as follows: 90 μL 100% acetonitrile twice, 90 μL 50% and 80% acetonitriler once in turn, and then 90 μL 50% acetonitrile once. After pillar balance with 90 μL 0.1% formic acid twice, the supernatant of the tubes was loading into the pillar twice, and decontamination with 90 μL 0.1% formic acid twice. Lastly, 90 μL elution buffer (0.1% formic acid in 50% acetonitrile) was added into the pillar fir elution twice and only the effluent was collected for MS. And then the collect of lipid was put in 60 ˚C vacuum drier for drying.

Proteome analysis in LC-MS/MS Analysis
For the proteomic profiling of samples, peptides were analyzed on a Q Exactive HF-X Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Fisher Scientific, Rockford, IL, USA) coupled with a high-performance liquid chromatography system (EASY nLC 1200, Thermo Fisher). Dried peptide samples re-dissolved in Solvent A (0.1% FA in water) were loaded to a 2-cm self-packed trap column (100-μm inner diameter, 3 μm ReproSil-Pur C18-AQ beads, Dr. Maisch GmbH) using Solvent A and separated on a 150-μm-inner-diameter column with a length of 30 cm (1.9 μm ReproSil-Pur C18-AQ beads, Dr. Maisch GmbH) over a 150 min (EESCC and EDAC). The eluted peptides were ionized under 2.0 kV and introduced into mass spectrometer). MS was performed under a data-dependent acquisition mode. For the MS1 Spectra full scan, ions with m/z ranging from 300 to 1,400 were acquired by Orbitrap mass analyzer at a high resolution of 120,000. The automatic gain control (AGC) target value was set as 3E6. The maximal ion injection time was 80 ms. MS2 Spectra acquisition was performed in the ion trap mode at a rapid speed. Precursor ions were selected and fragmented with higher energy collision dissociation (HCD) with a normalized collision energy of 27%. Fragment ions were analyzed by the ion trap mass analyzer with the AGC target at 5E4. The maximal ion injection time of MS2 was 20 ms.
Peptides that triggered MS/MS scans were dynamically excluded from further MS/MS scans for 12 s. The maximum number of missed cleavages was set to 2. A mass tolerance of 20 ppm for precursor and 0.5 Da for production was allowed. The fixed modification was cysteine carbamidomethylation and the variable modifications were N-acetylation and oxidation of methionine. For the quality control of proteins identification, the target-decoy based strategy was applied to confirm the FDR (False Discovery Rate) of both peptide and protein was lower than 1%. Percolator was used to obtain the probability value (q-value), validating the FDR (measured by the decoy hits) of every peptide-spectrum match (PSM) was lower than 1%. Then all the peptides shorter than seven amino acids were removed. The cutoff ion score for peptide identification was 20. All the PSMs in all fractions were combined for protein quality control, which was a stringent quality control strategy. The q-values of both target and decoy peptide sequences were dynamically increased until the corresponding protein FDR was less than 1% employing the parsimony principle. Finally, to reduce the false positive rate, the proteins with at least one unique peptide were selected for further investigation.

Phosphopeptide enrichment and analysis
The phosphoproteome samples were prepared by Fe-NTA Phosphopeptide Enrichment Kit (Thermo, scan, the higher-energy collision dissociation fragmentation was performed at a normalized collision energy of 30%. The MS2 AGC target was set to 1E4 with a maximum injection time of 10 ms, Peptide mode was selected for monoisotopic precursor scan, and charge state screening was enabled to reject unassigned 1+, 7+, 8+, and >8+ ions with a dynamic exclusion time of 45 s to discriminate against previously analyzed ions between ± 10 ppm.

Principal component analysis (PCA) of trace FFPE samples
We performed PCA on a total of proteins/phosphoproteins identified in 6 gastrointestinal cancer sample to illustrated the global proteomic difference between EESCC and EDAC. The PCA function under the scikit-learn R package was implemented for unsupervised clustering analysis with the parameter 'n_components = 2' on the expression matrix of global proteomic data. A colored ellipse represented the 95% confidence coverage for each group.

Biological pathways enrichment analysis
To investigate the dominant signaling pathways of the overlap proteins and each concentrated gradient of trace samples, we used gens sets of molecular pathways in DAVID [12]. For this analysis, pathways from the GOBP/KEGG database were considered. Statistical significance was considered when P value was less than 0.05 and FDR q value was no more than 0.1.

Kinase Activity Prediction and Phosphopeptide Analysis
The phospho-proteome data of 6 samples were searched against the same database with MaxQuant.
The phosphorylation of S or T or Y was set as variable modification, in which three mis-cleavages were allowed, with a minimum Andromeda score of 40 for spectra matches. The ratios of identified phosphorylation sites of all samples were used to estimate the kinase activities by Kinase-Substrate Enrichment Analysis (KSEA) algorithm [13]. The information of kinase-substrate relationships was obtained from publicly available databases, including PhosphoSite [14], Phospho.ELM [15] and PhosphoPOINT [16]. The information of substrate motifs was obtained either from the literature [17] or from an analysis of the KSEA dataset with Motif (sP) [18]. The kinase-substrate-motif network analysis was referenced from PhosphoSitePlus (PSP, https://www.phosphosite.org/homeAction) [19]. Statistical analysis was performed in R (version 3.5.1) with Kruskal-Wallis test.

Overview of proteomic landscape of trace FFPE samples in EESCC and EDAC
To characterize the comprehensive proteomic landscape of trace FFPE samples, we collected proteomics and phosphoproteomics data from 3 early-stage ESCC (EESCC) patients and 3 early-stage DC (EDAC) patients who had not experienced prior chemotherapy or radiotherapy. A schematic of the experimental design is shown in Fig. 1a and Supplementary Fig. S1a. Proteomic analysis was performed on the basis of mass spectrometry (MS)-based label-free quantification strategy [11,20]. Protein abundance of all samples was firstly calculated by intensity-based absolute quantification (iBAQ) [21,22] and then normalized as a fraction of the total (FOT), allowing for comparisons between experiments.
Spearman's correlation showed lower coefficients (mean = 0.62) between EESCC and EDAC than the same cancer type (mean = 0.74 (ESCC) and 0.78 (EDAC)) indicated the difference in gastrointestinal cancer (Fig. 2a). To explore the difference between EESCC and EDAC, we performed principal component analysis (PCA) at the protein and phosphoprotein levels. Visualization of PCA differentiated the proteome profiles between EESCC and EDAC, as well as at the phosphoprotein level (Fig. 2b). The distinct separation suggested the fundamental difference between EESCC and EDAC.
The significance analysis of microarray (SAM) [25] was performed to investigate the characteristics of EESCC and EDAC at the protein level, which identified 791 differentially expressed proteins (DEPs)  (Fig. 2d).

Immune-based features of EESCC and EDAC
Recent studies have well-established the connection between inflammatory and tumorigenesis, and have considered the inflammatory is an important risk factor for gastrointestinal cancer [30]. To gain insight into features of immune infiltration of early-stage gastrointestinal cancer, we analyzed the proteomic profiles of EESCC and EDAC, and deconvoluted immune, stromal, and microenvironment cell signature using xCell (https://xcell.ucsf.edu) [31]. As a results, we found the cellular characteristics of dendritic cells (DCs) was dominant in EESCC, evidenced by the highly expressed biomarkers at the protein level, such as cluster of differentiation 14 (CD14), CD276, and CD36 ( Fig. 3a -3b . 3a). In addition, the cell markers of B cells (e.g., CD200 and CD38) and T cells (e.g., CD226 and CD81) were also overrepresented in EDAC (Fig. 3b). In EDAC, the molecules of major histocompatibility complex class I and II (MHC-I/II), were highly expressed, including HLA-B, HLA-C, HLA-E, HLA-DQA1, HLA-DQB1, HLA_DRA, HLA-DRB1, etc. (Fig. 3c).

EDAC (n = 297) (Supplementary
Phosphorylation impacts multiple cellular processes, with site occupancy tightly regulated by the activity of kinases and phosphatases [39]. We then performed integrative analysis of the differential kinasessubstates (site), and proposed the functions of drugs approved by FDA in EESCC and EDAC. In EESCC, anti-EGFR with abemaciclib decreased the expression of EGFR (T693) and the downstream phosphorylation of GAB1 (S547), and anti-PDGFRB with imatinib in-activated the phosphorylation of PDGFRB (S712) and ABL (S210), which participated in cell cycle. Additionally, the inhibitor of ribociclib to WEE1 down-regulated the CDK1 at the protein and phosphoprotein levels, and the anti-CDK4 with afatinib decreased the phosphorylation of RBL2 (S662) and RB1 (T373), resulting in the stability of cell cycle-checkpoint which was the final safeguard of genomic fidelity (Fig. 4g). In EDAC, the inhibitor of trametinib to MAP2K1 was negative associated with the phosphorylation of MAPK1 (T185) and MAPK3 (T202). Anti-MAPK3 with ulixertinib decreased the phosphorylation of AKT1 (S129), EIF4EBP1 (S35), and RPS6KA1 (S221), which down-regulated PI3K-AKT signaling and inhibited cell proliferation in EDAC (Fig. 4g). Collectively, we revealed EESCC-specific and EDACspecific kinases, elucidated the functional kinases-substates relationship network, and proposed the potential clinic strategy in EESCC and EDAC, providing a novel insight for gastrointestinal cancer in the clinic.

Discussion
Early screening and diagnosis provided better outcomes for patients, and are employed to many cancers, especially in gastrointestinal cancer. Great progress in mass spectrometry-based proteomics and advancement of FFPE samples enables to explore the molecular characterization of cancers, including CRC [10] gastric cancer [11] breast cancer [40] and so on. Whereas, the trace early-stage cancer sample is still a challenge, and the proteomic profiling of trace FFPE samples of early-stage cancer remain largely unknow.
In this study, we a performed comprehensive proteomic landscape of early-stage gastrointestinal cancer (EESCC and EDAC), and identified ~6,000 proteins and > 10,000 phosphosites in single trace FFPE samples, and presented highly coverage at the protein and phosphoprotein levels, providing proteomic datasets and phosphoproteomic datasets of early-stage cancer. The consistence and positive correlation between proteome and phosphoproteome allowed us to further investigate the characterizations of early-stage gastrointestinal cancer.
The distinct separation of EESCC and EDAC indicated the tumor heterogeneity and difference among cancer types in gastrointestinal cancer. We then performed SAM [25] analysis of EESCC and EDAC, and found that primary functions of normal esophagus tissues were prominent in EESCC, such as keratinization (e.g., KRT2, KRT5, etc.), epidermis development (e.g., CDH3, TCHH, etc.) at the protein and phosphoprotein levels. EGFR and RB1, played key roles in cell cycle, functioned in the metastasis and carcinogenesis of head and neck cancer and lung cancer [41,42]. In EESCC, we found the high phosphorylation of EGFR T693 and RB1 T373, which showed significantly positive correlation, indicating the co-functions of EGFR and RB1 in the ESCC carcinogenesis at the phosphoprotein level.
In EDAC, the high expression of metabolic proteins (e.g., ACO2, ATP5B, PFKL, etc.) and Mapk signaling pathways (e.g., MAPK1, MAPK3, ARAF, etc.) was detected, which was evidenced by the overrepresented phosphorylation of their corresponding phosphoproteins. Previous studies have proved the prominent mutations of MTOR and functions of mTOR signaling in small bowel cancer. [29] In this study, the overrepresentation of MTOR (S1261) at protein and phosphoprotein levels in EDAC, demonstrated the functions of MTOR in the carcinogenesis in duodenum cancer.
Inflammasome signaling is an emerging pillar of innate immunity, and the inflammatory microenvironment promotes gastrointestinal cancer development and invasion [43,44]. Human protein kinases participated in the majority biological processes, including cell metabolism, cell cycle, apoptosis, immune system, etc. [33], and ubiquitous in tumors, such as lung cancer, breast cancer [45,46]. Nowadays, kinases have become an important therapeutic target for the treatment, and inhibitors that target the kinases have been developed and are clinically active [47]. In this study, we depicted Kinome Tree of EESCC and EDAC, and found the kinases of CK1 groups, CMGC groups, TK groups, and TKL groups were overrepresented in EESCC, and the kinases of AGC group, CAMK group, and STE group were prevalent in EDAC at the protein and phosphoprotein levels. In addition, the kinases-substrates correlation network revealed the positive impacts of cell cycle, p53 signaling pathways, and DNA repair in EESCC, and the positive impacts of PI3K-AKT signaling pathways, mTOR signaling pathways, Mapk signaling pathways in EDAC. We also proposed the potential functional mechanism of drugs approved by FDA in EESCC and EDAC. For example, the inhibitor of trametinib to MAP2K1 decreased the phosphorylation of MAPK1 (T185) and MAPK3 (T202), and anti-MAPK3 with ulixertinib had negative impacts on the phosphorylation of AKT1 (S129), EIF4EBP1 (S35), and RPS6KA1 (S221), which down-regulated PI3K-AKT signaling and inhibited cell proliferation in EDAC.
However, limits still existed in our study. The more samples were need, and the trace amount samples restricted to collected genomic and transcriptomic data, and so on.

Conclusion
This study presented a comprehensive proteomic landscape for early-stage gastrointestinal cancer for the first time, with identification of ~6,000 proteins and > 10,000 phosphosites in single trace FFPE sample.
We revealed the functional classification of all identified proteins and phophproteins in EESCC and EDAC. We also disclosed the distinct separation between EESCC and EDAC, and illustrating the impacts of cell cycle in EESCC, and of apoptosis and metabolic processes in EDAC at the protein and phosphoprotein levels. In addition, we deconvoluted the immune infiltration of early-stage gastrointestinal cancer, and found higher immune cell signatures in EDAC. Additionally, we revealed the specific cytokines in EESCC and EDAC. Furthermore, we delineated the Kinome Tree of EESCC and EDAC, elucidated the specific kinases, and proposed the potential clinic strategy in EESCC and EDAC, delivering a novel insight of the clinic therapeutic strategy for gastrointestinal cancer.

Authors' contributions
Lingling       Proteomic characterization of EESCC and EDAC. a Spearman's correlation coe cients among 6 gastrointestinal cancer samples. b PCA analysis showing distinct separation between EESCC and EDAC at the protein (left) and phosphoprotein levels. c Volcano analysis depicted the differential expressed proteins of EESCC and EDAC. d Bar chart presenting the functional pathways in EESCC (top) and EDAC (bottom). e Proteins in functional pathways that were differentially expressed in EESCC and EDAC at protein and phosphoprotein levels. f A brief of the differential proteins and functional pathways in EESCC (top) and EDAC (bottom). g Boxplot showing MTOR was highly expressed in EDAC at the protein (left) and phosphoprotein (right) levels. h Pearson's correlation coe cients indicated signi cantly positive association between MTOR proteome and phosphoproteome (S1261).  Supplementary Files