An N-glycoproteomics Analysis Reveals Glycoprotein Alterations in Esophageal Squamous Cell Carcinoma

Yingzhen Gao Shanxi Medical University Liuyi Shen Shanxi Medical University Tianyue Dong Shanxi Medical University Xin Yang Shanxi Medical University Heyang Cui Peking University Shenzhen Hospital Yanlin Guo Shanxi Medical University Yanchun Ma Shanxi Medical University Pengzhou Kong Shanxi Medical University Xiaolong Cheng Shanxi Medical University Ling Zhang (  zhanglingty@sxmu.edu.cn ) Shanxi Medical University https://orcid.org/0000-0002-4100-0475 Yongping Cui Shanxi Medical University


Introduction
Esophageal cancer is one of the most common malignant tumors worldwide and about half of global esophageal cancer patients are in China, where the main histologic type is esophageal squamous cell carcinoma (ESCC). ESCC patients tend to have a very poor prognosis with the 5-year survival rate less than 30% [1]. Although the combined detection of some markers such as squamous cell carcinoma antigen (SCC), carcinoembryonic antigen (CEA) and cytokeratin-19-fragment (CYFRA21-1) is helpful for the diagnosis and prognosis prediction of ESCC patients [2], the tumor markers used in the early diagnosis of ESCC are not speci c and mature. Radical surgery is considered to be the best choice to treat ESCC patients. However, most ESCC patients are in the middle and late stage at initial diagnosis, and recurrence remains still the main reason for treatment failure. Although the application of some drugs such as trastuzumab and ramucirumab have entered the clinic, there is still a lack of speci c and effective therapeutic target in ESCC compared with other types of cancer [3]. Thus, there is a long way to go to further study the molecular mechanism of ESCC.
Aberrant glycosylation has been recognized as one of the hallmarks of cancer, which conferred a new perspective for cancer research [4]. As a prominent post-translational modi cation of proteins, glycosylation plays key roles in a variety of biological process such as cell transformation, differentiation, growth, invasion, and immune surveillance of tumors [5]. There are two main ways of glycosylation modi cation, N-glycosylation and mucin-type O-glycosylation occurring in cell endoplasmic reticulum and Golgi complex respectively [5]. Most tumor biomarkers currently used in clinical practice are glycoproteins (e.g. CA125 for ovarian cancer, AFP for liver cancer and CEA for colon cancer) or glycan-related such as CA19-9 for gastrointestinal and pancreatic cancer [6]. Fucosylated haptoglobin (Fut-Hpt) could be a biomarker for the detection of liver metastasis of colorectal or pancreatic cancer [7]. Moreover, antibodies of the fucosylated Le Y have also been used as potential drugs for immunotherapy in epithelial-derived tumors [8]. Thus, the study of the speci c glycosylation changes in cancer especially ESCC may provide a new way for clinical translation and application.
Abnormal glycosylation usually includes the changes of glycoprotein expression level caused by abnormal glycosylation modi cation and changes of glycan structure of glycoproteins determined by glycogene alteration. However, the complex characteristics of glycans and the limitation of research methodology made glycoproteomics and glycomics study severely lag behind. To date, there is no information about abnormal glycosylation signatures in ESCC. Here, we performed N-glycoproteomics analysis to investigate differentially expressed glycoproteins in ESCC compared with matched normal tissues. Because changes of glycoproteins may be a result of changes in protein concentration [9], we integrated proteomics data and N-glycoproteomics data to evaluate the effect of protein normalization on identi ed glycosylation changes. Moreover, we investigated dynamic pro ling changes of glycoprotein expression during lymph node metastasis progression of ESCC. This study might throw new light on the early diagnosis and molecular targeted treatment of ESCC.

Patient tissue samples
The ten pairs of tumor tissues and adjacent normal tissues at least 5 cm from the center of the tumor used for N-glycoproteomics and proteomics analysis in this study were obtained from patients diagnosed with ESCC who underwent surgical resection at the Shanxi Cancer Hospital (Taiyuan, China). There was no preoperative chemotherapy, radiotherapy or other therapies done on these patients. All the recruited patients gave their informed consent before they participated in our study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of the Shanxi Medical University (Approval No. 2017LL037). Details of clinicopathological features of cases are summarized in Table S1.

Protein extraction and trypsin digestion
After the samples were groud into cell powder in liquid nitrogen, four times the volume of lysis buffer (1% protease inhibitor, 3 µM TSA, 8 M urea, 50 mM NAM and 2 mM EDTA) was added and the samples were sonicated on ice using an ultrasonic processor. Once removing the remaining debris, the supernatant was collected and BCA kit was used to detect the protein concentration. Then the samples were reduced with 5 mM dithiothreitol for 30 min at 56°C, iodoacetamide was added to make the nal concentration 11 mM and samples were incubated at room temperature for 15 min in darkness. Finally, the urea concentration was diluted to less than 2 M. Trypsin was added at 1:50 trypsin-to-protein mass ratio overnight for the rst digestion and 1:100 trypsin-to-protein mass ratio for 4 hours for the second digestion. TMT Labeling and PTM enrichment for N-glycosylation The digested peptides were desalted using Strata X C18 SPE column (Phenomenex) and vacuum-dried. Then, the peptides were reconstituted in 0.5 M TEAB buffer and labeled with TMT kit according to the manufacturer's protocol. Brie y, the peptides were incubated with TMT reconstituted in acetonitrile for 2 hours at room temperature, then pooled, desalted and dried. The peptides were dissolved in 40 µL enrichment buffer containing 80% acetonitrile/1% tri uoroacetic acid. Then the supernatant was transferred to a hydrophilic (HILIC) micro-column. After centrifugation at 4000 g for 15 minutes, the HILIC micro-column was washed three times. Subsequently, glycopeptides were eluted using 10% acetonitrile and the eluate was collected. After being vacuum freeze-dried, samples were dissolved in 50 µl of ammonium bicarbonate buffer dissolved in 50 µl heavy oxygen water. Finally, 2 µl of PNGase was added and samples were digested overnight at 37 ° C.

HPLC Fractionation and LC-MS/MS analysis
The tryptic peptides were fractionated using high pH reverse-phase HPLC in Thermo Betasil C18 column (5 µm particles, 250 mm length, 10 mm ID). Firstly, the peptides were separated with a gradient of 8-32% acetonitrile (pH = 9.0) for more than 60 min into 60 fractions, followed by being dried by vacuum centrifuging. All of the above processes, LC-MS/MS and subsequent bioinformatics analyses were performed in Jingjie PTM Biolabs (Hangzhou in China). The peptides were dissolved in 0.1% formic acid (solvent A) and directly loaded into a reversed-phase analytical column (75 µm i.d, 15-cm length.). The gradient consisted of a gradual increase from 6-23% of solvent B (0.1% formic acid in 98% acetonitrile) for more than 26 min, 23-35% in 8 min and climbing to 80% in 3 min, followed by being held at 80% for the last 3 min. Above all were operated at a constant ow rate of 400 nL / min on EASY-nLC 1000 UPLC system. The peptides were submitted to NSI source and were analyzed by tandem mass spectrometry (MS/MS) in Q ExactiveTM Plus (Thermo). The m/z scan range was 300 -1600 for full scan, and Orbitrap was used to detect intact peptides at a resolution of 70,000. Then, peptides were selected and analyzed by MS/MS using NCE setting as 28 and Orbitrap was used to detect the fragments at a resolution of 17, 500. Automatic gain control (AGC) was set at 5E4. Fixed rst mass was set as 100 m/z.

Database Search.
Maxquant search engine (v1.5.2.8) was used to retrieve the mass spectrometer data. Parameter settings are as follow. The database is SwissProt Human (20130 sequences), and the reversed database is added to calculate the false positive rate (FDR) caused by random matching. The cleavage enzyme method was set as Trypsin/P allowing up to 4 missing cleavages. The minimum required peptide length was set to seven amino acids and the maximum modi cation number of the peptide was set to 5. In First search, the mass tolerance for precursor ions was set as 20 ppm. In Main search, the mass tolerance for precursor ions was set as 5 ppm. The mass tolerance for fragment ions was set as 0.02 Da. FDR was adjusted to < 1% and minimum score for modi ed peptides was set > 40.
Gene Ontology (GO) and KEGG pathway annotation.
The annotation of Gene Ontology (GO) of proteins comes from the UniProt-GOA database (www. http://www.ebi.ac.uk/GOA/). Firstly, the system converts protein ID to UniProt ID, then uses UniProt ID to match GO ID, and extracts corresponding information from UniProt-GOA database based on GO ID. If there is no according protein information in UniProt-GOA database, the InterProScan soft based on protein sequence, will be used to annotate protein's GO function. The proteins were then classi ed according to cellular component, biological process and molecular function. Kyoto Encyclopedia of Genes and Genomes (KEGG) database was used to annotate protein signaling pathway. Firstly, we used KEGG online service tools KAAS to annotate KEGG database description of proteins. Then KEGG mapper, KEGG online service tools was used to map the annotation result on the KEGG pathway database.
Wolfpsort, an updated version of PSORT/PSORT II, was applied to predict subcellular localization.
Motif Analysis.
Software MoMo (motif-x algorithm) was applied to analyze the motif characteristics of N-glycosylation sites. Among them, the peptide sequences of 10 amino acids upstream and downstream of all the Nglycosylation site were analyzed. And all the database protein sequences were used as background database parameter. When the number of peptides in a speci c sequence is more than 20 and the p value is less than 0.00000 1, it is considered that the characteristic sequence is a motif of modi ed peptide.
The Mfuzz package (version 2.36.0) based on the open-source statistical language R (version 3.6.3) was used to perform soft-clustering analysis. The fuzzy c-means algorithm, one of the most widely used softclustering method, provided by Mfuzz package, is more noise robust and a priori pre-ltering of genes/proteins can be avoided, so it avoided the exclusion of biologically relevant genes/proteins from the data analysis [10,11]. FCM clustering uses Euclidean distance as the distance metric, and demands two main parameters (c = number of clusters, m = fuzzi cation parameter). Different from k-means clustering, each element has a set of membership coe cients corresponding to the degree of being in a given cluster by FCM clustering.
For the quantitative information of N-glycosites, we removed the N-glycosites containing missing values in the raw data at rst, to make sure that each gene could be quanti ed in each sample. Calculated the quantitative ratio of N-glycosites in each sample pair, and the quantitative ratios were log2 transformed and then z-score normalized for each pair of sample, to make the mean was zero and standard deviation was one. The normalized-data makes sure that N-glycosites with similar temporal patterns are close in Euclidean space [12]. The transformed pro les were then clustered using the Mfuzz package. For this analysis, the optimal values of c and m were derived. Final clustering was done with the parameters c = 15 and m = 1.48. Then we screened out the groups related to the degree of LNM, set the appropriated membership coe cients to make the remaining genes in these groups had a similar trend of change (membership>=0.30). Figure 1 shows the work ow for the analysis of global N-glycoproteomics and proteomics of ESCC and matched normal tissues. The separately matched samples for potential biomarkers discovery were used and expected to re ect the level of different individuals. Twenty protein samples from 10 patients were assigned to two experiments with TMT10plex markers and digested with trypsin for N-glycoproteomics and proteomics analysis. A detailed description of the clinical characteristics of the analyzed samples is presented in Table S1.

Study design and global proteomics analysis of ESCC
In order to exclude the possibility that the changes of glycoproteins may be a result of changes in protein concentration, we rstly investigated global proteomic pro le in 10 cases with paired tumor tissue and adjacent normal tissues using TMT labeling-based quantitative proteomics. On one hand, quality control of global proteomic data indicated that the proteomics analysis system was robust ( Figure 2). On the other hand, we exclude the data of two samples (#8 and #9) because principal component analysis and Pearson's correlation coe cient analysis indicated that the extreme values of the two samples were probably caused by sample collection problems. Altogether, 6856 proteins were identi ed from these pairs of ESCC tissues, among which 6113 proteins were quanti ed. The mean value of 6113 protein quantitative ratios was 1.11, indicating expression levels of most proteins were unchanged. With the threshold of >1.5 fold change and p-value <0.05, it was found that 596 proteins were up-regulated while 495 proteins were down-regulated when comparing ESCC samples with their matched normal tissues ( Figure 3A). All these data were listed in Table S2. Proteins showing signi cant up/down-regulation in all paired comparisons were exhibited in detail in the heat-map of Figure 3B. In terms of cellular component, the up-regulated proteins were mainly localized in chromosome, nuclear lumen, spliceosomal complex and intracellular organelle lumen, while most of the down-regulated proteins were found in extracellular region, extracellular matrix and vesicle ( Figure 3C). Accordingly, KEGG-based enrichment analysis showed that the prominent pathways enriched in the increased proteins included spliceosome, DNA replication, RNA transport, transcriptional mis-regulation in cancer, cell cycle, base excision repair and mismatch repair. By contrast, protein expression in the cellular pathways including ECM-receptor, focal adhesion, metabolism (e.g. protein digestion and absorption; tyrosine, histidine, phenylalanine, and tryptophan metabolism; drug metabolism-cytochrome P450), interaction and proteoglycans was decreased ( Figure  3D). Because proteomic analysis of more ESCC samples will be reported in our another study, here, we focused on the N-glycoproteomics analysis and all the following N-glycoproteomics data were normalized by corresponding proteomics data.

Motif analysis of N-glycoproteome
As one type of classical post-transcriptional modi cation (PTM), glycosylation has been proven to be essential for regulating cellular functions. Although N-glycoproteomics study has been applied in many cancers, it remains unreported in ESCC. In this study, the quality control results of N-glycoproteomics showed that our optimized methods were highly con dent. Most identi ed peptides are distributed in 8-20 amino acids, which is consistent with the common pattern of trypsin digestion ( Figure 4A). The mass error of most spectrograms is within 5 ppm, which indicates that the accuracy of the mass spectrometer is reliable and the qualitative analysis of protein will not be affected by the large mass deviation ( Figure  4B). Principal component analysis and Pearson's correlation coe cient analysis demonstrated a good reproducibility and accuracy in LC-MS/MS analysis ( Figure 4C-D). In total, N-glycoproteomics data quality control indicated that the analysis system was robust.
To characterize the possible speci c sequence, a motif analysis was performed to indicate the likelihood of amino acid types being over-or under-represented at the positions surrounding the N-glycosylation sites. Seven signi cantly enriched motifs were found from all of the identi ed N-glycosylation sites. As shown in Figure 4E, motif logos according to the score ranking included N-X-T, N-C-S, R-XXXXXX-N-X-S, N-A-S, R-N-X-S, N-X-S, N-G-X (X represents a random amino acid residue; N, T, A, G, C, S, R represent asparagine, threonine, alanine, glycine, cystine, serine, arginine respectively). The amino acid frequencies in the sequences anking N-glycosylation sites were assessed by motif model to con rm whether there were position-speci c amino acids surrounding N-glycosylation sites ( Figure 4F). In addition to canonical N-linked glycopeptide sequon of N-X-T/S, we found that cysteine residue at +1 and +3, glycine residue at -1 and +1, arginine residue at -1, -2 and -7, alanine residue at +1 position were overrepresented surrounding N-glycosylation sites. Interestingly, despite N-X-C and N-X-V have been discovered to be atypical N-glycosylation sequons in recent years [13], cystine and valine were obviously depleted at +3 position of N-glycosylation site in our study.
Characteristics of the N-glycoproteins: identi cation and quantitative analysis On average in this study, a total of 1,839 N-glycosylation sites in 1021 glycoproteins were identi ed, of which 1588 sites in 901 glycoproteins were quantitative. Figure  In order to explore the abundance difference between the global N-glycoproteins and proteins, we compared datasets of the quanti ed N-glycoproteomics and the proteomics. A case-by-case review demonstrated that the numbers of quanti ed proteins are more than glycoproteins for individual ESCC. Moreover, the numbers of quanti ed proteins have no trend while the numbers of quanti ed glycoproteins increased with disease development and lymph node metastasis ( Figure 5B). Figure 5C shows the overlap in the protein groups identi ed in both N-glycoproteomics and proteomics. There are 542 overlapped proteins identi ed in both N-glycoproteomics and proteomics. In other words, only 60.2% (542/901) of all identi ed glycosylated proteins in N-glycoproteomics were also identi ed in the proteomics. In fact, relatively low abundance of some glycosylated proteins, such as surface receptors and secreted proteins, make it di cult to detect them in the proteomics. This is supported by the results that the intensity of deglycoside proteins only identi ed in the N-glycoproteomics data is obviously and generally lower than that of corresponding proteins also identi ed in the global proteomics data ( Figure  5D).
N-glycoproteomics in this study revealed that a total of 716 differentially expressed N-glycosylation sites (with a cut-off change of 1.5-fold and p-value < 0.05) in 441 proteins, in which 512 N-glycosylation sites in 326 glycoproteins were observed signi cantly upregulated and 204 N-glycosylation sites in 115 glycoproteins were observed signi cantly down-regulated ( Figure 5E). Differential abundances of PTM's peptides represent changes in PTM's status and corresponding protein expression affected by PTM. Thus, comprehensive PTM changes require normalization by total protein expression changes to exclude that differential PTMs are attributed to changes in total protein expression. So, we mainly focused on the glycoproteins quanti ed in both N-glycoproteomics and proteomics to investigate the effect of protein normalization on glycosylation change. We evaluated the quantitative N-glycoproteomic ratios of 1107 Nglycosites in 542 proteins which were normalized by corresponding protein quantity. There is a signi cant correlation between the N-glycosite ratios and protein ratios (Pearson r=0.84), demonstrating that changes of N-glycosylation modi cation of a protein are probably a re ection of protein abundance change. A total of 264 differentially expressed N-glycosylation sites in 180 proteins were identi ed, in which 243 N-glycosylation sites in 162 glycoproteins were observed signi cantly upregulated and 21 Nglycosylation sites in 18 glycoproteins were observed signi cantly down-regulated. Differentially expressed glycoproteins and glycosylation sites were listed in Table S3. Twenty-four glycosylation sites of 20 proteins were consistently upregulated in all samples and nineteen glycosylation sites of 17 proteins were consistently down-regulated in more than four samples were shown in Figure 5F. Among them, procollagen-lysine,2-oxoglutarate 5-dioxygenase 2 (PLOD2), collagen alpha-3(VI) chain (COL6A3), peptidyl-prolyl cis-trans isomerase (FKBP10) and Tenascin (TNC) have two or more differential glycosylation sites in all samples. The number of differentially expressed N-glycosylation sites that were identi ed in different individuals of both experiments was exhibited in Figure 5G.

Characteristics of the N-glycoproteins: functional analysis
Subcellular localization analysis revealed that the majority differentially expressed glycoproteins (fold change>1.5 and p value< 0.05), whether up-regulated or down-regulated, distributed in the extracellular region (53.6%), plasma membrane (22.4%) and endoplasmic reticulum (10.6%), while few glycoproteins were localized in cytoplasm (2.2%) and nucleus (3.4%). It is highly consistent with the property of glycoproteins, completely different from the protein localization in proteomics ( Figure 6A). Then we performed enrichment analysis of differentially expressed glycoproteins to identify signi cantly Gene Ontology terms and KEGG pathways and to investigate their functions. Most up-regulated glycoproteins were enriched in cell-substrate-related metabolic processes including collagen metabolic process, receptor metabolic process, protein activation cascade. For the KEGG pathway enrichment, some known glycosylation-affected cancer-associated pathways including ECM-receptor interaction, focal adhesion, phagosome and lysosome were enriched in up-regulated glycoproteins. Interestingly, some amino acid metabolism-related pathways including lysine degradation, tyrosine metabolism, phenylalanine metabolism, valine, leucine and isoleucine degradation were obviously glycosylated and the levels of according glycoproteins were signi cantly upregulated in ESCC ( Figure 6B). Procollagen-lysine, 2oxoglutarate 5-dioxygenase 2 (PLOD2), which catalyzes the hydroxylation of lysyl residues in collagenlike peptides, was identi ed with three N-glycosites (N63, N209 and N696). Other up-regulated glycoproteins related to amino acid metabolism included procollagen-lysine, 2-oxoglutarate 5dioxygenase 1 and 3 (PLOD1 and PLOD3), amine oxidase copper containing 3 (AOC3), interleukin 4 induced 1 (IL4I1), 3-hydroxy-3-methylglutaryl-CoA synthase 1(HMGCS1) and so on. In contrast, these upregulated pathways in N-glycoproteomics were found to be down-regulated in proteomics, indicating these pathways can be actually regulated by glycosylation modi cation in ESCC. Many down-regulated terms were related to peptidase regulator activity. The KEGG pathway enrichment result showed ECMreceptor interaction, focal adhesion and PI3K-Akt signaling pathway as negatively enriched ( Figure 6C). Differential glycosylated proteins and differential proteins in some amino acid metabolism-related pathways are marked in Figure 6D.

Stage progression related dynamic clusters analysis in ESCC
As the lymph node status is one of the main factors affecting the clinical stage and survival of ESCC patients, we were interested in the dynamic process of glycoprotein expression during ESCC development, especially in lymph node metastasis (LNM). To investigate the types of dynamic change, we performed dynamic clusters analysis on the basis of glycoprotein quantity. We captured a series of glycoprotein clusters whose fold change increased or decreased as cancer progresses with lymph node metastasis.
For glycoprotein clusters whose fold change increased or decreased as cancer progression, the fteen clusters were detected. Clusters #3, #7 and #9 represented a series of glycoproteins whose expression quantity showed an increasing or decreasing trend ( Figure 7A). The enrichment pathways and interaction networks of these glycoproteins were exhibited in Figure 7B and 7C. And the glycoproteins of Clusters #9 were enriched in HIF-1 signaling pathway (INSR, IGF1R, TFRC), RAP1 signaling pathway (INSR, IGF1R, ITGB3), and lysine degradation pathway (PLOD3 and COLGALT1), indicating that these pathways might play an important role in the LNM process in ESCC. Furthermore, the glycoproteins of Clusters #3 and #7 were enriched in Cell adhesion molecules (CAMs), complement and coagulation cascades, and leukocyte trans-endothelial migration. Decreased glycosylation levels of CAMs may promote tumor cells to detach from the primary tumor, then facilitate tumor invasion and metastasis; and for the leukocyte migration and complement system, low glycosylation levels may affect their immune surveillance and clearance capabilities for tumors.

Discussion
Aberrant protein glycosylation is well known to be associated with the occurrence and development of cancer including cell transformation, invasion and metastasis [14]. Glycans are usually attached to proteins on the cell membrane or in the extracellular matrix. N-linked glycans are the addition of an oligosaccharide chain to an asparagine (Asn) residue within an amino acid sequence Asn-X-Ser/Thr (X should not be proline), and O-linked glycans are the transfer to the oxygen atom of the serine(Ser), hydroxyl (Tyr) or threonine(Thr) residues [15]. Although great progresses have recently been made to identify the glycoproteins and glycosylation patterns as potential biomarkers for various types of cancers, this study is the rst unbiased analysis of N-glycoproteomics study in ESCC.
As expected, most of the changes in glycoproteins were accompanied by changes in according protein expression. After normalization by protein expression changes, a total of 901 glycoproteins were identi ed and quanti ed from ESCC tissues. According to the localization, the identi ed glycoproteins were mostly located in the subcellular compartments, consistent with the sites of the N-glycoprotein formation process. Regarding enriched related signaling pathways, the glycoproteins involved in the ECMreceptor interaction, focal adhesion (such as laminin, integrin and bronectin family) and lysosome (e.g. LAMP1, GNS, LGMN, MAN2B1) were largely enriched which was consistent with previous reports [9,16]. Interestingly, the proteins involved in amino acid metabolism were obviously upregulated and largely enriched. PLOD2, PLOD3 and PLOD1, which belong to lysyl hydroxylase and catalyze collagen lysine hydroxylation, were found to be up-regulated glycoproteins in all cases of ESCC. Although increased glycosylation of these proteins was only also observed in colorectal cancer [17], the detailed mechanism and impact of glycosylation in cancer remained elucidated. Glycosylation of HMGCS1 and AOC3 proteins has not been reported in cancer. HMGCS1 mediates the mevalonate pathway, ketogenesis [18] and is involved in cholesterol biosynthesis [19]. AOC3 catalyzes the conversion of primary amines into aldehydes [20]. In this study, AOC3 protein was up-regulated in the N-glycosylation level at Asn137 and down-regulated in its protein expression, indicating that an overall increase of N-glycosylation occupancy of AOC3 has an important impact on the biological function. All these results supported the well-known experience that glycosylation participated in cellular metabolism in the onset and progression of ESCC [21].
The majority of differential glycoproteins with multiple glycosylation sites were laminin, integrin, collagen, bronectin associated proteins. Among differential glycoproteins in all samples, lysosome-associated membrane glycoprotein 1 (LAMP1) has been reported to be a pro-invasive factor in cancer progression through abnormal localization on the plasma membrane of cancer cells [22]. On one hand, this could be a membrane repair mechanism through lysosome fusion and exocytosis [23]. On the other hand, LAMP1 localization on the plasma membrane provided the binding ability to E-selectin through sialyl-LeX residues, thereby promoting the cancer cells adhering to extracellular matrix [24]. LAMP1 over-expression could also in uence cancer development inside the lysosomal membrane through increasing lysosomal exocytosis and/or lysosomal size [25,26]. In this study, N-glycosylation level of LAMP1 was extremely upregulated but protein expression level was unchanged, indicating it is glycosylation modi cation but not expression change affects the biological function. Whether glycosylation of LAMP1 affects localization and how the six differential glycosylation sites of LAMP1 (Asn84, 103, 261,76, 62, 249) in this study work in ESCC needs further study. Transferrin receptors encoded by TFRC is a membrane glycoprotein, which can import iron by binding transferrin. TFRC displayed moderate to strong cytoplasmic expression in various cancer tissues and expression increased with increasing cancer stage [27]. TFRC knockdown decreased intracellular total iron, suppressed tumor growth and metastasis in human and mouse mammary adenocarcinomas [27]. Matrix remodeling associated 5 (MXRA5) was a novel extracellular protein that was also upregulated in several types of cancers [28]. MXRA5 was identi ed to be frequently mutated in non-small cell lung carcinoma and the high MXRA5 expression was correlated with tumor progression [29,30]. However, there are no previous studies that focus on the role of the N-glycosylation modi cation of this protein in cancer including ESCC.
Lymphatic metastasis (LNM) related dynamic analysis captured 15 glycoprotein clusters whose fold change increased or decreased as cancer progresses with LNM. And the glycoproteins that increased with LNM status were enriched in HIF-1 signaling pathway, RAP1 signaling pathway and lysine degradation pathway. As is well known, HIF-1 signaling is a classical oncogenic pathway associated with tumor growth, angiogenesis, metastasis, and mortality. Recent studies showed HIF-1 signaling was involved in lymphatic invasion through induction of platelet-derived growth factor B (PDGF-B) in breast cancer [31] and vascular endothelial growth factor C (VEGF-C) [32] and SP1 in ESCC [33]. INSR and IGF1R belong to Insulin/IGF System, which was known to affect the malignant behavior of cancer cells and was regulated by Glycans [34]. Increased INSR/IGF1R were related with LNM in cancers [35]. Inhibition of N-linked glycosylation impaired the receptors (INSR and IGF1R) glycosylation and reduced their abundance at the cell surface [36]. As an important cell adhesion molecular, ITGB3 expression has been reported to be associated with LNM in several types of cancers [37] and its N-glycosylation was required for the attachment of cells to ECM [38]. RAP1 pathway is also an important regulator of cell adhesions and junctions, cellular migration, and polarization [39]. RAP1 expression was reported to be correlated with LNM in ESCC and its knockdown decreased the migratory and invasive capacities via AKT signaling in ESCC cells [40]. The role of lysine degradation pathway in cancer was rarely reported. In colorectal cancer, thrombopoietin (TPO) was reported to promote self-renewal and metastasis of CD110+ tumor-initiating cells (TICs) to the liver by activating lysine degradation [41]. In our study, the two genes involved in lysine degradation, PLOD3 and COLGALT1, were involved in collagen glycosylation. PLOD3 was upregulated in some types of cancers and promoted tumor malignant progression [42][43][44][45][46]. ICOLGALT2 was also reported to be overexpressed in metastatic ovarian cancer and interacts with PLOD3[47].

Conclusion
Our study revealed a series of differentially expressed glycoproteins and signaling pathways in ESCC tissues compared with paired normal tissues. Especially we found several metastasis-related glycoproteins and signaling pathways, which might be potential therapeutic targets for the prevention and treatment of ESCC patients with LN metastasis.  Figure 1 Schematic representation of the N-glycoproteomics and proteomics approach for the discovery of differential expressed glycoproteins and proteins in ESCC tissues compared with matched normal tissue.      Supplementary Files