CPXM1: a novel biomarker for gastric cancer prognosis

Jun Du Department of Gastrointestinal Surgery, The First A liated Hospital of Wannan Medical College Mengxiang Zhu Department of Gastrointestinal Surgery, The First A liated Hospital of Wannan Medical College Wenwu Yan Department of Gastrointestinal Surgery, The First A liated Hospital of Wannan Medical College Changsheng Yao Department of Gastrointestinal Surgery, The First A liated Hospital of Wannan Medical College Qingyi Li Department of Gastrointestinal Surgery, The First A liated Hospital of Wannan Medical College Jinguo Wang (  shwangjg@sina.com ) Department of Gastrointestinal Surgery, The First A liated Hospital of Wannan Medical College


Introduction
GC is one of the most common tumors of the digestive system and has been the fth most common malignant cancer in the past decade. As the third leading cause of cancer deaths, the mortality rate of GC is increasing worldwide [1][2]. Although treatment methods (e.g., surgery, radiotherapy, chemotherapy) have improved, the prognosis of patients with advanced GC is still poor [3]. Therefore, it will be important to identify effective biomarkers to diagnose GC and predict prognosis.
The carboxypeptidase X, M14 family member (CPXM1) gene is located on chromosome 20p13 and is a member of the carboxypeptidase (CP) superfamily, which is a class of exopeptidases that speci cally degrades and releases free amino acids from the C-terminus of a peptide chain. Unlike most other members of the CP family, CPXM1 is inactive toward standard CP substrates, which could lack two residues related substrate binding such as Arg117 and Try248 in CPB [4]. Speci cally, CPXM1 contains a discoidin domain (DSD), which consists of 157 amino acids and can bind the GVMGFO motif of collagen III. CPXM1 is a secreted protein predicted to contain four N-glycosylation sites, and its secretion is inhibited by the use of tunicamycin which is an inhibitor of N-acetylglucosamine phosphotransferase [5]. During osteoclastogenesis and adipogenesis, the expression level of CPXM1 has been shown to increase transiently, suggesting CPXM1 plays a key role in osteoclast and adipocyte differentiation [6][7].
At present, the biological function and molecular mechanism of CPXM1 in cancer have not been reported. Therefore, it is meaningful to study the relationship between CPXM1 and cancer. This study investigated the relationship between the clinicopathological features and prognosis of GC patients by immunohistochemistry (IHC) analysis, and analyzed the expression, prognosis, and related signaling pathways of CPXM1 in GC by bioinformatics analysis.

Materials And Methods
Public data extraction and processing The gene expression quanti cation data and clinical data of 375 GC samples and 32 normal samples were downloaded from the GDC data portal (https://portal.gdc.cancer.gov/). We used Perl (https://www.perl.org/) scripts to decompress the downloaded compressed les in batches, which contain gene expression and clinical data of GC samples. Then, all sample IDs and RNA-seq data were integrated into a matrix le. Next, according to the Ensembl database (http://asia.ensembl.org/index.html), the Ensembl ID was converted to the gene symbol. Finally, we added the gene attribute (protein coding or lincRNA) after the gene symbol for subsequent operations. R (https://www.r-project.org/) language scripts and various packages were used to make the images and process the data. The expression difference of CPXM1 in normal gastric and GC tissues was veri ed in the Oncomine database (https://www.oncomine.org). The threshold was set to the following parameters: p-value of 0.001, fold change of 2, and gene rank of all. The correlation between CPXM1 expression and prognosis in GC patients was evaluated by Kaplan-Meier plotter (http://kmplot.com).

Patients and tissue samples
Informed consent was signed by all patients and their relatives, and the study was approved by the ethics committee of the First A liated Hospital, Yijishan Hospital of Wannan Medical College. All tissue samples containing 96 cases of GC tissues and 84 cases of paired normal tissues, which excluded patients with preoperative radiotherapy or chemotherapy, were processed anonymously in accordance with ethics and law.

IHC staining
The TMA was placed in an oven at 63°C for 1 h, and then placed in an automatic dyeing machine for dewaxing (LEICA). The TMA was soaked twice in xylene solution for 15 min and dehydrated in different concentrations of ethanol solution in sequence (100%, 7 min; 100%, 7 min; 90%, 7 min; 80%, 7 min; 70%, 7 min). The TMA was taken out and washed three times with pure water for 3 min, placed into boiling citric acid repair solution (82 ml 0.1 mol/L sodium citrate solution, 18 ml 0.1 mol/L citric acid solution, 900 ml pure water), heated in a pressure cooker for 5 min, and nally cooled to room temperature. The TMA was then placed into endogenous peroxidase blocking solution (38.4 ml anhydrous methanol, 12 ml 30% H 2 O 2 , 9.6 ml pure water) for 10 min, and then washed three times with phosphate-buffered saline (PBS) for 5 min. The TMA was covered with CPXM1 primary antibody (1:200; catalog no. bs-8341R, BIOSS, China) and placed at 4℃ overnight. The following day, the TMA was washed three times with phosphate-buffered saline (PBS) for 5 min and incubated with the secondary antibody (DAKO) at room temperature for 30 min followed by three washes with PBS for 5 min. Diluted DAB (DAKO) solution was added for 5 min and the TMA was washed with water for 15 min followed by the addition of hematoxylin for 2 min. The TMA was then immersed in 0.25% hydrochloric acid alcohol for 2 s and washed with water for 2 min. Finally, the TMA was placed into the automatic dyeing machine for dehydration and sealed with para n wax.

IHC evaluation
Two pathologists independently evaluated the immunohistochemical results of all samples in a blinded manner. When there were con icting results, we invited a third pathologist to resolve the dispute. Cytoplasmic staining was positive, which was consistent with the description of primary antibody's instruction. We scored according to the staining intensity (negative = 0, weakly positive = 1, moderate positive = 2, strong positive = 3) and the percentage of stained cells (0-10% = 1, 11-50% = 2, 50-75% = 3, and 75-100% = 4). The results of the two scores were multiplied to obtain the immunohistochemical scores of all GC patients, and the GC patients were reasonably divided into a high expression group (score ≤ 3) and a low expression group (score > 3) by X-tile software.
Gene set enrichment analysis (GSEA) GSEA was performed using GSEA v3.0 (https://www.gsea-msigdb.org) and JAVA 8 (https://www.java.com) to identify CPXM1 associated with gene sets. First, two les were prepared: (1) a .gct le containing CPXM1 expression data of 375 GC patients, and (2) a .cls le that divides the expression data of CPXM1 into high expression and low expression. Then, the number of permutations was set to 1000 to test CPXM1 correlations with the phenotypes using the c2.cp.kegg.v6.2 gene set database. Ultimately, with the normalized (NOM) p-value < 0.05 and false discovery rate (FDR) < 0.05, the gene sets with signi cant enrichment of CPXM1 high expression-related genes were considered the enrichment gene sets.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis
To nd the co-expressed genes of CPXM1, we rst merged the same gene expression data in the processed matrix le and took the average. Secondly, the gene expression data of all the coding proteins in the matrix le was screened out, and the correlation was tested with the expression data of CPXM1 one by one.
Finally, we screened out co-expressed genes of CPXM1 using Pearson's correlation coe cient > 0.4 and pvalue < 0.05 as thresholds. Then, we used R language scripts and R packages to perform GO and KEGG enrichment analyses. We considered the co-expressed genes of CPXM1 signi cantly enriched in the GO term and the KEGG pathway at p-value < 0.05 and FDR q-value < 0.05.
Statistical analysis SPSS 23.0, GraphPad prism 8.0.2, and R 4.0.2 were used for statistical analysis. The Chi square test and Fisher's exact test were performed to detect the correlation between CPXM1 expression levels and clinicopathological features in the 90 GC patients. The Kaplan-Meier survival curve and log-rank test were used to compare the survival rate between the CPXM1 high expression and low expression groups. We performed univariate Cox-regression analysis of all clinicopathological parameters, and then integrated these parameters into multivariate Cox-regression analysis to determine independent predictors of survival in GC patients. p-values < 0.05 were considered statistically signi cant.

Relationship between CPXM1 mRNA expression and clinicopathological features, and survival of GC patients in public databases
We rst veri ed that the expression level of CPXM1 mRNA was signi cantly increased in GC tissues in the Oncomine database (p < 0.001) ( Figure 1A-C). Secondly, we screened the CPXM1 expression data of 375 GC patients and 32 normal patients downloaded from The Cancer Genome Atlas (TCGA) database, and con rmed that the expression level of CPXM1 mRNA in GC tissues was 4.87 times higher than that in normal tissues (p < 0.001) ( Figure 1D). At the same time, comparison of paired GC tumors and normal tissues also showed that the expression level of CPXM1 mRNA was signi cantly upregulated (p < 0.001) ( Figure 1E). Then, we analyzed the relationship between the expression level of CPXM1 mRNA and clinicopathological features of these patients, and found that the tumor invasion grade was signi cantly correlated with CPXM1 expression (p < 0.001) ( Figure 1F). Finally, we analyzed the relationship between the expression level of CPXM1 and the survival of GC patients through the Kaplan-Meier plotter database. The results showed that the prognosis of GC patients with high expression of CPXM1 was signi cantly poor (p < 0.001) ( Figure 1G). These data indicate that CPXM1 plays an important role in the occurrence and progression of GC.
Relationship between CPXM1 expression of GC and normal tissues and clinicopathological features and survival in a TMA To verify CPXM1 expression at the protein level, a TMA containing 96 GC tissues and 84 normal tissues was subjected to immunohistochemistry ( Figure 2). These samples had complete clinicopathological features and survival follow-up information. In this study, the expression of CPXM1 was high in 46 cases of GC tissues and 34 cases of normal tissues. However, there was no signi cant difference in the expression of CPXM1 between GC and normal tissues (p = 0.824, Table 1). Subsequently, we analyzed the relationship between the expression of CPXM1 and clinicopathological features in GC and normal tissues, separately ( Table 2). The expression level of CPXM1 in GC tissues was signi cantly correlated with tumor size (p = 0.041) and lymph node metastasis (p = 0.014), but not with gender, age, pathological grade, tumor invasion, distant metastasis, and AJCC stage (all p > 0.05). In addition, the expression level of CPXM1 in normal tissues was not statistically signi cant with all clinicopathological features (all p > 0.05).
According to the Kaplan-Meier survival curve analysis, we found that survival in patients with high expression of CPXM1 in GC tissues was signi cantly correlated with prognosis (p = 0.011, Figure 3A). However, CPXM1 high expression in normal tissues was not signi cantly correlated with the prognosis of GC patients (p = 0.317, Figure 3B)  (Table 3). These results suggest that CPXM1 could be a potential independent prognostic factor in GC patients.

GSEA analysis of CPXM1-related signaling pathways
We performed GSEA analysis on both the CPXM1 high expression and low expression datasets in TCGA to identify various pathways that could be activated in GC. Using NOM p-value < 0.05 and FDR q-value < 0.05 as the threshold, we listed the rst 20 pathways related to the CPXM1 high expression data set; most of these pathways were related to cancer phenotypes (Table 4). Pathways associated with focal adhesion, cell adhesion molecules (CAMs), extracellular matrix (ECM) receptor interaction, cytokine-cytokine receptor interaction, leukocyte transendothelial migration, dilated cardiomyopathy, axon guidance, melanoma, and Hedgehog signaling pathway were enriched in the CPXM1 high expression data set ( Figure 4).

GO and KEGG analyses of CPXM1 co-expressed genes
To further elucidate the molecular mechanism of CPXM1 in GC, we screened CPXM1 co-expressed genes from the TCGA database and visualized their correlation ( Figure 5A). We then analyzed the CPXM1 coexpressed gene expression correlation by GO and KEGG analyses. GO analysis showed that the CPXM1 coexpressed genes were mainly enriched in ECM structural constituent, cell adhesion molecule binding, glycosaminoglycan binding, and integrin binding ( Figure 5B). In addition, KEGG analysis showed that CPXM1 co-expressed genes were mainly enriched in signaling pathways related to focal adhesion, ECMreporter interaction, proteoglycans in cancer, and osteoclast differentiation, and participated in some classic cancer-related signaling pathways, such as PI3K-Akt and Rap1 signaling ( Figure 5C). Chord plot displays of the relationship between CPXM1 co-expressed genes and KEGG pathways are shown in Figure  5D. The CPXM1 co-expressed gene pro les are displayed in each KEGG pathway by hierarchical clustering ( Figure 5E). Therefore, based on our bioinformatics analysis we conclude that CPXM1 could activate a series of cancer-related signaling pathways through ECM interactions, which may lead to malignant phenotypes such as cancer adhesion and metastasis. These results are consistent with the CPXM1-related clinicopathological parameters described above.

Discussion
GC is one of the most common malignant tumors worldwide. Although the cure rate of early GC is very high, the 5-year survival rate is only 3.9% once metastasis occurs [8]. The molecular mechanisms of GC metastasis are complex, and so it is particularly important to identify independent prognostic factors for GC. We hypothesized that CPXM1 may play a key role in the development of GC through multiple databases, and the molecular mechanisms of CPXM1 in cancer have not been reported at present. Therefore, we proposed for the rst time that the expression of CPXM1 in GC was related to the clinicopathological characteristics and survival of GC patients in this study, and predicted CPXM1 related signaling pathways through bioinformatics analysis, laying the foundation for subsequent experiments.
CPXM1 is a member of the CPE family, and a special member of the CP family along with CPXM2 and AEBP1 [5]. Overexpression of CPXM2 is closely related to the prognosis of GC patients, and may promote the proliferation and invasion of GC through the epithelial-mesenchymal transition [9]. AEBP1 could be a potential prognostic factor for GC, which can activate NF-ĸB signaling and lead to a series of malignant phenotypes of GC [10]. First, compared with other members, CPXM1, CPXM2, AEBP1 are highly homologous. Second, they seem to lack CP activity, which may be caused by the deletion of CP-related active sites or the substitution of other residues. Finally, they are unique in that their signal peptides are connected via a DSD [11].
The DSD is composed of approximately 150 amino acids. It is widely distributed in secretory proteins, intracellular proteins, and transmembrane proteins, and participates in a variety of biological functions.
The DSD structure of AEBP1 may interact with collagen I to regulate the spreading and proliferation of broblasts and myo broblasts [12]. In addition, special sites in the DSD of DDR1 and DDR2 can bind different collagen proteins, and activate a series of downstream signaling pathways to promote the malignant phenotypes of various cancer cells [13]. Kim et al. found that the homology of CPXM1 DSD with CPXM2 and AEBP1 was 58.9% and 53.2%, respectively, and that of DDR1 and DDR2 was 34.7% and 34.2%. CPXM1, like DDR1, DDR2, and AEBP1, could bind the GVMGFO motif of collagen III [5]. Early studies found that CPXM1 is a secretory protein associated with N-glycosylation. Four glycosylation sites were predicted, located at N57, N210, N318, and N472. N-glycosylation is closely related to the secretion of CPXM1. When tunicamycin was used to inhibit the formation of N-glycosylation, it was showed that the amount of CPXM1 secreted into the extracellular medium was signi cantly reduced [4][5].
Our immunohistochemical results showed that there was no difference in the expression of CPXM1 between GC and normal tissues. This may be due to the increased secretion of CPXM1 caused by glycosylation in gastric cancer tissue, and our preliminary experiments results show that compared with normal gastric mucosa tissue, the glycosylation of CPXM1 in gastric cancer tissue is signi cantly increased (not shown). However, the expression level of CPXM1 in GC tissues was closely related to some clinicopathological features and the prognosis of GC patients. Through univariate and multivariate analyses, CPXM1 was determined to be an independent prognostic factor for GC patients. The results of GSEA showed that the high expression of CPXM1 enriched for signaling pathways related to focal adhesion, cell adhesion molecules, and ECM-receptor interaction. To further verify the results, we performed GO and KEGG analyses and found that the co-expression genes of CPXM1 were enriched in cell adhesionrelated signaling pathways, such as focal adhesion, PI3K-Akt signaling pathway, integrin binding, and cell adhesion molecular binding. Cell matrix adhesion plays an important role in cell movement, cell proliferation, cell differentiation, and cell survival. Focal adhesion refers to the special structure of actin laments anchored on the transmembrane receptor of the integrin family through the multimolecular complex connecting plaque protein [14]. These structures can activate downstream signaling pathways, leading to reorganization of the actin cytoskeleton, which is a key prerequisite for changing cell shape, movement, and gene expression [15][16]. The relationship between CPXM1 and the cell adhesion pathway needs to be further studied.
There are three important limitations in our research. First, we used a relatively small number of TMA samples and used retrospective studies to analyze GC tissues, which may lead to potential bias. Second, we only collected the overall survival time, so we could not analyze the relationship between CPXM1 expression and disease-free survival and relapse-free survival. Finally, the molecular mechanism of CPXM1 in GC is not clear due to a lack of experiments.
In conclusion, the overexpression of CPXM1 may indicate a poor prognosis of GC. In addition, our bioinformatics results showed that CPXM1 may regulate the cell adhesion-related signaling pathway of GC. However, further experiments are needed to study the precise molecular mechanism of CPXM1 in GC.       Enrichment plots of CPXM1 high expression data set from GSEA analysis. The CPXM1 high expression data set was enriched in focal adhesion, cell adhesion molecules, ECM-receptor interaction, cytokinecytokine receptor interaction, leukocyte transendothelial migration, dilated cardiomyopathy, axon guidance, melanoma, and Hedgehog signaling pathway.