Gastric cancer data sets
We searched the published gastric cancer gene expression database systematically, including those with complete clinical information and excluding those with no survival information. Finally, we gathered The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) cohorts and 17 Gene Expression Omnibus (GEO) cohorts of samples from patients with GC for this study (GSE54129, GSE65801, GSE35809, GSE51105, GSE13861, GSE27342, GSE29272, GSE63089, GSE19826, GSE79973, GSE13911, GSE51575, GSE118916, GSE122401, GSE130823, GSE15459, GSE66229) and the TCPA database for analysis. The original data were collected and downloaded from GEO (http://www.ncbi.nlm.nih.gov/geo/), TCGA (https://portal.gdc.cancer.gov/), and TCPA (https://www.tcpaportal.org/tcpa).
Tissue specimens
The tissues in this study were selected from gastric adenocarcinoma tissue specimens of 215 patients undergoing radical gastrectomy for gastric cancer in our center from January 2013 to December 2014. All patients were newly diagnosed and before surgery they had not received chemotherapy or radiation treatment. The patients were pathologically confirmed to have gastric adenocarcinoma after surgery with comprehensive clinicopathological information. The data were analyzed retrospectively. This study was approved by the Fujian Medical University Union Hospital Ethics Committee and written permission was obtained from every relevant patient.
Single-sample gene set enrichment analysis (ssGSEA)
We obtained 3 GFR gene sets (KRAS_SIGNALING_UP and AKT_UP. V1_DN and MTOR_UP. V1_DN) from C6 (oncogenic gene sets) of MSigDB (https://www.gsea-msigdb.org/). Using the R software package "GSVA" (gene set variation analysis for microarray and RNA-seq data), we scored each sample in the TCGA cohort by ssGSEA (method = "ssgsea", ssgsea.norm = TRUE, verbose = TRUE).
Unsupervised clustering
Unsupervised clustering methods (K-means) were used to classify the TCGA cohort into different clusters based on the enrichment of GFR pathways. The clustering factors were the ssGSEA scores of the three GFR gene sets. These scores were first converted to z scores to improve the accuracy of clustering. We determined the final number of clusters according to the algorithm provided by the R software package "NbClust". Finally, the TCGA queue was accurately divided into 3 clusters defined as Cluster A, Cluster B, and Cluster C.
GSEA
We performed GSEA on the TCGA and GEO datasets (GSE54129, GSE65801, GSE35809, and GSE51105). First, we used the mean ± standard deviation (SD) of the CDK5RAP3 expression value as the cut-off point to divide each data set into three groups: the group of high, moderates and low. Next, we compared the high and low expression group to obtain differentially expressed genes. Additionally, the R package “clusterProfiler” (v3.12.0)0 (https://guangchuangyu.github.io/software/clusterProfiler) was applied to perform GSEA on these differential genes. MSigDB provided us with all of the hallmark and oncogenic gene sets (https://www.gsea-msigdb.org/).
Immunohistochemistry
Tumour specimens containing enough formalin-fixed and embedded by paraffin were sliced into 4-μm serial sections and mounted for immunohistochemical analysis on silane-coated glass slides. The sections were dewaxed, rehydrated, antigen repaired, blocked and then incubated with appropriate antibodies. The rabbit anti-human CDK5RAP3 (ab24189; 1:200; Abcam) or UFM1 (ab109305; 1:200; Abcam) antibody was used as the first antibody.
Immunohistochemical score
Two experienced pathologists independently assessed IHC-stained tissue slices and scored them based on the intensity of cell staining and the positive ratio of the stained tumour cells. The proportion and intensity of CDK5RAP3-positive and UFM1-positive cells in random selection visual areas were evaluated to indicate the protein expression level. The following were the staining score standards for CDK5RAP3 and UFM1: no staining was indicated by a score of 0; the light yellow was defined as mild staining with a score of 1; the yellowish brown was defined as moderate staining with a score of 2; the brown was defined as significant staining with a score of 3. The following were the proportional score standards for stained tumor cells: when less than or equal to 5 percent cells were positive, the score was 0; when the positive cells were range from 6 to 25 percent, the score was 1; when the positive cells were range from 26 to 50 percent, the score was 2; when the positive cells were greater than or equal to 50 percent, the score was 3. (Figure S1). The final score ranging from 0 to 9 for the expression of CDK5RAP3 and UFM1, was obtained by multiplying the staining score and proportional score. The low-expression group was defined as patients having a final score <4. The high-expression group included the remaining patients.
Western blotting
The following antibodies were used by Western blots: CDK5RAP3 (ab24189; 1:1000 dilution; Abcam, Cambridge, MA, USA), UFM1 (ab109305; 1:1000 dilution; Abcam, Cambridge, MA, USA), p-AKT (serine 473) (ab81283, 1:1000 dilution; Abcam, Cambridge, MA, USA) and GAPDH (#5174; 1:2000 dilution; Cell Signaling Technology).
Total RNA extraction and qPCR
Total RNA from gastric cancer and paracancerous tissues was extracted using Invitrogen's TRIzol kit according to the manufacturer’s instructions and used to obtain cDNA using Takara's reverse transcription system. The copy numbers of GAPDH, CDK5RAP3 and UFM1 were detected using qPCR. The following were the detailed primer sequences:
CDK5RAP3 Forward primer: 5′-GCTGGTGGACAGAAGGCACT-3′
Reverse primer: 5′-TGTCCTGGATGGCAGCATTGA-3′
UFM1 Forward primer: 5′-GTCCCC AGCACACTAGAGGA-3′
Reverse primer: 5′-GGA AAAGAGCGGGAG AGAGT-3′
GAPDH Forward primer: 5′-GAAGGTGAAGGTCGG AGT-3′,
Reverse primer: 5′-GAAGATGGTGATGGGATTTC-3′
GAPDH was used as an internal reference, and the ΔΔCt method was used for analysis.
Co-immunoprecipitation (Co-IP)
Protein was extracted from stably transfected cells (HGC-27) overexpressing UFM1, and the BCA method was used to determine the protein concentration. A small amount of protein solution was saved and boiled with 2× SDS sample buffer and then frozen at -20°C for Western blot analysis. Next, an appropriate amount of UFM1 antibody was added to the remaining protein solution at a ratio of 100 µg of protein/1 µg antibody and incubated at 4°C with gentle shaking overnight. Protein A/G agarose beads (20 µl) were incubated at 4°C for 2–4 h and centrifuged at 4°C at 3000×g for 3 min. It discarded the supernatant and washed the agarose beads on 5 times with a buffer of 1 ml lysis. After the final removal of the supernatant, 20 µl of 2× SDS was added to the pellet, followed by boiling in water for 5 min. Finally, the CDK5RAP3 antibody was used for Western blot.
Follow-up
According to the institutional follow-up protocol, qualified doctors monitored all patients by outpatient clinics, phone calls, emails, letters or visits. The first 2 years of follow-up were completed every 3 months. The next 3 years of follow-up were completed every 6 months. Then they were followed up annually until death or after 5 years. Most of the patients had undergone physical exams, laboratory tests, imageological examinations and annual gastroscopy. The time from operation to last follow-up or death was defined as the overall survival time. The follow-up rate of the whole group was 93.56%, and the median follow-up time was 57 months (range, 2–83 months).
Statistical analysis
All statistical analyses were performed using the Social Science Statistical Software Package (SPSS) version 23.0 for Windows (IBM, Chicago, IL, USA) or R software (version 3.6.2). If not specified, the results were shown as percentages or means ± SD. As needed, the data were analysed by chi-square test, Fisher's exact test or Student's t test. The survival rate was evaluated by Kaplan-Meier method and log-rank test. The Cox proportional hazards model was used for univariate and multivariate prognostic analysis. Multivariate analysis was performed on factors with p<0.05 in univariate analysis. Statistical significance was indicated when the P value was less than 0.05. Pearson’s correlation or Spearman’s correlation was used to estimate the correlation coefficient (p <0.05). Additionally, the protein interaction network was constructed using GeneMANIA (http://www.genemania.org/). A receiver operating characteristic (ROC) curve and the area under the curve (AUC) were computed to assess discriminative ability.