CYSTM1: A Novel Biomarker for Hepatocellular Carcinoma Prognosis

Background: The expression and molecular mechanism of cysteine rich transmembrane module containing 1 (CYSTM1) in human tumor cells remains unclear. The aim of this study was to determine whether CYSTM1 could be used as a potential prognostic biomarker for hepatocellular carcinoma (HCC). Methods: We rst demonstrated the relationship between CYSTM1 expression and HCC in various public databases. Secondly, Kaplan–Meier analysis and Cox proportional hazard regression model were performed to evaluate the relationship between the expression of CYSTM1 and the survival of HCC patients which data was downloaded in the cancer genome atlas (TCGA) database. Finally, we used the expression data of CYSTM1 in TCGA database to predict CYSTM1-related signaling pathways through bioinformatics analysis. Results: The expression level of CYSTM1 in HCC tissues was signicantly correlated with T stage (p = 0.039). In addition, Kaplan–Meier analysis showed that the expression of CYSTM1 was signicantly associated with poor prognosis in patients with early-stage HCC (p = 0.003). Multivariate analysis indicated that CYSTM1 is a potential predictor of poor prognosis in HCC patients (p = 0.036). The results of biosynthesis analysis demonstrated that the data set of CYSTM1 high expression was mainly enriched in neurodegeneration and oxidative phosphorylation pathways. Conclusion: may play a key role in the occurrence and progression of HCC.


Introduction
HCC is a one of malignant tumor with high degree of malignancy, easy metastasis and recurrence, which is harmful to human health. Its mortality rate ranks the fourth among all kinds of cancer mortality [1]. In recent years, more and more studies have focused on the molecular mechanism of HCC, but it is still unclear. Because the early symptoms and clinical signs of HCC are not obvious, most patients are in advanced stage of cancer at the time of diagnosis due to its occult onset and rapid progress. Therefore, HCC patients generally have poor prognosis and low survival rate [2]. Research on the molecular mechanisms of early diagnosis and treatment of HCC is of great signi cance for reducing HCC mortality. CYSTM1 is a highly conserved cysteine-rich transmembrane protein in all eukaryotes. It may play a role of stress resistance in eukaryotes, including human beings [3]. The gene is located on chromosome 5q31.3 (C5orf32), consisting of 97 amino acids. However, the expression of CYSTM1 in human tumor tissues and its role in tumor development have not been reported. Therefore, we analyzed the expression patterns of CYSTM1 in HCC patients in this study. The potential mechanism of CYSTM1 in HCC was elaborated by bioinformatics analysis, which laid the foundation for the follow-up experiments.

Materials And Methods
Public data extraction and processing The gene expression quanti cation data and clinical data of 374 HCC samples and 50 normal samples were downloaded from the GDC data portal (https://portal.gdc.cancer.gov/). We used Perl (https://www.perl.org/) scripts to decompress the downloaded compressed les in batches, which contain gene expression and clinical data of GC samples. Then, all sample IDs and RNA-seq data were integrated into a matrix le. Next, according to the Ensembl database (http://asia.ensembl.org/index.html), the Ensembl ID was converted to the gene symbol. Finally, we added the gene attribute (protein coding or lincRNA) after the gene symbol for subsequent operations. R (https://www.r-project.org/) language scripts and various packages were used to make the images and process the data. The expression difference of CYSTM1 in normal gastric and GC tissues was veri ed in the Oncomine database (https://www.oncomine.org). The threshold was set to the following parameters: p-value of 0.001, fold change of 1.5, and gene rank of all. The correlation between CYSTM1 expression and prognosis in HCC patients was evaluated by Kaplan-Meier plotter (http://kmplot.com). Gene Expression Pro ling Interactive Analysis (GEPIA) databases was used to analyse the mRNA expression of CYSTM in HCC samples (http://gepia.cancer-pku.cn/index.html). The human protein atlas (HPA) database was used to analyze the protein expression of CYSTM1 between normal and HCC tissues (www.proteinatlas.org) .

Gene Set Enrichment Analysis (gsea)
GSEA was performed using GSEA v3.0 (https://www.gsea-msigdb.org) and JAVA 8 (https://www.java.com) to identify CYSTM1 associated with gene sets. First, two les were prepared: (1) a .gct le containing CYSTM1 expression data of 374 HCC patients, and (2) a .cls le that divides the expression data of CYSTM1 into high expression and low expression. Then, the number of permutations was set to 1000 to test CYSTM1 correlations with the phenotypes using the c2.cp.kegg.v6.2 gene set database. Ultimately, with the normalized (NOM) p-value < 0.05 and false discovery rate (FDR) < 0.05, the gene sets with signi cant enrichment of CYSTM1 high expression-related genes were considered the enrichment gene sets.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis
To nd the co-expressed genes of CYSTM1, we rst merged the same gene expression data in the processed matrix le and took the average. Secondly, the gene expression data of all the coding proteins in the matrix le was screened out, and the correlation was tested with the expression data of CYSTM1 one by one. Finally, we screened out co-expressed genes of CYSTM1 using Pearson's correlation coe cient > 0.4 and p-value < 0.05 as thresholds. Then, we used R language scripts and R packages to perform GO and KEGG enrichment analyses. We considered the co-expressed genes of CYSTM1 signi cantly enriched in the GO term and the KEGG pathway at p-value < 0.05 and FDR q-value < 0.05. Statistical analysis R 4.0.2 was used for statistical analysis. The Kaplan-Meier survival curve and log-rank test were used to compare the survival rate between the CYSTM1 high expression and low expression groups. We performed univariate Cox-regression analysis of all clinicopathological parameters, and then integrated these parameters into multivariate Cox-regression analysis to determine independent predictors of survival in HCC patients. p-values < 0.05 were considered statistically signi cant.

Expression of CYSTM1 in HCC patients
According to the results of TCGA analysis, the expression of CYSTM1 mRNA in normal tissues and HCC tissues was signi cant different (p < 0.001), and its expression in cancer tissues was 2.3 times higher than that in normal tissues (Fig. 1A). In GEPIA database, the expression of CYSTM1 was up-regulated in HCC tissues (p < 0.001) (Fig. 1B). Similarly, The mRNA expression of CYSTM1 in HCC tissues in the Oncomine database respectively increased by 1.5 and 1.7 times in the two chips (Fig. 1C). The results of immunohistochemical staining in HPA database showed that the protein expression of CYSTM1 was strongly positive in HCC tissues, but negative in normal tissues. Moreover, CYSTM1 was mainly localized in the cell membrane and cytoplasm of HCC tissues (Fig. 1D).

Relationship between the expression of CYSTM1 and clinicopathological features
In this study, we downloaded the clinical data of 374 HCC patients from the TCGA database, including survival time, survival status, age, gender, histological grade, T stage, N stage, M stage and TNM stage. HCC samples with unknown survival time were deleted, and 342 samples were left for subsequent analysis. We analyzed the relationship between the expression of CYSTM1 and the clinicopathological characteristics of HCC patients, which showed that the expression of CYSTM1 was signi cantly different from the histological grade (p = 0.004), T stage (p = 0.046) and M stage (p = 0.048) (Fig. 2). According to the median value of CYSTM1 expression, HCC patients were divided into low expression groups (n = 170) and high expression groups (n = 172). The results showed that the expression of CYSTM1 was statistically signi cant with T stage (p = 0.039) ( Table 1).  (Fig. 3). The results of these subgroup analyses indicated that the expression of CYSTM1 is more predictive of prognosis in early-stage HCC patients ≥ 60 years old. Then we veri ed its in the Kaplan-Meier database and derived similar results (Fig. 4). Univariate analysis demonstrated that T stage (p < 0.001), AJCC stage (p < 0.001) and CPXM1 expression (p = 0.005) are related to the overall survival of HCC patients (Fig. 5A). In addition, multivariate analysis manifested that the expression of CPXM1 is an independent prognostic factor for HCC patients (p = 0.036) (Fig. 5B).

Gsea Analysis Of Cystm1-related Signaling Pathways
We performed GSEA analysis on the CPXM1 high and low expression datasets in TCGA to determine the various pathways that could take part in HCC. With normal p-value < 0.05 and false discovery rate q-value < 0.05 as the threshold, we listed the top nine pathways related to the CPXM1 high expression dataset (Table 2), including huntingtons disease, alzheimers disease, oxidative phosphorylation, parkinsons disease, proteasome, vibrio cholerae infection, lysosome, snare interactions in vesicular transport and glutathione metabolism (Fig. 6). In order to further elucidate the molecular mechanism of cystm1 in gastric cancer, we screened out cystm1 co expression genes from TCGA database and visualized their correlation (Fig. 7A). We then analyzed the CYSTM1 co-expressed gene expression correlation by GO and KEGG analyses. GO analysis showed that the CYSTM1 co-expressed genes were mainly enriched in unfolded protein binding, NADH dehydrogenase activity, NADH dehydrogenase (ubiquinone) activity, NADH dehydrogenase (quinone) activity, ubiquitin binding, etc ( Fig. 7B). In addition, KEGG analysis showed that CYSTM1 co-expressed genes were mainly enriched in signaling pathways related to huntington disease, prion disease, proteasome, amyotrophic lateral sclerosis, parkinson disease, etc (Fig. 7C). Chord plot displays of the relationship between CYSTM1 co-expressed genes and GO analysis are shown in Fig. 8A. The CYSTM1 co-expressed gene pro les are displayed in each GO analysis by hierarchical clustering (Fig. 8B).

Discussion
HCC is one of the most fatal cancers in the world which occurrence and development are a multi-gene, multistep and multi-stage process [3,4]. With the development of molecular biology, more and more biomarkers have been found. However, the prognosis of HCC is still poor due to the lack of effective biomarkers to predict early-stage HCC [5,6]. Therefore, it is of great signi cance to nd a speci c and sensitive tumor marker to assist in the diagnosis and treatment of early-stage HCC. This study is the rst time to propose that CYSTM1 could be used as a biomarker for the prognosis of HCC, and the overexpression of CYSTM1 is signi cantly related to the clinicopathological characteristics and prognosis of HCC patients. In addition, multivariate Cox proportional hazard regression model showed that CYSTM1 was an independent risk factor for survival of HCC patients.
Nearly 10 years ago, CYSTM1 was rst proposed and characterized, when it was proved to be a cysteine-rich transmembrane module [3]. It is noteworthy that CYSTM1 is also expressed and located on the cell membrane of human tissues. In the past researches, CYSTM1 has only a few reports in some eukaryotes, and it has been proved that CYSTM family proteins play an important role in resistance to drug, resistance to metal ions, and resistance to viruses [8,9]. These functions may be related to cysteines, the acid residues and the cytoplasmic polar disordered head on CYSTM1, and these structures are highly conserved in different species. The Cterminal transmembrane helix of CYSTM1 contains 5-6 cysteines, among which 3-4 continuous cysteines constitute the cysteine patch. This may change the redox potential or radical quenching of the membrane, thus playing an antioxidant role [10]. CDT1, a member of CYSTM1 subfamily which rich in cysteine polypeptide, is heterologously expressed in yeast to prevent cadmium from entering cells [8]. Vallee and Margoshe reported for the rst time that metallothioneins (MTs) are cadmium binding proteins in horse kidney cortex [11], and it contained a high proportion of cysteines [12]. These results suggest that CYSTM1 may act as a metallothionein-like protein in cell membrane.
Through KEGG enrichment analysis, we found that CYSTM1 co-expression genes are mainly concentrated in pathway of neurodegeneration -multiple diseases, especially huntington's disease, which is consistent with the results of transcriptome analysis by Mastrokolias using next-generation sequencing to predict biomarkers of huntington's disease [13]. Copper could increase the aggregation of poly-glutamine (polyQ) in vivo and in vitro, but MTs could protect huntington model cells from the toxic effects of polyQ [14]. Then through GO enrichment analysis, it is concluded that CYSTM1 co-expressed genes are mainly enriched in unfolded protein binding and NADH dehydrogenase activity. The former function is related to the results of KEGG enrichment analysis, because neuronal cells are highly sensitive to unfolded protein. Long-term accumulation of unfolded protein will cause endoplasmic reticulum stress, which may lead to cell apoptosis and necrosis if stress exists permanently [15]. The function of the latter may be related to the resistance to cellular oxidative stress. The tumor microenvironment also contains a large amount of reactive oxygen species (ROS). These ROS could be produced by tumor-related broblasts, in ammatory cells, vascular endothelial cells, hypoxic internal environment through a variety of ways [16]. At the same time, the effect of ROS on tumor is bidirectional. On the one hand, it could promote tumor growth and progression by stimulating tumor cells migration and invasion [17]. On the other hand, high levels of ROS could cause cell apoptosis or necrosis, which is detrimental to the progress of tumor[]. However, MTs could protect DNA from damage by exchanging various toxic metal ions and oxygen free radicals [18]. Therefore, the up-regulation of CYSTM1 expression may be to protect tumor cells from apoptosis by resisting high ROS levels. Of course, these results need a large number of scienti c experiments to verify, and it is also one of our follow-up research topics.

Conclusion
CYSTM1 is present on the cell membrane of human HCC cells. Moreover, overexpression of CYSTM1 may be a potential tumor biomarker for poor prognosis of HCC. The stress resistance function of CYSTM1 in eukaryotes and the bioinformatics analysis results make us predict that oxidative stress resistance may be one of the most important functions of CYSTM1 expression in HCC. However, it is necessary to further explore the biological function of CYSTM1 in human tumor cells.