Systematic Expression Analysis of the Diagnostic and Prognostic Value of HEPACAM Family Member 2 in Colon Adenocarcinoma

Background: The diagnostic and prognostic value of HEPACAM family member 2 (HEPACAM2) gene in patients with colon adenocarcinoma (COAD) is rarely reported. Therefore, the purpose of this study is to explore the diagnostic and prognostic value of HEPACAM2 gene in patients with COAD. Methods: Firstly, we analyzed the differential expression levels of HEPACAM2 gene and diagnostic value analysis from different databases. Secondly, univariate and multivariate survival analysis of the prognostic value of HEPACAM2 gene in patients with COAD was performed. Finally, utilizing joint-effects analysis and comprehensive prognosis analysis to investigate the prognostic value of HEPACAM2 and related genes. Results: Differential analyses of multiple databases showed that the HEPACAM2 expression level in COAD tumor tissue was signicantly lower than that of adjacent normal tissues. The diagnostic ROC curve results indicated that HEPACAM2 gene had a higher diagnostic value in COAD. The RT-qPCR verication results of COAD tissue in the Guangxi cohort showed that HEPACAM2 expression level in COAD tumor tissue was signicantly higher than that of adjacent normal tissues (P<0.001), and the diagnostic value was high in COAD (AUC=0.892). The prognostic value analysis showed that low expression of HEPACAM2 gene had a poorer prognosis of overall survival (OS) in patients with COAD when compared with those patients with high expression of HEPACAM2 gene ((P = 0.038, HR (95% CI) = 0.635(0.414-0.976)). Joint-effects analysis and comprehensive prognosis analysis showed that high expression of HEPACAM2 combined with high expression of CLCA1, high expression of REP15, and high expression of B3GNT6 were associated with a better prognosis of overall COAD OS. Conclusion: The results of this study suggested that the HEPACAM2 may be an independent diagnostic and prognostic biomarker for COAD. 60°C for 15 s. The primer sequences were: HEPACAM2 Forward (cid:0) 5’-TGCCACCCAATGCATCTCTGCT-3’, HEPACAM2 Reverse (cid:0) 5’-TCTGCACCACTGGCTTTGTGAC-3’; β-actin Forward (cid:0) 5’-CATGTACGTTGCTATCCAGGC-3’, β-actin Reverse (cid:0) 5’-CTCCTTAATGTCACGCACGAT-3’. The nal result was calculated using 2 ‐ ∆∆ Cq.(28) This study followed the Herki and was by the Ethics Association of the First Aliated Hospital of [No.: 2019(KY-E-001)], all patients signed an informed consent form. risk scored model of HEPACAM2 and REP15 as follows: (-0.453×HEPACAM2 expression) +(-0.615 REP15 expression). risk scored model of HEPACAM2 and B3GNT6 as follows: (-0.453×HEPACAM2 expression) +(-0.429 × B3GNT6 expression). All nomograms we constructed showed that low risk scored group had better survival and fewer deaths of COAD OS. The prognosis 1-, 3-, and 5-year gene in patients with COAD for the rst time. The results of multiple data cohorts and the Guangxi cohort found that HEPACAM2 gene had a higher diagnostic value in COAD, and the results of multivariate COX survival analysis showed that low expression of HEPACAM2 gene had a poorer prognosis in patients with COAD when compared with those patients with high expression of HEPACAM2 gene. HEPACAM2 gene may be an independent biomarker for diagnosis and prognosis in patients with COAD.


Introduction
Colorectal cancer (CRC) is a malignant tumor that ranks second in cancer morbidity and third in mortality worldwide. (1) The morbidity and mortality of CRC are increasing globally, with approximately 1.4 million new CRC cases and 700,000 deaths in 2018. (2) Smoking, alcohol consumption, excess weight, consumption of red or processed meat, low dietary ber or calcium intake are considered risk factors for increased CRC. (3) Colonoscopy and clinicopathological examinations are still the main methods of screening and diagnosis, but these operations are often invasive and could bring certain pain and economic burden to patients.(4) Serum carcinoembryonic antigen (CEA) has a certain prognostic value in CRC, but its low diagnostic value and poor early prognosis are still the main contradictions. (5,6) At present, the main treatments for CRC are still surgery, radiotherapy, and chemotherapy. When diagnosed with advanced metastatic disease, CRC patients traditionally have a poor prognosis, with a 5-year survival rate of 5%-8%.(7) Therefore, we urgently need to nd another novel, accurate, practical oncology biomarker to improve the early diagnosis and prognostic detection of patients with CRC. As an important regulatory molecule, mRNA expression is believed to play an important role in tumor cell survival and tumor progression, and the comparison of gene expression pro les between tumors and adjacent tissues plays an important role in the identi cation of biomarkers. (8,9) Colon adenocarcinoma (COAD) is the most common type of CRC, (10) so this study aimed to nd potential mRNA gene biomarkers with diagnostic and prognostic value in COAD.
Hepatic and glial cell adhesion molecule (HEPACAM) was rst identi ed as a cell adhesion molecule in 2005, belonging to the immunoglobulin superfamily. (11) The HEPACAM gene is often down-regulated or lost in various human cancer cells or tissues. (12) It is reported that HEPACAM mediates the proliferation, differentiation, and migration of cancer cells. (13) However, as a member of the immunoglobulin superfamily, the HEPACAM family member 2 (HEPACAM2) gene has few reports on its role in cancer regulation and management. HEPACAM2, also called MIKI and LOC253012, is located in the chromosome region 7q21. (14) This gene encodes a protein related to the immunoglobulin superfamily and plays a role in mitosis. The poly (ADP-ribosylation) of the encoded protein can promote its translocation to the centrosome, thereby promoting the maturation of the centrosome. (15) It was reported that HEPACAM2 participated in regulating the cell proliferation, and knockdown of this gene can lead to early and metaphase block of mitosis, abnormal nuclear morphology and apoptosis. (14,16) It is noteworthy that the expression of HEPACAM2 was downregulated in colorectal cancer (17)(18)(19). Nevertheless, the clinical importance of HEPACAM2 in COAD remains unclear. This study is the rst to investigate and explore the diagnostic and prognostic value of HEPACAM2 gene in COAD.

Public data collection
The HEPACAM2 gene mRNA expression dataset and consistent clinicopathological parameter information were obtained from the global public cancer database of TCGA (https://tcga-data.nci.nih.gov/) (20). Additionally, we also obtained the co-expression gene expression dataset related to HEPACAM2 gene in COAD from the TCGA database.
Differential expression analysis and diagnostic ROC Curve Analysis based on TCGA cohort and public networks Firstly, we downloaded the related information of HEPACAM2 gene in different cancers from UALCAN (http://ualcan.path.uab.edu/index.html) (21) and Tumor IMmune Estimation Resource (TIMER, https://cistrome.shinyapps.io/timer/)(22) database. To further explore the differential expression of the HEPACAM2 gene in COAD, we also obtained the different HEPACAM2 gene expression data from the public databases of Gene Expression Pro ling Interactive Analysis (GEPIA; http://gepia.cancer-pku.cn/index.html) (23) and Metabolic gEne Rapid Visualizer (MERAV; http://merav.wi.mit.edu/) (24). Additionally, we also investigated and performed the differential expression of HEPCAM2 gene in COAD cancer tissues and adjacent normal tissues, and analyzed the expression level of HEPCAM2 gene in COAD tumor tissues under different tumor TNM stages based on the TCGA database. Finally, based on the TCGA database, we performed a receiver operating characteristic (ROC) curve to analyze the diagnostic value of the HEPACAM2 gene in COAD tumor tissues and adjacent normal tissues.
Mining and validation of the differential expression and diagnostic value of HEPACAM2

Mining cohort
The mRNA expression of HEPACAM2 in COAD or CRC was analyzed within the Oncomine dataset (https://www.oncomine.org/) (25), which based on the Gene Expression Omnibus (GEO). The differential expression scatter plot and diagnostic ROC curve of HEPACAM2 gene were performed in two studies, including Skrzypczak COAD (26) and Hong CRC (27). The expression of HEPACAM2 gene in COAD was divided into two groups according to the median cut-off value of gene expression level, namely, high-and low-expression groups. The Kaplan-Meier survival curve was performed to evaluate the survival prognosis of HEPACAM2 gene in patients with COAD. We then constructed different prognostic COX risk models according to the different clinical parameters, the HR risk curve and multivariate survival analysis was investigated to explore the prognostic value of HEPACAM2 gene in COAD. Finally, we constructed the HEPACAM2 gene-related nomogram to estimate the prognosis risk in COAD.

Collection of HEPACAM2 gene mutation and immune in ltration information of COAD
The mutation status and mutation frequency of HEPACAM2 related to COAD were obtained from the cBio Cancer Genomics Portal (cbioportal, https://www.cbioportal.org/) database.(29) Additionally, we obtained the potential connection between HEPACAM2 gene and different immune in ltrating cells through the TIMER database.

Correlation analysis and correlation-genes survival analyses in COAD patients
We collected COAD gene sets associated with HEPACAM2 gene from different datasets, including GEPIA, ULCAN, and LinkedOmics (http://www.linkedomics.org/)(30) dataset, and drew a Venn diagram to select the intersection genes of the three datasets. Then, we also performed a correlation analysis of intersection genes based on the TCGA cohort. The intersection genes were also separated into two compartments based on median cut-off value, the univariate and multivariate survival analysis to evaluate the prognostic value of these genes in COAD.

Joint-effects analysis and comprehensive prognosis analysis
The joint-effects survival analysis of correlation genes was investigated the combination effects in COAD patients' survival, the survival curves and multivariate survival analyses were also utilized to evaluate the prognostic value in COAD. Meanwhile, we also conducted different prognostic nomograms based on joint-effects survival results. Finally, we constructed different prognostic risk scored models based on joint-effects survival analysis. The formula for calculating the prognostic risk score was: Risk score= gene 1 expression × β1 gene 1 + gene 2 expression×β2 gene2+…Gene n expression× β n Gene n . The result of β came from the coe cients of the multivariate COX regression risk model. (31) COAD patients were divided into different risk models according to the level of median risk score values. The prognostic time-dependent ROC curves were performed by R software to estimate the predictive accuracy in COAD patients.

Statistical analysis
The comparison of HEPACAM2 gene expression in COAD tumor tissues and adjacent normal tissues in the TCGA cohort was performed by unpaired Student's t-test, and the validation cohort of RT-PCR result was validated by paired t-test, and the mean ± Standard deviation represented gene expression level. The multivariate Cox risk model was adjusted by different models, namely, model 1: Unadjusted model; model 2: Adjusted by TNM stage; model 3: Adjusted by age, sex, and TNM stage. The results were presented with hazard ratio (HR), 95% con dence interval (CI) and P-value. All results in this study were considered to be statistically different at P<0.05. All statistical calculations were performed by SPSS 25.0 (IBM, New York, NY, USA) and R software, version 4.0.3 (http://www.R-project.org/).

Data resource processing
We downloaded a total of 461 COAD patients' clinical data and 456 gene expression levels (including 480 cancer tissue expression levels and 41 adjacent normal tissue expression levels) from the TCGA database. We integrated the two data to obtain 438 cases of COAD tumor tissue and 41 cases of adjacent normal tissue (removal of no prognostic information, mismatched information, and repeated expression of cancer tissue). The univariate survival analysis of clinical parameters was shown in Table 1. The results showed that the TNM stage was correlated with the OS of COAD patients (Log-rank P < 0.001).

Differential expression analysis and diagnostic ROC Curve Analysis based on TCGA cohort and public networks
We downloaded the HEPACAM2 gene expression level in various cancer tissues and normal tissues from the UALCAN and TIMER databases. The results showed that the expression level of HEPACAM2 gene in COAD tumor tissue was lower than that in normal colon tissue. (Fig. 1) We further downloaded the expression box diagrams of HEPACAM2 gene in COAD tissue and normal colon tissue from the GEPIA and MERAV databases, and the results were consistent with the previous description. (Figure S1A-1B) Meanwhile, the expression level of HEPACAM2 gene in COAD patients of different TNM stages did not reach statistical differences (P > 0.05). ( Figure S1C) Based on the TCGA database, we also investigated the differential expression of HEPACAM2 gene between COAD tumor tissues and adjacent normal tissues, the result showed that the HEPACAM2 expression level was higher in COAD adjacent normal tissues than that in tumor tissues. We also found that the HEPACAM2 expression level didn't show differential expression in different TNM stages. (Fig. 2A) The diagnostic ROC curve showed that HEPACAM2 had a higher diagnostic value in patients with COAD (P < 0.001, Area Under Curve (AUC) = 0.940, 95%CI = 0.805-0.979). (Fig. 2B) Finally, we also investigated the differential expression and diagnostic value of HEPACAM2 gene in COAD or CRC using an Oncomine dataset based on GEO cohort, the result showed that the HEPACAM2 gene was highly expressed in Skrzypczak COAD or Hong CRC normal tissue than that in tumor tissues, and the HEPACAM2 gene had a high diagnostic value in Skrzypczak COAD (P < 0.001, AUC = 0.896, 95%CI = 0.812-0.980) and Hong CRC (P < 0.001, AUC = 0.976, 95%CI = 0.944-1.000). (Fig. 3) Validation and analysis of HEPACAM2 in the diagnostic value of COAD based on the Guangxi cohort We collected 30 pairs of COAD patients' tumor tissues and adjacent normal tissues. After RT-qPCR detection, we found that the expression level of HEPACAM2 gene in COAD tumor tissue (0.036942 ± 0.062463) was signi cantly lower than that in adjacent normal colon tissue (0.167750 ± 0.179779). (P < 0.001) Meanwhile, the diagnostic ROC curve suggested that the HEPACAM2 gene had a higher diagnostic value in patients with COAD (P < 0.001, AUC = 0.892, 95%CI = 0.805-0.979). (Fig. 2C-2E) Univariate and multivariate survival analysis of HEPACAM2 gene in COAD We performed a survival analysis of HEPACAM2 gene in patients with COAD in accordance with the median cutoff value of HEPACAM2 gene expression, the patients with high expression of HEPACAM2 gene had better survival than those with lowly expressed HEPACAM2 gene. (Fig. 4A) In terms of the survival results of univariate clinical parameters, we constructed two different adjusted models and found that the HEPACAM2 gene was related to the OS of patients with COAD, namely model 2: Adjusted by TNM stage (P = 0.044, HR (95% CI) = 0.643(0.419-0.988)) and model 3: Adjusted by age, sex, and TNM stage (P = 0.038, HR (95% CI) = 0.635(0.414-0.976)). (Table 2) Finally, the association between HEPACAM2 gene and death risk of COAD patients was presented in Fig. 5A-5C. In short, the death risk of COAD decreased with the increased expression of HEPACAM2 gene. The HEPACAM2 gene-related nomogram showed that HEPACAM2 gene made a certain contribution to COAD OS. (Figure S2 A)

HEPACAM2 gene mutation and immune in ltration information of COAD
We investigated the mutation status of HEPACAM2 gene in patients with COAD and found that mutation frequency was low and genomic alterations occurred in COAD patients. (Fig. 6A-6B) Additionally, we utilized the TIMER dataset to analyze possible correlations between HEPACAM2 gene and immune in ltration of different Immune cells in COAD. The result showed that there was no signi cant and positive association between HEPACAM2 gene and different Immune cells. (Fig. 6C)
The nomograms of HEPACAM2 and CLCA1, HEPACAM2 and REP15, HEPACAM2 and B3GNT6 showed that these different combinations displayed a higher prognostic contribution to COAD OS than the only HEPACAM2 gene-related nomogram. (Figure S2B-S2D) Finally, the risk scored model of HEPACAM2 and CLCA1 was constructed by the following formula: (-0. Our gene mutation analysis found that the mutation rate of HEPACAM2 gene in COAD was low. Additionally, as we have known that immune cells in ltrating in the tumor microenvironment played an important role in the initiation and progression of the tumor through directly contacting with tumor cells to promote or suppress tumor cell growth.(35) But the HEPACAM2 gene didn't show a signi cantly positive relationship with immune cells in ltrating. We obtained the genes associated with the HEPACAM2 gene through different data websites and found that CLCA1, REP15 and B3GNT6 genes were strongly related to HEPACAM2 gene in COAD. Based on the TCGA database, multivariate survival analysis found that CLCA1, REP15 and B3GNT6 genes had statistical signi cance in the survival of COAD patients. After the combined survival analysis and comprehensive survival prognostic analysis, it was found that HEPACAM2 gene combined with CLCA1 gene, HEPACAM2 gene combined with REP15 gene, and HEPACAM2 gene combined with B3GNT6 gene could improve the survival prediction of patients with COAD.
CLCA1 gene is expressed in the intestinal epithelium. (36) This gene may act as a tumor suppressor and play an important function in regulating the differentiation and inhibiting the proliferation in colon cell lines. (37) Previous studies reported that low expression of CLCA1 was associated with a poorer OS in patients with CRC and related to tumorigenesis, metastasis, and high chromosomal instability. (38,39) The protein encoded by the REP15 gene interacts with Rab15 bound to GTP and participates in the circulation of the transferrin receptor from the endocytic circulation chamber to the cell surface. (40) A study by Xu et al. identi ed that REP15 could serve as a novel prognostic gene for CRC and low expression of REP15 was associated with unfavorable prognosis of CRC, besides, they also found that the high expression of REP15 was positively correlated with the p53 pathway, and negatively correlated with glycerophospholipid metabolism, hedgehog and insulin pathways. (41) The gene B3GNT6 encodes a glycosyltransferase, which is capable of adding progressive carbohydrates to form a core 3 O-glycan structure and only appears in speci c tissues, such as the colon. It has been reported that the core 3 structure in colon cancer tissue decreases with the decrease of core 3 synthase activity. (42) Besides, B3GNT6 was down-expressed in the grade of CRC progress and in tumor metastasis when compared with normal tissues. (43,44) These prognostic genes seem to have changes in their expression during the occurrence and development of tumors. When these genes are combined with the HEPACAM2 gene, they can improve the survival prediction ability of patients with COAD. Of course, these assumptions still need to be further veri ed in the future.
Of course, this research still has some shortcomings. Firstly, the clinical parameter information we obtained from public databases is still not perfect. Secondly, the diagnostic and prognostic value of HEPACAM2 still needs to be further explored in vivo and in vitro. Finally, multi-center and larger samples are still needed to verify our ndings.

Conclusions
This study explored the diagnostic and prognostic value of HEPACAM2 gene in patients with COAD for the rst time. The results of multiple data cohorts and the Guangxi cohort found that HEPACAM2 gene had a higher diagnostic value in COAD, and the results of multivariate COX survival analysis showed that low expression of HEPACAM2 gene had a poorer prognosis in patients with COAD when compared with those patients with high expression of HEPACAM2 gene. HEPACAM2 gene may be an independent biomarker for diagnosis and prognosis in patients with COAD. However, these ndings need to be veri ed in the future.

Declarations
Ethics approval and consent to participate Availability of data and materials The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.