Identication of Key Claudin Genes Associated With Survival Prognosis and Diagnosis in the Colon Cancer Through Integrated Bioinformatic Analysis.

Background: Claudin ’s gene are associated with various aberrant physiological and cellular signaling. However, the association of claudins with survival prognosis, signaling pathways, and diagnostic efficacy in colon cancer remain lacking. Methods: We used various bioinformatics methods, including differential expression analysis, gene set enrichment analysis (GSEA), protein-protein interaction (PPI), survival analysis, single sample gene set enrichment analysis (ssGSEA), mutation analysis, and identifying receiver operating characteristic (ROC) curve of claudins in the TCGA colon adenocarcinoma (COAD). Results: We found that: CLDN2, CLDN1, CLDN14, CLDN16, CLDN18, CLDN9, CLDN12, and CLDN6 are elevated in COAD. In contrast, the CLDN8, CLDN23, CLDN5, CLDN11, CLDN7, and CLDN15 are downregulated in COAD. Various claudin ’s genes are mutated and associated with diagnostic efficacy in the COAD. Conclusions: Claudin ’s genes are associated with prognosis, immune regulation, signaling pathway regulations, and diagnosis. These findings may provide new molecular insight into the treatment of colon cancer. This study presents a thorough bioinformatics investigation on how the changes of claudins expression level regulating the key cancerous phenotypes and immunity of the colon adenocarcinoma. Also, we explore the association of claudins in the prognostic and diagnostic efficacy of colon adenocarcinoma. Moreover, we revealed that the claudins are mutated and substantially associated with regulating the activity of cancerous pathways in COAD. signaling pathway, focal adhesion,


Introduction
Claudins are a multi-gene transmembrane protein family that included at least 27 members 1,2 .
Claudins proteins are associated with the tight junctions of cell-cell communication between the plasma membranes of two contacting cells for signaling purposes 3 . Claudins are related to the various physiological functions, including paracellular ion pores, extracellular loops, ion permeability, cell polarity, affecting regulatory pathways, stabilizing the integrity of the epithelium, etc. 3,4 . Claudins are crucially related to human diseases, including ovarian, breast, pancreatic, and prostate cancers [5][6][7] . The deregulated expression level of claudins can be modulators of carcinogenesis 5,6 . In the progression to the metastatic cascade of cancer, claudins are functionally associated with the controlling of various steps 5 . Claudins regulating the intrinsic roles of the tumor through claudin-mediated functions of stromal cells that substantially influencing the metastatic process of the primary tumor 5 . Recently, it was stated that the claudins are playing major roles in epithelial-to-mesenchymal transition (EMT), the initiation of cancer stem cells, chemoresistance, and recurrence of tumors 6 . The various epithelial-derived tumor showed an altered level of claudin expressions and some claudins are associated with the prediction of patient prognosis 5 . Jian Li demonstrated that the aberrant expression of claudins is linked with neoplastic transformation, which indicating that the claudins are related to the diagnosis and prognosis or targets for the treatment of cancer 8 . Various studies demonstrated the association of claudins with colon cancer initiation, progression, metastasis, and prognosis 2 , [9][10][11] . For example, Didi Zuo et al revealed that claudin-1 is a substantial prognostic biomarker in colorectal cancer 9 . Liguo Zhu et al demonstrated that the claudin family participates in colitisassociated colorectal cancer 2 . Qun Huo et al revealed that claudin-1 protein is a crucial factor associated with the tumorigenesis of colorectal cancer 12 . Claudin-2 potentially trans-activating the EGFR for promoting colon cancer 10 . The decreased expression level of claudin-4 substantially regulating the invasion and metastasis of colorectal cancer 11 . These studies providing the information that the claudins are crucially associated with the invasion, metastasis, and prognosis of colon cancer.
This study presents a thorough bioinformatics investigation on how the changes of claudins expression level regulating the key cancerous phenotypes and immunity of the colon adenocarcinoma. Also, we explore the association of claudins in the prognostic and diagnostic efficacy of colon adenocarcinoma. Moreover, we revealed that the claudins are mutated and substantially associated with regulating the activity of cancerous pathways in COAD.

Methods
This study was carried out in the computer laboratories at king Abdulaziz University during 2020-2021. The study was approved by research ethics committee (HA-02-J003) at the center of excellence in genomic medicine research (CEGMR). All of the data in this investigation was analyzed in accordance with CEGMR ethical requirements.

Identification of Differentially Expressed claudins
The R package "limma" was employed for identifying the significant DEGs between COAD (n=287) and normal samples (n=41) 14 . We identified the DEGs with a threshold absolute value of Log2FC>0.50 and P-value≤0.05.

Gene-Set Enrichment Analysis
We performed gene-set enrichment analysis of the DEGs by using the GSEA 15 . We inputted all claudins into the GSEA tool for identifying GO and pathways. The KEGG 16 pathways and Reactome pathways 17 that are significantly associated with the claudins were identified. The P-value<0.05 was considered significant when selecting the GO terms and pathways.

Construction of Protein-Protein Interaction (PPI) Network of claudins
To better know the relationship among all claudins, the PPI network was established using the STRING tool 18 . To identify the rank of hub genes, we used Cytoscape plug-in tool cytoHubba 19 .
The rank of the genes was identified based on the degree of interactions with neighbor genes. We selected the minimum required interaction score is 0.40 for identifying the PPI of claudins. We visualize the PPI networks by utilizing the Cytoscape 3.6.1 software 20 .

Survival Analysis of claudins by using the GEPIA tool
We compared the overall survival (OS) and the disease-free survival (DFS) of colon cancer patients. Kaplan-Meier survival curves were used to show the survival differences between the high expression group and low expression groups (High expression group > median > Low expression group). The survival significance of all differentially expressed claudins in the TCGA COAD cohort was analyzed using GEPIA 13 databases. Cox regression P-value < 0.05 was considered as significant when comparing the survival between the two groups.

ESTIMATE algorithmic for quantifying immune score and stromal score
ESTIMATE is an algorithmic tool based on the R package for predicting tumor purity, Immune Score (predicting the infiltrations of immune cells), and Stromal Score (predicting the infiltrations of stromal cells) which uses the gene expression profiles of 141 immune genes and 141 stromal genes 21 . The presence of infiltrated immune cells and stromal cells in tumor tissues were calculated using related gene expression matrix data, represented by Immune Score and Stromal Score, respectively 21 . Then we calculated the correlations of key genes with immune scores and stromal scores. The threshold value of correlation is R>0.20, and P-value is not less than 0.001 (Spearman's correlation test).

Single sample gene set enrichment analysis (ssGSEA)
One of the extension packages of GSEA, single-sample gene-set enrichment analysis (ssGSEA) was used to identify the enrichment scores of immune cells for each pairing of a sample and gene set in the tumor samples 22 We collected the marker gene set for immune signatures, biological processes, cancer-associated pathways, and utilizing each gene set to quantify the ssGSEA scores of specific immune signatures [23][24][25][26] . We identified the ssGSEA score of various immune stimulatory and inhibitory signatures, including B cells, CAFs, CD4 Regulatory T cells, CD8 T cells, cytolytic activity, endothelial cell, immune checkpoint genes, M2 macrophages, macrophages, MDSCs, neutrophils, NK cells, pDC, T cell activation, T cell exhaustion, TAM, Tfh, Th17, TILs, and Tregs. Then, we identified the ssGSEA score of angiogenesis, apical junction, apoptosis, epithelial-mesenchymal transition (EMT), hypoxia, proliferation, and stemness. Moreover, we identified the ssGSEA score of cancer-associated pathways 27 that included cell adhesion molecules CAMs, ECM receptor interaction, ERBB signaling pathway, focal adhesion, gap junction, leukocyte transendothelial migration, MAPK signaling pathway, MTOR signaling, Notch signaling, pathways in cancer, TGF beta signaling, tight junction, VEGF signaling pathway, and Wnt signaling. All of the gene sets are displaying in Supplementary Table   S1.

Diagnostic efficacy evaluation of differentially expressed claudins in the COAD
To assess diagnostic values of the prognostic genes, the receiver operating characteristic (ROC) curve was plotted and the area under the ROC curve (AUC) was calculated using the "pROC" R package 28 to evaluate the capability of distinguishing COAD and normal samples. The greater AUC value of individual genes indicated the differences between tumor and normal samples, and the key gene of AUC>0.5 in the CAFs datasets was defined as a diagnostic efficiency of the gene 29 .

Statistical Analysis
We used the R software version 4.0.1 for all statistical analyses. In the Log-rank test, P<0.05 was considered as statistically significant for survival analysis. To investigate the correlation of genes, Spearman's correlation between the ssGSEA scores and specific genes was performed (P<0.001).
We used the Pearson correlation test to identify the correlation between the two genes (P≤0.05).
We utilized R package ggplot2 for the graphical presentation of the Heatmap and correlation graph.

Identifying the differentially expressed Claudins gene members in the COAD
We investigated the differential expression analysis of Claudins gene members in the COAD relative to the normal samples (Table 1). We found that the expression level of CLDN2, CLDN1,

Claudins gene family is associated with functional enrichment and pathways
The enriched gene ontology (GO) terms and pathways were identified by using the GSEA tool

Claudins gene family members are involved in the PPI network and correlated with each other
The targeting of protein-protein interactions is substantially relevant to cancer and acting as the tumor-promoting function of several aberrantly expressed proteins in the cancerous conditions is direct influences the capability to interact with a protein-binding regulatory partner 30 . For investigating the PPI interaction of all claudins, we used the STRING tool. We revealed that the

Claudins are associated with poor survival prognosis in the COAD
We investigated the survival significance of all differentially expressed significant claudins

Claudins are associated with immune infiltrations in the COAD
We investigated the regulation of the tumor microenvironment by Claudins genes. We found that the immune score is positively correlated with CLDN5, CLDN11, and CLDN18 and negatively correlated with CLDN9 (Absolute value of Spearman Correlation, R >0.20 and P<0.001) ( Figure   6A). Besides, the stromal score is positively correlated with CLDN5 and CLDN11 and negatively correlated with CLDN7 (Absolute value of Spearman Correlation is 0.20 and P<0.001) ( Figure   6A). Tumor purity, another substantial parameter for the tumor microenvironment, is negatively

Claudins gene family members regulating the cancer-associated pathways
Since our analysis identifying the association of claudins with immune infiltrations and cancerous biological processes, we investigated the correlation of claudins with cancerous pathways activity signaling, Notch signaling, TGF beta signaling, and Wnt signaling pathways (Figure 8).
Altogether, it indicates that the expression of CLDN5, CLDN11, CLDN2, CLDN7, and CLDN12 is associated with regulating the key cancer-associated pathways in the COAD.

Claudins gene family members are mutated in the COAD
We investigated the genetic alterations of all differentially expressed claudins in the COAD. We (1.8%) and CLDN15 (1.4) (Figure 9).

Claudins exhibited the diagnostic efficacy in colon cancers
We speculate that these differentially expressed claudins genes (CLDN2, CLDN1, CLDN14, CLDN16, CLDN18, CLDN9, CLDN12, CLDN6, CLDN8, CLDN23, CLDN5, CLDN11, CLDN7, and CLDN15) have diagnostic value in colon cancer. We used the TCGA COAD dataset to validate our hypothesis, and the results showed that the ROC curve of the expression levels of these genes showed excellent diagnostic value for colon cancer cases (AUC>0.5) (Figure 10)

Discussion
Since Altogether, it indicates that deregulated claudins are associated with colon cancer pathogenesis.
Moreover, we found that caludins are associated with the enrichment of gene ontology and signaling pathways (Figure 2 and Figure 3). GO and pathway analysis revealed that the significant terms are mainly involved with immune regulation and cellular communication ( Figure   2 and Figure 3). Ryan C Winger et al revealed that claudins are associated with the leukocyte transendothelial migration in a human model of the blood-brain barrier 40  behavior and prevention, it is necessary to identify the correlation of claudins with cancer associated pathways. We found that several deregulated claudins are associated with the activity of cancer associated pathways (Figure 8). It was indicated that the claudins are related to the cancerous-associated pathways in cancers. For example, the hypermethylation of the CLDN11 promoter region in CRC cells committed the metastasis of cells 33 . Altogether, it indicates that the expression of claudins is associated with the regulation of cancerous phenotypes and pathways in COAD. Furthermore, we found that the claudins are mutated (Figure 9) and it has strong diagnostic value for colon cancer patients (Figure 10). Recently, it was stated that the CLDN15 is a diagnostic marker for malignant pleural mesothelioma 48 . CLDN1, a gene with diagnostic value, acted as the novel marker in CRC 35. CLDN7, with emerging clinical significance, is also a diagnostic marker in the COAD 49 . CLDN14, an upregulated prognostic gene, influence colorectal cancer progression through controlling the PI3K/AKT/mTOR pathway 50 . Altogether, it can be stated that the claudins are associated with the diagnostic efficacy in COAD.

Conclusions
The identification of key claudin's gene associated with prognosis, immune regulation, signaling regulations, and diagnosis may provide insight into the new avenue of colon cancer treatment. In summary, the expression of claudins, especially CLDN5, CLDN11, and CLDN18 are substantially associated with immune regulation, cancer-associated pathways, cancerous phenotypes, and diagnostic efficacy in the COAD. The experimental validation of these key claudins would be necessary before applying these present findings into clinical translation for treating colon cancer.
Therefore, these comprehensive studies can then be used for identifying the crucial oncogenic functions of claudins in colon cancer.