Systematic Expression Analysis of Zinc-finger Transcription Factor Family Reveals the Importance of Snai1 in Colorectal Carcinogenesis


 Background: Epithelial-mesenchymal transition (EMT) has long been recognized as one of the most important processes involved in cancer cells metastasis. SNAI1, a member of zinc-finger transcriptional factor family, has been identified as an EMT inducer, but its role in human colorectal cancer (CRC) remains largely unclear.Methods: In the present study, we made a synthesis of the expression of zinc-finger transcription factor family and investigate the prognostic value of SNAI1 in human CRC by using a series of bioinformatic tools.Results: SNAI1 is the only member of zinc-finger transcription factor family that frequently overexpressed in CRC. In addition, overexpression of SNAI1 was remarkably associated with high tumor stage, N classification and M classification in patients with CRC. Patients with high SNAI1 expression had worse prognosis than those with low SNAI1 expression. Moreover, cox regression analysis showed that SNAI1 was an independent risk factor for overall survival in CRC patients. KEGG pathway analysis showed that SNAI1 enriched in pathways of Focal adhesion, PI3K-Akt signaling pathway and ECM-receptor interaction. GO analysis revealed that SNAI1 participated in processes of extracellular matrix organization, cell adhesion and collagen catabolic process. Eventually, co-expression analysis revealed that overexpression of SNAI1 was significantly correlated with Biglycan (BGN), indicating that they may cooperate in regulating the process of EMT, thereby inducing cancer cells metastasis.Conclusions: Our results demonstrated that SNAI1 is overexpressed in CRC and acts as an independent molecular marker for prognosis of CRC patients. SNAI1 is a potential biomarker for diagnosis, targeted therapy and prognostic evaluation of CRC.


Introduction
Malignancies are expected to be the leading cause of death and the major obstacle to increasing life expectancy worldwide in the 21st century. There are approximately 18.1 million newly diagnosed cancer cases and 9.6 million cancer deaths worldwide in 2018 [1]. Colorectal cancer (CRC) is one of the most frequently diagnosed malignancy of digestive system, and remains the third most common cause of cancer related death among both men and women [1] [2]. The overall 5-year relative survival rate for CRC patients is about 65% through comprehensive treatment based on operation. For the patients diagnosed with localized stage, 5-year relative survival rate is about 90%, but the 5-year survival declines to 12% for patients diagnosed with advanced stage. Unfortunately, most of the patients are at advanced stage at the time of diagnosis [3]. Metastasis and recurrence are the leading causes of death in patients with CRC, approximately 60% of CRC patients have local or distant metastasis prior to receiving treatment, which leads to a signi cantly worse prognosis and a higher mortality for patients with CRC [4]. The molecular mechanisms underlying the carcinogenesis and metastasis of CRC remain largely unclear, but the evidence available suggested that tumor metastasis, in addition to carcinogenesis, is also a signi cant factor which leads to the poor prognosis of patients with CRC. However, the underlying molecular mechanisms are elusive and are worthy of further research.
Epithelial-mesenchymal transition (EMT), an important step by which tumor epithelial cells gradually lose their epithelial features and acquire a mesenchymal features, has long been recognized as one of the most important steps in initiating cancer cell metastasis and is closely correlated with the ability of tumor invasion and metastasis [5]. Under normal conditions, EMT is integral in the process of embryogenesis, wound healing and stem cell behavior, while under pathological conditions, EMT contributes to brosis and cancer progression [6]. Tumor cells undergoing EMT are characterized by loss of apical-basal polarity, reduced intercellular contacts and adhesion to extracellular matrix, and enhanced invasive capacity [7]. In addition, changes in molecular phenotypes, such as downregulation of epithelial markers (e.g., E-cadherin and β-catenin) and upregulation of mesenchymal markers (e.g., N-cadherin and Vimentin), are also a feature of these tumor cells [7] [8]. Among them, downregulation of E-cadherin is recognized as a signi cant hallmark of EMT [9]. E-cadherin, encoded by tumor suppressor gene CDH1, is a transmembrane glycoprotein which functions as a major contributor to maintaining tight cell-cell contacts [10]. The loss of E-cadherin is thought to be the leading cause of cancer progression because the majority of solid malignancies are carcinomas that arise from epithelial tissue [11]. Hence, exploring the molecular mechanisms underlying the dysregulation of E-cadherin will help to improve our understanding of EMT process, and provide new theoretical basis for the clinical treatment of malignant tumor patients.
The zinc nger molecule snail family transcriptional repressor 1 (SNAI1), a member of zinc-nger transcription factor family, is an important transcriptional factor which functions as a transcriptional repressor [12]. SNAI1 can speci cally binds to the promoter of CDH1 gene through its zinc nger domain and inhibits the gene expression level of CDH1, leading to the loss of E-cadherin expression, and subsequently promotes metastasis of cancer cells [13]. Additionally, SNAI1 has been reported to directly or indirectly inhibit the activity of p53, one of the most important anti-oncogene during carcinogenesis [14]. Previous studies have reported that high expression of SNAI1 was signi cantly correlated with tumor metastasis, recurrence and worse prognosis in multiple types of malignancies, including CRC [15], lung cancer [16], breast cancer [17], and ovarian cancer [18]. However, the role of SNAI1 in the carcinogenesis and metastasis of CRC cells remains largely unclear and deserves more attention. In the present study, we systematically analyzed the expression level, prognostic value, biological function and mutation of SNAI1 to access its importance in human CRC by using a series of integrated bioinformatics approaches.

Pan-cancer analysis.
The Oncomine database (https://www.oncomine.org) is a web-based data mining platform which is designed to collect, standardize, analyze, and deliver cancer transcriptome data to the biomedical research community [19]. We used the Oncomine database to analyze the transcriptome data of zinc-nger transcription factor family in multiple types of cancer tissues and respective normal tissues, the cut-off conditions were set to P-value < 1E-4 and fold change > 2. Subsequently, the mRNA expression levels of zinc-nger transcription factor family in multiple types of cancer were analyzed by using the UALCAN database (http://ualcan.path.uab.edu), an online resource for exploring gene expression variations and survival associations in 31 cancer types [20]. Ultimately, the prognostic values of zincnger transcription factor family in patients with CRC were evaluated via the Gene Expression Pro ling Interactive Analysis (GEPIA, http://gepia.cancer-pku.cn), a web-based tool to deliver fast and customizable functionalities based on The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) projects [21]. The threshold of statistical signi cance was set to logrank P-value < 0.05. information of all the human proteins in cells, tissues and organs by using integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics and systems biology. We used the immunohistochemistry data from the HPA database to compare the differential protein expression of SNAI1 between CRC tissues and normal tissues.

Functional and KEGG pathway enrichment analysis.
To explore the biology function of SNAI1, the genes positively co-expressed with SNAI1 in human CRC were identi ed by using GEPIA and cBioPortal databases, respectively. Subsequently, the genes that appeared in the top 200 results of both GEPIA and cBioPortal databases were identi ed as SNAI1 co-expressed genes by using the intersect function of Venn Diagram (http://bioinformatics.psb.ugent.be/webtools/Venn/). Ultimately, Gene Ontology (GO) analysis including biological process (BP), cellular component (CC), and molecular function (MF) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of SNAI1 co-expressed genes were performed by using the Database for Annotation, Visualization and Integrated Discovery database (DAVID, https://david.ncifcrf.gov/), a web-based bioinformatics resource which provides functional interpretation of large lists of genes derived from genomic studies [25] [26]. Only the enrichment terms with P-value < 0.05 were considered statistical signi cance.
In the present study, a protein-protein interactions (PPI) network of SNAI1 co-expressed genes was generated by using the Search Tool for the Retrieval of Interacting Genes database (STRING, http://stringdb.org) [27],and was visualized via Cytoscape software (v 3.7.2, http://www.cytoscape.org/) [28]. Subsequently, to extract core genes which signi cantly co-expressed with SNAI1, the PPI network was analyzed by using the CytoHubba plugin in Cytoscape software, and the top 10 genes were identi ed by using the Degree, Closeness and Betweenness algorithm, respectively. The genes that simultaneously appeared in all of three datasets were identi ed as core genes by using the intersect function of Venn Diagram, and the expression heat map of core genes was veri ed by UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu/) [29]. We used GEPIA database to evaluate the prognostic values of core co-expressed genes in human CRC and the correlations between SNAI1 and core co-expressed genes. Eventually, relationships between the expression levels of core co-expressed genes and the clinical parameters of CRC patients were analyzed and visualized by using the R software. A P-value < 0.05 was considered statistically signi cant.

Statistical analysis
Most of the statistical analyses were performed by using the bioinformatic platforms mentioned above, and the rest of statistical analyses were performed via the R software (v 3.6.2). The results of Oncomine analysis showed that gene expression pro les of zinc-nger transcription factor family were dysregulated in multiple cancer types (Fig. 1A). In addition, expression pro le of SNAI1 was obviously elevated in CRC tissues compared to normal tissues across multiple CRC datasets (Fig. 2). Subsequently, mRNA expression levels analyzed by UALCAN database further con rmed that SNAI1 was signi cantly overexpressed in CRC tissues compared with normal tissues (Fig. 1B). More importantly, survival analysis revealed that aberrantly expression of SNAI1, but not of SNAI2, SNAI3 and ZEB1, was signi cantly associated with poor overall survival in patients with CRC (Fig. 1C).
3.2 Associations between SNAI1 expression and clinical parameters in patients with colorectal cancer.
Clinical parameters of CRC patients, including survival status, survival time, age, gender, tumor stage, tumor, node and metastasis classi cation (TNM) were obtained from TCGA database. As shown in We applied cBioPortal database to analysis the copy number variations (CNVs) and mutations of SNAI1 in human CRC. As shown in Fig. 5A&B, the frequency of genetic alterations is only 8%, most of which is gene ampli cation, and mutations of SNAI1 in human CRC is very low, only 0.8%. Subsequently, we used COSMIC database to evaluate the mutations of SNAI1in human CRC. As shown in Fig. 5C&D, the types of mutations in SNAI1 include nonsense mutations, missense mutations, and synonymous mutations, the major type of which is missense mutations, up to 53.21%. The types of nucleotide alterations include C > T, G > A, C > A, C > G and G > C mutations, the largest proportion of which is C > T, up to 38.81%.
3.4 Protein expression of SNAI1 in cancer tissues compared to normal tissues.
We used the immunohistochemistry data from the HPA database to further examine the protein expression of SNAI1 in human CRC. As shown in Fig. 6A-D, protein expression of SNAI1 is mainly localized in the nucleus. More importantly, the staining of SNAI1 was signi cantly higher in CRC tissues in relative to normal tissues, con rming our ndings at the mRNA expression level.

KEGG and GO enrichment analysis revealing functional role of SNAI1 in colorectal carcinogenesis.
The top 200 co-expressed genes of SNAI1 in human CRC were obtained by using GEPIA and cBioPortal database, respectively (Fig. 7A&B). A total of 84 common co-expressed genes were identi ed through the intersect function of Venn Diagram (Fig. 7C). Subsequently, we applied DAVID online database to analysis the biological function of these co-expressed genes. KEGG pathway enrichment analysis revealed that the co-expressed genes of SNAI1 were mainly enriched in Focal adhesion, PI3K-Akt signaling pathway, ECM-receptor interaction, and MicroRNAs in cancer (Fig. 7D). GO function analysis showed that the co-expressed genes were mainly enriched in extracellular matrix organization, cell adhesion, collagen catabolic process, and collagen bril organization in BP group. In CC group, coexpressed genes were mainly enriched in extracellular space, extracellular exosome, extracellular region, and extracellular matrix. In MF group, co-expressed genes were mainly enriched in protein binding, platelet-derived growth factor, and collagen binding. The top 10 GO terms according to P-value in each group are shown in Fig. 7E.

BGN was positively co-expressed with SNAI1 in human colorectal cancer.
A PPI network consisted of co-expressed genes were constructed by using the STRING online database, and the network was subsequently analyzed by using Cytoscape software (Fig. 8A). The top 10 genes of the PPI network were identi ed according to different algorithm generated by CytoHubba plug-in, and 5 genes (COL1A1, MMP2, BGN, LOX, and PDGFRB) were identi ed as core genes (Fig. 8B&C). Hierarchical clustering of the core co-expressed genes was performed through the use of UCSC Cancer Genomics Browser, revealing that these core genes were differentially expressed in most CRC samples (Fig. 8D).
Ultimately, prognostic values of these core genes and the correlations between SNAI1 and core genes were evaluated by GEPIA database. Co-expression analysis demonstrated that all of these core genes were positively co-expressed with SNAI1 (Fig. 8E). However, only one of them (BGN) was signi cantly correlated with both poorer overall survival (HR = 2, P = 0.0017) and disease free survival (HR = 2, P = 0.0017) in patients with CRC (Fig. 8F). Therefore, BGN was identi ed as a hub gene which positively coexpressed with SNAI1 in human CRC.

Associations between BGN expression and clinical parameters in colorectal cancer patients.
Based on information of clinical samples obtained from TCGA database, the relationships between BGN expression pattern and clinical parameters of CRC patients were analyzed and visualized by using R software. As shown in Fig. 9A-H, expression of BGN was signi cantly elevated in CRC tissues compared to normal tissues. In addition, BGN expression was also remarkably associated with clinical stage (P = 0.042), T classi cation(P = 0.007) and N classi cation (P = 0.001). However, no signi cant association was found between BGN expression and other clinical variables including M classi cation (P = 0.28), gender (P = 0.443), and age (P = 0.092).

Discussion
The development, invasion and metastasis of tumor is a multistep and complex process. Colorectal cancer is one of the most common malignant tumors of the digestive system, with more than 1.8 million newly diagnosed cases and 800,000 cancer related deaths around the world in 2018 [1]. In addition to tumorigenesis, tumor metastasis seems to be another important cause of cancer related death in patients with CRC [30]. Despite the treatment of cancer has made great progress in recent years, the underlying molecular mechanisms of tumorigenesis and metastasis are complex and worthy of further study. Therefore, to explore the mechanisms of tumor metastasis will help to improve the clinical outcome of patients with CRC.
SNAI1 is an important transcription inhibitory factor, which has been described as the main molecule responsible for regulating the EMT process in multiple cancer types [31][32] [33]. Blocking the Snail signaling pathways may partially or completely prevent the process of tumorigenesis, invasion and metastasis. However, the clinical implications of SNAI1 expression as a biomarker in human CRC have not been well understand. Therefore, we applied multiple online databases and R software in the present study to systematically analyze the role of SNAI1 in human CRC. Our study revealed that SNAI1 was expressed at signi cantly higher levels in CRC samples compared to normal samples, and overexpression of SNAI1 was signi cantly correlated with advanced tumor stage, lymph node metastasis, and distant metastasis. In agreement with the results of our study, previous studies have reported that the expression level of SNAI1 in colorectal cancer tissues and cell lines was signi cantly higher than that in normal control [34]. High expression level of SNAI1 was signi cantly correlated with lymph node metastasis, and SNAI1 was an independent predictor of lymph node metastasis in human CRC [35]. Enforced expression of SNAI1 signi cantly enhanced the invasive and metastatic ability of CRC cells in vitro and vivo [36]. Interestingly, studies have shown that SNAI1 expression can mediate chemoresistance and radiation resistance in human CRC cells [37] [38]. Furthermore, a growing number of studies have reported that high SNAI1 expression was associated with poor overall survival in patients with CRC, and SNAI1 serve as a biomarker for poor prognosis in human CRC [39] [40]. Consistent with those reports, our study found that patients with high SNAI1 expression caused decreased overall survival rate compared to those with low SNAI1 expression. Additionally, univariate and multivariate Cox analyses revealed that overexpression of SNAI1 was a potential independent biomarker for poor prognosis in human CRC. Thus, therapeutic targeting of SNAI1 may be a novel strategy to improve outcomes for CRC patients.
To further explore the molecular mechanisms underlying the role of SNAI1 in human CRC, the biological classi cation of co-expressed genes of SNAI1 was evaluated by DAVID database. Pathway enrichment analysis revealed that co-expressed genes are mainly enriched in Focal adhesion pathway, ECM-receptor interaction, and PI3K-Akt signaling pathway. Focal adhesion pathway has been reported to be responsible for cancer metastasis by weakening the cell-cell and cell-extracellular matrix adhesions [41]. Blocking the signaling cascades of Focal adhesion pathway may be a novel strategy to enhance therapeutic e cacy in colorectal cancer therapy [42]. ECM is the major components of the tumor microenvironment, which affects the biological behaviors of tumor cells [43]. Components of the ECM may be potential therapeutic targets for reducing physical barriers to systemic treatments in patients with mCRC who receive anti-VEGF therapy [44]. Activation of the PI3K-Akt signaling pathway has been reported to induce the SNAI1-mediated EMT process, thereby facilitating the metastasis of CRC cells in vitro and in vivo [45]. Blocking the PI3K-Akt signaling pathway by using the AKT inhibitor can suppress the SNAI1-induced migration and invasion of CRC cells [46]. Together with our results, these data above indicated that the activation of PI3K-Akt signaling pathway is closely involved in the EMT process mediated by SNAI1, targeting PI3K-Akt signaling pathway via SNAI1 inhibition may be a potential strategy for CRC clinical therapy. Inevitably, there is a limitation of our study that all data and results were based on public databases and online bioinformatic tools, further experimental validation will be needed to verify the biological signi cance of SNAI1 expression in human CRC in the future.
Co-expression analysis indicated that the expression of BGN was signi cantly correlated with SNAI1. In line with our study, previous studies have shown that overexpression of BGN was signi cantly correlated with advanced tumor stage and poor clinical outcome in multiple cancer entities, including CRC [47], gastric cancer [48], and prostate cancer [49]. There are also increasing number of studies reported that BGN promotes the EMT process, thereby enhancing the invasive and metastatic capabilities of cancer cells [50] [51]. In addition, BGN has been identi ed as a potential EMT biomarker, which participates in the integrated TGFβ/Snail pathway in CRC cells [52]. These results above suggested that there is a potential link between SNAI1 and BGN expression during the EMT process, targeting these two collaborative genes simultaneously may be an e cient approach for CRC therapy.

Conclusions
In summary, our study performed a systematic analysis to calculate the expression level, prognostic value, biological function and mutation of SNAI1 in human CRC. We con rmed that overexpression of SNAI1 and its partner, BGN, are involved in the process of EMT in human CRC. We speculate that SNAI1 could be a potential biomarker for the early diagnosis, prognostic evaluation, and clinical treatment of CRC. Availability of data and materials

List Of Abbreviations
The datasets generated and/or analyzed during the current study are available in The Cancer Genome Atlas (TCGA) repository (https://www.cancer.gov/tcga).

Competing interests
The authors declare that they have no known competing nancial interests or personal relationships that could have appeared to in uence the work reported in this paper.

Funding
No funding was received.
Authors' contributions JL, TH and QY conceived and designed the study. JL, TH, YW and XW performed the bioinformatics analysis and interpretation of the data. JL drafted the manuscript. QY agreed to be responsible for all aspects of the work to ensure that issues of accuracy or completeness of the study were properly investigated and addressed. All authors read and approved the nal manuscript.  Heat map of SNAI1 mRNA expression in multiple CRC tissues vs. normal tissues.