Review and screening of key genes in esophageal squamous cell carcinoma

Esophageal squamous cell carcinoma (ESCC) is the most common type of human esophageal cancer with high mortality due to late stage diagnosis. Efforts have been made to gure out the genetic events underlying its carcinogenesis and progression, but the molecular mechanisms of these processes remain elusive. To identify the candidate genes involved in ESCC, literature about signicantly mutated genes (SMGs) was extensively reviewed and gene expression proles of GSE161533, GSE20347 and GSE77861 were downloaded from the Gene Expression Omnibus (GEO) database. Following the identication of 230 differentially expressed genes (DEGs), hub gene identication was performed by the plug-in MCODE in Cytoscape software. 14 hub genes were identied which were enriched in cell cycle, DNA replication and p53 signaling pathway. In summary, genes mentioned in this study may provide potential targets for treatment and diagnosis of ESCC and help us better understand the pathogenesis and progression of ESCC from genetic perspective.


Introduction
Esophageal cancer is the eighth most common cancer and the sixth leading cause of cancer-related mortality in the world 1,2 . ESCC is the major histological type accounting for about 90% of the 456,000 incident esophageal cancers each year 3 . The 5-year survival rate for ESCC is about 18%, a number that re ects limited approaches of early diagnosis and treatment of ESCC 4 . Thus, there is a great need to further gure out the molecular mechanisms and to develop better diagnostic and therapeutic methods for ESCC. The pathogenesis of ESCC is believed to be a multi-step process and the genetic determinants remain elusive. Increasing evidence shows that gene mutation plays a key role in ESCC tumorigenesis and tumor progression. These genes include upregulated genes ADAM29, AJUBA, CBX4/8, CCND1(BCL1/PRAD1), EGFR(ERBB1), ERBB2(HER-2), FAM135B, FGFR1, KMT2D(MLL2/MLL4/ALR), MMP14, MYC, NOTCH, NRF2(NFE2L2), PIK3CA, RB1, SOX2, TP53, XPO1, YAP1 and downregulated genes CDKN2A, CREBBP/EP300, CUL3, FAT1, FBXW7, KMT2C(MLL3), PTEN, TET2, TGFBR2, ZFP36L2, ZNF750 5-11 (Table 1). Genes involved in cell cycle, the Notch signaling pathway, epigenetic processes and RTK/PI3K/AKT circuit are frequently altered 12 . Cell cycle progression is changed mostly by TP53 mutation, CDKN2A deletion/mutation and CCND1 ampli cation 5 . TP53 is the most signi cantly mutated genes (SMGs) in ESCC with mutation frequency reaching 93% 12 . NOTCH plays a dual role as both a tumor suppressor pathway and an oncogenic pathway, for which further studies are warranted 13 . Abudureheman et al. have shown that overexpression of KMT2D facilitates ESCC tumor progression, and that it may exert oncogenic role via activation of epithelial-to-mesenchymal transition (EMT) 14 . In a largesample study with ESCC in China 15 , PIK3CA was signi cantly overexpressed in cancer tissue and its overexpression was independently associated with higher risk of local recurrence 15 . EGFR and FGFR1 were the most often ampli ed RTK/RAS-related genes in ESCC 9 , of which the inhibitors have been under therapeutic evaluation 16,17 . It's worth noting that con icting results were found in studies about prognostic value of TP53 overexpression in ESCC 18-21 , of which the biological functions were undoubted.
In general, there are still signi cant gaps to ll to gure out the exact mechanism of carcinogenesis and to develop precision treatment means of ESCC.
Gene chip or gene pro le is an advanced gene detection technique that can quickly detect all the genes within the same sample at one time 22 . Last two decades have seen more and more studies of genetic alterations in cancers via microarray technology and bioinformatics analysis, which have helped us identify the differentially expressed genes (DEGs) and related pathways in ESCC. However, the results were always limited or inconsistent because of tissue or sample heterogeneity in independent studies, or the results were produced from a single cohort study. Thus, integrated bioinformatics analysis combined with gene pro ling technique might be innovative and solve this disadvantage. In this work, we downloaded three microarray datasets GSE161533, GSE20347, GSE77861 from NCBI-Gene Expression Omnibus database (NCBI-GEO) (Available online: https://www.ncbi.nlm.nih.gov/geo) followed by DEGs identi cation via GEO2R analysis. Subsequently, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis and protein-protein interaction (PPI) network analysis were conducted to help us understand the molecular mechanisms of carcinogenesis and progression of ESCC. In summary, 230 DEGs and 14 hub genes were found in this study, which may serve as potential biomarkers for individualized prevention, early diagnosis and precise treatment.

Identi cation of DEGs in ESCC
After analyzing with GEO2R with adj. P value < 0.01, |log FC| > 1, DEGs (1504 in GSE161533, 1680 in GSE20347 and 972 in GSE77861) were identi ed. The intersection of three gene sets of DEGs contains 230 genes as shown in venn diagram (Fig.1), which consists of 144 upregulated genes and 86 downregulated genes between normal and ESCC tissues.

GO and KEGG enrichment analysis of DEGs
To annotate the DEGs, GO and KEGG enrichment analysis were performed using DAVID, with P value<0.05 considered signi cant. GO analysis results (Fig.2) showed that changes in biological processes (BP) were mainly enriched in oxidation-reduction process, positive regulation of cell proliferation, cell-cell adhesion and in ammatory response. Changes in cellular components (CC) of DEGs were signi cantly enriched in cytoplasm, extracellular exosome, cytosol and extracellular space. Changes in molecular function (MF) of DEGs were enriched in calcium ion binding, protein homodimerization activity, cadherin binding involved in cell-cell adhesion and actin binding. And KEGG analysis showed that DEGs mainly enriched in transcriptional misregulation in cancer and p53 signaling pathway.

PPI network construction and hub genes identi cation
Prediction of the functional interaction was conducted by STRING online and the PPI network of DEGs was constructed by Cytoscape (Fig.3). Subsequently, 14 hub genes were identi ed with MCODE score > 10 (Fig.4).

Hub gene analysis
Among all hub genes, only ESPL1 is down-regulated. GO and KEGG analysis network of hub genes was performed using ClueGO (Fig.5). Result showed that hub genes were mainly enriched in positive regulation of mitotic cell cycle phase transition, regulation of cytokinesis and DNA replication origin binding. Subsequently, we conducted an extensive literature search on the hub genes.
AURKA, which is signi cantly overexpressed in various cancers including ESCC 54 has been reported to  role in ESCC remains blank. RRM1 has been found to be an oncogene in lung cancer 81 , the overexpression of which is involved in tumor progression 82 and is transforming to the therapeutic target 83,84 . A large-scale, long-term follow-up retrospective analysis 85 showed that TOP2A expression was not only associated with perineural invasion and poorer differentiation, but it could be also an independent prognostic factor. Additionally, as TOP2A is a speci c marker for the use of chemotherapeutic drugs such as anthracycline, therapy targeting TOP2A protein may be an appropriate way of individualized treatment and improving the prognosis of ESCC patients. Studies 86,87 using immunohistochemical analysis con rmed that UBE2C protein expression was upregulated in all ESCC cases, but absent in the histologically normal tumor surrounding tissues, pointing out its role as a diagnostic biomarker for ESCC.
Besides, high expression of UBE2C is a marker of poor prognosis in ESCC 87 .