Screening of Potential Molecular Targets for Proximal, Mid, and Distal Esophageal Cancers


 Objective: This study aimed to screen the potential molecular targets for proximal, mid, and distal esophageal cancers.Methods: The clinical data, RNA-seq data, and survival data for TCGA esophageal cancer were retrieved from UCSC Xena database. The samples with clinical information on the esophageal tumor central location were selected. Differential analyses on different groups (mid vs. proximal, distal vs. proximal, and distal vs. mid) were performed. Following the differentially expressed genes (DEGs) in three groups were analyzed, the common genes were subjected to survival analysis, miRNA prediction, and drug-gene analysis. The specific genes were annotated for functional and module analyses.Results: Six common genes, namely KCNA1, CCDC196, UNC5CL, CYP3A5, CA10, and REG3A, were analyzed among three comparison groups. The expression of CA10 and REG3A was significantly correlated in the prognosis of patients with esophageal cancer. The distal vs. mid groups screened 766 specific DEGs, and the mid vs. proximal group screened 99 specific DEGs. Functional analysis showed that the genes of distal vs. mid groups, including EDN3, CGA, CCR9, and GABRA2, as well as genes in the mid vs. proximal groups, including GCGR, OXTR, and MCHR2, were significantly enriched in neural functions. In the distal vs. proximal groups, there were 314 specific DEGs, including CYP2C8 and OXGR1.Conclusion: Six common genes may serve as the potential molecular targets for the treatment of all proximal, mid, and distal esophageal cancers.


Introduction
Esophageal cancer is a severe malignancy, which has a fatal outcome in the vast majority of cases due to its extremely aggressive nature [1,2]. Currently, esophageal cancer is the eighth most common incident cancer worldwide, and the sixth leading cause of cancer-related death [3]. Moreover, its incidence is rapidly increasing [4]. Radiotherapy is a commonly used treatment modality in conjunction with chemotherapy as either a de nitive or pre-operative treatment for esophageal cancer [5,6]. Nevertheless, the optimal choice of therapeutic strategy must be individualized, based on the disease progression and comorbidities of patients [7].
The esophagus is a long tubular organ extending from the neck to the upper abdomen, surrounded by various structures [8,9]. Esophageal tumors tend to grow and migrate longitudinally along the esophageal wall. Therefore, esophageal cancer is divided into proximal, mid, and distal cancer depending on the tumor position. Generally, surgery plays an important role in achieving locoregional control in patients with mid and distal esophageal cancer [10]. Proximal esophageal cancer is located between the cricopharyngeal muscle and the sternal notch, accounting for less than 5% of all esophageal cancers [11], the management of which remains controversial. If the tumor invades an adjacent organ or has distant metastases, radiotherapy, chemotherapy, and chemoradiotherapy are the therapeutic options regardless of the tumor positions [10]. Recently, the emerging understanding of the molecular mechanisms underlying carcinogenesis provided screen critical molecular targets, which leaded to the development of drugs that target speci c molecular events linked to the progression of many cancers, including esophageal cancer [12]. We speculated that the screening of potential molecular targets in esophageal cancer of different position might guide the treatment of proximal, mid, and distal esophageal cancers.
In this study, we collected the RNA-seq data of esophageal cancer from a public database, and the samples included clinical data of esophageal tumor central location. With analysis of the differentially expressed genes (DEGs) between different tumor positions, we screened several key biomarkers or potential molecular targets for proximal, mid, and distal esophageal cancers.

Data retrieval and preprocessing
The clinical data, RNA-seq data, mutation data, and survival data from the Cancer Genome Atlas (TCGA) esophageal cancer were retrieved from the University Of California Santa Cruz, UCSC (UCSC) Xena database. The original gene expression value in TCGA was stored in the form of ENSGXXXXXXXXXXXX.X, which was converted into gene symbol by the R biomaRt package [13]. The data not matching the gene symbol were deleted. In the case that the same gene symbol corresponded to multiple ENSG numbers, the average value of multiple ENSGs was taken as the nal expression value of this gene symbol. The clinical expression matrix matched to a gene symbol was extracted according to the gene type, and only the mRNA was extracted for subsequent analysis. Additionally, since the mRNA expressed at low levels may interfere with the results of differential analysis, these mRNAs (that were expressed in at least 1/4th of the samples) were removed.

Clinical information processing
Normal or healthy samples were removed from the esophageal cancer clinical expression pro les, and only tumor samples were selected. We included these esophageal cancers who had never undergone chemotherapy or radiotherapy. The pathological type of esophageal cancer included in the present study was squamous cell carcinoma. Moreover, from the remaining tumor samples, the samples with clinical information of esophageal tumor central location were included, and the samples without clinical information were excluded. The remaining samples were included in the subsequent analysis, and the corresponding expression pro le and survival data of these samples were extracted from the total expression pro les and survival data for differential analysis.

Differential analysis
To investigate the differences in gene expression in esophageal cancer patients with different tumor central locations, edgeR package in R [14] was used to perform differential analyses on different groups (mid vs. proximal, distal vs. proximal and distal vs. mid). Differential analysis was performed based on count value, and the threshold of DEGs for screening was set as p-value < 0.05 and |log fold change (FC)| > 0.585. The three groups of DEGs obtained were analyzed through Venn diagrams (http://bioinformatics.psb.ugent.be/webtools/Venn/).

Analysis of common DEGs from three groups
The expression of common genes was validated in three groups using the Logcpm function in edgeR packet, and the results are shown in boxplots. To investigate the prognostic role of common genes in esophageal cancer, survival analysis was performed using these common genes. The survival analysis was based on the standardized data processed by Logcpm function in edgeR and the TCGA clinical survival data. Based on the standardized data, the optimal cutoff was calculated using the R survminer package [15]. The data values higher than the optimal cutoff were considered as high expression and those lower than the optimal cutoff were considered as low expression. Survival package of R [16] was used for survival analysis, and the gene with p < 0.05 was considered as a signi cant prognosis-related gene in esophageal cancer.
The R maftools package [17] was used to study the mutations of common genes in esophageal cancer. Statistical analysis was also carried out for common genes. The mutation les were obtained from TCGA Genomic Data Commons (GDC) preprocessed by Mutect.
Additionally, the drug-gene interaction database (DGIdb) [18] was used to analyze the drug-gene interaction pairs of common genes, and the interacting relations were visualized by Cytoscape.
Furthermore, to study the regulatory mechanism of miRNAs in those common genes, miRNA-target interaction was predicted by miRWalk 2.0 [19]. The databases used included miRWalk, Microt4, miRanda, mirbridge, miRDB, miRMap, miRNAMap, Pictar2, PITA, RNA22, RNAhybrid, and Targetscan. The interacting relations, in at least nine databases, were considered as positive relations, and the obtained relations were visualized using Cytoscape.

Analysis of speci c DEGs in three groups
The speci c DEGs in three groups were performed for gene ontology biological process (GO-BP) and pathway analyses using Metascape [20]. In addition to functional analysis, Metascape could also perform protein-protein interaction (PPI) analysis and MCODE [21] subnetwork mining for the input genes. MCODE can calculate the node information of nodes in the network, and the node with the highest score was a seed.

Differential analysis
Principal component analysis (PCA) of the original data in three groups showed that except for one sample, the other samples were relatively clustered; thus, the outlier was deleted for subsequent analysis. Logical analysis of three groups of differential genes The six common genes among the three comparison groups were analyzed. These genes are potassium voltage-gated channel subfamily A member 1 (KCNA1), coiled-coil domain containing 196 (CCDC196), unc-5 family C-terminal like (UNC5CL), cytochrome P450 family 3 subfamily A member 5 (CYP3A5), carbonic anhydrase 10 (CA10) and regenerating family member 3 alpha (REG3A) (Fig. 2A). The boxplots for six genes are shown in Fig. 2B. Additionally, there were 99 speci c genes in the mid vs. proximal groups, 314 speci c genes in the distal vs. proximal groups, and 766 speci c genes in the distal vs. mid groups.

Mutation analysis of six common genes in esophageal cancer
In terms of the overall genetic mutations in esophageal cancer, missense mutation was the main variant type in esophageal cancer (Fig. 3A). A missense mutation is a base substitution mutation that results in changes in the amino acid sequence of a polypeptide or the base sequence of functional RNA, most of which have harmful or lethal effects. Analysis of the mutation of six common genes showed that KCNA1, CA10, and REG3A had mutation data in TCGA esophageal cancer, and all the variant types were missense mutations (Fig. 3B).

Survival analysis of six common genes in esophageal cancer
Survival analysis showed that among the six common genes, CA10 and REG3A had a signi cant effect on the prognosis of patients with esophageal cancer (p < 0.05). Low expression of CA10 and high expression of REG3A had a good prognosis of esophageal cancer (Fig. 4).
According to the screening conditions described in the method section, three genes, namely UNC5CL, KCNA1, and CA10, had shown a strong miRNA-target regulatory relation (Fig. 5A). Additionally, three genes, namely CYP3A5, KCNA1, and CA10, were predicted to have drug-gene interactions. Nifedipine had interactions with both CYP3A5 and KCNA1. Guanidine hydrochloride and dalfampridine also had interaction with KCNA1 (Fig. 5B).
Function and pathway analyses of speci c genes A total of 99 speci c genes in the mid vs. proximal groups were enriched in GOs associated with axonemal dynein complex assembly, positive regulation of synaptic transmission, central nervous system neuron differentiation, and the pathway of neuroactive ligand-receptor interaction. The 314 speci c genes in the distal vs. proximal groups were enriched in GOs associated with digestion, regulation of neuron differentiation, negative regulation of the Wnt signaling pathway, and regulation of cell cycle checkpoint, as well as pathways of protein digestion and absorption, and sphingolipid metabolism. The 766 genes speci c to the distal vs. mid groups were enriched in GO related to corni cation, digestion, regulation of intestinal cholesterol absorption, neurotransmitter transport, and pathways of neuroactive ligand-receptor interaction, fat digestion and absorption, and SLC-mediated transmembrane transport (Fig. 6).

Discussion
The proximal, mid, and distal esophageal cancers have different biological characteristics. Therefore, surgical and radiation treatment options are usually different. In this study, we selected the DEGs among the three types of esophageal cancers to screen the targeted molecules for the treatment of esophageal cancers. After analysis, six genes were screened to be differentially expressed in all three groups.
Additionally, hundreds of DEGs speci c to three comparison groups were also analyzed. These genes may serve as potential therapeutic targets for esophageal cancers.
In this study, six genes were differentially expressed among the three groups. KCNA1 and CYP3A5 were regulated by nifedipine, which is the most common kind of calcium channel blocker and is used to treat high blood pressure and angina. Especially, calcium channel blockers were recently reported to be associated with cancer [22]. A previous study has reported that cancer cells that have DNA mismatch repair de ciency are resistant to many cytotoxic drugs. Calcium channel blockers may inhibit the pathways that cause drug-resistance [23]. Thus, we speculated that KCNA1 and CYP3A5 might be therapeutic targets of esophageal cancer.
CA10 and REG3A were signi cantly correlated in the prognosis of patients with esophageal cancer. The CA10 gene encodes a protein belonging to the carbonic anhydrase family of zinc metalloenzymes. Hypoxia and acidosis are prominent features of many tumors, leading to a different metabolism from normal or healthy cells. Two of the simplest metabolic products, protons, and bicarbonate, are produced by the catalytic activity of the carbonic anhydrase [24]. Carbonic anhydrase IX is considered as a molecular marker that may predict the prognosis of renal cancer [25]. REG3A is a member of the REG protein family, which has been demonstrated to be involved in the development of some digestive tumors, including gastric cancer, hepatocellular carcinoma, and colorectal cancer [26]. However, its role in esophageal cancer has not yet been reported.
For the other two genes UNC5CL and CCDC196, UNC5CL was found to be regulated by miR-337-3p and miR-512-5p. miR-337 has been reported to distinguish esophageal squamous cell carcinoma patients from healthy controls [27]. Interestingly, a recent study reported that rs10484761 loci, located in a region of ~ 200-kb on chromosome 6p21, near UNC5CL gene, was a novel susceptible locus for esophageal squamous cell carcinoma. CCDC196 belongs to coiled-coil domain-containing proteins that are directly associated with migratory, invasive, and metastatic phenotypes of cancer cells [28][29][30]. The role of CCDC196 in esophageal cancer is unknown. Taken together, given the signi cant differential expression of the KCNA1, CYP3A5, CA10, REG3A, UNC5CL, and CCDC196 among three tumor locations, we speculate that these genes might serve as the potential molecular targets for the treatment of all proximal, mid, and distal esophageal cancers. Additionally, CA10 and REG3A might also be potential prognostic factors of esophageal cancer.
In addition to common genes found in the three groups, there were also some genes speci c to each position of esophageal cancers. The distal vs. mid groups had the most speci c DEGs (766), which suggested that there may exist large differences between the two positions. Mid vs. proximal groups only screened 99 speci c DEGs. Interestingly, functional analysis showed that the seed genes of distal vs. mid groups (including EDN3, CGA, CCR9, and GABRA2), as well as genes in mid vs. proximal group (such as GCGR, OXTR, and MCHR2), were signi cantly enriched in neural functions, including neuroactive ligandreceptor interaction, transmission of nerve impulse, and neuronal system. Emerging evidence has suggested that nervous microenvironment plays a critical role in cancer progression [31][32][33]. Quail et al. [34] have reported that nervous microenvironment is an important factor in the early stage of invasion and metastasis of pancreatic cancers. Therefore, treatment targeted at neuro-microenvironment may provide a novel strategy to prevent the progression of some cancers.
Nevertheless, the role of the neuronal system in esophageal cancer has not been fully studied. We speculate that EDN3, CGA, CCR9, and GABRA2 might serve as potential targets during the treatment of distal and mid esophageal cancers. Additionally, GCGR, OXTR, and MCHR2 may be therapeutic targets of mid and proximal esophageal cancers.
In the distal vs. proximal groups, there were 314 speci c DEGs, including CYP2C8 and OXGR1. CYP2C8 encodes a member of the cytochrome P450 superfamily of enzymes, which play essential roles in the metabolism of endogenous and exogenous molecules [35]. The local expression of cytochrome P450s is essential for cancer management since these functionally associated enzymes might be involved in determining the anticancer drug sensitivity [36]. OXGR1 belongs to the G protein-coupled receptor superfamily, and it is frequently found to be hypermethylated in hepatocellular carcinoma [37]. A recent study revealed that overexpression of OXGR1 promoted prostate cancer development [38]. Thus, we speculated that CYP2C8 and OXGR1 might be used as biomarkers to distinguish between distal and proximal esophageal cancers.
Although the genes screened from our analyses are promising candidate genes, we have not validated these genes through in vivo or in vitro experiments. Therefore, we will further validate and strengthen our ndings by developing animal models and collecting clinical samples.
In conclusion, KCNA1, CYP3A5, CA10, REG3A, UNC5CL, and CCDC196 may serve as the potential molecular targets for the treatment of all proximal, mid, and distal esophageal cancers. CA10 and REG3A may be prognostic factors of esophageal cancer. EDN3, CGA, CCR9, and GABRA2 may serve as potential targets during the treatment of distal and mid esophageal cancers. GCGR, OXTR, and MCHR2 may be therapeutic targets of mid and proximal esophageal cancers. CYP2C8 and OXGR1 may be used as biomarkers to distinguish between distal and proximal esophageal cancers.

Declarations
Ethics approval and consent to participate Not Applicable.

Consent for publication
Not applicable.

Competing Interests
The authors declare that no con icts of interest exist. The principal component analysis (PCA) (A) and a volcano plot (B) for three comparison groups. In volcano plots, yellow represents upregulated genes, and blue represents downregulated genes.   The biological processes and pathways enriched by differentially expressed genes in three comparison groups.