Exploring the hub genes and mechanisms of Daphne altaica treating esophageal squamous cell carcinoma based on network pharmacology and bioinformatics analysis

Esophageal squamous cell carcinoma (ESCC), is a frequent digestive tract malignant carcinoma with a high fatality rate. Daphne altaica (D. altaica), a medicinal plant that is frequently employed in Kazakh traditional medicine, and which has traditionally been used to cure cancer and respiratory conditions, but research on the mechanism is lacking. Therefore, we examined and verified the hub genes and mechanism of D. altaica treating ESCC. Active compounds and targets of D. altaica were screened by databases such as TCMSP, and ESCC targets were screened by databases such as GeneCards and constructed the compound-target network and PPI network. Meantime, data sets between tissues and adjacent non-cancerous tissues from GEO database (GSE100942, GPL570) were analyzed to obtain DEGs using the limma package in R. Hub genes were validated using data from the Kaplan–Meier plotter database, TIMER2.0 and GEPIA2 databases. Finally, AutoDock software was used to predict the binding sites through molecular docking. In total, 830 compound targets were obtained from TCMSP and other databases. In addition, 17,710 disease targets were acquired based on GeneCards and other databases. In addition, we constructed the compound-target network and PPI network. Then, 127 DEGs were observed (82 up-regulated and 45 down-regulated genes). Hub genes were screened including TOP2A, NUF2, CDKN2A, BCHE, and NEK2, and had been validated with the help of several publicly available databases. Finally, molecular docking results showed more stable binding between five hub genes and active compounds. In the present study, five hub genes were screened and validated, and potential mechanisms of action were predicted, which could provide a theoretical understanding of the treatment of ESCC with D. altaica.


Introduction
Esophageal carcinoma (EC) is the seventh most aggressive type of malignancy and because of lacking biomarkers that can facilitate early detection, its treatment remains a challenge. Identifications of EC are two major histological forms, namely, Esophageal adenocarcinoma (EAC) and Esophageal squamous cell carcinoma (ESCC), and each of them shows differences in the incidence among populations that are geographically separated (Tungekar et al. 2018). ESCC tends to have a higher global proportion than EAC, and it accounts for 90% of all cases of Esophageal carcinoma worldwide. According to research, the ESCC incidence in the Kazakh population is much higher than in other minorities in Xinjiang, northwestern China (Zheng et al. 2010). There has been some progress in its possible etiological mechanisms in several studies, including behavioral and environmental risk factors and genetic alterations (Zheng et al. 2011). There are research reports, on the unique Sendaer Hailati and Ziruo Talihati have contributed equally to this work.
1 3 eating habits of the Kazakh population in northern Xinjiang, such as drinking hot milk tea, bacon, air-dried meat, fermented dairy products, and low intake of vegetables and fruits (Han 2015). In addition, Esophageal carcinoma is also closely related to genetic factors, which makes the incidence of Esophageal carcinoma in the Xinjiang Kazakh population cluster in families Chen et al. 2006). However, the most complex mechanisms by which the Kazakh population suffers abnormally remain largely unknown.
Daphne altaica (D. altaica), it was first mentioned as spicy, heated, and lethal in the 15th-century Kazakh classic work "Chipagar Bayan" (Chinese: "Medicine"). The rare plant is primarily found in the Altai Mountains, namely, in the Tacheng and Habahe region of Xinjiang's Junggar Basin, the Altai, Manrak, and Talbagatai Mountains in Kazakhstan, the Altai region of Russia, and northwest Mongolia (Kizaibek et al. 2011). The medicinal component of D. altaica is the stem bark, which the Kazakhs refer plant as "Wusuoyihe" or "Hasherjidek". It has been part of traditional Kazakh medicine for centuries and is used in the treatment of cancer and respiratory conditions (Kizaibek et al. 2020). It is mostly used to treat malignant tumors, such as stomach cancer and esophageal cancer, as well as colds, bronchitis, pharyngeal swelling and pain, rheumatoid arthritis, toothaches, and snake bites which are poisonous (Murat et al. 2016). Studies have shown that ethanol extracts from different parts of D. altaica can inhibit human esophageal cancer cells (Eca-109), human gastric cancer cell lines (AGS), cervical cancer cells (Hela), liver cancer cells (SMMC-7721) four types of cancer cells proliferation in vitro (Kizaibek et al. 2011). The potential role of D. altaica on human esophageal cancer cells Eca-109 was investigated at the cellular level by Kizaibek et al. (2011Kizaibek et al. ( , 2020. It was concluded that D. altaica may induce apoptosis, cause cell cycle arrest and upregulate intracellular PPARγ gene expression to accomplish these although the theoretical mechanism of action is still unclear.
Within the current study, we had examined and verified the hub genes and mechanism of the Kazakh traditional medicine D. altaica treating ESCC using network pharmacology in combination with bioinformatics Analysis and molecular docking, with the hope that this study will make a contribution to research on the therapies of ESCC.

Screening for D. altaica and ESCC targets
Through literature review learning about D. altaica active compounds, we used the TCMSP Platform (Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform) (https:// old. tcmsp-e. com/ tcmsp. php), BATMAN-TCM Database (Bioinformatics Analysis Tool of Molecular Mechanism of Traditional Chinese Medicine) (http:// bionet. ncpsb. org. cn/ batman-tcm/), PubChem Database (https:// pubch em. ncbi. nlm. nih. gov/), and SwissTarget-Prediction Database (http:// www. swiss targe tpred iction. ch/) to collect and predict targets of D. altaica active compounds. In clinical treatment, TCMs are often administered orally. ADME-related models such as oral bioavailability (OB) (Xu et al. 2012) and drug-likeness (DL) (Walters and Murcko 2002) mostly affect the absorption of drugs through the gastrointestinal tract. The further characterization of bioactive components was performed following standards, such as OB ≥ 20% and DL ≥ 0.18, and the related targets for each component were determined. In addition, BATMAN-TCM under the standards of Score ≥ 20. Our next step was to search for targets of disease using the search term "Esophageal squamous cell carcinoma (ESCC)" by The Comparative Toxicogenomics Database (CTD) (http:// ctdba se. org/), Therapeutic Target Database (TTD) (http:// db. idrbl ab. net/ ttd/), and GeneCards Database (https:// www. genec ards. org/) under the standards of Score ≥ 20.

Network construction
The compound-target network was constructed by Cytoscape 3.9.1 to import the active compounds and targets of the drug. In addition, we got the interaction of proteins using STRING (https:// cn. string-db. org/). The protein-protein interaction (PPI) network was then generated based on Cytoscape 3.9.1

Differentially expressed genes screening
From the GEO database, we obtained data on clinical samples of ESCC (https:// www. ncbi. nlm. nih. gov/ geo/) (series: GSE100942, GPL570). Clinical samples were collected from ESCC patients undergoing surgical tumor resection. For RNA extraction and hybridization on Affymetrix microarrays, paired adjacent non-tumor tissues were also collected from the proximal resection borders (> 5 cm from the ESCC sample). Differential analysis was performed using R Studio, and the cutoff for identifying DEG was set at |log2 (Fold Change)|> 2 and p value < 0.05.

Functional enrichment analysis
Gene ontology analysis comprises three parts: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC) (The Gene Ontology Consortium 2017), which were used for the description of gene functions. The bioinformation and system function of the hub genes were identified using the Kyoto Encyclopedia of Genes and Gnomes enrichment analysis, which resulted in enrichment of biological signaling pathways (Ogata et al. 1999). We inputted the results of the analysis into R and used the R package Bioconductor ClusterProfiler to perform both of the above enrichment analysis by gene cluster ).

Screening hub genes by prognostic analysis
For prognostic analysis, the KM Plotter database (http:// kmplot. com/ analy sis/) was used to perform survival analysis and for the screening of hub genes, p < 0.05 was considered statistically relevant (Nagy et al. 2021).

Validation analysis of hub genes
The TIMER2.0 database (http:// timer. comp-genom ics. org/) was used to perform Pan-cancer analysis on hub genes. Using this database, the expression differences between tumour and surrounding normal tissue were investigated for hub gene from each TCGA tumor. The results were shown as box plots showing the levels of gene expression. The stars were annotations of the statistic significances calculated by the Wilcoxon signed rank test. In addition, for each type of cancer, it can be used to determine which genes are upregulated or downregulated in tumors as compared to normal tissue. GEPIA2 database (http:// gepia2. cancer-pku. cn/) Box Plot was used to examine the expression levels of hub genes. The differential analysis was performed on the chosen data sets (TCGA tumors vs. TCGA normal + GTEx normal). The method was one-way ANOVA, with disease status ("Tumor" or "Normal") as the variables used to calculate expression differences: Gene expression ~ Disease status. Data were initially log2(TPM + 1) converted for differential analysis, and log2FC was expressed as the median (Tumor)median (Normal). Genes were considered to be differentially expressed genes if |log 2FC|> 1 and p value < 0.01.

Molecular docking
Molecular docking was then used to determine whether core drug compounds discovered by network pharmacology were associated with hub genes. Docking was performed as described in the following: 1. On the basis of the results of the previous screening from Network Pharmacology and Bioinformatics analysis, we got five hub genes (TOP2A, NUF2, CDKN2A, BCHE, NEK2). The 3D structures of the target proteins were obtained using the Protein Database (PDB) (https:// www. rcsb. org/) (Berman et al. 2000) and were downloaded in PDB format. Next, the protein was configured in AutoDock4.2.6 to remove the water, and add hydrogen and the structure was saved as PDBQT format (Morris et al. 2009). 2. In the above screening outcomes, we got the two most active compounds of the drug (Apigenin and Luteolin).
Using the TCMSP and PubChem databases, we obtained the protein structures of the compounds (Kim et al. 2019). In the same process as in the previous step, the protein was modified in AutoDock4.2.6 to remove the water and add hydrogen, and the ligand files were saved in PDBQT format. 3. To perform molecular docking, we imported the above two structure files into AutoDock4.2.6. The minimum binding energy was calculated using the PDBQT for-

Introduction to the present study
The present study aimed to obtain the hub genes of the Kazakh drug D. altaica for the treatment of ESCC and to investigate their potential molecular mechanisms. The flow chart of this study is shown in Fig. 1. First, we obtained targets of D. altaica and ESCC using databases, such as TCMSP and GeneCards. Using the limma package in R, we analyzed 127 differentially expressed genes on the basis of the gene expression profiles of the ESCC and healthy patients in the GEO database. Then, 12 overlapping genes were obtained by cross-tabulation analysis as the core genes for treatment. Survival analysis were performed to screen for five hub genes, and hub gene expression was validated using publicly available databases. Finally, to further investigate the interactions of the hub genes with the compounds, molecular docking was carried out using AutoDock software.

D. altaica and ESCC targets collection
We obtained ethanol and chloroform extraction and analytically identified 21 compounds from the studies of Shi (2015) and Cong (2015), of which 13 were active compounds. Then, we obtained 165, 153, and 512 compound targets using the TCMSP, BATMAN-TCM, and Swis-sTargetPrediction three databases, respectively (Table 1).
In addition, we used the CTD, TTD, and GeneCards databases to obtain 15,912, 3, and 1795 disease targets, respectively.

Compound-target network and PPI network construction
Based on 13 active drug compounds and 830 targets, we constructed a compound-target network using Cytoscape 3.9.1 (Fig. 2a). This network consisted of one drug, 13 compounds, and 830 compound targets, and the compounds were sorted by Degree value. The higher the value of the degree, the more biological functions it is involved in, and the stronger its biological importance (Li et al. 2009) (Table 2). Meantime, we got 409 overlapping genes of Drug-Disease by the Venn diagram (Fig. 2b). And then, the STRING database was used to obtain the PPI information. After that, we put it into Cytoscape 3.9.1 to obtain the PPI network diagram (Fig. 2c) and set the protein size according to the Degree value (Table 3).

Acquisition of differentially expressed genes
We obtained RNA extraction and hybridization data from ESCC patients' cancer and paracancerous tissues through the GEO database and obtained DEGs of micro RNA data using R Studio. We obtained 127 DEGs, including 82 ups and 45 downs (Supplementary Table 1). A volcano plot (Fig. 3a), a heatmap (Fig. 3b), and a PCA plot (Fig. 3c, d) were generated to show the distribution of DEGs. A GSEA plot (Fig. 3e, f) showed that DEGs were closely related to DNA replication, Fanconi anemia pathway, Glycosaminoglycan biosynthesis-keratan sulfate, Homologous recombination, Mismatch repair, Cell cycle, Nucleocytoplasmic transport, Proteasome and Ribosome biogenesis in eukaryotes.

Functional enrichment analysis
GO analysis and KEGG pathway enrichment analysis were carried out with the R package Bioconductor ClusterProfiler to clarify the signaling pathways of the 12 overlapping genes (Supplementary Table 2). GO analysis results showed that the 12 overlapping genes in BP terms were mainly enriched in the positive regulation of cell death, cytokine-mediated signaling pathway, extracellular matrix disassembly, collagen metabolic process, and inflammatory cell apoptotic process (Fig. 4b). The CC terms included chromosome, endoplasmic reticulum lumen, extracellular matrix, kinetochore, spindle pole, centromeric region, and senescence-associated heterochromatin focus (Fig. 4c). In MF terms, they were enriched in enzyme binding, metallopeptidase activity, endopeptidase activity, metalloendopeptidase activity, histone deacetylase binding, serinetype endopeptidase activity, DNA topoisomerase activity (Fig. 4d). In addition, the result of KEGG analysis indicated that mainly enriched in the IL-17 signaling pathway, Pathways in cancer, Platinum drug resistance, Cell cycle, Viral carcinogenesis, MicroRNAs in cancer, and p53 signaling pathway (Fig. 4e, Table 4).

Validation analysis of hub genes
The next five hub genes were validated. First, we explored hub genes' expression in multiple tumor types for pan-cancer analysis using the TIMER2.0 database. The pan-cancer analysis results revealed that the hub genes NEK2, NUF2, TOP2A, and CDKN2A were up-regulated in most tumors with high significance (p value < 0.001), while BCHE was down-regulated in tumors including ESCC (Fig. 5b). Similarly, Box Plot results in the GEPIA2 database revealed that the mRNA levels of NEK2, NUF2, TOP2A and CDKN2A were up-regulated considerably in ESCC samples compared to the normal group (p value < 0.01), while the mRNA levels of BCHE were not significantly down-regulated (Fig. 5c).

Discussion
In 2020, Esophageal cancer ranked as the seventh most common cancer, accounting for one in eighteen all-cancer deaths worldwide (Sung et al. 2021). ESCC is the most commonly occurring histological sub-type of EC, a common digestive system cancer, which has been a worldwide concern in developing countries such as China or India due to its high morbidity and mortality (Alsop and Sharma 2016). Chemotherapeutic agents, including docetaxel and paclitaxel, for the treatment of ESCC, have limited therapeutic potential and numerous side effects in clinical practice (Nakajima and Kato 2013). Although several treatment options have been developed, improvements to these existing treatment modalities are required because of poor prognosis and low survival rates related to late diagnosis (Tustumi et al. 2016). Consequently, it is needed to identify new potential targets of ESCC by exploring the molecular mechanisms that lead to ESCC to improve clinical outcomes.
TCM is also widely used as a medical treatment for ESCC. Daphne altaica (D. altaica) is a medicinal herb used to treat many ailments, and is historically used to treat cancer and diseases of the respiratory system in traditional Kazakh medicine (Kizaibek et al. 2020). However, the complexity of the ingredients and targets limits the research into the mechanisms of Traditional Chinese Medicines (Liu et al. 2022a).
In the present study, we identified 13 active compounds in D. altaica with 830 targets. Based on the data analyzed, it was known that the compounds Apigenin and Luteolin had higher degree values. By analyzing the PPI network diagram we obtained that AKT1 and TP53 were the two proteins with high relevance to the treatment. The 127 DEGs of the ESCC were obtained by analyzing GEO data, and 12 overlapped genes were identified by target screening analysis with drugs, diseases, and DEGs as the core genes for treatment, including BCHE, CDC20, CDKN2A, CXCR2, IL6, MMP1, MMP3, MMP12, NEK2, NUF2, PTGS2, and TOP2A. Gene functional enrichment analysis revealed that they were mainly enriched in the IL-17 signaling pathway, Pathways in cancer, Cell cycle, MicroRNAs in cancer, and p53 signaling pathway. After survival analysis, we found that the levels of BCHE, CDKN2A, NEK2, NUF2, and TOP2A had a significant relationship with the prognosis, and these five genes were considered to be the hub genes in the D. altaica treatment of ESCC. Besides, molecular docking revealed that BCHE, CDKN2A, NEK2, NUF2, and TOP2A had good binding affinities with Apigenin and Luteolin.   (Manning and Toker 2017). AKT1 is an important regulator of the phosphoinositide 3-kinase (PI3K)/AKT signal transduction pathway, which regulates the proliferation and survivability of cells (Burgering and Coffer 1995). The study by Alwhaibi A et al. showed that AKT1 has a critical role in modulating tumor angiogenesis and cancer metastasis (Alwhaibi et al. 2019). Ou R et al. demonstrated that gain-of-function and loss-of-function tests indicated that circ-AKT1 promotes CC cell growth and migration (Ou et al. 2020). TP53, the p53 tumor suppressor protein, is an important transcription factor that acts as a failsafe in the cells' defence against tumors by inhibiting the proliferation or viability of cells under a variety of stresses (Kastenhuber and Lowe 2017;Vousden and Prives 2009;Macheret and Halazonetis 2015). From a clinical point of view, TP53 mutation has been associated with a worse prognosis in several types of cancer. However, this remains controversial (Robles and Harris 2010). The TP53 family of transcription factors, which includes TP53, TP63, and TP73, is involved in biologically and pathologically relevant development of cancer and neurons (Agostini et al. 2018). The study by Yao et al. found that TP53 is up-regulated in ESCC tumors and has a critical function in ESCC cell growth, survival and migratory processes. It could be an activator of the mTOR pathway and an inhibitor of TP53 dependent autophagy. As a result, it could shed new light on the possible value of TP53 as a biomarker for diagnosis and as well as a useful therapeutic target in ESCC (Yao et al. 2021).
Butyrylcholinesterase (BChE, EC 3.1.1.8), also described as plasma cholinesterase or pseudocholinesterase, is a serine hydrolase that is found in most of the tissue of mammals, the most abundant being in the plasma and liver (Johnson and Moore 2013;Lockridge 2015). Clinical studies have shown that its levels in the serum are decreased in conditions including liver damage, inflammation, infection, and malignant tumor (Lampón et al. 2012). In the study of Lampón et al., BCHE as a biomarker for EC may be a predictor of its prognosis. It may also have significant immunotherapy implications (Liu et al. 2022b).
The CDKN2A gene also known as p16INK4 is thought to help suppress tumors (Foulkes et al. 1997). The CDKN2A protein is critical for inhibiting cell cycle progression, and the down-regulation of p16 could lead to increased cell proliferation and may contribute to the pathogenesis of several cancers (Conceição et al. 2015). In a study by Chen et al., they analyzed CDKN2A levels in 33 tumors and found that CDKN2A may be a promising prognostic biomarker with potential molecular mechanisms to influence the survival outcomes of cancer patients (Chen et al. 2021).
Never in mitosis-related kinase 2 (NEK2) is a serine/ threonine kinase that facilitates the division of centrosomes and guarantees the accurate segregation of chromosomes in the G2/M phase of the cell cycle (Xia et al. 2015). The prognostic significance of increased NEK2 expression and its role in ESCC in proliferation, migration, and EMT was demonstrated in a study by Su et al. They concluded that NEK2 interacts with YAP1 and phosphorylates it at Thr-143, thereby attenuating its ubiquitylation in vitro and promoting its stability, thereby mediating the progression of ESCC (Su et al. 2022). Previous studies have indicated that NEK2 inhibitors efficiently inhibit cancer cell proliferation by causing arrest of the cellular cycle and apoptotic cell death, and markedly inhibition of tumor growth in-vivo (Xi et al. 2017;Fang et al. 2016).
NUF2, also named CDCA1, is a critical component of the Ndc80/NUF2 composite and is necessary for the establishment of a stable kinetochore-microtubule attachment and chromosome alignment throughout mitosis ). In the study by Zheng et al.,NUF2 was identified as one of the six novel markers of ESCC and could also be used as one of the potential putative markers for ESCC treatment (Zheng et al. 2021).
TOP2A is an isoform of Topoisomerase II (TOP2), a core protein necessary for DNA replication and cell division (Wang et al. 2022). TOP2A has been identified in numerous studies as a biomarker for prognosis and a promising target for therapy in bladder cancer, including bladder urothelial carcinoma (BLCA), lung adenocarcinoma, prostate cancer, colorectal cancer, and breast cancer (Zeng et al. 2019;Du et al. 2020;de Resende et al. 2013;Zhang et al. 2018; Badawy and Loay 2019). Yang et al. identified TOP2A as 1 of the top 10 candidate genes predicted to be implicated in the treatment of ESCC, and it was also found to be the top three hub genes targeted by the majority of miRNAs (Yang et al. 2019).
Among all polyphenols, the five most common plant polyphenols are kaempferol, quercetin, prunetin, apigenin and lignan (Abid et al. 2022). A study showed that a significant reduction in the risk and potential of ovarian cancer was observed due to regular intake of apigenin. After a course of medical trials, it was postulated that apigenin together with lignocaine is an effective chemotherapeutic agent . In a study by Rahmani et al., apigenin was   h TOP2A-Luteolin, affinity = − 2.53 kcal·mol −1 ; i NUF2-Apigenin, affinity = − 2.79 kcal·mol −1 ; j NUF2-Luteolin, affinity = − 2.08 kcal·mol −1 demonstrated to play an essential role in cancer inhibition through a multitude of molecular interactions and processes that modulate apoptotic mechanisms, abnormal cell signaling, and oncogenic protein networks to inhibit carcinogenesis (Rahmani et al. 2022). Apigenin causes apoptosis in esophageal cancer cells through disruption of membrane structure, according to Zhu et al. (2016). Chen et al. found Luteolin could cause G2/M arrest of the cellular cycle and mitochondrially mediated apoptotic cell death in EC cells, while in vivo studies also demonstrated that lignocaine could markedly inhibit tumor progression in a mouse model of ESCC (Chen et al. 2017). According to the above expressions, Apigenin and Luteolin have some therapeutic effects on ESCC as well as other cancers. The treatment targets and drugs revealed by this study are both promising and innovative for personalized treatment, although this study was merely a prediction that needed to be validated by further experiments in-vivo and in-vitro.

Conclusion
To conclude, we identified five hub genes and two active compounds for the D. altaica treatment of ESCC by network pharmacology and validated their effects by bioinformatics analysis and molecular docking. The five hub genes may act as potential novel targets for detecting, prognosticating, and targeting ESCC. These findings predicted new therapeutic targets for the Kazakh traditional drug D. altaica treatment of ESCC.