Integrated computational analysis revealed hub genes in lung adenocarcinoma

Lung (LUCA) is the leading cause of cancer-related morbidities and mortalities globally. Despite the recent advancements in lung cancer research, understanding of the molecular mechanism underlying LUCA tumorigenesis and prognosis remains suboptimal. This study aims to identify the candidate biomarkers and genes in lung In this study, gene expression proles of GSE30219, GSE33532, GSE32863 and GSE43458 were downloaded from GEO. The differentially expressed genes (DEGs) in LUAD tissue and normal lung tissue with a p-value < 0.05 and a |log fold change (FC)| >1.0 were identied by GEO2R. For functional enrichment analysis of these DEGs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed with KOBAS and DAVID tools. Next, the candidate hub genes were ltered out with Cytoscape using CytoHubba plugin. These hub genes were validated by (the Cancer Genome Atlas) TCGA-based gene expression analysis, protein-protein network interaction (PPI) analysis, survival analysis. Moreover, the expression of these genes in cancer and normal tissue was assessed in the Human Protein Atlas (HPA) database. In addition, miRNA network of the hub genes was constructed. Finally, DGIdb database was used to check the of the genes.


Introduction
LUCA is the second leading cause of cancer-related mortality and second to prostate and breast cancer in terms of morbidities [1,2]. According to American Cancer Society statistics, 131,880 and 235,760 lung cancer incidences and deaths were reported in 2021 [3]. There are numerous risk factors associated with LUCA, such as tobacco consumption, genetic susceptibility, poor diet and occupation exposure to air pollution [4]. Based on histopathological appearance, LUCA is classi ed into two major subtypes; nonsmall cell lung cancer (NSCLC) and small cell lung cancer (SCLC). NSCLC accounts for 85% of the total lung cancer, which is further classi ed into three major histological subtypes; lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC) and large cell carcinoma (LCC) [5]. LUAD is the primary histological subtype of lung cancer in China [6]. Surgery is the rst-line treatment and effective option for LUAD [7]. Despite the advancements in surgical procedures, and targeted and immunotherapies, the diagnosis and prognosis remain dismal with the ve-year survival rate of 10-15% [8], which is due to its diagnosis at advance or metastatic stage [9].
Understanding the molecular mechanism underlying LUAD is critical to improve its diagnosis and prognosis by identifying new reliable biomarkers and therapeutic sites. Recently, integrated bioinformatics analysis of the gene expression microarray datasets revealed various hub genes in several cancers [10][11][12]. However, the latest biomarker-related studies are inadequate, and the DEGs ndings may be inconsistent because of the complicated gene regulatory mechanism underlying LUCA tumorigenesis. Moreover, drug resistance remains a bottleneck, often resulting in treatment failure and reduced overall survival (OS) and progression-free survival (PFS). Nevertheless, RNA-sequencing or microarray-based global gene expression pro ling has facilitated the researchers to identify genes that are crucial in LUCA pathology [13]. Using computational approaches, Mengwei et al., identi ed 249 DEGs including 113 upregulated and 136 downregulated genes [14], analysis of the genes with altered expression between the cancer and normal tissue may help in the development of effective biomarkers.
In this study, we performed a multistep analysis to identify candidate genes in LUAD through integrated bioinformatics analysis. To this end, the gene expression pro les in four different gene expression datasets (GSE30219, GSE33532, GSE32863, and GSE43458) containing 263 LUAD patients and 122 normal samples were downloaded from the GEO database, screened. The DEGs were subsequently analyzed using the integrated bioinformatics tools, including GO, KEGG, and PPI network construction.
From the PPI network, hub genes were identi ed and screened for the diagnostic and prognostic potentials. Moreover, the expression of the hub genes was validated in TCGA dataset and in lung cancer cell lines. Furthermore, the miRNA network of the hub genes was constructed. In addition, hub genes targeted by the small molecular drugs were also identi ed.

Materials Methods
Cell culture: Human lung cancer cell lines A549 and PC9, and normal epithelial cell line BEAS-2B were obtained from ATCC and cultured in DMEM (Biological Industries, USA), supplemented with 10% FBS (Biological Industries) and kanamycin. Cell culture was maintained in humidi ed chamber in 5% CO 2 at 37 0 C. GEO datasets collection and data processing: The NCBI-GEO database contains freely available the gene expression pro les. GEO datasets information of the lung cancer was obtained from the GEO database. We selected 4 lung cancer datasets including GSE30219, GSE33532, GSE32863 and GSE43458. Datasets meeting the inclusion criteria de ned as; (i) human as the organism of investigation (ii) LUAD samples, (iii) dataset containing tumor and normal samples, (iv) array-based expression pro ling, the dataset with normal lung tissue sample ≥15 were downloaded from the GEO database. GEO2R a web-based tool to use for the comparison of gene expression in two different groups of the same samples under the same experimental conditions (). We used GEO2R to analyze the selected LUAD datatests and the resultant data was downloaded in tables.
We screen the genes using cut-off criteria as |log2FC> |1 and adjusted p-value of <0.05 as statistically signi cant to identify the DEGs. The average of the expression level was considered for the genes with multiple probes. Locus-by-locus volcano plot was used to distribute the DEGs. Overlapping DEGs among all the four datasets were integrated and represented with the Venn diagram. Locus-by-locus volcano plot and Venn diagram was generated by gplot and Venn diagram packages in R program, respectively [15].
The heatmap of the top20 up-and downregulated genes were generated with Heml [16].
GO and KEGG pathway enrichment analyses DAVID can explore the biological process (BP), molecular function (MF), cellular component (CC). For the functional annotation and pathway enrichment analysis of the DEGs, GO, and KEGG was used through the DAVID database. For pathway enrichment analysis, we de ned a p-value of <0.05 as a cut-off criterion. The list of upregulated and downregulated DEGs were submitted to the DAVID tool, and the top 10 terms for each function were selected. Circos plot is a web-based visualization tool to identify and analyze genomes' similarities and differences [17]. In this study, the GO function of the top 20 upregulated and downregulated DEGs are represented as Circos plot.

Protein-protein interaction network construction and modules analysis
Protein interaction networks can decipher the possible molecular mechanism of genes involved in various cellular processing pathways. We performed the protein-protein Interaction (PPI) network construction by using STRING (version 11.0). A con dence score of >0.4 was set as a threshold. Cytoscape (version 3.7.2) was used for the visualization and analysis of the PPI network. In order to reveal the sub-network of the PPI network, Molecular Complex Detection (MCODE) plugin with default parameters including degree cut off >10, node cut off=0.4, k-core= 2 and max depth=100 was set as selection criteria.
Moreover, functional pathway enrichment analysis of the DEGs in the modules was performed using the KOBAS database with default parameters.

Identi cation of hub genes
Cytohubba can predict vital nodes and subnetworks in a given biological network using several topological analysis strategies (Chin et al., 2014). Cytohubba plugin of Cytoscape was used to identify the candidate hub genes among the overlapped DEGs identi ed based on protein network. Maximal clique centrality (MCC) was selected among the other computing methods to explore hub genes modules.

Survival analysis of the DEGs
We screened all the DEGs for their prognostic potential using the Kaplan-Meier plotter database, which is an online tool to assess the prognostic potential of cancer-associated genes (more than 54 thousand genes) based on their expression level in 21 cancer types including breast cancer (6234 samples), gastric cancer (1440 samples), ovarian (2190) and lung cancer (3452). LUAD patients were classi ed into highexpression and low-expression group based on the gene expression pattern. The analysis is described as survival curves according to the hazard ratio (HR), 95% con dence interval (95% CI), and log-rank Pvalues and survival prognosiscurves. Moreover, the survival analysis of the hub genes was validated in UCALAN, which contain the available TCGA patient survival data based on RNAseq-based gene expression pro ling. The diagnostic value of the hub genes was calculated using Wilson/Brown method with a 95% con dence interval.

Gene expression analysis
Gene expression pro ling interactive analysis (GEPIA), is based on the analysis of RNA sequencing expression data obtained from TCGA and GTEx projects containing 483 LUAD and 347 normal samples. In our study, we used this database, to analyse the expression of the DEGs in different LUAD stages. ANOVA was used to assess the statistical signi cance of the gene expression variation. A Pr (>F) <0.05 was considered statistically signi cant. In addition, we obtained the gene expression pro les of LUAD patients and normal from the TCGA database available on (https://portal.gdc.cancer.gov/). The RNAsequencing expression data from TCGA was ltered and the expression of the hub genes were assessed.

Immunohistochemistry staining
The Human Protein Atlas (HPA) aims at mapping and providing free access to all the human proteins expressed in cells, tissues and organs by the integration of data derived from various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, and transcriptomics and systems biology. In this study, we used HPA to examine the expression of the hub genes expressed in human LUAD and normal bronchial tissue.

Hub genes-drugs interaction network
Using Drug-Gene Interaction Database (DGIB), hub genes that were promisingly targeted by the hub genes the were selected and only the drugs approved by the Food and Drug Administration (FDA) were included. STITCH database contains the known and predicted interaction between proteins and chemicals. In order to construct the hub gene-drug interaction network, the web-based too named STITCH was applied.

Construction of miRNA regulatory network
Gene-Cloud Biotechnology Information (GCBI) is a versatile tool that predicts the expression level of genes in diseases and healthy individual, and possible interactions of genes with non-coding RNA (ncRNA) and transcription factors. miRTarbase contain more than 360000 miRNA-target interactions.
These interactions are validated by reporter assays, western blot, microarray and next-generation sequencing [18]. We employed GCBI and miRTarbase to predict the miRNAs interacting with hub genes.
RNA Extraction and quantitative real-time polymerase chain reaction (qRT-PCR): Total RNA was extracted from A549, PC9 and BEAS-2B cells using TRIzol reagent following the manufacturer instruction (Invitrogen, USA). 2 µg total RNA was converted into complementary DNA (cDNA) using RevertAid RT reverse transcription kit (Thermo Fisher Scienti c). The SYBR-Green supermix kit (TaKara) was used to perform real-time PCR experiment to quantify the gene expression. The thermal cycling procedure was 95°C for 10 min, followed by 40 cycles of 95°C for 10 s and 60°C for 30s. The relative expression levels of the hub genes were determined by 2-ΔΔCt method and normalized to housekeeping genes; GAPDH [19]. All the qPCR reactions were performed in triplicate. The sequence of the primers used in this study is mentioned in table S1.

Statistical analysis
Gene expression data of the related genes is presented as mean ± SD and analysed using the paired ttest. All the analysis was performed with Graphpad Prism (8.0). A p-value of <0.05 was considered as statistically signi cant. The diagnostic value of the hub genes in LUAD was assessed using ROC curves and AUC obtained by Wilson/Brown method with a 95% con dence interval.

Identi cation and screening of DEGs between LUAD tissue and normal tissue in each dataset
We employed a multistep strategy to explore candidate genes, which are important in LUAD pathogenesis by integrating bioinformatics tools ( gure 1). After the preliminary search, we identi ed 4 datasets; GSE30219, GSE33532, GSE32863 and GSE43458 based on the inclusion criteria. GSE32863 and GSE43458 datasets contained only LUAD patient gene expression pro les, whereas only LUAD pathological subtype patients in GSE30219 and GSE33532 were screened. A total of 263 LUAD and 123 normal lung tissue samples were obtained in this study. The information of the dataset used in this study is summarized in table 1.
After screening the datasets based on the cut-off criteria, a set of 9014 DEGs including 3182 and 5832 up-and down-regulated genes were identi ed between LUAD and normal lung tissue (table 1). There were including 1362 upregulated and 2394 downregulated in GSE33532, 1026 DEGs, including 319 upregulated and 707 downregulated in GSE43458, and 1576 DEGs, including 644 upregulated and 932 downregulated genes in GSE32863 dataset. The volcano plots representing the distribution of the DEGs in each dataset are shown in gure 2A-D. Next, we further identi ed the genes, which were overlapped among all the four datasets. A total of 73 and 259 overlapping upregulated and downregulated genes, respectively were identi ed among all the datasets, which are represented in Venn diagrams ( gure 2E and 2F, table 2). Moreover, among these commonly shared DEGs we identi ed the top 20 upregulated and downregulated genes and displayed on the heatmap ( gure 2G).
Functional pathway enrichment analysis GO and KEGG pathway enrichment analysis deciphers the function and relationship of the genes with various biological functions. To assess the functional roles of the 73 upregulated and 257 downregulated genes, DAVID database was used (table S2). Based on the p-value and cut-off criteria, the DEGs were classi ed into three functions; biological function, molecular function (MF) and cellular function (CF). The biological function (BF) category of the GO analysis revealed the enrichment of the upregulated genes in cellular and metabolic processes, cell division, tissue development and regulation of biological processes ( gure S1A). Regarding the CF, DEGs were enriched in extracellular region part, proteinaceous extracellular matrix, cell-cell junction. ( gure S1B). Moreover, the DEGs were enriched in protein and chromatin binding and catalytic activity in MF category of the GO analysis ( gure S1C). Circos plot showing GO enrichment of the 20 upregulated genes illustrates the role of these genes in collagen catabolic and metabolic processes, extracellular structure organization and tissue development ( gure 3A). Based on KEGG enrichment analysis, the upregulated genes signi cantly enriched in the cytoplasm and extracellular region ( gure 3B). Moreover, the downregulated genes were enriched in multicellular organismal processes and regulation of biological processes, metabolic process, immune system (BF) ( gure S1D) extracellular matrix, extracellular region part, (CF) ( gure S1E), and binding and catalytic activity (MF) ( gure S1F). Circos plot showing GO enrichment of the 20 downregulated genes shows the regulation of vasculature and system development, angiogenesis, cell motility and regulation of cell proliferation ( gure 3C). Based on KEGG enrichment analysis, the downregulated genes signi cantly enriched in multicellular organismal processes, developmental processes and extracellular region ( gure 3D).

Protein-protein interaction (PPI) network analysis
The establishment of PPI network interaction of these DEGs were constructed based on the information obtained from STRING database. A list of 330 DEGs were submitted to STRING database and a network was obtained, which consisted a total number of 935 edges and 292 nodes ( gure S2A). Signi cant cluster module identi cation via MCODE-plugin in Cytoscape revealed 4 modules containing 55 genes in the whole PPI network ( gure 4A, 4B, 4C and 4D). We calculated the top-ranked hub genes by using the MCC. The top 4 functional clusters of modules were selected (module 1, MCODE score=12; module 2, MCODE score=9; module 3, MCODE score=5.0; module 4, MCODE score=5). The number of edges, nodes and genes in 4 modules are listed in table 3. DAVID database was used to perform the KEGG pathway analysis of the genes in modules (table S3). The genes in module 1 were involved in cell cycle, cell division, p53 signaling pathways. The genes in module 2 were enriched in HIF-1, TNF and PI3-Akt signaling pathways. The genes in module 3 were involved in vascular smooth muscle contraction, protein digestion and absorption, relaxin signaling pathway and pathways in cancer. Moreover, the genes in module 4 were enriched in adherens junction, acute myeloid leukemia, PPAR signaling pathway and FoxO signaling pathway. Next, the cytohubba plugin in cytoscape was applied to identify the hub genes in the modules and obtained 10 hub genes (KIF20A, MELK, CCNB2, CDC20, TOP2A, PECAM1, CDH5, IL6, ZBTB16 and RNF144B) for LUAD. Among the 10 hub genes, 5 genes were upregulated and 5 genes were downregulated. STRING database was used to construct the protein interaction network of the hub genes with each other and with neighboring genes ( gure S2B and S2C). The interaction network of the hub genes comprised up of 10 nodes, 17 edges with a p-value of 3.57e-05. In addition, hub genes coexpression analysis revealed possible interaction of among the hub genes ( gure 4E).

Validation of the prognostic and diagnostic potentials of the hub genes
In order to corroborate the importance of the and hub genes in LUAD, the prognostic potential of the hub genes was investigated by using KM plotter database.

Validation of the expression of hub genes
In order to validate the expression of hub genes in TCGA, fragments per kilobase of transcript per million mapped reads (FPKM) gene expression pro les from the TCGA database available at (https://portal.gdc.cancer.gov/) were downloaded ltered and merged to produce one le, which contained the gene expression data for both LUAD and normal samples. The expression of the hub genes was assessed. The result showed that KIF20A, MELK, CCNB2, CDC20 and TOP2A genes were signi cantly upregulated in LUAD patients compared to the normal samples ( gure 6A). However, PECAM1, CDH5, LEPREL1, ZBTB16 and RNF144B were downregulated in LUAD patients ( gure 6B).
Next, the expression of hub genes was determined in two lung cancer cell lines; A549 and PC9, and normal bronchial epithelial cells BEAS-2B. We found that the expression of KIF20A, MELK, CCNB2, CDC20 genes was upregulated in lung cancer cell lines compared to the normal bronchial epithelium cell ( gure 6C), whereas, the expression of PECAM1, CDH5, LEPREL1 genes was reduced in the lung cancer cell lines compared to BEAS-2B ( gure 6D).

Veri cation of the expression of hub genes on protein level
In addition, the expression of the hub genes was assessed at protein level in the LUAD and normal lung tissue and using HPA database. The expression of KIF20A, MELK, CCNB2, CDC20 and TOP2A was highly expressed in LUAD tissue compared to the normal lung tissue ( gure 7A-7E). Contrarily, a reduced expression of PECAM1, CDH5, LEPREL1, ZBTB16 and RNF144B in LUAD as compared to the normal lung tissue ( gure 7F-7J). The clinical information of the immunohistochemistry (IHC) in human proteome atlas is available in table S4.

Hub gene-drug interaction network
Using DGIdb for exploring the drug-gene interaction, a list of 117 potential drugs for treating LUAD was identi ed (table 5). These drugs are all FDA-approved drugs. Among the hub genes, TOP2A was found to interact with 97/118 drugs, either as an inhibitor or in an unknown mechanism, followed by MELK, which interacted with 18 drugs ( Given the role of miRNA in regulating biological processes, we identi ed the miRNAs targets of the hub genes using mirRTarbase and GCBI databases. The miRNA network was constructed and visualized with Cytoscape, which consisted of 9 hub genes and 165 interacting miRNAs. The interaction of the miRNAs with their corresponding hub genes are represented in gure 9A. Based on their degree of closeness, the top 10 molecules in the interaction network were ltered. RNF144B, ZBTB16 and CDH5, LEPREL1 and TOP2A with a degree score of 30,30,27,25 and 16, respectively were the top 5 miRNA targeting genes. Moreover, hsa-miR-16-5p, hsa-miR-128-3p, hsa-let-7f-5p, hsa-let-7c-5p and hsa-let-7a-5p, were the top 5 interactive miRNAs with a score of 52, 47, 46, 45.95 and 45.96, respectively ( gure 9B). The miRNA network of the 9 hub genes constructed by using GCBI database is shown in gure S5.

Genetic alteration of the hub genes
Investigation of the mutation in genes associated with biological functions are important. Several mutations in these genes can lead to LUCA [20] We investigated the genetic alteration in the hub genes using Cbioportal and identi ed various alteration in 19% of patients or samples. The highest frequency of alteration was observed in MELK, CDC20, TOP2A, PECAM1 and CDH5 (7%, 3%, 2.6%, 2.6% and 1.3%), respectively. These alterations included missense mutation, truncating mutation, ampli cation, deep deletion and no alteration ( gure 11A). Among the alterations, ampli cation accounted for the highest percentage of the alterations followed by deep deletion and mutations ( gure 11B).

Discussion
Lung cancer contributes to the highest proportion of cancer-related deaths worldwide. LUAD is the most common histological type of lung cancer. Despite the remarkable improvements in surgical procedures and medical therapy made over the last few years, the overall survival of LUAD patients remains poorer. The cause of such high mortality is mainly due to diagnosis of LUAD patients in advanced stages, which is primarily due to the lack of effective diagnostic tools. The dysregulated expression of numerous genes has been reported in LUAD. Gene expression pro ling of the lung cancer patients vs normal individuals has facilitated the modern research to identify the genes that are important in disease diagnosis, prognosis, and therapy. Recently, advancements in bioinformatics tools, a growing amount microarray and high throughput sequencing data have conferred a simple and sophisticated platform to discover genetic alteration, identify DEGs and understand the molecular mechanism of tumor diagnosis and prognosis, and therapy [21].
In this study, using bioinformatics analysis of the four GEO datasets of lung cancer, we identi ed 332 DEGs including 73 up-and 259 down-regulated genes. The DEGs with elevated expression were enriched in cellular and metabolic processes, cell division, tissue development and regulation of biological processes. The downregulated DEGs were enriched in biological functions such as multicellular organismal processes and regulation of biological processes, metabolic process, immune system. Pathway enrichment analysis via KEGG and GO revealed the role of these genes in various biological pathways related to lung cancer including developmental process, immune system processing, cellular component organization, and regulation of biological processes, cellular and metabolic processes.
PPI network of a list of 330 DEGs was constructed using STRING database, which consisted a total number of 935 edges and 292 nodes and identi ed 4 modules containing 55 genes (table 3). The functional enrichment of modules genes revealed that these genes regulated cell cycle, cell division, p53 signaling pathways, vascular smooth muscle contraction, protein digestion and absorption, relaxin signaling pathway and pathways, adherens junction, acute myeloid leukemia, PPAR signaling pathway and FoxO signaling pathway. These ndings also highlighted important clue to exploring the molecular interaction involved in LUAD progression.
Top ranked hub genes were selected using Cytoscape. Next, the expression level of these hub genes in cancer and normal lung tissue was veri ed using GEPIA database. The expression results were in line with that of the GEO database. The differential expression of these hub genes in different LUAD stages were also assessed. Owing to the signi cant differential expression of CDC20, MELK and TOP2A in the early and advance LUAD patients ( gure 10), we postulate that this nding may answer why the increased expression of CDC20, MELK and TOP2A related to poorer PFS in early LUAD stage ( gure S6B and C). Whereas RNF144B was associated with better OS and PFS in advanced stages in LUAD patients ( gure 10 and gure S6I). In addition, these genes also possess high high diagnostic accuracy with AUC <0.9 ( gure S4). Therefore, we proposed whether CDC20, MELK, TOP2A and RNF144B might be considered as biomarker in the diagnosis and prognosis of LUAD. We further identi ed the expression of these hub genes based on the TCGA gene expression data of the LUAD patients and normal samples, which further corroborated the importance of these hub genes in LUAD. The processes of tumor invasion and metastasis are governed by complex molecular mechanism. In this study, the dysregulated expression of hub genes was obvious LUAD tissues compared with normal lung tissue ( gure 7), suggesting the possible role of these genes in tumorigenesis and progression. Despite many studies have described these hub genes related with diagnosis, prognosis and treatment of various cancers, the precise regulatory mechanism of LUAD tumorigenesis and progression remain elusive.
Compared with other studies, this study integrates various datasets with a larger sample size including 263 cancer and 122 normal lung tissues, respectively. Moreover, gene expression data of LUAD patient have been investigated in this study.
Kinesin-like family member 20A (KIF20A) plays a crucial role in organization of central spindle at anaphase and cytokinesis stages. It is an oncogene, which facilitate cancer cell proliferation and inhibiting apoptosis thus facilitating the malignant phenotype of LUAD [22] . The overexpression of KIF20A predicted the poor OS and PFS in LUAD patients in our study ( gure 5B and gure S6A). Moreover, the increased expression of KIF20A was found in lung cancer cell lines ( gure 6C and D), meaning the tumorigenic role of KIF20A in LUAD. Maternal Embryonic Leucine Zipper Kinase (MELK) is a cell cycledependent kinase protein that has been reported in various cancer including SCLC [23]. However, the precise tumorigenic mechanism of MELK needs further studies. In our study, we found that the increased expression of MELK was related with poorer OS and PFS ( gure 5C and gure S6B). In addition, we also found numerous potentially interacting miRNAs ( gure S5). Cyclin B2 (CCNB2) involved in G2/M transition, has been associated with tumor degree, size, stage and metastasis as well as poor prognosis in NSCLC [24]. However, the role of CCNB2 in LUAD tumorigenesis is not well-elucidated. Type 2 topoisomerase alpha (TOP2A) encodes a nuclear enzyme called DNA Topoisomerase II, which is involved in DNA replication. TOP2A has been reported in many cancers including lung cancer [25]. The downregulation of TOP2A reduced the proliferation, migration and invasion abilities in lung cancer cells [25]. T2a and HER-2 of TOP2A in epithelial ovarian carcinoma was rst reported by [26], whereas in this study, the alteration in TOP2A mainly included ampli cation, missense mutation and truncated mutations ( gure 11). The cell cycle division cycle protein 20 homolog (CDC20), a vital cell division regulator has been implicated in lung cancer [27]. The high expression of CDC20 have been documented to be associated with poor prognosis in multiple cancers [28][29][30]. In our study, the higher expression of CDC20 favored a poorer OS and FPS, suggesting that the diagnostic potentials of CDC20 ( gure 5E and gure S6E).
In addition to the upregulated hub genes, downregulated hub genes were also identi ed in the present study. Platelet endothelial cell adhesion molecule-1 (PECAM-1) a member of immunoglobulin superfamily, is expressed on the vascular endothelium and has been involved in the late progression of metastatic tumors. The microvessel density (MVD) of PECAM-1 was associated with tumor stages and a high PECAM-1 MVD had worse overall survival in either adenocarcinoma or EGFR mutation subgroups [31]. We found a reduced expression of PECAM-1 in the later LUAD stages compared to the initial stages and the reduced expression is related to poor PFS and OS ( gure 10, gure 5G and gure S6F). The expression of cadherin 5 (CDH5) also known as vascular endothelial cadherin (VE-cadherin) is an angiogenic factor. The aberrant expression of VE-cadherin has been found to correlated with lymph node metastasis in NSCLC, suggesting its role in tumor metastasis [32]. However, the biomarker potentials of CDH5 in LUAD is not documented. Indeed, we evidenced a differential expression of CDH5 in LUAD tissues as compared to normal lung tissue ( gure 6B) and the decrease expression was correlated with worse OS and PFS ( gure 5H and gure S6G). Leprecan-like 1 (LEPREL1), also known as P3H2 is abundantly found in basement membrane [33]. The lower expression of LEPREL1 has been reported in hepatocellular carcinoma (HCC) without any correlation with the clinical parameters of HCC [34]. P3H3, a member of the prolyl hydroxylase family exhibits tumor suppressor activity in lung cancer. P3H3 is silenced by the DNA methylation and the ectopic expression of P3H3 circumvented cell proliferation, colony formation, metastasis and invasion of lung cancer [35]. We found the decreased expression of LEPREL1 in A549 and PC9 cell lines as compared to the BEAS cell line and in cancer patients as compared to healthy person ( gure 6). Moreover, the lower expression of LEPREL predicted the poorer OS and PFS ( gure 5I and gure S6J). Further studies are needed to explore whether LEPREL1 is an independent factor in the diagnosis and/or development of LUAD. Zinc nger and BTB domaincontaining 16 (ZBTB16) a transcription factor, which regulate cell proliferation, differentiation, development of organ, stem cell maintenance and development of innate immunity, has been reported both as a tumor suppressor and oncogene in various cancer [36]. However, its role in lung cancer is not well-explained. We found the decreased expression of ZBTB16 in TCGA-dataset and in lung cancer cell lines and the decreased expression was also associated with poor OS and PFS of lung cancer patients ( gure 5K, gure 6 and gure S6H). RNF144B also known as PIR2, is an E3-ubiquitin ligase involved in the regulation of cell apoptosis and proliferation. It has been recognized as important biomarker in endometrial cancer [36,37]. We observed a reduced expression of RNF144B in LUAD tissue, TCGA dataset and in lung cancer cell lines ( gure 6). Moreover, the downregulation of RNF144B was correlated with poor OS of lung cancer patients ( gure 5J and gure S6I). These ndings suggest the biomarker potentials of PECAM1, CDH5, ZBTB16 and RNF144B in LUAD. Moreover, we further con rmed the differential expression of hub genes in LUAD tissue vs normal lung tissue using HPA ( gure 7), which further corroborated the importance of these hub genes in LUAD.
In this study, a list of 150 potential drugs for treating LUAD was identi ed. Among the hub genes, TOP2A and MELK was potentially targeted by 96 and 17 drugs, respectively. Most of these drugs are inhibitors in nature. Only a few of the TOP2A and MELK targeting drugs have been used for the treatment of lung cancer such as etoposide (topoisomerase II inhibiting anti-cancer drug) has been used for treating NSLC [38], dexrazoxane (a catalytic inhibitor of DNA topoisomerase II that lack double stranded DNA break activity) possess with protective function against anthracycline-induced cardiotoxicity and extravasation injuries, has been administered with cyclophosphamide, adriamycin and vincristine (Advanced small cell lung cancer treated with CAV [39], sunitinib (a tyrosine kinase inhibitor possessing broad antitumor e cacy) [40] and nintedanib (inhibitor of receptor tyrosine kinase and nonreceptor tyrosine kinase) along with docetaxel is administered for NSCLC patients after rst-line chemotherapy in European countries [41]. Further investigation and clinical trials are needed to explore the mechanism of these drugs that could be effective against lung cancer. Consequently, based on the ndings of this study and aforementioned studies, these genes are crucial in LUAD tumorigenesis and we hope that future studies may concentrate on investigating the detailed mechanism of these hub genes, especially PECAM-1, CDH5, ZBTB16, RNF144B, TOP2A and MELK.
hsa-miR-128-3p contributes to the development of chemoresistance and metastasis NSCLC patients by suppressing the expression of the inhibitors of Wnt/β-catenin and TGF-β pathways. Whereas the antagonism of hsa-miR-128-3p reverse metastasis and chemoresistance in NSCLC patients [43]. hsa-let-7f-5p hsa-let-7a-5p showed the lowest expression in NSCLC as compared to chronic obstructive pulmonary disease patients and healthy individuals [44]. However, the interaction of these miRNAs is based on the in-silico prediction and needs experimental validations. Therefore, in view of the vital role of these 10 hub genes based on the ndings of this study and the previously published studies described above, further research is needed to understand the precise molecular mechanisms in the tumorigenesis and prognosis of LUAD, especially exploring MELK, CCNB2, PECAM1, ZBTB16 and RNF144B.
Despite the interesting ndings, the are still several shortcomings in the present study. Firstly, the data is obtained from the public databases and datasets, not produced by the authors. Secondly, the gene expression analysis was performed in TCGA and GEPIA databases, and in cell lines but not validated in the lung cancer patients. Moreover, due to the lack of functional genetic studies, we couldn't investigate the impact of the hub gene-miRNA networks on the detection and treatment of LUAD. Nevertheless, the integrated bioinformatics approach used in this study provide more robust results compared to single dataset studies, and opens up new research avenues to investigate these hub genes in future.
In summary, we identi ed signi cant DEGs in lung cancer by integrated bioinformatics analysis.
Consequently, these DEGs were screened out and 10 hub genes were identi ed, which were further validated in TCGA, GEPIA and HPA databases, and lung cancer cell lines. Moreover, a list of drugs interacting with the hub-genes were identi ed. Finally, candidate miRNAs interacting with hub genes were identi ed. These hub genes may pave the way for the development of novel and reliable biomarkers in LUAD.