Identication of Hub Genes, Modules and Metabolic Pathways Associated With Lung Adenocarcinoma: A System Biology Approach

Lung cancer is the most common and fatal malignant tumour worldwide with a ve ‐ year overall survival rate of only 15%. Lung adenocarcinoma (LUAD) is a heterogeneous disease. The use of microarray datasets along with bioinformatics knowledge might help to clarify the expression prole of cancer, molecular markers for the initial screening of tumour and the underlying biological mechanisms. The present study is designed to identify differential expression genes and molecular mechanisms of LUAD compared to normal lung tissues using systems biology approaches. Methods Four GSE datasets (GSE75037, GSE63459, GSE32863 and GSE10072) were selected from the Gene Expression Omnibus (GEO) database. Data processing and meta-analysis were performed using the R statistical programming language, The differentially expressed genes (DEGs) associated with each stage were obtained. The common and unique DEGs between stages of LUAD and adjacent normal lung tissues were initiated by Venny tool. Common genes, including upregulated and downregulated genes, were then analyzed to a STRING database to obtain protein-protein interaction (PPI). STRING output was analyzed by MCODE and CytoHubba applications of Cytoscape to identify modules of co-expression and hub genes, respectively. Results The shared upregulated and downregulated DEGs among LUAD stages were 22 and 140 genes, respectively, when compared to normal lung tissues. Unique genes for each stage were also identied. The hub genes were PECAM1, TEK, CDH5, VWF and ANGPT1. Most of the top cluster genes were enriched for Gα(s) signalling events, GPCR ligand binding, class B/2 (Secretin family receptors), platelet activation, signalling and aggregation in the main three co-expression clusters. Most of the shared genes (16 genes) were enriched in the metabolic pathway of hemostasis. Meaningful signaling pathways for unique genes were found at each stage. Conclusions the present three co-expression clusters, metabolic pathways and biological processes identied underlying LUAD pathogenesis, development and progression at different stages. Unique upregulated and downregulated DEGs each stage identied FERMT1 SIX1 specic early-stage diagnostic biomarkers for stage IB and IIB. 5 hub genes were observed, including PECAM1, TEK, CDH5, vWF and ANGPT1 which might be crucial for the onset and progression of LUAD. the patients with LUAD 16, 17 . However, these studies examined only a limited number of patients and restricted genetic dysregulations or geographical variation 18 . Therefore, the application of computational systems biology approach might help to shed lighting on the molecular aspects of LUAD. Discovering the molecular mechanisms involved in LUAD can help to identify the pathogenesis of this cancer. On the other hand, nding specic LUAD biomarkers in any stage of the disease can help with the screening and early diagnosis. The present study is designed to identify differentially expressed genes and molecular mechanisms of LUAD compared to normal lung tissues using systems biology approaches. Lastly, gene expression patterns at each stage of LUAD, ranging from I to IV, pluscommon and denite dysregulated genes related to each stage were represented. gene for which no meaningful pathway was found. The signaling pathways associated with the unique genes in Stage IIA were Translocation of ZAP-70 to Immunological synapse (FDR = 0.04), Phosphorylation of CD3 and TCR zeta chains (FDR = 0.04) and PD-1 signaling (FDR = 0.04). In Stage IIB, there was only the SIX1 gene, which did not nd a signicant path based on the Reactome database, but from the KEGG database, the pathway of transcriptional misregulation in cancer (FDR = 0.01) was found. The only signicant signaling pathway from the unique genes in Stage IIIA was Listeria monocytogenes entry into host cells (FDR = 0.04). No signicant path was found in Stage IIIB. Finally, in Stage IV, there was only the signaling pathway of serotonin metabolism (FDR = 0.04). binding of Tie2 induces Tie2 autophosphorylation, which leads to vascular integration 46 . The overall effect of ANG-1, ANG-2 and TIE-2 on tumour include angiogenesis, inammation and vascular injection 47 . ANGPT1 maintains normal distribution of PECAM1 and junctional adhesion molecule A (JAMA) in tumour vessel walls 48 .


Introduction
Lung cancer remains the most prevalent and fatal malignant tumour on a global scale with a ve-year overall survival rate of only 15% 1 . Nearly 85% of lung cancer cases are categorized into non-small cell lung cancer (NSCLC) which includes adenocarcinoma and squamous cell carcinoma 2,3 . Lung adenocarcinoma (LUAD) is the most common histological NSCLC subtype with an estimated frequency of approximately 40 percent 4 . During the last decade, early-stage detection of LUAD using low-dose computed tomography (LDCT) and treatment modalities including surgical intervention followed by chemo/radiotherapy regimens have been used; notwithstanding, recurrence is a common event, ranging from 30 to 75 percent. In addition, more than 80 percent of relapses occur within 2 years after conventional treatments 5,6 . Surprisingly, signi cant improvements have been achieved in LUAD treatment as reported by molecular targeted therapy and immunotherapy such as the immune checkpoint inhibitors. However, the incidence rate of LUAD is showing an alarming trend within non-smokers at younger ages 7,8 . The insight that LUAD grows slower than other histological subtypes -as well as with higher probability of detection prior to -has sparked signi cant interest among scientists 4 .
LUAD is a heterogeneous disease 11 and the examination of microarray-based expression pro les could identify thousands of genes independently, while the use of network analysis based on system biology can de ne transcription networks and key genes involved in disease 12 . The use of microarray datasets along with bioinformatics knowledge might help to clarify the expression pro le of cancers and, consequently, different biological mechanisms can be obtained 13 . Several studies have shown that molecular markers can be used for the initial screening of tumours 14 . Candidate upregulated and downregulated genes selected by bioinformatics analysis can be assayed in biochemical analysis and subsequent steps 15 .
Growing evidence has shown that previous studies were conducted to determine the genetic changes in the patients with LUAD 16,17 . However, these studies examined only a limited number of patients and restricted genetic dysregulations or geographical variation 18 . Therefore, the application of computational systems biology approach might help to shed lighting on the molecular aspects of LUAD. Discovering the molecular mechanisms involved in LUAD can help to identify the pathogenesis of this cancer. On the other hand, nding speci c LUAD biomarkers in any stage of the disease can help with the screening and early diagnosis. The present study is designed to identify differentially expressed genes and molecular mechanisms of LUAD compared to normal lung tissues using systems biology approaches. Lastly, gene expression patterns at each stage of LUAD, ranging from I to IV, pluscommon and de nite dysregulated genes related to each stage were represented.

1) Eligibility criteria and dataset selection
The gene datasets were screened by two independent authors (R. R. and M. H.) and those deemed relevant were selected according to the inclusion and exclusion criteria. The datasets were identi ed according to the following inclusion criteria: differential gene expression analysis of primary LUAD compared to normal lung tissues without any prior treatment and data availability on the de nite (certain) stage of patients. The gene datasets with speci c categorization were excluded.
2) Processing of GEO dataset and identi cation of differentially expressed genes GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/) is a web-based instrument on GEOquery and limma packages from the Bioconductor project, which can compare two or more groups of specimens to recognize differentially expressed genes (DEGs). In the current study, each dataset was assigned either LUAD or normal tissue and GEO2R was then applied to identify DEGs between them. The occurrence of false-positive results was corrected using adjusted p values (adj. p value (Benjamini-Hochberg procedure)). The adj. p value< 0.05 and log2FC> |1| were set as cut-off values.
3) Microarray data processing and integrative meta-analysis data processing and integration procedures were performed using the R statistical programming language. The aforementioned four datasets made use of different and very popular platforms (Illumina and Affymetrix). A sensitive step in the integration of heterogeneous data is normalization (2), Before data merging, each dataset was rst normalized using the Limma package. The Surrogate Variable Analysis (SVA) package was used for the removal of batch effects (non-biological differences) using the ComBat function (3). Batch effect removal was checked by Principal Component Analysis (PCA) and boxplot.
The outcome of the meta-analysis is a unit expression matrix, namely a combination of four datasets of this study based on common Entrez IDs. We extrapolated DEGs associated with each stage from the unit expression matrix. For the de nition of DEGs, log2 fold change ≥ |1.5| and adjusted p-values ≤ 0.001 were used. Given that the data were acquired from multiple microarray platforms, different pre-processing approaches were applied for each platform.

4) Co-expression network construction in LUAD and adjacent normal lung tissues
The co-expression network and hub genes at different tumour stages of LUAD and adjacent normal lung tissues are initially processed by determining the common DEGs by Venny tool (http://bioinformatics.psb.ugent.be/webtools/Venn/). The list of common DEGs was then subjected to a STRING database to obtain protein-protein interaction (PPI). The outcome of STRING was analyzed by MCODE and CytoHubba (using the Maximal Clique Centrality (MCC) function) plugin of Cytoscape to identify modules of co-expression and hub genes respectively. Lastly, top modules and hub genes were selected according to the highest score.

Results
Dataset selection, differential gene expression analysis and identi cation of DEGs Four gene expression microarray datasets, including GSE75037, GSE63459, GSE32863 and GSE10072 were selected for our study. The raw data were obtained from 219 normal lung adjacent tissue and 232 LUAD patients with a wide range of tumour stages, from IA and IB to IV ( Table 1). The GSE75037 dataset consisted of 83 LUAD patients versus 83 controls, while the dataset GSE63459 consisted of 33 LUAD patients versus 32 controls. GSE32863 and GSE10072 datasets consisted of 58 LUAD patients versus 58 controls, and 58 LUAD patients versus 49 controls, respectively. Before integrative meta-analysis, each dataset was analyzed and genes with log2 fold change ≥ |1| and adjusted p-value threshold of 0.05 were considered as signi cantly differentially expressed.

Venny Analysis
After independent analysis of signi cant upregulated and downregulated DEGs at each stage with Venny software, the shared upregulated and downregulated DEGs among different stages of LUAD cases compared to normal lung tissues were 22 and 140, respectively ( Table 2). The expression level of shared 162 genes among all stages of LUAD cases relative to normal lung tissues is shown in gure 1.
Unique genes for each stage were also identi ed, with speci c details reported in Table 3. Our ndings support the idea that the highest number of unique DEGs genes between is reported by LUAD patients with stage IV, including 84 upregulated and 55 downregulated genes. Detailed information on venny analysis can be found in Supplementary File 1.

PPI Network, Module and hub genes Analysis
At this stage, shared genes (162 common DEGs) were used to elaborate a protein network (protein-protein interaction) with the STRING database (https://string-db.org/). Subsequently, using the Molecular Complex Detection (MCODE) and CytoHubba applications, co-expression clusters and the highly connected hub genes were identi ed respectively (Fig 2). Table 4 shows the top three co-expression clusters. The hub genes included PECAM1, TEK, CDH5, VWF and ANGPT1. Gene enrichment analysis of the top three cluster genes was performed by ToppGene platform. Most of the top cluster genes were reported within metabolic pathways such as Gα (s) signalling events, GPCR ligand binding, class B/2 (Secretin family receptors), Platelet activation, signalling and aggregation (Table 5).

GO and pathway enrichment analysis
In addition to enrichment analysis, genes that were common to all stages were further analyzed using ToppGene tool. Most of the genes (16 genes in total) were found to be enriched in the metabolic pathway of Hemostasis and other superior metabolic pathways including Extracellular matrix organization, Platelet degranulation, Response to elevated platelet cytosolic Ca2+, Cell surface interactions at the vascular wall and G alpha (s) signaling events. Information of the aforementioned metabolic pathways are reported in Table 6.
Biological processes from gene ontology (GO) analysis were also examined with the ToppGene site. The top 25 biological processes can be seen in Table 7.
Most genes are involved in the regulation of cell population proliferation whereas cell adhesion, biological adhesion, and response to endogenous stimulus were the processes that involved the most genes. Details of biological processes are available in supplementary le 2.
Pathway enrichment analysis for unique genes As previously mentioned, Table 3

Discussion
Lung cancer is the most common malignant tumour, with the highest cancer-related mortality rate (22%) 19 and a ve-year overall survival rate of approximately 15% 1 . For these reasons, lung cancer has become of major interest in clinical research 20 . The two main hystotypes of lung cancer include nonsmall cell lung cancer (NSCLC) (85%) and small cell lung cancer (SCLC) (15%). NSCLC is further classi ed into three categories: adenocarcinoma, squamous cell carcinoma and large cell with adenocarcinoma the most common subtype of NSCLC 21 . Due to the poor prognosis of NSCLC, investigation of molecular mechanisms responsible for NSCLC development together with the discovery of biomarkers are of great importance for early detection 22 . One Promising strategy for the identi cation of main biomarkers is to investigate DEGs in a disease.
In the present study, four microarray datasets including GSE10072, GSE32863, GSE63459 and GSE75037 were analyzed. After merging the four mentioned datasets, DEGs at each LUAD stage, in comparison with the control (normal lung tissues), were identi ed from a single expression matrix. The shared upregulated and downregulated DEGs among all LUAD stages were 22 and 140 genes, respectively. Gene set enrichment analysis for shared genes showed that most of the genes (16 genes) were within the metabolic pathway of Hemostasis, the other superior metabolic pathways were: Extracellular matrix organization, Platelet degranulation, Response to elevated platelet cytosolic Ca2+, Cell surface interactions at the vascular wall and G alpha (s) signaling events. Tumor cells or tumor-related in ammatory cells can stimulate the homeostasis system through expression of precoagulation proteins, exposure to precoagulant lipids, release of in ammatory cytokines, and adhesion to host vascular cells. Coagulation activation and tumor progression are closely linked (Coagulation and cancer: biological and clinical aspects/Analysis of haemostasis biomarkers in patients with advanced stage lung cancer during hypofractionated radiotherapy treatment). Unique upregulated and down-regulated DEGs for each stage were also identi ed. Comparison of stage IB and IIB showed only one upregulated gene for each stage, which included FERMT1 and SIX1, respectively. FERMT1 and SIX1 can be examined as speci c biomarkers for stage IB and IIB.
Fermitin family member 1 (FERMT1, Kindlin-1) is a focal adhesion protein and an epithelial-speci c regulator of integrin functions 23 and reported to be expressed in 60% of lung cancers. Although overexpressed in the NSCLC epithelium, FERMT1 is correlated with the differentiation of lung cancer 24 . In lung cancer, Kindlin-1 downregulates Axin2, a component of the wnt signaling pathway, and upregulates Claudin-1 and -3 (tight junction molecules) 25 . SIX1 is a homeodomain transcription factor and a downstream target of Notch2 26 . In one study, Xia Y et al. reported that SIX1 increased NSCLC invasion and proliferation. Upregulation of SIX1 was observed at both mRNA and protein levels by Mimae T et al., making way for lung adenocarcinoma invasion 27 . These two biomarkers can be used to identify the preliminary stages of the disease.
Meaningful signaling pathways for unique upregulated and downregulated DEGs (table 3) were examined using the ToppGene database and the Reactome database at each stage. In stage IA for the CRLF1 gene, IL-6-type cytokine receptor ligand interactions and Interleukin-6 family signaling pathways, for the MFAP5 gene, the pathways of Molecules associated with elastic , Elastic bre formation and nally for the AKR1B10 gene, metabolic pathways of Retinoid metabolism and transport , Metabolism of fat-soluble vitamins and Visual phototransduction were signi cant. The signaling pathways associated with the unique genes in Stage IIA were Translocation of ZAP-70 to Immunological synapse, Phosphorylation of CD3 and TCR zeta chains and PD-1 signaling. In Stage IIB, there was only the SIX1 gene; the pathway of transcriptional misregulation in cancer was found from the KEGG database. The only signi cant signaling pathway from the unique genes in Stage IIIA was Listeria monocytogenes entry into host cells. Finally, in Stage IV, there was only the signaling pathway of serotonin metabolism.
Among the shared genes of all stages, a protein network was plotted, subsequently top 3 co-expression clusters and 5 hub genes were identi ed with the MCODE and CytoHubba plugin, respectively. Most of the top cluster genes were enriched in metabolic pathways such as Gα(s) signaling events, GPCR ligand binding, class B/2 (Secretin family receptors), Platelet activation, signaling and aggregation. The hub genes included PECAM1, TEK, CDH5, VWF and ANGPT1 of which the CDH5 and VWF genes were in the third co-expression cluster. These 5 hub genes were part of the downregulated DEGs for all stages.
PECAM-1 (platelet endothelial cell adhesion molecule-1) is a 130 kDa transmembrane glycoprotein also known as CD31 28 . This protein is a member of the immunoglobulin superfamily and is commonly expressed on the surface of platelets, monocytes, neutrophils, endothelial cells, and a large population of circulating T lymphocytes 28,29 . Functions of PECAM-1 include leukocyte migration, angiogenesis, integrin activation and the intercellular junction of endothelial cells 30 . Although phosphorylation of PECAM-1 is associated with aggregate-dependent growth and cell adhesion, its degradation reduces tumour cell size and the colony-forming ability 31 . PECAM1 is involved in LUAD tumourigenesis by regulating the expression of vascular endothelial growth factor. In fact, it can maintain and restore vascular integrity and is commonly used as a marker for angiogenesis 28,32 .
The TIE receptor family, which incorporates TIE1 and TIE2 and is likewise referred to as TEK, plays an essential function in vasculature formation 33 and is mainly expressed in endothelial cells 34 . In the study of Tao Peng et al., TEK has been reported to be downregulated in LUAD tissues and cell lines, indicating poor patient survival; additionally, overexpression of TEK in LUAD cell lines showed impaired proliferation and aggression abilities 35,36 . TEK was identi ed in 2018 as the gene involved in angiogenesis, cell growth, cytokine secretion and in ammatory response in the lung adenocarcinoma 36 .
CDH5 or VE-cadherin is a transmembrane protein that is responsible for cell-cell adhesion of Endothelial cells (ECs) 37 and promotes the integration of ECs 38 . In lung cancer, the expression of VEGF and CDH5 in circulating microvesicles is correlated with distant metastasis. Researchers have found that CDH5 expression in NSCLC tissues is related to node metastasis and poor prognosis 39 . CDH5 is important for angiogenesis in tumour and promotes tumour progression via the transforming growth factor -β signalling pathway 40 .
Von Willebrand factor (vWF) is a large multi-subunit glycoprotein that leads to platelet adhesion and accumulation to the sub-endothelium. vWF is observed in platelets, endothelial cells and megakaryocytes 41 . This molecule plays a pivotal role in homeostasis. Not only it helps platelets to be absorbed in damaged endothelium but also latest research displays its tumour angiogenesis, in ammation and metastasis potential. 42,43 . On the surface of tumour cells, several vWF receptors such as glycoprotein Ib, integrin αIIbβ3 and αVβ3 have been identi ed; however, a direct link between vWF and tumour cells has also been observed 44 . vWF is a highly speci c marker for vascular endothelial cells 45 . In many cancers, high levels of vWF have been reported in plasma and tumour microenvironment, indicating an advanced stage of the disease. vWF facilitates tumour cell growth by upregulating in ammatory signals, angiogenesis and vascular permeability 43 .
Angiopoietins belong to a growth factor family required for the formation of blood vessels, including ANG-1, which is the Tie2 Receptor ligand 46 . ANG-1 binding of Tie2 induces Tie2 autophosphorylation, which leads to vascular integration 46 . The overall effect of ANG-1, ANG-2 and TIE-2 on tumour include angiogenesis, in ammation and vascular injection 47 . ANGPT1 maintains normal distribution of PECAM1 and junctional adhesion molecule A (JAMA) in tumour vessel walls 48 .
The top 25 biological processes from gene ontology (GO) analysis showed that most genes were involved in the regulation of cell population proliferation; cell adhesion, biological adhesion, and response to endogenous stimulus were the processes in which most genes were subsequently involved.
The results of this study, obtained through bioinformatics analysis based on the system biology approach, provide details on potential diagnostic biomarkers, genes and signalling pathways involved in LUAD pathogenesis. In order to verify these results, further studies and molecular biological experiments are needed.
Conclusions 22 upregulated and 140 downregulated shared DEGs, based on the analysis of 4 microarray datasets, were identi ed among all stages of LUAD when compared to normal lung tissues. Unique upregulated and downregulated DEGs for each stage were also identi ed. FERMT1 and SIX1 can be used as a speci c biomarker for stage IB and IIB to diagnose early-stage disease. 5 hub genes were obtained including PECAM1, TEK, CDH5, vWF and ANGPT1 which might be crucial for the onset and progression of LUAD. Top co-expression clusters, metabolic pathways and biological processes were identi ed which might help to better understand the underlying mechanism in LUAD development and progression at different stages.
All methods were carried out in accordance with relevant guidelines and regulations.

Declarations
Ethics approval and consent to participate Not applicable.
Consent for publication I con rm that I understand BMC Bioinformatics is an open access journal that levies an article processing charge per articles accepted for publication. By submitting my article I agree to pay this charge in full if my article is accepted for publication.
No, I declare that the authors have no competing interests as de ned by BMC, or other interests that might be perceived to in uence the results and/or discussion reported in this paper.
The results/data/ gures in this manuscript have not been published elsewhere, nor are they under consideration (from you or one of your Contributing Authors) by another publisher.
I have read the BMC journal policies on author responsibilities and submit this manuscript in accordance with those policies.