Identification of key miRNA and hub genes in lung adenocarcinoma by bioinformatics and functional analysis

Background : lung adenocarcinoma is the main subtype of lung cancer and the most fatal malignant disease in the world. However, the pathogenesis of lung adenocarcinoma has not been fully elucidated. Methods : Three LUAD-associated datesets (GSE118370, GSE43767 and GSE74190) were downloaded from the Gene Expression Omnibus (GEO) datebase and the differentially expressed miRNAs (DEMs) and genes (DEGs) were screened by GEO2R. The prediction of target gene of differentially expressed miRNA were used miRWALK. Metascape was used to enrich the overlapped genes of DEGs and target genes. Then, the protein-protein interaction(PPI) and DEMs-DEGs regulatory network were created via String datebase and Cytoscape. Finally, overall survival analysis was established via the Kaplan–Meier curve and look for the possible prognostic biomarkers. Result : In this study, 433 differential genes were identified. There were 267 genes overlapped with the target gene of Dems, and eight hub genes (CDH1, CDH5, CAV1, MMP9, PECAM1, CD24, ENG, MME) were screened out. There were 85 different miRNAs in total, among which 16 miRNA target genes intersect with DEGs, 12 miRNAs with the highest interaction were screened out, and survival analysis of miRNA and hub genes was carried out. Conclusion : we found that miRNA-940, miRNA-125a-3p, miRNA-140-3p, miRNA-542-5p, CDH1, CDH5, CAV1, MMP9, PECAM1 may be related to the development of LUAD.


Cancer Biology
Introduction Lung carcinoma incidence and mortality ratio is the highest in all countries in the world. Lung carcinoma accounts for 11.6% of all cancers and lung carcinoma incidence rate is 18.4% [1]. In recent years, the proportion of lung adenocarcinoma in non-small cell lung cancer is increasing. Despite various treatments, the mortality rate is still very high. Therefore, in order to reduce the incidence rate and mortality of lung adenocarcinoma, we must identify the relevant targets and mechanisms of lung adenocarcinoma.
Gene chips are also called DNA microarrays, and the cDNA sequence representing a single gene is spotted on a glass slide at a high density [2]. At present, numerous of studies have confirmed the 3 reliability and rationality of gene chip results [3]. Gene chip has been widely used in tumor research because of its high throughput and large-scale detection of genes, which not only improves the research efficiency, but also provides the possibility for in-depth study of gene function [4]. Therefore, microarray analysis is an effective means to study the distribution of miRNAs or mRNAs in LUAD.
miRNA is a kind of small non coding RNA, which is composed of 19-23 nucleotides. It is an endogenous non coding single stranded RNA, widely exists in eukaryotes [5]. Research shows that miRNA plays an vital regulatory role in a large number of organs and biological processes. miRNA is also related to growth signal imbalance, cell energy imbalance, immune escape, angiogenesis, tissue invasion and metastasis [6]- [9]. It has been shown that miR-29 family can target Tet1 and TDG in lung adenocarcinoma, leading to tumor development [10]. miR-21 drives tumorigenesis through suppression of negative regulators of the Ras/MEK/ERK pathway and suppression of apoptosis [11]. miR-221 & 222 induces TRAIL resistance through the target genes PTEN and TIMP3, and enhances cell migration by activating the AKT pathway and metallopeptidase [12].In this study, two mRNA datasets and a set of miRNA datasets were selected from geo database to screen the mRNA and miRNA differentially expressed in matched LUAD and normal tissues. MiRwalk predicted the target genes of differentially expressed miRNA, selected the overlapped genes of target genes and DEGs, screened out miRNA with high correlation, and carried out survival analysis. In addition, metascape was used to enrich the overlapped genes. Finally, we also constructed PPI, Dems DEG networks and searched for hub genes to verify and analyze the survival of hub genes. Our aim is to distinguish the misaligned miRNAs, mRNAs and to find out their regulatory networks in the development of LUAD, and the ultimate goal is to offer more evidence for the therapeutic targets of LUAD.

Data sources
First, search for the keyword lung adenocarcinoma, in GEO (www.ncbi.nlm.nih.gov/geo) database.
Screening criteria: Organization: Home sapiens. Two gene datasets (GSE43767 and GSE118370) were retrieved from the GEO database. GSE43767 is based on GPL6480 platform, including 15 benign lung lesion tissues and 69 cancer tissues of LUAD patients GSE118370 is based on the GPL570 platform, 4 including 6 invasive lung adenocarcinoma tissues and 6 matched normal tissues; One miRNA dataset called GSE74190 is based on the GPL19622 platform, including 36 lung adenocarcinoma tissues and 26 normal lung tissues. The heat map and volcano map were made in R (3.6.2) software.

Screening differentially expressed genes/miRNAs DEGs and DEMs
Screening DEGs and DEMs and studying their interactions may reveal the underlying regulatory mechanisms of the disease. GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/) was exploited to screen the differences between the two groups of lung adenocarcinoma tissue and normal tissue of differentially expressed genes and miRNAs(DEGs and DEMs). GEO2R is a web tool of GEO database.
Comparing two groups of samples under the same experimental conditions, we can analyze most of GEO series [13]. The screening conditions of GSE118370 and GSE43767 were Padj < 0.05 and | log2FC| >1. Then the common differential genes were determined by vene map.The screening conditions of GSE74190 were Padj < 0.05 and| log2FC |>1. We used R(3.6.2) software to create heat maps and volcano maps for these DEGs and DEMs.

Target gene prediction of DEMs
Target genes for DEMs were predicted in miRWalk [14] http://miRwalk.umm.uniheidelberg.de/about/ ,which is a comprehensive miRNA target gene database and integrates information from miRdb, targetscan, miRtarbase and other databases. Then, these target genes were overlapped with DEGs to screen out the final DEGs using the Venn diagram.

Functional enrichment
Metascape www.metascape.org/ [16] provides automatic meta-analysis tools. It also supports protein-protein interaction (PPI) analysis based on, interactive visualization of Gene Ontology (GO) network and generation of enrichment heat map. We enriched GO and KEGG in the overlapping part of DEG and target gene. The selected parameters are min overlap: 3, P value cutoff: 0.01, min enrichment: 15.

DEMs-DEGs network and PPI network
The PPI network map of overlapped genes was constructed on the STRING string-db.org/ [17]website. The PPI network and the DEMs-DEGs network were visualized via the software of 5 Cytoscape(3.7.2). The hub genes of overlapped genes were screened by the MCODE [16] plug-in of the application software.

Survival analysis and validation
Ualcan (http://ualcan.path.uab.edu/) [18] was used to verify the hub gene. Ualcan is a comprehensive, user-friendly and interactive network resource, which is used to analyze the data of cancer histology.

Target gene prediction of miRNA
The target genes of 85 miRNAs were predicted by miRwalk website, and then these target genes were overlapped with DEGs. It was found that there were 16 miRNAs associated with them, and 267 target genes overlapped with DEGs (Figure 1b). We selected the first 12 miRNAs (Table 1) with high correlation index and overlapped genes.

PPI and DEMs-DEGs visualization network
The PPI network of DEGs was obtained by the String website, and the visual network analysis of DEGs In order to better understand the role of miRNAs and regulatory genes, we constructed DEMs-DEGs regulatory networks. DEMs-DEGs regulatory network was composed of 323 nodes and 247 edges and it showed that miRNA-940 interacts with CDH5 and MME, miRNA-125a-3p interacts with CAV1, miRNA-140-3p interacts with ENG and CHD1, miRNA-542-5p interacted with CD24, MME, PECAM1 ( Figure 5).

Enrichment analysis
More than 200 DEGs were enriched by GO and KEGG functions using the Metascape website. The results showed that these DEGs were mainly enriched in the aspects of blood vessel morphogenesis

Visual verification and survival analysis of hub gene
We visually verified eight hub genes on the website (http://ualcan.path.uab.edu/index.html), and found that the expression of CDH1, CD24, MMP9 was up-regulated then the expression of CAV1, CDH5, ENG, MME, PECAM1 was down-regulated. This result was consistent with our research ( Figure   7). Patient overall survival analysis displayed that the expression of MME, CD24 and ENG did not affect the overall survival time of patients with LUAD. However, compared with patients with low expression, patients with increased CDH1 and MMP9 expression had a statistically significant decrease in overall survival time. Compared with patients with high expression of CDH5, CAV1 and PECAM1, the overall survival time of patients with decreased expression of CDH5, CAV1 and PECAM1 decreased, which was statistically significant (Figure 8).

Discuss
Our research showed that miRNA promoted the development of LUAD through its target gene axis and these DEMs and DEGs were all from the GEO database. We found that miR-940/CHD5, MMP9 axis, miR-125a-3p/CAV1 axis, miR-140-3p/CDH1 axis and miR-542-5p/PECAM1 axis may be important pathways to promote the occurrence and development of LUAD. These studies have not yet been discovered.
Up regulation of miR-940 expression can inhibit the proliferation, invasion and migration of tumor cells and play a role of tumor suppressor gene. But in gastric cancer [25], bladder cancer [26] and pancreatic cancer [27], the expression level of miR-940 was increased, which played a role of oncogene. This study shows that the expression of miRNA-940 in LUAD is down regulated, and the down regulated miRNA-940 often has a worse prognosis, and its target gene CDH5 is down regulated, and the down regulated LUAD patients also have a worse prognosis. Its other target gene, MMP9, is up-regulated, and patients with high expression have a worse prognosis. This suggests that miRNA-940 may promote the development of LUAD by targeting CDH5 and MMP9. miRNA-125a-3p is down regulated in many kinds of cancers, which can inhibit the proliferation and migration of cancer cells by targeting. Kim et al [28]Found that through miR-125a-3p-mediated upregulation of USF1, CHI3L1 could be down regulated to inhibit lung metastasis. In the study of nonsmall cell lung cancer (NSCLC), Hou et al [29] used qPCR to detect the expression of miR-125a-3p in NSCLC and adjacent normal lung tissues. It was found that miR-125a-3p was low expression in NSCLC, and the expression of miR-125a-3p was negatively correlated with lymph node metastasis, malignant tumor stage and tumor diameter, suggesting that miR-125a-3p could be used as a biomarker for 8 prognosis of NSCLC patients. The results of this study showed that miRNA-125a-3p was down regulated in LUAD, and CAV1, as the target gene of hub gene, was down-regulated, and miRNA-125a-3p and CAV1 often had worse prognosis in patients with low expression of LUAD, which indicated that miRNA-125a-3p might promote the development of LUAD by targeting the regulation of CAV1. This suggests that miRNA-125a-3p and CAV1 may also be potential biomarkers of LUAD. miR-140-3p is a newly discovered miRNA in recent years. It has been shown that its expression is decreased in many kinds of tumors and plays an anti-tumor role. For example, Dong et al [30] showed that over expression of miR-140-3p could inhibit the growth, migration and invasion of NSCLC cells, and induce apoptosis. Sun et al [31]showed that the expression of miR-140-3p decreased in breast cancer cells, and over expression of miR-140-3p inhibited the proliferation of breast cancer cells and promoted apoptosis. In this study, the expression of miRNA-140-3p was lower than that of normal lung tissue. Survival analysis showed that the low expression miRNA-140-3p had a worse prognosis than the high expression miRNA-140-3p. There are two target genes in hub gene. One is CDH1 with high expression and the other is ENG with low expression. The expression of CDH1 is up-regulated, the prognosis is often worse, but the expression of ENG is down-regulated, there is no significant difference in prognosis. This suggests that miRNA-140-3p may affect the occurrence of LUAD by regulating CHD1. miRNA-140-3p and its target gene CHD1 may also be the detection and treatment targets of LUAD.
It was found that miR-542-5p was down regulated in a variety of tumor tissues, and it was involved in the occurrence and development of malignant tumors as a tumor suppressor gene. Bray et al [32] found that miR-542-5p was significantly low expression in neuroblastoma, which played a role of tumor suppressor gene by suppressing the proliferation and invasion of cancer cells. Zhou et al [33] Found that miR-542-5p can regulate the expression of target gene EGFRv A, thus reducing the invasive ability of cancer cells. The present study showed that the expression of miRNA-542-5p was up-regulated. Survival analysis showed that the highly expressed miRNA-542-5p had a worse prognosis. There were three target genes in hub. The expression of CD24 was up-regulated and MME was down-regulated, however there were no correlation between CD24, MME expression and the 9 prognosis of patients with LUAD. However, the prognosis of the patients with decreased expression of PECAM1 was worse than that of the patients with high expression of PECAM1. This suggests that miRNA-542 may regulate the development of LUAD through its target gene PECAM1.

Conclusion
To sum up, we jointly researched the expression profiles of miRNA and mRNA in LUAD to find out more targets for the diagnosis and treatment of LUAD. CDH1, CDH5, CAV1, MMP9, PECAM1 miR-940 miR-125a-3p miR-140-3p miR-542-5p may serve as biomarkers and can be united for the diagnosis of LUAD. The lung adenocarcinoma data are publicly available and can be downloaded from NCBI GEO database(GSE118370, GSE43767, and GSE74190).  Heat map and volcano map of gene and miRNA expression profiles of LUAD and its adjacent tissues. (a) Differential gene and differential miRNA expression profile thermogram, red represents up-regulation and green represents down-regulation. (b) Volcanic map of differential genes and differential miRNA expression profiles, red for up-regulation, green for down-regulation, and black for failure to meet screening conditions.
16 Figure 4 (a) PPI network, red represents up-regulation gene, blue represents down-regulation gene.
(b)HUB gene of the PPI network,red represents up-regulation gene, blue represents downregulation gene.
17 Figure 5 DEMs-DEGs regulatory network, Red represents up-regulated genes or miRNA, and blue represents down-regulated genes or miRNA. The circle represents the HUB gene.
18 Expression of HUB gene in cancer and normal tissues from the UALCAN database.
20 Figure 8 Prognostic analysis of HUB gene in LUAD. Logrank P < 0.05 was considered statistically.