Prognostic values and prospective pathway signaling of miR-126 in non-small cell lung cancer: a study based on gene expression omnibus and bioinformatics analysis

Objective MiRNAs are considered to be crucial for NSCLC’s initiation and development. MiRNAs have been widely identied in NSCLC. However, the role of miR-126 in NSCLC has not been fully explained. Methods miR-126 Expression in NSCLC was evaluated by analyzing the common data sets in Gene Expression Omnibus(GEO) database and reviewing former thesis papers. Three mRNA datasets, GSE18842, GSE19804 and GSE101929, from GEO to indentify the differentially expressed genes (DEG). We prognosed the target genes of hsa-miR-126-5p using TargetScan and analyzed the gene overlap between the target genes of miR-126 and DEG in NSCLC. Subsequently, we analyzed Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. We used STRING and Cytoscape to construct a protein-protein interaction (PPI) network, and analyzed the inuence of HUB gene on the prognosis of NSCLC. Results A common pattern of mir-126 downregulation in NSCLC was identied in the literature review. A total of 187 DEGs were identied, both NSCLC-related and miR-126-related. Many DEGs are extendedly enriched in cell membranes, signal receptor binding, and biological regulation. Among the 10 main Hub genes analyzed by PPI, 4 HUB genes (NCAP-G,MELK,KIAA0101,TPX2) were obviously related to the poor recuperation of NSCLC patients. When these genes highly expressed, survival rate of NSCLC patients was low. Furthermore, we identied the recessive miR-126-related genes that may be involved in NSCLC, such as TPX2, HMMR, and ANLN through network analysis. Conclusion this study suggests that mir-126 is radical for the biological processing of NSCLC.


Introduction
Lung cancer ranks rst in global cases of malignant tumor, and the vast majority are non-small cell lung cancer (NSCLC). Due to its high degree of malignancy, there is currently no e cient treatment, and the fatality rate also ranks rst in the world. [1] In recent years, the occurrence and fatality of lung cancer in our country have gradually increased [2]. It is because the pathogenesis of lung cancer remains unclear and its early symptoms are not typical, that most lung cancer patients are not diagnosed by histopathology until their middle or late stages of the disease, missing the best chance for early surgery.
Even if they are diagnosed with lung cancer early enough to have the surgery, after tumor is totally removed, it is found during postoperative follow-up that the recurrence rate and metastasis rate are often as high as 1/3 [3]. With the in-depth study and investigation of drive-gene on lung cancer, we have a better understanding of the molecular biology of NSCLC, and the targeted therapy for diver genes has achieved remarkable results. Compared with traditional chemotherapy, targeted therapy has opened up a new therapeutic approach with high e ciency, low toxicity, accuracy and minimum adverse reactions in the eld of lung cancer treatment, and thus improved the e ciency and safety of treatment towards NSCLC patients. [4,5]. In particular, the maturity of next-generation sequencing (NGS) technology has further promoted the development of NSCLC driver gene detection and targeted therapy [6].
And thus we can conclude that the morbidity and fatality of NSCLC are so high. If we can nd new targets related to the prognosis of NSCLC and clarify its mechanism, targeted treatment of NSCLC patients will greatly improve the clinical treatment effect of lung cancer as well as patients' quality of life, which would be essential in monitoring cancer recurrence and guiding rehabilitation.
MicroRNA (miRNA) is one of the mostly conserved non-coding small molecule RNAs. It is complete complementary or incomplete complementary to the 3'UTR, CDS or promoter region of mRNA, at the posttranscriptional stage of mRNA, to inhibits mRNA translation or directly degrades target mRNA, so as to regulate gene expression [7]. Since more miRNA family and its various regulatory functions are found, it is explicit that they are playing multiple roles in physiological and pathological processes such as cell differentiation, proliferation and apoptosis; meanwhile, they are an important factor to explain the initiation and development of various tumors. [8,9]. More and more studies suggested that the abnormal expression of miR-126-5p (miR-126) plays multiple roles in the biological processes of various tumors., Studies suggest that MicroRNA-126 hinders the viability of colorectal cancer cells by hindering mTORinduced apoptosis and autophagy [10]. Through the regulation of epigenetics we can use miRNA as a biomarker for the early diagnosis and treatment of malignant mesothelioma [11]. Kim et al [12] reported that ETV2 / ER71 regulates the production of FLK1 + cells from mouse embryonic stem cells through the miR-126-MAPK signaling pathway. On the other hand, some reports suggested that miR-126 can mediate the activation of STAT3 signal pathway, regulate the malignant biological behavior of NSCLC cells, and affect their proliferation, migration, cycle and susceptibility to apoptosis [13]. It has been explicit from previous studies that the expression level of miR-126 in NSCLC is numerously down-regulated and is crucial for NSCLC prognosis [14,15].
Hence, on the basis of published data about the expression of miR-126 in NSCLC, this study aims to con rm possible molecular targets and explain the role of miR-126 in NSCLC by investigating miRNAs expression, GEO and literature review, as well as bioinformatics analysis. Criteria for inclusion or non-inclusion were: (1) include studies of the expression of miR-126 in NSCLC ; (2) exclude review, non-clinical studies, case reports, meta-analysis, and meeting summary; (3)exclude those with no control groups.

Gene ontology enrichment and target prediction analysis
The gene expression pro les of GSE18842, GSE19804 and GSE101929 were from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo). The array data of GSE18842, GSE19804 and GSE101929 were composed by 46, 60 and 32 NSCLC samples and 45, 60 and 34 ANLT samples. All data were analyzed on the GPL570 platform: Affymetrix Human Genome U133 plus 2.0 (Affymetrix; Thermo Fisher Science, Inc., Waltham, MA, USA).
Limma software package (version 3.6.3) in R / BioManager was used to identify differentially expressed genes (DEG) between NSCLC and ANLT By default, the adjusted P value (adj.P.Value) uses Benjamini and Hochberg false discovery rate (FDR) methods to correct false positive results. P 0.05 and |log2(FC)| 1 was set as the cut-off criterion Based on the platform annotation le downloaded from the database, the probe data in the matrix le was converted into gene symbol.
The target gene of hsa-miR-126-5p (TG_miR-126-5p) was anticipated using the online website TargetScan (http://www.targetscan.org/vert_72/). Subsequently, the gene overlap between DEGs integrated in NSCLC was analyzed by bioinformatics software, and TG_miR-126-5p was anticipated. The gene overlap was analyzed through Gene Ontology (GO) and visualised by Binto plug-in of Cytoscape software (version 3.7.2). David database was applied to analyze the enrichment of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. FDR ≤ 0.05 was set as the criterion. (STRING, version 11.0, https://string-db.org/)was used to build Protein-protein interaction(PPI) network of genes,whose interactions were also visualized by using Cytoscape software(version 3.7.2)'s CytoHubba plug-in. The con dence score C ≥ 0.7 was set as the cut-off criterion. Then, we conducted the molecular complex detection (MCODE), and selected the PPI network modules with degree cutoff = 2, node score cutoff = 0.2, k-core = 2, and max. Depth = 100.

Survival analysis
Online tool Kaplan-Meier plotter (KM plotter, www.kmplot.com) will assess the in uence of 21 cancers on survival rate, Including some of the most enumerate ones: breast(n = 6,234),ovarian(n = 2,190),lung(n = 3,452), And gastric (n = 1,440) cancer. Based on the median expression level of speci c genes, we divided NSCLC patients into high expression and low expression groups. And we analyzed NSCLC patients' living situation by Kaplan-Meier method. Survival analysis was conducted on the general population; smoking population and non-smoking population. Then this study included the speci c genes that showed statistical importance in the survival analysis of smoking population and non-smoking population. It was calculated and displayed the hazard ratio (HR) and its 95% con dence interval. Data set GSE102287 was applied to proof the correlation between miR-126 and DEGs. 62 samples in total were included in this analysis.

Statistical analysis
Data are displayed as mean ± standard deviation (SD). The discrepancy between the two groups was investigated through two independent sample T tests. We also analyzed the correlation between miR-126 level and NSCLC by standardized mean difference (SMD) analysis by using Stata 15.0 statistical software. We used Mantel-Haenszel formula ( xed effect model) or Der Simonia-Laird formula (random effect model) to combine and analyze different GEO data sets. When the Q statistic matters (P ≤ 0.05 or I 2 ≥ 50%), we adopt the random effect model and otherwise the xed effect model. Spearman rank correlation was used to analyze the correlation between DEGS expression and miR-126 levels. When P 0.05, it was assumed to statistically matter.

miR-126 expression in OC based on GEO
Based on the GEO dataset ( Fig. 1), the expression of miR-126 was evaluated in a series of NSCLC and ANLT. This study collected 4 GEO data sets (GSE102286, GSE29248, GSE63805 and GSE27705) in total. In the data set GSE102286, GSE29248, GSE27705, the expression level of miR-126 in NSCLC tissue was obviously lower than that of ANLT (P < 0.01); while in the data set GSE63805, the expression level of miR-126 in NSCLC tissue, compared with that of ANLT group, showed no statistically important discrepancy. The research characteristics based on the GEO dataset was enumerated in Table 1 and Fig. 2. In accordance with all included GEO data sets, the difference between non-small cell lung cancer and the normal control group was statistically important (SMD = -1.69, 95% CI: -2.97 ~ -0.41, p < 0.01). The results of forest map were as shown in Fig. 3.

Expression Pro le of miR-126 in NSCLC and ANLT in Literature
Next, we investigated the expression of miR-126 in non-small cell lung cancer based on previous data. As shown in Fig. 4, we selected six studies from the literature that meet the selection criteria [14][15][16][17][18][19]. All studies showed that the expression level of miR-126 in NSCLC tissues was obviously lower than that in normal control group (Table 2).  (Fig. 5, Fig. 6). Then, based on TargetScan, we anticipated 5591 TG_hsa-miR-126-5p,and 187 of which were proofed in 595 common DEGs. The data in GSE101929 showed that there were 46 up-regulated hsa-miR-126-5p related genes and 141 down-regulated hsa-miR-126-5p related genes in NSCLC tissues compared with that in normal control group (Fig. 6). Among them, the top ten up-regulated and down-regulated hsa-miR-126-5p related genes in data set GSE101929 were shown in Table 3.

Functional analysis of miR-126-related DEGs in NSCLC
Bingo plug-in in Cytoscape was used for Go function analysis. David was used for KEGG enrichment analysis. Analysis showed that many target genes were involved in biological processes such as cell membrane, signal receptor binding, glycosaminoglycan binding, biological regulation, and stimulus response (Fig. 7, Table 4). In addition, among these potential target genes, there was an over expression of the KEGG pathway, that is the protein digestion and absorption interaction pathway (Table 4).

PPI Network Construction and Modules Selection
The PPI network of miR-126-related DEG is composed of 187 nodes and 281 edges, including 46 upregulated genes and 141 down-regulated genes (Fig. 8). Setting Degree as the ranking criterion, a total of Top 10 genes were selected as HUB genes, and there was a close correlation between HUB genes (Fig. 9A). An important module was obtained from the PPI network of mir-126-related DEG using MCODE, including 12 nodes and 64 edges (Fig. 9B).

Survival Analysis
The prognostic value of the top 10 HUB genes (CCNB1, KIAA0101, BUB1B, TPX2, NUF2, NCAPG, MELK, KIF15, HMMR, ANLN) in the PPI network were evaluated on the websitehttps://kmplot.com/analysis/. The overall survival of NSCLC patients was analyzed according to the high and low expression of each HUB gene. The results showed that all the ten genes were related to the prognosis of NSCLC in the overall survival analysis of population. However, the results were inconsistent when we divided the population into smoking and non-smoking groups. In the NSCLC survival analysis, genes with statistical differences between smokers and non-smokers were included in this study, and only four genes were up to criterion. Results showed that high expression of NCAPG  (Fig. 10). In PPI network, the Top 10 Hub genes were negatively correlated with miR-126 to some extent, especially the genes TPX2, HMMR and ANLN, which play a potential role in NSCLC through interaction with mir-126 (Table 5).

Discussion
In the current study, we con rmed the abnormal expression of miR-126 related to NSCLC by comparing the expression pro les of miRNA in NSCLC tissues and normal control lung tissues based on the data from the GEO dataset and published studies. In addition, through GO analysis, KEGG analysis, proteinprotein interaction (PPI) network and kaplan-meier plotter, we discovered and analyzed new markers and potential targets of miR-126 involved in the regulation of key biological processes in NSCLC.
So far, there have been few studies on the characteristics of miR-126 in NSCLC, but different studies have shown that the expression of miR-126 in NSCLC patients is down-regulated. A study of Wang et al [20]in 2015 showed that the level of miR-126 in the blood of patients with early stage non-small cell lung cancer was lower than that of healthy people. Zhu et al [21]and Shang et al [22]reached the same conclusion.
Interestingly, in addition to being in the blood, the level of miR-126 in the sputum of lung cancer patients also decreased compared with the healthy control group. Their studies all suggested that miR-126 can be used as a marker for early diagnosis of non-small cell lung cancer. The expression of miR-126 in patients with non-small cell lung cancer decreased not only in body uids, but the expression of miR-126 also lowered in non-small cell lung cancer tissues, comparing with adjacent tissues [14,15]. And this decrease is signi cantly associated with the prognosis of patients with NSCLC [16,17]. According to the included GEO dataset, three out of the four GSE datasets showed that miR-126 signi cantly decreased in NSCLC, and a meta-analysis of the random effect model showed that the combined SMD also had statistical importance. Therefore, according to the current research results, it will be of great signi cance for the treatment and prognosis of NSCLC patients to further study the mechanism of miR-126 in NSCLC, investigate its target genes and study its potential therapeutic targets.
MiR-126 is one of the most important miRNAs in the regulation of NSCLC. A meta-analysis by Weng et al [23]suggested that high expression of miR-126 was positive factor hopefully to improve the overall survival in patients with non-small cell lung cancer. MiR-126-3p can hamper the growth, migration and invasion of non-small cell lung cancer by targeting CCR1 [24]. Lima et al [25]reported that the expression of miR-126-5p hampered the enzyme activity of MDH1 and mitochondrial respiration in NSCLC cells, leading to cell death. The up-regulation of long-chain non-coding RNA MINCR promoted the growth of non-small cells in lung cancer by negatively regulating the miR-126 /SLC7A5 axis [26]. Guo et al [27]s research showed that long-chain non-coding RNA PRNCR1 regulates the proliferation, apoptosis, migration, and invasion of non-small cell lung cancer cells through the PRNCR1 / miR-126-5p / MTDH axis In view of the current situation, it is necessary to further clarify the molecular mechanisms and clinical value associated with abnormal expression of miR-126 in NSCLC.
Through PPI network analysis, we found ten Hub genes with high correlation. Through the survival analysis by online tools, we found that these ten Hub genes were associated with the prognosis of NSCLC. However, considering the important role of smoking in lung cancer, we conducted a separate prognostic analysis of smoking population and non-smoking population, and found that after strati ed analysis, some genes became no longer signi cant, and only four genes' expression was signi cantly related to the prognosis of NSCLC, no matter the population smokes or not. These four genes are NCAPG MELK KIAA0101 and TPX2.Their high expression was associated with poor prognosis in NSCLC patients. This is also consistent with the negative regulation of miR-126. NCAPG (non-SMC condensin I complex,subunit G) is a subunit of the agglutinin complex, which is responsible for the condensation and stabilization of chromosomes during mitosis and meiosis [28].. Kim et al [29]reported that NCAPG can be used as a candidate biomarker for renal cell carcinoma. The maternal embryo leucine zipper kinase (MELK) belongs to the CAMK serine / threonine protein kinase superfamily and has maximum activity during mitosis. Zhang et al's [30]research showed that NSCLC patients with high MELK expression had a poor prognosis. This is consistent with the conclusion of this study. KIAA0101 mainly plays a role in the liver, and there are few studies with other organs.Kato et al [31] reported that the overexpression of KIAA0101 predicted a poor prognosis in patients with primary lung cancer. This is consistent with the result of this study. Schneider et al [32]reported that TPX2 is associated with poor prognosis in patients with non-small cell lung cancer.It can be conclude that these genes are expected to become new prognostic markers for NSCLC.
In the current study, we found that new candidate target genes of miR-126 are involved in the regulation of key biological processes of NSCLC, such asTPX2 HMMR and ANLN. The HMMR gene has been proved to be a polyhedral molecule. First, HMMR has the role of promoting tumors. Its product hyaluronic acidmediated cell migration receptor (RHAMM) has been proved to be a cancer promoter and may become a tumor cell migration Promoting factors, whose expression promotes the occurrence and development of tumors, and is related to the pathological stage of tumors [33]. Some studies have shown that carbon ion irradiation can inhibit the expression of A549anillin (ANLN) gene in human lung adenocarcinoma cells, which is regulated by the activation of phosphatidylinositol-3-kinase (PI3K) / Akt signal pathway associated with metastasis [34]. Our correlation analysis also showed that miR-126 was negatively correlated with genes TPX2, HMMR and ANLN in non-small cell lung cancer, and the difference was statistically signi cant, indicating that they play a very important role in the biological regulation of NSCLC.
In conclusion, the results of this study suggest that miR-126 plays an important role in NSCLC biology. However, further in vivo and in vitro experiments are needed to further study its pathogenesis and thus to clarify the role of the molecular network regulated by miR-126 in NSCLC.

Declarations
Ethics approval and consent to participate Ethical approval was not needed because this is a Bioinformatics analysis.

Consent for publication
Not applicable.
Availability of data and material The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
All the authors declare that they have no con ict of interest.

Funding
Our work is supported by Natural Science Foundation of Jiangsu Province, Grant/Award Author's contributions All authors contributed to the study design; all authors collected the data and performed the data analysis; all authors prepared the manuscript.  Expression of miR-126 in NSCLC and normal lung tissues in GEO datasets. NSCLC: Non-small cell lung cancer; Normal: normal lung tissue; miR-126: hsa-miR-126-5p    Venn plots of hsa-miR-126-5p-related differentially expressed genes from four datasets ( GSE18842, GSE19804, GSE101929, and TG_miR-126-5p), the overlapping area corresponds to the commonly identi ed DEGs. DEGs: differentially expressed genes; TG_miR-126-5p, target genes of hsa-miRNA-126-5p.

Figure 7
Results of GO enrichment Analysis of miR-126 NSCLC Target Gene.The yellow circle represents functional enrichment, and the larger the circle, the darker the color, the more genes are enriched in this pathway. The connecting lines represent the association between gene and gene.

Figure 8
Protein-protein interaction network of hsa-miR-126-5p-related DEGs. The lines represent interaction relationship between nodes. DEGs, differentially expressed genes.The line between the circle nodes represents the interaction between the two proteins linked by the line.Colored nodes:query proteins and rst shell of interactors;white nodes:second shell of interactors;empty nodes:proteins of unknown 3D structure; lled nodes:some 3D structure is known or predicted..  Overall survival analysis of NCAPG , MELK ,KIAA0101 and TPX2 expression with prognosis of NSCLC patients. The patients with NSCLC were divided into two groups (high vs. low), according to the median expression level.