DOI: https://doi.org/10.21203/rs.2.10912/v1
Lung cancer (LC) is a major cause of cancer-related death worldwide, and the annual incidence and mortality are about 2.1 million and 1.8 million, respectively [1, 2]. The most common histological subtype is lung adenocarcinoma (LUAD), accounting for 82.6% of non-small cell lung cancer (NSCLC) [3] and 40% ~50% of LC [4]. Moreover, during the last several years the prevalence of LUAD has been increasing globally, especially in women, young patients and non-smokers [3, 5], and the morbidity of young patients even reaches 57.5%-77.9% [3, 6-11]. The 5-years survival rate of LUAD is approximately 15% [12], yet which in stage Ⅰ LUAD patients can achieve 100% [13-15]. At present, the methods widely used for LC diagnosis, such as sputum exfoliation cytology, x-ray, and CT scan do not improve the early diagnostic rate or distinguish LUAD from other tumor subtypes. Therefore, an early accurate diagnosis for LUAD is an extremely urgent need and a crucial approach to enhance patient’s chances to receive a proper treatment for improving the survival rate in clinical [16].
Circulating miRNAs as noninvasive effective diagnostic markers for NSCLC have recently became a major research area [17-20], and whether can be used as highly specific diagnostic biomarkers for LUAD have attracted more attentions. miRNAs are the broadest class of gene regulatory molecules in biological processes. The expression profile of miRNAs reflect the tumor development lineage and differentiation stages, is related to the clinicopathological features of tumor subtypes [21, 22], and can identify tumor histological subtypes based on origin, histology, invasiveness, and chemical sensitivity [23-25]. For the past few years, several researches have verified that circulating miRNAs can be used for early diagnosis of LUAD with good diagnostic efficiency [26-30], and many researchers have been working hard on the role of miRNAs in the pathogenesis of LUAD [31-33]. But inconsistences of miRNAs profiles existed among studies, due to the differences of the specimen, ethnicity, pre-analysis preparations, and other influencing factors [34-37]. So far, miR-21, miR-155, miR-210, miR-126, miR-486, miR-182, and miR-17 are the circulating miRNAs with most reported frequency and great diagnostic efficiency for NSCLC [18, 38-41]. Whether those circulating miRNAs can accurately identify LUAD and are predicted to apply in clinical practice is still inconclusive, and the specific biological processes and molecular regulatory mechanisms of miRNAs in LUAD are also undefined.
In present study, we performed a systematic summary of the published datasets based on the GEO database which investigated circulating miRNAs for LUAD detection. The primary purpose was to analysis the diagnostic performance of circulating miR-21, miR-155, miR-210, miR-126, miR-486, miR-182, and miR-17 for LUAD. Second, we explored the biological processes and molecular regulatory mechanisms of the aforementioned miRNAs with statistically significant in LUAD development by bioinformatics analysis.
Selection of GEO dataset and data extraction A systematic literature search was applied to identify studies assessing circulating miRNAs as diagnostic biomarkers for LUAD. We mined the GEO database for eligible articles until Apr 26, 2019. The search strategy as follows: ((lung OR pulmonary) AND (cancer OR carcinoma OR tumour OR tumor OR malignanc* OR neoplas* OR nodule OR adenoma*)) AND (microRNA* OR miRNA* OR miR*)). And we used the following filters to filter the search results again: "Homo sapiens"[porgn] AND "gse"[Filter]. The initial screening was to browse the title, using the following exclusion criteria: (1) non-related to NSCLC, (2) non-human studies, (3) non-related to the topic. The second screening used the following exclusion criteria: (1) the studies about the miRNAs in tissue, bronchial epithelium, and lung fibroblasts. (2) without health control group or relevant data. Two reviewers independently screened the involved microarray datasets, extracted the expression levels of miR-21, miR-155, miR-210, miR-126, miR-486, miR-182, and miR-17 used for diagnosis LUAD. Subsequently, controversies from the two reviewers were resolved by consensus.
Statistical analysis The RevMan software version 5.3 and MetDiSc software version 1.4 were used to perform statistical analyses in this study. In addition, sensitivity, specificity, and AUC values of circulation miRNAs for LUAD were calculated by MedCalc software version 18. Standardized mean difference (SMD) was used to assess the relationship between circulating miRNAs expression levels and LUAD. A fixed-effect model was adopted to merge SMD when the heterogeneity was low (I2<50%, or P>0.1). Otherwise, a random-effect model was used [42]. Eatimaed SD by the fomula: SDestimated≈f×range [43]. P≤0.05 was considered statistically significant. Bar plot was performed using statistical software R x64 (version 3.5.3).
Target prediction analysis and bioinformatic analysis we extracted the predicted target genes in 11 miRNA databases (miRWalk, Microt4, miRanda, mirbridge, miRDB, miRNAMap, Pictar2, PITA, RNA22, RNAhybrid, Targetscan), and the genes overlapped in at least 5 of 11 databases were selected as target genes. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of overlapped genes were analyzed by online DAVID database (P<0.05 and FDR>1.5 was considered statistically significant). Furthermore, we used the online STRING database to construct protein-protein interaction (PPI) network analysis (combined score>0.9 was considered statistically significant) and used MCODE plugin of Cytoscape to identify the hub genes.
Results from the GEO database Finally, six microarray datasets (concluded 115 LUAD patients and 96 healthy controls) were eligible for this study (Figure 1), and the original expression levels of circulating miR-21, miR-155, miR-210, miR-126, miR-486, miR-182 and miR-17 were supplied in supplement material 1. Compared with healthy controls, meta-analysis showed SMD of miR-210 in LUAD patients had statistically significant, SMD value was 1.86 (95%CI (3.52, 0.20), P=0.03). The forest plot was presented in Figure 2. SMD of miR-21, miR-155, miR-126, miR-486, miR-182, and miR-17 had no statistically significant, the forest plots were shown in supplement material 2. The heterogeneity result of circulating miR-210 was severe (I2=94%, P<0.00001, Figure 2). The pooled sensitivity, specificity and AUC values of miR-210 were 0.83, 0.60 and 0.78, respectively. SROC curve was depicted in Figure 3.
GO and KEGG analyses A total of 480 overlapped genes were obtained, all predicted target genes included in 11 miRNA databases were displayed in supplement material 3. Predicting functional annotation information of overlapping genes by GO enrichment analysis, as shown in Figure 4, GO-BP included 24 statistical items, the top three enriched items were response to hypoxia, positive regulation of chemokine biosynthetic process, and leukocyte migration; GO-MF included 10 statistical items, the top three enriched items were protein complex binding, protein kinase activity, and transcription factor activity, sequence-specific DNA binding; GO-CC included 4 statistical items, the top three enriched items were extracellular matrix, integral component of plasma membrane, and focal adhesion. The statistically significant KEGG enrichment pathways were 21 items, the main enrichment pathways were pathways in cancer, Chagas disease (American trypanosomiasis), and TNF signaling pathway, as shown in Figure 4.
PPI network analysis PPI network contained 480 nodes and 239 edges, p=0.0138. The protein network diagram was presented in Figure 5. Used MCODE plugin of Cytoscape identified 3 significant modules and nine hub genes: FBXO, FBXL, MGRN1, ATG7, CUL3, RAB, ADAMTS, SEMA, THBS2, shown in Figure 6. The standard settings as follows: Degree Cutoff =2, Node Score Cutoff=0.2, K-Core=2, Max.Depth=100.
Based on the six eligible microarray datasets, we compared the expression levels of circulating miR-21,miR-155,miR-210,miR-126,miR-486,miR-182, and miR-17 between LUAD patients and healthy controls, and found that miR-210 was the only statistically significant circulating miRNA with excellent diagnostic performance for LUAD (AUC value was 0.83). Then we elucidated the oncogenic role of miR-210 in LUAD through bioinformatics analysis. We identified 38 statistically significant GO items and 21 KEGG pathways. Consistent with previous researches, we again proved that hypoxia was an important feature in LUAD and miR-210 was regarded as the most important hypoxemir associated with the tumorigenesis of LUAD. We also constructed PPI network of the target genes and identified nine hub genes involved. Therefore, this study provided evidences for circulating miR-210 as a promising noninvasive biomarker for early LUAD detection, and provided a foundation for further researches on circulating miR-210 in the pathogenetic of LUAD.
MiR-210 is one of the most widely studied miRNAs and has captured great attention since it has been shown to be associated with various biological processes and the development of different human diseases thus far [44]. The increasing literatures exploring the role of circulating miR-210 in LUAD have been proved miR-210 to be a noninvasive biomarker for LUAD detection [45-47]. In Tamiya H’s research, it revealed the AUC value of exosomes miR-210 in the diagnosis of LUAD combined with pleural effusion was 0.81 [48]. In Y. HE’s study, circulating miR-210 combined with other miRNAs (miR-199a-3p, miR-148a-3p, miR-378d, miR-138-5p) had a diagnostic specificity of 90.2% in LUAD presenting with pulmonary nodules [49]. Also, Shen et al used plasma miRNAs panel (miR-210, miR-21, miR-486-5p, and miR-126) for the detection of LUAD which yield 92% sensitivity and 97% specificity [50]. Not long ago, one research demonstrated that serum miR-210 displayed considerable accuracy in discriminating LUAD patients from healthy controls, the AUC value was 0.84 [46]. With the consistent of previous results, this study showed the combined AUC value of circulating miR-210 for LUAD were 0.83. Theoretically, circulating miR-210 test was a promising method to diagnosis LUAD in this study. It provides a novel approach for improving management of LUAD, which is a major subtype of NSCLC, and finding efficacy noninvasive detection markers to classify NSCLC subtypes. Thus, future validation on circulating miR-210 as a noninvasive specific biomarker for LUAD should preferably be done in the context of large scale prospective studies.
Overall,we demonstrated that the diagnostic performance of circulating miR-210 for LUAD appeared to be rather promising. However, there was severe heterogeneity among studies reported circulating miR-210 as biomarkers for LUAD in this meta-analysis (Figure 2). One of the heterogeneity is that the variation of sample types and population characteristics among studies. The miRNA profiles are differences, which were abtained form different sample types and different sample preparation [34, 51]. Although plasma, serum and whole blood are all origin of blood-based samples, reports described the variations circulating miRNA profile by sample handling and preparation [52, 53], and in which the major factor of concern is the blood fractionation protocol [51, 54, 55]. Population characteristics, such as age, smoking status, and ethnicity, have been demonstrated as the other potential source of miRNA level variability in LC patients [37, 56, 57]. Another important source of heterogeneity was the small size sample in eligible datasets. Three of six datasets have small size sample (GSE93300, GSE94536, GSE111803). Sample size is one of the critical elements affecting the statistical power, and large population researches can help to reduce analytical bias and to improve diagnostic performance. In this study, one dataset (GSE103149) only uploaded the averaged normalized values without the original expression levels of circulating miRNAs and in which SD values were estimated by estimation formula. It may be one more factor of the heterogeneity. Considering the limitations mentioned above, developing new constantly technologies and bioinformatics tools to reduce the analysis bias are necessary.
The roles of miRNAs in the initiation and progression of LUAD is with complicated gene expression and signaling pathways, researchers worldwide have been focus on the theme recent years [27, 31, 58, 59]. In this study, we predicted the target genes of miR-210 in LUAD as well, and then used bioinformatics analysis to explore the potential biological processes and molecular pathogenesis of the target genes involved. And the results showed that nine central genes were identified (FBXO, FBXL, MGRN1, ATG7, CUL3, RAB, ADAMTS, SEMA, THBS2). In GO annotation and KEGG pathway analysis, the results revealed that responsing to hypoxia was the mainly involved biological process, the major site of biological process was extracellular matrix, protein complex binding was the dominating molecular function, and pathways in cancer was the significant pathway aggregating most genes. In solid tumors, oxygen concentration is reduced variously in which hypoxia is the most common neoplastic microenvironment. A current overview shows miR-210 has been identified as a major miRNA induced under hypoxia, and plays numerous crucial roles in the cellular response to hypoxia, such as in apoptosis [60], angiogenesis [61], cell cycle regulation [62], and mitochondrial metabolism [63]. Furthermore, several studies have suggested a direct connection between miR-210 and hypoxia, specially carrying the HIF-1a-binding site in its promoter [64-66]. While miR-210 has garnered interest as a prospective biomarker for LUAD detection, further work is require to confirm its detailed role in biological processes and molecular mechanisms.
Our study suggested that circulating miR-210 had great potential to be used as a noninvasive diagnostic biomarker for LUAD. Compared to previous reviews, we focused on studies identified to LUAD patients from health controls based on the original expression levels of circulating miR-210 in order to improve early diagnosis rate. We also analyzed the possible source of heterogeneity among eligible datasets in this study. In particular, we rediscovered that response to hypoxia may be the mainly involved biological process of miR-210 in LUAD and predicted nine hub genes through bioinformatics analysis. However, the specific role of hub genes is still not fully known. Furthermore, a big set of samples prospective and molecular biology researches are needed to validate the diagnostic efficacy of circulating miR-210 for LUAD detection and to search for specific biological processes and target genes of LUAD.
FBXO: member F-box protein family, FBXL: member F-box and leucine rich repeat protein family, MGRN1: mahogunin ring finger 1, ATG7: autophagy related 7, CUL3: cullin 3, RAB: member RAS oncogene family, ADAMTS: member ADAM metallopeptidase with thrombospondin type 1 motif family, SEMA: semaphorin, THBS2: thrombospondin 2. SD: standard deviation, SEN: Sensitivity, SPE: Specificity, SROC: Summary receiver operating characteristic, AUC: The area under the SROC curve. MiRNA: MicroRNA.
Ethics approval and consent to participate:Not applicable.
Consent for publication:Not applicable.
Availability of data and material:The dataset supporting the conclusions of this article is included within the article and its additional file.
Competing interests:The authors declare no competing interest
Funding:None.
Authors' contributions: HX and JRX designed this study. JRX, HX and ENJ collected literatures and conducted the analysis of pooled data. NR helped to draft the manuscript.
JRX wrote the manuscript. All authors contributed to review the manuscript. All authors read and approved the final manuscript.
Acknowledgements:Not applicable