The construction and analysis of ceRNA network and patterns of immune infiltration in lung adenocarcinoma

Background Competitive Endogenous RNA (ceRNA) may be closely associated with tumor progression. However, studies on ceRNAs and immune cells in LUAD are scarce. Method The profiles of gene expression and clinical data of LUAD patients were extracted from the TCGA database. Bioinformatics methods were used to evaluate differentially-expressed genes (DEGs) and to form a ceRNA network. Preliminary verification of clinical specimens was utilized to detect the expressions of key biomarkers at the tissues. Cox and Lasso regressions were used to identify key genes, and prognosis prediction nomograms were formed. The mRNA levels of 9 genes in the risk score model in independent clinical LUAD samples were detected by qRT-PCR. The interconnection between the risk of cancer and immune cells was evaluated using the CIBERSORT algorithm, while the conformation of notable tumor-infiltrating immune cells (TIICs) in the LUAD tissues of the high and low risk groups was assessed using the RNA transcript subgroup in order to identify tissue types. Finally, co-expression study was used to examine the interconnection between the key genes in the ceRNA networks and the immune cells. Result A ceRNA network of 115 RNAs was established, and nine key genes were identified to construct a Cox proportional-hazard model and create a prognostic nomogram. This risk-assessment model might serve as an independent factor to forecast the prognosis of LUAD, and it was consistent with the preliminary verification of clinical specimens. Survival analysis of clinical samples further validated the potential value of high risk groups in predicting LUAD prognosis. Five immune cells were identified with significant differences in the LUAD tissues of the high and low risk groups. Besides, two pairs of biomarkers associated with the growth of LUAD were found, i.e., E2F7 and macrophage M1 (R = 0.419, p = 1.4e− 08) and DBF4 and macrophage M1 (R = 0.282, p < 2.2 e− 16). Conclusion This study identified several important ceRNAs, i.e. (E2F7 and BNF4) and TIICs (macrophage M1), which might be related to the development and prognosis of LUAD. The established risk-assessment model might be a potential tool in predicting LUAD of prognosis. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-08932-z.


Introduction
Lung cancer (carcinoma) is one of the most prevalent tumors with rapid progression ability, high metastatic potential, high morbidity, and fatality [1]. The World Health Organization reported in 2018 that incidences and death rates of lung cancer were 11.6 and 18.4% respectively. There are two main subtypes of lung carcinoma, namely, small-cell lung cancer (SCLC) and nonsmall-cell lung cancer (NSCLC). Pathologically, there are several subtypes of NSCLC, namely, lung adenocarcinoma (LUAD), squamous carcinoma (LUSD), and adenosquamous carcinoma (AC). Among these NSCLCs, LUAD is the most prevalent subtype in histology. Its rapid progression ability mainly due to the high metastatic nature of the tumor. This is the most common reason for treatment to fail [2]. LUAD is one of the most critical malignancies, which is often detected at the progressive stage with a poor clinical prognosis [3]. Numerous genetic and molecular biomarkers, such as proteincoding and non-coding genes are often utilized as diagnostic and remedial means for LUAD [4][5][6][7][8]. Among these biomarkers, the competitive endogenous RNAs (ceRNAs) and tumor-infiltrating immune cells (TIICs) are potentially the most important ones, as they could affect tumor development. However, studies on the ceR-NAs are scarce with poor clinical prognosis, and the factors related to the LUAD progression remain largely unknown.
The ceRNA network assumes that RNA transcripts containing microRNA (miRNA) response elements could seal miRNA from other targets obtaining the same miRNA response elements, thus, modulating its expression and life processes [9]. Earlier studies found that the ceRNA networks were involved in the development, metastasis, and prognosis of lung cancers, such as linc00665 and miR-98 [10] networks. Besides, the presence of TIICs, especially the leukocytes, supervise the immune system activities, may played a marjor role in the development, progression, and metastasis of neoplastic cells, including LUAD [11][12][13]. However, only a handful of investigations focused on the regulatory mechanism between the ceRNA network and TIICs, but not LUAD [14].
This study aimed to construct a ceRNA network risk assessment model that could be used for prognosis prediction and immune infiltration analysis of LUAD. A LUAD-related ceRNA network was established based on gene expression profiles to identify key LUAD genes, and the prognostic values of these genes were determined using a Cox proportional-hazard model. Also, this study estimated the proportion of TIICs in LUAD tissues in the high and low risk groups using the Cell Type Identification Estimation of Relative Subpopulation RNA Transcription (CIBERSORT) algorithm. Moreover, co-expression analysis was used to forecast the occurrence and growth of LUAD. The mechanism of LUAD development was also considered. The risk-assessment model and the established mechanism of regulation might provide us with a potential insight for predicting the clinical occurrence and development of LUAD.

Materials and methods
Source of data and analysis of disparate gene expression In this study, data were extracted from The Cancer Genome Atlas (TCGA; https://gdc-portal.nci.nih.gov/). These data comprised of the clinical information of LUAD patients and the profiles of long non-coding RNA (lncRNA), messenger RNA (mRNA), and microRNA (miRNA). Data were organised by different expression profiles, ID conversion, filtering, merging, correction, and clinical information. Demography and other information of patients, survival endpoints, and the histology and tumor stage of the LUAD were also acquired. Genes that showed no expression in LUAD (genes that did not show in tests and control groups) were seperated. Differentially expressed RNAs, including Differentially Expressed LncRNAs (DELs), Differentially Expressed miRNAs (DEMs) and Differentially Expressed mRNAs (DEGs) were determined utilizing the edgeR method. The minimum and maximum regulated genes, | log 2 Table 1 RT-qPCR primers fold change (log 2 FC) | > 1.0, and the false discovery rate (FDR) < 0.05 were considered as thresholds.

Patient selection
Patients who underwent surgical resection of primary LUAD at The Second Affiliated Hospital of Qiqihar Medical College, Qiqihar, Heilongjiang Province, China between Oct. 2019 and Aug. 2021 were selected for this study. Tumor and non-tumor specimens (located > 2 cm from the tumor margin) obtained from the enrolled patients were analyzed by RT-qPCR for nine genes. All patients did not receive chemotherapy or radiotherapy prior to surgery. Pathological diagnosis was made by 2 pathologists. The Second Affiliated Hospital of Qiqihar Medical College institutional review board approved the study protocol, and all patients provided written informed consent.
Construction of LUAD-associated ceRNA networks and identification of genes associated with prognosis Before the analysis of statistics, experimentally validated information on miRNA-mRNA interactions based on miRTarBase was downloaded from the experimental modules of the database Lncbase v.2, along with the lncRNA-miRNA interaction data. All interactions were validated by direct molecular mechanisms, such as luciferase reporter experiments and immunoprecipitation.
To simultaneously regulate lncRNAs and mRNAs, the hypergeometric testing and correlation analysis of miR-NAs was performed to obtain DEM-DEG pairs. Axes of DEM-DEG were obtained to build the ceRNA network    In order to test the modeling derived from different genes, the key genes were screened with Cox and Lasso regressions to create a predictive point chart. The Cox model encompassed all elements of the ceRNA network, and the Lasso regression was used to find the overfitting of the model. The values of all biomarkers were determined using the Cox proportional-hazard model. A histogram was created with the model to forecast the survivalbility of LUAD patients. Calibration curves and receiver operating characteristic curves (ROCs) were used to assess the precision and discriminatory power of the chart. Finally, each biomarker was validated using the Kaplan-Meier survival analysis method.
Immune risk assessment of key RNAs, CIBERSORT estimation, and the validation of initial clinical specimens Univariate and multivariate Cox regressions were used. Factors like age, sex, tumor stage, TNM, and risk score were performed to examine if the constructed immune risk model was unconstrained by clinicopathological parameters. The CIBERSORT algorithm (citation) was used to identify cell types by estimating the relative subset of RNA transcripts of 22 cell types to determine the relationship between risk and immune cells in LUAD. Cell types in samples with a CIBERSORT output of p < 0.05 were further identified by determining the relative subset of RNA transcripts. The Wilcoxon rank-sum test was employed to examine whether the proportion between high and low risk groups in LUAD tissue was different by identifying the immune-infiltrating cells. Expressions of the key genes in the tissues were verified through searching over the database of The Human Protein Atlas (https://www.proteinatlas.org/). The immunohistochemical results of key genes in ceRNA were compared.

RNA extraction and qRT-PCR
Total RNA was extracted from LUAD tissues with Trizol Reagent (Invitrogen, Carlsbad, CA, USA), which was reversely transcribed with RT reagent Kit gDNA Eraser (TaKaRa). And then, cDNA expression levels were detected by SYBR-Green (TaKaRa) and RT-qPCR analysis with GAPDH as internal reference. The primers were shown in Table 1. PCR amplification was carried out in a formula of three Wells. All experiments were repeated three times and genes' relative expression levels were studied with 2 -△△Ct .

Statistical analyses
The software R was used to perform all statistical analyses, such as preprocessCore package, edgeR package, rms package, and ggplot2 package. The survival analysis was performed using the Fisher's precision test for the high and low risk score models. Only a two-sided p value < 0.05 was considered statistically significant.

Result
Identifying genes with markedly different expression

Construction of the ceRNA network and identification of genes associated with prognosis
The ceRNA network was constructed in this study with 115 genes (seven lncRNAs, 15 miRNAs, and 93 mRNAs) in order to examine the pattern of differential gene regulation ( Fig. 2A). Table 2 shows the ceRNA network with a hypergeometric p -value < 0.05 as the cut-off value. In addtion, the results of gene survival analysis were screened for prognostic significance of ceRNA nodes with p < 0.05 as the cut-off value. As a result, this study demonstrated the survival curve of two genes, SNHG3 and CCT6A, respectively ( Fig. 2B-C). Both curves show that the expression levels of SNHG3 and CCT6A were associated with the overall survivalbility of patients with LUAD.

Risk assessment analysis of the prognostic value of key RNAs in ceRNA networks
In this study, 38 genes in the ceRNA network were analysed utilizing univariate Cox regression, and nine genes were screened and tested utilizing multivariate Cox regression. Nine genes (DBF4, CPS1, CDC14A, CCT6A, SLC16A1, E2F7, GPR37, SNHG3, Hsa-miR-326) in the ceRNA network were merged into the Cox proportionalrisk model to assess its prognostic value (Fig. 3A). A nomogram was established using the risk score of the prediction model and clinical variables to calculate the overall survival rate of LUAD patients at the first, third, and fifth years (Fig. 3B). After adopting multivariate cox analysis, Lasso regression was perfromed to assess the stability of the genetic model ( Fig. 3C and D). The timedependent ROC curves and calibration curves ( Fig. 3E and F) showed good accuracy for the area under the curve (AUC) as well as the calibration of the nomogram.
The AUC values for first-year, third-year, and fifth-year survivalbility were 0.718, 0.705, and 0.707, respectively.

Preliminary verification of clinical specimens
To verify the expression of key genes in tissues, we searched through The Human Protein Atlas database (https://www.proteinatlas.org/). Immunohistochemical results showed that CCT6A, CDC14A, DBF4, and SLC16A1 proteins were showing different attributes in differentiating healthy tissues and LUAD (Fig. 4).

Risk-scoring models as independent predictors for LUAD prognosis
The risk scores and survivalbility of LUAD patients are shown in Fig. 5A and B, respectively, and LUAD patients were separated into high and low risk groups in terms of risk scores. To investigate whether the constructed risk score model was independent of clinicopathological parameters, the univariate and multivariate Cox regression analysis for age, sex, stage, TNM, and risk score were performed. In univariate Cox model, pathologic T, N stage, pathologic stage, and high-risk score were related to poor survivalbility (Fig. 5C). And Fig. 5D shows that in multivariate Cox model, the risk score was the sole parameter that could independently predict the overall survivalbility. The Kaplan-Meier survival curve was constructed for the high and low risk groups with a statistically meaningful difference in the survivalbility between the two groups (p < 0.001), and The low-risk group had a superior OS than that the high-risk group (Fig. 5E). Based on the above findings, we further designed a validation cohort to confirm the potential value of riskscore model. levels in predicting LUAD prognosis. Twenty patients with LUAD were enrolled. LUAD patients were divided into two high-and low-risk groups based on the levels of risk score (median value), patient survival of low-and high-risk score groups was analyzed by Fisher's exact test at 1 year of return visit. As shown in the contingency table, high risk score group was correlated with poor outcome of LUAD patients (Table 3), which suggested the expression level of nine genes have a high prognostic value in patients with LUAD.
The composition of TIICs between the LUAD high-and low-risk scores As illustrated in Fig. 6, the histogram (Fig. 6A) and heat maps (Fig. 6B) were generated from the CIBERSORT algorithm for assessing the composition of significant TIICs in LUAD tissues of high and low risk groups. In both groups, the T-cell CD4, T-cell CD8, macrophage M0, and macrophage M2 were significantly demonstrated in LUAD tissues. The Wilcoxon rank-sum test showed that five immune cell components, i.e., B-cell naïve (p = 0.002), T-cell CD4 naïve (p = 0.028), T-cell gamma δ (p = 0.029), monocytes (p = 0.049), and macrophage M1 (p = 0.010), were significantly different in LUAD tissues between the high and low risk groups (Fig. 6B).

The co-expression of genes and immune cells for prognosis
In order to explore the role of key genes in the ceRNA network and tumor immune cells played during prognosis of LUAD, co-expression analysis of key genes in the ceRNA network and different TIICs acquired from LUAD samples from high and low risk groups were performed. The result shows that the E2F7 gene and macrophage M1 (the correlation coefficient, R = 0.42, p < 2.2e − 16 ), DBF4 and macrophage M1 (R = 0.28, p = 1.4e − 08 ) displayed a positive correlation (Fig. 7).

Discussion
In this study, a bioinformatics approach was used to predict lung cancer progression and metastasis based on RNA sequencing data extracted from the TCGA database, and 93 DE mRNAs, seven DE lncRNAs, and 15 DE miRNAs were found to be dysregulated in LUAD tissues. The relationship between differentially expressed RNAs in the ceRNA network and the clinical prognosis of LUAD patients showed that key genes could well-predict LUAD. Besides, the Cox proportional-risk model might serve as an unconstrained factor in predicting LUAD. Looking at the nomogram of the nine key genes (hsa-miR-326, DBF4, CPS1, CDC14A, CCT6A, SLC16A1,   1.4e − 08 ) were significantly associated with the attributes of co-expression. And their associations are crucial in predicting LUAD. In recent years, miRNA is one of the hot topics in tumor research, largely because aberrant expression of miRNA is related to tumorigenesis and progression of multiple tumors [15]. This study predicted that 15 DE miRNAs were linked to the development of LUAD. In particular, aberrant expression of hsa-miR-326 (miR-326) is involved in a variety of pathological processes, including endometrial cancer, gastric cancer, lung cancer, osteosarcoma, pulmonary fibrosis, and breast cancer. Thus, it is used as a biomarker for identifying cancer, treatment, and prognosis [16][17][18][19][20][21]. As a repressor of the Hedgehog signaling pathway, miR-326 controls the growth of cerebellar neuronal progenitors and cancer cells [22]. It is also found on patients with type I diabetics and leukemia [23][24][25]. Meanwhile, low miR-326 expression in gastric tumor is related to clinical stage, tumor depth, lymph node metastasis, and distant metastasis; it is a relatively poorly constrained prognostic factor for gastric tumor [26]. In glioblastoma tissues, miR-326 was downregulated and involved in tumorigenesis and progression of glioma, and small amount of miR-326 expression was correlated with clinicopathological factors and prognosis of glioma [27]. However, studies on hsa-miR-326 and the development of LUAD are generally scarce.
Immune infiltration of the tumor microenvironment is an important factor affecting the immune response and prognosis. Macrophages, which are crucial in the metastatic process, are major components of TIICs and often trigger local inflammation [28]. In this study, macrophages in tumor bodies were divided into M1 and M2 types. In the early stages of tumor development, macrophages either act as phagocytose individual tumor cells or act as antigen presenting cells (APCs) to trigger an immune response of CD8+ T cells. Eventually, when CD8+ T cells are unable to generate enough immune effect, tumor-associated macrophages (TAMs) would secretly stimulate tumor growth or angiogenesis [29]. However, macrophage infiltration in the tumor stroma is a negative prognostic factor for LUAD [30]. In NSCLC, TAMs would stimulate tumor metastasis via the TGF-inhibitor/ SOX9 axis [31]. Specifically, the M2 subtype stimulates the invasion of lung cancer cells, while the M1 subtype suppresses tumor formations [32]. As a result, macrophages are a key factor in the metastatic process of LUAD [33].
E2F7 is a member of the family of E2F transcription factors (E2Fs), which plays an important role in cell proliferation, differentiation, and apoptosis. There are eight genes in the E2F family, designated as E2F1 -E2F8 in the order of discovery. E2F7 abnormalities seem to have a crucial function in the growth of cancer cells. In breast-cancer patients treated with tamoxifen, the abundant apperence of E2F7 is associated with a high possibility of recurrence and poor prognosis. E2F7 usually competes with E2F1 to suppress miR-15a/16 clustering [34]. The acquisition of the E2F7 function counteracted the effects of miR-30a-5p on cell propagation and metastasis [35]. The abundant aperence of E2F7 shown within NSCLC tissues usually correlated with poor prognosis. It usually inhibits cell propagation, migration, invasion, development of cancer, EMT and AKT pathways in NSCL C cells by targeting miR-935 [36]. However, related studies are scarce. In summary, this study analysed DE mRNAs, lncRNAs, and miRNAs in LUAD cancer and para cancer using an integrative biological approach. A ceRNA network of lncRNA-miRNA-mRNA ceRNA was constructed, uncovering a potentially new regulatory mechanism. The relationship between DE mRNAs, lncRNAs, and miRNAs in the ceRNA network and a Cox proportional-risk model in ceRNAs was examined to predict LUAD prognosis. This riskassessment model could serve as an independent factor to predict LUAD prognosis. In LUAD tissues, five immune cell types with significant differences were identified using the CIBERSORT algorithm and coexpression analysis, revealing significant correlations between E2F7 and macrophage M1 (R = 0.42, p < 2.2e − 16 ) and DBF4 and macrophage M1 (R = 0.28, p = 1.4e − 08 ). These two pairs of co-expressed genes and their associated mechanisms would play an important role in predicting LUAD prognosis.