Differential analysis of genes
In the aggregate, 1995 differentially expressed lncRNAs, with corresponding clinical information were extracted from TCGA and GSE6791 databases mentioned above. Of these, 567 lncRNAs were highly expressed and 1428 lncRNAs were expressed at low levels in the CC samples (Table 1). A total of 64 lncRNAs were obtained from the intersection of the two databases (Fig. 1A), among which 10 lncRNAs were consistently up-regulated and 54 lncRNAs were consistently down-regulated in the two databases. A total of 1040 IRGs were identified from the InnateDB. The ssGSEA analysis was based on 201 IRGs obtained from the intersection of TCGA and InnateDB databases (Fig. 1B) to compute IS. Finally, a cohort of 28-IRLs was obtained based on the Pearson correlation coefficient analysis between the 64 lncRNAs and IS of 201 IRGs(P༜0.01).
Table 1
Differential mRNAs and lncRNAs
|
|
mRNA
|
lncRNA
|
TCGA
|
up
|
3305
|
472
|
down
|
4163
|
733
|
total
|
7468
|
1205
|
GSE6791
|
up
|
4680
|
95
|
down
|
3133
|
695
|
total
|
7813
|
790
|
Univariate Cox regression and Kaplan–Meier survival analysis
Univariate Cox regression and Kaplan–Meier survival analysis were performed for the cohort of 28 IRLs. Four IRLs (BZRAP1-AS1, EMX2OS, ZNF667-AS1, and CTC-429P9.1) with low expression were found to be in accordance with our expectation and were then included in the signature development. The results are shown in Table 2. The Kaplan–Meier survival analysis revealed that four IRLs were related to OS in CC significantly (p ༜0.05) (Fig. 2). The four IRLs were defined as protective factors due to their HR value ༜1, which showed that the high expression of the four IRLs was associated with lower OS.
Table 2
K-M analysis, Univariate Cox regression analysis and Pearson correlation analysis of 4-IRLs
|
K-M analysis
|
Univariate Cox regression analysis
|
Pearson correlation analysis
|
differential expression analysis
|
lncRNA
|
p.value
|
HR
|
lower.95
|
upper.95
|
p.value
|
r
|
p.adj.value
|
up_down
|
BZRAP1-AS1
|
0.0002
|
0.82
|
0.72
|
0.93
|
0.00
|
0.32
|
2.84E-08
|
down_lnc
|
EMX2OS
|
0.0010
|
0.91
|
0.82
|
1.00
|
0.05
|
-0.25
|
3.01E-05
|
down_lnc
|
ZNF667-AS1
|
0.0019
|
0.87
|
0.77
|
0.99
|
0.03
|
-0.17
|
0.007
|
down_lnc
|
CTC-429P9.1
|
0.0159
|
0.93
|
0.82
|
1.06
|
0.27
|
-0.24
|
5.15E-05
|
down_lnc
|
Construction of the prognostic risk model and validation using four IRLs
A total of 304 TCGA samples were regarded as the total-set so that there were both 152 samples in the training-set and the valid-set as described in the methods. The regression coefficient β was first generated from the training-set (βZNF667−AS1= -0.152565, βEMX2OS=-0.019887, βBZRAP1−AS1=-0.17831, βCTC−429P9.1 = 0.0096755, Table 3) to establish the risk score(RS) formula. Hence, the prognostic risk model for corresponding samples was established using the following formula:
Table 3
Ensemble_ID
|
lncRNA
|
β
|
ENSG00000166770.10
|
ZNF667-AS1
|
-0.152565
|
ENSG00000229847.8
|
EMX2OS
|
-0.019887
|
ENSG00000265148.5
|
BZRAP1-AS1
|
-0.17831
|
ENSG00000269427.1
|
CTC-429P9.1
|
0.0096755
|
risk score (RS) = exprBZRAP1−AS1*(-0.152565) + exprEMX2OS*(-0.019887) + expr BZRAP1−AS1*(-0.17831) + expr CTC−429P9.1*0.0096755
Exprgene indicated the expression value of corresponding IRL for each sample. An RS higher than the median was identified as a high-risk group while an RS lower than the medium was identified as a low-risk group. Using this approach, a risk model signature based on four IRLs was constructed, which was further validated in the total-set and valid-set using the same β to confirm the prediction potential of the 4-IRL signature. The Kaplan-Meier survival curves based on logRank test revealed that OS in the low-risk group was markedly longer than that in the other group in all three sets (P Training−set =0.0068, P Valid−set =0.02, P Total−set =0.0015, Fig. 3A, 3B, 3C). The one-year, two-year and three-year survival ROC curves predicted by the risk model indicted that the AUCs were larger than 0.65 (0.695, 0.66, 0.676, Fig. 3D),thus predicting that the risk score model could efficiently forecast over 65% of OS for CC patients. Therefore, the risk model signature based on four IRLs was accurate in predicting OS of CC patients.
Clinicopathological characteristics of the risk score model
A heat map was constructed by combining the expression values of four IRLs and their clinicopathological characteristics (Fig. 4A). The higher the RS, the lower the IS. RS and IS values showed a significant negative correlation based on the scatterplot of the correlation coefficient (r=-0.14, p = 0.01631, Fig. 4B). The scatterplot for the distribution of IS in risk groups showed that the IS of the high-risk group was significantly lower than that of the low-risk group (Fig. 4C). Furthermore, the high expression of the four IRLs could be seen with low RS, which suggested that the up-regulation of the four IRLs were associated with better prognosis. In contrast, low expression of the four IRLs had the opposite consequence. An RS contrast of neoplasm histologic grade showed that the RS in stages IIB-III-IV was significantly greater than that in stages I-II-IIA (Fig. 4C). This part of the result suggests that a high-risk score has an adverse effect on prognosis, which may be caused by a decrease in immune score.
Nomogram model construction and visualization
The univariate and multivariate Cox regression analyses of clinicopathological characteristics and the four-IRL signature for total TCGA dataset demonstrated that age, AJCC stage along with the four-IRL signature were all independent prognostic factors (P༜0.05,Table 4). Nonetheless, neoplasm histologic grade, radiation therapy, and a history of tobacco-smoking were not correlated with prognosis independently. To better predict prognosis at one-, three-, and five-year OS of CC patients, we constructed a new nomogram using variables such as age, AJCC stage, and the four-IRL signature (Fig. 5).
Table 4
Univariate and multivariate cox regression analyses of clinical parameters
|
Univariate
|
Multivariate
|
Variable
|
HR
|
p.value
|
HR
|
p.value
|
Age,years
|
|
|
|
|
>=60 vs < 60
|
1.86253
|
0.01696
|
3.689112
|
0.048533
|
Grade
|
|
|
|
|
III-IV vs I-II
|
0.889401
|
0.661005
|
|
|
T stage
|
|
|
|
|
T3-T4 vs T1-T2
|
3.64268
|
0.000074
|
3.869304
|
0.060251
|
N stage
|
|
|
|
|
N1 vs N0
|
2.694785
|
0.0046
|
2.319201
|
0.132728
|
M stage
|
|
|
|
|
M1 vs M0
|
3.64563
|
0.020534
|
2.24E-07
|
0.998132
|
AJCC stage
|
|
|
|
|
IIB-III-IV vs I-II-IIA
|
1.861189
|
0.009193
|
0.200615
|
0.057357
|
Radiotherapy
|
|
|
|
|
Yes vs no
|
1.091559
|
0.828872
|
|
|
tobacco_smoking_history
|
|
|
|
|
> 1 vs = 1
|
1.469726
|
0.12418
|
|
|
Four-lncRNA signature
|
|
|
|
|
High risk vs Low risk
|
2.137914
|
0.001933
|
3.047015
|
0.032286
|
Gene set enrichment analysis
GSEA analysis of the two risk groups was carried out to predict enrichment status disparities of molecular mechanism functions. The enrichment analysis showed that 118 biological functions were markedly enriched in the low-risk group, whereas only one biological function(GO_ATP_SYNTHESIS_COUPLED_ELECTRON_TRANSPORT)was enriched in the high-risk group. In the enrichment status enriched in the low-risk group, we sought out couple immune-related responses as shown in Fig. 6.
Evaluation of infiltrating immune cells
Previous studies have shown that infiltrating immune cells are closely related to the prognosis and treatment of malignant tumors[29]. From the GSEA analysis in our study, we discovered that the four-IRL signature was associated with many immune characteristics. Hence, the abundance of twenty-two infiltrating immune cells (Fig. 7A) were estimated which showed that the abundance of nine infiltrating immune cells (B cells naïve, B cells memory, T cells CD4 memory resting, NK cells resting, macrophages M1, dendritic cells activated, mast cells resting, mast cells activated and neutrophils) was significantly different (P༜0.05) between the risk groups (Fig. 7B). The abundance of four infiltrating immune cells (B cells naïve, T cells CD4 memory resting, macrophages M1 and mast cells resting) in the low-risk group were significantly higher than that in the high-risk group as shown by the Student's t-test(P༜0.05).
Construction of an IRL-associated ceRNA network
Four-IRLs,forty-five hub mRNAs༌and thirty-eight miRNAs were involved in ceRNA network (Fig. 8). A total of 46 lncRNA-miRNA pairs, 232 miRNA-mRNA pairs, and 55 lncRNA-mRNA pairs were identified. The downregulation of 12 mRNAs in the ceRNA network was significantly related to OS in CC. Among them, the downregulation of seven mRNAs (CXCL12, FREM1, IGF1, IRF4, NFATC2, NTN1, and STAT6) had an adverse effect on prognosis, which was in line with what was expectated (Fig. 9).