3.1. Filtering the Stem Cell-Related Genes
We obtained 31 stem cell division related-genes from Molecular Signatures Database and deleted 1 gene that is not in TCGA. Ultimately, a total of 30 associated genes were selected (FIG. 1). To select these survival-related gens, we used LASSO regression analysis. Therefore,16 genes (EV12B, FZD7, ING2, MLLT3, NCOA3, NOTCH1, PRDM15, SFRP2, SMYD5, TGFB2, THOC2, TIAL1, WNT7A, WWTR1, ZBTB16, and ZFP36L2) were identified (FIG. 2). And then, a further analysis of these genes was conducted in the following study.
3.2. Construction a 11-Gene Signature Panel
TAB. 1 Details of the 11 genes included in the signature
|
mRNA
|
Ensemble ID
|
Chromosome location
|
β(Cox)
|
HR(95%CI)
|
p
|
EVI2B
|
ENSG00000185862
|
Chr17:31,303,770-31,314,105
|
0.3195
|
1.38(1.1-1.72)
|
0.00446
|
ING2
|
ENSG00000168556
|
Chr4:183,505,058-183,512,429
|
0.6211
|
1.86(1.26-2.75)
|
0.00179
|
MLLT3
|
ENSG00000171843
|
Chr9:20,341,669-20,622,499
|
-0.6316
|
0.53(0.36-0.78)
|
0.00130
|
NCOA3
|
ENSG00000124151
|
Chr20:47,501,887-47,656,877
|
-1.0023
|
0.37(0.2-0.67)
|
0.00123
|
NOTCH1
|
ENSG00000148400
|
Chr9:136,494,433-136,546,048
|
0.2648
|
1.3(0.99-1.72)
|
0.06002
|
PRDM15
|
ENSG00000141956
|
Chr21:41,798,225-41,879,482
|
0.9214
|
2.51(1.21-5.23)
|
0.01366
|
SFRP2
|
ENSG00000145423
|
Chr4:153,780,591-153,789,083
|
-0.2461
|
0.78(0.73-0.84)
|
0.00000
|
SMYD5
|
ENSG00000135632
|
Chr2: 73,214,222-73,227,221
|
-0.6459
|
0.52(0.26-1.05)
|
0.06810
|
THOC2
|
ENSG00000125676
|
ChrX:123,600,561-123,733,056
|
1.4914
|
4.44(2.39-8.26)
|
0.00000
|
TIAL1
|
ENSG00000151923
|
Chr10:119,571,802-119,597,029
|
-1.0214
|
0.36(0.22-0.6)
|
0.00009
|
WNT7A
|
ENSG00000154764
|
Chr3: 13,816,258-13,880,071
|
0.1963
|
1.22(1.01-1.47)
|
0.03824
|
We performed multifactorial COX regression analysis on the 16 genes retained in the LOSS regression analysis, then we obtained 11 prognosis-related genes (EV12B, ING2, MLLT3, NCOA3, NOTCH1, PRDM15, SFRP2, SMYD5, THOC2, TIAL1 and WNT7A). Therefore, in this way, 11 genes were identified as the most valuable predictive genes. Subsequently, using the formulas involved above, we gain a risk-score system. Risk score =(0.3195×EV12B) + (0.6211×ING2) + (-0.6316×MLLT3) +(-1.0023×NCOA3) + (0.2648×NOTCH1) + (0.9214×PRDM15) +(-0.2461×SFRP2) + (-0.6459×SMYD5) + (1.4914×THOC2) +(-1.0214×TIAL1) + (0.1963×WNT7A) (TAB. 1). A risk score was calculated for each patient and based on its median, 253 patients were assigned to the high-risk group and 253 to the low-risk group. (FIG. 3A and 3B).
TAB. 2 The relation between risk score and clinical features
|
Variables
|
Total (n = 506)
|
High risk (n = 253)
|
Low risk (n = 253)
|
p
|
Age, Median (Q1, Q3)
|
41 (32.25, 53)
|
43 (33, 57)
|
39 (32, 50)
|
< 0.001
|
Gender, n(%)
|
|
|
|
0.788
|
Female
|
226 (45)
|
111 (44)
|
115 (45)
|
|
Male
|
280 (55)
|
142 (56)
|
138 (55)
|
|
Grade n (%)
|
|
|
|
< 0.001
|
2
|
245 (49)
|
84 (33)
|
161 (64)
|
|
3
|
260 (51)
|
168 (67)
|
92 (36)
|
|
New Event, n (%)
|
|
|
|
0.001
|
NO
|
231 (46)
|
97 (38)
|
134 (53)
|
|
YES
|
275 (54)
|
156 (62)
|
119 (47)
|
|
Cancer Status, n (%)
|
|
|
|
< 0.001
|
Tumor free
|
175 (39)
|
68 (30)
|
107 (50)
|
|
With tumor
|
271 (61)
|
162 (70)
|
109 (50)
|
|
3.3. Survival Analysis
Kaplan-Meier survival analysis showed that the OS rate was significantly lower in the high-risk group than in the low-risk group (FIG. 3C). We performed ROC curve analysis to assess the diagnostic role of the constructed model. The AUC was 0.827 (FIG. 3D), suggesting that this 11-gene signature played an important role and accuracy in the efficacy of this diagnosis.
To reveal the relationship between risk scores and clinical characteristics, we applied chi-square test (TAB. 2). It indicated that the Age, Cancer-Status, New-Event-Type and Grade are linked to the risk score of LGG patients.
3.4. Verification of the stem cell division related-genes Risk Model in the CCGA Cohort
In the following, we validated the accuracy of the prognostic model using the LGG data from the CCGA database. First, we gathered 657 LGG patients that had both gene expression and patient survival information from the CCGA. We then calculated the risk score using the same risk score calculation formula as in TCGA. Based on the median risk score, we divided the 657 LGG patients in the CCGA into the low-risk group(329) and the high-risk group(328) (FIG. 4A). Subsequently, the scatter plot demonstrated that the low-risk group had a longer survival time than the high-risk group (FIG. 4B). Additionally, The Kaplan-Meier method was utilized to examine the survival curves of the low-risk and high-risk patient groups, and there was a significant difference in survival rates between the two groups (p < 0.0001, FIG. 4C). This was consistent with our results in the TCGA-LGG dataset. Finally, we knew that the AUC for 5-year OS was 0.703 in the CCGA-LGG dataset (FIG. 4D). Therefore, it was obvious that the model we developed had relative predictive accuracy.
3.5. Genetic alteration and expression of those 11 genes
BioPortal (http://www.cbioportal.org) was used to analyze genetic alterations in the 11 genes that were screened in LGG. Clearly, all 511 patients had genetic alterations. And yet, there were 101 cases had alterations in all query genes, containing 50(9.78%) cases of mutation, 30(5.87%) cases of homozygously deleted(HOMDEL), 16(3.13%) cases of amplification, 3(0.59%) cases of fusion and 2 (0.39%)cases of multiple alterations(TAB. 3). In summary, this result showed that the most common type of genetic alteration was mutation. Conversely, the least common type of genetic alteration was multiple alterations. The heatmap of alterations in these 11 genes revealed that the change rates from high to low were 9.20% (NOTCH1),3.13% (MLLT3), 2.54% (TIAL1)), 2.35%(ING2), 1.76%(SFRP2), 1.57%(THOC2), 1.17%(EV12B), 0.59%(NCOA3), 0.59%(PRDM15), 0.98%(WNT7A) and 0.20%(SMYD5),respectively(FIG.5). In addition, the specific of the changes of the 11 genes were shown in TAB. 4.
Meanwhile, we analyzed the expression of these 11 genes in the two groups (FIG. 6A). The results showed that these 11 genes in the two groups have a remarkably differences in expression. The expression of EV12B, ING2, NCOA3, PRDM15, and THOC2 in high-risk group was higher than that in low-risk group, while the expression of TIAL1, SFRP2, MLLT3, and WN17A in high-risk group was lower than that in low-risk group (FIG. 6B).
TAB. 3 Details of the 11 genes in different alterations.
|
Alteration
|
Number of casea
|
Frequency
|
Mutation
|
50
|
9.78%
|
HOMDEL
|
30
|
5.87%
|
Amplification
|
16
|
3.13%
|
Fusion
|
3
|
0.59%
|
Multiple alterations
|
2
|
0.39%
|
Total
|
101
|
19.77%
|
.
TAB. 4 Different genetic alteration patterns of the 11 genes in LGG samples (n=511)
|
mRNAs
|
No Alterations
|
Genetic Alteration
|
Altered/Profiled(%)
|
Mutation
|
HOMDEL
|
Amplification
|
Fusion
|
EVI2B
|
505
|
2
|
3
|
1
|
0
|
1.17%
|
ING2
|
499
|
0
|
11
|
1
|
0
|
2.35%
|
MLLT3
|
495
|
3
|
11
|
0
|
2
|
3.13%
|
NCOA3
|
508
|
3
|
0
|
0
|
0
|
0.59%
|
NOTCH1
|
464
|
38
|
1
|
7
|
1
|
9.20%
|
PRDM15
|
508
|
2
|
0
|
1
|
0
|
0.59%
|
SFRP2
|
502
|
1
|
8
|
0
|
0
|
1.76%
|
SMYD5
|
510
|
1
|
0
|
0
|
0
|
0.20%
|
THOC2
|
503
|
4
|
0
|
4
|
0
|
1.57%
|
TIAL1
|
498
|
2
|
10
|
1
|
0
|
2.54%
|
WNT7A
|
506
|
2
|
0
|
3
|
0
|
0.98%
|
3.6. Independent Prognostic Value of the 11-Gene Signature
We conducted univariate and multivariate Cox regression analyses on the entire dataset just to assess whether the prognostic power of these 11 genetic traits was independent of other clinical variables,such as Age, Sex, Grade, New-Event, and Cancer-Status. The results indicated that the risk score was significantly correlated with OS of LGG patients. However, there was no significant distinction that met the requirements in the sex of LGG patient(p>0.05). Moreover, adjustment with other clinical parameters from the entire dataset, the 11-genetic signature remained as an independent prognostic indicator (P <0.001, HR = 1.061, 95% CI = 1.042-1.079). Furthermore, Age, grade, and cancer status of LGG patients are also independent prognostic factors, because they differ significantly both in univariate analysis and multivariate analysis (P <0.001, HR = 1.047, 95% CI = 1.030-1.064; P <0.001, HR = 2.600, 95% CI = 1.677-4.030; P<0.001, HR=6.169, 95% CI =2.519-15.105, respectively, TAB. 5和TAB. 6).
TAB. 5 Univariable Analyses for Each Clinical Feature
|
|
pvalue
|
Hazard ratio
|
|
Age
|
<0.001
|
1.058(1.043-1.073)
|
Sex
|
0.619
|
1.094(0.767-1.561)
|
Grade
|
<0.001
|
3.397(2.296-5.024)
|
New Event
|
<0.001
|
3.1651.837-5.454)
|
Cancer Status
|
<0.001
|
7.893(3.672-16.964)
|
riskScore
|
<0.001
|
1.090(1.075-1.106)
|
TAB. 6 Multivariable Analyses for each Clinical Feature
|
|
pvalue
|
Hazard ratio
|
|
Age
|
<0.001
|
1.047(1.030-1.064)
|
Sex
|
0.377
|
1.185(0.813-1.729)
|
Grade
|
<0.001
|
2.600(1.677-4.030)
|
New Event
|
0.917
|
1.042(0.480-2.261)
|
Cancer Status
|
<0.001
|
6.169(2.519-15.105)
|
Risk Score
|
<0.001
|
1.061(1.042-1.079)
|
3.7. Construction and identification of individualized evaluation nomogram model
To provide clinicians with a better method to quantitatively predict the prognosis of LGG patients, we constructed a nomogram model using independent risk factors from TCGA-LGG dataset, such as Age, Sex, Grade, New-Event, Cancer-Status and Risk Score. This nomogram was designed to estimate the 1-, 3- and 5-year OS of LGG patients (FIG. 7A). The nomogram revealed that the risk score was an essential influencing factor among the various clinical parameters. In addition, we constructed calibration curves, which showed that the nomogram matched well with the actual survival of LGG patients(FIG. 7B).
3.8. Immune Cell Infiltration Analysis
We used "CIBERSORT", "QUANTISEQ" and “EPIC” to research the relative proportions of immune cells. The results showed that there were 12 immune cells differed significantly in their degree of infiltration between the high and low risk groups in "CIBERSORT". Furthermore, we found the infiltration of T-cell CD4 memory quiescence, T-cell CD4 memory activation, T-cell regulation, T-cell CD4 memory quiescence, macrophage M1, macrophage M2 and neutrophils in the high-risk group was greater than in the low-risk group (FIG. 8A). Additionally, according to "QUANTISEQ", T cell regulatory, T cell CD4, Macrophage M1 and Macrophage M2 also displayed relatively greater infiltration in the high-risk group (FIG. 8B). In “EPIC”, both Endothelial cells and Macrophage exhibited a relatively greater degree of infiltration in the high-risk group (FIG. 8C).
In conclusion, this suggested that immune infiltration may play an influential role in the poor prognosis of LGG patients.
3.9. Drug Sensitivity Analysis
We investigated the relationships between the proportions of immune and stromal components and the expression of ARGPs. Therefore, we used the ESTIMATIONR package to classify the low- and high-risk groups by stromal score, immune score and ESTIMATE score. The results indicated that there were significant differences between the groups in stromal scores (p<0.0001), immune scores (p<0.0001) and ESTIMATE scores (p<0.0001) (FIG. 9B).
Subsequently, we examined whether the high- and low-risk groups were significantly correlated with the effect of immunotherapy. The TMB score, The TIDE score and The TME score linked to response to immune checkpoint inhibitors. (16). Both TMB scores (p less than 0.001) and TME scores (p less than 0.001) were significantly different by Wilcoxon's test. The high-risk group showed significantly higher estimated scores (FIG. 9A).
In addition, we calculated the IC50 of the commonly used chemotherapeutic drugs for LGG to determine the sensitivity to the drugs in the high and low-risk groups. As it shown, we found Bortrezomib (p<0.0001), Etoposide (p<0.0001), Tamoxifen (p<0.0001), and Temozolomide (p<0.0001) had lower IC50 values in the high-risk group than in the low-risk group, thus these drugs were more sensitive in the high-risk group(FIG. 9C).