Identification and validation of the signature for MSI status of RCC
The GSE39582 with the largest samples of RCC, including 57 MSI and 154 MSS was used as the training data for extracting a REOs-based signature. Firstly, we identified 4769 MSI-related differentially expressed (DE) genes (Student's t-test, FDR < 0.01, Additional file 1: Table S1) between the 57 MSI RCC samples and the 154 MSS RCC samples. From all the gene pairs formed by these DE genes, we identified 1,654,739 gene pairs, whose specific REO pattern occurred more frequently in the MSI samples than in the MSS samples (Fisher’s exact test, FDR < 0.01). The larger FD of a REO pattern, the stronger classified ability of the REO pattern can classify the status of MSI. We further narrowed down the number of gene pairs to 1898 through a redundancy removal process by keeping only one with the largest FD value of those gene pairs sharing a common gene (see Methods, Figure 1a). From these gene pairs, we extracted 10 gene pairs with the FD at least 0.8. These 10 gene pairs were used as the signature for predicting MSI status of RCC, denoted as 10-GPS (Table 1). A RCC sample was predicted as MSI if the REOs of at least seven gene pairs in the 10-GPS vote for MSI; otherwise the MSS. According to the classification rule, the F-score of the signature in the training data was 0.9727, with a sensitivity of 0.9649 and a specificity of 0.9805. The area under the curve (AUC) of the receiver operating characteristic (ROC) curve was 0.9838 (Figure 2a).
Table 1. The Composition of 10-GPS
signature
|
gene1
|
gene2
|
signature
|
gene1
|
gene2
|
pair1
|
HNRNPL
|
CDC16
|
pair6
|
STRN3
|
TMEM192
|
pair2
|
MTA2
|
VGF
|
pair7
|
HPSE
|
BCAS3
|
pair3
|
CALR
|
SEC22B
|
pair8
|
PRPF39
|
ATF6
|
pair4
|
RASL11A
|
CAB39L
|
pair9
|
CCRN4L
|
GRM8
|
pair5
|
LYG1
|
DHRS12
|
pair10
|
AMFR
|
DUSP18
|
Notes:
A RCC sample was classified as MSI if the REOs (gene1>gene2) of at least 7 of the gene pairs in the 10-GPS vote for MSI; otherwise the MSS.
We tested the 10-GPS in four independent cohorts of RCC samples (Figure 1b), the F-scores of the classification by 10-GPS were 1, 0.9630, 0.9412 and 0.8798, respectively, as shown in the Table 2, and the AUCs were 1, 0.9923, 1 and 0.9244, respectively (Figure 2b, c, d and e).
Table 2. The performances of the 10-GPS in RCCs of the independent datasets
|
pre-MSIa
(MSI:MSS)b
|
pre-MSSa
(MSI:MSS)b
|
sensitivity
|
specificity
|
F-score
|
GSE39084_R
|
13(13:0)
|
18(0:18)
|
1
|
1
|
1
|
GSE18088_R
|
13(13:0)
|
15(1:14)
|
0.9286
|
1
|
0.9630
|
GSE75317_R
|
8(8:0)
|
18(1:17)
|
0.8889
|
1
|
0.9412
|
TCGA_R
|
15(9:6)
|
38(1:37)
|
0.900
|
0.8605
|
0.8798
|
Total_RCCs
|
49(43:6)
|
89(3:86)
|
0.9348
|
0.9348
|
0.9348
|
Notes:
a represents the predicted MSI status by 10-GPS; b represents the original MSI status; GSE_R represents the RCC samples; Total_RCCs represents all the samples of RCC.
Transcriptome assessment of the signature-disconfirmed RCC samples
In the training data, there were a total of five signature-disconfirmed samples. We compared the gene expression patterns of the five signature-disconfirmed samples with the 206 signature-confirmed samples through clustering analysis. Firstly, we identified 5664 DE genes (Student's t-test, FDR < 0.01) between the 55 signature-confirmed MSI and the 151 signature-confirmed MSS samples. Secondly, using the expression levels of the top 100 significant DE genes, the samples were divided into two subgroups using the complete linkage hierarchical clustering based on the Euclidean distance (Figure 3a). The results showed that all of the two MSI samples reclassified as MSS by the 10-GPS were clustered with the signature-confirmed MSS samples, and all of the three MSS samples reclassified as MSI were clustered with the signature-confirmed MSI samples.
Similarly in the two of the four validation datasets of RCC (GSE18088 and GSE75317), all of these two MSI samples reclassified as MSS by our signature were clustered with the corresponding signature-confirmed MSS samples (Figure 3b and 3c), respectively. These results provided transcriptional evidence of the correctness of the prediction of 10-GPS.
Genome assessment of the signature-disconfirmed RCC samples
It is known that BRAFV600E mutations and CpG island methylator phenotype (CIMP)-positive frequently occur in MSI CRCs, whereas KRAS mutations (in codons 12 or 13) frequently occur in MSS CRCs [2, 31]. In the training data, for the three MSS samples which were reclassified as MSI by 10-GPS, two patients were KRAS wild-type, BRAF mutant and CIMP-positive and one patient was KRAS wild-type. For the two MSI samples which were reclassified as MSS, one was BRAF wild-type and CIMP-negative, as shown in Table 3.
Table 3. The molecular characteristics of the five signature-disconfirmed RCC samples in the training dataset
original_MSI.
status
|
predicted_MSI.
status
|
KRAS.
status
|
BRAF.
status
|
CIMP.
status
|
MSS
|
MSI
|
wild type
|
NA
|
NA
|
MSS
|
MSI
|
wild type
|
mutation
|
+
|
MSS
|
MSI
|
wild type
|
mutation
|
+
|
MSI
|
MSS
|
wild type
|
mutation
|
+
|
MSI
|
MSS
|
NA
|
wild type
|
-
|
In the TCGA validation dataset of RCC, there were seven signature-disconfirmed samples. Because mutation of MMR genes can result in MSI [32], we observed the mutation status of the MMR genes in the signature-disconfirmed samples. There were only five samples with mutation data. And two of the four MSS samples which were reclassified as MSI by 10-GPS were MSH6 mutant (Additional file 2: Table S1). These results supported that MSI status of these samples reclassified by 10-GPS might be reliable.
Prognosis assessment of the signature-disconfirmed RCC samples
Then, we also evaluated the reliability of the reclassifications by 10-GPS through survival analyses based on the knowledge that stage III MSI CRCs treated with surgery only have better prognoses than MSS CRCs [2] and that stage III MSS CRC patients treated with 5-Fu-based ACT after surgery have improved outcomes than patients treated with surgery only [3-5]. And the survival benefit of ACT was only observed in stage III patients [3, 4]. In the 32 stage III RCC samples of the training data for patients treated with surgery only, one of the 19 MSS sample was reclassified as MSI and one of 13 MSI sample was reclassified as MSS by the 10-GPS. In the 54 (46 MSS and 8 MSI) stage III RCC samples for patients receiving 5-Fu-based ACT, the original MSI status of all samples were confirmed by the 10-GPS. By comparison, the MSS patient reclassified as MSI had longer RFS (130 months) than the MSI patient reclassified as MSS (31 months) by the signature. The survival difference between patients with predicted MSI status by 10-GPS were more significant than the difference between patients with the original MSI status due to the two reclassified samples (Additional file 3).
Identification and validation of the signature for MSI status of LCC
We applied the 10-GPS to LCC samples and CRC samples without clear location information (Additional file 2: Table S2). The results showed that the performance was reduced when applying the 10-GPS to predict MSI status of LCC. Therefore, we also tried to develop a signature to identify MSI status of the LCC patients in the same way as in RCC. Eventually, these six gene pairs were used as the signature for predicting MSI status of LCC, denoted as 6-GPS (Additional file 2: Table S3). A LCC sample was predicted as MSI if the REOs of at least four gene pairs in the 6-GPS vote for MSI; otherwise the MSS. According to the classification rule, the F-score of the signature in the LCC training data was 0.9983, with a sensitivity of 1 and a specificity of 0.9966. And also, the 6-GPS was well validated in four independent cohorts of LCC samples (Additional File 2: Table S4).