We were particularly interested in predicting local/distant recurrence free survival (RFS) for patients with localized GIST and in predicting overall survival (OS) for patients with advanced GIST. The use of adjuvant therapy is known to be directly linked with patients’ baseline risk evaluation and their prognosis8. Therefore, we investigated the prognostic power of DL (The full DL workflow is shown in Fig. 1) in three subgroups of patients using different endpoints: RFS for patients with localized GIST and without adjuvant therapy (N = 161), RFS for patients with localized GIST and treated with adjuvant therapy (N = 66) and OS for patients with advanced GIST and treated with imatinib (N = 119, 63 metastatic at diagnosis and 56 recurrent, Supplementary Fig. 1). All models were trained with the whole cohort and evaluated within these subgroups (Methods).
DL showed an improvement in predicting RFS from images alone for patients with localized GIST and without adjuvant therapy (C-index = 0.81, std = 0.04 for testing from CV in center 1) compared to the Miettinen (C-index = 0.76, std = 0.04). More interestingly, the multimodal DL model that includes H&E images, tumor location and tumor size (Deep Miettinen) further improved the prediction (C-index = 0.83, std = 0.04, Fig. 2a). We validated this Deep Miettinen model in center 2 and obtained a C-index of 0.72 (95% CI = [0.63, 0.80]) for localized untreated patients.
Although the improvement is only moderate, it is worth mentioning that the Deep Miettinen model was better at stratifying patients into high and low risk groups compared to Miettinen as shown in the KM curve (log-rank test p-value = 5.8e-17 and 1.3e-10 for Deep Miettinen and Miettinen respectively for localized untreated patients Fig. 2c). Furthermore, the Deep Miettinen model was capable of stratifying classical Miettinen intermediate and high risk groups into sub groups of different prognosis (log-rank test p-value = 2.1e-03 for both intermediate and high risk groups, Fig. 2c). Same trend was observed in center 2 (log-rank test p-value = 2.9e-01 and 3.1e-e01 for intermediate and high risk groups respectively, Supplementary Fig. 2a) using a dataset adapted threshold for high and low risk groups (a cutoff that led to a comparable number of high risk patients as defined by the Miettinen).
When examining the predictive tiles that were related to different risks, we noticed that mitoses, marked nuclear atypia, high cellular density, epithelioid cell component, necrosis and hemorrhage were associated with high risk while cytoplasmic vacuolization, low cellular density, collagenous stroma, mild nuclear atypia and spindle cell component were associated with low risk (Fig. 2b).
We observed inconclusive results in predicting outcome in patients treated with TKIs. For localized and treated GIST, we obtained C-index = 0.81, std = 0.1 for DL with image only, C-index = 0.68, std = 0.11 for Miettinen and C-index = 0.84, std = 0.1 for Deep Miettinen, however these results were not validated in center 2(C-index = 0.44 for DL with image only, C-index = 0.54 for Miettinen and C-index = 0.45 for Deep Miettinen). For advanced GIST receiving imatinib, the performance was poor in both center 1 and center 2(C-index = 0.52, std = 0.1 for DL with image only, C-index = 0.46, std = 0.1 for Miettinen and C-index = 0.64, std = 0.15 for Deep Miettinen in center 1 and C-index = 0.64 for DL with image only, C-index = 0.49 for Miettinen and C-index = 0.62 for Deep Miettinen in center 2). These results suggest that the treatment effect is crucial to patients’ prognosis and models built on baseline images alone are not capable of predicting the outcome.
Since the mutational profile is vital for both the treatment decision as well as the outcome of GIST patients, we next investigated the predictive power of DL on WSI for mutation classification. We first tested DL models in mutation classification at gene level (KIT mutant, PDGFRA mutant or wild type). Patients without mutations or with mutations in genes other than KIT and PDGFRA were considered Wild Type. We obtained a macro AUC of 0.81 from CV in center 1 (Table 3, Fig. 3a). This model was in center 2 with a macro-AUC of 0.77 (Table 3, Fig. 2a). We obtained a better performance for tumors from the stomach in both centers (Table 3, Fig. 3a). Given the importance of the mutations at the exon level, we also built predictive models for PDGFRA exon18, KIT exon11, KIT exon9 and other mutations in KIT and PDGFRA and WT. Similar to the gene level mutation classification model, we obtained better performance for tumors from the stomach. Results are shown in Table 3 (Fig. 3a).
We next investigated if DL could predict mutation types at the codon level with 2 particular types: PDGFRA exon18 D842V mutation which is resistant to imatinib and KIT del-inc557/558 mutations since these types of mutations have been reported to have a worse prognosis when compared to other KIT exon11 mutations25–27. For the PDGFRA exon18 D842V mutation, we obtained an AUC = 0.87 (95% CI = [0.84, 0.90]) in testing from center 1 and AUC = 0.90 (95% CI = [0.84, 0.96]) in validation from center 2 for all samples. Comparable results were obtained for samples from the stomach only (AUC = 0.82 [95% CI= [0.78, 0.86] in center 1 and AUC = 0.87 (95%= [0.80, 0.95] in center 2, Fig. 3c). For the KIT del-inc557/558 mutations, we obtained an AUC = 0.69 (95% CI= [0.65, 0.72]) in center 1 and AUC = 0.76 (95% CI= [0.66, 0.86]) in center 2 for all samples and an AUC = 0.78 (95% CI= [0.75, 0.83]) in center 1 and AUC = 0.74 (95% CI= [0.66, 0.82]) in center 2 for samples from the stomach only.
Of note, conventional tumor cell histological subtypes showed inferior association to the mutations types compared to DL (AUC = 0.8, 95%CI= [0.76, 0.86] for PDGFRA exon18, AUC = 0.66, 95%CI= [0.60, 0.70] for KIT exon11, AUC = 0.56, 95%CI= [0.49, 0.63] for KIT exon9, AUC = 0.57, 95%CI= [0.47, 0.67] for other mutations and AUC = 0.47, 95%CI= [0.4, 0.54] for WT in center 1 where the tumor cell types were available, Supplementary Fig. 3). This result indicates that DL quantifies morphological features beyond conventional cell type classification. When reviewing the most predictive tiles for these different mutations, we observed that the PDGFRA exon18 D842V mutation was associated with epithelioid or mixt cell morphology, cytoplasmic vacuolization, myxoid stroma, and lymphoid infiltrate while KIT del-inc557/558 mutations were associated with mitotic activity and nuclear hyperchromasia. Interestingly, KIT exon9 was associated with lymphoid infiltrate (Fig. 3d) which has not been previously described in the literature to our knowledge31. Further investigation is currently being carried out to validate this finding.