Both the Cancer Genome Atlas Program (TCGA) 12 and Proteogenomic Translational Research Centers High-Grade Serous Ovarian Carcinoma (PTRC-HGSOC) 9 datasets included patients with paired H&E WSI and proteomics data, along with documented responses to platinum chemotherapy. This enabled an investigation into the effectiveness of multi-modal histology-proteomics deep learning compared to their unimodal counterparts for this task.
We first trained a clustering-constrained attention multiple instance learning (CLAM) 21 model to predict response to platinum therapy in ovarian cancer based on WSI data alone. The models were trained on one of the data sets available (TCGA or PTRC-HGSOC, see methods for description of data sets) and validated on the other data set. As Table. 1 shows, the predictive power of this method using only pathological images was rather limited independent of whether primary or metastatic tumors were considered. Similarly, we trained a classical machine learning ensemble model based on the proteomics “Chowdhury Signature” 9 (methods 3.6.1) on one of the two data sets (TCGA-OV or PTRC-HGSOC) and evaluated the predictive power on the other data set (full split descriptions in Supplementary Fig. 3.a. The proteomics-based predictor performed better than the WSI-based predictor (Table. 1).
Next, we asked whether we could derive a more accurate predictor if we combine patient-paired H&E WSIs and proteomics data using multi-modal deep learning frameworks 18–20. The model combining WSI and proteomic data results in a significant increase in model performance. (Table. 1). Multi-modal benchmarks consistently outperformed both WSI-only and proteomics-only models predicting treatment resistance to platinum chemotherapy. This finding was robust across various training and testing setups (See methods section 3.1 and Supplementary Figure. 3.a). This suggests that there exist multi-modal features invariant across cohorts that correlate well with tumor sensitivity to treatment. The test AUC results pertain solely to cohorts from held-out sites, ensuring the mitigation of potential biases and validating the generalization of discovered features 22. Table. 1 presents the best performing PorpoiseMMF 18 model (Supplementary notes 3.5) performance for the multimodal model. Tables containing the results for all benchmarked models can be found within the Supplementary material (Supplementary Tables 1–4).
The results for primary samples shown in Table 1 demonstrate that overall, proteomics-only models slightly outperform WSI-only models. When training on PTRC-HGSOC 9 primary tumor samples and testing on TCGA samples, the multi-modal model achieves an AUC (Area Under the Curve) of 0.752. This is a significant improvement over the proteomics-only model, which has an AUC of 0.61. This represents a 14% increase (t = 6.24, p = 0.00336). When training on TCGA tumor samples and testing on PTRC-HGSOC primary tumor samples, the multi-modal model achieves an AUC of 0.835. This represents a 13.6% increase over the proteomics-only model, which has an AUC of 0.755 (t = 4.14, p = 0.0144). Finally, when training on the FHCRC + Mayo sub-cohorts (figure. 2.) of PTRC-HGSOC primary tumor samples and testing on UBC primary tumor samples, the multi-modal model achieves an AUC of 0.84. This represents a 19.9% increase over the proteomics-only model, which has an AUC of 0.558 (t = 7.0, p = 0.00219).
Focusing on metastatic cases separately, Table 1 demonstrates that overall, proteomics-only models slightly outperform WSI-only models, and all test results are slightly lower than primary trained models, as TCGA samples are only from primary tumor sites. When training on PTRC-HGSOC metastatic tumor samples and testing on TCGA samples, the multi-modal model achieves an AUC of 0.704, representing an 8.2% increase over the proteomics-only model, which has an AUC of 0.54 (t = 2.63, p = 0.058). Training on TCGA tumor samples and testing on PTRC-HGSOC metastatic tumor samples, the multi-modal model achieves an AUC of 0.698, representing a 13.7% increase over the proteomics-only model, which has an AUC of 0.566 (t = 2.58, p = 0.0614). Finally, when training on the FHCRC + Mayo sub-cohorts of PTRC-HGSOC metastatic tumor samples and testing on UBC metastatic tumor samples, the multi-modal model achieves an AUC of 0.798, representing a 16.5% increase over the proteomics-only model, which has an AUC of 0.665 (t = 3.33, p = 0.029).
When modeling histopathology-proteomics data, the experimental setup involves selecting the model architecture, the proteomic input structure (methods 3.6), and the histopathology patch embeddings (methods 3.4). We evaluated how changes in each of these choices impacted the performance in predicting platinum response.
Comparing multimodal model architectures, we found that PorpoiseMMF19 outperformed MCAT20 and SurvPath19 in overall model performance. The former being a late fusion model and the latter two being early fusion models. In terms of proteomic groupings, Chowdhury signature (methods 3.6.1) > CPTAC signature (methods 3.6.2) > RM pathways (methods 3.6.4) > PRG pathways (3.6.3). Comparing pretrained histopathology image patch embedding networks, our SSL trained ovarian cancer-specific DINO-OV (Supplementary Table 5 for pre-training data and Supplementary section 3.8 for SSL details), UNI23 and CTransPath24 performed similarly across the tasks (Supplementary Tables 1–4).
Although primary and metastatic cancers show many similarities, they demonstrate large differences in response and prognosis. We found that training the models separately for the primary and metastatic cohorts yields better performance. Based on these findings we also recommend this practice for similar studies.