In this study, we evaluated several machine learning models for early assessment of pathological response to neoadjuvant chemoradiation in patients treated for locally advanced rectal cancer. In this study population from a single centre, we found that merged model using MR radiomics and clinical features performed better than the individual clinical and radiomics models.
Neoadjuvant chemoradiation followed by surgery is the standard of care in locally advanced Rectal cancers (1). However, there is a shift in the treatment strategy towards total neoadjuvant therapy (TNT) and most of the trials which were designed with the intention of improving distant control, also demonstrated improvement in pathological complete response rates (2). Literature suggests that patients who have a pathological complete response (pCR) have a better prognosis when compared to those who do not have a complete response (24). Patients who achieve pCR have better local control, better survival, and less distant failure (3) and demonstrated improved rectum preservation leading to better quality of life in these patients. If these patients could be predicted upfront, then they could be offered organ preservation strategies especially in low-risk rectal cancers (25). And if the patients with imaging features portending poor response and early systemic dissemination can be predicted, then their treatment could be intensified before surgery (26).
Till date clinical evaluation is the most widely used assessment tool for identifying pCR following nCRT. Certain clinical features such as tumor stage, nodal stage, histology and pretreatment Carcinoembryonic Antigen (CEA) have been reported to predict pathological complete response in rectal cancers (4). MERCURY trial demonstrated the importance of pretreatment high resolution MRI in assessing the local extensions which had impact on the survival (27) and outcomes.
Pretreatment Radiomics have shown some promising directions towards prediction of outcomes and survival in rectal cancer (20, 21, 24). In this study, we explored Radiomics from pre-treatment T2W-MRI and post chemotherapy Planning CT; which has not really been evaluated. While the performance of the T2W-MRI and Planning CT models were quite similar (AUC = 0.62) in predicting pCR, the feature values themselves were different in patients who went on to have pCR compared to the IR group. However, the CT features did not show significant difference between the pCR and IR groups. The merged model with T2W radiomics and clinical features performed the best and was able to discriminate between patients who achieved pCR and IR with an area under the curve of 0.80 and 0.72 in the training and the validation cohorts, respectively. The combined model also showed better performance than models built with individual features alone (Fig. 3). However, it should be noted that the low predictive values for most of the models must be keeping in mind the low prevalence of rectal cancer and even lower rates of pCR in any given cohort. So, it is quite difficult for single institutional data to account for the variabilities that negatively impacts the generalizability of models to real-life data such as the variations seen across populations, image quality and size of the dataset. Restaging MRI has shown promise in earlier studies for predicting complete response (28) and it would be interesting to evaluate if combining radiomics based prediction using these images can further improve the accuracy for predicting complete response and follow-up in wait and watch strategy. Given the growing number of studies on deep learning based oncological prognostication and the ability of these models to handle the heterogeneity in the data better compared to machine learning (29), it would be worth exploring their utility for future studies. It would also be interesting to see if with federated architecture (30) where we can have more access to more data without having to worry much about data leaving the participating institutions; can help us improve the predictive performance of the models for future clinical use.
Our study has a few limitations. This is a single institute retrospective study. The models were trained and validated in a relatively small sample of patients. There were variations in the image acquisition parameters for CT and MRI. The datasets included have both contrast and non-contrast enhanced CTs; with patients who underwent TNT having received contrast enhanced CT, while others got non-contrast enhanced CTs. However, we did not correct this since the impact of contrast agents on radiomics is not completely understood (31). The ideal situation would be to define the extent of the tumor on T2W high resolution MRI (32). Since all patients did not have high resolution MR images, we used the available T2W-MR images instead. The retrospective nature and smaller sample size of the study population may have limited the overall ML model performance. Large multicenter studies (17, 20, 33, 34) show that the ML model performs better with larger sample size for training and validation cohorts. Multi modal studies like (18) and RAPIDS, radio-pathomics based study (17) also show multimodal prediction models perform better than single-modality prediction models which was consistent with our results as the merged T2W MRI-clinical model achieved the best AUC. The performance of the Ryan’s grading classification model could not achieve an AUC > 0.58. (Table 4). This may also be attributed to a smaller number of patients in each grade which limits optimal training and validation of the models. Next, the model has not been validated with data from another institution to comment on the generalizability of the results. The correlation of pathological response and radiomics for rectal cancer has not been reported earlier in Indian patients. So future studies will address this limitation.
In summary, by combining information from clinical parameters and baseline T2W-MRI radiomics could predict pCR better than individual prediction models for rectal cancer. The Random Forest model had high predictive value. Before clinical use with a wait-and-watch strategy, the proposed models will require validation in larger clinical studies.