From the Core to the Border of Locally Advanced Rectal Cancer: A Novel MRI-based Clinical-Radiomic Model Early Predicts Treatment Response


 Neoadjuvant chemo-radiotherapy (CRT) followed by total mesorectal excision (TME) represents the standard treatment for patients with locally advanced (³ T3 or N+) rectal cancer (LARC). Approximately 15% of patients with LARC shows a complete response after CRT. The use of pre-treatment MRI as predictive biomarker could help to increase the chance of organ preservation by tailoring the neoadjuvant treatment. We present a novel machine learning model combining pre-treatment MRI-based clinical and radiomic features for the early prediction of treatment response in LARC patients. MRI scans (3.0T, T2-weighted) of 72 patients with LARC were included. Two readers independently segmented each tumor. Radiomic features were extracted from both the “tumor core” (TC) and the “tumor border” (TB). Partial least square (PLS) regression was used as the multivariate, machine learning, algorithm of choice and leave-one-out nested cross-validation was used to optimize hyperparameters of the PLS. The MRI-Based “clinical-radiomic” machine learning model properly predicted the treatment response (AUC=0.793, p =5.6·10-5). Importantly, the prediction improved when combining MRI-based clinical features and radiomic features, the latter extracted from both TC and TB. Prospective validation studies in randomized clinical trials are warranted to better define the role of radiomics in the development of rectal cancer precision medicine.

In this context, Magnetic Resonance Imaging (MRI), thanks to its diagnostic and prognostic relevance, is the gold standard imaging approach for local staging and treatment response assessment 7 . At staging, MRI provides information on the site, the extension and the relation with surrounding organs, thus establishing landmarks for following treatment. Moreover, it reveals prognostic signs such as the mesorectal fascia involvement, the extramural vascular invasion and the distance to the anal sphincter complex 8 . During the treatment and, mostly, at the end of the CRT right before surgery, MRI plays a crucial role for the response assessment [9][10][11][12][13][14][15] . In fact, approximately 15% of patients with LARC shows a complete response to CRT (complete responders) and it has been suggested that surgery could be omitted in selected patients 3 . Moreover, an international multicenter registry study on watch and wait (W&W) strategy revealed a 5-year disease speci c survival of 94% 16,17 . In this context, the use of MR as predictive biomarker in oncology quickly became a hot topic in literature, with the primary aim to select the most appropriate treatment thus pursuing precision medicine 18 . Moreover, the development of organ preservation policies further encouraged the researchers to investigate new MRI-based biomarkers for treatment response 3 . This approach could be extremely bene cial for patients considered good responder, since the neoadjuvant treatment could be intensi ed to increase the chance of organ preservation. Moreover, for these patients, a longer interval between CRT and surgery (from 8 to 11 weeks) may increase the pathological complete response rates as well. On the other hand, in poor responders, the treatment could be tailored with an additional boost 19 . Methods extracting mineable data from clinical images, such as those based on automatic extraction of features from radiological images (radiomic features) and machine learning approaches, revealed promising results when combined to visual morphologic and clinical assessment [20][21][22][23][24][25][26][27][28] . Moreover, in other tumor types, prognostic information has been obtained not only from the tumor but also in the tumor surrounding tissue (TST) [29][30][31] . We hypothesized that a combination of MRI-based data, including clinical and computational ones, could be useful for treatment response prediction at an early stage.
We here present a novel machine learning model combining MRI-based clinical and radiomic features, the latter extracted from both the "tumor core" (TC) and the "tumor border" (TB) from baseline T2-weighted (T2w) images, for the early prediction of treatment response in patients with LARC.

Methods
Subjects. The study received formal approval from the Ethical Committee of the University G. D'Annunzio of Chieti-Pescara, Italy; informed consent was waived by the same ethics committee that approved the study (Comitato Etico per la Ricerca Biomedica delle Province di Chieti e Pescara e dell'Università degli Studi "G.d'Annunzio" di Chieti e Pescara). The study was conducted according to ethical principles laid down by the latest version of the Declaration of Helsinki. A total of 164 patients who underwent clinically indicated rectal MRI for primary staging between January 2011 and February 2019 at our institution were retrospectively included ( Figure 1). Inclusion criteria were 1) biopsy proved non-mucinous locally advanced rectal cancer, 2) MRI performed using a 3.0T scanner, 3) availability of nal clinical outcome (major pathological response or non-major pathological response), 4) long-course CRT. Ninety-two patients were excluded: 12 were mucinous cancers, 24 because the MRI exam was performed on a 1.5T scanner, 33 were transferred in other institutes and we had no information regarding the nal clinical outcome. Moreover, 18 patients were considered un t for long-course CRT because of their clinical fragility and 5 patients had severe susceptibility artifacts in the pelvis (hip replacement). The nal study population was composed of 72 patients.
MRI Protocol. All patients in this cohort underwent clinically indicated rectal MRI consisting of a standard T2w and a diffusion weighted imaging (DWI) acquisition performed using a state-of-the-art 3T MR scanner (Achieva, Philips Medical System, Best, the Netherlands) equipped with a phased array surface coil. Both T2w and DWI sequences were axially oriented perpendicular to the tumor major axis de ned on a sagittal scan. Patients without contraindications received 20 mg of scopolamine butylbromide (Buscopan, Boehringer Ingelheim, Ingelheim am Rhein, Germany) intravenously to reduce bowel motility 7 . Detailed information regarding the parameters of the MRI protocol are described in Table 1.
Imaging analysis. Whole-volume tumor manual segmentation representing the tumor core (TC) was performed on T2w images for each patient by two independent readers with different degrees of experience in abdominal radiology (a radiologist with 5 years of expertise in rectal MRI and a senior resident). All the segmentations were performed on T2w images of the staging MRI and were used as masks for following analysis. The software used for the segmentation was an open-source medical image computing platform, 3DSlicer Version 4.8 (www.3dslicer.org). To create the "border" segmentations (tumor border, TB), a "3dmask_tool" (AFNI) was used 32 . Firstly, both a 2mm dilatation (ROI "dilate") and a 2 mm erosion (ROI "erode") of the masks of each patient were obtained. Secondly, the two masks were subtracted (ROI "dilate" -ROI "erode") in order to obtain the "border" mask of 4 mm thickness. All the "border" masks were then checked by the two readers. If necessary, the segmentation was manually corrected in order to include only the outer border of the tumor and adjacent perivisceral fat. TC and TB were shown in Figure 2a and 2b.
Radiomic Features Extraction. The extraction of the radiomic features from the masked (TB and TC) T2w images was performed using PyRadiomics, a exible open-source platform capable of extracting a large panel of engineered features from medical images; this radiomic quanti cation platform enables the standardization of both feature de nitions and image processing 33 . The reproducibility assessment of the features extracted by the two readers from the segmentations of all patients was performed. To avoid data heterogeneity bias and to promote reproducibility, the T2w images and masks were resampled using 3 isotropic voxel dimensions (1x1x1 mm, 2x2x2 mm, 5x5x5 mm). For each segmentation and for each image resolution (1 mm, 2 mm and 5 mm) ten built-in lters (Original, wavelet, Laplacian of Gaussian (LoG), square, square root, logarithm, exponential, Gradient, LBP2D, LBP3D) were applied and seven feature classes ( rst order statistics, shape descriptors, glcm, glrlm, ngtdm, gldm, glszm) were calculated, which resulted in a total of 1688 radiomic features ( Figure 2c).

Machine Learning: Partial Least Square (PLS) Regression
Machine learning approaches (also de ned multivariate approaches) exploits data multidimensionality to extract useful information. The added value of such approaches is that they identify statistical dependences among variables that are not visible to standard univariate analysis.
However, many radiomic features tend to identify similar image characteristics, generally providing highly redundant information 34 . This means that, when trying to predict an output based on these features, this information redundancy, coupled with the large numerosity of features with respect to samples (e.g. subjects), corrupts the results making the prediction unstable to noise and prone to over tting and poor generalization 35 . To address this problem, in this work, two main approaches were implemented. The rst approach to dampen this effect was to reduce the number of used features by selecting only those that were highly repeatable between the two masks (delineated by the two radiologists) used (r>0.95).
The second approach was to implement a machine learning framework based on a linear regression analysis that employed a space dimension reduction procedure, namely the partial least square (PLS) regression 36,37 . The PLS was used to predict the treatment response in patients with LARC at an early stage from the set of clinical and radiomic features. PLS has been extensively proven to be highly effective in reducing over tting in the presence of collinearity. The underlying assumption of PLS is that the observed data is generated by a system or process which is driven by a small number of latent variables.
PLS allows the construction of regression equations reducing the predictors to a smaller set of uncorrelated components, i.e. a linear combination of the original predictors, and performs regression on these components 36,38 . The goal of PLS is to identify components that capture most of the information in the independent variables (e.g. linear combinations of all radiomic features) that is useful for predicting the dependent variable (e.g. treatment response), while reducing the dimensionality of the regression problem by using fewer components than the original number of independent variables. PLS can be conceived as a supervised learning version of the Principal Component Analysis (PCA) 39,40 . Of note, the learning process ( tting) of the PLS algorithm provides regression loadings that can be used to retrieve the weights (β-weights) linking the original independent variables with the dependent variable, depicting the importance and sign of the original variables in the prediction process. The PLS has one hyperparameter to be optimized, namely the number of uncorrelated components (linear combinations of the original independent variables) to be used in the regression.
In order to perform hyperparameter optimization and evaluate the generalizable performance of the procedure a possible approach that allows to minimize the loss of samples in the different sets is the nested cross-validation (nCV) 41 . In nCV, data are divided in folds and the model is trained on all data except one-fold in an iterative, nested manner. The hyperparameter optimization and performance assessment are performed on the remaining fold and averaged across iterations. If the number of folds equals the number of samples (one-fold per sample) the procedure is de ned leave-one-out nCV 42,43 . This approach is highly suited for medical applications where each sample represents one subject. In this work, a leave-one-out nCV was implemented to optimize the PLS number of components and to assess the PLS generalization performance.
The leave-one-out nCV PLS analysis was repeated multiple times considering standalone clinical features, standalone radiomic features (with an inter-radiologist repeatability of r>0.95) and combined clinical and radiomic features. MRI-based clinical and radiomic features were also analyzed multiple times considering TC only, TB only, as well as both TC and TB radiomic features (Figure 2d). The machine learning analyses were implemented in Matlab.
Reference Standard. The major pathologic response, assessed for 69 patients on surgical specimens, was considered to be Tumor Regression Grade (TRG) 1 or 2 scores according to Mandard's classi cation [44][45][46] . Alternatively, for 3 patients, a sustained complete clinical response with repeated negative MRI examinations and endoscopy with or without biopsy was considered surrogate for a major pathological response for patients enrolled in watch-and-wait protocols 47 .

Statistical Analysis
The classi cation performances were assessed through Receiver Operating Characteristic (ROC) analyses considering the inferred (out-of-training-sample) treatment response to therapy. Patients who responded to therapy (major pathological response) were attributed to the "negative" group, whereas patient showing a non-major pathological response were attributed to the "positive" group. The ROC analyses were also performed on random shu ed outcomes to simulate the null hypothesis and evaluate its con dence interval (repeated 10 6 times). The ROC analysis delivered an Area Under the Curve (AUC) which, using the random shu ed outcomes, could be transformed into a z-score for assessing its statistical signi cance. The Statistical Analysis was performed in Matlab.
Nine clinical features were available and all of them were considered in the machine learning analysis. Moreover, 1688 radiomic features were extracted for the three image resolutions employed (1 mm, 2 mm and 5 mm) and for TC and TB, leading to a total of 10128 (1688x3x2) features. 1405 of these features showed an inter-reader correlation of r > 0.95 and were used for further analysis.
The weights of the PLS (β-weights), when the machinery was trained on both clinical and radiomic features, are shown in Figure 4. Figure 4a reports the distributions of the β-weights for radiomic and clinical features, whereas gure 4b depicts the β-weights associated to the top 1% of features with the largest β-weights in magnitude, i.e. those most impacting the prediction. Importantly, for all except one (tumor location), of the top 1% of features, the weights were positive, that considering the value of "0" attributed to patients with major pathologic response and the value of "1" to the others during the multivariate regression, depicted a worse response to treatment with increasing feature value. Of note, the only important feature with negative weight was tumor location that, considering the labelling value attributed as a function of location (1=High, 2=Middle, 3=Low), delineated a better response to treatment for tumors in the low rectum. Finally, the impact of TC and TB on the results was assessed. When using the 9 clinical features and 790 radiomic features (with an interradiologist repeatability of r>0.95) extracted only from the TC, an AUC = 0.689 was obtained (z =2.60, p=9.3·10 -3 , Figure 5a). When using the 9 clinical features and 626 radiomic features (with an interradiologist repeatability of r>0.95) extracted only from the TB, an AUC =0.541 was obtained (z=0.56, p=0.57, Figure 5b). Indeed, a highly synergistic effects was obtained when combining TB and TC features, replicating the results previously found with an AUC =0.793 (z =4.00, p =5.6·10 -5 , Figure 5c).

Discussion
Our results demonstrated that an MRI-based machine learning "clinical-radiomic" approach was an accurate method to predict, at an early stage, the treatment response of patient with LARC.
Importantly, the addition of novel radiomic features to standard T2w-based clinical features was indeed useful in the improvement of the prognostic model. Moreover, the prediction further improved when both TC and TB radiomic features were included. These results con rm the promising role of baseline T2w-imaging for the prediction of clinical-outcome 21,22,48,49 . Recent studies have demonstrated the prognostic role of tumor morphology and the adjacent perirectal tissue assessed on T2w-imaging 10,50 .
Furthermore, the importance of the perirectal tissue status for the nal clinical outcome was also underlined by the signi cant reduction of the local recurrences with the introduction of TME thanks to the complete excision of microtumors around the cancer 51 . To the best of our knowledge our study, apart for the added value of radiomic features to standard MRI-based clinical metrics, is the rst demonstrating the synergic role between feature extracted from the tumor core and the tumor border in the early prediction of response to therapy. Since the tumor border naturally include a certain degree of mesorectal compartment, our results con rm that also perirectal tissues contain useful information for the prediction of treatment response, as recently showed by Shaish et al. 52 .
Other studies based on radiomics were recently proposed in literature. However, different radiomics approaches were used: for example, a delta-radiomics approach was used to analyze the variation of quantitative features over time. In this way, Jeon at al. analyzed the difference in radiomic features before and after CRT. The authors demonstrated that delta radiomics signatures were successful predictors of treatment response and independent prognostic factors 53 . In other studies, the predictivity of more extensive MRI protocols was explored 54,55 . For example, Cui et al. investigated the predictive role of a multiparametric radiomics-based approach including T2w, contrast-enhanced T1w and ADC images 54 . Differently from these approaches, our study was focused only on the primary staging and included only T2w images. In fact, our primary aim was to investigate biomarkers of treatment response at an early stage using the most widely used and recommended MRI imaging 7 . Focusing of the important features deemed as such by the machine learning algorithm, in our study the principal predictive features were mainly focused on tumor heterogeneity and clinical, visually assessed, characteristics. For example, tumors with high degree of internal heterogeneity, N+ and with a high clinical T staging were more likely associated to a poor treatment response. Furthermore, cancers located in low rectum were found to have a more favorable outcome thus con rming the crucial role of neoadjuvant chemo-radiation therapy in this category of patients 56 . Since most of the predictive features were extracted by 1-2 mm resolution voxels instead of 5 mm, our results support the recommendations of the last European Society of Gastrointestinal and Abdominal Imaging (ESGAR) consensus meeting 7 . In fact, according to these guidelines, the slice thickness of axial T2-weighted images should be equal or inferior to 3 mm 7 . Similarly to other studies, the accuracy of the prediction improved when a qualitative MRI assessment is added to a quantitative radiomic-based analysis 21,48 . In this way we are con dent that the development of other "omics" disciplines may further improve the overall accuracy of the treatment response prediction 57,58 .
Our study has some limitation. First of all, the sample size was limited and we analyzed a large number of predictive features. This was partially due to the strict inclusion criteria that we adopted. However, the PLS algorithm surely dampened this issue by obtaining a good classi cation performance. However, future studies on larger cohorts are warranted to con rm our ndings, and to further increase the classi cation outcome.
Secondly, ours is a retrospective single-center study. For this reason, further studies, possibly prospective and multicentric need to be warranted to de ne a potential standardization of our approach.
Third, there were some variations in the patient preparation and acquisition parameters of the MRI sequences used in this study. They re ect the daily practice which is subject to a certain degree of optimization or changes over time and this may have had some effect on the results. On the other hand, this fact may be even be viewed as a positive factor since the results may be more generalizable and less dependent on protocol variation between different centers. Fourth, we only focused our prediction on T2weighted imaging, without considering other MRI techniques, such as DWI and Dynamic Contrast Enhanced Imaging (DCE). However, since our objective was the prediction at the primary staging, the potential role of DWI and DCE at this time point is controversial. In fact, none of these techniques is routinely recommended in the current guidelines 7 .

Conclusion
A pre-treatment, MRI-based, "clinical-radiomic" machine-learning model accurately predict the treatment response in patients with locally advanced rectal cancer. The prediction improved when combining MRIbased clinical features and radiomic features, the latter extracted from both tumor core and the tumor border. Exploiting the method, patients with locally advanced rectal cancer could bene t from a tailored approach including conservative strategies. Prospective validation studies in randomized clinical trials are warranted to better de ne the role of radiomics in the development of rectal cancer precision medicine.
Declarations Ethical statement. This study was approved by the local ethics committee. The study used only preexisting medical data, therefore patient consent was waived.
Data Availability. The datasets generated during and/or analyzed during the current study are not publicly available due to the clinical and con dential nature of the material but can be made available from the corresponding author on reasonable request.  ***= assessed according to Mandard Tumor Regression Grade (TRG) system on surgical specimen after neoadjuvant treatment in 69/72 patients. In three patients, a sustained complete clinical response (with repeated negative MRI examinations and endoscopy with or without biopsy) was considered surrogate for a complete response; the follow-up (mean ± SD) was 47±11 months.