Predicting short-term survival after total resection in glioblastomas by machine learning-based radiomic analysis of preoperative MRI

Radiomics, in combination with articial intelligence, has emerged as a powerful tool for the development of predictive models in neuro-oncology. Our study aims to nd an answer to a clinically relevant question: is there a radiomic prole that can identify glioblastoma (GBM) patients with short-term survival after complete tumor resection? the radiomic features of preoperative mpMRI and machine learning-based classication and regression analysis to predict short-term survival in GBM patients. Our model shows an impressive classication accuracy of 80% and an iAUC of 0.76 to predict OS < 6 months in the validation cohort. We believe that these new tools will serve clinicians to understand the biological behaviour of individual GBMs, and we must take advantage of them.


Introduction
Glioblastoma (GBM) continues to be the most threatening primary brain neoplasm, with a median survival of approximately 15 months. 1 Currently, despite the standard treatment that includes maximum safe surgical resection followed by adjuvant chemoradiation therapy 2,3 , its prognosis remains ominous, and our knowledge of this neoplasm is still limited.
Predicting a patient's survival is of vital importance for determining the ideal choice of treatment and management. Currently, several prognostic factors are commonly used to predict the prognosis of these patients, including age, sex, Karnofsky performance status (KPS), molecular pro le, extent of resection, preoperative tumor volume, volume of nonenhancing tumor and degree of necrosis. 4 However, some of these features depend on radiologists' interpretation, which justi es the increasing need for an unbiased and quantitative radiological evaluation.
Magnetic resonance imaging (MRI) plays a fundamental role in neuro-oncology for diagnosing and assessing response to treatment and is being increasingly used as a noninvasive predictive tool. On the other hand, the term "radiomics" refers to the process of obtaining quantitative features based on the intensity, volume, shape and texture variations of the radiological images and creating algorithms that nd the association of these variables with the survival and outcome of the patients. 5 Through radiomics, converting medical images into high-dimensional data allows us to expose the underlying pathophysiology, especially intratumor heterogeneity. 6 This extraction process captures tumor characteristics undetectable to the human eye and gives added value to clinical visual perception. Radiomics incorporates several essential disciplines, including radiology for image interpretation, computerized vision for extracting quantitative variables, and machine learning for classi cation and regression tasks. 7,8 Such integration has been demonstrated to exceed expert human abilities in multiple tasks, including diagnosis and outcome prediction.
Recognizing patients who would not bene t from standard treatment and identifying those who need a more aggressive approach at the time of diagnosis is essential for managing GBM through personalized medicine. 9 There are several publications that, through the integration of radiomics and arti cial intelligence, seek to establish survival prediction models in GBM based on preoperative MRI. [9][10][11][12][13][14][15] In the vast majority of studies, patients are classi ed according to their survival into two or three categories, depending on whether they exceed 10 or 15 months of survival. This approach aims to identify medium-and long-term survivors who could theoretically be subsidiaries of aggressive therapies.
Furthermore, in most studies, the extent of resection is not used as a discriminatory factor, and biopsies and partial and subtotal resections are included.
This fact precludes the implementation of such predictive models in newly diagnosed GBM patients. Our study aims to use the radiological characteristics from structural preoperative multiparametric magnetic resonance imaging (mpMRI) to construct a predictive model of short-term survival in patients in whom total or near-total resection of the enhancing tumor has been performed followed by standard treatment.

Study Population
A retrospective collection of patients who underwent surgery with a diagnosis of GBM was carried out in two institutions between January 2019 and January 2020. In addition, a second cohort of patients was selected from available public databases: the BraTS (Multimodal Brain Tumor Segmentation) Challenge 2020 [16][17][18] and three other sources available through The Cancer Image Archive [Ivy Glioblastoma Altas Project (Ivy -GAP) 19 20 ]. The inclusion criteria were pathologically con rmed glioblastomas, availability of preoperative MRI with structural/conventional sequences [T2-weighted images (T2WI), uid-attenuated inversion recovery (FLAIR), T1-weighted images (T1WI) and contrast-enhanced T1-weighted images (T1CE)] with adequate resolution, known survival status, and clinical information (age and type of surgical resection).
Only those cases in which gross total resection (100% of the enhancing tumor volume) or near-total resection (> 95% of the enhancing tumor volume) were included. Patients were randomly allocated into training and testing data sets following a proportion of 70/30. The primary endpoint was overall survival (OS), which was de ned as the number of days from the initial pathological diagnosis to death (censored = 1) or the last date that they were known to be alive (censored = 0). Public datasets do not have patient identi ers. Hence, no institutional review board approval was required. Nevertheless, the study was approved by the institutional review boards and ethics committees of the other two participating centers.
Additionally, all institutional patients provided written informed consent. The study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments.

Image data description and preprocessing
All BraTS scans were acquired with different clinical protocols and various scanners from multiple (n = 19) institutions. Details of the protocol acquisition of the scans from TCIA and institutional cases are shown in Supplementary Table 1.
Image preprocessing consists of several steps. First, mpMRI scans were converted to Neuroimaging Informatics Technology Initiative (NifTI) format. Then, the scans were placed in a common orientation ["LPS" (left-posterior-superior) in the radiological convention or "RAI" (right-anterior-inferior" in the neurological convention). Later, the scans for every subject were registered to SRI24 anatomical atlas space. 21 N4 bias correction 22 was applied as a temporary step to facilitate optimal registration but was not included at the end of the process since it might obliterate the MRI signal, particularly on the FLAIR modality. 23 The T1W1, T2WI and FLAIR scans were registered to the transformed T1CE scan, resulting in coregistered resampled volumes of 1x1x1 mm isotropic voxels. The brain was then extracted from all coregistered scans using a pretrained deep learning-based model. 24 Finally, intensity Z-scoring normalization was carried out. All preprocessing pipelines were generated using The Cancer Imaging Phenomics Toolkit (CaPTk). 25 Tumor segmentation and feature extraction The method used to generate the segmentation labels of the different tumor regions is called GLISTRboost (Boosted GLioma Image SegmenTation and Registration), 26 which is de ned as a hybrid generative-discriminative tumor segmentation method. This segmentation algorithm comprises a glioma growth model, a discriminative part based on a gradient boosting multiclass classi cation scheme and a Bayesian strategy. 15 Segmentation labels or volumes of interest (VOIs) were as follows: enhancing tumor (ET), non-enhancing tumur/necrosis (NET), and edema (ED). Segmentations were evaluated by two experts (S.C., S.G.) and corrected manually if necessary.
Using the extraction tool of CaPTk, a total of 15720 characteristics were computed from the tumor subregions (i.e., ET, NET and ED) and the four mpMRI modalities following the Image biomarker standardization initiative (IBSI) 27 de nitions. These extracted features included intensity features or rstorder statistics, histogram-based features, and volumetric, morphologic and textural features, including those based on the gray level cooccurrence matrix (GLCM), gray level run-length matrix (GLRLM), neighborhood gray-tone difference matrix (NGTDM), gray level size-zone matrix (GLSZM) and latticebased features. A detailed description of these characteristics is shown in Supplementary Table 2.

Data processing and feature selection
After the extraction of the radiomic features, the data were preprocessed by data cleaning (removing the variables with more than 5% missing values) and imputation (using the average value method).
Afterward, 13265 features were z-score normalized based on the mean and standard deviation. Then, a feature selection process was necessary to reduce these high-dimensional imaging features to avoid over tting. Feature selection helps to optimize the generalizability and reproducibility of the models subsequently built. A two-step selection method was used as follows. Spearman's correlation coe cient was calculated for each pair of radiomic features. Then, the features with Spearman's correlation coe cient > 0.95 with each other were discarded, retaining a single feature in each set. Later, features were reduced by including only the variables with a signi cance < 0.001 with the OS in days. Thus, the number of features was reduced to a set of 260.

Statistical Analysis and Machine Learning
Predicting OS was achieved by two different strategies. The rst is a binary classi cation task between short and long survival. For this purpose and after feature reduction, several classi cation algorithms were assessed for patient strati cation. As a previous step, machine learning (ML)-based lters were used: Gini index (GINI), fast correlation-based lter (FCBF) and information gain (InfoGain). Hence, the top 10 features were selected. Then, ve different ML classi ers were trained: naive Bayes, k-nearest neighbors (kNN), random forest (RF), support vector machine (SVM) and a multilayer perceptron algorithm (neural network -NN). The target response for each model was the patient's OS grouped into two classes to distinguish patients who survived < 6 months (short-term survivors) from others. Then, the results were quantitatively validated on the testing data set. The performance of the ML classi ers was measured by the area under the receiver operating characteristic curve (AUC), accuracy, precision, F1 score and recall. All performance metrics were reported as the average value over classes.
The second statistical strategy was conducted using the random survival forest (RSF) approach from the R package "randomForestRSC" 28,29 , an ensemble-tree method that adapts random forests to rightcensored data and survival analysis. RSF does not rely on restrictive assumptions such as proportional hazards and automatically handles nonlinear effects and interactions of high-dimensional data. Features were ranked by positive importance using a variable hunting algorithm as a feature selector. Model hyperparameters were as follows: number of trees = 500, node size = 2, number of splits = 10 and logrank as the splitting rule. We also evaluated the model's ability to generalize those predictions on the testing group. Training and testing predictions were performed using 5-fold cross-validation.
When the primary outcome is survival (time to event), RSF produces a cumulative hazard function (CHF) from each decision tree that is averaged in an ensemble out-of-bag CHF (OOB-CHF). The predicted ensemble mortality is the mean OOB-CHF estimated by the RSF model for each subject, and it was used to calculate each patient's estimated mortality risk. We used the results from the RSF model to build a mortality risk score and split the sample into high-and low-risk groups. The OOB-CHF cut-off values de ning the risk groups were calculated through the "cutp" function of the "survMisc" package. 30 The logrank test was used to compare the survival Kaplan-Meier curves between the patients in the high-and low-risk groups.
Finally, Cox proportional hazard regression models were tted to the training data set using the dichotomized risk score (high-and low-risk groups) from the RSF model as an explanatory variable, the patient's age and a combination of both. Then, the models were validated in the testing dataset. The performance assessment of the survival models was performed by calculating the prediction errors using the integrated Brier score (IBS) de ned as the average squared distances between the observed survival status and the predicted survival probability by the "pec" package. 31 Additionally, the discriminatory capacity of the model was evaluated by calculating the concordance index (CI), which refers to the probability that, for a pair of randomly allowed samples, the sample with the highest risk prediction experiences an event before the sample with the smallest risk. Furthermore, the integrated area under the time-dependent ROC curve (iAUC) was calculated for all models using the "risksetROC" package. 32 The standard approach of ROC curve analysis considers event (death) status and predictor value for an individual as xed over time. Because the status and explanatory variables change over time, we used the risksetROC package that estimates the iAUC under incident sensitivity and dynamic speci city de nition and produces accuracy measures for censored data under proportional or nonproportional hazard assumption of Cox regression estimator. 33 Following the objective of our study, we also calculated the iAUC at six months for all models.
Statistical and survival analyses were performed with R version 4.0.5 (R Foundation for Statistical Computing, Vienna, Austria). The differences in age, OS, the proportion of right-censored cases and shortterm (< 6 months) survival cases were assessed using Student's t-test, Mann-Whitney U test and twoproportions Z-test, respectively. For the binary classi cation model, we used Orange version 3.28.0 (University of Ljubljana, Slovenia). 34 The radiomics quality score (RQS) was calculated for this study according to the recommendations by Lambin et al. 35 A p value < .05 was considered to indicate a statistically signi cant difference. The image processing and statistical analysis work ow are shown in

Patient population
Two hundred and three patients were enrolled in this study. The mean age was 61.49 ± 11.76 (range 27.81 -86.65). The median OS was 407 days [interquartile range (IQR) = 351.5]. A total of 7.9% (16) of patients were right-censored cases, and 17.24% (35) registered an OS of less than six months. The patients were randomly assigned to a training data set of 143 patients and a testing data set of 60 patients. There were no signi cant differences in age, OS, or proportion of right-censored and short-term survival cases between the training and testing data sets. (Table 1)   Table 3). Based on these characteristics and using the ML classi ers, patients were classi ed into shortterm survivors (< 6 months).
The optimal results were obtained by applying the information gain as a feature selector. Thus, in the training cohort, AUC values were achieved with a range between 0.802 and 0.978, a classi cation accuracy between 81.8% and 94.4%, and a precision between 82.8% and 94.8%. (Supplementary Table 4) In the testing cohort, the naive Bayes classi er obtained the best results, with an AUC of 0.769, a classi cation accuracy of 80%, and a precision of 81%. (Table 2 and Figure 2)

Random Survival Forest to predict OS
In the radiomic model based on RSF, the variable-hunting algorithm selected 17 radiomic features to predict OS in the training dataset. Based on these characteristics, the mortality risk score was calculated using the OOB-CHF. The cut-off point used was 0.684. This cutoff point allowed patients to be strati ed into low-risk and high-risk groups [HR = 2.19, (95% CI: 1.54 -3.12), log-rank p = <.001, C-Index = 0.61, IBS = 0.096]. In the testing dataset, patients were also strati ed using the same cutoff point [HR 2.16, (95% CI: 1.21 -3.89), log-rank p = .008, C-Index = 0.61, IBS = 0.123]. The multivariate Cox regression models in which age was incorporated as an explanatory variable are shown in Table 3. The iAUC of the radiomic model was 0.591 in the training cohort and 0.568 in the testing cohort. By incorporating age as a variable in the model, the iAUC increased to 0.650 in the training cohort and 0.627 in the testing dataset.
After setting the survival time to 6 months, the predictive accuracy of the radiomic model improved to an iAUC of 0.712 in the training dataset and 0.761 in the testing dataset. (Supplementary Figure S1) The RQS was used to evaluate the methodological quality of our study. We obtained a score of 19/36 (53 %). A detailed report of RQS items is shown in Supplementary Table 7.

Discussion
In the present study, we elaborated a prediction model of short-term survival with high predictive capacity using the radiomic features of structural preoperative multiparametric MRI of GBM patients.
We believe that the main strength of our study is based on a selection of patients who underwent total or near-total resection of the enhancing tumor. We considered this methodologic aspect due to the undeniable link between the extent of resection and survival in these patients. 36,37 In most previous studies, the extent of resection was not used as a selection criterion, including partial resections and biopsies in their series, without making any adjustment during the analysis phase. The exception is the studies by Bakas et al. 15 and Fathi et al. 38 , in which the entire cohort of patients underwent complete resection and standard chemoradiotherapy treatment.
Another crucial point of our work is to set our objective to identify short-term survival patients, in contrast to previously published studies where 10 and 12 -15 months were used as cutoff points for de ning short-and long-term survival, respectively. 10,11,13,[39][40][41][42][43][44] The only reference we found is in the work of Prasanna et al. 45 , who classi ed patients in long (> 18 months) versus short-term (< 7 months) survival based on peritumoral region radiomic features. The rationale of our approach lies in the desire to predict the survival of patients diagnosed with GBM by noninvasive methods and to identify those with very short survival. In these patients, the futility of our treatments would lead us to offer patients and their families the option of not taking aggressive measures or, on the contrary, opening new lines of research since those cases would be poor responders to the standard therapies applied currently.
As another strength of our work, we can mention the use of open-source software. The CapTk and Orange programs have a very intuitive yet robust user interface, thanks to which clinicians can access advanced image processing technics and data mining tools. Thanks to these programs, we have performed complex tasks such as automatic tumor segmentation, image processing, radiomic feature extraction, and exploring different ML-based algorithms.
Concerning statistical analysis, we used a dual approach. On the one hand, we have used a binary classi cation system using different ML-based algorithms. Additionally, we used state-of-the-art survival analysis techniques such as random survival forest and time-dependent ROC curve analysis focused on short-term survival that contribute to corroborating the stability of the models produced here.
We also highlight that the results of our predictive models have been achieved using only structural MRI. 15 These results could even be improved after the inclusion of studies based on diffusion and perfusion sequences. 46 However, basic MRI is available in most centers, and according to our results, the lack of special sequences is not a limitation in the search for useful radiological patterns in clinical practice.
An important aspect to discuss is the biological correspondence of the variables employed by the prediction models. There is notable variability concerning the radiomic characteristics used by previous studies, which is one of the most signi cant obstacles in reproducing and validating their results. In our study, most of the selected variables came from the T1CE sequence followed by FLAIR and T2WI, while the different tumor subregions (i.e., ET, NET and ED) were represented in the models in a balanced way. In our series, rst-order features and morphological characteristics appeared to be important for OS prediction.
We are aware of the limitations of our work, such as the lack of clinical and molecular data that can be incorporated into predictive models. Even so, age as an explanatory variable was incorporated into our models due to its signi cant association with the OS of these patients, proving that its mere incorporation into the analysis improved the performance of the models. Despite having a relatively small sample size, various statistical techniques have been applied to overcome the "curse of dimensionality". Taking into consideration that MRI studies come from numerous sources, the processing method for image standardization that we have chosen aims to be simple and reliable and has been used by several studies. 38,47,48 Unquestionably, the combination of texture analysis and arti cial intelligence is starting to facilitate knowledge about the biological behaviour of GBMs through the study of their patient-dependent heterogeneity. However, the rapid development of big data tools and the tremendous complexity of advanced medical image analysis dangerously threaten widening the gap between data experts and clinicians. Then, it is a paradox that radiomics, de ned by Lambin et al. 35 as "the bridge between medical imaging and personalized medicine", is now out of reach of those who treat real patients every day. Therefore, our study arises from a real need and aims to nd a solution to a clinically relevant problem: identifying GBM patients with short survival after complete resection. Although our results can be improved, we show that there are currently computer tools and public data sets available to everyone to develop reliable predictive models. Hence, our duty as clinicians is to become immersed in developing these models since our pragmatism can never be replaced even by the most complex algorithm.
Indeed, our results are encouraging, and the precision achieved is similar to the previous literature. However, this article represents an early age of a promising future in which the ultimate link between image, diagnosis and prognosis could nally be decoded to provide instant, useful and precise information to individual patients based on their speci c features. Multiinstitutional studies 49 would allow the generalization of predictive models or even adapt the mechanisms of data preprocessing, extraction, and analysis to the MRI from each center since the standardization of acquisition protocols is not feasible. Finally, we believe that in this catastrophic disease, the quality of life of our patients should be our rst consideration, and maximum exploitation of available neuroimaging techniques should be pursued to optimize management strategies avoiding unnecessarily aggressive therapies in those patients who will not bene t from them.

Conclusion
In the present study, we evaluated the capability of the radiomic features of preoperative mpMRI and machine learning-based classi cation and regression analysis to predict short-term survival in GBM patients. Our model shows an impressive classi cation accuracy of 80% and an iAUC of 0.76 to predict OS < 6 months in the validation cohort. We believe that these new tools will serve clinicians to understand the biological behaviour of individual GBMs, and we must take advantage of them.

Funding
No funding was received for this research.
Con ict of interest: All authors certify that they have no a liations with or involvement in any organization or entity with any nancial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements) or non nancial interest (such as personal or professional relationships, a liations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.