MRI radiomics to differentiate between low grade glioma and glioblastoma peritumoral region

The peritumoral region (PTR) of glioblastoma (GBM) appears as a T2W-hyperintensity and is composed of microscopic tumor and edema. Infiltrative low grade glioma (LGG) comprises tumor cells that seem similar to GBM PTR on MRI. The work here explored if a radiomics-based approach can distinguish between the two groups (tumor and edema versus tumor alone). Patients with GBM and LGG imaged using a 1.5 T MRI were included in the study. Image data from cases of GBM PTR, and LGG were manually segmented guided by T2W hyperintensity. A set of 91 first-order and texture features were determined from each of T1W-contrast, and T2W-FLAIR, diffusion-weighted imaging sequences. Applying filtration techniques, a total of 3822 features were obtained. Different feature reduction techniques were employed, and a subsequent model was constructed using four machine learning classifiers. Leave-one-out cross-validation was used to assess classifier performance. The analysis included 42 GBM and 36 LGG. The best performance was obtained using AdaBoost classifier using all the features with a sensitivity, specificity, accuracy, and area of curve (AUC) of 91%, 86%, 89%, and 0.96, respectively. Amongst the feature selection techniques, the recursive feature elimination technique had the best results, with an AUC ranging from 0.87 to 0.92. Evaluation with the F-test resulted in the most consistent feature selection with 3 T1W-contrast texture features chosen in over 90% of instances. Quantitative analysis of conventional MRI sequences can effectively demarcate GBM PTR from LGG, which is otherwise indistinguishable on visual estimation.


Introduction
Radiomics is an emerging field in medicine and oncology involving quantitative feature analysis of radiographic images [1,2]. Radiomics can provide insights into tumor biology as the imaging features can be correlated to the tumor microenvironment and genomic information, as obtained from histopathology and molecular studies, and thereby serve as a surrogate marker for treatment outcomes [3]. Gliomas are a heterogeneous group of tumors arising from glial cells and named after their resemblance to glial cell lineages such as astrocytoma and oligodendrogliomas and classified into four grades that characterize biological behavior [4]. For the work here, we refer to grade 2 infiltrative gliomas as low grade The results of the study were presented at the 62nd annual meeting of the American Society for Radiation Oncology (ASTRO) in October 2020.
Nauman Malik and Benjamin Geraghty contributed equally to the study (co-first authors).
Arjun Sahgal and Gregory J. Czarnota contributed equally to the study.
* Gregory J. Czarnota gregory.czarnota@sunnybrook.ca Extended author information available on the last page of the article glioma (LGG), which are typically identified on magnetic resonance imaging (MRI) as T2-W hyperintense lesions. On MRI, distinct anatomical compartments can be identified in glioblastoma (GBM), including tumor enhancement appearing as contrast-enhancing regions on T1-weighted sequences, with a central non-enhancing necrotic core and the adjacent peritumoral region (PTR) appearing as a T2-weighted hyperintense area. Although the structural composition is quite distinct, the radiological appearance of LGG is similar to the PTR of GBM. In LGG, the areas of T2-weighted hyperintensity represent tumor, whereas GBM PTR is an admixture of microscopic infiltrative tumor and vasogenic edema [5,6]. The spatial information regarding the presence of microscopic infiltrative disease within the PTR of GBM is indistinguishable from the edema using semantic MRI features. The use of computer vision and quantitative imaging analysis in the form of radiomics can unfold high-dimensional information in an image and can potentially outperform low-level visual feature interpretation. Since the structural composition of the LGG and the GBM PTR is different, an exploratory analysis was contemplated in the current study to investigate the ability to differentiate the two groups using a radiomics approach.
Our previous work demonstrated a radiomic-based approach using 270 features to categorize individual voxels in the PTR of GBM on a probabilistic scale to indicate infiltrative tumor or edema extrapolating feature characteristics from LGG and PTR of brain metastasis, respectively [7]. The hypothesis of the current study was the ability of radiomics to effectively distinguish between the T2W hyperintensity of LGG (tumor) from GBM PTR (tumor and edema). The presented work was undertaken to differentiate the entire region of GBM PTR from LGG, expanding the number of features (3822) using various filtration techniques. As opposed to 10 patients with GBM with post-operative MRI analyzed in the previous work, in this study, 42 patients with GBM were analyzed using preoperative index MRI. Also, in the present study, different feature selection techniques (all 270 features were used in the previous study for modeling) along with four machine learning classifiers (only support vector machine classifier in previous work) were investigated. The radiomics model can be explored in future studies to identify patches of microscopic disease within the PTR of GBM that can help reduce the clinical target volumes for radiation rather than treating the entire T2W signal changes.

Patient selection
This retrospective study was approved by the institutional research ethics review board, and the requirement of obtaining consent was waived. Adult patients (≥ 18 years of age) with IDH-wildtype GBM and LGG were identified from an institutional database from January 2014 to December 2018, with MRI data available from two identical scanners. For patients in the GBM cohort, preoperative MRI was used for analysis, usually undertaken within one week of surgery. The LGG cohort included grade 2 astrocytoma and oligodendroglioma. Patients with high-risk or atypical features on histopathological evaluation or higher grade features on radiology, like intervening contrast-enhancing regions, significant T2W heterogeneity, were excluded. Post-operative scans were included in the study for the LGG cohort, but patients with prior radiotherapy or chemotherapy were excluded. During segmentation, the surgical cavity and tract were manually excluded. Tumors with segments smaller than 1 cm on at least one dimension were excluded from the study.

MRI protocol
Imaging was acquired on a GE Signa HDxT 1.5 T MRI scanner. using an 8-channel head coil (General Electric Medical Systems, Waukesha, WI, USA) was used. Three sequences were included in the current study: Gadolinium (Gd) (Gadovist 01 mmol/kg, 10 ml maximum, bolus) enhanced 3D T1-weighted FSPGR (T1-CE); Gd-enhanced T2 fluidattenuated inversion-recovery PROPELLOR (T2-FLAIR); and diffusion-weighted imaging-derived apparent diffusion coefficient maps (ADC). A single-shot echo-planar imaging sequence with 3 diffusion directions and a b-value of 1000 s/mm 2 was used to obtain diffusion-weighted images, and ADC maps were reconstructed online using GE's Functool. The MRI acquisition parameters are shown in Supplementary Table S1. Figure 1 shows the schema used for the current study. The T2-FLAIR and ADC scans were first resampled to the corresponding T1-CE volume to match the field-of-view and resolution using the FMIRB Software Library (FSL) tool FLIRT. A pre-trained artificial neural network-based automated method was used for skull stripping using HD-BET. The extracted brain volumes from the T2-FLAIR scans were rigidly registered to the corresponding T1-CE volumes using FLIRT. Data handling and scripting were performed in Matlab R2018b (The Mathworks, Inc., Natick, MA, USA). The T1-CE, T2-FLAIR, and ADC volumes were combined into single workspaces for each patient using the software platform ITK-SNAP (http:// www. itksn ap. org) for manual segmentation [8].

Image preprocessing and segmentation
Segmentation was carried out manually initially by a radiation oncologist (AD), and all cases were individually reviewed by a neuroradiologist (PM) and a neuro-radiation oncologist (AS) in order to achieve final consensus. The segments in LGG included the tumor observed as T2-FLAIR hyperintensity, with the surgical cavity excluded for postoperative cases. In patients with GBM, the PTR was segmented to include the T2-FLAIR hyperintense region beyond the contrast-enhancing tumor core ( Supplementary  Fig. S1).

Normalization & feature extraction
Prior to feature extraction, z-score intensity normalization was performed for all images by centering all pixel intensities within each brain to zero and then dividing by the standard deviation. The normalized images were then multiplied by 100 and shifted by 300 to ensure that the ± 3σ majority of pixel intensities were non-negative. Fixed bin width (FBW) quantization was used to discretize pixel intensities within each segment [9,10]. The FBWs and corresponding bin counts (BC)s for the T1c images was 13 (BC median = 52; BC range = 16-87); for the T2f images was 20 (BC median = 27; BC range = 16-43); and for the ADC images was 7 (BC median = 52; BC range = 18-124).
Feature extraction was performed using PyRadiomics software V2.2.0 The feature set included the following: 18 first-order statistical features; 22 Gy level co-occurrence matrix (GLCM) features; 16 Gy level size zone (GLSZM) features; 16 Gy level run length matrix (GLRLM) features; 5 neighboring gray tone difference matrix (NGTDM) features; and 14 Gy level dependence matrix (GLDM) features. Additional features were extracted by pre-processing images, either wavelet or Laplacian of Gaussian (LoG). LoG features were extracted with kernel sizes of 1, 2, 3, 4, and 5 mm. All features were extracted from the segments in 3D. Pixel intensities outside the segments were set to zero prior to image filtration to reduce the effect of contamination. 91 features were derived from unfiltered images, 728 from wavelet filtered images and 455 from LoG filtered images, resulting in 1274 features per modality and a total of 3822 features. A detailed description of the features can be found on the Pyradiomics website (https:// pyrad iomics. readt hedocs. io/ en/ latest/ featu res. html).

Feature selection & classification
All model building steps were performed in python using scikit-learn V0.22.2 [11]. Three feature selection approaches were utilized. Two were filter-based methods: ANOVA F-Test and minimum redundancy maximum relevance (mRMR). Recursive feature elimination (RFE), a wrapper-based approach, was also tested. A linear support vector machine (SVM) classifier (regularization parameter C = 1) was used as the base learner, and at each iteration, the 5 least important features were eliminated from the total set until a predetermined number of features was returned. For each feature selection method, sets of the top 4 features were included. This threshold was chosen based on the commonly employed rule-of-thumb that models should be trained on datasets that have at least 10 times as many training samples per class as the number of features to mitigate overfitting [12]. In order to demonstrate the impact of feature selection on classification performance, a fourth strategy used no feature reduction i.e. models were trained on all available features.
In order to prevent data leakage that can be present when the same samples used for feature selection are again used for model validation, a leave-one-patient-out (LOPO) crossvalidation approach was utilized. This strategy precludes the identification of an "optimal" feature set, as different features can be selected as the training fold is permuted; however, it reduces the optimistic bias that can impact radiomics studies that make use of internal validation schemes. In lieu of identifying an "optimal" feature set, this approach can provide descriptive statistics of the selection frequency of different feature types and allow a degree of stability assessment with respect to feature selection. Features from the training data were scaled to zero mean and unit standard deviation at each LOPO iteration and the learned scaling parameters were applied to the features of the test patient.
Four machine learning classifiers were investigated in this study: support vector machine with a linear kernel (SVM); K-nearest neighbors (K-NN); linear discriminant analysis (LDA); and adaptive boosting using decision stumps as the base learner (AdaBoost). Hyperparameter selection was repeated at each LOPO cross-validation iteration through grid-search with nested fivefold cross-validation to maximize balanced accuracy across the nested cross-validation procedure. For SVM, the range of the regularization parameter C was 10 -4 to 10 5 in multiples of 10; for K-NN, the range for the number of neighbors was from 1 to 11 in steps of 1; and for AdaBoost, the range for the number of trees was from 50 to 450 in steps of 100 (Supplementary Table S2). All other tunable model hyperparameters were left as their default values assigned by scikit-learn (https:// scikit-learn. org/0. 22/). LOPO model performance was quantified by accuracy, sensitivity, specificity, and the area under the receiver operator characteristic curve (ROC AUC).

Classification
A heat map of the performance metrics for each combination of feature selection method and classifier is presented in Fig. 2, and the LOPO ROC curves are shown in Fig. 3. Among the three feature selection methods tested, RFE resulted in the highest ROC AUC for all four classifiers (0.868-0.916). The ANOVA F-test method resulted in the lowest accuracy scores for all classifiers (67.9-70.5). Since feature selection was performed independently at each LOPO iteration to prevent data leakage, the sets of selected features varied. With 4 selected features for each of the 74 LOPO iterations, the maximum number of unique features that could be selected was 296. Supplementary Figure S2 depicts a visual breakdown of the selected features by type across all LOPO iterations for each feature selection method. The majority of features selected were from T1-CE sequences (33-44%), followed by ADC (33-39%). The ANOVA F-test was the most stable method with respect to single patient permutation of the training set as only 9 unique features were selected, two of which were selected every time. In contrast, mRMR and RFE were less stable, with 62 and 46 unique features selected, respectively. Heat maps of the 5 most frequently selected features for each method along with their selection frequencies are shown in Fig. 4. Feature maps for 1 representative patient each from GBM and LGG for three selected features are presented in Supplementary Figure S3, showing significant differences on visual inspection.
In the cases of model development without feature selection, i.e. all 3822 features included, the accuracy of the SVM classifier matched the highest score when feature selection was applied (83.3%) while LDA (87.2%) and AdaBoost (88.5%) achieved their highest accuracies. Additionally, the highest ROC AUC for SVM, LDA, and AdaBoost was obtained when all features were used for training, with AUCs of 0.918, 0.915, and 0.958, respectively. In contrast, the K-NN methodology performed slightly worse in terms of both accuracy and ROC AUC as compared to the highest performance observed with feature selection.

Discussion
Radiomics has the potential to integrate clinical information and outcomes with radiology using computer analytics leading the way towards the development of novel biomarkers in oncology. Magnetic resonance imaging is considered the gold standard imaging modality for the management of central nervous system (CNS) malignancies, and it is particularly useful due to its high soft tissue contrast and the availability of different functional sequences, which can aid in tissue characterization [13]. Radiomic analysis has been widely undertaken in various CNS tumors, including gliomas, aiding in the differentiation between tumor histologies, grading, genetic profiling, and prognostication [14,15]. For patients with GBM, the PTR has been described as the area around contrast enhancement identified as hyperintense region on T2 weighted and T2-FLAIR MRI sequences [16]. The complex microenvironment in the PTR region contains edema, vascular alterations, neoplastic cells, astrocytes, oligodendrocytes, inflammatory cells, microglia, stromal cells, pericytes [17], as well as GBM stem cells [18]. As approximately 90 percent of recurrences are witnessed in the PTR region [19], understanding tumor cell heterogeneity and microinvasion of cells in this region have the potential to impact treatment. Prasanna and colleagues [20] predicted survival outcomes using the top ten performing radiomic features in the PTR on routine MRI sequences. Despite the radiographic similarities on conventional MRI sequences, in contrast to GBM PTR, the architecture of LGGs is quite different.
LGGs infiltrate surrounding brain tissue, but unlike GBM, they lack microvascular endothelial proliferation, necrosis and exhibit a lower Ki-67 proliferative index [21].
In this study, a radiomics approach using conventional MRI sequences distinguished GBM PTR from LGG with reasonable performance. The feature selection was done in this study using three approaches as there is a potential risk of overfitting as a total of 3822 features were extracted with 74 patients in the output group. The model was trained with features extracted from two groups as the PTR of the GBM (excluding the enhancing component of the tumor) and LGG. Although a chance of overfitting of classifiers exists given the relatively smaller sample size (n = 74), feature selection and cross-validation techniques provide confidence in mitigating a major risk of overestimation of performance indices. Achieving good results with AUCs over 0.70 across all the classifiers demonstrates the ability of the radiomics in differentiating the two cohorts. However, with an increase in sample size, the reliability and robustness of the models are expected to increase as well as the stability of the features can be ascertained in the future. Applying well-established classifiers with three feature selection algorithms to each of the sequences, T1-CE, T2-FLAIR, and ADC resulted in relatively high levels of discrimination. In terms of classification accuracy, performance ranged from 67.9% to 88.5% among the various models and feature selection approaches tested. Interestingly, the highest performing model was AdaBoost without using feature selection. The ROC AUC was also highest for SVM and LDA when all features were used for training as compared to models built using only 4 features. These models appear to handle high dimensional data as regularization and/or relative feature importance are implicit in their design. In contrast, the K-NN methodology performed worse when feature selection was not used prior to model training. K-NN strictly computes the distance of samples to its neighbors in the feature space and therefore assigns no relative importance to different features. When the number of features is large, the performance of local approaches such as K-NN is known to deteriorate, which is encompassed in the phenomenon referred to as the "curse of dimensionality." A large number of features and the limited number of samples likely contributed to spurious associations between samples, which resulted in good specificity (86.1%) but poor sensitivity (69.0%). These results indicate that the risk overfitting attributable to feature dimension relative to sample size is both model and data-dependent and can be observed using robust cross-validation. In the LGG cohort, we have included both oligodendroglioma and astrocytoma. Some studies have shown the ability of radiomic analysis to differentiate between histology or molecular phenotype in LGG [22]. We have not undertaken an analysis to differentiate between the two histologies due to the lesser number of patients with individual tumor types. It will be interesting to perform future studies to investigate the ability of radiomics to differentiate between low-grade oligodendroglioma, astrocytoma, and the GBM PTR (arising from astrocytic lineage) individually.
Distinguishing between LGG tumors and the PTR region of GBM has potential clinical applications. Previous work had demonstrated a radiomic model could demarcate brain metastasis peritumoral region (prototype for vasogenic edema) from LGG (prototype for infiltrative tumor) [7]. The model was extrapolated on the individual voxels within PTR of GBM to identify areas of microscopic tumor, which was shown to positively correlate with future areas of recurrence. In the present study, we had included a higher number of patients with GBM using preoperative MRI, a large number of features were generated, multiple feature selection methods, and four machine learning classifiers were used for model development. The current work can be expanded to independently identify microscopic disease in the GBM PTR using the LGG-like signature. Also, the radiomics model can be integrated with the previous work to improve the classification performances and increase the reliability of a combined model. Some studies have shown the potential improvement in clinical outcomes with surgical excision of the PTR region in GBM [23,24], which is currently studied in a Canadian phase 2 randomized trial (NCT04737577). The present study will be particularly important in the preoperative identification of infiltrative microscopic disease since the images in the GBM cohort had preoperative MRI as opposed to post-operative imaging in the previous work. Similarly, identifying areas of infiltrative disease within the GBM PTR region can help in individualized radiation planning. Generally, radiation volumes include large margins empirically to include the entire T2W hyperintense areas without spatial information on the microscopic disease, which can lead to increased toxicity. In a recent work, the use of small 5 mm margins beyond the gross tumor and cavity has shown survival outcomes similar to contemporary clinical trials [25]. The MRI-radiomics model will form an important basis in margin reduction in patients with GBM to be treated using an MRI-linear accelerator device (MR-LINAC) with GBM for a prospective phase 2 study of GBM (NCT04726397). Also, by identifying low grade disease within the PTR, this can allow differential dosing as low grade disease can be effectively treated with doses ranging from 45 to 54 Gy in 1.8-2.0 Gy per fraction, and the high grade components can be treated with 60 Gy in.1.8-2.0 Gy/day. This may reduce risks of radiation necrosis and help generating personalized clinical target volume delineation and dosing.
The work here has some limitations. This was a retrospective study with a limited sample size on one model of an MRI scanner. Future work can include the standardization of imaging features from different MRI scanners to validate the general applicability of such a model. In the LGG group, a proportion of images were obtained in the post-operative setting. Although the surgical tract and cavity were carefully excluded, changes including post-operative edema cannot be completely ruled out. The inclusion of probable areas of post-operative edema in the LGG group can potentially impact the features extracted since the rationale of inclusion of this group was to include tumor cells rather than edema. With the expansion of the study and availability of a higher number of patients, it will be important to limit the LGG cohort to pre-operative imaging to avoid such confounding factors. Further studies are needed to include a larger number of patients, incorporating other MRI sequences, and undertaking external validation approaches to establish a reliable and reproducible radiomics model.

Conclusion
A radiomics model using routine MRI sequences can discriminate between low grade gliomas and the glioblastoma peritumoral region with acceptable performance, despite their visual resemblance. The best results were obtained using an AdaBoost classifier with an AUC of 0.96 to differentiate between the two groups.