2.1 Patient enrolment
We retrospectively collected 353 diffuse glioma patients across three datasets: 216 from our institution (Center1) and 137 from the Cancer Genome Atlas (TCGA). The inclusion criteria were as follows: (1) patients over 18 years at initial diagnosis with pathologically confirmed primary diffuse glioma post-surgery or biopsy; (2) availability of molecular information that support the WHO-CNS5 classification; (3) postoperative survival exceeding 4 weeks, with survival data accessible; (4) preoperative MRI with all sequences necessary for this study; (5) exclusion of cases with image quality compromised by factors such as patient movement or significant artifacts; (6) exclusion of patients who had undergone radiotherapy or chemotherapy prior to the MRI; (7) exclusion of cases where the tumor was located in the brainstem and deemed inoperable. The detailed inclusion and exclusion data are depicted in Supplementary Fig. 1. We divided the patients into a training cohort composed of half of the Center1 and all TCGA cases, and a test cohort consisting of the remaining Center1 cases. Baseline characteristics of all datasets are presented in Table 1.
Table 1
Datasets used in this work within the training cohort and test cohort. Tumor lesions involving multiple brain regions have been counted multiple times. SD = Standard Deviation.
| Training cohort | | Test cohort |
Center1 (N = 108) | TCGA (N = 137) | | Center1 (N = 108) |
Age (year) | | | | |
Mean (SD) | 53.8 (13.8) | 54.4 (14.0) | | 51.2 (13.3) |
Median [Min, Max] | 55.5 [20.0, 84.0] | 55.0 [21.8, 85.7] | | 52.0 [25.0, 82.0] |
Sex | | | | |
Male | 65 (60.2%) | 71 (51.8%) | | 57 (52.8%) |
Female | 43 (39.8%) | 66 (48.2%) | | 51 (47.2%) |
Status | | | | |
Alive | 52 (48.1%) | 51 (37.2%) | | 61 (56.5%) |
Dead | 56 (51.9%) | 86 (62.8%) | | 47 (43.5%) |
Survival time (month) | | | |
Mean (SD) | 25.4 (20.7) | 21.5 (17.5) | | 28.6 (22.0) |
Median [Min, Max] | 19.1 [3.17, 97.8] | 16.9 [2.13, 95.6] | | 23.6 [2.13, 97.8] |
CNS5 class | | | | |
Astrocytoma | 19 (17.6%) | 16 (11.7%) | | 21 (19.4%) |
Oligodendroglioma | 20 (18.5%) | 38 (27.7%) | | 19 (17.6%) |
Glioblastoma | 69 (63.9%) | 83 60.6%) | | 68 (63.0%) |
IDH status | | | | |
Mutation | 39 (36.1%) | 54 (39.4%) | | 40 (37.0%) |
Wild-type | 69 (63.9%) | 83 (60.6%) | | 68 (63.0%) |
1p19q status | | | | |
Co-deletion | 22 (20.4%) | 50 (36.5%) | | 22 (20.4%) |
Non Co-deletion | 56 (51.9%) | 16 (11.7%) | | 61 (56.5%) |
Not available | 30 (27.8%) | 71 (51.8%) | | 25 (23.1%) |
Region | | | | |
Left (vs Right) | 54 (50.0%) | 74 (54.0%) | | 56 (51.9%) |
Midline crossing | 7(6.5%) | 6 (4.4%) | | 9 (8.3%) |
Frontal lobe | 52 (48.1%) | 55 (40.1%) | | 59 (54.6%) |
Temporal lobe | 39 (36.1%) | 64 (46.7%) | | 32 (29.6%) |
Parietal lobe | 33 (30.6%) | 38 (27.7%) | | 23 (21.3%) |
Occipital lobe | 15 (13.9%) | 26 (19.0%) | | 8 (7.4%) |
Insular lobe | 42 (38.9%) | 54 (39.4%) | | 43 (39.8%) |
Patients underwent comprehensive preoperative planning and tumor resection under neuro-navigation, with biannual telephone or outpatient follow-ups post-discharge. Baseline and survival data for TCGA dataset was acquired from the GlioVis website (http://gliovis.bioinfo.cnio.es/ ). All used MRI data used were anonymized, and the study was approved by the ethics committee of XXX (anonymized for review). Given the exponential growth in radiomics research, standardized imaging study methodologies are necessary. Our research complies with the CLEAR checklist[17], detailed in Supplementary Method 1.
2.2 MRI data acquisition and preprocessing
MRI data from the TCGA datasets can be accessed through the TCIA website (http://www.cancerimagingarchive.net/ ), including scans from 1.5T or 3.0T MRI scanners with axial sections. At our institution, preoperative imaging was done using 3.0T-MRI scanners (Siemens Magnetom Verio and Skyra) equipped with eight-channel head coils. Each patient underwent standard axial T1WI, T2WI, and FLAIR sequences, followed by contrast-enhanced, sagittal, thin-slice 3D-T1WI following an intravenous bolus of gadopentetate dimeglumine (Gd-DTPA) at 0.1 mmol/kg and 4 ml/s. We performed DWI acquisition before injecting the contrast agent using a spin-echo echo planar imaging (SE-EPI) sequence applying diffusion gradients in three orthogonal directions to reduce anisotropy effects in the DWI signal. The scanning parameters for the original sequences are detailed in Supplementary Table 1. All sequence files were initially converted from DICOM to NIFTI format using dcm2niix (https://github.com/rordenlab/dcm2niix ), followed by extraction of three-dimensional images at b-values of 0 s/mm² and 1000 s/mm² for the DWI data using the "SimpleITK" library in Python. ADC values for all voxels were calculated using a mono-exponential model [18]: ADC = ln[S(b)/S(0)]/b, where S(b) represents the signal intensity at a given b-value, and S(0) is the signal intensity at b = 0 s/mm², resulting in the generation of the ADC map. The DWI with a b-value of 1000 s/mm² was selected for subsequent processing.
After obtaining six 3D NIFTI files, we performed MRI preprocessing that included: (1) Rigid registration of all sequences with the average brain template atlas of MNI152 using Advanced Normalization Tools (ANTs, https://github.com/ANTsX/ANTs ), thereby adjusting spatial positioning and resampling the sequences to an MNI space (https://nist.mni.mcgill.ca/atlases/ ) with dimensions of 193×229×229 and voxel size of 1mm×1mm×1mm. (2) Each of the remaining five sequences underwent rigid registration to the T1CE sequence to ensure alignment within the image set of the same patient, designating T1CE as the reference for further non-linear registration processes. (3) Abandoning the strategy of individually registering each sequence to the standard brain atlas, we built upon the foundation of the rigid registrations by subjecting the T1CE sequence to further affine registration and the concluding SyN registration to the MNI152 template. During this process, ANTs generated intermediary files containing transformation parameters, which were then applied to the remaining five sequences, thus enhancing computational efficiency while maintaining alignment with the T1CE sequence. (4) All sequences underwent N4 bias field correction [19] to improve image quality. (5) The "White Stripe" R package [20] was utilized for normalization of image signal intensity. (6) The brain was extracted using the corresponding MNI152 brain mask, culminating in the acquisition of fully preprocessed images. A schematic representation of these steps is illustrated in Fig. 1A.
2.4 VOI segmentation and feature extraction
Utilizing Lin et al.'s open-source CKD-TransBTS framework [21], we performed fully automated glioma segmentation. This model synergistically harnesses the advantages of Transformers and CNNs, affording the capacity for precise localization of lesion boundaries in 3D imaging. It outperformed others in the BraTS 2021 challenge, achieved state-of-the-art (SOTA) performance. The model evaluates features on T1WI, T2WI, FLAIR, and T1CE sequences to generate volumetric region of interest (VOI), identifying three areas: enhancing tumor (ET), peritumoral edema (ED), and necrotic core (NCR).
Subsequently, the segmentation outputs were subjected to the review of two neuroradiologists with over a decade of professional experience. They imported the preprocessed sequences and the VOIs into 3DSlicer (https://www.slicer.org ), meticulously verifying the congruence of the ET, ED, and NCR regions with their radiological significance. Utilizing semi-automatic algorithms such as thresholding and flood-filling methods, they refined the segmentation and eliminated erroneously delineated areas. This process ensured the segmentation's precision while maintaining high efficiency. An illustrative example of the delineation results is presented in Fig. 1A.
As a significant number of non-GBM cases in the dataset did not show post-contrast enhancement or an ET area, we redefined these subregions into two clinically relevant areas: tumor core (TC), which includes ET and NCR, and whole tumor (WT), comprising both ED and TC. After this reclassification, we extracted features using the "PyRadiomics" library[22] in Python, employing default settings. The feature set incorporated 18 first-order features, 14 shape and size features, and 75 textural features. This was further expanded by 744 wavelet, 372 LBP, and 186 LoG filtered features, plus an additional 93 features each from exponential, gradient, logarithm, square, and square root filters for each TC and WT region on individual MRI sequences. Consequently, for each case, across all six sequences, we extracted a total of 22,488 features (1874 features for each TC and WT region multiplied by 6 sequences and by 2 regions).
2.5 Preliminary Screening of Radiomics Features
Given the generation of numerous features, feature engineering was critical in this study. Initially, we preprocessed the 1874 features extracted from each VOI subregion—which included both continuous and categorical radiomics data—by discarding features where a single value constituted over 75% of the data. This step reduced the total feature count from 22,488 to 20,818. Next, we applied the intraclass correlation coefficient (ICC) to further exclude features with instability across different radiologist interpretations, retaining 8,243 features at an ICC2k threshold of greater than 0.9. Since the fourth-order radiomics features, which were derived from various filters and parameters based on the first three orders (first-order, shape and size, textural features), resulted in a plethora of approximate features, we reduced redundancy within the TC and WT feature sets across all six sequences to mitigate multicollinearity in our analysis. This was achieved by identifying pairs of highly correlated features (Pearson correlation, r > 0.95, p < 0.05) and eliminating the one with the higher variance inflation factor (VIF) from each pair, culminating in the retention of 3,543 features. Figure 1B illustrates the distributions of the resulting feature set and the ICC filtering process.
2.6 Machine-learning-based prognostic prediction Feature selection and model construction
Within the training cohort, we established a robust feature selection and modeling process for prognosis using machine learning algorithms. Input features were divided into eight groups, each yielding a model and a final radiomics survival biomarker (RadSurv). The groups included: 510 features for T1WI, 584 for T2WI, 686 for T1CE, 591 for FLAIR, 565 for DWI, 606 for ADC, 2,372 for the combined first four sequences (Combo-4), and 3,543 for all six sequences combined (Combo-6). Subjects were randomly split between the training and validation sets in ratios of 3:7, 4:6, 5:5, 6:4, or 7:3 across 500 iterations, with each iteration following a specified set of steps.
First, within the training set, univariate Cox regression was employed for selecting prognostic features (likelihood ratio test, p < 0.05). Next, we calculated risk scores (risk score = Σ feature value × feature's Cox coefficient) to stratify the training set into high- and low-risk groups, using the "survminer" package's "surv-cutpoint" function to find the optimal cut-off point. If a significant difference between the groups was observed in the Kaplan-Meyer analysis (log-rank test, p < 0.05), these features were retained for the validation set; otherwise, the process iterated further. Notably, to enhance the diversity and efficiency of the prognostic feature selection methodology, following initial univariate Cox regression, we incorporated two additional independent processes: a Cox regression model with Least Absolute Shrinkage and Selection Operator (Cox-LASSO) and a Random Survival Forest Variable Hunting (RSFVH) algorithm. We refer to this basic approach, which did not include either of the two algorithms, as Cox-regression-based Feature Selection (Cox-FS). In the validation set, each radiomics features identified from the previous training set step was further evaluated for prognostic value across four tasks: (i) value in GBM, (ii) value in non-GBM, (iii) value in all glioma cases, and (iv) value concurrently in GBM and non-GBM. The evaluation criteria were consistent with those used in the training set. We tracked how often each feature passed these validations across all sampling ratios and iterations and finally generated a count table of prognostic efficacy, one for each feature selection method across eight scenarios. These steps are outlined in Fig. 1C.
We normalized the count tables to mitigate disparities in absolute values before using unsupervised clustering to distill prognostically potent feature clusters from the count data. Approximately two-thirds or more of the features are expected to be culled through this selection mechanism. The clustering results are depicted in Supplementary Fig. 2. The 95% threshold previously set for multicollinearity meant effective clustering without overly dispersing features, although the retained features may still harbor some multicollinearity, potentially affecting subsequent modeling. To mitigate this issue, a further feature selection protocol was introduced to isolate highly efficacious, independent variables. Starting at a threshold of 0.9 and decreasing in 0.05 intervals to a minimum of 0.1, we pruned pairs of redundant features by removing the one with the higher VIF at each decremental threshold, as illustrated in Fig. 1D. Ultimately, in the training cohort, a multifactorial survival analysis model was constructed. This was accomplished by employing stepwise regression to further reduce potential multicollinearity or by bypassing stepwise regression and proceeding directly to modeling. The scoring formula is as follows:
RadSurv = h0(t) × exp (β1X1 + β2X2 + ... + βnXn)
The formula includes X as radiomic feature values, β as the regression coefficients, and h0(t) as the baseline hazard. This concludes the exposition of the selection and final modeling methodologies for eight scenarios of feature groups using three parallel approaches: Cox-FS, Cox-LASSO, and RSFVH. The processes described herein were executed by our custom-built automated script.
2.7 Evaluation
In this study, we thoroughly assessed RadSurv's survival prediction performance across four key aspects. Firstly, models were trained for various scenarios using the methods previously described, and Harrell's concordance index (C-index) was calculated with the "lifelines" library in python to quantitatively evaluate the predictive power for glioma prognosis. The results were represented in Table 2, Fig. 2 and Supplementary Table 2. We focused on comparing three feature selection methods (Cox-FS, Cox-LASSO, RSFVH), the radiomics model's effectiveness across different tasks (all gliomas, GBM, non-GBM), and the performance between and combination of imaging sequences (Combo-4 and Combo-6). Secondly, we explored RadSurv's ability to stratify patients into risk categories, both in GBM and non-GBM subgroups, exemplified with an optimal Combo-6 model. Kaplan-Meyer curves were further plotted with the "survival" and "survminer" R packages to assess and demonstrate the risk stratification capacity at various cut-off values (Supplementary Method 2 described the method of finding the best cut-off values) for glioma (Fig. 3). Thirdly, to ascertain whether the laboriously trained and explored RadSurv was an independent prognostic factor for glioma, univariate and multivariate Cox regression analyses were performed using the "rms" R package, with results presented in Table 3. Lastly, we constructed nomograms and performed calibration curve validations (Fig. 4) using the "regplot" R package. These validations encompassed the all glioma cases and delved into GBM and non-GBM subgroups.
Table 2
The best C-index scores for structural, functional, and combined sequences with different feature selection methods for varied tasks on the training cohort. The three data framed in boxes represent the best results for the ALL, Non-GBM, and GBM groups, respectively. In the first six rows of table, data in bold italic indicate the top three performance scores within the current glioma grouping. In the last two rows, data bolded and underlined signify the highest scores in their grouping. Combo-4 = structural-sequence combinations; Combo-6 = all-sequence combinations.
| ALL | | non-GBM | | GBM |
| CoxFS | CoxLasso | RSFVH | | CoxFS | CoxLasso | RSFVH | | CoxFS | CoxLasso | RSFVH |
T1WI | 0.705 | 0.732 | 0.698 | | 0.835 | 0.893 | 0.826 | | 0.598 | 0.643 | 0.606 |
T2WI | 0.675 | 0.740 | 0.715 | | 0.793 | 0.777 | 0.744 | | 0.630 | 0.680 | 0.660 |
T1CE | 0.776 | 0.768 | 0.799 | | 0.744 | 0.860 | 0.760 | | 0.643 | 0.632 | 0.706 |
FLAIR | 0.738 | 0.726 | 0.719 | | 0.909 | 0.876 | 0.835 | | 0.664 | 0.635 | 0.630 |
DWI | 0.703 | 0.690 | 0.723 | | 0.851 | 0.769 | 0.769 | | 0.631 | 0.602 | 0.615 |
ADC | 0.716 | 0.703 | 0.701 | | 0.868 | 0.901 | 0.917 | | 0.634 | 0.600 | 0.605 |
Combo-4 | 0.772 | 0.764 | 0.772 | | 0.835 | 0.868 | 0.826 | | 0.689 | 0.649 | 0.651 |
Combo-6 | 0.753 | 0.755 | 0.787 | | 0.893 | 0.835 | 0.893 | | 0.633 | 0.644 | 0.683 |
Table 3
Cox regression analysis of prognostic factors in discovery cohort and validation cohort. Note. a Numerical variable, b Categorical variable, OL = Oligodendroglioma, GBM = Glioblastoma, AS = Astrocytoma, WT = wild-type, MT = mutant, Cod = co-deletion, NA = not available.
Variable | Training cohort | | Test cohort |
| Univariate analysis | | Multivariate analysis | | Univariate analysis | | Multivariate analysis |
| HR (95%CI) | p.value | | HR (95%CI) | p.value | | HR (95%CI) | p.value | | HR (95%CI) | p.value |
RadSurva | 1.30 (1.20–1.30) | < 0.001 | | 1.18 (1.12–1.25) | < 0.001 | | 1.05 (1.00-1.10) | 0.0017 | | 1.08 (1.00-1.17) | 0.043 |
Agea | 1.10 (1.10–1.10) | < 0.001 | | 1.03 (1.01–1.05) | 0.007 | | 1.10 (1.00-1.10) | < 0.001 | | 1.05 (1.02–1.08) | 0.002 |
Genderb | 0.94 (0.68–1.30) | 0.72 | | | | | 0.84 (0.47–1.50) | 0.55 | | | |
CNS5 classb | | | | | | | | | | | |
OL vs GBM | 0.14 (0.07–0.27) | < 0.001 | | 0.53 (0.16–1.78) | 0.301 | | 0.06 (0.01–0.44) | 0.0056 | | 0.03 (0.00-0.60) | 0.023 |
AS vs GBM | 0.20 (0.10–0.44) | < 0.001 | | NA | NA | | 0.28 (0.11–0.72) | 0.0086 | | NA | NA |
IDH statusb | | | | | | | | | | | |
WT vs MT | 9.08 (5.80–17.0) | < 0.001 | | 3.72 (1.53–9.06) | 0.004 | | 11.0 (4.50–28.0) | < 0.001 | | 4.72 (1.36–16.4) | 0.015 |
1p/19q statusb | | | | | | | | | | | |
Cod vs non-cod | 0.28 (0.18–0.45) | < 0.001 | | 1.59 (0.78–3.26) | 0.205 | | 0.25 (0.09–0.69) | 0.0078 | | 4.39 (1.15–16.8) | 0.031 |
NA vs non-cod | 3.50 (2.50-5.00) | < 0.001 | | 1.18 (0.76–1.84) | 0.466 | | 3.30 (1.70–6.10) | < 0.001 | | 1.41 (0.72–2.80) | 0.318 |
Locationb | | | | | | | | | | | |
Left vs right | 0.87 (0.62–1.20) | 0.4 | | | | | 0.91 (0.51–1.60) | 0.76 | | | |
Cross midline | 0.97 (0.45–2.10) | 0.94 | | | | | 0.81 (0.29–2.30) | 0.69 | | | |
Frontal lobe | 0.71 (0.50-1.00) | 0.051 | | | | | 0.62 (0.35–1.10) | 0.098 | | | |
Temporal lobe | 1.70 (1.20–2.30) | 0.0019 | | 1.61 (1.13–2.29) | 0.008 | | 2.30 (1.30–4.10) | 0.0055 | | 1.36 (0.62–2.96) | 0.444 |
Parietal lobe | 1.30 (0.90–1.80) | 0.17 | | | | | 2.60 (1.40-5.00) | 0.0027 | | 1.49 (0.63–3.56) | 0.366 |
Occipital lobe | 1.60 (1.10–2.40) | 0.019 | | 1.13 (0.74–1.73) | 0.567 | | 5.60 (2.40–13.0) | < 0.001 | | 1.10 (0.30–3.98) | 0.887 |
Insular | 1.30 (0.92–1.80) | 0.14 | | | | | 2.10 (1.20–3.70) | 0.012 | | 1.36 (0.61–3.03) | 0.453 |
2.9 Statistical Analysis and data availability
The MRI processing and statistical analyses were conducted using Python (v3.8) and R (v4.2.3). Manual rectifications of VOIs were performed with 3DSlicer (v5.2). The visualization of figures and tables was accomplished through R, Microsoft Office Excel, and Adobe Illustrator. The C-index ranges from 0 to 1, where values near 1 indicate better prediction; 0.5 means no predictive ability, and below 0.5 suggests inaccurate predictions. Log-rank test was used to assess significance between two groups in Kaplan-Meyer survival analyses, considering p-values < 0.05 statistically significant. Closer alignment of nomogram calibration points with the diagonal shows greater predictive accuracy. Our code is open-source and available at https://github.com/shadiendo/Radiomic_PrognosicPredict. Additionally, we have provided the survival data from our study and a web-based dynamic nomogram solution using the 'DynNom' R package for easy visualization and utilization. Data from our institution will be provided upon reasonable request to the corresponding author.