Patients
The ethics Institutional review committee of our hospital approved this retrospective multicenter study and waived the requirement for written informed consent due to the anonymous nature of the imaging data. A total of 374 patients who underwent meningioma resection between January 1, 2016 and December 31, 2020 were initially collected from centers A and B. Pathology reports were reviewed to ensure that they met the criteria of the 2021 WHO classification criteria for central nervous system tumors [1].
Patients meeting the following inclusion criteria were selected: (1) underwent MRI examination before surgery, (2) had a confirmed diagnosis of meningioma by pathology, and (3) did not receive any treatment prior to MRI and surgery. Patients were excluded if they met any of the following exclusion criteria: (1) unclear pathological diagnosis or missing grade or Ki-67 index information, (2) severe MRI image artifacts or poor image quality unsuitable for analysis, and (3) recurrence, multiple lesions, or lesions smaller than 1cm×1cm×1cm. Clinical features, including age, sex, grade, and Ki-67 index were recorded for each patient.
Based on the above criteria, 216 patients in total (183 from center A and 33 from center B) were included in the study. All patients were randomly divided into the training and test sets in a ratio of 7:3.
Image acquisition
In Centre A, all patients underwent standard MRI examinations on a 3.0T Philips Achieva. The detailed protocols and parameters were as follows: Slice thickness was 5mm, with a repetition time of 2000ms and echo time of 20ms for T1-weighted imaging (T1WI), and a repetition time of 3000ms and echo time of 80ms for T2-weighted imaging (T2WI). For contrast-enhanced T1WI, the repetition time was 18ms and echo time was 4.6ms.
In Centre B, MRI was performed on a 1.5T Siemens Avanto. The detailed protocols and parameters were as follows: Slice thickness was 5mm, with a repetition time of 1800ms and echo time of 6.8ms for T1WI, and a repetition time of 3500ms and echo time of 120ms for T2WI. For contrast-enhanced T1WI, the repetition time was 2000ms and echo time was 2.6ms.
In addition, all contrast-enhanced MRI scans were acquired after injection of gadodiamide (dose: 0.1 mmol/kg) as a contrast agent. The dynamic-enhanced MRI scan was performed within 250 seconds of contrast injection.
Image preprocessing and tumor segmentation
Raw MRI data of Digital Imaging and Communications in Medicine (DICOM) format were first converted to Neuroimaging Informatics Technology Initiative (NIfTI) format using ITK-SNAP (version 3.8.0, University of Pennsylvania, Philadelphia, USA; http://www.itksnap.org). Image preprocessing was required before radiomic feature extraction, including nonparametric nonuniform intensity normalization algorithm (N4ITK) bias correction, normalization at a scale of 100, resampling to a 1 × 1 × 1 mm3 resolution, and gray-level intensity normalization in the range of 0 to 255(10, 11). The ITK-SNAP software was also utilized for image segmentation. Blinded to patient information, two radiologists with more than five years of experience in image reading manually drew region of interest (ROI), including the largest cross-sectional area of the tumor (2DROI) and the whole tumor (3DROI), as close as possible to the tumor edge. A senior neuroradiologist with over ten years of experience in image reading reviewed and selected ROIs used to radiomic features extraction.
Clinical and radiological features collection and selection
Two radiologists with over five years of experience in image reading systematically analyzed eight radiological features, including peritumoral edema, cerebrospinal fluid (CSF) space surrounding the tumor, capsular enhancement, heterogeneous enhancement, intratumoral necrosis, cross-flax or tentorium, dural tail, and surrounding invasion. Univariate and multivariate analyses were performed to identify clinical and radiological features significantly correlated with different grade and Ki-67 groups. A P value less than 0.05 was considered statistically significant in both univariate and multivariate analyses for clinical and radiological features. Figure 1 shows two case examples.
Radiomic features extraction
The "PyRadiomic" package (http://www.radiomics.io/pyradiomics.html) (12), which is based on IBSI in Python was used to retrieve the radiomic features. These features Included shape features, first-order radiomic features, and higher-order radiomic features from five different matrices: the gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), gray-level dependence matrix (GLDM), and neighborhood gray-tone difference matrix (NGTDM). Interobserver agreement was assessed by calculating the intraclass correlation coefficients (ICCs) between the two radiologists. Only the radiomic features with high ICCs (ICCs ≥ 0.8) were taken into modeling. The radiomic features were standardized by moving the mean and scaling to unit variance.
Radiomic feature selection and development of radiomic models
As mentioned above, this study develop machine learning models to accomplish three tasks, namely predict WHO grade, Ki-67 index and combined of grade and Ki-67 of meningioma. Figure 2 describes the workflow of this study.
The extensive number of the extracted radiomic features must be selected properly to avoid overfitting the machine learning algorithms. The T-test and the minimum redundancy maximum relevance (MRMR) methods were used in the training set to preliminary select the relatively important features. Based on the total number of patients in the training and test sets, the top 20 features were selected.
Due to the low proportion of high grade and high Ki-67 meningioma patients, the synthetic minority oversampling technique (SMOTE) was employed to obtain smoother data for training the model after the preliminary feature selection process.
After that, five feature selections were independently utilized to select the final features to establish the models, including least absolute shrinkage and selection operator (LASSO), mutual Information (MI), recursive feature elimination (RFE), random forest (RF), and analysis of variance (ANOVA). For RFE, MI, and RF, 15 features were selected for model establishment(13).Eight supervised machine learning algorithms were applied. These classifiers were support vector machines (SVM), logistic regression (LR), Naive Bayes (NB), decision tree (DT), random forest (RF), extreme gradient boosting (XGBOOST), light gradient boosting machine (LGBM), and k nearest neighbors (KNN). Machine learning models use basic default parameters.
In combination with three MRI sequences, two kinds of ROIs, five feature selection methods, and eight classifiers, 240 (3 × 2 × 5 × 8 = 240) models were built for each task. The nomenclature of each model combined four elements, including the sequence, ROI dimension, feature selection method, and machine learning algorithm. For example, T1CE-2D-LASSO-SVM was a model trained by a support vector machines classifier with features selected by the LASSO, which feature an extract from the largest section of the T1CE sequence.2-times-3-fold cross validation was carried out on the training set, models' predictive performance and stability were evaluated by cross-validation area under the curve (AUC), and the cross-validation AUCs relative standard deviation (RSD) of the cross-validation AUCs, respectively. RSD was defined as RSD= (sd AUC/mean AUC) ×100%. A lower cross-validation RSD value indicates a more stable model performance in cross-validation, and models with an RSD value below 0.1 were considered to be stable. After comparing the cross-validation AUC and RSD, the best radiomic model for each task was selected.
Selected models were trained and tested in the training and test set. The performances of those radiomic models were demonstrated with AUC, sensitivity, and specificity, respectively.
Development of clinical-radiological-radiomic models
The clinical-radiological-radiomic (CRR) models used the classifier of the best radiomic models and were trained by combining the significant clinical and radiological features with the radiomic features of the best radiomic model. The performances of the CRR models on training and test set were evaluated with AUC, sensitivity, and specificity. Additionally, Delong test was used to compare the performance of the CRR models with the radiomic models in the test set.
Statistical analysis
Image preprocessing, radiomic features extraction, features selection, model establishment, and model validation were performed in Python (version 3.9).
Categorical variables were presented with percentages and frequencies, whereas continuous variables were denoted as mean ± standard deviation or median (interquartile range, IQR) according to whether they conformed to the normal distribution. In addition, the Mann-Whitney U test and the Chi-squared test were used to assess the differences between the training and the test sets for continuous and categorical variables, respectively. In the univariate and multivariate analysis, the chi-square test and logistic regression were employed to assess the associations between classification and clinical and imaging features, while a p-value less than 0.05 were considered statistically significant. The statistical analysis was performed with the IBM SPSS v 24.0 (IBM Corp, Armonk, New York) software package.