DOI: https://doi.org/10.21203/rs.2.21209/v1
Background: For individually predicting preoperative response to Stereotactic radiotherapy for Nonfunctioning pituitary Adenoma with the use of a radiomics approach.
Methods: 93 cases (training set: n = 62; test set: n = 31) were recruited with contrast-enhanced T1-weighted MRI (CE-T1) before stereotactic radiotherapy. All of these patients received another MRI scan to assess sensitivity of radiotherapy after 12 to 18 months. The shrinkage and no increase in tumor volume are regarded as sensitive to gamma knife radiotherapy. According to CE-T1 images, we extracted 1208 quantitative imaging features totally. Support vector machine (SVM) combined with recursive feature elimination (RFE) and grid-search trained a four-feature prediction mode verified with an assay of receiver operating characteristics (ROC) for an individual set of test. In addition, a ROC curves with individual feature and signature bar were constructed for prediction.
Results: The cross-validation area under the curve (AUC) on the three-fold train set is 0.991,0.843 and 0.889. In terms of the test and training sets, T1-CE image features led to 0.897 and 0.914 AUC, separately.
Conclusions: With the use of a radiomics method, the response to Stereotactic Radiotherapy for Nonfunctioning Pituitary Adenoma was primarily predicted before the operation. The built mode performed well, suggesting that radiomics is promising to preoperatively predict sensitivity to radiotherapy in NFPA.
Pituitary adenoma (PA) refers to frequently observed intracranial tumors, taking up 15–20% of most intracranial tumors; its rate of occurrence reaches 80–90 per 100,000 population[1–3]. At present, PA mainly relies on surgical resection such as endoscopic endonasal approach, an invasive method that may lead to risks of hypopituitarism, CSF leakage, vision loss triggered by surgeries and other operative complications such as central nerves system infection[4–6]. However, some patients are still unable or unwilling to undergo operation due to their older age or high risk of operation.
Novel radiotherapy technique (e.g., stereotactic radiotherapy) has reported to have enhanced safe level and effectiveness in the treatment of tumor[7–9]. Radiation therapy (RT), especially stereotactic radiotherapy, can lead to prominent local control for cases with nonfunctioning pituitary adenomas (NFPA) not able to have full surgical resection[10, 11]. Nevertheless, radiotherapy will inevitably come up with side effects such as nerve damage and difficulty to subsequent operation[12].
It is worth noting that there are currently no effective biomarkers to select NFPA patients most suitable for radiotherapy, which is the key to personalized medicine. There remains an unfathomed clinical need for biomarkers able to detect patient’s sensitivity to radiotherapy in a noninvasive and correspondingly easy method. To solve this problem, here we developed a predictive model to determine the sensitivity of NFPA to treatment of radiotherapy using a radiomics approach. Recent advances in imaging analysis enable noninvasive, three-dimensional, and quantitative characterization of tumor tissue possible [13, 14], which has great therapeutic guidance potential by providing a comprehensive perception of the entire tumor, expounding intra-tumor heterogeneity and unrestricted reproducibility in the course of the disease[15–17]. In our previous study, we have built a radiomics model to discriminate NCAs from other type of nonfunctioning PA preoperatively[18].
Here, the potential of radiomics, an emerging field of research, is analyzed to take full advantage of the potential of medical imaging, by extracting and analyzing a total of 1,208 features from MRI in 93 patients with nonfunctioning pituitary adenoma before radiotherapy. It is assumed that the extracted radiological characteristics can be used to construct a categorical model capable of predicting and stratifying the sensitivity of patients to radiotherapy.
Patients and treatment
The institutional review board approved this retrospective analysis of data from MR images, and the requirement for informed consent was waived. Eligibility criteria included (a) treatment with Gamma Knife radiotherapy for NFPA from August 2009 through August 2012; (b) availability of at MRI scans obtained after radiotherapy 12-18 months; (c) an identical sequence protocol for MR scans, covering CE-T1. Each patient was treated with gamma knife (Elekta instrument AB, Sweden) using dose of 9-12 Gy in single fraction.
Patients were split to the training and test set at a 2:1 ratio, according to random numbers generated by computer.
MR Image Acquisition and Segmentation
We obtained all MR images using a 3-T MRI system (SIEMENS) during routine clinical visits. Patients with poor MRI data quality due to exercise artifacts or poor contrast injection were excluded. A neurosurgeon contoured the regions of interest manually on the CE-T1 images by using the ITK-SNAP[19] (www.itksnap.org) since compared with other scan types, lesions are more easily detected after contrast agent injection. Another one neurosurgeon and one radiologist then reviewed the contours to ensure correct segmentation.
Radiomics features extraction
To reveal phenotypic differences in patients who are sensitive to radiotherapy or not, we loaded the original MR image and the corresponding mask image segmented by neurosurgeon into the radiomics features extraction software, which was implemented using the package PyRadiomics[20] (https://github.com/Radiomics/pyradiomics) based on Python 3.6.4 (https://www.python.org).
According to prior work[13, 21, 22], a large panel of radiomic features quantifying five kinds of phenotypic characteristics on medical imaging was extracted in our study, including first-order statistics features, shape descriptors features, and features describing texture which are gray level cooccurrence matrix features (GLCM), gray level run length matrix features (GLRLM), and gray level size zone matrix features (GLSZM).
Feature Engineering
At the first step of feature engineering, the Z-score standardization procedure was performed on each radiomics feature in the training set, then in the test set using the parameters of the training set. To screen out useless features, we assessed the potential association of the radiomics signature with sensitivity status using the Mann-Whitney U test. Meanwhile, we discarded features that did not show statistically significance difference between sensitive group and insensitive group (a P-value below 0.05 is referred to as statistically significant). To minimize the redundancy between features, we also calculated Pearson correlation coefficients between each pair of features. For pairs with a correlation coefficient greater than 0.8, we retain only one of two features. In order to further reduce the risk of overfitting, we adopted a support vector machine (SVM) based recursive feature elimination (RFE) method with automatic tuning of the number of features selected with cross-validation. As a result, we screened out only a small number of crucial features which were prepared to construct the classification model.
The aforementioned processes were performed in the training set, while the test set only adopted the result.
Model Construction and Radiomics Score Calculation
The classification model was a support vector machine (SVM) constructed on the training set based on radiomics features. The penalty parameter ‘C’ of SVM was automatically decided using 3-fold cross validation. The receiver operating characteristic curve (ROC) was employed to assess how the proposed model performs. In the meantime, area under curve (AUC) was calculated. An AUC value of 0.5 implies that the model has no discriminability, and a value of 1.0 reveals the optimal discrimination.
To illustrate model’s performance, we calculated a radiomics score for respective case via a linearly combined taken characteristics weighted by their corresponding parameters.
Statistical Analysis
Statistical analysis was conducted with Python software (version 3.6.4, https://www.python.org). We used the scikit-learn package (https://scikit-learn.org) to implement feature engineering procedures and to construct the SVM model. Matplotlib package (https://matplotlib.org/) was used to plot the figures. A two-sided P-value less than 0.05 was used for indicating the statistically significant differences.
Patients
Our study retrospectively involved a total of 93 eligible patients. The patients’ sex distribution during two data set did not show statistically significant difference (P-value derived using , 0.460). The age of patients was 43.0813.58 and 44.1912.82 respectively in the two data set, did not show significant difference either (P-value derived using Mann-Whitney u test, 0.352).
The detailed distribution information was given in Table 1, while no statistically significant difference found between the sensitive and insensitive patients group.
Radiomics Feature Engineering
Out of 1208 radiomics features which describing five kinds of phenotypic characteristics extracted from MR image, 390 features were screened out through the Mann-Whitney u test. And leverage the Pearson correlation coefficients between features, we further kept 164 of them. At last, we took advantage of the (RFE) method and only 4 crucial features were selected [Fig. 1].
These 4 features were ‘original_glszm_SmallAreaEmphasis’, ‘wavelet-LLL_ngtdm_Contrast’, ‘wavelet-LHL_firstorder_Mean’ and ‘wavelet-HHH_glszm_ZoneEntropy’. Table 2 has explained the implications of them.
For each of the 4 features, we constructed a univariate SVM model to analysis the discrimination ability, showing acceptable separate performance [Fig. 2].
Model Construction and Evaluation
We constructed a SVM model on the training set to distinguish the sensitivity of patients to radiotherapy, based on the 4 chosen features on the training set. With 3-fold cross-validation, the best C value (0.183) was decided and employed. The AUC of these 3 folds were 0.991, 0.843 and 0.889, indicated the stability and availability of our model. The ROC in Fig 3 illuminated the remarkable performance of our model in the training set (AUC, 0.914) and the test set (AUC, 0.897).
Further, we derived a radiomics score for each patient from the SVM model (Fig. 4). The formula of the radiomics score was:
Rad-score = 0.859 * original_glszm_SmallAreaEmphasis
+ 0.831 * wavelet-HHH_glszm_ZoneEntropy
- 0.563 * wavelet-LLL_ngtdm_Contrast
- 0.339 * wavelet-LHL_firstorder_Mean
- 0.362
We provided a boxplot (Fig. 5) in the training and test set, graphically depicting radiomics scores in the training and test set through their quartiles. The scores between two data set show statistic significant difference using Mann-Whitney u test (P-value < 0.001).
NFPA patients are now primarily treated by operations such as endoscopic endonasal approach. Nevertheless, considering the complications of surgical operation, the prediction of patients’ sensitivity to radiotherapy before the operative is likely to enable NFPA patients to undergo radiotherapy, more probably benefiting from radiotherapy instead of surgical operation. Also, the prediction of patients who are not sensitive to radiotherapy could avoid the difficulty of subsequent operation due to gamma knife radiotherapy. In recent RCT research, an imaging biomarker also showed great ability to identify patients who would more likely to gain additional failure-free survival from an induction chemotherapy, which could influence the clinical decisions in locoregionally advanced nasopharyngeal cancer[23].
Thus, a quantitative mode was built according to radiomics features extracted from MRI for predicting the sensitivity of NFPA to gamma knife radiotherapy. Radiomics effectively represents the underlying biological information of brain tumor in a quantitative manner and applies to diagnosing process, prognosing process and assessment of therapeutic response [24, 25]. To our knowledge, this study is the first attempt to assess the reactiveness to gamma knife in NFPA patients by a radiomics approach.
Such feature selecting approach[26–28] was reported as a feasible method. We consider the results here can facilitate radiomics analyses to identify features that are beneficial to model the outcome in clinics. As a matter of fact, with the simple processing above, our result could have such a good performance which also means the less is more.
Our study still had some limitations. In our study, the diagnosis of NFPA is based on radiology and hormone examination because the patients did not receive any operation. Although NFPA is easily recognized on MRI, there is no pathological information to confirm the diagnosis.
In this study, we chose sagittal CE-T1 images to extract features and build the predictive model. While other routine sequences (e.g., diffusion-weighted imaging and T2WI) could provide excess information and make the predicting mode perform well. As a matter of fact, the imaging sequences and modalities employed to achieve radiomics analyses change in published literature[29, 30]. Although more sequences and modality images capable of affording more information, they might be inconvenient in clinical application and cost more time to acquire the exact information. Consequently, here, we merely employed contrast-enhanced T1WI because contrast-enhanced T1WI are easy to acquire and employed in clinical aspect to diagnose and estimate brain tumor. Nevertheless, numerous publications aim at developing novel imaging methods for enhancing the diagnosing process and serving as a reliable and powerful tool for prognosis prediction (e.g., positron emission tomography imaging, dynamic contrast-enhanced MRI, and perfusion MRI[31–34]). Though the mentioned studies are of high importance, the mentioned methods are overall high in cost and should be significantly developed prior to it is clinically used.
Furthermore, our single hospital provided the dataset here. Also, the limited number of patients reflects the nature of the treatment. When a new patient data set is available, it should be made to introduce more to our model data samples. In the future, a multicenter trial should be required to evaluate the performance of our model. The features found here are required to be valuable in subsequent analyses of employing MRI for distinguishing the sensitivity to radiotherapy.
Additionally, our follow-up time is only 12–18 months. Thus, the long-term prognosis is still unclear. Though we required patients to go back to follow up in schedule, they had different follow up time. As radiomic features appear to change constantly[35], it is difficult to find out the appropriate time for extracting features to achieve feature selection and model building, and models developed for the mentioned features are likely be insufficiently powerful to interpret variations with the passage of time.
The proposed predictive model also has limitations for clinical use because existing radiomic study remains hard to implement clinically as covering sophisticated computational steps and constant interactions of human. The segmentation of lesion complies with the optimal judged results of the neurosurgeon and is likely to vary with people. We do not calibrate MR image values, which is different from CT images. For this reason, the identical tissue can show difference in various MRI systems.
To sum up, a quantitative model was proposed to predict sensitivity to radiotherapy for patients with NFPA before the operation. The patients could benefit from such a predictive model and have a more personalized treatment strategy besides operation, revealing that clinicians can be helped by analyzing medical images with a radiomics approach.
Ethics approval and consent to participate
The need for approval was waived by the Institutional Review Board of Beijing Tiantan Hospital Affiliated to Capital Medical University.
Consent for publication
Not applicable.
Availability of data and materials
The datasets analysed during the current study are not publicly available due to the privacy policy of Beijing tiantan hospital but are available from the corresponding author on reasonable request.
Competing interests
The authors declare that there is no conflict of interest regarding the publication of this article.
Funding
This work was supported by the National Key R&D Program of China (Grant 2017YFC1308700, 2017YFA0205200, 2017YFC1309100), the Ministry of Science and Technology of China (Grant 2015AA020504), National Natural Science Foundation of China (Grant 81771489, 81971776, 91959130, 81771924, 81227901), the Beijing Natural Science Foundation (Grant L182061), the Beijing Municipal Science & Technology Commission (Grant Z171100000117002) and the Youth Innovation Promotion Association CAS (Grant 2017175).
Authors’ contributions
JJ, GS and CL contoured the regions of interest manually on the CE-T1 images. LM extracted the radiomics features and built the final category model. JJ and LM contributed equally to this work as they were major contributors in writing the manuscript. SS did all these gamma knife operation. YZ and JT are co-corresponding authors, they reviewed the final version of manuscript. All authors read and approved the final manuscript.
Acknowledgements
Not applicable.
Table 1. Statistical analysis of the clinical characteristics of the training and test data sets
Characteristics |
Training Set (n = 62) |
Test Set (n = 31) |
||||
|
Sensitive |
Insensitive |
P-value |
Sensitive |
Insensitive |
P-value |
Sex, No. (%) |
|
|
0.562 |
|
|
0.727 |
Male |
16 (25.8) |
17 (27.4) |
|
7 (22.6) |
12 (38.7) |
|
Female |
11 (17.7) |
18 (29.1) |
|
6 (19.4) |
6 (19.3) |
|
Age, meanSD, year |
42.0413.02 |
43.1114.00 |
0.475 |
42.2310.84 |
45.6113.91 |
0.200 |
Sex and age’s distribution during the training and test set. The distribution of sex was assessed by the test., while the difference in the patients’ ages was evaluated by the Mann-Whitney U-test.
Table 2. The implications of the features.
Feature |
Filter |
Group |
Description |
original_ glszm_ SmallAreaEmphasis |
Original |
GLSZM-based features |
SmallAreaEmphasis is a measure of the distribution of small size zones, with a greater value indicative of more smaller size zones and more fine textures. |
wavelet-LLL_ ngtdm_ Contrast |
Wavelet-LLL |
NGTDM-based features |
Contrast is a measure of the spatial intensity change. Contrast is high when both the dynamic range and the spatial change rate are high, i.e. an image with a large range of gray levels, with large changes between voxels and their neighborhood. |
Wavelet-LHL_ firstorder_ Mean |
Wavelet-LHL |
First order statistics |
The average gray level intensity within the ROI. This tends to emphasize the regions with high intensity levels. |
Wavelet-HHH_ glszm_ ZoneEntropy |
Wavelet-HHH |
GLSZM-based features |
ZoneEntropy measures the uncertainty/randomness in the distribution of zone sizes and gray levels. A higher value indicates more heterogeneity in the texture patterns. |
The feature name was split by an underline ‘_’, where the first part of the name was the image filter used by the feature, the second part was the class to which the feature belonged, and the third part referred to the feature’s formula. As to wavelet filters, they decomposed original image separately in 3 directions (x, y z), with either low- or high-pass functions, deriving 8 image filter (wavelet_LLL, _LLH, _LHL, LHH, HLL, HLH, HHL, HHH).