Building Reliable Radiomic Models Using Image Perturbation

doi:10.21203/rs.3.rs-1195202/v1

Download PDF

Research Article

Building Reliable Radiomic Models Using Image Perturbation

https://doi.org/10.21203/rs.3.rs-1195202/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Radiomic model reliability is a central premise for its clinical translation. Presently, it is assessed using test-retest or external data, which, unfortunately, is often scarce in reality. Therefore, we aimed to develop a novel image perturbation-based method (IPBM) for the first of its kind toward building a reliable radiomic model. We first developed a radiomic prognostic model for head-and-neck cancer patients on a training (70%) and evaluated on a testing (30%) cohort using C-index. Subsequently, we applied the IPBM to CT images of both cohorts (Perturbed-Train and Perturbed-Test cohort) to generate 60 additional samples for both cohorts. Model reliability was assessed using intra-class correlation coefficient (ICC) to quantify consistency of the C-index among the 60 samples in the Perturbed-Train and Perturbed-Test cohorts. Besides, we re-trained the radiomic model using reliable RFs exclusively (ICC>0.75) to validate the IPBM. Results showed moderate model reliability in Perturbed-Train (ICC:0.565, 95%CI:0.518-0.615) and Perturbed-Test (ICC:0.596, 95%CI:0.527-0.670) cohorts. An enhanced reliability of the re-trained model was observed in Perturbed-Train (ICC:0.782, 95%CI:0.759-0.815) and Perturbed-Test (ICC:0.825, 95%CI:0.782-0.867) cohorts, indicating validity of the IPBM. To conclude, we demonstated capability of the IPBM toward building reliable radiomic models, providing community with a novel model reliability assessment strategy prior to prospective evaluation.

Nuclear Medicine & Medical Imaging

Cancer Biology

Model reliability

Model robustness

Radiomics

Perturbation

Radiomics is a flourishing field in which machine learning is used to associate cancer imaging phenotypes with cancer genotypes or clinical outcomes for precision medicine ^1–3. Radiomics strives to characterize the differences in tumor phenotypes based on non-invasive medical images, such as computed tomography (CT), magnetic resonance imaging, and positron emission tomography. Furthermore, radiomics can be used to capture the heterogeneity of a tumor ⁴, associate heterogeneity with tumor characteristics for diagnosis ⁵ and treatment prognostication ⁶, and improve the overall decision-making during treatment ⁷.

Despite the potential of radiomics, the unknown reliability of reported radiomic features and signatures against the variability of image acquisition, reconstruction, and segmentation is one of the major challenges in translating radiomic models from bench to bedside ^8,9. Lafata et al. ¹⁰ reported the variability of a classification model for non-small-cell lung cancer histology with respect to free-breathing 3D-CT and phases of 4D-CT imaging. In addition to radiomic model applications, the deep-learning model variability caused by variations in analyzed images should be considered. Blazis et al. ¹¹ reported the impact of CT reconstruction parameters on the performance of a lung nodule computer-aided diagnosis (CAD) system based on deep learning. They found that the performance of the CAD system increased when the iterative reconstruction levels or the image quality were also increased. Both publications suggest that the impact of imaging variations on the reliability of radiomic models need to be better understood.

To our knowledge, no study has compared the reliability of radiomic models with that of features – against imaging variations. Multiple scans of the same patients obtained within a short interval are necessary to conduct a model reliability study, where the predicted outcomes from different scan sets could reflect the model variability. As obtaining such datasets is resource-intensive and increases the burden on the patient, they are only obtained for research purposes. To obtain multiple datasets, Zwanenburg et al. ¹² proposed perturbing the images and contours to simulate the acquisition of multiple image sets. They validated this method by comparing the feature robustness with that in two test-retest datasets.

Following this idea, we propose a reliability assessment method of the radiomic model using perturbations. In addition to traditional radiomic modeling methods, we simulated multiple internal validation datasets by adding plausible perturbations to the original images and segmentations. The perturbed data were then used to validate the reliability of the radiomic model against randomization, and reliability was indicated by the intraclass coefficient of correlation (ICC), which was used to describe the consistency of model prediction outcomes within the same patient across all perturbations.

First, the optimal features and associated characteristics for model building are reported. Second, the model’s performance on the original and perturbed dataset are evaluated. Third, the reliability of the radiomic model is computed.

The first step was to identify the features relevant to the outcome and remove redundant features. After filtering, 17 of 5486 features were selected. Then, a backward recursive feature elimination based on a penalized Cox proportional hazard model was used to find the optimal feature set for model building. Figure 1 shows the changes in training and validation C-indexes of a 10-times-repeated, three-fold cross-validation of the training dataset with respect to the number of features in the recursive feature elimination process. The feature set with the highest validation C-index was identified as the optimal feature set, and thus six features were identified as the optimal feature set and used for model building. The characteristics of these six selected features are tabulated in Table 1.

Table 1 The characteristics of selected features for model building. The univariate C-index, p-value, and ICC were tabulated. Feature names indicate the feature, the bin count (if applicable), and the image used to compute it.

features	C-index	p-value	ICC
log-sigma-6-0-mm-3D_gldm_LargeDependenceLowGrayLevelEmphasis_64_binCount	0.619	0.045	0.747
wavelet-HHL_glrlm_LongRunLowGrayLevelEmphasis_128_binCount	0.587	0.169	0.454
original_glszm_LargeAreaLowGrayLevelEmphasis_128_binCount	0.614	0.066	0.610
wavelet-LLL_glrlm_RunEntropy_128_binCount	0.608	0.064	0.900
wavelet-LHL_glszm_LowGrayLevelZoneEmphasis_64_binCount	0.572	0.091	0.491
wavelet-HLL_glszm_SmallAreaHighGrayLevelEmphasis_128_binCount	0.604	0.085	0.542

After identifying the six optimal features, the radiomic survival model was constructed and validated. The C-indexes of the survival radiomic model in the training and testing cohorts were 0.742 and 0.769, respectively. The averaged model performance C-indexes (standard deviation) over the perturbed training and testing cohorts were 0.686 (0.038) and 0.678 (0.065), respectively.

The model performance on the original and perturbed cohorts is visualized in Figure 2, which shows that the original training and testing C-indexes probably overestimate the model’s performance compared with the perturbed cohort evaluation. Furthermore, the model performance variations on the perturbed cohorts are significant, with C-indexes ranging from 0.609 to 0.758 in training and from 0.514 to 0.794 in testing.

After evaluating the model’s performance, the quantified model performance using ICC was calculated with a 95% confidence interval. The model reliability ICC was 0.565 (0.518–0.615) on the training set and 0.596 (0.527–0.670) on the testing set. According to the convention ¹³, this model’s reliability is moderate (0.5 < ICC < 0.75), and it is consistent with the significant variations in model performance with the perturbed datasets as shown in Figure 2.

An additional experiment was performed to validate the sensitivity of the reliability ICC, using the highly reliable features (ICC > 0.75) to repeat the radiomic modeling process. After prescreening the reliable features, 67% (3667 / 5486) of features were retained; these were reduced to four optimal features for model building after feature selection. The new model performance C-indexes for the original training and testing cohorts were 0.711 and 0.641, respectively, while the averaged perturbed training and testing C-indexes (standard deviation) were 0.640 (0.029) and 0.625 (0.042). The model reliability ICC values, with a 95% confidence interval, were 0.782 (0.749–0.815) and 0.825 (0.782–0.867) for the perturbed training and testing sets, respectively.

An additional experiment, starting with highly reliable features, led to a significant increase in the model reliability ICC values from moderate to good. This result demonstrated the sensitivity of our method to input reliability.

This study proposed a radiomic model reliability evaluation method using data perturbations. We demonstrated this method using a publicly available dataset and by building radiomic models to predict distant metastasis-free survival. To our knowledge, this is the first study to describe a method to assess the reliability of radiomics models based on image perturbation. Our method evaluates model reliability against randomization in a radiomic workflow using the perturbation method. This study may provide a new perspective on model assessment for the radiomic community. Our results showed that model performance can be overestimated, despite the decent model predictability achieved using an independent testing set. Moreover, simulated perturbation data can serve as an internal validation method for a model reliability assessment.

This study is also the first to assess radiomic model reliability. Currently, there is no radiomic model reliability assessment method, despite consensus on the importance of building reliable radiomic models within the community ¹⁴. This paradox may be due to several reasons. First, the reliability of a model covers a wide range of aspects, as radiomics is a multi-step process and uncertainties may be introduced in each step ^8,15. Therefore, it is challenging to characterize the stability of radiomic models. Second, limited medical resources, such as re-scanned images, prevent the internal validation of model reliability. If multiple scanned image sets obtained over a short time interval and inter-observer delineations of different scans were available, the model could be validated internally to account for random variations in parameters such as patient positioning and inter-observer delineation. Third, it is challenging to characterize a model’s reliability against controllable factors, such as different scanners and acquisition parameters, because such medical resources are inaccessible. These factors have been shown to affect radiomic feature reproducibility and, potentially, model reliability. To tackle some of these challenges, our study used the perturbation method to simulate perturbed datasets, thereby accounting for randomized factors in the radiomic workflow. For example, rotation and translation mimic variations in the patient’s positioning during the scans and resampling uncertainties, noise addition mimics fluctuations in the voxel values caused by statistical uncertainties, and contour randomization mimics inter-observer uncertainty in region-of-interest delineation. These simulated datasets play a crucial role in assessing radiomic model reliability.

This study also evaluated the robustness of the model against randomness. The majority of reliability studies in radiomics publications have focused on the reproducibility and robustness of controllable factors, such as the scanner brand ¹⁶, image acquisition parameters ¹⁷, reconstruction kernels ¹⁸, and preprocessing parameters ¹⁹. However, the effects of these controllable factors can be minimized with sufficiently transparent reporting. In contrast, random and natural variations persist in every radiomic study and are difficult to address by harmonization or standardization. Therefore, understanding the impact of randomness on radiomic features and models is crucial for establishing clinical radiomic applications.

Our results revealed the vulnerability of our radiomic model to randomness. In our results, the model performance evaluation using perturbed data showed lower training and testing C-indexes for the survival model and considerable variability in its distribution under perturbations. The lower training C-index for the perturbed data reveals that evaluating models using their original data results in overfitting to noise in the original data and over-estimation of the model’s learning. If a model is unable to achieve a similar performance using the same data with plausible randomization, it is unlikely that it could be translated to the clinic. Careful assessment of radiomic models’ reliability is therefore essential.

A potential solution to this issue is to evaluate the reliability of features under randomization and integrate this information into radiomic modeling. Despite plenty of discussion and studies of radiomic feature robustness and reliability under various circumstances, only two methods have been implemented in a few clinical studies. The first method uses a test-retest dataset and evaluates radiomic feature reliability using two consecutive scans in a short interval, followed by incorporating this reliability into the dataset. This method may reflect realistic feature reliability under test and retest settings. However, the acquisition of test-retest imaging is rarely conducted outside of a research context, and most medical imaging datasets therefore lack complimentary test-retest image data. Although some studies have adopted the test-retest RIDER Lung dataset ²⁰ to assess feature robustness in an attempt to build reliable models, the generalizability of feature robustness from the RIDER Lung data to the dataset being studied has been criticized ²¹. The second method assesses feature robustness using inter-observer variability on the contours. The region of interest on the images is delineated multiple times by independent oncologists, and feature robustness is evaluated from the inter-observer consistency of feature values. This method is more practically accessible than test-retest images to assess feature robustness. However, this method also has limitations in terms of the insufficient identification of non-robust features and high medical personnel costs. The shortcomings of these two methods for assessing feature robustness limit their effectiveness for removing non-robust radiomic features during radiomic modeling, potentially resulting in radiomic models that are vulnerable to randomization. Therefore, simulated randomization of a dataset via the perturbation method may enable estimation of the impact of randomness on radiomic modeling. Multiple perturbed datasets can be generated with perturbations, and their feature values can be determined. Feature robustness can be quantified using the ICC for each feature by considering its variability within a single subject and across the dataset. Then, removing the less reliable features can improve the reliability of radiomic models against randomizations. In contrast to test-retest and inter-observer variability, simulation methods may be more versatile for evaluating feature robustness with no additional clinical resource costs and could enable data-specific feature robustness evaluations. Moreover, perturbations can provide additional validation data to evaluate model reliability and safeguard it against randomization.

In addition to these contributions, some aspects of our approach could be explored to enhance the impact of this study. First, image and contour perturbation via simulation is a new method in radiomics, so comparisons between this and established methods (e.g., test-retest and inter-observer variability) could be studied further to identify their respective advantages and disadvantages. Second, our validation results showed a decline in model predictability performance from the testing data when poorly and moderately reliable features were removed. A future study could investigate how to balance the model’s predictive performance with its reliability.

This study proposed a radiomic model reliability assessment method using perturbations. This method identifies unreliable models by comparing the model’s performance on the training dataset with the performance achieved on random perturbations of the training dataset. Using this approach could help the radiomics community to build more reliable models for future clinical applications.

Overview

The overview of the workflow used to demonstrate our model reliability assessment method is illustrated in Figure 3. First, we collected pre-treatment CT images and clinical outcomes from a publicly available head-and-neck cancer (HNC) dataset and randomly split the data into training (70%) and testing cohorts (30%), with similar outcome ratios between the two cohorts. Second, a radiomic survival model was built to assess distant metastasis-free survival. Third, internal validation datasets (Perturbed-Train and Perturbed-Test) with perturbations were simulated ¹². The simulated perturbation datasets were used to extract perturbed radiomic features and validate the survival model’s reliability against randomizations, as shown in Figure 3(b). Finally, the ICC was used to quantify the model’s reliability, reflecting its prediction consistency when using the perturbed data. The experiement is approved by the department of health technology and informatics, the Hong Kong Polytechnic University. The reporting of the radiomic survival model is based on Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) ²².

Materials

The dataset, Head-Neck-PET-CT ¹⁴, was collected in The Cancer Image Archive ²³. This dataset consists of 298 patients with head-and-neck squamous cell carcinoma (HNSCC) with a median follow-up of 43 months. The patients were treated at four different centers and received only radiation (n = 48, 16%) or chemo-radiation (n = 250, 84%) with curative intent. The patients’ characteristics and image reconstruction parameters are summarized in Supplementary Tables 1 and 2. Due to the nature of the retrospective study and the publicity of the dataset, the informed consent was waved.

The region of interest for feature extraction was the primary gross tumor volume (GTV), which was the primary treatment target of radiation therapy. The GTV is the most reliable region for predictive feature extraction ²⁴ and has been used in several predictive radiomics studies of HNSCC ^1,25,26.

Distant metastasis-free survival, defined as the interval from the first day of treatment to the date of the event, was the clinical endpoint in this study to demonstrate the reliability assessment of the radiomic model ²⁷. Previous studies of binary classification models of HNC ^25,28 have achieved good prediction results but were limited because the time-to-event was neglected during model development.

Image Preprocessing and Radiomic Feature Extraction

The CT images and their GTV contours were preprocessed before their features were extracted to maintain the features’ reproducibility and consistency ^29,30. First, the GTV contours were interpolated to a voxel-based segmentation mask. Second, an isotropic resampler (1 mm × 1 mm × 1 mm) was applied to the images and masks, with B-spline interpolation on the image and nearest-neighbor interpolation on the mask to enhance the reproducibility of the radiomic features ³¹. The preprocessing steps were implemented on Python v3.8 using the SimpleITK v1.2.4 ³² and OpenCV ³³ packages.

The radiomic features were then extracted using the Pyradiomics v2.2.0 ³⁴ package, which is Image Biomarker Standardization Initiative-compliant ^35,36. A total of 5,486 radiomic features were extracted from the GTV of each patient’s CT scan. Twelve images were included in the feature extraction, including one unfiltered image, three Laplacian-of-Gaussian filtered images (with sigma values of 1 mm, 3 mm, and 6 mm), and eight Coiflet1 wavelet filtered images (LLL, HLL, LHL, LLH, LHH, HLH, HHL, HHH). In addition to the 14 shape features from GTV segmentation, 18 first-order and 73 second-order features were extracted from the region of interest of each filtered image. A re-segmentation of the soft-tissue range (−150 to 180) ¹² and discretization, with fixed bin counts of 4, 8, 16, 32, 64, and 128, were specified for the texture feature extraction. The detailed feature extraction parameters can be found in Supplementary 3.

Radiomic Modeling

Patients were randomly assigned to the training and testing cohorts (70/30 split) with stratification by distant metastasis status ^6,37. The data in the training cohort were used for feature selection and subsequent model training, while the data in the testing cohort were used to evaluate the model’s performance.

Feature Selection

A filter-based feature selection method was adopted in our analysis ³⁸. This process has two steps: feature–outcome relevance filtering and feature–feature redundancy filtering. Identifying the most relevant and less redundant features is a common practice in radiomics studies, regardless of the evaluation metric ³⁹.

Relevance filtering. Relevance filtering aims to identify the radiomic features that are correlated with the outcomes ²⁵. First, the outcome relevance of each feature was repeatedly evaluated by log-rank test p-values under downsample bootstrapping (imbalanced-learn 0.8.0 ⁴⁰) without replacement over 100 iterations on the training dataset. Downsampling can be used to capture useful information in an imbalanced dataset ⁴¹. Second, features with p-values less than 0.1 were selected in each iteration and ranked by their frequencies, with the top 10% of features with the highest frequencies selected.

Redundancy filtering. Redundancy filtering aims to remove features correlated with each other ⁴². First, the feature pairs with Pearson correlation coefficients higher than 0.6 were identified. Then, the features with higher mean correlation coefficients than the rest of the features were removed. The removal of these redundant features should improve the predictive ability of the classifiers ⁴³.

Model Building

To build the survival model, the optimal features for the model building were identified using backward recursive feature elimination based on the penalized Cox proportional hazard model ⁴⁴. This approach maximizes the validation concordance index (C-index) curve by using repeated three-fold cross-validation in the training set. After identifying the optimal features, a penalized Cox proportional hazard survival model was built for distant metastasis-free survival. The hyperparameter of the model was fine-tuned with five-fold cross-validation to maximize the C-index for the survival model. Thus, the model’s performance with the training and testing cohorts was evaluated.

Reliability Assessment

This section describes the method to evaluate the model reliability using perturbations and the workflow shown in Figure 3(b). First, the internal validation datasets were simulated with the perturbations by adding plausible randomizations to the original images and segmentations. Second, the survival model was evaluated using both the perturbed training and testing data. Third, the model reliability against simulated randomization was quantified using the reliability index ICC.

Validation Data Simulation

The internal validation data sets were simulated using the perturbation method ^12,45. For each perturbation, both the image and mask were translated and rotated simultaneously by a random amount. This simulation aimed to mimic variations in the patient’s position during imaging. Then, a random Gaussian noise field was added to the image to mimic the noise level variations between different image acquisitions ⁴⁶. The detailed perturbation parameters are presented in Supplementary Table 4. Next, the GTV mask was also perturbed by a randomly generated deformable vector field, which aimed to simulate uncertainties in inter-observer delineations on the same target ⁴⁷. In total, 60 sets of perturbed images and contours were simulated, with the corresponding radiomic features extracted as the internal validation sets to evaluate the model reliability under randomization.

Model Validation

The model performance was validated and reported on the original and perturbed datasets using the C-index as the evaluation metric. Two observations may warrant attention. First, the model performance consistency between the original and perturbed datasets might be a qualitative indicator of model performance reliability against the simulated randomizations. Second, the model performance variance with perturbed datasets may reflect the model’s sensitivity to slight fluctuations. A qualitative assessment of model reliability could be performed by comparing the model performance on the original and perturbed data.

Model Reliability Quantification

In addition to the qualitative analysis of model reliability, a quantification metric, the ICC, was proposed to evaluate model reliability under randomization. The ICC is often used as a reliability index for inter-rater reliability analysis ⁴⁸, and several radiomic studies have used this measure to quantify feature reproducibility ^13,49,50.

The model reliability ICC reflects the extent to which the measurements can be replicated. We aimed to determine whether model predictions can be repeatedly measured/produced after adding plausible randomizations to the images and segmentations both for the same patient and across the entire dataset. As each perturbed dataset was simulated randomly and the model was expected to yield an identical outcome, the one-way random effects with absolute agreement, ICC(1, 1), were calculated to quantify the model’s reliability, with patients as the subjects and perturbations as the raters ⁴⁸. ICC values range between 0 and 1, with values closer to 1 representing more robust reliability. Typically, ICC values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.9 indicate poor, moderate, good, and excellent reliability, respectively ⁴⁸.

Model Reliability Validation

To validate the calculation of model robustness, the same experiment was repeated with highly reliable features (ICC > 0.75). This validation aimed to verify the sensitivity of the ICC in response to changes in model input reliability. An increase in feature robustness was expected to increase the model ICC.

Aerts, H. J. W. L. et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nature Communications 5, 4006 (2014).
Lambin, P. et al. Radiomics: the bridge between medical imaging and personalized medicine. Nature Reviews Clinical Oncology 14, 749–762 (2017).
Gillies, R. J., Kinahan, P. E. & Hricak, H. Radiomics: Images Are More than Pictures, They Are Data. Radiology 278, 563–577 (2015).
Fan, M. et al. Radiomic analysis of imaging heterogeneity in tumours and the surrounding parenchyma based on unsupervised decomposition of DCE-MRI for predicting molecular subtypes of breast cancer. Eur Radiol 29, 4456–4467 (2019).
Bian, Y. et al. CT-Based Radiomics Score for Distinguishing Between Grade 1 and Grade 2 Nonfunctioning Pancreatic Neuroendocrine Tumors. American Journal of Roentgenology 215, 852–863 (2020).
Coroller, T. P. et al. CT-based radiomic signature predicts distant metastasis in lung adenocarcinoma. Radiother Oncol 114, 345–350 (2015).
Guerrisi, A. et al. Novel cancer therapies for advanced cutaneous melanoma: The added value of radiomics in the decision making process–A systematic review. Cancer Medicine 9, 1603–1612 (2020).
Zhao, B. Understanding Sources of Variation to Improve the Reproducibility of Radiomics. Frontiers in Oncology 11, 826 (2021).
Ibrahim, A. et al. Radiomics for precision medicine: Current challenges, future prospects, and the proposal of a new framework. Methods 188, 20–29 (2021).
Lafata, K. et al. Spatial-temporal variability of radiomic features and its effect on the classification of lung cancer histology. Phys. Med. Biol. 63, 225003 (2018).
Blazis, S. P., Dickerscheid, D. B. M., Linsen, P. V. M. & Martins Jarnalo, C. O. Effect of CT reconstruction settings on the performance of a deep learning based lung nodule CAD system. European Journal of Radiology 136, 109526 (2021).
Zwanenburg, A. et al. Assessing robustness of radiomic features by image perturbation. Scientific Reports 9, 614 (2019).
Suter, Y. et al. Radiomics for glioblastoma survival analysis in pre-operative MRI: exploring feature robustness, class boundaries, and machine learning techniques. Cancer Imaging 20, 55 (2020).
Vallières, M. et al. Data from Head-Neck-PET-CT. (2017) doi:10.7937/K9/TCIA.2017.8OJE5Q00.
Reiazi, R. et al. The impact of the variation of imaging parameters on the robustness of Computed Tomography radiomic features: A review. Computers in Biology and Medicine 133, 104400 (2021).
Orlhac, F. et al. How can we combat multicenter variability in MR radiomics? Validation of a correction procedure. Eur Radiol 31, 2272–2280 (2021).
Foy, J. J. et al. Harmonization of radiomic feature variability resulting from differences in CT image acquisition and reconstruction: assessment in a cadaveric liver. Phys. Med. Biol. 65, 205008 (2020).
Ligero, M. et al. Minimizing acquisition-related radiomics variability by image resampling and batch effect correction to allow for large-scale data analysis. Eur Radiol 31, 1460–1470 (2021).
Li, Y., Ammari, S., Balleyguier, C., Lassau, N. & Chouzenoux, E. Impact of Preprocessing and Harmonization Methods on the Removal of Scanner Effects in Brain MRI Radiomic Features. Cancers 13, 3000 (2021).
Zhao, B. et al. Evaluating Variability in Tumor Measurements from Same-day Repeat CT Scans of Patients with Non–Small Cell Lung Cancer. Radiology 252, 263–272 (2009).
van Timmeren, J. E. et al. Test–Retest Data for Radiomics Feature Stability Analysis: Generalizable or Study-Specific? Tomography 2, 361–365 (2016).
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med 13, 1 (2015).
Clark, K. et al. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. J Digit Imaging 26, 1045–1057 (2013).
FH, T., CYW, C. & EYW, C. Radiomics AI prediction for head and neck squamous cell carcinoma (HNSCC) prognosis and recurrence with target volume approach. BJR Open 3, 20200073 (2021).
Vallières, M. et al. Radiomics strategies for risk assessment of tumour failure in head-and-neck cancer. Sci Rep 7, 10117 (2017).
Bogowicz, M. et al. Perfusion CT radiomics as potential prognostic biomarker in head and neck squamous cell carcinoma. Acta Oncologica 58, 1514–1518 (2019).
Lombardo, E. et al. Distant metastasis time to event analysis with CNNs in independent head and neck cancer cohorts. Sci Rep 11, 6418 (2021).
Diamant, A., Chatterjee, A., Vallières, M., Shenouda, G. & Seuntjens, J. Deep learning in head & neck cancer outcome prediction. Sci Rep 9, 2764 (2019).
Moradmand, H., Aghamiri, S. M. R. & Ghaderi, R. Impact of image preprocessing methods on reproducibility of radiomic features in multimodal magnetic resonance imaging in glioblastoma. Journal of Applied Clinical Medical Physics 21, 179–190 (2020).
Fave, X. et al. Impact of image preprocessing on the volume dependence and prognostic potential of radiomics features in non-small cell lung cancer. Translational Cancer Research 5, 349–363 (2016).
Shafiq-ul-Hassan, M. et al. Intrinsic dependencies of CT radiomic features on voxel size and number of gray levels. Med Phys 44, 1050–1062 (2017).
Yaniv, Z., Lowekamp, B. C., Johnson, H. J. & Beare, R. SimpleITK Image-Analysis Notebooks: a Collaborative Environment for Education and Reproducible Research. J Digit Imaging 31, 290–303 (2018).
Bradski, G. The OpenCV Library. Dr. Dobb’s http://www.drdobbs.com/open-source/the-opencv-library/184404319.
van Griethuysen, J. J. M. et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res 77, e104–e107 (2017).
Fornacon-Wood, I. et al. Reliability and prognostic value of radiomic features are highly dependent on choice of feature extraction platform. Eur Radiol 30, 6241–6250 (2020).
Zwanenburg, A. et al. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology 295, 328–338 (2020).
Cai, J. et al. A Radiomics Model for Predicting the Response to Bevacizumab in Brain Necrosis after Radiotherapy. Clin Cancer Res 26, 5438–5447 (2020).
Yu, L. & Liu, H. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. 8.
Parmar, C., Grossmann, P., Bussink, J., Lambin, P. & Aerts, H. J. W. L. Machine Learning methods for Quantitative Radiomic Biomarkers. Scientific Reports 5, 13087 (2015).
Lemaître, G., Nogueira, F. & Aridas, C. K. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research 18, 1–5 (2017).
Dirand, A.-S., Frouin, F. & Buvat, I. A downsampling strategy to assess the predictive value of radiomic features. Sci Rep 9, 17869 (2019).
Qiu, Q. et al. Reproducibility and non-redundancy of radiomic features extracted from arterial phase CT scans in hepatocellular carcinoma patients: impact of tumor segmentation variability. Quant Imaging Med Surg 9, 453–464 (2019).
Appice, A., Ceci, M., Rawles, S. & Flach, P. Redundant feature elimination for multi-class problems. in Twenty-first international conference on Machine learning - ICML ’04 5 (ACM Press, 2004). doi:10.1145/1015330.1015397.
Zhang, X. et al. Radiomics assessment of bladder cancer grade using texture features from diffusion-weighted imaging. J Magn Reson Imaging 46, 1281–1288 (2017).
Mottola, M. et al. Reproducibility of CT-based radiomic features against image resampling and perturbations for tumour and healthy kidney in renal cancer patients. Sci Rep 11, 11542 (2021).
Rizzo, S. et al. Radiomics: the facts and the challenges of image analysis. Eur Radiol Exp 2, 36 (2018).
Pavic, M. et al. Influence of inter-observer delineation variability on radiomics stability in different tumor sites. Acta Oncologica 57, 1070–1074 (2018).
Koo, T. K. & Li, M. Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med 15, 155–163 (2016).
Lee, J. et al. Radiomics feature robustness as measured using an MRI phantom. Scientific Reports 11, 3973 (2021).
Park, S.-H. et al. Robustness of magnetic resonance radiomic features to pixel size resampling and interpolation in patients with cervical cancer. Cancer Imaging 21, 19 (2021).

No competing interests reported.

SupplementaryBuildingReliableRadiomicsModelsusingImagePerturbationAZ.docx

Download PDF

Editorial decision: Major revision
10 Feb, 2022
Reviews received at journal
08 Feb, 2022
Reviewers agreed at journal
25 Jan, 2022
Reviewers invited by journal
03 Jan, 2022
Editor assigned by journal
03 Jan, 2022
Editor invited by journal
03 Jan, 2022
Submission checks completed at journal
03 Jan, 2022
First submitted to journal
22 Dec, 2021

You are reading this latest preprint version

Building Reliable Radiomic Models Using Image Perturbation

Status:

Version 1

Abstract

Figures

Main

Results

Discussion

Conclusions

Methods

References

Additional Declarations

Supplementary Files

Status:

Version 1