Study population
Our study was a retrospective observational study involving patients with bladder cancer who received ICI therapy and developed pneumonitis between July 2020 and June 2023. Conducted in compliance with the Declaration of Helsinki (as revised in 2013), the study received approval from the Medical Ethics Committee. The need for informed consent was waived for this retrospective study.Figure 1 illustrates the study protocol in detail.
All participants underwent chest CT scans and were selected consecutively from our electronic database. The study included two groups of patients, classified based on their medical history and CT imaging findings (i.e., radiological reports). Inclusion criteria for group selection were as follows:
Checkpoint inhibitor-related pneumonitis: This condition involves a group of oncological patients who developed pneumonitis after receiving checkpoint inhibitor therapy (ICI). Radiographic evidence from chest CT scans can vary and may include patterns resembling cryptogenic organizing pneumonia, acute interstitial pneumonia, and acute respiratory distress syndrome. The diagnosis of ICI therapy-related pneumonitis is primarily clinical, relying on the exclusion of other proven microbiological or pharmacological causes of pneumonia, improvement after stopping the drug, and the use of medical treatments, notably corticosteroids. Confirmation comes from the resolution of radiographic findings on follow-up chest CT scans [8, 14].
COVID-19: This group consists of consecutive patients showing symptoms such as fever > 37.5°C, dyspnea, cough, and fatigue, who were confirmed to have COVID-19 pneumonia. Confirmation was through positive reverse transcription–polymerase chain reaction tests on nasopharyngeal or oropharyngeal swabs and positive chest CT scans, with data collected between July 2021 and June 2022 [15].
Exclusion criteria for the study include poor quality CT images unsuitable for analysis and the presence of other chest conditions such as tuberculosis, chronic obstructive pulmonary disease, and lung cancer on chest CT.
CT acquisition
The Siemens SOMATOM Definition Edge and GE LightSpeed VCT spiral CT scanners were employed for imaging. Patients were positioned supine, and scans were taken at full inspiration using standard doses. The scan range extended from the lung apex to the costophrenic angle, with a slice thickness of 5 mm, a tube voltage of 120 kV, and a tube current of 100 mA. Images were reconstructed using a medium-sharp algorithm with a 1.25 mm thickness.
Image segmentation
The original digital imaging and communications in medicine images were uploaded to the Radcloud platform (Huiying Medical Technology Co., Ltd) for preprocessing. This step normalized the data to reduce variability from different scanning techniques and improved reproducibility [16]. Lesions on the CT images were manually outlined by two experienced radiologists (with over 10 years’ experience in lung CT), who were unaware of the patients' clinical details. All delineated regions of interest (ROI) were then reviewed by a senior radiologist, who made the final decision on any discrepancies greater than 5% in the delineation of pneumonitis borders [17]. The computer automatically generated the volume of interest (VOI) for the lesions. The methodology of the radiomics approach is depicted in Figure 1.
Feature extraction
After delineating the volume of interest (VOI) for each lesion, a total of 1,688 quantitative imaging features were extracted from CT images using the Radcloud platform (http://radcloud.cn/). These features were categorized into three groups. Group 1, first-order statistics, included 126 descriptors that quantitatively represent the distribution of voxel intensities within the CT image using common and basic metrics. Group 2 (shape- and size-based features) consisted of 14 three-dimensional features related to the shape and size of the region. Group 3, texture features, encompassed 525 features derived from grey level run-length and grey level co-occurrence texture matrices, quantifying region heterogeneity differences. These texture features describe the recurrent local patterns in the image and their arrangement rules, including 75 features from the Gray Level Co-occurrence Matrix Reproducibility analysis, the Gray Level Run Length Matrix (GLRLM), and the Gray Level Size Zone Matrix (GLSZM). Additionally, the texture was represented at multiple resolutions by filtering the image with 14 filters, such as index, logarithm, gradient, square value, square root, lbp-2D and wavelet (Wavelet-LHL, Wavelet-LHH, Wavelet-HLL, Wavelet-LLH, Wavelet-HLH, Wavelet-HHH, Wavelet-HHL, and Wavelet-LLL) to analyze texture at a finer scale.
To validate the manual segmentation's reliability between two radiologists, CT scans from 10 randomly selected patients were segmented by both radiologists for a double-blind interpretation. The consistency of each feature, both within and between observers, was measured by the intra-class correlation (ICC), and features with low reproducibility were excluded from further analysis. Any feature with an ICC below 0.75 was discarded.
Feature selection
To prevent the overfitting of the signature, feature dimension reduction was conducted before signature construction. Radiomics features with inter- and intra-observer ICCs above 0.75 and those significantly different between groups, as determined by one-way analysis of variance (ANOVA), were selected using the Select K Best method and the least absolute shrinkage and selection operator (LASSO) regression model. This selection aimed to identify the most valuable features in the training set. The variance threshold method employed a threshold of 0.8, removing features with variance eigenvalues below this value. The Select K Best method, a single-variable feature selection technique, used p-values to analyze the relationship between features and classification results, including all features with p-values below 0.05. The LASSO model was set with a cross-validation error value of 5 and a maximum iteration count of 1,000. The selected features were then used to construct a radiomics signature, with parameter settings following those of previous studies [18, 19].
Classifier training
Based on the clinical data and follow-up imaging analysis, the validation and training datasets were randomly separated with a ratio of 3:7, using a random seed of 958. Three classifiers—k-nearest neighbor (KNN), support vector machine (SVM), and stochastic gradient descent (SGD)—were trained using fivefold cross-validation. This method divides the data into five parts, trains on each part sequentially, and estimates the algorithm's accuracy by calculating the mean results from five training rounds. The best model for distinguishing CIP from COVID-19 was selected. The performance of the feature classifier was then validated and evaluated using several metrics, including area under the curve (AUC), sensitivity, specificity, accuracy, recall, and F1-score, through the receiver operating characteristic (ROC) curve.
Development of a radiomics nomogram and performance assessment
After selecting key features, a radiomics signature, termed Rad-score, was created from a linear combination of selected features and corresponding coefficients derived from LASSO. Additionally, nomograms were developed to determine whether clinical or radiological parameters could further differentiate subtypes of pneumonitis by combining Rad-score and clinical factors, as opposed to using clinical factors alone. A calibration plot was used to assess the nomogram's calibration and goodness-of-fit. The predictive performance of the clinical factors model and the radiomics nomogram for prognosis was evaluated using the C-index in both the training and validation sets. Decision curve analysis (DCA) was conducted in the training set to assess the net benefits across a range of threshold probabilities.
Statistical analysis
Statistical tests were performed using SPSS (version 25.0, IBM) and R statistical software (version 3.3.3; https://www.r-project.org). Univariate analysis was used to compare clinical factors between groups using the chi-square test or Fisher's exact test for categorical variables and the Mann–Whitney U test for continuous variables, where appropriate. One-way ANOVA was used to compare the value of each radiomics feature for differentiating CIP from COVID-19. The ROCs of the two datasets were compared using the DeLong test. The models' prediction performance was assessed in the validation set using the thresholds determined in the training set. The ROC curves were plotted using the “pROC” package, nomogram development utilized the “rms” package, and DCA was performed using the “dca.R” package. A two-sided p-value of less than 0.05 was considered significant.