This retrospectively study was approved by the ethics committee of the hospital, and the requirement for informed consent was waived.
188 patients with AMC or type B1 and B2 thymomas confirmed by pathology in the department of thoracic oncology at our hospital from January 2010 to December 2018 were collected. The inclusion criteria were as follows: (1) complete routine unenhanced and enhanced chest CT images; (2) round and uniformly dense lesions without infiltration of surrounding tissues. (3) all included AMC misdiagnosed as thymomas and underwent resection. Patients with incomplete CT images were excluded. Finally, 188 patients were included in the study, 84 males and 104 females, mean aged 52 (19-70) years.
Baseline clinical features were derived from our medical records, including age, sex, primary site (left or right), lesion size, unenhanced CT value, enhanced CT value and change of CT value (enhanced CT value minus unenhanced CT value).
Unenhanced and 1-phase enhanced chest CT was performed using a Siemens Definition Flash 64 row. The scanning sequences were the following parameters: tube voltage 120 kV, tube current 250 mAs, 5-mm section collimation, field of view, 300 mm, matrix, 512×512, pixel size, 0.68×0.68 mm. 38-second delay scan was for enhanced phase CT scan after the administration of 100 to 120 mL of 300 mg/mL iodinated contrast material (Loversol Injection; Liebel-Flarsheim Canada Inc.) at a 3-mL/sec injection rate with a pump injector. All patients were scanned with the same machine using identical scanning parameters to ensure the same imaging parameters.
VOI Segmentation and Radiomics Feature Extraction
The chest CT images were obtained from the Picture Archiving and Communication Systems (PACS) database. For both the unenhanced phase and enhanced phase CT images of the mediastinal window, a 3D volume of interest (VOI) manual segmentation was performed using ITK-SNAP software (Version3.4.0, http://www.itksnap.org/) (Fig. 6). When multiple tumours were present, the largest diameter tumour was used to analyse.
We randomly chose 60 unenhanced and enhanced CT images for intraclass correlation coefficient (ICC). The segmentation was performed independently by two experienced radiologists. Intra-observer ICC was computed by comparing two extractions of reader A (10 years of experience in chest CT). Inter-observer ICC was computed by comparing reader A and reader B (15 years of experience in chest CT). When the ICC was greater than 0.75, it was considered good agreement, and the remaining 128 image segmentation was performed by reader A. We then obtained two feature sets (feature set-1 of 188 overall patients were extracted by reader A and feature set-2 of 60 randomly images by reader B). The feature set-1 was used to perform the model training and feature set-2 was used to test the robustness and reproducibility of features from set-1.
Image processing was applied before feature extraction, including image resample to 1×1×1 mm3 voxel size and image grey normalization to uniform grayscale of 0-255. A total number of 180 image features were extracted for each patient from the enhanced and unenhanced CT images based on VOI by AK software (Artificial Intelligence Kit V3.0.0.R; GE Healthcare). The feature set included histogram features(number=42), gray level co-occurrence matrix(GLCM) features(number=58), grey level run-length matrix (RLM) features(number=60), formfactor features(number=9) and grey-level size-zone matrix(GLZSM) features (number=11) . These features could characterize intratumor heterogeneity, may contain the underlying genotypes and protein structures [33,34].
Feature selection and radiomics signature construction
To eliminate the differences in the value scales of extraction features, feature normalization was performed before feature selection, each feature for all patients was normalized with Z-scores subtracting the mean value and divided by standard deviation .
All the patients were randomly divided into the training (n = 130) and test (n = 58) datasets at a ratio of 7:3 . The feature selection and radiomics signature construction was performed in the training dataset. Four steps were used to feature selection. First, the ICC was used to select the robustness and reproducibility features to reduce the manual segmentation among different radiologists . ICC greater than 0.75 indicated a high correlation according to the thumb rule . Second, univariate logistic regression was used to select the independent risk features with P<0.05. Third, correlation analysis was conducted on any two features, when the correlation coefficient was greater than 0.9, excluding one of them. The final step method was least absolute shrinkage and selection operator (LASSO)  to further select the most useful features by penalty parameter tuning λ, we chose the optimal λ based on the minimum criteria according to 10-fold cross-validation. This method was widely used for the radiomics analysis of high-dimensional features but small medical images.
The selected features were used to construct the radscore model. A radiomics signature (radscore) was then calculated for each patient via a linear combination of selected features that was weighted by their respective coefficients.
Construction and validation of combined model
Univariate logistic regression was used for 7 clinical features in the training datasets, including gender, age, primary site, lesion size, unenhanced CT value, enhanced CT value and change of CT value, to select independent clinical predictors. Multivariable logistic regression analysis combining above independent clinical risk factors and radscore was applied to develop combined model for the differential diagnosis between AMC and type B1 and B2 thymomas . To detect the multi-collinearity between variables in the combined model, the variance inflation factor (VIF) was used to perform the collinearity diagnosis with the VIFs>10 indicating a severity collinearity .
The discrimination and calibration curve were used to evaluate the performances of the clinical models, radscore models and combined models (unenhanced and enhanced CT) in the training and test datasets. The discrimination performance was accessed by receiver operating characteristic (ROC) curve and area under the curve (AUC), accuracy, sensitivity and specificity. To estimate the predict error, we further tested the proposed model of enhanced CT using a 1000-iteration bootstrap analysis in both datasets of enhanced CT. For each repetition, a random subset of 50% patients from training or test was selected and the corresponding AUC was calculated . Furthermore, nomogram of the combined model of enhanced CT was constructed. The calibration curve was used to detect the consistency between the predicted and actual AMC probability, which was quantitatively evaluated by Hosmer-Lemeshow test indicating the goodness of model fit when P>0.05.
All statistical analysis was executed by R software (version 3.0.1; http://www. Rproject.org). Univariate analysis for clinical features was performed by using Independent sample t test or the Mann-Whitney U for continues variable and Chi-squared test for categorical variable (sex, primary site). The statistical significance levels were all two-sided, with statistical significance set at 0.05. Multivariate logistic regression analysis was performed using the “stats” package. Nomogram construction was performed using the “rms” package.