Patients and clinical characteristics
Shanxi Medical University Review Board approved this retrospective study. The entire cohort of this study was acquired from February 2016 to October 2018 records of the Institutional Picture Archiving and Communication System (PACS), which was used to identify the patients who had histologically confirmed ESCC (TNM stage: I-III) and underwent surgery after diagnosis at Shanxi Cancer Hospital. All patients underwent pretreatment CT scans from neck to abdomen, and signed their own informed consent. All methods were carried out in accordance with relevant guidelines and regulations.
To determine the patients that could be included, we developed the following criteria: 1) pathologically confirmed ESCC; 2) underwent surgery for ESCC; 3) standard contrast-enhanced CT was performed preoperatively; and 4) complete clinical and follow-up information was available. We randomly divided the patients into training and validation cohort by a ratio of about 3:1. We trained models in training cohort and validated them in validation cohort.
Clinical characteristics including age, gender, tumor location (upper, middle, lower), drinking history, smoking history, genetic alterations, and pathologic characteristics including depth of invasion, TNM stage and lymph node metastasis informations were collected from patient records. These clinicopathologic characteristics are presented in Table 1.
Follow up and clinical endpoint
All patients were followed up every 1-3 months during the first 2 years, every 6 months in year 2-5, and annually thereafter. To provide an efficient tool, which would allow earlier personalized treatment, we chose PFS as the endpoint . We defined PFS from the first day of treatment to the date of disease progression (locoregional recurrences or distant metastases), death from any cause, or the date of the last follow-up visit (censored). The minimum follow-up time to ascertain the PFS was 6 months.
CT acquisition and segmentation
All patients were performed the contrast-enhanced CT by using a 64-channel multi-detector CT scanner (LightSpeed VCT, GE Medical Systems, Milwaukee, Wis, USA). The acquisition parameters were as follows: 120 kV; 160 mA; 0.5-second rotation time; detector collimation: 64×0.625 mm; field of view: 350 mm×350 mm; and matrix: 512×512. After routine non-enhanced CT, contrast-enhanced CT was performed after a 25-second delay following intravenous administration of 85 mL of iodinated contrast material (Ultravist 370; Bayer Schering Pharma, Berlin, Germany) at a rate of 3.0 mL/s with a pump injector (Ulrich CT Plus 150, Ulrich Medical, Ulm, Germany). All images were reconstructed with a thick slice of 5.0 mm. For feature selection, we converted image format from DICOM to NII without applying any preprocessing.
Note that segmentation is required before the extraction of quantitative radiomics features, we performed three-dimensional manual segmentation by using 3D-Slicer software (https://www.slicer.org/), which is an open platform for medical image processing. The chief physician of Shanxi Cancer Hospital with more than five years’ experience in interpreting chest radiology outlined the tumor regions for each CT image layer, and the tumor segmentation was guided and verified by the specialist. The region of interest (ROI) covered the whole tumor mass and was delineated on each CT slice, and would be used in subsequent feature extraction.
Selection of radiomics feature and building of radiomics signature
We performed the calculation through our homemade Python scripts (Python3.6, https://www.python.org) for radiomics feature extraction based on the segmentation results. A total of 954 features were obtained by calling feature calculation in pyradiomics package (open-source python package; https://pyradiomics.readthedocs.io/en/latest/), which included the following 4 categories: 1) first-order statistics features; 2) size- and shape-based features; 3) texture features; and 4) wavelet features； and 5 typical matrixes: Gray-Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), Gray Level Dependence Matrix (GLDM) and Neigbouring Gray Tone Difference Matrix (NGTDM).
We built the radiomics signature with selected features in training cohort. To reduce over-fitting or any types of bias, we applied following 2 steps: First, the best features based on univariate statistical tests (2-sample t-test) between death and censoring groups in the primary cohort were selected and executed by using Matlab 2016b (Mathworks, Natick, USA). Second, we used our homemade R scripts to select features that were most significant by using the least absolute shrinkage and selection operator (LASSO) method, which would be a suitable methodology for the feature selection through regression of high-demensional data (R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: http://www.R-project.org, 2016). The glmnet R-packages was applied for logistic regression (open-source R package; https://cran.r-project.org/web/packages/glmnet/index.html). Additionally, the accuracy of prediction model could be improved by regularizing the features through penalized estimation. We added the L1 penalty term to the normal linear model and the parameter lambda controls the complexity of regression. When the λ was large, it indicated that there was no effect on the estimated regression parameters; while as the λ getted smaller, most covariate coefficients were shrunk to zero. Then the remaining variables with non zero coefficients were selected by the λ that the 10-fold cross-validation error was the smallest  .
Finally, the radiomics signature was built by combining those variables in the primary cohort and validated in the validation cohort. The radiomics signature is a linear combination of selected features with respective weights, which would be calculated as a factor (Radiomics score, Rad-score) for the further prediction model. The assessment method of the logistic regression model is the receiver operating characteristic (ROC) curve and its area under the curve (AUC).
Prognostic validation of radiomics signature
We calculated Rad-score for each ESCC patient and grouped them according to the following 2 rules. 1) The patients were divided into high-risk and low-risk groups based on the median Rad-score. 2) Patients with median scores were placed in high-risk groups. The radiomics signature discriminative performance of the survival status was assessed according to the overall distribution of ESCC patients. And then, the potential association of radiomics signature and clinical feature with PFS was assessed in the training cohort and validated in the validation cohort. Kaplan-Meier survival analysis was used in these two cohorts (the survival R-package was used for Kaplan-Meier survival analyses; https://cran.r-project.org/web/packages/survival/index.html). Stratified analyses were implemented to determine the PFS in subgroups of high-risk and low-risk patients. Univariate Cox Proportional Hazards Models were performed to explore the C-index of the radiomics signature (the rms R-package was used for Cox proportional hazards regression; https://cran.r-project.org/web/packages/rms/).
Performance of TNM staging and clinical nomograms in the training cohort before and after addition of Rad-score
The nomogram with the predicting model was based on the multivariable logistic regression analysis. The following candidate factors: TNM stage (dummy variable: “0” for I, “1” for II, “2” for III), status of clinical features and Rad-scores were involved in a diagnostic model for preoperative prediction of ESCC. The nomogram is a graphical representation of this prediction model in the training cohort. The prognostic performance of TNM staging and clinical nomograms in the training cohort before and after the addition of Rad-score was quantitatively measured by using harrell’s concordance index (C-Index), which is commonly used to evaluate the discriminative power of prognostic models . The value of the C-index could range from 0.5, which indicated no discriminative ability, to 1.0, which indicated perfect ability to distinguish between the patients who sufferred disease progression or death and those who did not. Bootstrap analyses with 1,000 resamples were used to obtain a C-index with 95% confidence interval (CI)  that were corrected for potential overfitting. The calibration curves were drawn for assessing the agreement between the predicted probability of 3-year PFS and actual 3-year PFS .
Nomogram validation in validation cohort
The prognostic performance of TNM staging and clinical nomograms in the validation cohort before and after the addition of Rad-score was tested by the above method. Calibration curve and C-index were calculated through multivariable Cox proportional hazard regression analyses. The decision curve analysis (DCA) was introduced to evaluate the quantified net benefit of our prediction model in the validation cohort [43, 44].
Association of radiomics features with clinical data
A heat map analysis was used to evaluate the associations between clinical data and radiomics features (the gplots and pheatmap packages were used for heat maps).