Development and Validation of a Clinical-Radiomic Model for Preoperative Prediction of Lymph Node Metastasis and Overall Survival in Rectal Cancer: A Retrospective Study

Background: Preoperative lymph node (LN) metastasis is essential to therapeutic strategies for rectal cancer (RC) patients without distal metastasis, and is one of the major predictive risk factors. Unfortunately, it is dicult to predict the preoperative LN metastasis. Thus, we established a clinical-radiomic model to assist in predicting LN metastasis and overall survival (OS) in RC patients preoperatively. Methods: The prediction model was established and validated in a primary cohort consisting of 150 patients with pathologically conrmed RC, and data was gathered from January 2013 and December 2015. Radiomic features were extracted from the CT images of portal venous phase (PVP). For selection of the features, LASSO regression method was used. We established a clinical-radiomic model by incorporating the radiomic features and independent clinical risk factors. Then we adopted this combined model to classify RC patients into high- and low-risk groups for 3-year OS analysis. Results: A total of 49 patients were proved to have LN metastasis pathologically, and 101 patients had no metastasis of LN. The Rad-score consisted of 2 selected features. The combined clinical-radiomic model included preoperative clinical T stage, preoperative clinical N stage and the Rad-score. The combined model showed good performance of discrimination, with a AUC of 0.847 in the training cohort and 0.782 in the validation cohort. Signicant differences existed in OS between high- and low-risk groups in both the training (p=0.026) and validation (p=0.034) cohort. Conclusion: This combined clinical-radiomic model could be conveniently used to facilitate the preoperative prediction of LN metastasis and prognosis in patients with RC. research


Background
Rectal cancer (RC) is the third leading cause of cancer-related death worldwide In. The incidence of RC is increasing, particularly in younger age groups (1). Moreover, local recurrence or metastatic disease can occur in up to 30% of patients within ve years of surgery (2,3).
Lymph node (LN) metastasis is crucial for determining the therapeutic strategy and constitutes one of the major prognostic risk factors in patients having RC without distal metastasis (4,5). Magnetic resonance imaging (MRI) and Multiple-detector computed tomography (MDCT) are major methods for pretreatment assessment. Unfortunately, there is no consensus standard to predict metastatic LNs (6)(7)(8). The maximum short-axis diameters of LNs were commonly used to identify malignant LNs in previous studies (6,9). However, there is signi cant size overlap between benign and malignant LNs (10), and over 50 percent of the LNs involved in RC are less than 5 mm (11).
Consequently, better tools for preoperative prediction of LN metastasis is still in need. Moreover, prognosis of RC patients after operation varies, making better risk strati cation methods important.
Radiomics, which involves texture analysis, is a high-throughput approach allowing clinicians to extract large amounts of quantitative data from images, thus providing information that cannot be visually assessed but could be related to the tumor phenotype (12,13). The prognostic performance of radiomics based on CT images, has been evidenced in several cancers, including pancreatic cancer, lung cancer and esophageal cancer (14)(15)(16). Therefore, we aimed to develop and validate a combined clinical-radiomic model to preoperatively predict lymph node status for patients having RC without distal metastasis, and to associate the combined model with overall survival (OS).

Method
This research was approved by our institution's medical ethics committee (NO.2019-1159). The need for informed consent was waved.

Patients
By searching electronic medical records, patients with histologically con rmed RC who underwent resection in our institution were recruited. Inclusion criteria: (1) patients were treated with resection; (2) preoperative enhancement CT was conducted within 1 month prior to surgical resection; (3) follow-up was performed for at least 3 years; (4) LN metastasis was con rmed by pathology. Exclusion criteria: (1) lack of preoperative CT images; (2) patients with distal metastasis; (3) loss of follow-up; (4) poor CT image quality. Patients were assigned to a training and a validation cohort at a ratio of 7:3 by the date of scanning. The ow-diagram of patient inclusion process was presented in Figure 1.

CT Examination and Imaging Analysis
Although the design of this study was retrospective, only one CT scanner was chosen: 128-MDCT scanner (Somatom De nition AS+; Siemens Healthcare Sector, Forchheim, Germany). The scanning parameters: amperage, 200-210 mA; voltage, 120 kV ; pitch, 1.0; rotation time, 0.5 seconds; and section thickness, 2.0 mm. Patients were injected an anionic contrast medium (Omnipaque 350; GE Healthcare, Chicago, IL) intravenously, at a dose of 1.5 mL/kg, and at a rate of 3 mL/s. With the trigger threshold of the aorta reaching 100 HU, a three-phase scan was obtained. However, only the PVP image was used to extract texture parameters.

Follow-up
The end point of this study was OS, which was de ned as the time from the date of RC resection until patients died, or until the day the patient was last known to be relapse-free. All patients were followed up continuously for at least 3 years, until the patient died or lost. Postoperative recurrence was monitored by standard NCCN guidelines, including clinical evaluation, CEA, CA19-9 every 3 months for the rst year and every 6 months until the 3 rd year (17). Other standard testing included yearly CT scans as well as surveillance colonoscopy until the 3 rd year(17).

Radiomic Features Extraction
By using ITK-Snap software (available at www.itk-snap.org) a radiologist manually delineated the regions of interest (ROIs) within a tumor (about 1 mm from the tumor margin) (Figure 2) to recognize the threedimensional (3D) volume area in PVP images (18). Calci cation, blood vessels and cystic areas within the tumor were excluded. Texture characteristic were extracted and analyzed from all patients by using A.K. software (Analysis-Kit; GE Healthcare). 396 texture features of six kinds were extracted: Gray level histogram, Haralick features, gray level co-occurrence matrix (GLCM), gray level size zone matrix, runlength matrix, and form factor feature. The histogram showed the intensity of the pixels and the runlength matrix in particular directions (19). The Haralick features and co-occurrence matrix presented information on the distribution of the gray level value in all directions of the pixel pairs (20). The gray level size zone matrix was effective for characterizing texture homogeneity, or a speckle-like texture (32,33). By using the histogram features, we extracted the parameters of radiomic features and produced a qualitative or quantitative description. A series of matrix transformations such as gray level cooccurrence metrics and run-length metrics re ected the transform matrix texture of high-level ROI. We analyzed the ROI characteristics across various resolution levels with the wavelet transformation. With the lter-transform texture, we obtained a set of target features by using various lter types such as log transformation and Gaussian transformation(21).

Model-building and evaluation
Because of the large number of texture features and the relatively limited size of the cohort, redundant texture features need to be eliminated. Therefore, we analyzed all radiomic characteristics for the correlation of any two features, and features were redundant if the coe cient of linear correlation was above 0.6 (22). In order to avoid over tting the high-dynamic data analysis, we adopted the least absolute regression shrinkage and selection operators (LASSO) (22). The optimization of the tuning parameter (λ) in the LASSO regression reduced the majority of the characteristics to zero and selected the remaining features with non-zero coe cients. (22) The most distinctive features were therefore identi ed. The stability of these features was assessed by calculating the intraclass correlation coe cient (ICC) from 30 patients (23). This features with ICC > 0.75 were included. Each radiomic feature was evaluated by univariate logistic regression. To avoid missing important features, statistical importance was assumed at 0.2 (24). The features which were statistically signi cant with univariate logistic regression analysis, were then evaluated for model building with multivariate logistic regression analysis. Then, radiomics signature was created with a radiomics score (Rad-score). And a combined model was created to predict LN status with incorporation of the preoperative clinical risk factors and Rad-score. Eventually, we established a nomogram for model visualization, and performed the receiver operating characteristic (ROC) curve analysis to evaluated the diagnostic performance to predict LN metastasis of RC. We also conducted Kaplan-Meier (K-M) survival curves to investigate the association of the predicting models with 3-year OS of RC patients.

Statistical analysis
All data analyzes were performed using SPSS version 22 (
A nomogram was established for the combined model visualization (Figure 4). To use the nomogram, the values on the respective axis for each variables need to be found and added together. Then by drawing a line from the total points axis to the axis of probability, the risk for LN metastasis can be assessed. Higher overall score was correlated with higher risk of LN metastasis. The contributions of these factors to the Rad-score and the combined model were measured by the standardized logistic regression coe cient. The contribution of "Compactness2" to the Rad-score was greater than another one ( Figure 5), and that of "cN stage" to the combined model was highest ( Figure 5).

Model for LN status prediction
If only radiomics signature (Rad-score) was considered, the AUC was 0.674 (95% CI, 0.575-0.762) ( Figure  6 & Table 2). In order to improve predictive performance for LN status, the radiomic features could be  Table 2), respectively. The clinical variables contributed better to the combined model than the Rad-score, which corresponded to the higher standardized logistic regression coe cient of preoperative cN stage than that of Rad-score ( Figure 5). Although clinical model showed higher AUC value than Rad-score, the Rad-score helped greatly to increase the AUC value from 0.818 to 0.847. Combined model performance for OS The combined model strati ed patients into low-risk and high-risk groups. Signi cant differences existed in OS of predicted low-risk and high-risk groups in both the training (p=0.026) and validation (p=0.034) cohort (Figure 7).

Discussion
Preoperative LN stage is essential to therapeutic strategy, and is one of the major predictive risk factors for RC patients without distal metastasis. Unfortunately, it was di cult to predict the LN metastasis through all existing methods (26), thus we constructed a combined clinical-radiologic model to preoperatively predict LN status, and to classify patients into high-and low-risk classes for 3-year OS analysis. Our results suggested that the combined model could improve diagnostic performance compared to the model with clinical risk factors alone, and 3-year OS between high-and low-risk groups were signi cantly different in both training and validation cohort.
Compared with traditional imaging methods to assess anatomical changes of LN, the advantage of radiomics is using large numbers of quantitative features to re ect the inherent heterogeneity of the lesion (27,28), that is invisible to human eyes. Recently, radiomic analysis has emerged as a powerful method for developing decision-making models. Radiomic-based predictive models for advanced nasopharyngeal carcinoma, early-stage non-small cell lung cancer, and rectal cancer (14)(15)(16) have been reported. Our study demonstrated that 2 robust radiomic features were associated with LN status and prognosis for RC patients.
Regarding the clinical features, preoperative cT and cN stage had high standardized logistic regression coe cient to the combined model, which was in line with the common sense that cT and cN stage had a signi cant correlation with pathological LN metastasis. Other clinical risk factors were not associated with LN metastasis and 3-year OS signi cantly.
In this study, the AUC value of rad-score was lower than that of clinical risk factors. This might be because we included RC patients at various stage, and the disparity in clinical features was greater than that in radiomic features among RC at various stages. Even so, the contribution of rad-score to the combined model was greater than that of cT stage. Moreover, the combination of rad-score helped to improve the AUC value from 0.818 to 0.847 in the training cohort and from 0.739 to 0.782 in the validation cohort.
Compared with some previous articles focused on the similar topic (21,24), our combined model showed better predictive value. The reason for this is may be the better data processing ow and control of heterogeneity. First, LASSO method not only succeeds the method of selecting predictors based on the strength of their univariable association with outcome, but also allows the combination of the panel of selected characteristics into a radiomic signature. Furthermore, all CT images used in our research were obtained from the same CT scanner, and patients were treated only with surgery. Besides better predictive value, our research associated the combined model with prognosis.
However, there were several limitations in this study. Firstly, as a retrospective study, there could be unavoidable selection bias, therefore, prospective and external validation studies are necessary. Secondly, this research resulted from a single institution, so multicenter validation is needed to expand the exibility of the experimental results. Thirdly, this study utilized only one imaging modality, which contributes to a limited number of extracted radiomics features. If more imaging modalities (such as MRI and PET-CT) are integrated, the feature pool could be extended effectively to provide more useful radiomics information.

Conclusions
This clinical-radiomic model could be conveniently used to promote the preoperative prediction of LN metastasis and prognosis in RC patients. Availability of data and materials

List Of Abbreviations
The datasets used and/or analysed during the current study are available from the corresponding author at wangziqiang@scu.edu.cn on reasonable request.

Competing interests
The authors declare that they have no competing interests The ow diagram for patient enrollment.