An End-to-End Deep Learning Proportional Hazard Regression Model for Prognosis in Patients with Colorectal Cancer: A Multi- modal and Multi-center Study

DOI: https://doi.org/10.21203/rs.3.rs-1786757/v1

Abstract

Purpose

Survival analysis models of prognosis are extremely useful for colorectal cancer (CRC) patients as they can guide personalized precision medicine. Current survival models require extensive feature engineering depending on heuristic expertise and are not suitable for multi-modal and multi-center cases. We developed an end-to-end deep learning proportional hazards regression model (Deep-Surv) based on 3D convolution neural networks (CNN) and attention for predicting the prognosis of CRC patients.

Method

This study consisted of 564 exams of 282 patients with stage II-III CRC from two independent centers that were divided into a training dataset (Center1, 294) and a validation dataset (Center 2, 270), who underwent non-enhanced and enhanced CT before total resection from November 2011 to February 2019. We also developed clinical and radiomics models for comparison (composite model: CS; radiomics model: Radiomics-Surv). Furthermore, we compared the value of non-enhanced and/or enhanced CT as inputs for disease-free survival (DFS) prediction, respectively.

Results

The concordance index (C-index) of the Deep-Surv outperformed the CS, and the Radiomics-Surv (training, 0.84 vs 0.7 vs 0.63; validation, 0.76 vs 0.67 vs 0.62). In the training dataset, the Deep-Surv has best risk partitioning ability than the CS and Radiomics-Surv, which is further validated in the validation dataset (training: Hazard ratio [HR], 5.83 [95% Confidence intervals {CI}, 3.532–9.692], P < 0.0001; validation: HR, 3.63 [95% CI, 2.302–5.709], P < 0.0001).

Conclusions

Our study demonstrated that Deep-Surv is a powerful model of prognosis assessment and satisfies the need for a multi-modal, multi-center, standardized risk stratification system that advances the development of noninvasive precision medicine.

1. Introduction

Colorectal cancer (CRC) is the third most common cancer worldwide and an important disease threatening human health(Siegel et al. 2018). The treatment of CRC is a comprehensive treatment mainly based on surgery, and among CRC patients, stage II-III accounts for most of them. The overall 5-year survival rate for CRC is approximately 60%, with recurrence occurring in 30–40% of patients after treatment (Sargent et al. 2005; Jeffery et al. 2019). Improving the prognosis of stage II -III is significant for better disease-free survival (DFS) of CRC. Patients with stage II CRC generally have a good prognosis after surgery, but approximately one-quarter of patients will have a recurrence within 5 years. Stage III patients have a higher risk of recurrence and metastasis and benefit more from postoperative adjuvant chemotherapy, thus requiring a prognosis based on the American Joint Committee on Cancer Tumor-node-metastasis (TNM) staging system (Amin et al. 2017). However, the monotonicity and anomalies of the TNM staging system for CRC (stage IIB/C (T4a/b N0) have a significantly worse prognosis than stage IIIA (T1-2 N1)) reduce the accuracy and reliability in clinical application (Li et al. 2014). Clinical practice requires accurate adjunctive diagnostic tools for CRC, so it is necessary to improve the existing TNM staging system to assist clinicians in the accurate prognosis of CRC patients.

The most widely used in clinical analysis is the Cox risk proportional model (CoxPH), which is a semi-parametric linear model with the important assumption that the covariates affecting the risk rate are independent of each other for the risk function (Cox 1972; Lin and Zelterman 2012). It may be too simplistic to assume that the risk function is linear in many applications, such as providing personalized treatment recommendations. The CoxPH was always used for survival analysis of radiomics biomarkers (Lubner et al. 2015; Chee et al. 2017; Negreros-Osuna et al. 2020). Radiomics biomarkers are handcrafted features extracted from CT relying heavily on intensity histograms, shape attributes, and textures of CT, which are mathematically extracted from human-defined and curated quantitative formulas, and thus bias may occur (Chalkidou et al. 2015; Hosny et al. 2019).

Deep learning models have distinct advantages over radiomics approaches because they can achieve self-learning features without human intervention, and they are adept at predicting unknown data by mining data patterns rather than assuming data distributions. Previous research also showed that deep learning has great potential for survival prediction in patients with lung, gastric and colorectal cancers (Hirasawa et al. 2018; Kather et al. 2019; Xu et al. 2019; Kim et al. 2020; Zhang et al. 2020). Although the above methods have shown promising performance, the implementation of deep learning for the survival prediction of DFS in CRC patients using CT images remains unknown. In addition, most of the methods are poorly portable and cannot be adapted to multi-center and multi-modal data.

The primary aim of our study is to propose a survival analysis model based on deep learning. To address the challenges, a total of 282 patients with stage II-III total CRC resection from two centers (multi-center) were included in this study, of which 147 patients were used as training and 135 patients as validation, and sophisticated deep learning architecture, which combines the convolution neural network and attention mechanism. In addition, we designed a clinical survival model using clinical characteristics as baseline, and a radiomics model using radiomics biomarkers. Figure 1 shows an overview of this study via the deep learning model in the combination with the clinical and radiomics models.

2. Methods

2.1 Study patients

This was a multi-center retrospective study. Datasets for this study were patients with stage II-III CRC who underwent surgery from November 2011 to February 2019 at the Affiliated Hospital of Jiangnan University (Center1) and the Wuxi No.4 People's Hospital (Center2). The human CRC non-enhanced and enhanced CT used in this study were approved by the Committee for the Ethical Review of Research, the Affiliated Hospital of Jiangnan University, China, and the Wuxi No.4 People's Hospital, China. Written informed consent was waived by institutional review board for retrospective study. The experiment inclusion criteria were patients with colon or rectal cancer resection, diagnosed with colorectal adenocarcinoma after surgery, complete clinical and pathological data and no distant organ metastases confirmed preoperatively and intraoperatively. Exclusion criteria were patients with other malignant tumors in combination, preoperative radiotherapy treatment, palliative tumor resection, and lack of follow-up (Fig. 2). A total of 294 exams of 147 eligible patients from Center1 were used as the training dataset and 270 exams of 135 from Center2 were used as the validation dataset.

2.2 Study End Point

The endpoint of this study was DFS defined as the period from the time of CT scan to local recurrence or distant metastasis confirmed by biological evidence of imaging, or death from CRC, or exclusion at the last follow-up (Xie et al. 2017). Local recurrence is the reappearance of the tumor at the primary location or anastomotic area as confirmed by pathological examination. The minimum follow-up period for DFS was ascertained to be 36 months after the first CT study, while the maximum follow-up was 98 months (median, 44 months). Our institution uses follow-up every 3 months until the patient has an endpoint event, and all surviving patients are followed for at least 5 years.

2.3 CT Image Acquisition and Volume of Interest (VOI) Delineation

All patients underwent a non-enhanced CT scan first and then an enhanced scan. Philips Brilliance 16-slice spiral CT was used for scanning. Scans were performed on Somatom Sensation 64 (Siemens Medical Solutions, Forchheim, Germany) or GE Optima CT 660 scanner (GE Medical Systems, Milwaukee, Wisconsin) with slice thickness and spacing of 5 mm, tube voltage of 120 kV, tube current of 120 mA, matrix of 512×512, and pitch of 0.984. After the non-enhanced images were scanned, 90 ml contract agent ioversol (320 mg/ ml) was injected via an automatic high-pressure syringe through a median cubital vein with an injection speed of 2.5 ml/s. Then, the enhanced images of the portal venous phase were obtained when delaying 70 s after the injection.

The VOI is obtained by three radiologists with 10 years of experience in the diagnosis of digestive tract tumors by manually delineating a rectangular box on the CT sequence layer by layer, which should include all tumors as much as possible. In our study, the VOI with a uniform size of 64*64*20 after downsampling was selected as the input of the Deep-Surv model.

2.4 Model structure of Deep-Surv

The Deep-Surv model is constructed based on a three-dimensional convolutional neural network. The model is composed of three convolutional layers, correlation attention (CA) layer, and a sum fusion layer (Fig. 3). The input to the model is VOI, which then undergoes two convolution layers to obtain high-dimensional features as input to the CA module. The CA layer is based on the “Attention” in machine vision and aims to simulate the capture between enhanced CT and non-enhanced CT, exploiting and enhancing interdependencies (Fu et al. 2018; Zhou et al. 2020). The sum fusion layer performs channel fusion of two high-dimensional features. Finally, we achieve the risk score through a layer of convolution and a fully connected layer with dropout. All the convolution filters in the designed networks are in the size of \(3\times 3\times 3\). Each convolution layer is followed by batch normalization, a ReLU activation function, and a max-pooling. The dropout is a regularization technique that helps avoid model overfitting. Further details regarding the Deep-Surv are shown in Part S1.

2.5 Assessment of Deep-Surv

To evaluate the effectiveness of the deep learning models, we designed comparison models.

(1) Radiomics features for survival analysis (Radiomics-Surv): we used Pyradiomics1, the open-source software (toolkit) mentioned in the IBIS guideline, for radiomics features (van Griethuysen et al. 2017; Zwanenburg et al. 2020). Radiomics features included first order statistics, shaped-based 3D, gray level cooccurence matrix (GLCM), gray level size zone matrix (GLSZM), gray level run length matrix (GLRLM), neighbouring gray tone difference matrix (NGTDM), gray level dependence matrix (GLDM) (van Griethuysen et al. 2017; Zwanenburg et al. 2020). Each feature is independently normalized to zero mean and unit variance. To exclude redundant features, we perform a tenfold cross-validated lasso dimensionality reduction process on the features. Finally, 16 features were retained as radiomics markers to fit the Cox proportional risk model to predict DFS. Further details regarding the Radiomics-Surv are shown in Part S2. The output of the radiomics model is the risk score (Fig. 3).

(2) Clinical features for survival analysis: the clinical characteristics of patients in the training dataset were analyzed by univariate analysis using log-rank to obtain potential risk factors. Further, confounding factors were excluded by logistic regression multivariate analysis. Finally, the screened clinical characteristics were fitted to Cox proportional risk model separately to construct several clinical models (Fig. 3). Then, we constructed a composite model (CS) that integrated all screened clinical characteristics.

The predicted risk scores of patients were obtained by evaluating the survival models on the training and validation datasets, respectively, and comparing them with the actual survival using the C-index. A cut-off value was created to compare the DFS between high- and low-risk groups based on the median risk score of the training dataset. Finally, to verify the prediction of the two CTs as model inputs, deep learning models based on single CT images were constructed respectively for comparison.

2.6 Statistical analysis

Statistical analysis using R software2 (version 4.1.1), and IBM SPSS Statistics 263. Continuous variables were tested by Mann–Whitney U test and discrete variables were tested by Pearson’s Chi-squared test. Univariate analysis was performed by log-rank test, and significant factors were included in logistic regression for multivariate analysis. The difference between the high- and low-risk groups was assessed using a weighted log-rank test (the G-rho rank test, rho = 1) (Buyske et al. 2000), and p < 0.05 indicates that the difference is statistically significant.

1 https://pyradiomics.readthedocs.io/en/latest

2 https://www.r-project.org

3 https://www.ibm.com/analytics/spss-statistics-software

3. Results

3.1 Patient characteristics

A total of 282 patients were included in this study, of which 147 patients (age range, 29–89; 84 males) were used as training and 135 patients (age range, 28–89; 76 males) as validation. The median DFS for patients in training dataset was 46 months (range, 9-106 months) and 57 months (range, 9-107 months) for those in validation dataset. The distribution of patient characteristics was not statistically different in the training and validation datasets (Table 1).

3.2 Univariate and multivariate Analysis

Univariate analysis was performed using log-rank for all patients' clinical characteristics and risk scores obtained from Radiomics-Surv and Deep-Surv, where variables with p < 0.05 were used as potential risk factors for predicting DFS (Table 2). Potential risk factors for predicting CRC identified by univariate analysis were age, T stage, N stage, differentiation and perineural invasion (PNI), and lymph node ratio (LNR). Further, multivariate analysis of the screened risk factors. Taking p < 0.1, where age, T stage, N stage, differentiation, and PNI and LNR were used as independent prognostic factors for colorectal cancer. We implemented clinical models and CS using these independent features. Moreover, both the output of Radiomics-Surv and Deep-Surv can be used as independent prognosis of CRC.

Table 1

Patient characteristics

Characteristic

Training dataset (n = 147)

Validation dataset (n = 135)

P value**

Age (years)*

29–89 (65 ± 11)

28–85 (57 ± 21)

0.36

Gender

   

0.79

Male

84 (57.1)

75 (55.6)

 

Female

63 (42.9)

60 (44.4)

 

CEA (ng/mL)*

1-456.4 (12.7 ± 44.5)

0.7–227 (7.7 ± 22.3)

0.23

Location

   

0.21

Right colon

25 (17)

31 (23)

 

Left colon

122 (83)

104 (77)

 

T stage

   

0.47

T1-T2

11 (8.5)

14 (10.4)

 

T3

58 (39.5)

46 (34.1)

 

T4a

78 (53)

75 (55.5)

 

N stage

   

0.16

N0

64 (43.5)

56 (41.5)

 

N1a-1c

54 (36.7)

61 (45.2)

 

N2a-2b

29 (19.7)

18 (13.3)

 

Differentiation

   

0.5

High

8 (5.4)

12 (8.9)

 

Medium

136 (92.5)

121 (89.6)

 

Low

3 (2.1)

2 (1.5)

 

PNI

   

0.78

Negative

111 (75.5)

100 (74.1)

 

Positive

36 (24.5)

35 (25.9)

 

LVI

   

0.81

Negative

100 (68)

90 (66.7)

 

Positive

47 (32)

45 (33.3)

 

LNR

   

0.16

≤ 50%

130 (88.4)

126 (93.3)

 

> 50%

17 (11.6)

9 (6.7)

 
Values in parenthesis are percentages (%).
* Values are range (mean + std).
** Continuous variables were tested by Mann–Whitney U test and discrete variables were tested by Pearson’s Chi-squared test.
CEA, carcinoembryonic antigen; PNI, perineural invasion; LVI, lymphatic vascular invasion; LNR, lymph node ratio; DFS, disease-free survival.

 

3.3 Performance of models

We evaluated the discrimination ability of the predicted DFS of all survival models in the training dataset and validation dataset, respectively. The Deep-Surv performed best, with a mean C-index of 0.84 in the training dataset and 0.76 in the validation dataset (Fig. 4A), outperforming all clinical models, CS model, and Radiomics-Surv. The C-index of Radiomics-Surv was higher than CS (training, 0.7 vs 0.63; validation, 0.67 vs 0.62), indicating that CT images can provide more potential prognostic information compared to clinical characteristics. In addition, we compared the time-dependent receiver operating characteristic (ROC) curves (5 years, > 5 years) of the three models (Figure S1). The performance of Deep-Surv equally outperformed the other models in both datasets. The 5 years area under the curve (AUC) of Deep-Surv is 0.82 in the training dataset and 0.77 in the validation dataset. In Figure S2, we also demonstrated the ROC curves of all clinical models for comparison.

We showed box plots of the distribution of risk scores of patients predicted by the three models (Fig. 4C). Among them, the distribution of the Deep-Surv in the training dataset divided all patients into two subgroups, where patients in the low-risk group are more concentrated. Furthermore, the Pearson correlation coefficient of predicted risk scores and actual survival is significantly higher in Deep-Surv (Fig. 4B). The larger the absolute value of the correlation coefficient, the stronger the correlation, the closer the correlation coefficient is to 1 or -1, the stronger the correlation, and the closer the correlation coefficient is to 0, the weaker the correlation. The outliers for the negative correlation between age and PNI in the validation dataset are considered to be caused by the data imbalance between the validation and training datasets.

Table 2

Univariable and Multivariable Analyses of DFS

Parameter

Univariable analysis

Multivariable analysis

P Value

HR (95% CI)

P Value

HR (95% CI)

Age

< 0.01

1.04 (1.02–1.07)

< 0.01

1.05 (1.02–1.09)

Female

0.97

0.99 (0.58–1.7)

   

CEA

0.94

1.0 (0.9–1.1)

   

Left colon

0.89

1.05 (0.51–2.15)

   

T stage

       

T3

0.73

0.85 (0.33–2.18)

0.89

1.09 (0.28–4.39)

T4a

< 0.01

0.37 (0.19–0.7)

0.02

0.44 (0.19–1.03)

N stage

       

N1a-1c

< 0.01

0.27 (0.13–0.55)

0.03

0.34 (0.12–0.95)

N2a-2b

0.1

0.6 (0.32–1.13)

0.6

0.65 (0.27–1.58)

Differentiation

       

Medium

0.04

0.14 (0.01-1.0)

0.07

0.15 (0.02–1.2)

Low

0.04

0.23 (0.06–0.96)

0.03

0.18 (0.04–0.91)

Negative PNI

0.02

0.53 (0.3–0.92)

0.04

0.39 (0.16–0.97)

Negative LVI

0.27

0.73 (0.42–1.28)

   

LNR ≤ 50%

< 0.01

0.36 (0.19–0.69)

0.44

0.67 (0.23–1.86)

Radiomics-Surv output

< 0.01

3.09 (1.71–5.56)

< 0.01

0.3 (0.21–0.36)

Deep-Surv output

< 0.01

1.68 (1.53–1.84)

< 0.01

1.3 (0.3–4.6)

Univariable analysis of P and hazard ratio were tested by log-rank test and multivariable analysis of P and hazard ratio were tested by logistics regression.
CEA, carcinoembryonic antigen; HR, hazard ratio; CI, confidence intervals; PNI, perineural invasion; LVI, lymphatic vascular invasion; LNR, lymph node ratio.

 

3.4 Survival analysis of models

The outputs of our survival models are all risk scores. We used three models (CS, Radiomics-Surv, Deep-Surv) separately to assign risk scores to patients and took the median risk score of the training dataset as a threshold to classify all patients into high- and low-risk groups. After validation with training and validation datasets, Deep-Surv (training: Hazard ratio [HR], 5.83 [95% Confidence intervals {CI}, 3.532–9.692], P < 0.0001; validation: HR, 3.63 [95% CI, 2.302–5.709], P < 0.0001) could stratify the population more accurately than others. Kaplan-Meier (KM) curves for the three models are shown in Fig. 5. In addition, we stratified the validation dataset population according to age, degree of differentiation, T stage, N stage, PNI, and LNR, and used Deep-Surv to partition the patients into high- and low-risk groups (Fig. 6). The stratification analysis demonstrated that the Deep-Surv also had good performance in different subgroups. Deep-Surv was able to capture the deep and complex relationships between DFS and patient characteristics with high robustness. In Figure S3, we also demonstrated the KM curves of all clinical models for comparison.

Decision curve analysis (DCA), as a method for evaluating survival prediction models, takes into account the failure of traditional methods to consider the clinical utility of a particular model and can incorporate patient or decision-maker preferences into the analysis. We plotted DCA curves for the three models (Figure S4A) based on risk scores and the calibration curves for Deep-Surv (Figure S4B). DCA curves showed that if the threshold probability is over 10%, the application of Deep-Surv to predict DFS adds more benefit than Radiomics-Surv and CS. The calibration curve of the Deep-Surv demonstrated good agreement between predicted and observed DFS in validation dataset.

3.5 Values of non-enhanced CT and enhanced CT for DFS prediction

For comparing the influence of non-enhanced and enhanced CT as model inputs in the prediction of DFS. Non-enhanced, enhanced, non-enhanced + enhanced CT was used as inputs to our model respectively. In Fig. 4D, non-enhanced + enhanced CT as inputs showed best performance both in training dataset (C-index 0.85 ,95% CI 0.81–0.88) and validation dataset (C-index 0.78, 95% CI 0.73–0.82) with significant difference from unimodal input models (both P < 0.01). This validated the effectiveness of the attention mechanism in our model, which self-learned the most relevant features for CRC prognosis, in both CTs.

4. Discussion

In this study, we developed an end-to-end deep learning proportional hazard regression model (Deep-Surv) from CT images for predicting survival after surgical resection of stage II-III CRC. The training and validation datasets were constructed using data from two centers. We quantitatively evaluated the ability of clinical features, radiomics features, and deep learning features to predict DFS (Table 2). Univariate analysis in Table 2 suggested that Age, T stage, N stage, differentiation, PNI, LNR, the output of Radiomics-Surv and Deep-Surv could all be used as independent factors and were further validated in the multivariate analysis. The construction of survival models based on independent prognostic factors and multiple evaluation criteria on the training and validation dataset confirmed that Deep-Surv improved prognostic prediction compared to Radiomics-Surv and CS (C-index: training, 0.84 vs 0.7 vs 0.63, validation:0.76 vs 0.67 vs 0.62; AUC: training, 0.82 vs 0.69 vs 0.61, validation, 0.77 vs 0.62 vs 0.56). This result also illustrated that the deep learning method could generate more promising information than semantic phenotypic features and could handle more complex relationships between features. Similar phenotypes in the secondary analysis, the survival analysis was modeled as a classification problem that partitioned the training and validation dataset into high- and low-risk to increase the general validity of the study results. The deep learning-based classifier has better performance on both the training and validation datasets (training: HR, 5.83 [95% CI, 3.532–9.692], P < 0.0001; validation: HR, 3.63 [95% CI, 2.302–5.709], P < 0.0001). Besides, the ability to partition between high- and low-risk remains in subgroups based on clinical characteristics, as shown in Fig. 5, significant partitioning effects on the T stage (T1-T3: HR, 3.106 [95% CI, 1.37–7.039], P < 0.05; T4a: HR: 2.309 [95% CI, 1.052–5.069], P < 0.05) and LNR (< 50%: HR, 2.24 [95% CI, 1.227–4.087], P < 0.05) subgroups. The effectiveness and robustness of the Deep-Surv’s output as an independent prognostic factor were further validated.

TNM is the most commonly used staging system for CRC and is the current benchmark for treatment options for patients with CRC. For personalized treatment, studies have shown that there are some drawbacks, such as TNM is mainly based on specialist opinions and has a single selection of features, which makes the staging effect controversial (Li et al. 2018). In particular, CRC (stage IIB/C (T4a/b N0) has a significantly worse prognosis than stage IIIA (T1-2 N1)) reduces the accuracy and reliability in clinical application(Li et al. 2014). Our study demonstrated that the Deep-Surv was an independent prognostic factor with the ability to stage risk for staging subgroups (T, N stage). For instance, Deep-Surv could be used as a reference indicator to assist TNM, for example, first determining the T, N stage, and then using the Deep-Surv to further risk stratify the T and N stage to improve the ability of clinical decision making.

Previous radiomics studies have shown that CT imaging features can predict disease survival (Ji et al. 2019; Dong et al. 2019). Radiomics features are mathematically extracted from human-defined quantitative formulas and are susceptible to human bias, and thus may be subject to bias or information redundancy (Aerts et al. 2014; Berenguer et al. 2018). The acquisition of radiomics features relies on the precise outline of the lesion area by the radiologist, which is costly in practice. Our Deep-Surv requires only simple interactions that do not precisely describe the tumor, and the neural network is self-learning the features related to prognosis, saving labor costs. The survival analysis methods of radiomics features commonly used are the KM curve method, Cox method, etc. The drawbacks of these methods are firstly the time dependence of prognostic factors on tumor prognosis and secondly the linear model dealing with nonlinear features. As a result, these survival analysis methods may lose some of the prognostic information and reduce the accuracy.

Our model is based on deep learning, which makes no assumptions about temporal dependencies. Time is input to the model as a one-dimensional vector. Deep-Surv is allowed to learn the complex relationships between features autonomously through 3D convolution and loss functions. Many studies have been conducted to validate the role of deep learning in survival prediction. Kather et al. (Kather et al. 2019)illustrated that CNN can assess the human tumor microenvironment and predict prognosis directly from CRC histopathological images. Kim et al. (Kim et al. 2020)presented a deep learning model for chest CT that predicted disease-free survival for patients undergoing a lung operation. Zhang et al. (Zhang et al. 2020)developed a deep learning risk prediction model for overall survival in patients with gastric cancer. Our study further validates the effectiveness of deep learning based on CRC CT images. Secondly, our model used two different classes of CT inputs, non-enhanced CT and enhanced CT. Enhanced CT requires contrast injection and can visualize the blood flow in the diseased tissue. It can provide more accurate information about the lesion in combination with non-enhanced CT. The two classes of CT have different focuses on disease diagnosis, so in clinical practice, clinicians will combine them to diagnose patients. Thirdly, CT images contain a large amount of information, not all of which is relevant to survival prediction. The attention mechanism can automatically filter the features in CT that are related to prognosis. Attention mechanisms have been proved to be effective in selecting relevant features in previous studies (Saillard et al. 2020).

Our study also had limitations. First, although the model can consider the differences between different classes of CT images, it can be further combined with other radiological images, such as MRI. Second, our model was validated in a validation dataset of patients with similar characteristics, but there are issues with the small volume of data, which may limit the statistical capability of distinguishing performance between models. Thirdly, the 3D convolutional used in our model is often referred to as a “black box” (Nicholson Price 2018). Lack of interpretability has long been a drawback of deep learning models.

5. Conclusion

In conclusion, Deep-Surv is an effective survival prediction method with great discriminatory power. With further research, Deep-Surv has the potential to become a standard method to assist physicians in clinical decision-making and personalized medicine.

Declarations

Acknowledgments

We thank all patients who participated in this study. The authors thank the School of Artificial Intelligence and Computer Science, Jiangnan University for providing the instrumentation and technical support. This work is supported in part by the National Key R&D Program of China under [grant numbers 2018YFA0701700, 2017YFC0109402], and is supported by National Natural Science Foundation of China [grant numbers 61602007, 61731008], Zhejiang Provincial Natural Science Foundation of China [grant number LZ15F010001], the University of Macau [grant numbers FHS-CRDA-029-002-2017 and MYRG2018-00071-FHS], and the Science and Technology Development Fund, Macau SAR [File no. 0004/2019/AFJ and 0011/2019/AKP].

Ethics standards

The human CRC non-enhanced and enhanced CT used in this study were approved by the Committee for the Ethical Review of Research, the Affiliated Hospital of Jiangnan University, China, and the Wuxi No.4 People's Hospital, China.

Informed consent

Written informed consent was waived by institutional review board for retrospective study.

Declaration of Competing Interests

The authors have declared that no competing interest exists.

Authors’ contributions

Xiang Pan designed the study and edited the manuscript. He Cong conducted image, statistical analysis, and drafted the manuscript. Xiaolei Wang and Heng Zhang preprocessed image. Yuxi Ge and Shudong Hu supervised the present study, edited, and approved the manuscript.

References

  1. Aerts HJWL, Velazquez ER, Leijenaar RTH, et al (2014) Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nature Communications 2014 5:1 5:1–9. https://doi.org/10.1038/NCOMMS5006
  2. Amin MB, Frederick;, Greene L, et al (2017) The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA: A Cancer Journal for Clinicians 67:93–99. https://doi.org/10.3322/CAAC.21388
  3. Berenguer R, del Rosario Pastor-Juan M, Canales-Vázquez J, et al (2018) Radiomics of CT features may be nonreproducible and redundant: Influence of CT acquisition parameters. Radiology 288:407–415. https://doi.org/10.1148/RADIOL.2018172361/ASSET/IMAGES/LARGE/RADIOL.2018172361.FIG4.JPEG
  4. Buyske S, Fagerstrom R, Ying Z (2000) A Class of Weighted Log-Rank Tests for Survival Data When the Event is Rare. J Am Stat Assoc 95:249–258. https://doi.org/10.1080/01621459.2000.10473918
  5. Chalkidou A, O’Doherty MJ, Marsden PK (2015) False Discovery Rates in PET and CT Studies with Texture Features: A Systematic Review. PLOS ONE 10:e0124165. https://doi.org/10.1371/JOURNAL.PONE.0124165
  6. Chee CG, Kim YH, Lee KH, et al (2017) CT texture analysis in patients with locally advanced rectal cancer treated with neoadjuvant chemoradiotherapy: A potential imaging biomarker for treatment response and prognosis. PLOS ONE 12:e0182883. https://doi.org/10.1371/JOURNAL.PONE.0182883
  7. Cox DR (1972) Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B (Methodological) 34:187–202. https://doi.org/10.1111/J.2517-6161.1972.TB00899.X
  8. Dong D, Tang L, Li ZY, et al (2019) Development and validation of an individualized nomogram to identify occult peritoneal metastasis in patients with advanced gastric cancer. Annals of Oncology 30:431–438. https://doi.org/10.1093/ANNONC/MDZ001
  9. Fu J, Liu J, Tian H, et al (2018) Dual Attention Network for Scene Segmentation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019-June:3141–3149. https://doi.org/10.48550/arxiv.1809.02983
  10. Hirasawa T, Aoyama K, Tanimoto T, et al (2018) Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer 21:653–660. https://doi.org/10.1007/S10120-018-0793-2/FIGURES/4
  11. Hosny A, Aerts HJ, Mak RH (2019) Handcrafted versus deep learning radiomics for prediction of cancer therapy response. Lancet Digit Health 1:e106–e107. https://doi.org/10.1016/S2589-7500(19)30062-7
  12. Jeffery M, Hickey BE, Hider PN (2019) Follow-up strategies for patients treated for non-metastatic colorectal cancer. Cochrane Database of Systematic Reviews 2019:. https://doi.org/10.1002/14651858.CD002200.PUB4/MEDIA/CDSR/CD002200/IMAGE_N/NCD002200-CMP-001-11.PNG
  13. Ji GW, Zhang YD, Zhang H, et al (2019) Biliary tract cancer at CT: A radiomics-based model to predict lymph node metastasis and survival outcomes. Radiology 290:90–98. https://doi.org/10.1148/RADIOL.2018181408/ASSET/IMAGES/LARGE/RADIOL.2018181408.TBL3.JPEG
  14. Kather JN, Krisam J, Charoentong P, et al (2019) Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLOS Medicine 16:e1002730. https://doi.org/10.1371/JOURNAL.PMED.1002730
  15. Kim H, Mo Goo J, Hee Lee K, et al (2020) Preoperative ct-based deep learning model for predicting disease-free survival in patients with lung adenocarcinomas. Radiology 296:216–224. https://doi.org/10.1148/RADIOL.2020192764/ASSET/IMAGES/LARGE/RADIOL.2020192764.FIG5D.JPEG
  16. Li J, Guo BC, Sun LR, et al (2014) TNM staging of colorectal cancer should be reconsidered by T stage weighting. World Journal of Gastroenterology: WJG 20:5104. https://doi.org/10.3748/WJG.V20.I17.5104
  17. Li W, Zhang L, Tian C, et al (2018) Prognostic value of computed tomography radiomics features in patients with gastric cancer following curative resection. European Radiology 2018 29:6 29:3079–3089. https://doi.org/10.1007/S00330-018-5861-9
  18. Lin H, Zelterman D (2012) Modeling Survival Data: Extending the Cox Model.
    http://dx.doi.org/101198/tech2002.s656 44
    :85–86. https://doi.org/10.1198/TECH.2002.S656
  19. Lubner MG, Stabo N, Lubner SJ, et al (2015) CT textural analysis of hepatic metastatic colorectal cancer: pre-treatment tumor heterogeneity correlates with pathology and clinical outcomes. Abdominal Imaging 2015 40:7 40:2331–2337. https://doi.org/10.1007/S00261-015-0438-4
  20. Negreros-Osuna AA, Parakh A, Corcoran RB, et al (2020) Radiomics texture features in advanced colorectal cancer: Correlation with braf mutation and 5-year overall survival. Radiology: Imaging Cancer 2:. https://doi.org/10.1148/RYCAN.2020190084/ASSET/IMAGES/LARGE/RYCAN.2020190084.FIG4.JPEG
  21. Nicholson Price W (2018) Big data and black-box medical algorithms. Science Translational Medicine 10:. https://doi.org/10.1126/SCITRANSLMED.AAO5333
  22. Saillard C, Schmauch B, Laifa O, et al (2020) Predicting Survival After Hepatocellular Carcinoma Resection Using Deep Learning on Histological Slides. Hepatology 72:2000–2013. https://doi.org/10.1002/HEP.31207
  23. Sargent DJ, Wieand HS, Haller DG, et al (2005) Disease-Free Survival Versus Overall Survival As a Primary End Point for Adjuvant Colon Cancer Studies: Individual Patient Data From 20,898 Patients on 18 Randomized Trials. Journal of Clinical Oncology 23:8664–8670. https://doi.org/10.1200/JCO.2005.01.6071
  24. Siegel RL, Miller KD, Jemal A (2018) Cancer statistics, 2018. CA: A Cancer Journal for Clinicians 68:7–30. https://doi.org/10.3322/caac.21442
  25. van Griethuysen JJM, Fedorov A, Parmar C, et al (2017) Computational radiomics system to decode the radiographic phenotype. Cancer Research 77:e104–e107. https://doi.org/10.1158/0008-5472.CAN-17-0339/SUPPLEMENTARY-VIDEO-S2
  26. Xie W, Regan MM, Buyse M, et al (2017) Metastasis-Free Survival Is a Strong Surrogate of Overall Survival in Localized Prostate Cancer. Journal of Clinical Oncology 35:3097. https://doi.org/10.1200/JCO.2017.73.9987
  27. Xu Y, Hosny A, Zeleznik R, et al (2019) Deep learning predicts lung cancer treatment response from serial medical imaging. Clinical Cancer Research 25:3266–3275. https://doi.org/10.1158/1078-0432.CCR-18-2495/352840/P/DEEP-LEARNING-PREDICTS-LUNG-CANCER-TREATMENT
  28. Zhang L, Dong D, Zhang W, et al (2020) A deep learning risk prediction model for overall survival in patients with gastric cancer: A multicenter study. Radiotherapy and Oncology 150:73–80. https://doi.org/10.1016/J.RADONC.2020.06.010
  29. Zhou J, Roy SK, Fang P, et al (2020) Cross-Correlated Attention Networks for Person Re-Identification. Image and Vision Computing 100:103931. https://doi.org/10.1016/J.IMAVIS.2020.103931
  30. Zwanenburg A, Vallières M, Abdalah MA, et al (2020) The image biomarker standardization initiative: Standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology 295:328–338. https://doi.org/10.1148/RADIOL.2020191145/ASSET/IMAGES/LARGE/RADIOL.2020191145.FIG5.JPEG