Patients’ clinicopathological characteristics
The clinical characteristics of CRC patients in the primary and validation cohort were given in Table 1. The detection rate of PM status in the primary cohort was higher than that in the validation cohort (16.7% vs. 15.7%), there was no significant difference (P > 0.05). In both cohorts, PM was found to be significantly associated with age, tumour diameters, tumour location, type, T stage, CEA CA199 and CA 72 − 4.
Table 1
Clinicopathological characteristics of patients in the training, internal validation and external validation cohorts (N(%)).
Characteristics
|
|
Training cohort
|
P
|
Internal validation cohort
|
P
|
External validation cohort
|
P
|
|
Non
metastasis
|
metastasis
|
|
Non
metastasis
|
metastasis
|
|
Non
metastasis
|
metastasis
|
|
groups
|
N = 1640
|
N = 309
|
|
N = 569
|
N = 135
|
|
N = 697
|
N = 130
|
|
Gender (%)
|
Male
|
1105 (67.38)
|
201 (65.05)
|
0.4636
|
379 (66.61)
|
77 (57.04)
|
0.0463
|
476 (68.29)
|
73 (56.15)
|
0.0096
|
|
Female
|
535 (32.62)
|
108 (34.95)
|
|
190 (33.39)
|
58 (42.96)
|
|
221 (31.71)
|
57 (43.85)
|
|
Age (%)
|
< 50
|
438 (26.71)
|
101 (32.69)
|
0.037
|
137 (24.08)
|
43 (31.85)
|
0.0798
|
173 (24.82)
|
47 (36.15)
|
0.01
|
|
≥ 50
|
1202 (73.29)
|
208 (67.31)
|
|
432 (75.92)
|
92 (68.15)
|
|
524 (75.18)
|
83 (63.85)
|
|
Smoke (%)
|
No
|
1218 (74.27)
|
237 (76.70)
|
0.4067
|
439 (77.15)
|
104 (77.04)
|
1
|
522 (74.89)
|
103 (79.23)
|
0.3443
|
|
Yes
|
422 (25.73)
|
72 (23.30)
|
|
130 (22.85)
|
31 (22.96)
|
|
175 (25.11)
|
27 (20.77)
|
|
Size (%)
|
< 20mm
|
389 (23.72)
|
13 (4.21)
|
< 0.0001
|
151 (26.54)
|
8 (5.93)
|
< 0.0001
|
150 (21.52)
|
15 (11.54)
|
0.0126
|
|
≥ 20mm
|
1251 (76.28)
|
296 (95.79)
|
|
418 (73.46)
|
127 (94.07)
|
|
547 (78.48)
|
115 (88.46)
|
|
Differentiation (%)
|
unknown
|
926 (56.46)
|
166 (53.72)
|
0.1003
|
335 (58.88)
|
73 (54.07)
|
0.4095
|
405 (58.11)
|
71 (54.62)
|
0.2406
|
|
Low
|
17 (1.04)
|
1 (0.32)
|
|
8 (1.41)
|
1 (0.74)
|
|
13 (1.87)
|
1 (0.77)
|
|
|
Moderate
|
43 (2.62)
|
3 (0.97)
|
|
16 (2.81)
|
2 (1.48)
|
|
22 (3.16)
|
1 (0.77)
|
|
|
High
|
654 (39.88)
|
139 (44.98)
|
|
210 (36.91)
|
59 (43.70)
|
|
257 (36.87)
|
57 (43.85)
|
|
Type (%)
|
Borrmann I
|
384 (23.41)
|
31 (10.03)
|
< 0.0001
|
142 (24.96)
|
18 (13.33)
|
< 0.0001
|
153 (21.95)
|
12 (9.23)
|
< 0.0001
|
|
Borrmann II
|
1055 (64.33)
|
181 (58.58)
|
|
355 (62.39)
|
75 (55.56)
|
|
447 (64.13)
|
84 (64.62)
|
|
|
Borrmann III
|
114 (6.95)
|
88 (28.48)
|
|
47 (8.26)
|
38 (28.15)
|
|
55 (7.89)
|
32 (24.62)
|
|
|
Borrmann IV
|
87 (5.30)
|
9 (2.91)
|
|
25 (4.39)
|
4 (2.96)
|
|
42 (6.03)
|
2 (1.54)
|
|
Location (%)
|
Upper
|
329 (20.06)
|
23 (7.44)
|
< 0.0001
|
125 (21.97)
|
11 (8.15)
|
< 0.0001
|
144 (20.66)
|
10 (7.69)
|
< 0.0001
|
|
Medium
|
464 (28.29)
|
142 (45.95)
|
|
132 (23.20)
|
63 (46.67)
|
|
174 (24.96)
|
53 (40.77)
|
|
|
Lower
|
847 (51.65)
|
144 (46.60)
|
|
312 (54.83)
|
61 (45.19)
|
|
379 (54.38)
|
67 (51.54)
|
|
T stage(%)
|
unknown
|
418 (25.49)
|
104 (33.66)
|
< 0.0001
|
146 (25.66)
|
34 (25.19)
|
< 0.0001
|
178 (25.54)
|
33 (25.38)
|
< 0.0001
|
|
T1
|
215 (13.11)
|
8 (2.59)
|
|
86 (15.11)
|
3 (2.22)
|
|
107 (15.35)
|
2 (1.54)
|
|
|
T2
|
329 (20.06)
|
23 (7.44)
|
|
119 (20.91)
|
15 (11.11)
|
|
126 (18.08)
|
8 (6.15)
|
|
|
T3
|
678 (41.34)
|
174 (56.31)
|
|
218 (38.31)
|
83 (61.48)
|
|
286 (41.03)
|
87 (66.92)
|
|
CEA (%)
|
Negative
|
1365 (83.23)
|
206 (66.67)
|
< 0.0001
|
456 (80.14)
|
87 (64.44)
|
0.0002
|
569 (81.64)
|
78 (60.00)
|
< 0.0001
|
|
Positive
|
275 (16.77)
|
103 (33.33)
|
|
113 (19.86)
|
48 (35.56)
|
|
128 (18.36)
|
52 (40.00)
|
|
CA19-9 (%)
|
Negative
|
1279 (77.99)
|
200 (64.72)
|
< 0.0001
|
439 (77.15)
|
74 (54.81)
|
< 0.0001
|
547 (78.48)
|
72 (55.38)
|
< 0.0001
|
|
Positive
|
361 (22.01)
|
109 (35.28)
|
|
130 (22.85)
|
61 (45.19)
|
|
150 (21.52)
|
58 (44.62)
|
|
CA72-4 (%)
|
Negative
|
1230 (75.00)
|
157 (50.81)
|
< 0.0001
|
453 (79.61)
|
62 (45.93)
|
< 0.0001
|
524 (75.18)
|
66 (50.77)
|
< 0.0001
|
|
Positive
|
410 (25.00)
|
152 (49.19)
|
|
116 (20.39)
|
73 (54.07)
|
|
173 (24.82)
|
64 (49.23)
|
|
*Fisher |
The development of prediction model based on simplified clinicopathological features
Among the eight simplified clinicopathological features, four variables were selected as the best subset of risk factors to develop prediction model, including tumour diameters, type, tumour location, and T stage (Table 2). The regression coefficients of multivariate logistic regression models was used to weight each feature in our models. we developed a risk score formula to predict PM status: risk score = -5.919 + 1.695 (if tumo size ≥ 20mm) + 0.505 (if tumour type Borrmann II; 1.863, if tumour type Borrmann III; 0.183, if tumour type Borrmann IV) + (1.399, if primary location is Medium; 0.899, if primary location is lower) + (1.391, if tumor tage is T3). Predicted risk = 1/(1 + e − risk score). The model that incorporated the above predictors was developed and presented as the nomogram (Fig. 2).
Table 2
The logistic regression mode l and mode 2.
Variables
|
Model 1
|
Model 2
|
β
|
OR
|
95%CI
|
P
|
β
|
OR
|
95%CI
|
P
|
Intercept
|
|
-5.919
|
|
|
< 0.0001
|
-6.259
|
|
|
< 0.0001
|
Size
|
< 20mm
|
reference
|
1
|
|
|
reference
|
1
|
|
|
|
≥ 20mm
|
1.695
|
5.447
|
3.167–10.222
|
< 0.0001
|
1.539
|
4.662
|
2.692–8.793
|
< 0.0001
|
Type
|
Borrmann I
|
reference
|
|
|
|
reference
|
1
|
|
|
|
Borrmann II
|
0.505
|
1.656
|
1.109–2.545
|
0.0169
|
0.488
|
1.629
|
1.081–2.523
|
0.0235
|
|
Borrmann III
|
1.863
|
6.443
|
4.013–10.561
|
< 0.0001
|
1.87
|
6.486
|
3.982–10.778
|
< 0.0001
|
|
Borrmann IV
|
0.183
|
1.201
|
0.510–2.595
|
0.6554
|
0.183
|
1.200
|
0.502–2.638
|
0.663
|
Location
|
Upper
|
reference
|
1
|
|
|
reference
|
1
|
|
|
|
Medium
|
1.399
|
4.049
|
2.544–6.706
|
< 0.0001
|
1.475
|
4.370
|
2.712–7.321
|
< 0.0001
|
|
Lower
|
0.899
|
2.456
|
1.554–4.043
|
0.0002
|
0.954
|
2.597
|
1.626–4.317
|
0.0001
|
T stage
|
T1
|
reference
|
1
|
|
|
reference
|
1
|
|
|
|
T2
|
0.395
|
1.484
|
0.663–3.663
|
0.3591
|
0.236
|
1.267
|
0.558–3.156
|
0.5882
|
|
T3
|
1.391
|
4.017
|
2.020–9.184
|
0.0003
|
1.205
|
3.335
|
1.661–7.674
|
0.0018
|
|
unknown
|
1.349
|
3.852
|
1.907–8.898
|
0.0005
|
1.273
|
3.571
|
1.750–8.306
|
0.0012
|
CEA
|
|
|
|
|
(2.092,7.935)
|
0.784
|
2.191
|
1.602–2.989
|
< 0.0001
|
CA19-9
|
|
|
|
|
(4.792,16.872)
|
0.223
|
1.250
|
0.925–1.680
|
0.1422
|
CA72-4
|
|
|
|
|
(0.851,3.685)
|
0.866
|
2.378
|
1.799–3.145
|
< 0.0001
|
Evaluation and External Validation of the PM Prediction Nomogram
The AUC values of the nomogram for the prediction of PM were 0.762 in the training cohorts, 0.772 in the internal validation cohorts, and 0.758 in the external validation cohort (Fig. 3). The calibration curve of the nomogram for the probability of PM showed good agreement between prediction and observation in the primary cohort (Fig. 4).
Incremental predictive value of STMs to the above model
To evaluate the additional predictive value of STMs, three STMs, including CEA, CA 19 − 9 and CA 72 − 4, together with simplified clinicopathological features, were used to develop PM prediction model. Finally, seven variables were selected as the best subset of risk factors, including tumour diameters, tumour location, T stage, type, CEA,CA199and CA 72 − 4 (Table 2). The risk score formula of the combined model was as follows: risk score = -6.259 + 1.539 (if tumo size ≥ 20mm) + 0.488 (if tumour type Borrmann II; 1.87, if tumour type Borrmann III; 0.183, if tumour type Borrmann IV) + (1.475, if primary location is Medium; 0.954, if primary location is lower) + (1.205, if tumor Tage is T3) + 0.784 (if CEA is negative) + 0.223 (if CA199 is negative) + 0.866 (if CA 72 − 4 is positive). Predicted risk = 1/(1 + e − risk score).
The mode l that incorporated the above predictors was developed and presented as the nomogram (Fig. 5).
The calibration curve for the probability of PM demonstrated good agreement between prediction and observation in the primary cohort (P = 0.998) and validation cohort (P = 0.888) (Fig. 4A and Fig. 4B). After the addition of CEA, CA199 and CA 72 − 4, the discrimination ability of pathology-based model was significantly improved in the primary cohort (AUC: 0.806 (95% CI, 0.780 to 0.831) (Fig. 3A) and validation cohort (AUC: 0.839 (95% CI, 0.804 to 0.874) (Fig. 3B) and Independent validation (AUC: 0.801 (95% CI,0.761 to 0.842), P < 0.001) (Fig. 3C).
Clinical Value of the Nomogram
DCA is a novel strategy for evaluating alternative predictive treatment methods and has advantages over AUROC in clinical value evaluation. The DCA curves for the developed nomogram in the training, internal validation, and external validation cohorts are presented in Fig. 6.