Characteristics of the training and validation cohorts
The clinical characteristics of both the training and validation sets are summarized in Table 1. The training set comprised 577 patients, while the validation set included 248 patients, making a total of 825 enrolled individuals. The rate of lymph node metastasis (LNM) was 10.1% in the training cohort and 9.3% in the validation cohort, with no statistically significant difference observed (p=0.731). In the training set, the mean age was 58.5 years, with a male-to-female ratio of 55.5:44.5. Among the 577 lesions, 85 (14.7%) were characterized as depressed type, and 42 (7.3%) exhibited submucosal invasion of less than 1000 μm. BD1 was observed in 514 (89.1%) tumors, and 545 (94.5%) lesions were without lymphovascular invasion (LVI). The clinicopathological characteristics were comparable between the training and validation sets.
Risk factors for LNM and development of a predictive model
The results of the univariate analysis for the training set are presented in Table 2. Consistent with established guidelines, LVI, TB (BD2/BD3), and TG (G3) were identified as risk factors for LNM. Interestingly, DSI(DSI, ≥ 1000 μm)was not identified as a standalone risk factor, but when combined with G2/G3 to analysis, it demonstrated significance. A logistic regression prediction model was constructed using the six variables demonstrating significant associations with LNM in Table 2. TG was excluded from the regression model due to multicollinearity (i.e. less specificity and high correlation with DSI-TG), which could potentially diminish the statistical significance of the model. Consequently, the model comprised five independent factors (depressed endoscopic gross appearance, sex, DSI-TG, LVI, TB) as LNM predictors. LVI emerged as the most potent predictor for LNM, increasing the incidence of metastasis tenfold (OR, 10.369; 95%CI, 4.60-23.28),depressed endoscopic gross appearance(OR,2.820; 95%CI,1.41-5.48),sex (OR, 0.564; 95%CI,0.31-1.00 ), DSI-TG (OR, 1.960; 95%CI,0.99-4.20), TB (OR, 1.387; 95%CI, 0.61-2.94). Furthermore, a guideline-combination model was developed, incorporating the four guideline factors: DSI, LVI, TG, and TB.
Additionally, sex and the endoscopic gross appearance of depressed type were recognized as additional risk factors for LNM. Female gender is a risk factor for lymph node metastasis in T1 colorectal cancer. However the role of gender factors is not consistent in early versus late onset populations. In T1 late-onset colorectal cancer (LOCRC), the rate of LNP in female patients (12.8%) was higher than that in male patients (7.2%; p=0.049). Notably, sex did not exert an impact on LNM in T1 EOCRC patients (P=0.640), as detailed in Table 3. Supplementary Table 1 provides the clinical-pathological characteristics of EOCRC and LOCRC.
The endoscopic gross appearance of depressed type was another additional risk factors for LNM. The tissue with the depressed appearance was found to have a more superficial DSI as outlined in Supplementary Table 2..
Overall performance of the prediction model
To validate its reliability, the Hosmer-Lemeshow statistic for the model was 2.379 (P = 0.795). When predicting the risk of LNM in the validation set using risk factors from current guidelines, Figure 2 illustrates that 231 out of 248 lesions were categorized as high risk, while 17 were deemed low risk. Among the high-risk group, 21 lesions (9.1%) exhibited LNM, whereas in the low-risk group, 2 lesions (11.8%) showed LNM. Remarkably, the guideline risk factors did not accurately differentiate lesions with LNM. The guideline-combination model, aiming to avoid an all-or-nothing decision, assigned 202 patients to the low-risk group. However, 43.5% (10 out of 23) of positive patients in the low-risk group were misdiagnosed. Utilizing the predictive model developed in this study, 121 patients in the validation set were classified as high risk, among whom 21 (17.4%) had LNM. In contrast, only 2 (1.6%) of the 127 patients in the low-risk group exhibited LNM. Importantly, a mere 8.7% (2 out of 23) of patients who tested positive were categorized as low-risk group according to both this prediction model and the current guidelines. These findings suggest that our model offers superior risk stratification for T1CRC patients with LNM.
Table 1 Clinicopathologic characteristics of the training and validation data sets
*Student t test (age and tumor size) or χ² test (other categorical variables).
R, rectum; D, descending colon; S, sigmoid colon; T, transverse colon; A, ascending colon; C, cecum.
Table 2 Univariate for selected risk factors and logistic regression model to predict for LNM in T1 colorectal cancer (training data set)
LNM, lymph node metastasis; CI, confidence interval; R, rectum; D, descending colon; S, sigmoid colon; T, transverse colon; A, ascending colon; C, cecum; DSI, depth of submucosal invasion; TG, tumor grade; DSI-TG, depth of submucosal invasion combined with tumor grade.
Table 3 Univariable analysis about sex for predicting LNM in EOCRC and LOCRC
LNM, lymph node metastasis; EOCRC, early-onset colorectal cancer; LOCRC, late-onset colorectal cancer.