Patient characteristics
A total of 201 eligible patients were analyzed: 145 cases in the training cohort and 56 cases in the validation cohort. The median follow-up was 29.0 months (interquartile range (IQR):12.0–64.0) in the training cohort and 32.5 months (IQR: 11.0–60.75) in the validation cohort. The 1-, 3-, and 5-year OS rates in the training were 75.2%, 46.9%, and 31.7%, and the 1-, 3-, and 5-year OS rates in the validation were 73.2%, 42.9%, and 26.8%.
The optimal cut-off value for each continuous variable as follows: age (40 years), BMI (22.3 kg/m2), tumor size (4.0 cm), WBC (10.8 109/L), N (8.1 109/L), L (1.74 109/L), PLT (163.0 109/L), NLR (2.7), PLR (108.6), ALB (42.5 g/L), ALT (13.7 U/L), AST (32.2 U/L), SLR (1.5), ALP (69.6 U/L), APOA (1.2 g/L), APOB (1.0 g/L), ABR (0.8), CRP (6.2 mg/L), CAR (0.16), LDH (230.3 U/L), GGT (44.2 U/L), TBIL (15.4 umol/L), DBIL (3.0 umol/L), and PNI (48.1). The details regarding patients' clinical characteristics and laboratory serological markers for the patients were listed in Table 1. No clinical and serological parameters except ALB, PLR, HBeAg, HBeAb, and HBcAb were significantly different distribution in the training cohort and validation cohort.
Construction of the multi-parametric prognostic model based on clinical and serological markers
To select prognostic clinical and serological markers, we performed the Lasso regression model on the basis of OS in the training cohort. Figure 1A showed the change in trajectory of each independent marker was analyzed. Moreover, 10-fold cross-validation was employed for model construction, and the confidence interval under each λ is presented in Fig. 1B. The optimal value of the λ was 0.046 in this model. So, this value was selected as the final model, which included 10 predictors from the 34 markers were significant weighted prognostic factors: age, BMI, tumor size, PLT, PLR, ALT, GGT, LDH, TBIL, and APOA. The coefficients of the 10 predictors were presented in Fig. 1C. Subsequently, a multi-parametric prognostic model based on clinical and serological markers was constructed using the coefficients derived from the Lasso regression model, with a prognostic model risk score calculated based on their personalized levels of the 10 predictors, by using the following formula: The prognostic model risk score = 0.679 - (0.148 × age) - (0.193 × BMI + (0.101 × tumor size) - (0.554 × PLT) + (0.197 × PLR) - (0.199 × ALT) + (0.186 × GGT) + (1.248 × LDH) - (0.137 × TBIL) - (0.194 × APOA). In this formula, each variable level was valued as 0 or 1; a value of 0 was assigned when the marker was less than or equal to the corresponding cut-off value, and a value of 1 otherwise.
Assessment Of Performance Of Prognostic Model And Verification
The C-index was used to estimate the discrimination performance between the prognostic model and TNM staging or clinical treatment. The results were listed in Table 2. In the training cohort, the C-index for prognostic model was 0.769 (95% confidence interval (CI): 0.721–0.817), which was higher than that of the TNM staging (0.710, 95% CI: 0.661–0.758, P = 0.079), and clinical treatment (0.694, 95% CI: 0.643–0.746, P = 0.017). Compared to either the TNM staging or the clinical treatment, the prognostic model also showed a better discrimination capability with higher C-indexes in the validation cohort.
The prognostic accuracy of the prognostic model and TNM staging or clinical treatment in these cohorts was also assessed using tdROC analysis (Fig. 2). In the training cohort, tdROC analysis showed that the area under ROC curve (AUC) of prognostic model was 0.857 for 1-year survival, 0.845 for 3-year survival, and 0.879 for 5-year survival, respectively. The AUC of TNM staging was 0.787 for 1-year survival, 0.798 for 3-year survival, and 0.771 for 5-year survival, respectively. The AUC of clinical treatment was 0.771 for 1-year survival, 0.799 for 3-year survival, and 0.753 for 5-year survival, respectively. The results indicated the prognostic model had better ability to predict survival outcomes than TNM staging and clinical treatment. Similar results were observed in the validation cohort.
In addition, the decision curve analysis (Fig. 3) showed the prognostic model had a higher overall net benefit than traditional TNM staging and clinical treatment across the majority of the range of reasonable threshold probabilities in training cohort and validation cohort.
Building And Validating A Predictive Nomogram
We built a nomogram consist of prognostic model risk score, TNM staging, and clinical treatment to predict 1-, 3-, and 5-year OS in the training cohort and validation cohort (Fig. 4A). Each subtype within the variables was assigned a point. As an example, locate the patient's model risk score, draw a line straight upward to the "Points" axis to determine how many points associated with that model risk score. Repeat the process for each variable, sum the points achieved for each covariate, and locate the sum on the "Total Point" axis. Final draw a line straight down to find the patient’s probability of OS at 1-, 3-, and 5-year. The calibration plots were used to assess the agreement between the predicted and actual observation at 1-, 3-, and 5-year OS (Fig. 4B, 4C, 4D). The 45° line represented the best prediction, the solid dark red line represented the performance of the nomogram in predicting the OS probability. The two lines overlap closely, indicating that the nomogram made better estimations in the patient cohort. The calibration plots for the probability of survival at 1-, 3-, and 5-year showed a good match between the prediction by nomogram and actual observation.
Performance of the prognostic model risk score in stratifying patient risk
The optimum cut-off value generated by the R package “survminer” was − 0.12 (Fig. 5). According to the cutoff values of prognostic model risk score, we divided the patients into 2 subgroups (Table 3): low risk group (risk score ≤ -0.12), and high risk group (risk score > -0.12). In the training cohort, for the high risk group, the median OS was 15 months (interquartile range (IQR): 7.0–40.0 months). And the high risk group with survival probabilities of 59.3%, 26.7% and 11.6% for 1-, 3-, and 5-year, respectively. For the low risk group, the median OS was 63 months (IQR: 38.0–74.0 months). And the high risk group with survival probabilities of 98.1%, 76.3% and 61.0% for 1-, 3-, and 5-year, respectively. In the validation cohort, low risk group also had higher survival probabilities than high risk group at 1-, 3-, and 5-year, respectively. Then, we adopt the Kaplan-Meier survival analysis according to the stratified subgroup (Fig. 6A). The Kaplan–Meier curves showed that significant differences in survival distributions were found stratified subgroup in the training cohort. We further applied it to the validation cohort, and found similar results.
Furthermore, we performed stratified analyses of NSCLC HBV (+) patients with their respective stage I/II, and III/IV (Fig. 6B, 6C). In the training cohort, the stratification by the prognostic model risk score resulted in significant differences in Kaplan–Meier OS curves for patients in each stage group. As for the validation cohort, this stratification also resulted in significant differences in OS, except for patients in stage I/II.
The correlation between the prognostic model and TNM staging or clinical treatment
Figure 7 showed the correlations between the prognostic model and TNM staging or clinical treatment in training cohort (A) and validation cohort (B). In this plot, the blue displayed positive correlations, and the red displayed negative correlations. The color intensity and the size of the circle are proportional to the correlation coefficients. In addition, the numbers in the graph show the Pearson's correlation coefficient (PCC) between different variables. The results revealed that prognostic model was positive correlation with TNM staging (PCC: training cohort: 0.48; validation cohort: 0.42) and clinical treatment (PCC: training cohort: 0.44; validation cohort: 0.29).