Demographic and clinical characteristics
Through inclusion and exclusion criteria, a total of 51529 patients were ultimately included in the study. We randomly divided the patients into a training set and a testing set at 7:3. There were 36070 patients in the training set and 15459 patients in our test set enrolled in this study. The selection criteria for patients are shown in (Fig. 1).The demographic and clinical characteristics of non-small cell lung cancer were listed in Table 1. Regarding the training set, most patients (81.49%) were white people. 54.04% of the patients are adenocarcinoma and 35.31% are Squamous cell carcinoma. The most common metastatic sites were bone, accounting for approximately 8.59% of bone metastases. 46.10% of patients performed surgery and 57.45% of patients performed radiation. There were nearly 57.69% of patients that received chemotherapy.
Since the random allocation was conducted, the p-value between the training and testing datasets was less than 0.001, indicating that there was a very small difference between the data from the two sets. Therefore, there is minimal difference between the baseline information of the data.
Table 1. Patient baseline information
Patient baseline information
|
P value
|
Value
|
|
trainning set
|
test set
|
|
Sex
|
female
|
17110
|
47.44%
|
7183
|
47.39%
|
<.001
|
male
|
18958
|
52.56%
|
8000
|
52.78%
|
|
Race
|
white
|
29393
|
81.49%
|
12682
|
83.67%
|
<.001
|
black
|
3375
|
9.36%
|
2034
|
13.42%
|
other
|
3300
|
9.15%
|
743
|
4.90%
|
Age
|
60-79
|
22835
|
63.31%
|
9484
|
62.57%
|
<.001
|
80+
|
13233
|
36.69%
|
5975
|
39.42%
|
Marital status
|
no
|
15991
|
44.33%
|
7515
|
49.58%
|
<.001
|
yes
|
20077
|
55.66%
|
7944
|
52.41%
|
tumor size
|
<3cm
|
10131
|
28.09%
|
2825
|
18.64%
|
.006
|
<7cm
|
12343
|
34.22%
|
4439
|
29.28%
|
>7cm
|
13595
|
37.69%
|
8194
|
54.06%
|
Mets at bone
|
no
|
32969
|
91.41%
|
12565
|
82.89%
|
.000
|
yes
|
3099
|
8.59%
|
2894
|
19.09%
|
Mets at brain
|
no
|
33804
|
93.72%
|
13499
|
89.06%
|
.000
|
yes
|
2264
|
6.28%
|
1960
|
12.93%
|
Mets at liver
|
no
|
34844
|
96.60%
|
14097
|
93.00%
|
.000
|
yes
|
1224
|
3.39%
|
1362
|
8.99%
|
AJCC M, 7th
|
M0
|
26140
|
72.47%
|
7835
|
51.69%
|
.000
|
M1a
|
3409
|
9.45%
|
2114
|
13.95%
|
M1b
|
6519
|
18.07%
|
5510
|
36.35%
|
|
AJCC N, 7th
|
N0
|
19809
|
54.92%
|
6294
|
41.52%
|
<.001
|
N1
|
3705
|
10.27%
|
1486
|
9.80%
|
N2
|
9748
|
27.03%
|
5798
|
38.25%
|
N3
|
2806
|
7.78%
|
1881
|
12.41%
|
AJCC T, 7th
|
T1
|
10130
|
28.09%
|
2826
|
18.64%
|
<.001
|
T2
|
12343
|
34.22%
|
4439
|
29.28%
|
T3
|
7289
|
20.21%
|
3900
|
25.73%
|
T4
|
6306
|
17.48%
|
4294
|
28.33%
|
Grade
|
I
|
4225
|
11.71%
|
1156
|
7.63%
|
<.001
|
II
|
13796
|
38.25%
|
4903
|
32.35%
|
III
|
17360
|
48.13%
|
9032
|
59.59%
|
Ⅳ
|
687
|
1.90%
|
368
|
2.43%
|
Histologic Type
|
adenocarcinoma
|
19493
|
54.04%
|
7347
|
48.47%
|
<.001
|
Squamous cell carcinoma
|
12737
|
35.31%
|
5757
|
37.98%
|
Large cell carcinoma
|
631
|
1.75%
|
369
|
2.43%
|
other
|
3207
|
8.89%
|
1986
|
13.10%
|
Surg Prim Site
|
no
|
19441
|
53.90%
|
11337
|
74.79%
|
<.001
|
yes
|
16627
|
46.10%
|
4122
|
27.19%
|
Radiation
|
no
|
12844
|
42.55%
|
7590
|
50.07%
|
yes
|
16583
|
57.45%
|
7577
|
49.99%
|
<.001
|
Chemotherapy
|
no
|
12987
|
42.31%
|
8120
|
43.57%
|
<.001
|
yes
|
16431
|
57.69%
|
8706
|
57.44%
|
Identifying independent factors for early death
The cox proportional hazards regression model was used to statistically analyze the baseline information of the patients, and we calculated the risk of patient's 1-year and 5-years mortality status based on different baseline information. The results are shown in Table 2. Among them, we can find that regardless of early-stage patients or advanced-stage patients, the differences in tumor size, treatment methods, cancer metastasis, and AJCC M staging have a significant impact on the survival of patients. Patients with a tumor diameter of 7 cm or more have a significantly higher risk of death than patients with a tumor diameter of less than 7 cm; in terms of treatment, the risk of death for patients who have undergone surgery, chemotherapy, and radiotherapy is greatly reduced. Patients with metastatic cancer also had a significantly increased risk of death. Secondly, in patients with late death, the impact of age, gender, race, and grade differentiation on the survival status of patients is more obvious than that in patients with early death. In the early death patients, the P value of the cox regression analysis of Grade and AJCC N stage was too high, which was not statistically significant, so it basically had no effect on the death of the patients. Tumor histomorphology, age, and marital status had a certain degree of influence on the death of patients.
By applying multivariate COX proportional hazard regression in the SEER cohort, the risk factors associated with early death of non-small cell lung cancer were analyzed (Table 2). multivariate COX proportional hazard regression models showed that age, race, sex, tumor size, AJCC M stage, metastases site, therapy, molecular subtype were associated with early death.
Table 2. Multivariate Cox proportional hazards regression for early and late death in patients with non-small cell lung cancer
|
Multifactorial cox proportional-hazards model within 5 years
|
Multifactorial cox proportional-hazards model within 1 year
|
|
|
95%CI
|
|
|
|
95%CI
|
|
|
Variables
|
OR
|
lower limit
|
upper limit
|
p vaule
|
OR
|
Lower limit
|
upper limit
|
p vaule
|
age(years)
|
|
|
|
|
|
|
|
|
60-79
|
Ref
|
|
|
|
Ref
|
|
|
|
80+
|
1.235
|
1.209
|
1.261
|
<0.001
|
1.084
|
1.055
|
1.114
|
<0.001
|
Sex
|
|
|
|
|
|
|
|
|
Felmale
|
Ref
|
|
|
|
Ref
|
|
|
|
Male
|
1.267
|
1.241
|
1.294
|
<0.001
|
1.086
|
1.052
|
1.144
|
<0.001
|
Race
|
|
|
|
|
|
|
|
|
white
|
Ref
|
|
|
|
Ref
|
|
|
|
black
|
0.941
|
0.911
|
0.972
|
<0.001
|
0.959
|
0.920
|
0.999
|
0.045
|
other
|
0.782
|
0.752
|
0.813
|
<0.001
|
0.901
|
0.855
|
0.950
|
<0.001
|
marital status
|
|
|
|
|
|
|
|
|
no
|
Ref
|
|
|
|
Ref
|
|
|
|
yes
|
0.892
|
0.874
|
0.911
|
<0.001
|
0.912
|
0.864
|
0.952
|
<0.001
|
tumor size
|
|
|
|
|
|
|
|
|
<3cm
|
Ref
|
|
|
|
Ref
|
|
|
|
<7cm
|
1.470
|
1.425
|
1.515
|
<0.001
|
1.320
|
1.287
|
1.455
|
<0.001
|
>7cm
|
1.878
|
1.819
|
1.938
|
<0.001
|
1.673
|
1.535
|
1.761
|
<0.001
|
Mets at bone
|
|
|
|
|
|
|
|
|
no
|
Ref
|
|
|
|
Ref
|
|
|
|
yes
|
1.211
|
1.167
|
1.256
|
<0.001
|
1.180
|
1.133
|
1.230
|
<0.001
|
Mets at brain
|
|
|
|
|
|
|
|
|
no
|
Ref
|
|
|
|
Ref
|
|
|
|
yes
|
1.406
|
1.350
|
1.464
|
<0.001
|
1.292
|
1.235
|
1.353
|
<0.001
|
Mets at liver
|
|
|
|
|
|
|
|
|
no
|
Ref
|
|
|
|
Ref
|
|
|
|
yes
|
1.262
|
1.207
|
1.319
|
<0.001
|
1.161
|
1.107
|
1.218
|
<0.001
|
AJCC M,7th
|
|
|
|
|
|
|
|
|
M0
|
Ref
|
|
|
|
Ref
|
|
|
|
M1a
|
1.585
|
1.527
|
1.646
|
<0.001
|
1.288
|
1.236
|
1.342
|
<0.001
|
M1b
|
1.797
|
1.729
|
1.868
|
<0.001
|
1.294
|
1.238
|
1.353
|
<0.001
|
AJCC N,7th
|
|
|
|
|
|
|
|
|
N0
|
Ref
|
|
|
|
Ref
|
|
|
|
N1
|
1.491
|
1.439
|
1.546
|
<0.001
|
1.072
|
1.022
|
1.125
|
0.005
|
N2
|
1.627
|
1.583
|
1.671
|
<0.001
|
1.159
|
1.122
|
1.197
|
<0.001
|
N3
|
1.615
|
1.555
|
1.677
|
<0.001
|
1.125
|
1.077
|
1.176
|
<0.001
|
Grade
|
|
|
|
|
|
|
|
|
I
|
Ref
|
|
|
|
Ref
|
|
|
|
II
|
1.286
|
1.233
|
1.340
|
<0.001
|
0.986
|
0.926
|
1.050
|
0.665
|
III
|
1.466
|
1.406
|
1.528
|
<0.001
|
1.079
|
1.014
|
1.147
|
0.016
|
Ⅳ
|
1.498
|
1.379
|
1.628
|
<0.001
|
1.138
|
1.026
|
1.263
|
0.015
|
Histologic Type
|
|
|
|
|
|
|
|
|
adenocarcinoma
|
Ref
|
|
|
|
Ref
|
|
|
|
Squamous cell carcinoma
|
1.213
|
1.185
|
1.241
|
<0.001
|
1.083
|
1.050
|
1.116
|
<0.001
|
Large cell carcinoma
|
1.355
|
1.255
|
1.462
|
<0.001
|
1.051
|
0.960
|
1.151
|
0.281
|
other
|
1.292
|
1.249
|
1.337
|
<0.001
|
1.150
|
1.104
|
1.199
|
<0.001
|
Radiation
|
|
|
|
|
|
|
|
|
no
|
Ref
|
|
|
|
Ref
|
|
|
|
yes
|
0.814
|
0.795
|
0.834
|
<0.001
|
0.777
|
0.754
|
0.800
|
<0.001
|
Chemotherapy
|
|
|
|
|
|
|
|
|
no
|
Ref
|
|
|
|
Ref
|
|
|
|
yes
|
0.537
|
0.525
|
0.550
|
<0.001
|
0.505
|
0.491
|
0.520
|
<0.001
|
Surg Prim Site
|
|
|
|
|
|
|
|
|
no
|
Ref
|
|
|
|
Ref
|
|
|
|
yes
|
0.345
|
0.334
|
0.356
|
<0.001
|
0.782
|
0.750
|
0.816
|
<0.001
|
KM survival curves
KM survival curves were used to analyze the influence of different tumor morphology, metastatic sites and different treatment methods on the survival status of patients. Figure 2a shows the KM analysis of patients with different histological subtypes, where 1, 2, 3, and 4 represent adenocarcinoma, squamous cell carcinoma, large cell carcinoma, and other types of non-small cell lung cancer, respectively. Through the K-M survival curve, it can be found that the cure rate of squamous cell carcinoma is the highest. The risk of death from large cell carcinoma and other types of non-small cell carcinoma was significantly lower in the early stage of cancer than in patients with squamous cell carcinoma and adenocarcinoma. In addition to patients with adenocarcinoma, the risk of death in patients with other non-small cell carcinomas was basically the same.
Figure 2b displays the KM analysis of non-small cell lung cancer patients with different metastatic conditions, where 0 represents no metastasis, and 1, 2, 3 represent bone metastasis, brain metastasis, and liver metastasis respectively. 4 represents multiple metastases. We can find that when the cancer metastasizes, the patient's risk of death increases significantly. Among them, the average survival month of non-small cell carcinoma patients without metastasis was 23 months, and the average survival month of patients with cancer metastasis was 6 months. And in the early stage of cancer, the cure rate of patients with bone metastases and brain metastases is slightly higher than that of liver metastases, and patients with multiple metastases have the highest risk of death. The risk of death in patients with metastases in the late stage is extremely high, almost close to it.
Figure 2c demonstrates the KM analysis of patients receiving different treatment methods, where 0 represents no treatment, and 1, 2, 3 represent patients receiving surgery, radiotherapy, and chemotherapy, respectively. Through the K-M survival curve, we can find that surgery at the primary site of cancer significantly improves patient survival. The survival rate of patients can also be improved by chemotherapy or radiotherapy, and the curative effect of the two is basically the same in the early stage. Over time, in the treatment of advanced patients, the effect of chemotherapy is slightly higher than that of radiotherapy patients, and radiotherapy is of little significance to the treatment of patients at this time.
Nomogram construction
Nomogram are often used to build clinical prediction models. We used clinical data from the training set of non-small cell lung cancer patients to construct a nomogram for assessing and predicting the risk of early death in patients (figure 3). First, we identified risk factors for mortality through Cox proportional hazard regression and K-M survival curve analysis. We found that tumor size, treatment method, AJCC staging, and cancer metastasis had a significant impact on the early death of non-small cell lung cancer patients. Second, age, race, gender, and tumor subtype also affected patient mortality. Therefore, we selected these factors and performed Cox proportional hazard regression analysis using R language to construct a column chart for predicting the risk of early death in non-small cell lung cancer patients, which could help doctors and patients better predict the probability of death in cancer patients.
Nomogram validation
The predictive nomogram was validated both in the training set and the test set. For the test set, nomogram predicted probabilities of early death were computed according to the nomogram based on the train set. Figure 4 shows the ROC curves, where 4a and 4c represent the ROC curves for predicting 1-year non-small cell lung cancer death in the training and test sets, respectively, while 4b and 4d represent the ROC curves for predicting 5-year death in the training and test sets, respectively. The ROC curve areas of the line graphs for predicting death within 1 year and 5 years in the training set of non-small cell lung cancer were 0.781 (95% CI: 0.771–0.804) and 0.740 (95% CI: 0.721–0.732), respectively. In the test set, the ROC curve areas of the line graphs for predicting death within 1 year and 5 years were 0.77(95% CI: 0.761–0.821) and 0.727(95% CI: 0.702–0.845), respectively. This also demonstrates that our model is better suited for predicting early death in non-small cell lung cancer. Figure 5 shows the calibration curves of the model under different conditions. The calibration plots performed well both in the training and validation cohort. Besides, figure 6 shows the Decision Curve Analysis curves of the model, the results demonstrate that the model has good clinical value.