Prognostic analysis and clinical characteristics of dual primary lung cancer: a population study based on surveillance, epidemiology, and end results (SEER) database

The peculiarity and the lack of clinical studies of dual primary lung cancer (DPLC) led to limited knowledge about its clinical characteristics and prognosis. This study performed a retrospective analysis to assess the prognostic factors and clinical characteristics of DPLC. A total of 1419 DPLC patients from SEER were analyzed by univariate and multivariable Cox regression analyses. The independent prognostic factors were included to establish a nomogram. The accuracy and reliability of prognostic model were evaluated by C indexes, calibration plots, receiver operating characteristic (ROC) curves, decision curve analyses (DCA) and integrated discrimination improvement (IDI) scores. Chi-square test was used to assess the differences between DPLC and single primary lung cancer (SPLC) or synchronous DPLC (sDPLC) and metachronous DPLC (mDPLC). Cox regression analysis showed that age, sex, histological type, stage, lymph node (LN) metastasis, surgery, chemotherapy were independent prognostic factors, we included these factors to establish a nomogram. In the training cohort, the C index was 0.690, and the area under curves (AUC) of 3 and 5-year survival time were 0.720 and 0.723. The calibration plots in training cohort and validation cohort were in excellent agreement. DCA and IDI showed that the predictive effect of the novel prognostic model was better than the model based on 8th AJCC TNM system. Chi-square test indicated that DPLC and SPLC had statistical differences on pathological and clinical features. The clinical and pathological characteristics of DPLC were different from the SPLC. The nomogram could provide accurate and individualized survival predictions for DPLC.


Introduction
Lung cancer has the high incidence and mortality, and is a leading cause of cancer death worldwide [1]. The incidence of dual primary lung cancer (DPLC) has also increased due to the improvement of medical level and the extension of survival time of lung cancer patients [2]. DPLC occupies the majority of multiple primary lung cancer (MPLC). And there are two subtypes of DPLC including synchronous DPLC (occurring at the same time) and metachronous DPLC (occurring at different times) [5,6]. The incidence of DPLC is approximately 0.8-14.5% per year for per nonsmall cell lung cancer patient who accesses to treatment [3,4]. Nevertheless, little studies pay attention to DPLC, clinicians still don't have a clear idea of the clinical features and Guanghui Wang and Yukai Zeng contributed equally to this work.
* Jiajun Du dujiajun@sdu.edu.cn prognosis about DPLC. Although the TNM stage system is used to evaluate prognosis widely, this criterion is more applicable to single primary lung cancer (SPLC). There are some limitations and disadvantages using TNM stage system to assess DPLC, because DPLC has different clinical features and pathological characteristics from SPLC. It is necessary to study the differences between SPLC and DPLC, and a dedicated clinical prognostic model for DPLC is urgently needed. The nomogram is a good choice for this purpose which evaluates the prognosis of DPLC patients owing to its convenience and accuracy [8,9]. Given the resulting paucity of data on prognosis and optimal management of DPLC, the aim of this study was to establish an accurate and reliable prognostic model which could predict the overall survival (OS) of DPLC patients and explore their clinical features. We performed a retrospective analysis based on the complete clinical information of DPLC and SPLC from the Surveillance, Epidemiology, and End Results (SEER) database and Provincial Hospital Affiliated to Shandong University. Verified nomogram which has compared with TNM stage system was used to predict the OS, and chi-square test was performed to research the clinical features of DPLC.

Data source
The relevant clinical information of DPLC and SPLC was extracted from SEER database between 2004 and 2015 using SEER*Stat program version 8.3.5 (http:// seer. cancer. gov/ seers tat/). In addition, MPLC patients' information was collected and selected from Provincial Hospital Affiliated to Shandong University.

Patient and variable selection
The following criteria were performed to select the DPLC patients in this study: (1) year of diagnosis was between 2004 to 2015; (2) site and morphology (site recoded ICD-O-3/WHO 2008) were lung and bronchus; (3) site and morphology (behavior recode for analysis) were malignant; (4) the lung cancers of patients were primary base on 'Sequence number', 'First Malignant Primary Indicator', 'primary by International rules' from SEER database; (5) the patients were defined as DPLC which their IDs were registered two times and the patients were defined as SPLC which their IDs were registered once. Patients with one primary lung cancer or two primary lung cancer were included for this study, and the information of status, survival time, AJCC stage, and AJCC N and grade was clear. A total of 1419 DPLC patients and 70,198 SPLC patients with pathological confirmation were selected from SEER database and 173 MPLC patients and 173 SPLC patients were selected from Provincial Hospital Affiliated to Shandong University. The variables included age at diagnosis, sex, race record, primary site label, laterality, grade, ICO-O-3 Hist/behav, malignancy, the time interval (months since the first tumor to the second tumor of the DPLC), AJCC TNM system, COD to site recode, Rx Summ-Surg Prim Site (1998 +), radiation record, chemotherapy record, smoke, chronic disease, familialhereditary disease and pre-cancer symptoms. We selected the grade with poorer differentiation and the later stage of the first primary lung cancer and second primary lung cancer as the final grade and stage of the patient. The patients' stage information recorded by 6th AJCC system was modified into 8th AJCC system.

Statistical analysis
All statistical analyses were performed by SPSS (version 24.0; SPSS Inc., Chicago, IL) and R software (4.0.5). Survival time was counted from date of diagnosis (the second tumor of DPLC) to date of death. Simple random sampling was performed by SPSS. The univariate analysis and the COX multivariate analysis were used to explored the independent prognostic factors of SPLC and DPLC. OS analysis was performed by Kaplan-Meier method. The propensity score matching (PSM) was used to eliminate the differences between SPLC and DPLC. We used the chi-square test to examine the variables. X-tile software was used to calculate cut-off value of age. Results were reported by the hazard ratio and 95% confidence interval. All p values were twosided, and p values less than 0.05 were considered statistically significant.

Univariate and multivariate COX analysis for overall survival
According to the univariate analysis of 1419 DPLC patients, the results showed age, race, sex, histological type, grade, stage, LN metastasis, surgery, chemotherapy were associated with the overall survival of DPLC patients. After univariate analysis, Cox multivariate analysis were performed to examine the effect of each clinical variables on OS, and the analysis clarified that age, sex, histological type, stage, LN metastasis, surgery, chemotherapy were significant prognostic factors (log-rank test, all p value < 0.05; Table 1). Aged ≥ 72 years old (HR 1.531; 95% CI 1.271-1.844; p < 0.001) was associated with worse OS. The female patients were associated with better prognosis (HR 0.654; 95% CI 0.566-0.755; p < 0.001). The prognosis of patients  (Fig. 1).

Propensity score matching of SPLC and DPLC
In our study, the OS rates of DPLC and SPLC patients were compared by Kaplan -Meier method. Because of the confounding factors, the survival rates between the two groups showed incomprehensible results which the SPLC patients had poor prognosis than DPLC patients ( Fig. 2A, p < 0.001). We extracted the 900 DPLC patients with surgery and 900 SPLC patients with surgery which were extracted by the random sampling method from 70,198 SPLC patients to perform PSM analysis. The factors of grade, LN metastasis and stage were included into the PSM analysis. The differences of factors between the two groups were eliminated by PSM (Table supplement 1). Kaplan-Meier method was used to explore the OS of matching data, and the result showed that the DPLC patients were associated with the poor prognosis (Fig. 2B).

Clinical and pathological differences between DPLC and SPLC
The 1419 SPLC patients were extracted from 70,198 SPLC patients by the random sampling method. The pathological differences among the single primary lung tumor, 1st tumor of DPLC and 2nd tumor of DPLC were examined by chisquare test (Table 2). According to Table 2, we could draw a conclusion that the main histologic type of DPLC was lung adenocarcinoma, the main stage of DPLC was stage I. And the distant metastasis of PDLC was rare.
The clinical characteristics were analyzed by the patients' information from Provincial Hospital Affiliated to Shandong University. A total of 173 SPLC patients and 173 MPLC patients were included into chi-square test ( Table 3). The proportion of male and female patients with DPLC was approximately the same, but the male SPLC patients made up the majority. Most MPLC patients didn't smoke and have chronic diseases before the occurrence of cancer.
There results also indicated that the pathogenesis and etiology might be different from MPLC and SPLC.

Prognostic nomogram for DPLC
A total of 1419 patients from SEER database were randomly divided into the training cohort and validation cohort. The training cohort accounted for 70% of the total cases. To predict the survival time of DPLC patients, the patients' information of training cohort was used to establish a nomogram including aforementioned significant factors (Fig. 3). The prognostic nomogram showed that surgery contributed the most to the OS of DPLC, followed by age, sex, histological type, stage, LN metastasis. The 3-and 5-year OS rates of the prediction results were shown in Fig. 3. The different classification of each factor was assigned a score on the point scale. The total score was calculated by adding all the scores of these factors. The higher score was associated with the worse survival prognosis. The nomogram could be used to predict each patient's survival time according to their own conditions.

Validation of predictive model's accuracy
Accuracy of the nomogram was examined by the C index, the ROC curve, the calibration chart. The C index of training cohort was 0.690 (95% CI 0.678-0.702), and the C index of validation cohort was 0.681 (95% CI 0.663-0.699). Both of them had good predictive value. The 3-and 5-year AUC values of the ROC curves were 0.720 and 0.723 on training cohort (Fig. 4A). The 3-and 5-year AUC values of the ROC curves were 0.696 and 0.755 on validation cohort (Figure supplement 1A). Additionally, the calibration plots presented an acceptable and excellent consistency between predicted results and actual observations for 3-and 5-year OS rates in training cohort and validation cohort ( Fig. 4B and Figure supplement 1B). We could observe that all calibration curves were closed to ideal the 45° dotted line.
To verify the predictive ability of the nomogram, the DCA was used to compare both the predictive model and the AJCC TNM system. The 3-and 5-year DCA curves of training cohort and validation cohort were shown on    In this study, we found that age, sex, histological type, stage, LN metastasis, surgery, chemotherapy were significant factors which were associated with the prognosis of DPLC patients. These results were broadly consistent with other studies, but Cong-kuan Song et al. found that race, grade also had an impact on prognosis of DPLC [10].
Surgical resection is a main therapeutic method which benefits the most DPLC patients to prolong their OS [11].
Our study also showed that the survival time of patients with two operations was longer than other patients without surgery. And previous studies showed that radiation and chemotherapy emerged as the effective alternative methods for patients for whom surgery was contraindicated or was not in line with the patient's wishes [12,13]. However, we found that only chemotherapy could prolong survival time of DPLP, radiotherapy was not independent diagnostic factor for DPLC patients.
Astonishingly, the OS of SPLC was shorter than DPLC ( Fig. 2A). We speculated that the clinical and pathological features between SPLC and DPLC were so different that the confounding factors confounded our judgment to survival time. After including SPLC and DPLC patients with surgery, the key prognostic factors such as grade, LN metastasis, stage were extracted to perform the PSM analysis, considering the interference of confounding factors. And the Kaplan-Meier curve showed reasonable outcome which the OS rate of DPLC was lower after PSM analysis (Fig. 2B). This may be because the second tumor of DPLC patients could not accept lobectomy and could only accept limited resection such as wide wedge resection.
Some previous study proposed a diagnostic criterion which DPLC could be divided into synchronous DPLC and metachronous DPLC according to interval time (6 months) of the 1st and 2nd tumor of DPLC [14]. Martini and Melamed thought the interval time of mMPLC  . 3 The nomogram of DPLC. To predict the survival time of DPLC patients, the patients' information of training cohort was used to establish a nomogram including aforementioned significant factors was 2 years based on 50 cases in 1975 [16]. But the American College of Chest Physicians put forward the diagnostic criteria that the interval time was 4 years in 2007 [6]. Our study adopted the point of view which the time was 6 months. But it was confused that the interval time wasn't significant prognostic factor, when it was Fig. 4 Validation of predictive model's accuracy. Accuracy of the nomogram was examined by the C index, the ROC curve, the calibration chart according to training cohort included into Cox regression analysis. Considering the influence of confounding factors, we performed the PSM to eliminate the difference between sDPLC and mDPLC, however, there still remained some bias after PSM. And the Kaplan-Meier curve showed that there were not any differences between their OS rates after PSM (p > 0.05). Cong-kuan Song et al. also found the results [10]. There were two possible reasons for this, firstly, we acknowledged the statistical limitations of the study due to the small sample size, secondly, the diagnostic criteria of interval were unreasonable. There results inspired that the further research could pay an attention to define new interval time which has more prognostic significance according to more data. The chi-square analysis was used to examine the differences of SPLC and DPLC. We used the data from SEER database to analyze the pathological differences, and used the data from Provincial Hospital Affiliated to Shandong University to analyze the clinical differences (Tables 2 and  3).
The pathological differences between the SPLC and DPLC were mainly reflected in laterality, histologic type, grade and TNM stage. The histologic type of DPLC mainly was adenocarcinoma, and this conclusion was consistent with previous research [15]. The differentiated grade of DPLC was mainly II and III, and the number of early-stage patients made up the majority.
The clinical differences between the DPLC and SPLC mainly were sex, smoke, chronic disease, pre-cancer symptoms. In SPLC patient's cohort, the male patients accounted for 75.14%, the female patients accounted for 24.86%. And the 119 patients with smoking habit accounted 68.79%. In DPLC patient's cohort, the proportion of sex was about the same, and the majority of patients hadn't the smoking habit. Most MPLC patients didn't have chronic diseases and precancer symptoms, which brought difficulties to the timely diagnosis for patients' diseases.
It is noteworthy that there are still no specific treatment guidelines and plan for DPLC. The nomogram was widely used to predict the prognosis for cancer, primarily because of their ability to include statistical predictive factors into a multivariate visualization gram [7]. Considering the advantages of nomogram, we included aforementioned independent prognostic factors to establish a nomogram for predicting the OS of patients. The effectiveness was assessed by C index, ROC curve and calibration plot, and the assessment results indicated that the nomogram had the excellent accuracy and predictability. Clinicians used the traditional 8th AJCC TNM system to access the progression of cancer, but the system couldn't predict the OS of patients and provide clinical guidance. And the assessment ability of the system lacked specificity for cancer subtypes. The nomogram made up for the shortcomings of the 8th AJCC TNM system, because we included clinical factors and therapy methods into the model. DCA curves were used to certify that the predictability of the new model was better than the 8th AJCC TNM system in training and validation cohort ( Fig. 4 and Figure supplement 1). The IDI values also proved these results.
Our study has the following advantages. Firstly, this study established a nomogram to predict survival time of DPLC. This model could be used to improve prediction capabilities. Then, this study was the first attempt to performed the chi-square analysis and PSM analysis for comparing the differences of DPLC and SPLC. This work expanded our knowledge of DPLC. Certainly, there are some shortcomings in our research. We did not clearly explore the differences between sDPLC and mDPLC because of the lack of data. And the patients' number from Provincial Hospital Affiliated to Shandong University and SEER database was so insufficient that the clinical features and pathological characteristics couldn't be explored clearly. Considering the shortcomings of this study, further prospective analysis should be recommended.

Conclusion
The clinical and pathological characteristics of DPLC were different from the SPLC. Therefore, AJCC TNM system was not suitable for prognosis assessment of DPLC patients. And the nomogram based on significant factors could provide accurate and individualized survival predictions for DPLC.
Funding This study was supported by the Jinan Science and Technology Plan (grant no. 202019058) and the Natural Science Foundation of Shandong Province (grant no. ZR2019MH026).

Conflict of interests
The authors declare that they have no competing interests.
Ethics approval and consent to participate Not applicable.

Consent for publication
The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. And investigators had obtained informed consent before enrolling partici-pants in clinical trials. The consent was for publication of their details under the Creative Commons Attribution License 4.0 (such that they will be freely available on the internet).