Development and validation of a Nomogram for predicting the Overall Survival of Patients With Pancreatic Ductal Adenocarcinoma of the Head of the Pancreas

Background: The purpose of this study is to develop and validate a nomogram to predict the overall survival (OS) of patients with Pancreatic Ductal Adenocarcinoma of the Head of the Pancreas (PDAC-HP). Methods: Using the Surveillance, Epidemiology, and End Results (SEER) database, we collected patients with PDAC-HP in the United States between 2004 and 2015. Patients were randomly divided into training set and validating set at a ratio of 7:3. The training set is used to develop a nomogram for predicting OS. These indicators such as the C index, the area under curve (AUC) of the receiver operating characteristic (ROC), calibration plots and the net reclassication improvement (NRI) and the integrated discrimination improvement (IDI) were used to evaluate the prediction accuracy of the nomogram. Results: A total of 33,893 patients with PDAC-HP over 20 years old were diagnosed between 2004 and 2015 were collected from the SEER database. Using multivariable Cox regression analysis, we identied eight risk factors that were associated with OS, such as age at diagnosis, sex, marital status at diagnosis, race, AJCC staging, surgery, radiotherapy and chemotherapy. A nomogram was constructed based on these variables. Compared with the AJCC staging system, the nomogram has a better C index and AUC in the training set and validatiing set. The calibration plots indicated that the nomogram was able to accurately predict the OS of patients with PDAC-HP at 1, 3, and 5 years. Conclusions: We developed and validated a nomogram, and predicted the OS of patients with PDAC-HP at 1, 3, and 5 years. Compared with the AJCC staging system, the nomogram we constructed has better performance. It shows that our nomogram could be served as an effective tool for prognostic evaluation of patients with PDAC-HP.

have already developed tumor metastasis after the diagnosis of the disease. [5,8] It was found that although surgery can signi cantly prolong the survival time of PDAC patients, the 5-year survival rate of patients after surgery is only about 20%, due to the high recurrence rate of patients after surgery. [9][10][11] At present, the tumor staging and prognosis of PDAC patients are mainly based on the AJCC staging system. [12] However, the AJCC staging system does not take into account the in uence of age, gender, race, surgery and other factors on tumor prognosis. [13,14] Therefore, constructing a prognostic prediction model with high accuracy and speci city is still of great signi cance for improving the prognosis of PDAC patients and guiding the treatment of patients.
A nomogram is a predictive model that can combine multiple risk factors. [14] In recent years, the nomogram has been widely used to predict the overall survival (OS) of different types of cancer patients. [15][16][17] At present, there are few studies using the nomogram to study the OS of patients with pancreatic ductal adenocarcinoma of the head of the pancreas (PDAC-HP). Therefore, the purpose of this study is to explore the potential risk factors that affect the OS of patients with PDAC-HP and construct a nomogram model to predict the OS of the patients.

Method
Data source In this study, the data of patients with PDAC-HP in the United States from 2004 to 2015 were derived from the surveillance, epidemiology and end result (SEER) program of the National Cancer Institute. The SEER database collected a variety of cancer data for about 30% of the population in the United States, including demographics, tumor characteristics, and survival data. [18] It provides relevant data to all researchers free of charge.
We obtained the license to use the SEER database by signing the SEER Research Data Agreement. We screened out independent risk factors that affect the OS of patients with PDAC-HP, using stepwise COX proportional hazards regression analysis. The Akaike Information Criteria (AIC) was used to screen out the nal predictor variables and build a nomogram model. The discrimination of the nomogram was evaluated by Harrell's C index (C statistic), and a bootstrap resampling with 1000 iterations was applied to verify the accuracy of the nomogram. The constructed nomogram model is internally compared with the C index of the AJCC staging system (sixth edition). The area under curve (AUC) of the receiver operating characteristic (ROC) was used to evaluate the accuracy of the nomogram in 1-, 3-and 5-year survival predictions, and the calibration plots was used to evaluate the performance of the nomogram.
Using the net reclassi cation improvement (NRI) and the integrated discrimination improvement (IDI), we evaluated the accuracy of the nomogram which was compared with that of the AJCC staging system. Finally, the decision-curve analysis (DCA) was used to evaluate the clinical validity of the nomogram. All statistical analyses were performed using R software (version 3.6.3) (http://www.r project.org/). All tests are two-sided tests, and P < 0.05 is considered statistically signi cant.

Results
Patient basic characteristics Table 1 shows the basic characteristics of the patients with PDAC-HP. In this study, we collected 33,893 patients with PDAC-HP over 20 years old were diagnosed between 2004 and 2015. These patients were randomly divided into training set and validating set, including 23,725 patients in the training set and 10,138 patients in the validating set. In the training set, the average age of the patients was 68.4 ± 11.6 years, and the median OS was 11.2 months. In the validating set, the average age of the patients was 68.8 ± 11.7 years, and the median OS was 11.2 months.  The hazard ratio (HR) and 95% con dence interval (CI) of each factor related to the OS of patients with PDAC-HP are shown in Table 2  0.621 * P < 0.05, ** P < 0.01, *** P < 0.001.

Nomogram development
We used the training set to construct the survival prediction nomogram of patients with PDAC-HP through the selected predictors. The C index of the nomogram was 0.736, which was higher than the C index of the AJCC staging system which was 0.625. The nomogram was used to calculate the OS of patients at 1, 3 and 5 years after diagnosis (Fig. 1).
Finally, we constructed calibration plots to verify the predictive ability of the nomogram. The calibration curves of patients with PDAC-HP at 1, 3, and 5 years showed that the 1, 3, and 5 year survival predictions by the nomogram are very close to the actual survival (Fig. 3).

Nomogram validation
We used the validating set to verify the nomogram, and its index was 0.732, which was higher than the C index of the AJCC staging system (0.623). Then, we compared the AUCs of the both models in the validating set to predict the OS of patients at 1, 3 and 5 years after diagnosis. In the validating set, We found that the AUCs of nomogram for predicting the OS at 1, 3 and 5 years after diagnosis were 0.792 (Fig. 4A), 0.792 (Fig. 4B) and 0.780 (Fig. 4C), respectively. It was higher than the AUCs in the AJCC staging system which were 0.670 (Fig. 4A), 0.680 (Fig. 4B) and 0.675 (Fig. 4C), respectively (Fig. 4).
In addition, we constructed calibration plots on the nomogram for patients in the validating set 1, 3, and 5 years after diagnosis. The results show that the nomogram's 1-, 1-and 5-year survival predictions were very close to the actual survival (Fig. 5).
Finally, we further analyzed the accuracy of the nomogram to predict the OS of patients with PDAC-HP. Compared with the AJCC staging system, the nomogram has better NRI and IRI values. In the nomogram, the NRIs for 1, 3 and 5 years were 0.578 (95% CI = 0.551-0.598), 0.398 (95% CI = 0.364-0.435), and 0.386 (95% CI = 0.338-0.451), respectively; the IDIs for 1, 3 and 5 years were 0.011 (P = 0.003), 0.014 (P < 0.001), and 0.014 (P < 0.001), respectively. Meanwhile, the DCA curves of the nomogram for the validating set at 1, 3, and 5 years after diagnosis are shown in Fig. 6. The results show that compared with the AJCC staging system, the nomogram has better accuracy and clinical validity in predicting the OS of patients with PDAC-HP.

Discussion
In recent years, there have been about 18 million new cancer cases worldwide each year, and about 9.6 million patients died of cancer. [1] There are approximately 460,000 new cases of pancreatic cancer each year, accounting for 2.5% of all tumors and ranking 14th among new cancers worldwide. [1] Pancreatic cancer includes pancreatic ductal adenocarcinoma (PDAC) and other types, which was the digestive system cancer with the highest mortality rate, with a 5-year survival rate of about 8%. [19,20] Because PDAC is prone to metastasis, various treatments such as surgery, radiotherapy and chemotherapy are little effective, which often has a serious impact on on the survival of patients. [21,22] Therefore, early detection of cancer and effective treatment can effectively improve the treatment effect of patients. At present, the TNM staging system is the most widely used tool for evaluating the prognosis of cancer patients, but due to its own limitations, it is unable to make individualized predictions for cancer patients themselves. [13,23,24] Therefore, the development of an individualized predictive model that integrates multiple predictive factors is of great signi cance for improving the treatment effect of PDAC patients and prolonging the survival period of patients.
A nomogram is a new type of prediction model that can predict the survival rate of a speci c outcome. [14] It can combine a variety of predictive factors, such as demographic and tumor characteristics, and graphically display the survival rate of each patient. [25] Accumulating studies have shown that compared with AJCC staging system, nomogram has better predictive ability. [26,27] At present, nomogram is widely used to predict the survival outcome of patients with various tumors, such as lung cancer, breast cancer, and liver cancer. [28][29][30] In this study, we constructed a nomogram of the survival outcomes of patients with PDAC-HP based on 33,893 American patients with PDAC-HP in the SEER database. The results of the study showed that eight variables proved to be independent prognostic factors, including age at diagnosis, sex, race, marital status at diagnosis, AJCC staging, surgery, radiotherapy and chemotherapy. Through the AIC criteria, we nally determined these eight variables as the predictors of the nal nomogram model. In the training set, the C index of the nomogram we constructed is 0.736, which was higher than the C index of the AJCC staging system (0.625). Compared with the AJCC staging system, the nomogram model predicts a higher AUCs for the OS of patients at 1, 3 and 5 years. The results of the calibration plots also showed that the nomogram predicted the expected survival rate of patients with PDAC-HP was very close to the actual survival rate. It shows that the nomogram can predict the survival outcome of patients with PDAC-HP well, and the predictive ability is better than the AJCC staging system.
In the validatiing set, We validated the nomogram for patients with PDAC-HP survival. The C index of the nomogram for the validatiing set (0.732) was similar to the C index of the training set, but was higher than that of the AJCC staging system (0.623). The results of the AUCs and calibration curve of the nomogram showed that the nomogram of the validatiing set could also predict the survival outcome of patients with PDAC-HP well. Then, in order to further evaluate the predictive ability and clinical signi cance of the nomogram, we analyzed the NRI, IDI and DCA of the nomogram. The NRI and IDI are evaluation indicators of model effectiveness. [31][32][33] Compared with the AJCC staging system, the NRI and IDI of the nomogram were higher at 1, 3 and 5 years after diagnosis. DCA is generally considered to be useful for verifying the bene ts and clinical effectiveness of the model. [34][35][36] In our study, the nomogram has better DCA results than the AJCC staging system at 1, 3 and 5 years after diagnosis. This shows that compared with the AJCC staging system, the nomogram is more clinically effective and accurate in predicting the OS of patients with PDAC-HP. In short, in predicting the OS of patients with PDAC-HP, the nomogram we constructed is better than the AJCC staging system, and provides a reference for patient treatment strategies.
This study still has some limitations that should be noted. First of all, the study is a retrospective study based on the SEER database. Some factors that may affect the OS of patients are not included in the nomogram, such as religious beliefs, education level, lymphovascular invasion, drug treatments. Second, the retrospective research has its own limitations, such as selection and information bias in the selection process of the research set. In addition, the nomogram only includes some predictors, and there may be some deviations when doctors predict the OS of patients. Finally, the data of this study only included the PDAC-HP population in some parts of the United States, and it was concluded that more large independent sets should be added for veri cation when it is promoted.

Conclusion
In summary, we screened and identi ed eight predictors related to the OS of patients with PDAC-HP, such as age at diagnosis, sex, race, marital status at diagnosis, AJCC staging, surgery, radiotherapy, and chemotherapy. We established a nomogram of patients with PDAC-HP based on a large research set to predict the OS of patients at 1, 3, and 5 years after diagnosis. The performance of the nomogram we constructed is better than that of the AJCC staging system, and can predict the OS of patients with PDAC-HP well.

Declarations
Ethical approval: The data of this study comes from the SEER database. The SEER database is a tumor-related database developed by the National Cancer Institute of the United States, providing research data for researchers free of charge. All patients participating in the study received the ethical approval sought by the National Cancer Institute. The informed consent was obtained from all patients or, if patients are under 18, from a parent and/or legal guardian.

Consent for publication:
Consent for publication was obtained from all participants.
Availability of data and materials: We obtained permission to access the database after signing and submitting the SEER Research Data Agreement form via email. The data that support the ndings of this study are available from SEER database but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of SEER database.
Data sharing: The datasets generated and analyzed during the current study are available in the SEER database repository (https://seer.cancer.gov) .

Con icts of Interest: None
Funding: None.
Author Contributions: