Development and Validation of A Nomogram for Predicting the Overall Survival of Patients with Testicular Cancer

The purpose of this study was to develop and validate a nomogram containing multiple predictors for the survival of testicular cancer patients. Testicular cancer patients diagnosed between 2004 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) database were selected for this study. A random sampling method was used to divide patients into training and validation cohorts, which accounted for 30% and 70% of the total sample, respectively. The nomogram was developed using the training cohort and evaluated using the C index, calibration chart, and area under the receiver operating characteristic curve (AUC). The same method was applied to the validation cohort to verify the nomogram.


Abstract Background
The purpose of this study was to develop and validate a nomogram containing multiple predictors for the survival of testicular cancer patients.

Methods
Testicular cancer patients diagnosed between 2004 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) database were selected for this study. A random sampling method was used to divide patients into training and validation cohorts, which accounted for 30% and 70% of the total sample, respectively. The nomogram was developed using the training cohort and evaluated using the C index, calibration chart, and area under the receiver operating characteristic curve (AUC). The same method was applied to the validation cohort to verify the nomogram.

Results
Seven risk factors that affect the survival of testicular cancer patients (AJCC stage, marital status, age at diagnosis, race, SEER historic stage A, surgery status, and origin) were identi ed using Cox proportional hazard regression analysis. The training cohort was used to construct a nomogram. The nomogram has a higher C index (0.897) and AUC when compared with the AJCC staging system. The validation cohort was used to verify the nomogram, which has a higher C index and AUC when compared with the AJCC staging system. The results of the calibration chart of the nomogram show that the predicted survival of testicular cancer patients at 3, 5, and 10 years after diagnosis is very close to their actual survival.

Conclusions
We developed and validated a nomogram for predicting the survival rate of testicular cancer patients at 3, 5, and 10 years after diagnosis. This nomogram has better discrimination, calibration, and clinical validity than the AJCC staging system. This indicates that the nomogram can be used to predict the survival of testicular cancer patients effectively, and provide a reference for patient treatment strategies.

Background
Testicular cancer accounts for about 0.4% of all cancers. [1,2] There were about 71,000 new testicular cancer patients and 9,500 deaths globally in 2018, mostly in developed regions such as Europe, Australia, and North America. [2] There are approximately 9,300 new testicular cancer patients in the US every year, making it the 18th most common tumor in males. [3,4] Testicular cancer is located in the male reproductive system, and is the second most common after prostate cancer. [5,6] Testicular cancer is most prevalent in males younger than 45 years, and mainly consists of germ cell tumors (more than 90% of cases). [7][8][9] Testicular cancer is a disease caused by multiple factors, such as genetics, diet, and lifestyle. [10][11][12] The incidence of testicular cancer reportedly differs between ethnic groups, being highest in whites . [13,14] Cryptorchidism is also a risk factor for testicular cancer. [15] Cryptorchidism patients have an incidence 10 times that of other patients. [16] Testicular cancer is mainly treated through surgical removal of the tumor combined with radiotherapy and chemotherapy, with most patients having a good prognosis. [17] However, some patients respond poorly to the treatment.
At present, the TNM staging system is often used to predict the survival outcome of testicular cancer patients. [18] However, the TNM staging system only accounts for the impact of some tumor characteristics such as size, invasion status, and metastasis on the survival of patients, and does not consider individual differences such as sex, age, and race for its prognosis. [19] With the discovery of additional cancer risk factors and scienti c and technological developments, it is necessary to develop a new method for predicting patient survival that includes tumor characteristics, demographic, and treatment methods. A nomogram is a graphical representation of a model that combines a variety of risk factors, and has been widely used to predict the overall survival of different types of cancer patients. [19,20] This study explored risk factors that potentially affect the prognosis of testicular cancer patients based on the Surveillance, Epidemiology, and End Results (SEER) 18 database, and developed a nomogram to predict their overall survival.

Data source
In this study, the data of testicular cancer patients between 2004 and 2015 were extracted from the SEER 18 database. The SEER database covers approximately 30% of the US population and contains a large amount of cancer research data. The SEER 18 database collected data related to various patient tumors between 1973 and 2015, including demographics, tumor characteristics, treatment methods, and survival data. Signing the SEER Research Data Agreement provided us with access to the SEER database.

Study population and inclusion criteria
The SEER*Stat software (version 8.3.6) was used to identify testicular cancer data from the SEER 18 database using its code from the International Taxonomy of Tumors (third edition). The main site codes of testicular cancer are C62.0-C62.9, and are classi ed according to location, including undescended testis (C62.0), descended testis (C62.1), and testis (C62.9). Testicular cancer patients with incomplete data on race, AJCC stage, and survival, as well as those younger than 20 years were excluded. We extracted 25,468 testicular cancer patients from the SEER 18 database, who were divided into a training cohort (70%) and a validation cohort (30%) using random sampling. The patient screening process is shown in eFigure 1 in the Supplement. The data collected from all patients included race, age at diagnosis, AJCC stage, tumor size, year of diagnosis, surgery status, chemotherapy status, radiotherapy status, marital status, survival time, and tumor extension.

Statistical analysis
We used univariable and multivariable Cox proportional hazard regression analyses to calculate the hazard ratio (HR) and 95% con dence interval (CI) of each risk factor for testicular cancer. The Akaike Information Criteria (AIC) was used as the selection criteria, and selected the variable with the lowest AIC in the model as the nal predictor variable. Random sampling was used to divide patients into training and validation cohorts, of which the training cohort accounts for 70% of the sample. By screening the signi cant predictor variables, we used the training cohort to construct a nomogram and predict the survival of each patient 3, 5, and 10 years after diagnosis. The performance of the nomogram was internally compared with the C index of the AJCC staging system (sixth edition), and the accuracy of the nomogram was identi ed using 1,000 iterations of bootstrap resampling. We evaluated the accuracy of the 3-, 5-and 10-year survival predictions using the area under the receiver operating characteristic curve (AUC), and evaluated the performance of the nomogram using calibration graphs. The same method was applied to the validation cohort to validate the nomogram. The accuracy of the nomogram was compared with that of the AJCC staging system through their net reclassi cation improvement (NRI) and the integrated discrimination improvement (IDI) values. The clinical validity of the nomogram was evaluated by decision-curve analysis (DCA). All statistical analyses were performed using R software (version 3.6.3). The signi cance level was set to P<0.05.

Basic characteristics of patients
This study included 25,468 testicular cancer patients older than 20 years who were diagnosed between 2004 and 2015. The patients were randomly divided into the training cohort (17,827 patients) and the validation cohort (7,641 patients). The patients were aged 37.2 ± 13.0 and 37.0 ± 12.9 years (mean ± SD) in the training and validation cohorts, respectively, and were mostly white (91.6% and 92.5%), at AJCC stage I (73.2% and 73.4%), and at the localized stage (66.2% and 66.4%). The basic characteristics of the patients are listed in Table 1.

Nomogram development
According to the seven screened predictors, we used the training cohort to construct the survival prediction nomogram of testicular cancer patients, which had a C index of 0.898. Based on the nomogram, we calculated the overall survival of testicular cancer patients at 3, 5, and 10 years after diagnosis (Fig. 1). According to the characteristics of each testicular cancer patient, we can calculate each predictor's score and calculate the survival of the patient in 3, 5, and 10 years after survival based on the total score of all predictors. We compared the resolution of the nomogram and the AJCC staging system through 1,000 iterations of bootstrap resampling. We found that the C index was higher for the nomogram (0.898) than for the AJCC staging system (0.834).
In addition, we compared the AUCs when using the nomogram and the AJCC staging system to predict the overall survival of testicular cancer patients 3, 5, and 10 years after diagnosis. For the nomogram, the AUCs predicting the overall survival of the patient at 3, 5, and 10 years after diagnosis were 0.914 ( Fig. 2A), 0.909 (Fig. 2B), and 0.920 (Fig. 2C), respectively. The AJCC staging system predicted that the AUCs of the overall survival at 3, 5, and 10 years after diagnosis were 0.850 ( Fig. 2A), 0.826 (Fig. 2B), and 0.817 (Fig. 2C), respectively. The AUCs of the nomogram and the AJCC staging system for the training cohort are shown in Fig. 2.
Finally, we constructed calibration charts for the nomogram based on patients in the training cohort at 3, 5, and 10 years after diagnosis to verify the similarity between the survival predicted by the nomogram and the actual survival of patients (Fig. 3). The results show that the nomogram's 3-, 5-and 10-year survival predictions were very close to the actual survival.

Nomogram validation
We used the validation cohort data to verify the nomogram, using the same method we used to construct the model. We compared the resolution of the nomogram and the AJCC staging system in the validation cohort through 1,000 iterations of bootstrap resampling. The results show that the nomogram for the validation cohort has a higher C index (0.872) than the AJCC staging system (0.797). We then compared the AUC in the validation cohort for both models to predict the overall survival of testicular cancer patients at 3, 5, and 10 years after diagnosis. In the validation cohort, the AUCs predicting the overall survival of patients at 3, 5, and 10 years after diagnosis (see Fig. 4A, 4B, and 4C, respectively) were 0.891, 0.881, and 0.875, respectively, in the nomogram, and 0.809, 0.794, and 0.759 in the AJCC staging system.
The AUC of the validation cohort from the nomogram and AJCC staging system are shown in Fig. 4.
We constructed calibration plots on the nomogram for patients in the validation cohort 3, 5, and 10 years after diagnosis. The results show that the 3-, 5-, and 10-year survival predictions by the nomogram are very close to the actual survival rates (Fig. 5).
Finally, we compared the accuracy of the nomogram with that of the AJCC staging system for the validation cohort. At 3, 5, and 10 years after diagnosis, the NRI values were 0.379 (95% CI = 0.276-0.491), 0.383 (95% CI = 0.263-0.489), and 0.422 (95% CI = 0.307-0.507), respectively, while the IDI values were 0.066 (P < 0.001), 0.078 (P < 0.001), and 0.088 (P < 0.001). The DCA curves of the nomogram for the validation cohort at 3, 5, and 10 years after diagnosis are shown in Fig. 6. The results show that the nomogram is more clinically effective and accurate at predicting the survival of testicular cancer patients than the AJCC staging system.

Discussion
Testicular cancer is a cancer of the reproductive system that is common in young males and is the 29th most common new type of cancer in the world. [2] There are approximately 9,300 new testicular cancer patients in the US each year, making it the 18th most common new cancer for male patients. [4] Testicular cancers mostly consist of germ cell tumors, which often have a serious impact on health. [21,22] Therefore, early detection and treatment are of great importance in improving therapeutic effects and prolonging survival of patients with testicular cancer. [23] At present, the AJCC staging system is the most common tool used by clinicians in predicting the survival of cancer patients. It cannot predict the survival of the individual as it only contains the relevant characteristics of the tumor. [24] Therefore, the development of a new method that individualizes survival predictions is of great importance for testicular cancer patients.
Due to the limitations of the AJCC staging system, nomograms have become a new method for predicting the overall survival of cancer patients in recent years. [19] A nomogram is a predictive model that includes a variety of predictors, such as tumor and demographic characteristics, and types of therapy. [25][26][27] It displays the predicted survival of patients in a graphical manner based on a complex mathematical formula. Using the nomogram, we can calculate the score of each predictor variable and its cumulative score that matches with the results list to predict the survival of each patient. [28,29] Given that nomograms can contain diverse predictors and provide accurate predictions, they have been widely used to predict the survival of many other cancers, such as lung, breast, liver, stomach, and prostate cancer. [30][31][32][33] The present study constructed a nomogram for predicting the survival of testicular cancer patients. We rst extracted data on 25,468 testicular cancer patients from the SEER database, and analyzed the risk factors that affect their survival using multivariable COX regression analysis. We identi ed the seven predictors most relevant to survival (P<0.05) based on the AIC criteria, which were AJCC stage, race, SEER historic stage A, age at diagnosis, surgery status, marital status, and origin. We next clari ed the impact of these factors on the long-term survival of testicular cancer patients using multivariable COX regression analysis. Therefore, we decided to include these in the nal forecast nomogram. We then constructed a nomogram for the training cohort based on the ltered predictors. The nomogram for the training cohort has a higher C index (0.898) than the AJCC staging system (0.834). The nomogram has a higher AUC than the AJCC staging system for 3, 5, and 10 years after diagnosis. According to the results of the calibration curve of the training cohort at 3, 5, and 10 years after diagnosis, we found that the nomogram's predictions on testicular cancer survival are very close to the actual survival. This indicates that the nomogram is more accurate in predicting the survival of testicular cancer patients than the AJCC staging system.
We used the validation cohort to validate the nomogram for testicular cancer patient survival. The C index of the nomogram for the validation cohort (0.872) was similar to that for the training cohort, but higher than that of the AJCC staging system (0.797). The AUC values of the validation cohort for 3, 5, and 10 years after diagnosis are similar to the nomogram of the training cohort. This indicates that the nomogram for the validation cohort has similar testicular cancer survival predictions to the nomogram of the training cohort. Meanwhile, we constructed calibration curves for the validation cohort at 3, 5, and 10 years after diagnosis, which con rmed this conclusion. We then evaluated the clinical signi cance of the nomogram for predicting the survival of testicular cancer patients. We found that the nomogram had higher NRI and IDI values for 3, 5, and 10 years after diagnosis than the AJCC staging system. This indicates that the nomogram has more accurate predictions for the overall survival of testicular cancer patients. DCA is often considered to be useful for verifying the bene ts and clinical validity of a model. [19,34,35] In our research, the nomogram had better DCA results than the AJCC staging system at 3, 5, and 10 years after diagnosis. This indicates that, compared to the AJCC staging system, the nomogram is more clinically effective and accurate in predicting the survival of testicular cancer patients. In summary, the nomogram we constructed is better than the AJCC staging system at predicting the survival of testicular cancer patients, and provides a reference for patient treatment strategies.
Our study had several limitations. First, the research data comes from the SEER database which lacks some information, such as basic disease status, education level, drug treatments, religious beliefs, and family history, which may have an impact on the survival of testicular cancer patients. Second, cohort studies have inherent limitations, such as possible selection and information bias. Third, there are inherent limitations for any nomogram, such as the assumption that the data collected and analyzed are static in time, and there are no recognized reporting standards for performance. [19] In addition, our study only included testicular cancer patients in some regions of the US, therefore external data veri cation needs to be added for it to be applied to other regions. Future studies should include testicular cancer patients from other countries or regions to further verify the nomogram.

Conclusions
We screened and identi ed seven predictors that were most relevant to the survival of testicular cancer patients, which were AJCC stage, race, SEER historic stage A, age at diagnosis, surgery status, marital status, and origin. We developed and validated a nomogram based on these predictors, and used it to predict the survival rate of testicular cancer patients at 3, 5, and 10 years after diagnosis. The nomogram we constructed has improved discrimination, calibration, and clinical validity compared to the AJCC staging system. This indicates that the nomogram can be used to predict the overall survival of testicular cancer patients accurately, and can provide a reference for patient treatment strategies.

Declarations
Ethical approval: The SEER database is a tumor-related database developed by the National Cancer Institute of the United States, providing research data for researchers free of charge. All patients participating in the study received the ethical approval sought by the National Cancer Institute.

Consent for publication:
Consent for publication was obtained from all participants.
Availability of data and materials: We obtained permission to access the database after signing and submitting the SEER Research Data Agreement form via email.
Data sharing: Figure 1 Nomogram predicting the 3-year, 5-year and 10-year overall survival of testicular cancer patients.

Figure 2
Comparison of the AUC of nomogram and AJCC staging system in the training set.
Page 20/21 Figure 3 The calibration of the nomogram using the training set.

Figure 4
Comparison of the AUC of nomogram and AJCC staging system in the validation set.
Page 21/21 Figure 5 The calibration of the nomogram using the validation set.

Figure 6
The decision curve analysis of the nomogram and the AJCC staging system in the validation set.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.