Prognostic models for predicting overall survival and cancer-specic mortality in hepatocellular carcinoma: A competing risk analysis

Background: Mortality due to hepatocellular carcinoma (HCC), which is the most common liver cancer, is often overestimated because of deaths from other causes. This study was conducted to estimate the probability of cancer-specic mortality (CSM) of patients with HCC and establish a competing risk nomogram for predicting the CSM of among these patients HCC using a large population-based cohort. Methods: Patients diagnosed with HCC between 2004 and 2015 were identied from the Surveillance Epidemiology and End Results Program. CSM and overall survival (OS) were the endpoints of the study. A competing risk nomogram for predicting CSM was built using the Fine and Gray regression model, and the nomogram for predicting OS was constructed with the Cox proportional hazard regression model, and 10-fold cross-validation was performed for the entire set. Results: A total of 34,957 patients were included in the study and randomly divided into a training set and validation set at a ratio of 7:3. Multivariate analysis identied age, race, surgical therapy, chemotherapy, radiotherapy, tumour diameter, and tumour staging as the independent predictive factors of CSM. In addition to these factors, sex and marital status were also identied as independent predictive factors of OS. Using these factors, corresponding nomograms were constructed for CSM and OS. In the validation set, the 5 year concordance-indices of the two nomogram models were estimated as 0.746 and 0.74. Calibration curves revealed good consistency between model predictions and observed outcomes. Furthermore, based on the results of cumulative incidence function analysis and Kaplan-Meier analysis, patients were categorised into four distinct risk subgroups, supporting the predictive performance of the models. Conclusions: In this population-based analysis, we developed and validated nomograms for individualized prediction of CSM and OS in patients with HCC.

of Cancer Control and Population Science, National Cancer Institute.

Patients
Patients with HCC were identi ed based on the International Classi cation of Disease for Oncology, third edition primary site code C22.0 and Histologic type code 8170-8175. We excluded patients with missing data, which were recorded as 'blank' in the database. Additional, patients aged < 18 years or those with history of other primary malignancies, with invalid follow-up data, or with unde ned data recorded as 'unknown' in the database were also excluded from the study. The detailed inclusion and exclusion criteria are shown in Figure 1.

Variable selection
Information on demographic factors (race, age, sex, and marital status), tumour diameter, tumour stage according to American Joint Committee on Cancer (AJCC) staging system, therapeutic factors (surgery, chemotherapy, and radiotherapy) and followup data were collected from the SEER database.
Marital status was recorded as single (never married), separated, divorced, widowed, unmarried or domestic partner, and married (including common law) in the SEER database. We grouped single (never married), separated, divorced, widowed, and unmarried or domestic partner into the unmarried category. No information was available on tumour staging according to the eighth edition AJCC staging system, and only patients diagnosed after 2010 showed information on diagnosis according to the seventh edition AJCC staging system in the SEER database. To enrol as many patients as possible, the sixth edition AJCC staging system was accepted for further analysis. Based on the Surgery Codes of the SEER program, we divided surgical procedures into four categories: no surgery, local tumour destruction (e.g., heat-radio-frequency ablation, and percutaneous ethanol injection), resection, and transplantation.
The endpoints of the study were CSM and OS. The speci c cause of death was determined based on the code of 'SEER causespeci c death classi cation' in the SEER database. OS was calculated from the date of diagnosis to the date of death due to any cause or the most recent follow-up. CSM was de ned as the interval between the date of diagnosis and the date of the most recent follow-up or of the death caused by HCC alone. The median follow-up time was calculated by the reverse Kaplan-Meier method.

Statistical analysis
Demographic and clinical variables were summarized by descriptive statistics. Categorical variables were expressed as percentages (%). The entire set was randomly divided into training and validation sets at a ratio of 7:3. The comparison of categorical variables between training and validation sets were performed using the chi-square test. CSM and death from other causes were regarded as two competing endpoint events, and the associations between variables and the risk of CSM were evaluated by Fine and Gray's competing risk analysis [6]. The corresponding cumulative incidence of CSM in different groups were depicted by the cumulative incidence function (CIF) and were compared using the Gray's test [6,8,13]. Variables with p < 0.05 as determined by univariate analysis or with clinically relevant results were then evaluated using multivariate analysis based on the Fine and Gray's proportional sub-distribution hazard ratio model. The independent predictive factors in the multivariate analysis were incorporated in the nomogram model to predict the 1-, 3-, and 5-year CSM probability [7]. In addition, with aim to compare the difference between competing risk analysis and traditional survival analysis, we also performed the traditional Kaplan-Meier and Cox proportional hazard regression analysis for CSM in present study. In traditional Kaplan-Meier and Cox proportional hazard regression analysis, the competing event was regarded as censored data.
For OS, the independent risk factors were identi ed by univariate and multivariate Cox proportional hazard regression analyses, and the corresponding nomogram model was then constructed to predict 1-, 3-, and 5-year OS probabilities.
The predictive performance of nomogram models was analysed from perspectives of discrimination and calibration. The discriminative ability of the models was tested using the concordance index (C-index), and calibration of models was tested by calibration curves [14,15]. In addition, 10-fold cross-validation was conducted on the entire set, which was randomly partitioned into 10 equal sized subsamples [11]. Furthermore, CIF curves with Gray's test or Kaplan-Meier curves with log-rank test were also conducted to measure the performance of the models. The risk groups were classi ed according to previously recommended cut-points for predictive models (16th, 50th, and 84th) [16], which classi ed patients into good, fairly good, fairly poor, and poor risk groups based on their personalized total points determined using the nomogram models.

Basic characteristics of patients
According to the inclusion and exclusion criteria, 34,957 patients diagnosed with HCC between 2004 and 2015 were included for analysis ( Figure 1). The entire cohort was then randomly divided into a training set (n = 24,469) and validation set (n = 10,488) at a ratio of 7:3. The basic clinicopathological features of the entire cohort as well as those of the training and validation sets are shown in Table 1. In the entire cohort, most patients were Caucasian (67.9%), male (77.2%), and younger than 60 years (46.0%). In terms of therapy for HCC, 67.3% of patients did not undergo surgery for the primary nodules, 13.2% of patients were treated by local tumour destruction, 11.8% of patients underwent liver resection, and 7.7% of patients underwent liver transplantation surgery. Regarding tumour characteristics, tumours diameter smaller than 3 cm (33.3%) and stage I per AJCC criteria (41.9%) were most common. There were no signi cant differences among clinicopathological features between the training and validation sets.

Follow-up and survival analysis of patients
The median follow-up time was 63 months (range: 1-155 months) months. Of the 34,957 patients, 9,840 had survived throughout the follow-up, 21,044 died from HCC, and 4073 died from other causes. For the training set, the 5-year OS, CSM, and death due to other causes were 24.2%, 63.9%, and 11.9%, respectively. The 3-and 5-year cumulative incidence of mortality and CIF curves corresponding to each clinicopathological variable are shown in Table 2 and Figure 2.
In the traditional survival analysis for CSM, the cumulative incidence of CSM estimated by Kaplan-Meier function were higher than that estimated by CIF, the 5-year cumulative incidence of CSM estimated by Kaplan-Meier analysis in the training set was 69.1%. (Table S1) Identi cation of risk factors and construction of nomograms Univariate and multivariate analysis were performed in the training set to identify independent risk factors associated with CSM and OS (Table 3, 4 and Table S3).
For CSM, multivariate analysis based on Fine-Gray's competing risk analysis showed that age, race, surgical therapy, chemotherapy, radiotherapy, tumour diameter, and tumour stage were independent risk factors for CSM. Speci cally, old age, white race, absence of surgical therapy, absence of radiotherapy, absence of chemotherapy, larger tumour diameter, and advanced tumour stage were associated with increased probability of CSM. In the multivariate traditional Cox regression analysis, these seven variables were again identi ed as independent risk factors of CSM. Moreover, sex and marital status were identi ed as additional independent risk factors.
Multivariate Cox regression analysis showed that all the aforementioned nine variables were independent risk factors for OS.
Speci cally, old age, male sex, white race, unmarried status, absence of surgical therapy, absence of radiotherapy, absence of chemotherapy, larger tumour diameter, and more advanced tumour stage were associated with poorer OS.
Based on the associated independent risk factors in corresponding multivariate analysis, the nomogram for predicting OS and competing risk nomogram for predicting CSM were constructed ( Figure 3). Nomogram, which integrate various prognostic factors, is an easy-to-apply graphical tool for personalized prediction of survival probability of patients. In the nomogram, each variable has corresponding points that cross the scale. After adding the points of all variables, we can obtain the total points, which can be used to estimate the probability of event occurrence by drawing a line downward from the location of the total points to the survival axes.
Predictive performance of nomogram models The predictive performance of nomogram models was veri ed via the C-index and calibration curve in the training and validation sets.  (Table S2). Calibration curves for 3-and 5-year were also well-matched with the standard lines ( Figure 4 and Figure S1).
Based on the nomograms, each patient was assigned corresponding total points for probability of CSM and OS. The median total points calculated for CSM and OS were 148 (range: 7-254) and 169 (range:  in the training set and 148 (range: 7-254) and 169 (range: 16-296) in the validation set. Based on previously reported cut-off points (16th, 50th, and 84th of total points in the training set) [16], patients were categorised into four distinct risk groups. CIF and Kaplan-Meier analysis also showed that the curves corresponding to the four risk groups were clearly separated in the training and validation sets (both p < 0.001), further supporting the good predictive performance of the nomogram models ( Figure 5).

Discussion
Cancer-speci c death and other cause-speci c death are mutually exclusive events in oncology research. The traditional Kaplan-Meier and Cox regression methods, which take competitive events as censoring data, tend to overestimate the incidence of CSM [17,18]. Therefore, there may be deviation in the prognosis assessment of patients by clinicians, creating a substantial psychological burden on patients and negatively affecting their lives. In the present study, we conducted a realworld study based on the SEER database to identify independent predictive factors of CSM of patients diagnosed with HCC using the competing risk method and established a competing nomogram model for individualized prediction of CSM. We also constructed a model for predicting OS. Both models achieved excellent predictive e ciency, which could potentially help clinicians assess prognosis more accurately.
In the present study, we also compared the traditional Kaplan-Meier and Cox regression analysis with CIF and Fine-Gray's competing risk regression analysis. To our knowledge, this is the rst study to compare the two algorithms in the prognostic analysis of HCC. Although there is no obvious difference in the predictive performance of the models, there are some problems that deserve our attention. Firstly, in the presence of competing events, traditional survival analysis regarded competing events as censoring data, which may overestimate the incidence of interest event. Our results showed that the cumulative incidence of CSM estimated by Kaplan-Meier method was higher than that of CIF function, which was consistent with previous research on other malignancies [19,20]. Similarly, the Cox regression model also overestimated the incidence of CSM, and the degree of overestimation became more obvious with the extension of follow-up. (Figure S2) Overestimating the incidence of CSM will bring huge psychological burden to patients and affect their lives. Therefore, in the case of competitive events, the competing risk model will be a better choice for accurately predicting the incidence of interested events, especially for those elderly patients or those with early tumour stage, whose competing events will occur more frequently. In addition, the results of multivariate analysis were also different in the Fine-Gray's analysis and Cox regression analysis. Sex and marital status was identi ed as independent risk factor for CSM in Cox regression model, however, this two variables were no signi cant in the Fine-Gray's analysis.
The two models constructed for prediction of CSM and OS included nine parameters: age, race, sex, marital status, surgical therapy, chemotherapy, radiotherapy, tumour diameter, and AJCC staging. Marital status and sex were the only difference between the two models. Previous research showed that marital status is one factor affecting prognosis [21,22]; this may be because a close and cohesive family increases the likelihood of adherence, and psychological and economic support from spouses may contribute to improvements in survival in married patients [23][24][25]. Furthermore, several studies based on the SEER database indicated that HCC patients who were married had a better prognosis [26][27][28]. However, our competitive risk analysis showed that marital status was signi cantly associated with well OS but not CSM. Therefore, the marital status mainly associated with other cause-speci c death for HCC but has little association with cancer-speci c death. Although using population-based data from SEER can reduce selection or treatment biases associated with small sample sizes or single-centre data analysis, there were several limitations to this study. First, this was a retrospective study. Second, although we included a large number of multicentre queues, all patients were from the United States. As there may be differences in treatments and the management of HCC among different countries, international multicentre studies are needed to accurately estimate and generalise the predictive performance of the models. Third, not all previously reported factors were recorded in the SEER database, such as the aetiology of HCC and genomic data. Considering the heterogeneity of HCC, the inclusion of these variables may improve the predictive power of the models.

Conclusion
Overall, in this population-based study, we developed and validated nomogram models for individualized prediction of CSM and OS in patients with HCC. These simple tools can help clinicians in identi ng high-risk groups and can guide clinical decision making. For the patients, the models will help answer consultation questions from patients and provide personalized prognosis assessments. Abbreviations HCC, hepatocellular carcinoma; CSS, cancer-speci c survival; OS, Overall survival; CSM, cancer-speci c mortality; SEER, Surveillance, Epidemiology, and End Results; AJCC, American Joint Committee on Cancer; CIF, cumulative incidence function; OCSS, other cause-speci c survival.

Declarations
Ethics approval and consent to participate: As all the data of this study were derived from the SEER database, institutional review board approval and consent to participate were not required.

Consent for publication:
Not applicable.
Availability of data and materials: All the data of this study were derived from the SEER database (www.seer.cancer.gov).
Competing Interests: The authors declare no con ict of interest.   Supporting Information Figure S1.