Nomograms Predicting Long-Term Overall Survival and Cancer-Specic Survival in Metastatic Hepatocellular Carcinoma Patients: A SEER Population-Based Study

Background: This study aims to evaluate the clinicopathological characteristics of metastatic hepatocellular carcinoma (HCC) patients and develop nomograms to predict their long-term overall survival (OS) and cancer-specic survival (CSS). Methods: Information on metastatic HCC from 2010 to 2015 was retrieved from the Surveillance, Epidemiology and End Results (SEER) program of the National Cancer Institute. The metastatic HCC patients were divided into a long-term survival (LTS) group and a short-term survival (STS) group with 1 year selected as the cut-off value. Then, we compared the demographic and clinicopathological features between the two groups. Next, all patients were randomly divided into a training group and validation group at a 7:3 ratio. Univariate and multivariate Cox regression analyses were used to identify potential predictors for OS and CSS in the training group, and nomograms of OS and CSS were established. These predictive models were further validated in the validation group. Results: A total of 2163 patients were included in the current study according to the inclusion and exclusion criteria. Patients with characteristics including lower T stage and N stage; treatment with surgery, radiation or chemotherapy; no lung metastasis; and AFP negative status showed better survival. The concordance index (C-index) of the OS nomogram was 0.72 based on 9 variables. The C-index of the CSS nomogram was 0.71 based on 8 variables. Conclusions: These nomograms may help clinicians make better treatment recommendations for metastatic HCC patients. groups. There were 83 (30.3%) and 624 (33%) patients with distant lymph node metastasis, 79 (28.8%) and 528 (28.0%) patients with bone metastasis, 5 (1.8%) and 38 (2%) patients with brain metastasis, and 76 (27.7%) and 791 (41.9%) patients with lung metastasis in the LTS group and STS group, respectively. The results showed that patients in the STS group had higher lung metastasis (P < 0.001) rates than those in the LTS group. there was node or


Introduction
Primary liver cancer includes hepatocellular carcinoma (HCC), intrahepatic cholangiocarcinoma (ICC), and HCC combined with ICC. HCC accounts for 75% ~ 85% of primary liver cancer (1) . Primary liver cancer is the sixth most common cancer and the fourth leading cause of cancer-related death globally, with approximately 841,000 new cases and 782,000 deaths annually (1,2) . According to the annual forecast of the World Health Organization, more than one million patients will die of liver cancer in 2030 (3). Great progress has been made in the treatment of primary liver cancer, such as novel surgical techniques, ablation treatment, transcatheter arterial chemoembolization (TACE), and liver transplantation, and the survival of patients has been largely improved. However, for metastatic HCC patients, the 5-year survival rate is less than 3.1% (4) . At present, there is no satisfactory clinical model to predict the prognosis of these patients.
Hepatitis B virus (HBV), hepatitis C virus (HCV), a atoxin-contaminated food, alcoholism, obesity, smoking and type 2 diabetes are major risk factors for HCC (1,5) . Measuring cancer-speci c survival (CSS) rather than overall survival (OS) will more accurately describe the survival of metastatic HCC patients.
An increasing number of researchers have used nomograms to predict the prognosis of cancer.
Compared with the AJCC staging system, a good nomogram can predict patient survival with better accuracy (6)(7)(8) . Nomograms are useful scoring and visualization tools, and multivariate Cox or competing risk models can be transformed into a single score sheet. Based on a series of clinical factors and patient characteristics, nomograms have been used to assist surgeons in developing treatment and follow-up strategies for a variety of cancers, including breast cancer, adenoid cystic cancer and gastric cancer (9)(10)(11) .
Nomogram development is also included in the NCCN guidelines (12,13) . The purpose of this study was to use the Surveillance, Epidemiology, and End Results (SEER) database to develop nomograms to predict OS and CSS in metastatic HCC patients to improve patients' treatment and follow-up strategies.

Data source
The Surveillance, Epidemiology, and End Results (SEER) database is a United States population-based cancer registry that is maintained by both the National Cancer Institute and Centers for Disease Control and Prevention. The SEER database contains information from a total of 18 population-based cancer registries in the United States, which represents approximately 28% of the US population. Since there was no detailed information about speci c metastasis sites in the SEER database before 2010, we selected metastatic HCC patients from 2010 to 2015. A series of exclusion criteria was applied to lter the data: (1) patients with M0 in the M stage or unknown status; (2) patients with unknown status of bone metastasis, brain metastasis, lung metastasis, and distant lymph node metastasis; (3) patients with unknown race, marital status, tumor size and alpha fetoprotein (AFP); (4) patients under 18 years of age; (5) patients with inactive follow-up; and (6) patients with information collected from autopsy or death certi cates. A total of 15 variables, including age, race, gender, marital status, T stage, N stage, surgery information, radiation therapy, chemotherapy, tumor size, distant lymph node metastasis, bone metastasis, brain metastasis, lung metastasis and AFP, were selected to construct prediction models for the OS and CSS of patients. Then, the patients were divided into two groups, the short-term survival (STS) group and the long-term survival (LTS) group, with 1 year selected as the cutoff value. STS was de ned as a survival time less than 1 year. LTS was de ned as a survival time equal to or longer than 1 year. SEER Stat software (SEER-Stat 8.5.6) was used to extract the data. In our study, OS was de ned as the time from diagnosis to death for any reason. CSS was de ned as the time from diagnosis to death from HCC. The survival time of HCC patients with distant metastasis ranged from 0 to 82 months in the SEER database. The following formula shows how survival time was calculated.
Survival months = FLOOR ((endpoint -date of diagnosis)/days in a month) The FLOOR function always rounds down, e.g., FLOOR (1.68) = 1. Days in a month is assigned to 365.24/12. According to clinical diagnostic criteria, AFP less than 10 ng/ml was de ned as negative, and AFP greater than 10 ng/ml was de ned as positive.
All eligible patients were randomly divided into a training group (70%) and validation group (30%) at a 7:3 ratio by using the 'createDataPartition()' function from the 'caret' package in R to minimize differences between the two groups.

Statistical analysis
Univariate Cox regression analysis was used to screen the prognostic indicators related to OS and CSS in the training group. Then, potential prognostic indicators with P values less than 0.2 were selected to develop a multivariate Cox regression model (14) . Variables with P values less than 0.05 were identi ed as cancer-related prognostic indicators. Nomograms were established to predict the 1-and 2-year OS and CSS of patients and for further visualization. The C-index and calibration plot were used to evaluate the nomograms. The C-index is used to predict the probability that the forecast is consistent with the actual situation. The higher the C-index is, the better the prediction model. The calibration plot is composed of two curves. The closer the two lines are, the higher the accuracy of the model prediction (15) . The distance from the irregular curve to the straight line is proportional to the nomogram accuracy. One thousand bootstraps were performed to calibrate the parameters. ROC curves were plotted, and then AUCs were calculated to evaluate the prediction accuracy of the model. Kaplan-Meier curves were employed to analyze survival differences. All computations were conducted in R version 3.5.2 software (packages: rms, foreign, survival, survivalROC, tables, ggplot2, survminer, and caret) (16,17) .  Table 1. A total of 1515 patients were divided into the training group, and 648 patients were divided into the validation group. The demographic and clinicopathological characteristics of the patients in the training group and validation group are listed in Table 2. All patients were divided into three age groups according to their age (19-59, 60-69, and over 70 years old). Race was classi ed into white, black and other (American Indian/AK Native, Asian/Paci c Islander). Marital status was classi ed as married or unmarried. The patients were divided into three groups according to their tumor sizes (less than 3 cm, 3 cm-5 cm and over 5 cm). The median age of the patients was 73 years (ranging from 19 to 85). The median follow-up time of the patients was 90 days (ranging from 1 to 2460). Of all patients,  patients were treated with surgery, radiotherapy, and chemotherapy, respectively. In the SEER database, four extrahepatic metastasis sites were recorded, including bone, brain, lung, and distant lymph nodes. Therefore, we also analyzed metastasis at these four sites; among them, there were 867 (40.1%) patients with lung metastasis, which was the most reported metastatic site. The brain was the least reported metastatic site, with only 43 (2.0%) cases.

LTS and STS
In Table 1, the median age of the patients was 61 years and 62 years in the STS group and LTS group, respectively. There was no signi cant difference in age, race, gender, marital status, distant lymph node metastasis, bone metastasis or brain metastasis between the two groups. The LTS group had a higher proportion of patients at the T1-T2 stage (46.7% vs 33.4%, P < 0.001) and a lower proportion of patients at the N1 stage than the STS group (19.0% vs 28.8%, P < 0.001). For treatment modalities, fewer patients in the STS group underwent surgery (2.7% vs 15.0%, P < 0.001), radiation therapy (16.3% vs 29.2%, P < 0.001) and chemotherapy (39.0% vs 69.7%, P < 0.001). For AFP, the proportion of negative patients in the LTS group was signi cantly higher than that in the STS group (25.5% vs 13.1%, P < 0.001).
The distribution of site-speci c metastasis was different in the two groups. There were 83 (30.3%) and 624 (33%) patients with distant lymph node metastasis, 79 (28.8%) and 528 (28.0%) patients with bone metastasis, 5 (1.8%) and 38 (2%) patients with brain metastasis, and 76 (27.7%) and 791 (41.9%) patients with lung metastasis in the LTS group and STS group, respectively. The results showed that patients in the STS group had higher lung metastasis (P < 0.001) rates than those in the LTS group.
Prognostic analyses: development of nomograms to predict patients' OS and CSS In OS analysis, twelve variables, including gender, marital status, T stage, N stage, surgery, radiation, chemotherapy, tumor size, brain metastasis, bone metastasis, lung metastasis and AFP, were identi ed as potential predictors by univariate Cox regression analysis (P < 0.2). These variables were further included in multivariate Cox regression analysis to develop nomograms. Eight variables identi ed by univariate Cox regression analysis were potential predictors in CSS analysis, including gender, T stage, surgery, radiation, chemotherapy, tumor size, lung metastasis and AFP. Finally, 9 variables were identi ed as independent prognostic factors for OS, and 8 variables were identi ed as independent prognostic factors for CSS in the multivariable Cox regression analyses. Gender, T stage, surgery, radiation, chemotherapy, tumor size, lung metastasis and AFP were predictors in both the OS and CSS models, while N stage was a predictor only in the OS model (Table 3). These variables were selected to develop nomograms for predicting the 1-year and 2-year OS and CSS for all patients (Figs. 1 and 2).  (Figs. 3 and 4). The AUCs to predict the 1-year and 2-year OS rates were 0.756 and 0.770, and those for the CSS rates were 0.738 and 0.757, respectively (Fig. 5).

Risk strati cations
The risk scores for every patient were calculated by using the corresponding nomogram. The patients were then divided into high-and low-risk groups accordingly. The Kaplan-Meier method was used to plot the survival curves of OS and CSS in the training and validation groups. The low-risk group showed a better prognosis than the high-risk group (Fig. 6).

Discussion
Contrary to the declining trend of mortality in other common cancers, the mortality of primary liver cancer increased by nearly 3% annually from 2010 to 2014, especially in patients with metastasis whose 5-year relative survival rate was 3.1% (1,18) . The distal metastasis of primary liver cancer is a heterogeneous disease, and its clinical outcomes are highly variable, depending on potential tumor biology and patient characteristics (19,20) . Currently, nomograms as decision tools are becoming increasingly popular for use in predicting cancer risk factors and are also being widely used for predicting the survival outcomes of different cancers (21)(22)(23) . Nomograms quantify risk by combining and illustrating the relative importance of various prognostic factors, and they have been used in clinical oncology assessment. Well-validated nomograms incorporate numerous key risk predictors to help clinicians come to reasonable conclusions to assess the prognosis of patients, they also show outstanding predictive ability for individual survival outcomes. Therefore, it is necessary to construct nomograms for predicting OS and CSS in metastatic HCC patients to achieve better patient management.
In the current study, we compared the clinicopathological characteristics between patients with long-term survival and those with short-term survival and found that patients with the following characteristics were more likely to have better survival: lower T stage and N stage; treatment with surgery, radiation or chemotherapy; no lung metastasis; and AFP-negative status.
A study showed that male and female cancer patients might have different survival lengths and survival outcomes (24) . Male patients showed a worse prognosis than female patients in liver cancer (25) . According to different reports, gender is an important prognostic factor in different cancers (26)(27)(28) . This nding may be related to the greater proportion of men who smoke, drink and have a poor lifestyle (29,30) . According to relevant statistics, from a global perspective, the ratio of smoking and drinking in women is signi cantly lower than that in men (31) . At the same time, smoking and drinking are important risk factors for the incidence of liver cancer (32) . Researchers pointed out that in liver cancer patients, the ratio of androgen and estrogen would change accordingly, and the rise of estrogen could promote the proliferation of human liver cells (33) . Therefore, the occurrence of HCC might also be related to estrogen (34,35) . In the current study, univariate and multivariate Cox regression analyses also showed that gender was an important prognostic factor for metastatic HCC. In short, the survival of female patients was signi cantly better than that of males, which might be due to differences in smoking, alcohol consumption and estrogen levels between males and females.
It was reported that the most common metastatic sites of liver cancer were lung, portal vein and portal vein lymph nodes (36) . Studies have indicated that the brain is the least common distant metastatic organ in HCC patients (37,38) . In our study, we found that the most common site of metastasis was the lung, followed by the distal lymph nodes and bone. Brain metastasis was relatively rare in HCC patients. This nding is consistent with those of previous studies (39,40) . The survival rate of patients with brain metastases is very low, and the median survival time is only 2.4 months (41) . However, in our study, brain metastasis showed no difference in the LTS group and STS group, and it was excluded following the univariate and multivariate Cox regression analyses of OS and CSS because only 43 out of 2163 HCC patients had brain metastases. It might be helpful to construct another nomogram in the future if there are su cient HCC patients with brain metastases.
From the univariate and multivariate Cox regression analyses, surgical treatment, radiotherapy and chemotherapy were important prognostic factors for the OS and CSS of metastatic HCC patients.
Compared with the STS group, more patients in the LTS group received treatment with surgery, radiation and chemotherapy. According to relevant studies, even in patients with advanced liver cancer, adequate treatment with surgery, radiotherapy and chemotherapy could improve prognosis (42)(43)(44)(45) . However, one relevant study pointed out that advanced patients were not suitable for surgical treatment (46) . Radiation therapy, which used to be a palliative option for HCC treatment, has now been proven to be effective for selected advanced HCC patients (47,48) . Therefore, our study revealed that active radiotherapy or chemotherapy could effectively improve the survival of these patients when surgical treatment is no longer suitable. Moreover, according to reports, TACE/TAE is a very common and effective treatment for HCC patients (49) . If it were to be included in our research, the prediction ability of the model would certainly be improved. Unfortunately, the SEER database does not contain information about TACE/TAE, so this variable could not be included in the models.
Our results showed that the nomogram could accurately estimate the impact of all independent factors on OS and CSS in HCC patients with metastases. In the current study, several advantages are re ected in the following aspects. First, all the data we selected came from the SEER database. The accuracy of the data and the applicability of the results were guaranteed. Second, we evaluated our models in a variety of ways, and the results proved that our nomograms could be applied to practical work with less bias and better accuracy. Finally, we divided patients into the LTS and STS groups and used univariate and multivariate Cox regression analyses to better compare the prognosis of different factors in patients.
Although our research was based on a large-scale population from the SEER database, there were still some limitations that need to be mentioned. First, this was a retrospective study with inevitable selection bias. Second, the SEER database did not include clinical information on several valuable predictors, such as serum tumor markers or biological and genetic characteristics. Third, detailed information on chemotherapy, radiotherapy and surgery was not reported in the SEER database. Therefore, we could not incorporate the detailed treatment information into the nomogram. Finally, the MELD score or CTP score is important for HCC patients, and whether patients could receive treatment depends on the degree of liver brosis. However, there was no information about the MELD score or CTP score in the SEER database, so we could not combine the prognosis of patients with the MELD score or CTP score, which may lead to corresponding shortcomings of the model.

Conclusions
In summary, patients with the following characteristics more likely to have longer survival: lower T stage and N stage; treatment with surgery, radiation or chemotherapy; no lung metastasis; and AFP negative status. We successfully developed nomograms for predicting the 1-year and 2-year OS and CSS of metastatic HCC patients. The prognostic model could improve the ability of clinicians to predict individual survival rates and make better treatment recommendations.

Consent of publication
Not applicable.

Availability of data and materials
All data can be retrieved from SEER public database with approval from SEER program. The datasets analyzed during the current study are not publicly available due SEER program restrictions but are available from the corresponding author on reasonable request.
Authors' contributions JS conceived the design of current study and CC carried out preliminary data screening and prepared this manuscript. YD, RL, HY, PW and DW collected and checked the data. JZ, JS revised the manuscript and all  The range of the AUC is generally between 0.5 and 1. Usually, the larger the value of the AUC, the more accurate the prediction of the representative model is.