Incidence of Early-onset Colorectal Cancer, and Construction and Validation of a Nomogram to Predict Distant Liver Metastasis and Overall Survival

Objective: This study aimed to use the Surveillance Epidemiology and End Results (SEER) database to investigate the incidence and associated factors of early-onset colorectal cancer (EO-CRC), construct a nomogram based on prognostic-related variables to predict the risk of liver metastasis in EO-CRC, predict the overall survival (OS), and guide individualized treatment, to help manage EO-CRC and improve survival. Methods: Data regarding patients diagnosed with CRC between 2010 and 2016 were retrieved from the SEER database, and the incidence rates of different age groups, genders, and distant metastases (bone, brain, liver, and lung) after age standardization were analyzed and calculated. We selected patients with EO-CRC for further study and randomly divided them according to a 7:3 ratio for the training and validation cohorts. The validation cohort was used for the internal verication. Logistic regression analyses were used to examine the risk factors of liver metastasis. Multivariate analysis was used to construct a nomogram to predict the risk of liver metastasis in EO-CRC. Cox regression analysis identied statistically signicant variables related to prognosis to construct a nomogram to predict the OS of EO-CRC patients. The nomogram’s performance was estimated by the receiver operating characteristic (ROC) curve and calibration curve. The Kaplan-Meier method was used to classify patients into high-risk and low-risk groups according to the optimal cutoff of the prognosis (PI). Risk stratication effectively avoids the survival paradox. Results: The incidence of CRC decreased annually from 2010-2016 and increased with age, continuing to rise from 35 years old. The incidence of CRC according to gender and distant metastasis is stable, and the incidence in men is higher than in women. The most common distant metastatic organ is the liver. Logistic regression analysis revealed that the grade, N stage, treatment (surgery, radiotherapy, chemotherapy), bone metastasis, CEA, tumor deposits, and perineural invasion were signicantly related to liver metastasis of EO-CRC. The optimal cutoff, specicity, and sensitivity of the


Introduction
Colorectal cancer (CRC) is one of the four most common malignant tumors, and the ght against CRC still faces many challenges. Data from the World Health Organization show that CRC mainly affects people over 50 years of age. Recent studies have shown that the incidence of CRC in young people is increasing. Intergenerational differences in diet, environmental factor exposure, and lifestyle factors may lead to a rapid increase in the incidence of young people. This pattern of incidence is still not clear globally [1] . Patients younger than 50 years old are often referred to as having early-onset CRCs (EO-CRC) [2] . From 2000-2013, the in-depth development of fecal occult blood testing and colonoscopy rapidly reduced the incidence of people aged 65 years and over, but the incidence among people under 50 years old increased at a rate of 2% per year, and the mortality rate increased by 1%. According to the age composition and growth forecast of the world's population, by 2030, the incidence of colon cancer and rectal cancer will increase to 90 and 124% in the 20-34-year-old population, respectively, and among the 35-49-year-old population, colon cancer and rectal cancer will increase by 27.7 and 46.0%, respectively [3][4][5] .
Young people have no obvious high-risk factors and are not speci cally screened. The general lack of awareness of colon cancer and colon cancer symptoms has resulted in symptomatic patients not being diagnosed in time, leading to the development of advanced diseases. Approximately 20-25% of CRC patients are diagnosed with stage IV or related distant organ metastasis at the rst diagnosis. This number has remained stable for the past two decades. More than 50% of CRC patients will undergo metastasis as the disease progresses. Local recurrence and distant organ metastasis are the leading causes of high overall mortality [6,7] .
Despite the rapid development of treatment strategies such as immunotherapy and targeted treatment, the prognosis of CRC remains very poor [8][9][10] . Therefore, there is an urgent need for statistical analysis of metastatic CRC to help clinicians understand the distant metastasis of CRC and take medical intervention measures. Using the TNM staging system of the American Joint Committee on Cancer ( AJCC) guide treatment and assessing the prognosis of patients with CRC has certain limitations; in clinical practice, various prognostic variables have been applied to assist in the prognosis, monitoring, and treatment of diseases [11][12][13][14] . Therefore, further research is required to identify factors that may affect the prognosis of patients, consider the entirety of individualized treatment plans, use a nomogram to hierarchically manage patients and predict their survival, and create a reliable tool for monitoring and auxiliary treatment decision-making, assisting in clinical prognosis evaluation and tailored screening and clinical management strategies [15] . This study will review the epidemiology and risk factors of EO-CRC patients to better understand EO-CRC and identify individuals who may bene t from early detection and follow-up monitoring.
Demographic characteristics, clinicopathological characteristics, treatment methods, distant metastasis (bone, brain, liver, and lung), and survival follow-up data of CRC from 2010-2016 from the Surveillance, Epidemiology, and End Results (SEER) database, and calculate the age-adjusted incidence rate were included in the analysis. According to the International Classi cation of Diseases for Oncology, third edition (ICD-O-3), the primary tumor site is divided into the proximal colon (C180, C182-C184), distal colon (C185-C188), and rectum (C199, C209). Surgery information, tumor size, tumor deposits, perineural invasion, and TNM staging information of the SEER database were downloaded from the SEER database. The endpoint was de ned as OS. All the data used in this research were retrieved from the public data of the SEER database, so there is no need for medical ethics review approval, ethics approval, or declaration.

Statistical Analysis
The incidence of CRC in different age groups, genders, and distant metastases from 2010-2016 The patients were divided into 18 groups, and each 5 years old was divided into one group. The incidence rate was standardized according to the age of the American population in the year 2000, and the incidence unit of this study was 100,000/person-year. CRC patients under 50 years old were selected for further analysis and randomly divided according to the ratio of 7:3 into the training and validation cohorts. The relationship between the risk of liver metastasis of EO-CRC and the variables was analyzed by univariate and multivariate logistic regression to determine the odds ratio (OR) and 95% con dence interval (95% CI) and screen out statistically signi cant variables for constructing a nomogram. Univariate and multivariate Cox regression analyses of the relationship between overall survival (OS) and prognostic variables, determine the HR value and 95% CI, and screen out statistically signi cant variables to construct a nomogram. The sensitivity and speci city of diagnosis and prediction of the nomogram were evaluated using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The optimal cutoff of the prognosis (PI) for EO-CRC was calculated to predict the OS. Using the optimal cutoff, all patients were strati ed into high-and low-risk groups to draw a Kaplan-Meier curve. The data were downloaded, and the incidence rate was calculated using SEER Stat software (version 8.3.9). All other statistical analyses were performed using the R software (version 4.0.4). Statistical signi cance was set at P < 0.05.

Incidence of CRC
As shown in Figure 1, the incidence of CRC decreased year-on-year from 2010-2016 ( Figure 1A) and the incidence of both men and women remained stable. During the same period, the incidence of male patients was approximately 10/100,000 higher than that of women ( Figure 1B). Data show that the incidence of distant metastasis of CRC has remained stable from 2010-2016. The most common metastatic site was the liver, followed by the lungs ( Figure 1C). The incidence of patients in different age groups gradually increased with age. Before the age of 50 years, the incidence rate doubles every 5 years. After 50 years of age, the incidence rate slows, and the incidence rate increases every 5 years by about 30%. ( Figure 1A). The incidence of EO-CRC distant metastasis was the same as the overall incidence of distant metastasis of CRC ( Figure 1D).

Patient Characteristics
Data regarding a total of 29,459 patients with EO-CRC from 2010-2016 were retrieved from the SEER database. After excluding patients with a lack of follow-up or unknown data, the study nally included a total of 16,915 patients with EO-CRC. According to the ratio of 7:3, the patients were randomly divided into the training (11,840 cases) and validation sets (5,075 cases). Table 1 shows the patient characteristics.

Determining the Risk Factors for Liver Metastasis
Univariate and multivariate logistic regression analyses were used to predict the risk factors for liver metastasis in EO-CRC ( Table 2). The results showed that grade, N stage, treatment (primary tumor resection, radiotherapy, chemotherapy), distant metastasis (bone, brain), CEA, tumor deposits, and perineural invasion are independent risk factors for the risk of liver metastasis in EO-CRC.

Construction and Validation of Predictive Nomograms for Liver Metastasis
A comprehensive logistic regression analysis of variables related to the risk of liver metastasis of EO-CRC was performed to construct a nomogram to predict the risk of liver metastasis of EO-CRC ( Figure 2). The optimal cutoff, speci city, and sensitivity of the total score of the risk nomogram for liver metastasis of EO-CRC in the training cohort were -1.627, 0.801, and 0.754, respectively( Figure 3A). The optimal cutoff, speci city, and sensitivity of the validation cohort were -1.903, 0.763, and 0.763 ( Figure 3B). The AUCs of the ROC curves of the training and validation sets were 0.848 and 0.839, respectively. There was no signi cant deviation between the calibration curve and the ideal curve for the training and validation set ( Figure 4A, B).

Univariate and Multivariate Analyses of Effects of Factors on OS
Univariate and multivariate Cox regression analyses were used to analyze the relationship between the OS of EO-CRC and prognostic variables (Table3). The results showed that race, sex, primary tumor location, grade, N staging, M staging, primary tumor resection, chemotherapy, tumor size, distant metastasis (bone, liver, lung), CEA level, tumor deposits, and perineural invasion were signi cantly correlated with the OS of EO-CRC patients.

Construction and Validation of the OS nomogram for EO-CRC
All the independent risk factors with a signi cant impact on OS were included in the nomogram for predicting 1-, 3-, and 5-year OS in the training set ( Figure 5). By adding the variable scores corresponding to each patient, it is easy to obtain the survival probability of different individuals. The ROC curve showed that the AUCs at 1, 3, and 5 years were 0.739, 0.745, and 0.739 in the training cohort ( Figure 6A) and 0.766, 0.745, and 0.739 in the validation cohort ( Figure 6B), respectively. According to the prognosis's optimal cutoff, the subgroups were further divided into low-risk and high-risk groups. The prognostic difference between the two groups was statistically signi cant ( Figure 8). The optimal cutoff of the prognosis at 1, 3, and 5 years were 0.47,0.312, and 0.154 in the training cohort and 1.0, 0.604, and 0.304 in the validation cohort, respectively. At the same time, the calibration curve of the nomogram ( Figure 7A-F) was established. In the training and validation sets, the 1-, 3-, and 5-year calibration curves showed that the survival rates predicted by the nomogram were in good agreement with the actual survival rates.

Discussion
The SEER database is a public cancer registry database funded by the US federal government. This database records information such as epidemiology, clinicopathological characteristics, and survival outcomes and can be used to study the current status of CRC [3,19] . Analysis of the incidence of CRC regarding distant metastases, age group, and sex from 2010-2016 found that the incidence of CRC is decreasing year by year, and the incidence of CRC patients between 35-50 years old almost doubles every 5 years. The incidence rate gradually slows after the age of 50 years, and the incidence rate increases by approximately 30% every 5 years.
The liver is the most common site of CRC metastasis, followed by the lungs, which is more common in men. Men and women have different risk factors. The top three risk factors for men are alcohol, lowcalcium diet, and smoking, and the top three risk factors for women are low-calcium diet, low dairy product intake, and diet. Low ber intake [20] . This study was based on accurate and effective big data to describe the incidence of CRC. The results of the study are consistent with global cancer statistics [3,21−23] . At present, only 1/5 to 1/3 of countries provide high-quality incidence data. This study updated the epidemiological information on CRC with distant metastasis. Some patients may not have a complete systemic assessment, which may underestimate the outcome.
The increase in the incidence of EO-CRC is a problem worth noting, especially in the context of the popularity of CRC screening in the elderly, and the overall incidence tends to stabilize or decline. The increased awareness of the signi cant increase in the incidence of EO-CRC may help provide a detailed assessment of family histories of cancer and follow-up of symptomatic young people. There are few studies on the risk factors that lead to an increase in the incidence of bowel cancer in young people. Lowering the screening age is currently one of the primary screening strategies. This problem creates a heavy nancial burden; in countries where per capita colorectoscopic resources are scarce, investigating the risk factors of young people is the most important solution to this problem [24] .
Age is the most important risk factor for CRC. Multiple independent calculation models show that CRC screening will bene t more from the age of 45 years, and it is recommended that individuals with a family history undergo screening from the age of 40 [25,26] . EO-CRC is usually poorly differentiated, and the risk of recurrence and distant metastasis is high [27] . Approximately 20-30% of CRC patients have liver metastases at the rst diagnosis, and as the disease progresses, approximately 50% develop liver metastases; the median survival time of patients with distant liver metastases from CRC is only 6-8 months [21,28,29] . In this study, we identi ed some risk factors related to the occurrence of liver metastasis of EO-CRC. We also developed a nomogram to predict the possibility of EO-CRC with liver metastasis, an intuitive statistical prediction tool that can quantitatively inform clinicians and patients of the risk of metastatic disease, provide reference opinions for related imaging examinations, and assist in making appropriate medical decisions; with the continuous improvement to the guidelines for the diagnosis and treatment of CRC, there is an urgent need for more scienti c and standardized treatment. Precise treatment has a better curative effect and fewer adverse effects.
Single-and multi-factor Cox regression analyses were conducted in this study, and it was found that a series of prognostic factors can increase the risk of death in patients with EO-CRC, including Blacks, the primary site in the proximal colon, N2 stage, undifferentiated tumors, and unacceptable chemotherapy, tumor size > 5 cm, distant metastasis, CEA, tumor deposits, and perineural invasion. Therefore, clinicians should focus more on EO-CRC patients with these risk factors. Due to the considerable differences in demographics and clinicopathological characteristics, the survival prognosis of patients with the same TNM staging varies greatly, and TNM staging alone cannot meet the demand. Therefore, the AJCC believes that it is necessary to develop a model that can predict the probability of individual risks [30] .
The nomogram in this study integrates common and widely recognized independent prognostic risk factors, such as sex, primary tumor location, and metastasis to other organs. These independent prognostic factors are easy to obtain and do not increase the additional costs of promotion and application. The ROC and calibration curves indicate that the nomogram established in this study has an excellent predictive ability.
Early removal of adenomas, detection of precancerous lesions, and early lesions all reduce mortality. Due to systemic supportive treatment, primary site tumor, and metastasis removal, the 5-year survival rate of patients with stage IV CRC has increased from 4 to 12 % [31] . The median survival time of all patients with stage IV CRC increased from 7 to 12 months, mainly due to the improved survival rate of lung and liver metastases [21] .
Individualized medicine has further improved the e cacy of systemic therapy, and it is expected that the survival rate will be further improved. Therefore, it is necessary to optimize personalized treatment and follow-up treatment effects [32] . The study of colorectal incidence trends and related risk factors is essential for developing better risk prediction models and provides more information for research on new treatment methods.
This study had some limitations. First, this was a retrospective study. Although the SEER database consists of 18 population-based registries, coding errors and incomplete and inaccurate data cannot be avoided. Second, the SEER database failed to provide patients' family histories, or histories of smoking and drinking, life-threatening chronic diseases, adverse reactions to treatment, chemotherapy regimens, molecular genetics, and immunology information.
Despite the above limitations, given the breadth of demographic information in this study and the availability of long-term follow-up data, our study still contributes valuable information to the understanding of CRC.

Conclusion
This study analyzes the relevant epidemiological information and clinicopathological and molecular characteristics of EO-CRC and uses a nomogram to stratify the risk of patients with EO-CRC, which will help clinicians manage patients and formulate more precise individualized treatment strategies.

Availability of data and materials
The data that support the ndings of this study are available from SEER database but restrictions apply to the availability of these data, which were used under license for the current study (ID: 14423-Nov2020), and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of SEER database.

Competing interests
The authors declare that they have no con ict of interest.

Authors' contributions
Peishan yao analyzed and interpreted the patient data ,and was a major contributor in writing the manuscript.Xinlian cai carried out data analysis. Songda chen and Binchao ling participated in study design and data collection.All authors read and approved the nal manuscript.

Footnotes
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional a liations.
Peishan yao was a major contributed to this work.    Nomogram for predicting liver metastasis in patients with early-onset colorectal cancer   Nomogram for the prediction of 1-, 3-and 5-year overall survival (OS) in early-onset colorectal cancer.