Predicting Survival Outcome of Patients With Colorectal Cancer and Only Lung Metastasis: A Population-Based Real-World Study

Background: Patients with metastatic colorectal cancer (mCRC) have a poor prognosis, but lung metastasis (LM) generally has a relatively desirable survival outcome. However, clinicians have had few tools for estimating the probability of survival in patients with colorectal cancer (CRC) and only LM (OLM). The present study aimed to develop nomograms estimating survival probability for patients with CRC and OLM. Methods: Data from the Surveillance, Epidemiology, and End Results (SEER) database of patients with CRC between 2010 and 2014 were retrieved for retrospective analysis. Patients with OLM diagnosed between 2010 and 2014, except for 2012 (n = 1,118) were used to conduct multivariate Cox analysis to identify independent prognostic factors. Nomograms estimating 1- and 3-year overall survival (OS) and cancer-specic survival (CSS) were developed. The nomograms were internally validated for concordance index (C-index), calibration plots, receiver operating characteristic (ROC) curves, and were also externally validated with independent patients diagnosed in 2012 (n = 261). Results: Age, marital status, tumor location, tumor size, T and N stage, CEA, tumor deposit, histological grade, primary or metastatic tumor surgery, chemotherapy, radiotherapy, and income were found to be independently associated with OS and/or CSS. The nomograms were constructed based on these prognostic factors. The C-index were high in internal validation (0.736 for OS and 0.741 for CSS) and external validation (0.656 for OS and 0.663 for CSS). Internal and external calibration plots and ROC curves demonstrated a good agreement between actual observation and nomogram prediction. Conclusions: The nomograms individually predict OS and CSS of patients with CRC and OLM and could aid in the personalized prognostic evaluation and clinical decision-making.


Introduction
Colorectal cancer (CRC) is the second leading cause of cancer death and the third most commonly diagnosed cancer worldwide, the incidence rates tend to rise consistently with social development [1].
With multidisciplinary treatment, including surgery for primary and metastasis tumor, systemic therapy and radiotherapy (mainly for rectal cancer), the 5-year relative survival rate is 65% for patients with CRC, but declines to 12% for metastatic CRC (mCRC) [2].
In terms of clinicopathological factors, old age, advanced tumor-node-metastasis (TNM) stage, and high histological grade are generally considered to be poor prognostic factors for CRC. Moreover, primary site, tumor size, tumor deposit, carcinoembryonic antigen (CEA), treatment, and some demographic factors can also in uence the prognosis of CRC [3]. Patients with mCRC have a poor prognosis, but those with only lung metastasis (OLM) generally have a relatively desirable prognosis [4]. Why patients with OLM have a better prognosis and how to predict accurately the prognosis of patients with OLM are of interest to clinicians. However, clinicians have had few tools for estimating the probability of survival in patients with CRC and OLM.
Nomogram, a statistical prediction tool that can incorporate and quantify the selected prognostic factors to estimate the survival probability for an individual patient, has been widely demonstrated in cancers including CRC, lung cancer, and gastric cancer [5][6][7]. However, to our knowledge, the nomograms predicting the overall survival (OS) and cancer-speci c survival (CSS) in patients with CRC and OLM have not been reported. Established in 1973, the Surveillance, Epidemiology, and End Results (SEER) program provides a wide range of clinicopathological and follow-up information of cancer patients, covering about 28% of the population in the United States [8]. Using the SEER database, we constructed nomograms estimating survival probability for patients with CRC and OLM. Furthermore, the internal and external validation for nomograms were performed.

Data source and selection
Patient data were retrieved from the SEER database using the SEER*stat software, version 8.3.6 (https://seer.cancer.gov/seerstat/). The ow chart of the case selection is shown in Fig. 1. The inclusion criteria were as follows: (1) stage IV colorectal cancer patients; (2) positive histological con rmation; (3) because detailed information about distant metastatic sites was not available before 2010 and the follow-up information was updated to November 2018, the present study was restricted to patients who were diagnosed between 2010 and 2014 to ensure a minimal follow-up length of three years. The exclusion criteria were as follows: (1) patients who had multiple tumors in their lifetime; (2) unknown important and easily accessible information in clinical practice including age at diagnosis, race, and marital status; (3) unknown survival time; (4) unknown or no metastasis in speci c metastatic sites (bone, brain, liver, and lung). All the included patients were used to analyze the risk factors for lung metastasis. After that, patients with only bone, brain, liver, or lung metastasis were divided to explore the effect of speci c metastatic sites on OS. Furthermore, patients with OLM were divided into training or validation cohorts based on year of diagnosis for further analysis.

Study variables
Clinical variables of CRC patients were extracted, including age, race, sex, marital status, TNM stage, tumor location, tumor size, speci c metastatic sites (bone, brain, liver, and lung), CEA, tumor deposit, perineural invasion, histological grade, treatment, vital status, survival time, corresponding death causes, the status of education and income in the county patients came from. In our study, except for the white and the black, other race was recorded as "Other". "Married (including common law)" was recorded as "Married", other marital status was recorded as "Single/unmarried". Tumor location was de ned either as "Right colorectal", "Left colorectal", or "Unknown". "Right colorectal" included International Classi cation of Diseases 10th Revision (ICD-10) codes C180 to 184, and "Left colorectal" included ICD-10 codes C185 to 187, C199 and C209. "Unknown" included ICD-10 code C188, C189. Considering it is di cult to distinguish T0-3 stages in patients without primary tumor resection, T0-3 stages were integrated into one. The status of education and income de ned either as "Low" or "High", meaning that patients came from counties with lower/higher than median education or income level. Considering the no precise survival days were available, survival time of 0 months was recorded as 0.5 months to patients who died within one month of diagnosis but who did not reach the one-month threshold [9]. OS was de ned as the time period from the diagnosis to the death caused by any cause or last follow-up, while CSS was de ned as the time period from the diagnosis to the death caused by CRC.

Statistical analysis
The chi-square tests were used to compare the clinicopathological characteristics among training cohort and validation cohort or with and without lung metastasis. Multivariate binary logistic regression analyses were performed to identify risk factors of lung metastasis. Differences in OS and CSS were compared between patients with speci c metastatic sites. Kaplan-Meier (KM) analyses and log-rank tests were used to assess this effect. Univariate and multivariate Cox proportional hazards regression models were performed to analyze the hazard ratios (HR) with corresponding 95% con dence intervals (CI) of variables associated with OS and CSS. Based on multivariate Cox analyses in the training cohort, nomograms were developed and evaluated by the concordance index (C-index), receiver operating characteristics (ROC), and calibration curves. The C-index was used to evaluate the predictive discrimination ability and accuracy of the overall nomogram: the higher the C-index, the better model's discrimination ability. The ROC curves are similar to the C-index, but are considered less suitable for use with censored data. The calibration curves were used to evaluate the accuracy of nomogram-predicted survival probability comparing with the actual survival probability. Closer distances from the points to the line indicate higher prediction accuracy of calibration curves. To verify the universal applicability of the model, nomograms were both internally and externally validated. All statistical analyses were performed in R version 3.6.1 (http://www.r-project.org/) or SPSS software version 21.0 (IBM Corporation). For all of the analyses, a two-tailed value of p < 0.05 was de ned as signi cant.

Patients characteristics
As shown in Fig. 1, a total of 992,325 patients with diagnoses of CRC were registered in the SEER database from 1969 to 2017. Among the 19,161 patients met the selection criteria, 14,360 patients were diagnosed CRC with only bone, brain, liver, or lung metastasis. Furthermore, patients with OLM (n = 1,379) were divided into two cohorts based on year of diagnosis for further analysis: patients diagnosed in 2012 were divided into validation cohort (n = 261), other patients were divided into training cohort (n = 1,118). The demographic and clinicopathologic characteristics of patients with OLM were listed in Table 1. Chisquare tests showed that there was no signi cant difference in patients' demographic and clinicopathologic characteristics between training and validation cohorts.

Factors associated with lung metastasis
As shown in Table 2, chi-square tests indicated that lung metastasis (LM) was signi cantly associated with age, race, marital status, tumor location, tumor size, T and N stage, other organs metastases (bone, brain, and liver), serum CEA level, tumor deposit, perineural invasion, and histological grade. Signi cant factors identi ed by chi-square tests were further explored in multivariate analysis, which showed that old age, non-white descent, unmarried status, left colorectal tumor, large size of the tumor, advanced T and N stage, liver metastasis, positive serum CEA, positive tumor deposit and high histological grade were independent risk factors for LM.

Differences in OS and CSS between patients with speci c metastatic sites
To con rm that different metastatic sites would affect the survival outcome of CRC patients, we further grouped the patients according to the site of metastasis. In this process, we only included the patients with CRC and only bone (n = 225), brain (n = 91), liver (n = 12,665) or lung metastasis (n = 1,379).
According to the Kaplan-Meier curves and log-rank analyses of OS and CSS (Figs. 2), among patients with CRC and only bone, brain, liver, or lung metastasis, those with OLM had the best prognosis.
Conversely, CRC patients with bone or brain metastasis had a poorer prognosis, regardless of OS or CSS, this is consistent with previous studies [4].  (Fig. 3A). With the same procedure, a nomogram of CSS was established (Table 4 and Fig. 3B). e Can't tell the difference between "No" and "Unknown".

Establishment of the nomogram
f Patients came from counties with lower/higher than median education or income level. e Can't tell the difference between "No" and "Unknown".
f Patients came from counties with lower/higher than median education or income level.
The point scale at the top of each nomogram was used rst to give every prognostic factor a score, then adding up the scores of all variables. Finally, based on the total score, the point scale at the bottom of each nomogram was used to predict the survival probability of one individual patient.

Nomogram validation
The OS and CSS nomograms were validated both by internal and external cohorts. In the internal validation, the C-index, an indicator of a model's discrimination ability, was 0.736 (

Discussion
De nitely, effective and precise prognostic prediction in patients with malignancy is important for clinical decision-making and scienti c research. Previous studies had reported that factors such as age, sex, CEA, tumor size, T and N stage, metastatic sites, and treatment were signi cant predictors related to survival outcome of mCRC patients [10]. Intuitive and convenient prognostic tools are what clinicians need. However, the inclusion of more information and a more individualized prognosis model will certainly predict the prognosis of patients more accurately.
Using the SEER database which covers about 28% of the population in the United States, we were able to collect su cient samples and therefore developed and validated nomograms that predicting 1-and 3year OS and CSS of patients with CRC and OLM. In the current study, we identi ed age, marital status, tumor location, T and N stage, serum CEA level, tumor deposit, histological grade, primary tumor surgery, chemotherapy, radiotherapy, and income were signi cantly associated with OS. As for CSS, metastatic tumor surgery was a signi cant prognostic factor, but not income. The nomograms were created based on these signi cant prognostic factors. As for high-risk OLM patients, more aggressive treatments are recommended, and the follow-up interval is encouraged to be shortened appropriately to detect the occurrence of an endpoint event in the early stage.
Unsurprisingly, our results suggested that chemotherapy and surgical treatment were the main prognostic factors in CRC patients with OLM, and the primary site was also signi cantly associated with the prognosis. Indeed, it has been widely recognized that chemotherapy can improve the prognosis of mCRC patients, left-sided tumor location is also a predictor for better prognosis [11]. Moreover, surgery is still the superior option for CRC patients who are able to undergo surgery, even for patients with mCRC. A comprehensive review including 26 clinical studies reported that the survival outcome of CRC patients with limited metastatic diseases were signi cantly improved by R0 resection of metastases [12].
Likewise, several retrospective studies and meta-analyses had demonstrated an association between primary tumor resection and improved OS in CRC patients with metastatic disease [13][14][15][16]. However, the data have not been entirely consistent across the published studies [17][18][19]. Of note, the correlation between radiotherapy and prognosis should be based on rectal cancer, and for the sake of simplicity, no distinction was made between left colon cancer and rectal cancer in the present study. Consistent with previous ndings, our results suggested that sex, race, and tumor size had no effect on long-term survival for patients with LM from CRC [20]. Unlike it was clear that age was a prognostic factor in mCRC patients [21], whether sex, race, and tumor size had effects on the prognoses of patients with mCRC is still controversial. Some studies reported that white descent, male sex, larger primary tumor size predicts poorer survival [10,22,23].
In addition to prognostic nomograms, the present study identi ed several risk factors associated with LM from CRC. Combined with nomogram for prediction of the risk of LM from CRC [10,20], clinicians and researchers can effectively identify patients at high risk of LM and monitor follow-up computed tomography closely. Moreover, we con rmed that different metastatic sites would affect the prognosis of CRC patients. Further exploration for why CRC patients with OLM had a better prognosis is worth looking forward to.
There were some limitations in our study. First, the nomograms were developed on the basis of retrospective data, prospective validation is needed. Second, some important information such as chemotherapy regimens, targeted therapy, molecular pathology, and genetic test were not recorded in the database. Third, the data of the timeline of other information such as CEA levels in patients were not available too. Finally, because of the limitations of our data, we didn't explore the effect of the factors, such as histological type and the number of positive lymph nodes, on the prognosis of mCRC patients. Further prospective studies with more critical information are warranted for model improvement and independent validation.

Conclusion
In summary, we developed nomograms predicting OS and CSS of patients with CRC and OLM individually. The nomograms showed good accuracy and applicability. Therefore, we recommend the simple and clear nomograms to be used as a convenient and effective tool for clinicians to evaluate the prognosis of individualized CRC patients with OLM and determine the treatment strategy.

Declarations
Funding