Development and Validation of Nomograms to Predict Lung Metastases in Patients with Ovarian Cancer: A Large Cohort Study

Background Lung metastasis, an independent risk factor affecting the prognosis of patients with ovarian cancer, is associated with poor survival. We tried to develop and validate a nomogram to predict the risk of lung metastases in newly diagnosed patients with ovarian cancer. Methods Patients diagnosed with ovarian cancer from the surveillance, epidemiology and nal results (SEER) database between 2010 and 2015 were retrospectively collected. The model nomogram was built based on logistic regression. The consistency index (C-index) was used to evaluate the discernment of the lung metastasis nomogram. Calibration plots was drawn to analyze the consistency between the observed probability and predicted probability of lung metastases in patients with ovarian cancer. The Kaplan-Meier method was used to estimate the overall survival rate, and the inuencing factors were included in the multivariate Cox regression (P<0.05) to analyze the independent prognostic factors of lung metastases. Results A total of 16,059 eligible patients were randomly divided into training (n = 11242) and validation cohort (n = 4817). AJCC T, N stage, bone metastases, brain metastases and liver metastases were evaluated as predictors of lung metastases. Finally, a nomogram was constructed. The nomogram based on independent predictors was well calibrated and showed good discriminative ability. The C index is 0.761 (0.736-0.787) for the training cohort and 0.757 (cid:0) 0.718-0.795 (cid:0) for the validation cohort. The overall survival rate of ovarian cancer patients with lung metastases was reduced. Mixed histological types, chemotherapy and primary site surgery were factors that affect the overall survival of ovarian cancer patients with lung metastases. Conclusion: The clinical prediction model had high accuracy and can be used to predict the lung metastasis risk of newly diagnosed patients with ovarian cancer, which can guide the treatment of patients with lung metastases.


Introduction
Ovarian cancer is one of the most common malignant tumors in the female reproductive system. Ovarian cancer is the fth most common cause of cancer-related deaths among American women. In 2018, there were an estimated 14,070 deaths in the United States 1 . Because the symptoms are unclear and there is currently no effective screening method, most patients are already in the advanced stage (III and IV) at the time of diagnosis, accompanied by distant metastases 2 .
Lung metastasis is the third most common distant metastatic site of ovarian cancer, accounting for 28.42% of the distant metastatic sites. The location of distant metastases is an independent prognostic factor for overall survival 3 . Previous studies had shown that the important risk factors for distant metastases are stage, grade, and lymph node involvement 4 . However, the sample size of the study is small. There are few studies on the risk factors of lung metastases and most of them are case reports 5 6 .
The median interval between diagnosis of ovarian cancer and recording of metastatic disease was 44 months 4 . Identifying the risk factors for lung metastases can ensure that high-risk patients with lung metastases are thoroughly investigated and, if possible, treat these patients as early as possible or provide appropriate preventive treatment. Not only a large number of research samples are needed, but realistic evidence is still needed to determine the risk factors for lung metastases in patients with ovarian cancer.
The purpose of this study was to use surveillance, epidemiology, and end result (SEER) databases to characterize the prevalence, related factors, and prognostic factors of lung metastases in patients with ovarian cancer. At the same time, a nomogram based on clinical factors was established to predict the risk of lung metastases and may guide lung metastases screening.

Study population
Data were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. The SEER *Stat 8.3.5 software (https://seer.cancer.gov/data/) was used to access the database. The site code was restricted to ovary. Since the details of metastases were not recorded before 2010, Patients with primary cancer of the ovarian cancer who were aged ≥ 18 years at diagnosis and between 2010-2015 were analyzed. The exclusion criteria for patient selection included (1)unknown grade; (2) unknown AJCC T,N stage and AJCC T0 stage; (3)unknown metastases information; (4)unknown tumor size; (5)unknown laterality; (6)unknown therapy information. The owchart of the subjects' selection is listed in Fig. 1. According to the inclusion and exclusion criteria, 16059 patients with ovarian cancer were nally enrolled in our study. We further randomly divided the patients in a 7-to-3 ratio forming a training cohort (n = 11242) for nomogram construction and a validation cohort (n = 4817) for internal veri cation. Data regarding clinical characteristics including age, race, marital status, insurance status, year of diagnosis, household income at the time of diagnosis, histological type, grade, laterality, clinical AJCC T, N stage, tumor size, metastatic status, and therapy information were collected from the SEER database.
Since all information from the SEER database has been identi ed and no personal identifying information was used in this analysis, informed consent is not required for use of the SEER data. The present study complied with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Statistical analysis
Statistical analysis was performed using SPSS 21 software. Categorical data were presented as frequency (%) and analyzed by the chi-squared test. Kolmogorov-Smirnov test was used to verify normality of variables. Normally distributed variables were expressed as mean ± standard deviation while non-normally distributed variables were expressed as median (interquartile range). Meanwhile, 95% con dence intervals (CIs) and hazard ratios were calculated. Univariable and multivariable logistic regression analyses were used to determine risk factors of lung metastases of patients with ovarian cancer. Factors with a P-value less than 0.05 were incorporated into the multivariable logistics regression model.
A lung metastases nomogram formulated based on the results of multivariable logistic analysis using the rms package in R version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria; www.rproject.org). The receiver operating characteristic (ROC) curves were drawn. Finally, we evaluated the stability of the prognostic nomogram and the lung metastases nomogram by internal validation with 1000 bootstrap samples. The nomograms were validated both internally and externally. The C-index (Harrell's concordance index) was used to assess the exact predicted values of nomograms. Calibration plots were drawn to analyze the consistency between observed probability and predicted probability. The overall survival was estimated by Kaplan-Meier method and the difference between distinct groups was compared using the log-rank test. Multivariable Cox regression model, incorporating the signi cant factors in Kaplan-Meier method (P < 0.05) was conducted for analyzing the independent prognostic factors for lung metastases.

Patients' Basic Information
According to the inclusion and exclusion criteria, 16,059 of the 35,333 patients with ovarian disease registered between 2010 and 2015 were collected from the SEER database. The patients were divided into training group (n = 11242) and veri cation group (n = 4817). The basic information of the patients is listed in Table 1. The median age of these patients was 59 years. Among these patients, there were 13223 (82.3%) whites, 1057 (6.6%) blacks and 1711 (10.7%) other races. 3377 (21.0%) patients were unmarried, 8549 (53.2%) patients were married, and 3486 (21.7%) patients were separated. The number of insured and uninsured were 861 (3.5%) and 15337 (95.5%). The median household income was 6255. The Multivariable logistic regression analysis showed that higher T stage, higher N stage, and the presence of lung, liver, and brain metastases were signi cantly associated with the early development of lung metastases. (Table 2) 3.3 Nomogram development A nomogram to predict lung metastases in patients with ovarian cancer was developed in the training cohort. The risk factors determined by multivariable logistic regression analysis, including higher T stage, higher N stage, the development of bone, liver, and brain metastases were developed and used as the nal nomogram. (Figure 2)

ROC curves analysis and prediction value evaluation
The ROC curves were drawn to determine the predicted value of the nomogram of the lung metastases in the training cohort and the validation cohort. As shown in Figs. 3a and c, the receiver operating characteristic (ROC) curves were drawn. We veri ed the nomogram internally and externally. The C index was used to evaluate the prediction accuracy of the nomogram. As shown in Fig. 3b, internal veri cation of the nomogram was performed, and the C index was 0.761 (0.736-0.787). As shown in Fig. 3d, the external veri cation of the validation cohort showed that the C index was 0.757(0.718-0.795).
Veri cation of the nomogram showed good agreement with the predicted values.

Discussion
Ovarian cancer is the seventh most common cancer among women and the eighth most common cause of cancer death in the world, with a 5-year overall survival rate of < 50% 7 . Two-thirds of patients are already advanced at the time of diagnosis (Stage III/IV). 8 When the lungs are affected by ovarian cancer, the main route of spread is from the pleura. Lung metastasis usually represents visceral pleura involvement and continuous in ltration. Occasionally, isolated substantial lesions are found, and the invasion of lymphatic vessels and blood vessels is also involved 9 . The incubation period from diagnosis of ovarian cancer to the development of lung metastases can be as long as 108 months 10 . Compared with standard chemotherapy treatment alone, early detection of lung metastases means earlier chances of receiving more aggressive treatments, which may lead to better survival 3 . Active chemotherapy can signi cantly reduce the tumor load and metastasis of ovarian cancer 11 .Surgical removal of isolated lung metastatic lesions is reasonable 12 .Targeted therapy is also a promising treatment for metastatic ovarian cancer 13 . Routine imaging studies, such as computed tomography (CT) or magnetic resonance imaging (MRI), have not shown high sensitivity and speci city when diagnosing micrometastasis < 1 cm 14 . Therefore, there is a need for a non-invasive method that can predict the likelihood of lung metastases in ovarian cancer patients. In this study, we used the SEER data set to develop and validate the predicted nomogram, which demonstratie signi cant discernment and calibration capabilities and provide a personalized estimate of the likelihood of lung metastases in ovarian cancer patients.
In order to solve this problem better, this study is the rst to generate a risk model based on clinical and tumor characteristics through population-based surveillance, epidemiology, and nal result databases to predict the risk of lung metastases in newly diagnosed ovarian cancer patients. We found that the higher of AJCC T and N stage, the more likely it is to metastasize, which is similar to the bone metastasis of ovarian cancer and other types of tumor metastases research 15 16 17 . Previous studies have shown that poor differentiation and lymph node involvement are risk factors for distant metastasis 3 .We found that liver metastases, brain metastases and bone metastases are risk factors for lung metastases. If distant metastasis occurs in other parts, it means that cancer cells have spread 18 ,and the possibility of lung metastases will be higher.
We constructed a nomogram of lung metastases from ovarian cancer and veri ed it internally and externally. It can predict the risk of lung metastases in patients with ovarian cancer. The nomogram of lung metastases included 5 factors, AJCC T stage, AJCC N stage, whether bone metastases, liver metastases and brain metastases occurs. The nomogram showed a good agreement between the predicted results and the observed results in the veri cation. In addition, the C indexes of internal veri cation and external veri cation of the nomogram were 0.761 (0.736-0.787) and 0.757(0.718-0.795),respectively, indicating a good consistency with the predicted values. For patients with a higher risk of metastases predicted by this model, imaging examination should be performed on time to diagnose the occurrence of lung metastases in the rst time, so as to better guide clinical procedures.
The determination of prognostic factors related to lung metastases in patients with ovarian cancer may help doctors provide targeted treatment strategies for patients at different risk levels and improve patient survival and quality of life. Previous studies had shown that lung metastases can signi cantly worsen the prognosis of patients. 19 .The median survival time for diagnosis of distant disease is 12 months 4 In this study, the 3-year and 5-year survival rates for 411 patients with newly diagnosed lung metastases were 33.8% and 22.8%, respectively, similar to other studies 20 21 .It was found that primary site surgery treatment and chemotherapy can improve overall survival. Therefore, for patients with ovarian cancer with lung metastases, active surgery and chemotherapy are encouraged. At the same time, the mixed histological type is a high-risk factor for death, and physicians should attach great importance to it.
However, this study has several limitations that should be noted. The main limitation of this study is that the variables used to construct the nomogram had only some clinical pathological features, because there are no important tumor biomarkers in the SEER database. Another limitation is that although the established nomogram shows good discrimination and veri cation capabilities, it still requires further veri cation based on large-scale external queues. Third, in this study, only patients with a primary diagnosis of liver metastases were analyzed. Since they may not be recorded in the surveillance, epidemiology, and nal results databases, liver metastases that occurred later in the disease cannot be analyzed.

Declarations
Con ict of Interest: The authors declare that they have no con icts of interest.
Ethics approval: Since the data collected from the Surveillance, Epidemiology, and End Results database were anonymized and de-identi ed prior to release, informed patient consent was not required in our study.      (Table 3).