Risk Factors for the Morbidity and Prognosis of Lung Metastases in Newly Diagnosed Ovarian Carcinoma: A Large Population-based Study

Background: To evaluate the risk factors for the morbidity and prognosis of lung metastases (LM) in patients with newly diagnosed ovarian carcinoma (OC). Methods: Based on the Surveillance, Epidemiology, and End Results (SEER) dataset, OC patients from 2010 and 2016 were retrospectively analyzed. Risk factors for the morbidity of LM in OC patients and their survival were assessed by logistic regression analysis and Kaplan-Meier and Gray method, respectively. Cox regression analysis was performed to identify risk factors for the prognosis of OC patients with LM, and their prognostic potentials were further validated by two established nomograms. Results: There are 27,123 eligible OC patients were enrolled in the study, with the morbidity of LM at 5.61% (1,521/27,123). Logistic regression models illustrated that T3 stage [odds ratio (OR)=2.74, 95%CI=2.09-3.66, P<0.01], advanced N stage (OR=1.86, 95%CI=1.62-2.14, P<0.01), and the prevalence of bone metastasis (OR=3.78, 95%CI=2.79-5.11, P<0.01), brain metastasis (OR=4.67, 95%CI=2.50-8.63, P<0.01) and liver metastasis (OR=3.60, 95%CI=3.14-4.12, P<0.01) were all signicantly correlated with the morbidity of LM in OC patients. Median survival for OC patients with LM was 11 months (interquartile range, 3 to 25 months). Cox regression analyses illustrated over 80 years of age [hazard ratio (HR)=2.52, 95%CI=2.33-2.72, P<0.01] and positive expression of cancer antigen 125 (CA-125, HR=1.63, 95%CI=1.47-1.82, P<0.01) were signicantly correlated with the high mortality of LM, while chemotherapy (HR=0.62, 95%CI=0.59-0.65, P<0.01) was signicantly correlated with the low mortality. Two nomograms were established to examine the concordance index (C-index), calibration curves, the area under the curve (AUC), decision curve analyses (DCAs) and clinical impact curves (CICs), which validated the prognostic potentials of identied risk factors in OC patients with LM. Conclusion: The population-based cohort study provides references for guiding clinical screening and individualized treatment of OC patients with LM.


Introduction
Ovarian carcinoma (OC) is the most common cause of death in the female genital system. The American Cancer Society reported that there were 21,750 new cases of OC and 13,940 deaths in 2020 [1].
Approximately 60% of OC patients were determined at an advanced stage. Previous studies reported that the invasion of OC cells mainly depends on the hematogenous circulation and lymphatic channels [2], and pulmonary metastases in OC always are classi ed at a lower-level outcome that notably lead to a poor prognosis [3,4]. As the second most common distant metastatic site, the morbidity of lung metastasis (LM) ranges from 6% to 16% [5][6][7][8]. Although surgery is preferred as the curative treatment for metastatic malignancies, a large number of affected people connot be operated because of the strict indications [9].
OC patients with LM can also be bene cial from systemic treatments, including chemotherapy, radiotherapy, targeted therapy, and immunotherapy [10][11][12]. Although these treatments do prolong the progression free survival, most patients would ultimately suffer from relapse or resistance [13].
Meanwhile, the huge economic burden also challenges them. It is necessary to explore risk factors for the morbidity and prognosis of OC patients with LM, thus enhancing the survival outcomes.
Through literature review, risk factors and survival estimates of OC patients with LM have not been extensively analyzed. Therefore, it is essential to construct predictive models for designing prophylactic treatments and attentive nursing care for OC patients at a high risk of LM. This study aims to investigate risk factors for the morbidity and prognosis of newly diagnosed OC patients with LM and validate them by establishing nomograms.  Figure 1. The ethical approval was not requested in this study because clinical data of recruited OC patients were collected from the open-access and anonymized data in the public SEER dataset.

Nomogram construction and validation
In the cohort, the classi ed variables were expressed as the number and its percentage (N, %). Follow-up analyses were conducted to assess independent risks for the prognosis of OC with LM. Univariable and multivariable logistic regression models were introduced to identify risk factors for the morbidity of LM in newly diagnosed OC patients, and adjusted and unadjusted proportional hazard models were used to distinguish prognostic factors for OC with LM. Afterwards, two nomograms were established to quantify the prediction capacity. Based on risk scores of overall survival (OS) in the nomogram, patients were categorized into low-risk and high-risk subgroups. Differences between two subgroups were assessed by depicting the clinic effect curve. Furthermore, Kaplan-Meier survival curves were depicted to assess the overall survival of OC patients with LM. To avoid the impact of other critical illnesses, cancer-speci c survival analyses were performed by the cumulative incidence function. Notably, the accuracy of nomogram was detected and validated by operating calibration plots. Also, DCAs and CICs were designed to calculate the net avails for each risk threshold probability.

Statistical analysis
R software (version 3.6.1, https://www.r-proje ct.org/) was employed for statistical analyses. The categorical data were measured by Fisher's exact test or Chi-square test. Nomograms based on regression models, calibration curves and survival-related curves were all drawn via diverse functional packages namely RMS, Foreign, Survival, Cmprsk and other software. (https://www.mskcc.org/departments/epidemiologybiostatistics/health-outcomes/tutorial-r) A two-tailed P value<0.05 was considered as statistical signi cance, (*P<0.05, ** P<0.01).

Baseline characteristics of OC patients
As shown in Table 1, a total of 27,123 eligible OC patients were recruited and their mean age and median survival time were 60.39±14.90 years and 22 months (interquartile range 9-43 months), respectively.
Among them, 5.61% (N=1,521) developed LM with the mean age and median survival time were 65.37±13.74 years and 11 months (interquartile range 3-25 months), respectively. Other demographic and medical traits of recruited OC patients were presented as well. The remarkable differences included the age, race, marital status, histology, grade, tumor size, the number of regional nodes examined, radiotherapy, chemotherapy, surgery scope, T stage, N stage, CA-125, bone metastasis, brain metastasis and liver metastasis.

Independent risk factors for the morbidity of LM in OC patients and nomogram establishment
Unadjusted and adjusted logistic regression analyses were applied to assess independent risk factors for the morbidity of LM in newly diagnosed OC patients. It was shown that the histology, grade, the number of examined regional nodes number, treatment strategies like chemotherapy and surgery, T and N stage, CA-125, and the incidence of other distant metastases were correlated with the morbidity of LM in OC patients ( Table 2). The morbidity of LM in OC patients with the histological subtype of serous was signi cantly lower than those with non-serous adenocarcinoma (OR=0.80, 95%CI=0.70-0.92, P<0.01) was lower than those with non-serous adenocarcinoma. Concerning tumor grade, poorly differentiated (OR=2.71, 95%CI=1.53-5.34, P<0.01), and undifferentiated OC patients (OR=2.84, 95%CI=1.60-5.61, P<0.01) had a signi cantly higher risk of LM development than well differentiated ones. In addition, OC patients with more than 10 examined lymph nodes had a signi cantly lower risk for the morbidity of LM than those without lymph nodes detection (OR=0.41, 95%CI=0. 33 Subsequently, we established a nomogram to intuitively display score assignments and predictive probability of the risk factors ( Figure 2A). Simultaneously, the calibration curve with the C-index of 0.807 suggested an extremely consistency between actual observations and the probability of prediction ( Figure 2B). DCAs and CICs illustrated that threshold probabilities at 0-0.3 were the most favorable predictor of LM in accordance with our nomogram model ( Figure 2C-D).

Survival analyses of OC patients with LM
Kaplan-Meier method was adopted to detect the in uence of LM on the outcome of OC patients. As shown in Figure 3A, OS curves revealed that LM development was signi cantly correlated to the prognosis of OC (HR=1.36, 95%CI=1.27-1.45, P<0.01). The OS was signi cantly worse in OC patients with over 80 years of age ( Figure 3B, P<0.01), poorly differentiated and undifferentiated neoplasm ( Figure 3C, P<0.01), bone metastasis ( Figure 3D, P<0.01), brain metastasis ( Figure 3E, P<0.01) and liver metastasis ( Figure 3F, P<0.01) than those of controls. Meanwhile, we found that LM was signi cant correlated with the major cause of death in OC patients rather than other diseases via gray method [sub-distribution hazard ratio (SHR)=3.08, 95%CI=2.89-3.28, P<0.01] ( Figure 3G).

Prognostic factors for OC and nomogram establishment
Prognostic factors for OC patients were analyzed using the cox regression model ( According to the results of cox regression analysis, signi cant risk factors for the prognosis of OC were subjected to the establishment of a nomogram for determining the 3-year and 5-year survival rate ( Figure   4A). Strati ed by the medium scores from the nomogram, the clinic effect curve revealed that the high survival feasibility of low-risk subgroup was signi cantly superior to that of high-risk subgroup (Figure 5, HR=1.06, 95%CI=1.03-1.10, P<0.01). Furthermore, the calculated 3-year and 5-year AUC (0.812 and 0.818, respectively, Figure 4B) and the solid lines closed to the diagonal lines ( Figure 4C) both displayed the excellent accuracy of the prediction.

Discussion
Ovarian carcinoma is regarded as the rst leading cause of mortality among gynecological malignancies due to the high rates of advanced stage and recurrence. is regarded as the rst leading cause of mortality among gynecological malignancies due to the high rates of advanced stage and recurrence.
Although risk factors for the prognosis of metastatic OC have been widely explored, shortcomings exist.
Previous studies have obtained hazard factors for the morbidity and prognosis of OC with distant metastases, but a visualized gure to predict the probability is lacked [14,15]. Nomograms, as a novel form of data visualization, have been well concerned to improve the e ciency for clinical applications. Yuan et al.[16] once revealed that the advanced T and N stage, other distant metastases were risk factors for the morbidity of LM in OC patients, and treatment strategies like active surgery and chemotherapy served as protective factors. However, inclusion and exclusion criteria are inconsistent. Whether there are complications that may signi cantly in uence the OS of recruited patients remain unclear. Although the prognostic survival rate was predicted, the results had little signi cance since not all signi cant risk factors have been introduced in the multivariate cox regression model. As a result, predicted results were not representative for the actual HR. Therefore, we identi ed risk factors for the morbidity of LM and prognosis of OC patients with LM by logistic regression analyses and proportional hazard analyses, followed by a direct visualization of the probabilities for LM development in OC patients and their OS through establishing nomograms. CA-125 is a large membrane glycoprotein belonging to the wide mucin family. Thirty years after its discovery, CA125 is still recommended as a vital tumor marker for OC, which is detected for re ecting cancer cell residue or recurrence in OC patients after the rst-line therapy [17]. It is proved that a rising serum CA-125 level within the normal range is strongly associated with the risk of recurrence and the survival of OC[18], suggesting that the uctuated CA-125 level is valuable for predicting the prognosis of OC. In this study, we not only detected the in uence of CA-125 on the prognosis of OC patients with LM, but also tried to eliminate the intervention from other major diseases by calculating the cancer-speci c survival. Moreover, multiple methods were adopted to determine the availability of nomograms.
According to the cohort analyses, 5.61% of the included OC patients were diagnosed with LM and the median survival was 11 months. We found that LM was more likely developed in OC patients with a high tumor grade, non-serous adenocarcinoma, the usage treatment of radiotherapy and chemotherapy, progressive T and N stage, and other organ metastases. Older age, poorly differentiated and undifferentiated grade, lack of regional lymph node examination, radiotherapy, elevated CA-125, advanced T and N phase and other site metastases were signi cantly correlated with the poor prognosis of OC with LM. To verify the precision of nomograms, the C-index, calibration plots and the value of AUC demonstrated the high agreement on predictions. Moreover, the clinic effect curve showed the discrimination ability of models. Obviously, the survival probability in low-risk subgroup was dramatically that in higher than high-risk OC patients, indicating that identifying risk factors was instructive and meaningful for guiding prophylactic clinical treatment and improving the prognosis of OC patients.
According to the previously established nomogram on the morbidity of LM in OC patients, the serous adenocarcinoma is considered as the most aggressive subtype [19]. Inconsistently, our results revealed that patients with non-serous carcinoma of OC, including mucinous, endometrioid and other subtypes had a memorably higher probability to develop LM. A growing number of studies have reported smoking increases the risk for the development of non-serous carcinomas, especially mucinous tumors, but clear association with serous subtypes is scant [20,21]. Besides, it has been widely accepted that smoke exposure increases the numbers of lung metastases [22,23], which might explain the inverse distribution in the study. A previous study suggested that higher tumor grade and T stage were crucial risk factors for the prognosis of gynecological cancer patients with distant metastases [24]. Not surprisingly, we obtained the similar result that undifferentiation and poor differentiation grade, worse T and N stages and lack of regional node examination were signi cantly correlated with the risk of OC with LM.
As for the OS nomogram, we revealed that the prognosis of youngest OC patients aging18-49 years was better than older ones, which was consistent with previous ndings [25,26]. It is reported that an elevated CA-125 level indicated an ineffective treatment [27]. Likewise, our study found that elevated CA-125 level resulted in worse survival outcomes, regarded as the effective determinant for the prognosis of OC with LM. Notably, the application of chemotherapy brought out an opposite effect in two nomograms. Current data demonstrated that chemotherapy resistance of OC cells would contribute to the recurrence and metastases [28]. On the contrary, evidences also supported that chemotherapy is feasible for partial cytoreduction and prolonged survival [29,30]. Undeniably, chemotherapy is conducive to improve clinical response and outcome [31,32]. For cancer patients with regional lymph node involvement, surgery and chemotherapy are positive factors for OS [33], which was also validated in our cox regression analyses. However, for patients who relapse or die from the disease, the assessment of risk factors and biomarkers at the cellular level for chemotherapy response should be highlighted in the future [34].
Our study still had several limitations. Firstly, this population-based retrospective investigation lacked some pivotal clinical data, such as the detailed assessment about pulmonary metastatic tumors and more information on individual treatments. Secondly, the obtained morbidity of LM might produce regional biases because the model was registered in the United States. Thirdly, our results were remarkably underestimated because a large number of OC patients having LM later have not been recorded. Last but not the least, these models constructed from SOR database did not go through veri cations by external data and should be constantly modi ed based on the clinical practice.

Conclusion
The retrospective study represented the largest dataset for LM development in OC patients and provided valuable nomograms about epidemiological characteristics and prognosis of advanced OC. Moreover, our ndings suggested a strong reliability through multiple statistic approaches of calibration and discrimination. Hence, they had the potential to guide clinical diagnosis and individual treatments of OC with LM. In the future, laboratory investigations and large sample prospective clinical trials are demanded to further evaluate the molecular characteristics and treatment decisions for OC patients with LM. Figure 1 The ow diagram of participant inclusion and exclusion. The nomogram combining with its calibration and veri cation curves for predicting LM morbidity in OC patients. A total of twelve factors were determined in LM incidence predictive nomogram (A). The calibration curve (B) with the C-index of 0.807 was showed to verify the validity of prediction. Decision curve (C) and clinical impact curve (D) were plotted to show the event occurrence of patients with high risks.  The nomogram combining with its calibration and veri cation curves for predicting risk factors on prognosis for OC patients. A total of fteen prognostic factors were de ned in 3-and 5-year survival nomogram (A). Calibration curves (B-C) with the values of AUC (3-year AUC=0.812, 5-year AUC=0.818, respectively) were plotted to verify the effectiveness of prediction.