A Novel Clinical Nomogram for Predicting Lymph Node Metastasis in Ovarian Cancer: A SEER Analysis and External Validation in a Tertiary Center

The aim of the study is to investigate the risk factors for developing lymph node metastases (LNM) in cases diagnosed as a presumed early-stage ovarian carcinoma (OC). Information of patients who had been diagnosed as OC in 2018 was obtained from the SEER database. We enrolled 104 OC patients in General Hospital of Northern Theatre Command for external validation. A logistic regression was conducted to determine the independent predictors for LNM, which were used for establishing a nomogram. In order to evaluate the reliability of nomogram, we applied a receiver operating characteristic curve (ROC) analysis, calibration curves and plotted decision curves. preoperative serum CA125 level were all signicant predictors of LNM. The nomogram established using the above variables had great performance for clinical applying.


Introduction
Annually worldwide, there are 230 000 women diagnosing as ovarian carcinoma (OC) and 150 000 patients die. Although the incidence of OC was is the third of gynecological malignancies, its mortality rate is the highest. In recent years, the incidence of OC has gradually increased [1,2,3]. Because of late detection, 70% of OC patients present with advanced stages upon diagnosis [4]. OC can be transferred by intraperitoneal route, lymphatic route, and blood-borne route. Previous studies showed that up to 15% of OC have positive lymph node metastases (LNM), which would signi cantly in uence the lifetime of OC [5].
Whereas, there were about 80% of cases unnecessary if the lymph node dissection is routine in presumed early-stage OC [6,7,8]. Moreover, dissection of lymph node would increase occurrence of complications, including infection, formation of lymphocyst, which might signi cantly in uence the quality of life [9,10,11]. Consequently, identifying OC cases with positive lymph node metastases (LNM) would help the oncologist institute treatment decisions which will bene t the prognosis of OC.
The aim of the study is to investigate the risk factors for developing LNM in cases diagnosed as a presumed early-stage OC. In our study, we used logical regression to construct a nomogram for predicting LNM in OC cases based on the SEER database and external validation in a Gynecological oncology Center.

Study Population and Data collection in SEER database
Information of patients who had been diagnosed as OC in 2018 was obtained from the SEER database using SEER*Stat software. In order to access the data of SEER, we obtained signed authorization. The following were the inclusion criteria: (1) Site recode ICD-O-3/WHO 2008: Ovary, (2) year of diagnosis: 2018. The exclusion criteria were as following: (1) information missing of LNM, tumor size, race, marital status, histology, or tumor grade, (2) no rst tumor.
Following the processing owchart shown in Figure 1, 921 patients with OC were enrolled in our study. At a ratio of 7:3, we randomly divided the 921 cases into a training cohort (n = 644) and validation cohort (n = 277). We collected the variables including age, race, insure, marriage, laterality, histology type and grade, tumor size, preoperative serum CA125 level and lymph nodes positive.

External Validation data
The clinical Data of 104 OC patients were extracted from electronical database of the General Hospital of Northern Theatre Command. This study was approved by the Institutional Review Board of the General Hospital of Northern Theatre Command (ID:2020016). Because the study was retrospective and observational, the board waived the patients' informed consent. Inclusion criteria: (1) The OC was primary, and diagnosed by postoperative pathology, (2) The patients did not receive preoperative biological therapy or chemoradiotherapy, (3) The clinical data were complete.

Statistical analysis
The categorical and continuous data were expressed as percentage and mean ± SD respectively. For categorical variables, we conducted t test or Mann-Whitney U test to make comparisons between the groups while for continuous variables, chi-square test or Fisher's exact tests were used. To develop a wellreliable nomogram model predicting the risk of LNM, our nomogram was built using the training cohort Page 4/17 with 644 patients, validated internally using the 277 patients and then validated externally using the 104 patients in the General Hospital of Northern Theatre Command. In order to check multicollinearity between clinical variables, we used the variance in ation factor (VIF) and tolerance. A logistic regression was conducted to determine the independent predictors for LNM, which were used for establishing a nomogram. In order to evaluate the reliability and the net bene t of nomogram, we applied a receiver operating characteristic curve (ROC) analysis, calibration curves and plotted decision curves. We considered statistically signi cant if the p value was less than 0.05. we used the statistical packages R (The R Foundation; http://www.r-project.org; version 3.4.3) and Empower (R) to analyze the Data (www.empowerstats.com, X&Y solutions, inc. Boston, Massachusetts).

Demographic characteristics
In our study, we enrolled 644 and 277 cases into the training cohort and the validation cohort. There was no difference in various indicators between the two cohorts (P > 0.05, Table 1). Most of the patients were white (76.9%), the histological type was serous carcinoma (44.4%), the most histological grade was G3 (43.8%), 66.6% of the patients were with positive serum CA125 and 77.6% of the patients were with positive LNM.  . In addition, we found the tolerance was >0.1 and VIF was <10 for the predictors, suggesting no collinearity among these independent variables (Supplement Table 1). Based on the above risk factors, we established the nomogram for predicting LNM. The AUC of the model training cohort and validation cohort were 0.78 ( gure3A) and 0.79 ( gure3B) respectively, which indicated favorable discrimination. The calibration curves showed that the predicted outcome tted well to the observed outcome in the training cohort (p=0.825, gure3D) and validation cohort (p=0.503, gure3E). The decision curves showed the nomogram had more bene ts than the All or None scheme if the threshold probability is >50% and <100% in training cohort and validation cohort ( gure3G, H  Table 2. The AUC of the external validation cohort were 0.76 ( gure3C), which indicated favorable discrimination. The calibration curves showed that the predicted outcome tted well to the observed outcome in the external validation cohort (p=0.108, gure3F). The decision curves showed the nomogram had more bene ts than the All or None scheme if the threshold probability is >30% and <90% in the external validation cohort ( gure3I).

Discussion
More than 70% of OC patients were diagnosed as late stage because of the insidious progress and not obvious symptoms early, which leads to the 5-year survival rate is only 30% -40% [4]. The issue of LNM in OC has a particular interest among gynecological oncologists worldwide, because the LNM was with high occurrence and would affect the prognosis of OC [1,12,13,14]. Nasioudis et al. have found that the rate of LNM was about 3.3% -14% in early OC and as high as 40% ~ 73.7% in late OC [15,16]. Moreover, it was observed that plenty of cases diagnosed as presumed early-stage OC already have LNM. Therefore, all these mis-staging cases are at risk for poor long-term prognosis. In consideration of the in uence of LNM on prognosis, routine lymph node resection in early OC patients has been performed by most surgeons. Whereas, routine performance of lymph node resection might lead to overtreatment in a signi cant number of cases and give rise to more occurrence of complications, including poor wound healing, infection, formation of lymphocyst and chronic lymphedema of lower extremities, which would in uence the quality of patients' daily life [5,7,17]. Consequently, the lymph node removal is still controversial for OC patients. In addition, with the development of minimally invasive surgical treatment and the objective existence of complications of lymph node resection, more and more gynecological oncologists focused on appropriate, reasonable, and accurate lymphadenectomy in OC. Therefore, identifying cases which present LNM would avoid unnecessary systemic lymph node resection and enable the oncologist to provide a better selection of cases. It would not only ensure patient outcomes but also reduced the incidence of complications.
In order to assess the possibility of LNM, several researchers make efforts using different methods.
Signorelli et al. used positron emission computed tomography (PECT) to detect potential positive lymph nodes. The study found that the detectable rate of positive lymph nodes was about 83.3%. Signorelli et al. concluded that PECT was safe and reliable for detecting potential positive lymph nodes and could help avoid systematic lymph node dissection [10]. The sentinel node detection was another promising method for identifying LNM in OC patients. The sentinel node detection is still under evaluation in OC patients before it was as part of the standard therapeutic protocol, despite the method using for breast cancer and cervical cancer [7,18,19,20]. Bogani et al. developed a nomogram to identify LNM and found that high-grade serous histology was a strongest predictor for LNM [5,21]. Zhou et al. found that poorly differentiation, serous histology, and higher values of CA125 may be associated LNM [22]. In our study, we found that endometrioid carcinoma, a lower degree of differentiation, and positive serum CA125 were all associated with higher occurrence of LNM, which was similar with the conclusion published by Hengeveld in 2019. Hengeveld et al. also found that higher age and the postmenopausal status were signi cantly associated LNM [23]. However, in our study, we found that the older age was negatively associated with LNM.

Conclusions
The multivariate logistic regression showed that age, histology type, histology grade and preoperative serum CA125 level were all signi cant predictors of LNM. After internal and external veri cation, we found the nomogram established using the above variables has great performance for clinical applying.   The nomogram to predict the probability of LNM in patients with OC. Based on the risk factors selected, we developed a nomogram to predict the probability of LNM based on the logistic model. Nomogram Validation The AUC of the model training cohort, internal validation cohort, and external validation cohort were 0.78 ( gure3A), 0.79 ( gure3B) and 0.76 ( gure3C) respectively, which indicated favorable discrimination. The calibration curves showed that the predicted outcome tted well to the observed outcome in the training cohort (p=0.825, gure3D) internal validation cohort (p=0.503, gure3E), and external validation cohort (p=0.108, gure3F). The decision curves showed the nomogram had more bene ts than the All or None scheme if the threshold probability is >50% and <100% in training cohort and internal validation cohort ( gure3G, H), >30% and <90% in the external validation cohort ( gure3I).