Establishment of a Risk Factor and Clinical Prediction Model for Distal Metastasis and Prognosis of Ewing Sarcoma: A SEER-Based Study

Background: the purpose of this study was to establish and validate a clinical predictive model for predicting the risk of metastasis and survival of patients with Ewing sarcoma. Methods: the data of patients diagnosed with Ewing' sarcoma (ES) from 2004 to 2015 were collected from the Surveillance, Epidemiology and nal results Database (The Surveillance, Epidemiology, and End Results, SEER). By excluding vacancy data and random number method, the screened data were divided into training set (nasty 2520) and verication set (nasty 1076). Logistic regression analysis was used to analyze the risk factors related to tumor metastasis, and Kaplan-Meier curve and Cox proportional hazard model were used to explore the related risk factors affecting patient survival. A visual line chart (Nomogram) based on the above analysis was developed to predict ES tumor metastasis and patient's 3-and 5-year survival probability, and to verify it. Results: in the analysis of tumor metastasis, it was found that age over 60 years old, operation, tumor volume, radiotherapy and chemotherapy were the high risk factors of tumor metastasis. The establishment of Nomogram, to predict the risk of tumor metastasis and the use of ROC curve showed that the predictive model had better predictive ability (AUC=0.744,95%CI0.721-0.768), which could predict the existence of tumor metastasis. In the survival analysis of the training set, age (20-29, 30-60, > 60), location of tumor (axial, bone, facial bone), mode of operation (local excision, radical resection), number of primary tumor (multiple), tumor stage M (M 1,), Stage Group (II,III,IV,UNK stage), tumor volume (5 cm 10 cm, greater than 10cm), spread range (local metastasis). Distal metastasis), radiotherapy and chemotherapy are independent prognostic factors of patients. The Nomogram, established to predict 3 years and 5 years of patients is veried by the verication set, which shows a good consistency (c index = 0.747, 95% CI, 0.696-0.797). Conclusion: Nomogram with good predictive ability is developed to predict the risk of tumor metastasis and the 3-and 5-year survival rate of patients with ES. External data validation is still needed in future clinical applications, especially outside the United States. tumor volume, radiotherapy, chemotherapy and survival time.


Introduction
Ewing' sarcoma was rst described in detail by the famous pathologist James Ewing in 1922. [1] . It is a common primary bone tumor in children and adolescents, and its incidence is second only to osteosarcoma [2] . Previous studies have shown that ES is biased towards tubular bones and pelvic shaft, the most common parts are limbs (46%), lower limbs are more common than upper limbs (46%), followed by pelvis (25%), trunk including ribs and spine (22%), and other parts including soft tissue Ewing (6%) [3][4][5] . With the development of a variety of treatment modes, including chemotherapy, surgery and radiotherapy, the overall 5-year survival rate of local diseases has increased from about 10-55% 65%. [3,5,6] The incidence of ES is relatively low [2] ,so it is di cult to include a su cient number of ES patients in the study cohort. A su cient number of cases are registered in the SEER database, which consists of 18 cancer registries, covering about 30 per cent of the total population of the United States [7] Nomogram is a reliable and convenient tool to estimate the prognosis of tumors [8,9] . Through visual graphics, it can easily and quickly get the prediction results By combining a variety of important factors, the chart provides a personalized estimate of the probability of events, such as the recurrence rate of disease or the probability of death. Nomogram has become a reliable tool for predicting the clinical outcome of many tumor diseases. [10,11] In this study, the researchers screened and extracted the data of ES patients in the SEER database with comprehensive clinical pathology and treatment methods from 2004 to 2015, analyzed the extracted data, and then created and veri ed Nomogram, containing important and reliable variables to predict the metastatic risk of ES tumors and the 3-and 5-year survival rates of patients.

Case screening
We used the SEER database to use features that do not require patient consent. All data are from SEER database. Taking advantage of the feature that patient data can be used without patient consent in SEER database, the software SEER*STAT (version 8.3.5) is used to extract ES patient data. The inclusion criteria were as follows: (1) the morphological code of ES,ICD-O-3/WHO2008 was 9260; (2) complete clinical data, including age, sex, race, primary location, tumor volume, tumor expansion, distant metastasis, tumor metastasis and related data information, etc., so the cases diagnosed from 2004 to 2015 with more perfect data were selected; (3) complete follow-up. The exclusion criteria were: incomplete clinicopathological and survival data or inability to use (Blanks or NA).

Variable screening
The extracted data included age, race, sex, location and number of primary tumor, use of primary tumor surgery, tumor stage (T, N, M, Stagegroup), tumor volume, radiotherapy, chemotherapy and survival time. Then the age was divided into < 10-year-old group, 10-19-year-old group, 20-29-year-old group, 30-year-old 60-year-old group and > 60-year-old group. The SEER database does not record the exact location of the bones. As a result, the researchers classi ed patients with PrimarySite as "face, pelvis, spine, ribs or scapula" as axial axial bone and facial bone (Axialskeleton,Facialbones), and the rest as Softtissue. Surgical treatment was classi ed as nosurgery, Localresection and Radicalexcision. The number of primary tumors (Sequencenumber) is divided into Onlyone and more. The tumor volume was divided into three groups: smaller than 10 cm group, 5 ~ 10 cm group and larger than 10 cm group. SEER tumor spread staging (Extension) can be divided into Localized, Regional and Distant. According to the tumor staging of the American Joint Committee on Cancer (AJCC) and the ,Union for International Cancer Contro(UICC) 's 2010 edition, the tumor stages were divided into T (T1, T2, T2, T3,), N (N0, N1,NX), M ( M0, M1,), Stage group ( I,II, and III, and IV, unk Stage) [12] . Radiotherapy is divided into unused (No) and Yes), chemotherapy is divided into unused or unknown (No/unknown) and the use of yes. The demographic and clinical characteristics of the patients are shown in Table 1.

Establishment and veri cation of data analysis and prediction model
The selected cases were divided into training set (N = 2520) and veri cation set (N = 1076) by random number. Univariate and multivariate logistics regression analysis was used to determine the factors associated with tumor metastasis. after determining the metastatic risk factors, the Nomogram, for predicting the risk of tumor metastasis (metastasis risk) was constructed, the ROC curve was constructed and the area (AUC) was calculated to evaluate the ability of Nomogram.
Through the Grand Pad Prims software (version 8.0), the survival rate was calculated by Kaplan-Meier method and the survival curve was drawn to show the in uence of various factors on the survival rate of ES patients. Logarithmic rank test was used to evaluate the deviation. Univariate and multivariate Cox proportional hazard models were used to calculate the hazard ratio (HR) and 95% con dence interval (CI), of the training set and to determine the relationship between prognostic factors and survival rate. The Nomogram, for predicting the 3-year and 5-year survival rate of ES patients was constructed through the training set, and its consistency was tested by C-index, and the veri cation set was tested externally. SPSS (version 22.0), GraphPadPrism8 software (version 8.0) and R (version 3.0.1) software were used for statistical analysis. P < 0.05 is considered to be statistically signi cant.

Establishment and validation of a prediction model for the risk of tumor metastasis
The training set of patients was analyzed respectively when art and multivariate Logistic regression.Logistic regression results ( The results of the Logistic regression model in Table 2 were used to construct a Nomogram ( Fig. 1), including age, primary site of tumor, tumor volume, surgery, and radiotherapy and chemotherapy as prognostic predictors.By adding up the scores for each selected variable, the probability of tumor metastasis in an individual patient can be easily calculated.The ROC curve of the Nomogram is shown as follows ( Fig. 2), AUC = 0.774 (95%CI: 0.721 ~ 0.768), suggesting that the combined model has a good predictive ability for tumor metastasis. In the treatment and management of cancer patients, the ultimate goal is to improve the survival rate of patients. The researchers used the Kaplan-Meier survival curve to analyze the training set, and the results showed that there was a correlation between patient survival and various factors (Fig. 3). Logarithmic rank test (log-rank) was used for the differences between groups. Kaplan-Meier survival curve can more intuitively show the factors affecting prognosis. For example, it can be seen directly that the Kaplan-Meier survival curve of the ethnic (race) in Fig. 3 has no signi cant differentiation (P = 0.369), so race has no signi cant effect on the survival rate of ES patients. Cox regression analysis was used to further analyze the prognosis.
Univariate Cox regression was used to analyze the training set, and the results ( affecting the survival of patients [13][14][15][16] . However, there is no internationally recognized risk classi cation scheme for ES patients.For clinicians, prognostic judgment is critical to guide treatment decisions and provide patients with more effective and systematic treatment.At present, the long-term survival rate of patients with non-metastatic disease ES has been increased from 10%~15% to 60%~70% through the application of various treatment methods such as surgery, radiotherapy and chemotherapy [17,18] . However, as for the invasive behavior of ES, the most common site of rst metastasis is lung (70-80%), followed by bone (40-45%). [19] Only 20% of the patients with METASTATIC ES have a good survival time [13,20,21] . Successful treatment of ES patients requires systemic chemotherapy combined with surgery or radiation therapy, or both, in order to achieve local tumor control [22] . The prognosis depends on the size and location of the tumor, presence or absence of metastasis, tumor response to treatment, age, and disease recurrence, with patients with distant metastasis having the worst prognosis [22] . Therefore, prognostic tools are urgently needed to accurately predict the risk of ES metastasis and patient survival.Nomogram is a widely used tool today to predict the occurrence of speci c events and to estimate the prognosis in medicine. It is able to generate individualized probabilities of clinical events by integrating different predictive variables. The advantages of visualization and quanti cation are also practical in clinical practice.
The authors of this study used prior studies and data analysis to identify several independent prognostic factors for ES patients, and established two Nomograms to effectively and intuitively predict tumor metastasis risk and survival.The model includes not only systematic demographic data, but also pathological staging, surgical treatment, and other clinical parameters readily available in clinical practice.As a source of data, the SEER database includes 18 different regions, representing 26% of the U.S. population and re ecting the racial, economic, and social diversity of the United States, of great value [13,23] .
In this study, most of the patients were younger than 30 years old, accounting for about 70% in the training and veri cation set, which was also consistent with the prone age of ES patients in previous studies [20] . Although THE incidence of ES is the highest in the population under 30 years old, the younger the age of onset, the better the prognosis, and the higher the age of onset, the worse the prognosis [21,24] . This is also consistent with the results of this study (Figure 3). The older the patient is, the worse the prognosis will be.But given that older patients have more diseases, including diabetes, high blood pressure and other cancers, and are less tolerant to treatment, clinicians tend to choose more conservative treatment strategies [25] . Age also has an effect on ES metastasis.One study showed that the sites of primary and metastatic tumors varied signi cantly with age [26] . Whether the presence of metastases is related to the size of the tumor is a controversial topic, and the results of this study suggest that larger tumors are associated with a higher risk of metastasis.But further research into the mechanism is needed. Table 1 shows that the majority of patients are white, which is the same proportion as the population structure of the United States.When kaplan-Meier survival analysis was performed for RACES ( Figure 3, RACE), it showed its P<0.05, and visually showed that the curve of each species had no obvious differentiation degree, so the human species in this study had no signi cant in uence on the prognosis of ES patients.
Both of the most commonly used staging systems for Ewing's osteosarcoma are designed for bone tumors.The rst was created by Enneking in 1980 [27] . The second was created by the American Joint Committee on Cancer (AJCC) based on its systematic classi cation of cancers, which relies on TNM, tumor size, lymph nodes, and metastasis [28] . The biggest advantage of TNM staging lies in its simplicity and speed, but the biggest problem is that the prediction is not accurate enough, which is far from the expectation of clinicians. Combining the is as easy to use as the TNM staging model, but more accurate than the TNM staging model alone.
The treatment of ES is multidisciplinary, including chemotherapy, surgery and radiation therapy.Surgery and radiation therapy play an important role in improving patients.However, with the increase of systemic chemotherapy, the survival rate has been greatly improved.Before systematic treatment, almost 80-90% of patients develop distant metastases, despite the use of aggressive local control measures and the option of more thorough radical surgery, such as amputation [29] . After chemotherapy for ES patients began in the 1960s, the combination regimen of vincristine, adriamycin, cyclophosphamide and actinomycin increased the survival rate of ES patients [30][31][32] . Some studies have found that the use of chemotherapy greatly improves the survival rate of patients with localized ES, from about 10% to 70%-80% [33] . ES is special compared to other common primary osteosarcomas because it is very sensitive to radiation [34,35] . In the process of data analysis, radiotherapy and chemotherapy were considered as important treatment methods to include prognostic factors. However, there are no speci c chemotherapy-speci c regimens in the SEER database, only chemotherapy with (YES) and without (no/unknown).In the radiotherapy data, the author simpli ed the data into those who had received radiotherapy and those who had not received radiotherapy, because in the clinical practice, the detailed treatment data are very complex, and it is di cult to include all factors into the prognosis analysis in reality, so the researchers can only analyze these patients as a general population and make adjustments based on the actual situation.
Studies have reported that diameter & GT;The prognosis of 8cm ES is poor, and recurrent ES tumors are more likely to become larger than before no matter how they are measured and treated [21,36] . Larger tumor volume and axial primary tumors may often be associated with metastatic disease, both of which have been shown to be risk factors for reduced survival [37][38][39] . In terms of surgical treatment, studies have shown that patients with primary ES undergoing surgical resection may have a higher survival rate [40] .
For patients with metastatic ES, surgical treatment is also signi cant for the survival rate [41] . This is consistent with the results of this study, although the effect of tumor site and size on prognosis is complex. However, this study found that limited resection and radical resection had a limited impact on the survival rate of ES patients (Figure 3, surgery), which may be related to the clinician's choice of a more thorough surgical plan in the case of higher degree of tumor invasion.
The authors of this study, as surgeons, envisage a scenario for the clinical application of the model: the advantage of Nomogram's visual visualization in communicating with ES and explaining treatment options so that patients with no medical knowledge can better understand why doctors choose this treatment. For example, when communicating with a female ES patient over 60 years of age whose tumor is located in the spine and whose tumor is estimated to be larger than 10cm from imaging evidence, Nomogram can calculate that the 3-year survival rate is about 55% without surgical treatment, but more than 80% with surgical treatment. The ultimate goal of the clinical prediction model is to help clinicians make medical decisions and improve patient prognosis and cost.
The limitations of this study are as follows: 1. Even though the author randomly divided it into two data sets, one for modeling and one for validation, both data sets came from the SAME research center, SEER database. If the validation of the model can be further extended to the data set of another research center, the application value of the model will be greatly expanded.2. With the progress of imaging, more and more scholars realize the value of imaging. Using a large number of imaging parameters of color Doppler ultrasound, CT, MR and PET combined with clinical characteristics to construct the prediction model can further improve the accuracy of the prediction model. Unfortunately, imaging data is not included in the SEER database.3. Considering that this study is retrospective, some patient data will inevitably be lost.
This may reduce the number of eligible cases and may lead to the risk of potential selection bias. Despite these limitations, Nomogram is an important and effective predictive model for accurate prediction of individual survival outcomes in ES patients.

Conclusion
This study included factors related to the occurrence of ES metastasis and patient prognosis, including demographic data, clinicopathology and treatment of ES patients. The study found that the risk factors for ES metastasis were older than 60 years, surgery, tumor volume, radiotherapy, and chemotherapy.
However, age, surgical method at the site of tumor occurrence, number of primary tumors, tumor stage, tumor volume and spread range, radiotherapy or not, and survival time of patients with chemotherapy were all correlated. The clinical prediction model established in this paper based on SEER data has good consistency with the actual observation results at home and abroad, which can help doctors to predict the prognosis of a patient in the actual clinical practice. It is helpful to guide treatment, follow up, improve treatment accuracy and individual level. However, in future clinical applications, especially outside North America, external data validation is still required.