Predicting Prognosis In Primary Pulmonary Sarcoma Using A Clinical Nomogram Based On SEER Database

Background: Primary pulmonary sarcoma (PPS) accounts for less than 1.1% of all pulmonary tumors. Few data outcomes are reported. This study aims to clarify the predictive value of clinicopathologic features on the overall survival (OS) of PPS patients. Methods: Patients with primary pulmonary sarcoma (PPS) were collected from the Surveillance, Epidemiology, and End Results (SEER) database (from 2000 to 2015) and divided randomly into training and validation cohorts at a ratio of 1:1. Univariate Cox analysis and the least absolute shrinkage and selection operator (LASSO) were implemented to identify prognostic factors related to overall survival of primary pulmonary sarcoma patients. Then, we performed multivariate Cox regression to establish a prognostic factors signature. The Kaplan- Meier (K-M) survival curves and time-dependent receiver operating characteristic (ROC) curves were plotted to estimate the prognostic power of the signature. In addition, multivariate Cox regression screened out independent prognostic factors and constructed a nomogram. Results: PPS patients with training group were divided into low- and high-risk group based on risk score, and high-risk group had a shorter survival time. The validation group got the same result. (P<0.001). On multivariate analysis of the training cohort, independent factors for survival were marriage, age, sex, grade, operation, metastasis and tumor size, which were all selected into the nomogram. The calibration curve and ROC plots for probability of 3-year and 5-year survival were in accord with prediction by nomogram and actual observation. And the C-index of the nomogram for predicting survival was 0.77 (95% CI, 0.74 to 0.80, P<0.05), which was statistically signicant. Conclusion: We constructed a risk prognosis model based on PPS patients from SEER database. In addition, the construction of nomogram provides one more idea for clinical treatment.


Introduction
Primary lung sarcomas (PPS) are a group of rare non-epithelial malignancies that originate in the lung mesenchymal tissue. It develops from mesenchymal elements of bronchial wall, vessels or pulmonary stroma [1][2][3] .
Primary lung sarcoma (PPS) is a rare and aggressive tumor that accounts for only 0.4% --1.1% of all lung malignancies. Apart from individual case reports, few cohort studies have been reported on primary sarcoma of the lung. Most reports indicated a survival bene t from surgical resection. However, a guideline for eligible treatment of PPS is not yet available. The low incidence and detection rate of this group of tumors brought out the fact that primary sarcomas of the lung still remain a diagnostic and cure challenge [4][5][6][7][8][9] .
Therefore, the aim of this study is to examine a large cohort of PPS patients to better characterize the clinicopathologic prognosis factors of PPS that impact survival. Furthermore, prognosis model was established to predict the probability of survival and provide more comprehensive evidence for early diagnosis and clinical treatment.

Patient Selection and Data Retrieval
Patient information was collected from the SEER database of the National Cancer Institute (http://seer.cancer.gov/). After consulting the CS Schema v0204+, we extracted data from 1094 patients with Primary pulmonary sarcoma from 2000 to 2015 using SEER*Stat software (version 8.3.8). The inclusion criteria were as follows: (a) Patients with primary lung sarcoma were con rmed by pathology in SEER database. (b) All patients had tumor-speci c death. (c) Follow-up information was complete on marriage, sex, age, race, site, grade, Laterality, stage, operation method, operation, radiotherapy, chemotherapy, neoplasm invasiveness, Lymph node status, metastasis and tumor size. Patients lacking any of this information were excluded from the study cohort.

Construction and Validation of Prognostic Model
247 patients with Primary pulmonary sarcoma (PPS)were randomly divided into a training group N=124 and a test group (N=123) in a ratio of 1 to 1. All preliminary analyses were performed in the training group to construct a feature based on prognostic factors and to validate the feature in the test group. Univariate Cox proportional risk analysis was used to screen for prognostic factors associated with overall survival in patients with primary pulmonary sarcoma, P<0.05 as statistical signi cance. LASSO analysis is a highdimensional indicator regression method, eliminating the highly correlated variables and preventing the model from over-tting. LASSO analysis was used to screen out the critical factors from the general prognostic factors in univariate Cox regression analysis using R with glmnet package (Version 3.0-2, https://CRAN.R-project.org/package=glmnet). Then, multivariate Cox regression, stepwise regression, was performed to reduce dimensionality and establish a risk score formula weighted by the corresponding coe cients. The univariate and multivariate Cox regression analysis was performed using survival package (version 2.41 1,http://bioconductor.org/packages/survivalr/) in R language. A risk score was calculated for each patient in the training group based on the above formula. The training and test groups were classi ed as high-risk and low-risk groups, respectively, based on the median risk value of the training group. Survival difference of ve years between the low-risk and high-risk group was assessed by the Kaplan-Meier (K-M) survival analysis using R with survival package (Version 2.41-1, http://bioconductor.org/packages/survivalr/). To evaluate the predictive performance of the ve-year survival difference, a time-dependent receiver operating characteristic (ROC) curve was plotted using the R and survival ROC packages (Version 1.0.3, https://CRAN.R-project.org/package=survivalROC). Thus, we perform K-M survival and ROC analysis to evaluate predictive accuracy of the prognosis factors in the testing group according to the same cutoff value. The area under the curve (AUC) is served as the evaluation criterion.

Construction of nomogram
Based on the results of multivariate Cox regression analysis, we constructed a nomogram of individualized overall survival prediction to estimate 1-, 3-, and 5-year overall survival using R with rms package (Version 5.1-4, https://CRAN.R-project.org/package=rms). The performance of the nomogram was measured by concordance index (C-index) and assessed by ROC and calibration curve. The larger the C-index, the more accurate was the survival prediction. During the validation of the nomogram, the total points of each patient in the cohort were calculated according to the established nomogram, then Cox regression in the cohort was performed using the total points as a factor, and nally, the C-index and calibration curve were derived based on the regression analysis. P <0.05 was considered statistically signi cant.

Identi cation and selection of independent prognostic factors
First, we randomly divided 247 patients with PPS into two groups: a training group and a test group. Cox regression analysis was performed to identify signi cant prognostic factors correlated with overall survival (OS). As shown in Table 2, univariate regression analysis of the train group revealed that clinicopathological characteristics, including age, grade, stage, lymph node status, and metastasis were prognostic risk factors (p < 0.01), whereas, marriage, gender and operation were prognostic protective factors (p < 0.001). In order to improve the accuracy of the prediction model, we use Lasso regression to eliminate variables with high correlation, to further reduce the variables. As a result, 7 key prognostic factors were identi ed from 11 signi cant clinicopathological characteristics in the univariate regression analysis(P<0.05) using lasso regression gure2A B . Next, multivariate COX regression analysis, stepwise regression and screening, were performed to determine the best risk score formula to predict the overall survival rate of the risk prognosis model (table 2). The sites corresponding with these clinicopathological characteristics were marriage, gender, age, grade, operation, metastasis and tumor size.
According to the corresponding coe cients of the prognostic factors β-values, a risk score formula was obtained for predicting prognosis. Risk score = 0.230 × β-value of age + 0.471 × β-value of grade + 0.695 × β-value of metastasis + 0.155 × β-value of tumor size -0.710× β-value of marriage -0.553 × β-value of gender -0.772 × β-value of operation. Among these characteristics, age, grade, metastasis and tumor size were negative correlated to overall survival in PPS patients, on the contrary, marriage, gender and operation were positive ones ( gure2C).

Survival analysis
To assess the predictive performance of the selected prognostic factors, patients were divided into highrisk (N=62) and low-risk (N=62) groups using the median risk score of the training group as a threshold.
Distribution of risk score, survival status of each patient, and heat map expression of seven prognosis factors were constructed in training group. As the same, they were validated in the test group. K-M survival curves showed patients in high-risk group with shorter OS and low-risk group with longer OS(P<0.001). What's more, the AUC is 0.847( gure 3A). Most importantly, the similar situation was proved in test group and the AUC is 0.813( gure 3B). These results demonstrated that our prognosis factors performed signi cant sensitivity and accuracy in assessing PPS patients' overall survival.

Construction and validation of nomogram
According to the results from multivariate analysis, marriage, gender, age, grade, operation, metastasis and tumor size were signi cantly related to overall survival of patients with PPS. Through analysis of the above independent and stable prognostic factor, we constructed a visualized nomogram predicting the 1-, 3-, and 5-year survival rates( gure3C). The prediction nomogram successfully quanti ed each indicator's contribution to survival and the concordance index and calibration plots showed its superior predictive capacity. The C-index of the nomogram for predicting OS was 0.77(95% CI, 0.74 to 0.80, P<0.05), at the same time, 3−year and 5-year Survival AUC were both 0.873( gure4A B). The last but not least, the calibration curve showed good agreement between prediction and observation in the probability of 5-year survival and 3-year survival( gure 4C D). All the above results showed that our prognosis prediction model had better stability and reliability predicting overall survival in PPS patients.

Discussion
As we all know, primary lung sarcoma (PPS) is a rare malignancy accounting for less 1% of all lung neoplasm 1-3 . Since the low incidence and detection of PPS, we are more interested in which clinicopathologic factors are signi cantly correlated with these patients and desired to develop a predictive nomogram for this cohort. So we conducted a series of analysis of PPS patients from SEER database. Here, we collected 247 PPS patients with complete clinical information performing survival analysis. After assessing univariate and multivariate cox regression, we developed a nomogram containing 7 signi cant independent prognosis factors, including marriage, gender, age, grade, operation, metastasis and tumor size. All of the above prognostic factors are easy to acquire. In addition, the focus of our conclusions is that patient gender, marriage, and metastasis are the key prognostic factors to be identi ed in primary pulmonary sarcoma for the rst time, providing additional targets for clinical prognostic analysis.
Previous studies have inferred that various signi cant prognostic factors for patients with PPS, such as age, tumor size, tumor grade, histologic form and treatment type 1 10 .Similarly, our study also proposed that age, grade and tumor size were signi cant risk factors, while operation was important protective factors. Matthew Koshy et al. revealed PPSs are associated with a higher rate of nodal metastasis relative to soft-tissue extremity sarcomas 1 . Thus, it is recommended that pulmonary sarcoma patients undergo a thorough mediastinal nodal evaluation to rule out locoregional metastasis. In our study, the lymph node status was an independent prognostic factor in univariate Cox regression analysis. The reason why we excluded this factor in the following analysis was avoiding the highly correlated variables between those clinicopathologic factors to improve accuracy of predicting OS. Although the reports on PPS are not abundant, most studies have showed complete surgical resection is important for cure which our results further con rm [11][12][13][14] . Therefore, the bene t of surgery for PPS patients maybe a good choice to a certain degree. Another interesting point is that we identi ed three different prognostic factors (gender, marriage, and metastasis) comparing with previous study, which means that the objective of clinical prognostic analysis is broadened, allowing physicians to consider outcomes from multiple dimensions.
However, it has not been mentioned in previous studies that the prognosis of PPS patients is worse in males and the unmarried than in females and the married. More data studies may be needed to explore the reasons behind this.
Besides the new factors for prognostic analysis, our survival prognostic model also showed good predictive performance. And C-index, AUC and calibration curves all showed that our prognosis model has a high accuracy and stability in 3-year or 5-year OS C-index of the nomogram for predicting OS was 0.77, Survival AUC were both 0.873 . This is one of the most powerful aspects of our model. Overall, 247 samples from SEER database were conducted the rst nomogram of PPS survival. We not only con rmed the common prognostic factors mentioned in previous studies, but also veri ed several different factors, laying a more solid foundation for the prognostic analysis of PPS patients. Moreover, good predictive performance makes it the fact the reliability of this group of randomly grouped data. If the results of our study are con rmed by further clinical practice, it may have a positive impact on the formulation of policies for the diagnosis and prognosis assessment of PPS patients.
Although our model had good performance for predicting survival, there must be some defects. Firstly, it was a retrospective study and the effective samples belong to a medium scale cohort from SEER database. Secondly, some signi cant prognostic factors, such as some physical indices, resection margin and the level of CEA were not available in the SEER database, resulting in limitation of our study. In addition, many patients in this study were classi ed as a primary sarcoma that is not otherwise speci ed which re ects the di culty in the tissue diagnosis of PPS, and it has been mentioned by Etienne-Mastroianni et al 12 . Rapidly evolving diagnostic techniques may allow future studies to further identify the relationship between PPS histology and behavior. Finally, further large prospective studies are needed to con rm the validity of our prognostic model, too.
In conclusion, we rstly conducted and validated a nomogram that has a higher potential to accurately predict OS of PPS patients and could provide some insights into the clinical therapy practice. This result would assist clinicians in performing personalized clinical treatment and more accurate prognosis estimation.

Declarations
Ethics approval and consent to participate: Not applicable.

Consent for publication:
Not applicable.
Availability of data and materials Not applicable.

Competing interests:
The authors declare that they have no competing interests. Authors contributions: SP prepared and wrote the manuscript, searched literature; ZY planned and designed the study; ZS, TJ and YX collected and interpreted the data; SP, ZY, and CW analyzed the data and organized the logic of the paper. All authors read and approved the nal manuscript.     probability of overall survival is plotted on the x-axis; actual overall survival is plotted on the y-axis.