Development and Validation of Nomograms to Predict Prognosis of Oral and Oropharyngeal Mucoepidermoid Carcinoma: A SEER-Based Study


 Background: Due to the low incidence of mucoepidermoid carcinoma, there lacks sufficient studies for determining optimal treatment and predicting prognosis. The purpose of this study was to develop prognostic nomograms, to predict overall survival and disease-specific survival (DSS) of oral and oropharyngeal mucoepidermoid carcinoma patients, using the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) database. Methods: Clinicopathological and follow-up data of patients diagnosed with oral and oropharyngeal mucoepidermoid carcinoma between 2004 and 2017 were collected from the SEER database. The Kaplan-Meier method with the log-rank test was employed to identify single prognostic factors. Multivariate Cox regression was utilized to identify independent prognostic factors. C-index, area under the ROC curve (AUC) and calibration curves were used to assess performance of the prognostic nomograms. Results: A total of 1230 patients with oral and oropharyngeal mucoepidermoid carcinoma were enrolled in the present study. After multivariate Cox regression analysis, age, sex, tumor subsite, T stage, N stage, M stage, grade and surgery were identified as independent prognostic factors for overall survival. T stage, N stage, M stage, grade and surgery were identified as independent prognostic factors for disease-specific survival. Nomograms were constructed to predict the overall survival and disease-specific survival based on the independent prognostic factors. The fitted nomograms possessed excellent prediction accuracy, with a C-index of 0.899 for OS prediction and 0.893 for DSS prediction. Internal validation by computing the bootstrap calibration plots, using the validation set, indicated excellent performance by the nomograms. Conclusion: The prognostic nomograms developed, based on individual clinicopathological characteristics, in the present study, accurately predicted the overall survival and disease-specific survival of patients with oral and oropharyngeal mucoepidermoid carcinoma.


INTRODUCTION
Mucoepidermoid carcinoma (MEC) is the most common histologic subtype of the minor salivary gland malignancies, and mainly arises from the minor salivary glands of the upper aerodigestive tract, with the oral cavity and oropharynx representing the most common anatomic locations [1][2][3] . Due to the low incidence of this tumor, studies are lacking for determining the optimal treatment, as well as predict prognosis.
The Surveillance, Epidemiology and End Results (SEER) database (https://seer.cancer.gov/statfacts/) is a publicly available resource for studies of cancer-based epidemiology, TNM staging, treatment and survival. It consists of 18 cancer registries and covers about 30% of the total US population. This database is a valuable tool for the analysis of rare cancers.
A nomogram is a visualization tool that has been widely used to predict the prognosis of cancer patients [4][5][6][7] . It is valuable for clinical decision-making and patient counseling.
However, a prognostic nomogram for MEC arising from the oral cavity and oropharynx has not been developed. In this study, we aim to establish and validate prognostic nomograms to predict overall survival and disease-specific survival of oral and oropharyngeal MEC using data extracted from the SEER database. We expect that our study will improve the understanding of oral and oropharyngeal MEC and optimize clinical decision-making, as well as patient counseling.

Data Source and Study Population
All relevant data in this study were obtained from the SEER database by using SEER*Stat software version 8.3.6. We selected all cases of MEC (International Classification of Disease code: 8430) arising from oral cavity or oropharynx in the SEER database from 2004 to 2017. Inclusion criteria were as follows: the primary location of the tumor was the oral cavity or oropharynx; the diagnosis was histologically confirmed as MEC; clinicopathological characteristics, including age, race, sex, TNM staging, tumor subsite and surgery, were known in detail; follow-up data including survival time, vital status and cause-specific death classification were reported. Patients with missing or unknown clinicopathological data, or incomplete follow-up information were excluded in the analysis.
Demographic and clinicopathological variables extracted from the SEER database included age, race, sex, pathologic TNM status (AJCC Staging Manual, 7th edition), tumor subsite, histologic grade, surgery , survival time, vital status and cause-specific death classification. The primary outcomes were overall survival (OS) and disease-specific survival (DSS). OS was defined as the interval between the date of diagnosis and the date of death from any cause or censoring. DSS was defined as the time from diagnosis to death specific to MEC-related causes.

Development and Validation of Prognostic Nomograms
The dataset extracted from the SEER database was randomly split into 70% for the training set and 30% for the validation set. The training set was used to identify independent prognostic factors and establish nomograms, and the validation set to test the performance of the nomograms. First, survival data from the training set were analyzed using a Kaplan-Meier survival plot to identify single significant factors of survival outcomes. Then, variables with p-values less than 0.1 by the log-rank test were included in the multivariate Cox hazard ratio regression to select the best model for predicting overall survival and disease-specific survival of MEC. Subsequently, a prognostic risk score was calculated by combining the status of prognostic factors and their regression coefficients from the multivariate Cox regression model, and patients were divided into high-risk and low-risk groups according to the risk score. Then, Kaplan-Meier curves were plotted to compare the OS and DSS of low-risk and high-risk groups. Finally, nomograms for predicting the OS and DSS of oral and oropharyngeal MEC were constructed using the independent prognostic factors selected by multivariate Cox regression.
The area under the ROC curve (AUC) and C-index were calculated to assess the predictive power of the prognostic models. In addition, calibration curves were drawn based on the bootstrap method to assess the performance of the prognostic nomograms.

Statistical Analysis
All statistical analyses were performed with R software (Version 4.0.2). The Kaplan-Meier method was used to identify single significant prognostic factors of survival outcomes. The variables found to be statistically significant by log rank test in survival analysis were included in the multivariate Cox hazard ratio regression to screen for independent prognostic factors. Then, prognostic nomograms for predicting overall survival (OS) and disease-specific survival (DSS) were developed by R software using the "rms" package. The predictive capacity of the prognostic nomograms was estimated by using a calibration curve, the concordance index (C-index) and the area under the ROC curve (AUC). A time-dependent receiver operating characteristic (ROC) curve was constructed by using the "survivalROC" package. The calibration curve was plotted using "rms", "foreign" and "survival" packages. All significance tests were two-sided, and a P < 0.05 was considered significant.

Characteristics of the Study Cohort
We identified 1230 oral and oropharyngeal MEC patients that met our inclusion criteria. The median follow-up time for the cohort was 66 months. The median age at diagnosis was 53 years. Among the 1230 patients, 524 (42.6%) were men and 706 (57.4%) were women. With respect to race, 935 (76.0%) patients were white, 156  Table 1. In addition, the chi-square test showed the absence of significant deviation in distribution between the training and validation sets.

Survival Analysis and Prognostic Factor Identification
We employed Kaplan-Meier analysis to identify single significant prognostic factors in the training set. Our results showed that the 3-year, 5-year and 10-year OSs were 90.8%, 87.2% and 80.8%, respectively. The 3-year, 5-year, and 10-year DSSs were 93.8%, 92.4% and 90.2%, respectively. The influence of the following factors on OS and DSS of oral and oropharyngeal MEC was investigated: age, sex, race, AJCC TNM status, tumor subsite, histologic grade and treatment. With respect to overall survival analysis, all the factors mentioned above yielded a p-value less than 0.1 by the log-rank test (Fig. 1). For disease-specific survival analysis, all the factors described above, except for race, yielded a p-value less than 0.1 by log-rank test (Fig.   2). Then, related prognostic factors with a p-value less than 0.1 by the log-rank test were adjusted for multivariate analysis. The forest plots of the multivariate Cox regression for OS and DSS are shown in Fig. 3A and Fig. 4A, respectively. After multivariate analysis, age, sex, race, TNM status, tumor subsite, grade and surgery were identified as independent prognostic factors for OS (p<0.05). For DSS, only T stage, N stage, M stage, grade and surgery remained independent factors. Then, based on the prognostic model score, MEC patients in the training cohort were stratified into a high-risk group and low-risk group, and Kaplan-Meier survival curves were plotted to compare the overall survival and disease-specific survival of the model-predicted risk groups. Results strongly demonstrated that patients in the high-risk group had significantly poorer overall or disease-specific survival than patients in the low-risk group (p <0.0001) (Fig.5).

Prognostic Nomogram Development and Validation
In the current study, two prognostic nomograms were developed by using the independent prognostic factors identified by multivariate Cox regression (OS nomogram is shown in Fig. 3B and DSS nomogram is shown in Fig. 4B). Based on the total score generated by adding the score obtained from individual characteristics,  (Fig. 8). Third, calibration curves based on the bootstrap resampling method were plotted. In comparison to the ideal model, the calibration curves for 3-year, 5-year, 10-year OS and DSS were excellent predictors (Fig. 9). Finally, all patients in the validation set were classified into high-risk and low-risk groups by the risk score calculated using the OS and DSS prediction models. Similar to the training set, survival analysis showed that patients in the high-risk group had worse OS and DSS than those in the low-risk group in the validation set (Fig. 10, p <0.0001).

DISCUSSION
In the current study, we developed and validated two prognostic nomograms, one for There are several advantages in the present study. First, our nomograms focus only on one histological subtype of minor salivary gland carcinoma arising from the oral cavity or oropharynx, which enables greater accuracy and reliability of our nomograms. Specifically, our prognostic nomograms yielded a very high C-index (0.899 for OS prediction and 0.893 for DSS prediction) and AUC, indicating excellent prognostic prediction ability compared to previous models 3, 13,16 . Last but not least, one independent validation set and three validation methods were utilized in the current study to guarantee stability and reliability of our prognostic nomograms.
Several limitations in this study should be considered. First, the SEER database does not contain certain important clinicopathological information, such as depth of invasion, surgical margins, perineural invasion, and lymphovascular invasion. Second, although three different methods had been utilized for internal validation of our prognostic nomograms, an appropriate cohort for external validation unfortunately was not available. An external validating set will be needed to confirm our findings in the future. Third, the current study is a retrospective analysis. Further prospective research in larger cohorts will be necessary to get a more precise and reliable prediction model.

CONCLUSIONS
In conclusion, we developed and validated two prognostic nomograms to predict OS and DSS for patients with oral and oropharyngeal mucoepidermoid carcinoma. These prognostic nomograms ought to be valuable in clinical decision-making and patient counseling.