The design of this model was based on patients from Surveillance, Epidemiology, and End Results (SEER) Program supported by the Surveillance Research Program (SRP) in NCI's Division of Cancer Control and Population Sciences (DCCPS). As shown in Table 1, a total of 134,962 patients were diagnosed with thyroid carcinoma from 2004 to 2015, 32,783 of which were male and 102,179 of which were female. Each variable was stratified by gender and its percentages did not contain N/A cases. T0 was excluded from our study because there were only 290 cases identified and didn’t have statistical value. Also, TX, T NOS, NX, N NOS, as well as MX were excluded because they didn’t have significance for our study. All results and graphs were produced by R Project 3.6.1, Empower Stats 2.20 and IBM SPSS statistics 23.
Variable selection and univariate analysis
Gender is considered a risk factor which affects the outcome of thyroid carcinoma and it’s well-known to us that female has a better chance of survival than male. The impact of gender has always been controversial as some researchers claim that there is a significant statistical difference in terms of survival between male and female(8), yet others say there is indeed a difference when gender is considered as an independent factor and this so-called significant difference will vanish in multivariate analysis(9). That’s why, in our study, we analyzed gender influence in both univariate and multivariate model to explore whether this factor can be an independent predictive variable.
Age as a risk factor is introduced into the 8th edition of the AJCC Cancer Staging Manual and is divided into two groups using 55 years old as a threshold instead of the previous 45 years old in the 7th edition(10). This is very important for patients between 45-55 years old in the purpose of preventing over-staging in low-risk patients and preventing over-aggressive treatment(11).
Among the four main types of thyroid carcinoma, ATC (Anaplastic thyroid carcinoma) is the one with the rarest incidence and accounts for the majority of deaths from thyroid carcinoma despite its rare morbidity due to its malignant character(12). By contrast, PTC (Papillary thyroid carcinoma) is the commonest type with an excellent prognosis (survival rates of >95% at 25 years) and can be especially found among women(13). FTC (Follicular thyroid carcinoma) is another less common type of well-differentiated thyroid carcinoma. MTC is an aggressive form of thyroid carcinoma causing about 8% to 15% of all thyroid cancer-related deaths(14). Different histology comes with a different prognosis and because of this difference, it’s important to put histology variable into univariate analysis to see how they contribute.
There are many changes in the latest 8th edition of the AJCC cancer staging manual: For PTC, FTC and ATC, T3a is a new category and refers to a tumor >4 cm in greatest dimension limited to the thyroid gland (this number is ≥4cm for MTC), T3b is a new category and is defined as a tumor of any size with gross extrathyroidal extension invading only strap muscles (sternohyoid, sternothyroid, thyrohyoid, or omohyoid muscles), as well as level VII lymph nodes were added to N1a and MTC has been removed from above becoming a new chapter(15). Because of these changes and the latest version of SEER program didn’t provide with details of 8th edition, we converted all the patients selected from 6th edition and 7th edition to 8th edition using IBM SPSS for further analysis.
There are mainly five strategies for DTC (Differentiated thyroid carcinoma) patients treatment including: TSH-suppressive therapy, 131I therapy, locoregional and adjuvant/adjunctive treatments (like surgery, radiotherapy, thermal/ethanol or cryoablation or embolization), targeted treatment, re-differentiation and other novel therapeutic approaches(16). All ATC patients fail to uptake iodine and are usually resistant to chemotherapy and the preferred strategy is surgery according to the American Thyroid Association (ATA) guidelines(17). As for patients with unresectable primary tumors, the role of surgery is to establish advantageous conditions to further perform palliative protocols(18). Different strategy should produce different prognosis, so we selected three factors including chemotherapy, 131I therapy, and surgical method to explore whether these treatment factors can be used as predictive variables.
All those factors above are associated with prognosis of thyroid carcinoma, so we evaluated influences of these factors by putting them into univariate COX regression model and Kaplan-Meier model.
Multivariate analysis and variable screening
To find out whether a certain variable still shows significantly statistical difference when other variables exist at the same time, we had to put all these variables into a COX regression model for multivariate analysis. COX model, also known as proportional hazards model, is widely used in medical researches to analyze the influences of multiple risk factors(19). In this step, we discarded those variables which may show significantly statistical difference in univariate analysis but may not in multivariate COX analysis. This COX model could produce several coefficients which later was used to develop a nomogram model.
Test of clinical use
Conventionally, there are mainly several diagnostic test indicators such as sensitivity, specificity and AUC as demonstrated below and these indicators only measure the diagnostic accuracy of the prediction model, but fail to consider the clinical availability of it. DCA (Decision Curve Analysis) is such a novel tool which can be used to evaluate whether a prediction model has clinical usage by calculating the value of net benefit within certain range of threshold probabilities(20, 21). This net benefit is produced by comparing the difference between expected benefit and expected harm related to each proposed testing and treatment method(22). We used this tool to analyze the clinical availability of the final model.
Design and validation of predictive nomogram model
Based on cox model final results (coefficients of all variables), we then used an R package called RMS to plot a nomogram to estimate 1-year, 3-year and 5-year survival probability with a line segment(23). In order to test the accuracy of this model, we divided all patients into two groups randomly- The first dataset, training set, was used to build the nomogram model accounting for 80% (94,474 cases) and the second dataset, validation set, was used for external validation accounting for 20% (40,488 cases). The accuracy of this nomogram model can be evaluated by AUC, C-index (Harrell’s Concordance Index), and calibration plot(24, 25). We used this model to predict patients’ survival probability of 1-year, 3-year and 5-year time point and calculated the AUCs, C-indexes as well as calibration performances of each time point of each dataset.