Identification of reliable DE-MTGs
We conducted the research according to the flowchart shown in Figure 1. Eventually, four independent datasets (GSE29265, GSE33630[33], GSE35570[34], and GSE60542[35]) with 134 PTC tumor samples and 146 normal thyroid samples were enrolled. Especially for dataset GSE35570, PTC samples derived from Chernobyl radiation-exposed patients were excluded. In all, 587, 851, 1716, and 777 DEGs were classified from GSE29265, GSE33630, GSE35570, and GSE60542, respectively PTC versus normal thyroid samples. 702 DEGs, including 349 up- and 353 down-regulated, were identified with the RRA (Supplementary Table 1). The information of GEO datasets was listed in Table 1. The top 20 upregulated and downregulated DEGs as shown in Figure 2A. We downloaded a list including 1938 experimentally supported MTGs from HCMDB (Supplementary Table 2) to intersect with DEGs. Finally, 155 reliable DE-MTGs were identified, among which 98 were upregulated, and 57 were down-regulated (Figure 2B, Supplementary Table 3).
Functional enrichment analysis
We analyzed the potential function and pathway enrichment of the 155 DE-MTGs (Figure 3A-D). In terms of BPs, the 155 DE-MTGs were mainly enriched in the cellular matrix organization, extracellular structure organization, and cell-substrate adhesion (Figure 3A). In terms of CCs, the DE-MTGs identified were significantly enriched in the extracellular matrix, focal adhesion, and membrane raft (Figure 3B). In terms of MFs, the DE-MTGs identified were significantly enriched in receptor-ligand activity, signaling receptor activity, and so on. Pathway analysis further revealed that the 155 DE-MTGs were mainly enriched in the proteoglycans in cancer, PI3K-Akt signaling pathways, and so on (Figure 3D).
Identification and establishment of a 10-gene signature
After excluding cases in which PFI was ≤ 30 days, 488 PTC samples with complete follow-up information were finally included in the analysis. The baseline clinical characteristics are as shown in Table 2. Twenty-eight PFI-related DE-MTGs were identified in training set using the Cox proportional-hazards model. Forest plots of the logfc, P-value and hazard ratio of each item are shown in Figure 4A. The potential protein-protein interaction network of the 28 items was explored using the STRING database as shown in Figure 4B[36]. A novel gene signature consisting of the following 10 DE-MTGs was constructed and listed in Table 4. The formula to calculate the risk score was as follows: β1 × gene-one's expression + β2 × gene-two's expression + ····· + βn × gene-N's expression, where β is the corresponding correlation coefficient. The patients with the high-risk have shorter PFIs by using the Log-rank analysis with the training, testing, and total (P < 0.05; Figure 5 A-C). The correlations between risk scores and recurrences are presented in Figure 5 D-F. In the training set, the AUCs for PFI prediction was 0.799 (1-year), 0.781 (2-year), and 0.737 (3-year), respectively (Figure 5G). The C-index was 0.752. In the testing set, the AUCs were 0.850, 0.750, and 0.649 (Figure 5H). The C-index was 0.750. In the total set, AUCs were 0.812, 0.772, and 0.717 (Figure 5I). The C-index was 0.748. After optimizing the training set by enrolling all the 488 cases, thirty-one PFI-related DE-MTGs were identified, then the optimized 10-gene signature was generated as shown in and Figure 6 and Table 4. In the total set, the AUCs for PFI prediction was 0.836 (1-year), 0.775 (2-year), and 0.736 (3-year), respectively (Figure 6D). The C-index was 0.756. Collectively, our results indicated that the 10-gene signature functions well in the PFI forecast for PTC.
GSEA
To seek the potential alteration underlying the 10-gene signature, we conducted GSEA in PTC from the TCGA-THCA (Figure 7A–C). For KEGG pathways, the molecular alterations in the high-risk group samples were related to the homologous recombination, cell cycle, and DNA replication. For the oncological signatures, including the AKT_UP.V1_DN, MTOR_UP.VI_DN, and CYCLIN_D1_UP.VI_UP, were related. GSEA results were presented in Supplementary Table 4.
RT-qPCR analysis of MTGs based signatures in PTC cell lines
The relative 10 gene expression level of MTGs signature in B-CPAP and KTC-1 cell were generated through RT-qPCR quantification. In 10 genes of MTGs signature, the expression level of KISS1R, EZH2, and FAM3B were higher in KTC-1 than B-CPAP, and the expression level of FBLN5 and SDPR were higher in B-CPAP than KTC-1, the differences were statistically significant (P<0.05). The differences of TGFBR3, ARHGDIB, SDC2, SOD3 and CCL14 were not statistically significant (P>0.05), as shown in Figure 8A. The MTGs signature scores of KTC-1 were higher than B-CPAP, the difference was statistically significant (P<0.05), as shown in Figure 8B.
Clinical correlation of the MTGs based signature
In groups divided by a series of pivotal clinical and pathological characters, patients who were with mutated BRAF status, advanced extra-invasion existence, advanced T stage had higher risk-scores, as shown in Figure 9B-E. Patients who were with lymph nodes metastasis, residual tumor, uni-focality, tall cell histological type and advanced disease stages also had higher risk-scores, as shown in Figure 9F-J, the differences were statistically significant (P<0.05).
Verification of the novel 10-gene signature in external datasets
We validated the pattern of 10-gene signature in six external GEO datasets and compared the risk-scores between ATC/PDTC/PTC/FTC. For 10-gene signature, in the datasets GSE 29265 and GSE 33630, risk-scores were higher in ATC samples compared to PTC samples (p < 0.01, 0.0001, respectively), as shown in Figure 10A, B. In the dataset GSE 76039, risk-scores were higher in ATC samples than PDTC samples (p < 0.01), as shown in Figure 10C. In GSE82208, the difference of risk-scores between follicular adenoma and FTC was statistically significant (P<0.0001), as shown in Figure 10D. Also, in GSE58545 and GSE5364, the difference of risk-scores between normal thyroid samples and PTC samples was statistically significant (P<0.0001), as shown in Figure 10E, F. The AUCs of diagnostic ROC for differentiating sample’s type were also presented.
Prognosis-associated factors of the PFI in PTC
488 TCGA PTC Cases were enrolled to identify prognostic factors. Univariate analysis showed that age, T stage, M stage, AJCC stage, neoplasm size, histological type and risk-scores were significantly PFI-related (p < 0.05) (as shown in Table 3). A total of 408 cases with complete information were then enrolled in multivariate analysis, and the risk-score (p < 0.001) was the only independent factor for prognosis, as shown in Table 3.
Construction and validation of the predictive nomogram
A nomogram for predicting the AUCs of PTC was built with a stepwise Cox regression model (Figure 11A). Parameters including risk-score, RAS mutation, neoplasm size, residual tumor, age, TNM stage and histological type were incorporated in the nomogram. For the histological type, the tall cell variant was defined as aggressive[37]. The calibration curve and DCA showed that the nomogram functions well in predicting the PFIs (Figure 11B, C). The AUCs for 1-, 2-, and 3-year PFIs were 0.940, 0.793 and 0.779, respectively (Figure 11D).