Analysis Process
The analysis process of this study is shown in Supplementary Fig. 1. The transcriptome RNA-seq data from 570 samples and clinical information of the THCA patients were downloaded from TCGA database and used to estimate immune and stromal components. DEGs were extracted by dividing into high- and low-score groups based on the median stromal and immune scores. DEGs were used to constructed PPI network and univariate COX regression analysis. We performed intersection analysis between PPI networks core nodes and the top significant factors gained from the analysis of univariate COX regression. Our data showed the presence of only one intersection gene, the CFB. Subsequently, we evaluated the role of the CFB in the patient survival, as well as its clinical relevance.
Characterization of THCA patients and clinical correlation with stromal or immune scores
A total of 570 THCA samples were obtained from the TCGA database. Stromal scores, immune scores and ESTIMATE scores in the TME were generated based on the ESTIMATE algorithm. Our univariate analysis demonstrated that the clinicopathological characteristics were significantly associated with the stromal scores and immune scores (Fig. 1). Besides, the Stromal score of the M0 stage was higher than that of the M1 stage (Fig. 1B, p = 0.027). On the other hand, the immune score differed significantly between the N0 and the N1 stage (Fig. 1C, p = 0.00014). The estimate score varied across the different M stages (Fig. 1A, p = 0.035). However, there were no statistically significant differences between stromal score, immune score and estimate score in T2 patients from that in T1 or T3 patients. Furthermore, whereas there was significant difference between the scores in stage Ⅱ and stage Ⅰ,Ⅲ or Ⅳ, other clinical stages did not show any significant differences. Besides, no statistically significant differences were observed in the stromal score, immune score and estimate score between the patients’ age and gender (Fig. 1A, 1B, 1C). However, there were significant differences in the N stage (Fig. 1A, 1B, 1C; p = 0.0034, 0.00014, 0.0003). These data showed that the proportion of immune and stromal cells was associated with the growth and progression of THCA.
DEGs and Functional Enrichment Analysis
To identify the exact alterations in gene expression in the TME, the THCA patients were categorized into high- and low-score groups based on the stromal/immune cells. After performing the differential expression analysis of the groups as categorized by the stromal scores, we obtained a total of 1031 DEGs. Among them, 79 genes were down-regulated while 952 genes were up-regulated (Figure 2A, 2B). Similarly, differential analysis of the high- and low- immune score patients showed 1294 DEGs consisting of 318 downregulated genes and 976 upregulated genes. In addition, the Venn diagrams, drawn from an intersection of the DEGs between stromal and immune score groups, showed a total of 72 downregulated genes and 818 upregulated genes (Figure 2C, 2D). The DEGs might have been responsible for the TME status. All the 890 DEGs were used to conduct GO and KEGG functional enrichment analysis via the clusterProfiler R package in R language. GO enrichment data showed that the DEGs were significantly mapped into T cell activation and lymphocyte differentiation (Figure 2E, 2F, 2G). In addition, the KEGG enrichment analysis suggested that most of the pathways were associated with immune responses, such as cytokine–cytokine receptor interaction and chemokine signaling pathway (Figure 2H, 2I, 2J).
PPI Network Analysis and Univariate COX Regression Analysis
To further explore the relationship between the DEGs, we used STRING online tool and Cytoscape software (version 3.8.2) to perform a PPI network (Figure 3A, 3B). The top 30 genes with maximum number of nodes in the PPI network were presented as bar plots (Figure 3C). Univariate COX regression analysis of the 890 immune-related DEGs showed that 4 genes were potential prognostic or risk factors for THCA (Figure 3D). Furthermore, an intersection analysis of the top 100 genes in PPI network and the 4 prognostic genes showed CFB as the only one core target gene (Figure 3E).
CFB Expression Levels, Survival and Clinical Correlation Analysis
To evaluate the role of CFB expression in clinical disease manifestations, we profiled the expression of CFB in normal and tumor tissues using “limma” package in R language (Wilcoxon rank sum test, P<0.001). The CFB expression in normal tissues was significantly lower than in tumor tissues (Figure 4A). Furthermore, the CFB expression in paired tumor tissues was significantly higher than the expression level in adjacent normal tissues (Figure 4B). To perform survival analysis, all the THCA cases were divided into CFB high and CFB low-expression groups based on the CFB median expression value. The survival analysis indicated that THCA patients with higher CFB expression had a longer survival compared to those with lower CFB expression (Figure 4C). Besides, our clinical correlation analysis of CFB showed that, except for the low CFB expression in stage Ⅱ compared to stage I or Ⅲ, there was no significant difference in the other clinical stages (Figure 4D). In addition, the expression level in N1 was higher than that in N0 (Figure 4G). M0 and M1 did not show any significant differences in the CFB protein expression (Figure 4F). Furthermore, our data showed that there was significant difference between the T2 and T3 or T4 groups (Figure 4E).
GSEA
To determine significantly enriched signaling pathways, we performed GSEA in the high-expression and low-expression groups and then used FDR q-value, nominal p-value and FWER p-value for data interpretation. Our KEGG enrichment data showed that the CFB gene was highly expressed in signaling pathways such as antigen processing and presentation, autoimmune thyroid disease, cell adhesion molecules cams, intestinal immune network for IGA production, leishmania infection, natural killer cell mediated cytotoxicity, Escherichia coli infection, systemic lupus erythematosus, type 1 diabetes mellitus or viral myocarditis (Figure 5A). On the other hand, the GO enrichment analysis showed upregulated CFB expression in pathways such as adaptive immune response, interferon gamma mediated signaling pathway, positive regulation of T cell mediated cytotoxicity, positive regulation of T cell mediated immunity, regulation of leukocyte mediated cytotoxicity, regulation of syncytium formation by plasma membrane fusion, regulation of T cell mediated cytotoxicity, regulation of T cell mediated immunity, T cell activation involved in immune response or T cell mediated immunity (Figure 5B). These results indicated that CFB may be a potential indicator for the TME status.
Tumor-infiltrating immune subsets in the TME
CIBERSORT algorithm be used to calculate the proportion of the tumor-infiltrating immune cells in the TME among the THCA cases. Our data suggested that the proportion of the 21 immune cells in every case, as well as the sum of various immune cells was 1 (Figure 6A). Besides, the correlation analyses indicated that CD8 T cells displayed a strong positive correlation with Plasma cells (Cor=0.57) (Figure 6B). On the contrary, plasma cells were negatively correlated with Macrophages M0 (Cor=-0.58).
Effect of the expression of CFB on TIC levels and the Correlation between the CFB expression and the TICs
To explore the potential functions of CFB in the TME, we classified the patients into high- and low-expression groups based on the expression of CFB. Our analysis showed that a total of 10 TICs were perturbed by the expression of CFB (Figure 7A). B cells naïve (p=0.034), plasma cells (p<0.001), T cells CD4 memory activated (p<0.001), T cells follicular helper (p=0.004), T cells regulatory (p=0.019) and Macrophages M1 (p<0.001) were positively correlated with high CFB expression. However, Macrophages M0 (p<0.001), Macrophages M0 (p=0.002), Mast cells resting (p=0.003) and Eosinophils (p=0.030) were significantly upregulated in the low-expression group.
In addition, the data indicated a correlation between the CFB expression and TICs in the THCA patients (Figure 7B). The CFB gene was positively correlated with T cells CD8, Macrophages M1, Plasma cells, T cells CD4 memory activated, T cells follicular helper or T cells regulatory (Tregs), but negatively correlated with Eosinophils, Macrophages M0, Macrophages M2, Mast cells resting or T cells CD4 memory resting.
Consequently, to identify the most closely related immune cells, we performed the intersection analysis based on both the difference analysis and correlation tests with Venn diagram. Our data showed positive correlation between the Plasma cells, T cells CD4 memory activated, T cells follicular helper, T cells regulatory (Tregs), Macrophages M0, Macrophages M1, Macrophages M2, Mast cells resting or Eosinophils with CFB expression (Figure 7C).
GEO database verification
We downloaded four datasets from GEO to further verify the accuracy of the previous results from TCGA analysis. The expression of CFB was significantly higher in PTC compared with than that in normal thyroid samples in GSE33630, GSE29265, GSE35570 and GSE27155 (Figure 8, P < 0.05). The results of GEO datasets were essentially in agreement with the findings of the TCGA database. The survival information of CFB in PTC could unfortunately not be obtained from GEO datasets.