Data preprocessing and screening of DEGs
In this study, the GSE55235 and GSE12021 datasets were batch-corrected using the sva package and the DEGs of the combined datasets were identified using the limma package. A total of 390 DEGs were screened out, including 184 upregulated genes and 206 downregulated genes, according to the critical criteria of |log2FC|\textgreater1 and p<0.05. The DEGs of the combined dataset are displayed with a volcano plot in Figure 1B; the top 30 upregulated and downregulated DEGs are displayed on the heat map in Figure 1A.
Building coexpression modules through WGCNA
The expression matrix of the 39 patients in the combined dataset was read in R. Weighted gene coexpression analysis was used for the two groups of samples, and the samples were clustered using the class average method (average) in the hclust function. The clustering results are shown in Figure 2A. The disease grouping data of the patients were imported, and the relationship between the sample grouping and the sample dendrogram was visualized, as shown in Figure 2B. The construction of the weighted gene coexpression network required a soft threshold β to calculate the adjacency matrix weight parameter, and the optimal soft threshold was 6 using the pickSoftThreshold function; the selection process is shown in Figures 2C, D, and E. Further construction of the weighted gene network was based on TOM, the minimum module size was set to 50, and a total of 30 gene modules were detected. The module identification results are shown in Figure 2F. Module-trait correlation analysis showed that, in terms of disease characteristics, the yellow module (765) showed the strongest negative correlation (cor=-0.91, 8e-16), and the midnight-blue module (312) showed the strongest positive correlation (cor=0.78, 4e-09); the results are shown in Figure 2G.
Acquisition of hub genes and GO functional enrichment analysis and KEGG pathway enrichment analysis
A total of 1,077 key module genes were intersected with 390 DEGs to obtain 161 hub genes, and a Venn diagram was drawn. As shown in Figure 3A, KEGG pathway enrichment analysis (Table 1) and GO function enrichment analysis (Table 2) were performed on the hub genes, which enriched 810 biological processes, 62 molecular functions, 21 cellular components, and 42 pathways (p<0.05). For BPs, the top 10 enrichment items were mainly involved in the response to peptides, cellular response to external stimuli, negative regulation of immune system processes, muscle cell proliferation, and other biological processes (Figure 3B). The top 10 enrichment items for CCs and MFs are shown in Figure 3B. According to the findings of the KEGG pathway enrichment study, the hub genes were primarily involved in cellular metabolic processes, including osteoblast differentiation and the signaling pathways of tumor necrosis factor (TNF), interleukin (IL)-17, p53, and JAK-STAT (Figure 3C).
Screening for signature biomarkers through a comprehensive strategy
First, we applied the LASSO regression model for variable screening. As shown in Figure 4A, each color curve in the first plot represents the trajectory of each independent variable coefficient. The later the coefficient is compressed to zero, the more important the variable becomes as the lambda value changes. The vertical coordinate in the second plot is the binomial deviance (dichotomous anomaly), which represents the magnitude of the error of the model. Our goal was to select the model with the lowest possible lambda characteristics and the smallest possible error. Lambda.min obtained the smallest mean value of the target covariates among all lambda values. The optimal lambda was determined to be 0.022 after 10-fold cross-validation. Fifteen genes were identified as feature genes of OA among the 161 target genes (Figure 4A, B; Table 3). In the SVM-RFE algorithm, the classifier error was minimized when the number of features was 15 (Figure 4C; Table 4). In addition, 161 target genes were used to construct the RF model (Figure 4D; Table 5), and the top 30 genes were ranked according to their relative importance and identified as characteristic genes. Figure 4C shows the performance of the random forest algorithm. By applying the gene set, we built the random forest model separately according to the different number of trees and calculated the error rate of this model. The number of trees corresponding to the point with the lowest error rate was adopted. The genes obtained by the three algorithms were intersected and four final diagnostic signature genes were obtained: FKBP5, EPYC, KLF9, and PDZRN4 (Figure 4E).
Diagnostic column line graphs and ROC curves
A nomogram (Figure 5A) was constructed as a diagnostic tool for OA progression by combining four diagnostic signature genes (FKBP5, EPYC, KLF9, and PDZRN4). In the nomogram, a score was given to the expression of each gene, and a total score could be calculated for each patient. This total score corresponded to the probability of OA occurrence in each patient. Calibration curves and DCA curves were applied to the dataset (GSE55235, GSE12021, and GSE55457) to construct the predictive ability of the nomogram. The calibration curves (Figure 5B) indicated that the nomogram could predict the occurrence of OA more accurately. DCA curves showed that the patients diagnosed with OA could benefit from the clinical treatment (Figure 5C). The ROC curves of the genes (Figure 5D) indicated that they all have the potential to be new diagnostic markers with AUC values of 0.989 (EPYC), 0.916 (FKBP5), 0.992 (KLF9), and 0.874 (PDZRN4). The validation set (GSE55457) could be similarly confirmed (Figure 5E), with AUC values of 0.970 (EPYC), 0.940 (FKBP5), 0.980 (KLF9), and 0.930 (PDZRN4).
Signaling pathways associated with signature genes
We divided the dataset into high- and low-expression groups according to the median value of signature gene expression and performed GSEA on four signature genes. The results revealed that several signature gene high-expression groups were significantly enriched in immune response-related pathways, the IL-17 signaling pathway, allograft rejection, and the TNF signaling pathway (Supplementary Figure 1).
Hallmark Gene Set with ssGSEA
Further study of phenotypic differences between OA and normal samples using ssGSEA showed that in the set of all genes with significant differences, the OA group scored higher than controls in peroxisomes, and the OA group scored lower than controls in the gene sets for IL-2 STAT5 signaling, UV response up, p53 pathway, estrogen response early, apoptosis, TGF-β signaling, cholesterol homeostasis, hypoxia and TNF-α signaling via NF-κ (Figure 6A). EPYC was significantly negatively correlated with UV response up, TNF-α signaling via NF-κ, TGF-β signaling, the p53 pathway, hypoxia, early estrogen response, and apoptosis, and significantly positively correlated with peroxisome and interferon alpha responses. FKBP5 was significantly positively associated with UV response up, unfolded protein response, MYC targets V2, hypoxia, cholesterol homeostasis, and androgen response. KLF9 was significantly positively associated with UV response up, unfolded protein response, TNF-α signaling via NF-κ, P53 pathway, MYC target V2, MYC target V1, hypoxia, early estrogen response, cholesterol homeostasis, apoptosis, androgen response, and significantly negatively correlated with peroxisomal, notch signaling, and interferon alpha responses (Figure 6B).
Diagnostic signature gene correlation and expression validation
The expression of these four genes in OA was verified using the pooled dataset of GSE55235 and GSE12021 (Figure 7A). FKBP5 and KLF9 were expressed at significantly lower levels in patients with OA, and EPYC and PDZRN4 were significantly highly expressed in patients with OA, which was confirmed in the GSE55457 dataset (Figure 7B). Additionally, we calculated the Pearson correlation coefficient between diagnostic feature genes. As shown in Figure 7C, there were significant interactions between the diagnostic features. Except for the significant positive correlations between FKBP5 and KLF9, EPYC, and PDZRN4, there were significant negative correlations between the remaining genes, suggesting that the four genes may not have mutually reinforcing relationships.
Immunoinfiltration analysis
The CIBERSORT algorithm was used to examine the fraction of 22 immune cells in the tumor microenvironment, and the results are shown in Figure 8A. We did not detect any appreciable infiltration of activated dendritic cells into the tumor microenvironment. Differential and correlation analyses of immune cells were performed to determine the correlation between OA and immune cell infiltration. In the differential analysis, seven immune cell types were statistically significant: memory B cells, plasma cells, resting memory CD4+ T cells, regulatory T cells (Tregs), activated natural-killer (NK) cells, resting mast cells, and activated mast cells (Figure 8C). The heat map (Figure 8B) demonstrates the Pearson correlation between immune cells, which showed that in patients with OA, naïve CD4+ T cells were significantly and positively correlated with activated memory CD4+ T cells; resting NK cells were significantly and positively correlated with naïve CD4+ T cells; activated mast cells were positively correlated with resting memory CD4+ T cells; eosinophils were positively correlated with activated NK cells; activated mast cells were negatively correlated with resting mast cells; naïve B cells were negatively correlated with memory B cells; and resting mast cells were negatively correlated with resting memory CD4+ T cells, which confirmed the feasibility of correlation analysis to predict biomarkers.
Analysis of the correlation between diagnostic signature genes and immune infiltrating cells
In the correlation analysis (Figure 9), EPYC was positively correlated with resting mast cells, plasma cells, memory B cells, and Tregs and negatively correlated with activated NK cells, resting memory CD4+ T cells, and activated mast cells. FKBP5 was positively correlated with activated mast cells, resting memory CD4+ T cells, and eosinophils and negatively correlated with Tregs and resting mast cells. KLF9 was positively correlated with activated mast cells, resting memory CD4+ T cells, activated NK cells, and eosinophils, and negatively correlated with plasma cells, quiescent mast cells, and Tregs. PDZRN4 was positively correlated with resting mast cells, memory B cells, plasma cells, CD8+ T cells, and Tregs and negatively correlated with resting memory CD4+ T cells and activated mast cells.
Expression of FKBP5 in normal and tumor tissues
To further explore the interactions between diagnostic signature genes, we used Genemania to construct PPI networks. FKBP5 had the highest degree value; therefore, we included FKBP5 in a deeper analysis. As shown in Supplemental Figure 2, the PPI network centered on FKBP5 was significantly enriched in NF-κB signaling, toll-like receptor signaling pathway, threonine kinase activity, and pattern recognition receptor signaling pathways, all of which play an important role in tumorigenesis and chemoresistance; therefore, FKBP5 was included in further analysis. Taking the TCGA database as an example (Figure 10A), among all 13 statistically significant tumor types, only two types of tumors (glioblastoma multiforme [GBM] and liver hepatocellular carcinoma [LIHC]) had significantly high expression of the FKBP5 gene. After merging the TCGA database with the GTEx database (Figure 10B), 18 of the 21 tumors had low FKBP5 expression, and the remaining three tumors, including brain lower grade glioma (LGG), GBM, and LIHC, had high FKBP5 expression. The expression of FKBP5 in these cell lines is shown in Figure 10C.
FKBP5 is associated with multiple tumor stages
The correlation between FKBP5 and pathological staging of different tumors was calculated using the online tool GEPIA. Lymph node diffuse large B-cell lymphoma (DLBC), head and neck squamous cell carcinoma (HNSC), kidney chromophobe, lung adenocarcinoma (LUAD), stomach adenocarcinoma (STAD), and uterine carcinosarcoma (UCS) were found to correlate significantly with tumor staging (Figure 11), and p<0.05 was considered statistically significant.
The relationship between FKBP5 and tumor prognosis
To assess the predictive link between gene expression and prognosis in each tumor, as illustrated in Figure 12, we created a Cox proportional hazards regression model. Prognostic significance was determined using a statistical test with the log-rank test. Finally, high expression in eight tumor types (GBMLGG, LGG, acute myeloid leukemia, LUAD, GBM, bladder urothelial carcinoma [BLCA], uveal melanoma [UVM], and STAD) was associated with a poor prognosis. Low expression in four tumor types (skin cutaneous melanoma [SKCM], skin cutaneous melanoma-metastasis, READ, and acute lymphoblastic leukemia) was associated with poor prognosis. Among these, FKBP5 had the highest risk for UVM. Patients were subsequently dichotomized according to the best cut-off values to obtain survival differences between the high and low FKBP5 expression groups in different cancer types. The results showed that the difference in survival was significant in all OS-related cancer types, while FKBP5 expression was not significantly correlated with OS in all TCGA tumor types (hazard ratio=0.94).
As shown in Figure 13, the correlation between gene expression and disease-free survival (DFS) in each tumor was analyzed by building a Cox proportional hazards regression model, and statistical tests were performed using the log-rank test. Unlike OS, there was a significant risk effect of FKBP5 expression in stomach and esophageal carcinoma, pheochromocytoma and paraganglioma, prostate adenocarcinoma (PRAD), and UCS, and FKBP5 expression levels were associated with DFS in a variety of tumors.
Correlation between FKBP5 expression and immune infiltration
We used the TIMER database to calculate the association between FKBP5 expression and multiple immune cell infiltration to determine whether gene expression affects the tumor immune environment. We found that infiltration scores for six immune cell types (B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells) in the TIMER database were significantly associated with FKBP5 expression in multiple malignancies (Figure 14A) and validated these using the algorithms xCELL (Figure 14B), quanTIseq (Figure 14C), MCPcounter (Figure 14D), and EPIC (Figure 14E).
Correlation of FKBP5 expression with cancer immune check-point gene expression
Several genes have been identified to be closely associated with immune responses and, therefore, act as immune check-points. We used cancer expression data from the TCGA database to examine the association between FKBP5 and immune check-points. In multiple cancer types, FKBP5 was significantly associated with multiple genes, including CD86, CD274, PDCD1LG2, CD28, CD48, CTLA4, TNFSF4, and bladder urothelial carcinoma CD200, FKBP5 was found to be coexpressed with more immune check-point genes in BLCA, SKCM, STAD, and testicular germ cell tumors (TGCT), which may indicate that FKBP5 is involved in the tumor immune response through the above immune checkpoints in BLCA, SKCM, STAD, and TGCT (Figure 15A).
FKBP5 correlates with TMB and MSI in some cancers
In UCS, uterine corpus endometrial carcinoma (UCEC), and colon adenocarcinoma (COAD), FKBP5 expression was positively correlated with TMB, while in thymoma, thyroid carcinoma, STAD, PRAD, pancreatic adenocarcinoma, lung squamous cell carcinoma, and esophageal carcinoma (ESCA), it was negatively correlated with TMB. In UCEC, kidney renal papillary cell carcinoma, and COAD, FKBP5 was positively correlated with MSI, but in TGCT, STAD, PRAD, ovarian serous cystadenocarcinoma, HNSC, ESCA, DLBC, and breast invasive carcinoma, FKBP5 expression was negatively correlated with MSI (Figure 15B and C).
FKBP5 and drug reactions
FKBP5 expression was positively associated with drug response in patients treated with hydroxyurea, fenretinide, fostamatinib, DMAPT, RH1, 8-chloroadenosine, chlorambucil, 3 -bromopyruvate (acid), imexon, nitrogen mustard, uracil mustard, parthenolide, chelerythrine, and vorinostat. Notably, there was a negative correlation between FKBP5 expression and kahalide F, a decapping peptide from the sea slug Elysia rufescens that targets alterations in lysosomal membrane function and is currently undergoing clinical trials for a variety of oncology treatments. FKBP5 and the expected drug responses are shown in Figure 16.