Gene Co-expression Networks of Osteomyelitis
We adopted the WGCNA function to screen the OM-related gene from GSE6269 and GSE16129. Firstly, no outliers were detected by the Hclust function, and 12401 candidate genes were involved in the subsequent analysis. When a power of β = 5 was selected as the soft-threshold to ensure a scale-free network, it was implemented to measure the co-expression network (Figure 1). Thirty-one significant gene modules (Figure 2) then were assigned to different colors. After calculating each module by WGCNA, the correlations between gene expression and osteomyelitis (Figure 3) signiﬁcantly demonstrated in the brown module (R2 = 0.57, P<0.001), which was defined as candidate genes (Figure 4). There were 918 genes in the brown module for the following analyses.
Biological Functions and Pathways Involving the Candidate Module
The GO enrichment analysis of the candidate genes showed that the cellular component (CC) of genes concentrated on the cell nucleus and particular extracellular region part. Besides, the candidate genes were mainly involved in three biological processes (BP), including protein binding, poly (A) RNA binding and ATP binding. The top three of molecular function (MF) were RNA polymerase II promoter, regulation of transcription from RNA polymerase II promoter and negative regulation of transcription (Figure 5). For the KEGG pathways, the candidate genes were significantly related to pathways including metabolic pathways, pathways in cancer, human papillomavirus infection, RNA transport, thermogenesis, PI3K-Akt signaling pathway, Epstein-Barr virus infection, Human T-cell leukemia virus 1 infection, FoxO signaling pathway and AMPK signaling pathway (Figure 6).
Recursive feature elimination (RFE) for Identification of feature genes
To narrow the range of candidate genes, we adopted the RFE algorithm to select the feature genes. The prediction accuracy was the highest (81%) when selected 19 genes from the brown module (Figure 7). These genes were involved as the feature genes to construct the Random forest model. These feature genes included Cytochrome P450 2E1 (CYP2E1), Bardet–Biedl syndrome 10 (BBS10), Actin Related Protein 2/3 Complex Subunit 5 Like (ARPC5L), GTPase Activating Protein And VPS9 Domains 1 (GAPVD1), Purine Rich Element Binding Protein A (PURA), RNA Binding Motif Single Stranded Interacting Protein 1 (RBMS1), Butyrophilin subfamily 2 member A2 (BTN2A2), Exosome component 8 (EXOSC8), Methyltransferase Like 8 (METTL8), FYVE And Coiled-Coil Domain Autophagy Adaptor 1 (FYCO1), Ketohexokinase (KHK), Pre-mRNA Processing Factor 38B (PRPF38B), Cluster of Differentiation 72 (CD72), C2 Calcium Dependent Domain Containing 5 (C2CD5), Alpha/beta-Hydrolase domain containing 6 (ABHD6), Cluster of Differentiation 200 (CD200), Family With Sequence Similarity 53 Member C (FAM53C), HLA Complex P5 (HCP5) and Elongator Complex Protein 1 (ELP1).
The construction of the classification model by the random forest algorithm
To validate the accuracy of feature genes, the feature genes were involved in a random forest algorithm for constructing classification models. After repeating the CV 5 times, the mean value of AUC was 0.85 ± 0.09 in the ROC curve (Figure 8), and the average accuracy of the model was 85.7% (95%CI: 0.7587-0.9265). The external dataset validated the efficiency of the random forest model. The value of AUC was 0.82 (Figure 9), and the accuracy of external validation was 78.6%. Besides that, the positive predictive value was 100%, and the negative predictive value was 66.7% that the confusion matrix visualized these results (Figure 10).
Hub modules screening of the PPI network
The PPI network of feature genes, with 38 nodes and 89 edges, was established by the STRING website and the Cytoscape. The top ten proteins were ACTR3, ARPC1A, ARPC2, ARPC3, SKIV2L2, ARPC5L, EXOSC5, EXOSC7, EXOSC8 and GAPVD1 (Figure 11). To screen the cluster of proteins, we selected two modules from these proteins using MCODE. The protein function of modules enriched in the catalytic component of the RNA exosome complex and regulation of actin polymerization, that these two modules were defined as module1 and module 2 respectively. The 4 nodes, Superkiller viralicidic activity 2-like 2 (SKIV2L2), Exosome component 5 (EXOSC5), Exosome component 7 (EXOSC7), Exosome component 8 (EXOSC8), belonged to module 1. Five other nodes in module 2 were Actin Related Protein 3 (ACTR3), Actin Related Protein 2/3 Complex Subunit 1A (ARPC1A), Actin-related protein 2/3 complex subunit 2 (ARPC2), Actin Related Protein 2/3 Complex Subunit 3 (ARPC3), Actin-related protein 2/3 complex subunit 5-like protein (ARPC5L) and GTPase Activating Protein And VPS9 Domains 1 (GAPVD1).