Subjects data collection and statement. Data from the Gene Expression Omnibus (GEO) database were retrieved from GSE59446, which including 100 controls samples and 100 KBD samples (Table S1). Clinically relevant samples were collected from the patient’s peripheral blood and conducted in accordance with the declaration of Helsinki. Our study was approved by The Human Ethics Committee of the First Affiliated Hospital of Harbin Medical University (Harbin, China) and all subjects gave informed consent to be treated by our hospital (EC-HS-2020112). Patients were diagnosed according to the WS/T 207-2010 criteria, and those presenting any other osteoarthritis-related diseases were excluded. Collected blood samples from 5 healthy laboratory staff volunteers who had not been diagnosed with chronic inflammation or other diseases before, as a reference for the clinical cohort. Finally, a total of 10 subjects were included in the study, including 5 healthy controls and 5 KBD patients. All subjects provided written informed consent.
Differentially expressed genes (DEG) screening. The GSE59446 dataset original data was read and the RMA algorithm was applied for background correction and data normalization. DEGs were filtered using the "limma" R package [13]. The "ggplot2" R package was used to draw the DEGs’ volcano map [14]. Also, the “gplots" R package was used to create a DEGs’ heatmap [15]. DEGs with p < 0.05 and |log2FC| > 1 were considered statistically significant.
WGCNA. After ranking genes from the largest to the smallest according to the median absolute deviation, we used the "WGCNA" R package to list all genes in WGCNA [5, 16]. Then, we used the "pickSoftThreshold" function (WGCNA software package) to filter out power parameters from 1 to 20. We choose a minimum power value of 0.85 for the independence degree and a soft threshold value of 4. Finally, we selected 7 modules. To visualize the generated gene network, we used the difference between the topological overlap matrix and its cluster dendrogram.
Key modules identification. A heatmap was used to display the correlation between gene features in modules and clinical features, which allowed the identification of key modules that were significantly related to clinical features. The key modules of synchronous frequency hopping are closely related to the status of synchronous frequency hopping. Thus, to verify the exact relationship between modules and characters, we compared the correlation between general service and management. All correlation analyses were conducted using Pearson correlations by the 'WGCNA' R package [5].
Protein-protein interaction (PPI) network analysis. The Cytoscape software (v. 3.7.1) was used to construct the PPI network with the above key modules genes.
Key modules functional enrichment analyses. To evaluate the key modular genes and DEGs biological functions, the annotation, visualization, and comprehensive discovery database tool, DAVID (version 6.8) [17, 18], was used to perform gene ontology (GO) functional enrichment analyses. GO terms with p < 0.05 were considered significant. For the genes in key modules, the functional enrichment of the top 10 GO pathways, sorted by gene count, was visualized. For DEGs between control and KBD patients, all the important functionally enriched GO pathways were visualized.
Screening of diagnostic. To perform feature selection and filter OA diagnostic markers, we used the least absolute shrinkage and selection operator (LASSO) logic back (Regression Shrinkage and Selection Via the Lasso) and support vector machine recursive feature elimination (SVM-RFE) [19]. The GSE59446 dataset expression matrix was subjected to quality control to establish a new dataset. Then, an independent dataset was used to verify the diagnostic efficiency of the obtained markers. The LASSO algorithm was used in conjunction with the “glmnet” R package [20]. Additionally, SVM-RFE is a general machine learning technique based on support vector machines in which feature vectors generated by SVM are removed to find the best variables [21]. SVM modules were created using the "e1071" R package to better identify the diagnostic value of these biomarkers. Then, we combined the genes selected by LASSO and SVM-RFE algorithms to further analyze them. A double-sided p < 0.05 was considered statistically significant.
Key genes clinical relevance. Use the "ggpubr" software package to analyze the relationship between key genes and age, stage, gender, and experimental groupings.
Blood sampling and RNA isolation for RT-PCR. To analyze each subject’s gene expression, 3 mL of peripheral blood was collected into heparinized vacuum tubes (Suzhou Bidi Medical Equipment Co., Ltd. China). An SYSMEXXE-2100 machine was used to determine the white blood cells count (Sysmex Corporation, Japan). Blood was centrifuged at 1,500 x g for 20 min to separate PBMC from plasma. Cell pellets were resuspended in Hanks Balanced Salt Solution (Solarbio). A 15 mL Falcon tube was filled with 5 mL Lympholyte-H (Solarbio) and centrifuged at 1,500 × g for 40 min. Then, they were rinsed with cold Hank balanced salt solution and stored in TRIZOLTM (ThermoFisher SCIENTIFIC). The RNA was extracted by the TRIZOL method. In the RT-PCR experiment, gene expression relative values were calculated using 2-ΔΔct.