Integrated bioinformatics analysis of cartilage for osteoarthritis

Background Osteoarthritis (OA) is a joint disease which threatens the health of older people around the world. However, its pathological mechanism remains unclear. In the current study, we aimed to identify the key genes and underlying pathological mechanisms associated with OA. Methods Firstly, GSE117999 was obtained from the Gene Expression Omnibus (GEO) database. We then obtained the DEGs, hub genes and key genes of the dataset using bioinformatics analysis, after which we analyzed these items using the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO). Furthermore, the target miRNAs of key genes were predicted by miRDB. The Drug Gene Interaction database (DGIdb) was used to predict target drugs for key genes. Receiver operating characteristic (ROC) analysis was employed to evaluate the effectiveness of key genes. Results: A total of 974 DEGs were identified. Furthermore, we found 19 hub genes and 10 key genes. We then predicted that hsa-mir-3613-3p, hsa-mir-548e-5p and hsa-mir-5692a would be closely related to key genes through web tools, and that two drugs, etoposide and everolimus, were highly specific to some key genes. Finally, we analyzed the receiver operating characteristic (ROC) of key genes in the samples and speculated that ESR1 may be a biomarker of OA. Conclusions This study deepens our understanding of OA and may be beneficial for further explorations of the pathological mechanism of the disease.


Introduction
Osteoarthritis (OA), a common degenerative joint disease, is characterized by destruction of articular cartilage and subchondral bone. Incidences of OA are increasing, and the disease seriously affects patients' quality of life while increasing their social and medical 3 burden [1,2] . Early and accurate diagnosis of OA is of great importance for its treatment.
However, it is difficult to ascertain via the current mainstream imaging examination. We therefore need a new method to improve diagnosis efficiency.
Progress has recently been made in the treatment of OA, although efficacy is still not optimal [3,4] . Nonsteroidal anti-inflammatory drugs are widely used for treating OA, but their many side effects limit their long-term use [5,6] , meaning a new drug is needed for the targeted treatment of OA. In recent years, many studies have been devoted to understanding the pathological process of OA, but no significant breakthrough has been made. It is therefore essential that we further explore the pathogenesis of OA to assist its early diagnosis and intervention. Cartilage, mainly composed of aggrecan and collagen II, plays a key role in maintaining the normal structure and function of joints [7,8] . In view of the role of cartilage in OA, we shifted the focus of research on the mechanism of OA to cartilage pathology.
In recent years, more scholars have made some progress on exploring differentially expressed genes (DEGs) as a potential breakthrough for exploring the pathological mechanism of OA [9][10][11] . However, this area needs further exploration. The Gene Expression Omnibus (GEO) database is established and maintained by the National Center for Biotechnology Information which stores a large amount of gene expression data.
Through bioinformatics analysis, we can select suitable data sets from the GEO database for further analysis which will obtain valuable information [12] .
In the current study, we identified DEGs, hub genes and key genes by bioinformatics analysis and analyzed their potential biological functions as well as possible pathways in OA. We also predicted the microRNA(miRNA) that may play an important role in OA and potential drugs for treatment. This study will help us further explore OA's pathological 4 mechanism.

Microarray data
We downloaded the microarray data (GSE117999) uploaded by Rai MF et al. (2018) from the GEO database, which was an analysis of transcription level in the cartilage of patients with OA and without OA. The platform used to process data was Agilent-072363 SurePrint G3 Human GE v3 8x60K Microarray 039494 (GPL20844).

Recognition of DEGs
GEO2R is an interactive network tool used for comparing differences between two groups under the same experimental conditions [13] . Using GEO2R, we analyzed DEGs of cartilage between OA and Non-OA groups using the default setting. Subsequently, we used adjusted P value <0.05 and |logFC| > 0.5 as indicators to screen DEGs through R software. We also used ggplot2, RColorBrewer and gplots of R software to create a volcano plot of all genes and a heatmap of DEGs.

Enrichment analysis for DEGs
We used the network tool Metascape to carry out Gene Ontology (GO) and a Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of DEGs according to the default parameters. We also carried out a Gene Set Enrichment Analysis (GSEA) of DEGs through miRWalk [14,15] and made bubble graphs about enrichment analysis through package ggplot2 of R software.

Construction and analysis of protein-protein interaction (PPI) network
STRING is commonly used in the study of protein-protein interaction [16] . We constructed a PPI network using STRING. We then used the plug-in CentiScape of Cytoscape to measure the centrality of DEGs in the PPI network with degree centrality, betweenness and closeness centrality [17,18] . Furthermore, we used centrality degree ≥ 10 as the criterion for identifying hub genes. We also constructed a PPI network of hub genes based on STRING and Cytoscape and then used plug-in MCODE of Cytoscape, with a degree cutoff value of five, and other default algorithms to identify key genes [19] .

GO and KEGG analysis for hub genes and key genes
We performed a GO and KEGG analysis of hub genes and a miRWalk analysis of key genes.
We also made bubble graphs through ggplot2 package of R software to visually display enrichment analysis results.

Construction of miRNA-protein regulatory network
The target miRNAs of key genes were predicted by miRDB [20] . We also selected the most valuable miRNAs for further research, with Target Score (≥90) as the boundary. Based on this, we used Cytoscape to build the miRNA-protein regulatory network.

Construction of protein-drug network
We firstly retrieved commonly used drugs for treating OA, in a proven state, through the Therapeutic Target Database (TTD) [21] . We then used the Drug Gene Interaction database (DGIdb) to predict the related genes of the common drugs obtained in the previous step.
We also analyzed whether related genes were the same as hub genes, key genes and DEGs before using DGIdb to predict target drugs for key genes. We then displayed these 6 intuitively through Cytoscape.

Receiver operating characteristic (ROC) analysis
ROC analysis was carried out using package pROC of R software to find valuable key genes as potential biomarkers. A p value of < 0.05 indicated statistically significant differences.
Youden's index was used to show the most diagnostic threshold for each gene. The area under curve (AUC) was used to evaluate the effectiveness of these genes as biomarkers of OA [22,23] .

Microarray data
In the GSE117999 data set, OA group samples came from patients undergoing total knee arthroplasty due to end-stage OA, while Non-OA group samples came from the non-weight bearing site of the medial intercondylar notch; that is, patients who underwent arthroscopic partial meniscectomy but who did not have OA. The overall design of the data set was to identify differentially expressed genes using transcriptional analysis of 12 OA patients and 12 Non-OA patients. However, two samples from each group were excluded from the analysis for reasons which were not mentioned in the database.

Screening DEGs
A total of 974 DEGs were obtained. Expression information for 20 samples is displayed intuitively in the volcano plot ( Figure 1) and the heatmap ( Figure 2).

Enrichment analysis for DEGs
By analyzing enrichment analysis data of Metascape and combining results from previous studies, we focused on the following items. For GO, we focused on defense response to fungus, positive regulation of mRNA processing, cellular metabolic compound salvage, magnesium ion homeostasis and cellular sodium ion homeostasis. For KEGG, we focused on signaling by BMP and signaling by ERBB4 (Figure 3.A). We also expanded our focus by interpreting GSEA enrichment analysis data. For GO, we also focused on protein tyrosine kinase activity, calcium-activated potassium channel activity, enzyme binding, ubiquitin protein ligase activity, protein-arginine N-methyltransferase activity, killing of cells of other organisms, defense response to fungus, serine-type endopeptidase activity, ATPase activity, antimicrobial humoral response, antimicrobial humoral immune response and potassium ion transport. For KEGG, we also focused on Endocrine resistance (Figure 3.B).

Construction and analysis of PPI network
By analyzing the PPI network of DEGs, we found that 19 genes had higher levels, meaning they may have more biological significance. We therefore defined these as hub genes ( Figure 4.B). Further, by analyzing the PPI network of hub genes, MDM2, RB1, EGFR, ESR1, UBE2E3, WWP1, BCL2, OAS2, TYMS and MSH2 were defined as key genes which may play crucial roles in OA's pathological process (Figure 4.B).

GO and KEGG analysis for hub genes and key genes
The GO and KEGG analyses of hub genes were mainly enriched in 14 items ( Figure 5.A).
Additionally, the enrichment analysis of key genes tended to focus on 13 items. In line with relevant literature from recent years, we focused on protein destabilization, the protein ubiquitination involved in ubiquitin-dependent protein catabolic process, transcription factor binding, enzyme binding, ubiquitin protein ligase activity, ubiquitin protein ligase binding, protein tyrosine kinase activity and p53 binding in GO as well as endocrine resistance in KEGG ( Figure 5.B).

Construction of miRNA-protein regulatory network
A total of 165 microRNAs were predicted, of which we focused on hsa-miR-3613-3p, hsa-miR-548e-5p and hsa-miR-5692a due to their close relationship with several key genes ( Figure 6).

Construction of protein-drug network
Following data analysis, we found 15 commonly used drugs for treating OA, which were closely related to 81 genes (Figure 7.A). Surprisingly, PLAT is the only DEG among these related genes, although the hub and key genes were not included in this group. We therefore obtained 355 potentially related drugs for key genes through analysis. Among these, as many as 21 drugs were closely related to two key genes, while etoposide and everolimus were closely related to three key genes. This may mean that these two drugs were the most specific to key genes (Figure 7.B).

ROC analysis
The AUC of all key genes was greater than 0.800, and their P values were less than 0.05.
Among these, the AUC of BCL2 (0.960), EGFR (0.940), ESR1 (1.000), OAS2 (0.960), UBE2E3 (0.920) and WWP1 (0.920) were greater than 0.900, while ESR1 had the maximum AUC ( Figure 8.A-J). Therefore, ESR1 may have the highest value for OA diagnosis in this data set. 9 It has previously been reported that 18% of elderly women and 9.6% of elderly men suffer from symptomatic OA [24] . Many scholars have explored the pathological mechanism of OA, yet it remains a mystery. We set out to find the key molecules and their potential molecular mechanisms in the occurrence and development of OA.

Discussion
GSE117999 is a valuable data set for OA research, as human articular cartilage is critical for joint function and hard to collect, especially in patients without OA. An in-depth analysis of this data will help elucidate OA's underlying pathological mechanism.
Firstly, authors of previous studies have proven that many items in the GO analysis of selected DEGs are closely related to OA. For example, some scholars have found that fungal infection incidence is low, but once it occurs, it will have a serious impact on the joint [25][26][27] . Since fungal infection is prone to misdiagnosis and missed diagnosis in clinic, special attention should be paid to it. The defense response to fungus of articular cartilage is key for maintaining normal joint function. Authors of previous studies have suggested that magnesium ion can increase alkaline phosphatase activity in osteoblasts while promoting osteogenic differentiation, meaning that maintaining the homeostasis of magnesium ion can help maintain normal joint function [28][29][30] . Scholars have studied the effect of sodium ion in OA and proposed that non-selective sodium channel blockers can effectively treat pain caused by OA in animal models and patients, confirming the important role of sodium ion homeostasis in joint function [31][32][33] .
The development of normal cartilage and growth plate requires regulation of the protein tyrosine kinase-mediated signaling pathway. The abnormal function of protein tyrosine kinase can lead to abnormality of chondrocyte, which in turn leads to joint damage [34] . It has been found that calcium-activated potassium channels, including BKα or KCNMA1, are highly expressed in chondrocytes, which suggests that maintaining typical calcium-activated potassium channel activity is essential for maintaining normal joint function [35] .
Scholars have shown that Smurf2, as an E3 ubiquitin ligase, is up-regulated in the cartilage of OA, while over-expression of Smurf2 in chondrocyte can cause cartilage damage [36] . ATPase can hydrolyze phosphomonoesterase, regulating the mineralization process and providing treatment for pathological mineralization, including OA. Potassium transport is associated with pain perception, which should be considered when treating OA [37] . Other GO entries have not yet been found to be significantly related to OA, although further study may be required here.
Secondly, use of DEGs enrichment analysis has shown that KEGG was mainly enriched in signaling by bone morphogenetic protein (BMP) and signaling by ERBB4. Authors of recent studies have found that BMP signaling is essential for the maturation of joint structure.
Blocking the BMP signaling pathway leads to the loss of extracellular matrix components in articular cartilage, which can then lead to OA [38] . Interestingly, it has been found that over-expressed BMP signals can harm joint formation [39,40] . Authors have demonstrated that ERBB4 is closely related to gamma-aminobutyric acid transmission and synaptic plasticity [41][42][43] . Gamma-aminobutyric acid receptor B is also involved in the regulation of acute and chronic pain in rats, including chronic OA pain [44][45][46] . However, whether ERBB4 plays a role in the process of OA remains unconfirmed.
We identified hub genes and key genes by constructing a PPI network. Key genes may play an important role in the process of OA. Some have been confirmed in existing research.
Authors of previous studies have found that RB1 can combat inflammation and apoptosis of chondrocytes induced by IL-1β or hydrogen peroxide [47,48] . Studies exist which also show that the epidermal growth factor receptor(EGFR) plays a role in cartilage 11 development and OA [49,50] . Previously, an increased expression of Estrogen receptor alpha (ESR1) was detected in chondrocytes and osteoblasts, suggesting it may be involved in OA's pathological process [51] . It has been reported that BCL2 and p53 play a key role in regulating the growth and apoptosis of chondrocytes [52] . Some scholars have demonstrated that MDM2 can be inhibited to improve joint inflammation as well as cartilage damage [53] . 2'-5'-oligoadenylate synthetase 2(OAS2) is a component of the IFNβ signaling pathway, which is related to RA. Similarly, TYMS has been found to be associated with the process of RA [54,55] .
Next, we analyzed both hub and key genes, using enrichment analysis. We found that as well as the entries enriched in DEGs, these genes also enriched other entries. Authors of previous in vitro model studies have shown that the expression of Forkhead box M1, a transcription factor, was up-regulated in OA chondrocytes induced by IL-1beta. Meanwhile, a rescue test of Forkhead box M1 was used to show that cell viability and inflammatory response improved accordingly [56] . P53 can inhibit DNA replication and stop cell replication as well as to promote cell apoptosis and chondrocyte degeneration throughout the course of OA [57] .
In recent years, more attention has been paid to miRNAs. Many scholars have found that molecules such as miR-145 and miR-140 play a significant role in OA's process [58,59] . We predicted that several microRNAs, such as hsa-miR-3613-3p, hsa-miR-548e-5p and hsa-miR-5692a, would be closely related to key genes by bioinformatics, suggesting they may play a key role in OA pathogenesis. However, the specific mechanism of its role in OA needs further exploration to be confirmed.
Additionally, we searched for 15 commonly used OA drugs. Interestingly, the target genes of these drugs were not hub or key genes, but only PLAT in DEGs. Therefore, both the specificity and efficacy of these drugs, which are commonly used to treat OA, are questionable. We then obtained two valuable drugs, etoposide and everolimus, by predicting targeted drugs for key genes, which may provide inspiration for the discovery of new therapeutic regimens specific to OA.
Finally, through ROC analysis, we predicted that ESR1, which had the largest AUC value, might be a marker for the diagnosis and treatment of OA. However, clinical data is needed to confirm this.
A strength of this study was that we used the integrated bioinformatics analysis method to analyze public databases, laying a foundation for further exploring the pathological mechanisms of OA. However, our research had some limitations. Firstly, due to the limited sample size in public databases, it is difficult to effectively avoid random errors. In future research, we need to ensure a sufficient sample size to enhance statistical effectiveness.
Additionally, when conducting the bioinformatics analysis, due to the limitations of experimental conditions, we failed to carry out relevant experimental verification. Relying solely on bioinformatics algorithms may lead to deviations. Despite these limitations, the results of this study provide guidance for exploring the pathological mechanism of OA, but need further verification.

Conclusions
In this study, cartilage was the object of study. Potential key genes related to OA as well as their biological functions were identified, potential related microRNAs and drugs were

Availability of data and materials
The microarray data GSE117999 was downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/).

Authors' contributions
WB and LHD participated in the design of this study. WB and ZJL performed the statistical analysis. WB, ZJL, LN and CYF collected important background information. WB drafted the manuscript. All authors read and approved the final manuscript.

Ethics approval and consent to participate
Not applicable.

Consent for publication
All authors read and approved the final manuscript and consented to publication.  The protein-miRNA network. The orange V-shape indicates key genes, the green triangle indicates hub genes, the blue/green circle shows target miRNAs of key genes, and the purple diamond shows core miRNAs.