Identication and Functional Enrichment Analysis of Differentially Expressed Genes in Osteoarthritis

Background Osteoarthritis (OA) is a chronic, progressive, inammatory, degenerative disease, which has become an osteoarthropathy that seriously affects physical health and quality of life of elderly people. However, the etiology and pathogenesis of OA remains unclear. Therefore, the study purposed to utilize bioinformatics technology to perform identication and functional enrichment analysis of differentially expressed genes in osteoarthritis. Method The main methods of this study consist of access to microarray data (GSE82107 and GSE55235), identication of differently expressed genes (DEGs) by GEO2R between OA and normal synovium samples, enrichment analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) by Gene Set Enrichment Analysis (GSEA), construction and analysis of protein-protein interaction (PPI) network, signicant module and hub genes. Result total of 300 DEGs were consisting of and 11 genes in samples compared Gene set some major cellular response TF, and


Background
Osteoarthritis is a chronic joint disease characterized by cartilage degeneration, synovial in ammation, formation of osteophytes and subchondral bone sclerosis. OA most affects such joints as knees, hands, hips and spine, with pain, transient morning stiffness, and limited function as the main symptoms [1,2].
The disease is the most prevalent form of arthritis in humans, affecting the majority of individuals over 65 years of age and is a leading cause of disability in the adult population [2,3]. What's more, the combined effects of ageing and increasing obesity globally have led to this already burdensome syndrome becoming more prevalent, with worldwide estimates suggesting that 250 million people are currently affected [1]. However till date, available pain therapies are limited in e cacy and have associated toxicities, such as paracetamol and non-steroid anti-in ammation drugs (NSAIDs) which are most often recommended. And there is no e cacious structure-modifying agent having been approved by any regulatory agency [3]. As for patients who have not responded appropriately to invasive procedures surgery should be reserved. Prevention and disease modi cation are the targeted areas for various research endeavours, indicating great potential to date [1]. New insights on pathogenesis holds that OA is a disorder of the whole joint with a wide range of underlying pathways which remain not much clear but lead to similar outcomes of joint destruction, as OA is typically described as a heterogeneous syndrome [4]. Therefore, emphasis should be given to gaining more insights into the molecular circuitry that promote the onset of this disease to further investigate novel and speci c biomarkers and targets.
Epidemiological studies have demonstrated that OA is a complex polygenic disorder with environmental and genetic risk factors involved [5]. It is a widely held view that OA has a strong genetic component and recent evidence suggests that at least 30% of the risk of it is genetically determined [6][7][8]. In the past 20 years, the search for susceptible sites of OA has been intensively investigated. Genomewide association studies (GWAS) are able to discover potential genetic variants that could be exploited as biomarkers for early diagnosis and targeted therapy [9]. Despite this, further work is needed to understand the genetic contribution to OA, as associations can vary depending on the nature of the condition, such as site, a history of trauma and gender [10]. In recent years, high-throughput sequencing and gene expression pro ling have been widely used in life sciences [11,12]. Gene expression analysis based on bioinformatics methods can nd a number of differentially expressed genes (DEGs) that play pivotal roles in the initiation and progression of tumors. Some of the DEGs are even considered as potential molecular targets and diagnostic biomarkers.
Recently, studies about DEGs and gene signatures of OA have been growingly reported. And aberrantly signi cant DEGs and pathways in OA have been brought to attention based on these results. For example, using Gene Expression Omnibus (GEO) datasets (https://www.ncbi.nlm.nih.gov/geo/), Li et al identi ed a total of 258 DEGs including 161 upregulated and 97 downregulated genes [13]. Pathway enrichment analysis illustrated that upregulated DEGs were mainly involved in the TNF signaling pathway and osteoclast differentiation, while downregulated DEGs were enriched in cytokine-cytokine receptor interaction and glycosphingolipid biosynthesis-globoseries. In another similar study, Sun et al discovered 1377 DEGs (869 upregulated and 508 downregulated) from the raw data [14]. However, noises, errors as well as outliers in DEG results may be obtained due to the limited sample quantities and not robust analysis methods. Gene Set Enrichment Analysis (GSEA) sequences all genes of the two groups needed to be analyzed and indicates the changing trend of expression level of genes. GSEA analyzes whether all genes of a priori de ned set are enriched at the top or bottom of this sequence list [15].
In this study, we have downloaded GSE82107 and GSE55235 from GEO database and utilized GEO2R online tool to identify the DEGs between OA and normal tissues, with the purpose to elucidate the potential key candidate genes and pathways in OA. They. Additionally, GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis (https://www.kegg.jp/) of DEGs by GSEA and protein-protein interaction (PPI) network construction were used for analyzing these data. We sought to gain a deeper understanding of the molecular mechanism in OA and investigate more potential target genes for OA treatment.

Identi cation of DEGs in OA
The difference between control and OA synovium tissues could be presented in the volcano plots (Fig. 1A, B) after analysis of the datasets (GSE82107 and GSE55235) with GEO2R. The analysis of GSE82107 and GSE55235 identi ed 3404 and 1523 DEGs, respectively (Fig. 1C). The Venn diagram showed that the common part among the 2 datasets included 300 DEGs.
2.2. GO and KEGG pathway enrichment analysis of DEGs in OA using GSEA GSEA was used to perform GO and KEGG analysis to explore the function and pathways of DEGs in GSE82107.
GO enrichment analysis indicated that 2745/4374 gene sets were upregulated in OA compared to normal synovium samples, while 259 gene sets were signi cantly enriched (P < 0.05), and at nominal p-value < 0.01 there were 57 gene sets. In addition, 1629/4374 gene sets are downregulated in OA, 55 gene sets were signi cantly enriched at nominal p-value < 0.05, and 7 gene sets were signi cantly enriched at nominal p-value < 0.01. The most signi cant enrichments for both up-and down-regulated gene sets in the signi cant order (size of normalized enrichment score) are listed in Table 1. Six signi cant enrichment plots are shown in Fig. 2 (Fig. 2), such as "GO_CELLULAR_RESPONSE_TO_HYDROGEN_PEROXIDE", "GO_POSITIVE_REGULATION_OF_RNA_SPLICING", "GO_GOLGI_VESICLE_TRANSPORT", "GO_CGMP_BINDING", "GO_PLASMA_LIPOPROTEIN_PARTICLE_CLEARANCE", "GO_ALDEHYDE_DEHYDROGENASE_NAD_ACTIVITY" and so on. GO enrichment analysis revealed that upregulated gene sets in OA were mainly associated with cellular response to hydrogen peroxide, RNA splicing, and golgi vesicle transport, while downregulated gene sets frequently correlates with cGMP binding, lipoprotein and so on.  Table 2. Six signi cant enrichment plots are shown in Fig. 3, such as "KEGG_P53_SIGNALING_PATHWAY", "KEGG_N_GLYCAN_BIOSYNTHESIS", "KEGG_SPHINGOLIPID_METABOLISM", "KEGG_ASCORBATE_AND_ALDARATE_METABOLISM", "KEGG_GLYCEROLIPID_METABOLISM", "KEGG_METABOLISM_OF_XENOBIOTICS_BY_CYTOCHROME_P450", and so on. According to KEGG pathway enrichment analysis, upregulated gene sets in OA were generally involved in the pathway of P53, glycan, and sphingolipid, and downregulated gene sets participated in ascorbate, glycerolipid, and xenobiotics by cytochrome P450.  (Fig. 4), and 49 edges and 17 nodes in the signi cant module (Fig. 5). Degrees ≥ 10 was considered as the criterion of judgment. Ten genes were identi ed as hub genes with Cytoscape: CYR61, PENK, GOLM1, DUSP1, ATF3, STC2, FOSB, PRSS23, TF, and TNC (Fig. 6). Heat maps showed that the OA synovium samples could be basically differentiated from the normal synovium samples by hub genes (Fig. 7A, B).

Discussion
OA is a leading osteoarticular disease with declining joint functions [2,3]. Historically, the interest in disease modi cation has concentrated on cartilage, which unfortunately has little direct improvement on with symptom experience [1]. The manifestations in the joint are well characterized by progressive loss and calci cation of articular cartilage, subchondral bone remodeling, osteophyte formation and mild to moderate in ammation of the synovial lining [3]. It's worth noting that the synovium may show signi cant changes, including mononuclear cells in ltration, in ammatory cytokines production and thickening of the synovial lining layer, even before visible cartilage degeneration has occurred. In addition, a high prevalence of synovial in ammation in all stages of OA has been con rmed by the combination of highsensitivity imaging modalities and tissue examination, with increasing studies demonstrating that synovitis is correlated with joint pain, functional impairment and may even be an independent driver of radiographic OA onset and structural progression [16]. Thus, new insights into the important roles that the synovium have in disease pathogenesis may hold great promise for future therapeutic advances.
The synovial tissues lining the diarthrodial joints surfaces include various types of cells, such as synovial broblasts and macrophages [17]. That these cells are involved in the pathogenesis of arthritis by producing proin ammatory cytokines, tumor necrosis factor and MMPs is widely recongnized [17,18]. However, previous research into OA synovium was mainly based on single gene expression analysis [19][20][21]. The genome-wide RNA expression pro ling could be useful in revealing gene functions in the disease course of such a complex syndrome. Especially pathway and network analysis would be more effective to unravel the pathogenesis of OA and even provide clues for treatment.
The expression pro ling analysis via microarray can provide information about the expression differences of thousands of genes in human genome. In this study, we used it to predict and nd the key or potential genes for OA. Two microarray datasets were screened (GSE82107 and GSE55235) and 300 shared DEGs obtained via bioinformatics analysis. Then, we submitted the expression matrix of all genes and phenotypes of the samples in GSE82107 to the GSEA software to perform the GO and KEGG enrichment analysis and to determine whether the DEGs show statistically signi cant difference between the OA and normal tissues. The GO enrichment analysis revealed that DEGs were signi cantly involved in cellular response to hydrogen peroxide, RNA splicing, golgi vesicle transport, cGMP binding, and lipoprotein. The KEGG enrichment analysis indicated that DEGs were generally associated with the pathway of P53, glycan, sphingolipid, ascorbate, glycerolipid, and xenobiotics by cytochrome P450.
The results obtained indicate that the cellular response to hydrogen peroxide (H 2 O 2 ) of synovium membrane is involved in the disease process of OA. Besides, a previous study has suggested that H 2 O 2 could modify the metabolism of synovial broblasts derived from both rheumatoid and osteoarthritic knee joints, and that the effects produced depending on the concentration of H 2 O 2 to which these cells are exposed. A biphasic response was observed in osteoarthritic cell lines. A signi cant stimulation on hyaluronic acid (HA) synthesis in the presence of low concentrations of H 2 O 2 (< 10 µmol/L) occurred, whereas an inhibitory effect of synthesis was noted at higher concentrations (> 10 µmol/L) [22]. During the course of OA, the synovial uid undergoes degradation manifesting as a decrease in the amount and the average molecular weight of HA, which is correlated with pain and dysfunction [23,24]. Notwithstanding the in vitro ndings cannot be extrapolated to the human clinical situation, high molecular weight HA is an alternative treatment for knee OA that articular injection of HA acts to restore intraarticular lubrication, consequently improving joint biomechanics [25,26]. Recent meta-analysis also suggests that HA injections are a safe and effective alternative in treating patients with symptomatic knee OA [25][26][27][28][29].
When present in excess, H 2 O 2 is thought to be a key mediator in various processes that damage cells.
Due to such reactive oxygen species (ROS) attacks, as growing evidence suggests the serious damage together with the alterations caused to physiological cell signalling lie in the core of the etiology and pathogenesis of several age-related diseases [30][31][32]. In OA, the augmented production of ROS and the signi cant reduction of antioxidant enzymes have lately been reported [33][34][35][36][37]. Accordingly, ROS may be an important contributor to the development of OA and a major risk factor in this degenerative disease besides aging [38].
In our PPI network constructed based on the DEGs, top 10 hub genes were identi ed: CYR61, PENK, GOLM1, DUSP1, ATF3, STC2, FOSB, PRSS23, TF, and TNC. Among them, CYR61, with the highest node degree as shown in Fig. 6 has gained our further focus. Cysteine-rich protein 61 (Cyr61/CCN1) is a product of an immediate early gene and a component of the extracellular matrix, playing a role in endothelial cell proliferation, differentiation, adhesion and migration [39,40]. Despite this, previous studies discovered that during the progression of rheumatoid arthritis (RA), Cyr61 dramatically enhanced interleukin-17 (IL-17) expression in broblast-like synoviocytes and promoted the production of the proin ammatory cytokine IL-1β [41,42]. Recently, Liu et al found that Cyr61 stimulated VEGF expression in osteoblasts and endothelial progenitor cells (EPCs)-primed angiogenesis, whereas Cyr61 knockdown inhibited angiogenesis in both in vitro and in vivo models. Furthermore, suppression of CYR61 ameliorated articular swelling and cartilage erosion in the ankle joint of murine collagen-induced arthritis (CIA) model [43]. Lin et al reported that Cyr61 produced from RA broblast-like synoviocytes (RA-FLS) initiated proliferation and migration of FLS, regulated IL-6 production in FLS, and induced differentiation of Th17 cell [44]. Another recent study suggested that Cyr61 promoted the production of IL-1β in FLS and induced osteoclastogenesis in RA [42]. In addition, FLS treated with Cyr61 increased IL-8 production and neutrophil in ux in RA [45]. Together, these studies indicate that Cyr61 acts as a nodal effector molecule in the pathogenesis and progression of RA. As for its role in OA, Cyr61 has been reported to induce chondrogenesis and angiogenesis during embryogenesis of mice, and regulate chondrocyte maturation during cartilage development [46,47]. But currently, the role of Cyr61 in OA synoviocytes has not been studied clearly yet. OA is generally considered as a form of chronic systemic low-grade in ammation, but in some cases the intensi cation of in ammatory lesions resembles changes seen in RA, which even hinders the differential diagnosis by means of imaging examinations [48][49][50][51]. This may have signi cant clinical implications. Therefore, the function of Cyr61 in OA might be somewhat alike but mostly different from that in RA, which needs further experimental study to con rm and analysis. The present study demonstrated that Cyr61 was differentially expressed in patients with OA and in normal group(as shown in Fig. 7), which might represent a starting point for subsequent researches into Cyr61 in OA synovium.
Compared with other studies of OA, the innovativeness of our present study is that GSEA was rst used to conduct GO and KEGG analysis to explore the functions and pathways of DEGs. Moreover, the robust DEGs based on integrated bioinformatics analysis including GO and KEGG pathway enrichment, PPI network, and module analysis, may provide reliable molecular biomarkers from a novel perspectives to help us gain a better understanding of the disease pathogenesis. However, there existed several limitations. We need further study on large sample size to validate the results, and molecular experiments are required, especially for Cyr61.
In conclusion, DEGs associated with OA were screened through the GEO database, and integrated bioinformatics analysis was performed. As a result, a total of 300 DEGs and 10 hub genes were picked out, and we chose cellular response to H 2 O 2 of synoviocytes and CYR61 to further discuss. Based on the effect of H 2 O 2 a treatment option for knee OA patients is intra-articular injection of HA. But whether Cyr61 and its related pathways may be a candidate drug target for the treatment of OA, and its underlying mechanisms need further investigation.

Access to microarray data
Two gene expression pro les, GSE82107 and GSE55235, were download from the GEO database. The array data of GSE82107 consists of an mRNA expression pro le of 10 osteoarthritis (OA) samples and 7 control samples. This gene expression pro les were generated by using Affymetrix HG-U133_plus_2 chip (Platform GPL570). The array data of GSE55235 consists of an mRNA expression pro le of 10 OA samples and 10 control samples. This gene expression pro les were generated by using Affymetrix HG-U133A chip (Platform GPL96). All OA samples were obtained from osteoarthritis synovium, while control samples were collected from normal synovium tissues of healthy individuals without any joint disease.

Identi cation of DEGs
GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/) is an interactive online tool from GEO database, allowing users to compare the two groups of GEO series or more sets of samples to identify DEGs [52]. Therefore we use it to distinguish DEGs between control synovium tissues and OA synovium tissue samples. If one probe set doesn't have the homologous gene, or if one gene has numerous probe sets, the data will be removed. The cut-off criteria were a P-value < 0.05 and a log (FC) ≥ 1 or log (FC) ≤ -1.

Enrichment analysis of GO and KEGG by GSEA
Gene Ontology (GO) is an ontology widely used in bioinformatics, which covers three aspects of biology, including cellular component, molecular function, and biological process [53]. Kyoto Encyclopedia of Genes and Genomes (KEGG) is a collection of databases, manually annotated by experts based on published experimental data. It constructs network to connect genes to interacted cellular molecules and aims to understand advanced functions and biological systems [54]. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed for identi ed DEGs using GSEA analysis. GSEA analysis was conducted according to algorithm after importing all gene data including gene annotation les and reference function sets of both OA synovium tissues and normal synovium tissues in GSE82107 to GSEA software,. Thus a gene sequence list is obtained, then GSEA will analyze position of all genes in the list and accumulate them to get the enrichment score (ES), and after standardizing the ES, a comprehensive understanding of biological function of genes through enrichment of function sets will be obtained. P < 0.05 was set as the cut-off criterion.

Construction and analysis of Protein-protein interaction (PPI) network and signi cant module
Search Tool for the Retrieval of Interacting Genes (STRING; http://string-db.org) (version 10.5) could predict and trace out the PPI network after importing the common DEGs to the online toolbox [55]. Then we applied Cytoscape (version 3.6.1), a free visualization software to construct the PPI networks [56]. The Molecular Complex Detection (MCODE) (version 1.5.1), a plug-in of Cytoscape, was able to diccover the tightly coupled region based on the topology principles. It was performed to screen modules of PPI network with degree cut-off = 6, node score cut-off = 0.2 [57]. And the hub genes were excavated when set as degrees ≥ 10. Two heatmaps of hub genes were developed by R language software. The datasets used in this study was downloaded from the public website GEO database. And all institutional and national guidelines for the use and care of participates were followed.

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
All authors declare no competing nancial interests.   The PPI network of the common DEGs was constructed using Cytoscape.

Figure 5
The most signi cant module was obtained from PPI network of DEGs using MCODE, including 17 nodes and 49 edges.