Mining biomarkers for the occurrence, development and therapeutic response to iniximab of rheumatoid arthritis based on weighted gene co-expression network analysis

Rheumatoid arthritis (RA) is a chronic systemic autoimmune and recurrent disease characterized by joint lesion with high disability rate and poor prognosis. Although anti-TNF therapy (iniximab) has been validated to be effective in controlling the symptoms of RA, more than one-third of patients cannot benet from anti-TNF therapy. Therefore, it is critical to explore novel biomarkers associated with the occurrence and development of the disease, and therapeutic response of iniximab treatment. In this study, we rst analyzed differentially expressed genes (DEGs) between RA patients and healthy controls. Subsequently, WGCNA was used to analyze the co-expression module (tan module) associated with the therapeutic response of iniximab treatment in RA, which further found that the genes (a total of 102) in the module tan were also the DEGs between RA patients and healthy controls (conrming that the close correlation of these genes with the occurrence and development of RA). Subsequently, enrichment analysis revealed that these 102 genes were mainly related to the body’s inammatory immune response and the functions of NOD-like receptor signaling pathway as well as miR-146a-5p. Moreover, protein-protein interaction (PPI) was constructed on the 102 genes, followed by mining the key genes in the network (a total of 14 genes) by topological analysis. Finally, we conrmed that 10 genes (OAS1, STAT1, IFIT3, IFIT2, IFI6, RSAD2, OASL, GBP2, IRF7 and IFIT1) were potential biomarkers for the occurrence and development, as well as the therapeutic response against iniximab treatment in RA by analyzing the differences in the expressions of these genes between RA and healthy control samples, as well as drug response and non-response samples. Therefore, this study can provide guidance on medication selection and prognostic prediction of RA patients for clinicians, which can also identify relevant genes to provide therapeutic targets for precise medicine.


Introduction
Rheumatoid arthritis (RA) is the most prevalent autoimmune-in ammatory arthritis, which affects up to 1% of the global population (1). RA is pathologically characterized by the chronic in ltration of immune cells in the synovial membrane, thereby causing progressively destructive joint cartilage and bone (2,3).
Tumor Necrosis Factor inhibitors (in iximab) has been considered as the most notable medication to treat RA (4), which has signi cantly altered the prognosis of massive RA patients, provided notably alleviated clinical signs and symptoms, promoted quality of life and protected the synovial joint integrity in the long term. However, 30-40% RA patients undergoing in iximab treatment fail to show signi cantly improved clinical outcomes (5,6). In addition, the underlying biological mechanisms of the above differential response to in iximab remains rarely determined. Therefore, a more comprehensive understanding of a lack of response could not only assist in personalized treatment, but also shed novel light on molecular heterogeneity of RA. In addition, it also helps to provide novel, potential therapeutic targets for patients with poor therapeutic response to in iximab.
To date, great efforts have been made to explore the relevant biomarkers for therapeutic response in rheumatic diseases, aiming to provide further bene t for personalized therapy as well as curative outcomes, including proteins, microRNAs, genes, and etc (7). Recent high-capacity genomics and transcriptomics technologies, such as gene microarrays which could comprehensively and simultaneously assess the expression pro les of massive genes, have been critically involved in the identi cation of biomarkers for disease behavior as well as prediction of therapeutic response (8,9). Thus, a small number of genome-wide transcriptomic researches have concentrated on the RA synovium or bone tissue, aiming at the investigation of the biological processes related to anti-TNF response (10)(11)(12)(13)(14). Nevertheless, the gene expression signatures from these studies are moderately related to the therapeutic outcomes, thereby leading to relatively low overlapping of differentially expressed genes (DGEs) among these studies, which indicates the presence of highly biological variability among RA population (15). Therefore, the identi cation of the biological variation source would be a major advance toward personalized treatment in RA (16). In addition, samples of bone and synovium are not easily obtained compared to blood sample in clinical practice. Therefore, it is critical to nd markers of blood samples associated with the occurrence and development of the disease, as well as therapeutic response of RA.
In this study, weighted gene co-expression network analysis (WGCNA) was used to mine biomarkers of blood samples associated with the therapeutic response against in iximab in RA patients and to analyze the differences of these genes between RA patients and healthy control samples. Our ndings could provide guidance for medication selection and prognostic prediction on RA patients for clinicians, which could also identify relevant genes to provide therapeutic targets for precise medicine.

Materials And Methods
Pre-processing and data analysis process of preliminary sample data We searched the GEO database using the keywords "rheumatoid arthritis" and "blood". As a result, the whole blood sample dataset GSE93272 (17) with the largest sample of RA patients and healthy controls were selected, with the platform of Affymetrix Human Genome U133 Plus 2.0 Array, including 43 healthy controls and 232 RA samples (shown in Tables S1). Meanwhile, the whole blood sample dataset GSE78068 with the largest sample size targeting the drug response of in iximab was selected, with the platform of Agilent-014850 Whole Human Genome Microarray 4x44K G4112F, containing 98 samples with unresponsiveness against in iximab treatment and 42 samples with responsiveness against in iximab treatment (shown in Table S2). Subsequently, the expression data was pre-processed. The probes correspond to the genes, where median was used when multiple probes corresponded to one gene, while a speci c probe was removed when the probe corresponded to multiple genes.
WGCNA analysis of gene co-expression modules related to in iximab response in RA patients The expression pro ling data for in iximab response were obtained from GSE78068 (containing a total of 140 samples), followed by mining co-expression modules using the WGCNA co-expression algorithm.
First, low-abundance genes or genes with low fold change were removed, as described in the following: 1. Calculate the median of gene expression in each sample; 2. Calculate the Median Absolute Deviation (MAD) of gene expression in each sample; 3. Select genes within both top 75% of median and top 75% MAD ordering. Pearson correlation coe cient was further used to calculate the distance between each gene, and the R-package WGCNA (18) was used to construct the weight co-expression network (threshold was set at 12) to screen the co-expression modules. As a result, the co-expression network was shown to conform to the scale-free network, that was to say, the logarithm log(k) of the node with the connection degree k was negatively correlated with the logarithm log(P(k)) of the probability of the node, and the correlation coe cient should be greater than 0.8. Afterwards, the expression matrix was converted into adjacency matrix, which was further converted into topological matrix. Based on TOM, the averagelinkage hierarchical clustering method was used for gene clustering, and the number of genes in the gene network module was set at least 30 according to the criteria of the hybrid dynamic cut tree. After determination of the gene module using the dynamic shear method, we sequentially calculated the eigengenes of each module, then performed clustering analysis on modules, where closer modules were merged into new modules (height=0.25, deepSplit = 2, minModuleSize = 30). The obtained gene modules were used to calculate the correlation of its eigengenes with the therapeutic response of in iximab.

Construction of gene interaction network and gene function analysis
The genes were mapped into the String database, and gene-gene interactions were obtained at a threshold of >0.4, followed by visualization using Cytoscape software and calculation of the topological parameters of the network nodes using cytohubba (19). Moreover, enrichment analysis of GO, KEGG, reactom pathway and miRNA was performed by using g:pro ler package (20) to investigate the signaling pathways, miRNAs and biological processes affected by the gene groups.

Results
GSEA enrichment analysis on DEGs of blood samples between healthy controls and RA patients R package limma(21) was used to analyze the DEGs between RA and healthy controls. In order to include more DEGs, FDR <0.05 was selected as the threshold and the volcano map as shown in Fig.1A. Moreover, the fold differences of gene expression were used as the rank ordering, and GSEA was used for enrichment analysis of DEGs. As shown in Fig1B and C, compared with the healthy control sample, the up-regulated DEGs were mainly concentrated in the region with large difference folds, while the downregulated DEGs were mainly concentrated in the region with small difference folds.
Identi cation of in iximab therapeutic response-related gene co-expression modules for RA To further explore the biomarkers associated with the in iximab response, the expression pro le data of GSE78068 was obtained (containing 140 samples). Afterwards, 7692 DEGsbetween samples with unresponsiveness and responsiveness against in iximab treatment were extracted and R software package WGCNA was subsequently used to screen the co-expression module. As a result, the coexpression network was shown to conform to the scale-free network, that was to say, the logarithm log(k) of the node with the connection degree k was negatively correlated with the logarithm log(P(k)) of the probability of the node, and the correlation coe cient should be greater than 0.8. To ensure that the network was a scale-free network, β=12 was set ( Fig. 2A and B). A total of 19 modules were obtained (Fig.  2C), where the grey module was a collection of genes that cannot be aggregated into other modules. The transcript statistics of each module was shown in Table 1, showing that 4198 transcripts were assigned to 18 co-expression modules. We subsequently calculated the correlation between the eigenvector of the 19 modules and the therapeutic response of in iximab. As shown in Fig. 2D, the tan module exhibited the most signi cantly negative correlation with the therapeutic response of in iximab, suggesting that the genes in the tan module might be closely related to the therapeutic response of in iximab. To further investigate the correlation between the genes in the tan module and the therapeutic response of in iximab, we analyzed the difference of the eigenvector of the tan module between in iximabresponsive samples and in iximab-unresponsive samples. As shown in Fig.3A, the eigenvector of the tan module was signi cantly higher in the in iximab-unresponsive samples than that in the in iximabresponsive samples, suggesting the expression levels of genes in the tan module may be more active in in iximab-unresponsive samples. We subsequently analyzed the difference of the levels of 102 genes of the tan module between in iximab-responsive samples and in iximab-unresponsive samples. As shown in Fig.3B, the gene expression levels were signi cantly higher in the in iximab-unresponsive samples than that in the in iximab-responsive samples, consistent with our expectation. Finally, we analyzed the differential expression of these genes in RA patients and healthy control samples. As shown in Fig.3C, the expression levels of these genes in the blood samples of RA patients were signi cantly higher than those of healthy controls, suggesting that the 102 genes in the tan module were not only related to the therapeutic response of in iximab in RA patients, but also critically involved in the occurrence and development of RA.
Functional enrichment analysis of the 102 genes in tan module To investigate the gene functions in the tan module, the genes in the tan module were mapped to the g:pro ler online analysis platform for enrichment analysis, with threshold FDR<0.05. As a result, these genes could be enriched into multiple GO terms, including 91 biological processes, mainly related to immune responses and stimuli defense (as shown in Fig.4A); ve cellular components, mainly associated with cytoplasma and in ammasome (as shown in Fig.4B); ve molecular functions, mainly related to protein and RNA binding (as shown in Fig.4C). Subsequently, these 102 genes were mainly enriched into seven KEGG pathways, mainly associated with in ammatory infection, NOD-like receptor signaling pathway (as shown in Fig.4D); and 13 reactom pathways, mainly correlated with immunity and the interferon signaling pathway (as shown in Fig.4E). Finally, the miRNAs enrichment analysis of these genes showed that 16 out of the 102 genes could be enriched into hsa-miR-146a-5p (as shown in Fig.4F). hsa-miR-146a-5p has been validated to be closely related to the occurrence and development of various complex diseases (22,23). The serum level of hsa-miR-146a-5p has been reported as a diagnostic marker for RA and predictive marker for clinical response to TNF inhibitor (24,25). As we all known, RA is a chronic systemic autoimmune disease characterized by joint lesions. In our study, the genes in the gene co-expression module tan, which was most signi cantly correlated with the therapeutic response of in iximab, are involved in the multiple immunological processes and relevant signaling transduction.
Mining and validation of key genes associated with the occurrence, development and in iximab response of RA diseases In order to identify key gene markers, genes in the tan module were mapped to the String database, and combine score>0.4 was selected, which nally gave rise to 330 interactions of 63 genes (as shown in last.ppi.cm0.4.txt), followed by Cytoscope visualization (as shown in Fig.5A). The statistics of the degree of genes in the network were shown in Fig.5B, revealing relatively high degree of most genes, with an average of 10.47, which suggested the relatively close interaction among these genes. The degree distribution of the network was shown in Fig.5C, showing that the degree of most genes was relatively small, while the degree of a few genes was relatively large, which indicated that the close gene-gene interactions in the network may be dominated by a small number of genes with large degree. The Cytoscope plug-in cytohubba was subsequently used to analyze the topological properties of each node, and genes simultaneously within the top 20 of the degree, top 20 of the Closeness, and top 20 of the Betweenness were selected (as shown in lst.gene.txt) as key genes correlated with the occurrence, development and in iximab response in RA.
Afterwards, to con rm the relationship between these 14 genes and the therapeutic response of in iximab, the differential expression of these 14 genes between the in iximab-responsive samples and the in iximab-unresponsive samples of RA patients was analyzed. As shown in Fig.6, there were signi cant difference of 10 genes between the two groups, and the expression levels of the 10 genes (OAS1, STAT1, IFIT3, IFIT2, IFI6, RSAD2, OASL, GBP2, IRF7 and IFIT1)were signi cantly lower in the in iximab-responsive samples than those in the in iximab-unresponsive samples. This nding indicated that these genes were associated with the therapeutic response of in iximab, which could be used as a blood biomarker for individualized treatment of in iximab (even anti-TNF therapy) in RA patients.
Finally, in order to investigate the relationship of these 14 genes with the occurrence and development of RA, the expression difference of the 14 genes were analyzed between RA samples and healthy control samples. As shown in Fig. 7, there were signi cant difference of the 14 genes between the two groups, and the expression levels of these genes were signi cantly higher in RA samples than those in healthy controls, suggesting that these 14 genes were associated with the occurrence and development of RA, which could be used as a blood diagnostic marker or new therapeutic targets for RA.
As a summary (Fig.8), these 14 genes play an important role in the occurrence and development of RA, and further high-expression of 10 out of them will lead to a decreased responsiveness of in iximab in RA patients.

Discussion
RA is a chronic systemic autoimmune disease characterized by joint lesion. RA is mainly clinically manifested as joint swelling and pain caused by the synovial membrane of the facet joint, followed by cartilage destruction, narrowing of the joint space, and joint stiffness, deformity and dysfunction due to severe bone destruction and absorption in the advanced stage. The prevalence of RA in China is 0.24-0.5%, with more female patients than male patients, with a ratio of 2-3:1. RA can occur at any age, and is most commonly seen in patients with 20-50 years old (26). RA is mostly a recurrent disease with high disability rate and poor prognosis (27). To date, anti-TNF therapy has been validated as an effective approach to treat RA. Nevertheless, relevant studies have suggested the responsive heterogeneity of in iximab among different RA patient, the identi cation of biomarkers correlated with the occurrence, development and in iximab response might play critical role in the identi cation of patients with relatively low responsive rate to in iximab, which could thereby, render the timely selection of alternative medication without delaying any effective therapy.
To implement individualized therapy and to enhance the therapeutic outcomes in patients with RA, great efforts have been made in the identi cation of various predictors for therapeutic response, including miRNAs, proteins, genes and single nucleotide polymorphisms based on miRNomics, proteomics, transcriptomics and genomics, respectively (28,29). Nevertheless, researches only based on the differential expression analysis of "omics" data is unlikely to be su cient to completely illustrate all alterations throughout the progression and therapeutic response of RA. To this end, in this study, we systematically integrated the differential expression based on expression pro le data, constructed gene co-expression modules, and analyzed the topological characteristics of gene-gene interaction network. To begin with, our study renders the su cient usage of gene co-expression information based on microarray data, thereby providing more impressive information compared to expression changes of individual genes for target gene identi cation. Secondly, network analysis is considered as a potent method for a more comprehensive and in-depth understanding of the pathogenesis, progression and therapeutic response of multiple disease. The integration of the topological characteristics of biological network renders the addition of previously missing data in analyzing differential expression, thereby identifying the potential gene biomarkers associated with the occurrence, development and the therapeutic response of in iximab in RA.
The above bioinformatics analysis revealed 14 potential biomarkers from blood samples. These genes are signi cantly up-regulated in the development of RA, and moreover, the expression levels of 10 genes (OAS1, STAT1, IFIT3, IFIT2, IFI6, RSAD2, OASL, GBP2, IRF7 and IFIT1) are also negatively associated with the therapeutic response of in iximab in RA patients. Most of these genes (nine out of tengenes) have been reported to be involved in the occurrence, development and therapeutic response of RA (30)(31)(32)(33)(34), and physio pathological processes of the immune system(35), con rming the results of bioinformatics mining based on the RA dataset from the GEO database are highly credible and accurate. However, the correlation of the one remaining genes (GBP2), which have the third most signi cant differential expression between the in iximab-responsive samples and in iximab-unresponsive sampleswith RA, has not been con rmed by basic and clinical studies and attracted us to carry out a further study.
In this study, WGCNA is used to mine biomarkers from blood samples associated with the therapeutic response of in iximab in RA patients, followed by analysis on the difference of these genes in RA patients and healthy controls. Our present outcomes could provide guidance on medication selection and prognostic prediction of RA patients for clinicians, which can also identify relevant genes to provide therapeutic targets for precise medicine.

Declarations
Supplemental Information Note Additional le1 Table S1.xlsx, 43 healthy controls and 232 RA samples, which were selected from the platform of Affymetrix Human Genome U133 Plus 2.0 Array.
Additional le2 Table S2.xlsx, 98 samples with unresponsiveness against in iximab treatment and 42 samples with responsiveness against in iximab treatment, which were selected from the platform of Agilent-014850 Whole Human Genome Microarray 4x44K G4112F.    The differential expression of the 14 hub genes between the in ixi-mab-responsive and unresponsive samples of RA patients. The expression difference of the 14 hub genes between RA samples and healthy control samples.