DOI: https://doi.org/10.21203/rs.3.rs-19490/v1
Rheumatoid arthritis (RA) is the most prevalent autoimmune-inflammatory arthritis, which affects up to 1% of the global population(1). RA is pathologically characterized by the chronic infiltration of immune cells in the synovial membrane, thereby causing progressively destructive joint cartilage and bone(2, 3). Tumor Necrosis Factor inhibitors (infliximab) has been considered as the most notable medication to treat RA(4), which has significantly altered the prognosis of massive RA patients, provided notably alleviated clinical signs and symptoms, promoted quality of life and protected the synovial joint integrity in the long term. However, 30–40% RA patients undergoing infliximab treatment fail to show significantly improved clinical outcomes(5, 6). In addition, the underlying biological mechanisms of the above differential response to infliximab remains rarely determined. Therefore, a more comprehensive understanding of a lack of response could not only assist in personalized treatment, but also shed novel light on molecular heterogeneity of RA. In addition, it also helps to provide novel, potential therapeutic targets for patients with poor therapeutic response to infliximab.
To date, great efforts have been made to explore the relevant biomarkers for therapeutic response in rheumatic diseases, aiming to provide further benefit for personalized therapy as well as curative outcomes, including proteins, microRNAs, genes, and etc(7). Recent high-capacity genomics and transcriptomics technologies, such as gene microarrays which could comprehensively and simultaneously assess the expression profiles of massive genes, have been critically involved in the identification of biomarkers for disease behavior as well as prediction of therapeutic response(8, 9). Thus, a small number of genome-wide transcriptomic researches have concentrated on the RA synovium or bone tissue, aiming at the investigation of the biological processes related to anti-TNF response(10–14). Nevertheless, the gene expression signatures from these studies are moderately related to the therapeutic outcomes, thereby leading to relatively low overlapping of differentially expressed genes (DGEs) among these studies, which indicates the presence of highly biological variability among RA population(15). Therefore, the identification of the biological variation source would be a major advance toward personalized treatment in RA(16). In addition, samples of bone and synovium are not easily obtained compared to blood sample in clinical practice. Therefore, it is critical to find markers of blood samples associated with the occurrence and development of the disease, as well as therapeutic response of RA.
In this study, weighted gene co-expression network analysis (WGCNA) was used to mine biomarkers of blood samples associated with the therapeutic response against infliximab in RA patients and to analyze the differences of these genes between RA patients and healthy control samples. Our findings could provide guidance for medication selection and prognostic prediction on RA patients for clinicians, which could also identify relevant genes to provide therapeutic targets for precise medicine.
Pre-processing and data analysis process of preliminary sample data
We searched the GEO database using the keywords “rheumatoid arthritis” and “blood”. As a result, the whole blood sample dataset GSE93272(17) with the largest sample of RA patients and healthy controls were selected, with the platform of Affymetrix Human Genome U133 Plus 2.0 Array, including 43 healthy controls and 232 RA samples (shown in Tables S1). Meanwhile, the whole blood sample dataset GSE78068 with the largest sample size targeting the drug response of infliximab was selected, with the platform of Agilent-014850 Whole Human Genome Microarray 4x44K G4112F, containing 98 samples with unresponsiveness against infliximab treatment and 42 samples with responsiveness against infliximab treatment (shown in Table S2). Subsequently, the expression data was pre-processed. The probes correspond to the genes, where median was used when multiple probes corresponded to one gene, while a specific probe was removed when the probe corresponded to multiple genes.
WGCNA analysis of gene co-expression modules related to infliximab response in RA patients
The expression profiling data for infliximab response were obtained from GSE78068 (containing a total of 140 samples), followed by mining co-expression modules using the WGCNA co-expression algorithm. First, low-abundance genes or genes with low fold change were removed, as described in the following: 1. Calculate the median of gene expression in each sample; 2. Calculate the Median Absolute Deviation (MAD) of gene expression in each sample; 3. Select genes within both top 75% of median and top 75% MAD ordering. Pearson correlation coefficient was further used to calculate the distance between each gene, and the R-package WGCNA(18) was used to construct the weight co-expression network (threshold was set at 12) to screen the co-expression modules. As a result, the co-expression network was shown to conform to the scale-free network, that was to say, the logarithm log(k) of the node with the connection degree k was negatively correlated with the logarithm log(P(k)) of the probability of the node, and the correlation coefficient should be greater than 0.8. Afterwards, the expression matrix was converted into adjacency matrix, which was further converted into topological matrix. Based on TOM, the average-linkage hierarchical clustering method was used for gene clustering, and the number of genes in the gene network module was set at least 30 according to the criteria of the hybrid dynamic cut tree. After determination of the gene module using the dynamic shear method, we sequentially calculated the eigengenes of each module, then performed clustering analysis on modules, where closer modules were merged into new modules (height=0.25, deepSplit = 2, minModuleSize = 30). The obtained gene modules were used to calculate the correlation of its eigengenes with the therapeutic response of infliximab.
Construction of gene interaction network and gene function analysis
The genes were mapped into the String database, and gene-gene interactions were obtained at a threshold of >0.4, followed by visualization using Cytoscape software and calculation of the topological parameters of the network nodes using cytohubba(19). Moreover, enrichment analysis of GO, KEGG, reactom pathway and miRNA was performed by using g:profiler package(20) to investigate the signaling pathways, miRNAs and biological processes affected by the gene groups.
GSEA enrichment analysis on DEGs of blood samples between healthy controls and RA patients
R package limma(21) was used to analyze the DEGs between RA and healthy controls. In order to include more DEGs, FDR <0.05 was selected as the threshold and the volcano map as shown in Fig.1A. Moreover, the fold differences of gene expression were used as the rank ordering, and GSEA was used for enrichment analysis of DEGs. As shown in Fig1B and C, compared with the healthy control sample, the up-regulated DEGs were mainly concentrated in the region with large difference folds, while the down-regulated DEGs were mainly concentrated in the region with small difference folds.
Identification of infliximab therapeutic response-related gene co-expression modules for RA
To further explore the biomarkers associated with the infliximab response, the expression profile data of GSE78068 was obtained (containing 140 samples). Afterwards, 7692 DEGsbetween samples with unresponsiveness and responsiveness against infliximab treatment were extracted and R software package WGCNA was subsequently used to screen the co-expression module. As a result, the co-expression network was shown to conform to the scale-free network, that was to say, the logarithm log(k) of the node with the connection degree k was negatively correlated with the logarithm log(P(k)) of the probability of the node, and the correlation coefficient should be greater than 0.8. To ensure that the network was a scale-free network, β=12 was set (Fig. 2A and B). A total of 19 modules were obtained (Fig. 2C), where the grey module was a collection of genes that cannot be aggregated into other modules. The transcript statistics of each module was shown in Table 1, showing that 4198 transcripts were assigned to 18 co-expression modules. We subsequently calculated the correlation between the eigenvector of the 19 modules and the therapeutic response of infliximab. As shown in Fig. 2D, the tan module exhibited the most significantly negative correlation with the therapeutic response of infliximab, suggesting that the genes in the tan module might be closely related to the therapeutic response of infliximab.
Table1:Statistics on the number of genes corresponding to each module
Module |
Count |
black |
219 |
blue |
444 |
brown |
442 |
cyan |
69 |
green |
380 |
greenyellow |
120 |
grey |
3494 |
grey60 |
47 |
lightcyan |
62 |
lightgreen |
43 |
magenta |
181 |
midnightblue |
67 |
pink |
202 |
purple |
130 |
red |
317 |
salmon |
75 |
tan |
102 |
turquoise |
869 |
yellow |
429 |
To further investigate the correlation between the genes in the tan module and the therapeutic response of infliximab, we analyzed the difference of the eigenvector of the tan module between infliximab-responsive samples and infliximab-unresponsive samples. As shown in Fig.3A, the eigenvector of the tan module was significantly higher in the infliximab-unresponsive samples than that in the infliximab-responsive samples, suggesting the expression levels of genes in the tan module may be more active in infliximab- unresponsive samples. We subsequently analyzed the difference of the levels of 102 genes of the tan module between infliximab-responsive samples and infliximab-unresponsive samples. As shown in Fig.3B, the gene expression levels were significantly higher in the infliximab-unresponsive samples than that in the infliximab-responsive samples, consistent with our expectation. Finally, we analyzed the differential expression of these genes in RA patients and healthy control samples. As shown in Fig.3C, the expression levels of these genes in the blood samples of RA patients were significantly higher than those of healthy controls, suggesting that the 102 genes in the tan module were not only related to the therapeutic response of infliximab in RA patients, but also critically involved in the occurrence and development of RA.
Functional enrichment analysis of the 102 genes in tan module
To investigate the gene functions in the tan module, the genes in the tan module were mapped to the g:profiler online analysis platform for enrichment analysis, with threshold FDR<0.05. As a result, these genes could be enriched into multiple GO terms, including 91 biological processes, mainly related to immune responses and stimuli defense (as shown in Fig.4A); five cellular components, mainly associated with cytoplasma and inflammasome (as shown in Fig.4B); five molecular functions, mainly related to protein and RNA binding (as shown in Fig.4C). Subsequently, these 102 genes were mainly enriched into seven KEGG pathways, mainly associated with inflammatory infection, NOD-like receptor signaling pathway (as shown in Fig.4D); and 13 reactom pathways, mainly correlated with immunity and the interferon signaling pathway (as shown in Fig.4E). Finally, the miRNAs enrichment analysis of these genes showed that 16 out of the 102 genes could be enriched into hsa-miR-146a-5p (as shown in Fig.4F). hsa-miR-146a-5p has been validated to be closely related to the occurrence and development of various complex diseases(22, 23). The serum level of hsa-miR-146a-5p has been reported as a diagnostic marker for RA and predictive marker for clinical response to TNF inhibitor(24, 25). As we all known, RA is a chronic systemic autoimmune disease characterized by joint lesions. In our study, the genes in the gene co-expression module tan, which was most significantly correlated with the therapeutic response of infliximab, are involved in the multiple immunological processes and relevant signaling transduction.
Mining and validation of key genes associated with the occurrence, development and infliximab response of RA diseases
In order to identify key gene markers, genes in the tan module were mapped to the String database, and combine score>0.4 was selected, which finally gave rise to 330 interactions of 63 genes (as shown in last.ppi.cm0.4.txt), followed by Cytoscope visualization (as shown in Fig.5A). The statistics of the degree of genes in the network were shown in Fig.5B, revealing relatively high degree of most genes, with an average of 10.47, which suggested the relatively close interaction among these genes. The degree distribution of the network was shown in Fig.5C, showing that the degree of most genes was relatively small, while the degree of a few genes was relatively large, which indicated that the close gene-gene interactions in the network may be dominated by a small number of genes with large degree. The Cytoscope plug-in cytohubba was subsequently used to analyze the topological properties of each node, and genes simultaneously within the top 20 of the degree, top 20 of the Closeness, and top 20 of the Betweenness were selected (as shown in lst.gene.txt) as key genes correlated with the occurrence, development and infliximab response in RA.
Afterwards, to confirm the relationship between these 14 genes and the therapeutic response of infliximab, the differential expression of these 14 genes between the infliximab-responsive samples and the infliximab-unresponsive samples of RA patients was analyzed. As shown in Fig.6, there were significant difference of 10 genes between the two groups, and the expression levels of the 10 genes (OAS1, STAT1, IFIT3, IFIT2, IFI6, RSAD2, OASL, GBP2, IRF7 and IFIT1)were significantly lower in the infliximab-responsive samples than those in the infliximab-unresponsive samples. This finding indicated that these genes were associated with the therapeutic response of infliximab, which could be used as a blood biomarker for individualized treatment of infliximab (even anti-TNF therapy) in RA patients.
Finally, in order to investigate the relationship of these 14 genes with the occurrence and development of RA, the expression difference of the 14 genes were analyzed between RA samples and healthy control samples. As shown in Fig. 7, there were significant difference of the 14 genes between the two groups, and the expression levels of these genes were significantly higher in RA samples than those in healthy controls, suggesting that these 14 genes were associated with the occurrence and development of RA, which could be used as a blood diagnostic marker or new therapeutic targets for RA.
As a summary (Fig.8), these 14 genes play an important role in the occurrence and development of RA, and further high-expression of 10 out of them will lead to a decreased responsiveness of infliximab in RA patients.
RA is a chronic systemic autoimmune disease characterized by joint lesion. RA is mainly clinically manifested as joint swelling and pain caused by the synovial membrane of the facet joint, followed by cartilage destruction, narrowing of the joint space, and joint stiffness, deformity and dysfunction due to severe bone destruction and absorption in the advanced stage. The prevalence of RA in China is 0.24–0.5%, with more female patients than male patients, with a ratio of 2–3:1. RA can occur at any age, and is most commonly seen in patients with 20–50 years old (26). RA is mostly a recurrent disease with high disability rate and poor prognosis (27). To date, anti-TNF therapy has been validated as an effective approach to treat RA. Nevertheless, relevant studies have suggested the responsive heterogeneity of infliximab among different RA patient, the identification of biomarkers correlated with the occurrence, development and infliximab response might play critical role in the identification of patients with relatively low responsive rate to infliximab, which could thereby, render the timely selection of alternative medication without delaying any effective therapy.
To implement individualized therapy and to enhance the therapeutic outcomes in patients with RA, great efforts have been made in the identification of various predictors for therapeutic response, including miRNAs, proteins, genes and single nucleotide polymorphisms based on miRNomics, proteomics, transcriptomics and genomics, respectively(28, 29). Nevertheless, researches only based on the differential expression analysis of “omics” data is unlikely to be sufficient to completely illustrate all alterations throughout the progression and therapeutic response of RA. To this end, in this study, we systematically integrated the differential expression based on expression profile data, constructed gene co-expression modules, and analyzed the topological characteristics of gene-gene interaction network. To begin with, our study renders the sufficient usage of gene co-expression information based on microarray data, thereby providing more impressive information compared to expression changes of individual genes for target gene identification. Secondly, network analysis is considered as a potent method for a more comprehensive and in-depth understanding of the pathogenesis, progression and therapeutic response of multiple disease. The integration of the topological characteristics of biological network renders the addition of previously missing data in analyzing differential expression, thereby identifying the potential gene biomarkers associated with the occurrence, development and the therapeutic response of infliximab in RA.
The above bioinformatics analysis revealed 14 potential biomarkers from blood samples. These genes are significantly up-regulated in the development of RA, and moreover, the expression levels of 10 genes (OAS1, STAT1, IFIT3, IFIT2, IFI6, RSAD2, OASL, GBP2, IRF7 and IFIT1) are also negatively associated with the therapeutic response of infliximab in RA patients. Most of these genes (nine out of tengenes) have been reported to be involved in the occurrence, development and therapeutic response of RA(30–34), and physio pathological processes of the immune system(35), confirming the results of bioinformatics mining based on the RA dataset from the GEO database are highly credible and accurate. However, the correlation of the one remaining genes (GBP2), which have the third most significant differential expression between the infliximab-responsive samples and infliximab-unresponsive sampleswith RA, has not been confirmed by basic and clinical studies and attracted us to carry out a further study.
In this study, WGCNA is used to mine biomarkers from blood samples associated with the therapeutic response of infliximab in RA patients, followed by analysis on the difference of these genes in RA patients and healthy controls. Our present outcomes could provide guidance on medication selection and prognostic prediction of RA patients for clinicians, which can also identify relevant genes to provide therapeutic targets for precise medicine.
Acknowledgements
Not applicable.
Funding.
No
Availability of data and materials
The datasets used during the present study are available from the corresponding author upon reasonable request.
Authors’ contributions
J.W. and N.Z. designed the experiment. J.W., D.C. and N.Z. collected and analyzed the data. J.W., Q.F., Z.J.W. and H.Z wrote the manuscript. All authors discussed the results and contributed to the manuscript.
Ethics approval and consent to participate
Not applicable.
Patient consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional file1:Table S1.xlsx, 43 healthy controls and 232 RA samples, which were selected from the platform of Affymetrix Human Genome U133 Plus 2.0 Array.
Additional file2:Table S2.xlsx, 98 samples with unresponsiveness against infliximab treatment and 42 samples with responsiveness against infliximab treatment, which were selected from the platform of Agilent-014850 Whole Human Genome Microarray 4x44K G4112F.