Identi cation of RRM2 as a Potential Biomarker for the Diagnosis and Prognosis of Rheumatoid Arthritis

Wu Biao Fifth A liated Hospital of Sun Yat-sen University https://orcid.org/0000-0003-4279-4921 Yufeng Chen Fifth A liated Hospital of Sun Yat-sen University Junlong Zhong Fifth A liated Hospital of Sun Yat-sen University Shuping Zhong Fifth A liated Hospital of Sun Yat-sen University Bin Wang Fifth A liated Hospital of Sun Yat-sen University Xuegang Li Fifth A liated Hospital of Sun Yat-sen University Ning Jiang Fifth A liated Hospital of Sun Yat-sen University Jie Shang Fifth A liated Hospital of Sun Yat-sen University Xianghe Xu Fifth A liated Hospital of Sun Yat-sen University Huading Lu (  johnniehuading@163.com ) Fifth A liated Hospital of Sun Yat-sen University


Introduction
Rheumatoid arthritis (RA) is a common chronic, in ammatory, autoimmune disease which can occur at any age [1]. It is characterized by a progressive synovitis that is initiated and maintained by a complex interplay among different immune cells as well as tissue cells [2,3]. The average prevalence of RA ranges from 0.5-1.0% globally [4]. Once treatment is inadequate or delayed, RA will lead to accumulating cartilage destruction, bone erosion, irreversible physical challenges and systemic complications [1,5].
The disease is complex and is known to be prominently associated with some risk factors, including sex, genetic component and environmental factors [1]. The females are generally two to three folds more likely to be affected with RA than males [6]. The females with higher prevalence of RA is partly attributed to the dual (stimulatory and inhibitory) effects of estrogen on the immune system [7]. With the development of Genome-Wide Associated Study, plenty of genetic association studies show that RA have a strong genetic element which can account for 60% of the risk of developing RA [1,8] and more than 100 RA risk genetics loci have been identi ed, such as HLA-DRB1, CD28, STAT4, PADI4, PTPN22 as well as epigenetic modi cation and so on [4,9,10]. Other risk factors are consist of environmental factors, including inhaled pollutants, nutritional habits, chronic infection, microbiota and geographic or ethnic differences [11,12].
However, studies for some of these risk factors are not robust enough. And plenty of RA-associated susceptibility genes still need to be further explored.
In the treatment strategies of RA, early and precise diagnosis is pivotal to prevent damage from occurring, since timely treatment can protect as many as 90% patients with early RA from disease progression and subsequent joint damage [2,3]. However, among most RA patients, when the clinical manifestations such as swelling, morning stiffness and tenderness of joints are appeared evidently, the pathogenesis has begun for many years [1,13]. Moreover, there is still no ideal diagnostic criteria for RA [13]. Although the current ACR/EULAR 2010 classi cation criteria has a higher sensitivity for early recognition of RA than the older ACR 1987 classi cation system [13][14][15], it has only a speci city of 61% [2]. Currently there are some ways for clinicians to evaluate a patient with possible early RA, including inquiry symptoms, medical history, physical examination, imaging test and laboratory blood tests [16]. Yet the imaging tests, i.e., magnetic resonance imaging (MRI), ultrasonography or X-ray are very sensitive for detecting synovitis and pannus before bone erosion and cartilage degradation, but they are some limitations in speci city or routine application [3,13,17,18]. Additionally, the positive predictive value of the biomarkers (anti-CCP, RF) is moderate. The sensitivity is 67% and 70% for anti-CCP and RF, respectively [19], which remain challenges facing rheumatologists. Therefore, the development of novel diagnostic biomarkers for RA remains an unmet need.
In the present study, to identify the novel RA-associated susceptibility genes and test their potential as new biomarkers for the diagnosis and prognosis of RA, we collected PBMCs from 40 healthy human donors and 47 RA patients. Partial samples were used for RNA-seq analyses to identify the differentially expressed genes (DEGs) between RA patients and healthy donors. The PBMCs-mRNA in DEGs were further subjected to Gene Ontology (GO) enrichment analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Furthermore, the hub genes and key modules associated with RA were screened by bioinformatics analyses and made a functional exploration by GO and KEGG. Then, the expression of hub genes in RA were assessed and screened in GSE93272 dataset which included 23 healthy control subjects and 136 patients with RA. Next, real time-quantitative PCR analyses were performed to further con rm the expression of the hub genes from the PBMCs that collected from 47 RA patients and 40 healthy volunteers. To assess whether the identi ed candidate mRNAs can be biomarkers for the diagnosis and prognosis of RA, we further investigated correlation between candidate mRNAs with clinical characteristics of RA patients (Fig. 1A).

Study population and data resources
For RNA-seq analysis as well as hub genes expression validation and clinical evaluation, a total of 84 participants (47 RA patients and 40 healthy individuals) were recruited in 2019. All the RA patients met the classi cation criteria of ACR/EULAR 2010 and were recruited in The Fifth A liated Hospital of Sun-Yat-Sen University. And all the healthy controls (HCs) without renal failure, heart failure, or autoimmune disease, and free from other in ammatory conditions, were recruited from the same hospital. All the participants had examinations and evaluations of swollen and tender joints, clinical disease activity index (CDAI) and disease activity score 28 (DAS28). Laboratory tests included anti-CCP, C-reactive protein (CRP), erythrocyte sedimentation rate (ESR) and blood routine parameters. The characteristics of healthy controls and RA patients were showed in Table 1. All study protocols were approved by the ethics committee of The Fifth A liated Hospital of Sun-Yat-Sen University. All participants in this study were informed and signed written consent. For testing the expression of hub genes between RA and HCs, we are in searched of the Gene Expression Omnibus(GEO)database [20] https://www.ncbi.nlm.nih.gov/geo/ for microarray datasets using the keyword "rheumatoid arthritis". Datasets were included if they met all of the following criteria: (1) were from humans; (2) included expression data from blood mRNA of both RA and HCs; (3) the number of rows in each platform was > 50,000; (4) the number of RA samples was ≥ 20, and the number of HCs samples was ≥ 20. Finally, one dataset GSE93272 was selected [21].

Preparation of Peripheral Blood Samples and Isolation of plasma and RNA
For whole-blood transcriptome analysis, peripheral blood samples (8ml) were collected from each RA patient and healthy control into EDTA-2K-containing tubes. Peripheral blood mononuclear cells (PBMCs) were extracted by using the Histopaque-1077 (Sigma-Aldrich, UK) from blood samples. And the plasma was isolated and stored at -80℃ for Elisa assay. Follow-up total RNA was extracted from PBMCs using the Total RNA Kit I (Omega Bio-tek, USA) according to manufacturer's instructions. The concentrations of the RNA were quanti ed by NanoDrop ND-1000 spectrophotometer (ThermoFisher Scienti c, USA) and assessed via absorbance ratios of A260/A280 nm > 1.8 and RNA Integrity Number > 7.

RNA-seq analysis
Before RNA-sequencing, the quality of RNA was tested by Agilent 2100 Bioanalyzer (Agilent Technology).
2 µg of RNA sample was taken for RNA-sequencing. RNase R digested and rRNA depleted RNAs were taken to generated the sequencing libraries by using Total RNA-seq (H/M/R) Library Prep Kit for Illumina (Vazyme Biotech) according to manufacturer's instructions. The library preparations were sequenced on Hiseq X Ten (Illumina).

Identi cation of differentially expressed genes
For identi cation of the differentially expressed mRNA in all DEGs between RA patients and healthy control samples, the limma package (version 3.46.0) was performed with the prede ned criterion (|log2FoldChange| >1 and P < 0.05) [22].

Functional exploration for DEGs and key modules
To explore and visualize the potential functions of the identi ed mRNAs in DEGs and key modules, clusterPro ler package [23] in R was utilized for Gene Ontology (GO) enrichment analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment.

Protein-protein interaction (PPI) network analysis
To establish the interactions of up-or down-regulated RA-related mRNAs in DEGs, we constructed PPI network by STRING V11.0 database (https://string-db.org/) [24], and subsequently visualized it by Cytoscape software v3.8.2 [25]. Typically, the most widely applied gene centrality measures are degree, closeness and betweenness. Degree depicts the overall connectivity of a node with other nodes in the network. Closeness represents the average distance from a node to all other nodes in the network. The betweenness counts the number of times a node acts as a bridge along the shortest path between two other nodes in the network.

Identi cation of hub genes and key modules
For screening hub genes, CytoHubba v0.1, a Cytoscape plug-in which can predict and explore important nodes/hubs by topological algorithms, which was used based on the genes in PPI networks [26]. Meanwhile, the Cytoscape plug-in Molecular Complex Detection v2.0.0 (MCODE) was employed to analyze the RA-associated key modules with MCODE score ≥ 5 and nodes ≥ 5 [27].

Real time-quantitative PCR (RT-qPCR) analysis
Firstly, RNA integrity was assessed prior to cDNA synthesis by agarose gel electrophoresis followed by ethidium bromide staining. Then we synthesized rst strand cDNA from total RNA templates using RevertAid Master Mix (Thermo Scienti c, Lithuania). Brie y, the reverse transcription reactions were incubated for 30min at 42℃, 5min at 95℃ and hold at 4℃. Last, real time-quantitative PCR (RT-qPCR) was performed to measure the relative genes expression on a CFX-96 Touch™ (BIO-RAD, USA) using Forget-Me-Not™ EvaGreen® qPCR Master Mix (Biotium, USA) according to the product' protocol. The primer sequences were listed in Supplementary Table S1. The expression was determined by the threshold cycle (Ct), and relative expression levels were calculated using the 2 −ΔΔCt method with GAPDH as an internal control.

ELISA
To determine the concentration of in ammatory cytokines in all plasma samples, such as IL-1α, IL-1β, IL-6, IL-8, TNF-α and IFN-γ, enzyme-linked immunosorbent assay (ELISA) was performed by corresponding ELISA Kit (Saicheng Bio-tek, China) according to the manufacturer's instruction.

Statistical Analysis
Data was tabulated using Microsoft Excel 2019 and analyzed using SPSS software v24 (IBM, USA) and Graphpad Prism v8.0.1 (GraphPad Software, Inc, San Diego, CA). Student's t-test was employed to assess the gene expression between RA cases and healthy controls in GSE93272 dataset which was normally distributed parameters. Mann-Whitney's U test was applied for skewed distribution including genes expression in RA course and states. Pearson's analysis or Spearman method was performed for testing the correlation between mRNA expression levels and the clinical variables including CDAI, DAS28, counts of swollen/tender joints and in ammatory factors, as appropriate. Receiver Operating Characteristic (ROC) Curves were conducted to evaluate sensitivity and speci city of candidate mRNAs. P < 0.05 was considered to be statistically signi cant threshold.

Identi cation of DEGs
To identify the different expression genes between control and RA, we collected whole blood samples from 4 healthy individuals and 4 RA patients,and then performed RNA-seq analyses. Analysis of the dataset revealed 178 mRNAs that were differentially expressed between the healthy controls and the rheumatoid arthritis patients (|log 2 FoldChange| >1 and P < 0.05). Among the 178 mRNAs identi ed, 124 mRNAs were upregulated and 54 mRNAs were downregulated. Their distribution was presented using a volcano plot (Fig. 1B).

Functional exploration
To further understand the functions of the mRNAs that were differentially expressed and connections among them, GO and KEGG pathway enrichment analyses were performed on the up or down-regulated mRNAs by the DAVID functional annotation clustering tool, respectively. The analysis results showed that under the "biological processes" category,up-regulated mRNAs were signi cantly enriched in nucleosome assembly, mitotic nuclear division, cell division, cellular protein metabolic process and gene silencing by RNA. On the other hand, down-regulated mRNAs were enriched in in ammatory response, immune response, cell adhesion, proteolysis, and bicarbonate transport (Supplementary Table S2 Table S2).

Protein-protein interaction (PPI) network construction and key module analysis
To further narrow and target the key regulatory mRNAs, a PPI network analysis was conducted by using STRING (Fig. 2A). The top 10 hub mRNAs with the highest degrees of interaction were identi ed with the cytoHubba in Cytoscape. These hub genes were CCNB1, CDC6, TOP2A, BIRC5, KIF2C, RRM2, CDC20, ASPM, KIF23 and DTL. Their degree, closeness and betweenness were showed in Table 2.  Table S3. All genes in the three modules were utilized to perform GO term and KEGG pathway enrichment analyses, and these terms based on their adjusted P-values are illustrated in Fig. 3. The top ve biological process terms for GO analysis were nuclear division, organelle ssion, mitotic nuclear division, sister chromatid segregation, and chromosome segregation. Interestingly, in the cellular component terms, the top ve GO terms were all enriched in condensed chromosome. In addition, the top ve molecular function terms for GO analysis were DNA replication origin binding, chemokine activity, cyclin-dependent protein serine/threonine kinase regulator activity, microtubule binding, and chemokine receptor binding. For KEGG pathway analysis, cell cycle, complement and coagulation cascades, oocyte meiosis, staphylococcus aureus infection, and progesterone-mediated oocyte maturation were mostly enriched.

Validation of the Expression of Hub Genes
All of these 10 hub genes underwent expression validation in GSE93272 dataset which included 23 healthy control patients and 136 RA patients. The results, as shown in Fig. 4A, indicated that CCNB1, CDC6, RRM2, ASPM and DTL were signi cantly up-regulated in RA samples compared to healthy control samples (P < 0.05, 0.01, 0.001, or 0.0001). However, the ve genes have been little studied on the associations with RA. To further con rm their roles in rheumatoid arthritis, we collected peripheral blood from 47 patients with RA and 40 healthy donors. Then RT-PCR analysis was performed to detect the expression of the ve mRNAs (Fig. 4B-F). Eventually, we found that ASPM, DLT and RMM2 were signi cantly up-regulated in RA group compared to healthy controls.
Clinical evaluation for the 3 candidate mRNAs Next, we assessed whether the expression of examined mRNAs could distinguish early RA patients from established RA patients and distinguish RA patients in remissive stages from that in active stages. Consistent with the validation data above, the expression of all the three mRNAs showed great difference between RA and HCs. But the expression of ASPM, DTL and RRM2 have no signi cant difference between early RA and established RA (Fig. 5A). And only RRM2 showed great capacity in discriminating between remissive RA and active RA (Fig. 5B).

Discussion
Here, we performed RNA-seq analyses of cohorts of patients with RA and HCs. By using bioinformatics analysis, this study comprehensively identi ed PBMCs differentially expressed mRNAs associated with RA and provided a novel diagnostic mRNA biomarker for rheumatoid arthritis. RRM2, as supported by both GEO dataset and clinical samples validations, importantly, had high discriminative power with the AUC of 0.990 ( sensitivity = 1.000; speci city = 0.975). Moreover, we observed that PBMCs-mRNA-RRM2 levels in RA patients was raised following exacerbation of clinical symptoms and disease activity, including CDAI, DAS28-ESR, tender joints, swollen joints and a transformation of disease state from remission to activity. These results suggested that the potential role of RRM2 as a novel biomarker for diagnosing disease and monitoring therapy outcome. As we know, the predictive value of the existing biomarkers (anti-CCP and RF) is moderate [19]. The PBMCs-RRM2 level could be used as an supplement biomarker for RA patients who have possibility of missing diagnosis due to the limitations of laboratory test results.
Early diagnosis of RA is an important step to realize a more effective prevention of the progression and subsequent damage [28]. The RF and anti-CCP are the well-known serological biomarkers for RA diagnosis. Raised serum titer of RF is associated with disease activity, longer disease duration and extraarticular manifestations [29][30][31]. And circulating anti-CCP can be detected even 10 years before the rst symptoms onset [32,33]. Therefore, the presence of RF and anti-CCP are commonly used as diagnostic biomarkers as well as a prognostic biomarkers [34]. However, the sensitivity of anti-CCP and RF is just 67% and 70%, respectively [19], which means that negative results do not exclude RA [13], which remains challenges facing rheumatologists. With the development of sequencing technology and bioinformatics analysis, many scientists try to nd out a novel highly sensitive and speci c biomarker for the diagnosis of disease. Mounting data have suggested that non-coding RNAs (ncRNAs) are playing pivotal role in regulating in ammation and autoimmunity, which can be regarded as diagnostic biomarkers for RA [35]. For example, miR-146a, miR-150 and miR-223 have be found to be high expressed in peripheral blood and joint tissues, and can be served as promising biomarkers for RA [36]. Another candidate Long Non-Coding RNA biomarker is GAPLINC, which is participated in proliferation, invasion and migration of broblast-like synoviocytes, and whose expression was increased in the peripheral blood, T cell, and synovial tissues of RA patients [37]. Intriguingly, the ncRNAs were participated in the pathogenesis of RA through the lncRNA/circRNA-miRNA-mRNA network [35]. And only a few of them have been employed in clinical diagnosis. Yet it's worth noting that many of these biomarkers only can detected from tissue, such as synovium tissues, obtained by biopsies. By contrast, those can be detectable in peripheral blood through non-invasion are more useful [38].
Ribonucleotide reductase M2 (RRM2) acts as a subunit of Ribonucleotide reductase (RR) which is important for DNA replication and damage repair via providing deoxy-ribonucleoside triphosphates (dNTPs) [39,40]. Many studied has showed that RRM2 plays a key role in tumor cell proliferation, apoptosis, DNA damage and epithelial mesenchymal [41][42][43]. Therefore, RRM2 is also considered as an important gene in tumor metastasis, progression and a promising biomarker for many diseases [44,45]. Meanwhile, previous studies indicated that both mRNA and protein of RRM2 were responsible for chemotherapy sensitivity and resistance [46], such as imatinib-based therapy resistance in chronic myeloid leukemia [47], adriamycin resistance in breast cancer [48], gemcitabine resistance in advanced lung squamous cell carcinoma [49] and so on. Interestingly, only one report about RRM2 was associated with RA, using liposome-polycation-dNA (LPd) complex loaded with RRM2 small interfering RNA, which suggested that suppressed RRM2 gene may cause the downregulation of the levels of proin ammatory cytokines TNF-α and IL-6 in RA-broblast-like synoviocytes (RA-FLSs) via increasing the levels of apoptosis and inhibiting the proliferation of RA-FLSs [50]. Evidence from this study elucidated that RRM2 could play a critical role in the pathogenesis of RA. However, we could not nd positive correlations between RRM2 and in ammatory factors including TNF-α and IL-6 in plasma. These inconsistent results indicated that RRM2 has a cell and tissue-speci c expression pattern in RA. In the light of the complex biological role of RRM2, more studies are needed to explore the mechanism and function of RRM2 in RA.
DTL (Denticleless E3 Ubiquitin Protein Ligase Homolog), also known as CDT2, is a substrate receptor for CRL4 ubiquitin ligases [51]. DTL is important in degrading some proteins that regulating DNA replication and repair and cell cycle through ubiquitin-dependent pathway [52,53]. In this study, we found DTL have a high diagnosis e ciency for RA. DLT also have signi cant correlations with IL-8 and TNF-α. However, DTL have no correlation with clinical symptoms and disease activity. Whether DTL can serve as a diagnostic biomarker for RA, the further study needs to do.
Our study has several potential limitations. The rst weakness was the population of cohorts used for high-throughput sequencing analyses was small. Secondly, the individual samples we collected and that in GSE93272 dataset used to validate the hub genes and evaluate their correlation with clinical characters were all Japanese and Chinese; more samples from other countries are needed to verify these results. Thirdly, the speci c mechanism by which RRM2 cause the pathogenesis of RA remains to be further studied. Lastly, further research is also required to explore the value of RRM2 for differential diagnosis with other autoimmune diseases, such as osteoarthritis and ankylosing spondylitis.

Conclusion
In summary, this study identi ed 3 newly discovered mRNA from PBMCs and their expression levels signi cantly increased in RA. With the validations in vivo and vitro, RRM2 showed high diagnosis power with extremely high levels of sensitivity (100%) and speci city (97.5%). And RRM2 might be used as a new biomarker for evaluating disease activity. The ndings provided a novel diagnostic and prognosis biomarker for RA.  Encyclopedia of Genes and Genomes; PPI, protein protein interaction; ROC, Receiver operating characteristic curves. CDAI, Crohn's Disease Activity Index; DAS28, Disease Activity Score 28; (B) Volcano plots showing the DEGs in the two comparison groups. The red dots represent up-regulated genes (log2FC > 1), and the blue dots represent down-regulated genes (log2FC < -1).    Veri cation of the 10 hub genes by GSE93272 dataset and patients' samples. (A). the expression of 10 hub genes in GSE93272 dataset. (B). expression of the hub genes in RA patients. Only hub genes that had signi cantly differentially expression between RA and normal control groups were listed. The green box indicated normal control group, and the orange box indicated the RA group. Mann Whitney test was performed to compare the means of two groups. *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. ns, no signi cance. Figure 5