Gene-regulatory network study of rheumatoid arthritis in single-cell chromatin landscapes of peripheral blood mononuclear cells

Rheumatoid arthritis is a chronic autoinammatory disease with an elusive etiology. Assays for transposase-accessible chromatin with single-cell sequencing (scATAC-seq) contribute to the progress in epigenetic studies. However, the impact of epigenetic technology on autoimmune diseases has not been objectively analyzed. Therefore, scATAC-seq was performed to generate a high-resolution map of accessible loci in peripheral blood mononuclear cells (PBMCs) of RA patients at the single-cell level. The purpose of our project was to discover the transcription factors (TFs) that were involved in the pathogenesis of RA at single-cell resolution. In our research, we obtained 22 accessible chromatin patterns. Then, 10 key TFs were involved in the RA pathogenesis by regulating the activity of MAP kinase. Consequently, two genes (PTPRC, SPAG9) regulated by 10 key TFs were found that may be associated with RA disease pathogenesis and these TFs were obviously enriched in RA patients (p<0.05, FC>1.2). With further qPCR validation on PTPRC and SPAG9 in monocytes, we found differential expression of these two genes, which were regulated by eight TFs (ZNF384, HNF1B, DMRTA2, MEF2A, NFE2L1, CREB3L4 (var. 2), FOSL2::JUNB (var. 2), MEF2B). What is more, the eight TFs showed highly accessible binding sites in RA patients. These ndings demonstrate the value of using scATAC-seq to reveal transcriptional regulatory variation in RA-derived PBMCs, providing insights on therapy from an epigenetic perspective. Chromatin specialization via the scATAC-seq (a) workow of the assay transposase accessible chromatin in single-cell sequencing in human PBMCs; (b) clustering of single nuclei accessibility proles of the ve cell groups in healthy controls and rheumatoid arthritis patients; (c) Open chromatin signals around marker genes of four clusters, including natural killer cells (NK), T lymphocytes (T cell), B lymphocytes (B-cell) and monocytes.


Introduction
Rheumatoid arthritis (RA) is a chronic autoin ammatory disease distinguished by autoantibodies to citrullinated proteins and affecting approximately 1% of the world's population. Without su cient treatment, RA can contribute to disabling joint damage and systemic disorders due to chronic synovial in ammation with synovial cell proliferation and pannus formation [1,2] .
Mounting evidence indicates that epigenetic modi cations play signi cant roles in regulating RA pathogenesis [3] . We currently know that more than 100 loci are associated with RA risks, such as HLA RB1, TRAF1, PSORS1C1, and microRNA 146a, which are variously related to joint damage [4] . Most recently, Christopher Loh successfully applied an assay for transposase-accessible chromatin by highthroughput sequencing (ATAC-seq) to discover a novel mechanism in which some motifs in approximately 80 genes were more accessible, which facilitated sustained chromatin activation by tumor necrosis factor (TNF), leading to continuous uninterrupted synovitis in RA [5] . Simultaneously, using ATACseq, Krishna V con rmed that JQ1 can inhibit the proliferation and activation of broblast-like synoviocytes (FLS) from RA patients in vitro, which indicated an alternative treatment for RA [6] .
To better understand RA etiology, we need to study the gene regulatory network of RA, which will reveal which genes and cell types play roles in RA [4] . There are some loose regions in chromatin in eukaryotic genomes, including those containing active regulatory elements, which control gene activity and are relatively accessible, thereby playing roles in disease [7] . By analyzing active regulatory elements, such as enhancers, promoters, and other regulatory sequences, we can pro le a map of open, active DNA. In Raw Data of scATAC-seq Processing All protocols for data processing were available on the following internet site: https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/algorithms/overview. The major steps were as follows: Droplet-based single-cell ATAC-seq (scATAC-seq) Nuclear suspensions were incubated in a transposition mix that included Tn5 transposase, which enabled fragmentation of DNA in open regions of the chromatin and marking them with adapter sequences. Then, GEMs were produced by mixing barcoded gel beads, transposed nuclei, a master mix, and partitioning oil on Chromium Chip E (10x Genomics, CG000168). The nuclei were delivered at limiting dilution, such that the majority (90-99%) of generated GEMs contained no nuclei, to obtain-single nucleus resolution, while the remainder largely contained a single nucleus. Then, silane magnetic beads were utilized to remove excess biochemical reagents from the post-GEM reaction mixture. Solid phase reversible immobilization (SPRI) beads were used to remove unutilized barcodes from the sample. During material ampli cation by PCR, the libraries were indexed by P7 (a sample index) and Read 2 (Read 2N) sequences. Therefore, the nal libraries contained the P5 and P7 primers used in Illumina® bridge ampli cation.

Barcode processing and genome alignment
The occasional sequencing error in barcodes obtained from the 'I2' index read was xed to improve data quality. For example, if barcodes outside the whitelist had a Hamming distance less than two and the probability that they are real barcodes was higher than 90%, they were corrected to be whitelist barcodes 14 . The Cell Ranger ATAC pipeline performs reference-based analysis and requires adapter and primer oligo sequences to be trimmed off before con dent mapping. We utilized the cutadapt [35] tool to identify the reverse complement of the primer sequence, which could exist at the end of a read sequence. We trimmed it from the read before alignment. Then, the trimmed read pairs were aligned to a speci ed reference using BWA-MEM [36] with default parameters. Then, duplicate reads were identi ed as groups of read pairs across all barcodes, where the 5' ends of both R1 and R2 had identical mapping positions as the reference, while other read pairs were detected only as fragments.
Peak Analysis, Peak-Barcode Matrix and t-SNE projection As each fragment end was indicative of regions of open chromatin, we analyzed the combined signals from these fragments to determine regions of the genome enriched for open chromatin and hence have putative regulatory and functional signi cance. Peak calling was performed through followed method: First, we counted the number of transposition events at each base pair along the genome; second, a smoothed pro le of these events with a 401 bp moving window around each base pair was produced and combined with a ZINBA-like mixture model, which includes a binarization algorithm; third, a region is identi ed as a peak signal (enriched for open chromatin) or noise, which depends on the signal threshold that is set at an odds ratio of 1/5. Consequently, peaks 500 bp apart are combined to generate a peak-containing BED le for position classi cation.
According to the number of peak areas with overlapping fragments, we separated the signal and noise for each barcode. To simulate and identify whitelist contamination, we subtracted the xed count relative to a depth from all the bar code counts. Setting an odds ratio of 1000, we separated the barcodes that corresponded to real cells from noncell barcodes. Subsequently, a raw peak barcode matrix containing each barcode of chromatin open richness was produced and then used for subsequent analysis, such as dimensional reduction, clustering, and visualization.
For binarization of the scATAC-seq data, all cells merge the peak calling which was performed on reads, if at least one reading with overlapping peaks, marks the peak '1' (open), or for the '0' (closed) by rst by term frequency-inverse document frequency (TF -IDF) to a normalized binary counter matrix, and then to perform Singular-Value Decomposition (SVD) to form the Latent semantic indexing analysis only after the SVD of 2-50 d was passed to the t-SNE visualization. We also provided graph-based clustering and visualization via t-SNE. We normalized the data to the unit norm before performing graph-based clustering and t-SNE projection.

Peak-Related Gene Identi cation by GO Enrichment and KEGG Enrichment Analyses
As an gene functional classi cation system, Gene Ontology (GO) reports annotations in three categories: molecular function, cellular component, and biological process. GO enrichment analysis reports all GO terms that are signi cantly enriched in peak-related genes compared to their levels in the genome backgrounds and lters the peak-related genes that correspond to biological functions. First, all peakrelated genes were mapped to GO terms in the Gene Ontology database (http://www.geneontology.org/), gene numbers were calculated for every term, and signi cantly enriched GO terms in peak-related genes compared to the genome background were de ned by hypergeometric testing. To study gene biological functions, we also performed KEGG enrichment analysis to identify vital signal transduction pathways with peak-related genes.

TF Motif Enrichment Analysis and Differential Accessibility Analysis
As transcription factors (TFs) bind through speci c motifs, we performed enrichment analysis for these motifs in the peaks. To identify their accessibility and speci c TF motifs for each cluster, Cell Ranger ATAC determines whether the average within a cluster is different than the average outside the cluster for each motif and each cluster. To nd differentially accessible motifs between cell groups, Cell Ranger ATAC uses the fast-asymptotic beta test, which is utilized in edgeR. For each cluster, the algorithm ran on the cluster with all other cells, producing a list of genes that are differentially expressed in the cluster relative to the remainder of the sample.

Validation by qPCR
Three healthy donors (male: female=1:2, mean age 28±3 years) and three RA patients (male: female=1:2, mean age 56±11 years) were involved in the qPCR validation experiments. Then, we collected 10 ml of peripheral blood from each donor for isolating PBMCs. After that, using CD14 microbeads (microbeads conjugated to monoclonal anti-human CD14 antibodies, Miltenyi), monocytes were isolated from PBMCs. Using RNAiso Plus (TAKARA 9109), chloroform, and isopropanol, the RNA was separated from monocytes. Then, total RNA was reversed transcribed into cDNA. Then, the cDNA was detected by real-time PCR. The relative expression of genes in monocytes was used by Student's t-test.

Results
Quality Control of scATAC-seq RA group and NC group were acquired and analyzed by scATAC-seq as the work ow shown. After quality control, 16,407 cells and 5,344 unique fragments were retained. At single-cell resolution, RA_PBMC enriched TSS fragments four times in the proximal region relative to the distal region, and NC_PBMC enriched TSS fragments ve times, which re ected that the proportion of fragments captured in the open chromatin was higher than that in the closed chromatin. Besides, we noted distribution characteristics of nucleosome fragment between the RA group and NC group. The recovered library from RA_PBMC was sequenced to an average depth of 22,743 raw reads per cell, generating chromatin accessibility pro les for 8014 cells with a median of 5,345 unique fragments per cell. Simultaneously, the library of NC_PBMC generates pro les of 8,393 cells with 23,035 original reads per cell and 5,344 unique fragments per cell.

Identi cation of primary cell types among PBMCs
The RA group and NC group were acquired and analyzed by scATAC-seq as shown in the work ow ( Figure  1a). Quality control of the scATAC-seq pro le was showed in gure 1b and gure 1c. In detail, we observed the signal distribution map around TTS after normalization of NC_PBMCs and RA_PBMCs library and the length distribution of corresponding inserts for each sample. Here, we used the activity of known cell marker genes to identify and annotate 9 clusters (Figure 1b), including T lymphocytes (T cells; clusters 0, 1, 3, and 5), natural killer cells (NK cells; cluster 2), monocytes (cluster 4), B lymphocytes (B cells; cluster 6) and dendritic cells (DCs; cluster 8). More speci cally, T cells can be identi ed by CD3D, CD3G, LEF1, CD8A and IL2RA [10,11] ; NK cells by GZMB, NKG7, and KLRD1 [10,12] ; monocytes by CD68, CD14, and ITGAM [10,13] ; B cells by CD79A, CD19 and MS4A1 [10,12] ; and DCs by IL3RA [10] . Select marker genes and the corresponding expression quantities are shown in Figure 1c.
In addition, scATAC-seq enables cell type identi cation by cell type-speci c transcription factor motifs [14] . For the fragments that overlapped the list of TF motifs, annotations of cell-type speci city were created for the most signi cantly enriched TF motifs (P-value < 0.01, fold change value, FC>1.5) in each cluster with no apparent difference between the RA_PBMC and NC_PBMC groups to serve as an alternative identifying feature for each cluster. Notably, more than 20 TF motifs in B cells were identi ed. Thus, we selected the top 10 TF motifs: POU2F3, IRF1, POU5F1B, STAT1:: STAT2, POU2F, POU1F1, POU2F2, POU3F3, POU5F, and POU3F1 ( Figure 2a). In addition, we found 13 TF motifs in NK cells, including EOMES, TBX2, TBR1, TBX20, and TBX21 ( Figure Table 2).

Differential chromosome accessibility analysis in rheumatoid arthritis
The distribution of immune cells and their regulatory functions are variously related to the occurrence of autoimmune diseases. Comparing the ratios of immune cells in the PMBC_RA with PBMC_NC groups (Table 3, Figure 2c), a signi cant difference was observed between B cells and T cells (Chi-square test, P-value < 0.01, FDR<0.05). Notably, the ratio of T cells in the PBMC_RA group was lower than that in the PBMC_NC group, implying a possible mechanism by which fewer T cells curb in ammation in the RA environment, a nding that is consistent with the hypothesis suggesting that regulatory CD8+ T cells play a role in preventing the development of autoimmune diseases [15] . In contrast, there were more B cells in the RA group than in the NC group (P-value < 0.01, FDR<0.05), suggesting that B cells in peripheral blood can promote RA pathogenesis by changing the number of cells. When calculating the different peaks (P-value < 0.05, FC>1.2) in the PBMC_NC and PBMC_RA groups, we found no different peaks in T cells, 51 in NK cells, 11 in monocytes, 149 in B cells, and 665 in DCs (Figure 2d). Moreover, the differential accessibility of TF motifs (P-value < 0.05, FC>1.2) for each immune cell type was selected. Consequently, we identi ed 56 TF motifs in RA, including three motifs (CEBPG (var. 2), NFE2L1, MAF:: NFE2) in monocytes, six motifs (NRF1, PHOX2A, TCFL5, ZBTB14, PHOX2B, PROP1) in B cells, one motif (TCFL5) in T cells and 46 motifs in NK cells (Figure 2e). However, no differential TF motifs were observed in DCs, indicating that the transcriptome signatures of these cells did not change in the RA environment.
We identi ed a total of 22 subclusters among B cells, DCs, monocytes, NK cells, and T cells (Figure 3a) with further cluster analysis. The results show that the cell number ratios of subcluster 0 B cells (B-0), subcluster 3 NK cells (NK-3), and subcluster 3 T cells (T-3) in the PBMC_RA group were more eight-fold greater than those in the PBMC_NC group. In addition, the cell number ratios in B-2, monocyte-1, NK-1, and T-5 in the PBMC_RA group were also higher than in the PBMC_NC group. However, the cell number ratio of subcluster 3 B cells (B-3), subcluster three monocyte cells (monocyte-3), subcluster 0 natural killer cells (NK-0), subcluster four natural killer cells (NK-4), subcluster 0 T cells (T-0), subcluster 4 T cells (T-4) and subcluster 6 T cells (T-6) in the PBMC_RA group was smaller than that in the PBMC_NC group, especially for NK-0 and NK-4, in which it was decreased by more than three-fold. Furthermore, no obvious difference in the number of cells was observed in the remaining subclusters in the PBMC_RA and PBMC_NC groups. We pro led a gure for the cell ratio, as mentioned above (Figure 3b). Furthermore, the differential accessibility of TF motifs (P-value < 0.05, FC>1.2) in each subgroup of PBMCs is shown in Figure 3c.

Functional analysis of signi cantly enriched peaks in the PBMC_RA group
According to the epigenomic analysis, we selected peaks with an obvious fold change value higher than 1.2 for the GO and KEGG analysis. Based on the KEGG analysis, NKs participate and mediate RA progression through virus infection-related pathways, such as human CMV, human immunode ciency virus 1, Kaposi sarcoma-associated herpesvirus, Epstein-Barr virus, and human T-cell leukemia virus (Figure 3d). In addition, NKs were also active in Th17 cell differentiation. This result was related to another study showing that helps T (Th) cells participate in RA pathogenesis, especially Th17 cells, which produce IL-21, IL-22, and TNFα [16] . Similarly, we found that DCs are regulated through human cytomegalovirus (CMV) infection ( Figure 3e), a nding consistent with research showing that RA patients usually have CMV [17] .
Similarly, after further clustering of ve major types of PBMCs, we found cell-type-speci c functions through GO enrichments considering peak-related genes in subclusters, except for the B-2 and T-1 subclusters. Here, the B-0 and B-3 subcluster genes were found to play roles in T cell activation. At the same time, B-1 genes are involved in B cell differentiation and activation. In addition to T cell activation, B-3 genes also participate in regulating cell−cell adhesion, negative regulation of protein phosphorylation, regulation of MAP kinase activity, and T cell activation regulation. In monocytes, M-0 genes are involved in the negative regulation of phosphorylation, regulation of MAP kinase activity, neutrophil degranulation, activation, and mediation of immunity. M-1 genes are regulated through DNA-binding transcription activator activity; RNA polymerase II-speci c. M-2 genes are active in histone modi cation, T cell activation, and differentiation.
Similarly, the M-3 subcluster genes participate in dephosphorylation, regulation of GTPase activity, regulation of small GTPase-mediated signal transduction, positive regulation of cell adhesion, and T cell activation. In NK cells, peak-related genes in the NK-0 subcluster are mainly involved in the endosome membrane; NK-1 genes participate in the regulation of mitotic cell cycle phase transition and kinase regulator activity; NK-2 genes take part in Ras protein signal transduction, regulation of small GTPasemediated signal transduction, and T cell activation; NK-3 genes have effects on the cellular response to UV; NK-4 genes are mediated through T cell activation, regulation of lymphocyte activation, positive regulation of cytokine production and regulation of GTPase activity. With regard to subtypes of T cells, genes are abundant in the ubiquitin ligase complex in the T-0 subcluster, focal adhesion in T-2, and the Cul4−RING E3 ubiquitin ligase complex in T-3. In addition, considering biological process, the T-2 subcluster genes are active in cell−substrate adherens junction; T-4 in autophagy; T-5 in the regulation of cellular amide metabolic process and p38 MAPK cascade; T-6 in T cell activation and positive regulation of cytokine production; and T-7 in I−kappaB kinase/NF−kappaB signaling (supplemental gure 4g). The seminal discovery of TFs in the PBMC_RA group A total of 434 signi cantly TF-enriched TF motifs were identi ed in the RA_PBMC group (P-value < 0.05, FC>1.2) through further cell clustering based on the scATAC-seq analysis. We discovered two (PTPRC and SPAG9) of the 5 intersecting genes with differential accessible TF-binding sites in B-3 and M-0 subclusters of RA patients compared to healthy controls (P-value < 0.05, FC>1.2). In addition, PTPRC related to RA was identi ed from Gene Data Mining to Disease Genome Sequence Analysis (www.genecards.org), and the relevance scores were higher than 9. More speci cally, 71 TFs could be involved in regulating two genes (PTPRC and SPAG9), and only 10 TFs had highly accessible binding sites (GATA6, IRF2, ZNF384, HNF1B, DMRTA2, MEF2A, NFE2L1, CREB3L4 (var. 2), FOSL2::JUNB (var. 2), and MEF2B). Therefore, 2 TFs (GATA6 and IRF2) in B-3 and the 8 TFs in M-0 (ZNF384, HNF1B, DMRTA2, MEF2A, NFE2L1, CREB3L4 (var. 2), FOSL2::JUNB (var. 2), and MEF2B) contributed for RA pathogenesis by regulating corresponding target genes and MAP kinase activity (Figure 4c and 4d). To gure out the cell traits of these subcluster, we further identi ed the enriched motifs in these subcategories in PBMC_(p<0.05, FC>1.2) ( Table 4).
To validate the expression of two genes in monocytes (PTPRC, SPAG9), we took the peripheral blood of another three RA patients and healthy donors. Using these samples, monocytes were isolated and used for extracting their RNA for qPCR experiments. Compared to healthy controls, differential expression of two genes (SPAG9 and PTPRC) was observed in RA patients (SPAG9, 1.673 ± 0.8945 vs. 1.013 ± 0.1673, p = 0.0045; PTPRC, 0.5878 ± 0.3841 vs. 1.007 ± 0.1203, p =0.0066) (Figure 4e).

Discussion
With the rise of single-cell sequencing technology, single-cell epigenomics sequencing has attracted more and more attention, and scATAC-seq is also the absolute new favorite in the eld of epigenetics. The research identi ed 85 accessible chromatin patterns and 400,000 regulatory elements from 13 mouse tissues by scATAC-seq and found the cell type re ected by chromatin accessibility varies according to anatomical coordinates [18] . More biological information can be revealed at the cell level by studying single-cell heterogeneity. And scATAC-seq can be used to study migratory behavior in cancer biology. Meanwhile, this technology on autoimmune disease has not been profoundly applied.
To better understand RA epigenetics pathogenesis, we performed scATAC-seq on peripheral blood mononuclear cells (PBMCs) from RA subjects to pro le an epigenetic landscape of active regulatory DNA, which can be used to distinguish different cell subtypes, identify cell-type-speci c TF enrichment, and reveal a regulatory pattern. In this way, T cells, B cells, NK cells, monocytes, and DCs were identi ed without antibodies. Furthermore, as part of the epigenetic signature identi es the aforementioned PBMC types, cell-type-speci c TF motifs were also identi ed. The discovery of TF motifs (P-value < 0.01, FC>1.5) may promote parsing of cellular heterogeneity by identifying immune cell types.
As above description ( gure 3d and 3e), NK cells and DC cells both played a role in RA progression, which was associated with some viruses. We hypothesized that as extrinsic antigens these viruses worked through molecular simulation mechanisms, inducing an autoimmune response in RA patients. And this hypothesis was consistent with other research. In this way, we con rmed the importance of using the single-cell sequencing technique and its reliability to reveal the traits of immune cells in RA patients.
With further cell reclustering, we identi ed a total of 22 subclusters among the ve main cell types. In this way, we obtained more detailed and differential data compared with the ve central immune cells. For example, little information could be obtained from T cells through the GO enrichment analysis of peakrelated genes (P-value < 0.05, | FC |>1.2). In contrast, GO analysis showed that T cell subtypes showed varied functions, biological processes, and cellular components. Furthermore, no TF motifs were found in T cells with an FC value higher than 1.2, but we observed 89 and 11 enriched motifs (P-value < 0.05, FC>1.2) in the T-6 and T-7 subclusters, respectively. Single-cell sequencing has been shown to provide a higher resolution of cellular differences and enables a more in-depth exploration of the function of an individual cell [19] .
Speci cally, according to GO analysis, we found that 32 genes in the T-2 subcluster cells showed an effect on focal adhesion, cell-substrate junctions, and cell−substrate adherens junctions. Considering previous research results [20,21] , these three pathways contribute to the migration of broblast-like synoviocytes (FLSs) in RA. In addition, T-4 subcluster cells participate in autophagy with 20 genes associated with various autoimmune diseases, including RA [22] . Dysregulation of the autophagy pathway plays a vital role in RA pathogenesis through peptide citrullination, osteoclast activation, T/B cell activation, and RA FLS survival [22,23] . If we better understand the autophagy pathway, we would be able to apply compounds that modulate the autophagy pathway to in vitro tests and develop new drugs. In addition, 12 genes in the T-5 subcluster cells were involved in regulating cellular amide metabolic processes, which may facilitate the metabolic regulation of the T cell response in RA [24] . Furthermore, we observed that the pathway of the p38 MAPK cascade was enriched by 12 genes in T-5. Moreover, the p38 mitogen-activated protein kinase (MAPK) pathway has been strongly suggested to be associated with the pathology of RA [20] . In the T-6 subcluster, a reduced cell number ratio was evident, and 59 genes and 54 genes participated in the T cell activation pathway and positive cytokine production pathway, respectively. Therefore, we speculated that T-6 promotes the development of in ammation in RA in these two ways.
In the subtypes of B cells, we found that B-0 and B-3 both played a role in T cell activation with 26 genes and 46 genes, consistent with a previous study showing that the T cell activation in RA was B cell dependent [21] . However, the cell number changed in an opposite trend, which showed B cell dysregulation. While B-0 and B-2 cells were obviously increased in the RA group compared to the NC group, B-3 cells were more profoundly decreased. Our results may indicate that protective (B-3) and pathogenic (B-0) B cell populations exist in vivo. Hence, past research has shown that the depletion of B-3 cells and the proliferation of B-0 cells both promote the development of RA [25] . Given this nding, anti-CD20 drugs may not be an advisable choice for use in treating RA, as they may not be able to distinguish protective B cells.
In monocytes, the M-0 subcluster genes were mainly embodied in immune-in ammatory pathways (e.g. neutrophil degranulation, neutrophil activation involved in immune response, neutrophil activation, neutrophil-mediated immunity, and regulation of MAP kinase activity). According to the literature, M-0 is interconnected with other cells by producing proin ammatory cytokines [26] . In addition, this nding suggests that M-0 genes may mediate the activation and degranulation of neutrophils to play a role in RA pathological processes. In the M-2 subcluster, the biological processes identi ed were mainly T cell differentiation, lymphocyte differentiation, T cell activation, and histone modi cation. This nding suggests that M-2 cells have more signi cant local chromatin accessibility changes through histone modi cation than other RA cells, contributing to additional, different peaks.
Furthermore, immune cells can undergo epigenetic changes through histone modi cation [3] . M-3 subcluster cells participated in immune-in ammatory pathways (e.g. regulation of GTPase activity, regulation of small GTPase-mediated signal transduction, positive regulation of cell adhesion, T cell activation, and dephosphorylation). As reported by Amanda Chan, Rac small GTPases mediate JNK activation to facilitate FLS proliferation and invasion, similar to tumor cells [27] , which are associated with M-3.
According to KEGG analysis, our ndings focused on virus infection-related pathways and tumorassociated pathways in NK cells of RA patients. NK-0 showed activity in proteoglycan pathways in cancer and Salmonella infection. In addition, NK-1 participated in Epstein−Barr virus infection, human immunode ciency virus 1 infection, human cytomegalovirus infection, and colorectal cancer. Infections and a high risk of virus-associated cancers increased RA susceptibility because most of the infectious agent proteins are similar to human polypeptides, as indicated by the molecular mimicry hypothesis [28] . In addition, our results showed that NK-1 was involved in apoptosis and the p53 signaling pathway, which was consistent with a study showing that antiapoptotic molecules, including the transcriptional regulator (p53), facilitate resident synoviocyte hyperplasia and proliferation [29] .
Furthermore, NK-2 and NK-4 both played a role in Th17 cell differentiation and Th1 and Th2 cell differentiation; NK-4 was also active in two in ammatory pathways, leukocyte transendothelial migration, and the TNF signaling pathway. These ndings support the idea that T cells are critical to the pathogenesis of RA [16] .
Finally, B-3 and M-0 subcluster genes both participated in regulating MAP kinase activity. Mitogenactivated protein kinase (MAPK) signaling pathways are critical for regulating the production of proin ammatory cytokines, ultimately culminating in joint in ammation and destruction [30,31] . Thus, ve genes that participate in this pathway are in both of these two subclusters (MAP3K3, SPAG9, PTPRC, PTPN1, and SRC), which may be key elements of RA disease. In addition, we discovered 10 TFs with highly accessible binding sites (GATA6, IRF2, ZNF384, HNF1B, DMRTA2, MEF2A, NFE2L1, CREB3L4 (var. 2), FOSL2::JUNB (var. 2), MEF2B) near two of these genes (PTPRC, SPAG9). The literature indicated that PTPRC has a strong association with the response to TNF inhibitors in RA [32] , meanwhile, European studies also supported that PTPRC was RA risk gene locus [4] . This result con rmed the reality of our ndings. However, SPAG9 was found to be related to prostate cancer [33] and abnormally regulate the protein expressed in systemic sclerosis. To validate the expression of these genes in RA patients, using qPCR, we observed the existence of differential expression in monocytes. Thus, if the accessibility of these transcription factor binding sites (ZNF384, HNF1B, DMRTA2, MEF2A, NFE2L1, CREB3L4 (var. 2), FOSL2::JUNB (var. 2), MEF2B) were transformed, they could regulate PTPRC and SPAG9 and mediate MAP kinase activity. These results enhance our understanding of gene transcription regulatory roles played by different TF motifs. These cell type-speci c regulatory patterns can drive cell identity and function. Moreover, these ndings provide insights into epigenetic changes in RA_PBMCs, the pathogenesis of RA, and the identity of potential therapeutic targets.
In the future, single-cell RNA sequencing for further study of RA and potential drug molecules targeted to RA pathogenesis will be possible. In other words, single-cell multimodal omics is a suitable method for further study to reveal a complicated relationship in different dimensions.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.