Diversity of Immunoglobulin Heavy Chain Repertoire in Patients With Rheumatoid Arthritis

Objectives Rheumatoid Arthritis (RA) is associated with polymorphism in major histocompatibility complex class II genes and dysregulations of CD4 + T cells which cause abnormalities in immune repertoire (iR) expression and intracellular signaling. We monitored nucleotide sequence changes in iR of immunoglobulin heavy chain (IGH), particularly complementarity determining region 3 (CDR3) during the course of treatments in RA patients using massively parallel sequencing technology. Methods CDR3 sequencing was carried out on clinical blood samples from RA patients for disease progress monitoring. The iR of each sample was measured using next generation sequencing (NGS) pipeline. Data analysis was done with a web-based iRweb server. Principal components analysis (PCA) was completed with commercial statistical pipeline. Results Datasets from 14 patients covered VDJ regions of IGH gene. D50 stayed low for all cases (mean D50 = 6.5). A pattern of shared CDR3 sequences was conrmed by a clustering pattern using PCA. Shared prole of 608 CDR3 sequences unique to the disease baseline was identied. D50 analyses revealed clonal diversity would remain low throughout the disease course even after treatment (mean D50 = 11.7 & 8.2 for csDMARD & bDMARD groups respectively) regardless of uctuated disease activity. PCA has provided a correlation of change in immune diversity along the whole course of RA. Conclusion We have successfully constructed the experimental design, data acquisition, processing, and analysis pipeline of a high throughput massively parallel CDR3 sequences detection to be used to correlate RA disease activity and IGH CDR3 iR during disease progression with or without treatments. of RA. PCA is used to perform dimension reduction to make sense of CDR3 sequences variations, and the BCR CDR3 sequences do have a clustering pattern upon PCA representation which is an indication of specic clonal expansion in RA. We demonstrated that BCR CD3 iR sequencing can be a good tool to follow up on the disease progression of individual RA patients. Low BCR repertoire diversity was noted in treatment naïve RA patients.


Introduction
Rheumatoid arthritis (RA) is a chronic, destructive autoimmune in ammatory disease which not only causes joint deformity, variant extra-articular manifestations, and increased mortality in individual patients but also poses socioeconomic burdens 1 .
Ever-increasing awareness of its pathogenesis has elicited developments of various novel therapeutic modalities for RA. Currently, conventional synthetic disease modifying ant-rheumatic drugs (csDMARDs), biologic (b)DMARDs and targeted synthetic (ts)DMARDs are 3 mainstays for the treatments in RA [2][3][4][5][6] . Traditionally, RA has been described as a T-cell mediated autoimmune disease, which develops via self-antigen presentation to T cell receptor (TCR) and subsequent ampli cation of the immune cell activation 7 . However, ever-increasing evidence has suggested that B cells also play essential roles in its pathogenesis 8 .
The immune repertoire (iR) is the summation of T and B cells in a human body at any given moment 9 . It represents the very moment or the chronology of individual's immune functions. Application of the iR sequencing in clinical medicine is cumulative in recent years, including cancer characterization and human T cell subset study in organ transplantation 10,11 . Sequencing the expressed T or B cell genes in iR may enable an accurate evaluation of disease activity, treatment response, and disease prognosis for RA. Previous studies have demonstrated the effects of bDMARDs on different auto-reactive T cell subtypes and TCR repertoire in RA [11][12][13][14][15][16] . We have used high-throughput sequencing approach to identify TCR iRs in RA patients receiving different types of bDMARDs 12 . Although, no signi cant difference has been found among various biological treatment strategies, an inverse tendency between TCR repertoire diversity and RA disease activity was found, suggesting that TCR iR can be a potential response biomarker for bDMARDs-treated RA.
It is known that RA patients express a skewed repertoire of polyclonal, hypo-mutated B cell receptor (BCR) 17 . Additionally, the dominant B cell clones in blood can predict onset of arthritis in individuals at risk for RA 18 . However, less little studies have been focused on BCR iR in RA, either the relationship between RA disease per se and BCR iR or the changes upon its treatments. In this study, we monitored the changes of iR in immunoglobulin heavy chain (IGH) of BCR, particularly complementarity determining region 3 (CDR3) sequences using massively parallel sequencing technology. We aimed to identify the correlation between disease activity and changes in BCR iR and its diversity in RA patients before and after treatment with csDMARDs or bDMARDs.

Page 3/16
Methods Patient selection, treatment protocol, and traditional disease markers The whole study protocol was veri ed by the institutional review board of Buddhist Hualian Tzu Chi Hospital (IRB-105-65-A). Seventeen patients were initially enrolled after they had signed the informed consent form, but three were lost followed up due to personal reasons. All procedures were performed after informed consent was signed, and all methods were performed in accordance with the relevant guidelines and regulations. The 16 samples for 14 patients were further categorized into those with newly diagnosed RA who were treatment naïve (Group 1, sample numbers: 8) and those with established RA who had been treated with csDMARDs but were bDMARD naïve (Group 2, sample numbers: 8). Two patients (1st patient designated as sample No. 4 and sample No. 14; 2nd patient designated as sample No. 8 and sample No. 16) participated in both groups (initially in Group 1 and then being extended into Group 2). The major demographics, and the treatment history including csDMARDs, and bDMARDs are listed in Table 1  14 are the same patient and No. 8 and No. 16 are the same patient. csDMARD= conventional synthetic disease modifying anti-rheumatic drug; bDMARD = biological disease modifying anti-rheumatic drug = biologics, which included * tocilizumab, ** abatacept, # adalimumab, ф certolizumab pegol and ◊ golbumab; *** csDMARD doses are represented with [M (methotrexate mg/wk )+ S (sulfasalazine gm/d) + H (hydroxychloroquine X10 2 mg/d)+ L (le uonomide X 10 mg/d)].

Next generation sequencing (NGS)
In order to evaluate iR at the disease baseline by CDR3 sequencing in BCR heavy chain, all specimens are collected from each patient with minimal required treatments at the beginning of study. Disease progression after treatments was evaluated by CDR3 sequencing of BCR heavy chain using longitudinal specimen series of four samples from each patient as mentioned above. A methodology overview contained the following four items, i.e. experimental design, RNA extraction, iRepertoire library preparations, and MiSeq® (Illumina, San Diego, CA) sequencing.

Data analysis
Raw datasets fastq les were uploaded into the server maintained by iRepertoire® for initial data processing. Data analysis was performed by iRweb software pipeline (iRepertoire, Huntsville, AL) (https://irweb.irepertoire.com/nir/). Sequence diversity of the iR is represented by D50 which is calculated by the following formula: where D50 is de ned as the diversity of an iR of total number of CDR3s including S distinct CDR3s in a ranked dominance con guration, and "r" stands for the amount of most abundant CDR3s, r1 is the amount of the 20 most abundant CDR3s, r2 is the amount of the second most abundant CDR3, and so on (i.e., r 1 >r 2 >r 3 >…ri>r i+1 >…r n ), X is the number of distinct CDR3s of the speci c sample, C is the minimum number of distinct CDR3s amounting to ≥50% of total sequencing reads 21 . The Diversity Index (DI) is calculated by the following formula, which is de ned mathematically as follows: assume that the frequency of the individual unique CDR3 are r 1 > r 2 > r 3 > · · · r k > r k+1 > · · · > r n where k stands for the k-th sample, r i is the frequency of CDR3 in the i-th sample, r k is the frequency of CDR3 in the k-th sample and n is the total number of unique CDR3s.
Principal component analysis (PCA) for baseline RA disease activity before treatment A very convenient feature of the iRweb software is CDR3 algebra, which allows the comparison of the CDR3 sequences from one data set to another data sets to identify shared CDR3s pro le. This allows for a comparison amongst samples of different time points during treatment. All CDR3 frequencies were arti cially scaled to 10 million reads to account for differences in read depth among samples, making comparisons between samples easier with this normalization step. A shared pro le of RA disease baseline was obtained using this function. We constructed observations/variables table based on this pro le taking different samples as well as different observations and different CDR3 clone sequences as different variables. We then applied the PCA function of a commercial module of Excel (Microsoft), namely xlstat (Addinsoft), to compute the PCA results on this set of baseline samples. Finally, observation chart with PC1 & PC2 for dimension reduced CDR3 listings of all samples was plotted for evaluation and observations based on factor scores.

Principal component analysis (PCA) for RA disease progression
Raw data in the form of CDR3 listings from serial detection of each patient were processed with the same pipeline of iRweb and analyzed using the PCA function of xlstat module in a PC windows environment. Both groups of patients undergoing csDMARD/bDMARD treatments were processed separately. . csDMARD= conventional synthetic disease modifying anti-rheumatic drug; bDMARD = biological disease modifying anti-rheumatic drug = biologics, which included * tocilizumab, ** abatacept, # adalimumab, ф certolizumab pegol and ◊ golbumab.

CDR3 detection, D50 and DI analysis
For the disease baseline study, the D50 values of most samples were below the value 10, suggesting low diversity of B cells in these samples. There were 1,035,476 average CDR3 reads (130,395 average unique CDR3) detected for all specimens with an average D50 value of 6.4 (DI 18.0 and Entropy 11.1). DI and entropy are alternate measurement values for diversity. In addition to the above de nition," DI" could alternatively de ned as 100 minus the area under the curve between percentage of total reads and percentage of unique CDR3s, when the frequencies of unique CDR3s are accumulated from most frequent to least frequent. On the other hand, "entropy" is de ned as that of the Shannon Entropy 22 . There was a shared pro le containing 608 BCR heavy chain CDR3 sequences which was exhibited across the samples as listed in supplementary Table  2. For the disease progression study, the results were presented in two groups, i.e. datasets obtained from 8 csDMARD-naïve patients undergoing csDMARD treatment and those from 8 bDMARD-naïve patients undergoing bDMARD treatment. The average D50 values from the 8 csDMARD-treated patients were 11.7, suggesting low diversity of B cells in them (Table 3). In average, there were 945,519 CDR3 reads (128,715 unique CDR3) detected for each sample. The average D50 values from the 8 bDMARD-treated patients were 8.2, suggesting also low diversity of B cells in them (Table 4). In average, there were  The CDR3 region is of particularly interest to us as the antigen speci city is highly correlated with this region of the BCR. The D50 is a quantitative measure of the degree of diversity of B cells within a sample. It is the percentage of dominant and unique B cell clones that account for the cumulative 50% of the total CDR3s counted in the sample. The more diverse a library, the closer the value will be to 50. Low diversity values are associated with decreased diversity. BCR heavy chain datasets and CDR3 listing in patients were obtained for disease baseline evaluation and IGH datasets from patients were obtained for disease progression evaluation.
Principal component analysis (PCA) for baseline disease activity of RA Apart from shared CDR3 analysis, we also performed a PCA on the disease baseline datasets and tried to analyze numerical data (CDR3 listings) structured in observations/ many variables table as described previously. We analyzed the CDR3   (Table 2 & Table 3), i.e., while improving D50 level, there was no improvement in terms of DAS28. Another patient (No.5) revealed the same trend as that in bDMARD group, which were 4.48, 4.71, 3.49 and 3.19 of D50 in sequence along the timeline, exhibiting a decreasing trend in disease activity. However, in contrary to increase in clonal diversity, the last DAS28 value on both cases was still above 3.2, indicating no signi cant improvement. Beside possible confounders such as undercurrent infection, this may conversely suggest D50 level is more sensitive than traditional DAS28 value to re ect disease activity. After all, pain or tenderness is so subjective that varies among patients, irrelevant to actual in ammatory status. Since tender joint count is one of the components of DAS28, the disease activity can also not be so reliably assessed by DAS28. Instead, D50 can be a better companion indicator of disease improvement trend, superior to DAS28 score. For a long time, lacking reliable biomarkers for prompt strati cation of individual patients to t the most appropriate and effective medications has rendered current treatment consensus for RA so challenging 24 . D50 level may potentially serve as an immediate biomarker to overcome this predicament.
Regarding the PCA, both "speci c treatment" naïve samples did show a clustering effect, implying that speci c clonal expansion pattern is present for RA. We have performed shared pro le analysis on patient datasets of disease baseline to identify shared dominant CDR3 sequences. We identi ed a shared pro le of 608 CDR3 clone sequences, i.e. clone expansion sequences which were present in at least two patients. Sequence "ARDLDY" (A=alanine R=arginine D=aspartic acid L=leucine Y=tyrosine) is the most signi cant CDR3 expansion sequence in these RA patients. Although currently, we still could not identify speci c peptide that contains this sequence, it is expected that it could eventually be clari ed when more data are accumulated.
We have also performed shared pro le analysis on patient datasets of disease progression to identify shared dominant CDR3 sequences in patients with csDMARD/ bDMARD treatment. PCA analysis on CDR3 shared pro le revealed a divergent tendency is some patients both in csDMARD/ bDMARD groups ( detections during treatment course also exhibit clustering effect, supporting our hypothesis that speci c clonal expansion pattern in individual patient may occur in response to some unique exogenous stimuli such as csDMARD/ bDMARD treatments within a same kind of underlying disease such as RA, regardless of disease improvement or deterioration.
Oral and gut microbiota have been thought to be one of the environmental factors that may enhance the development of RA [25][26][27][28] . For example, periodontitis caused by microorganisms might alter post-transcriptional regulation and citrullinated self-protein, leading to autoantibody production and local in ammation, and nally triggering the onset of RA. The change of healthy microbiota and pathogen-host immune system interaction may integrate the reduced immune diversity and BCR CD3 sequences clustering effect. Other environmental risk factor, especially cigarette smoking may also result in same clustering effect, eventually precipitating RA development [29][30][31] . On the contrary, treatments such as csDMARD/bDMARD break down the vicious cycle and lead to increased immune diversity. Clinically, autoantibodies such as RF and anti-CCP, though with very high speci city for disease entity, can be detected up to a decade before the development of RA 32 . Hence, no one can predict the disease onset of RA. CDR3 sharing pro le analysis may potentially predict the development of RA and guide physicians to a timely and appropriate intervention. Clustering effect may represent inchoate of disease and serves as a predictor biomarker; however, more data are needed to con rm this hypothesis.
In spite of the present promising ndings, our investigation holds some limitations. Firstly, this study was restrained by a small sample size, which may limit the statistical power for detecting the variation of the B-cell repertoire in clinical assessments. Yet, we still observed a trend toward a decrease of BCR iR diversity in RA patients prior to treatment and an increase in it after either csDMARDs or biologics, and a larger cohort study is required for solid validation. Secondly, our study only focuses on peripheral B cell repertoire in RA patients without B-cell and/ or plasma-cell clones in synovial tissue, and we will plan to extend our study in the future. Finally, we measured and analyzed the BCR IGH CD3 iR sequencing to monitor the disease activity, and further functional investigation will be conducted to elucidate the potential roles in the pathogenesis of RA.
In conclusion, we have successfully established a pipeline of experimental design, data acquisition, processing, and analysis to be used in monitoring the disease progression of RA. PCA is used to perform dimension reduction to make sense of CDR3 sequences variations, and the BCR CDR3 sequences do have a clustering pattern upon PCA representation which is an indication of speci c clonal expansion in RA. We demonstrated that BCR CD3 iR sequencing can be a good tool to follow up on the disease progression of individual RA patients. Low BCR repertoire diversity was noted in treatment naïve RA patients.
Conversely, immune diversity expands upon recovery or remission from active disease after csDMARDs and biologics treatment.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.