Molecular Biomarkers of Anti-TNF Response in Patients with Rheumatoid Arthritis


 BackgroundAdvances in immunotherapy by blocking TNF have remarkably improved treatment outcomes for rheumatoid arthritis (RA) patients. Although treatment specifically targets TNF-α, the downstream mechanisms of immune suppression are not completely understood, and the reason for the reduced efficacy in a significant fraction of patients remains unclear. Hence this study was designed to detect biomarkers and expression signatures of response to TNF inhibition.MethodsIn this study, we included 39 female patients diagnosed with RA who were non-responders to methotrexate treatment. The blood samples were collected before anti-TNF treatment initiation, and three months into treatment. The clinical evaluations were performed based on European League Against Rheumatism (EULAR) and classified 23 patients as responders and 16 as non-responders after three months following the initiation of anti-TNF treatment. We investigated differences in gene expression in peripheral blood mononuclear cells (PBMCs), the proportion of cell types and cell phenotypes in peripheral blood using flow cytometry, the level of proteins in serum, as well as clinical and demographic factors.ResultsWe performed analyses to identify differences between responders and non-responders at both time points (before and after treatment initiation) as well as to detect the changes induced during the treatment using transcriptomics, flow cytometry and proteomics data. The gene expression analysis before treatment revealed notably a higher expression of EPPK1 and BCL6-AS1 in future responders. We further detected suppression of genes and proteins during treatment, most notably a suppression of expression of the gene, T-cell inhibitor CHI3L1 and its protein YKL-40 measured from flow cytometry. We identified an increase in the proportion of T- and B cells, whereas the proportion of granulocytes was suppressed during treatment in responders. Finally, our machine learning models mainly based on transcriptomics data showed high predictive utility (ROC AUC ± SEM: 0.81 ± 0.17) in classifying response before anti-TNF treatment initiation.ConclusionsOur comprehensive analyses resulted in several useful insights regarding the transcriptional and translational regulations of anti-TNF treatment in RA patients. The study reports first transcriptomics analysis using RNA sequencing of isolated PBMCs from anti-TNF naïve and anti-TNF treated RA patients to study biomarkers and predict anti-TNF response.


Abstract Background
Advances in immunotherapy by blocking TNF have remarkably improved treatment outcomes for rheumatoid arthritis (RA) patients. Although treatment speci cally targets TNF-α, the downstream mechanisms of immune suppression are not completely understood, and the reason for the reduced e cacy in a signi cant fraction of patients remains unclear. Hence this study was designed to detect biomarkers and expression signatures of response to TNF inhibition.

Methods
In this study, we included 39 female patients diagnosed with RA who were non-responders to methotrexate treatment. The blood samples were collected before anti-TNF treatment initiation, and three months into treatment. The clinical evaluations were performed based on European League Against Rheumatism (EULAR) and classi ed 23 patients as responders and 16 as non-responders after three months following the initiation of anti-TNF treatment. We investigated differences in gene expression in peripheral blood mononuclear cells (PBMCs), the proportion of cell types and cell phenotypes in peripheral blood using ow cytometry, the level of proteins in serum, as well as clinical and demographic factors.

Results
We performed analyses to identify differences between responders and non-responders at both time points (before and after treatment initiation) as well as to detect the changes induced during the treatment using transcriptomics, ow cytometry and proteomics data. The gene expression analysis before treatment revealed notably a higher expression of EPPK1 and BCL6-AS1 in future responders. We further detected suppression of genes and proteins during treatment, most notably a suppression of expression of the gene, T-cell inhibitor CHI3L1 and its protein YKL-40 measured from ow cytometry. We identi ed an increase in the proportion of T-and B cells, whereas the proportion of granulocytes was suppressed during treatment in responders. Finally, our machine learning models mainly based on transcriptomics data showed high predictive utility (ROC AUC ± SEM: 0.81 ± 0.17) in classifying response before anti-TNF treatment initiation.

Conclusions
Our comprehensive analyses resulted in several useful insights regarding the transcriptional and translational regulations of anti-TNF treatment in RA patients. The study reports rst transcriptomics analysis using RNA sequencing of isolated PBMCs from anti-TNF naïve and anti-TNF treated RA patients to study biomarkers and predict anti-TNF response.

Page 4/29
Background Rheumatoid arthritis (RA) is a systemic autoimmune disease characterized by chronic in ammation in symmetric joints that without effective treatment leads to additional pain and eventually bone destruction. RA is one of the most common autoimmune diseases which affects approximately 0.5 -1 % of the world's population [1]. Currently there is no cure for RA, but several disease modifying antirheumatic drugs (DMARDs) are used to treat patients with the disease. During a successful treatment course, in ammation in the joints decreases, resulting in disease remission or low disease activity [2,3]. Methotrexate (MTX) is the DMARD recommended as the rst line treatment of early RA, however at least 30 % of patients do not respond to the treatment where signi cant disease activity remains [4,5]. The patients who do not respond to rst line treatment are recommended for additional treatment, in most cases with drugs that inhibit tumor necrosis factor (TNFi). TNF is a pro-in ammatory cytokine secreted mainly by monocytes and macrophages, but also by other immune and non-immune cells, including broblasts and endothelial cells, involved in systemic in ammation.
The anti-TNF treatment is recommended due to its e cacy in blocking synovial in ammation, and preventing radiological progression [6]. Anti-TNF therapy has been used to treat RA for two decades, but in many contexts one third of treated patients do not respond or have poor responses [7]. The ability to predict which patients will respond to any treatment is limited. Prediction of e cacy of treatments at baseline would help patients to start effective treatment and hence reduce delay in effective treatment of the disease and decrease numbers of adverse effects. This could also help to reduce the cost of starting different biologic therapies and most importantly improve effective health care.
RA affects women more frequently than men, and the response rate for various RA therapies including DMARDs and anti-TNF has in some studies been reported to be lower in women compared to men [8][9][10].
Sex differences in immune responses are in uenced by both the age and sex. Moreover, sex chromosome genes and sex hormones contribute to the difference in immune responses in males and females [11][12][13][14].
In the current study, we performed analyses to identify biomarkers of anti-TNF response, as well as signatures of response, in a Swedish cohort of RA patients. The study cohort is well-characterized and allowed us to investigate gene expression differences in peripheral blood mononuclear cells (PBMCs), the proportion of different cell types and cell phenotypes in peripheral blood using ow cytometry, the level of several proteins in serum, as well as clinical and demographic factors.
The PBMCs are mainly composed of different cell types (monocytes, T cells, B cells, NK cells, and dendritic cells) that are important for immune responses and in ammation. Gene expression in PBMCs is a good source to study pathophysiological mechanisms and has emerged as a potential source for the identi cation of biomarkers [15,16]. Transcriptomics studies have been used to identify potential biomarkers that might help to predict treatment response [17]. Our study reports the rst transcriptomics analysis using RNA sequencing of isolated PBMCs from RA patients with the aim to nd biomarkers and predict anti-TNF response. Further, we studied the association of data from ow cytometry, from protein measurements and from clinical information with anti-TNF response and employed machine learning models to predict anti-TNF response.

Patient Cohort
In this study, patient samples are obtained from the COMBINE cohort which includes 239 patient samples [18]. This includes individuals treated with methotrexate (89 patients), treated with anti-TNF drugs (after failing to respond to methotrexate treatment, 59 patients), or treated with second biologic agent (after failing to respond to anti-TNF treatment, 31 patients) and healthy controls, 60 samples). Healthy controls were recruited from the Swedish Blood Centre service in Uppsala with age and sex as closely matched with patient groups as possible. The patients who did not respond adequately to methotrexate treatment according to the local physician's judgement were prescribed for anti-TNF treatment in combination with methotrexate. Patients included in this study donated blood at the clinic before starting anti-TNF therapy, and at the follow up visit after approximately three months. Clinical assessments and routine blood sampling were made at both visits, and are used to calculate the disease activity score based on 28 joints, DAS28 CRP [19]. Of the 59 patients who started anti-TNF therapy, three patients dropped out prior to the scheduled three month visit and therefore lack clinical assessment. Previous studies have indicated that males and females respond differently to RA treatments with biologic agents, therefore the low number of male non-responders in our study cohort indicates a very low power to detect whether males and females potentially have different biomarkers or different treatment effects. In order to avoid larger heterogeneity, we therefore decided to include only female patients (n = 39) in our study (Additional le 1: Fig. S1). The clinical and demographic variables at baseline are shown in Table 1.   Table 1 : HAQ = Health Assessment questionnaire disability index; ACPA = anti-citrullinated peptide antibody; DAS28 = Disease Activity Score in 28 joints; CRP = C-reactive protein.

Response measures
We used European League Against Rheumatism (EULAR) response criteria to classify patient response to treatment [20,21]. In our analysis we consider Good and Moderate EULAR responders as "responders", and compare these to the EULAR "non-responders".

Flow cytometry
We measured the concentration of leukocytes, neutrophils, eosinophils, basophils and monocytes per liter of peripheral blood using XE Sysmex ow cytometry-based analysis. In addition, other immune cell phenotypes were measured by Gallios ow cytometry. Stainings were performed in combinations of antibody panels, one panel focusing on T cell stainings of PBMC, another on B cell stainings of PBMC, a third on NK and monocyte stainings of PBMC and a fourth on staining whole lysed blood, where granulocytes were identi ed by size and granularity (forward and side scattering properties). PBMCs were isolated by density gradient centrifugation and whole blood lysed using Serotec Erythrolyse buffer. Cells were stained freshly using the following antibodies Beckton Dickinson, NKG2D (1D11) from eBioscience. Only the HLA-DR staining was controlled using an isotype control antibody from Biolegend, while the staining of NKG2A, NKG2D, IL21R and TREM-1 were controlled by absence of added antibody (FMO, uorescence minus one). The data was analyzed using FlowJo (TreeStar Inc, Ashland, OR, USA). In total, we included 422 ow cytometry variables in our analysis. For the correlation analysis, we used the Pearson method.

Protein measurements
The protein measurements were performed as previously described [18]. A log transformation of protein levels was applied for association analysis to negate highly skewed protein levels. We considered only proteins with a minimum of at least 8 different non-zero values, resulting in the analysis of 51 proteins, where 32 proteins were measured with two separate technologies ( HumanMAP assay (Myriad RBM) and VectraDA (Crescendo)).

Statistical analysis
Gene expression analysis was performed using the DESeq2 Bioconductor package [23]. In the two cross sectional analyses, we adjusted for current smoking (whether patient was smoking when they started treatment), presence of HLA-DRB1 shared epitope (SE) alleles, presence of anti-citrullinated protein antibody (ACPA), presence of bone erosions (only available at baseline), and whether the patient was prescribed prednisolone when blood was drawn. To account for the potential differences in cell type composition of PBMCs, we also included cell type proportions of the samples (proportion of B cells, T cells, NK cells and monocytes of total PBMCs). Gene expression analysis was performed using the model: gene expression ~ response + bone erosion + shared epitope + current_smoker + ACPA + prednisolone + prop.

B cells + prop.T cells + prop.NK cells + prop.Monocytes
In the longitudinal analysis we investigated changes in gene expression during treatment, and thus used only paired samples (25 patients). To nd the effect of treatment on gene expression, we adjusted the model for prednisolone prescription: To detect the effect of anti-TNF treatment in responders and non-responders separately, we used a model with interaction terms as described in DESeq2 [23].
gene expression ~ response + patid:response + visit:response Gene set enrichment analysis was performed using the preranked gene set enrichment method using the R package fgsea [24]. We included 1329 canonical pathways (8904 genes) curated from the following online databases BioCarta, KEGG, Matrisome Project, Pathway Interaction Database, Reactome, SigmaAldrich, Signal Transduction KE, Signalling Gateway and SuperArray SABiosciences, collected by MSigDB. We considered pathways with a FDR threshold (FDR < 0.01 or FDR < 0.1) for gene expression contrasts based on the number of signi cant gene sets.
For protein expression and ow cytometry data, we analyzed the association to EULAR response at two time points using logistic regression. We used the below model for the protein data analysis: response ~ measurement + ethnicity + erosion + current smoker + shared epitope + ACPA + prednisolone For the longitudinal analysis of protein, ow cytometry, we used a mixed linear model (lme from nlme R package) : where patid is a random effect. We computed the estimated marginal means (EMMs) for two contrasts from this model: for the treatment effect in responders and for the treatment effect in non-responders.

Prediction model
We built three classi cation models of the anti-TNF response data: a linear model (with L1 and L2 regularization, using the glmnet R library), a non-linear model (using the randomForest library in R), and a kernel-based method (SVM with an RBF kernel, using the smvRadial library in R). We used 10 repeats of 5-fold cross-validation, where in each repeat 5 randomly sampled steps of hyperparameter estimation were employed. We built separate predictive models for clinical variables, gene expression, ow cytometry and for protein measurements, where the covariates mentioned above were included as additional features. For the models based on clinical data, we included CRP, ESR, Swollen joints, Tender joints, Pain (VAS mm), Physical function (HAQ), Patient assessment of global status, Health professional assessment of global status, DAS28, DAS28-CRP, Eosinophils (10*9 /l) and Monocytes (10*9 /l). The prediction models were built on measurements of samples from naive to anti-TNF treated patients (before treatment) and also on measurements taken after three months of anti-TNF therapy (treated). For the gene expression data, we removed genes with all zero counts and ncRNAs (miRNA, snoRNA, piRNA, tRNAs, rRNA and siRNA) that resulted in dataset of 22,628 transcripts (only protein coding and long noncoding transcripts are included) and used transcript per million (TPM) normalized read counts in the model.

Association of baseline gene expression signatures and biomarkers with response to anti-TNF treatment
In order to decrease the heterogeneity between analyzed groups for all the presented analyses, we included only female patients diagnosed with RA in our study (18 responders and 10 non-responders).
The gene expression analysis was performed for the baseline data between the group of future responders compared to the group of non-responders. The analysis identi ed 59 differentially expressed genes and most of these genes achieved signi cance (FDR ≤ 0.1) due to a single outlier sample showing different gene expression pro le (Additional le 1: Fig. S2A-B). Therefore, we decided to exclude this patient sample (both anti-TNF naïve and treated) from the subsequent analyses. The further analysis (without the outlier sample) yielded 192 differentially expressed genes (FDR ≤ 0.1), including 103 genes with higher expression in the group of responders and 89 genes with higher expression in the group of non-responders. The top differentially expressed genes are represented in Fig. 1A and listed in Table 2 (also see Additional le 2: Table S1).
Many differentially expressed genes showed signi cant differences between responders and nonresponders. However, this was found to be due to outlier expression in one or more patient samples. In order to assess possible heterogeneity and to detect the genes with stable association to response, we employed a leave-one-out (LOO) approach, where we removed one patient sample in each iteration and repeated the association analysis. The genes are considered in each iteration if the gene meets statistical signi cance with a false discovery rate less than 0.1. After performing the LOO approach, only two genes, Epiplakin (EPPK1) and FosB Proto-Oncogene, AP-1 Transcription Factor Subunit (FOSB) with higher expression in future responders showed signi cance in all possible 28 iterations (FDR < 0.1 across all 28 LOO iterations, Table 2). And additional ve genes, EGR1, EGR2, BCL6-AS1, IGLV10-54 and IGKV1D-39 showed signi cance (FDR < 0.1) in 27 LOO iterations. The genes EGR1, EGR2 and BCL6-AS1 were expressed higher in future responders, whereas immunoglobulin light chain genes IGLV10-54 and IKV1D-39 were expressed lower in future responders. We plotted the normalized expression counts of these seven differentially expressed genes and found only genes EPPK1 and BCL6-AS1 showed clear difference in expression between responders and non-responders (Fig. 1B) whereas FOSB, EGR1 and EGR2 did not show clear differences (Additional le 1: Fig. S2C). Also we noticed 6 immunoglobulin light chain genes (IGLV10-54, IGKV1D-39, IGKV3-20, IGLV3-1, IGKV1-17, IGKV2-24) and 1 heavy chain gene (IGHV5-10-1) showing lower expression in group of future responders compared to group of non-responders, however these genes did not pass the LOO analysis (FDR < 0.1) (Fig. 1A). The number of differentially expressed genes that showed signi cance across the 28 LOO iterations varied from 15 to 1617, indicating that the statistical analyses in DESeq2 were sensitive to outlier samples.  To understand the characteristics of responders and non-responders, we performed gene set enrichment analysis (GSEA) and identi ed a total of 127 regulated pathways (FDR ≤ 0.01, Additional le 2: Table S2).
We found positive and negative enrichment characteristics for responders compared to non-responders.
Notably, response to therapy was preferentially characterized by higher expression of genes involved in graft versus host disease, antigen processing and presentation, and neutrophil degranulation.  Table S2).
We further studied the association of immune phenotypes measured by ow cytometry and also the association of blood plasma protein levels to clinically de ned response before anti-TNF treatment.
Neither ow cytometry measurements nor protein measurements showed any signi cant difference between responders and non-responders at baseline (data not shown).

Association of gene expression signatures and response upon 3 months of anti-TNF treatment
With the aim to identify gene expression signatures associated with response upon therapy with TNFblockade, we performed differential expression analysis in PBMC between groups of responders and nonresponders after 3 months of the treatment. This analysis identi ed signi cant differences in expression for 19 genes (FDR ≤ 0.1) in PBMC (Additional le 1: Fig. S4A). We investigated the effects of outliers on differential gene expression among the follow up samples using a LOO approach. None of the previously found 19 genes remained signi cantly differentially expressed in all 32 iterations. However, three genes, BRDOS (Additional le 1: Fig. S4B), C2orf42 and HBA2 showed signi cance (FDR ≤ 0.1) in 31 iterations.
All three genes are expressed lower in the group of responders compared to the group of non-responders. Additionally, ve genes, EPHB3, MKS1, NCK1-AS1 (Additional le 1: Fig. S4B), SLC25A39, FBXO7 showed signi cance in 30 iterations. The list of differentially expressed genes and number of iterations where each gene showed signi cance are presented in Additional le 2: Table S3.
The gene set enrichment analysis was performed using differentially expressed genes, sorted based on log2 fold change and the analysis identi ed 27 regulated pathways. Interestingly, all the 27 pathways that showed signi cance had a lower expression in the group of responders. The regulated pathways were predominantly enriched for metabolism of RNA, metabolism of proteins and metabolism of amino acids and derivatives (Additional le 1: Fig. S3B; Additional le 2: Table S4) suggesting overall downregulation of biosynthesis in PBMCs of patients who have responded to anti-TNF treatment.
Further, our association analysis of immune phenotypes and plasma protein levels to clinically de ned response of anti-TNF treatment did not show any signi cant association between responders and nonresponders after three months (data not shown).

Effect on gene expression in PBMCs during anti-TNF treatment
Since biological and technical variations between individuals may signi cantly affect the analyses, paired samples of PBMC from the same patient before and after treatment is the most preferable approach for addressing changes related to the treatment. We analyzed the effect of anti-TNF treatment on gene expression using paired PBMC samples from all 25 RA patients, not considering response. The analysis identi ed 25 genes that showed signi cant treatment effect on gene expression (FDR ≤ 0.1).
The expression of 14 genes were suppressed and 11 genes showed a slight increase in expression compared to baseline with 3 months of treatment with TNF blockade (Additional le 2: Table S5). The transcription factor BHLHE40 that controls cytokine production by T cells and Chitinase-6-like protein, CHI3L1were suppressed during treatment whereas the B cell novel protein 1 (alias FAM129C) and Tetratricopeptide Repeat Domain 21A (TTC21A) were induced by treatment ( Fig. 2A). The pathway analysis of differentially expressed genes did not show any enrichment of gene sets (FDR ≤ 0.01), but when using a less conservative threshold of FDR ≤ 0.1, we detected the regulation of 114 gene sets during treatment across all patients (Additional le 2: Table S6). Induced genes are signi cantly enriched for genes involved in the RNA and protein metabolism, and interferon signaling, whereas suppressed genes are predominantly enriched for genes involved in neutrophil degranulation, hemostasis and signaling by GPCR (Fig. 2B).

Effect of anti-TNF treatment on gene expression in PBMCs in relation to treatment response
The trajectory of gene expression changes may correlate with measured clinical outcomes. Therefore, we induced during anti-TNF treatment. The gene expression plots for all ve genes using normalized counts shows a clear difference in responders before and after anti-TNF treatment (Fig. 3A). We observed regulation of these genes in the same direction for non-responders, but with less signi cance (Additional le 1: Fig. S5).
Further, we performed gene set enrichment analysis of differentially expressed genes in responders and found 78 pathways that are signi cantly enriched (Additional le 2: Table S7). In responders, we noticed induction of pathways involved in regulation of cell cycle mitotic and protein metabolism, whereas genes involved in extracellular matrix organization, neutrophil degranulation, signaling by GPCR, signaling by interleukins, hemostasis and immune responses such as Toll like receptor cascades were downregulated ( Fig. 3B; Additional le 2: Table S7).

Changes in cell phenotypes during anti-TNF treatment
Using a linear model we studied the changes in 422 immune phenotypes measured by ow cytometry during treatment with anti-TNF. When analyzing the effect of treatment in responders and nonresponders, we observed differences in seven cell phenotypes in responders and no signi cant differences in non-responders (Additional le 2: Table S8). After treatment in responders, we detected a strong suppression of the proportion of granulocytes among leukocytes (de ned as CD45 + cells), as well as a decrease in overall concentration of neutrophils in whole blood. The proportion of T cells (de ned as

CD3 + cells) and B cells (CD3-CD19+) among leukocytes was instead up-regulated during anti-TNF
treatment among responders, along with the proportion of NKG2A + NKp44 + NK cells out of all NKp44 + NK cells (Fig. 3C). We compared beta coe cient values of responders and non-responders for the selected 58 cell phenotypes (p-value < 0.05) and we observed a positive correlation between response groups (r = 0.46, p value = 0.0008) indicating similar directionality of regulation for the most cell phenotypes.

Effect of anti-TNF treatment on levels of different proteins blood plasma
To identify the proteins that are regulated during treatment in responders and non-responders separately, we performed longitudinal analysis on paired patient samples. In responders, we identi ed regulation of 12 proteins in plasma compared to one protein in non-responders. Proteins were predominantly downregulated by anti-TNF treatment in responders, including CRP, IL-6, MMP-1, MMP-3, SAA, TNF-RI, VEGF, TNFR2, MIG, MIP-1 beta and YKL-40 (Fig. 3D). Interestingly, we observed downregulation of protein matrix metalloproteinase-3 (MMP-3) measured using two different methods. The protein adiponectin is induced during anti-TNF treatment among non-responders, as well as with less signi cance in responders (FDR: 0.11). The list of proteins that were regulated during anti-TNF treatment is shown in Additional le 2: Table S9.
Classi er performance for anti-TNF response data We investigated the utility of machine learning algorithms to predict anti-TNF response using clinical data, ow cytometry measurements, protein measurements and transcriptomic data. At baseline, the model based on transcriptomics data predicted response fairly accurately with linear model (ROC AUC ± SEM : 0.81 ± 0.17) (Fig. 4) whereas the models based on clinical data, ow cytometry data and protein data showed limited predictive utility. The kernel method at baseline predicted response with ROC AUC: 0.73 ± 0.17 for clinical data, ROC AUC: 0.72 ± 0.18 for ow cytometry and ROC AUC: 0.72 ± 0.15 for protein data (Fig. 4). We further studied the classi er performance of the models based on all four data types at three months (after treatment) and we observed limited classi er utility of models based on ow cytometry, protein as well as transcriptomics data. In contrast, the linear model based on clinical data showed good classi er performance with ROC AUC: 0.85 ± 0.15 at three months. For the FACS data, we found maximum classi er utility of ROC AUC: 0.68 ± 0.17 using linear model, whereas the models based on proteins and transcriptomics data showed maximum classi er utility using non-linear method with ROC AUC: 0.73 ± 0.15 and ROC AUC: 0.72 ± 0.18 respectively.

Discussion
This study represents a comprehensive analysis of transcriptomics, proteomics and ow cytometry data analysis of female RA patients treated with anti-TNF. Here, we used data from multiple data types to infer differences between groups of responders and non-responders before and after treatment. Our analysis was directed towards nding transcriptional and translational regulations during anti-TNF treatment using paired samples collected at two time points. We identi ed genes that are differentially regulated between responders and non-responders both at baseline and after anti-TNF treatment. We further reported changes in gene expression, protein measurements and cell phenotypes during the course of anti-TNF treatment. Interestingly, our integrated studies revealed that CHI3L1 transcript and its protein product YKL-40 were suppressed upon anti-TNF treatment. The anti-TNF treatment in responders also resulted in an increased proportion of B cells, T cells and NK cells, whereas the proportion of granulocytes was strongly suppressed in responders.
The identi cation of potential biomarkers with prognostic value for response to a given therapy is challenging as RA is a very heterogeneous disease by its clinical characteristics and pathological processes. Previously, various attempts have been made to nd biomarkers for anti-TNF treatment, but with limited success [25][26][27][28]. This may be due to the high clinical heterogeneity in RA patient samples and strong confounding (covariates) effects. For example, variation in cell subsets is one of the strong covariates of gene expression [29]. For the current study, we used a well characterized cohort which allowed us to adjust for the important covariates, such as clinical measurements and proportions of T cells, B cells, NK cells and monocytes measured from ow cytometry. For RA, the response measure is calculated based on changes in DAS28 scores as de ned by the EULAR response criteria [30]. In addition, RNA sequencing experiments can often generate outlier read counts in one or more RNA samples. The presence of these outliers in the data considerably limits the power of differential testing and therefore we performed extensive transcriptomics analysis using leave-one-out approach to address possible heterogeneity and nd genes with a stable association to response. To the best of our knowledge, this is the rst study that integrates RNA sequencing of PBMCs, broad ow cytometry measurements, and measurements of protein levels in serum and plasma to nd biomarkers in RA patients treated with TNF inhibitors.
In the analysis using baseline samples, we report two genes EPPK1 and BCL6-AS1 that showed stronger association with response groups. The cell adhesion gene epiplakin 1, EPPK1 and BCL6-AS1 showed higher expression in future responders compared to future non-responders. The in ltration of in ammatory cells into the synovial lining is achieved by deregulation of cell adhesion molecules and the studies have previously reported the association of cell adhesion genes and suggested a role of these molecules in the pathogenesis of rheumatoid arthritis and hemophilic arthropathy [31,32]. We also found the expression of the EPPK1 gene to be positively correlated with the proportion of NK cells and proportion of monocytes. The gene BCL6-AS1, a long non-coding RNA was shown to be correlated with the BCL6 translocation zone that promotes chromosomal breaks in immunoglobulin heavy chain (IgH) switch regions through convergent transcription [33,34]. We found the expression of BCL6-AS1 to be positively correlated (r2: 0.89) with the expression of BCL6 gene which is important for the formation of both Tfh cells and germinal B cells that enhances humoral responses [35][36][37]. Importantly, the differences we observed in genes, EPPK1(log2 fold change:1.95, FDR:5.5E-03) and BCL6-AS1(log2 fold change:1.51, FDR:8.16E-02) were maintained between responders and non-responders following the adjustment to CRP measurements in the gene expression model. Interestingly, future non-responders showed an increase in cell cycle pathways mainly cell cycle mitotic activity, cell cycle checkpoints. This may partly explain the hyperproliferation of cells that leads to the accumulation of pro-in ammatory cytokines in the in amed joints [38,39].
Our analysis of differences in gene expression patterns between responders and non-responders in samples obtained at 3 months after anti-TNF treatment showed signi cant differences in two genes, BRD3OS was expressed higher in non-responders as compared to responders and NCK-AS1 was expressed higher in responders as compared to non-responders. Interestingly, pathway analysis suggested downregulation of pathways involved in metabolism of RNA, metabolism of amino acids and derivatives and metabolism of proteins in responders. This inactivation of pathways involved in metabolism among responders corroborates with the previous ndings that proliferation and rapid activation of immune and stromal cells requires a metabolically highly active state. Such a high metabolic state induces overproduction of enzymes that lead to degradation of cartilage and bones and the production of cytokines that promote immune cell in ltration [40][41][42].
The longitudinal studies using paired samples of PBMCs from the same patient before and after treatment removes inter-individual variations, and therefore provides better possibility to detect changes induced by treatment. We analyzed the treatment effects in RA patients using two different approaches, i) changes in gene expression patterns in samples from all RA patients not considering response status; ii) treatment effect separately in responders and non-responders in order to understand to which extent anti-TNF treatment regulates gene expression differently in responders and non-responders. In the rst approach, our study revealed twenty-ve genes that were differentially expressed in all RA patients, whereas in the second approach we found ve genes that were differentially expressed in responders and no genes that were differentially expressed in non-responders. In both these approaches, we found four common genes CXCR2, MPO and MYADM that were downregulated, whereas FCGR2B was upregulated upon treatment. We also observed that the TNFAIP6 gene (TNF Alpha Induced Protein 6, also called tumor necrosis factor stimulated gene-6, TSG-6) was suppressed by anti-TNF treatment in responders. Previous studies have shown that expression of TSG-6 has a strong correlation with disease severity and is a potential biomarker of in ammation [43][44][45]. Higher expression of TSG-6 has been found in synovial uid of patients with osteoarthritis and rheumatoid arthritis and the protein coded by this gene is secreted by cells of articular joints. TSG-6 has been reported to play a key role in remodeling of extracellular matrix, regulation of leukocyte migration and stimulation of cell proliferation during in ammation [46,47]. These previous ndings correspond well with our pathway analysis, which suggests that treatment with anti-TNF results in downregulation of genes involved in extracellular matrix organization and signaling by interleukins, and an upregulation of genes involved in regulation of mitotic cell cycle.
Interestingly, the differentially expressed genes that are regulated upon treatment in responders showed regulation in the same direction for non-responders. Thus, the differences that we observed in responders and not in non-responders could be due to the result of different powers of the analysis in the two groups.
Extended future studies with large sample size are warranted to validate these results.
We studied changes in cell phenotypes in paired blood samples and observed changes in seven cell populations in responders to anti-TNF treatment, indicating a marked change in major cell type proportions during treatment (Fig. 3C). The proportions of B cells and T cells among leukocytes were increased during treatment in responders whereas there was a strong reduction in the proportions of granulocytes. There was also a decrease in overall concentration of neutrophils in whole blood in responders as well as in non-responders. The reduction of neutrophils seen by ow cytometry was also associated with signi cant reduction in peripheral blood neutrophil count seen by routine blood analysis, leading to 19% of patients becoming neutropenic by a clinical judgement. This nding corroborates a recent study which showed that patients treated with anti-TNF in combination with methotrexate decreased the blood neutrophil counts regardless of their clinical response to therapy [48,49]. Additionally, our results also corroborate with earlier studies showing an induction of B cells and NK cells in whole blood following anti-TNF treatment [50]. Along with these changes we also see an upregulation of the proportion of activated NK cells (NKp44+) in responders that express the inhibitory protein NKG2A.
The longitudinal analysis of protein data showed changes in plasma protein levels differently in responders as compared to non-responders. The expression of the protein YKL-40 was reduced upon treatment (beta: -0.26) and the gene encoding for YKL-40, CHI3L1, was also down-regulated ( Fig. 2A,  Fig. 3D). In the analysis including only responders, the gene CHI3L1 showed strong suppression (log2 fold change: -1.07) upon treatment, however with lower statistical signi cance (p-value < 1.73E-04; FDR < 0.21). The CHI3L1 gene may negatively regulate T cell activation and may also inhibit Th1 differentiation via the IFNγ-STAT1 pathway [51]. We also showed that treatment with TNF inhibitors leads to a larger proportion of T cells among responders (beta: 8.16).
Interestingly, we also found two proteins, soluble TNFR2 and adiponectin which plasma levels are increased after treatment both in responders and non-responders. The TNFR2 protein is predominantly produced by certain T cells, mainly regulatory T cells, endothelial cells, thymocytes, mesenchymal stem cells and cells from the nervous system [52]. Various studies have pointed out that the increase in TNFR2 signaling may lead to the activation and proliferation of Tregs and promote tissue regeneration [53][54][55].
The protein adiponectin is considered as a pro-in ammatory mediator, and a similar induction of adiponectin as seen here for TNF-blockade has been shown previously after treatment with anti-IL-6R, Tocilizumab [56]. Importantly, it has been shown that adiponectin has a multifaceted role in RA with both pro-in ammatory and anti-in ammatory functions [57]. The studies on immune phenotypes and protein levels both before and after treatment did not show any signi cant differences between responders and non-responders. As for other ndings in the present study, this lack of differences may be due to both insu cient power in our analysis and to an insu cient choice of protein markers that we investigated.
Further studies on proteins as biomarkers for response therapies, including TNF-inhibition, in RA are thus warranted.
We studied clinical variables, cell phenotypes, protein measurements and transcriptomics data to assess their ability to predict response to anti-TNF treatment. We found high predictive utility for response using different data types and algorithms before treatment. The linear model based on transcriptomics data at baseline found a suggestive good predictability using the presently applied gene expression classi er. We also illustrated that models with transcriptomics data alone predict response with higher accuracy compared to models with clinical data alone. Interestingly, we also found clinical variables add less further utility to the transcriptomics based predictive models at baseline (data not shown). Previously, a similar outcome was observed in a study where models using transcriptomics data alone predicted brous cap thickness response to statin treatment with better accuracy compared to the models with clinical data or transcriptomics plus clinical variables [58]. We caution that our prediction model ndings need to be validated in independent cohorts, before its potential use in clinical settings.
Overall, we used a well-characterized cohort of RA patients and reported gene expression, ow cytometry and proteomics pro ling in response to anti-TNF treatment. We identi ed gene expression differences in response groups at baseline and more signi cant regulations during the treatment. The anti-TNF treatment causes a major regulation of cell subsets that corroborates with the previous reports, which are also mirrored in the protein analysis. We also report strong decrease in expression of genes, proteins and cell subsets in responders upon treatment. Also, our study highlights machine learning predictive utility of the anti-TNF treatment response using different biological measurements at baseline. The limitation of our study is that we have done pro ling of only female RA patients and that relatively few patients were included. Our ndings also need to be validated in independent cohorts and to be extended in larger studies with more patient samples.

Conclusion
In summary, our ndings resulted in several insights regarding the relationships between a set of biomarkers and response to anti-TNF therapy in female RA patients. Our integrative and multi-omics approach identi ed gene expression signatures, changes in protein concentration as well as cell phenotypes in responders. Our machine learning model based on transcriptomics data showed high predictive utility in stratifying patients and/or predict response before anti-TNF treatment initiation. We envision that machine learning algorithms based on multi-omics data will help to support rheumatologist's decision towards personalized treatment of RA patients.    Statistical machine learning models to predict response (evaluated after three months) at baseline and after anti-TNF treatment using clinical variables, ow cytometry measurements, protein measurements and gene expression data. The Y-axis represents the area under the receiver operating characteristic (ROC) curves (AUCs) calculated for estimating the predicted performance.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.