Association of Toll-Like Receptor 9 Gene Polymorphisms With Remission in Patients With Rheumatoid Arthritis Receiving Tumor Necrosis Factor Alpha Inhibitors and Development of Machine Learning Models

Studies that investigate the association between toll-like receptor (TLR)-4 or TLR9 gene polymorphisms and remission from the disease in RA patients taking tumor necrosis factor alpha (TNF-α) inhibitors have yet to be conducted. In this context, this study was designed to investigate the effects of polymorphisms in TLR4 and TLR9 on response to TNF-α inhibitor and develop various machine learning approaches to predict remission. A total of six single nucleotide polymorphisms (SNPs) were investigated. Logistic regression analysis was used to investigate the association between genetic polymorphisms and response to treatment. Various machine learning methods were utilized for prediction of remission. After adjusting for covariates, the rate of remission of T-allele carriers of TLR9 rs352139 was about 5 times that of the CC-genotype carriers (95% condence interval (CI) 1.325–19.231, p = 0.018). Among machine learning algorithms, multivariate logistic regression and elastic net showed the best prediction with the AUROC value of 0.71 (95% CI 0.597 - 0.823 for both models). This study showed an association between a TLR9 polymorphism (rs352139) and treatment response in RA patients receiving TNF-α inhibitors. Moreover, this study developed various machine learning methods for prediction, among which the elastic net provided the best model for remission prediction.


Introduction
Rheumatoid arthritis (RA) is a severe chronic in ammatory reaction that occurs in the synovium of joints.
Mortality hazards are 60-70% higher in patients with RA than in those without the disease [1]. Although the exact etiology of RA is still under investigation, several genetic studies have suggested a role of genetic factors [2,3]. The most well-known genetic risk factors for RA are variations in the human leukocyte antigen (HLA) genes, especially the HLA-DRB1 gene [4]. However, many other genes with potential links to RA remain to be investigated in order to discover further genetic risk factors and therapeutic variations for RA.
Tumor necrosis factor alpha (TNF-α) inhibitors play important roles in in ammatory states, including RA [5]. There are ve TNF-α inhibitors available for RA treatment (adalimumab, certolizumab, etanercept, golimumab, and in iximab), and clinical e cacies in RA are known to be similar among these agents [6].
Patients with advanced RA are treated with TNF-α inhibitors; however, the e cacy of these treatments is still questionable as several studies have reported that only one-third of the patients bene t from the treatment [7,8].
Toll-like receptors (TLRs) play vital roles in both innate and acquired immune systems [9], and several studies have shown their association with the development of RA [10][11][12]. Notably, TLRs are known as inducers of TNF-α transcription [13]. Triad3A is an E3 ubiquitin-protein ligase that induces degradation of TLR4 and TLR9 [14]. Hence, reduction in endogenous Triad3A results in TLR activation. Since Triad3A acts speci cally on TLR4 and TLR9 among the 13 members of the TLR family, the genes encoding TLR4 and TLR9 are important for understanding RA pathogenesis and potential therapeutic intervention [15][16][17][18].
Nuclear factor-kappaB (NFkB) is associated with the response to TNF-α inhibitors in autoimmune diseases [19]. Because of this role of NFkB, several transcription factors activating NFkB have been discovered and investigated, including TLRs. As TLRs activate pro-in ammatory cytokines including TNFα and transcription factors such as NFkB, their polymorphisms may potentially affect treatment outcomes [20]. Therefore, this study aimed to examine the effects of TLR4 and TLR9 polymorphisms on drug response in RA patients receiving TNF-α inhibitors. Moreover, we developed machine learning algorithms to assess the ability of the obtained factors to predict remission in patients with RA.

Study patients
This prospective observational two-center study enrolled 105 patients who were prescribed TNF-α inhibitors (adalimumab, etanercept, golimumab, or in iximab) at Ajou University Hospital and Chungbuk National University Hospital between July 2017 and December 2019. Data collection was conducted using electronic medical records. Data on sex, age, weight, height, duration of RA, autoantibodies against rheumatoid factor, anti-cyclic citrullinated peptide, concomitant medications, and comorbidities were collected from electronic medical records. Additionally, baseline data on disease activity score (DAS)-28 and its subcomponents, which included tender joint count (TJC)-28, swollen joint count (SJC)-28, global health (GH), and erythrocyte sedimentation rate (ESR) or C-reactive protein levels, were collected.
A good clinical response to anti-TNF therapy was de ned as basis of the DAS-28 scores. Patients with a DAS-28 score of less than 2.6 after 6 months of TNF-α inhibitor therapy, were considered to be in remission [21]. DAS-28 was calculated as 0.56 × √(TJC28) + 0.28 × √(SJC28) + 0.70 × ln(ESR) + 0.014 × GH [21]. This study was approved by the Institutional Review Boards of the Ajou University Hospital (approval number: AJIRB-BMR-OBS-17-153) and Chungbuk National University Hospital (approval number: 2017-06-011-004). All patients submitted written informed consents for participation. This study was conducted according to the principles of the Declaration of Helsinki (2013).
Genomic DNA of the patients was isolated from ethylenediaminetetraacetic acid-blood samples using the QIAamp DNA Blood Mini Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer's protocol. Genotyping was performed using a single-base primer extension assay with SNaPShot multiplex kits (ABI, Foster City, CA, USA) or TaqMan genotyping assay in a real-time PCR system (ABI 7300, ABI), according to the manufacturer's recommendations.

Statistical analysis and machine learning methods
Student's t-test was used to compare continuous variables between patients who showed good clinical response (remission) and those who did not. Chi-square test or Fisher's exact test was used to compare categorical variables between the two groups. Multivariable logistic regression analysis was used to examine independent factors for remission; factors with a p-value less than 0.05 in univariate analysis along with clinically relevant confounders were included in multivariable analysis. The Hosmer-Lemeshow test was performed to con rm the model's goodness of t.
This study employed a random forest-based classi cation approach to analyze the importance of different variables for factors that affect remission. To prevent over-tting, we selected seven features that are most important. Various machine learning methods such as multivariate logistic regression, elastic net, random forest, and support vector machine (SVM) were utilized for prediction of remission. All the methods were implemented with the caret R package. The area under the receiver-operating curve (AUROC), to assess the ability of the risk factor to predict complication, and its 95% con dence interval (CI) of each machine learning prediction models were described in this study. A p-value of less than 0.05 was considered statistically signi cant. Univariate statistical analysis was conducted using IBM SPSS statistics, version 20 software (International Business Machines Corp., New York, USA). All other analyses were performed using R software version 3.6.0 (R Foundation for Statistical Computing, Vienna, Austria).
To measure performance of each machine learning model, internal validation was done. The dataset was randomly divided for model development and evaluation in prediction process. After partitioning one data sample into ve subsets, one subset was selected for model validation while the remaining subsets were used to establish machine learning models. Each ve-fold cross-validation iteration was repeated 100 times to evaluate the power of the machine learning models.

Results
Among the 105 patients enrolled in this study, 7 patients were excluded due to incomplete medical data.
The data from 98 patients receiving TNF-α inhibitors were analyzed. The mean age of the included patients was 53 years (range: 20-82 years), and there were 79 (80.6%) females. The mean duration of RA was 9 years, and 29 patients reached remission. To determine the possible effect of the disease status on response to TNF-α inhibitors, baseline DAS-28 and its subcomponents were examined. Baseline DAS-28 and its subcomponents were not statistically signi cant between the remission and non-remission groups ( Table 1). Marginal signi cance was found according to sex (p = 0.059) and hypertension (p = 0.060).  Table 2, statistically signi cant associations between genotypes and RA remission were found for both TLR9 SNPs: T-allele carriers of rs352139 and rs352140 experienced approximately 3.3 and 4.5 times more frequent remission than patients with the CC genotype, respectively.  OR: odds ratio; CI: con dence interval As shown in Figure 1, after feature selection using performing ve-fold cross-validated random forest approach, four important variables from feature selection (rs352139, body mass index (BMI), sulfasalazine, and anti-citrullinated protein/peptide antibody (AC-PA)) were included in machine learning models.
After performing ve-fold cross-validated multivariate logistic regression, elastic net, random forest, support vector machine (SVM) models, the average area under the receiver-operating curve (AUROC), values across 100 random iterations were shown in Table 4. The AUROC values for multivariate logistic regression, elastic net, and random forest indicated good performances of the models; 0.71, 0.71, and 0.70 respectively (95% CI 0.594 -0.827 for multivariate logistic regression and elastic models and 0.584 -0.821, respectively). Linear kernel SVM and radial kernel SVM revealed sub-optimal performances of the models; AUROC values of 0.60 and 0.67, respectively (95% CI 0.416 -0.782 and 0.53 -0.813, respectively). Figure 2 showed AUROC curves of three models that exhibit good interpretability and prediction rate.

Discussion
The main nding of this study is that rs352139 of TLR9 was associated with treatment response to TNFα inhibitors in RA patients. The remission rate in T-allele carriers of rs352139 was about 5 times that in patients with the CC genotype. Multivariate logistic regression and elastic net were proven to be the most suitable method in predicting remission in patients with RA, with AUROC values of 0.71 (95% CI 0.594-0.827 for both models).
TNF-α is a pro-in ammatory cytokine involved in the innate immune response [27]. It is involved in the pathogenesis of several in ammatory conditions, especially RA. As the TNF-α level is elevated in patients with RA, TNF-α inhibitors have been frequently us to treat of RA. Unlike other agents for RA therapy, TNF-α inhibitors target cytokines and are used to treat patients with advanced RA.
Damage-associated molecular patterns (DAMPs) are endogenous danger molecules that activate the innate immune system by interacting with TLRs. This evokes innate immune responses, including induction of in ammatory cytokines [28]. DAMPs play an important role in the initiation of in ammation during tissue injury without infection and are may also be involved in chronic in ammation including autoimmune diseases [12]. Once DAMPs are released during tissue injury, TLRs are activated, and the in ammatory cycle is initiated. The binding of TLRs to DAMPS activates the receptors, up-regulating proin ammatory mediators including cytokines and resulting in various in ammatory conditions and chronic in ammation [12].
Ligand-bound TLRs interact with elements on the surface of pathogens and activate the MyD88-related pathways [28], resulting in NFkB activation and cytokine gene expression [10]. This ultimately leads to the induction of molecules associated with in ammation and release of pro-in ammatory components such as TNF-α [29]. TLRs are known as inducers of TNF-α transcription [13]. Several studies have shown an increased expression of TLR4 on RA synovial uid macrophages and RA synovial broblasts [30,31] and of TLR9 in RA synovial tissue broblasts and RA peripheral blood monocytes [18,32].
Our results revealed that TLR9 polymorphism was associated with the remission rate of RA patients taking TNF-α inhibitors. The T-allele carriers of rs352139 had a signi cantly higher remission rate than patients with the CC genotype. TLR9 is expressed by B cells and functions with the B cell receptor complex, resulting in the release of rheumatoid factor [33]. Previously, Bank et al. [19] have reported an association of the GG genotype of rs352139 with nonresponse to TNF-α inhibitors in in ammatory bowel disease patients, which is in line with our ndings. This association is possibly attributable to alteration in TLR function; however, further research is required to validate our results, as there are no published mechanistic studies on the association between this polymorphism and TNF-α inhibition or treatment response in advanced RA patients.
The utilization of machine learning approaches to predict remission in patients with RA receiving TNF-α inhibitors is novel in clinical research. In clinical settings, these models can be helpful in decision-making process. To overcome over-tting, this study utilized random forest, an ensemble method of bootstrap aggregated binary classi cation trees [34] for feature selection. We also demonstrated that multivariate logistic regression and elastic net, a penalized linear regression model that combine penalties of the lasso and ridge methods [35], models outperformed other models. Hence, these models may be useful in predicting remission in patients on TNF-α inhibitor treatment.
The limitations of our study are its small sample size and a lack of a detailed mechanism. Nevertheless, to our knowledge, this is the rst study to investigate the effects of ge-netic variations in the TLR4 and TLR9 genes on favorable response rates to RA treatment in patients taking TNF-α inhibitors. Moreover, this study provides important features and prediction models based on machine learning algorithms including logistic regression, elastic net, random forest and SVM for remission in patients with RA receiving TNF-α inhibitors. Thus, our current results could be used as preliminary data to design individually tailored TNF-α inhibitor treatments for RA patients. Declarations