Study patients
This prospective observational two-center study enrolled 105 patients who were prescribed TNF-α inhibitors (adalimumab, etanercept, golimumab, or infliximab) at Ajou University Hospital and Chungbuk National University Hospital between July 2017 and December 2019. Data collection was conducted using electronic medical records. Data on sex, age, weight, height, duration of RA, autoantibodies against rheumatoid factor, anti-cyclic citrullinated peptide, concomitant medications, and comorbidities were collected from electronic medical records. Additionally, baseline data on disease activity score (DAS)-28 and its subcomponents, which included tender joint count (TJC)-28, swollen joint count (SJC)-28, global health (GH), and erythrocyte sedimentation rate (ESR) or C-reactive protein levels, were collected.
A good clinical response to anti-TNF therapy was defined as basis of the DAS-28 scores. Patients with a DAS-28 score of less than 2.6 after 6 months of TNF-α inhibitor therapy, were considered to be in remission [21]. DAS-28 was calculated as 0.56 × √(TJC28) + 0.28 × √(SJC28) + 0.70 × ln(ESR) + 0.014 × GH [21].
This study was approved by the Institutional Review Boards of the Ajou University Hospital (approval number: AJIRB-BMR-OBS-17-153) and Chungbuk National University Hospital (approval number: 2017-06-011-004). All patients submitted written informed consents for participation. This study was conducted according to the principles of the Declaration of Helsinki (2013).
Genotyping methods
To select single nucleotide polymorphisms (SNPs) of TLR4 and TLR9 that might be associated with RA remission, genetic information on TLR4 and TLR9 was obtained from the PharmGKB database, Haploreg 4.1, the NCBI Database of SNPs (dbSNP), and previous studies [19, 22-26]. A total of six SNPS, including four SNPs of TLR4 (rs11536889, rs1927907, rs1927911, and rs2149356) and two SNPs of TLR9 (rs352139 and rs352140), were selected.
Genomic DNA of the patients was isolated from ethylenediaminetetraacetic acid–blood samples using the QIAamp DNA Blood Mini Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer’s protocol. Genotyping was performed using a single-base primer extension assay with SNaPShot multiplex kits (ABI, Foster City, CA, USA) or TaqMan genotyping assay in a real-time PCR system (ABI 7300, ABI), according to the manufacturer’s recommendations.
Statistical analysis and machine learning methods
Student’s t-test was used to compare continuous variables between patients who showed good clinical response (remission) and those who did not. Chi-square test or Fisher’s exact test was used to compare categorical variables between the two groups. Multivariable logistic regression analysis was used to examine independent factors for remission; factors with a p-value less than 0.05 in univariate analysis along with clinically relevant confounders were included in multivariable analysis. The Hosmer–Lemeshow test was performed to confirm the model’s goodness of fit.
This study employed a random forest–based classification approach to analyze the importance of different variables for factors that affect remission. To prevent over-fitting, we selected seven features that are most important. Various machine learning methods such as multivariate logistic regression, elastic net, random forest, and support vector machine (SVM) were utilized for prediction of remission. All the methods were implemented with the caret R package. The area under the receiver-operating curve (AUROC), to assess the ability of the risk factor to predict complication, and its 95% confidence interval (CI) of each machine learning prediction models were described in this study. A p-value of less than 0.05 was considered statistically significant. Univariate statistical analysis was conducted using IBM SPSS statistics, version 20 software (International Business Machines Corp., New York, USA). All other analyses were performed using R software version 3.6.0 (R Foundation for Statistical Computing, Vienna, Austria).
To measure performance of each machine learning model, internal validation was done. The dataset was randomly divided for model development and evaluation in prediction process. After partitioning one data sample into five subsets, one subset was selected for model validation while the remaining subsets were used to establish machine learning models. Each five-fold cross-validation iteration was repeated 100 times to evaluate the power of the machine learning models.