Explainable artificial intelligence for predicting remission in patients with rheumatoid arthritis treated with biologics


 Background: We developed a model to predict remissions in patients treated with biologic disease-modifying anti-rheumatic drugs (bDMARDs) and to identify important clinical features associated with remission using explainable artificial intelligence (XAI).Methods: We gathered the follow-up data of 1204 patients treated with bDMARDs (etanercept, adalimumab, golimumab, infliximab, abatacept, and tocilizumab) from the Korean College of Rheumatology Biologics and Targeted Therapy Registry. Remission was predicted at one-year follow-up using baseline clinical data obtained at the time of enrollment. Machine learning methods (e.g., lasso, ridge, support vector machine, random forest, and XGBoost) were used for the predictions. The Shapley additive explanation (SHAP) value was used for interpretability of the predictions.Results: The ranges for accuracy and area under the receiver operating characteristic of the newly developed machine learning model for predicting remission were 52.8%–72.9% and 0.511–0.694, respectively. The Shapley plot in XAI showed that the impacts of the variables on predicting remission differed for each bDMARD. The most important features were age for adalimumab, rheumatoid factor for etanercept, erythrocyte sedimentation rate for infliximab and golimumab, disease duration for abatacept, and C-reactive protein for tocilizumab, with mean SHAP values of -0.250, -0.234, -0.514, -0.227, -0.804, and 0.135, respectively.Conclusions: Our proposed machine learning model successfully identified clinical features that were predictive of remission in each of the bDMARDs. This approach may be useful for improving treatment outcomes by identifying clinical information related to remissions in patients with rheumatoid arthritis.


Introduction
Rheumatoid arthritis (RA) is a chronic in ammatory disease that affects the synovial tissues in multiple joints. Biologics are often considered as a promising line of treatment for patients who have high disease activities despite treatment with conventional disease-modifying anti-rheumatic drugs (cDMARDs). [1][2][3] Biologics are generally prescribed by referring to clinical practice recommendations after considering factors such as disease activity, adverse events, and cost effectiveness. [1][2][3] However, treatment with biologics may be unsuccessful because of differences in the physiological and pathological characteristics among individuals. Indeed, clinical trials have frequently shown that approximately 30-40% of patients do not respond to treatment with biologics, and their response rates decrease with subsequent biologics. [4,5] Treatment failure due to ineffective biologics not only increases the pain experienced by the individual but also increases their cost of healthcare. [6] Therefore, it is necessary to develop good predictors that can identify the e cacies of different biologics for such individuals.
Several clinical, genetic, and proteomic studies have used statistical methods to identify biomarkers to predict responses to biologics in patients with RA. [7][8][9] Machine learning approaches that can complement statistical methods are able to incorporate such information for making accurate predictions. Furthermore, machine learning can be generalized across a broader array of data types and can produce results with complex situations as well. [10,11] However, it is often di cult for users to understand the processes by which machine learning predicts outcomes from relationships among numerous variables. Accordingly, several methods have been proposed to improve the interpretability of machine learning methods while maintaining the prediction accuracies of complex models. Explainable arti cial intelligence (XAI), which presents the reasons for a prediction in a manner that can be understood, suggests the relationships between several variables necessary for predicting outcomes. [12] In this study, we established a machine learning model using data from the Korean College of Rheumatology Biologics and Targeted Therapy Registry (KOBIO), [13] and show its application to identifying clinical variables predictive of remission in RA patients treated with biologics using the concept of XAI.

Study population
This study used data from the KOBIO registry, which is a nationwide multicenter cohort in Korea that was established to evaluate the effectiveness and side effects of biologic therapies in patients with RA. [13] Patients in the registry were recruited from 38 hospitals since 2012, and their demographics, medications, comorbidities, extra-articular manifestations, disease activities, radiographic ndings, and laboratory ndings have been recorded by investigators from each of the participating hospitals. The data from patients who were followed up annually were recorded on the KOBIO website (http://www.kobio.or.kr/kobio/), and these patients provided informed consent prior to registration.
Ethical approval of the KOBIO-RA was obtained from the institutional review boards of all 38 participating institutions, including the Institutional Review Board of Inje University Seoul Paik Hospital (PAIK 2018-11-005). Figure 1 shows the owchart for patient selection. From December 2012 to June 2019, a total of 2,122 patients who were treated with biologic disease-modifying anti-rheumatic drugs (bDMARDs) and targeted synthetic diseasemodifying anti-rheumatic drugs (tsDMARDs) were registered. The baseline data were obtained at the time of initial enrollment for prescription of the bDMARDs, and follow-up data were registered annually or when the bDMARDs were switched or stopped. Patients treated with tsDMARDs, such as tofacitinib (N = 33), were excluded from the analysis because the aim of the study was to predict patients who responded to bDMARDs. Among the bDMARDs, rituximab was excluded from the analysis owing to the small sample group (N = 2).

Data collection
After excluding data from subjects whose follow-up durations were less than 3 months from the baseline, a total of 1,204 baseline data and 1,397 follow-up data were obtained and used for the analysis in this study.

Machine learning methodology for predicting remission
The disease activity scores in 28 joints using the erythrocyte sedimentation rate (DAS28-ESR) were measured at baseline establishment and during follow-up. The outcome of the prediction model evaluating the responses to bDMARDs was "remission", which was de ned as DAS28-ESR ≤ 2.6 at follow-up. To prevent overestimation of remission by prednisolone treatment, another prediction model was constructed with the outcome of "remission without increasing prednisolone dose". An overview of the study ow is presented in Fig. 2.
Five machine learning models were used to predict remission in subjects receiving bDMARDs, tumor necrosis factor (TNF) inhibitors, non-TNF inhibitors, and each bDMARD; the models included lasso and ridge based on linear relationships, [14] support vector machine using kernel methods, [15] tree-based random forest, [16] and Xgboost [17]. To avoid over tting problems, the training and test sets were divided in a 7:3 ratio, and the models were trained with the training set; then, the prediction results were veri ed using the test set. For the training dataset, a 5-fold cross validation was performed to tune the hyperparameters determined as outside models. In this procedure, a grid search was conducted to evaluate all possible combinations of the hyperparameters. The grid search found optimal hyperparameters with the objective function of determining the area under the receiver operating characteristics (AUROC) in each model. Bootstrapping (random sampling with replacement) was also performed to obtain a median value for the AUROC curve and to determine the accuracy for reducing measurement variances caused by small samples when dividing between the training and test sets.
Dimension reduction was performed to avoid the "curse of dimensionality" caused by a large number of variables compared with the size of the data. Among a total of 64 variables, 15 variables known to be of clinical importance were preselected (i.e., sex, age, baseline DAS28-ESR, methotrexate dose, steroid dose, erythrocyte sedimentation rate [ESR], C-reactive protein [CRP], rheumatoid factor [RF], anti-cyclic citrullinated peptide antibody [ACPA], anti-nuclear antibody [ANA], and ve comorbidities). Subsequently, 20 variables that were highly correlated with the drug response of each bDMARD were added for model training.

XAI for identifying important clinical features associated with responses to bDMARDs
Using the Xgboost machine learning model, the important variables and their impact on predicting remissions using XAI were determined. By focusing on improving the performance and accuracy, machine learning models have become complex, and their interpretability has declined. Although there are some feature importance measures in random forest and Xgboost, these models provide inconsistent results depending on the tree structure; in addition, they only show the overall importance and not the direction of the effect of the independent variables. [12] To overcome these issues, the Shapley additive explanations (SHAP) method was developed, [18] which approximates a complex model to a linear model and interprets the feature importance in the linear model to demonstrate the amount by which a given feature changes the prediction. This methodology satis es three conditions: (1) the approximated linear model has a similar accuracy to the original model in the local domain, (2) meaningless variables have no impact on the explanatory power of the model, and (3) feature importance is consistent in the model structure. Accordingly, the SHAP method demonstrates consistent feature importance regardless of model structure and direction of effect of the predictive variables, thereby allowing clinicians to acquire insights into achieving remission and to nd potential variables affecting the selection of appropriate bDMARDs.

Statistical analysis
All data are shown as mean (standard deviation [SD]) or percentage values. To evaluate the machine learning performance, the accuracy and AUROC curve were analyzed. The no information rate, which is the largest proportion of the observed classes, was used as a baseline to determine the overall distribution of the classi cation and to compare with those of the machine learning models. Statistical analyses were performed using R software version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria), and model training was performed using the caret package and SHAPforxgboost package in R.

Results
Clinical characteristics of the patients Table 1 shows the clinical characteristics of the 1,204 patients included in the study. The mean (SD) age at baseline was 54.0 (12.8) years and the majority (82.6%) of the patients were female. The mean (SD) disease duration was 7.1 (7.2) years. The rheumatoid factor (RF) and ACPA positivity were 83.2% and 73.4%, respectively, and the mean (SD) DAS28-ESR values at baseline and follow-up were 5.6 (1.0) and 4.3 (1.3), respectively. History of cardiovascular diseases (%) 3.9 History of lung diseases (%) 6.1 History of hemato-oncologic diseases (%)

Important features for remission in XAI
The SHAP method for remission was used to determine the in uences of the variables that contributed to remission in the prediction model. The interpretations of the feature importance with the Shapley plot are shown in Fig. 3 Table 3 shows the order of the average of the SHAP value ranks from the Shapley plot in the bDMARDs. The degree and direction of contribution of the variables to remission were different for each bDMARD. Among the variables, the ESR had the highest average ranking (4.0) and had a negative association with remission for all bDMARDs. Hemoglobin had the second-highest average ranking (5.3) and had a positive association with remission for all bDMARDs except for golimumab (no association). The age and DAS28-ESR factors had the third and fourth highest average rankings (6.67 and 7.83, respectively) and had negative associations with remissions for adalimumab, etanercept, in iximab, golimumab, and tocilizumab. For abatacept, DAS28-ESR was positively associated with remission, and age had a nonlinear relationship with remission.  Notes: Asterisk denotes a non-linear relationship such as quadratic effect or mixed effect between drugs and variables; The average ranking was obtained by averaging the rankings of the 6 bDMARDs.
The relationship between RF levels and remission was different for each bDMARD. RF was positively associated with remission in adalimumab, but negatively associated with remission in etanercept and tocilizumab. RF had a nonlinear relationship with remission in in iximab and abatacept, and no association with remission in golimumab.
In terms of ACPA, low ACPA was associated with remission in all bDMARDs (negative association) except tocilizumab (positive association). The elevation of CRP was associated with remission in bDMARDs (positive association) except for adalimumab and golimumab (no association). In most bDMARDs, disease duration, platelet, cholesterol, and triglyceride showed negative associations with remission. ANA had a negative association with remission for in iximab and abatacept.

Predictive model for remission without increasing prednisolone dose
We built a machine learning model for remission without increasing prednisolone dose as another outcome (Additional le: Table S1). Among the 1,397 follow-ups, 537 were classi ed as remission without increasing prednisolone dose. The ranges of accuracy and AUROC were 54.1%-72.9% and 0.517-0.698, respectively, which were similar to the results for predicting remission. In the Shapley plots, compared with the outcomes for predicting remission, the SHAP scores had slight uctuations while the order of the variables was similar (Additional le: Figure  S1). In the ranking of important features, there was a slight change due to the difference in the SHAP value of each variable (Additional le: Table S2). However, important features such as ESR, DAS28-ESR, CRP, age, hemoglobin, RF, and ACPA retained high rankings for predicting remission.

Discussion
In the prediction models of remission with all machine learning methods, the ranges of accuracy and AUROC were 52.8-72.9% and 0.512-0.694, respectively. Notably, based on the machine learning models for predicting remission, we identi ed important clinical features that were associated with remission after treatment with bDMARDs.
Although it is possible to estimate the importance of each variable by linear regression analysis, we determined that machine learning would complement statistical methods due to the complex relationships between the variables and remission. Some signi cant clinical features for remission prediction were commonly identi ed in the bDMARDs, albeit with slight differences in the impact between the bDMARDs.
As opposed to how machine learning models, such as those using deep learning are "black-box" models to explain the reason for prediction, the XAI machine learning method provides reasons for prediction in a manner that lay users can easily understand. For interpretability, the SHAP method, which is based on the Shapley values, was recently presented. [18] The Shapley value employs a method based on game theory, which was introduced to suggest how to fairly distribute payout among the features. [19] This theory has in uenced various elds and has recently been used in machine learning to improve the interpretability of complex models. The SHAP method is a uni ed framework that improves interpretability while maintaining predictability of complex models with machine learning. Using the SHAP method, we identi ed the characteristics of the important variables that contribute to prediction of remission in a complex cohort dataset. In the use of healthcare data, it is important to discern the relationships between the variables in clinical, genomic, and other types of healthcare data. This proposed method is expected to provide insights for nding the relationships among numerous variables in the integration of large-scale healthcare data.
Among various baseline variables, several laboratory ndings were identi ed as important features that ranked highly for predicting remission. For all the bDMARDs, the most important feature was the ESR, which is a variable predictive of disease activity. [20,21] Hemoglobin was the second most important feature in our study; considering how anemia is associated with disease activity [22] and erosion progression, [23] hemoglobin may be an important factor indicating disease states in patients with RA. The RF and ACPA were also important for prediction of remission and contributed to determination of the direction of treatment in patients with RA. [24,25] However, several studies have shown con icting results on the relationship between RF and ACPA in response to TNF inhibitors. [26,27] In our study, the RF and ACPA were associated with remission in various degrees of impact and direction for each bDMARD. Moreover, RF was not linearly related to remission for in iximab and abatacept.
This study has some obvious limitations. First, the dosage intervals and doses of biologics were not considered. Second, this study did not distinguish between the primary response failure (i.e., failure of clinical improvement) and secondary response failure (i.e., loss of response after clinical improvement). [27] Third, this study did not provide evidence as to how the important features of each bDMARD were related to their mechanisms of action. Fourth, because this study focused on the important features of the variables needed for prediction in the machine learning model, the performance of machine learning was not con rmed using external test sets from other RA cohort data.

Conclusion
We successfully developed machine learning models to predict remission as a response to different bDMARDs in active RA patients based on their clinical pro les. Using these models, the XAI was able to identify important clinical features associated with remission according to the biologics used. We noted that some important features were more strongly associated with remission, albeit the order of their relative importance was different for each biologic.
Our results suggest that an advanced machine learning approach may be helpful for supporting clinical decisions to improve treatment outcomes with biologics in RA patients.