We developed a machine learning model that predicts radiographic progression using EMR data from patients with AS accumulated over 18 years. The RF model trained on data from the first visit predicted radiographic progression with an accuracy of 73.73% and an AUC of 0.7959, showing the best performance among the three machine learning models. We also found that the accuracy and AUC decreased as we trained with the second and third visit data. These results suggested that the data accumulated over a longer period did not increase the performance of the three machine learning models, and the data from the first visit in AS may contain important predictors for predicting radiographic progression.
Joo et al. predicted radiographic progression using machine learning on the training set (n = 253) and test set (n = 173) [9]. In their results, balanced accuracy in the test set was over 65% in all models and 69.3% in RF, which was the highest of all models. In addition, the generalized linear model and support vector machine showed the best performance with an AUC of over 0.78. The outcome of their study is similar to ours in predicting radiographic progression, but with significant differences in detail. First, we examined machine learning based prediction models for radiographic progression according to each visit using three time-point datasets containing EMR data accumulated over 18 years. We used a larger amount of time-series data compared to them and were able to identify clinical characteristics for each time point that could affect radiographic progression. These results provided various insights into radiographic progression in AS studies. In addition, the accuracy and AUC achieved in our study were higher than those achieved by them. This difference in predictive power may be related to the difference in the amount of data and variables, such as limited features for bone marrow density and syndesmophyte score and additional laboratory findings.
We used time-series EMR data accumulated from the first, second, and third visits to predict radiographic progression at subsequent visits. Interestingly, the predictive performance was poorer when the model included clinical datasets at the second or third visit compared to only the clinical dataset from the first visit. This could be explained by the fact that baseline data from the first visit contained important information for predicting radiographic progression. In addition, as treatment with NSAIDs was started at the first visit, the disease activity index such as the Bath Ankylosing Spondylitis Disease Activity Index, CRP, and ESR decreased subsequently. A decrease in the disease activity index, which leads to an increase in mSASSS [2–4], may have reduced the differences in important features between individuals. As a result, the prediction performance may have deteriorated for the datasets of the second and third visits.
The EMR data of AS have accumulated over time with various variables. EMR data are important in AS because the radiographic progression advances slowly over a long period and can be related to many variables. Above all, radiographic progression may be the result of the delayed effects of various clinical variables because inflammation begins, ossifies, progresses to syndesmophyte and can be confirmed on radiographs. Although disease activity, such as CRP or ankylosing spondylitis disease activity score, is known to be an important predictor of radiographic progression [25, 26], it is not an absolute long-term factor in determining radiographic progression. For example, radiographic progression continues even when recurrent transient inflammations are actively controlled [27]. This evidence suggests that various factors influence radiographic progression. Therefore, it might be difficult to find factors related to radiographic progression from the EMR data comprising numerous variables that we have using existing statistical analysis. Unlike the analysis of numerous statistical associations that have already been conducted, this study aims to provide insight into the timing and factors important for the prediction of radiographic progression.
Several machine learning models using large datasets have been useful for diagnosing axial spondyloarthritis [12]. Those approaches can help in early diagnosis and reducing the social burden of disease. Using a claim dataset, Deodhar et al. suggested that machine learning models have a positive predictive value of 6.24% compared to the Assessment of SpondyloArthritis International Society classification criteria with a positive predictive value of 1.29% [7]. In addition, machine learning models with EMR datasets have also shown good performance for predicting the diagnosis of axial spondyloarthritis with accuracies ranging from 82.6–91.8% [10, 11, 13]. It can be used for early diagnosis of AS by creating a machine learning model with image data and text data because images such as radiographs are important in the diagnosis of AS. The detection of sacroiliitis using X-ray, computed tomography, and magnetic resonance imaging using machine learning methods has been conducted recently with excellent performance in the screening of patients with AS [6, 8, 28]. Therefore, developing a machine learning model useful for diagnosis by combining image, life-log, and clinical information is essential to improve diagnosis accuracy, which is worthy of future challenges for the prediction of radiographic progression in patients with AS. Furthermore, an important task is assembling a representative and diverse dataset to meet the demands of high-performance machine learning models [29].
Machine learning methods may be useful even in areas with insufficient data [30]. Statistical methods employ a validation system that determines statistical significance based on the random sampling of data. By contrast, because machine learning models use big data, they rely on training data as a convenient sample rather than random sampling. Therefore, cross-validation or external validation is essential. Learning all the data in the population would solve this problem, but such an undertaking is infeasible. Nevertheless, machine learning models can be valuable because, once trained, a machine learning model can be easily changed to a model that reflects the characteristics of other hospitals with a small amount of data through “transfer learning.” However, caution is required in interpreting even the results inferred from these models.
There are some limitations to our study. First, we applied three machine learning models to predict individual radiographic progression and identified the importance of features that contribute to their prediction. Interpretation of the importance of features is possible because previous statistical studies have shown the factors related to radiographic progression. Therefore, machine learning methods may help complement statistical methods. However, additional statistical validation is needed for generalization of important unknown features that contribute to radiographic progression. Second, we used the EMR data from a single center. Validation using EMR data from various centers is required. Third, we utilized a machine learning model using EMR data at diagnosis and initial treatment. Therefore, this model can predict radiographic progression only when a patient first visits the hospital. In future, it will be necessary to develop a model that can predict radiographic progression at various time points by advancing machine learning models. Fourth, there may be models using algorithms that are better than the machine learning models developed in this study. It is possible to try a better model using an artificial neural network, but it may become more difficult for clinical application owing to the limitations of the “black box” model.
Among the datasets, including for the first, second, and third visits, predicting the radiographic progression of the second visit using the first visit dataset resulted in the best performance, with the highest accuracy and AUC. Therefore, the clinical features of the first visit are likely to contain important information for predicting radiographic progression. In terms of importance of features, mSASSS, age, ALP and CRP were ranked high. In addition to EMR data, various types of data, such as images and life-log, may be required to increase accuracy.