Principal Results
We describe a pilot validation study of, to our knowledge, the first multilingual DDSS developed for people suspecting a rheumatic disease. To our knowledge, this is also the first multicenter validation study of a rheumatic DDSS. In our pilot study we tested the performance of Rheumatic? in 175 research individuals and patients from three different university centers. We observed a high performance of this online tool in differentiating individuals and patients with immune-mediated rheumatic conditions versus those without, both when using the total score and when applying the currently implemented thresholds. We tested Rheumatic? in the setting of individuals with musculoskeletal complaints where only a subset developed arthritis. Here the AUC-ROC was 75%. We furthermore tested the tool in the setting of patients with unclassified arthritis, where a subset had gout or osteoarthritis instead of an immune-mediated rheumatic disease. Here we observed laudable performance with AUC-ROC. These results were less convincing in the final data set of the LUMC, with an AUC-ROC of 54%.
Rheumatic? currently has expert-based thresholds for several rheumatic diseases, where passing threshold 1 for any one of the diseases gives the advice to visit a general doctor. Passing threshold 2 triggers the advice to visit a rheumatologist. Here we observed that threshold 2 is highly specific for immune-mediated condition. However, in our setting it lacked sensitivity by identifying far too few patients with immune-mediated conditions. The sensitivity of threshold 1 was much better, though not optimal, with 0.61 to 0.67. The results of our pilot study suggest that the thresholds might need to be optimized by defining different scores for the threshold, and perhaps reviewing the scoring of the specific questions.
The comparison between the data of the three different centers provides some interesting insights. We used data from existing cohorts, that all had their own approaches for selecting research individuals and patients. We cannot draw hard conclusions about the differences between the inclusion methods as they are applied in different centers. However, the observed differences seem to support the validity of Rheumatic? For instance, the higher scores in KI compared to LUMC could be explained by the selection of individuals with ACPA positivity and high risk for rheumatoid arthritis in KI. Similarly, the overall highest scores at the LUMC, as well as the much higher minimum scores there, could be explained by the selection of patients with immune-mediated arthritis. This might also explain the lack of discriminative ability of Rheumatic? in the LUMC data. This dataset contains patients for whom the rheumatologist also considered an immune-mediated condition. Both the rheumatologist and Rheumatic? use the same questions to come to this conclusion.
Limitations
Our study has several limitations. First, due to the enrichment of non-immune-mediated rheumatic conditions the datasets in Setting B did not reflect the true prevalences of the individual diseases and therefore we cannot calculate the positive and negative predictive values. Secondly, retrospectively entering symptoms (setting B) described by patients introduces a DDSS usage bias. Finally the initial aim of the tool was to identify patients at risk for developing rheumatic diseases. Our datasets contained patients who were already selected by experts for being at risk, excluding those with unspecific signs of arthralgia or other musculoskeletal problems. Nevertheless, Rheumatic? was able to further differentiate within this group.
To further optimize the performance we need larger samples sizes than in the current study. Also, we would need to include patients without a rheumatic condition and having patients filling out the questionnaire by their own. An additional way of optimizing the sensitivity is to link the usage of the DDSS tool to genetic and immunological blood testing for specific biomarkers as currently under development in our rheumatology units.
Two recently started studies, a prospective multicenter study and a population-wide study, will further help to ameliorate Rheumatic? performance.
Comparison with Prior Work
In 1991, Moens et al. concluded that rheumatology is a suitable domain for computer-assisted diagnosis(11). Despite the elapsed time, such systems are still not part of standard rheumatology care. Alder et al. concluded in a review in 2014 that the validation process of rheumatology DDSS was in general underappreciated and none of the systems seemed to have succeeded in daily practice(12).
Patients and rheumatologists, however, still seem to believe in the positive potential impact of a patient facing Digital Diagnostic Decision Support Systems. A recent study showed that 80% of physicians agreed that an app that could diagnose symptoms of rheumatic diseases in patients will be helpful(22). Furthermore, in a recent survey study 50% of RMD patients stated that they would be interested in using an app for symptom decision support(1).
An RMD DDSS, based on a fuzzy cognitive map technique showed a diagnostic accuracy of 87% in a validation study with 15 vignette cases(22). In a prospective pilot study, 34 patients completed an NHS and WebMD symptom checker. Only 4 out of 21 patients with immune-mediated arthritis were given a first diagnosis of rheumatoid arthritis or psoriatic arthritis(23). People suspecting axial spondyloarthritis (axial SpA), using an online-self-referral were diagnosed in 19.4% with axial SpA(24). This proportion being significantly higher than the assumed 5% prevalence of axial SpA in patients with chronic back pain(25).
Besides patient facing DDSS / symptom checkers, DDSS also represent a great tool for physicians to improve their diagnostic skills. DDSS might be especially attractive for young physicians, due to their limited work experience. McCrea et al showed that using a decision tree lead to an improved accuracy in medical students diagnosing 10 rheumatology cases (81% compared to 68%). Furthermore, using physician based DDSS could accelerate rare disease diagnosis(26, 27).
Current research suggests that the diagnostic accuracy of DDSS are user dependent(28). The effectiveness of DDSS could also depend on the patient’s eHealth literacy, which seems to be limited in RMD patients(1).
A major strength of our study lies in its multicenter approach and large validation sample size compared to previous studies(22, 23). The risk-adverse retrospective validation scenario was deliberately chosen. The majority of currently available symptom checkers seem to skip this validation process, providing little scientific evidence. The DDSS that were evaluated often use patient-vignettes instead of true data (15,27,29). We believe that focusing on improving an international overarching DDSS could boost quality standards in rheumatology by increasing transparency, objectivity and decreasing redundant single-center efforts(13).