Rheumatic? - A Digital Diagnostic Decision Support Tool for Individuals Suspecting Rheumatic Diseases: A Multicenter Validation Study

Background Digital diagnostic decision support tools promise to accelerate diagnosis and increase health care eciency in rheumatology. Rheumatic? is an online tool developed by specialists in rheumatology and general medicine together with patients and patient organizations. It calculates a risk score for several rheumatic diseases. In the current pilot study, we retrospectively test Rheumatic? for its ability to differentiate symptoms from immune-mediated diseases from other rheumatic and musculoskeletal complaints and disorders in patients visiting rheumatology clinics. Methods

Rheumatic? is a web-based patient-centered multilingual diagnostic tool capable of differentiating immune-mediated rheumatic conditions from other musculoskeletal problems. The scoring system might be further optimized, for which we will perform a prospective study.

Background
Despite generally increasing digitalization in rheumatology (1)(2)(3)(4), the decision for referral of new patients suspecting rheumatic diseases is mostly analogue with few exceptions and has not changed in the last decades (5,6).
Diagnostic delays do not seem to improve signi cantly (7,8) and often inhibit early, and therefore effective, therapy. Up to 60% of new referrals to rheumatologists end up having no immune-mediated rheumatic disease (9). In contrast to emergency medicine(10), rheumatology has not yet developed objective and transparent triage standards, further complicating patient referral. Incomplete, illegible and not importable paper-based referral forms seem to be outdated and a bottleneck in current clinical care.
Digital diagnostic decision support systems (DDSS) and in particular online self-referral (OSR) systems and symptom checkers (SC) promise to accelerate diagnosis in rheumatic diseases (11)(12)(13) and improve health care e ciency. Currently more than 100 SCs exist (14) and are increasingly used by patients. Only a minority of these SCs showed transparent, published, promising evidence before being publicly available (15,16,17). The inclusion of clinical experts and patients in the development process has been recommended by various rheumatology societies (18,19).
Rheumatic? is such a web-based screening tool available in Swedish, English, German and Dutch. (13) It is developed by designers, engineers, clinical experts and patients. This screening tool was designed to capture patients at risk for developing a rheumatic disease. The initial scoring was done by experts in the respective rheumatic diseases included in the screening tool, and needs further validation and improvement. The aim of this pilot study was to test this multilingual, comprehensive DDSS for people suspecting a rheumatic disease and to validate its discriminatory ability in patients with and without immune-mediated rheumatic problems.

Methods
Rheumatic? a web-based screening tool Rheumatic? is a web-based screening tool to identify individuals with early signs of or at high risk for developing Rheumatoid Arthritis, Ankylosing Spondylitis, Systemic Lupus Erythematosus, Myositis, Systemic Sclerosis, or Sjögren's Syndrome (in this paper called immune-mediated rheumatic diseases) (13). A team of designers, engineers, clinical experts, patients, and at risk individuals worked together to build a test that was both medically correct and effortless to use for an average person. The weights and thresholds for scoring Rheumatic were de ned. In short, a rst draft of the questions was made, which was then reviewed and adapted in a collaborative workshop with several experts, to make sure the questions covered the most important symptoms of each diagnosis while still keeping it as short and relevant as possible. In this workshop, a rst draft of the scoring was made, by having the experts approximate how signi cant each question and option was for each diagnosis. These scorings were then implemented as an interactive prototype where different combinations of answers could be tested, and iteratively improved and revised with input from the experts in several meetings during 2018.
The tool was constructed as part of the JPAST project and versions supported by the EU EIT Health program, and exists in Dutch, English, German and Swedish are at the moment accessible for researchers.
Identifying questions and setting thresholds: experienced clinicians provided prognostic questions for rheumatic diseases, based on their clinical and scienti c knowledge. In the next step these experts weighted the importance, the sensitivity and the speci city of each question for a speci c disease. The weight of the questions was then used to set threshold-1 for any of the diseases, with the advice to visit a general physician and threshold-2 for any disease, with the advice to visit a rheumatologist (13).

Study Design
This pilot study utilized retrospective data collection, where research patient records were used to ll out the questionnaire. In this pilot phase, we aimed to test the ability to differentiate between immunemediated and other (osteoarthritis (OA) and gout) rheumatic diseases. The retrospective design allowed us to select a su cient number of patients with different rheumatic diagnoses.
We investigated two clinical settings covering two different stages of disease development. For this we used data from three different clinical centers.
Setting A concerned Risk-RA individuals with anti-citrullinated protein antibody positivity and musculoskeletal complaints (without arthritis) signifying a high risk for future rheumatic diseases.. Here the outcome was the development of arthritis which served as gold standard in the analyses. Setting B concerned patients with early unclassi ed arthritis without a clear diagnosis. Here the nal diagnosis of different rheumatic diseases, ful lling classi cation criteria served as the endpoint.

Setting A Dataset 1 -Karolinska Institutet
We analyzed 50 individuals from Karolinska Risk RA prospective cohort study (20) with at least two years follow up time. The patients were further selected so about half of them developed arthritis during follow up to ensure power to our analysis.
Patients in this cohort are all referred from non-rheumatologist-specialist (in most of the cases primary care doctor) due to suspicions of rheumatic disease, had musculoskeletal complaints and ACPA positivity without having clinical or ultrasound-based arthritis at the rst visit at Karolinska University hospital or Center for Rheumatology in Stockholm, Sweden.

Setting B Dataset 2 -Erlangen University Medical Centre (EUMC)
Patients referred from a non-rheumatologist-specialist visiting the rheumatology clinic of the Erlangen University Medical Centre (EUMC) with unclassi ed arthritis? Joint swelling from a prospective cohort study were included. We selected 51 patients with a follow-up of at least one year and a nal diagnosis from this cohort, again aiming for a variety of diagnoses and at least 50% patients with non-immunemediated rheumatic disease (osteoarthritis and gout). Here, we grouped patients by disease and randomly selected patients from the disease groups.

Dataset 3 -Leiden University Medical Centre (LUMC)
Patients visiting the rheumatology clinic of the Leiden University Medical Centre (LUMC) with an initial diagnosis of unclassi ed arthritis were recruited. These patients with an (not yet classi able) in ammatory rheumatic disease are included in the Leiden Early Arthritis Clinic (21). After one year of follow-up the nal diagnosis was registered. We selected 72 patients from this cohort, aiming for a variety of diagnoses given at the end of the one-year observation period and with and at least 30% patients with osteoarthritis or gout and 70% with the immune-mediated diseases subject to identi cation with Rheumatic?. Here, we grouped patients by disease diagnosed at one year after inclusion and randomly selected patients from the different disease groups with xed proportions of patients with the different rheumatic diseases.

Statistical Analysis
Rheumatic? Gives a total score, which is built from the individual scores for each of the six diseases. For each disease a participant can pass a rst or second threshold, which will inform the participant about the likelihood of having a rheumatic disease. We tested the performance of: a. the total score using Wilcoxon rank test and the area-under-the-receiver-operating-curve (AUC-ROC).
b. the sensitivity and speci city for having an immune-mediated rheumatic disease when passing threshold 1.
c. the sensitivity and speci city for having an immune-mediated rheumatic disease when passing threshold 2.
All analyses were performed using R. AUC-ROC were calculated using the pROC library

Results
Characteristics By design, the presence of immune-mediated and other rheumatic outcomes were well distributed in each cohort. In KI, 42% of the Risk-RA individuals developed arthritis during the two years follow up. In patients with unclassi ed arthritis in the EUMC and the LUMC cohorts 55% and 69% were diagnosed with an immune-mediated rheumatic disease after one year of follow-up. Table 1 describes the research individuals and patients in more details. * the immune-mediated outcome was the development of in ammatory arthritis in setting A and the development of an immune-mediated rheumatic disease in setting B. RS3PE = remitting seronegative symmetrical synovitis with pitting edema Discriminatory ability of the total score In each cohort there was a wide variety in scores. Overall, patients who developed an immune-mediated disease after one or two years had a signi cantly higher Rheumatic? score at recruitment compared to those who did not P < 0.0001 in all centers ( Table 2, Fig. 1). developed an immune-mediated rheumatic disease (imRD) and those who did not, but this difference was smaller than in the other centers (212 versus 262, P < 0.0001). Here too, the maximum score differed between the two groups: 459 versus 536.

Performance of Rheumatic? threshold
Though the overall score is informative, the value of Rheumatic? is to support patients to seek adequate care when they are at risk of any of the rheumatic diseases. Patients with a low overall score but a high score for one disease, should still be advised to seek medical care.
In setting A, consisting of individuals presenting without arthritis but with ACPA positivity and musculoskeletal complaints, the individuals who developed arthritis passed threshold 1 substantially more often than those who did not develop arthritis (28% versus 16%, Table 3). This difference was less clear, by design, for threshold 2, as in setting A individuals do not have any rheumatic disease, which only a small minority reached (1 patient (2%) and 0 patients( 0%). This resulted in a sensitivity and speci city of 67% and 72% for threshold 1, and 5% and 100% for threshold 2. of times in EUMC and LUMC, and threshold 2 in 4% versus 2% and 10% versus 3% respectively. This resulted in a sensitivity of 0.61 and 0.67 for threshold 1 and 0.07 and 0.14 for threshold 2 in EUMC and LUMC respectively. The speci city was 0.87 and 0.23 for threshold 1 and 0.96 and 0.91 for threshold 2.

Discussion
Principal Results We describe a pilot validation study of, to our knowledge, the rst multilingual DDSS developed for people suspecting a rheumatic disease. To our knowledge, this is also the rst multicenter validation study of a rheumatic DDSS. In our pilot study we tested the performance of Rheumatic? in 175 research individuals and patients from three different university centers. We observed a high performance of this online tool in differentiating individuals and patients with immune-mediated rheumatic conditions versus those without, both when using the total score and when applying the currently implemented thresholds. We tested Rheumatic? in the setting of individuals with musculoskeletal complaints where only a subset developed arthritis. Here the AUC-ROC was 75%. We furthermore tested the tool in the setting of patients with unclassi ed arthritis, where a subset had gout or osteoarthritis instead of an immune-mediated rheumatic disease. Here we observed laudable performance with AUC-ROC. These results were less convincing in the nal data set of the LUMC, with an AUC-ROC of 54%.
Rheumatic? currently has expert-based thresholds for several rheumatic diseases, where passing threshold 1 for any one of the diseases gives the advice to visit a general doctor. Passing threshold 2 triggers the advice to visit a rheumatologist. Here we observed that threshold 2 is highly speci c for immune-mediated condition. However, in our setting it lacked sensitivity by identifying far too few patients with immune-mediated conditions. The sensitivity of threshold 1 was much better, though not optimal, with 0.61 to 0.67. The results of our pilot study suggest that the thresholds might need to be optimized by de ning different scores for the threshold, and perhaps reviewing the scoring of the speci c questions.
The comparison between the data of the three different centers provides some interesting insights. We used data from existing cohorts, that all had their own approaches for selecting research individuals and patients. We cannot draw hard conclusions about the differences between the inclusion methods as they are applied in different centers. However, the observed differences seem to support the validity of Rheumatic? For instance, the higher scores in KI compared to LUMC could be explained by the selection of individuals with ACPA positivity and high risk for rheumatoid arthritis in KI. Similarly, the overall highest scores at the LUMC, as well as the much higher minimum scores there, could be explained by the selection of patients with immune-mediated arthritis. This might also explain the lack of discriminative ability of Rheumatic? in the LUMC data. This dataset contains patients for whom the rheumatologist also considered an immune-mediated condition. Both the rheumatologist and Rheumatic? use the same questions to come to this conclusion.

Limitations
Our study has several limitations. First, due to the enrichment of non-immune-mediated rheumatic conditions the datasets in Setting B did not re ect the true prevalences of the individual diseases and therefore we cannot calculate the positive and negative predictive values. Secondly, retrospectively entering symptoms (setting B) described by patients introduces a DDSS usage bias. Finally the initial aim of the tool was to identify patients at risk for developing rheumatic diseases. Our datasets contained patients who were already selected by experts for being at risk, excluding those with unspeci c signs of arthralgia or other musculoskeletal problems. Nevertheless, Rheumatic? was able to further differentiate within this group.
To further optimize the performance we need larger samples sizes than in the current study. Also, we would need to include patients without a rheumatic condition and having patients lling out the questionnaire by their own. An additional way of optimizing the sensitivity is to link the usage of the DDSS tool to genetic and immunological blood testing for speci c biomarkers as currently under development in our rheumatology units.
Two recently started studies, a prospective multicenter study and a population-wide study, will further help to ameliorate Rheumatic? performance.

Comparison with Prior Work
In 1991, Moens et al. concluded that rheumatology is a suitable domain for computer-assisted diagnosis (11). Despite the elapsed time, such systems are still not part of standard rheumatology care. Alder et al. concluded in a review in 2014 that the validation process of rheumatology DDSS was in general underappreciated and none of the systems seemed to have succeeded in daily practice (12).
Patients and rheumatologists, however, still seem to believe in the positive potential impact of a patient facing Digital Diagnostic Decision Support Systems. A recent study showed that 80% of physicians agreed that an app that could diagnose symptoms of rheumatic diseases in patients will be helpful (22).
Furthermore, in a recent survey study 50% of RMD patients stated that they would be interested in using an app for symptom decision support(1).
An RMD DDSS, based on a fuzzy cognitive map technique showed a diagnostic accuracy of 87% in a validation study with 15 vignette cases (22). In a prospective pilot study, 34 patients completed an NHS and WebMD symptom checker. Only 4 out of 21 patients with immune-mediated arthritis were given a rst diagnosis of rheumatoid arthritis or psoriatic arthritis (23). People suspecting axial spondyloarthritis (axial SpA), using an online-self-referral were diagnosed in 19.4% with axial SpA (24). This proportion being signi cantly higher than the assumed 5% prevalence of axial SpA in patients with chronic back pain (25).
Besides patient facing DDSS / symptom checkers, DDSS also represent a great tool for physicians to improve their diagnostic skills. DDSS might be especially attractive for young physicians, due to their limited work experience. McCrea et al showed that using a decision tree lead to an improved accuracy in medical students diagnosing 10 rheumatology cases (81% compared to 68%). Furthermore, using physician based DDSS could accelerate rare disease diagnosis(26, 27).
Current research suggests that the diagnostic accuracy of DDSS are user dependent(28). The effectiveness of DDSS could also depend on the patient's eHealth literacy, which seems to be limited in RMD patients (1).
A major strength of our study lies in its multicenter approach and large validation sample size compared to previous studies (22,23). The risk-adverse retrospective validation scenario was deliberately chosen. The majority of currently available symptom checkers seem to skip this validation process, providing little scienti c evidence. The DDSS that were evaluated often use patient-vignettes instead of true data (15,27,29). We believe that focusing on improving an international overarching DDSS could boost quality standards in rheumatology by increasing transparency, objectivity and decreasing redundant singlecenter efforts(13).

Conclusions
By incorporating input from RMD patients, rheumatologists and general practitioners from multiple countries, a multilingual DDSS for people suspecting a rheumatic disease was created, This rst validation shows that Rheumatic? is capable of differentiating immune-mediated rheumatic conditions from other musculoskeletal problems. Future prospective, patient-lead, and independent studies will provide further validation and amelioration.