Our study was a pragmatic, cluster randomized, open label, controlled clinical trial. The methods for this study were previously published  and the statistical analysis plan (SAP) is available in Supplement 1. General physicians (GPs) were invited to participate in the study through the clinical laboratories with which they collaborated, and all GPs provided a written consent to participate. They were rewarded for enrolling patients and trial-related tasks but they were not rewarded for using the intervention. Patients provided written consent before enrolment.
Study design and patients
From December 2017 to June 2018, GPs enrolled patients aged ≥18 years with a laboratory test order for at least one of 17 indications; cardiovascular disease follow-up or screening, hypertension, check-up, chronic kidney disease (CKD), thyroid disease, type 2 diabetes mellitus, fatigue, anemia, liver disease, gout, suspicion of acute coronary syndrome (ACS), suspicion of lung embolism, rheumatoid arthritis, sexually transmitted infections (STI), acute diarrhea, chronic diarrhea, and follow-up of medication. The combination of tests ordered together for one or more of the above indications at one give time are further referred to as a laboratory panel. All tests were analyzed by one of three different ambulatory clinical laboratories.
The CDSS was integrated into a computerized physician order entry (CPOE) in the form of evidence-based order sets that suggested appropriate tests based on the indication provided by the GP. When starting the order entry within the CPOE, GPs first chose a presenting concern or chronic condition. GPs with access to the CDSS then received a list of suggested tests based on the order sets developed for each of the chosen indications. The CDSS included order sets for presenting complaints and for chronic conditions. The order sets were developed to include multiple clinical presentations for specific indications, such as screening, diagnosis, or follow-up. They were based on clinical practice guidelines developed by the Flemish College of Family Physicians[26,27] and tailored to the different laboratory workflows. The CDSS allowed the GP to change, add or delete proposed tests prior to confirming the laboratory test order. Control GPs equally recorded the indications for laboratory test ordering in the CPOE but did not receive suggestions from the CDSS. In order to be able to identify tests that were ordered for indications other than the 17 study indications, GPs flagged panels that included additional indications and were prompted to describe these additional indications in a free text field.
Randomization and procedures
GPs were randomized to a control group who ordered laboratory tests as usual through a CPOE or to an intervention group who had access to the CPOE with integrated CDSS. The intervention was aimed at the GP, and many GPs worked together in a primary care practice (further referred to as practice), hence we chose to randomize on the level of the practice rather than on the level of the patient. This clustering avoided contamination between GPs and ensured that patients could not be managed by GPs in both intervention and control arms. All practices were allocated prior to patient enrolment using an electronic random number generator in a 1:1 ratio by an independent statistician. We aimed to stratify practices based on their prior experience with a CPOE, but post-hoc we chose to stratify based on the clinical laboratory with which practices were affiliated. Of the three participating laboratories, one had previously implemented a CPOE and two others had only recently started the implementation, hence experience with a CPOE was associated with the affiliated laboratory.
All practices received a one-hour training in the use of the CPOE (with or without CDSS) by qualified personnel. Practices were not blinded to the intervention, nor were patients. All involved researchers, including data managers, statisticians and monitors, were blinded to the allocations until all data was collected, cleaned, and analyzed.
The primary outcome of the ELMO study was the proportion of appropriate tests over the total number of ordered tests and inappropriately not-requested tests. For the definition of the primary outcome, three numbers were relevant:
- The number of tests ordered appropriately,
- the number of tests ordered inappropriately and,
- the number of inappropriately not-requested tests. This number was only relevant for diabetes mellitus, CKD, rheumatoid arthritis and thyroid disease.
Per patient, aggregated over panels if multiple panels were available, the primary outcome was defined by the ratio (a)/(a+b+c). This is further referred to as the proportion of appropriate tests. Appropriateness was defined restrictively, where a test with no clear indication was considered inappropriate. In addition, recommended tests not ordered for a specific indication (underutilization) were also considered inappropriate. Appropriateness per indication was defined prior to data analysis and was based on the recommendations from the clinical practice guidelines used to develop the intervention. Hence appropriateness reflected the tests suggested by the CDSS (appropriate and inappropriate under-utilized tests per indication are available in Supplement 1). GP’s tagged panels that included so-called “piggyback” tests, or tests that were ordered for another indication that one of the 17 study indications. This allowed separate analyses on panels that did not include any piggyback tests.
Secondary outcomes of the ELMO study included diagnostic error, test volume and cascade activities. For the assessment of diagnostic error, all new diagnoses were extracted from the EHR using a semi-automated clinical report form. All new diagnoses were evaluated for diagnostic error in relation to the indications for which the laboratory tests were ordered. Diagnostic error was assessed independently by two academic clinicians (ND, VP, BV or GVP) who were blinded to the allocation. Disagreements were resolved by consensus. Laboratory test volume was assessed as the number of tests per laboratory panel.
The planned statistical analyses were described in the published protocol and are available in Supplement 1. All analyses were performed using SAS® Enterprise Guide version 8.2 software. For the primary outcome, a sample of 35 GPs and 7305 tests would have been sufficient to detect a 10% difference in appropriateness (significance level of 5%, corrected for clustering). However, we aimed to recruit 300 GPs and enroll 12 600 patients based on the power calculations for our secondary outcome (80% power to detect a non-inferiority of a 1% difference in incidence of diagnostic error using a significance level of 5% and correcting for clustering). We were able to recruit 288 GPs from 72 practices who included 10 665 patients, hence the trial was over-powered for the primary outcome, but slightly under-powered for the secondary outcome.
To assess differences between the allocated groups in the proportion appropriate tests, a logistic generalized estimating equation (GEE) model was used, where the marginal proportions were of interest and not the proportions on patient, GP or practice level. The logistic GEE model included the allocated group and laboratory as factors and practice as the clustering variable. The effect of the intervention was expressed as the difference in proportions with associated 95% confidence intervals. The proportion of appropriate tests in the two allocated groups was also estimated from the GEE model and presented with their 95% confidence intervals.
The proportion of patients with a missed diagnosis was analyzed by means of a logistic GEE model that included the allocation and laboratory as factors and used the practice as the clustering variable. The proportion of patients with a missed diagnosis and associated 95% confidence intervals were estimated from the model. The non-inferiority limit for missed diagnoses was 1%, hence the intervention was deemed non-inferior if the difference between the allocated groups (intervention – control) was shown to be less than 1%.
We conducted post-hoc sensitivity analyses to investigate potential sources of bias. To assess the effect of age difference between both groups, the planned analysis for the primary outcome was also performed on subgroups of patients stratified by age categories. The analysis was also performed on a subset of the total population where practices with extreme age differences were omitted. To assess potential documentation bias, a comparison of several signal tests was made between subgroups in both arms. For instance, the results of mean value for TSH was compared in the subgroup of thyroid disease patients in both arms, allowing us to evaluate whether both subgroups were comparable. We judged that potential documentation bias would have been most probable in the subgroup of patients for which tests were ordered for a general check-up. Differences in patient characteristics may have been influenced by more accurate clinical coding of indications by GPs in the intervention group. Omitting patients with general check-up as indication, leaves only patients with clearly documented indications. We therefore also analyzed appropriateness in the sub-group of patients without tests ordered for general check-up.