An AI-Enabled Predictive Analytics Dashboard for Acute Neurosurgical Referrals

Healthcare dashboards make key information about service and clinical outcomes available to staff in an easy-to-understand format. Most dashboards are limited to providing insights based on group-level inference, rather than individual prediction. Here, we evaluate a dashboard which could analyze and forecast acute neurosurgical referrals based on 10,033 referrals made to a large volume tertiary neurosciences center in central London, U.K., from the start of the Covid-19 pandemic lockdown period until October 2021. As anticipated, referral volumes signicantly increased in this period, largely due to an increase in spinal referrals. Applying a range of validated time-series forecasting methods, we found that referrals were projected to increase beyond this time-point. Using a mixed-methods approach, we determined that the dashboard was usable, feasible, and acceptable to key stakeholders. Dashboards provide an effective way of visualizing acute surgical referral data and for predicting future volume without the need for data-science expertise.


Introduction
Healthcare dashboards provide a visual, interactive presentation of clinical and service data that has been gathered in a way that facilitates interpretation and decision-making 1 . They offer staff an e cient means to audit data without requiring much technical ability and can function as a catalyst for quality improvement initiatives. Dashboards have been used in various surgical departments to measure and improve key in-hospital patient outcomes, theater utilization and intraoperative performance 2-5 . Common to all of these examples is the ability to visually describe data, permitting group-level inference but not individual-level prediction.
Alongside data curation, dashboards have a clear opportunity to incorporate more predictive capabilities.
Advanced, real-time predictive visualizations are currently being employed to aid with the Covid-19 pandemic response effort (https://covid19.healthdata.org 6 ), but to the best of our knowledge no such comparable examples exist within surgical healthcare. If such technologies were available, they would be well placed to offer surgical decision-makers with not only a snap-shot of historical data, but also a contemporaneous, accessible forecast of resource availability and demand. Sta ng, bed and theater availability, emergency admissions and referrals represent predictive metrics of relevance. Given the current growing backlog of surgical procedures due to the Covid-19 pandemic 7 , as well as rising acute referral volumes before and after the pandemic 8,9,10 , the need for such tools has become more pressing.
They would enable surgical departments to evaluate and anticipate workload, allocate resources more dynamically and ensure timely access to treatment 11 .
To that end, we present an interactive dashboard that not only offers important audit insights into historical surgical data but can also make robust machine learning forecasts about future surgical demand. In this study, we use large-scale data from an electronic referral system (ERS), which is a barometer of acute service demand. Acute referrals greatly contribute to the throughput of specialities like neurosurgery in which a considerable proportion of patients present with life or limb threatening injury, and require prompt transfer and emergency intervention. Now ubiquitous among U.K. neurosurgical centres 12 , ERS can help with on-call triage by transmitting salient patient information between the referring site and neurosurgical center. ERS also provide a formal record of referrals that can be evaluated, permitting exploration into their volume, type, timing and geographical distribution as well as the potential to make data-driven predictions regarding future referral volume. A lack of data science expertise among clinicians, however, limits this type of analysis, providing an incentive for the usage of the software discussed in this work.
As a test case, we evaluated acute neurosurgical referrals in a large-volume tertiary neurosciences center in central London, U.K from the start of the Covid-19 pandemic lockdown period to October, 2021. We anticipated that referrals would signi cantly increase during this time period as surgical services returned to pre-Covid levels of capacity. We also determined whether the dashboard was usable, feasible and acceptable for typical users using a mixed-methods approach.

Ethics and regulations
Our retrospective study and use of anonymised referral data was approved by the institutional review board (National Hospital for Neurology and Neurosurgery, London, UK) as a service evaluation (121-202021-CA) with the requirement for informed consent being waived. All methods were conducted in accordance with local and national guidelines and regulations.

Data collection
A series of data acquisition and processing steps were followed and have been outlined graphically in Figure 1. Data processing and analysis was performed in Python 3.8.6, using a MacBook Pro (2017, 2.9GHz, 16GB RAM). Raw referral data from the center's cloud-based referral platform (referapatient.org) was securely obtained and extracted in comma separated values format and downloaded to a hospital workstation before fully de-identifying the data and transferring to the system aforementioned. referapatient is used in the overwhelming majority of neurosurgical centers in the U.K.

Data analysis and visualization
Data was cleaned and pre-processed to remove erroneous and duplicated entries using numpy (v=1.19. Statistics and evaluation of forecasting algorithm performance Statistical comparisons of weekly referral volumes were implemented through scipy (v=1.6.2). Tests of normality were performed using the Kolmogorov-Smirnov test. If parametric, an independent-samples ttest was applied, otherwise a Mann-Whitney U-test was performed. The choice of forecasting algorithm was limited to options which would: (i) train rapidly while the user is interacting with the dashboard, (ii) automatically parameterise and tune the algorithm without user input and (iii) handle anticipated seasonality within the data seen after exploratory time-series analysis (Supplementary Results). Three algorithms were felt to meet these criteria: an automated pipeline which combined Seasonal and Trend decomposition using Loess (STL) with an automated regression integrated moving average (Auto-ARIMA) model, a Convolutional Neural Network -Long Short-Term Memory (CNN-LSTM) network 13,14 and Prophet 15 (see Supplementary Methods for model implementation).
Forecasting algorithms were evaluated using two methods with evaluations subdivided to forecast volumes for one week, four week and twelve week periods. First, a 'blocked' cross-validation using all available training data (June, 2014 to July, 2021), randomly divided into 5 folds with 15-month timeframes, with a 12-month training window and 3-month validation window. Second, a train-test approach with a three month period (August to October, 2021) withheld from the outset as a testing sample and training data as the year preceding this i.e. the post-pandemic period. Mean absolute error (MAE), mean percentage error (MPE) and root mean squared error (RMSE) were used as scoring metrics, in addition to the computing time taken to run the algorithm.

User experience and implementation
In line with other studies in the eld of health informatics 16 , we tested user experience using a mixedmethods approach with semi-structured interviews and an electronic questionnaire which incorporated the System Usability Scale (SUS) 17 : a validated tool for usability testing of systems including healthcare dashboards 18, 19 . In addition, we gauged information relating to user implementation through the acceptability of intervention (AIM) and feasibility of intervention measures (FIM) 20

Results
All gures presented in this section are available as interactive and adjustable graphs within the dashboard platform and can be trialed with the web application which uses synthetic data (Supplementary material).
A summary of acute neurosurgical referrals in the post-lockdown period 10,033 acute referrals were made to our neurosurgical center between March, 2020 to October, 2021 (female = 4938, mean age [SD] = 61.1 years [18.8]). As would be expected, age and gender distribution varied widely according to diagnosis (Figure 2). For example, patients with a subdural hemorrhage presented with a mean age of 76.8 years [13.6] and male bias (male = 68.3%) as compared to patients classi ed as being suspected of cauda equina syndrome (mean age = 53.5 years [17.9], male = 43.4%).
The majority of referrals were classi ed as a brain tumor, degenerative spine or neurovascular diagnosis ( Figure 3A) in line with the center's main subspecialties. 96% of referrals were stated as an 'emergency' or 'urgent' by the referring team ( Figure 3B) and 79% of referrals were made by a junior registrar or intern ( Figure 3C). In terms of how referrals were triaged, 9.5% of referrals were accepted for immediate hospital transfer, 1% were placed on a transfer wait-list and 6.3% were assigned to outpatient review. 36.4% of referrals required additional clinical or imaging information from the referrer in order to make a triage decision. 32.1% of referrals were completed by triaging to conservative treatment or by offering only advice (Figure 4).
Weekly referral timing was found to be concentrated between 2-6PM on weekdays ( Figure 5A), particularly for brain tumor referrals. Referrals for other high-volume categories such as degenerative spine and neurovascular diagnoses were more distributed but still signi cantly less over the weekend ( Figure 5B, Supplementary Table 1).
During the aforementioned time period, referrals were received from 116 hospital sites and clinical institutions from across the U.K. (Figure 6A). Five hospitals in the Greater London catchment area accounted for more than 70% of overall referral volume ( Figure 6B).

Choice of forecasting algorithm
Prioritizing computational time and test performance, Prophet was selected as the dashboard forecasting algorithm of choice (Figure 7). Although the CNN-LSTM algorithm demonstrated better performance across cross-validation scoring metrics, it was found to require a longer computational training and tting time making it unsuitable under real-world computational constraints. ARIMA models are often considered as a benchmark model in forecasting 21 . Here, with the addition of STL and an autohyperparameter tuning function, the cross-validation performance was comparable with Prophet, however its test performance was worse across time periods and was also marginally slower.

Change in referral volumes
Weekly referral volumes were compared between the rst 6 months after the announcement of the U.K.
Covid-19 pandemic lockdown 22 as compared to the same time frame after one year. There was a signi cant increase between these periods, mainly driven by an increase in spinal referrals (Table 1) which include spinal trauma, suspected cauda equina syndrome and degenerative diagnoses. Out-ofsample forecasting by all three time-series algorithms using all available training data demonstrated a consistent increasing long-term referral trend ( Figure 7B, Supplementary Figure 1) Table 1. Median weekly volumes and group-wise differences between the rst 6 months of lockdown (March to August, 2020) and the same corresponding months after one year. Usability, feasibility and acceptability 20 participants were recruited for feasibility testing, including 5 neurosurgical consultants, 12 registrars and 3 members of management or administration staff. All were blinded to the development of the dashboard. Table 2 lists the average SUS, AIM and FIM scores among participants. An SUS score of 70 or above has previously been de ned as a threshold for good usability 16 . In this study, all user groups had mean SUS scores above this benchmark and high mean acceptability (AIM) and feasibility (FIM) scores were also recorded. Table 2). In brief, users highlighted the gures and interactivity as particularly useful features and felt that the dashboard would be useful to explore referral data, identify current areas for service improvement and suggest future directions for research. The use of time-series forecasting was commented as useful in anticipating service demand. In contrast, users expressed concerns regarding how the dashboard would be hosted and wished for additional functionality to review the data in more detail.

Discussion
In this study, we present and evaluate an acute neurosurgical referral dashboard that allows users with little to no data-science experience to audit referral data using highly interactive features and visualizations. Referrals can be ltered according to diagnosis, time-frame, age, sex, geographical location, decision-making, referrer information and timing. Even more novel is the dashboard's ability to estimate future referral volumes nearly instantly using novel time-series forecasting techniques. While preliminary, these ndings demonstrate our dashboard's versatility and functionality for drawing critical insights into an acute neurosurgical service as well as suggesting various avenues for future quality improvement and research. We con rmed, using a mixed-methods approach that key stakeholders who could bene t from this software, deemed it as usable, acceptable and feasible.
We found that, as anticipated, referral volumes signi cantly increased between the rst 6 months after the Covid-19 pandemic began and a corresponding time period one year later. This was consistent with the ndings of ElGhamry et al., who discovered increased referral and operative volumes were apparent in the post-wave period as services were restored to pre-pandemic levels 23 . Changes in referral volume at our center were mostly related to an increase in spinal referrals such as for degenerative spine and suspected cauda equina syndrome. Spinal activity in particular, was previously shown to be reduced during the initial pandemic period. This is most likely due to patient avoidance of healthcare services, governmental self-isolation advice and lower vehicular tra c 24,23 , however lower referrals did not necessarily translate into fewer spinal procedures 25 .
Beyond the dates of our testing set (from October, 2021), we forecasted that weekly referral volumes would continue to rise over the next year. To evaluate our forecasting functionality, we compared 'standard' models of time-series forecasting (STL + ARIMA) against novel methods, including deep learning algorithms with the stipulation that they could be tted quickly while the dashboard was in use.
Although the combination of CNN and LSTM methods produced the best cross-validation results it was discovered they were computationally intensive to t. Similar to previous work 26 , we found that Prophet was the most time-e cient and had comparable cross-validation scores as compared to STL + ARIMA. Prophet also outperformed both in the withheld testing sample. A shortcoming shared by all forecasting methods was that tuning was performed over a narrow parameter space, implying that there was scope to improve the models with a wider grid search, or via other techniques such as ensemble methods 27 .
Nevertheless, our intention was to improve generalisability by reducing hyperparameter optimization time, and making the software 'plug and play'.
Although there is no shortage of time-series forecasting examples within the recent surgical literature 27,28 , there are few that precisely describe acute referral numbers. Chandrabalan et al. trained a forecasting model using Prophet to estimate the pandemic-related de cit in colorectal cancer referrals and found that their predictions overestimated referral volumes as compared to actual data in the early post-pandemic period 29 . The fact that all algorithms examined in our work had 12-week mean percentage error scores of less than 10 and 15% for testing and cross-validation respectively and that similar trajectories and volumes were forecasted out-of-sample, lends credibility to our predictions (Supplementary Figure 1).
Low-code and no-code development platforms have recently been gaining traction, permitting users to build, test and share applications with minimal expertise 30 . Data dashboards are an example of a lowcode platform with modular components that allow people to engage with big-data via an easy-to-use graphical user interface. They facilitate audit and exploratory data analysis without the need for programming or spreadsheets. For developers, each module can be iteratively con gured with ease to meet the needs of the user-base, allowing for an almost limitless scope for innovation.
Although dashboards have previously been used to explore healthcare big-data 31,32 , we believe our work is the rst to combine surgical large-scale data with machine learning (ML) methods within a dashboard platform. A key strength of this design is that it considerably reduces the gap between the requisite level of technical ability needed to understand ML methods and their actual implementation. Indeed as two study participants commented: "the new AI tool was really user friendly [R5]" and "it was impressive that AI could be implemented and used [in the dashboard] so easily [R8]".
Despite this, our study has limitations. Accuracy of referral data is highly reliant on the referring doctor and on-call neurosurgical team. Often, there is a change in referral status that is not represented in the record (for example, a pending scan may have been received but the status was not altered) or the specialist working diagnosis was incorrect. Still, some of these inaccuracies are likely to be compensated for, once the data has been aggregated. Although a number of design features were implemented to make the model and software generalisable, further testing on other referral data-sets is required to con rm the tool's validity. In addition, user feedback from other neurosurgical centers is needed. Furthermore, because our center lacked an existing method for auditing large-volume referral data, there was no control to compare the dashboard's user experience, making it di cult to gauge the relative utility of our software.
Our focus in this study was to showcase an analytics dashboard that could e ciently audit and predict acute neurosurgical referral volume. In addition to the prospective work aforementioned, there are several opportunities for future development. Setting up a pipeline that can accept referral data, t models and make predictions contemporaneously would contribute toward a widely-held objective of a dynamic, exible surgical service 33 . Delaying this are a number of obstacles including practical concerns regarding streamlining access to multiple information pools and data-regulation issues about how and where this type of clinical dashboard would be hosted. Nevertheless, having a dashboard 'front-end' for big-data sets that can both describe as well as predict, would expand accessibility and stimulate improvements in the quality of patient care.

Declarations
Code and data availability The supplementary material contains key portions of the code used. Full code and synthetic data set are available on reasonable request. A version of the dashboard using synthetic data is available on https://referralsdash.herokuapp.com. Clinical data cannot be shared without rst obtaining relevant information governance permissions.