Establishment of the AI model
Based on deep learning and deep neural networks, the SCMC and YI TU Technology Company jointly developed a personalized consultation and automatic diagnosis algorithm that can imitate the process of the doctor’s consultation. At the same time, the medical records are structured through natural language processing. Following by automatic diagnosis based on medical records, the corresponding examinations or tests items are generated.
We selected 59,041 high-quality medical records hand-annotated by a team of professional doctors and informatics experts. Then AI model was used to find the internal rules between patients' chief complaint/past history and the tests/examinations that needed to be done. The features of patients were captured and marked for prediction, then appropriate clinical test/examination decisions were finally made. In previous validation, the average value of the accuracy was 0.92.
Considering guardian’s acceptance, though AI could theoretically generate most of the tests/examinations, our final model only considered certain kinds of the tests/examinations, which were non-invasive (or less invasive) and low-cost. Thus, XIAO YI just recommended common items to patients. If a 12-year-old child developed hematuria with lumbago for 1 day, the initial diagnosis might be kidney stones. According to the consultation, XIAO YI analyzed that the child needed blood routine, urine routine and urinary B-ultrasound. But for some kidney stones, doctors might also ask the patient to have a CT scan. The price of CT was higher, but B-ultrasound was sufficient for a preliminary diagnosis of kidney stones. In performance test, most errors were items missing (85%).This was the result of our deliberate choice, as we did not require XIAO YI to order all tests/examinations for patients. On the contrary, we only needed it to issue the simplest and most common parts. The rest of the complex, invasive ones would be left to professional doctors.
At the same time, we also had special doctors responsible for reviewing every item ordered by XIAO YI. If the doctors thought the tests/examinations were not enough, they would manually add some. Only after the doctors’ approval, can the patients pay and complete the tests/examinations.
Procedure of the AI-assisted outpatient service
We explain the standard outpatient procedure and the AI-based modifications to it. A patient needs to be registered first. After registration, the patients wait in the waiting area. When it is their turn, they are called to the consulting room to see a doctor. In most cases, a lab test and an imaging examination are ordered to confirm diagnosis. The patient pays for these, and then goes to the appropriate place to get examined or tested. After receiving the report, the patient has to wait again to see the doctor and ascertain the diagnosis based on which the patient may be recommended another examination/test or medicines. In this study, we focus on the steps from registration to the examination or test.
The first step in the AI-assisted outpatient service remains the same. The patient is registered. In the next step, the patient opens the WeChat application (a Twitter-like social application widely used in China) on their mobile phone. The patient’s unique outpatient number is linked to a small smart program based on WeChat named XIAO YI. XIAO YI is the materialization of the above-discussed algorithms, which has clients on both mobile phones and doctors’ work computers. XIAO YI can automatically read the registration information of the patient. Depending on the symptoms, XIAO YI asks the patient a series of questions, like a real doctor would. The next question is decided intelligently based on the answer to the previous question. When the AI believes it has gathered enough information, the inquiry ends. XIAO YI orders any tests and examinations that must be done to help the doctors make the clinical diagnosis. The tests and examinations “prescribed” by XIAO YI are basic, non-invasive, and relatively inexpensive (e.g., blood routine). The patient then makes the payment for these tests and heads to the testing room. If the patient disagrees, they would then have to go through the traditional procedure of waiting in line to see the doctor. When the test or examination is completed and the report is obtained, the patient waits to be called to the doctor’s office for consultation. The traditional and AI-assisted workflows are shown in Fig. 1.
Selection of subjects
SCMC is one of the biggest pediatric specialized hospitals in Shanghai. It is affiliated to Shanghai Jiao Tong University School of Medicine. We collected the information of patient’s registrations from August 1, 2019 to January 31, 2020. The dataset included patients from the internal department, gastroenterology department, and respiratory department who visited SCMC during that period. It included their gender, age (on the day of registration), registration code, registration time, time of meeting the doctor, time of examination/testing, time of prescription by the doctor, and time of receiving the medicines, among others. We ensured patients’ privacy. In the dataset that we extracted and used for analysis, researchers could not see the patient’s name or their outpatient number. The patient’s outpatient number was recoded into a registration code, mainly because sometimes a patient would register multiple times in one day and therefore the outpatient number needed to be recoded to make it unique. In addition, in this way, the information security of patients was also guaranteed.
During this period, uniformly trained volunteers and nurses would publicize XIAO YI to guardians of the children in the internal department, gastroenterology department, and respiratory department, and taught them how to use it. Thus, patients were categorized into two groups, namely, the conventional outpatient group and the AI-assisted group (AI group), depending on the type of medical procedure they chose. Because there were far more patients in the conventional group than in the AI group, we conducted a 1:1 matched case–control study. The two groups of patients were matched according to the registration time mainly because the time of registration may be the most influential factor affecting the waiting time of an outpatient. Generally, there are more patients on holidays than on weekdays, and there are more patients in the morning than in the afternoon. Moreover, weather, traffic jam, and other external factors (e.g., COVID-19 outbreak) could influence the time spent by outpatients in the hospital. This complication can be resolved by matching the registration time to pair the patients who visited the hospital at almost the same time. We employed propensity score matching (PSM) to pair the patients [18].
We found that using only the paired dataset was insufficient. This was because in our conceptual scenario, patients were first registered and then queued up in the waiting area to see the doctor. However, the actual situation was that after some patients registered, they did not wait to see the doctor if they perceived that there was a long waiting time due to too many patients. Some of these patients (i.e., children accompanied by their guardians) returned to the waiting area after some time with a fresh registration. As a result, this kind of patients spent a lot more time waiting than others. In addition, there were some patients who took advantage of the features of the system to make an appointment, especially in the AI group, as it was more convenient to make an appointment through the AI system. For example, if a patient came to register at 8 a.m. but the patient was not available until 2 p.m., the patient would request the nurse to schedule the appointment for 2 p.m. This would greatly overestimate the time spent in the hospital.
To avoid these issues, we cleaned the data according to some criteria. We excluded the patients who did not have a lab test because the main function of the AI was to order a lab test before the patient’s consultation with the doctor. Patients who spent more than five hours from registration to consultation were also excluded, as were those who spent more than eight hours from registration to obtaining their medicines. According to the experience of many doctors in the hospital, such long waiting times usually happened because the patients either had appointment or were late for their appointment. The patients who spent less than five minutes waiting were also excluded, as these were likely errors made by the AI system when reading the data.
Outcomes
The primary outcome was the time spent by the patient from registration to taking the laboratory test or examination, defined as the waiting time. The secondary outcome was the expenses incurred by the patient in the hospital. Thus, we evaluated the performance of the AI-system from two dimensions.
Statistical analysis
Stata 15 was used for statistical analysis and PSM. Continuous variables were expressed as means ± standard deviation (SD) or medians and inter-quartile range (IQR). Categorical variables were summarized as counts and percentages. Missing data were not imputed and were deleted. All of the analyses were two-sided, and P values of < 0.05 were considered to be significant. The skewness/kurtosis test for normality was used to test the assumption of normal distribution. When normally distributed, continuous variables were expressed as mean ± SD and calculated using a paired Student’s t-test. If not, as was the case with almost all continuous variables, we used the nonparametric Wilcoxon signed-rank test.
Propensity scores were estimated using logistic regression. The covariate was time of registration. This covariate was selected because it may affect the time that the patient spent in the hospital. The time from registration to taking the test or examination was entered into the regression model as a dependent variable. The group was defined as an independent variable. A 1:1 “nearest neighbor,” case-control match without replacement was used [19]. Stata was used to test the equilibrium between the two groups after PSM, and p > 0.05 suggested that the difference in registration time was not statistically significant. The chi-square test was used to compare the sex ratio in the two groups and the ratio of visits in each department.