From the (2,011,813 ) visits included in the study, there were (1,474,391) no-show (537,422 ) show visits. Therefore, the overall proportion of no-shows at all outpatients’ clinics was (26.71%). Of these visits, we will not consider cancelled appointments. Each record contains 20 variables, which summarized in Table 1. As per Table 1, male patients were less likely to miss their appointments than female patients. New patients were the most likely to miss of their appointments. The patients who has Follow up were the second most likely to miss their appointments.
Table 1
Descriptive characteristics of the dataset (N = 2,011,813)
Features | No-Show (N%) | Show (N%) | Total (N) |
Gender |
Male | 213,729 (10.62%) | 564,024 (28.04%) | 777,753 |
Female | 323,693 (16.09%) | 910,367 (45.25%) | 1,234,060 |
Age Group |
0–5 | 66,118 (3.29%) | 150,748 (7.49%) | 216,866 |
6–10 | 39,234 (1.95%) | 95,566 (4.75%) | 134,800 |
11–15 | 32,949 (1.64%) | 87,269 (4.34%) | 120,218 |
16–20 | 32,440 (1.61%) | 87,115 (4.33%) | 119,555 |
21–25 | 40,968 (2.04%) | 104,714 (5.20%) | 145,682 |
26–30 | 44,580 (2.22%) | 118,409 (5.89%) | 162,989 |
31–35 | 45,776 (2.28%) | 123,426 (6.14%) | 169,202 |
36–40 | 40,400 (2.01%) | 114,836 (5.71%) | 238,262 |
41–45 | 32,026 (1.59%) | 95,960 (4.77%) | 127,986 |
46–50 | 31,599 (1.57%) | 97,445 (4.84%) | 129,044 |
51–55 | 30,485 (1.52%) | 94,975 (4.72%) | 125,460 |
56–60 | 27,602 (1.37%) | 88,763 (4.41%) | 116,365 |
61–65 | 23,159 (1.15%) | 74,015 (3.68%) | 97,174 |
66–70 | 16,748 (0.83%) | 51,621 (2.57%) | 68,369 |
71–75 | 13,857 (0.69%) | 39,771 (1.98%) | 53,628 |
76–80 | 10,223 (0.51%) | 27,134 (1.35%) | 37,357 |
81–85 | 5,464 (0.27%) | 13,272 (0.66%) | 18,736 |
> 85 | 3,794 (0.19%) | 9,352 (0.46%) | 13,146 |
Nationality | | | |
Saudi | 530,112 (26.35%) | 1,451,144 (72.13%) | 1,981,256 |
Non-Saudi | 6,650 (0.33%) | 21,807 (1.08%) | 28,457 |
Unknown | 660 (0.03%) | 1,440 (0.07%) | 2,100 |
Appointment type |
New Patient (NP) | 243,158 (12.09%) | 890,110 (44.24%) | 1,133,268 |
First visit (FV) | 271,466 (13.50%) | 517,688 (25.73%) | 789,154 |
Follow up (FU) | 22,798 (1.13%) | 66,593 (3.31%) | 789,154 |
Reservation type |
Scheduled | 516,300 (25.66%) | 1,278,602 (63.55%) | 1,794,902 |
Walk-in | 21,122 (1.05%) | 195,789 (9.73%) | 216,911 |
Patient type |
Patient Service | 530,923 (26.40%) | 1,460,373 (72.59%) | 1,991,296 |
Business Center | 2,625 (0.13%) | 6,634 (0.33%) | 9,259 |
VIP | 3,874 (0.19%) | 7,384 (0.37%) | 11,258 |
Distance (km) |
distance < = 100 | 517,591 (25.73%) | 1,421,338 (70.65%) | 1,938,929 |
distance > = 101 and distance < = 399 | 10,251 (0.51%) | 28,962 (1.44%) | 39,213 |
distance > = 400 and distance < = 799 | 7,462 (0.37%) | 19,229 (0.96%) | 26,691 |
distance >= 800 | 2,118 (0.11%) | 4,862 (0.24%) | 6,980 |
Outpatient Clinics | | | |
Health Care Specialty Clinic | 236,668 (11.76%) | 656,152 (32.61%) | 892,820 |
National Guard Comprehensive Specialized Clinic | 102,428 (5.09%) | 264,979 (13.17%) | 367,407 |
King Abdulaziz City Housing | 106,727 (5.31%) | 304,584 (15.14%) | 411,311 |
King Saud city Housing | 81,487 (4.05%) | 215,946 (10.73%) | 297,433 |
Prince Bader Housing City Clinic | 10,112 (0.50%) | 32,730 (1.63%) | 42,842 |
As an outcome of the feature importance process, the top four predictors are; number of no-show appointments, medical department, lead time and number of show appointments. The second four important predictors group are appointment type, patient type, outpatient clinics and appointment month. While appointment year, distance, gender, reservation type and nationality are not important predictors, thus removed from the models. The rest factors have less influence on the no-show such as number of schedule appointments, number of walk-in appointments, appointment time and age. The factors related to patients have more impact on no-show f patients to than factors related to the appointments. The prediction models have been developed using only 14 factors. The list of the predictors is ranked base on their importance in Fig. 2
Table 2 describes the experiments results carried out to show the performance of Spark using five machine learning algorithms over the same huge dataset. We evaluated the effectiveness of all classifiers in terms of time to train and evaluate the models, accuracy, precision, recall, F-measure and ROC. MLP and RF classified visits well. From the results, we can see that the percentage of all metrics is comparable for both classifier. A more improvement observed for the MLP in F-measure than RF. LG and SVM have similar ROC performance, LG are preferred than SVM as it produces better performance in all metrics with less computation power. SVM likely performs poorly due to the limitation of kernel function in MLlib, the only available linear kernel is used with SVM algorithm. GB performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively.
Table 2
Evaluation metrics shown by different models on predicting outpatients no-show
| Accuracy | Precision | Recall | F-measure | ROC Area |
Random Forest | 0.76 | 0.76 | 0.76 | 0.68 | 0.77 |
Gradient Boosting | 0.79 | 0.77 | 0.79 | 0.76 | 0.81 |
Logistic Regression | 0.75 | 0.73 | 0.75 | 0.70 | 0.73 |
SVM | 0.73 | 0.70 | 0.73 | 0.62 | 0.73 |
Multilayer Perceptron | 0.77 | 0.75 | 0.77 | 0.72 | 0.78 |
To better understand efficiency, Fig. 2 presents the ROC curve of five models to illustrate the precision of each classifier. From the plot, we can easily show that Gradient Boosting is best model (area = 081). SVM with linear kernel and Logistic Regression returned comparable classification results. Currently, MLlib supports linear SVMs only; using non-linear kernels may outperform Logistic Regression.
As evaluation criteria, we have employed the overall training and evaluation time (in seconds) for all five algorithms. Table 3 compares time values obtained by five algorithms. Unlike other metrics, there are a differences between times of the algorithms and considered a huge difference in the training time. We observe that GB is around 15x times slower than MLP, although it achieved the optimal results. SVM, the algorithm with close performance to LG, takes about 68x times as long to train the model. Logistic Regression is 4x times faster than the next two accurate algorithm MLP and RF with comparable performance. For huge datasets, the time is a factor to select one of the quicker algorithms, considering that the time values of models depends on the choice of algorithms parameters.
Table 3
Evaluation of time value for each machine learning model (seconds)
| Training Time | Evaluation Time |
Random Forest | 41.289 | 31.118 |
Gradient Boosting | 668.882 | 27.287 |
Logistic Regression | 10.033 | 24.962 |
SVM | 685.782 | 23.081 |
MLP | 42.444 | 23.458 |