In the present study, we evaluated the ability of ML models created from five algorithms to discriminate between PV disease and non-PV disease. These five algorithms were the commonly used RF, AB, GB, SVM, and LR methods and suggest the potential for supporting the prediction of vestibular disease diagnosis. Furthermore, our approach of combining all five ML classifier models was expected to support the prediction performance of each model individually.
All five models presented relatively good results by tuning the algorithms and choosing the best parameters using GridSearchCV. Among the five models, the results of SVM in Table 4 seemed to be superior to those of the other models. Varpa et al. applied one-vs-one and one-vs-all classifiers in the k-nearest neighbor method and SVM in the classification of vertigo data and reported that using multiple binary classifiers improves the classification accuracies of disease classes compared to one multi-class classifier49. Masankaran et al.50 used four classifier models (RFC, SVC, k-nearest neighbor, and Naïve Bayes) with the Dizziness Handicap Inventory questionnaire to distinguish benign paroxysmal positioning vertigo types with a best accuracy of 73.91%. Priesol et al.11 applied five classifier models (DT, RF, LR, AB, and SVM) and reported an overall accuracy of 76%. Compared to these reports, the performance of our best classifier had a higher accuracy of 79%. To further improve performance, we devised a method by combining all five models in the prediction data (Fig. 4). As a result, when the predictions of the five models matched in PV, the correct answer rate was 83%, and when they matched in non-PV, the correct answer rate was 85%. This result was superior to the accuracy of SVM alone. However, when PV and non-PV predictions were presented simultaneously, the accuracy of SVM was superior. Therefore, the combination of SVM together with our new ML approach has the potential to diagnose PV disease and distinguish it from non-PV disease.
For otolaryngologists, it is important to reliably detect PV disease in patients with chaotic symptoms of vertigo/dizziness. However, the non-PV group included various diseases of cerebral etiology such as brain tumor, brain infarction, spinocerebellar degeneration, vertebrobasilar insufficiency, and others for which a delayed diagnosis might lead to life-threatening consequences. Thus, ML should have a high predictive ability not only for PV diseases but also for non-PV diseases. This balance of predictive performance can be evaluated using precision, recall, and the F1-score. The F1-score is a measure that can comprehensively evaluate precision and recall. As shown in Table 4, the precision average of the five models was better in non-PV than PV. However, the F1-score averages of the five models were 0.77 for PV and 0.78 for non-PV. This result means that our models function well for both groups. Furthermore, the F1-scores of SVM were the best with 0.78 for PV and 0.79 for non-PV. Thus, SVM appears to be a useful classifier for discriminating both disease groups.
Our dataset was established based on the clinical data of patients who were diagnosed by our 16 different types of equilibrium function tests, whereas previous studies usually used the most commonly performed vestibular tests such as the caloric test and vestibulo-ocular reflex derived from the rotation test11 or used head impulse, gaze-evoked nystagmus, or test of skew for differentiation of vestibular stroke and peripheral acute vestibular syndrome9. In Fig. 3, features related to the caloric test were the most important features, but the optokinetic nystagmus test, eye tracking test, Schellong test, pendular sinusoidal rotation test, and stabilometry also ranked in the top 10. Thus, the combination of multiple kinds of equilibrium examinations might help to increase the variety of features and improve the quality of the training dataset for ML. However, not all features in our dataset have equal importance. Determining which features yield the most predictive power is another crucial step in the model-building process.
This study has some important limitations, including the characteristics and size of the dataset and optimization of the models. In this study, ML was used to classify PV disease and non-PV disease, which include a wide range of diseases. Further studies using synthetic models in the classification of PV disease and a particular disease are needed to improve the diagnostic ability of ML. In addition, the number of study subjects was relatively modest, and other ML algorithms using advanced analytics techniques will be necessary to enhance the results. Furthermore, obtaining extensive testing batteries as presented here will not tailor for clinical decision making in the setting of acutely dizzy patients in an emergency condition or in an outpatient center without examination equipment. Finally, even though ML can assist in making good predictions, it does not completely replace the physician. Especially with some diseases, which require patient-physician interaction and critical thinking, the physician needs to make the final diagnosis.