Background
Early diagnosis of Parkinson's disease (PD) is crucial for personalized medicine and improved patient outcomes. Traditional methods often lack transparency, raising concerns about reliability. This study proposes developing interpretable Machine Learning (ML) models that leverage Explainable Artificial Intelligence (XAI) techniques. Vocal biomarkers from PD dataset are used to train these models for early PD prediction. This approach aims to empower healthcare professionals by providing insights into the "why" behind model predictions, fostering trust, and identifying potential voice biomarkers for PD.
Methods
We analyzed vocal features extracted from PD dataset, creating visualizations to uncover distribution patterns and relationships. We experimented with various ML algorithms, including Random Forest, Gradient Boosting, and AdaBoost, as well as established methods like Logistic Regression, Support Vector Machines (SVM), and K-Nearest Neighbors (KNN). We incorporated a Multi-Layer Perceptron (MLP) for non-linear modeling. XAI techniques such as SHAP and LIME were used to understand model predictions and build trust in their application.
Results
Ten-fold cross-validation ensured robust evaluation, with accuracy ranging from 0.95 to 1.0. AdaBoost emerged as the most efficient algorithm (accuracy: 100%, training time: 0.0036 seconds, prediction time: 0.0016), outperforming others. SVM (accuracy: 0.82) and KNN (accuracy: 0.85) showed lower accuracy and limitations in PD classification. MLP had good accuracy (around 0.87) but a lower AUC-ROC score. Notably, Random Forest demonstrated superiority on the test dataset. SHAP and LIME provided insights into model decisions, identifying specific vocal characteristics indicative of PD. Exploratory data analysis revealed significant differences in vocal features between PD patients and healthy controls, with features like jitter and shimmer showing strong positive correlations with PD status.
Conclusion
This study demonstrates the effectiveness of using XAI techniques to understand model reasoning, fostering trust and providing insights into potential voice biomarkers for PD. Employing a diverse range of machine learning algorithms ensures robust and accurate PD prediction. The findings highlight the importance of balancing model accuracy with interpretability, contributing to the development of more transparent and trustworthy diagnostic tools in clinical settings.