Background Machine learning models have shown great potential in preventive medicine but require large datasets, which is a challenge due to strict privacy regulations in the healthcare sector. Federated learning is an approach that enables collaboration between institutions while preserving data privacy. The focus today in research is on developing federated learning methods using artificial neural networks. In this study, we aimed to contribute federated learning modelling methods applied for random forests with a use case of predicting delirium in hospitalised patients using data from multiple hospitals.
Methods We collected data from 11 hospitals, including 29,479 patients and 627 features. We trained random forest models with each hospital’s data and a general model using all hospitals data. We developed federated learning models by averaging the predictions of the individual hospital models, with different schemes based on the number of samples, positive cases, minority cases and maximum possible diversity and evaluated the models using area under the receiver operating characteristic curve (AUROC) as a performance measure.
Results The general model outperformed all the other models with an AUROC of 0.854 [0.849-0.860]. Models trained on data from single hospitals varied in performance with AUROC from 0.626 to 0.828. Models from hospitals with large datasets performed better than that of small hospitals. The general model outperformed all the other models with an AUROC of 0.854. Federated learning models performed better than individual models. Unweighted averaging performed worst with an AUROC of 0.793 [0.782-0.805]. Among the weighted averaging schemes, the number of positive cases performed the best with an AUROC of 0.843 [0.838-0.846], followed by minority class (AUROC=0.840 [0.836-0.845]), maximum possible diversity (AUROC=0.836 [0.830-0.841]) and number of samples (AUROC=0.830 [0.819-0.841]).
Conclusions Results suggest that federated learning models can perform better than hospital-specific models in some cases, especially hospitals with limited data. In case of datasets of different size, we suggest weighted averaging based on the number of samples. If the datasets are class imbalanced, maximum possible diversity should also be considered. Additionally, federated learning models are consistent and stable in performance compared to hospital specific models.