A Multimodal Large Language Modelling Deep Learning Framework for the Future Pandemic

doi:10.21203/rs.3.rs-2777372/v2

Download PDF

Article

A Multimodal Large Language Modelling Deep Learning Framework for the Future Pandemic

https://doi.org/10.21203/rs.3.rs-2777372/v2

This work is licensed under a CC BY 4.0 License

Version 2

posted

You are reading this latest preprint version

Deep neural networks have been integrated into the whole clinical decision procedure which can improve the efficiency of diagnosis and alleviate the heavy workload of physicians. Typical applications include 1) medical report generation, 2) disease classification, and 3) survival prediction. Since most neural networks are supervised, their quality heavily depends on the volume and quality of available labels. However, for rare diseases, e.g., new pandemics, there are few existing labels. In addition, collecting sufficient labels for training is time-consuming and is typically unavailable at the early stage. In this paper, we propose a multimodal large language model - Unsupervised Learning from Unlabelled Medical Images and Text (ULUMIT) framework for radiograph representation learning, which can learn broad medical knowledge (e.g., image understanding, text semantics, and clinical phenotypes) from unlabelled data. As a result, when encountering a rare disease, our framework can be rapidly deployed and easily adapted to them with limited labels. Furthermore, ULUMIT supports medical data across visual modality (e.g., chest X-ray and CT) and textual modality (e.g., medical report and free-text clinical note); therefore, it can be used for clinical tasks that involve both visual and textual medical data. We demonstrate the effectiveness of our ULUMIT by showing how it would perform using the COVID-19 pandemic ``in replay''. In particular, in the retrospective setting, we test the model on the early COVID-19 datasets; and in the prospective setting, we test the model on the new variant COVID-19-Omicron. The experiments are conducted on 1) three kinds of input medical data, image-only, text-only, and image-text; 2) three kinds of downstream tasks, medical reporting, diagnosis, and prognosis; 3) five public COVID-19 datasets; and 4) three different languages, i.e., English, Chinese, and Spanish. All experiments consistently show that our framework can make accurate and robust COVID-19 decision-support tasks with little labelled data. Besides COVID-19, our framework can be applied to identify 14 common thorax diseases and tuberculosis across five additional public datasets, demonstrating its robustness in generalization and transferability. In brief, our framework achieves state-of-the-art performances on ten datasets.

Physical sciences/Mathematics and computing/Computer science

Health sciences/Health care/Medical imaging/Radiography

(Not answered)

Download PDF

Version 2

posted

You are reading this latest preprint version

A Multimodal Large Language Modelling Deep Learning Framework for the Future Pandemic

Archived Versions:

Version 2

Abstract

Full Text

Additional Declarations

Archived Versions:

Version 2