Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19

doi:10.21203/rs.3.rs-3569833/v1

Download PDF

Article

Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19

https://doi.org/10.21203/rs.3.rs-3569833/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 07 May, 2024

Read the published version in npj Digital Medicine →

You are reading this latest preprint version

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (𝑐𝑜𝑟𝑟(𝑋u_𝟏, Zv_𝟏) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

Health sciences/Medical research

Health sciences/Diseases

There is a conflict of interest Ahmet Gorkem Er acknowledges financial support for this project from the Fulbright Foreign Student Program, sponsored by the U.S. Department of State and the Turkish Fulbright Commission.

Download PDF

Journal Publication

published 07 May, 2024

Read the published version in npj Digital Medicine →

Editorial decision: revise
12 Dec, 2023
Review #2 received at journal
04 Dec, 2023
Review #3 received at journal
27 Nov, 2023
Review #1 received at journal
26 Nov, 2023
Reviewer #3 agreed at journal
22 Nov, 2023
Reviewer #2 agreed at journal
20 Nov, 2023
Reviewer #1 agreed at journal
17 Nov, 2023
Reviewers invited by journal
17 Nov, 2023
Editor assigned by journal
07 Nov, 2023
Submission checks completed at journal
07 Nov, 2023
First submitted to journal
06 Nov, 2023

You are reading this latest preprint version

Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19

Status:

Journal Publication

Version 1

Abstract

Full Text

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1