The discovery of cancer subtype based on unsupervised clustering helps provide precise diagnoses, guide treatment and improve patients’ prognoses. Instead of single-omics data, multi-omics data can improve performance of the clustering because it obtains a comprehensive landscape for understanding biological systems and mechanisms. However, heterogeneous data from multiple sources raises high complexity and different kinds of noise, which will be detrimental to the extraction of clustering information.
We propose an end-to-end deep learning-based method, Multi-omics Clustering Variational Autoencoders (MCluster-VAEs), that can extract cluster-friendly representations on multi-omics data. First, unified network architecture with an attention mechanism is developed for modeling multi-omics data precisely. Then, using a novel objective function built from the Variational Bayes technique, the model is trained to effectively obtain the posterior estimation of clustering assignments.
Compared with twelve other state-of-the-art multi-omics clustering methods, MCluster-VAEs achieved outstanding performance on benchmark datasets from the TCGA database. On the Pan Cancer dataset, MCluster-VAEs achieved adjusted Rand index of around 0.78 for cancer category recognition, an increase of more than 18% compared with other methods. Furthermore, the survival analysis and clinical parameters enrichment tests on ten cancer datasets demonstrate that MCluster-VAEs delivered comparable or even better results than many typical integrative methods.
These results demonstrate that MCluster-VAEs is a new powerful tool for dissecting complex multi-omics relationships and providing new insights for cancer subtype discovery.