Background: The amount of available biological data has exploded since the emergence of high-throughput technologies, which is not only revolting the way we recognize molecules and diseases but also bringing novel analytical challenges to bioinformatics analysis. In the last decade, deep learning has become a dominant technique in data science. However, classification accuracy is plagued with domain discrepancy. Notably, in the presence of multiple batches, domain discrepancy typically happens between individual batches. The recently proposed pair-wise adaptation approach may be suboptimal as it fails to eliminate the external factors across multiple batches and takes the classification task into account simultaneously.
Results: We propose a joint deep learning framework for integrating batch effect removal and classification upon various omics data. To this end, we validate it on two private metabolomics (MALDI MS) datasets and one public transcriptomics (scRNA-seq) dataset. Especially for the former, we have achieved the highest diagnostic accuracy (ACC), with notable ~10% improvement than over state-of-the-art methods. Overall, these results indicate that our approach removes batch effect more effectively than conventional methods and yields more accurate classification results for smart diagnosis.