Multi-modality medical image classification aims to combine information from different modalities or devices to generate comprehensive and accurate diagnostic results. Existing research methods have ignored two characteristics of medical images across different phases: the highly redundant background and may exist the low differentiation between different phases. Based on the idea of disentangled representation learning, we introduce a dual-branch network to disentangle images into shared features and modality-specific features. And based on the properties of different features, we propose a prototypical loss and a similar prototypical loss to constrain the two types of features, respectively. Our approach achieves strong performance in classification on LLD-MMRI dataset and fusion on ANNLIB dataset. Extensive ablation studies validate the contribution of each component of our framework.