Convolutional neural networks have been utilised for medical disease detection. However, most of the work is completed on the feature maps of the final convolution layer and the benefits of additional layers are not considered. The extraction of effective features from a limited medical image dataset is key for improving disease classification. In this paper, we propose a novel deep learning-based multilayer multimodal fusion model that focuses on extracting the features of different layers of the model and their fusion. Our disease detection model considered discriminatory information from each layer. Furthermore, to fuse different-sized feature maps of layers, we propose a novel feature map transform module known as the fusion of differentsize feature maps (FDSFM). The proposed model has shown a significantly higher accuracy of 97.21% and 99.60% for both three-class classification and two-class classification, respectively. The proposed model can be extended to any disease classifications from chest X-Ray images.