Background: The global incidence of skin cancers, particularly melanoma, the deadliest among them, has risen in recent decades. Artificial Intelligence (AI) has yielded remarkable results across diverse domains including healthcare. However, most AI models for skin cancer prediction primarily focus on a single data type or modality, such as images. Emerging research highlights the potential of multimodal approaches to enhance predictive performance and interpretability in clinical tasks.
Methods: In this work, the effectiveness of multimodal fusion methods —including early and late fusion— for the detection of melanoma is assessed. Three public skin image datasets (ISIC, PH2, and Derm7pt) were considered, and two types of modalities were used, including dermoscopy images and tabular data comprising metadata, image features, and image embeddings. This study employed AI-based image classification and extended feature extraction to capture the color and texture characteristics of skin lesions, aiming to discover factors related to melanoma.
Results: Experimental findings reveal that the proposed multimodal fusion methods outperform single-modality models. Early fusion methods achieve the highest classification results, with an overall AUCROC of 0.873, 0.963, and 0.857 for ISIC, PH2, and Derm7pt, respectively. Compared with models trained with single-modality data, fusion approaches exhibited notable improvements of 6.67% (ISIC), 8.27% (Derm7pt), and 7% (PH2) in AUCROC. Two post-hoc techniques were used for detecting salient regions in dermoscopy images and identifying the most relevant image features for melanoma diagnosis.
Conclusions: These results hold promise for early skin lesion detection, contributing valuable support to dermatologists in their daily practice.