Recent advancements in Artificial Intelligence (AI), notably the development of Large Language Models (LLMs)and text-to-image diffusion models, have facilitated the creation of realistic textual content and images. Specifically, platformslike ChatGPT and Midjourney have simplified creation of high-quality text and visuals with minimal expertise and cost. Theincreasing sophistication of Generative AI presents challenges in ensuring the integrity of news, media, and information quality,making it increasingly difficult to distinguish between real and artificially generated textual and visual content. Our workaddresses this problem in two ways. First, by means of ChatGPT and Midjourney we create a comprehensive novel multimodalnews corpus named SyN24News based on the N24News corpus, on which we evaluate our model. Second, we develop a novelexplainable synthetic news detector for discriminating between real and synthetic news articles. We leverage a Neural AdditiveModel (NAM) like network structure that ensures effect separation by handling input data in separate subnetworks. Complexstructures and patterns are extracted by deep features from unstructured data, i.e. images and texts, using fine-tuned VGGand DistilBERT subnetworks. We ensure further explainability by individually processing carefully chosen handcrafted textand image features in simple Multilayer Perceptrons (MLPs), allowing for graphical interpretation of corresponding structuredeffects. Our findings indicate that textual information are the main drivers in the decision finding process. Structured textualeffects, particularly Flesch-Kincaid reading ease and sentiment, have a much higher influence on the classification outcomethan visual features such as dissimilarity and homogeneity.