Multimodal Metadata Assignment for Cultural Heritage Artifacts

doi:10.21203/rs.3.rs-1708875/v1

Download PDF

Research Article

Multimodal Metadata Assignment for Cultural Heritage Artifacts

https://doi.org/10.21203/rs.3.rs-1708875/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers and use the focal loss to handle class imbalance. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged specific data models and taxonomy in a Knowledge Graph to create the dataset and to store classification results. All individual classifiers accurately predict missing propertiesin the digitized silk artifacts, with the multimodal approach providing the best results.

Cultural Heritage

Multimodal

Deep Learning

Multilingual Text Classification

Image Classification

Transformer

Convolutional Neural Networks

No competing interests reported.

Download PDF

Reviews received at journal
07 Nov, 2022
Reviewers agreed at journal
27 Oct, 2022
Reviewers invited by journal
27 Oct, 2022
Submission checks completed at journal
08 Sep, 2022
First submitted to journal
04 Sep, 2022

You are reading this latest preprint version

Multimodal Metadata Assignment for Cultural Heritage Artifacts

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1