Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:EUFCC-340K: A Faceted Hierarchical Dataset for Metadata Annotation in GLAM Collections

Jun 04, 2024

Francesc Net, Marc Folia, Pep Casals, Andrew D. Bagdanov, Lluis Gomez

Share this with someone who'll enjoy it:

Abstract:In this paper, we address the challenges of automatic metadata annotation in the domain of Galleries, Libraries, Archives, and Museums (GLAMs) by introducing a novel dataset, EUFCC340K, collected from the Europeana portal. Comprising over 340,000 images, the EUFCC340K dataset is organized across multiple facets: Materials, Object Types, Disciplines, and Subjects, following a hierarchical structure based on the Art & Architecture Thesaurus (AAT). We developed several baseline models, incorporating multiple heads on a ConvNeXT backbone for multi-label image tagging on these facets, and fine-tuning a CLIP model with our image text pairs. Our experiments to evaluate model robustness and generalization capabilities in two different test scenarios demonstrate the utility of the dataset in improving multi-label classification tools that have the potential to alleviate cataloging tasks in the cultural heritage sector.

* 23 pages, 13 figures

View paper on

Share this with someone who'll enjoy it:

Title:EUFCC-340K: A Faceted Hierarchical Dataset for Metadata Annotation in GLAM Collections

Paper and Code