Abstract:Archaeological catalogs, containing key elements such as artifact images, morphological descriptions, and excavation information, are essential for studying artifact evolution and cultural inheritance. These data are widely scattered across publications, requiring automated collection methods. However, existing Large Vision-Language Models (VLMs) and their derivative data collection methods face challenges in accurate image detection and modal matching when processing archaeological catalogs, making automated collection difficult. To address these issues, we propose a novel archaeological catalog collection method based on Large Vision-Language Models that follows an approach comprising three modules: document localization, block comprehension and block matching. Through practical data collection from the Dabagou and Miaozigou pottery catalogs and comparison experiments, we demonstrate the effectiveness of our approach, providing a reliable solution for automated collection of archaeological catalogs.
Abstract:In real-world applications across specialized domains, addressing complex out-of-distribution (OOD) challenges is a common and significant concern. In this study, we concentrate on the task of fine-grained bronze ware dating, a critical aspect in the study of ancient Chinese history, and developed a benchmark dataset named ShiftedBronzes. By extensively expanding the bronze Ding dataset, ShiftedBronzes incorporates two types of bronze ware data and seven types of OOD data, which exhibit distribution shifts commonly encountered in bronze ware dating scenarios. We conduct benchmarking experiments on ShiftedBronzes and five commonly used general OOD datasets, employing a variety of widely adopted post-hoc, pre-trained Vision Large Model (VLM)-based and generation-based OOD detection methods. Through analysis of the experimental results, we validate previous conclusions regarding post-hoc, VLM-based, and generation-based methods, while also highlighting their distinct behaviors on specialized datasets. These findings underscore the unique challenges of applying general OOD detection methods to domain-specific tasks such as bronze ware dating. We hope that the ShiftedBronzes benchmark provides valuable insights into both the field of bronze ware dating and the and the development of OOD detection methods. The dataset and associated code will be available later.
Abstract:In this paper, we propose a learning-based image fragment pair-searching and -matching approach to solve the challenging restoration problem. Existing works use rule-based methods to match similar contour shapes or textures, which are always difficult to tune hyperparameters for extensive data and computationally time-consuming. Therefore, we propose a neural network that can effectively utilize neighbor textures with contour shape information to fundamentally improve performance. First, we employ a graph-based network to extract the local contour and texture features of fragments. Then, for the pair-searching task, we adopt a linear transformer-based module to integrate these local features and use contrastive loss to encode the global features of each fragment. For the pair-matching task, we design a weighted fusion module to dynamically fuse extracted local contour and texture features, and formulate a similarity matrix for each pair of fragments to calculate the matching score and infer the adjacent segment of contours. To faithfully evaluate our proposed network, we created a new image fragment dataset through an algorithm we designed that tears complete images into irregular fragments. The experimental results show that our proposed network achieves excellent pair-searching accuracy, reduces matching errors, and significantly reduces computational time. Details, sourcecode, and data are available in our supplementary material.