Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jamie Mahowald

Retrieval-Augmented Search for Large-Scale Map Collections with ColPali

Oct 29, 2025

Jamie Mahowald, Benjamin Charles Germain Lee

Abstract:Multimodal approaches have shown great promise for searching and navigating digital collections held by libraries, archives, and museums. In this paper, we introduce map-RAS: a retrieval-augmented search system for historic maps. In addition to introducing our framework, we detail our publicly-hosted demo for searching 101,233 map images held by the Library of Congress. With our system, users can multimodally query the map collection via ColPali, summarize search results using Llama 3.2, and upload their own collections to perform inter-collection search. We articulate potential use cases for archivists, curators, and end-users, as well as future work with our system in both machine learning and the digital humanities. Our demo can be viewed at: http://www.mapras.com.

* 5 pages, 5 figures

Via

Access Paper or Ask Questions

Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP

Oct 02, 2024

Jamie Mahowald, Benjamin Charles Germain Lee

Figure 1 for Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP

Figure 2 for Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP

Figure 3 for Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP

Figure 4 for Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP

Abstract:Despite the prevalence and historical importance of maps in digital collections, current methods of navigating and exploring map collections are largely restricted to catalog records and structured metadata. In this paper, we explore the potential for interactively searching large-scale map collections using natural language inputs ("maps with sea monsters"), visual inputs (i.e., reverse image search), and multimodal inputs (an example map + "more grayscale"). As a case study, we adopt 562,842 images of maps publicly accessible via the Library of Congress's API. To accomplish this, we use the mulitmodal Contrastive Language-Image Pre-training (CLIP) machine learning model to generate embeddings for these maps, and we develop code to implement exploratory search capabilities with these input strategies. We present results for example searches created in consultation with staff in the Library of Congress's Geography and Map Division and describe the strengths, weaknesses, and possibilities for these search queries. Moreover, we introduce a fine-tuning dataset of 10,504 map-caption pairs, along with an architecture for fine-tuning a CLIP model on this dataset. To facilitate re-use, we provide all of our code in documented, interactive Jupyter notebooks and place all code into the public domain. Lastly, we discuss the opportunities and challenges for applying these approaches across both digitized and born-digital collections held by galleries, libraries, archives, and museums.

* 18 pages, 7 figures, accepted at the Computational Humanities Research Conference (CHR 2024)

Via

Access Paper or Ask Questions