Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Katherine McDonough

MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale

Nov 30, 2021

Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen, Katherine McDonough

Figure 1 for MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale

Figure 2 for MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale

Figure 3 for MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale

Figure 4 for MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale

Abstract:We present MapReader, a free, open-source software library written in Python for analyzing large map collections (scanned or born-digital). This library transforms the way historians can use maps by turning extensive, homogeneous map sets into searchable primary sources. MapReader allows users with little or no computer vision expertise to i) retrieve maps via web-servers; ii) preprocess and divide them into patches; iii) annotate patches; iv) train, fine-tune, and evaluate deep neural network models; and v) create structured data about map content. We demonstrate how MapReader enables historians to interpret a collection of $\approx$16K nineteenth-century Ordnance Survey map sheets ($\approx$30.5M patches), foregrounding the challenge of translating visual markers into machine-readable data. We present a case study focusing on British rail infrastructure and buildings as depicted on these maps. We also show how the outputs from the MapReader pipeline can be linked to other, external datasets, which we use to evaluate as well as enrich and interpret the results. We release $\approx$62K manually annotated patches used here for training and evaluating the models.

* 13 pages, 9 figures

Via

Access Paper or Ask Questions

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Sep 22, 2020

Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien, Federico Nanni

Figure 1 for A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Figure 2 for A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Figure 3 for A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Figure 4 for A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Abstract:Recognizing toponyms and resolving them to their real-world referents is required for providing advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a toponym previously recognized. While it has traditionally received little attention in the research community, it has been shown that candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a flexible deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several new realistic datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors). We report its performance on candidate selection in the context of the downstream task of toponym resolution, both on existing datasets and on a new manually-annotated resource of nineteenth-century English OCR'd text.

* 10 pages, 1 figure

Via

Access Paper or Ask Questions

Living Machines: A study of atypical animacy

May 22, 2020

Mariona Coll Ardanuy, Federico Nanni, Kaspar Beelen, Kasra Hosseini, Ruth Ahnert, Jon Lawrence, Katherine McDonough, Giorgia Tolfo, Daniel CS Wilson, Barbara McGillivray

Figure 1 for Living Machines: A study of atypical animacy

Figure 2 for Living Machines: A study of atypical animacy

Figure 3 for Living Machines: A study of atypical animacy

Figure 4 for Living Machines: A study of atypical animacy

Abstract:This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text. In particular, this work is focused on atypical animacy and examines the scenario in which typically inanimate objects, specifically machines, are given animate attributes. To address it, we have created the first dataset for atypical animacy detection, based on nineteenth-century sentences in English, with machines represented as either animate or inanimate. Our method builds upon recent innovations in language modeling, specifically BERT contextualized word embeddings, to better capture fine-grained contextual properties of words. We present a fully unsupervised pipeline, which can be easily adapted to different contexts, and report its performance on an established animacy dataset and our newly introduced resource. We show that our method provides a substantially more accurate characterization of atypical animacy, especially when applied to highly complex forms of language use.

* 13 pages, 2 figures

Via

Access Paper or Ask Questions