Abstract:The Norman Conquest of 1066 C.E. brought profound transformations to England's administrative, societal, and linguistic practices. The DEEDS (Documents of Early England Data Set) database offers a unique opportunity to explore these changes by examining shifts in word meanings within a vast collection of Medieval Latin charters. While computational linguistics typically relies on vector representations of words like static and contextual embeddings to analyze semantic changes, existing embeddings for scarce and historical Medieval Latin are limited and may not be well-suited for this task. This paper presents the first computational analysis of semantic change pre- and post-Norman Conquest and the first systematic comparison of static and contextual embeddings in a scarce historical data set. Our findings confirm that, consistent with existing studies, contextual embeddings outperform static word embeddings in capturing semantic change within a scarce historical corpus.
Abstract:We outline an unsupervised method for temporal rank ordering of sets of historical documents, namely American State of the Union Addresses and DEEDS, a corpus of medieval English property transfer documents. Our method relies upon effectively capturing the gradual change in word usage via a bandwidth estimate for the non-parametric Generalized Linear Models (Fan, Heckman, and Wand, 1995). The number of possible rank orders needed to search through possible cost functions related to the bandwidth can be quite large, even for a small set of documents. We tackle this problem of combinatorial optimization using the Simulated Annealing algorithm, which allows us to obtain the optimal document temporal orders. Our rank ordering method significantly improved the temporal sequencing of both corpora compared to a randomly sequenced baseline. This unsupervised approach should enable the temporal ordering of undated document sets.
Abstract:Deeds, or charters, dealing with property rights, provide a continuous documentation which can be used by historians to study the evolution of social, economic and political changes. This study is concerned with charters (written in Latin) dating from the tenth through early fourteenth centuries in England. Of these, at least one million were left undated, largely due to administrative changes introduced by William the Conqueror in 1066. Correctly dating such charters is of vital importance in the study of English medieval history. This paper is concerned with computer-automated statistical methods for dating such document collections, with the goal of reducing the considerable efforts required to date them manually and of improving the accuracy of assigned dates. Proposed methods are based on such data as the variation over time of word and phrase usage, and on measures of distance between documents. The extensive (and dated) Documents of Early England Data Set (DEEDS) maintained at the University of Toronto was used for this purpose.