Abstract:A key function of the lexicon is to express novel concepts as they emerge over time through a process known as lexicalization. The most common lexicalization strategies are the reuse and combination of existing words, but they have typically been studied separately in the areas of word meaning extension and word formation. Here we offer an information-theoretic account of how both strategies are constrained by a fundamental tradeoff between competing communicative pressures: word reuse tends to preserve the average length of word forms at the cost of less precision, while word combination tends to produce more informative words at the expense of greater word length. We test our proposal against a large dataset of reuse items and compounds that appeared in English, French and Finnish over the past century. We find that these historically emerging items achieve higher levels of communicative efficiency than hypothetical ways of constructing the lexicon, and both literal reuse items and compounds tend to be more efficient than their non-literal counterparts. These results suggest that reuse and combination are both consistent with a unified account of lexicalization grounded in the theory of efficient communication.
Abstract:Humans possess the unique ability to communicate emotions through language. Although concepts like anger or awe are abstract, there is a shared consensus about what these English emotion words mean. This consensus may give the impression that their meaning is static, but we propose this is not the case. We cannot travel back to earlier periods to study emotion concepts directly, but we can examine text corpora, which have partially preserved the meaning of emotion words. Using natural language processing of historical text, we found evidence for semantic change in emotion words over the past century and that varying rates of change were predicted in part by an emotion concept's prototypicality - how representative it is of the broader category of "emotion". Prototypicality negatively correlated with historical rates of emotion semantic change obtained from text-based word embeddings, beyond more established variables including usage frequency in English and a second comparison language, French. This effect for prototypicality did not consistently extend to the semantic category of birds, suggesting its relevance for predicting semantic change may be category-dependent. Our results suggest emotion semantics are evolving over time, with prototypical emotion words remaining semantically stable, while other emotion words evolve more freely.
Abstract:We present a methodological framework for inferring symmetry of verb predicates in natural language. Empirical work on predicate symmetry has taken two main approaches. The feature-based approach focuses on linguistic features pertaining to symmetry. The context-based approach denies the existence of absolute symmetry but instead argues that such inference is context dependent. We develop methods that formalize these approaches and evaluate them against a novel symmetry inference sentence (SIS) dataset comprised of 400 naturalistic usages of literature-informed verbs spanning the spectrum of symmetry-asymmetry. Our results show that a hybrid transfer learning model that integrates linguistic features with contextualized language models most faithfully predicts the empirical data. Our work integrates existing approaches to symmetry in natural language and suggests how symmetry inference can improve systematicity in state-of-the-art language models.