Abstract:In this paper, we launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru. We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates. Then, we describe the general details of the treebank and the language-specific considerations implemented for the proposed annotation. We finally conduct some experiments on part-of-speech tagging and syntactic dependency parsing. We focus on monolingual and transfer learning settings, where we study the impact of a Shipibo-Konibo treebank, another Panoan language resource.
Abstract:Zipf's law establishes a scaling behavior for word-frequencies in large text corpora. The appearance of Zipfian properties in human language has been previously explained as an optimization problem for the interests of speakers and hearers. On the other hand, human-like vocabularies can be viewed as bipartite graphs. The aim here is double: within a bipartite-graph approach to human vocabularies, to propose a decentralized language game model for the formation of Zipfian properties. To do this, we define a language game, in which a population of artificial agents is involved in idealized linguistic interactions. Numerical simulations show the appearance of a phase transition from an initially disordered state to three possible phases for language formation. Our results suggest that Zipfian properties in language seem to arise partly from decentralized linguistic interactions between agents endowed with bipartite word-meaning mappings.
Abstract:Background/Introduction: The Zipf's law establishes that if the words of a (large) text are ordered by decreasing frequency, the frequency versus the rank decreases as a power law with exponent close to -1. Previous work has stressed that this pattern arises from a conflict of interests of the participants of communication: speakers and hearers. Methods: The challenge here is to define a computational language game on a population of agents, playing games mainly based on a parameter that measures the relative participant's interests. Results: Numerical simulations suggest that at critical values of the parameter a human-like vocabulary, exhibiting scaling properties, seems to appear. Conclusions: The appearance of an intermediate distribution of frequencies at some critical values of the parameter suggests that on a population of artificial agents the emergence of scaling partly arises as a self-organized process only from local interactions between agents.
Abstract:Traditionally, the formation of vocabularies has been studied by agent-based models (specially, the Naming Game) in which random pairs of agents negotiate word-meaning associations at each discrete time step. This paper proposes a first approximation to a novel question: To what extent the negotiation of word-meaning associations is influenced by the order in which the individuals interact? Automata Networks provide the adequate mathematical framework to explore this question. Computer simulations suggest that on two-dimensional lattices the typical features of the formation of word-meaning associations are recovered under random schemes that update small fractions of the population at the same time.
Abstract:Can artificial communities of agents develop language with scaling relations close to the Zipf law? As a preliminary answer to this question, we propose an Automata Networks model of the formation of a vocabulary on a population of individuals, under two in principle opposite strategies: the alignment and the least effort principle. Within the previous account to the emergence of linguistic conventions (specially, the Naming Game), we focus on modeling speaker and hearer efforts as actions over their vocabularies and we study the impact of these actions on the formation of a shared language. The numerical simulations are essentially based on an energy function, that measures the amount of local agreement between the vocabularies. The results suggests that on one dimensional lattices the best strategy to the formation of shared languages is the one that minimizes the efforts of speakers on communicative tasks.
Abstract:The Naming Game has been studied to explore the role of self-organization in the development and negotiation of linguistic conventions. In this paper, we define an automata networks approach to the Naming Game. Two problems are faced: (1) the definition of an automata networks for multi-party communicative interactions; and (2) the proof of convergence for three different orders in which the individuals are updated (updating schemes). Finally, computer simulations are explored in two-dimensional lattices with the purpose to recover the main features of the Naming Game and to describe the dynamics under different updating schemes.
Abstract:This work develops a computational model (by Automata Networks) of phonological similarity effects involved in the formation of word-meaning associations on artificial populations of speakers. Classical studies show that in recalling experiments memory performance was impaired for phonologically similar words versus dissimilar ones. Here, the individuals confound phonologically similar words according to a predefined parameter. The main hypothesis is that there is a critical range of the parameter, and with this, of working-memory mechanisms, which implies drastic changes in the final consensus of the entire population. Theoretical results present proofs of convergence for a particular case of the model within a worst-case complexity framework. Computer simulations describe the evolution of an energy function that measures the amount of local agreement between individuals. The main finding is the appearance of sudden changes in the energy function at critical parameters.
Abstract:This work attempts to give new theoretical insights to the absence of intermediate stages in the evolution of language. In particular, it is developed an automata networks approach to a crucial question: how a population of language users can reach agreement on a linguistic convention? To describe the appearance of sharp transitions in the self-organization of language, it is adopted an extremely simple model of (working) memory. At each time step, language users simply loss part of their word-memories. Through computer simulations of low-dimensional lattices, it appear sharp transitions at critical values that depend on the size of the vicinities of the individuals.