Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Morten H. Christiansen

The Danish Gigaword Project

May 08, 2020

Leon Strømberg-Derczynski, Rebekah Baglini, Morten H. Christiansen, Manuel R. Ciosici, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen(+5 more)

Figure 1 for The Danish Gigaword Project

Abstract:Danish is a North Germanic/Scandinavian language spoken primarily in Denmark, a country with a tradition of technological and scientific innovation. However, from a technological perspective, the Danish language has received relatively little attention and, as a result, Danish language technology is hard to develop, in part due to a lack of large or broad-coverage Danish corpora. This paper describes the Danish Gigaword project, which aims to construct a freely-available one billion word corpus of Danish text that represents the breadth of the written language.

Via

Access Paper or Ask Questions

Memory limitations are hidden in grammar

Aug 19, 2019

Carlos Gómez-Rodríguez, Morten H. Christiansen, Ramon Ferrer-i-Cancho

Figure 1 for Memory limitations are hidden in grammar

Figure 2 for Memory limitations are hidden in grammar

Figure 3 for Memory limitations are hidden in grammar

Figure 4 for Memory limitations are hidden in grammar

Abstract:The ability to produce and understand an unlimited number of different sentences is a hallmark of human language. Linguists have sought to define the essence of this generative capacity using formal grammars that describe the syntactic dependencies between constituents, independent of the computational limitations of the human brain. Here, we evaluate this independence assumption by sampling sentences uniformly from the space of possible syntactic structures. We find that the average dependency distance between syntactically related words, a proxy for memory limitations, is less than expected by chance in a collection of state-of-the-art classes of dependency grammars. Our findings indicate that memory limitations have permeated grammatical descriptions, suggesting that it may be theoretically impossible to capture human linguistic productivity independent of cognitive constraints.

Via

Access Paper or Ask Questions