Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shandy Darma

Crowdsourcing Lexical Diversity

Oct 30, 2024

Hadi Khalilia, Jahna Otterbacher, Gabor Bella, Rusma Noortyani, Shandy Darma, Fausto Giunchiglia

Figure 1 for Crowdsourcing Lexical Diversity

Figure 2 for Crowdsourcing Lexical Diversity

Figure 3 for Crowdsourcing Lexical Diversity

Figure 4 for Crowdsourcing Lexical Diversity

Abstract:Lexical-semantic resources (LSRs), such as online lexicons or wordnets, are fundamental for natural language processing applications. In many languages, however, such resources suffer from quality issues: incorrect entries, incompleteness, but also, the rarely addressed issue of bias towards the English language and Anglo-Saxon culture. Such bias manifests itself in the absence of concepts specific to the language or culture at hand, the presence of foreign (Anglo-Saxon) concepts, as well as in the lack of an explicit indication of untranslatability, also known as cross-lingual \emph{lexical gaps}, when a term has no equivalent in another language. This paper proposes a novel crowdsourcing methodology for reducing bias in LSRs. Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food. Our LingoGap crowdsourcing tool facilitates comparisons through microtasks identifying equivalent terms, language-specific terms, and lexical gaps across languages. We validated our method by applying it to two case studies focused on food-related terminology: (1) English and Arabic, and (2) Standard Indonesian and Banjarese. These experiments identified 2,140 lexical gaps in the first case study and 951 in the second. The success of these experiments confirmed the usability of our method and tool for future large-scale lexicon enrichment tasks.

Via

Access Paper or Ask Questions

Lexical Diversity in Kinship Across Languages and Dialects

Aug 24, 2023

Hadi Khalilia, Gábor Bella, Abed Alhakim Freihat, Shandy Darma, Fausto Giunchiglia

Figure 1 for Lexical Diversity in Kinship Across Languages and Dialects

Figure 2 for Lexical Diversity in Kinship Across Languages and Dialects

Figure 3 for Lexical Diversity in Kinship Across Languages and Dialects

Figure 4 for Lexical Diversity in Kinship Across Languages and Dialects

Abstract:Languages are known to describe the world in diverse ways. Across lexicons, diversity is pervasive, appearing through phenomena such as lexical gaps and untranslatability. However, in computational resources, such as multilingual lexical databases, diversity is hardly ever represented. In this paper, we introduce a method to enrich computational lexicons with content relating to linguistic diversity. The method is verified through two large-scale case studies on kinship terminology, a domain known to be diverse across languages and cultures: one case study deals with seven Arabic dialects, while the other one with three Indonesian languages. Our results, made available as browseable and downloadable computational resources, extend prior linguistics research on kinship terminology, and provide insight into the extent of diversity even within linguistically and culturally close communities.

Via

Access Paper or Ask Questions