Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hadi Khalilia

Crowdsourcing Lexical Diversity

Oct 30, 2024

Hadi Khalilia, Jahna Otterbacher, Gabor Bella, Rusma Noortyani, Shandy Darma, Fausto Giunchiglia

Figure 1 for Crowdsourcing Lexical Diversity

Figure 2 for Crowdsourcing Lexical Diversity

Figure 3 for Crowdsourcing Lexical Diversity

Figure 4 for Crowdsourcing Lexical Diversity

Abstract:Lexical-semantic resources (LSRs), such as online lexicons or wordnets, are fundamental for natural language processing applications. In many languages, however, such resources suffer from quality issues: incorrect entries, incompleteness, but also, the rarely addressed issue of bias towards the English language and Anglo-Saxon culture. Such bias manifests itself in the absence of concepts specific to the language or culture at hand, the presence of foreign (Anglo-Saxon) concepts, as well as in the lack of an explicit indication of untranslatability, also known as cross-lingual \emph{lexical gaps}, when a term has no equivalent in another language. This paper proposes a novel crowdsourcing methodology for reducing bias in LSRs. Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food. Our LingoGap crowdsourcing tool facilitates comparisons through microtasks identifying equivalent terms, language-specific terms, and lexical gaps across languages. We validated our method by applying it to two case studies focused on food-related terminology: (1) English and Arabic, and (2) Standard Indonesian and Banjarese. These experiments identified 2,140 lexical gaps in the first case study and 951 in the second. The success of these experiments confirmed the usability of our method and tool for future large-scale lexicon enrichment tasks.

Via

Access Paper or Ask Questions

Advancing the Arabic WordNet: Elevating Content Quality

Mar 29, 2024

Abed Alhakim Freihat, Hadi Khalilia, Gábor Bella, Fausto Giunchiglia

Figure 1 for Advancing the Arabic WordNet: Elevating Content Quality

Figure 2 for Advancing the Arabic WordNet: Elevating Content Quality

Figure 3 for Advancing the Arabic WordNet: Elevating Content Quality

Figure 4 for Advancing the Arabic WordNet: Elevating Content Quality

Abstract:High-quality WordNets are crucial for achieving high-quality results in NLP applications that rely on such resources. However, the wordnets of most languages suffer from serious issues of correctness and completeness with respect to the words and word meanings they define, such as incorrect lemmas, missing glosses and example sentences, or an inadequate, Western-centric representation of the morphology and the semantics of the language. Previous efforts have largely focused on increasing lexical coverage while ignoring other qualitative aspects. In this paper, we focus on the Arabic language and introduce a major revision of the Arabic WordNet that addresses multiple dimensions of lexico-semantic resource quality. As a result, we updated more than 58% of the synsets of the existing Arabic WordNet by adding missing information and correcting errors. In order to address issues of language diversity and untranslatability, we also extended the wordnet structure by new elements: phrasets and lexical gaps.

Via

Access Paper or Ask Questions

Lexical Diversity in Kinship Across Languages and Dialects

Aug 24, 2023

Hadi Khalilia, Gábor Bella, Abed Alhakim Freihat, Shandy Darma, Fausto Giunchiglia

Figure 1 for Lexical Diversity in Kinship Across Languages and Dialects

Figure 2 for Lexical Diversity in Kinship Across Languages and Dialects

Figure 3 for Lexical Diversity in Kinship Across Languages and Dialects

Figure 4 for Lexical Diversity in Kinship Across Languages and Dialects

Abstract:Languages are known to describe the world in diverse ways. Across lexicons, diversity is pervasive, appearing through phenomena such as lexical gaps and untranslatability. However, in computational resources, such as multilingual lexical databases, diversity is hardly ever represented. In this paper, we introduce a method to enrich computational lexicons with content relating to linguistic diversity. The method is verified through two large-scale case studies on kinship terminology, a domain known to be diverse across languages and cultures: one case study deals with seven Arabic dialects, while the other one with three Indonesian languages. Our results, made available as browseable and downloadable computational resources, extend prior linguistics research on kinship terminology, and provide insight into the extent of diversity even within linguistically and culturally close communities.

Via

Access Paper or Ask Questions

Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship

Apr 11, 2022

Temuulen Khishigsuren, Gábor Bella, Khuyagbaatar Batsuren, Abed Alhakim Freihat, Nandu Chandran Nair, Amarsanaa Ganbold, Hadi Khalilia, Yamini Chandrashekar, Fausto Giunchiglia

Figure 1 for Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship

Figure 2 for Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship

Figure 3 for Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship

Figure 4 for Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship

Abstract:This paper describes a method to enrich lexical resources with content relating to linguistic diversity, based on knowledge from the field of lexical typology. We capture the phenomenon of diversity through the notions of lexical gap and language-specific word and use a systematic method to infer gaps semi-automatically on a large scale. As a first result obtained for the domain of kinship terminology, known to be very diverse throughout the world, we publish a lexico-semantic resource consisting of 198 domain concepts, 1,911 words, and 37,370 gaps covering 699 languages. We see potential in the use of resources such as ours for the improvement of a variety of cross-lingual NLP tasks, which we demonstrate through a downstream application for the evaluation of machine translation systems.

* LREC 2022

Via

Access Paper or Ask Questions