Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Diwa Koirala

Nowa Lab

Generative AI for Named Entity Recognition in Low-Resource Language Nepali

Mar 12, 2025

Sameer Neupane, Jeevan Chapagain, Nobal B. Niraula, Diwa Koirala

Abstract:Generative Artificial Intelligence (GenAI), particularly Large Language Models (LLMs), has significantly advanced Natural Language Processing (NLP) tasks, such as Named Entity Recognition (NER), which involves identifying entities like person, location, and organization names in text. LLMs are especially promising for low-resource languages due to their ability to learn from limited data. However, the performance of GenAI models for Nepali, a low-resource language, has not been thoroughly evaluated. This paper investigates the application of state-of-the-art LLMs for Nepali NER, conducting experiments with various prompting techniques to assess their effectiveness. Our results provide insights into the challenges and opportunities of using LLMs for NER in low-resource settings and offer valuable contributions to the advancement of NLP research in languages like Nepali.

* This paper has been accepted in the FLAIRS Conference 2025

Via

Access Paper or Ask Questions

Linguistic Taboos and Euphemisms in Nepali

Jul 27, 2020

Nobal B. Niraula, Saurab Dulal, Diwa Koirala

Figure 1 for Linguistic Taboos and Euphemisms in Nepali

Figure 2 for Linguistic Taboos and Euphemisms in Nepali

Figure 3 for Linguistic Taboos and Euphemisms in Nepali

Abstract:Languages across the world have words, phrases, and behaviors -- the taboos -- that are avoided in public communication considering them as obscene or disturbing to the social, religious, and ethical values of society. However, people deliberately use these linguistic taboos and other language constructs to make hurtful, derogatory, and obscene comments. It is nearly impossible to construct a universal set of offensive or taboo terms because offensiveness is determined entirely by different factors such as socio-physical setting, speaker-listener relationship, and word choices. In this paper, we present a detailed corpus-based study of offensive language in Nepali. We identify and describe more than 18 different categories of linguistic offenses including politics, religion, race, and sex. We discuss 12 common euphemisms such as synonym, metaphor and circumlocution. In addition, we introduce a manually constructed data set of over 1000 offensive and taboo terms popular among contemporary speakers. This in-depth study of offensive language and resource will provide a foundation for several downstream tasks such as offensive language detection and language learning.

* 10 pages, 3 tables

Via

Access Paper or Ask Questions