Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anna Feldman

Turkish Delights: a Dataset on Turkish Euphemisms

Jul 17, 2024

Hasan Can Biyik, Patrick Lee, Anna Feldman

Abstract:Euphemisms are a form of figurative language relatively understudied in natural language processing. This research extends the current computational work on potentially euphemistic terms (PETs) to Turkish. We introduce the Turkish PET dataset, the first available of its kind in the field. By creating a list of euphemisms in Turkish, collecting example contexts, and annotating them, we provide both euphemistic and non-euphemistic examples of PETs in Turkish. We describe the dataset and methodologies, and also experiment with transformer-based models on Turkish euphemism detection by using our dataset for binary classification. We compare performances across models using F1, accuracy, and precision as evaluation metrics.

* In Proceedings of The First SIGTURK workshop co-located with ACL 2024: https://sigturk.github.io/workshop/

Via

Access Paper or Ask Questions

Evaluating Embeddings for One-Shot Classification of Doctor-AI Consultations

Feb 06, 2024

Olumide Ebenezer Ojo, Olaronke Oluwayemisi Adebanji, Alexander Gelbukh, Hiram Calvo, Anna Feldman

Abstract:Effective communication between healthcare providers and patients is crucial to providing high-quality patient care. In this work, we investigate how Doctor-written and AI-generated texts in healthcare consultations can be classified using state-of-the-art embeddings and one-shot classification systems. By analyzing embeddings such as bag-of-words, character n-grams, Word2Vec, GloVe, fastText, and GPT2 embeddings, we examine how well our one-shot classification systems capture semantic information within medical consultations. Results show that the embeddings are capable of capturing semantic features from text in a reliable and adaptable manner. Overall, Word2Vec, GloVe and Character n-grams embeddings performed well, indicating their suitability for modeling targeted to this task. GPT2 embedding also shows notable performance, indicating its suitability for models tailored to this task as well. Our machine learning architectures significantly improved the quality of health conversations when training data are scarce, improving communication between patients and healthcare providers.

Via

Access Paper or Ask Questions

MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms

Jan 25, 2024

Patrick Lee, Alain Chirino Trujillo, Diana Cuevas Plancarte, Olumide Ebenezer Ojo, Xinyi Liu, Iyanuoluwa Shode, Yuan Zhao, Jing Peng, Anna Feldman

Figure 1 for MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms

Figure 2 for MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms

Figure 3 for MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms

Figure 4 for MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms

Abstract:This study investigates the computational processing of euphemisms, a universal linguistic phenomenon, across multiple languages. We train a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with current trends, we demonstrate that zero-shot learning across languages takes place. We also show cases where multilingual models perform better on the task compared to monolingual models by a statistically significant margin, indicating that multilingual data presents additional opportunities for models to learn about cross-lingual, computational properties of euphemisms. In a follow-up analysis, we focus on universal euphemistic "categories" such as death and bodily functions among others. We test to see whether cross-lingual data of the same domain is more important than within-language data of other domains to further understand the nature of the cross-lingual transfer.

Via

Access Paper or Ask Questions

MedAI Dialog Corpus (MEDIC): Zero-Shot Classification of Doctor and AI Responses in Health Consultations

Oct 20, 2023

Olumide E. Ojo, Olaronke O. Adebanji, Alexander Gelbukh, Hiram Calvo, Anna Feldman

Abstract:Zero-shot classification enables text to be classified into classes not seen during training. In this research, we investigate the effectiveness of pre-trained language models to accurately classify responses from Doctors and AI in health consultations through zero-shot learning. Our study aims to determine whether these models can effectively detect if a text originates from human or AI models without specific corpus training. We collect responses from doctors to patient inquiries about their health and pose the same question/response to AI models. While zero-shot language models show a good understanding of language in general, they have limitations in classifying doctor and AI responses in healthcare consultations. This research lays the groundwork for further research into this field of medical text classification, informing the development of more effective approaches to accurately classify doctor-generated and AI-generated text in health consultations.

Via

Access Paper or Ask Questions

Legend at ArAIEval Shared Task: Persuasion Technique Detection using a Language-Agnostic Text Representation Model

Oct 14, 2023

Olumide E. Ojo, Olaronke O. Adebanji, Hiram Calvo, Damian O. Dieke, Olumuyiwa E. Ojo, Seye E. Akinsanya, Tolulope O. Abiola, Anna Feldman

Abstract:In this paper, we share our best performing submission to the Arabic AI Tasks Evaluation Challenge (ArAIEval) at ArabicNLP 2023. Our focus was on Task 1, which involves identifying persuasion techniques in excerpts from tweets and news articles. The persuasion technique in Arabic texts was detected using a training loop with XLM-RoBERTa, a language-agnostic text representation model. This approach proved to be potent, leveraging fine-tuning of a multilingual language model. In our evaluation of the test set, we achieved a micro F1 score of 0.64 for subtask A of the competition.

Via

Access Paper or Ask Questions

FEED PETs: Further Experimentation and Expansion on the Disambiguation of Potentially Euphemistic Terms

Jun 06, 2023

Patrick Lee, Iyanuoluwa Shode, Alain Chirino Trujillo, Yuan Zhao, Olumide Ebenezer Ojo, Diana Cuevas Plancarte, Anna Feldman, Jing Peng

Abstract:Transformers have been shown to work well for the task of English euphemism disambiguation, in which a potentially euphemistic term (PET) is classified as euphemistic or non-euphemistic in a particular context. In this study, we expand on the task in two ways. First, we annotate PETs for vagueness, a linguistic property associated with euphemisms, and find that transformers are generally better at classifying vague PETs, suggesting linguistic differences in the data that impact performance. Second, we present novel euphemism corpora in three different languages: Yoruba, Spanish, and Mandarin Chinese. We perform euphemism disambiguation experiments in each language using multilingual transformer models mBERT and XLM-RoBERTa, establishing preliminary results from which to launch future work.

Via

Access Paper or Ask Questions

NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification

May 18, 2023

Iyanuoluwa Shode, David Ifeoluwa Adelani, Jing Peng, Anna Feldman

Figure 1 for NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification

Figure 2 for NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification

Figure 3 for NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification

Figure 4 for NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification

Abstract:Africa has over 2000 indigenous languages but they are under-represented in NLP research due to lack of datasets. In recent years, there have been progress in developing labeled corpora for African languages. However, they are often available in a single domain and may not generalize to other domains. In this paper, we focus on the task of sentiment classification for cross domain adaptation. We create a new dataset, NollySenti - based on the Nollywood movie reviews for five languages widely spoken in Nigeria (English, Hausa, Igbo, Nigerian-Pidgin, and Yoruba. We provide an extensive empirical evaluation using classical machine learning methods and pre-trained language models. Leveraging transfer learning, we compare the performance of cross-domain adaptation from Twitter domain, and cross-lingual adaptation from English language. Our evaluation shows that transfer from English in the same target domain leads to more than 5% improvement in accuracy compared to transfer from Twitter in the same language. To further mitigate the domain difference, we leverage machine translation (MT) from English to other Nigerian languages, which leads to a further improvement of 7% over cross-lingual evaluation. While MT to low-resource languages are often of low quality, through human evaluation, we show that most of the translated sentences preserve the sentiment of the original English reviews.

Via

Access Paper or Ask Questions

A Report on the Euphemisms Detection Shared Task

Dec 03, 2022

Patrick Lee, Anna Feldman, Jing Peng

Figure 1 for A Report on the Euphemisms Detection Shared Task

Figure 2 for A Report on the Euphemisms Detection Shared Task

Figure 3 for A Report on the Euphemisms Detection Shared Task

Abstract:This paper presents The Shared Task on Euphemism Detection for the Third Workshop on Figurative Language Processing (FigLang 2022) held in conjunction with EMNLP 2022. Participants were invited to investigate the euphemism detection task: given input text, identify whether it contains a euphemism. The input data is a corpus of sentences containing potentially euphemistic terms (PETs) collected from the GloWbE corpus (Davies and Fuchs, 2015), and are human-annotated as containing either a euphemistic or literal usage of a PET. In this paper, we present the results and analyze the common themes, methods and findings of the participating teams

Via

Access Paper or Ask Questions

Searching for PETs: Using Distributional and Sentiment-Based Methods to Find Potentially Euphemistic Terms

May 20, 2022

Patrick Lee, Martha Gavidia, Anna Feldman, Jing Peng

Figure 1 for Searching for PETs: Using Distributional and Sentiment-Based Methods to Find Potentially Euphemistic Terms

Figure 2 for Searching for PETs: Using Distributional and Sentiment-Based Methods to Find Potentially Euphemistic Terms

Abstract:This paper presents a linguistically driven proof of concept for finding potentially euphemistic terms, or PETs. Acknowledging that PETs tend to be commonly used expressions for a certain range of sensitive topics, we make use of distributional similarities to select and filter phrase candidates from a sentence and rank them using a set of simple sentiment-based metrics. We present the results of our approach tested on a corpus of sentences containing euphemisms, demonstrating its efficacy for detecting single and multi-word PETs from a broad range of topics. We also discuss future potential for sentiment-based methods on this task.

* Proceedings of UnImplicit: The Second Workshop on Understanding Implicit and Underspecified Language, NAACL 2022, Seattle

Via

Access Paper or Ask Questions

CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms

May 05, 2022

Martha Gavidia, Patrick Lee, Anna Feldman, Jing Peng

Figure 1 for CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms

Figure 2 for CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms

Figure 3 for CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms

Figure 4 for CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms

Abstract:Euphemisms have not received much attention in natural language processing, despite being an important element of polite and figurative language. Euphemisms prove to be a difficult topic, not only because they are subject to language change, but also because humans may not agree on what is a euphemism and what is not. Nevertheless, the first step to tackling the issue is to collect and analyze examples of euphemisms. We present a corpus of potentially euphemistic terms (PETs) along with example texts from the GloWbE corpus. Additionally, we present a subcorpus of texts where these PETs are not being used euphemistically, which may be useful for future applications. We also discuss the results of multiple analyses run on the corpus. Firstly, we find that sentiment analysis on the euphemistic texts supports that PETs generally decrease negative and offensive sentiment. Secondly, we observe cases of disagreement in an annotation task, where humans are asked to label PETs as euphemistic or not in a subset of our corpus text examples. We attribute the disagreement to a variety of potential reasons, including if the PET was a commonly accepted term (CAT).

* Proceedings of LREC 2022

Via

Access Paper or Ask Questions