Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rafal Rzepka

Speciesism in Natural Language Processing Research

Oct 18, 2024

Masashi Takeshita, Rafal Rzepka

Abstract:Natural Language Processing (NLP) research on AI Safety and social bias in AI has focused on safety for humans and social bias against human minorities. However, some AI ethicists have argued that the moral significance of nonhuman animals has been ignored in AI research. Therefore, the purpose of this study is to investigate whether there is speciesism, i.e., discrimination against nonhuman animals, in NLP research. First, we explain why nonhuman animals are relevant in NLP research. Next, we survey the findings of existing research on speciesism in NLP researchers, data, and models and further investigate this problem in this study. The findings of this study suggest that speciesism exists within researchers, data, and models, respectively. Specifically, our survey and experiments show that (a) among NLP researchers, even those who study social bias in AI, do not recognize speciesism or speciesist bias; (b) among NLP data, speciesist bias is inherent in the data annotated in the datasets used to evaluate NLP models; (c) OpenAI GPTs, recent NLP models, exhibit speciesist bias by default. Finally, we discuss how we can reduce speciesism in NLP research.

* This article is a preprint and has not been peer-reviewed. The postprint has been accepted for publication in AI and Ethics. Please cite the final version of the article once it is published

Via

Access Paper or Ask Questions

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Jul 04, 2024

LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto(+72 more)

Figure 1 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Figure 2 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Figure 3 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Figure 4 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Abstract:This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp. For the latest activities, visit https://llm-jp.nii.ac.jp/en/.

Via

Access Paper or Ask Questions

Speciesist Language and Nonhuman Animal Bias in English Masked Language Models

Mar 15, 2022

Masashi Takeshita, Rafal Rzepka, Kenji Araki

Figure 1 for Speciesist Language and Nonhuman Animal Bias in English Masked Language Models

Figure 2 for Speciesist Language and Nonhuman Animal Bias in English Masked Language Models

Figure 3 for Speciesist Language and Nonhuman Animal Bias in English Masked Language Models

Figure 4 for Speciesist Language and Nonhuman Animal Bias in English Masked Language Models

Abstract:Various existing studies have analyzed what social biases are inherited by NLP models. These biases may directly or indirectly harm people, therefore previous studies have focused only on human attributes. If the social biases in NLP models can be indirectly harmful to humans involved, then the models can also indirectly harm nonhuman animals. However, until recently no research on social biases in NLP regarding nonhumans existed. In this paper, we analyze biases to nonhuman animals, i.e. speciesist bias, inherent in English Masked Language Models. We analyze this bias using template-based and corpus-extracted sentences which contain speciesist (or non-speciesist) language, to show that these models tend to associate harmful words with nonhuman animals. Our code for reproducing the experiments will be made available on GitHub.

* Anonymous previous version of this paper is accessible at (https://openreview.net/forum?id=dfqMpjZOgv4). Our code will be available at (https://github.com/Language-Media-Lab/speciesist-language)

Via

Access Paper or Ask Questions

In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis

Mar 04, 2022

Michal Ptaszynski, Pawel Dybala, Tatsuaki Matsuba, Fumito Masui, Rafal Rzepka, Kenji Araki, Yoshio Momouchi

Figure 1 for In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis

Figure 2 for In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis

Figure 3 for In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis

Figure 4 for In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis

Abstract:One of the burning problems lately in Japan has been cyber-bullying, or slandering and bullying people online. The problem has been especially noticed on unofficial Web sites of Japanese schools. Volunteers consisting of school personnel and PTA (Parent-Teacher Association) members have started Online Patrol to spot malicious contents within Web forums and blogs. In practise, Online Patrol assumes reading through the whole Web contents, which is a task difficult to perform manually. With this paper we introduce a research intended to help PTA members perform Online Patrol more efficiently. We aim to develop a set of tools that can automatically detect malicious entries and report them to PTA members. First, we collected cyber-bullying data from unofficial school Web sites. Then we performed analysis of this data in two ways. Firstly, we analysed the entries with a multifaceted affect analysis system in order to find distinctive features for cyber-bullying and apply them to a machine learning classifier. Secondly, we applied a SVM based machine learning method to train a classifier for detection of cyber-bullying. The system was able to classify cyber-bullying entries with 88.2% of balanced F-score.

* International Journal of Computational Linguistics Research, Vol. 1, Issue 3, pp. 135-154, 2010
* 12 pages, 11 tables, 6 figures

Via

Access Paper or Ask Questions

Summarizing Utterances from Japanese Assembly Minutes using Political Sentence-BERT-based Method for QA Lab-PoliInfo-2 Task of NTCIR-15

Oct 22, 2020

Daiki Shirafuji, Hiromichi Kameya, Rafal Rzepka, Kenji Araki

Figure 1 for Summarizing Utterances from Japanese Assembly Minutes using Political Sentence-BERT-based Method for QA Lab-PoliInfo-2 Task of NTCIR-15

Figure 2 for Summarizing Utterances from Japanese Assembly Minutes using Political Sentence-BERT-based Method for QA Lab-PoliInfo-2 Task of NTCIR-15

Figure 3 for Summarizing Utterances from Japanese Assembly Minutes using Political Sentence-BERT-based Method for QA Lab-PoliInfo-2 Task of NTCIR-15

Figure 4 for Summarizing Utterances from Japanese Assembly Minutes using Political Sentence-BERT-based Method for QA Lab-PoliInfo-2 Task of NTCIR-15

Abstract:There are many discussions held during political meetings, and a large number of utterances for various topics is included in their transcripts. We need to read all of them if we want to follow speakers\' intentions or opinions about a given topic. To avoid such a costly and time-consuming process to grasp often longish discussions, NLP researchers work on generating concise summaries of utterances. Summarization subtask in QA Lab-PoliInfo-2 task of the NTCIR-15 addresses this problem for Japanese utterances in assembly minutes, and our team (SKRA) participated in this subtask. As a first step for summarizing utterances, we created a new pre-trained sentence embedding model, i.e. the Japanese Political Sentence-BERT. With this model, we summarize utterances without labelled data. This paper describes our approach to solving the task and discusses its results.

* 8 pages, 1 figure, 8 tables, NTCIR-15 conference

Via

Access Paper or Ask Questions