Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mika Hämäläinen

A Comparative Analysis of Ethical and Safety Gaps in LLMs using Relative Danger Coefficient

May 06, 2025

Yehor Tereshchenko, Mika Hämäläinen

Abstract:Artificial Intelligence (AI) and Large Language Models (LLMs) have rapidly evolved in recent years, showcasing remarkable capabilities in natural language understanding and generation. However, these advancements also raise critical ethical questions regarding safety, potential misuse, discrimination and overall societal impact. This article provides a comparative analysis of the ethical performance of various AI models, including the brand new DeepSeek-V3(R1 with reasoning and without), various GPT variants (4o, 3.5 Turbo, 4 Turbo, o1/o3 mini) and Gemini (1.5 flash, 2.0 flash and 2.0 flash exp) and highlights the need for robust human oversight, especially in situations with high stakes. Furthermore, we present a new metric for calculating harm in LLMs called Relative Danger Coefficient (RDC).

* Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, 2025

Via

Access Paper or Ask Questions

On Psychology of AI -- Does Primacy Effect Affect ChatGPT and Other LLMs?

Apr 29, 2025

Mika Hämäläinen

Abstract:We study the primacy effect in three commercial LLMs: ChatGPT, Gemini and Claude. We do this by repurposing the famous experiment Asch (1946) conducted using human subjects. The experiment is simple, given two candidates with equal descriptions which one is preferred if one description has positive adjectives first before negative ones and another description has negative adjectives followed by positive ones. We test this in two experiments. In one experiment, LLMs are given both candidates simultaneously in the same prompt, and in another experiment, LLMs are given both candidates separately. We test all the models with 200 candidate pairs. We found that, in the first experiment, ChatGPT preferred the candidate with positive adjectives listed first, while Gemini preferred both equally often. Claude refused to make a choice. In the second experiment, ChatGPT and Claude were most likely to rank both candidates equally. In the case where they did not give an equal rating, both showed a clear preference to a candidate that had negative adjectives listed first. Gemini was most likely to prefer a candidate with negative adjectives listed first.

Via

Access Paper or Ask Questions

Threefold model for AI Readiness: A Case Study with Finnish Healthcare SMEs

Mar 15, 2025

Mohammed Alnajjar, Khalid Alnajjar, Mika Hämäläinen

Abstract:This study examines AI adoption among Finnish healthcare SMEs through semi-structured interviews with six health-tech companies. We identify three AI engagement categories: AI-curious (exploring AI), AI-embracing (integrating AI), and AI-catering (providing AI solutions). Our proposed threefold model highlights key adoption barriers, including regulatory complexities, technical expertise gaps, and financial constraints. While SMEs recognize AI's potential, most remain in early adoption stages. We provide actionable recommendations to accelerate AI integration, focusing on regulatory reforms, talent development, and inter-company collaboration, offering valuable insights for healthcare organizations, policymakers, and researchers.

Via

Access Paper or Ask Questions

Analyzing Pokémon and Mario Streamers' Twitch Chat with LLM-based User Embeddings

Nov 17, 2024

Mika Hämäläinen, Jack Rueter, Khalid Alnajjar

Figure 1 for Analyzing Pokémon and Mario Streamers' Twitch Chat with LLM-based User Embeddings

Figure 2 for Analyzing Pokémon and Mario Streamers' Twitch Chat with LLM-based User Embeddings

Figure 3 for Analyzing Pokémon and Mario Streamers' Twitch Chat with LLM-based User Embeddings

Figure 4 for Analyzing Pokémon and Mario Streamers' Twitch Chat with LLM-based User Embeddings

Abstract:We present a novel digital humanities method for representing our Twitch chatters as user embeddings created by a large language model (LLM). We cluster these embeddings automatically using affinity propagation and further narrow this clustering down through manual analysis. We analyze the chat of one stream by each Twitch streamer: SmallAnt, DougDoug and PointCrow. Our findings suggest that each streamer has their own type of chatters, however two categories emerge for all of the streamers: supportive viewers and emoji and reaction senders. Repetitive message spammers is a shared chatter category for two of the streamers.

* NLP4DH 2024

Via

Access Paper or Ask Questions

Leveraging Transformer-Based Models for Predicting Inflection Classes of Words in an Endangered Sami Language

Nov 04, 2024

Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

Abstract:This paper presents a methodology for training a transformer-based model to classify lexical and morphosyntactic features of Skolt Sami, an endangered Uralic language characterized by complex morphology. The goal of our approach is to create an effective system for understanding and analyzing Skolt Sami, given the limited data availability and linguistic intricacies inherent to the language. Our end-to-end pipeline includes data extraction, augmentation, and training a transformer-based model capable of predicting inflection classes. The motivation behind this work is to support language preservation and revitalization efforts for minority languages like Skolt Sami. Accurate classification not only helps improve the state of Finite-State Transducers (FSTs) by providing greater lexical coverage but also contributes to systematic linguistic documentation for researchers working with newly discovered words from literature and native speakers. Our model achieves an average weighted F1 score of 1.00 for POS classification and 0.81 for inflection class classification. The trained model and code will be released publicly to facilitate future research in endangered NLP.

* IWCLUL 2024

Via

Access Paper or Ask Questions

DAG: Dictionary-Augmented Generation for Disambiguation of Sentences in Endangered Uralic Languages using ChatGPT

Nov 03, 2024

Mika Hämäläinen

Abstract:We showcase that ChatGPT can be used to disambiguate lemmas in two endangered languages ChatGPT is not proficient in, namely Erzya and Skolt Sami. We augment our prompt by providing dictionary translations of the candidate lemmas to a majority language - Finnish in our case. This dictionary augmented generation approach results in 50\% accuracy for Skolt Sami and 41\% accuracy for Erzya. On a closer inspection, many of the error types were of the kind even an untrained human annotator would make.

* IWCLUL 2024

Via

Access Paper or Ask Questions

Predicting Sustainable Development Goals Using Course Descriptions -- from LLMs to Conventional Foundation Models

Feb 26, 2024

Lev Kharlashkin, Melany Macias, Leo Huovinen, Mika Hämäläinen

Abstract:We present our work on predicting United Nations sustainable development goals (SDG) for university courses. We use an LLM named PaLM 2 to generate training data given a noisy human-authored course description input as input. We use this data to train several different smaller language models to predict SDGs for university courses. This work contributes to better university level adaptation of SDGs. The best performing model in our experiments was BART with an F1-score of 0.786.

* 3 figures, 2 tables

Via

Access Paper or Ask Questions

Sentiment Analysis Using Aligned Word Embeddings for Uralic Languages

May 24, 2023

Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

Abstract:In this paper, we present an approach for translating word embeddings from a majority language into 4 minority languages: Erzya, Moksha, Udmurt and Komi-Zyrian. Furthermore, we align these word embeddings and present a novel neural network model that is trained on English data to conduct sentiment analysis and then applied on endangered language data through the aligned word embeddings. To test our model, we annotated a small sentiment analysis corpus for the 4 endangered languages and Finnish. Our method reached at least 56\% accuracy for each endangered language. The models and the sentiment corpus will be released together with this paper. Our research shows that state-of-the-art neural models can be used with endangered languages with the only requirement being a dictionary between the endangered language and a majority language.

* Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)

Via

Access Paper or Ask Questions

Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos

Dec 15, 2022

Khalid Alnajjar, Mika Hämäläinen, Shuo Zhang

Abstract:We present the first openly available multimodal metaphor annotated corpus. The corpus consists of videos including audio and subtitles that have been annotated by experts. Furthermore, we present a method for detecting metaphors in the new dataset based on the textual content of the videos. The method achieves a high F1-score (62\%) for metaphorical labels. We also experiment with other modalities and multimodal methods; however, these methods did not out-perform the text-based model. In our error analysis, we do identify that there are cases where video could help in disambiguating metaphors, however, the visual cues are too subtle for our model to capture. The data is available on Zenodo.

* Figlang 2022

Via

Access Paper or Ask Questions

Modern French Poetry Generation with RoBERTa and GPT-2

Dec 06, 2022

Mika Hämäläinen, Khalid Alnajjar, Thierry Poibeau

Abstract:We present a novel neural model for modern poetry generation in French. The model consists of two pretrained neural models that are fine-tuned for the poem generation task. The encoder of the model is a RoBERTa based one while the decoder is based on GPT-2. This way the model can benefit from the superior natural language understanding performance of RoBERTa and the good natural language generation performance of GPT-2. Our evaluation shows that the model can create French poetry successfully. On a 5 point scale, the lowest score of 3.57 was given by human judges to typicality and emotionality of the output poetry while the best score of 3.79 was given to understandability.

* ICCC 2022

Via

Access Paper or Ask Questions