Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Elisabeth André

UNIA

CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding

Jan 16, 2025

Johannes Kirmayr, Lukas Stappen, Phillip Schneider, Florian Matthes, Elisabeth André

Figure 1 for CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding

Figure 2 for CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding

Figure 3 for CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding

Figure 4 for CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding

Abstract:In today's assistant landscape, personalisation enhances interactions, fosters long-term relationships, and deepens engagement. However, many systems struggle with retaining user preferences, leading to repetitive user requests and disengagement. Furthermore, the unregulated and opaque extraction of user preferences in industry applications raises significant concerns about privacy and trust, especially in regions with stringent regulations like Europe. In response to these challenges, we propose a long-term memory system for voice assistants, structured around predefined categories. This approach leverages Large Language Models to efficiently extract, store, and retrieve preferences within these categories, ensuring both personalisation and transparency. We also introduce a synthetic multi-turn, multi-session conversation dataset (CarMem), grounded in real industry data, tailored to an in-car voice assistant setting. Benchmarked on the dataset, our system achieves an F1-score of .78 to .95 in preference extraction, depending on category granularity. Our maintenance strategy reduces redundant preferences by 95% and contradictory ones by 92%, while the accuracy of optimal retrieval is at .87. Collectively, the results demonstrate the system's suitability for industrial applications.

* Accepted for presentation at the International Conference on Computational Linguistics (COLING 2025)

Via

Access Paper or Ask Questions

A Pre-Trained Graph-Based Model for Adaptive Sequencing of Educational Documents

Nov 18, 2024

Jean Vassoyan, Anan Schütt, Jill-Jênn Vie, Arun-Balajiee Lekshmi-Narayanan, Elisabeth André, Nicolas Vayatis

Abstract:Massive Open Online Courses (MOOCs) have greatly contributed to making education more accessible. However, many MOOCs maintain a rigid, one-size-fits-all structure that fails to address the diverse needs and backgrounds of individual learners. Learning path personalization aims to address this limitation, by tailoring sequences of educational content to optimize individual student learning outcomes. Existing approaches, however, often require either massive student interaction data or extensive expert annotation, limiting their broad application. In this study, we introduce a novel data-efficient framework for learning path personalization that operates without expert annotation. Our method employs a flexible recommender system pre-trained with reinforcement learning on a dataset of raw course materials. Through experiments on semi-synthetic data, we show that this pre-training stage substantially improves data-efficiency in a range of adaptive learning scenarios featuring new educational materials. This opens up new perspectives for the design of foundation models for adaptive learning.

* NeurIPS 2024 Workshop on Large Foundation Models for Educational Assessment (FM-Assess), Dec 2024, Vancouver, Canada

Via

Access Paper or Ask Questions

VoiceX: A Text-To-Speech Framework for Custom Voices

Aug 22, 2024

Silvan Mertes, Daksitha Withanage Don, Otto Grothe, Johanna Kuch, Ruben Schlagowski, Elisabeth André

Abstract:Modern TTS systems are capable of creating highly realistic and natural-sounding speech. Despite these developments, the process of customizing TTS voices remains a complex task, mostly requiring the expertise of specialists within the field. One reason for this is the utilization of deep learning models, which are characterized by their expansive, non-interpretable parameter spaces, restricting the feasibility of manual customization. In this paper, we present a novel human-in-the-loop paradigm based on an evolutionary algorithm for directly interacting with the parameter space of a neural TTS model. We integrated our approach into a user-friendly graphical user interface that allows users to efficiently create original voices. Those voices can then be used with the backbone TTS model, for which we provide a Python API. Further, we present the results of a user study exploring the capabilities of VoiceX. We show that VoiceX is an appropriate tool for creating individual, custom voices.

Via

Access Paper or Ask Questions

Recognizing Emotion Regulation Strategies from Human Behavior with Large Language Models

Aug 08, 2024

Philipp Müller, Alexander Heimerl, Sayed Muddashir Hossain, Lea Siegel, Jan Alexandersson, Patrick Gebhard, Elisabeth André, Tanja Schneeberger

Abstract:Human emotions are often not expressed directly, but regulated according to internal processes and social display rules. For affective computing systems, an understanding of how users regulate their emotions can be highly useful, for example to provide feedback in job interview training, or in psychotherapeutic scenarios. However, at present no method to automatically classify different emotion regulation strategies in a cross-user scenario exists. At the same time, recent studies showed that instruction-tuned Large Language Models (LLMs) can reach impressive performance across a variety of affect recognition tasks such as categorical emotion recognition or sentiment analysis. While these results are promising, it remains unclear to what extent the representational power of LLMs can be utilized in the more subtle task of classifying users' internal emotion regulation strategy. To close this gap, we make use of the recently introduced \textsc{Deep} corpus for modeling the social display of the emotion shame, where each point in time is annotated with one of seven different emotion regulation classes. We fine-tune Llama2-7B as well as the recently introduced Gemma model using Low-rank Optimization on prompts generated from different sources of information on the \textsc{Deep} corpus. These include verbal and nonverbal behavior, person factors, as well as the results of an in-depth interview after the interaction. Our results show, that a fine-tuned Llama2-7B LLM is able to classify the utilized emotion regulation strategy with high accuracy (0.84) without needing access to data from post-interaction interviews. This represents a significant improvement over previous approaches based on Bayesian Networks and highlights the importance of modeling verbal behavior in emotion regulation.

* Accepted to ACII'24

Via

Access Paper or Ask Questions

DISCOVER: A Data-driven Interactive System for Comprehensive Observation, Visualization, and ExploRation of Human Behaviour

Jul 18, 2024

Dominik Schiller, Tobias Hallmen, Daksitha Withanage Don, Elisabeth André, Tobias Baur

Abstract:Understanding human behavior is a fundamental goal of social sciences, yet its analysis presents significant challenges. Conventional methodologies employed for the study of behavior, characterized by labor-intensive data collection processes and intricate analyses, frequently hinder comprehensive exploration due to their time and resource demands. In response to these challenges, computational models have proven to be promising tools that help researchers analyze large amounts of data by automatically identifying important behavioral indicators, such as social signals. However, the widespread adoption of such state-of-the-art computational models is impeded by their inherent complexity and the substantial computational resources necessary to run them, thereby constraining accessibility for researchers without technical expertise and adequate equipment. To address these barriers, we introduce DISCOVER -- a modular and flexible, yet user-friendly software framework specifically developed to streamline computational-driven data exploration for human behavior analysis. Our primary objective is to democratize access to advanced computational methodologies, thereby enabling researchers across disciplines to engage in detailed behavioral analysis without the need for extensive technical proficiency. In this paper, we demonstrate the capabilities of DISCOVER using four exemplary data exploration workflows that build on each other: Interactive Semantic Content Exploration, Visual Inspection, Aided Annotation, and Multimodal Scene Search. By illustrating these workflows, we aim to emphasize the versatility and accessibility of DISCOVER as a comprehensive framework and propose a set of blueprints that can serve as a general starting point for exploratory data analysis.

Via

Access Paper or Ask Questions

Socially Interactive Agents for Robotic Neurorehabilitation Training: Conceptualization and Proof-of-concept Study

Jun 17, 2024

Rhythm Arora, Pooja Prajod, Matteo Lavit Nicora, Daniele Panzeri, Giovanni Tauro, Rocco Vertechy, Matteo Malosio, Elisabeth André, Patrick Gebhard

Abstract:Individuals with diverse motor abilities often benefit from intensive and specialized rehabilitation therapies aimed at enhancing their functional recovery. Nevertheless, the challenge lies in the restricted availability of neurorehabilitation professionals, hindering the effective delivery of the necessary level of care. Robotic devices hold great potential in reducing the dependence on medical personnel during therapy but, at the same time, they generally lack the crucial human interaction and motivation that traditional in-person sessions provide. To bridge this gap, we introduce an AI-based system aimed at delivering personalized, out-of-hospital assistance during neurorehabilitation training. This system includes a rehabilitation training device, affective signal classification models, training exercises, and a socially interactive agent as the user interface. With the assistance of a professional, the envisioned system is designed to be tailored to accommodate the unique rehabilitation requirements of an individual patient. Conceptually, after a preliminary setup and instruction phase, the patient is equipped to continue their rehabilitation regimen autonomously in the comfort of their home, facilitated by a socially interactive agent functioning as a virtual coaching assistant. Our approach involves the integration of an interactive socially-aware virtual agent into a neurorehabilitation robotic framework, with the primary objective of recreating the social aspects inherent to in-person rehabilitation sessions. We also conducted a feasibility study to test the framework with healthy patients. The results of our preliminary investigation indicate that participants demonstrated a propensity to adapt to the system. Notably, the presence of the interactive agent during the proposed exercises did not act as a source of distraction; instead, it positively impacted users' engagement.

Via

Access Paper or Ask Questions

Faces of Experimental Pain: Transferability of Deep Learned Heat Pain Features to Electrical Pain

Jun 17, 2024

Pooja Prajod, Dominik Schiller, Daksitha Withanage Don, Elisabeth André

Figure 1 for Faces of Experimental Pain: Transferability of Deep Learned Heat Pain Features to Electrical Pain

Figure 2 for Faces of Experimental Pain: Transferability of Deep Learned Heat Pain Features to Electrical Pain

Figure 3 for Faces of Experimental Pain: Transferability of Deep Learned Heat Pain Features to Electrical Pain

Figure 4 for Faces of Experimental Pain: Transferability of Deep Learned Heat Pain Features to Electrical Pain

Abstract:The limited size of pain datasets are a challenge in developing robust deep learning models for pain recognition. Transfer learning approaches are often employed in these scenarios. In this study, we investigate whether deep learned feature representation for one type of experimentally induced pain can be transferred to another. Participating in the AI4Pain challenge, our goal is to classify three levels of pain (No-Pain, Low-Pain, High-Pain). The challenge dataset contains data collected from 65 participants undergoing varying intensities of electrical pain. We utilize the video recording from the dataset to investigate the transferability of deep learned heat pain model to electrical pain. In our proposed approach, we leverage an existing heat pain convolutional neural network (CNN) - trained on BioVid dataset - as a feature extractor. The images from the challenge dataset are inputted to the pre-trained heat pain CNN to obtain feature vectors. These feature vectors are used to train two machine learning models: a simple feed-forward neural network and a long short-term memory (LSTM) network. Our approach was tested using the dataset's predefined training, validation, and testing splits. Our models outperformed the baseline of the challenge on both the validation and tests sets, highlighting the potential of models trained on other pain datasets for reliable feature extraction.

Via

Access Paper or Ask Questions

Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

May 08, 2024

Silvan Mertes, Tobias Huber, Christina Karle, Katharina Weitz, Ruben Schlagowski, Cristina Conati, Elisabeth André

Figure 1 for Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

Figure 2 for Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

Figure 3 for Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

Figure 4 for Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

Abstract:In this paper, we demonstrate the feasibility of alterfactual explanations for black box image classifiers. Traditional explanation mechanisms from the field of Counterfactual Thinking are a widely-used paradigm for Explainable Artificial Intelligence (XAI), as they follow a natural way of reasoning that humans are familiar with. However, most common approaches from this field are based on communicating information about features or characteristics that are especially important for an AI's decision. However, to fully understand a decision, not only knowledge about relevant features is needed, but the awareness of irrelevant information also highly contributes to the creation of a user's mental model of an AI system. To this end, a novel approach for explaining AI systems called alterfactual explanations was recently proposed on a conceptual level. It is based on showing an alternative reality where irrelevant features of an AI's input are altered. By doing so, the user directly sees which input data characteristics can change arbitrarily without influencing the AI's decision. In this paper, we show for the first time that it is possible to apply this idea to black box models based on neural networks. To this end, we present a GAN-based approach to generate these alterfactual explanations for binary image classifiers. Further, we present a user study that gives interesting insights on how alterfactual explanations can complement counterfactual explanations.

* Accepted at IJCAI 2024. arXiv admin note: text overlap with arXiv:2207.09374

Via

Access Paper or Ask Questions

Stressor Type Matters! -- Exploring Factors Influencing Cross-Dataset Generalizability of Physiological Stress Detection

May 06, 2024

Pooja Prajod, Bhargavi Mahesh, Elisabeth André

Figure 1 for Stressor Type Matters! -- Exploring Factors Influencing Cross-Dataset Generalizability of Physiological Stress Detection

Figure 2 for Stressor Type Matters! -- Exploring Factors Influencing Cross-Dataset Generalizability of Physiological Stress Detection

Figure 3 for Stressor Type Matters! -- Exploring Factors Influencing Cross-Dataset Generalizability of Physiological Stress Detection

Figure 4 for Stressor Type Matters! -- Exploring Factors Influencing Cross-Dataset Generalizability of Physiological Stress Detection

Abstract:Automatic stress detection using heart rate variability (HRV) features has gained significant traction as it utilizes unobtrusive wearable sensors measuring signals like electrocardiogram (ECG) or blood volume pulse (BVP). However, detecting stress through such physiological signals presents a considerable challenge owing to the variations in recorded signals influenced by factors, such as perceived stress intensity and measurement devices. Consequently, stress detection models developed on one dataset may perform poorly on unseen data collected under different conditions. To address this challenge, this study explores the generalizability of machine learning models trained on HRV features for binary stress detection. Our goal extends beyond evaluating generalization performance; we aim to identify the characteristics of datasets that have the most significant influence on generalizability. We leverage four publicly available stress datasets (WESAD, SWELL-KW, ForDigitStress, VerBIO) that vary in at least one of the characteristics such as stress elicitation techniques, stress intensity, and sensor devices. Employing a cross-dataset evaluation approach, we explore which of these characteristics strongly influence model generalizability. Our findings reveal a crucial factor affecting model generalizability: stressor type. Models achieved good performance across datasets when the type of stressor (e.g., social stress in our case) remains consistent. Factors like stress intensity or brand of the measurement device had minimal impact on cross-dataset performance. Based on our findings, we recommend matching the stressor type when deploying HRV-based stress models in new environments. To the best of our knowledge, this is the first study to systematically investigate factors influencing the cross-dataset applicability of HRV-based stress models.

Via

Access Paper or Ask Questions

Unimodal Multi-Task Fusion for Emotional Mimicry Prediction

Mar 22, 2024

Tobias Hallmen, Fabian Deuser, Norbert Oswald, Elisabeth André

Abstract:In this study, we propose a methodology for the Emotional Mimicry Intensity (EMI) Estimation task within the context of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild. Our approach leverages the Wav2Vec 2.0 framework, pre-trained on a comprehensive podcast dataset, to extract a broad range of audio features encompassing both linguistic and paralinguistic elements. We enhance feature representation through a fusion technique that integrates individual features with a global mean vector, introducing global contextual insights into our analysis. Additionally, we incorporate a pre-trained valence-arousal-dominance (VAD) module from the Wav2Vec 2.0 model. Our fusion employs a Long Short-Term Memory (LSTM) architecture for efficient temporal analysis of audio data. Utilizing only the provided audio data, our approach demonstrates significant improvements over the established baseline.

Via

Access Paper or Ask Questions