Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Theresa Pekarek Rosin

A Framework for Adapting Human-Robot Interaction to Diverse User Groups

Oct 15, 2024

Theresa Pekarek Rosin, Vanessa Hassouna, Xiaowen Sun, Luca Krohm, Henri-Leon Kordt, Michael Beetz, Stefan Wermter

Figure 1 for A Framework for Adapting Human-Robot Interaction to Diverse User Groups

Figure 2 for A Framework for Adapting Human-Robot Interaction to Diverse User Groups

Figure 3 for A Framework for Adapting Human-Robot Interaction to Diverse User Groups

Figure 4 for A Framework for Adapting Human-Robot Interaction to Diverse User Groups

Abstract:To facilitate natural and intuitive interactions with diverse user groups in real-world settings, social robots must be capable of addressing the varying requirements and expectations of these groups while adapting their behavior based on user feedback. While previous research often focuses on specific demographics, we present a novel framework for adaptive Human-Robot Interaction (HRI) that tailors interactions to different user groups and enables individual users to modulate interactions through both minor and major interruptions. Our primary contributions include the development of an adaptive, ROS-based HRI framework with an open-source code base. This framework supports natural interactions through advanced speech recognition and voice activity detection, and leverages a large language model (LLM) as a dialogue bridge. We validate the efficiency of our framework through module tests and system trials, demonstrating its high accuracy in age recognition and its robustness to repeated user inputs and plan changes.

* Accepted at the 16th International Conference on Social Robotics (ICSR) 2024

Via

Access Paper or Ask Questions

Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

Sep 05, 2023

Patrick Eickhoff, Matthias Möller, Theresa Pekarek Rosin, Johannes Twiefel, Stefan Wermter

Figure 1 for Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

Figure 2 for Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

Figure 3 for Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

Figure 4 for Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

Abstract:In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preprocessor network, which can be used as a frontend for downstream ASR models. However, the proposed methods were limited to specific fully convolutional architectures. In this work, we propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture. We propose the Cleancoder preprocessor architecture that extracts hidden activations from the Conformer ASR model and feeds them to a decoder to predict denoised spectrograms. We train our pre-processor on the Noisy Speech Database (NSD) to reconstruct denoised spectrograms from noisy inputs. Then, we evaluate our model as a frontend to a pretrained Conformer ASR model as well as a frontend to train smaller Conformer ASR models from scratch. We show that the Cleancoder is able to filter noise from speech and that it improves the total Word Error Rate (WER) of the downstream model in noisy conditions for both applications.

* Submitted and accepted for ICANN 2023 (32nd International Conference on Artificial Neural Networks)

Via

Access Paper or Ask Questions

Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition

Jul 14, 2023

Theresa Pekarek Rosin, Stefan Wermter

Abstract:While Automatic Speech Recognition (ASR) models have shown significant advances with the introduction of unsupervised or self-supervised training techniques, these improvements are still only limited to a subsection of languages and speakers. Transfer learning enables the adaptation of large-scale multilingual models to not only low-resource languages but also to more specific speaker groups. However, fine-tuning on data from new domains is usually accompanied by a decrease in performance on the original domain. Therefore, in our experiments, we examine how well the performance of large-scale ASR models can be approximated for smaller domains, with our own dataset of German Senior Voice Commands (SVC-de), and how much of the general speech recognition performance can be preserved by selectively freezing parts of the model during training. To further increase the robustness of the ASR model to vocabulary and speakers outside of the fine-tuned domain, we apply Experience Replay for continual learning. By adding only a fraction of data from the original domain, we are able to reach Word-Error-Rates (WERs) below 5\% on the new domain, while stabilizing performance for general speech recognition at acceptable WERs.

* 13 pages, 7 figures, submitted to ICANN 2023

Via

Access Paper or Ask Questions