Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Thomas Weber

NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language Model

Mar 12, 2025

Yuzhi Lai, Shenghai Yuan, Youssef Nassar, Mingyu Fan, Thomas Weber, Matthias Rätsch

Abstract:Effective Human-Robot Interaction (HRI) is crucial for future service robots in aging societies. Existing solutions are biased toward only well-trained objects, creating a gap when dealing with new objects. Currently, HRI systems using predefined gestures or language tokens for pretrained objects pose challenges for all individuals, especially elderly ones. These challenges include difficulties in recalling commands, memorizing hand gestures, and learning new names. This paper introduces NVP-HRI, an intuitive multi-modal HRI paradigm that combines voice commands and deictic posture. NVP-HRI utilizes the Segment Anything Model (SAM) to analyze visual cues and depth data, enabling precise structural object representation. Through a pre-trained SAM network, NVP-HRI allows interaction with new objects via zero-shot prediction, even without prior knowledge. NVP-HRI also integrates with a large language model (LLM) for multimodal commands, coordinating them with object selection and scene distribution in real time for collision-free trajectory solutions. We also regulate the action sequence with the essential control syntax to reduce LLM hallucination risks. The evaluation of diverse real-world tasks using a Universal Robot showcased up to 59.2\% efficiency improvement over traditional gesture control, as illustrated in the video https://youtu.be/EbC7al2wiAc. Our code and design will be openly available at https://github.com/laiyuzhi/NVP-HRI.git.

* This work has been accepted for publication in ESWA @ 2025 Elsevier. Personal use of this material is permitted. Permission from Elsevier must be obtained for all other uses, including reprinting/redistribution, creating new works, or reuse of any copyrighted components of this work in other media

Via

Access Paper or Ask Questions

A Realistic Collimated X-Ray Image Simulation Pipeline

Nov 15, 2024

Benjamin El-Zein, Dominik Eckert, Thomas Weber, Maximilian Rohleder, Ludwig Ritschl, Steffen Kappler, Andreas Maier

Abstract:Collimator detection remains a challenging task in X-ray systems with unreliable or non-available information about the detectors position relative to the source. This paper presents a physically motivated image processing pipeline for simulating the characteristics of collimator shadows in X-ray images. By generating randomized labels for collimator shapes and locations, incorporating scattered radiation simulation, and including Poisson noise, the pipeline enables the expansion of limited datasets for training deep neural networks. We validate the proposed pipeline by a qualitative and quantitative comparison against real collimator shadows. Furthermore, it is demonstrated that utilizing simulated data within our deep learning framework not only serves as a suitable substitute for actual collimators but also enhances the generalization performance when applied to real-world data.

Via

Access Paper or Ask Questions

Ethically aligned Deep Learning: Unbiased Facial Aesthetic Prediction

Nov 09, 2021

Michael Danner, Thomas Weber, Leping Peng, Tobias Gerlach, Xueping Su, Matthias Rätsch

Figure 1 for Ethically aligned Deep Learning: Unbiased Facial Aesthetic Prediction

Figure 2 for Ethically aligned Deep Learning: Unbiased Facial Aesthetic Prediction

Abstract:Facial beauty prediction (FBP) aims to develop a machine that automatically makes facial attractiveness assessment. In the past those results were highly correlated with human ratings, therefore also with their bias in annotating. As artificial intelligence can have racist and discriminatory tendencies, the cause of skews in the data must be identified. Development of training data and AI algorithms that are robust against biased information is a new challenge for scientists. As aesthetic judgement usually is biased, we want to take it one step further and propose an Unbiased Convolutional Neural Network for FBP. While it is possible to create network models that can rate attractiveness of faces on a high level, from an ethical point of view, it is equally important to make sure the model is unbiased. In this work, we introduce AestheticNet, a state-of-the-art attractiveness prediction network, which significantly outperforms competitors with a Pearson Correlation of 0.9601. Additionally, we propose a new approach for generating a bias-free CNN to improve fairness in machine learning.

* Peer reviewed and accepted at CEPE/IACAP 2021 as Extended Abstract

Via

Access Paper or Ask Questions

Discovering key topics from short, real-world medical inquiries via natural language processing and unsupervised learning

Dec 08, 2020

Angelo Ziletti, Christoph Berns, Oliver Treichel, Thomas Weber, Jennifer Liang, Stephanie Kammerath, Marion Schwaerzler, Jagatheswari Virayah, David Ruau, Xin Ma(+1 more)

Figure 1 for Discovering key topics from short, real-world medical inquiries via natural language processing and unsupervised learning

Figure 2 for Discovering key topics from short, real-world medical inquiries via natural language processing and unsupervised learning

Figure 3 for Discovering key topics from short, real-world medical inquiries via natural language processing and unsupervised learning

Figure 4 for Discovering key topics from short, real-world medical inquiries via natural language processing and unsupervised learning

Abstract:Millions of unsolicited medical inquiries are received by pharmaceutical companies every year. It has been hypothesized that these inquiries represent a treasure trove of information, potentially giving insight into matters regarding medicinal products and the associated medical treatments. However, due to the large volume and specialized nature of the inquiries, it is difficult to perform timely, recurrent, and comprehensive analyses. Here, we propose a machine learning approach based on natural language processing and unsupervised learning to automatically discover key topics in real-world medical inquiries from customers. This approach does not require ontologies nor annotations. The discovered topics are meaningful and medically relevant, as judged by medical information specialists, thus demonstrating that unsolicited medical inquiries are a source of valuable customer insights. Our work paves the way for the machine-learning-driven analysis of medical inquiries in the pharmaceutical industry, which ultimately aims at improving patient care.

Via

Access Paper or Ask Questions

Metapath- and Entity-aware Graph Neural Network for Recommendation

Oct 22, 2020

Zhiwei Han, Muhammad Umer Anwaar, Shyam Arumugaswamy, Thomas Weber, Tianming Qiu, Hao Shen, Yuanting Liu, Martin Kleinsteuber

Figure 1 for Metapath- and Entity-aware Graph Neural Network for Recommendation

Figure 2 for Metapath- and Entity-aware Graph Neural Network for Recommendation

Figure 3 for Metapath- and Entity-aware Graph Neural Network for Recommendation

Figure 4 for Metapath- and Entity-aware Graph Neural Network for Recommendation

Abstract:Due to the shallow structure, classic graph neural networks (GNNs) failed in modelling high-order graph structures that deliver critical insights of task relevant relations. The negligence of those insights lead to insufficient distillation of collaborative signals in recommender systems. In this paper, we propose PEAGNN, a unified GNN framework tailored for recommendation tasks, which is capable of exploiting the rich semantics in metapaths. PEAGNN trains multilayer GNNs to perform metapath-aware information aggregation on collaborative subgraphs, $h$-hop subgraphs around the target user-item pairs. After the attentive fusion of aggregated information from different metapaths, a graph-level representation is then extracted for matching score prediction. To leverage the local structure of collaborative subgraphs, we present entity-awareness that regularizes node embedding with the presence of features in a contrastive manner. Moreover, PEAGNN is compatible with the mainstream GNN structures such as GCN, GAT and GraphSage. The empirical analysis on three public datasets demonstrate that our model outperforms or is at least on par with other competitive baselines. Further analysis indicates that trained PEAGNN automatically derives meaningful metapath combinations from the given metapaths.

Via

Access Paper or Ask Questions

Encoder-decoder semantic segmentation models for electroluminescence images of thin-film photovoltaic modules

Oct 15, 2020

Evgenii Sovetkin, Elbert Jan Achterberg, Thomas Weber, Bart E. Pieters

Figure 1 for Encoder-decoder semantic segmentation models for electroluminescence images of thin-film photovoltaic modules

Figure 2 for Encoder-decoder semantic segmentation models for electroluminescence images of thin-film photovoltaic modules

Figure 3 for Encoder-decoder semantic segmentation models for electroluminescence images of thin-film photovoltaic modules

Figure 4 for Encoder-decoder semantic segmentation models for electroluminescence images of thin-film photovoltaic modules

Abstract:We consider a series of image segmentation methods based on the deep neural networks in order to perform semantic segmentation of electroluminescence (EL) images of thin-film modules. We utilize the encoder-decoder deep neural network architecture. The framework is general such that it can easily be extended to other types of images (e.g. thermography) or solar cell technologies (e.g. crystalline silicon modules). The networks are trained and tested on a sample of images from a database with 6000 EL images of Copper Indium Gallium Diselenide (CIGS) thin film modules. We selected two types of features to extract, shunts and so called "droplets". The latter feature is often observed in the set of images. Several models are tested using various combinations of encoder-decoder layers, and a procedure is proposed to select the best model. We show exemplary results with the best selected model. Furthermore, we applied the best model to the full set of 6000 images and demonstrate that the automated segmentation of EL images can reveal many subtle features which cannot be inferred from studying a small sample of images. We believe these features can contribute to process optimization and quality control.

Via

Access Paper or Ask Questions

Methodology to analyze the accuracy of 3D objects reconstructed with collaborative robot based monocular LSD-SLAM

Mar 06, 2018

Sergey Triputen, Atmaraaj Gopal, Thomas Weber, Christian Hofert, Kristiaan Schreve, Matthias Ratsch

Figure 1 for Methodology to analyze the accuracy of 3D objects reconstructed with collaborative robot based monocular LSD-SLAM

Figure 2 for Methodology to analyze the accuracy of 3D objects reconstructed with collaborative robot based monocular LSD-SLAM

Figure 3 for Methodology to analyze the accuracy of 3D objects reconstructed with collaborative robot based monocular LSD-SLAM

Figure 4 for Methodology to analyze the accuracy of 3D objects reconstructed with collaborative robot based monocular LSD-SLAM

Abstract:SLAM systems are mainly applied for robot navigation while research on feasibility for motion planning with SLAM for tasks like bin-picking, is scarce. Accurate 3D reconstruction of objects and environments is important for planning motion and computing optimal gripper pose to grasp objects. In this work, we propose the methods to analyze the accuracy of a 3D environment reconstructed using a LSD-SLAM system with a monocular camera mounted onto the gripper of a collaborative robot. We discuss and propose a solution to the pose space conversion problem. Finally, we present several criteria to analyze the 3D reconstruction accuracy. These could be used as guidelines to improve the accuracy of 3D reconstructions with monocular LSD-SLAM and other SLAM based solutions.

* 5 pages, 5 figures, 2018 International Conference on Intelligent Autonomous Systems (ICoIAS 2018)

Via

Access Paper or Ask Questions