Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eckehard Steinbach

Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge

Jun 06, 2025

Constantin Patsch, Marsil Zakour, Yuankai Wu, Eckehard Steinbach

Abstract:In this report, we address the task of online mistake detection, which is vital in domains like industrial automation and education, where real-time video analysis allows human operators to correct errors as they occur. While previous work focuses on procedural errors involving action order, broader error types must be addressed for real-world use. We introduce an online mistake detection framework that handles both procedural and execution errors (e.g., motor slips or tool misuse). Upon detecting an error, we use a large language model (LLM) to generate explanatory feedback. Experiments on the HoloAssist benchmark confirm the effectiveness of our approach, where our approach is placed second on the mistake detection task.

Via

Access Paper or Ask Questions

FARE: A Deep Learning-Based Framework for Radar-based Face Recognition and Out-of-distribution Detection

Jan 14, 2025

Sabri Mustafa Kahya, Boran Hamdi Sivrikaya, Muhammet Sami Yavuz, Eckehard Steinbach

Figure 1 for FARE: A Deep Learning-Based Framework for Radar-based Face Recognition and Out-of-distribution Detection

Figure 2 for FARE: A Deep Learning-Based Framework for Radar-based Face Recognition and Out-of-distribution Detection

Figure 3 for FARE: A Deep Learning-Based Framework for Radar-based Face Recognition and Out-of-distribution Detection

Figure 4 for FARE: A Deep Learning-Based Framework for Radar-based Face Recognition and Out-of-distribution Detection

Abstract:In this work, we propose a novel pipeline for face recognition and out-of-distribution (OOD) detection using short-range FMCW radar. The proposed system utilizes Range-Doppler and micro Range-Doppler Images. The architecture features a primary path (PP) responsible for the classification of in-distribution (ID) faces, complemented by intermediate paths (IPs) dedicated to OOD detection. The network is trained in two stages: first, the PP is trained using triplet loss to optimize ID face classification. In the second stage, the PP is frozen, and the IPs-comprising simple linear autoencoder networks-are trained specifically for OOD detection. Using our dataset generated with a 60 GHz FMCW radar, our method achieves an ID classification accuracy of 99.30% and an OOD detection AUROC of 96.91%.

* Accepted at ICASSP 2025

Via

Access Paper or Ask Questions

FERT: Real-Time Facial Expression Recognition with Short-Range FMCW Radar

Nov 18, 2024

Sabri Mustafa Kahya, Muhammet Sami Yavuz, Eckehard Steinbach

Figure 1 for FERT: Real-Time Facial Expression Recognition with Short-Range FMCW Radar

Figure 2 for FERT: Real-Time Facial Expression Recognition with Short-Range FMCW Radar

Figure 3 for FERT: Real-Time Facial Expression Recognition with Short-Range FMCW Radar

Figure 4 for FERT: Real-Time Facial Expression Recognition with Short-Range FMCW Radar

Abstract:This study proposes a novel approach for real-time facial expression recognition utilizing short-range Frequency-Modulated Continuous-Wave (FMCW) radar equipped with one transmit (Tx), and three receive (Rx) antennas. The system leverages four distinct modalities simultaneously: Range-Doppler images (RDIs), micro range-Doppler Images (micro-RDIs), range azimuth images (RAIs), and range elevation images (REIs). Our innovative architecture integrates feature extractor blocks, intermediate feature extractor blocks, and a ResNet block to accurately classify facial expressions into smile, anger, neutral, and no-face classes. Our model achieves an average classification accuracy of 98.91% on the dataset collected using a 60 GHz short-range FMCW radar. The proposed solution operates in real-time in a person-independent manner, which shows the potential use of low-cost FMCW radars for effective facial expression recognition in various applications.

* Accepted at IEEE SENSORS 2024

Via

Access Paper or Ask Questions

LEMON: Localized Editing with Mesh Optimization and Neural Shaders

Sep 18, 2024

Furkan Mert Algan, Umut Yazgan, Driton Salihu, Cem Eteke, Eckehard Steinbach

Abstract:In practical use cases, polygonal mesh editing can be faster than generating new ones, but it can still be challenging and time-consuming for users. Existing solutions for this problem tend to focus on a single task, either geometry or novel view synthesis, which often leads to disjointed results between the mesh and view. In this work, we propose LEMON, a mesh editing pipeline that combines neural deferred shading with localized mesh optimization. Our approach begins by identifying the most important vertices in the mesh for editing, utilizing a segmentation model to focus on these key regions. Given multi-view images of an object, we optimize a neural shader and a polygonal mesh while extracting the normal map and the rendered image from each view. By using these outputs as conditioning data, we edit the input images with a text-to-image diffusion model and iteratively update our dataset while deforming the mesh. This process results in a polygonal mesh that is edited according to the given text instruction, preserving the geometric characteristics of the initial mesh while focusing on the most significant areas. We evaluate our pipeline using the DTU dataset, demonstrating that it generates finely-edited meshes more rapidly than the current state-of-the-art methods. We include our code and additional results in the supplementary material.

Via

Access Paper or Ask Questions

FOOD: Facial Authentication and Out-of-Distribution Detection with Short-Range FMCW Radar

Jun 06, 2024

Sabri Mustafa Kahya, Boran Hamdi Sivrikaya, Muhammet Sami Yavuz, Eckehard Steinbach

Figure 1 for FOOD: Facial Authentication and Out-of-Distribution Detection with Short-Range FMCW Radar

Figure 2 for FOOD: Facial Authentication and Out-of-Distribution Detection with Short-Range FMCW Radar

Figure 3 for FOOD: Facial Authentication and Out-of-Distribution Detection with Short-Range FMCW Radar

Figure 4 for FOOD: Facial Authentication and Out-of-Distribution Detection with Short-Range FMCW Radar

Abstract:This paper proposes a short-range FMCW radar-based facial authentication and out-of-distribution (OOD) detection framework. Our pipeline jointly estimates the correct classes for the in-distribution (ID) samples and detects the OOD samples to prevent their inaccurate prediction. Our reconstruction-based architecture consists of a main convolutional block with one encoder and multi-decoder configuration, and intermediate linear encoder-decoder parts. Together, these elements form an accurate human face classifier and a robust OOD detector. For our dataset, gathered using a 60 GHz short-range FMCW radar, our network achieves an average classification accuracy of 98.07% in identifying in-distribution human faces. As an OOD detector, it achieves an average Area Under the Receiver Operating Characteristic (AUROC) curve of 98.50% and an average False Positive Rate at 95% True Positive Rate (FPR95) of 6.20%. Also, our extensive experiments show that the proposed approach outperforms previous OOD detectors in terms of common OOD detection metrics.

* Accepted at ICIP 2024

Via

Access Paper or Ask Questions

ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living

Feb 27, 2024

Marsil Zakour, Partha Pratim Nath, Ludwig Lohmer, Emre Faik Gökçe, Martin Piccolrovazzi, Constantin Patsch, Yuankai Wu, Rahul Chaudhari, Eckehard Steinbach

Figure 1 for ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living

Figure 2 for ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living

Figure 3 for ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living

Figure 4 for ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living

Abstract:Hand-Object Interactions (HOIs) are conditioned on spatial and temporal contexts like surrounding objects, pre- vious actions, and future intents (for example, grasping and handover actions vary greatly based on objects proximity and trajectory obstruction). However, existing datasets for 4D HOI (3D HOI over time) are limited to one subject inter- acting with one object only. This restricts the generalization of learning-based HOI methods trained on those datasets. We introduce ADL4D, a dataset of up to two subjects inter- acting with different sets of objects performing Activities of Daily Living (ADL) like breakfast or lunch preparation ac- tivities. The transition between multiple objects to complete a certain task over time introduces a unique context lacking in existing datasets. Our dataset consists of 75 sequences with a total of 1.1M RGB-D frames, hand and object poses, and per-hand fine-grained action annotations. We develop an automatic system for multi-view multi-hand 3D pose an- notation capable of tracking hand poses over time. We inte- grate and test it against publicly available datasets. Finally, we evaluate our dataset on the tasks of Hand Mesh Recov- ery (HMR) and Hand Action Segmentation (HAS).

Via

Access Paper or Ask Questions

Adapting Learned Image Codecs to Screen Content via Adjustable Transformations

Feb 27, 2024

H. Burak Dogaroglu, A. Burakhan Koyuncu, Atanas Boev, Elena Alshina, Eckehard Steinbach

Figure 1 for Adapting Learned Image Codecs to Screen Content via Adjustable Transformations

Figure 2 for Adapting Learned Image Codecs to Screen Content via Adjustable Transformations

Figure 3 for Adapting Learned Image Codecs to Screen Content via Adjustable Transformations

Figure 4 for Adapting Learned Image Codecs to Screen Content via Adjustable Transformations

Abstract:As learned image codecs (LICs) become more prevalent, their low coding efficiency for out-of-distribution data becomes a bottleneck for some applications. To improve the performance of LICs for screen content (SC) images without breaking backwards compatibility, we propose to introduce parameterized and invertible linear transformations into the coding pipeline without changing the underlying baseline codec's operation flow. We design two neural networks to act as prefilters and postfilters in our setup to increase the coding efficiency and help with the recovery from coding artifacts. Our end-to-end trained solution achieves up to 10% bitrate savings on SC compression compared to the baseline LICs while introducing only 1% extra parameters.

* 7 pages, 6 figures, 2 tables

Via

Access Paper or Ask Questions

HAROOD: Human Activity Classification and Out-of-Distribution Detection with Short-Range FMCW Radar

Dec 14, 2023

Sabri Mustafa Kahya, Muhammet Sami Yavuz, Eckehard Steinbach

Abstract:We propose HAROOD as a short-range FMCW radar-based human activity classifier and out-of-distribution (OOD) detector. It aims to classify human sitting, standing, and walking activities and to detect any other moving or stationary object as OOD. We introduce a two-stage network. The first stage is trained with a novel loss function that includes intermediate reconstruction loss, intermediate contrastive loss, and triplet loss. The second stage uses the first stage's output as its input and is trained with cross-entropy loss. It creates a simple classifier that performs the activity classification. On our dataset collected by 60 GHz short-range FMCW radar, we achieve an average classification accuracy of 96.51%. Also, we achieve an average AUROC of 95.04% as an OOD detector. Additionally, our extensive evaluations demonstrate the superiority of HAROOD over the state-of-the-art OOD detection methods in terms of standard OOD detection metrics.

* Accepted at ICASSP 2024

Via

Access Paper or Ask Questions

Enabling Acoustic Audience Feedback in Large Virtual Events

Oct 27, 2023

Tamay Aykut, Markus Hofbauer, Christopher Kuhn, Eckehard Steinbach, Bernd Girod

Figure 1 for Enabling Acoustic Audience Feedback in Large Virtual Events

Figure 2 for Enabling Acoustic Audience Feedback in Large Virtual Events

Abstract:The COVID-19 pandemic shifted many events in our daily lives into the virtual domain. While virtual conference systems provide an alternative to physical meetings, larger events require a muted audience to avoid an accumulation of background noise and distorted audio. However, performing artists strongly rely on the feedback of their audience. We propose a concept for a virtual audience framework which supports all participants with the ambience of a real audience. Audience feedback is collected locally, allowing users to express enthusiasm or discontent by selecting means such as clapping, whistling, booing, and laughter. This feedback is sent as abstract information to a virtual audience server. We broadcast the combined virtual audience feedback information to all participants, which can be synthesized as a single acoustic feedback by the client. The synthesis can be done by turning the collective audience feedback into a prompt that is fed to state-of-the-art models such as AudioGen. This way, each user hears a single acoustic feedback sound of the entire virtual event, without requiring to unmute or risk hearing distorted, unsynchronized feedback.

* 4 pages, 2 figures

Via

Access Paper or Ask Questions

Care3D: An Active 3D Object Detection Dataset of Real Robotic-Care Environments

Oct 09, 2023

Michael G. Adam, Sebastian Eger, Martin Piccolrovazzi, Maged Iskandar, Joern Vogel, Alexander Dietrich, Seongjien Bien, Jon Skerlj, Abdeldjallil Naceri, Eckehard Steinbach(+3 more)

Figure 1 for Care3D: An Active 3D Object Detection Dataset of Real Robotic-Care Environments

Figure 2 for Care3D: An Active 3D Object Detection Dataset of Real Robotic-Care Environments

Figure 3 for Care3D: An Active 3D Object Detection Dataset of Real Robotic-Care Environments

Figure 4 for Care3D: An Active 3D Object Detection Dataset of Real Robotic-Care Environments

Abstract:As labor shortage increases in the health sector, the demand for assistive robotics grows. However, the needed test data to develop those robots is scarce, especially for the application of active 3D object detection, where no real data exists at all. This short paper counters this by introducing such an annotated dataset of real environments. The captured environments represent areas which are already in use in the field of robotic health care research. We further provide ground truth data within one room, for assessing SLAM algorithms running directly on a health care robot.

Via

Access Paper or Ask Questions