Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dongheui Lee

Multimodal Anomaly Detection with a Mixture-of-Experts

Jun 23, 2025

Christoph Willibald, Daniel Sliwowski, Dongheui Lee

Abstract:With a growing number of robots being deployed across diverse applications, robust multimodal anomaly detection becomes increasingly important. In robotic manipulation, failures typically arise from (1) robot-driven anomalies due to an insufficient task model or hardware limitations, and (2) environment-driven anomalies caused by dynamic environmental changes or external interferences. Conventional anomaly detection methods focus either on the first by low-level statistical modeling of proprioceptive signals or the second by deep learning-based visual environment observation, each with different computational and training data requirements. To effectively capture anomalies from both sources, we propose a mixture-of-experts framework that integrates the complementary detection mechanisms with a visual-language model for environment monitoring and a Gaussian-mixture regression-based detector for tracking deviations in interaction forces and robot motions. We introduce a confidence-based fusion mechanism that dynamically selects the most reliable detector for each situation. We evaluate our approach on both household and industrial tasks using two robotic systems, demonstrating a 60% reduction in detection delay while improving frame-wise anomaly detection performance compared to individual detectors.

* 8 pages, 5 figures, 1 table, the paper has been accepted for publication in the Proceedings of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)

Via

Access Paper or Ask Questions

Hierarchical Task Decomposition for Execution Monitoring and Error Recovery: Understanding the Rationale Behind Task Demonstrations

May 07, 2025

Christoph Willibald, Dongheui Lee

Abstract:Multi-step manipulation tasks where robots interact with their environment and must apply process forces based on the perceived situation remain challenging to learn and prone to execution errors. Accurately simulating these tasks is also difficult. Hence, it is crucial for robust task performance to learn how to coordinate end-effector pose and applied force, monitor execution, and react to deviations. To address these challenges, we propose a learning approach that directly infers both low- and high-level task representations from user demonstrations on the real system. We developed an unsupervised task segmentation algorithm that combines intention recognition and feature clustering to infer the skills of a task. We leverage the inferred characteristic features of each skill in a novel unsupervised anomaly detection approach to identify deviations from the intended task execution. Together, these components form a comprehensive framework capable of incrementally learning task decisions and new behaviors as new situations arise. Compared to state-of-the-art learning techniques, our approach significantly reduces the required amount of training data and computational complexity while efficiently learning complex in-contact behaviors and recovery strategies. Our proposed task segmentation and anomaly detection approaches outperform state-of-the-art methods on force-based tasks evaluated on two different robotic systems.

* The paper has been accepted for publication by the International Journal of Robotics Research (IJRR), 26 pages

Via

Access Paper or Ask Questions

M2R2: MulitModal Robotic Representation for Temporal Action Segmentation

Apr 25, 2025

Daniel Sliwowski, Dongheui Lee

Abstract:Temporal action segmentation (TAS) has long been a key area of research in both robotics and computer vision. In robotics, algorithms have primarily focused on leveraging proprioceptive information to determine skill boundaries, with recent approaches in surgical robotics incorporating vision. In contrast, computer vision typically relies on exteroceptive sensors, such as cameras. Existing multimodal TAS models in robotics integrate feature fusion within the model, making it difficult to reuse learned features across different models. Meanwhile, pretrained vision-only feature extractors commonly used in computer vision struggle in scenarios with limited object visibility. In this work, we address these challenges by proposing M2R2, a multimodal feature extractor tailored for TAS, which combines information from both proprioceptive and exteroceptive sensors. We introduce a novel pretraining strategy that enables the reuse of learned features across multiple TAS models. Our method achieves state-of-the-art performance on the REASSEMBLE dataset, a challenging multimodal robotic assembly dataset, outperforming existing robotic action segmentation models by 46.6%. Additionally, we conduct an extensive ablation study to evaluate the contribution of different modalities in robotic TAS tasks.

* 8 pages, 6 figures, 2 tables

Via

Access Paper or Ask Questions

Variable Stiffness for Robust Locomotion through Reinforcement Learning

Feb 13, 2025

Dario Spoljaric, Yashuai Yan, Dongheui Lee

Abstract:Reinforcement-learned locomotion enables legged robots to perform highly dynamic motions but often accompanies time-consuming manual tuning of joint stiffness. This paper introduces a novel control paradigm that integrates variable stiffness into the action space alongside joint positions, enabling grouped stiffness control such as per-joint stiffness (PJS), per-leg stiffness (PLS) and hybrid joint-leg stiffness (HJLS). We show that variable stiffness policies, with grouping in per-leg stiffness (PLS), outperform position-based control in velocity tracking and push recovery. In contrast, HJLS excels in energy efficiency. Furthermore, our method showcases robust walking behaviour on diverse outdoor terrains by sim-to-real transfer, although the policy is sorely trained on a flat floor. Our approach simplifies design by eliminating per-joint stiffness tuning while keeping competitive results with various metrics.

* submitted to IFAC Joint Symposia on Mechatronics & Robotics

Via

Access Paper or Ask Questions

REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly

Feb 07, 2025

Daniel Sliwowski, Shail Jadav, Sergej Stanovcic, Jedrzej Orbik, Johannes Heidersberger, Dongheui Lee

Abstract:Robotic manipulation remains a core challenge in robotics, particularly for contact-rich tasks such as industrial assembly and disassembly. Existing datasets have significantly advanced learning in manipulation but are primarily focused on simpler tasks like object rearrangement, falling short of capturing the complexity and physical dynamics involved in assembly and disassembly. To bridge this gap, we present REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt), a new dataset designed specifically for contact-rich manipulation tasks. Built around the NIST Assembly Task Board 1 benchmark, REASSEMBLE includes four actions (pick, insert, remove, and place) involving 17 objects. The dataset contains 4,551 demonstrations, of which 4,035 were successful, spanning a total of 781 minutes. Our dataset features multi-modal sensor data including event cameras, force-torque sensors, microphones, and multi-view RGB cameras. This diverse dataset supports research in areas such as learning contact-rich manipulation, task condition identification, action segmentation, and more. We believe REASSEMBLE will be a valuable resource for advancing robotic manipulation in complex, real-world scenarios. The dataset is publicly available on our project website: https://dsliwowski1.github.io/REASSEMBLE_page.

* 16 pages, 12 figures, 1 table

Via

Access Paper or Ask Questions

ConditionNET: Learning Preconditions and Effects for Execution Monitoring

Feb 03, 2025

Daniel Sliwowski, Dongheui Lee

Abstract:The introduction of robots into everyday scenarios necessitates algorithms capable of monitoring the execution of tasks. In this paper, we propose ConditionNET, an approach for learning the preconditions and effects of actions in a fully data-driven manner. We develop an efficient vision-language model and introduce additional optimization objectives during training to optimize for consistent feature representations. ConditionNET explicitly models the dependencies between actions, preconditions, and effects, leading to improved performance. We evaluate our model on two robotic datasets, one of which we collected for this paper, containing 406 successful and 138 failed teleoperated demonstrations of a Franka Emika Panda robot performing tasks like pouring and cleaning the counter. We show in our experiments that ConditionNET outperforms all baselines on both anomaly detection and phase prediction tasks. Furthermore, we implement an action monitoring system on a real robot to demonstrate the practical applicability of the learned preconditions and effects. Our results highlight the potential of ConditionNET for enhancing the reliability and adaptability of robots in real-world environments. The data is available on the project website: https://dsliwowski1.github.io/ConditionNET_page.

* in IEEE Robotics and Automation Letters, vol. 10, no. 2, pp. 1337-1344, Feb. 2025
* 9 pages, 5 figures, 3 tables

Via

Access Paper or Ask Questions

Enhancing Model-Based Step Adaptation for Push Recovery through Reinforcement Learning of Step Timing and Region

Nov 01, 2024

Tobias Egle, Yashuai Yan, Dongheui Lee, Christian Ott

Figure 1 for Enhancing Model-Based Step Adaptation for Push Recovery through Reinforcement Learning of Step Timing and Region

Figure 2 for Enhancing Model-Based Step Adaptation for Push Recovery through Reinforcement Learning of Step Timing and Region

Figure 3 for Enhancing Model-Based Step Adaptation for Push Recovery through Reinforcement Learning of Step Timing and Region

Figure 4 for Enhancing Model-Based Step Adaptation for Push Recovery through Reinforcement Learning of Step Timing and Region

Abstract:This paper introduces a new approach to enhance the robustness of humanoid walking under strong perturbations, such as substantial pushes. Effective recovery from external disturbances requires bipedal robots to dynamically adjust their stepping strategies, including footstep positions and timing. Unlike most advanced walking controllers that restrict footstep locations to a predefined convex region, substantially limiting recoverable disturbances, our method leverages reinforcement learning to dynamically adjust the permissible footstep region, expanding it to a larger, effectively non-convex area and allowing cross-over stepping, which is crucial for counteracting large lateral pushes. Additionally, our method adapts footstep timing in real time to further extend the range of recoverable disturbances. Based on these adjustments, feasible footstep positions and DCM trajectory are planned by solving a QP. Finally, we employ a DCM controller and an inverse dynamics whole-body control framework to ensure the robot effectively follows the trajectory.

Via

Access Paper or Ask Questions

Know your limits! Optimize the robot's behavior through self-awareness

Sep 16, 2024

Esteve Valls Mascaro, Dongheui Lee

Abstract:As humanoid robots transition from labs to real-world environments, it is essential to democratize robot control for non-expert users. Recent human-robot imitation algorithms focus on following a reference human motion with high precision, but they are susceptible to the quality of the reference motion and require the human operator to simplify its movements to match the robot's capabilities. Instead, we consider that the robot should understand and adapt the reference motion to its own abilities, facilitating the operator's task. For that, we introduce a deep-learning model that anticipates the robot's performance when imitating a given reference. Then, our system can generate multiple references given a high-level task command, assign a score to each of them, and select the best reference to achieve the desired robot behavior. Our Self-AWare model (SAW) ranks potential robot behaviors based on various criteria, such as fall likelihood, adherence to the reference motion, and smoothness. We integrate advanced motion generation, robot control, and SAW in one unique system, ensuring optimal robot behavior for any task command. For instance, SAW can anticipate falls with 99.29% accuracy. For more information check our project page: https://evm7.github.io/Self-AWare

* Accepted to Humanoids 2024 and HFR 2024. Project Page: https://evm7.github.io/Self-AWare

Via

Access Paper or Ask Questions

I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning

May 14, 2024

Yashuai Yan, Esteve Valls Mascaro, Tobias Egle, Dongheui Lee

Abstract:This paper addresses the critical need for refining robot motions that, despite achieving a high visual similarity through human-to-humanoid retargeting methods, fall short of practical execution in the physical realm. Existing techniques in the graphics community often prioritize visual fidelity over physics-based feasibility, posing a significant challenge for deploying bipedal systems in practical applications. Our research introduces a constrained reinforcement learning algorithm to produce physics-based high-quality motion imitation onto legged humanoid robots that enhance motion resemblance while successfully following the reference human trajectory. We name our framework: I-CTRL. By reformulating the motion imitation problem as a constrained refinement over non-physics-based retargeted motions, our framework excels in motion imitation with simple and unique rewards that generalize across four robots. Moreover, our framework can follow large-scale motion datasets with a unique RL agent. The proposed approach signifies a crucial step forward in advancing the control of bipedal robots, emphasizing the importance of aligning visual and physical realism for successful motion imitation.

Via

Access Paper or Ask Questions

Shared Autonomy via Variable Impedance Control and Virtual Potential Fields for Encoding Human Demonstration

Mar 19, 2024

Shail Jadav, Johannes Heidersberger, Christian Ott, Dongheui Lee

Abstract:This article introduces a framework for complex human-robot collaboration tasks, such as the co-manufacturing of furniture. For these tasks, it is essential to encode tasks from human demonstration and reproduce these skills in a compliant and safe manner. Therefore, two key components are addressed in this work: motion generation and shared autonomy. We propose a motion generator based on a time-invariant potential field, capable of encoding wrench profiles, complex and closed-loop trajectories, and additionally incorporates obstacle avoidance. Additionally, the paper addresses shared autonomy (SA) which enables synergetic collaboration between human operators and robots by dynamically allocating authority. Variable impedance control (VIC) and force control are employed, where impedance and wrench are adapted based on the human-robot autonomy factor derived from interaction forces. System passivity is ensured by an energy-tank based task passivation strategy. The framework's efficacy is validated through simulations and an experimental study employing a Franka Emika Research 3 robot.

* Accepted to ICRA 2024

Via

Access Paper or Ask Questions