Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mederic Fourmy

Multi-step manipulation task and motion planning guided by video demonstration

May 13, 2025

Kateryna Zorina, David Kovar, Mederic Fourmy, Florent Lamiraux, Nicolas Mansard, Justin Carpentier, Josef Sivic, Vladimir Petrik

Abstract:This work aims to leverage instructional video to solve complex multi-step task-and-motion planning tasks in robotics. Towards this goal, we propose an extension of the well-established Rapidly-Exploring Random Tree (RRT) planner, which simultaneously grows multiple trees around grasp and release states extracted from the guiding video. Our key novelty lies in combining contact states and 3D object poses extracted from the guiding video with a traditional planning algorithm that allows us to solve tasks with sequential dependencies, for example, if an object needs to be placed at a specific location to be grasped later. We also investigate the generalization capabilities of our approach to go beyond the scene depicted in the instructional video. To demonstrate the benefits of the proposed video-guided planning approach, we design a new benchmark with three challenging tasks: (I) 3D re-arrangement of multiple objects between a table and a shelf, (ii) multi-step transfer of an object through a tunnel, and (iii) transferring objects using a tray similar to a waiter transfers dishes. We demonstrate the effectiveness of our planning algorithm on several robots, including the Franka Emika Panda and the KUKA KMR iiwa. For a seamless transfer of the obtained plans to the real robot, we develop a trajectory refinement approach formulated as an optimal control problem (OCP).

Via

Access Paper or Ask Questions

BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

Apr 03, 2025

Van Nguyen Nguyen, Stephen Tyree, Andrew Guo, Mederic Fourmy, Anas Gouda, Taeyeop Lee, Sungphill Moon, Hyeontae Son, Lukas Ranftl, Jonathan Tremblay(+9 more)

Abstract:We present the evaluation methodology, datasets and results of the BOP Challenge 2024, the sixth in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks. In 2024, our goal was to transition BOP from lab-like setups to real-world scenarios. First, we introduced new model-free tasks, where no 3D object models are available and methods need to onboard objects just from provided reference videos. Second, we defined a new, more practical 6D object detection task where identities of objects visible in a test image are not provided as input. Third, we introduced new BOP-H3 datasets recorded with high-resolution sensors and AR/VR headsets, closely resembling real-world scenarios. BOP-H3 include 3D models and onboarding videos to support both model-based and model-free tasks. Participants competed on seven challenge tracks, each defined by a task, object onboarding setup, and dataset group. Notably, the best 2024 method for model-based 6D localization of unseen objects (FreeZeV2.1) achieves 22% higher accuracy on BOP-Classic-Core than the best 2023 method (GenFlow), and is only 4% behind the best 2023 method for seen objects (GPose2023) although being significantly slower (24.9 vs 2.7s per image). A more practical 2024 method for this task is Co-op which takes only 0.8s per image and is 25X faster and 13% more accurate than GenFlow. Methods have a similar ranking on 6D detection as on 6D localization but higher run time. On model-based 2D detection of unseen objects, the best 2024 method (MUSE) achieves 21% relative improvement compared to the best 2023 method (CNOS). However, the 2D detection accuracy for unseen objects is still noticealy (-53%) behind the accuracy for seen objects (GDet2023). The online evaluation system stays open and is available at http://bop.felk.cvut.cz/

* arXiv admin note: text overlap with arXiv:2403.09799

Via

Access Paper or Ask Questions

Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking

Nov 09, 2023

Mederic Fourmy, Vojtech Priban, Jan Kristof Behrens, Nicolas Mansard, Josef Sivic, Vladimir Petrik

Figure 1 for Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking

Figure 2 for Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking

Figure 3 for Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking

Figure 4 for Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking

Abstract:The objective of this work is to enable manipulation tasks with respect to the 6D pose of a dynamically moving object using a camera mounted on a robot. Examples include maintaining a constant relative 6D pose of the robot arm with respect to the object, grasping the dynamically moving object, or co-manipulating the object together with a human. Fast and accurate 6D pose estimation is crucial to achieve smooth and stable robot control in such situations. The contributions of this work are three fold. First, we propose a new visual perception module that asynchronously combines accurate learning-based 6D object pose localizer and a high-rate model-based 6D pose tracker. The outcome is a low-latency accurate and temporally consistent 6D object pose estimation from the input video stream at up to 120 Hz. Second, we develop a visually guided robot arm controller that combines the new visual perception module with a torque-based model predictive control algorithm. Asynchronous combination of the visual and robot proprioception signals at their corresponding frequencies results in stable and robust 6D object pose guided robot arm control. Third, we experimentally validate the proposed approach on a challenging 6D pose estimation benchmark and demonstrate 6D object pose-guided control with dynamically moving objects on a real 7 DoF Franka Emika Panda robot.

Via

Access Paper or Ask Questions

WOLF: A modular estimation framework for robotics based on factor graphs

Oct 25, 2021

Joan Sola, Joan Vallve-Navarro, Joaquim Casals, Jeremie Deray, Mederic Fourmy, Dinesh Atchuthan, Juan Andrade-Cetto

Figure 1 for WOLF: A modular estimation framework for robotics based on factor graphs

Figure 2 for WOLF: A modular estimation framework for robotics based on factor graphs

Figure 3 for WOLF: A modular estimation framework for robotics based on factor graphs

Figure 4 for WOLF: A modular estimation framework for robotics based on factor graphs

Abstract:This paper introduces WOLF, a C++ estimation framework based on factor graphs and targeted at mobile robotics. WOLF extends the applications of factor graphs from the typical problems of SLAM and odometry to a general estimation framework able to handle self-calibration, model identification, or the observation of dynamic quantities other than localization. WOLF produces high throughput estimates at sensor rates up to the kHz range, which can be used for feedback control of highly dynamic robots such as humanoids, quadrupeds or aerial manipulators. Departing from the factor graph paradigm, the architecture of WOLF allows for a modular yet tightly-coupled estimator. Modularity is based on plugins that are loaded at runtime. Then, integration is achieved simply through YAML files, allowing users to configure a wide range of applications without the need of writing or compiling code. Synchronization of incoming data and their processing into a unique factor graph is achieved through a decentralized strategy of frame creation and joining. Most algorithmic assets are coded as abstract algorithms in base classes with varying levels of specialization. Overall, these assets allow for coherent processing and favor code reusability and scalability. WOLF can be interfaced with different solvers, and we provide a wrapper to Google Ceres. Likewise, we offer ROS integration, providing a generic ROS node and specialized packages with subscribers and publishers. WOLF is made publicly available and open to collaboration.

* 8 pages, 12 figures. v1: removed repository link

Via

Access Paper or Ask Questions