Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Serena Ivaldi

LARSEN

Extremum Flow Matching for Offline Goal Conditioned Reinforcement Learning

May 26, 2025

Quentin Rouxel, Clemente Donoso, Fei Chen, Serena Ivaldi, Jean-Baptiste Mouret

Abstract:Imitation learning is a promising approach for enabling generalist capabilities in humanoid robots, but its scaling is fundamentally constrained by the scarcity of high-quality expert demonstrations. This limitation can be mitigated by leveraging suboptimal, open-ended play data, often easier to collect and offering greater diversity. This work builds upon recent advances in generative modeling, specifically Flow Matching, an alternative to Diffusion models. We introduce a method for estimating the extremum of the learned distribution by leveraging the unique properties of Flow Matching, namely, deterministic transport and support for arbitrary source distributions. We apply this method to develop several goal-conditioned imitation and reinforcement learning algorithms based on Flow Matching, where policies are conditioned on both current and goal observations. We explore and compare different architectural configurations by combining core components, such as critic, planner, actor, or world model, in various ways. We evaluated our agents on the OGBench benchmark and analyzed how different demonstration behaviors during data collection affect performance in a 2D non-prehensile pushing task. Furthermore, we validated our approach on real hardware by deploying it on the Talos humanoid robot to perform complex manipulation tasks based on high-dimensional image observations, featuring a sequence of pick-and-place and articulated object manipulation in a realistic kitchen environment. Experimental videos and code are available at: https://hucebot.github.io/extremum_flow_matching_website/

Via

Access Paper or Ask Questions

From Vocal Instructions to Household Tasks: The Inria Tiago++ in the euROBIN Service Robots Coopetition

Dec 20, 2024

Fabio Amadio, Clemente Donoso, Dionis Totsila, Raphael Lorenzo, Quentin Rouxel, Olivier Rochel, Enrico Mingo Hoffman, Jean-Baptiste Mouret, Serena Ivaldi

Abstract:This paper describes the Inria team's integrated robotics system used in the 1st euROBIN coopetition, during which service robots performed voice-activated household tasks in a kitchen setting.The team developed a modified Tiago++ platform that leverages a whole-body control stack for autonomous and teleoperated modes, and an LLM-based pipeline for instruction understanding and task planning. The key contributions (opens-sourced) are the integration of these components and the design of custom teleoperation devices, addressing practical challenges in the deployment of service robots.

Via

Access Paper or Ask Questions

Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models

Jul 19, 2024

Dionis Totsila, Quentin Rouxel, Jean-Baptiste Mouret, Serena Ivaldi

Figure 1 for Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models

Figure 2 for Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models

Figure 3 for Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models

Figure 4 for Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models

Abstract:This paper presents Words2Contact, a language-guided multi-contact placement pipeline leveraging large language models and vision language models. Our method is a key component for language-assisted teleoperation and human-robot cooperation, where human operators can instruct the robots where to place their support contacts before whole-body reaching or manipulation using natural language. Words2Contact transforms the verbal instructions of a human operator into contact placement predictions; it also deals with iterative corrections, until the human is satisfied with the contact location identified in the robot's field of view. We benchmark state-of-the-art LLMs and VLMs for size and performance in contact prediction. We demonstrate the effectiveness of the iterative correction process, showing that users, even naive, quickly learn how to instruct the system to obtain accurate locations. Finally, we validate Words2Contact in real-world experiments with the Talos humanoid robot, instructed by human operators to place support contacts on different locations and surfaces to avoid falling when reaching for distant objects.

Via

Access Paper or Ask Questions

Flow Matching Imitation Learning for Multi-Support Manipulation

Jul 17, 2024

Quentin Rouxel, Andrea Ferrari, Serena Ivaldi, Jean-Baptiste Mouret

Figure 1 for Flow Matching Imitation Learning for Multi-Support Manipulation

Figure 2 for Flow Matching Imitation Learning for Multi-Support Manipulation

Figure 3 for Flow Matching Imitation Learning for Multi-Support Manipulation

Figure 4 for Flow Matching Imitation Learning for Multi-Support Manipulation

Abstract:Humanoid robots could benefit from using their upper bodies for support contacts, enhancing their workspace, stability, and ability to perform contact-rich and pushing tasks. In this paper, we propose a unified approach that combines an optimization-based multi-contact whole-body controller with Flow Matching, a recently introduced method capable of generating multi-modal trajectory distributions for imitation learning. In simulation, we show that Flow Matching is more appropriate for robotics than Diffusion and traditional behavior cloning. On a real full-size humanoid robot (Talos), we demonstrate that our approach can learn a whole-body non-prehensile box-pushing task and that the robot can close dishwasher drawers by adding contacts with its free hand when needed for balance. We also introduce a shared autonomy mode for assisted teleoperation, providing automatic contact placement for tasks not covered in the demonstrations. Full experimental videos are available at: https://hucebot.github.io/flow_multisupport_website/

Via

Access Paper or Ask Questions

Multi-Contact Whole Body Force Control for Position-Controlled Robots

Jan 16, 2024

Quentin Rouxel, Serena Ivaldi, Jean-Baptiste Mouret

Abstract:Many humanoid and multi-legged robots are controlled in positions rather than in torques, preventing direct control of contact forces, and hampering their ability to create multiple contacts to enhance their balance, such as placing a hand on a wall or a handrail. This paper introduces the SEIKO (Sequential Equilibrium Inverse Kinematic Optimization) pipeline, drawing inspiration from flexibility models used in serial elastic actuators to indirectly control contact forces on traditional position-controlled robots. SEIKO formulates whole-body retargeting from Cartesian commands and admittance control using two quadratic programs solved in real time. We validated our pipeline with experiments on the real, full-scale humanoid robot Talos in various multicontact scenarios, including pushing tasks, far-reaching tasks, stair climbing, and stepping on sloped surfaces. This work opens the possibility of stable, contact-rich behaviors while getting around many of the challenges of torque-controlled robots. Code and videos are available at https://hucebot.github.io/seiko_controller_website/ .

Via

Access Paper or Ask Questions

Analysis and Perspectives on the ANA Avatar XPRIZE Competition

Jan 10, 2024

Kris Hauser, Eleanor Watson, Joonbum Bae, Josh Bankston, Sven Behnke, Bill Borgia, Manuel G. Catalano, Stefano Dafarra, Jan B. F. van Erp, Thomas Ferris(+18 more)

Abstract:The ANA Avatar XPRIZE was a four-year competition to develop a robotic "avatar" system to allow a human operator to sense, communicate, and act in a remote environment as though physically present. The competition featured a unique requirement that judges would operate the avatars after less than one hour of training on the human-machine interfaces, and avatar systems were judged on both objective and subjective scoring metrics. This paper presents a unified summary and analysis of the competition from technical, judging, and organizational perspectives. We study the use of telerobotics technologies and innovations pursued by the competing teams in their avatar systems, and correlate the use of these technologies with judges' task performance and subjective survey ratings. It also summarizes perspectives from team leads, judges, and organizers about the competition's execution and impact to inform the future development of telerobotics and telepresence.

* 26 pages, preprint of article appearing in International Journal of Social Robotics

Via

Access Paper or Ask Questions

Feasibility Retargeting for Multi-contact Teleoperation and Physical Interaction

Aug 07, 2023

Quentin Rouxel, Ruoshi Wen, Zhibin Li, Carlo Tiseo, Jean-Baptiste Mouret, Serena Ivaldi

Figure 1 for Feasibility Retargeting for Multi-contact Teleoperation and Physical Interaction

Figure 2 for Feasibility Retargeting for Multi-contact Teleoperation and Physical Interaction

Figure 3 for Feasibility Retargeting for Multi-contact Teleoperation and Physical Interaction

Figure 4 for Feasibility Retargeting for Multi-contact Teleoperation and Physical Interaction

Abstract:This short paper outlines two recent works on multi-contact teleoperation and the development of the SEIKO (Sequential Equilibrium Inverse Kinematic Optimization) framework. SEIKO adapts commands from the operator in real-time and ensures that the reference configuration sent to the underlying controller is feasible. Additionally, an admittance scheme is used to implement physical interaction, which is then combined with the operator's command and retargeted. SEIKO has been applied in simulations on various robots, including humanoid and quadruped robots designed for loco-manipulation. Furthermore, SEIKO has been tested on real hardware for bimanual heavy object carrying tasks.

* 2nd Workshop Toward Robot Avatars, 2023 IEEE International Conference on Robotics and Automation (ICRA), Jun 2023, London, United Kingdom

Via

Access Paper or Ask Questions

Simultaneous Action Recognition and Human Whole-Body Motion and Dynamics Prediction from Wearable Sensors

Mar 14, 2023

Kourosh Darvish, Serena Ivaldi, Daniele Pucci

Abstract:This paper presents a novel approach to solve simultaneously the problems of human activity recognition and whole-body motion and dynamics prediction for real-time applications. Starting from the dynamics of human motion and motor system theory, the notion of mixture of experts from deep learning has been extended to address this problem. In the proposed approach, experts are modelled as a sequence-to-sequence recurrent neural networks (RNN) architecture. Experiments show the results of 66-DoF real-world human motion prediction and action recognition during different tasks like walking and rotating. The code associated with this paper is available at: \url{github.com/ami-iit/paper_darvish_2022_humanoids_action-kindyn-predicition}

Via

Access Paper or Ask Questions

Teleoperation of Humanoid Robots: A Survey

Jan 11, 2023

Kourosh Darvish, Luigi Penco, Joao Ramos, Rafael Cisneros, Jerry Pratt, Eiichi Yoshida, Serena Ivaldi, Daniele Pucci

Figure 1 for Teleoperation of Humanoid Robots: A Survey

Figure 2 for Teleoperation of Humanoid Robots: A Survey

Figure 3 for Teleoperation of Humanoid Robots: A Survey

Figure 4 for Teleoperation of Humanoid Robots: A Survey

Abstract:Teleoperation of humanoid robots enables the integration of the cognitive skills and domain expertise of humans with the physical capabilities of humanoid robots. The operational versatility of humanoid robots makes them the ideal platform for a wide range of applications when teleoperating in a remote environment. However, the complexity of humanoid robots imposes challenges for teleoperation, particularly in unstructured dynamic environments with limited communication. Many advancements have been achieved in the last decades in this area, but a comprehensive overview is still missing. This survey paper gives an extensive overview of humanoid robot teleoperation, presenting the general architecture of a teleoperation system and analyzing the different components. We also discuss different aspects of the topic, including technological and methodological advances, as well as potential applications. A web-based version of the paper can be found at https://humanoid-teleoperation.github.io/.

Via

Access Paper or Ask Questions

Data-efficient learning of object-centric grasp preferences

Mar 01, 2022

Yoann Fleytoux, Anji Ma, Serena Ivaldi, Jean-Baptiste Mouret

Figure 1 for Data-efficient learning of object-centric grasp preferences

Figure 2 for Data-efficient learning of object-centric grasp preferences

Figure 3 for Data-efficient learning of object-centric grasp preferences

Figure 4 for Data-efficient learning of object-centric grasp preferences

Abstract:Grasping made impressive progress during the last few years thanks to deep learning. However, there are many objects for which it is not possible to choose a grasp by only looking at an RGB-D image, might it be for physical reasons (e.g., a hammer with uneven mass distribution) or task constraints (e.g., food that should not be spoiled). In such situations, the preferences of experts need to be taken into account. In this paper, we introduce a data-efficient grasping pipeline (Latent Space GP Selector -- LGPS) that learns grasp preferences with only a few labels per object (typically 1 to 4) and generalizes to new views of this object. Our pipeline is based on learning a latent space of grasps with a dataset generated with any state-of-the-art grasp generator (e.g., Dex-Net). This latent space is then used as a low-dimensional input for a Gaussian process classifier that selects the preferred grasp among those proposed by the generator. The results show that our method outperforms both GR-ConvNet and GG-CNN (two state-of-the-art methods that are also based on labeled grasps) on the Cornell dataset, especially when only a few labels are used: only 80 labels are enough to correctly choose 80% of the grasps (885 scenes, 244 objects). Results are similar on our dataset (91 scenes, 28 objects).

* Video: https://youtu.be/dJ1fkcught4

Via

Access Paper or Ask Questions