Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhanpeng He

Meta-World+: An Improved, Standardized, RL Benchmark

May 16, 2025

Reginald McLean, Evangelos Chatzaroulas, Luc McCutcheon, Frank Röder, Tianhe Yu, Zhanpeng He, K. R. Zentner, Ryan Julian, J K Terry, Isaac Woungang(+2 more)

Abstract:Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills simultaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the past versions of Meta-World to provide insights into multi-task and meta-reinforcement learning benchmark design. Through this process we release a new open-source version of Meta-World (https://github.com/Farama-Foundation/Metaworld/) that has full reproducibility of past results, is more technically ergonomic, and gives users more control over the tasks that are included in a task set.

Via

Access Paper or Ask Questions

VibeCheck: Using Active Acoustic Tactile Sensing for Contact-Rich Manipulation

Apr 22, 2025

Kaidi Zhang, Do-Gon Kim, Eric T. Chang, Hua-Hsuan Liang, Zhanpeng He, Kathryn Lampo, Philippe Wu, Ioannis Kymissis, Matei Ciocarlie

Abstract:The acoustic response of an object can reveal a lot about its global state, for example its material properties or the extrinsic contacts it is making with the world. In this work, we build an active acoustic sensing gripper equipped with two piezoelectric fingers: one for generating signals, the other for receiving them. By sending an acoustic vibration from one finger to the other through an object, we gain insight into an object's acoustic properties and contact state. We use this system to classify objects, estimate grasping position, estimate poses of internal structures, and classify the types of extrinsic contacts an object is making with the environment. Using our contact type classification model, we tackle a standard long-horizon manipulation problem: peg insertion. We use a simple simulated transition model based on the performance of our sensor to train an imitation learning policy that is robust to imperfect predictions from the classifier. We finally demonstrate the policy on a UR5 robot with active acoustic sensing as the only feedback.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions

Task-Based Design and Policy Co-Optimization for Tendon-driven Underactuated Kinematic Chains

May 23, 2024

Sharfin Islam, Zhanpeng He, Matei Ciocarlie

Abstract:Underactuated manipulators reduce the number of bulky motors, thereby enabling compact and mechanically robust designs. However, fewer actuators than joints means that the manipulator can only access a specific manifold within the joint space, which is particular to a given hardware configuration and can be low-dimensional and/or discontinuous. Determining an appropriate set of hardware parameters for this class of mechanisms, therefore, is difficult - even for traditional task-based co-optimization methods. In this paper, our goal is to implement a task-based design and policy co-optimization method for underactuated, tendon-driven manipulators. We first formulate a general model for an underactuated, tendon-driven transmission. We then use this model to co-optimize a three-link, two-actuator kinematic chain using reinforcement learning. We demonstrate that our optimized tendon transmission and control policy can be transferred reliably to physical hardware with real-world reaching experiments.

Via

Access Paper or Ask Questions

MORPH: Design Co-optimization with Reinforcement Learning via a Differentiable Hardware Model Proxy

Sep 29, 2023

Zhanpeng He, Matei Ciocarlie

Abstract:We introduce MORPH, a method for co-optimization of hardware design parameters and control policies in simulation using reinforcement learning. Like most co-optimization methods, MORPH relies on a model of the hardware being optimized, usually simulated based on the laws of physics. However, such a model is often difficult to integrate into an effective optimization routine. To address this, we introduce a proxy hardware model, which is always differentiable and enables efficient co-optimization alongside a long-horizon control policy using RL. MORPH is designed to ensure that the optimized hardware proxy remains as close as possible to its realistic counterpart, while still enabling task completion. We demonstrate our approach on simulated 2D reaching and 3D multi-fingered manipulation tasks.

Via

Access Paper or Ask Questions

Pick2Place: Task-aware 6DoF Grasp Estimation via Object-Centric Perspective Affordance

Apr 08, 2023

Zhanpeng He, Nikhil Chavan-Dafle, Jinwook Huh, Shuran Song, Volkan Isler

Abstract:The choice of a grasp plays a critical role in the success of downstream manipulation tasks. Consider a task of placing an object in a cluttered scene; the majority of possible grasps may not be suitable for the desired placement. In this paper, we study the synergy between the picking and placing of an object in a cluttered scene to develop an algorithm for task-aware grasp estimation. We present an object-centric action space that encodes the relationship between the geometry of the placement scene and the object to be placed in order to provide placement affordance maps directly from perspective views of the placement scene. This action space enables the computation of a one-to-one mapping between the placement and picking actions allowing the robot to generate a diverse set of pick-and-place proposals and to optimize for a grasp under other task constraints such as robot kinematics and collision avoidance. With experiments both in simulation and on a real robot we demonstrate that with our method, the robot is able to successfully complete the task of placement-aware grasping with over 89% accuracy in such a way that generalizes to novel objects and scenes.

* IEEE International Conference on Robotics and Automation 2023

Via

Access Paper or Ask Questions

Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

Mar 14, 2023

Siddharth Singi, Zhanpeng He, Alvin Pan, Sandip Patel, Gunnar A. Sigurdsson, Robinson Piramuthu, Shuran Song, Matei Ciocarlie

Abstract:In a Human-in-the-Loop paradigm, a robotic agent is able to act mostly autonomously in solving a task, but can request help from an external expert when needed. However, knowing when to request such assistance is critical: too few requests can lead to the robot making mistakes, but too many requests can overload the expert. In this paper, we present a Reinforcement Learning based approach to this problem, where a semi-autonomous agent asks for external assistance when it has low confidence in the eventual success of the task. The confidence level is computed by estimating the variance of the return from the current state. We show that this estimate can be iteratively improved during training using a Bellman-like recursion. On discrete navigation problems with both fully- and partially-observable state information, we show that our method makes effective use of a limited budget of expert calls at run-time, despite having no access to the expert at training time.

Via

Access Paper or Ask Questions

Discovering Synergies for Robot Manipulation with Multi-Task Reinforcement Learning

Oct 04, 2021

Zhanpeng He, Matei Ciocarlie

Figure 1 for Discovering Synergies for Robot Manipulation with Multi-Task Reinforcement Learning

Figure 2 for Discovering Synergies for Robot Manipulation with Multi-Task Reinforcement Learning

Figure 3 for Discovering Synergies for Robot Manipulation with Multi-Task Reinforcement Learning

Figure 4 for Discovering Synergies for Robot Manipulation with Multi-Task Reinforcement Learning

Abstract:Controlling robotic manipulators with high-dimensional action spaces for dexterous tasks is a challenging problem. Inspired by human manipulation, researchers have studied generating and using postural synergies for robot hands to accomplish manipulation tasks, leveraging the lower dimensional nature of synergistic action spaces. However, many of these works require pre-collected data from an existing controller in order to derive such a subspace by means of dimensionality reduction. In this paper, we present a framework that simultaneously discovers a synergy space and a multi-task policy that operates on this low-dimensional action space to accomplish diverse manipulation tasks. We demonstrate that our end-to-end method is able to perform multiple tasks using few synergies, and outperforms sequential methods that apply dimensionality reduction to independently collected data. We also show that deriving synergies using multiple tasks can lead to a subspace that enables robots to efficiently learn new manipulation tasks and interactions with new objects.

Via

Access Paper or Ask Questions

UMPNet: Universal Manipulation Policy Network for Articulated Objects

Sep 19, 2021

Zhenjia Xu, Zhanpeng He, Shuran Song

Figure 1 for UMPNet: Universal Manipulation Policy Network for Articulated Objects

Figure 2 for UMPNet: Universal Manipulation Policy Network for Articulated Objects

Figure 3 for UMPNet: Universal Manipulation Policy Network for Articulated Objects

Figure 4 for UMPNet: Universal Manipulation Policy Network for Articulated Objects

Abstract:We introduce the Universal Manipulation Policy Network (UMPNet) -- a single image-based policy network that infers closed-loop action sequences for manipulating arbitrary articulated objects. To infer a wide range of action trajectories, the policy supports 6DoF action representation and varying trajectory length. To handle a diverse set of objects, the policy learns from objects with different articulation structures and generalizes to unseen objects or categories. The policy is trained with self-guided exploration without any human demonstrations, scripted policy, or pre-defined goal conditions. To support effective multi-step interaction, we introduce a novel Arrow-of-Time action attribute that indicates whether an action will change the object state back to the past or forward into the future. With the Arrow-of-Time inference at each interaction step, the learned policy is able to select actions that consistently lead towards or away from a given state, thereby, enabling both effective state exploration and goal-conditioned manipulation. Video is available at https://youtu.be/KqlvcL9RqKM

Via

Access Paper or Ask Questions

Learning 3D Dynamic Scene Representations for Robot Manipulation

Nov 03, 2020

Zhenjia Xu, Zhanpeng He, Jiajun Wu, Shuran Song

Figure 1 for Learning 3D Dynamic Scene Representations for Robot Manipulation

Figure 2 for Learning 3D Dynamic Scene Representations for Robot Manipulation

Figure 3 for Learning 3D Dynamic Scene Representations for Robot Manipulation

Figure 4 for Learning 3D Dynamic Scene Representations for Robot Manipulation

Abstract:3D scene representation for robot manipulation should capture three key object properties: permanency -- objects that become occluded over time continue to exist; amodal completeness -- objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity -- the movement of each object is continuous over space and time. In this paper, we introduce 3D Dynamic Scene Representation (DSR), a 3D volumetric scene representation that simultaneously discovers, tracks, reconstructs objects, and predicts their dynamics while capturing all three properties. We further propose DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR. Our model achieves state-of-the-art performance in modeling 3D scene dynamics with DSR on both simulated and real data. Combined with model predictive control, DSR-Net enables accurate planning in downstream robotic manipulation tasks such as planar pushing. Video is available at https://youtu.be/GQjYG3nQJ80.

* CoRL 2020. The first two authors contributed equally to this paper. Project page: https://dsr-net.cs.columbia.edu/

Via

Access Paper or Ask Questions

Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning

Aug 11, 2020

Tianjian Chen, Zhanpeng He, Matei Ciocarlie

Figure 1 for Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning

Figure 2 for Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning

Figure 3 for Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning

Figure 4 for Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning

Abstract:Deep Reinforcement Learning (RL) has shown great success in learning complex control policies for a variety of applications in robotics. However, in most such cases, the hardware of the robot has been considered immutable, modeled as part of the environment. In this study, we explore the problem of learning hardware and control parameters together in a unified RL framework. To achieve this, we propose to model aspects of the robot's hardware as a "mechanical policy", analogous to and optimized jointly with its computational counterpart. We show that, by modeling such mechanical policies as auto-differentiable computational graphs, the ensuing optimization problem can be solved efficiently by gradient-based algorithms from the Policy Optimization family. We present two such design examples: a toy mass-spring problem, and a real-world problem of designing an underactuated hand. We compare our method against traditional co-optimization approaches, and also demonstrate its effectiveness by building a physical prototype based on the learned hardware parameters.

* Submitted to Conference on Robot Learning (CoRL) 2020

Via

Access Paper or Ask Questions