Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wanxin Jin

TwinTrack: Bridging Vision and Contact Physics for Real-Time Tracking of Unknown Dynamic Objects

May 28, 2025

Wen Yang, Zhixian Xie, Xuechao Zhang, Heni Ben Amor, Shan Lin, Wanxin Jin

Abstract:Real-time tracking of previously unseen, highly dynamic objects in contact-rich environments -- such as during dexterous in-hand manipulation -- remains a significant challenge. Purely vision-based tracking often suffers from heavy occlusions due to the frequent contact interactions and motion blur caused by abrupt motion during contact impacts. We propose TwinTrack, a physics-aware visual tracking framework that enables robust and real-time 6-DoF pose tracking of unknown dynamic objects in a contact-rich scene by leveraging the contact physics of the observed scene. At the core of TwinTrack is an integration of Real2Sim and Sim2Real. In Real2Sim, we combine the complementary strengths of vision and contact physics to estimate object's collision geometry and physical properties: object's geometry is first reconstructed from vision, then updated along with other physical parameters from contact dynamics for physical accuracy. In Sim2Real, robust pose estimation of the object is achieved by adaptive fusion between visual tracking and prediction of the learned contact physics. TwinTrack is built on a GPU-accelerated, deeply customized physics engine to ensure real-time performance. We evaluate our method on two contact-rich scenarios: object falling with rich contact impacts against the environment, and contact-rich in-hand manipulation. Experimental results demonstrate that, compared to baseline methods, TwinTrack achieves significantly more robust, accurate, and real-time 6-DoF tracking in these challenging scenarios, with tracking speed exceeding 20 Hz. Project page: https://irislab.tech/TwinTrack-webpage/

Via

Access Paper or Ask Questions

Robust Reward Alignment via Hypothesis Space Batch Cutting

Feb 06, 2025

Zhixian Xie, Haode Zhang, Yizhe Feng, Wanxin Jin

Abstract:Reward design for reinforcement learning and optimal control agents is challenging. Preference-based alignment addresses this by enabling agents to learn rewards from ranked trajectory pairs provided by humans. However, existing methods often struggle from poor robustness to unknown false human preferences. In this work, we propose a robust and efficient reward alignment method based on a novel and geometrically interpretable perspective: hypothesis space batched cutting. Our method iteratively refines the reward hypothesis space through "cuts" based on batches of human preferences. Within each batch, human preferences, queried based on disagreement, are grouped using a voting function to determine the appropriate cut, ensuring a bounded human query complexity. To handle unknown erroneous preferences, we introduce a conservative cutting method within each batch, preventing erroneous human preferences from making overly aggressive cuts to the hypothesis space. This guarantees provable robustness against false preferences. We evaluate our method in a model predictive control setting across diverse tasks, including DM-Control, dexterous in-hand manipulation, and locomotion. The results demonstrate that our framework achieves comparable or superior performance to state-of-the-art methods in error-free settings while significantly outperforming existing method when handling high percentage of erroneous human preferences.

* 17 pages, including appendix

Via

Access Paper or Ask Questions

Whole-Body Impedance Coordinative Control of Wheel-Legged Robot on Uncertain Terrain

Nov 15, 2024

Lei Shi, Xinghua Yu, Cheng Zhou, Wanxin Jin, Wanchao Chi, Shenghao Zhang, Dongsheng Zhang, Xiong Li, Zhengyou Zhang

Figure 1 for Whole-Body Impedance Coordinative Control of Wheel-Legged Robot on Uncertain Terrain

Figure 2 for Whole-Body Impedance Coordinative Control of Wheel-Legged Robot on Uncertain Terrain

Figure 3 for Whole-Body Impedance Coordinative Control of Wheel-Legged Robot on Uncertain Terrain

Figure 4 for Whole-Body Impedance Coordinative Control of Wheel-Legged Robot on Uncertain Terrain

Abstract:This article propose a whole-body impedance coordinative control framework for a wheel-legged humanoid robot to achieve adaptability on complex terrains while maintaining robot upper body stability. The framework contains a bi-level control strategy. The outer level is a variable damping impedance controller, which optimizes the damping parameters to ensure the stability of the upper body while holding an object. The inner level employs Whole-Body Control (WBC) optimization that integrates real-time terrain estimation based on wheel-foot position and force data. It generates motor torques while accounting for dynamic constraints, joint limits,friction cones, real-time terrain updates, and a model-free friction compensation strategy. The proposed whole-body coordinative control method has been tested on a recently developed quadruped humanoid robot. The results demonstrate that the proposed algorithm effectively controls the robot, maintaining upper body stability to successfully complete a water-carrying task while adapting to varying terrains.

Via

Access Paper or Ask Questions

Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos

Oct 11, 2024

Harsh Mahesheka, Zhixian Xie, Zhaoran Wang, Wanxin Jin

Figure 1 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos

Figure 2 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos

Figure 3 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos

Figure 4 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos

Abstract:Learning from Demonstrations, particularly from biological experts like humans and animals, often encounters significant data acquisition challenges. While recent approaches leverage internet videos for learning, they require complex, task-specific pipelines to extract and retarget motion data for the agent. In this work, we introduce a language-model-assisted bi-level programming framework that enables a reinforcement learning agent to directly learn its reward from internet videos, bypassing dedicated data preparation. The framework includes two levels: an upper level where a vision-language model (VLM) provides feedback by comparing the learner's behavior with expert videos, and a lower level where a large language model (LLM) translates this feedback into reward updates. The VLM and LLM collaborate within this bi-level framework, using a "chain rule" approach to derive a valid search direction for reward learning. We validate the method for reward learning from YouTube videos, and the results have shown that the proposed method enables efficient reward design from expert videos of biological agents for complex behavior synthesis.

Via

Access Paper or Ask Questions

ContactSDF: Signed Distance Functions as Multi-Contact Models for Dexterous Manipulation

Aug 18, 2024

Wen Yang, Wanxin Jin

Abstract:In this paper, we propose ContactSDF, a method that uses signed distance functions (SDFs) to approximate multi-contact models, including both collision detection and time-stepping routines. ContactSDF first establishes an SDF using the supporting plane representation of an object for collision detection, and then use the generated contact dual cones to build a second SDF for time stepping prediction of the next state. Those two SDFs create a differentiable and closed-form multi-contact dynamic model for state prediction, enabling efficient model learning and optimization for contact-rich manipulation. We perform extensive simulation experiments to show the effectiveness of ContactSDF for model learning and real-time control of dexterous manipulation. We further evaluate the ContactSDF on a hardware Allegro hand for on-palm reorientation tasks. Results show with around 2 minutes of learning on hardware, the ContactSDF achieves high-quality dexterous manipulation at a frequency of 30-60Hz.

Via

Access Paper or Ask Questions

Complementarity-Free Multi-Contact Modeling and Optimization for Dexterous Manipulation

Aug 14, 2024

Wanxin Jin

Abstract:A significant barrier preventing model-based methods from matching the high performance of reinforcement learning in dexterous manipulation is the inherent complexity of multi-contact dynamics. Traditionally formulated using complementarity models, multi-contact dynamics introduces combinatorial complexity and non-smoothness, complicating contact-rich planning and control. In this paper, we circumvent these challenges by introducing a novel, simplified multi-contact model. Our new model, derived from the duality of optimization-based contact models, dispenses with the complementarity constructs entirely, providing computational advantages such as explicit time stepping, differentiability, automatic satisfaction of Coulomb friction law, and minimal hyperparameter tuning. We demonstrate the effectiveness and efficiency of the model for planning and control in a range of challenging dexterous manipulation tasks, including fingertip 3D in-air manipulation, TriFinger in-hand manipulation, and Allegro hand on-palm reorientation, all with diverse objects. Our method consistently achieves state-of-the-art results: (I) a 96.5% average success rate across tasks, (II) high manipulation accuracy with an average reorientation error of 11{\deg} and position error of 7.8 mm, and (III) model predictive control running at 50-100 Hz for all tested dexterous manipulation tasks. These results are achieved with minimal hyperparameter tuning.

* Video demo: https://youtu.be/NsL4hbSXvFg

Via

Access Paper or Ask Questions

A Differential Dynamic Programming Framework for Inverse Reinforcement Learning

Jul 29, 2024

Kun Cao, Xinhang Xu, Wanxin Jin, Karl H. Johansson, Lihua Xie

Abstract:A differential dynamic programming (DDP)-based framework for inverse reinforcement learning (IRL) is introduced to recover the parameters in the cost function, system dynamics, and constraints from demonstrations. Different from existing work, where DDP was used for the inner forward problem with inequality constraints, our proposed framework uses it for efficient computation of the gradient required in the outer inverse problem with equality and inequality constraints. The equivalence between the proposed method and existing methods based on Pontryagin's Maximum Principle (PMP) is established. More importantly, using this DDP-based IRL with an open-loop loss function, a closed-loop IRL framework is presented. In this framework, a loss function is proposed to capture the closed-loop nature of demonstrations. It is shown to be better than the commonly used open-loop loss function. We show that the closed-loop IRL framework reduces to a constrained inverse optimal control problem under certain assumptions. Under these assumptions and a rank condition, it is proven that the learning parameters can be recovered from the demonstration data. The proposed framework is extensively evaluated through four numerical robot examples and one real-world quadrotor system. The experiments validate the theoretical results and illustrate the practical relevance of the approach.

* 20 pages, 15 figures; submitted to IEEE for potential publication

Via

Access Paper or Ask Questions

Safe MPC Alignment with Human Directional Feedback

Jul 05, 2024

Zhixian Xie, Wenlong Zhang, Yi Ren, Zhaoran Wang, George J. Pappas, Wanxin Jin

Abstract:In safety-critical robot planning or control, manually specifying safety constraints or learning them from demonstrations can be challenging. In this paper, we propose a certifiable alignment method for a robot to learn a safety constraint in its model predictive control (MPC) policy with human online directional feedback. To our knowledge, it is the first method to learn safety constraints from human feedback. The proposed method is based on an empirical observation: human directional feedback, when available, tends to guide the robot toward safer regions. The method only requires the direction of human feedback to update the learning hypothesis space. It is certifiable, providing an upper bound on the total number of human feedback in the case of successful learning of safety constraints, or declaring the misspecification of the hypothesis space, i.e., the true implicit safety constraint cannot be found within the specified hypothesis space. We evaluated the proposed method using numerical examples and user studies in two developed simulation games. Additionally, we implemented and tested the proposed method on a real-world Franka robot arm performing mobile water-pouring tasks in a user study. The simulation and experimental results demonstrate the efficacy and efficiency of our method, showing that it enables a robot to successfully learn safety constraints with a small handful (tens) of human directional corrections.

* 18 pages, submission to T-RO

Via

Access Paper or Ask Questions

How Can LLM Guide RL? A Value-Based Approach

Feb 25, 2024

Shenao Zhang, Sirui Zheng, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang

Abstract:Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. However, RL algorithms may require extensive trial-and-error interactions to collect useful feedback for improvement. On the other hand, recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities for planning tasks, lacking the ability to autonomously refine their responses based on feedback. Therefore, in this paper, we study how the policy prior provided by the LLM can enhance the sample efficiency of RL algorithms. Specifically, we develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning, particularly when the difference between the ideal policy and the LLM-informed policy is small, which suggests that the initial policy is close to optimal, reducing the need for further exploration. Additionally, we present a practical algorithm SLINVIT that simplifies the construction of the value function and employs subgoals to reduce the search complexity. Our experiments across three interactive environments ALFWorld, InterCode, and BlocksWorld demonstrate that our method achieves state-of-the-art success rates and also surpasses previous RL and LLM approaches in terms of sample efficiency. Our code is available at https://github.com/agentification/Language-Integrated-VI.

Via

Access Paper or Ask Questions

Adaptive Contact-Implicit Model Predictive Control with Online Residual Learning

Oct 15, 2023

Wei-Cheng Huang, Alp Aydinoglu, Wanxin Jin, Michael Posa

Abstract:The hybrid nature of multi-contact robotic systems, due to making and breaking contact with the environment, creates significant challenges for high-quality control. Existing model-based methods typically rely on either good prior knowledge of the multi-contact model or require significant offline model tuning effort, thus resulting in low adaptability and robustness. In this paper, we propose a real-time adaptive multi-contact model predictive control framework, which enables online adaption of the hybrid multi-contact model and continuous improvement of the control performance for contact-rich tasks. This framework includes an adaption module, which continuously learns a residual of the hybrid model to minimize the gap between the prior model and reality, and a real-time multi-contact MPC controller. We demonstrated the effectiveness of the framework in synthetic examples, and applied it on hardware to solve contact-rich manipulation tasks, where a robot uses its end-effector to roll different unknown objects on a table to track given paths. The hardware experiments show that with a rough prior model, the multi-contact MPC controller adapts itself on-the-fly with an adaption rate around 20 Hz and successfully manipulates previously unknown objects with non-smooth surface geometries.

* Wei-Cheng Huang and Alp Aydinoglu contributed equally to this work

Via

Access Paper or Ask Questions