Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiu Yuan

Responsive Noise-Relaying Diffusion Policy: Responsive and Efficient Visuomotor Control

Feb 18, 2025

Zhuoqun Chen, Xiu Yuan, Tongzhou Mu, Hao Su

Figure 1 for Responsive Noise-Relaying Diffusion Policy: Responsive and Efficient Visuomotor Control

Figure 2 for Responsive Noise-Relaying Diffusion Policy: Responsive and Efficient Visuomotor Control

Figure 3 for Responsive Noise-Relaying Diffusion Policy: Responsive and Efficient Visuomotor Control

Figure 4 for Responsive Noise-Relaying Diffusion Policy: Responsive and Efficient Visuomotor Control

Abstract:Imitation learning is an efficient method for teaching robots a variety of tasks. Diffusion Policy, which uses a conditional denoising diffusion process to generate actions, has demonstrated superior performance, particularly in learning from multi-modal demonstrates. However, it relies on executing multiple actions to retain performance and prevent mode bouncing, which limits its responsiveness, as actions are not conditioned on the most recent observations. To address this, we introduce Responsive Noise-Relaying Diffusion Policy (RNR-DP), which maintains a noise-relaying buffer with progressively increasing noise levels and employs a sequential denoising mechanism that generates immediate, noise-free actions at the head of the sequence, while appending noisy actions at the tail. This ensures that actions are responsive and conditioned on the latest observations, while maintaining motion consistency through the noise-relaying buffer. This design enables the handling of tasks requiring responsive control, and accelerates action generation by reusing denoising steps. Experiments on response-sensitive tasks demonstrate that, compared to Diffusion Policy, ours achieves 18% improvement in success rate. Further evaluation on regular tasks demonstrates that RNR-DP also exceeds the best acceleration method by 6.9%, highlighting its computational efficiency advantage in scenarios where responsiveness is less critical.

Via

Access Paper or Ask Questions

When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning?

Dec 18, 2024

Tongzhou Mu, Zhaoyang Li, Stanisław Wiktor Strzelecki, Xiu Yuan, Yunchao Yao, Litian Liang, Hao Su

Figure 1 for When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning?

Figure 2 for When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning?

Figure 3 for When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning?

Figure 4 for When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning?

Abstract:Learning policies from high-dimensional visual inputs, such as pixels and point clouds, is crucial in various applications. Visual reinforcement learning is a promising approach that directly trains policies from visual observations, although it faces challenges in sample efficiency and computational costs. This study conducts an empirical comparison of State-to-Visual DAgger, a two-stage framework that initially trains a state policy before adopting online imitation to learn a visual policy, and Visual RL across a diverse set of tasks. We evaluate both methods across 16 tasks from three benchmarks, focusing on their asymptotic performance, sample efficiency, and computational costs. Surprisingly, our findings reveal that State-to-Visual DAgger does not universally outperform Visual RL but shows significant advantages in challenging tasks, offering more consistent performance. In contrast, its benefits in sample efficiency are less pronounced, although it often reduces the overall wall-clock time required for training. Based on our findings, we provide recommendations for practitioners and hope that our results contribute valuable perspectives for future research in visual policy learning.

* Accepted by The 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025)

Via

Access Paper or Ask Questions

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Dec 18, 2024

Xiu Yuan, Tongzhou Mu, Stone Tao, Yunhao Fang, Mengke Zhang, Hao Su

Figure 1 for Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Figure 2 for Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Figure 3 for Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Figure 4 for Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Abstract:Recent advancements in robot learning have used imitation learning with large models and extensive demonstrations to develop effective policies. However, these models are often limited by the quantity, quality, and diversity of demonstrations. This paper explores improving offline-trained imitation learning models through online interactions with the environment. We introduce Policy Decorator, which uses a model-agnostic residual policy to refine large imitation learning models during online interactions. By implementing controlled exploration strategies, Policy Decorator enables stable, sample-efficient online learning. Our evaluation spans eight tasks across two benchmarks-ManiSkill and Adroit-and involves two state-of-the-art imitation learning models (Behavior Transformer and Diffusion Policy). The results show Policy Decorator effectively improves the offline-trained policies and preserves the smooth motion of imitation learning models, avoiding the erratic behaviors of pure RL policies. See our project page (https://policydecorator.github.io) for videos.

* Explore videos, data, code, and more at https://policydecorator.github.io

Via

Access Paper or Ask Questions

Unpacking the Individual Components of Diffusion Policy

Nov 27, 2024

Xiu Yuan

Figure 1 for Unpacking the Individual Components of Diffusion Policy

Figure 2 for Unpacking the Individual Components of Diffusion Policy

Figure 3 for Unpacking the Individual Components of Diffusion Policy

Figure 4 for Unpacking the Individual Components of Diffusion Policy

Abstract:Imitation Learning presents a promising approach for learning generalizable and complex robotic skills. The recently proposed Diffusion Policy generates robot action sequences through a conditional denoising diffusion process, achieving state-of-the-art performance compared to other imitation learning methods. This paper summarizes five key components of Diffusion Policy: 1) observation sequence input; 2) action sequence execution; 3) receding horizon; 4) U-Net or Transformer network architecture; and 5) FiLM conditioning. By conducting experiments across ManiSkill and Adroit benchmarks, this study aims to elucidate the contribution of each component to the success of Diffusion Policy in various scenarios. We hope our findings will provide valuable insights for the application of Diffusion Policy in future research and industry.

Via

Access Paper or Ask Questions

GSL-PCD: Improving Generalist-Specialist Learning with Point Cloud Feature-based Task Partitioning

Nov 11, 2024

Xiu Yuan

Abstract:Generalization in Deep Reinforcement Learning (DRL) across unseen environment variations often requires training over a diverse set of scenarios. Many existing DRL algorithms struggle with efficiency when handling numerous variations. The Generalist-Specialist Learning (GSL) framework addresses this by first training a generalist model on all variations, then creating specialists from the generalist's weights, each focusing on a subset of variations. The generalist then refines its learning with assistance from the specialists. However, random task partitioning in GSL can impede performance by assigning vastly different variations to the same specialist, often resulting in each specialist focusing on only one variation, which raises computational costs. To improve this, we propose Generalist-Specialist Learning with Point Cloud Feature-based Task Partitioning (GSL-PCD). Our approach clusters environment variations based on features extracted from object point clouds and uses balanced clustering with a greedy algorithm to assign similar variations to the same specialist. Evaluations on robotic manipulation tasks from the ManiSkill benchmark demonstrate that point cloud feature-based partitioning outperforms vanilla partitioning by 9.4%, with a fixed number of specialists, and reduces computational and sample requirements by 50% to achieve comparable performance.

Via

Access Paper or Ask Questions