Abstract:Reward shaping is effective in addressing the sparse-reward challenge in reinforcement learning by providing immediate feedback through auxiliary informative rewards. Based on the reward shaping strategy, we propose a novel multi-task reinforcement learning framework, that integrates a centralized reward agent (CRA) and multiple distributed policy agents. The CRA functions as a knowledge pool, which aims to distill knowledge from various tasks and distribute it to individual policy agents to improve learning efficiency. Specifically, the shaped rewards serve as a straightforward metric to encode knowledge. This framework not only enhances knowledge sharing across established tasks but also adapts to new tasks by transferring valuable reward signals. We validate the proposed method on both discrete and continuous domains, demonstrating its robustness in multi-task sparse-reward settings and its effective transferability to unseen tasks.
Abstract:Reward shaping addresses the challenge of sparse rewards in reinforcement learning by constructing denser and more informative reward signals. To achieve self-adaptive and highly efficient reward shaping, we propose a novel method that incorporates success rates derived from historical experiences into shaped rewards. Our approach utilizes success rates sampled from Beta distributions, which dynamically evolve from uncertain to reliable values as more data is collected. Initially, the self-adaptive success rates exhibit more randomness to encourage exploration. Over time, they become more certain to enhance exploitation, thus achieving a better balance between exploration and exploitation. We employ Kernel Density Estimation (KDE) combined with Random Fourier Features (RFF) to derive the Beta distributions, resulting in a computationally efficient implementation in high-dimensional continuous state spaces. This method provides a non-parametric and learning-free approach. The proposed method is evaluated on a wide range of continuous control tasks with sparse and delayed rewards, demonstrating significant improvements in sample efficiency and convergence stability compared to relevant baselines.
Abstract:Designing motion control and planning algorithms for multilift systems remains challenging due to the complexities of dynamics, collision avoidance, actuator limits, and scalability. Existing methods that use optimization and distributed techniques effectively address these constraints and scalability issues. However, they often require substantial manual tuning, leading to suboptimal performance. This paper proposes Auto-Multilift, a novel framework that automates the tuning of model predictive controllers (MPCs) for multilift systems. We model the MPC cost functions with deep neural networks (DNNs), enabling fast online adaptation to various scenarios. We develop a distributed policy gradient algorithm to train these DNNs efficiently in a closed-loop manner. Central to our algorithm is distributed sensitivity propagation, which parallelizes gradient computation across quadrotors, focusing on actual system state sensitivities relative to key MPC parameters. We also provide theoretical guarantees for the convergence of this algorithm. Extensive simulations show rapid convergence and favorable scalability to a large number of quadrotors. Our method outperforms a state-of-the-art open-loop MPC tuning approach by effectively learning adaptive MPCs from trajectory tracking errors and handling the unique dynamics couplings within the multilift system. Additionally, our framework can learn an adaptive reference for reconfigurating the system when traversing through multiple narrow slots.
Abstract:In this paper, we present a novel method for mobile manipulators to perform multiple contact-rich manipulation tasks. While learning-based methods have the potential to generate actions in an end-to-end manner, they often suffer from insufficient action accuracy and robustness against noise. On the other hand, classical control-based methods can enhance system robustness, but at the cost of extensive parameter tuning. To address these challenges, we present MOMA-Force, a visual-force imitation method that seamlessly combines representation learning for perception, imitation learning for complex motion generation, and admittance whole-body control for system robustness and controllability. MOMA-Force enables a mobile manipulator to learn multiple complex contact-rich tasks with high success rates and small contact forces. In a real household setting, our method outperforms baseline methods in terms of task success rates. Moreover, our method achieves smaller contact forces and smaller force variances compared to baseline methods without force imitation. Overall, we offer a promising approach for efficient and robust mobile manipulation in the real world. Videos and more details can be found on \url{https://visual-force-imitation.github.io}