Abstract:Safety has been recognized as the central obstacle to preventing the use of reinforcement learning (RL) for real-world applications. Different methods have been developed to deal with safety concerns in RL. However, learning reliable RL-based solutions usually require a large number of interactions with the environment. Likewise, how to improve the learning efficiency, specifically, how to utilize transfer learning for safe reinforcement learning, has not been well studied. In this work, we propose an adaptive aggregation framework for safety-critical control. Our method comprises two key techniques: 1) we learn to transfer the safety knowledge by aggregating the multiple source tasks and a target task through the attention network; 2) we separate the goal of improving task performance and reducing constraint violations by utilizing a safeguard. Experiment results demonstrate that our algorithm can achieve fewer safety violations while showing better data efficiency compared with several baselines.
Abstract:The building sector has been recognized as one of the primary sectors for worldwide energy consumption. Improving the energy efficiency of the building sector can help reduce the operation cost and reduce the greenhouse gas emission. The energy management system (EMS) can monitor and control the operations of built-in appliances in buildings, so an efficient EMS is of crucial importance to improve the building operation efficiency and maintain safe operations. With the growing penetration of renewable energy and electrical appliances, increasing attention has been paid to the development of intelligent building EMS. Recently, reinforcement learning (RL) has been applied for building EMS and has shown promising potential. However, most of the current RL-based EMS solutions would need a large amount of data to learn a reliable control policy, which limits the applicability of these solutions in the real world. In this work, we propose MetaEMS, which can help achieve better energy management performance with the benefits of RL and meta-learning. Experiment results showcase that our proposed MetaEMS can adapt faster to environment changes and perform better in most situations compared with other baselines.
Abstract:Vision-Language Navigation requires the agent to follow natural language instructions to reach a specific target. The large discrepancy between seen and unseen environments makes it challenging for the agent to generalize well. Previous studies propose data augmentation methods to mitigate the data bias explicitly or implicitly and provide improvements in generalization. However, they try to memorize augmented trajectories and ignore the distribution shifts under unseen environments at test time. In this paper, we propose an Unseen Discrepancy Anticipating Vision and Language Navigation (DAVIS) that learns to generalize to unseen environments via encouraging test-time visual consistency. Specifically, we devise: 1) a semi-supervised framework DAVIS that leverages visual consistency signals across similar semantic observations. 2) a two-stage learning procedure that encourages adaptation to test-time distribution. The framework enhances the basic mixture of imitation and reinforcement learning with Momentum Contrast to encourage stable decision-making on similar observations under a joint training stage and a test-time adaptation stage. Extensive experiments show that DAVIS achieves model-agnostic improvement over previous state-of-the-art VLN baselines on R2R and RxR benchmarks. Our source code and data are in supplemental materials.