Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Woo Kyung Kim

Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

Dec 16, 2024

Wonje Choi, Woo Kyung Kim, SeungHyun Kim, Honguk Woo

Figure 1 for Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

Figure 2 for Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

Figure 3 for Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

Figure 4 for Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

Abstract:For embodied reinforcement learning (RL) agents interacting with the environment, it is desirable to have rapid policy adaptation to unseen visual observations, but achieving zero-shot adaptation capability is considered as a challenging problem in the RL context. To address the problem, we present a novel contrastive prompt ensemble (ConPE) framework which utilizes a pretrained vision-language model and a set of visual prompts, thus enabling efficient policy learning and adaptation upon a wide range of environmental and physical changes encountered by embodied agents. Specifically, we devise a guided-attention-based ensemble approach with multiple visual prompts on the vision-language model to construct robust state representations. Each prompt is contrastively learned in terms of an individual domain factor that significantly affects the agent's egocentric perception and observation. For a given task, the attention-based ensemble and policy are jointly learned so that the resulting state representations not only generalize to various domains but are also optimized for learning the task. Through experiments, we show that ConPE outperforms other state-of-the-art algorithms for several embodied agent tasks including navigation in AI2THOR, manipulation in egocentric-Metaworld, and autonomous driving in CARLA, while also improving the sample efficiency of policy learning and adaptation.

* Accepted at NeurIPS 2023

Via

Access Paper or Ask Questions

Embodied CoT Distillation From LLM To Off-the-shelf Agents

Dec 16, 2024

Wonje Choi, Woo Kyung Kim, Minjong Yoo, Honguk Woo

Figure 1 for Embodied CoT Distillation From LLM To Off-the-shelf Agents

Figure 2 for Embodied CoT Distillation From LLM To Off-the-shelf Agents

Figure 3 for Embodied CoT Distillation From LLM To Off-the-shelf Agents

Figure 4 for Embodied CoT Distillation From LLM To Off-the-shelf Agents

Abstract:We address the challenge of utilizing large language models (LLMs) for complex embodied tasks, in the environment where decision-making systems operate timely on capacity-limited, off-the-shelf devices. We present DeDer, a framework for decomposing and distilling the embodied reasoning capabilities from LLMs to efficient, small language model (sLM)-based policies. In DeDer, the decision-making process of LLM-based strategies is restructured into a hierarchy with a reasoning-policy and planning-policy. The reasoning-policy is distilled from the data that is generated through the embodied in-context learning and self-verification of an LLM, so it can produce effective rationales. The planning-policy, guided by the rationales, can render optimized plans efficiently. In turn, DeDer allows for adopting sLMs for both policies, deployed on off-the-shelf devices. Furthermore, to enhance the quality of intermediate rationales, specific to embodied tasks, we devise the embodied knowledge graph, and to generate multiple rationales timely through a single inference, we also use the contrastively prompted attention model. Our experiments with the ALFRED benchmark demonstrate that DeDer surpasses leading language planning and distillation approaches, indicating the applicability and efficiency of sLM-based embodied policies derived through DeDer.

* Accepted at ICML 2024

Via

Access Paper or Ask Questions

Incremental Learning of Retrievable Skills For Efficient Continual Task Adaptation

Oct 30, 2024

Daehee Lee, Minjong Yoo, Woo Kyung Kim, Wonje Choi, Honguk Woo

Figure 1 for Incremental Learning of Retrievable Skills For Efficient Continual Task Adaptation

Figure 2 for Incremental Learning of Retrievable Skills For Efficient Continual Task Adaptation

Figure 3 for Incremental Learning of Retrievable Skills For Efficient Continual Task Adaptation

Figure 4 for Incremental Learning of Retrievable Skills For Efficient Continual Task Adaptation

Abstract:Continual Imitation Learning (CiL) involves extracting and accumulating task knowledge from demonstrations across multiple stages and tasks to achieve a multi-task policy. With recent advancements in foundation models, there has been a growing interest in adapter-based CiL approaches, where adapters are established parameter-efficiently for tasks newly demonstrated. While these approaches isolate parameters for specific tasks and tend to mitigate catastrophic forgetting, they limit knowledge sharing among different demonstrations. We introduce IsCiL, an adapter-based CiL framework that addresses this limitation of knowledge sharing by incrementally learning shareable skills from different demonstrations, thus enabling sample-efficient task adaptation using the skills particularly in non-stationary CiL environments. In IsCiL, demonstrations are mapped into the state embedding space, where proper skills can be retrieved upon input states through prototype-based memory. These retrievable skills are incrementally learned on their corresponding adapters. Our CiL experiments with complex tasks in Franka-Kitchen and Meta-World demonstrate robust performance of IsCiL in both task adaptation and sample-efficiency. We also show a simple extension of IsCiL for task unlearning scenarios.

Via

Access Paper or Ask Questions

Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation

Aug 22, 2024

Woo Kyung Kim, Minjong Yoo, Honguk Woo

Abstract:Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL framework (ParIRL) that establishes a Pareto policy set from these limited datasets. In the framework, the Pareto policy set is then distilled into a single, preference-conditioned diffusion model, thus allowing users to immediately specify which expert's patterns they prefer. Through experiments, we show that ParIRL outperforms other IRL algorithms for various multi-objective control tasks, achieving the dense approximation of the Pareto frontier. We also demonstrate the applicability of ParIRL with autonomous driving in CARLA.

* 13 pages, 7 figures; Accepted for International Joint Conference on Artificial Intelligence (IJCAI) 2024; Published version

Via

Access Paper or Ask Questions

Robust Policy Learning via Offline Skill Diffusion

Mar 05, 2024

Woo Kyung Kim, Minjong Yoo, Honguk Woo

Abstract:Skill-based reinforcement learning (RL) approaches have shown considerable promise, especially in solving long-horizon tasks via hierarchical structures. These skills, learned task-agnostically from offline datasets, can accelerate the policy learning process for new tasks. Yet, the application of these skills in different domains remains restricted due to their inherent dependency on the datasets, which poses a challenge when attempting to learn a skill-based policy via RL for a target domain different from the datasets' domains. In this paper, we present a novel offline skill learning framework DuSkill which employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets, thereby enhancing the robustness of policy learning for tasks in different domains. Specifically, we devise a guided diffusion-based skill decoder in conjunction with the hierarchical encoding to disentangle the skill embedding space into two distinct representations, one for encapsulating domain-invariant behaviors and the other for delineating the factors that induce domain variations in the behaviors. Our DuSkill framework enhances the diversity of skills learned offline, thus enabling to accelerate the learning procedure of high-level policies for different domains. Through experiments, we show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks, demonstrating its benefits in few-shot imitation and online RL.

* Accepted for AAAI 2024

Via

Access Paper or Ask Questions

One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill

Feb 13, 2024

Sangwoo Shin, Daehee Lee, Minjong Yoo, Woo Kyung Kim, Honguk Woo

Figure 1 for One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill

Figure 2 for One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill

Figure 3 for One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill

Figure 4 for One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill

Abstract:One-shot imitation is to learn a new task from a single demonstration, yet it is a challenging problem to adopt it for complex tasks with the high domain diversity inherent in a non-stationary environment. To tackle the problem, we explore the compositionality of complex tasks, and present a novel skill-based imitation learning framework enabling one-shot imitation and zero-shot adaptation; from a single demonstration for a complex unseen task, a semantic skill sequence is inferred and then each skill in the sequence is converted into an action sequence optimized for environmental hidden dynamics that can vary over time. Specifically, we leverage a vision-language model to learn a semantic skill set from offline video datasets, where each skill is represented on the vision-language embedding space, and adapt meta-learning with dynamics inference to enable zero-shot skill adaptation. We evaluate our framework with various one-shot imitation scenarios for extended multi-stage Meta-world tasks, showing its superiority in learning complex tasks, generalizing to dynamics changes, and extending to different demonstration conditions and modalities, compared to other baselines.

* ICML-2023 Camera Ready Version

Via

Access Paper or Ask Questions