Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daesol Cho

Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement

Oct 30, 2023

Daesol Cho, Seungjae Lee, H. Jin Kim

Abstract:Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning methods, D2C requires only a few examples of desired outcomes and works in any environment, regardless of its geometry or the distribution of the desired outcome examples. The proposed method performs diversification of the goal-conditional classifiers to identify similarities between visited and desired outcome states and ensures that the classifiers disagree on states from out-of-distribution, which enables quantifying the unexplored region and designing an arbitrary goal-conditioned intrinsic reward signal in a simple and intuitive way. The proposed method then employs bipartite matching to define a curriculum learning objective that produces a sequence of well-adjusted intermediate goals, which enable the agent to automatically explore and conquer the unexplored region. We present experimental results demonstrating that D2C outperforms prior curriculum RL methods in both quantitative and qualitative aspects, even with the arbitrarily distributed desired outcome examples.

Via

Access Paper or Ask Questions

CQM: Curriculum Reinforcement Learning with a Quantized World Model

Oct 26, 2023

Seungjae Lee, Daesol Cho, Jonghae Park, H. Jin Kim

Abstract:Recent curriculum Reinforcement Learning (RL) has shown notable progress in solving complex tasks by proposing sequences of surrogate tasks. However, the previous approaches often face challenges when they generate curriculum goals in a high-dimensional space. Thus, they usually rely on manually specified goal spaces. To alleviate this limitation and improve the scalability of the curriculum, we propose a novel curriculum method that automatically defines the semantic goal space which contains vital information for the curriculum process, and suggests curriculum goals over it. To define the semantic goal space, our method discretizes continuous observations via vector quantized-variational autoencoders (VQ-VAE) and restores the temporal relations between the discretized observations by a graph. Concurrently, ours suggests uncertainty and temporal distance-aware curriculum goals that converges to the final goals over the automatically composed goal space. We demonstrate that the proposed method allows efficient explorations in an uninformed environment with raw goal examples only. Also, ours outperforms the state-of-the-art curriculum RL methods on data efficiency and performance, in various goal-reaching tasks even with ego-centric visual inputs.

* Accepted to NeurIPS 2023

Via

Access Paper or Ask Questions

Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum

May 17, 2023

Jigang Kim, Daesol Cho, H. Jin Kim

Abstract:While reinforcement learning (RL) has achieved great success in acquiring complex skills solely from environmental interactions, it assumes that resets to the initial state are readily available at the end of each episode. Such an assumption hinders the autonomous learning of embodied agents due to the time-consuming and cumbersome workarounds for resetting in the physical world. Hence, there has been a growing interest in autonomous RL (ARL) methods that are capable of learning from non-episodic interactions. However, existing works on ARL are limited by their reliance on prior data and are unable to learn in environments where task-relevant interactions are sparse. In contrast, we propose a demonstration-free ARL algorithm via Implicit and Bi-directional Curriculum (IBC). With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods, even the ones that leverage demonstrations.

* accepted to ICML 2023 (poster)

Via

Access Paper or Ask Questions

Outcome-directed Reinforcement Learning by Uncertainty & Temporal Distance-Aware Curriculum Goal Generation

Jan 27, 2023

Daesol Cho, Seungjae Lee, H. Jin Kim

Abstract:Current reinforcement learning (RL) often suffers when solving a challenging exploration problem where the desired outcomes or high rewards are rarely observed. Even though curriculum RL, a framework that solves complex tasks by proposing a sequence of surrogate tasks, shows reasonable results, most of the previous works still have difficulty in proposing curriculum due to the absence of a mechanism for obtaining calibrated guidance to the desired outcome state without any prior domain knowledge. To alleviate it, we propose an uncertainty & temporal distance-aware curriculum goal generation method for the outcome-directed RL via solving a bipartite matching problem. It could not only provide precisely calibrated guidance of the curriculum to the desired outcome states but also bring much better sample efficiency and geometry-agnostic curriculum goal proposal capability compared to previous curriculum RL methods. We demonstrate that our algorithm significantly outperforms these prior methods in a variety of challenging navigation tasks and robotic manipulation tasks in a quantitative and qualitative way.

* ICLR 2023 Spotlight. First two authors contributed equally

Via

Access Paper or Ask Questions

S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning

Sep 30, 2022

Daesol Cho, Dongseok Shim, H. Jin Kim

Figure 1 for S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning

Figure 2 for S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning

Figure 3 for S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning

Figure 4 for S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning

Abstract:Offline reinforcement learning (Offline RL) suffers from the innate distributional shift as it cannot interact with the physical environment during training. To alleviate such limitation, state-based offline RL leverages a learned dynamics model from the logged experience and augments the predicted state transition to extend the data distribution. For exploiting such benefit also on the image-based RL, we firstly propose a generative model, S2P (State2Pixel), which synthesizes the raw pixel of the agent from its corresponding state. It enables bridging the gap between the state and the image domain in RL algorithms, and virtually exploring unseen image distribution via model-based transition in the state space. Through experiments, we confirm that our S2P-based image synthesis not only improves the image-based offline RL performance but also shows powerful generalization capability on unseen tasks.

* NeurIPS 2022, first two authors contributed equally

Via

Access Paper or Ask Questions

Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery

Apr 29, 2022

Daesol Cho, Jigang Kim, H. Jin Kim

Figure 1 for Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery

Figure 2 for Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery

Figure 3 for Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery

Figure 4 for Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery

Abstract:Current reinforcement learning (RL) in robotics often experiences difficulty in generalizing to new downstream tasks due to the innate task-specific training paradigm. To alleviate it, unsupervised RL, a framework that pre-trains the agent in a task-agnostic manner without access to the task-specific reward, leverages active exploration for distilling diverse experience into essential skills or reusable knowledge. For exploiting such benefits also in robotic manipulation, we propose an unsupervised method for transferable manipulation skill discovery that ties structured exploration toward interacting behavior and transferable skill learning. It not only enables the agent to learn interaction behavior, the key aspect of the robotic manipulation learning, without access to the environment reward, but also to generalize to arbitrary downstream manipulation tasks with the learned task-agnostic skills. Through comparative experiments, we show that our approach achieves the most diverse interacting behavior and significantly improves sample efficiency in downstream tasks including the extension to multi-object, multitask problems.

* 8 pages, 9 figures; accepted for publication in the IEEE Robotics and Automation Letters (RA-L); supplementary video available at https://www.youtube.com/watch?v=bF3Y4WXfM7c&t=9s

Via

Access Paper or Ask Questions

Automating Reinforcement Learning with Example-based Resets

Apr 06, 2022

Jigang Kim, J. hyeon Park, Daesol Cho, H. Jin Kim

Figure 1 for Automating Reinforcement Learning with Example-based Resets

Figure 2 for Automating Reinforcement Learning with Example-based Resets

Figure 3 for Automating Reinforcement Learning with Example-based Resets

Figure 4 for Automating Reinforcement Learning with Example-based Resets

Abstract:Deep reinforcement learning has enabled robots to learn motor skills from environmental interactions with minimal to no prior knowledge. However, existing reinforcement learning algorithms assume an episodic setting, in which the agent resets to a fixed initial state distribution at the end of each episode, to successfully train the agents from repeated trials. Such reset mechanism, while trivial for simulated tasks, can be challenging to provide for real-world robotics tasks. Resets in robotic systems often require extensive human supervision and task-specific workarounds, which contradicts the goal of autonomous robot learning. In this paper, we propose an extension to conventional reinforcement learning towards greater autonomy by introducing an additional agent that learns to reset in a self-supervised manner. The reset agent preemptively triggers a reset to prevent manual resets and implicitly imposes a curriculum for the forward agent. We apply our method to learn from scratch on a suite of simulated and real-world continuous control tasks and demonstrate that the reset agent successfully learns to reduce manual resets whilst also allowing the forward policy to improve gradually over time.

* 8 pages, 6 figures; accepted for publication in the IEEE Robotics and Automation Letters (RA-L); source code available at https://github.com/jigangkim/autoreset_rl ; supplementary video available at https://youtu.be/himd0Z5b64A

Via

Access Paper or Ask Questions