Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seungyul Han

Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

Jun 26, 2025

Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han

Abstract:Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces single-step subgoal reachability by structurally constraining high-level decision-making. To enhance exploration, SSE employs a decoupled exploration policy that systematically traverses underexplored regions of the goal space. Furthermore, a failure-aware path refinement, which refines graph-based planning by dynamically adjusting edge costs according to observed low-level success rates, thereby improving subgoal reliability. Experimental results across diverse long-horizon benchmarks demonstrate that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.

* 9 technical page followed by references and appendix

Via

Access Paper or Ask Questions

Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning

Jun 24, 2025

Yisak Park, Sunwoo Lee, Seungyul Han

Abstract:Cooperative multi-agent reinforcement learning (MARL) under sparse rewards presents a fundamental challenge due to limited exploration and insufficient coordinated attention among agents. In this work, we propose the Focusing Influence Mechanism (FIM), a novel framework that enhances cooperation by directing agent influence toward task-critical elements, referred to as Center of Gravity (CoG) state dimensions, inspired by Clausewitz's military theory. FIM consists of three core components: (1) identifying CoG state dimensions based on their stability under agent behavior, (2) designing counterfactual intrinsic rewards to promote meaningful influence on these dimensions, and (3) encouraging persistent and synchronized focus through eligibility-trace-based credit accumulation. These mechanisms enable agents to induce more targeted and effective state transitions, facilitating robust cooperation even in extremely sparse reward settings. Empirical evaluations across diverse MARL benchmarks demonstrate that the proposed FIM significantly improves cooperative performance compared to baselines.

* 9 technical page followed by references and appendix

Via

Access Paper or Ask Questions

PRISM: A Robust Framework for Skill-based Meta-Reinforcement Learning with Noisy Demonstrations

Feb 06, 2025

Sanghyeon Lee, Sangjun Bae, Yisak Park, Seungyul Han

Figure 1 for PRISM: A Robust Framework for Skill-based Meta-Reinforcement Learning with Noisy Demonstrations

Figure 2 for PRISM: A Robust Framework for Skill-based Meta-Reinforcement Learning with Noisy Demonstrations

Figure 3 for PRISM: A Robust Framework for Skill-based Meta-Reinforcement Learning with Noisy Demonstrations

Figure 4 for PRISM: A Robust Framework for Skill-based Meta-Reinforcement Learning with Noisy Demonstrations

Abstract:Meta-reinforcement learning (Meta-RL) facilitates rapid adaptation to unseen tasks but faces challenges in long-horizon environments. Skill-based approaches tackle this by decomposing state-action sequences into reusable skills and employing hierarchical decision-making. However, these methods are highly susceptible to noisy offline demonstrations, resulting in unstable skill learning and degraded performance. To overcome this, we propose Prioritized Refinement for Skill-Based Meta-RL (PRISM), a robust framework that integrates exploration near noisy data to generate online trajectories and combines them with offline data. Through prioritization, PRISM extracts high-quality data to learn task-relevant skills effectively. By addressing the impact of noise, our method ensures stable skill learning and achieves superior performance in long-horizon tasks, even with noisy and sub-optimal data.

* 8 pages main, 19 pages appendix with reference. Submitted to ICML 2025

Via

Access Paper or Ask Questions

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Feb 05, 2025

Jeongmo Kim, Yisak Park, Minung Kim, Seungyul Han

Abstract:Meta reinforcement learning aims to develop policies that generalize to unseen tasks sampled from a task distribution. While context-based meta-RL methods improve task representation using task latents, they often struggle with out-of-distribution (OOD) tasks. To address this, we propose Task-Aware Virtual Training (TAVT), a novel algorithm that accurately captures task characteristics for both training and OOD scenarios using metric-based representation learning. Our method successfully preserves task characteristics in virtual tasks and employs a state regularization technique to mitigate overestimation errors in state-varying environments. Numerical results demonstrate that TAVT significantly enhances generalization to OOD tasks across various MuJoCo and MetaWorld environments.

* 8 pages main paper, 19 pages appendices with reference, Submitted to ICML 2025

Via

Access Paper or Ask Questions

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Feb 05, 2025

Sunwoo Lee, Jaebak Hwang, Yonghyeon Jo, Seungyul Han

Abstract:Traditional robust methods in multi-agent reinforcement learning (MARL) often struggle against coordinated adversarial attacks in cooperative scenarios. To address this limitation, we propose the Wolfpack Adversarial Attack framework, inspired by wolf hunting strategies, which targets an initial agent and its assisting agents to disrupt cooperation. Additionally, we introduce the Wolfpack-Adversarial Learning for MARL (WALL) framework, which trains robust MARL policies to defend against the proposed Wolfpack attack by fostering system-wide collaboration. Experimental results underscore the devastating impact of the Wolfpack attack and the significant robustness improvements achieved by WALL.

* 8 pages main, 21 pages appendix with reference. Submitted to ICML 2025

Via

Access Paper or Ask Questions

Domain-Invariant Per-Frame Feature Extraction for Cross-Domain Imitation Learning with Visual Observations

Feb 05, 2025

Minung Kim, Kawon Lee, Jungmo Kim, Sungho Choi, Seungyul Han

Abstract:Imitation learning (IL) enables agents to mimic expert behavior without reward signals but faces challenges in cross-domain scenarios with high-dimensional, noisy, and incomplete visual observations. To address this, we propose Domain-Invariant Per-Frame Feature Extraction for Imitation Learning (DIFF-IL), a novel IL method that extracts domain-invariant features from individual frames and adapts them into sequences to isolate and replicate expert behaviors. We also introduce a frame-wise time labeling technique to segment expert behaviors by timesteps and assign rewards aligned with temporal contexts, enhancing task performance. Experiments across diverse visual environments demonstrate the effectiveness of DIFF-IL in addressing complex visual tasks.

* 8 pages main, 19 pages appendix with reference. Submitted to ICML 2025

Via

Access Paper or Ask Questions

Exclusively Penalized Q-learning for Offline Reinforcement Learning

May 23, 2024

Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim, Sanghyeon Lee, Seungyul Han

Figure 1 for Exclusively Penalized Q-learning for Offline Reinforcement Learning

Figure 2 for Exclusively Penalized Q-learning for Offline Reinforcement Learning

Figure 3 for Exclusively Penalized Q-learning for Offline Reinforcement Learning

Figure 4 for Exclusively Penalized Q-learning for Offline Reinforcement Learning

Abstract:Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To address this concern, we propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors. Numerical results show that our method significantly reduces underestimation bias and improves performance in various offline control tasks compared to other offline RL methods

* 9 pages technical page followed by references and appendix

Via

Access Paper or Ask Questions

Domain Adaptive Imitation Learning with Visual Observation

Dec 01, 2023

Sungho Choi, Seungyul Han, Woojun Kim, Jongseong Chae, Whiyoung Jung, Youngchul Sung

Abstract:In this paper, we consider domain-adaptive imitation learning with visual observation, where an agent in a target domain learns to perform a task by observing expert demonstrations in a source domain. Domain adaptive imitation learning arises in practical scenarios where a robot, receiving visual sensory data, needs to mimic movements by visually observing other robots from different angles or observing robots of different shapes. To overcome the domain shift in cross-domain imitation learning with visual observation, we propose a novel framework for extracting domain-independent behavioral features from input observations that can be used to train the learner, based on dual feature extraction and image reconstruction. Empirical results demonstrate that our approach outperforms previous algorithms for imitation learning from visual observation with domain shift.

* Accepted to NeurIPS 2023

Via

Access Paper or Ask Questions

FoX: Formation-aware exploration in multi-agent reinforcement learning

Aug 22, 2023

Yonghyeon Jo, Sunwoo Lee, Junghyuk Yum, Seungyul Han

Figure 1 for FoX: Formation-aware exploration in multi-agent reinforcement learning

Figure 2 for FoX: Formation-aware exploration in multi-agent reinforcement learning

Figure 3 for FoX: Formation-aware exploration in multi-agent reinforcement learning

Figure 4 for FoX: Formation-aware exploration in multi-agent reinforcement learning

Abstract:Recently, deep multi-agent reinforcement learning (MARL) has gained significant popularity due to its success in various cooperative multi-agent tasks. However, exploration still remains a challenging problem in MARL due to the partial observability of the agents and the exploration space that can grow exponentially as the number of agents increases. Firstly, in order to address the scalability issue of the exploration space, we define a formation-based equivalence relation on the exploration space and aim to reduce the search space by exploring only meaningful states in different formations. Then, we propose a novel formation-aware exploration (FoX) framework that encourages partially observable agents to visit the states in diverse formations by guiding them to be well aware of their current formation solely based on their own observations. Numerical results show that the proposed FoX framework significantly outperforms the state-of-the-art MARL algorithms on Google Research Football (GRF) and sparse Starcraft II multi-agent challenge (SMAC) tasks.

* 7 pages main, 5 pages appendix with reference. 10 figures, submitted for AAAI

Via

Access Paper or Ask Questions

Robust Imitation Learning against Variations in Environment Dynamics

Jun 19, 2022

Jongseong Chae, Seungyul Han, Whiyoung Jung, Myungsik Cho, Sungho Choi, Youngchul Sung

Figure 1 for Robust Imitation Learning against Variations in Environment Dynamics

Figure 2 for Robust Imitation Learning against Variations in Environment Dynamics

Figure 3 for Robust Imitation Learning against Variations in Environment Dynamics

Figure 4 for Robust Imitation Learning against Variations in Environment Dynamics

Abstract:In this paper, we propose a robust imitation learning (IL) framework that improves the robustness of IL when environment dynamics are perturbed. The existing IL framework trained in a single environment can catastrophically fail with perturbations in environment dynamics because it does not capture the situation that underlying environment dynamics can be changed. Our framework effectively deals with environments with varying dynamics by imitating multiple experts in sampled environment dynamics to enhance the robustness in general variations in environment dynamics. In order to robustly imitate the multiple sample experts, we minimize the risk with respect to the Jensen-Shannon divergence between the agent's policy and each of the sample experts. Numerical results show that our algorithm significantly improves robustness against dynamics perturbations compared to conventional IL baselines.

* Accepted to ICML 2022

Via

Access Paper or Ask Questions