Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Changyeon Kim

Subtask-Aware Visual Reward Learning from Segmented Demonstrations

Feb 28, 2025

Changyeon Kim, Minho Heo, Doohyun Lee, Jinwoo Shin, Honglak Lee, Joseph J. Lim, Kimin Lee

Abstract:Reinforcement Learning (RL) agents have demonstrated their potential across various robotic tasks. However, they still heavily rely on human-engineered reward functions, requiring extensive trial-and-error and access to target behavior information, often unavailable in real-world settings. This paper introduces REDS: REward learning from Demonstration with Segmentations, a novel reward learning framework that leverages action-free videos with minimal supervision. Specifically, REDS employs video demonstrations segmented into subtasks from diverse sources and treats these segments as ground-truth rewards. We train a dense reward function conditioned on video segments and their corresponding subtasks to ensure alignment with ground-truth reward signals by minimizing the Equivalent-Policy Invariant Comparison distance. Additionally, we employ contrastive learning objectives to align video representations with subtasks, ensuring precise subtask inference during online interactions. Our experiments show that REDS significantly outperforms baseline methods on complex robotic manipulation tasks in Meta-World and more challenging real-world tasks, such as furniture assembly in FurnitureBench, with minimal human intervention. Moreover, REDS facilitates generalization to unseen tasks and robot embodiments, highlighting its potential for scalable deployment in diverse environments.

* Project webpage: https://changyeon.site/reds/

Via

Access Paper or Ask Questions

Benchmarking Mobile Device Control Agents across Diverse Configurations

Apr 25, 2024

Juyong Lee, Taywon Min, Minyong An, Changyeon Kim, Kimin Lee

Abstract:Developing autonomous agents for mobile devices can significantly enhance user interactions by offering increased efficiency and accessibility. However, despite the growing interest in mobile device control agents, the absence of a commonly adopted benchmark makes it challenging to quantify scientific progress in this area. In this work, we introduce B-MoCA: a novel benchmark designed specifically for evaluating mobile device control agents. To create a realistic benchmark, we develop B-MoCA based on the Android operating system and define 60 common daily tasks. Importantly, we incorporate a randomization feature that changes various aspects of mobile devices, including user interface layouts and language settings, to assess generalization performance. We benchmark diverse agents, including agents employing large language models (LLMs) or multi-modal LLMs as well as agents trained from scratch using human expert demonstrations. While these agents demonstrate proficiency in executing straightforward tasks, their poor performance on complex tasks highlights significant opportunities for future research to enhance their effectiveness. Our source code is publicly available at https://b-moca.github.io.

* Accepted (Spotlight) to ICLR 2024 Workshop on Generative Models for Decision Making. Project website: https://b-moca.github.io

Via

Access Paper or Ask Questions

Guide Your Agent with Adaptive Multimodal Rewards

Sep 19, 2023

Changyeon Kim, Younggyo Seo, Hao Liu, Lisa Lee, Jinwoo Shin, Honglak Lee, Kimin Lee

Abstract:Developing an agent capable of adapting to unseen environments remains a difficult challenge in imitation learning. In this work, we present Adaptive Return-conditioned Policy (ARP), an efficient framework designed to enhance the agent's generalization ability using natural language task descriptions and pre-trained multimodal encoders. Our key idea is to calculate a similarity between visual observations and natural language instructions in the pre-trained multimodal embedding space (such as CLIP) and use it as a reward signal. We then train a return-conditioned policy using expert demonstrations labeled with multimodal rewards. Because the multimodal rewards provide adaptive signals at each timestep, our ARP effectively mitigates the goal misgeneralization. This results in superior generalization performances even when faced with unseen text instructions, compared to existing text-conditioned policies. To improve the quality of rewards, we also introduce a fine-tuning method for pre-trained multimodal encoders, further enhancing the performance. Video demonstrations and source code are available on the project website: https://sites.google.com/view/2023arp.

* Project webpage: https://sites.google.com/view/2023arp

Via

Access Paper or Ask Questions

Preference Transformer: Modeling Human Preferences using Transformers for RL

Mar 02, 2023

Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee

Abstract:Preference-based reinforcement learning (RL) provides a framework to train agents using human preferences between two behaviors. However, preference-based RL has been challenging to scale since it requires a large amount of human feedback to learn a reward function aligned with human intent. In this paper, we present Preference Transformer, a neural architecture that models human preferences using transformers. Unlike prior approaches assuming human judgment is based on the Markovian rewards which contribute to the decision equally, we introduce a new preference model based on the weighted sum of non-Markovian rewards. We then design the proposed preference model using a transformer architecture that stacks causal and bidirectional self-attention layers. We demonstrate that Preference Transformer can solve a variety of control tasks using real human preferences, while prior approaches fail to work. We also show that Preference Transformer can induce a well-specified reward and attend to critical events in the trajectory by automatically capturing the temporal dependencies in human decision-making. Code is available on the project website: https://sites.google.com/view/preference-transformer.

* Project website: https://sites.google.com/view/preference-transformer. Accepted to ICLR 2023

Via

Access Paper or Ask Questions

MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation

Aug 17, 2021

Hojoon Lee, Dongyoon Hwang, Sunghwan Hong, Changyeon Kim, Seungryong Kim, Jaegul Choo

Figure 1 for MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation

Figure 2 for MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation

Figure 3 for MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation

Figure 4 for MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation

Abstract:Successful sequential recommendation systems rely on accurately capturing the user's short-term and long-term interest. Although Transformer-based models achieved state-of-the-art performance in the sequential recommendation task, they generally require quadratic memory and time complexity to the sequence length, making it difficult to extract the long-term interest of users. On the other hand, Multi-Layer Perceptrons (MLP)-based models, renowned for their linear memory and time complexity, have recently shown competitive results compared to Transformer in various tasks. Given the availability of a massive amount of the user's behavior history, the linear memory and time complexity of MLP-based models make them a promising alternative to explore in the sequential recommendation task. To this end, we adopted MLP-based models in sequential recommendation but consistently observed that MLP-based methods obtain lower performance than those of Transformer despite their computational benefits. From experiments, we observed that introducing explicit high-order interactions to MLP layers mitigates such performance gap. In response, we propose the Multi-Order Interaction (MOI) layer, which is capable of expressing an arbitrary order of interactions within the inputs while maintaining the memory and time complexity of the MLP layer. By replacing the MLP layer with the MOI layer, our model was able to achieve comparable performance with Transformer-based models while retaining the MLP-based models' computational benefits.

* 9 pages

Via

Access Paper or Ask Questions

Collecting the Public Perception of AI and Robot Rights

Aug 04, 2020

Gabriel Lima, Changyeon Kim, Seungho Ryu, Chihyung Jeon, Meeyoung Cha

Figure 1 for Collecting the Public Perception of AI and Robot Rights

Figure 2 for Collecting the Public Perception of AI and Robot Rights

Figure 3 for Collecting the Public Perception of AI and Robot Rights

Figure 4 for Collecting the Public Perception of AI and Robot Rights

Abstract:Whether to give rights to artificial intelligence (AI) and robots has been a sensitive topic since the European Parliament proposed advanced robots could be granted "electronic personalities." Numerous scholars who favor or disfavor its feasibility have participated in the debate. This paper presents an experiment (N=1270) that 1) collects online users' first impressions of 11 possible rights that could be granted to autonomous electronic agents of the future and 2) examines whether debunking common misconceptions on the proposal modifies one's stance toward the issue. The results indicate that even though online users mainly disfavor AI and robot rights, they are supportive of protecting electronic agents from cruelty (i.e., favor the right against cruel treatment). Furthermore, people's perceptions became more positive when given information about rights-bearing non-human entities or myth-refuting statements. The style used to introduce AI and robot rights significantly affected how the participants perceived the proposal, similar to the way metaphors function in creating laws. For robustness, we repeated the experiment over a more representative sample of U.S. residents (N=164) and found that perceptions gathered from online users and those by the general population are similar.

* Conditionally Accepted to ACM CSCW 2020

Via

Access Paper or Ask Questions