Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zoya Volovikova

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

Jun 04, 2025

Anastasiia Ivanova, Eva Bakaeva, Zoya Volovikova, Alexey K. Kovalev, Aleksandr I. Panov

Abstract:As a part of an embodied agent, Large Language Models (LLMs) are typically used for behavior planning given natural language instructions from the user. However, dealing with ambiguous instructions in real-world environments remains a challenge for LLMs. Various methods for task ambiguity detection have been proposed. However, it is difficult to compare them because they are tested on different datasets and there is no universal benchmark. For this reason, we propose AmbiK (Ambiguous Tasks in Kitchen Environment), the fully textual dataset of ambiguous instructions addressed to a robot in a kitchen environment. AmbiK was collected with the assistance of LLMs and is human-validated. It comprises 1000 pairs of ambiguous tasks and their unambiguous counterparts, categorized by ambiguity type (Human Preferences, Common Sense Knowledge, Safety), with environment descriptions, clarifying questions and answers, user intents, and task plans, for a total of 2000 tasks. We hope that AmbiK will enable researchers to perform a unified comparison of ambiguity detection methods. AmbiK is available at https://github.com/cog-model/AmbiK-dataset.

* ACL 2025 (Main Conference)

Via

Access Paper or Ask Questions

CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World

May 17, 2025

Zoya Volovikova, Gregory Gorbov, Petr Kuderov, Aleksandr I. Panov, Alexey Skrynnik

Abstract:Following instructions in real-world conditions requires the ability to adapt to the world's volatility and entanglement: the environment is dynamic and unpredictable, instructions can be linguistically complex with diverse vocabulary, and the number of possible goals an agent may encounter is vast. Despite extensive research in this area, most studies are conducted in static environments with simple instructions and a limited vocabulary, making it difficult to assess agent performance in more diverse and challenging settings. To address this gap, we introduce CrafText, a benchmark for evaluating instruction following in a multimodal environment with diverse instructions and dynamic interactions. CrafText includes 3,924 instructions with 3,423 unique words, covering Localization, Conditional, Building, and Achievement tasks. Additionally, we propose an evaluation protocol that measures an agent's ability to generalize to novel instruction formulations and dynamically evolving task configurations, providing a rigorous test of both linguistic understanding and adaptive decision-making.

Via

Access Paper or Ask Questions

Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments

Jul 12, 2024

Zoya Volovikova, Alexey Skrynnik, Petr Kuderov, Aleksandr I. Panov

Figure 1 for Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments

Figure 2 for Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments

Figure 3 for Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments

Figure 4 for Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments

Abstract:In this study, we address the issue of enabling an artificial intelligence agent to execute complex language instructions within virtual environments. In our framework, we assume that these instructions involve intricate linguistic structures and multiple interdependent tasks that must be navigated successfully to achieve the desired outcomes. To effectively manage these complexities, we propose a hierarchical framework that combines the deep language comprehension of large language models with the adaptive action-execution capabilities of reinforcement learning agents. The language module (based on LLM) translates the language instruction into a high-level action plan, which is then executed by a pre-trained reinforcement learning agent. We have demonstrated the effectiveness of our approach in two different environments: in IGLU, where agents are instructed to build structures, and in Crafter, where agents perform tasks and interact with objects in the surrounding environment according to language commands.

Via

Access Paper or Ask Questions

Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions

Nov 01, 2022

Alexey Skrynnik, Zoya Volovikova, Marc-Alexandre Côté, Anton Voronov, Artem Zholus, Negar Arabzadeh, Shrestha Mohanty, Milagro Teruel, Ahmed Awadallah, Aleksandr Panov(+2 more)

Abstract:The adoption of pre-trained language models to generate action plans for embodied agents is a promising research strategy. However, execution of instructions in real or simulated environments requires verification of the feasibility of actions as well as their relevance to the completion of a goal. We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment according to the natural language instructions. Our method first generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy. The proposed method formed the RL baseline at the IGLU 2022 competition.

* 6 pages, 3 figures

Via

Access Paper or Ask Questions

IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents

May 31, 2022

Artem Zholus, Alexey Skrynnik, Shrestha Mohanty, Zoya Volovikova, Julia Kiseleva, Artur Szlam, Marc-Alexandre Coté, Aleksandr I. Panov

Figure 1 for IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents

Figure 2 for IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents

Figure 3 for IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents

Abstract:We present the IGLU Gridworld: a reinforcement learning environment for building and evaluating language conditioned embodied agents in a scalable way. The environment features visual agent embodiment, interactive learning through collaboration, language conditioned RL, and combinatorically hard task (3d blocks building) space.

Via

Access Paper or Ask Questions

IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022

May 27, 2022

Julia Kiseleva, Alexey Skrynnik, Artem Zholus, Shrestha Mohanty, Negar Arabzadeh, Marc-Alexandre Côté, Mohammad Aliannejadi, Milagro Teruel, Ziming Li, Mikhail Burtsev(+7 more)

Figure 1 for IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022

Figure 2 for IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022

Figure 3 for IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022

Figure 4 for IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022

Abstract:Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collaborative Environment. The primary goal of the competition is to approach the problem of how to develop interactive embodied agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment. Understanding the complexity of the challenge, we split it into sub-tasks to make it feasible for participants. This research challenge is naturally related, but not limited, to two fields of study that are highly relevant to the NeurIPS community: Natural Language Understanding and Generation (NLU/G) and Reinforcement Learning (RL). Therefore, the suggested challenge can bring two communities together to approach one of the crucial challenges in AI. Another critical aspect of the challenge is the dedication to perform a human-in-the-loop evaluation as a final evaluation for the agents developed by contestants.

* arXiv admin note: text overlap with arXiv:2110.06536

Via

Access Paper or Ask Questions