Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexey K. Kovalev

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

Jun 04, 2025

Anastasiia Ivanova, Eva Bakaeva, Zoya Volovikova, Alexey K. Kovalev, Aleksandr I. Panov

Abstract:As a part of an embodied agent, Large Language Models (LLMs) are typically used for behavior planning given natural language instructions from the user. However, dealing with ambiguous instructions in real-world environments remains a challenge for LLMs. Various methods for task ambiguity detection have been proposed. However, it is difficult to compare them because they are tested on different datasets and there is no universal benchmark. For this reason, we propose AmbiK (Ambiguous Tasks in Kitchen Environment), the fully textual dataset of ambiguous instructions addressed to a robot in a kitchen environment. AmbiK was collected with the assistance of LLMs and is human-validated. It comprises 1000 pairs of ambiguous tasks and their unambiguous counterparts, categorized by ambiguity type (Human Preferences, Common Sense Knowledge, Safety), with environment descriptions, clarifying questions and answers, user intents, and task plans, for a total of 2000 tasks. We hope that AmbiK will enable researchers to perform a unified comparison of ambiguity detection methods. AmbiK is available at https://github.com/cog-model/AmbiK-dataset.

* ACL 2025 (Main Conference)

Via

Access Paper or Ask Questions

Symbolic Disentangled Representations for Images

Dec 25, 2024

Alexandr Korchemnyi, Alexey K. Kovalev, Aleksandr I. Panov

Abstract:The idea of disentangled representations is to reduce the data to a set of generative factors that produce it. Typically, such representations are vectors in latent space, where each coordinate corresponds to one of the generative factors. The object can then be modified by changing the value of a particular coordinate, but it is necessary to determine which coordinate corresponds to the desired generative factor -- a difficult task if the vector representation has a high dimension. In this article, we propose ArSyD (Architecture for Symbolic Disentanglement), which represents each generative factor as a vector of the same dimension as the resulting representation. In ArSyD, the object representation is obtained as a superposition of the generative factor vector representations. We call such a representation a \textit{symbolic disentangled representation}. We use the principles of Hyperdimensional Computing (also known as Vector Symbolic Architectures), where symbols are represented as hypervectors, allowing vector operations on them. Disentanglement is achieved by construction, no additional assumptions about the underlying distributions are made during training, and the model is only trained to reconstruct images in a weakly supervised manner. We study ArSyD on the dSprites and CLEVR datasets and provide a comprehensive analysis of the learned symbolic disentangled representations. We also propose new disentanglement metrics that allow comparison of methods using latent representations of different dimensions. ArSyD allows to edit the object properties in a controlled and interpretable way, and the dimensionality of the object property representation coincides with the dimensionality of the object representation itself.

* 14 pages, 14 figures

Via

Access Paper or Ask Questions

Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Dec 09, 2024

Egor Cherepanov, Nikita Kachaev, Artem Zholus, Alexey K. Kovalev, Aleksandr I. Panov

Figure 1 for Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Figure 2 for Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Figure 3 for Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Figure 4 for Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Abstract:The incorporation of memory into agents is essential for numerous tasks within the domain of Reinforcement Learning (RL). In particular, memory is paramount for tasks that require the utilization of past information, adaptation to novel environments, and improved sample efficiency. However, the term ``memory'' encompasses a wide range of concepts, which, coupled with the lack of a unified methodology for validating an agent's memory, leads to erroneous judgments about agents' memory capabilities and prevents objective comparison with other memory-enhanced agents. This paper aims to streamline the concept of memory in RL by providing practical precise definitions of agent memory types, such as long-term versus short-term memory and declarative versus procedural memory, inspired by cognitive science. Using these definitions, we categorize different classes of agent memory, propose a robust experimental methodology for evaluating the memory capabilities of RL agents, and standardize evaluations. Furthermore, we empirically demonstrate the importance of adhering to the proposed methodology when evaluating different types of agent memory by conducting experiments with different RL agents and what its violation leads to.

* 18 pages, 6 figures

Via

Access Paper or Ask Questions

Object-Centric Learning with Slot Mixture Module

Nov 08, 2023

Daniil Kirilenko, Vitaliy Vorobyov, Alexey K. Kovalev, Aleksandr I. Panov

Figure 1 for Object-Centric Learning with Slot Mixture Module

Figure 2 for Object-Centric Learning with Slot Mixture Module

Figure 3 for Object-Centric Learning with Slot Mixture Module

Figure 4 for Object-Centric Learning with Slot Mixture Module

Abstract:Object-centric architectures usually apply a differentiable module to the entire feature map to decompose it into sets of entity representations called slots. Some of these methods structurally resemble clustering algorithms, where the cluster's center in latent space serves as a slot representation. Slot Attention is an example of such a method, acting as a learnable analog of the soft k-means algorithm. Our work employs a learnable clustering method based on the Gaussian Mixture Model. Unlike other approaches, we represent slots not only as centers of clusters but also incorporate information about the distance between clusters and assigned vectors, leading to more expressive slot representations. Our experiments demonstrate that using this approach instead of Slot Attention improves performance in object-centric scenarios, achieving state-of-the-art results in the set property prediction task.

* 17 pages, 6 figures

Via

Access Paper or Ask Questions

Recurrent Memory Decision Transformer

Jul 05, 2023

Arkadii Bessonov, Alexey Staroverov, Huzhenyu Zhang, Alexey K. Kovalev, Dmitry Yudin, Aleksandr I. Panov

Figure 1 for Recurrent Memory Decision Transformer

Figure 2 for Recurrent Memory Decision Transformer

Figure 3 for Recurrent Memory Decision Transformer

Figure 4 for Recurrent Memory Decision Transformer

Abstract:Originally developed for natural language problems, transformer models have recently been widely used in offline reinforcement learning tasks. This is because the agent's history can be represented as a sequence, and the whole task can be reduced to the sequence modeling task. However, the quadratic complexity of the transformer operation limits the potential increase in context. Therefore, different versions of the memory mechanism are used to work with long sequences in a natural language. This paper proposes the Recurrent Memory Decision Transformer (RMDT), a model that uses a recurrent memory mechanism for reinforcement learning problems. We conduct thorough experiments on Atari games and MuJoCo control problems and show that our proposed model is significantly superior to its counterparts without the recurrent memory mechanism on Atari games. We also carefully study the effect of memory on the performance of the proposed model. These findings shed light on the potential of incorporating recurrent memory mechanisms to improve the performance of large-scale transformer models in offline reinforcement learning tasks. The Recurrent Memory Decision Transformer code is publicly available in the repository \url{https://anonymous.4open.science/r/RMDT-4FE4}.

Via

Access Paper or Ask Questions