Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Amogh Raut

STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied Agents in Minecraft

Dec 01, 2024

Nicholas Lenzen, Amogh Raut, Andrew Melnik

Figure 1 for STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied Agents in Minecraft

Figure 2 for STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied Agents in Minecraft

Figure 3 for STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied Agents in Minecraft

Figure 4 for STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied Agents in Minecraft

Abstract:Recently, the STEVE-1 approach has been introduced as a method for training generative agents to follow instructions in the form of latent CLIP embeddings. In this work, we present a methodology to extend the control modalities by learning a mapping from new input modalities to the latent goal space of the agent. We apply our approach to the challenging Minecraft domain, and extend the goal conditioning to include the audio modality. The resulting audio-conditioned agent is able to perform on a comparable level to the original text-conditioned and visual-conditioned agents. Specifically, we create an Audio-Video CLIP foundation model for Minecraft and an audio prior network which together map audio samples to the latent goal space of the STEVE-1 policy. Additionally, we highlight the tradeoffs that occur when conditioning on different modalities. Our training code, evaluation code, and Audio-Video CLIP foundation model for Minecraft are made open-source to help foster further research into multi-modal generalist sequential decision-making agents.

* Accepted at CoRL 2024: Workshop on Lifelong Learning for Home Robots

Via

Access Paper or Ask Questions

Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Mar 23, 2023

Stephanie Milani, Anssi Kanervisto, Karolis Ramanauskas, Sander Schulhoff, Brandon Houghton, Sharada Mohanty, Byron Galbraith, Ke Chen, Yan Song, Tianze Zhou(+20 more)

Figure 1 for Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Figure 2 for Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Figure 3 for Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Figure 4 for Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Abstract:To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022. The BASALT challenge asks teams to compete to develop algorithms to solve tasks with hard-to-specify reward functions in Minecraft. Through this competition, we aimed to promote the development of algorithms that use human feedback as channels to learn the desired behavior. We describe the competition and provide an overview of the top solutions. We conclude by discussing the impact of the competition and future directions for improvement.

Via

Access Paper or Ask Questions

Behavioral Cloning via Search in Video PreTraining Latent Space

Dec 27, 2022

Federico Malato, Florian Leopold, Amogh Raut, Ville Hautamäki, Andrew Melnik

Abstract:Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge. Then the proximity search is repeated. Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.

Via

Access Paper or Ask Questions

Implementation of Neural Network and feature extraction to classify ECG signals

Feb 17, 2018

R Karthik, Dhruv Tyagi, Amogh Raut, Soumya Saxena, Rajesh Kumar M

Figure 1 for Implementation of Neural Network and feature extraction to classify ECG signals

Figure 2 for Implementation of Neural Network and feature extraction to classify ECG signals

Figure 3 for Implementation of Neural Network and feature extraction to classify ECG signals

Figure 4 for Implementation of Neural Network and feature extraction to classify ECG signals

Abstract:This paper presents a suitable and efficient implementation of a feature extraction algorithm (Pan Tompkins algorithm) on electrocardiography (ECG) signals, for detection and classification of four cardiac diseases: Sleep Apnea, Arrhythmia, Supraventricular Arrhythmia and Long Term Atrial Fibrillation (AF) and differentiating them from the normal heart beat by using pan Tompkins RR detection followed by feature extraction for classification purpose .The paper also presents a new approach towards signal classification using the existing neural networks classifiers.

* SPRINGER LNEE

Via

Access Paper or Ask Questions