Picture for Harsh Agrawal

Harsh Agrawal

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

Add code
Dec 11, 2024
Viaarxiv icon

DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models

Add code
Dec 11, 2024
Viaarxiv icon

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Add code
Oct 24, 2024
Viaarxiv icon

Grounding Multimodal Large Language Models in Actions

Add code
Jun 12, 2024
Viaarxiv icon

Large Language Models as Generalizable Policies for Embodied Tasks

Add code
Oct 26, 2023
Figure 1 for Large Language Models as Generalizable Policies for Embodied Tasks
Figure 2 for Large Language Models as Generalizable Policies for Embodied Tasks
Figure 3 for Large Language Models as Generalizable Policies for Embodied Tasks
Figure 4 for Large Language Models as Generalizable Policies for Embodied Tasks
Viaarxiv icon

Housekeep: Tidying Virtual Households using Commonsense Reasoning

Add code
May 22, 2022
Figure 1 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Figure 2 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Figure 3 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Figure 4 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Viaarxiv icon

Simple and Effective Synthesis of Indoor 3D Scenes

Add code
Apr 06, 2022
Figure 1 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 2 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 3 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 4 for Simple and Effective Synthesis of Indoor 3D Scenes
Viaarxiv icon

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

Add code
Oct 27, 2021
Figure 1 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Figure 2 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Figure 3 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Figure 4 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Viaarxiv icon

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

Add code
Aug 26, 2021
Figure 1 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Figure 2 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Figure 3 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Figure 4 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Viaarxiv icon

Contrast and Classify: Alternate Training for Robust VQA

Add code
Oct 13, 2020
Figure 1 for Contrast and Classify: Alternate Training for Robust VQA
Figure 2 for Contrast and Classify: Alternate Training for Robust VQA
Figure 3 for Contrast and Classify: Alternate Training for Robust VQA
Figure 4 for Contrast and Classify: Alternate Training for Robust VQA
Viaarxiv icon