Picture for Harsh Agrawal

Harsh Agrawal

DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models

Add code
Dec 11, 2024
Viaarxiv icon

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

Add code
Dec 11, 2024
Viaarxiv icon

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Add code
Oct 24, 2024
Viaarxiv icon

Grounding Multimodal Large Language Models in Actions

Add code
Jun 12, 2024
Figure 1 for Grounding Multimodal Large Language Models in Actions
Figure 2 for Grounding Multimodal Large Language Models in Actions
Figure 3 for Grounding Multimodal Large Language Models in Actions
Figure 4 for Grounding Multimodal Large Language Models in Actions
Viaarxiv icon

Large Language Models as Generalizable Policies for Embodied Tasks

Add code
Oct 26, 2023
Figure 1 for Large Language Models as Generalizable Policies for Embodied Tasks
Figure 2 for Large Language Models as Generalizable Policies for Embodied Tasks
Figure 3 for Large Language Models as Generalizable Policies for Embodied Tasks
Figure 4 for Large Language Models as Generalizable Policies for Embodied Tasks
Viaarxiv icon

Housekeep: Tidying Virtual Households using Commonsense Reasoning

Add code
May 22, 2022
Figure 1 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Figure 2 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Figure 3 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Figure 4 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Viaarxiv icon

Simple and Effective Synthesis of Indoor 3D Scenes

Add code
Apr 06, 2022
Figure 1 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 2 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 3 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 4 for Simple and Effective Synthesis of Indoor 3D Scenes
Viaarxiv icon

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

Add code
Oct 27, 2021
Figure 1 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Figure 2 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Figure 3 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Figure 4 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Viaarxiv icon

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

Add code
Aug 26, 2021
Figure 1 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Figure 2 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Figure 3 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Figure 4 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Viaarxiv icon

Contrast and Classify: Alternate Training for Robust VQA

Add code
Oct 13, 2020
Figure 1 for Contrast and Classify: Alternate Training for Robust VQA
Figure 2 for Contrast and Classify: Alternate Training for Robust VQA
Figure 3 for Contrast and Classify: Alternate Training for Robust VQA
Figure 4 for Contrast and Classify: Alternate Training for Robust VQA
Viaarxiv icon