Picture for Tanmay Gupta

Tanmay Gupta

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

CodeNav: Beyond tool-use to using real-world codebases with LLM agents

Add code
Jun 18, 2024
Figure 1 for CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Figure 2 for CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Figure 3 for CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Figure 4 for CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Viaarxiv icon

Task Me Anything

Add code
Jun 17, 2024
Figure 1 for Task Me Anything
Figure 2 for Task Me Anything
Figure 3 for Task Me Anything
Figure 4 for Task Me Anything
Viaarxiv icon

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

Add code
Mar 21, 2024
Viaarxiv icon

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning

Add code
Feb 23, 2024
Viaarxiv icon

Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

Add code
Dec 05, 2023
Figure 1 for Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Figure 2 for Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Figure 3 for Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Figure 4 for Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Viaarxiv icon

OBJECT 3DIT: Language-guided 3D-aware Image Editing

Add code
Jul 20, 2023
Figure 1 for OBJECT 3DIT: Language-guided 3D-aware Image Editing
Figure 2 for OBJECT 3DIT: Language-guided 3D-aware Image Editing
Figure 3 for OBJECT 3DIT: Language-guided 3D-aware Image Editing
Figure 4 for OBJECT 3DIT: Language-guided 3D-aware Image Editing
Viaarxiv icon

Visual Programming: Compositional visual reasoning without training

Add code
Nov 18, 2022
Viaarxiv icon

GRIT: General Robust Image Task Benchmark

Add code
May 02, 2022
Figure 1 for GRIT: General Robust Image Task Benchmark
Figure 2 for GRIT: General Robust Image Task Benchmark
Figure 3 for GRIT: General Robust Image Task Benchmark
Figure 4 for GRIT: General Robust Image Task Benchmark
Viaarxiv icon

Webly Supervised Concept Expansion for General Purpose Vision Models

Add code
Feb 04, 2022
Figure 1 for Webly Supervised Concept Expansion for General Purpose Vision Models
Figure 2 for Webly Supervised Concept Expansion for General Purpose Vision Models
Figure 3 for Webly Supervised Concept Expansion for General Purpose Vision Models
Figure 4 for Webly Supervised Concept Expansion for General Purpose Vision Models
Viaarxiv icon