Picture for Tanmay Gupta

Tanmay Gupta

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

CodeNav: Beyond tool-use to using real-world codebases with LLM agents

Add code
Jun 18, 2024
Viaarxiv icon

Task Me Anything

Add code
Jun 17, 2024
Viaarxiv icon

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

Add code
Mar 21, 2024
Viaarxiv icon

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning

Add code
Feb 23, 2024
Viaarxiv icon

Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

Add code
Dec 05, 2023
Viaarxiv icon

OBJECT 3DIT: Language-guided 3D-aware Image Editing

Add code
Jul 20, 2023
Viaarxiv icon

Visual Programming: Compositional visual reasoning without training

Add code
Nov 18, 2022
Viaarxiv icon

GRIT: General Robust Image Task Benchmark

Add code
May 02, 2022
Figure 1 for GRIT: General Robust Image Task Benchmark
Figure 2 for GRIT: General Robust Image Task Benchmark
Figure 3 for GRIT: General Robust Image Task Benchmark
Figure 4 for GRIT: General Robust Image Task Benchmark
Viaarxiv icon

Webly Supervised Concept Expansion for General Purpose Vision Models

Add code
Feb 04, 2022
Figure 1 for Webly Supervised Concept Expansion for General Purpose Vision Models
Figure 2 for Webly Supervised Concept Expansion for General Purpose Vision Models
Figure 3 for Webly Supervised Concept Expansion for General Purpose Vision Models
Figure 4 for Webly Supervised Concept Expansion for General Purpose Vision Models
Viaarxiv icon