Picture for Mandar Joshi

Mandar Joshi

BAGEL: Bootstrapping Agents by Guiding Exploration with Language

Add code
Mar 12, 2024
Viaarxiv icon

Efficient End-to-End Visual Document Understanding with Rationale Distillation

Add code
Nov 16, 2023
Viaarxiv icon

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces

Add code
May 31, 2023
Figure 1 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Figure 2 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Figure 3 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Figure 4 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Add code
Feb 24, 2023
Figure 1 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Figure 2 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Figure 3 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Figure 4 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Viaarxiv icon

DePlot: One-shot visual language reasoning by plot-to-table translation

Add code
Dec 20, 2022
Figure 1 for DePlot: One-shot visual language reasoning by plot-to-table translation
Figure 2 for DePlot: One-shot visual language reasoning by plot-to-table translation
Figure 3 for DePlot: One-shot visual language reasoning by plot-to-table translation
Figure 4 for DePlot: One-shot visual language reasoning by plot-to-table translation
Viaarxiv icon

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Add code
Dec 19, 2022
Viaarxiv icon

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

Add code
Oct 07, 2022
Figure 1 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Figure 2 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Figure 3 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Figure 4 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Viaarxiv icon

Few-shot Mining of Naturally Occurring Inputs and Outputs

Add code
May 09, 2022
Figure 1 for Few-shot Mining of Naturally Occurring Inputs and Outputs
Figure 2 for Few-shot Mining of Naturally Occurring Inputs and Outputs
Figure 3 for Few-shot Mining of Naturally Occurring Inputs and Outputs
Figure 4 for Few-shot Mining of Naturally Occurring Inputs and Outputs
Viaarxiv icon

Improving Passage Retrieval with Zero-Shot Question Generation

Add code
Apr 15, 2022
Figure 1 for Improving Passage Retrieval with Zero-Shot Question Generation
Figure 2 for Improving Passage Retrieval with Zero-Shot Question Generation
Figure 3 for Improving Passage Retrieval with Zero-Shot Question Generation
Figure 4 for Improving Passage Retrieval with Zero-Shot Question Generation
Viaarxiv icon