Picture for Vijay Kumar BG

Vijay Kumar BG

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

Add code
Apr 06, 2024
Figure 1 for Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Figure 2 for Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Figure 3 for Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Figure 4 for Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Viaarxiv icon

Exploring Question Decomposition for Zero-Shot VQA

Add code
Oct 25, 2023
Viaarxiv icon

Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!

Add code
Jun 06, 2023
Figure 1 for Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Figure 2 for Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Figure 3 for Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Figure 4 for Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Viaarxiv icon

Single-Stream Multi-Level Alignment for Vision-Language Pretraining

Add code
Mar 30, 2022
Figure 1 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Figure 2 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Figure 3 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Figure 4 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Viaarxiv icon

Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention

Add code
Nov 23, 2020
Figure 1 for Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention
Figure 2 for Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention
Figure 3 for Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention
Figure 4 for Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention
Viaarxiv icon

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

Add code
Jul 29, 2016
Figure 1 for Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
Figure 2 for Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
Figure 3 for Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
Figure 4 for Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
Viaarxiv icon