Picture for Zaid Khan

Zaid Khan

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

Add code
Oct 08, 2024
Viaarxiv icon

Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering

Add code
Apr 16, 2024
Figure 1 for Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
Figure 2 for Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
Figure 3 for Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
Figure 4 for Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
Viaarxiv icon

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

Add code
Apr 06, 2024
Viaarxiv icon

Exploring Question Decomposition for Zero-Shot VQA

Add code
Oct 25, 2023
Figure 1 for Exploring Question Decomposition for Zero-Shot VQA
Figure 2 for Exploring Question Decomposition for Zero-Shot VQA
Figure 3 for Exploring Question Decomposition for Zero-Shot VQA
Figure 4 for Exploring Question Decomposition for Zero-Shot VQA
Viaarxiv icon

Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!

Add code
Jun 06, 2023
Viaarxiv icon

Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning

Add code
Mar 21, 2023
Viaarxiv icon

Single-Stream Multi-Level Alignment for Vision-Language Pretraining

Add code
Mar 30, 2022
Figure 1 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Figure 2 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Figure 3 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Figure 4 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Viaarxiv icon

Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation

Add code
Aug 05, 2021
Figure 1 for Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation
Figure 2 for Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation
Figure 3 for Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation
Figure 4 for Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation
Viaarxiv icon

One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision

Add code
Feb 03, 2021
Figure 1 for One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision
Figure 2 for One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision
Figure 3 for One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision
Figure 4 for One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision
Viaarxiv icon

Families In Wild Multimedia : A Multi-Modal Database for Recognizing Kinship

Add code
Jul 28, 2020
Figure 1 for Families In Wild Multimedia : A Multi-Modal Database for Recognizing Kinship
Figure 2 for Families In Wild Multimedia : A Multi-Modal Database for Recognizing Kinship
Figure 3 for Families In Wild Multimedia : A Multi-Modal Database for Recognizing Kinship
Figure 4 for Families In Wild Multimedia : A Multi-Modal Database for Recognizing Kinship
Viaarxiv icon