Picture for Aishwarya Agrawal

Aishwarya Agrawal

CTRL-O: Language-Controllable Object-Centric Visual Representation Learning

Add code
Mar 27, 2025
Viaarxiv icon

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Add code
Mar 19, 2025
Viaarxiv icon

Assessing and Learning Alignment of Unimodal Vision and Language Models

Add code
Dec 05, 2024
Viaarxiv icon

VisMin: Visual Minimal-Change Understanding

Add code
Jul 23, 2024
Figure 1 for VisMin: Visual Minimal-Change Understanding
Figure 2 for VisMin: Visual Minimal-Change Understanding
Figure 3 for VisMin: Visual Minimal-Change Understanding
Figure 4 for VisMin: Visual Minimal-Change Understanding
Viaarxiv icon

Benchmarking Vision Language Models for Cultural Understanding

Add code
Jul 15, 2024
Viaarxiv icon

Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison

Add code
Jul 10, 2024
Viaarxiv icon

An Introduction to Vision-Language Modeling

Add code
May 27, 2024
Figure 1 for An Introduction to Vision-Language Modeling
Figure 2 for An Introduction to Vision-Language Modeling
Figure 3 for An Introduction to Vision-Language Modeling
Viaarxiv icon

Improving Text-to-Image Consistency via Automatic Prompt Optimization

Add code
Mar 26, 2024
Figure 1 for Improving Text-to-Image Consistency via Automatic Prompt Optimization
Figure 2 for Improving Text-to-Image Consistency via Automatic Prompt Optimization
Figure 3 for Improving Text-to-Image Consistency via Automatic Prompt Optimization
Figure 4 for Improving Text-to-Image Consistency via Automatic Prompt Optimization
Viaarxiv icon

MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model

Add code
Oct 20, 2023
Figure 1 for MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model
Figure 2 for MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model
Figure 3 for MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model
Figure 4 for MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model
Viaarxiv icon

Improving Automatic VQA Evaluation Using Large Language Models

Add code
Oct 04, 2023
Viaarxiv icon