Picture for Sai Rajeswar

Sai Rajeswar

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Add code
Mar 19, 2025
Viaarxiv icon

PairBench: A Systematic Framework for Selecting Reliable Judge VLMs

Add code
Feb 21, 2025
Viaarxiv icon

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Add code
Feb 03, 2025
Figure 1 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 2 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 3 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 4 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Viaarxiv icon

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Add code
Dec 05, 2024
Figure 1 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 2 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 3 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 4 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Viaarxiv icon

Representing Positional Information in Generative World Models for Object Manipulation

Add code
Sep 19, 2024
Viaarxiv icon

Multimodal foundation world models for generalist embodied agents

Add code
Jun 26, 2024
Viaarxiv icon

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

Add code
Jun 17, 2024
Viaarxiv icon

VCR: Visual Caption Restoration

Add code
Jun 10, 2024
Figure 1 for VCR: Visual Caption Restoration
Figure 2 for VCR: Visual Caption Restoration
Figure 3 for VCR: Visual Caption Restoration
Figure 4 for VCR: Visual Caption Restoration
Viaarxiv icon

Capture the Flag: Uncovering Data Insights with Large Language Models

Add code
Dec 21, 2023
Viaarxiv icon

Equivariant Adaptation of Large Pre-Trained Models

Add code
Oct 02, 2023
Viaarxiv icon