Picture for Vibhav Vineet

Vibhav Vineet

What MLLMs Learn about When they Learn about Multimodal Reasoning: Perception, Reasoning, or their Integration?

Add code
Oct 02, 2025
Viaarxiv icon

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Add code
Oct 02, 2025
Viaarxiv icon

Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames

Add code
May 30, 2025
Viaarxiv icon

Grounding Task Assistance with Multimodal Cues from a Single Demonstration

Add code
May 02, 2025
Viaarxiv icon

TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action

Add code
May 02, 2025
Viaarxiv icon

Phi-4-reasoning Technical Report

Add code
Apr 30, 2025
Viaarxiv icon

A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning

Add code
Apr 08, 2025
Viaarxiv icon

Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

Add code
Mar 31, 2025
Viaarxiv icon

HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding

Add code
Mar 11, 2025
Viaarxiv icon

MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation

Add code
Jan 07, 2025
Figure 1 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 2 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 3 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 4 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Viaarxiv icon