Picture for Vibhav Vineet

Vibhav Vineet

Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames

Add code
May 30, 2025
Viaarxiv icon

Grounding Task Assistance with Multimodal Cues from a Single Demonstration

Add code
May 02, 2025
Viaarxiv icon

TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action

Add code
May 02, 2025
Viaarxiv icon

Phi-4-reasoning Technical Report

Add code
Apr 30, 2025
Viaarxiv icon

A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning

Add code
Apr 08, 2025
Viaarxiv icon

Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

Add code
Mar 31, 2025
Viaarxiv icon

HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding

Add code
Mar 11, 2025
Viaarxiv icon

MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation

Add code
Jan 07, 2025
Figure 1 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 2 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 3 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 4 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Viaarxiv icon

RiTTA: Modeling Event Relations in Text-to-Audio Generation

Add code
Dec 20, 2024
Figure 1 for RiTTA: Modeling Event Relations in Text-to-Audio Generation
Figure 2 for RiTTA: Modeling Event Relations in Text-to-Audio Generation
Figure 3 for RiTTA: Modeling Event Relations in Text-to-Audio Generation
Figure 4 for RiTTA: Modeling Event Relations in Text-to-Audio Generation
Viaarxiv icon

On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Add code
Oct 25, 2024
Figure 1 for On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes
Figure 2 for On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes
Figure 3 for On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes
Figure 4 for On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes
Viaarxiv icon