Picture for Abhinav Shrivastava

Abhinav Shrivastava

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos

Add code
Mar 14, 2026
Viaarxiv icon

Scale Space Diffusion

Add code
Mar 09, 2026
Viaarxiv icon

OmniRet: Efficient and High-Fidelity Omni Modality Retrieval

Add code
Mar 02, 2026
Viaarxiv icon

TeCoNeRV: Leveraging Temporal Coherence for Compressible Neural Representations for Videos

Add code
Feb 18, 2026
Viaarxiv icon

All-in-One Conditioning for Text-to-Image Synthesis

Add code
Feb 09, 2026
Viaarxiv icon

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

Add code
Jan 25, 2026
Viaarxiv icon

Towards Understanding Best Practices for Quantization of Vision-Language Models

Add code
Jan 21, 2026
Viaarxiv icon

Characterizing Motion Encoding in Video Diffusion Timesteps

Add code
Dec 18, 2025
Viaarxiv icon

Growing Visual Generative Capacity for Pre-Trained MLLMs

Add code
Oct 02, 2025
Viaarxiv icon

Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor

Add code
Jul 09, 2025
Viaarxiv icon