Picture for Dinesh Manocha

Dinesh Manocha

Wid3R: Wide Field-of-View 3D Reconstruction via Camera Model Conditioning

Add code
Feb 05, 2026
Viaarxiv icon

MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents

Add code
Jan 28, 2026
Viaarxiv icon

AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding

Add code
Dec 18, 2025
Figure 1 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 2 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 3 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 4 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Viaarxiv icon

DR. Nav: Semantic-Geometric Representations for Proactive Dead-End Recovery and Navigation

Add code
Nov 16, 2025
Viaarxiv icon

Music Flamingo: Scaling Music Understanding in Audio Language Models

Add code
Nov 13, 2025
Viaarxiv icon

SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models

Add code
Nov 13, 2025
Figure 1 for SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models
Figure 2 for SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models
Figure 3 for SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models
Figure 4 for SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models
Viaarxiv icon

Structured Uncertainty guided Clarification for LLM Agents

Add code
Nov 11, 2025
Viaarxiv icon

MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency

Add code
Oct 08, 2025
Figure 1 for MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency
Figure 2 for MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency
Figure 3 for MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency
Figure 4 for MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency
Viaarxiv icon

NavMoE: Hybrid Model- and Learning-based Traversability Estimation for Local Navigation via Mixture of Experts

Add code
Sep 16, 2025
Viaarxiv icon

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Add code
Jul 09, 2025
Figure 1 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Figure 2 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Viaarxiv icon