Picture for Dinesh Manocha

Dinesh Manocha

Vi-LAD: Vision-Language Attention Distillation for Socially-Aware Robot Navigation in Dynamic Environments

Add code
Mar 12, 2025
Viaarxiv icon

AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning

Add code
Mar 10, 2025
Viaarxiv icon

ProSE: Diffusion Priors for Speech Enhancement

Add code
Mar 09, 2025
Viaarxiv icon

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Add code
Mar 06, 2025
Viaarxiv icon

EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching

Add code
Feb 28, 2025
Viaarxiv icon

Towards Optimal Multi-draft Speculative Decoding

Add code
Feb 26, 2025
Viaarxiv icon

IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360$^\circ$ Cameras

Add code
Feb 19, 2025
Viaarxiv icon

V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation

Add code
Jan 14, 2025
Viaarxiv icon

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Add code
Jan 03, 2025
Figure 1 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 2 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 3 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 4 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Viaarxiv icon

TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification

Add code
Dec 31, 2024
Viaarxiv icon