Picture for Dinesh Manocha

Dinesh Manocha

V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation

Add code
Jan 14, 2025
Viaarxiv icon

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Add code
Jan 03, 2025
Viaarxiv icon

TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification

Add code
Dec 31, 2024
Viaarxiv icon

HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language Models

Add code
Dec 29, 2024
Viaarxiv icon

DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments

Add code
Dec 28, 2024
Figure 1 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Figure 2 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Figure 3 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Figure 4 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Viaarxiv icon

VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Add code
Dec 14, 2024
Viaarxiv icon

SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation

Add code
Dec 13, 2024
Viaarxiv icon

PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from Related Example Banks

Add code
Dec 07, 2024
Viaarxiv icon

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Add code
Nov 27, 2024
Figure 1 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 2 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 3 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 4 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Viaarxiv icon

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Add code
Oct 24, 2024
Figure 1 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 2 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 3 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 4 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Viaarxiv icon