Picture for Dinesh Manocha

Dinesh Manocha

SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation

Add code
Dec 13, 2024
Viaarxiv icon

PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from Related Example Banks

Add code
Dec 07, 2024
Viaarxiv icon

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Add code
Nov 27, 2024
Figure 1 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 2 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 3 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 4 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Viaarxiv icon

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Add code
Oct 24, 2024
Viaarxiv icon

DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding

Add code
Oct 21, 2024
Figure 1 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 2 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 3 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 4 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Viaarxiv icon

Do Audio-Language Models Understand Linguistic Variations?

Add code
Oct 21, 2024
Figure 1 for Do Audio-Language Models Understand Linguistic Variations?
Figure 2 for Do Audio-Language Models Understand Linguistic Variations?
Figure 3 for Do Audio-Language Models Understand Linguistic Variations?
Figure 4 for Do Audio-Language Models Understand Linguistic Variations?
Viaarxiv icon

PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification

Add code
Oct 19, 2024
Viaarxiv icon

EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning

Add code
Oct 17, 2024
Figure 1 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 2 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 3 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 4 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Viaarxiv icon

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation

Add code
Oct 17, 2024
Figure 1 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 2 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 3 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 4 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Viaarxiv icon

ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera

Add code
Oct 14, 2024
Figure 1 for ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera
Figure 2 for ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera
Figure 3 for ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera
Figure 4 for ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera
Viaarxiv icon