Picture for Dinesh Manocha

Dinesh Manocha

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Add code
Oct 24, 2024
Viaarxiv icon

Do Audio-Language Models Understand Linguistic Variations?

Add code
Oct 21, 2024
Viaarxiv icon

DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding

Add code
Oct 21, 2024
Figure 1 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 2 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 3 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 4 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Viaarxiv icon

PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification

Add code
Oct 19, 2024
Viaarxiv icon

EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning

Add code
Oct 17, 2024
Figure 1 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 2 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 3 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 4 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Viaarxiv icon

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation

Add code
Oct 17, 2024
Figure 1 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 2 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 3 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 4 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Viaarxiv icon

ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera

Add code
Oct 14, 2024
Figure 1 for ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera
Figure 2 for ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera
Figure 3 for ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera
Figure 4 for ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera
Viaarxiv icon

MeshGS: Adaptive Mesh-Aligned Gaussian Splatting for High-Quality Rendering

Add code
Oct 11, 2024
Viaarxiv icon

Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering

Add code
Oct 06, 2024
Viaarxiv icon

AIME: AI System Optimization via Multiple LLM Evaluators

Add code
Oct 04, 2024
Viaarxiv icon