Picture for Sonal Kumar

Sonal Kumar

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Add code
Oct 24, 2024
Viaarxiv icon

Do Audio-Language Models Understand Linguistic Variations?

Add code
Oct 21, 2024
Viaarxiv icon

PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification

Add code
Oct 19, 2024
Viaarxiv icon

EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning

Add code
Oct 17, 2024
Figure 1 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 2 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 3 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 4 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Viaarxiv icon

Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

Add code
Oct 02, 2024
Figure 1 for Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Figure 2 for Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Figure 3 for Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Figure 4 for Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Viaarxiv icon

ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds

Add code
Sep 13, 2024
Figure 1 for ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
Figure 2 for ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
Figure 3 for ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
Figure 4 for ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
Viaarxiv icon

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Add code
Jun 17, 2024
Viaarxiv icon

LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Add code
Jun 06, 2024
Viaarxiv icon

ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions

Add code
Jun 06, 2024
Viaarxiv icon

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap

Add code
May 24, 2024
Viaarxiv icon