Picture for Ashish Seth

Ashish Seth

TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification

Add code
Dec 31, 2024
Viaarxiv icon

HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language Models

Add code
Dec 29, 2024
Viaarxiv icon

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Add code
Oct 24, 2024
Figure 1 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 2 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 3 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 4 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Viaarxiv icon

Do Audio-Language Models Understand Linguistic Variations?

Add code
Oct 21, 2024
Figure 1 for Do Audio-Language Models Understand Linguistic Variations?
Figure 2 for Do Audio-Language Models Understand Linguistic Variations?
Figure 3 for Do Audio-Language Models Understand Linguistic Variations?
Figure 4 for Do Audio-Language Models Understand Linguistic Variations?
Viaarxiv icon

PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification

Add code
Oct 19, 2024
Viaarxiv icon

EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning

Add code
Oct 17, 2024
Figure 1 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 2 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 3 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 4 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Viaarxiv icon

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Add code
Jun 17, 2024
Figure 1 for GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Figure 2 for GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Figure 3 for GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Figure 4 for GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Viaarxiv icon

LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Add code
Jun 06, 2024
Figure 1 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Figure 2 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Figure 3 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Figure 4 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Viaarxiv icon

FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

Add code
Dec 20, 2023
Figure 1 for FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning
Figure 2 for FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning
Figure 3 for FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning
Figure 4 for FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning
Viaarxiv icon

Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition

Add code
Dec 20, 2023
Figure 1 for Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Figure 2 for Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Figure 3 for Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Figure 4 for Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Viaarxiv icon