Picture for Utkarsh Tyagi

Utkarsh Tyagi

Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction

Add code
Dec 16, 2025
Figure 1 for Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction
Figure 2 for Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction
Figure 3 for Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction
Figure 4 for Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction
Viaarxiv icon

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning

Add code
Oct 14, 2025
Viaarxiv icon

ProSE: Diffusion Priors for Speech Enhancement

Add code
Mar 09, 2025
Figure 1 for ProSE: Diffusion Priors for Speech Enhancement
Figure 2 for ProSE: Diffusion Priors for Speech Enhancement
Figure 3 for ProSE: Diffusion Priors for Speech Enhancement
Figure 4 for ProSE: Diffusion Priors for Speech Enhancement
Viaarxiv icon

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Add code
Oct 24, 2024
Figure 1 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 2 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 3 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 4 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Viaarxiv icon

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Add code
Jun 17, 2024
Figure 1 for GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Figure 2 for GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Figure 3 for GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Figure 4 for GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Viaarxiv icon

ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions

Add code
Jun 06, 2024
Figure 1 for ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
Figure 2 for ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
Figure 3 for ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
Figure 4 for ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
Viaarxiv icon

LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Add code
Jun 06, 2024
Figure 1 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Figure 2 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Figure 3 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Figure 4 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Viaarxiv icon

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap

Add code
May 24, 2024
Viaarxiv icon

Do Vision-Language Models Understand Compound Nouns?

Add code
Mar 30, 2024
Figure 1 for Do Vision-Language Models Understand Compound Nouns?
Figure 2 for Do Vision-Language Models Understand Compound Nouns?
Figure 3 for Do Vision-Language Models Understand Compound Nouns?
Figure 4 for Do Vision-Language Models Understand Compound Nouns?
Viaarxiv icon

CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP

Add code
Mar 30, 2024
Viaarxiv icon