Picture for Soham Deshmukh

Soham Deshmukh

Microsoft

Mellow: a small audio language model for reasoning

Add code
Mar 11, 2025
Viaarxiv icon

ADIFF: Explaining audio difference using natural language

Add code
Feb 06, 2025
Viaarxiv icon

MACE: Leveraging Audio for Evaluating Audio Captioning Systems

Add code
Nov 05, 2024
Figure 1 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 2 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 3 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 4 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Viaarxiv icon

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

Add code
Jul 25, 2024
Viaarxiv icon

SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios

Add code
Jul 22, 2024
Viaarxiv icon

Domain Adaptation for Contrastive Audio-Language Models

Add code
Feb 14, 2024
Viaarxiv icon

PAM: Prompting Audio-Language Models for Audio Quality Assessment

Add code
Feb 01, 2024
Viaarxiv icon

Prompting Audios Using Acoustic Properties For Emotion Representation

Add code
Oct 05, 2023
Figure 1 for Prompting Audios Using Acoustic Properties For Emotion Representation
Figure 2 for Prompting Audios Using Acoustic Properties For Emotion Representation
Figure 3 for Prompting Audios Using Acoustic Properties For Emotion Representation
Figure 4 for Prompting Audios Using Acoustic Properties For Emotion Representation
Viaarxiv icon

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

Add code
Oct 02, 2023
Figure 1 for LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model
Figure 2 for LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model
Figure 3 for LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model
Figure 4 for LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model
Viaarxiv icon

Training Audio Captioning Models without Audio

Add code
Sep 14, 2023
Viaarxiv icon