Picture for Satvik Dixit

Satvik Dixit

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers

Add code
Nov 18, 2024
Viaarxiv icon

MACE: Leveraging Audio for Evaluating Audio Captioning Systems

Add code
Nov 05, 2024
Figure 1 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 2 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 3 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 4 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Viaarxiv icon

Improving Speaker Representations Using Contrastive Losses on Multi-scale Features

Add code
Oct 07, 2024
Viaarxiv icon

Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features

Add code
Sep 14, 2024
Viaarxiv icon