Picture for Bhiksha Raj

Bhiksha Raj

Language Technologies Institute, Carnegie Mellon University, Mohammed bin Zayed University of AI

MACE: Leveraging Audio for Evaluating Audio Captioning Systems

Add code
Nov 05, 2024
Viaarxiv icon

FLAASH: Flow-Attention Adaptive Semantic Hierarchical Fusion for Multi-Modal Tobacco Content Analysis

Add code
Oct 25, 2024
Viaarxiv icon

On the Diversity of Synthetic Data and its Impact on Training Large Language Models

Add code
Oct 19, 2024
Viaarxiv icon

What Do Speech Foundation Models Not Learn About Speech?

Add code
Oct 16, 2024
Viaarxiv icon

RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement

Add code
Oct 07, 2024
Viaarxiv icon

Improving Speaker Representations Using Contrastive Losses on Multi-scale Features

Add code
Oct 07, 2024
Viaarxiv icon

Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection

Add code
Oct 04, 2024
Viaarxiv icon

ImageFolder: Autoregressive Image Generation with Folded Tokens

Add code
Oct 02, 2024
Figure 1 for ImageFolder: Autoregressive Image Generation with Folded Tokens
Figure 2 for ImageFolder: Autoregressive Image Generation with Folded Tokens
Figure 3 for ImageFolder: Autoregressive Image Generation with Folded Tokens
Figure 4 for ImageFolder: Autoregressive Image Generation with Folded Tokens
Viaarxiv icon

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

Add code
Sep 24, 2024
Viaarxiv icon

Revisiting Acoustic Features for Robust ASR

Add code
Sep 24, 2024
Viaarxiv icon