Picture for Bhiksha Raj

Bhiksha Raj

Language Technologies Institute, Carnegie Mellon University, Mohammed bin Zayed University of AI

On the Robust Approximation of ASR Metrics

Add code
Feb 18, 2025
Viaarxiv icon

Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models

Add code
Feb 18, 2025
Viaarxiv icon

ADIFF: Explaining audio difference using natural language

Add code
Feb 06, 2025
Viaarxiv icon

Masked Autoencoders Are Effective Tokenizers for Diffusion Models

Add code
Feb 05, 2025
Viaarxiv icon

Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video

Add code
Jan 24, 2025
Viaarxiv icon

Tessellated Linear Model for Age Prediction from Voice

Add code
Jan 16, 2025
Figure 1 for Tessellated Linear Model for Age Prediction from Voice
Figure 2 for Tessellated Linear Model for Age Prediction from Voice
Figure 3 for Tessellated Linear Model for Age Prediction from Voice
Figure 4 for Tessellated Linear Model for Age Prediction from Voice
Viaarxiv icon

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Add code
Dec 14, 2024
Viaarxiv icon

XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation

Add code
Dec 02, 2024
Viaarxiv icon

Perturbation Ontology based Graph Attention Networks

Add code
Nov 27, 2024
Viaarxiv icon

MACE: Leveraging Audio for Evaluating Audio Captioning Systems

Add code
Nov 05, 2024
Figure 1 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 2 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 3 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 4 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Viaarxiv icon