Picture for Bhiksha Raj

Bhiksha Raj

Language Technologies Institute, Carnegie Mellon University, Mohammed bin Zayed University of AI

CAARMA: Class Augmentation with Adversarial Mixup Regularization

Add code
Mar 20, 2025
Viaarxiv icon

Robust Latent Matters: Boosting Image Generation with Sampling Error

Add code
Mar 11, 2025
Viaarxiv icon

Mellow: a small audio language model for reasoning

Add code
Mar 11, 2025
Viaarxiv icon

On the Robust Approximation of ASR Metrics

Add code
Feb 18, 2025
Viaarxiv icon

Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models

Add code
Feb 18, 2025
Viaarxiv icon

ADIFF: Explaining audio difference using natural language

Add code
Feb 06, 2025
Viaarxiv icon

Masked Autoencoders Are Effective Tokenizers for Diffusion Models

Add code
Feb 05, 2025
Viaarxiv icon

Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video

Add code
Jan 24, 2025
Viaarxiv icon

Tessellated Linear Model for Age Prediction from Voice

Add code
Jan 16, 2025
Figure 1 for Tessellated Linear Model for Age Prediction from Voice
Figure 2 for Tessellated Linear Model for Age Prediction from Voice
Figure 3 for Tessellated Linear Model for Age Prediction from Voice
Figure 4 for Tessellated Linear Model for Age Prediction from Voice
Viaarxiv icon

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Add code
Dec 14, 2024
Figure 1 for SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Figure 2 for SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Figure 3 for SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Figure 4 for SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Viaarxiv icon