Picture for Bhiksha Raj

Bhiksha Raj

Language Technologies Institute, Carnegie Mellon University, Mohammed bin Zayed University of AI

XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation

Add code
Dec 02, 2024
Viaarxiv icon

Perturbation Ontology based Graph Attention Networks

Add code
Nov 27, 2024
Viaarxiv icon

MACE: Leveraging Audio for Evaluating Audio Captioning Systems

Add code
Nov 05, 2024
Figure 1 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 2 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 3 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Figure 4 for MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Viaarxiv icon

FLAASH: Flow-Attention Adaptive Semantic Hierarchical Fusion for Multi-Modal Tobacco Content Analysis

Add code
Oct 25, 2024
Viaarxiv icon

On the Diversity of Synthetic Data and its Impact on Training Large Language Models

Add code
Oct 19, 2024
Viaarxiv icon

What Do Speech Foundation Models Not Learn About Speech?

Add code
Oct 16, 2024
Viaarxiv icon

RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement

Add code
Oct 07, 2024
Viaarxiv icon

Improving Speaker Representations Using Contrastive Losses on Multi-scale Features

Add code
Oct 07, 2024
Viaarxiv icon

Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection

Add code
Oct 04, 2024
Viaarxiv icon

ImageFolder: Autoregressive Image Generation with Folded Tokens

Add code
Oct 02, 2024
Figure 1 for ImageFolder: Autoregressive Image Generation with Folded Tokens
Figure 2 for ImageFolder: Autoregressive Image Generation with Folded Tokens
Figure 3 for ImageFolder: Autoregressive Image Generation with Folded Tokens
Figure 4 for ImageFolder: Autoregressive Image Generation with Folded Tokens
Viaarxiv icon