Picture for Rafael Valle

Rafael Valle

A2SB: Audio-to-Audio Schrodinger Bridges

Add code
Jan 20, 2025
Viaarxiv icon

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

Add code
Dec 30, 2024
Viaarxiv icon

ETTA: Elucidating the Design Space of Text-to-Audio Models

Add code
Dec 26, 2024
Viaarxiv icon

OMCAT: Omni Context Aware Transformer

Add code
Oct 15, 2024
Figure 1 for OMCAT: Omni Context Aware Transformer
Figure 2 for OMCAT: Omni Context Aware Transformer
Figure 3 for OMCAT: Omni Context Aware Transformer
Figure 4 for OMCAT: Omni Context Aware Transformer
Viaarxiv icon

Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

Add code
Oct 02, 2024
Figure 1 for Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Figure 2 for Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Figure 3 for Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Figure 4 for Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Viaarxiv icon

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

Add code
Jun 25, 2024
Figure 1 for Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Figure 2 for Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Figure 3 for Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Figure 4 for Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Viaarxiv icon

Improving Text-To-Audio Models with Synthetic Captions

Add code
Jun 18, 2024
Figure 1 for Improving Text-To-Audio Models with Synthetic Captions
Figure 2 for Improving Text-To-Audio Models with Synthetic Captions
Figure 3 for Improving Text-To-Audio Models with Synthetic Captions
Figure 4 for Improving Text-To-Audio Models with Synthetic Captions
Viaarxiv icon

Audio Dialogues: Dialogues dataset for audio and music understanding

Add code
Apr 11, 2024
Figure 1 for Audio Dialogues: Dialogues dataset for audio and music understanding
Figure 2 for Audio Dialogues: Dialogues dataset for audio and music understanding
Figure 3 for Audio Dialogues: Dialogues dataset for audio and music understanding
Figure 4 for Audio Dialogues: Dialogues dataset for audio and music understanding
Viaarxiv icon

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Add code
Feb 02, 2024
Viaarxiv icon

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

Add code
Jan 29, 2024
Viaarxiv icon