Picture for Anton Ragni

Anton Ragni

Beyond Two-stage Diffusion TTS: Joint Structure and Content Refinement via Jump Diffusion

Add code
Mar 14, 2026
Viaarxiv icon

Beyond the Utterance: An Empirical Study of Very Long Context Speech Recognition

Add code
Feb 04, 2026
Viaarxiv icon

Decoding Order Matters in Autoregressive Speech Synthesis

Add code
Jan 13, 2026
Viaarxiv icon

How I Built ASR for Endangered Languages with a Spoken Dictionary

Add code
Oct 06, 2025
Figure 1 for How I Built ASR for Endangered Languages with a Spoken Dictionary
Figure 2 for How I Built ASR for Endangered Languages with a Spoken Dictionary
Figure 3 for How I Built ASR for Endangered Languages with a Spoken Dictionary
Figure 4 for How I Built ASR for Endangered Languages with a Spoken Dictionary
Viaarxiv icon

Score-Based Training for Energy-Based TTS Models

Add code
May 19, 2025
Figure 1 for Score-Based Training for Energy-Based TTS Models
Figure 2 for Score-Based Training for Energy-Based TTS Models
Figure 3 for Score-Based Training for Energy-Based TTS Models
Figure 4 for Score-Based Training for Energy-Based TTS Models
Viaarxiv icon

VisualSpeech: Enhance Prosody with Visual Context in TTS

Add code
Jan 31, 2025
Figure 1 for VisualSpeech: Enhance Prosody with Visual Context in TTS
Figure 2 for VisualSpeech: Enhance Prosody with Visual Context in TTS
Figure 3 for VisualSpeech: Enhance Prosody with Visual Context in TTS
Figure 4 for VisualSpeech: Enhance Prosody with Visual Context in TTS
Viaarxiv icon

What happens to diffusion model likelihood when your model is conditional?

Add code
Sep 10, 2024
Figure 1 for What happens to diffusion model likelihood when your model is conditional?
Figure 2 for What happens to diffusion model likelihood when your model is conditional?
Figure 3 for What happens to diffusion model likelihood when your model is conditional?
Figure 4 for What happens to diffusion model likelihood when your model is conditional?
Viaarxiv icon

Foundation Models for Music: A Survey

Add code
Aug 27, 2024
Figure 1 for Foundation Models for Music: A Survey
Figure 2 for Foundation Models for Music: A Survey
Figure 3 for Foundation Models for Music: A Survey
Figure 4 for Foundation Models for Music: A Survey
Viaarxiv icon

Self-Train Before You Transcribe

Add code
Jun 17, 2024
Figure 1 for Self-Train Before You Transcribe
Figure 2 for Self-Train Before You Transcribe
Figure 3 for Self-Train Before You Transcribe
Figure 4 for Self-Train Before You Transcribe
Viaarxiv icon

Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis

Add code
Jun 12, 2024
Figure 1 for Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis
Figure 2 for Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis
Figure 3 for Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis
Figure 4 for Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis
Viaarxiv icon