Picture for Sara Papi

Sara Papi

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

Add code
Dec 24, 2025
Figure 1 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Figure 2 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Figure 3 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Figure 4 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Viaarxiv icon

Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

Add code
Dec 19, 2025
Figure 1 for Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
Figure 2 for Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
Figure 3 for Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
Figure 4 for Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
Viaarxiv icon

The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence

Add code
May 29, 2025
Viaarxiv icon

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian

Add code
May 28, 2025
Viaarxiv icon

Granary: Speech Recognition and Translation Dataset in 25 European Languages

Add code
May 19, 2025
Viaarxiv icon

NUTSHELL: A Dataset for Abstract Generation from Scientific Talks

Add code
Feb 24, 2025
Viaarxiv icon

Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison

Add code
Jan 04, 2025
Figure 1 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Figure 2 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Figure 3 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Figure 4 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Viaarxiv icon

How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Add code
Dec 24, 2024
Figure 1 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
Figure 2 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
Figure 3 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
Figure 4 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
Viaarxiv icon

Findings of the IWSLT 2024 Evaluation Campaign

Add code
Nov 07, 2024
Viaarxiv icon

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Add code
Oct 01, 2024
Figure 1 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 2 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 3 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 4 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Viaarxiv icon