Picture for Marco Gaido

Marco Gaido

Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection

Add code
Dec 16, 2024
Viaarxiv icon

SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation

Add code
Nov 03, 2024
Figure 1 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Figure 2 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Figure 3 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Figure 4 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Viaarxiv icon

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Add code
Oct 01, 2024
Figure 1 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 2 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 3 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 4 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Viaarxiv icon

How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not

Add code
Sep 25, 2024
Figure 1 for How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
Figure 2 for How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
Figure 3 for How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
Figure 4 for How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
Viaarxiv icon

Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

Add code
Aug 07, 2024
Viaarxiv icon

SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation

Add code
Jun 20, 2024
Viaarxiv icon

StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection

Add code
Jun 10, 2024
Viaarxiv icon

SBAAM! Eliminating Transcript Dependency in Automatic Subtitling

Add code
May 17, 2024
Viaarxiv icon

How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena

Add code
Feb 20, 2024
Figure 1 for How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
Figure 2 for How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
Figure 3 for How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
Viaarxiv icon

Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?

Add code
Feb 19, 2024
Figure 1 for Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Figure 2 for Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Figure 3 for Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Figure 4 for Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Viaarxiv icon