Picture for Mauro Cettolo

Mauro Cettolo

SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation

Add code
Nov 03, 2024
Figure 1 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Figure 2 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Figure 3 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Figure 4 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Viaarxiv icon

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Add code
Oct 01, 2024
Figure 1 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 2 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 3 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 4 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Viaarxiv icon

SBAAM! Eliminating Transcript Dependency in Automatic Subtitling

Add code
May 17, 2024
Viaarxiv icon

Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection

Add code
Oct 24, 2023
Figure 1 for Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection
Figure 2 for Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection
Figure 3 for Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection
Figure 4 for Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection
Viaarxiv icon

No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation

Add code
Oct 10, 2023
Viaarxiv icon

Direct Speech Translation for Automatic Subtitling

Add code
Sep 27, 2022
Figure 1 for Direct Speech Translation for Automatic Subtitling
Figure 2 for Direct Speech Translation for Automatic Subtitling
Figure 3 for Direct Speech Translation for Automatic Subtitling
Figure 4 for Direct Speech Translation for Automatic Subtitling
Viaarxiv icon

Evaluating Subtitle Segmentation for End-to-end Generation Systems

Add code
May 19, 2022
Figure 1 for Evaluating Subtitle Segmentation for End-to-end Generation Systems
Figure 2 for Evaluating Subtitle Segmentation for End-to-end Generation Systems
Figure 3 for Evaluating Subtitle Segmentation for End-to-end Generation Systems
Figure 4 for Evaluating Subtitle Segmentation for End-to-end Generation Systems
Viaarxiv icon

Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?

Add code
Jun 02, 2021
Figure 1 for Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?
Figure 2 for Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?
Figure 3 for Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?
Figure 4 for Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?
Viaarxiv icon

Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation

Add code
Apr 23, 2021
Figure 1 for Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation
Figure 2 for Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation
Figure 3 for Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation
Figure 4 for Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation
Viaarxiv icon

CTC-based Compression for Direct Speech Translation

Add code
Feb 02, 2021
Figure 1 for CTC-based Compression for Direct Speech Translation
Figure 2 for CTC-based Compression for Direct Speech Translation
Figure 3 for CTC-based Compression for Direct Speech Translation
Figure 4 for CTC-based Compression for Direct Speech Translation
Viaarxiv icon