Picture for Andrei Andrusenko

Andrei Andrusenko

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter

Add code
Jun 11, 2024
Figure 1 for Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
Figure 2 for Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
Figure 3 for Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
Figure 4 for Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
Viaarxiv icon

SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

Add code
Oct 13, 2023
Figure 1 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Figure 2 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Figure 3 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Figure 4 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Viaarxiv icon

Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition

Add code
Aug 16, 2022
Figure 1 for Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
Figure 2 for Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
Figure 3 for Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
Figure 4 for Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
Viaarxiv icon

LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring

Add code
Apr 06, 2021
Figure 1 for LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
Figure 2 for LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
Figure 3 for LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
Figure 4 for LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
Viaarxiv icon

Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition

Add code
Mar 12, 2021
Figure 1 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Figure 2 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Figure 3 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Figure 4 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Viaarxiv icon

Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset

Add code
Jun 15, 2020
Figure 1 for Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset
Figure 2 for Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset
Figure 3 for Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset
Viaarxiv icon

Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario

Add code
May 14, 2020
Figure 1 for Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
Figure 2 for Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
Figure 3 for Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
Figure 4 for Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
Viaarxiv icon

You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

Add code
May 14, 2020
Figure 1 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Figure 2 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Figure 3 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Figure 4 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Viaarxiv icon

Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription

Add code
Apr 24, 2020
Figure 1 for Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription
Figure 2 for Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription
Figure 3 for Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription
Figure 4 for Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription
Viaarxiv icon