Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multi-sequence Intermediate Conditioning for CTC-based ASR

Apr 01, 2022

Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida

Figure 1 for Multi-sequence Intermediate Conditioning for CTC-based ASR

Figure 2 for Multi-sequence Intermediate Conditioning for CTC-based ASR

Figure 3 for Multi-sequence Intermediate Conditioning for CTC-based ASR

Figure 4 for Multi-sequence Intermediate Conditioning for CTC-based ASR

Share this with someone who'll enjoy it:

Abstract:End-to-end automatic speech recognition (ASR) directly maps input speech to a character sequence without using pronunciation lexica. However, in languages with thousands of characters, such as Japanese and Mandarin, modeling all these characters is problematic due to data scarcity. To alleviate the problem, we propose a multi-task learning model with explicit interaction between characters and syllables by utilizing Self-conditioned connectionist temporal classification (CTC) technique. While the original Self-conditioned CTC estimates character-level intermediate predictions by applying auxiliary CTC losses to a set of intermediate layers, the proposed method additionally estimates syllable-level intermediate predictions in another set of intermediate layers. The character-level and syllable-level predictions are alternately used as conditioning features to deal with mutual dependency between characters and syllables. Experimental results on Japanese and Mandarin datasets show that the proposed multi-sequence intermediate conditioning outperformed the conventional multi-task-based and Self-conditioned CTC-based methods.

* This paper was submitted to INTERSPEECH 2022

View paper on

Share this with someone who'll enjoy it:

Title:Multi-sequence Intermediate Conditioning for CTC-based ASR

Paper and Code