Picture for Yosuke Higuchi

Yosuke Higuchi

End-to-End Speech Recognition with Pre-trained Masked Language Model

Add code
Oct 01, 2024
Figure 1 for End-to-End Speech Recognition with Pre-trained Masked Language Model
Figure 2 for End-to-End Speech Recognition with Pre-trained Masked Language Model
Figure 3 for End-to-End Speech Recognition with Pre-trained Masked Language Model
Figure 4 for End-to-End Speech Recognition with Pre-trained Masked Language Model
Viaarxiv icon

Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems

Add code
Sep 30, 2024
Figure 1 for Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
Figure 2 for Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
Figure 3 for Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
Figure 4 for Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
Viaarxiv icon

Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference

Add code
Oct 01, 2023
Figure 1 for Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
Figure 2 for Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
Figure 3 for Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
Figure 4 for Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
Viaarxiv icon

Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition

Add code
Sep 19, 2023
Viaarxiv icon

Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition

Add code
Sep 09, 2023
Figure 1 for Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition
Figure 2 for Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition
Figure 3 for Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition
Figure 4 for Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition
Viaarxiv icon

A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

Add code
Nov 10, 2022
Viaarxiv icon

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

Add code
Nov 02, 2022
Figure 1 for BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
Figure 2 for BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
Figure 3 for BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
Figure 4 for BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
Viaarxiv icon

InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

Add code
Nov 02, 2022
Figure 1 for InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
Figure 2 for InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
Figure 3 for InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
Figure 4 for InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
Viaarxiv icon

BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model

Add code
Oct 29, 2022
Viaarxiv icon

CTC Alignments Improve Autoregressive Translation

Add code
Oct 11, 2022
Figure 1 for CTC Alignments Improve Autoregressive Translation
Figure 2 for CTC Alignments Improve Autoregressive Translation
Figure 3 for CTC Alignments Improve Autoregressive Translation
Figure 4 for CTC Alignments Improve Autoregressive Translation
Viaarxiv icon