Picture for Ryo Masumura

Ryo Masumura

Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding

Add code
Sep 30, 2024
Figure 1 for Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
Figure 2 for Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
Figure 3 for Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
Figure 4 for Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
Viaarxiv icon

Alignment-Free Training for Transducer-based Multi-Talker ASR

Add code
Sep 30, 2024
Figure 1 for Alignment-Free Training for Transducer-based Multi-Talker ASR
Figure 2 for Alignment-Free Training for Transducer-based Multi-Talker ASR
Figure 3 for Alignment-Free Training for Transducer-based Multi-Talker ASR
Figure 4 for Alignment-Free Training for Transducer-based Multi-Talker ASR
Viaarxiv icon

Factor-Conditioned Speaking-Style Captioning

Add code
Jun 27, 2024
Figure 1 for Factor-Conditioned Speaking-Style Captioning
Figure 2 for Factor-Conditioned Speaking-Style Captioning
Figure 3 for Factor-Conditioned Speaking-Style Captioning
Figure 4 for Factor-Conditioned Speaking-Style Captioning
Viaarxiv icon

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff

Add code
Aug 31, 2023
Viaarxiv icon

End-to-End Joint Target and Non-Target Speakers ASR

Add code
Jun 04, 2023
Viaarxiv icon

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

Add code
May 25, 2023
Viaarxiv icon

Improving Scheduled Sampling for Neural Transducer-based ASR

Add code
May 25, 2023
Viaarxiv icon

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

Add code
May 24, 2023
Viaarxiv icon

Leveraging Large Text Corpora for End-to-End Speech Summarization

Add code
Mar 02, 2023
Viaarxiv icon

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

Add code
Oct 28, 2022
Viaarxiv icon