Picture for Takafumi Moriya

Takafumi Moriya

Guided Speaker Embedding

Add code
Oct 16, 2024
Figure 1 for Guided Speaker Embedding
Figure 2 for Guided Speaker Embedding
Figure 3 for Guided Speaker Embedding
Figure 4 for Guided Speaker Embedding
Viaarxiv icon

Investigation of Speaker Representation for Target-Speaker Speech Processing

Add code
Oct 15, 2024
Figure 1 for Investigation of Speaker Representation for Target-Speaker Speech Processing
Figure 2 for Investigation of Speaker Representation for Target-Speaker Speech Processing
Figure 3 for Investigation of Speaker Representation for Target-Speaker Speech Processing
Figure 4 for Investigation of Speaker Representation for Target-Speaker Speech Processing
Viaarxiv icon

Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding

Add code
Sep 30, 2024
Figure 1 for Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
Figure 2 for Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
Figure 3 for Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
Figure 4 for Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
Viaarxiv icon

Alignment-Free Training for Transducer-based Multi-Talker ASR

Add code
Sep 30, 2024
Figure 1 for Alignment-Free Training for Transducer-based Multi-Talker ASR
Figure 2 for Alignment-Free Training for Transducer-based Multi-Talker ASR
Figure 3 for Alignment-Free Training for Transducer-based Multi-Talker ASR
Figure 4 for Alignment-Free Training for Transducer-based Multi-Talker ASR
Viaarxiv icon

NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge

Add code
Sep 09, 2024
Figure 1 for NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
Figure 2 for NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
Figure 3 for NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
Figure 4 for NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
Viaarxiv icon

Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings

Add code
Aug 30, 2024
Viaarxiv icon

Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation

Add code
Aug 01, 2024
Figure 1 for Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
Figure 2 for Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
Figure 3 for Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
Figure 4 for Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
Viaarxiv icon

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling

Add code
Jul 01, 2024
Viaarxiv icon

Factor-Conditioned Speaking-Style Captioning

Add code
Jun 27, 2024
Figure 1 for Factor-Conditioned Speaking-Style Captioning
Figure 2 for Factor-Conditioned Speaking-Style Captioning
Figure 3 for Factor-Conditioned Speaking-Style Captioning
Figure 4 for Factor-Conditioned Speaking-Style Captioning
Viaarxiv icon

Applying LLMs for Rescoring N-best ASR Hypotheses of Casual Conversations: Effects of Domain Adaptation and Context Carry-over

Add code
Jun 27, 2024
Viaarxiv icon