Picture for Tomohiro Nakatani

Tomohiro Nakatani

Loose coupling of spectral and spatial models for multi-channel diarization and enhancement of meetings in dynamic environments

Add code
Jan 22, 2026
Viaarxiv icon

Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm

Add code
Oct 31, 2025
Figure 1 for Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm
Figure 2 for Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm
Figure 3 for Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm
Figure 4 for Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm
Viaarxiv icon

MOVER: Combining Multiple Meeting Recognition Systems

Add code
Aug 07, 2025
Figure 1 for MOVER: Combining Multiple Meeting Recognition Systems
Figure 2 for MOVER: Combining Multiple Meeting Recognition Systems
Figure 3 for MOVER: Combining Multiple Meeting Recognition Systems
Viaarxiv icon

Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes

Add code
Jun 12, 2025
Figure 1 for Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes
Figure 2 for Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes
Figure 3 for Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes
Viaarxiv icon

Microphone Array Signal Processing and Deep Learning for Speech Enhancement

Add code
Jan 13, 2025
Figure 1 for Microphone Array Signal Processing and Deep Learning for Speech Enhancement
Figure 2 for Microphone Array Signal Processing and Deep Learning for Speech Enhancement
Figure 3 for Microphone Array Signal Processing and Deep Learning for Speech Enhancement
Figure 4 for Microphone Array Signal Processing and Deep Learning for Speech Enhancement
Viaarxiv icon

NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge

Add code
Sep 09, 2024
Figure 1 for NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
Figure 2 for NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
Figure 3 for NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
Figure 4 for NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
Viaarxiv icon

Interaural time difference loss for binaural target sound extraction

Add code
Aug 01, 2024
Figure 1 for Interaural time difference loss for binaural target sound extraction
Figure 2 for Interaural time difference loss for binaural target sound extraction
Viaarxiv icon

Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers

Add code
Feb 05, 2024
Viaarxiv icon

Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss

Add code
Nov 20, 2023
Figure 1 for Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss
Figure 2 for Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss
Figure 3 for Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss
Viaarxiv icon

Target Speech Extraction with Conditional Diffusion Model

Add code
Aug 17, 2023
Figure 1 for Target Speech Extraction with Conditional Diffusion Model
Figure 2 for Target Speech Extraction with Conditional Diffusion Model
Figure 3 for Target Speech Extraction with Conditional Diffusion Model
Figure 4 for Target Speech Extraction with Conditional Diffusion Model
Viaarxiv icon