Picture for Shinji Watanabe

Shinji Watanabe

CLSP

Discrete Speech Unit Extraction via Independent Component Analysis

Add code
Jan 11, 2025
Viaarxiv icon

Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization

Add code
Dec 26, 2024
Viaarxiv icon

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

Add code
Dec 23, 2024
Viaarxiv icon

Deep Speech Synthesis from Multimodal Articulatory Representations

Add code
Dec 17, 2024
Viaarxiv icon

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR

Add code
Dec 07, 2024
Viaarxiv icon

Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition

Add code
Nov 27, 2024
Figure 1 for Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
Figure 2 for Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
Figure 3 for Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
Figure 4 for Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
Viaarxiv icon

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Add code
Nov 08, 2024
Figure 1 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 2 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 3 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 4 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Viaarxiv icon

Findings of the IWSLT 2024 Evaluation Campaign

Add code
Nov 07, 2024
Viaarxiv icon

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning

Add code
Oct 23, 2024
Viaarxiv icon

FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model

Add code
Oct 03, 2024
Figure 1 for FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Figure 2 for FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Figure 3 for FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Figure 4 for FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Viaarxiv icon