Picture for Shinji Watanabe

Shinji Watanabe

Carnegie Mellon University

CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR

Add code
Jan 30, 2026
Viaarxiv icon

Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback

Add code
Jan 27, 2026
Viaarxiv icon

The CMU-AIST submission for the ICME 2025 Audio Encoder Challenge

Add code
Jan 22, 2026
Viaarxiv icon

ICASSP 2026 URGENT Speech Enhancement Challenge

Add code
Jan 20, 2026
Viaarxiv icon

PRiSM: Benchmarking Phone Realization in Speech Models

Add code
Jan 20, 2026
Viaarxiv icon

SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition

Add code
Jan 18, 2026
Viaarxiv icon

Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks

Add code
Jan 18, 2026
Viaarxiv icon

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception

Add code
Jan 14, 2026
Viaarxiv icon

TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation

Add code
Dec 23, 2025
Viaarxiv icon

BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction

Add code
Nov 08, 2025
Figure 1 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Figure 2 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Figure 3 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Figure 4 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Viaarxiv icon