Picture for Tomoki Toda

Tomoki Toda

MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models

Add code
Nov 06, 2024
Viaarxiv icon

Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions

Add code
Sep 29, 2024
Figure 1 for Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions
Figure 2 for Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions
Figure 3 for Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions
Figure 4 for Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions
Viaarxiv icon

Improved Architecture for High-resolution Piano Transcription to Efficiently Capture Acoustic Characteristics of Music Signals

Add code
Sep 29, 2024
Viaarxiv icon

Improvements of Discriminative Feature Space Training for Anomalous Sound Detection in Unlabeled Conditions

Add code
Sep 14, 2024
Viaarxiv icon

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction

Add code
Sep 11, 2024
Viaarxiv icon

SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge

Add code
Aug 28, 2024
Viaarxiv icon

Quantifying the effect of speech pathology on automatic and human speaker verification

Add code
Jun 10, 2024
Viaarxiv icon

2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval

Add code
Jun 10, 2024
Viaarxiv icon

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

Add code
Jun 04, 2024
Figure 1 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 2 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 3 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 4 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Viaarxiv icon

Multi-speaker Text-to-speech Training with Speaker Anonymized Data

Add code
May 20, 2024
Viaarxiv icon