Picture for Jiatong Shi

Jiatong Shi

Aligning Text-to-Music Evaluation with Human Preferences

Add code
Mar 20, 2025
Viaarxiv icon

ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems

Add code
Mar 11, 2025
Viaarxiv icon

Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM

Add code
Feb 24, 2025
Viaarxiv icon

ESPnet-SpeechLM: An Open Speech Language Model Toolkit

Add code
Feb 21, 2025
Viaarxiv icon

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

Add code
Dec 23, 2024
Viaarxiv icon

How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario

Add code
Nov 27, 2024
Figure 1 for How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
Figure 2 for How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
Figure 3 for How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
Figure 4 for How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
Viaarxiv icon

Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition

Add code
Nov 27, 2024
Figure 1 for Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
Figure 2 for Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
Figure 3 for Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
Figure 4 for Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
Viaarxiv icon

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Add code
Nov 08, 2024
Figure 1 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 2 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 3 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 4 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Viaarxiv icon

Findings of the IWSLT 2024 Evaluation Campaign

Add code
Nov 07, 2024
Viaarxiv icon

Exploiting Longitudinal Speech Sessions via Voice Assistant Systems for Early Detection of Cognitive Decline

Add code
Oct 16, 2024
Viaarxiv icon