Picture for David Harwath

David Harwath

How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario

Add code
Nov 27, 2024
Viaarxiv icon

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Add code
Nov 08, 2024
Figure 1 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 2 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 3 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 4 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Viaarxiv icon

SyllableLM: Learning Coarse Semantic Units for Speech Language Models

Add code
Oct 05, 2024
Viaarxiv icon

Self-supervised Speech Models for Word-Level Stuttered Speech Detection

Add code
Sep 16, 2024
Figure 1 for Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Figure 2 for Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Figure 3 for Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Figure 4 for Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Viaarxiv icon

Interface Design for Self-Supervised Speech Models

Add code
Jun 18, 2024
Viaarxiv icon

Multimodal Contextualized Semantic Parsing from Speech

Add code
Jun 10, 2024
Viaarxiv icon

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos

Add code
Apr 08, 2024
Figure 1 for SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Figure 2 for SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Figure 3 for SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Figure 4 for SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Viaarxiv icon

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Add code
Mar 25, 2024
Viaarxiv icon

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

Add code
Feb 10, 2024
Viaarxiv icon

Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model

Add code
Feb 08, 2024
Viaarxiv icon