Picture for Zakaria Aldeneh

Zakaria Aldeneh

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

Add code
Nov 26, 2024
Figure 1 for Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis
Figure 2 for Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis
Figure 3 for Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis
Figure 4 for Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis
Viaarxiv icon

Learning Spatially-Aware Language and Audio Embedding

Add code
Sep 17, 2024
Viaarxiv icon

Towards Automatic Assessment of Self-Supervised Speech Models using Rank

Add code
Sep 16, 2024
Viaarxiv icon

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels

Add code
Sep 16, 2024
Viaarxiv icon

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models

Add code
Sep 16, 2024
Viaarxiv icon

dMel: Speech Tokenization made Simple

Add code
Jul 22, 2024
Figure 1 for dMel: Speech Tokenization made Simple
Figure 2 for dMel: Speech Tokenization made Simple
Figure 3 for dMel: Speech Tokenization made Simple
Figure 4 for dMel: Speech Tokenization made Simple
Viaarxiv icon

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

Add code
Feb 01, 2024
Viaarxiv icon

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

Add code
Jan 30, 2024
Figure 1 for ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Figure 2 for ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Figure 3 for ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Figure 4 for ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Viaarxiv icon

Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning

Add code
Aug 18, 2023
Figure 1 for Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning
Figure 2 for Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning
Figure 3 for Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning
Figure 4 for Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning
Viaarxiv icon

Naturalistic Head Motion Generation from Speech

Add code
Oct 26, 2022
Viaarxiv icon