Picture for Midia Yousefi

Midia Yousefi

Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages

Add code
Nov 11, 2024
Figure 1 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 2 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 3 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 4 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Viaarxiv icon

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Figure 1 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 2 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 3 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 4 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Viaarxiv icon

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Add code
May 28, 2024
Viaarxiv icon

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Add code
Apr 10, 2024
Viaarxiv icon

Profile-Error-Tolerant Target-Speaker Voice Activity Detection

Add code
Sep 21, 2023
Viaarxiv icon

Single-channel speech separation using Soft-minimum Permutation Invariant Training

Add code
Nov 16, 2021
Figure 1 for Single-channel speech separation using Soft-minimum Permutation Invariant Training
Figure 2 for Single-channel speech separation using Soft-minimum Permutation Invariant Training
Figure 3 for Single-channel speech separation using Soft-minimum Permutation Invariant Training
Figure 4 for Single-channel speech separation using Soft-minimum Permutation Invariant Training
Viaarxiv icon

Speaker conditioning of acoustic models using affine transformation for multi-speaker speech recognition

Add code
Oct 30, 2021
Figure 1 for Speaker conditioning of acoustic models using affine transformation for multi-speaker speech recognition
Figure 2 for Speaker conditioning of acoustic models using affine transformation for multi-speaker speech recognition
Figure 3 for Speaker conditioning of acoustic models using affine transformation for multi-speaker speech recognition
Figure 4 for Speaker conditioning of acoustic models using affine transformation for multi-speaker speech recognition
Viaarxiv icon

Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network

Add code
Oct 30, 2021
Figure 1 for Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network
Figure 2 for Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network
Figure 3 for Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network
Figure 4 for Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network
Viaarxiv icon

Probabilistic Permutation Invariant Training for Speech Separation

Add code
Aug 04, 2019
Figure 1 for Probabilistic Permutation Invariant Training for Speech Separation
Figure 2 for Probabilistic Permutation Invariant Training for Speech Separation
Figure 3 for Probabilistic Permutation Invariant Training for Speech Separation
Viaarxiv icon