Picture for Bin Ma

Bin Ma

Univ. Western Ontario

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

Add code
Jan 17, 2025
Viaarxiv icon

Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context Learning

Add code
Jan 17, 2025
Viaarxiv icon

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Add code
Jan 10, 2025
Figure 1 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 2 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 3 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 4 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Viaarxiv icon

Speech Separation using Neural Audio Codecs with Embedding Loss

Add code
Nov 27, 2024
Figure 1 for Speech Separation using Neural Audio Codecs with Embedding Loss
Figure 2 for Speech Separation using Neural Audio Codecs with Embedding Loss
Figure 3 for Speech Separation using Neural Audio Codecs with Embedding Loss
Figure 4 for Speech Separation using Neural Audio Codecs with Embedding Loss
Viaarxiv icon

$\texttt{PatentAgent}$: Intelligent Agent for Automated Pharmaceutical Patent Analysis

Add code
Oct 25, 2024
Viaarxiv icon

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions

Add code
Sep 25, 2024
Viaarxiv icon

Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding

Add code
Sep 12, 2024
Figure 1 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 2 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 3 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 4 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Viaarxiv icon

Towards Audio Codec-based Speech Separation

Add code
Jun 18, 2024
Viaarxiv icon

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

Add code
Jun 04, 2024
Viaarxiv icon

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation

Add code
Dec 19, 2023
Viaarxiv icon