Picture for Bin Ma

Bin Ma

Univ. Western Ontario

Beyond Lips: Integrating Gesture and Lip Cues for Robust Audio-visual Speaker Extraction

Add code
Jan 27, 2026
Viaarxiv icon

LuSeeL: Language-queried Binaural Universal Sound Event Extraction and Localization

Add code
Jan 27, 2026
Viaarxiv icon

E2E-AEC: Implementing an end-to-end neural network learning approach for acoustic echo cancellation

Add code
Jan 23, 2026
Viaarxiv icon

FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning

Add code
Jan 23, 2026
Viaarxiv icon

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Add code
Dec 29, 2025
Viaarxiv icon

FunAudio-ASR Technical Report

Add code
Sep 15, 2025
Figure 1 for FunAudio-ASR Technical Report
Figure 2 for FunAudio-ASR Technical Report
Figure 3 for FunAudio-ASR Technical Report
Figure 4 for FunAudio-ASR Technical Report
Viaarxiv icon

Insight Rumors: A Novel Textual Rumor Locating and Marking Model Leveraging Att_BiMamba2 Network

Add code
Aug 18, 2025
Viaarxiv icon

ClearerVoice-Studio: Bridging Advanced Speech Processing Research and Practical Deployment

Add code
Jun 24, 2025
Viaarxiv icon

Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction

Add code
May 27, 2025
Viaarxiv icon

ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates

Add code
May 18, 2025
Viaarxiv icon