Picture for Kai Yu

Kai Yu

Sherman

The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents

Add code
Feb 15, 2026
Viaarxiv icon

TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASR

Add code
Feb 12, 2026
Viaarxiv icon

Detect, Attend and Extract: Keyword Guided Target Speaker Extraction

Add code
Feb 08, 2026
Viaarxiv icon

PACER: Blockwise Pre-verification for Speculative Decoding with Adaptive Length

Add code
Feb 01, 2026
Viaarxiv icon

CMANet: Channel-Masked Attention Network for Cooperative Multi-Base-Station 3D Positioning

Add code
Jan 31, 2026
Viaarxiv icon

Fronthaul-Efficient Distributed Cooperative 3D Positioning with Quantized Latent CSI Embeddings

Add code
Jan 31, 2026
Viaarxiv icon

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

Add code
Jan 20, 2026
Viaarxiv icon

PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient

Add code
Jan 19, 2026
Viaarxiv icon

Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Add code
Jan 15, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon