Picture for Yang Xiao

Yang Xiao

GIVE: Grounding Human Gestures in Vision-Language-Action Models

Add code
Jun 11, 2026
Viaarxiv icon

RAIL: Rethinking Auditory Intelligence in Large Audio-Language Models with a CHC-Grounded Benchmark

Add code
Jun 09, 2026
Viaarxiv icon

Titans-as-a-Layer: Test-Time Memory for Conversational Speech Emotion Recognition

Add code
Jun 07, 2026
Viaarxiv icon

UNIVID: Unified Vision-Language Model for Video Moderation

Add code
Jun 04, 2026
Viaarxiv icon

Benchmarking Multimodal LLMs on Code Generation for Complex Interactive Webpages

Add code
May 29, 2026
Viaarxiv icon

Learning When to Think While Listening in Large Audio-Language Models

Add code
May 26, 2026
Viaarxiv icon

Why Can't They Remember? Uncovering Representation and Retrieval Bottlenecks in Multi-Turn Acoustic Memory

Add code
May 26, 2026
Viaarxiv icon

Rethinking Continual Learning for Speech and Audio: A Representation-Centric Taxonomy and Open Problems

Add code
May 24, 2026
Viaarxiv icon

Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation

Add code
Apr 24, 2026
Viaarxiv icon

RSGMamba: Reliability-Aware Self-Gated State Space Model for Multimodal Semantic Segmentation

Add code
Apr 14, 2026
Viaarxiv icon