Picture for Yu Sun

Yu Sun

Sherman

MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free

Add code
Jan 08, 2026
Viaarxiv icon

FROST-Drive: Scalable and Efficient End-to-End Driving with a Frozen Vision Encoder

Add code
Jan 06, 2026
Viaarxiv icon

End-to-End Test-Time Training for Long Context

Add code
Dec 31, 2025
Viaarxiv icon

From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration

Add code
Dec 23, 2025
Viaarxiv icon

PSI3D: Plug-and-Play 3D Stochastic Inference with Slice-wise Latent Diffusion Prior

Add code
Dec 20, 2025
Viaarxiv icon

Distillation of Discrete Diffusion by Exact Conditional Distribution Matching

Add code
Dec 15, 2025
Viaarxiv icon

Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding

Add code
Dec 11, 2025
Viaarxiv icon

PRISM: Probabilistic and Robust Inverse Solver with Measurement-Conditioned Diffusion Prior for Blind Inverse Problems

Add code
Sep 19, 2025
Viaarxiv icon

Dataset and Benchmark for Enhancing Critical Retained Foreign Object Detection

Add code
Jul 09, 2025
Viaarxiv icon

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Add code
Jun 11, 2025
Viaarxiv icon