Picture for Song Han

Song Han

University of Connecticut

SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference

Add code
Jun 03, 2026
Viaarxiv icon

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Add code
Jun 01, 2026
Viaarxiv icon

Cosmos 3: Omnimodal World Models for Physical AI

Add code
Jun 01, 2026
Viaarxiv icon

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

Add code
May 28, 2026
Viaarxiv icon

Grounded 3D-Aware Spatial Vision-Language Modeling

Add code
May 28, 2026
Viaarxiv icon

JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search

Add code
May 26, 2026
Viaarxiv icon

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

Add code
May 25, 2026
Viaarxiv icon

Hide to Guide: Learning via Semantic Masking

Add code
May 24, 2026
Viaarxiv icon

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Add code
May 19, 2026
Viaarxiv icon

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

Add code
May 14, 2026
Viaarxiv icon