Picture for Gao Huang

Gao Huang

InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation

Add code
May 14, 2026
Viaarxiv icon

Steering Visual Generation in Unified Multimodal Models with Understanding Supervision

Add code
May 07, 2026
Viaarxiv icon

Linear-Time Global Visual Modeling without Explicit Attention

Add code
May 06, 2026
Viaarxiv icon

Linearizing Vision Transformer with Test-Time Training

Add code
May 04, 2026
Viaarxiv icon

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

Add code
Apr 28, 2026
Viaarxiv icon

Bridging the RGB-IR Gap: Consensus and Discrepancy Modeling for Text-Guided Multispectral Detection

Add code
Apr 13, 2026
Viaarxiv icon

MAG-3D: Multi-Agent Grounded Reasoning for 3D Understanding

Add code
Apr 10, 2026
Viaarxiv icon

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Add code
Mar 19, 2026
Viaarxiv icon

UltraStar: Semantic-Aware Star Graph Modeling for Echocardiography Navigation

Add code
Mar 02, 2026
Viaarxiv icon

TwinRL-VLA: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation

Add code
Feb 09, 2026
Viaarxiv icon