Picture for Hanwang Zhang

Hanwang Zhang

DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer

Add code
Apr 15, 2026
Viaarxiv icon

Efficient Matrix Implementation for Rotary Position Embedding

Add code
Apr 10, 2026
Viaarxiv icon

Adapting Point Cloud Analysis via Multimodal Bayesian Distribution Learning

Add code
Mar 23, 2026
Viaarxiv icon

Principled Steering via Null-space Projection for Jailbreak Defense in Vision-Language Models

Add code
Mar 23, 2026
Viaarxiv icon

Scene Graph-guided SegCaptioning Transformer with Fine-grained Alignment for Controllable Video Segmentation and Captioning

Add code
Mar 21, 2026
Viaarxiv icon

MuSteerNet: Human Reaction Generation from Videos via Observation-Reaction Mutual Steering

Add code
Mar 20, 2026
Viaarxiv icon

Modeling Cross-vision Synergy for Unified Large Vision Model

Add code
Mar 03, 2026
Viaarxiv icon

Thinking with Images as Continuous Actions: Numerical Visual Chain-of-Thought

Add code
Feb 27, 2026
Viaarxiv icon

Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation

Add code
Feb 27, 2026
Viaarxiv icon

Reducing Class-Wise Performance Disparity via Margin Regularization

Add code
Jan 30, 2026
Viaarxiv icon