Picture for Saining Xie

Saining Xie

RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space

Add code
Jun 12, 2026
Viaarxiv icon

Benchmarking Visual State Tracking in Multimodal Video Understanding

Add code
Jun 02, 2026
Viaarxiv icon

PaintBench: Deterministic Evaluation of Precise Visual Editing

Add code
May 29, 2026
Viaarxiv icon

Cambrian-P: Pose-Grounded Video Understanding

Add code
May 21, 2026
Viaarxiv icon

Improved Baselines with Representation Autoencoders

Add code
May 18, 2026
Viaarxiv icon

Image Generators are Generalist Vision Learners

Add code
Apr 22, 2026
Viaarxiv icon

Repurposing Geometric Foundation Models for Multi-view Diffusion

Add code
Mar 23, 2026
Viaarxiv icon

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Add code
Mar 03, 2026
Viaarxiv icon

Solaris: Building a Multiplayer Video World Model in Minecraft

Add code
Feb 26, 2026
Viaarxiv icon

Self-Refining Video Sampling

Add code
Jan 26, 2026
Viaarxiv icon