Picture for Saining Xie

Saining Xie

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Add code
Apr 14, 2025
Viaarxiv icon

Transfer between Modalities with MetaQueries

Add code
Apr 08, 2025
Viaarxiv icon

Scaling Language-Free Visual Representation Learning

Add code
Apr 01, 2025
Viaarxiv icon

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop

Add code
Mar 12, 2025
Viaarxiv icon

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Add code
Jan 28, 2025
Figure 1 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Figure 2 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Figure 3 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Figure 4 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Viaarxiv icon

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Add code
Jan 16, 2025
Viaarxiv icon

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Add code
Dec 18, 2024
Viaarxiv icon

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Add code
Dec 18, 2024
Figure 1 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 2 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 3 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 4 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Viaarxiv icon

Altogether: Image Captioning via Re-aligning Alt-text

Add code
Oct 22, 2024
Figure 1 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 2 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 3 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 4 for Altogether: Image Captioning via Re-aligning Alt-text
Viaarxiv icon

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Add code
Oct 09, 2024
Figure 1 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Figure 2 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Figure 3 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Figure 4 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Viaarxiv icon