Picture for Saining Xie

Saining Xie

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop

Add code
Mar 12, 2025
Viaarxiv icon

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Add code
Jan 28, 2025
Figure 1 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Figure 2 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Figure 3 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Figure 4 for SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Viaarxiv icon

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Add code
Jan 16, 2025
Viaarxiv icon

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Add code
Dec 18, 2024
Viaarxiv icon

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Add code
Dec 18, 2024
Figure 1 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 2 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 3 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 4 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Viaarxiv icon

Altogether: Image Captioning via Re-aligning Alt-text

Add code
Oct 22, 2024
Figure 1 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 2 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 3 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 4 for Altogether: Image Captioning via Re-aligning Alt-text
Viaarxiv icon

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Add code
Oct 09, 2024
Figure 1 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Figure 2 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Figure 3 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Figure 4 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Viaarxiv icon

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing

Add code
Oct 08, 2024
Viaarxiv icon

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Add code
Oct 04, 2024
Figure 1 for AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Figure 2 for AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Figure 3 for AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Figure 4 for AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Viaarxiv icon

Fast Encoding and Decoding for Implicit Video Representation

Add code
Sep 28, 2024
Viaarxiv icon