Picture for Xichen Pan

Xichen Pan

Exploring MLLM-Diffusion Information Transfer with MetaCanvas

Add code
Dec 12, 2025
Viaarxiv icon

Think Then Embed: Generative Context Improves Multimodal Embedding

Add code
Oct 06, 2025
Viaarxiv icon

Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation

Add code
Jun 12, 2025
Viaarxiv icon

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Add code
May 15, 2025
Figure 1 for Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Figure 2 for Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Figure 3 for Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Figure 4 for Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Viaarxiv icon

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Add code
May 14, 2025
Viaarxiv icon

Transfer between Modalities with MetaQueries

Add code
Apr 08, 2025
Figure 1 for Transfer between Modalities with MetaQueries
Figure 2 for Transfer between Modalities with MetaQueries
Figure 3 for Transfer between Modalities with MetaQueries
Figure 4 for Transfer between Modalities with MetaQueries
Viaarxiv icon

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop

Add code
Mar 12, 2025
Figure 1 for PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop
Figure 2 for PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop
Figure 3 for PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop
Figure 4 for PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop
Viaarxiv icon

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Add code
Jun 24, 2024
Figure 1 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 2 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 3 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 4 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Viaarxiv icon

Image Sculpting: Precise Object Editing with 3D Geometry Control

Add code
Jan 02, 2024
Viaarxiv icon

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Add code
Oct 04, 2023
Viaarxiv icon