Picture for Xichen Pan

Xichen Pan

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Add code
Jun 24, 2024
Figure 1 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 2 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 3 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 4 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Viaarxiv icon

Image Sculpting: Precise Object Editing with 3D Geometry Control

Add code
Jan 02, 2024
Viaarxiv icon

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Add code
Oct 04, 2023
Viaarxiv icon

Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation

Add code
Apr 19, 2023
Figure 1 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Figure 2 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Figure 3 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Figure 4 for Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation
Viaarxiv icon

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Add code
Nov 20, 2022
Figure 1 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Figure 2 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Figure 3 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Figure 4 for Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Viaarxiv icon

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition

Add code
Mar 26, 2022
Figure 1 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Figure 2 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Figure 3 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Figure 4 for Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Viaarxiv icon