Picture for Zeyuan Chen

Zeyuan Chen

xLAM: A Family of Large Action Models to Empower AI Agent Systems

Add code
Sep 05, 2024
Viaarxiv icon

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Add code
Aug 22, 2024
Figure 1 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 2 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 3 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 4 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Viaarxiv icon

Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting

Add code
Aug 18, 2024
Viaarxiv icon

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Add code
Aug 16, 2024
Figure 1 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 2 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 3 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 4 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Viaarxiv icon

OmniControlNet: Dual-stage Integration for Conditional Image Generation

Add code
Jun 09, 2024
Viaarxiv icon

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Add code
Mar 17, 2024
Viaarxiv icon

Bayesian Diffusion Models for 3D Shape Reconstruction

Add code
Mar 11, 2024
Viaarxiv icon

Pattern-wise Transparent Sequential Recommendation

Add code
Feb 29, 2024
Viaarxiv icon

Dolfin: Diffusion Layout Transformers without Autoencoder

Add code
Oct 25, 2023
Viaarxiv icon

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Add code
Aug 19, 2023
Figure 1 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 2 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 3 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 4 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Viaarxiv icon