Picture for Yinfei Yang

Yinfei Yang

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

Add code
Mar 27, 2025
Viaarxiv icon

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

Add code
Mar 17, 2025
Viaarxiv icon

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing

Add code
Mar 16, 2025
Viaarxiv icon

DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation

Add code
Mar 13, 2025
Viaarxiv icon

CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling

Add code
Feb 03, 2025
Figure 1 for CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling
Figure 2 for CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling
Figure 3 for CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling
Figure 4 for CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling
Viaarxiv icon

STIV: Scalable Text and Image Conditioned Video Generation

Add code
Dec 10, 2024
Viaarxiv icon

Multimodal Autoregressive Pre-training of Large Vision Encoders

Add code
Nov 21, 2024
Figure 1 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 2 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 3 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 4 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Viaarxiv icon

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Add code
Oct 24, 2024
Viaarxiv icon

Improve Vision Language Model Chain-of-thought Reasoning

Add code
Oct 21, 2024
Figure 1 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 2 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 3 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 4 for Improve Vision Language Model Chain-of-thought Reasoning
Viaarxiv icon

MM-Ego: Towards Building Egocentric Multimodal LLMs

Add code
Oct 09, 2024
Figure 1 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 2 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 3 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 4 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Viaarxiv icon