Picture for Jiasen Lu

Jiasen Lu

STIV: Scalable Text and Image Conditioned Video Generation

Add code
Dec 10, 2024
Viaarxiv icon

One Diffusion to Generate Them All

Add code
Nov 25, 2024
Figure 1 for One Diffusion to Generate Them All
Figure 2 for One Diffusion to Generate Them All
Figure 3 for One Diffusion to Generate Them All
Figure 4 for One Diffusion to Generate Them All
Viaarxiv icon

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities

Add code
Nov 07, 2024
Figure 1 for The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Figure 2 for The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Figure 3 for The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Figure 4 for The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Viaarxiv icon

MM-Ego: Towards Building Egocentric Multimodal LLMs

Add code
Oct 09, 2024
Figure 1 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 2 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 3 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 4 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Viaarxiv icon

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

SoupLM: Model Integration in Large Language and Multi-Modal Models

Add code
Jul 11, 2024
Viaarxiv icon

Preserving Identity with Variational Score for General-purpose 3D Editing

Add code
Jun 13, 2024
Viaarxiv icon

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Add code
Dec 28, 2023
Viaarxiv icon

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Add code
Jun 17, 2022
Figure 1 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 2 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 3 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 4 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Viaarxiv icon

ASC me to Do Anything: Multi-task Training for Embodied AI

Add code
Feb 14, 2022
Viaarxiv icon