Picture for Jiasen Lu

Jiasen Lu

STIV: Scalable Text and Image Conditioned Video Generation

Add code
Dec 10, 2024
Viaarxiv icon

One Diffusion to Generate Them All

Add code
Nov 25, 2024
Viaarxiv icon

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities

Add code
Nov 07, 2024
Figure 1 for The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Figure 2 for The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Figure 3 for The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Figure 4 for The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Viaarxiv icon

MM-Ego: Towards Building Egocentric Multimodal LLMs

Add code
Oct 09, 2024
Figure 1 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 2 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 3 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 4 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Viaarxiv icon

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

SoupLM: Model Integration in Large Language and Multi-Modal Models

Add code
Jul 11, 2024
Viaarxiv icon

Preserving Identity with Variational Score for General-purpose 3D Editing

Add code
Jun 13, 2024
Viaarxiv icon

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Add code
Dec 28, 2023
Viaarxiv icon

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Add code
Jun 17, 2022
Figure 1 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 2 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 3 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 4 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Viaarxiv icon

ASC me to Do Anything: Multi-task Training for Embodied AI

Add code
Feb 14, 2022
Viaarxiv icon