Picture for Jiasen Lu

Jiasen Lu

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities

Add code
Nov 07, 2024
Viaarxiv icon

MM-Ego: Towards Building Egocentric Multimodal LLMs

Add code
Oct 09, 2024
Figure 1 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 2 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 3 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 4 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Viaarxiv icon

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

SoupLM: Model Integration in Large Language and Multi-Modal Models

Add code
Jul 11, 2024
Viaarxiv icon

Preserving Identity with Variational Score for General-purpose 3D Editing

Add code
Jun 13, 2024
Viaarxiv icon

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Add code
Dec 28, 2023
Viaarxiv icon

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Add code
Jun 17, 2022
Figure 1 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 2 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 3 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 4 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Viaarxiv icon

ASC me to Do Anything: Multi-task Training for Embodied AI

Add code
Feb 14, 2022
Viaarxiv icon

MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

Add code
Jan 07, 2022
Figure 1 for MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Figure 2 for MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Figure 3 for MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Figure 4 for MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Viaarxiv icon

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

Add code
Nov 29, 2021
Figure 1 for A Simple Long-Tailed Recognition Baseline via Vision-Language Model
Figure 2 for A Simple Long-Tailed Recognition Baseline via Vision-Language Model
Figure 3 for A Simple Long-Tailed Recognition Baseline via Vision-Language Model
Figure 4 for A Simple Long-Tailed Recognition Baseline via Vision-Language Model
Viaarxiv icon