Picture for Christopher Clark

Christopher Clark

One Diffusion to Generate Them All

Add code
Nov 25, 2024
Viaarxiv icon

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Add code
Dec 28, 2023
Viaarxiv icon

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Add code
Dec 14, 2023
Viaarxiv icon

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

Add code
Mar 28, 2023
Viaarxiv icon

I Can't Believe There's No Images! Learning Visual Tasks Using only Language Data

Add code
Dec 01, 2022
Viaarxiv icon

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Add code
Jun 17, 2022
Figure 1 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 2 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 3 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 4 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Viaarxiv icon

A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

Add code
Jun 03, 2022
Figure 1 for A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Figure 2 for A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Figure 3 for A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Figure 4 for A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Viaarxiv icon

Webly Supervised Concept Expansion for General Purpose Vision Models

Add code
Feb 04, 2022
Figure 1 for Webly Supervised Concept Expansion for General Purpose Vision Models
Figure 2 for Webly Supervised Concept Expansion for General Purpose Vision Models
Figure 3 for Webly Supervised Concept Expansion for General Purpose Vision Models
Figure 4 for Webly Supervised Concept Expansion for General Purpose Vision Models
Viaarxiv icon

Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

Add code
Dec 01, 2021
Viaarxiv icon