Picture for Jae Sung Park

Jae Sung Park

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

Add code
Nov 12, 2024
Viaarxiv icon

ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition

Add code
Oct 08, 2024
Figure 1 for ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
Figure 2 for ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
Figure 3 for ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
Figure 4 for ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
Viaarxiv icon

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

Add code
Jul 02, 2024
Viaarxiv icon

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Add code
May 29, 2024
Viaarxiv icon

Agent AI: Surveying the Horizons of Multimodal Interaction

Add code
Jan 07, 2024
Figure 1 for Agent AI: Surveying the Horizons of Multimodal Interaction
Figure 2 for Agent AI: Surveying the Horizons of Multimodal Interaction
Figure 3 for Agent AI: Surveying the Horizons of Multimodal Interaction
Figure 4 for Agent AI: Surveying the Horizons of Multimodal Interaction
Viaarxiv icon

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

Add code
Dec 12, 2023
Viaarxiv icon

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

Add code
May 01, 2023
Figure 1 for ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Figure 2 for ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Figure 3 for ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Figure 4 for ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Viaarxiv icon

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

Add code
Feb 10, 2022
Figure 1 for The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Figure 2 for The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Figure 3 for The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Figure 4 for The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Viaarxiv icon

MERLOT: Multimodal Neural Script Knowledge Models

Add code
Jun 10, 2021
Figure 1 for MERLOT: Multimodal Neural Script Knowledge Models
Figure 2 for MERLOT: Multimodal Neural Script Knowledge Models
Figure 3 for MERLOT: Multimodal Neural Script Knowledge Models
Figure 4 for MERLOT: Multimodal Neural Script Knowledge Models
Viaarxiv icon