Picture for Dahun Kim

Dahun Kim

Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning

Add code
Nov 22, 2024
Viaarxiv icon

Learning Visual Grounding from Generative Vision and Language Model

Add code
Jul 18, 2024
Viaarxiv icon

OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All

Add code
May 25, 2024
Viaarxiv icon

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

Add code
Nov 13, 2023
Viaarxiv icon

Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection

Add code
Sep 29, 2023
Viaarxiv icon

Contrastive Feature Masking Open-Vocabulary Vision Transformer

Add code
Sep 02, 2023
Viaarxiv icon

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

Add code
Aug 03, 2023
Viaarxiv icon

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

Add code
May 11, 2023
Viaarxiv icon

RECLIP: Resource-efficient CLIP by Training with Small Images

Add code
Apr 12, 2023
Figure 1 for RECLIP: Resource-efficient CLIP by Training with Small Images
Figure 2 for RECLIP: Resource-efficient CLIP by Training with Small Images
Figure 3 for RECLIP: Resource-efficient CLIP by Training with Small Images
Figure 4 for RECLIP: Resource-efficient CLIP by Training with Small Images
Viaarxiv icon

Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling

Add code
Apr 10, 2023
Viaarxiv icon