Picture for Anelia Angelova

Anelia Angelova

Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning

Add code
Nov 22, 2024
Viaarxiv icon

3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation

Add code
Jan 04, 2024
Figure 1 for 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Figure 2 for 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Figure 3 for 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Figure 4 for 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Viaarxiv icon

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

Add code
Nov 13, 2023
Viaarxiv icon

Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection

Add code
Sep 29, 2023
Viaarxiv icon

Contrastive Feature Masking Open-Vocabulary Vision Transformer

Add code
Sep 02, 2023
Viaarxiv icon

Diversifying Joint Vision-Language Tokenization Learning

Add code
Jun 15, 2023
Viaarxiv icon

Joint Adaptive Representations for Image-Language Learning

Add code
Jun 01, 2023
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

Add code
May 11, 2023
Viaarxiv icon

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Add code
Mar 30, 2023
Viaarxiv icon