Picture for Filip Pavetic

Filip Pavetic

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Mar 28, 2024
Figure 1 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 2 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 3 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 4 for LocCa: Visual Pretraining with Location-aware Captioners
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Add code
Oct 17, 2023
Figure 1 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 2 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 3 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 4 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Add code
Mar 30, 2023
Viaarxiv icon

FlexiViT: One Model for All Patch Sizes

Add code
Dec 15, 2022
Viaarxiv icon