Picture for Andreas Steiner

Andreas Steiner

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Viaarxiv icon

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Mar 28, 2024
Figure 1 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 2 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 3 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 4 for LocCa: Visual Pretraining with Location-aware Captioners
Viaarxiv icon

CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

Add code
Mar 07, 2024
Figure 1 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 2 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 3 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 4 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Viaarxiv icon

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Add code
Jul 12, 2023
Viaarxiv icon

Image Captioners Are Scalable Vision Learners Too

Add code
Jun 13, 2023
Viaarxiv icon

Three Towers: Flexible Contrastive Learning with Pretrained Image Models

Add code
May 29, 2023
Figure 1 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 2 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 3 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 4 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Viaarxiv icon

A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Add code
Mar 30, 2023
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Feb 10, 2023
Viaarxiv icon

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Add code
Sep 16, 2022
Figure 1 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 2 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 3 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 4 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Viaarxiv icon

LiT: Zero-Shot Transfer with Locked-image Text Tuning

Add code
Nov 15, 2021
Figure 1 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Figure 2 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Figure 3 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Figure 4 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Viaarxiv icon