Picture for Michael Tschannen

Michael Tschannen

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Viaarxiv icon

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Mar 28, 2024
Figure 1 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 2 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 3 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 4 for LocCa: Visual Pretraining with Location-aware Captioners
Viaarxiv icon

Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers

Add code
Jan 03, 2024
Viaarxiv icon

GIVT: Generative Infinite-Vocabulary Transformers

Add code
Dec 04, 2023
Viaarxiv icon

Finite Scalar Quantization: VQ-VAE Made Simple

Add code
Oct 12, 2023
Viaarxiv icon

Image Captioners Are Scalable Vision Learners Too

Add code
Jun 13, 2023
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Viaarxiv icon

M2T: Masking Transformers Twice for Faster Decoding

Add code
Apr 14, 2023
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Feb 10, 2023
Viaarxiv icon

FlexiViT: One Model for All Patch Sizes

Add code
Dec 15, 2022
Viaarxiv icon