Picture for Michael Tschannen

Michael Tschannen

PaliGemma 2: A Family of Versatile VLMs for Transfer

Add code
Dec 04, 2024
Viaarxiv icon

JetFormer: An Autoregressive Generative Model of Raw Images and Text

Add code
Nov 29, 2024
Viaarxiv icon

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Figure 1 for PaliGemma: A versatile 3B VLM for transfer
Figure 2 for PaliGemma: A versatile 3B VLM for transfer
Figure 3 for PaliGemma: A versatile 3B VLM for transfer
Figure 4 for PaliGemma: A versatile 3B VLM for transfer
Viaarxiv icon

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Mar 28, 2024
Figure 1 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 2 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 3 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 4 for LocCa: Visual Pretraining with Location-aware Captioners
Viaarxiv icon

Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers

Add code
Jan 03, 2024
Viaarxiv icon

GIVT: Generative Infinite-Vocabulary Transformers

Add code
Dec 04, 2023
Viaarxiv icon

Finite Scalar Quantization: VQ-VAE Made Simple

Add code
Oct 12, 2023
Viaarxiv icon

Image Captioners Are Scalable Vision Learners Too

Add code
Jun 13, 2023
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

M2T: Masking Transformers Twice for Faster Decoding

Add code
Apr 14, 2023
Viaarxiv icon