Picture for Matthias Minderer

Matthias Minderer

PaliGemma 2: A Family of Versatile VLMs for Transfer

Add code
Dec 04, 2024
Viaarxiv icon

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Figure 1 for PaliGemma: A versatile 3B VLM for transfer
Figure 2 for PaliGemma: A versatile 3B VLM for transfer
Figure 3 for PaliGemma: A versatile 3B VLM for transfer
Figure 4 for PaliGemma: A versatile 3B VLM for transfer
Viaarxiv icon

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection

Add code
Mar 21, 2024
Viaarxiv icon

Improving fine-grained understanding in image-text pre-training

Add code
Jan 18, 2024
Figure 1 for Improving fine-grained understanding in image-text pre-training
Figure 2 for Improving fine-grained understanding in image-text pre-training
Figure 3 for Improving fine-grained understanding in image-text pre-training
Figure 4 for Improving fine-grained understanding in image-text pre-training
Viaarxiv icon

Video OWL-ViT: Temporally-consistent open-world localization in video

Add code
Aug 22, 2023
Viaarxiv icon

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Add code
Jul 12, 2023
Viaarxiv icon

Scaling Open-Vocabulary Object Detection

Add code
Jun 16, 2023
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Feb 10, 2023
Viaarxiv icon

FlexiViT: One Model for All Patch Sizes

Add code
Dec 15, 2022
Viaarxiv icon