Picture for Matthias Minderer

Matthias Minderer

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Viaarxiv icon

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection

Add code
Mar 21, 2024
Viaarxiv icon

Improving fine-grained understanding in image-text pre-training

Add code
Jan 18, 2024
Viaarxiv icon

Video OWL-ViT: Temporally-consistent open-world localization in video

Add code
Aug 22, 2023
Viaarxiv icon

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Add code
Jul 12, 2023
Viaarxiv icon

Scaling Open-Vocabulary Object Detection

Add code
Jun 16, 2023
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Feb 10, 2023
Viaarxiv icon

FlexiViT: One Model for All Patch Sizes

Add code
Dec 15, 2022
Viaarxiv icon

Decoder Denoising Pretraining for Semantic Segmentation

Add code
May 23, 2022
Figure 1 for Decoder Denoising Pretraining for Semantic Segmentation
Figure 2 for Decoder Denoising Pretraining for Semantic Segmentation
Figure 3 for Decoder Denoising Pretraining for Semantic Segmentation
Figure 4 for Decoder Denoising Pretraining for Semantic Segmentation
Viaarxiv icon