Picture for Lucas Beyer

Lucas Beyer

Dima

Gemma 3 Technical Report

Add code
Mar 25, 2025
Viaarxiv icon

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Add code
Feb 20, 2025
Viaarxiv icon

PaliGemma 2: A Family of Versatile VLMs for Transfer

Add code
Dec 04, 2024
Figure 1 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 2 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 3 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 4 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Viaarxiv icon

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Figure 1 for PaliGemma: A versatile 3B VLM for transfer
Figure 2 for PaliGemma: A versatile 3B VLM for transfer
Figure 3 for PaliGemma: A versatile 3B VLM for transfer
Figure 4 for PaliGemma: A versatile 3B VLM for transfer
Viaarxiv icon

No Filter: Cultural and Socioeconomic Diversityin Contrastive Vision-Language Models

Add code
May 22, 2024
Viaarxiv icon

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Mar 28, 2024
Figure 1 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 2 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 3 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 4 for LocCa: Visual Pretraining with Location-aware Captioners
Viaarxiv icon

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Add code
Oct 17, 2023
Figure 1 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 2 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 3 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 4 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Viaarxiv icon

Image Captioners Are Scalable Vision Learners Too

Add code
Jun 13, 2023
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

Three Towers: Flexible Contrastive Learning with Pretrained Image Models

Add code
May 29, 2023
Figure 1 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 2 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 3 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 4 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Viaarxiv icon