Picture for Ibrahim Alabdulmohsin

Ibrahim Alabdulmohsin

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Add code
Feb 20, 2025
Viaarxiv icon

A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?

Add code
Feb 19, 2025
Viaarxiv icon

Harnessing Language's Fractal Geometry with Recursive Inference Scaling

Add code
Feb 11, 2025
Viaarxiv icon

Scaling Pre-training to One Hundred Billion Data for Vision Language Models

Add code
Feb 11, 2025
Viaarxiv icon

PaliGemma 2: A Family of Versatile VLMs for Transfer

Add code
Dec 04, 2024
Figure 1 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 2 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 3 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 4 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Viaarxiv icon

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Figure 1 for PaliGemma: A versatile 3B VLM for transfer
Figure 2 for PaliGemma: A versatile 3B VLM for transfer
Figure 3 for PaliGemma: A versatile 3B VLM for transfer
Figure 4 for PaliGemma: A versatile 3B VLM for transfer
Viaarxiv icon

No Filter: Cultural and Socioeconomic Diversityin Contrastive Vision-Language Models

Add code
May 22, 2024
Viaarxiv icon

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Mar 28, 2024
Figure 1 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 2 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 3 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 4 for LocCa: Visual Pretraining with Location-aware Captioners
Viaarxiv icon

CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

Add code
Mar 07, 2024
Figure 1 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 2 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 3 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 4 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Viaarxiv icon

Fractal Patterns May Unravel the Intelligence in Next-Token Prediction

Add code
Feb 02, 2024
Viaarxiv icon