Picture for Mustafa Shukor

Mustafa Shukor

Action100M: A Large-scale Video Action Dataset

Add code
Jan 15, 2026
Viaarxiv icon

VL-JEPA: Joint Embedding Predictive Architecture for Vision-language

Add code
Dec 11, 2025
Viaarxiv icon

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Add code
Apr 10, 2025
Figure 1 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Figure 2 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Figure 3 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Figure 4 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Viaarxiv icon

Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment

Add code
Jan 06, 2025
Viaarxiv icon

Multimodal Autoregressive Pre-training of Large Vision Encoders

Add code
Nov 21, 2024
Figure 1 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 2 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 3 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 4 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Viaarxiv icon

Skipping Computations in Multimodal LLMs

Add code
Oct 12, 2024
Viaarxiv icon

A Concept-Based Explainability Framework for Large Multimodal Models

Add code
Jun 12, 2024
Viaarxiv icon

Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features

Add code
Jun 05, 2024
Figure 1 for Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features
Figure 2 for Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features
Figure 3 for Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features
Figure 4 for Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features
Viaarxiv icon

Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs

Add code
May 26, 2024
Figure 1 for Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Figure 2 for Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Figure 3 for Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Figure 4 for Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Viaarxiv icon

What Makes Multimodal In-Context Learning Work?

Add code
Apr 25, 2024
Figure 1 for What Makes Multimodal In-Context Learning Work?
Figure 2 for What Makes Multimodal In-Context Learning Work?
Figure 3 for What Makes Multimodal In-Context Learning Work?
Figure 4 for What Makes Multimodal In-Context Learning Work?
Viaarxiv icon