Picture for Afshin Dehghan

Afshin Dehghan

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

Add code
Mar 27, 2025
Viaarxiv icon

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

Add code
Mar 17, 2025
Viaarxiv icon

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Add code
Feb 19, 2025
Viaarxiv icon

Cubify Anything: Scaling Indoor 3D Object Detection

Add code
Dec 05, 2024
Figure 1 for Cubify Anything: Scaling Indoor 3D Object Detection
Figure 2 for Cubify Anything: Scaling Indoor 3D Object Detection
Figure 3 for Cubify Anything: Scaling Indoor 3D Object Detection
Figure 4 for Cubify Anything: Scaling Indoor 3D Object Detection
Viaarxiv icon

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Add code
Sep 30, 2024
Viaarxiv icon

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Add code
Jul 22, 2024
Figure 1 for SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Figure 2 for SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Figure 3 for SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Figure 4 for SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Viaarxiv icon

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Add code
Jul 02, 2024
Figure 1 for Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Figure 2 for Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Figure 3 for Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Figure 4 for Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Viaarxiv icon

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Add code
Jun 14, 2024
Viaarxiv icon

4M: Massively Multimodal Masked Modeling

Add code
Dec 11, 2023
Viaarxiv icon

GAUDI: A Neural Architect for Immersive 3D Scene Generation

Add code
Jul 27, 2022
Figure 1 for GAUDI: A Neural Architect for Immersive 3D Scene Generation
Figure 2 for GAUDI: A Neural Architect for Immersive 3D Scene Generation
Figure 3 for GAUDI: A Neural Architect for Immersive 3D Scene Generation
Figure 4 for GAUDI: A Neural Architect for Immersive 3D Scene Generation
Viaarxiv icon