Picture for Salman Khan

Salman Khan

Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models

Add code
Feb 03, 2025
Viaarxiv icon

GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing

Add code
Jan 23, 2025
Viaarxiv icon

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Add code
Jan 10, 2025
Figure 1 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 2 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 3 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 4 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Viaarxiv icon

Discriminative Image Generation with Diffusion Models for Zero-Shot Learning

Add code
Dec 23, 2024
Figure 1 for Discriminative Image Generation with Diffusion Models for Zero-Shot Learning
Figure 2 for Discriminative Image Generation with Diffusion Models for Zero-Shot Learning
Figure 3 for Discriminative Image Generation with Diffusion Models for Zero-Shot Learning
Figure 4 for Discriminative Image Generation with Diffusion Models for Zero-Shot Learning
Viaarxiv icon

EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues

Add code
Dec 19, 2024
Figure 1 for EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
Figure 2 for EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
Figure 3 for EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
Figure 4 for EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
Viaarxiv icon

UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities

Add code
Dec 13, 2024
Viaarxiv icon

Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation

Add code
Dec 12, 2024
Viaarxiv icon

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Add code
Dec 10, 2024
Viaarxiv icon

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Add code
Nov 28, 2024
Figure 1 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Figure 2 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Figure 3 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Figure 4 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Viaarxiv icon

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Add code
Nov 25, 2024
Figure 1 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Figure 2 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Figure 3 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Figure 4 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Viaarxiv icon