Picture for Salman Khan

Salman Khan

Promptception: How Sensitive Are Large Multimodal Models to Prompts?

Add code
Sep 04, 2025
Viaarxiv icon

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications

Add code
Aug 19, 2025
Viaarxiv icon

Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation

Add code
Aug 12, 2025
Viaarxiv icon

AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock

Add code
Jul 29, 2025
Viaarxiv icon

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models

Add code
Jun 13, 2025
Viaarxiv icon

Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning

Add code
Jun 12, 2025
Viaarxiv icon

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

Add code
Jun 08, 2025
Viaarxiv icon

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

Add code
Jun 08, 2025
Viaarxiv icon

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation

Add code
Jun 06, 2025
Viaarxiv icon

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Add code
Jun 05, 2025
Viaarxiv icon