Picture for Fahad Shahbaz Khan

Fahad Shahbaz Khan

RainDiff: End-to-end Precipitation Nowcasting Via Token-wise Attention Diffusion

Add code
Oct 16, 2025
Viaarxiv icon

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

Add code
Oct 09, 2025
Viaarxiv icon

How Good are Foundation Models in Step-by-Step Embodied Reasoning?

Add code
Sep 18, 2025
Figure 1 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 2 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 3 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 4 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Viaarxiv icon

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications

Add code
Aug 19, 2025
Viaarxiv icon

Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation

Add code
Aug 12, 2025
Viaarxiv icon

RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping

Add code
Jul 31, 2025
Figure 1 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Figure 2 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Figure 3 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Figure 4 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Viaarxiv icon

AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock

Add code
Jul 29, 2025
Viaarxiv icon

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models

Add code
Jun 13, 2025
Viaarxiv icon

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

Add code
Jun 08, 2025
Viaarxiv icon

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation

Add code
Jun 06, 2025
Viaarxiv icon