Picture for Salman Khan

Salman Khan

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Add code
Nov 07, 2024
Figure 1 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Figure 2 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Figure 3 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Figure 4 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Viaarxiv icon

ROAD-Waymo: Action Awareness at Scale for Autonomous Driving

Add code
Nov 03, 2024
Viaarxiv icon

COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes

Add code
Oct 31, 2024
Viaarxiv icon

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark

Add code
Oct 24, 2024
Viaarxiv icon

How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?

Add code
Oct 23, 2024
Viaarxiv icon

Frontiers in Intelligent Colonoscopy

Add code
Oct 22, 2024
Viaarxiv icon

AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment

Add code
Oct 02, 2024
Viaarxiv icon

Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking

Add code
Oct 02, 2024
Viaarxiv icon

CDChat: A Large Multimodal Model for Remote Sensing Change Description

Add code
Sep 24, 2024
Viaarxiv icon

Efficient Localized Adaptation of Neural Weather Forecasting: A Case Study in the MENA Region

Add code
Sep 11, 2024
Viaarxiv icon