Picture for Mohamed Fazli Imam

Mohamed Fazli Imam

Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!

Add code
Jan 18, 2025
Viaarxiv icon

CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections

Add code
Nov 28, 2024
Viaarxiv icon