Picture for Fahad Shahbaz Khan

Fahad Shahbaz Khan

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Add code
Mar 27, 2025
Viaarxiv icon

Tracking Meets Large Multimodal Models for Driving Scenario Understanding

Add code
Mar 18, 2025
Viaarxiv icon

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

Add code
Mar 13, 2025
Viaarxiv icon

Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology

Add code
Mar 13, 2025
Viaarxiv icon

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

Add code
Feb 28, 2025
Viaarxiv icon

C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

Add code
Feb 27, 2025
Viaarxiv icon

AirCast: Improving Air Pollution Forecasting Through Multi-Variable Data Alignment

Add code
Feb 25, 2025
Viaarxiv icon

Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

Add code
Feb 20, 2025
Viaarxiv icon

InterLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration

Add code
Feb 04, 2025
Viaarxiv icon

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Add code
Jan 10, 2025
Figure 1 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 2 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 3 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 4 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Viaarxiv icon