Picture for Salman Khan

Salman Khan

Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs

Add code
Mar 29, 2025
Viaarxiv icon

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Add code
Mar 27, 2025
Viaarxiv icon

Tracking Meets Large Multimodal Models for Driving Scenario Understanding

Add code
Mar 18, 2025
Viaarxiv icon

How Good is my Histopathology Vision-Language Foundation Model? A Holistic Benchmark

Add code
Mar 17, 2025
Viaarxiv icon

O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models

Add code
Mar 15, 2025
Viaarxiv icon

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

Add code
Mar 13, 2025
Viaarxiv icon

Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology

Add code
Mar 13, 2025
Viaarxiv icon

Handwritten Digit Recognition: An Ensemble-Based Approach for Superior Performance

Add code
Mar 08, 2025
Viaarxiv icon

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Add code
Mar 06, 2025
Viaarxiv icon

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

Add code
Feb 28, 2025
Viaarxiv icon