Picture for Rao Muhammad Anwer

Rao Muhammad Anwer

Tracking Meets Large Multimodal Models for Driving Scenario Understanding

Add code
Mar 18, 2025
Viaarxiv icon

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

Add code
Mar 13, 2025
Viaarxiv icon

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Add code
Mar 06, 2025
Viaarxiv icon

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

Add code
Feb 28, 2025
Viaarxiv icon

CLIMB-3D: Continual Learning for Imbalanced 3D Instance Segmentation

Add code
Feb 24, 2025
Viaarxiv icon

Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

Add code
Feb 20, 2025
Viaarxiv icon

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Add code
Jan 10, 2025
Figure 1 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 2 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 3 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 4 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Viaarxiv icon

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Add code
Nov 25, 2024
Figure 1 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Figure 2 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Figure 3 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Figure 4 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Viaarxiv icon

AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning

Add code
Oct 10, 2024
Viaarxiv icon

DB-SAM: Delving into High Quality Universal Medical Image Segmentation

Add code
Oct 05, 2024
Viaarxiv icon