Picture for Omkar Thawakar

Omkar Thawakar

Thinking Beyond Labels: Vocabulary-Free Fine-Grained Recognition using Reasoning-Augmented LMMs

Add code
Dec 21, 2025
Viaarxiv icon

How Good are Foundation Models in Step-by-Step Embodied Reasoning?

Add code
Sep 18, 2025
Figure 1 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 2 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 3 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 4 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Viaarxiv icon

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications

Add code
Aug 19, 2025
Viaarxiv icon

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs

Add code
May 26, 2025
Figure 1 for Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Figure 2 for Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Figure 3 for Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Figure 4 for Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Viaarxiv icon

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark

Add code
May 22, 2025
Figure 1 for ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
Figure 2 for ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
Figure 3 for ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
Figure 4 for ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
Viaarxiv icon

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

Add code
Mar 13, 2025
Figure 1 for DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Figure 2 for DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Figure 3 for DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Figure 4 for DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Viaarxiv icon

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

Add code
Feb 28, 2025
Viaarxiv icon

Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

Add code
Feb 20, 2025
Viaarxiv icon

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Add code
Jan 10, 2025
Figure 1 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 2 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 3 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 4 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Viaarxiv icon

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Add code
Nov 25, 2024
Figure 1 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Figure 2 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Figure 3 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Figure 4 for All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Viaarxiv icon