Picture for Omkar Thawakar

Omkar Thawakar

DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding

Add code
Jan 27, 2026
Viaarxiv icon

Thinking Beyond Labels: Vocabulary-Free Fine-Grained Recognition using Reasoning-Augmented LMMs

Add code
Dec 21, 2025
Viaarxiv icon

How Good are Foundation Models in Step-by-Step Embodied Reasoning?

Add code
Sep 18, 2025
Figure 1 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 2 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 3 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 4 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Viaarxiv icon

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications

Add code
Aug 19, 2025
Viaarxiv icon

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs

Add code
May 26, 2025
Figure 1 for Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Figure 2 for Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Figure 3 for Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Figure 4 for Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Viaarxiv icon

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark

Add code
May 22, 2025
Figure 1 for ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
Figure 2 for ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
Figure 3 for ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
Figure 4 for ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
Viaarxiv icon

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

Add code
Mar 13, 2025
Figure 1 for DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Figure 2 for DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Figure 3 for DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Figure 4 for DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Viaarxiv icon

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

Add code
Feb 28, 2025
Viaarxiv icon

Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

Add code
Feb 20, 2025
Viaarxiv icon

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Add code
Jan 10, 2025
Figure 1 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 2 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 3 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Figure 4 for LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Viaarxiv icon