Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Nov 28, 2024

Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Kartik Kuckreja, Fahad Shahbaz Khan, Paolo Fraccaro, Alexandre Lacoste, Salman Khan

Figure 1 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Figure 2 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Figure 3 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Figure 4 for GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Share this with someone who'll enjoy it:

Abstract:While numerous recent benchmarks focus on evaluating generic Vision-Language Models (VLMs), they fall short in addressing the unique demands of geospatial applications. Generic VLM benchmarks are not designed to handle the complexities of geospatial data, which is critical for applications such as environmental monitoring, urban planning, and disaster management. Some of the unique challenges in geospatial domain include temporal analysis for changes, counting objects in large quantities, detecting tiny objects, and understanding relationships between entities occurring in Remote Sensing imagery. To address this gap in the geospatial domain, we present GEOBench-VLM, a comprehensive benchmark specifically designed to evaluate VLMs on geospatial tasks, including scene understanding, object counting, localization, fine-grained categorization, and temporal analysis. Our benchmark features over 10,000 manually verified instructions and covers a diverse set of variations in visual conditions, object type, and scale. We evaluate several state-of-the-art VLMs to assess their accuracy within the geospatial context. The results indicate that although existing VLMs demonstrate potential, they face challenges when dealing with geospatial-specific examples, highlighting the room for further improvements. Specifically, the best-performing GPT4o achieves only 40\% accuracy on MCQs, which is only double the random guess performance. Our benchmark is publicly available at https://github.com/The-AI-Alliance/GEO-Bench-VLM .

View paper on

Share this with someone who'll enjoy it:

Title:GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Paper and Code