Abstract:While numerous recent benchmarks focus on evaluating generic Vision-Language Models (VLMs), they fall short in addressing the unique demands of geospatial applications. Generic VLM benchmarks are not designed to handle the complexities of geospatial data, which is critical for applications such as environmental monitoring, urban planning, and disaster management. Some of the unique challenges in geospatial domain include temporal analysis for changes, counting objects in large quantities, detecting tiny objects, and understanding relationships between entities occurring in Remote Sensing imagery. To address this gap in the geospatial domain, we present GEOBench-VLM, a comprehensive benchmark specifically designed to evaluate VLMs on geospatial tasks, including scene understanding, object counting, localization, fine-grained categorization, and temporal analysis. Our benchmark features over 10,000 manually verified instructions and covers a diverse set of variations in visual conditions, object type, and scale. We evaluate several state-of-the-art VLMs to assess their accuracy within the geospatial context. The results indicate that although existing VLMs demonstrate potential, they face challenges when dealing with geospatial-specific examples, highlighting the room for further improvements. Specifically, the best-performing GPT4o achieves only 40\% accuracy on MCQs, which is only double the random guess performance. Our benchmark is publicly available at https://github.com/The-AI-Alliance/GEO-Bench-VLM .
Abstract:This study explores the effectiveness of multi-temporal satellite imagery for better functional field boundary delineation using deep learning semantic segmentation architecture on two distinct geographical and multi-scale farming systems of Netherlands and Pakistan. Multidate images of April, August and October 2022 were acquired for PlanetScope and Sentinel-2 in sub regions of Netherlands and November 2022, February and March 2023 for selected area of Dunyapur in Pakistan. For Netherlands, Basic registration crop parcels (BRP) vector layer was used as labeled training data. while self-crafted field boundary vector data were utilized for Pakistan. Four deep learning models with UNET architecture were evaluated using different combinations of multi-date images and NDVI stacks in the Netherlands subregions. A comparative analysis of IoU scores assessed the effectiveness of the proposed multi-date NDVI stack approach. These findings were then applied for transfer learning, using pre-trained models from the Netherlands on the selected area in Pakistan. Additionally, separate models were trained using self-crafted field boundary data for Pakistan, and combined models were developed using data from both the Netherlands and Pakistan. Results indicate that multi-date NDVI stacks provide additional temporal context, reflecting crop growth over different times of the season. The study underscores the critical role of multi-scale ground information from diverse geographical areas in developing robust and universally applicable models for field boundary delineation. The results also highlight the importance of fine spatial resolution for extraction of field boundaries in regions with small scale framing. The findings can be extended to multi-scale implementations for improved automatic field boundary delineation in heterogeneous agricultural environments.