Abstract:Acquiring, processing, and visualizing geospatial data requires significant computing resources, especially for large spatio-temporal domains. This challenge hinders the rapid discovery of predictive features, which is essential for advancing geospatial modeling. To address this, we developed Similarity Search (Sims), a no-code web tool that allows users to visualize, compare, cluster, and perform similarity search over defined regions of interest using Google Earth Engine as a backend. Sims is designed to complement existing modeling tools by focusing on feature exploration rather than model creation. We demonstrate the utility of Sims through a case study analyzing simulated maize yield data in Rwanda, where we evaluate how different combinations of soil, weather, and agronomic features affect the clustering of yield response zones. Sims is open source and available at https://github.com/microsoft/Sims
Abstract:Rare object detection is a fundamental task in applied geospatial machine learning, however is often challenging due to large amounts of high-resolution satellite or aerial imagery and few or no labeled positive samples to start with. This paper addresses the problem of bootstrapping such a rare object detection task assuming there is no labeled data and no spatial prior over the area of interest. We propose novel offline and online cluster-based approaches for sampling patches that are significantly more efficient, in terms of exposing positive samples to a human annotator, than random sampling. We apply our methods for identifying bomas, or small enclosures for herd animals, in the Serengeti Mara region of Kenya and Tanzania. We demonstrate a significant enhancement in detection efficiency, achieving a positive sampling rate increase from 2% (random) to 30%. This advancement enables effective machine learning mapping even with minimal labeling budgets, exemplified by an F1 score on the boma detection task of 0.51 with a budget of 300 total patches.