Abstract:This technical report presents Prithvi-EO-2.0, a new geospatial foundation model that offers significant improvements over its predecessor, Prithvi-EO-1.0. Trained on 4.2M global time series samples from NASA's Harmonized Landsat and Sentinel-2 data archive at 30m resolution, the new 300M and 600M parameter models incorporate temporal and location embeddings for enhanced performance across various geospatial tasks. Through extensive benchmarking with GEO-Bench, the 600M version outperforms the previous Prithvi-EO model by 8\% across a range of tasks. It also outperforms six other geospatial foundation models when benchmarked on remote sensing tasks from different domains and resolutions (i.e. from 0.1m to 15m). The results demonstrate the versatility of the model in both classical earth observation and high-resolution applications. Early involvement of end-users and subject matter experts (SMEs) are among the key factors that contributed to the project's success. In particular, SME involvement allowed for constant feedback on model and dataset design, as well as successful customization for diverse SME-led applications in disaster response, land use and crop mapping, and ecosystem dynamics monitoring. Prithvi-EO-2.0 is available on Hugging Face and IBM terratorch, with additional resources on GitHub. The project exemplifies the Trusted Open Science approach embraced by all involved organizations.
Abstract:This study introduces a novel approach to terrain feature classification by incorporating spatial point pattern statistics into deep learning models. Inspired by the concept of location encoding, which aims to capture location characteristics to enhance GeoAI decision-making capabilities, we improve the GeoAI model by a knowledge driven approach to integrate both first-order and second-order effects of point patterns. This paper investigates how these spatial contexts impact the accuracy of terrain feature predictions. The results show that incorporating spatial point pattern statistics notably enhances model performance by leveraging different representations of spatial relationships.
Abstract:Retrieval and recommendation are two essential tasks in modern search tools. This paper introduces a novel retrieval-reranking framework leveraging Large Language Models (LLMs) to enhance the spatiotemporal and semantic associated mining and recommendation of relevant unusual climate and environmental events described in news articles and web posts. This framework uses advanced natural language processing techniques to address the limitations of traditional manual curation methods in terms of high labor cost and lack of scalability. Specifically, we explore an optimized solution to employ cutting-edge embedding models for semantically analyzing spatiotemporal events (news) and propose a Geo-Time Re-ranking (GT-R) strategy that integrates multi-faceted criteria including spatial proximity, temporal association, semantic similarity, and category-instructed similarity to rank and identify similar spatiotemporal events. We apply the proposed framework to a dataset of four thousand Local Environmental Observer (LEO) Network events, achieving top performance in recommending similar events among multiple cutting-edge dense retrieval models. The search and recommendation pipeline can be applied to a wide range of similar data search tasks dealing with geospatial and temporal data. We hope that by linking relevant events, we can better aid the general public to gain an enhanced understanding of climate change and its impact on different communities.
Abstract:In the past few years, Artificial Intelligence (AI)-based weather forecasting methods have widely demonstrated strong competitiveness among the weather forecasting systems. However, these methods are insufficient for high-spatial-resolution short-term nowcasting within 6 hours, which is crucial for warning short-duration, mesoscale and small-scale weather events. Geostationary satellite remote sensing provides detailed, high spatio-temporal and all-day observations, which can address the above limitations of existing methods. Therefore, this paper proposed an advanced data-driven thermal infrared cloud images forecasting model, "DaYu." Unlike existing data-driven weather forecasting models, DaYu is specifically designed for geostationary satellite observations, with a temporal resolution of 0.5 hours and a spatial resolution of ${0.05}^\circ$ $\times$ ${0.05}^\circ$. DaYu is based on a large-scale transformer architecture, which enables it to capture fine-grained cloud structures and learn fast-changing spatio-temporal evolution features effectively. Moreover, its attention mechanism design achieves a balance in computational complexity, making it practical for applications. DaYu not only achieves accurate forecasts up to 3 hours with a correlation coefficient higher than 0.9, 6 hours higher than 0.8, and 12 hours higher than 0.7, but also detects short-duration, mesoscale, and small-scale weather events with enhanced detail, effectively addressing the shortcomings of existing methods in providing detailed short-term nowcasting within 6 hours. Furthermore, DaYu has significant potential in short-term climate disaster prevention and mitigation.
Abstract:Geospatial Knowledge Graphs (GeoKGs) model geoentities (e.g., places and natural features) and spatial relationships in an interconnected manner, providing strong knowledge support for geographic applications, including data retrieval, question-answering, and spatial reasoning. However, existing methods for mining and reasoning from GeoKGs, such as popular knowledge graph embedding (KGE) techniques, lack geographic awareness. This study aims to enhance general-purpose KGE by developing new strategies and integrating geometric features of spatial relations, including topology, direction, and distance, to infuse the embedding process with geographic intuition. The new model is tested on downstream link prediction tasks, and the results show that the inclusion of geometric features, particularly topology and direction, improves prediction accuracy for both geoentities and spatial relations. Our research offers new perspectives for integrating spatial concepts and principles into the GeoKG mining process, providing customized GeoAI solutions for geospatial challenges.
Abstract:KnowWhereGraph is one of the largest fully publicly available geospatial knowledge graphs. It includes data from 30 layers on natural hazards (e.g., hurricanes, wildfires), climate variables (e.g., air temperature, precipitation), soil properties, crop and land-cover types, demographics, and human health, various place and region identifiers, among other themes. These have been leveraged through the graph by a variety of applications to address challenges in food security and agricultural supply chains; sustainability related to soil conservation practices and farm labor; and delivery of emergency humanitarian aid following a disaster. In this paper, we introduce the ontology that acts as the schema for KnowWhereGraph. This broad overview provides insight into the requirements and design specifications for the graph and its schema, including the development methodology (modular ontology modeling) and the resources utilized to implement, materialize, and deploy KnowWhereGraph with its end-user interfaces and public query SPARQL endpoint.
Abstract:Research on geospatial foundation models (GFMs) has become a trending topic in geospatial artificial intelligence (AI) research due to their potential for achieving high generalizability and domain adaptability, reducing model training costs for individual researchers. Unlike large language models, such as ChatGPT, constructing visual foundation models for image analysis, particularly in remote sensing, encountered significant challenges such as formulating diverse vision tasks into a general problem framework. This paper evaluates the recently released NASA-IBM GFM Prithvi for its predictive performance on high-level image analysis tasks across multiple benchmark datasets. Prithvi was selected because it is one of the first open-source GFMs trained on time-series of high-resolution remote sensing imagery. A series of experiments were designed to assess Prithvi's performance as compared to other pre-trained task-specific AI models in geospatial image analysis. New strategies, including band adaptation, multi-scale feature generation, and fine-tuning techniques, are introduced and integrated into an image analysis pipeline to enhance Prithvi's domain adaptation capability and improve model performance. In-depth analyses reveal Prithvi's strengths and weaknesses, offering insights for both improving Prithvi and developing future visual foundation models for geospatial tasks.
Abstract:As a potential non-invasive biomarker for ischaemic stroke, intracranial arterial calcification (IAC) could be used for stroke risk assessment on CT head scans routinely acquired for other reasons (e.g. trauma, confusion). Artificial intelligence methods can support IAC scoring, but they have not yet been developed for clinical imaging. Large heterogeneous clinical CT datasets are necessary for the training of such methods, but they exhibit expected and unexpected data anomalies. Using CTs from a large clinical trial, the third International Stroke Trial (IST-3), we propose a pipeline that uses as input non-enhanced CT scans to output regions of interest capturing selected large intracranial arteries for IAC scoring. Our method uses co-registration with templates. We focus on quality control, using information presence along the z-axis of the imaging to group and apply similarity measures (structural similarity index measure) to triage assessment of individual image series. Additionally, we propose superimposing thresholded binary masks of the series to inspect large quantities of data in parallel. We identify and exclude unrecoverable samples and registration failures. In total, our pipeline processes 10,659 CT series, rejecting 4,322 (41%) in the entire process, 1,450 (14% of the total) during quality control, and outputting 6,337 series. Our pipeline enables effective and efficient region of interest localisation for targeted IAC segmentation.
Abstract:Dielectrics are materials with widespread applications in flash memory, central processing units, photovoltaics, capacitors, etc. However, the availability of public dielectric data remains limited, hindering research and development efforts. Previously, machine learning models focused on predicting dielectric constants as scalars, overlooking the importance of dielectric tensors in understanding material properties under directional electric fields for material design and simulation. This study demonstrates the value of common equivariant structural embedding features derived from a universal neural network potential in enhancing the prediction of dielectric properties. To integrate channel information from various-rank latent features while preserving the desired SE(3) equivariance to the second-rank dielectric tensors, we design an equivariant readout decoder to predict the total, electronic, and ionic dielectric tensors individually, and compare our model with the state-of-the-art models. Finally, we evaluate our model by conducting virtual screening on thermodynamical stable structure candidates in Materials Project. The material Ba\textsubscript{2}SmTaO\textsubscript{6} with large band gaps ($E_g=3.36 \mathrm{eV}$) and dielectric constants ($\epsilon=93.81$) is successfully identified out of the 14k candidate set. The results show that our methods give good accuracy on predicting dielectric tensors of inorganic materials, emphasizing their potential in contributing to the discovery of novel dielectrics.
Abstract:Flood inundation mapping is a critical task for responding to the increasing risk of flooding linked to global warming. Significant advancements of deep learning in recent years have triggered its extensive applications, including flood inundation mapping. To cope with the time-consuming and labor-intensive data labeling process in supervised learning, deep active learning strategies are one of the feasible approaches. However, there remains limited exploration into the interpretability of how deep active learning strategies operate, with a specific focus on flood inundation mapping in the field of remote sensing. In this study, we introduce a novel framework of Interpretable Deep Active Learning for Flood inundation Mapping (IDAL-FIM), specifically in terms of class ambiguity of multi-spectral satellite images. In the experiments, we utilize Sen1Floods11 dataset, and adopt U-Net with MC-dropout. In addition, we employ five acquisition functions, which are the random, K-means, BALD, entropy, and margin acquisition functions. Based on the experimental results, we demonstrate that two proposed class ambiguity indices are effective variables to interpret the deep active learning by establishing statistically significant correlation with the predictive uncertainty of the deep learning model at the tile level. Then, we illustrate the behaviors of deep active learning through visualizing two-dimensional density plots and providing interpretations regarding the operation of deep active learning, in flood inundation mapping.