Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marc Rußwurm

AirCast: Improving Air Pollution Forecasting Through Multi-Variable Data Alignment

Feb 25, 2025

Vishal Nedungadi, Muhammad Akhtar Munir, Marc Rußwurm, Ron Sarafian, Ioannis N. Athanasiadis, Yinon Rudich, Fahad Shahbaz Khan, Salman Khan

Abstract:Air pollution remains a leading global health risk, exacerbated by rapid industrialization and urbanization, contributing significantly to morbidity and mortality rates. In this paper, we introduce AirCast, a novel multi-variable air pollution forecasting model, by combining weather and air quality variables. AirCast employs a multi-task head architecture that simultaneously forecasts atmospheric conditions and pollutant concentrations, improving its understanding of how weather patterns affect air quality. Predicting extreme pollution events is challenging due to their rare occurrence in historic data, resulting in a heavy-tailed distribution of pollution levels. To address this, we propose a novel Frequency-weighted Mean Absolute Error (fMAE) loss, adapted from the class-balanced loss for regression tasks. Informed from domain knowledge, we investigate the selection of key variables known to influence pollution levels. Additionally, we align existing weather and chemical datasets across spatial and temporal dimensions. AirCast's integrated approach, combining multi-task learning, frequency weighted loss and domain informed variable selection, enables more accurate pollution forecasts. Our source code and models are made public here (https://github.com/vishalned/AirCast.git)

Via

Access Paper or Ask Questions

Imbalance-aware Presence-only Loss Function for Species Distribution Modeling

Mar 12, 2024

Robin Zbinden, Nina van Tiel, Marc Rußwurm, Devis Tuia

Figure 1 for Imbalance-aware Presence-only Loss Function for Species Distribution Modeling

Figure 2 for Imbalance-aware Presence-only Loss Function for Species Distribution Modeling

Figure 3 for Imbalance-aware Presence-only Loss Function for Species Distribution Modeling

Figure 4 for Imbalance-aware Presence-only Loss Function for Species Distribution Modeling

Abstract:In the face of significant biodiversity decline, species distribution models (SDMs) are essential for understanding the impact of climate change on species habitats by connecting environmental conditions to species occurrences. Traditionally limited by a scarcity of species observations, these models have significantly improved in performance through the integration of larger datasets provided by citizen science initiatives. However, they still suffer from the strong class imbalance between species within these datasets, often resulting in the penalization of rare species--those most critical for conservation efforts. To tackle this issue, this study assesses the effectiveness of training deep learning models using a balanced presence-only loss function on large citizen science-based datasets. We demonstrate that this imbalance-aware loss function outperforms traditional loss functions across various datasets and tasks, particularly in accurately modeling rare species with limited observations.

* Tackling Climate Change with Machine Learning at ICLR 2024

Via

Access Paper or Ask Questions

Data-Centric Machine Learning for Geospatial Remote Sensing Data

Dec 08, 2023

Ribana Roscher, Marc Rußwurm, Caroline Gevaert, Michael Kampffmeyer, Jefersson A. dos Santos, Maria Vakalopoulou, Ronny Hänsch, Stine Hansen, Keiller Nogueira, Jonathan Prexl(+1 more)

Figure 1 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Figure 2 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Figure 3 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Figure 4 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Abstract:Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning models have been proposed, the majority of them have been developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that shifting the focus towards a complementary data-centric perspective is necessary to achieve further improvements in accuracy, generalization ability, and real impact in end-user applications. This work presents a definition and precise categorization of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.

Via

Access Paper or Ask Questions

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Nov 30, 2023

Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, Marc Rußwurm

Figure 1 for SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Figure 2 for SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Figure 3 for SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Figure 4 for SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Abstract:Geographic location is essential for modeling tasks in fields ranging from ecology to epidemiology to the Earth system sciences. However, extracting relevant and meaningful characteristics of a location can be challenging, often entailing expensive data fusion or data distillation from global imagery datasets. To address this challenge, we introduce Satellite Contrastive Location-Image Pretraining (SatCLIP), a global, general-purpose geographic location encoder that learns an implicit representation of locations from openly available satellite imagery. Trained location encoders provide vector embeddings summarizing the characteristics of any given location for convenient usage in diverse downstream tasks. We show that SatCLIP embeddings, pretrained on globally sampled multi-spectral Sentinel-2 satellite data, can be used in various predictive tasks that depend on location information but not necessarily satellite imagery, including temperature prediction, animal recognition in imagery, and population density estimation. Across tasks, SatCLIP embeddings consistently outperform embeddings from existing pretrained location encoders, ranging from models trained on natural images to models trained on semantic context. SatCLIP embeddings also help to improve geographic generalization. This demonstrates the potential of general-purpose location encoders and opens the door to learning meaningful representations of our planet from the vast, varied, and largely untapped modalities of geospatial data.

Via

Access Paper or Ask Questions

Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

Oct 10, 2023

Marc Rußwurm, Konstantin Klemmer, Esther Rolf, Robin Zbinden, Devis Tuia

Figure 1 for Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

Figure 2 for Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

Figure 3 for Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

Figure 4 for Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

Abstract:Learning feature representations of geographical space is vital for any machine learning model that integrates geolocated data, spanning application domains such as remote sensing, ecology, or epidemiology. Recent work mostly embeds coordinates using sine and cosine projections based on Double Fourier Sphere (DFS) features -- these embeddings assume a rectangular data domain even on global data, which can lead to artifacts, especially at the poles. At the same time, relatively little attention has been paid to the exact design of the neural network architectures these functional embeddings are combined with. This work proposes a novel location encoder for globally distributed geographic data that combines spherical harmonic basis functions, natively defined on spherical surfaces, with sinusoidal representation networks (SirenNets) that can be interpreted as learned Double Fourier Sphere embedding. We systematically evaluate the cross-product of positional embeddings and neural network architectures across various classification and regression benchmarks and synthetic evaluation datasets. In contrast to previous approaches that require the combination of both positional encoding and neural networks to learn meaningful representations, we show that both spherical harmonics and sinusoidal representation networks are competitive on their own but set state-of-the-art performances across tasks when combined. We provide source code at www.github.com/marccoru/locationencoder

Via

Access Paper or Ask Questions

Large-scale Detection of Marine Debris in Coastal Areas with Sentinel-2

Jul 05, 2023

Marc Rußwurm, Sushen Jilla Venkatesa, Devis Tuia

Abstract:Detecting and quantifying marine pollution and macro-plastics is an increasingly pressing ecological issue that directly impacts ecology and human health. Efforts to quantify marine pollution are often conducted with sparse and expensive beach surveys, which are difficult to conduct on a large scale. Here, remote sensing can provide reliable estimates of plastic pollution by regularly monitoring and detecting marine debris in coastal areas. Medium-resolution satellite data of coastal areas is readily available and can be leveraged to detect aggregations of marine debris containing plastic litter. In this work, we present a detector for marine debris built on a deep segmentation model that outputs a probability for marine debris at the pixel level. We train this detector with a combination of annotated datasets of marine debris and evaluate it on specifically selected test sites where it is highly probable that plastic pollution is present in the detected marine debris. We demonstrate quantitatively and qualitatively that a deep learning model trained on this dataset issued from multiple sources outperforms existing detection models trained on previous datasets by a large margin. Our experiments show, consistent with the principles of data-centric AI, that this performance is due to our particular dataset design with extensive sampling of negative examples and label refinements rather than depending on the particular deep learning model. We hope to accelerate advances in the large-scale automated detection of marine debris, which is a step towards quantifying and monitoring marine litter with remote sensing at global scales, and release the model weights and training source code under https://github.com/marccoru/marinedebrisdetector

* in review

Via

Access Paper or Ask Questions

Meta-Learning for Few-Shot Land Cover Classification

Apr 28, 2020

Marc Rußwurm, Sherrie Wang, Marco Körner, David Lobell

Figure 1 for Meta-Learning for Few-Shot Land Cover Classification

Figure 2 for Meta-Learning for Few-Shot Land Cover Classification

Figure 3 for Meta-Learning for Few-Shot Land Cover Classification

Figure 4 for Meta-Learning for Few-Shot Land Cover Classification

Abstract:The representations of the Earth's surface vary from one geographic region to another. For instance, the appearance of urban areas differs between continents, and seasonality influences the appearance of vegetation. To capture the diversity within a single category, like as urban or vegetation, requires a large model capacity and, consequently, large datasets. In this work, we propose a different perspective and view this diversity as an inductive transfer learning problem where few data samples from one region allow a model to adapt to an unseen region. We evaluate the model-agnostic meta-learning (MAML) algorithm on classification and segmentation tasks using globally and regionally distributed datasets. We find that few-shot model adaptation outperforms pre-training with regular gradient descent and fine-tuning on (1) the Sen12MS dataset and (2) DeepGlobe data when the source domain and target domain differ. This indicates that model optimization with meta-learning may benefit tasks in the Earth sciences whose data show a high degree of diversity from region to region, while traditional gradient-based supervised learning remains suitable in the absence of a feature or label shift.

* accepted to the CVPR 2020 EarthVision Workshop

Via

Access Paper or Ask Questions

Self-Attention for Raw Optical Satellite Time Series Classification

Oct 23, 2019

Marc Rußwurm, Marco Körner

Figure 1 for Self-Attention for Raw Optical Satellite Time Series Classification

Figure 2 for Self-Attention for Raw Optical Satellite Time Series Classification

Figure 3 for Self-Attention for Raw Optical Satellite Time Series Classification

Figure 4 for Self-Attention for Raw Optical Satellite Time Series Classification

Abstract:Deep learning methods have received increasing interest by the remote sensing community for multi-temporal land cover classification in recent years. Convolutional Neural networks that elementwise compare a time series with learned kernels, and recurrent neural networks that sequentially process temporal data have dominated the state-of-the-art in the classification of vegetation from satellite time series. Self-attention allows a neural network to selectively extract features from specific times in the input sequence thus suppressing non-classification relevant information. Today, self-attention based neural networks dominate the state-of-the-art in natural language processing but are hardly explored and tested in the remote sensing context. In this work, we embed self-attention in the canon of deep learning mechanisms for satellite time series classification for vegetation modeling and crop type identification. We compare it quantitatively to convolution, and recurrence and test four models that each exclusively relies on one of these mechanisms. The models are trained to identify the type of vegetation on crop parcels using raw and preprocessed Sentinel 2 time series over one entire year. To obtain an objective measure we find the best possible performance for each of the models by a large-scale hyperparameter search with more than 2400 validation runs. Beyond the quantitative comparison, we qualitatively analyze the models by an easy-to-implement, but yet effective feature importance analysis based on gradient back-propagation that exploits the differentiable nature of deep learning models. Finally, we look into the self-attention transformer model and visualize attention scores as bipartite graphs in the context of the input time series and a low-dimensional representation of internal hidden states using t-distributed stochastic neighborhood embedding (t-SNE).

Via

Access Paper or Ask Questions

Early Classification for Agricultural Monitoring from Satellite Time Series

Aug 27, 2019

Marc Rußwurm, Romain Tavenard, Sébastien Lefèvre, Marco Körner

Figure 1 for Early Classification for Agricultural Monitoring from Satellite Time Series

Figure 2 for Early Classification for Agricultural Monitoring from Satellite Time Series

Figure 3 for Early Classification for Agricultural Monitoring from Satellite Time Series

Figure 4 for Early Classification for Agricultural Monitoring from Satellite Time Series

Abstract:In this work, we introduce a recently developed early classification mechanism to satellite-based agricultural monitoring. It augments existing classification models by an additional stopping probability based on the previously seen information. This mechanism is end-to-end trainable and derives its stopping decision solely from the observed satellite data. We show results on field parcels in central Europe where sufficient ground truth data is available for an empiric evaluation of the results with local phenological information obtained from authorities. We observe that the recurrent neural network outfitted with this early classification mechanism was able to distinguish the many of the crop types before the end of the vegetative period. Further, we associated these stopping times with evaluated ground truth information and saw that the times of classification were related to characteristic events of the observed plants' phenology.

* Appeared at the International Conference on Machine Learning AI for Social Good Workshop, Long Beach, United States, 2019

Via

Access Paper or Ask Questions

BreizhCrops: A Satellite Time Series Dataset for Crop Type Identification

May 28, 2019

Marc Rußwurm, Sébastien Lefèvre, Marco Körner

Figure 1 for BreizhCrops: A Satellite Time Series Dataset for Crop Type Identification

Figure 2 for BreizhCrops: A Satellite Time Series Dataset for Crop Type Identification

Figure 3 for BreizhCrops: A Satellite Time Series Dataset for Crop Type Identification

Figure 4 for BreizhCrops: A Satellite Time Series Dataset for Crop Type Identification

Abstract:This dataset challenges the time series community with the task of satellite-based vegetation identification on large scale real-world dataset of satellite data acquired during one entire year. It consists of time series data with associated crop types from 580k field parcels in Brittany, France (Breizh in local language). Along with this dataset, we provide results and code of a Long Short-Term Memory network and Transformer network as baselines. We release dataset, along with preprocessing scripts and baseline models in https://github.com/TUM-LMF/BreizhCrops and encourage methodical researchers to benchmark and develop novel methods applied to satellite-based crop monitoring.

* Accepted to the Time Series Workshop of the 36th International Conference on Machine Learning (ICML), Long Beach, California

Via

Access Paper or Ask Questions