Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fabian Gieseke

University of Copenhagen, Denmark

DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications

Feb 24, 2025

Ibrahim Fayad, Max Zimmer, Martin Schwartz, Philippe Ciais, Fabian Gieseke, Gabriel Belouze, Sarah Brood, Aurelien De Truchis, Alexandre d'Aspremont

Abstract:Significant efforts have been directed towards adapting self-supervised multimodal learning for Earth observation applications. However, existing methods produce coarse patch-sized embeddings, limiting their effectiveness and integration with other modalities like LiDAR. To close this gap, we present DUNIA, an approach to learn pixel-sized embeddings through cross-modal alignment between images and full-waveform LiDAR data. As the model is trained in a contrastive manner, the embeddings can be directly leveraged in the context of a variety of environmental monitoring tasks in a zero-shot setting. In our experiments, we demonstrate the effectiveness of the embeddings for seven such tasks (canopy height mapping, fractional canopy cover, land cover mapping, tree species identification, plant area index, crop type classification, and per-pixel waveform-based vertical structure mapping). The results show that the embeddings, along with zero-shot classifiers, often outperform specialized supervised models, even in low data regimes. In the fine-tuning setting, we show strong low-shot capabilities with performances near or better than state-of-the-art on five out of six tasks.

* 26 pages, 8 figures

Via

Access Paper or Ask Questions

Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Jan 31, 2025

Jan Pauls, Max Zimmer, Berkant Turan, Sassan Saatchi, Philippe Ciais, Sebastian Pokutta, Fabian Gieseke

Figure 1 for Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Figure 2 for Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Figure 3 for Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Figure 4 for Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Abstract:With the rise in global greenhouse gas emissions, accurate large-scale tree canopy height maps are essential for understanding forest structure, estimating above-ground biomass, and monitoring ecological disruptions. To this end, we present a novel approach to generate large-scale, high-resolution canopy height maps over time. Our model accurately predicts canopy height over multiple years given Sentinel-2 time series satellite data. Using GEDI LiDAR data as the ground truth for training the model, we present the first 10m resolution temporal canopy height map of the European continent for the period 2019-2022. As part of this product, we also offer a detailed canopy height map for 2020, providing more precise estimates than previous studies. Our pipeline and the resulting temporal height map are publicly available, enabling comprehensive large-scale monitoring of forests and, hence, facilitating future research and ecological analyses. For an interactive viewer, see https://europetreemap.projects.earthengine.app/view/temporalcanopyheight.

* 9 pages main paper, 5 pages references and appendix, 8 figures, 5 tables

Via

Access Paper or Ask Questions

CLIP-Branches: Interactive Fine-Tuning for Text-Image Retrieval

Jun 19, 2024

Christian Lülf, Denis Mayr Lima Martins, Marcos Antonio Vaz Salles, Yongluan Zhou, Fabian Gieseke

Figure 1 for CLIP-Branches: Interactive Fine-Tuning for Text-Image Retrieval

Figure 2 for CLIP-Branches: Interactive Fine-Tuning for Text-Image Retrieval

Figure 3 for CLIP-Branches: Interactive Fine-Tuning for Text-Image Retrieval

Figure 4 for CLIP-Branches: Interactive Fine-Tuning for Text-Image Retrieval

Abstract:The advent of text-image models, most notably CLIP, has significantly transformed the landscape of information retrieval. These models enable the fusion of various modalities, such as text and images. One significant outcome of CLIP is its capability to allow users to search for images using text as a query, as well as vice versa. This is achieved via a joint embedding of images and text data that can, for instance, be used to search for similar items. Despite efficient query processing techniques such as approximate nearest neighbor search, the results may lack precision and completeness. We introduce CLIP-Branches, a novel text-image search engine built upon the CLIP architecture. Our approach enhances traditional text-image search engines by incorporating an interactive fine-tuning phase, which allows the user to further concretize the search query by iteratively defining positive and negative examples. Our framework involves training a classification model given the additional user feedback and essentially outputs all positively classified instances of the entire data catalog. By building upon recent techniques, this inference phase, however, is not implemented by scanning the entire data catalog, but by employing efficient index structures pre-built for the data. Our results show that the fine-tuned results can improve the initial search outputs in terms of relevance and accuracy while maintaining swift response times

Via

Access Paper or Ask Questions

Estimating Canopy Height at Scale

Jun 03, 2024

Jan Pauls, Max Zimmer, Una M. Kelly, Martin Schwartz, Sassan Saatchi, Philippe Ciais, Sebastian Pokutta, Martin Brandt, Fabian Gieseke

Figure 1 for Estimating Canopy Height at Scale

Figure 2 for Estimating Canopy Height at Scale

Figure 3 for Estimating Canopy Height at Scale

Figure 4 for Estimating Canopy Height at Scale

Abstract:We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring.

* ICML Camera-Ready, 17 pages, 14 figures, 7 tables

Via

Access Paper or Ask Questions

Tree Counting by Bridging 3D Point Clouds with Imagery

Mar 12, 2024

Lei Li, Tianfang Zhang, Zhongyu Jiang, Cheng-Yen Yang, Jenq-Neng Hwang, Stefan Oehmcke, Dimitri Pierre Johannes Gominski, Fabian Gieseke, Christian Igel

Figure 1 for Tree Counting by Bridging 3D Point Clouds with Imagery

Figure 2 for Tree Counting by Bridging 3D Point Clouds with Imagery

Figure 3 for Tree Counting by Bridging 3D Point Clouds with Imagery

Figure 4 for Tree Counting by Bridging 3D Point Clouds with Imagery

Abstract:Accurate and consistent methods for counting trees based on remote sensing data are needed to support sustainable forest management, assess climate change mitigation strategies, and build trust in tree carbon credits. Two-dimensional remote sensing imagery primarily shows overstory canopy, and it does not facilitate easy differentiation of individual trees in areas with a dense canopy and does not allow for easy separation of trees when the canopy is dense. We leverage the fusion of three-dimensional LiDAR measurements and 2D imagery to facilitate the accurate counting of trees. We compare a deep learning approach to counting trees in forests using 3D airborne LiDAR data and 2D imagery. The approach is compared with state-of-the-art algorithms, like operating on 3D point cloud and 2D imagery. We empirically evaluate the different methods on the NeonTreeCount data set, which we use to define a tree-counting benchmark. The experiments show that FuseCountNet yields more accurate tree counts.

* need more experiments

Via

Access Paper or Ask Questions

End-to-End Neural Network Training for Hyperbox-Based Classification

Aug 01, 2023

Denis Mayr Lima Martins, Christian Lülf, Fabian Gieseke

Abstract:Hyperbox-based classification has been seen as a promising technique in which decisions on the data are represented as a series of orthogonal, multidimensional boxes (i.e., hyperboxes) that are often interpretable and human-readable. However, existing methods are no longer capable of efficiently handling the increasing volume of data many application domains face nowadays. We address this gap by proposing a novel, fully differentiable framework for hyperbox-based classification via neural networks. In contrast to previous work, our hyperbox models can be efficiently trained in an end-to-end fashion, which leads to significantly reduced training times and superior classification results.

* 6 pages, accepted for poster presentation at ESANN 2023

Via

Access Paper or Ask Questions

BuildSeg: A General Framework for the Segmentation of Buildings

Jan 15, 2023

Lei Li, Tianfang Zhang, Stefan Oehmcke, Fabian Gieseke, Christian Igel

Figure 1 for BuildSeg: A General Framework for the Segmentation of Buildings

Figure 2 for BuildSeg: A General Framework for the Segmentation of Buildings

Abstract:Building segmentation from aerial images and 3D laser scanning (LiDAR) is a challenging task due to the diversity of backgrounds, building textures, and image quality. While current research using different types of convolutional and transformer networks has considerably improved the performance on this task, even more accurate segmentation methods for buildings are desirable for applications such as automatic mapping. In this study, we propose a general framework termed \emph{BuildSeg} employing a generic approach that can be quickly applied to segment buildings. Different data sources were combined to increase generalization performance. The approach yields good results for different data sources as shown by experiments on high-resolution multi-spectral and LiDAR imagery of cities in Norway, Denmark and France. We applied ConvNeXt and SegFormer based models on the high resolution aerial image dataset from the MapAI-competition. The methods achieved an IOU of 0.7902 and a boundary IOU of 0.6185. We used post-processing to account for the rectangular shape of the objects. This increased the boundary IOU from 0.6185 to 0.6189.

Via

Access Paper or Ask Questions

Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN

Dec 18, 2022

Lei Li, Tianfang Zhang, Stefan Oehmcke, Fabian Gieseke, Christian Igel

Figure 1 for Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN

Figure 2 for Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN

Figure 3 for Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN

Figure 4 for Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN

Abstract:Fine-grained semantic segmentation of a person's face and head, including facial parts and head components, has progressed a great deal in recent years. However, it remains a challenging task, whereby considering ambiguous occlusions and large pose variations are particularly difficult. To overcome these difficulties, we propose a novel framework termed Mask-FPAN. It uses a de-occlusion module that learns to parse occluded faces in a semi-supervised way. In particular, face landmark localization, face occlusionstimations, and detected head poses are taken into account. A 3D morphable face model combined with the UV GAN improves the robustness of 2D face parsing. In addition, we introduce two new datasets named FaceOccMask-HQ and CelebAMaskOcc-HQ for face paring work. The proposed Mask-FPAN framework addresses the face parsing problem in the wild and shows significant performance improvements with MIOU from 0.7353 to 0.9013 compared to the state-of-the-art on challenging face datasets.

* 9 pages

Via

Access Paper or Ask Questions

LR-CSNet: Low-Rank Deep Unfolding Network for Image Compressive Sensing

Dec 18, 2022

Tianfang Zhang, Lei Li, Christian Igel, Stefan Oehmcke, Fabian Gieseke, Zhenming Peng

Figure 1 for LR-CSNet: Low-Rank Deep Unfolding Network for Image Compressive Sensing

Figure 2 for LR-CSNet: Low-Rank Deep Unfolding Network for Image Compressive Sensing

Figure 3 for LR-CSNet: Low-Rank Deep Unfolding Network for Image Compressive Sensing

Figure 4 for LR-CSNet: Low-Rank Deep Unfolding Network for Image Compressive Sensing

Abstract:Deep unfolding networks (DUNs) have proven to be a viable approach to compressive sensing (CS). In this work, we propose a DUN called low-rank CS network (LR-CSNet) for natural image CS. Real-world image patches are often well-represented by low-rank approximations. LR-CSNet exploits this property by adding a low-rank prior to the CS optimization task. We derive a corresponding iterative optimization procedure using variable splitting, which is then translated to a new DUN architecture. The architecture uses low-rank generation modules (LRGMs), which learn low-rank matrix factorizations, as well as gradient descent and proximal mappings (GDPMs), which are proposed to extract high-frequency features to refine image details. In addition, the deep features generated at each reconstruction stage in the DUN are transferred between stages to boost the performance. Our extensive experiments on three widely considered datasets demonstrate the promising performance of LR-CSNet compared to state-of-the-art methods in natural image CS.

Via

Access Paper or Ask Questions

Deep Learning Based 3D Point Cloud Regression for Estimating Forest Biomass

Dec 22, 2021

Stefan Oehmcke, Lei Li, Jaime Revenga, Thomas Nord-Larsen, Katerina Trepekli, Fabian Gieseke, Christian Igel

Figure 1 for Deep Learning Based 3D Point Cloud Regression for Estimating Forest Biomass

Figure 2 for Deep Learning Based 3D Point Cloud Regression for Estimating Forest Biomass

Figure 3 for Deep Learning Based 3D Point Cloud Regression for Estimating Forest Biomass

Figure 4 for Deep Learning Based 3D Point Cloud Regression for Estimating Forest Biomass

Abstract:Knowledge of forest biomass stocks and their development is important for implementing effective climate change mitigation measures. It is needed for studying the processes driving af-, re-, and deforestation and is a prerequisite for carbon-accounting. Remote sensing using airborne LiDAR can be used to measure vegetation biomass at large scale. We present deep learning systems for predicting wood volume, above-ground biomass (AGB), and subsequently carbon directly from 3D LiDAR point cloud data. We devise different neural network architectures for point cloud regression and evaluate them on remote sensing data of areas for which AGB estimates have been obtained from field measurements in a national forest inventory. Our adaptation of Minkowski convolutional neural networks for regression gave the best results. The deep neural networks produced significantly more accurate wood volume, AGB, and carbon estimates compared to state-of-the-art approaches operating on basic statistics of the point clouds, and we expect this finding to have a strong impact on LiDAR-based analyses of terrestrial ecosystem dynamics.

Via

Access Paper or Ask Questions