Abstract:Earth's forests play an important role in the fight against climate change, and are in turn negatively affected by it. Effective monitoring of different tree species is essential to understanding and improving the health and biodiversity of forests. In this work, we address the challenge of tree species identification by performing semantic segmentation of trees using an aerial image dataset spanning over a year. We compare models trained on single images versus those trained on time series to assess the impact of tree phenology on segmentation performances. We also introduce a simple convolutional block for extracting spatio-temporal features from image time series, enabling the use of popular pretrained backbones and methods. We leverage the hierarchical structure of tree species taxonomy by incorporating a custom loss function that refines predictions at three levels: species, genus, and higher-level taxa. Our findings demonstrate the superiority of our methodology in exploiting the time series modality and confirm that enriching labels using taxonomic information improves the semantic segmentation performance.
Abstract:Forests are an essential part of Earth's ecosystems and natural systems, as well as providing services on which humanity depends, yet they are rapidly changing as a result of land use decisions and climate change. Understanding and mitigating negative effects requires parsing data on forests at global scale from a broad array of sensory modalities, and recently many such problems have been approached using machine learning algorithms for remote sensing. To date, forest-monitoring problems have largely been approached in isolation. Inspired by the rise of foundation models for computer vision and remote sensing, we here present the first unified Forest Monitoring Benchmark (FoMo-Bench). FoMo-Bench consists of 15 diverse datasets encompassing satellite, aerial, and inventory data, covering a variety of geographical regions, and including multispectral, red-green-blue, synthetic aperture radar (SAR) and LiDAR data with various temporal, spatial and spectral resolutions. FoMo-Bench includes multiple types of forest-monitoring tasks, spanning classification, segmentation, and object detection. To further enhance the diversity of tasks and geographies represented in FoMo-Bench, we introduce a novel global dataset, TalloS, combining satellite imagery with ground-based annotations for tree species classification, spanning 1,000+ hierarchical taxonomic levels (species, genus, family). Finally, we propose FoMo-Net, a foundation model baseline designed for forest monitoring with the flexibility to process any combination of commonly used sensors in remote sensing. This work aims to inspire research collaborations between machine learning and forest biology researchers in exploring scalable multi-modal and multi-task models for forest monitoring. All code and data will be made publicly available.
Abstract:Forests play a crucial role in Earth's system processes and provide a suite of social and economic ecosystem services, but are significantly impacted by human activities, leading to a pronounced disruption of the equilibrium within ecosystems. Advancing forest monitoring worldwide offers advantages in mitigating human impacts and enhancing our comprehension of forest composition, alongside the effects of climate change. While statistical modeling has traditionally found applications in forest biology, recent strides in machine learning and computer vision have reached important milestones using remote sensing data, such as tree species identification, tree crown segmentation and forest biomass assessments. For this, the significance of open access data remains essential in enhancing such data-driven algorithms and methodologies. Here, we provide a comprehensive and extensive overview of 86 open access forest datasets across spatial scales, encompassing inventories, ground-based, aerial-based, satellite-based recordings, and country or world maps. These datasets are grouped in OpenForest, a dynamic catalogue open to contributions that strives to reference all available open access forest datasets. Moreover, in the context of these datasets, we aim to inspire research in machine learning applied to forest biology by establishing connections between contemporary topics, perspectives and challenges inherent in both domains. We hope to encourage collaborations among scientists, fostering the sharing and exploration of diverse datasets through the application of machine learning methods for large-scale forest monitoring. OpenForest is available at https://github.com/RolnickLab/OpenForest .
Abstract:Autonomous driving requires a detailed understanding of complex driving scenes. The redundancy and complementarity of the vehicle's sensors provide an accurate and robust comprehension of the environment, thereby increasing the level of performance and safety. This thesis focuses the on automotive RADAR, which is a low-cost active sensor measuring properties of surrounding objects, including their relative speed, and has the key advantage of not being impacted by adverse weather conditions. With the rapid progress of deep learning and the availability of public driving datasets, the perception ability of vision-based driving systems has considerably improved. The RADAR sensor is seldom used for scene understanding due to its poor angular resolution, the size, noise, and complexity of RADAR raw data as well as the lack of available datasets. This thesis proposes an extensive study of RADAR scene understanding, from the construction of an annotated dataset to the conception of adapted deep learning architectures. First, this thesis details approaches to tackle the current lack of data. A simple simulation as well as generative methods for creating annotated data will be presented. It will also describe the CARRADA dataset, composed of synchronised camera and RADAR data with a semi-automatic annotation method. This thesis then present a proposed set of deep learning architectures with their associated loss functions for RADAR semantic segmentation. It also introduces a method to open up research into the fusion of LiDAR and RADAR sensors for scene understanding. Finally, this thesis exposes a collaborative contribution, the RADIal dataset with synchronised High-Definition (HD) RADAR, LiDAR and camera. A deep learning architecture is also proposed to estimate the RADAR signal processing pipeline while performing multitask learning for object detection and free driving space segmentation.
Abstract:With their robustness to adverse weather conditions and ability to measure speeds, radar sensors have been part of the automotive landscape for more than two decades. Recent progress toward High Definition (HD) Imaging radar has driven the angular resolution below the degree, thus approaching laser scanning performance. However, the amount of data a HD radar delivers and the computational cost to estimate the angular positions remain a challenge. In this paper, we propose a novel HD radar sensing model, FFT-RadNet, that eliminates the overhead of computing the Range-Azimuth-Doppler 3D tensor, learning instead to recover angles from a Range-Doppler spectrum. FFT-RadNet is trained both to detect vehicles and to segment free driving space. On both tasks, it competes with the most recent radar-based models while requiring less compute and memory. Also, we collected and annotated 2-hour worth of raw data from synchronized automotive-grade sensors (camera, laser, HD radar) in various environments (city street, highway, countryside road). This unique dataset, nick-named RADIal for "Radar, Lidar et al.", is available at https://github.com/valeoai/RADIal.
Abstract:Understanding the scene around the ego-vehicle is key to assisted and autonomous driving. Nowadays, this is mostly conducted using cameras and laser scanners, despite their reduced performances in adverse weather conditions. Automotive radars are low-cost active sensors that measure properties of surrounding objects, including their relative speed, and have the key advantage of not being impacted by rain, snow or fog. However, they are seldom used for scene understanding due to the size and complexity of radar raw data and the lack of annotated datasets. Fortunately, recent open-sourced datasets have opened up research on classification, object detection and semantic segmentation with raw radar signals using end-to-end trainable models. In this work, we propose several novel architectures, and their associated losses, which analyse multiple "views" of the range-angle-Doppler radar tensor to segment it semantically. Experiments conducted on the recent CARRADA dataset demonstrate that our best model outperforms alternative models, derived either from the semantic segmentation of natural images or from radar scene understanding, while requiring significantly fewer parameters. Both our code and trained models will be released.