Abstract:In many research fields, the sizes of the existing datasets vary widely. Hence, there is a need for machine learning techniques which are well-suited for these different datasets. One possible technique is the self-organizing map (SOM), a type of artificial neural network which is, so far, weakly represented in the field of machine learning. The SOM's unique characteristic is the neighborhood relationship of the output neurons. This relationship improves the ability of generalization on small datasets. SOMs are mostly applied in unsupervised learning and few studies focus on using SOMs as supervised learning approach. Furthermore, no appropriate SOM package is available with respect to machine learning standards and in the widely used programming language Python. In this paper, we introduce the freely available SUpervised Self-organIzing maps (SUSI) Python package which performs supervised regression and classification. The implementation of SUSI is described with respect to the underlying mathematics. Then, we present first evaluations of the SOM for regression and classification datasets from two different domains of geospatial image analysis. Despite the early stage of its development, the SUSI framework performs well and is characterized by only small performance differences between the training and the test datasets. A comparison of the SUSI framework with existing Python and R packages demonstrates the importance of the SUSI framework. In future work, the SUSI framework will be extended, optimized and upgraded e.g. with tools to better understand and visualize the input data as well as the handling of missing and incomplete data.
Abstract:Soil texture is important for many environmental processes. In this paper, we study the classification of soil texture based on hyperspectral data. We develop and implement three 1-dimensional (1D) convolutional neural networks (CNN): the LucasCNN, the LucasResNet which contains an identity block as residual network, and the LucasCoordConv with an additional coordinates layer. Furthermore, we modify two existing 1D CNN approaches for the presented classification task. The code of all five CNN approaches is available on GitHub (Riese, 2019). We evaluate the performance of the CNN approaches and compare them to a random forest classifier. Thereby, we rely on the freely available LUCAS topsoil dataset. The CNN approach with the least depth turns out to be the best performing classifier. The LucasCoordConv achieves the best performance regarding the average accuracy. In future work, we can further enhance the introduced LucasCNN, LucasResNet and LucasCoordConv and include additional variables of the rich LUCAS dataset.
Abstract:In this contribution, we investigate the potential of hyperspectral data combined with either simulated ground penetrating radar (GPR) or simulated (sensor-like) soil-moisture data to estimate soil moisture. We propose two simulation approaches to extend a given multi-sensor dataset which contains sparse GPR data. In the first approach, simulated GPR data is generated either by an interpolation along the time axis or by a machine learning model. The second approach includes the simulation of soil-moisture along the GPR profile. The soil-moisture estimation is improved significantly by the fusion of hyperspectral and GPR data. In contrast, the combination of simulated, sensor-like soil-moisture values and hyperspectral data achieves the worst regression performance. In conclusion, the estimation of soil moisture with hyperspectral and GPR data engages further investigations.
Abstract:In this paper, we investigate the potential of estimating the soil-moisture content based on VNIR hyperspectral data combined with LWIR data. Measurements from a multi-sensor field campaign represent the benchmark dataset which contains measured hyperspectral, LWIR, and soil-moisture data conducted on grassland site. We introduce a regression framework with three steps consisting of feature selection, preprocessing, and well-chosen regression models. The latter are mainly supervised machine learning models. An exception are the self-organizing maps which combine unsupervised and supervised learning. We analyze the impact of the distinct preprocessing methods on the regression results. Of all regression models, the extremely randomized trees model without preprocessing provides the best estimation performance. Our results reveal the potential of the respective regression framework combined with the VNIR hyperspectral data to estimate soil moisture measured under real-world conditions. In conclusion, the results of this paper provide a basis for further improvements in different research directions.