Abstract:Gaussian Processes (GPs) has experienced tremendous success in geoscience in general and for bio-geophysical parameter retrieval in the last years. GPs constitute a solid Bayesian framework to formulate many function approximation problems consistently. This paper reviews the main theoretical GP developments in the field. We review new algorithms that respect the signal and noise characteristics, that provide feature rankings automatically, and that allow applicability of associated uncertainty intervals to transport GP models in space and time. All these developments are illustrated in the field of geoscience and remote sensing at a local and global scales through a set of illustrative examples.
Abstract:Hyperspectral acquisitions have proven to be the most informative Earth observation data source for the estimation of nitrogen (N) content, which is the main limiting nutrient for plant growth and thus agricultural production. In the past, empirical algorithms have been widely employed to retrieve information on this biochemical plant component from canopy reflectance. However, these approaches do not seek for a cause-effect relationship based on physical laws. Moreover, most studies solely relied on the correlation of chlorophyll content with nitrogen, and thus neglected the fact that most N is bound in proteins. Our study presents a hybrid retrieval method using a physically-based approach combined with machine learning regression to estimate crop N content. Within the workflow, the leaf optical properties model PROSPECT-PRO including the newly calibrated specific absorption coefficients (SAC) of proteins, was coupled with the canopy reflectance model 4SAIL to PROSAIL-PRO. The latter was then employed to generate a training database to be used for advanced probabilistic machine learning methods: a standard homoscedastic Gaussian process (GP) and a heteroscedastic GP regression that accounts for signal-to-noise relations. Both GP models have the property of providing confidence intervals for the estimates, which sets them apart from other machine learners. GP-based band analysis identified optimal spectral settings with ten bands mainly situated in the shortwave infrared (SWIR) spectral region. Use of well-known protein absorption bands from the literature showed comparative results. Finally, the heteroscedastic GP model was successfully applied on airborne hyperspectral data for N mapping. We conclude that GP algorithms, and in particular the heteroscedastic GP, should be implemented for global agricultural monitoring of aboveground N from future imaging spectroscopy data.
Abstract:Computationally expensive Radiative Transfer Models (RTMs) are widely used} to realistically reproduce the light interaction with the Earth surface and atmosphere. Because these models take long processing time, the common practice is to first generate a sparse look-up table (LUT) and then make use of interpolation methods to sample the multi-dimensional LUT input variable space. However, the question arise whether common interpolation methods perform most accurate. As an alternative to interpolation, this work proposes to use emulation, i.e., approximating the RTM output by means of statistical learning. Two experiments were conducted to assess the accuracy in delivering spectral outputs using interpolation and emulation: (1) at canopy level, using PROSAIL; and (2) at top-of-atmosphere level, using MODTRAN. Various interpolation (nearest-neighbour, inverse distance weighting, piece-wice linear) and emulation (Gaussian process regression (GPR), kernel ridge regression, neural networks) methods were evaluated against a dense reference LUT. In all experiments, the emulation methods clearly produced more accurate output spectra than classical interpolation methods. GPR emulation performed up to ten times more accurately than the best performing interpolation method, and this with a speed that is competitive with the faster interpolation methods. It is concluded that emulation can function as a fast and more accurate alternative to commonly used interpolation methods for reconstructing RTM spectral data.
Abstract:With current and upcoming imaging spectrometers, automated band analysis techniques are needed to enable efficient identification of most informative bands to facilitate optimized processing of spectral data into estimates of biophysical variables. This paper introduces an automated spectral band analysis tool (BAT) based on Gaussian processes regression (GPR) for the spectral analysis of vegetation properties. The GPR-BAT procedure sequentially backwards removes the least contributing band in the regression model for a given variable until only one band is kept. GPR-BAT is implemented within the framework of the free ARTMO's MLRA (machine learning regression algorithms) toolbox, which is dedicated to the transforming of optical remote sensing images into biophysical products. GPR-BAT allows (1) to identify the most informative bands in relating spectral data to a biophysical variable, and (2) to find the least number of bands that preserve optimized accurate predictions. This study concludes that a wise band selection of hyperspectral data is strictly required for optimal vegetation properties mapping.
Abstract:The availability of satellite optical information is often hampered by the natural presence of clouds, which can be problematic for many applications. Persistent clouds over agricultural fields can mask key stages of crop growth, leading to unreliable yield predictions. Synthetic Aperture Radar (SAR) provides all-weather imagery which can potentially overcome this limitation, but given its high and distinct sensitivity to different surface properties, the fusion of SAR and optical data still remains an open challenge. In this work, we propose the use of Multi-Output Gaussian Process (MOGP) regression, a machine learning technique that learns automatically the statistical relationships among multisensor time series, to detect vegetated areas over which the synergy between SAR-optical imageries is profitable. For this purpose, we use the Sentinel-1 Radar Vegetation Index (RVI) and Sentinel-2 Leaf Area Index (LAI) time series over a study area in north west of the Iberian peninsula. Through a physical interpretation of MOGP trained models, we show its ability to provide estimations of LAI even over cloudy periods using the information shared with RVI, which guarantees the solution keeps always tied to real measurements. Results demonstrate the advantage of MOGP especially for long data gaps, where optical-based methods notoriously fail. The leave-one-image-out assessment technique applied to the whole vegetation cover shows MOGP predictions improve standard GP estimations over short-time gaps (R$^2$ of 74\% vs 68\%, RMSE of 0.4 vs 0.44 $[m^2m^{-2}]$) and especially over long-time gaps (R$^2$ of 33\% vs 12\%, RMSE of 0.5 vs 1.09 $[m^2m^{-2}]$).