Abstract:Accurate yield forecasting is essential for making informed policies and long-term decisions for food security. Earth Observation (EO) data and machine learning algorithms play a key role in providing a comprehensive and timely view of crop conditions from field to national scales. However, machine learning algorithms' prediction accuracy is often harmed by spatial heterogeneity caused by exogenous factors not reflected in remote sensing data, such as differences in crop management strategies. In this paper, we propose and investigate a simple technique called state-wise additive bias to explicitly address the cross-region yield heterogeneity in Kazakhstan. Compared to baseline machine learning models (Random Forest, CatBoost, XGBoost), our method reduces the overall RMSE by 8.9\% and the highest state-wise RMSE by 28.37\%. The effectiveness of state-wise additive bias indicates machine learning's performance can be significantly improved by explicitly addressing the spatial heterogeneity, motivating future work on spatial-aware machine learning algorithms for yield forecasts as well as for general geospatial forecasting problems.
Abstract:Crop type classification using satellite observations is an important tool for providing insights about planted area and enabling estimates of crop condition and yield, especially within the growing season when uncertainties around these quantities are highest. As the climate changes and extreme weather events become more frequent, these methods must be resilient to changes in domain shifts that may occur, for example, due to shifts in planting timelines. In this work, we present an approach for within-season crop type classification using moderate spatial resolution (30 m) satellite data that addresses domain shift related to planting timelines by normalizing inputs by crop growth stage. We use a neural network leveraging both convolutional and recurrent layers to predict if a pixel contains corn, soybeans, or another crop or land cover type. We evaluated this method for the 2019 growing season in the midwestern US, during which planting was delayed by as much as 1-2 months due to extreme weather that caused record flooding. We show that our approach using growth stage-normalized time series outperforms fixed-date time series, and achieves overall classification accuracy of 85.4% prior to harvest (September-November) and 82.8% by mid-season (July-September).