Abstract:Desert locust swarms present a major threat to agriculture and food security. Addressing this challenge, our study develops an operationally-ready model for predicting locust breeding grounds, which has the potential to enhance early warning systems and targeted control measures. We curated a dataset from the United Nations Food and Agriculture Organization's (UN-FAO) locust observation records and analyzed it using two types of spatio-temporal input features: remotely-sensed environmental and climate data as well as multi-spectral earth observation images. Our approach employed custom deep learning models (three-dimensional and LSTM-based recurrent convolutional networks), along with the geospatial foundational model Prithvi recently released by Jakubik et al., 2023. These models notably outperformed existing baselines, with the Prithvi-based model, fine-tuned on multi-spectral images from NASA's Harmonized Landsat and Sentinel-2 (HLS) dataset, achieving the highest accuracy, F1 and ROC-AUC scores (83.03%, 81.53% and 87.69%, respectively). A significant finding from our research is that multi-spectral earth observation images alone are sufficient for effective locust breeding ground prediction without the need to explicitly incorporate climatic or environmental features.
Abstract:Desert locust outbreaks threaten the food security of a large part of Africa and have affected the livelihoods of millions of people over the years. Machine learning (ML) has been demonstrated as an effective approach to locust distribution modelling which could assist in early warning. ML requires a significant amount of labelled data to train. Most publicly available labelled data on locusts are presence-only data, where only the sightings of locusts being present at a location are recorded. Therefore, prior work using ML have resorted to pseudo-absence generation methods as a way to circumvent this issue. The most commonly used approach is to randomly sample points in a region of interest while ensuring that these sampled pseudo-absence points are at least a specific distance away from true presence points. In this paper, we compare this random sampling approach to more advanced pseudo-absence generation methods, such as environmental profiling and optimal background extent limitation, specifically for predicting desert locust breeding grounds in Africa. Interestingly, we find that for the algorithms we tested, namely logistic regression, gradient boosting, random forests and maximum entropy, all popular in prior work, the logistic model performed significantly better than the more sophisticated ensemble methods, both in terms of prediction accuracy and F1 score. Although background extent limitation combined with random sampling boosted performance for ensemble methods, for LR this was not the case, and instead, a significant improvement was obtained when using environmental profiling. In light of this, we conclude that a simpler ML approach such as logistic regression combined with more advanced pseudo-absence generation, specifically environmental profiling, can be a sensible and effective approach to predicting locust breeding grounds across Africa.