Abstract:Precise crop yield prediction is essential for improving agricultural practices and ensuring crop resilience in varying climates. Integrating weather data across the growing season, especially for different crop varieties, is crucial for understanding their adaptability in the face of climate change. In the MLCAS2021 Crop Yield Prediction Challenge, we utilized a dataset comprising 93,028 training records to forecast yields for 10,337 test records, covering 159 locations across 28 U.S. states and Canadian provinces over 13 years (2003-2015). This dataset included details on 5,838 distinct genotypes and daily weather data for a 214-day growing season, enabling comprehensive analysis. As one of the winning teams, we developed two novel convolutional neural network (CNN) architectures: the CNN-DNN model, combining CNN and fully-connected networks, and the CNN-LSTM-DNN model, with an added LSTM layer for weather variables. Leveraging the Generalized Ensemble Method (GEM), we determined optimal model weights, resulting in superior performance compared to baseline models. The GEM model achieved lower RMSE (5.55% to 39.88%), reduced MAE (5.34% to 43.76%), and higher correlation coefficients (1.1% to 10.79%) when evaluated on test data. We applied the CNN-DNN model to identify top-performing genotypes for various locations and weather conditions, aiding genotype selection based on weather variables. Our data-driven approach is valuable for scenarios with limited testing years. Additionally, a feature importance analysis using RMSE change highlighted the significance of location, MG, year, and genotype, along with the importance of weather variables MDNI and AP.
Abstract:Experimental corn hybrids are created in plant breeding programs by crossing two parents, so-called inbred and tester, together. Identification of best parent combinations for crossing is challenging since the total number of possible cross combinations of parents is large and it is impractical to test all possible cross combinations due to limited resources of time and budget. In the 2020 Syngenta Crop Challenge, Syngenta released several large datasets that recorded the historical yield performances of around 4% of total cross combinations of 593 inbreds with 496 testers which were planted in 280 locations between 2016 and 2018 and asked participants to predict the yield performance of cross combinations of inbreds and testers that have not been planted based on the historical yield data collected from crossing other inbreds and testers. In this paper, we present a collaborative filtering method which is an ensemble of matrix factorization method and neural networks to solve this problem. Our computational results suggested that the proposed model significantly outperformed other models such as LASSO, random forest (RF), and neural networks. Presented method and results were produced within the 2020 Syngenta Crop Challenge.
Abstract:Environmental stresses such as drought and heat can cause substantial yield loss in agriculture. As such, hybrid crops which are tolerant to drought and heat stress would produce more consistent yields compared to the hybrids which are not tolerant to these stresses. In the 2019 Syngenta Crop Challenge, Syngenta released several large datasets that recorded the yield performances of 2,452 corn hybrids planted in 1,560 locations between 2008 and 2017 and asked participants to classify the corn hybrids as either tolerant or susceptible to drought stress, heat stress, and combined drought and heat stress. As one of the winning teams, we designed a two-step approach to solve this problem in an unsupervised way since no data was provided that classified any set of hybrids as tolerant or susceptible to any type of stress. First, we designed a deep convolutional neural network (CNN) that took advantage of state-of-the-art modeling and solution techniques to extract stress metrics for each type of stress. Our CNN model was found to successfully distinguish between the low and high stress environments due to considering multiple factors such as planting/harvest dates, daily weather, and soil conditions. Then, we conducted a linear regression of the yield of hybrid against each stress metric, and classified the hybrid based on the slope of the regression line, since the slope of the regression line showed how sensitive a hybrid was to a specific environmental stress. Our results suggested that only 14 % of the corn hybrids were tolerant to at least one type of stress.