Abstract:Italian ryegrass is a grass weed commonly found in winter wheat fields that are competitive with winter wheat for moisture and nutrients. Ryegrass can cause substantial reductions in yield and grain quality if not properly controlled with the use of herbicides. To control the cost and environmental impact we detect weeds in drone and satellite imagery. Satellite imagery is too coarse to be used for precision spraying, but can aid in planning drone flights and treatments. Drone images on the other hand have sufficiently good resolution for precision spraying. However, ryegrass is hard to distinguish from the crop and annotation requires expert knowledge. We used the Python segmentation models library to test more than 600 different neural network architectures for weed segmentation in drone images and we map accuracy versus the cost of the model prediction for these. Our best system applies herbicides to over 99% of the weeds while only spraying an area 30% larger than the annotated weed area. These models yield large savings if the weed covers a small part of the field.
Abstract:Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding across various domains, including healthcare and finance. For some tasks, LLMs achieve similar or better performance than trained human beings, therefore it is reasonable to employ human exams (e.g., certification tests) to assess the performance of LLMs. We present a comprehensive evaluation of popular LLMs, such as Llama 2 and GPT, on their ability to answer agriculture-related questions. In our evaluation, we also employ RAG (Retrieval-Augmented Generation) and ER (Ensemble Refinement) techniques, which combine information retrieval, generation capabilities, and prompting strategies to improve the LLMs' performance. To demonstrate the capabilities of LLMs, we selected agriculture exams and benchmark datasets from three of the largest agriculture producer countries: Brazil, India, and the USA. Our analysis highlights GPT-4's ability to achieve a passing score on exams to earn credits for renewing agronomist certifications, answering 93% of the questions correctly and outperforming earlier general-purpose models, which achieved 88% accuracy. On one of our experiments, GPT-4 obtained the highest performance when compared to human subjects. This performance suggests that GPT-4 could potentially pass on major graduate education admission tests or even earn credits for renewing agronomy certificates. We also explore the models' capacity to address general agriculture-related questions and generate crop management guidelines for Brazilian and Indian farmers, utilizing robust datasets from the Brazilian Agency of Agriculture (Embrapa) and graduate program exams from India. The results suggest that GPT-4, ER, and RAG can contribute meaningfully to agricultural education, assessment, and crop management practice, offering valuable insights to farmers and agricultural professionals.