Efficient and scalable crop growth simulations using standard big data and distributed computing technologies

Rob Knapen*, Allard de Wit, Eliya Buyukkaya, Petros Petrou, Dilli Paudel, Sander Janssen, Ioannis Athanasiadis

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

The digitization in agriculture has led to an explosion of highly detailed data generated, offering opportunities for further optimizing resource use in food production systems. However, managing and processing these growing data volumes presents significant challenges. This study investigates the suitability of standard big data and distributed computing technologies with a crop yield forecasting case study, and benchmarks performance and scalability of storage and compute. To that end a prototype system leveraging the Apache Spark big data analytics framework and using the WISS-WOFOST crop growth simulation model is assembled and evaluated for its efficiency and scalability when running large numbers of simulations using distributed computing on commonly available infrastructure. Existing data for maize and winter wheat, as typical summer and winter crops, is prepared for distributed storage and processing and used to measure the performance of the system on clusters of increasing sizes, from small Kubernetes Cloud deployments to large HPC configurations. Specific attention is paid to the aggregation of the grid-based simulation results to larger administrative regions for follow-up analysis and reporting. Our results demonstrate that the selected standard big data and distributed computing technology simplifies the application of distributed processing and storage, making the related trade-off between runtime and costs more attainable. By increasing the distribution of our system 64 times and the total number of cores used 45 times compared to the baseline, we obtained a 99% reduction in simulation processing time and a 95% decrease in the aggregation time of the simulation results, making detailed forecasting for large areas more tractable. However, distributed implementations remain inherently more complex than conventional ones. As such, the construction and use of distributed systems will continue to be a challenge for agricultural agronomists and agricultural data scientists.

Original languageEnglish
Article number110392
JournalComputers and Electronics in Agriculture
Volume236
DOIs
Publication statusPublished - Sept 2025

Keywords

  • Apache Spark
  • Benchmarking
  • Crop yield forecasting
  • Distributed computing
  • HPC
  • Kubernetes
  • WOFOST crop growth model

Fingerprint

Dive into the research topics of 'Efficient and scalable crop growth simulations using standard big data and distributed computing technologies'. Together they form a unique fingerprint.

Cite this