Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Coupling Machine Learning and Crop Modeling Improves Crop Yield Prediction in the US Corn Belt (2008.04060v2)

Published 28 Jul 2020 in q-bio.QM, cs.LG, and q-bio.PE

Abstract: This study investigates whether coupling crop modeling and ML improves corn yield predictions in the US Corn Belt. The main objectives are to explore whether a hybrid approach (crop modeling + ML) would result in better predictions, investigate which combinations of hybrid models provide the most accurate predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction. Five ML models (linear regression, LASSO, LightGBM, random forest, and XGBoost) and six ensemble models have been designed to address the research question. The results suggest that adding simulation crop model variables (APSIM) as input features to ML models can decrease yield prediction root mean squared error (RMSE) from 7 to 20%. Furthermore, we investigated partial inclusion of APSIM features in the ML prediction models and we found soil moisture related APSIM variables are most influential on the ML predictions followed by crop-related and phenology-related variables. Finally, based on feature importance measure, it has been observed that simulated APSIM average drought stress and average water table depth during the growing season are the most important APSIM inputs to ML. This result indicates that weather information alone is not sufficient and ML models need more hydrological inputs to make improved yield predictions.

Citations (200)

Summary

  • The paper demonstrates that combining APSIM outputs with five ML models reduces RMSE by 7-20% compared to traditional methods.
  • It employs an ensemble of machine learning techniques with detailed environmental and hydrological data to boost prediction accuracy.
  • The study emphasizes practical benefits for agricultural advisories and resilient crop forecasting under extreme weather conditions.

Improving Crop Yield Prediction through Coupled Machine Learning and Simulation Modeling

The integration of traditional crop modeling with ML techniques specifically for enhancing crop yield predictions is an area that has received limited attention despite significant potential. The paper authored by Shahhosseini et al. addresses this gap by investigating the benefits of combining these methodologies to predict corn yield in the US Corn Belt. Unlike earlier studies which predominantly employed basic statistical models, this research incorporates a variety of machine learning models alongside the Agricultural Production Systems sIMulator (APSIM) to assess their collective efficacy in improving prediction accuracy.

Methodology and Findings

The research leverages five distinct ML models—linear regression, LASSO, LightGBM, random forest, and XGBoost—complemented by ensemble approaches to create a robust prediction framework. Their baseline was a benchmark model using only environmental, management, and historical yield data, which they then enhanced by integrating 22 APSIM simulation outputs as additional model inputs. The inclusion of APSIM variables resulted in a noticeable reduction in prediction error, with root mean squared error (RMSE) improvements ranging from 7% to 20%.

Furthermore, employing APSIM-derived features such as average drought stress and depth to water table during growing seasons were shown to drive this enhancement, emphasizing the importance of incorporating detailed hydrological data in predictive models. This demonstrates the shortfall of relying solely on weather data and highlights the necessity of including comprehensive environmental inputs in yield prediction tasks.

In challenging conditions, such as the extreme drought of 2012, the hybrid model outperformed ML-only models but noted a general decrease in prediction accuracy, revealing the inherent challenges in forecasting under extreme weather conditions. Nevertheless, this finding underscores the resilience of hybrid models, which consistently outperform standalone approaches even in adverse conditions.

Discussion and Implications

The paper presents a compelling case for integrating APSIM outputs with machine learning models to advance crop yield predictions. The improved performance of hybrid models suggests potential pathways for optimizing agricultural forecasting tools, with particular emphasis on the inclusion of detailed soil and hydrological metrics.

One notable observation is that average bias in predictions was significantly reduced when APSIM data were incorporated, indicating that coupled models provide not only more accurate but also more reliable outputs. The exploration of variable importance further reinforces the critical role of particular APSIM inputs in enhancing predictive capability, which could guide model improvements and feature selection in future studies.

From a practical perspective, these findings have substantial implications for farmers and policy-makers, suggesting a route for developing more precise and timely agricultural advisories. From a theoretical standpoint, the research enriches the corpus of knowledge in the field of agronomy by providing empirical evidence on the superiority of hybrid models, thus paving the way for further application of similar methodologies across different crops and regions.

Future Directions

The research opens several avenues for future exploration. Extending this framework to include remote sensing data could further bolster the predictive performance by providing timely and dynamic environmental inputs. Additionally, the challenge of predicting under extreme weather conditions remains an open research question, emphasizing the need for innovations that enhance model robustness during such events.

The relevance of simulation models operating in forecast mode, as opposed to retrospective analysis using full-year data, also warrants further paper to better understand the limitations and opportunities this presents for real-world applications.

In conclusion, the combination of machine learning and simulation-based approaches holds significant promise for revolutionizing crop yield predictions, providing a rich field of paper for future research in sustainable agriculture and food security.