Feasibility of machine learning-based rice yield prediction in India at the district level using climate reanalysis data (2403.07967v1)
Abstract: Yield forecasting, the science of predicting agricultural productivity before the crop harvest occurs, helps a wide range of stakeholders make better decisions around agricultural planning. This study aims to investigate whether machine learning-based yield prediction models can capably predict Kharif season rice yields at the district level in India several months before the rice harvest takes place. The methodology involved training 19 machine learning models such as CatBoost, LightGBM, Orthogonal Matching Pursuit, and Extremely Randomized Trees on 20 years of climate, satellite, and rice yield data across 247 of Indian rice-producing districts. In addition to model-building, a dynamic dashboard was built understand how the reliability of rice yield predictions varies across districts. The results of the proof-of-concept machine learning pipeline demonstrated that rice yields can be predicted with a reasonable degree of accuracy, with out-of-sample R2, MAE, and MAPE performance of up to 0.82, 0.29, and 0.16 respectively. These results outperformed test set performance reported in related literature on rice yield modeling in other contexts and countries. In addition, SHAP value analysis was conducted to infer both the importance and directional impact of the climate and remote sensing variables included in the model. Important features driving rice yields included temperature, soil water volume, and leaf area index. In particular, higher temperatures in August correlate with increased rice yields, particularly when the leaf area index in August is also high. Building on the results, a proof-of-concept dashboard was developed to allow users to easily explore which districts may experience a rise or fall in yield relative to the previous year.
- Remote sensing for agricultural applications: a meta-review. Remote Sens. Environ., 236:111402, January 2020.
- Crop yield prediction using machine learning: a systematic literature review. Comput. Electron. Agric., 177:105709, October 2020.
- Field-scale crop yield prediction using multi-temporal WorldView-3 and PlanetScope satellite data and deep learning. ISPRS J. Photogramm. Remote Sens., 174:265–281, April 2021.
- Estimating crop yield from multi-temporal satellite data using multivariate regression and neural network techniques. Photogramm. Eng. Remote Sens., 73(10):1149–1157, October 2007.
- Warming temperatures, yield risk and crop insurance participation. Eur. Rev. Agric. Econ., 48(5):1109–1131, December 2021.
- Application of weather index-based insurance for paddy yield: the case of Malaysia. Int. J. Adv. Appl. Sci, 6:51–59, 2019.
- Hari Sankar Nayak et al. Rice yield gaps and nitrogen-use efficiency in the Northwestern Indo-Gangetic Plains of India: evidence-based insights from heterogeneous farmers’ practices. Field Crops Research, 275:108328, 2022.
- FAO. India at a glance, 2018.
- Muhammad Ishfaq et al. Alternate wetting and drying: a water-saving and ecofriendly rice production system. Agricultural Water Management, 241:106363, 2020.
- Modelling climate smart rice-wheat production system in the Middle Gangetic Plains of India. Theoretical and Applied Climatology, 144(1-2):77–91, 2021.
- Steffen Fritz et al. A comparison of global agricultural monitoring systems and current gaps. Agricultural Systems, 168:258–272, 2019.
- Xiangying Xu et al. Design of an integrated climatic assessment indicator (ICAI) for wheat production: a case study in Jiangsu Province, China. Ecological Indicators, 101:943–953, 2019.
- A CNN-RNN framework for crop yield prediction. Frontiers in Plant Science, 10:1750, 2020.
- Anna X. Wang et al. Deep transfer learning for crop yield prediction with remote sensing data. In Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, 2018.
- Large-scale spatio-temporal yield estimation via deep learning using satellite and management data fusion in vineyards. Computers and Electronics in Agriculture, 216:108439, January 2024.
- Sugarcane yield grade prediction using random forest with forward feature selection and hyper-parameter tuning. In Herwig Unger, Sunantha Sodsee, and Phayung Meesad, editors, Recent Advances in Information and Communication Technology 2018, pages 33–42. Springer International Publishing, Cham, 2019.
- Patrick Filippi et al. An approach to forecast grain crop yield using multi-layered, multi-farm data sets and machine learning. Precision Agriculture, 20(5):1015–1029, 2019.
- Ishfaq Ahmad et al. Yield forecasting of spring maize using remote sensing and crop modeling in Faisalabad-Punjab Pakistan. Journal of the Indian Society of Remote Sensing, 46(10):–, 2018.
- Anat Goldstein et al. Applying machine learning on sensor data for irrigation recommendations: revealing the agronomist’s tacit knowledge. Precision Agriculture, 19(3):421–444, 2018.
- Asheesh Chaurasiya et al. Layering smart management practices to sustainably maintain rice yields and improve water use efficiency in eastern India. Field Crops Research, 275:108341, 2022.
- An end-to-end model for rice yield prediction using deep learning fusion. Computers and Electronics in Agriculture, 174:105471, 2020.
- Li Tian et al. Yield prediction model of rice and wheat crops based on ecological distance algorithm. Environmental Technology & Innovation, 20:101132, 2020.
- Weiguo Yu et al. Improved prediction of rice yield at field and county levels by synergistic use of SAR, optical and meteorological data. Agricultural and Forest Meteorology, 342:109729, 2023.
- Liang Wan et al. Grain yield prediction of rice using multi-temporal UAV-based RGB and multispectral images and model transfer – a case study of small farmlands in the south of China. Agricultural and Forest Meteorology, 291:108096, 2020.
- Prakash K Jha et al. Using daily data from seasonal forecasts in dynamic crop models for yield prediction: a case study for rice in Nepal’s Terai. Agricultural and Forest Meteorology, 265:349–358, 2019.
- Predicting rice yield at pixel scale through synthetic use of crop and deep learning models with satellite data in South and North Korea. Science of The Total Environment, 802:149726, 2022.
- Ponraj Arumugam et al. Remote sensing based yield estimation of rice (Oryza sativa L.) using gradient boosted regression in India. Remote Sensing, 13(12):2379, 2021.
- Paddy acreage mapping and yield prediction using sentinel-based optical and SAR data in Sahibganj District, Jharkhand (India). Spatial Information Research, 27(4), 2019.
- Diego Gómez et al. Regional estimation of garlic yield using crop, satellite and climate data in Mexico. Computers and Electronics in Agriculture, 181:105943, 2021.
- Climate change, the monsoon, and rice yield in India. Climatic Change, 111(2):411–424, 2012.
- Sheetal Sharma et al. Field-specific nutrient management using rice crop manager decision support tool in Odisha, India. Field Crops Research, 241:107578, 2019.
- Hans Hersbach et al. The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society, 146(730):1999–2049, 2020.
- Aleš Urban et al. Evaluation of the ERA5 reanalysis-based universal thermal climate index on mortality data in Europe. Environmental Research, 198:111227, 2021.
- Josh M Colston et al. Evaluating meteorological data from weather stations, and from satellites and global models for a multi-site epidemiological study. Environmental Research, 165:91–109, 2018.
- Pramiti Kumar Chakraborty et al. Assessing congenial soil temperature and its impact on root growth, grain yield of summer rice under varying water stress condition in lower Gangetic Plain of India. Journal of the Saudi Society of Agricultural Sciences, 2021.
- Y Jia et al. Effects of low water temperature during reproductive growth on photosynthetic production and nitrogen accumulation in rice. Field Crops Research, 242:107587, 2019.
- Tsuneo Kuwagata et al. Hydrometeorology for plant omics: potential evaporation as a key index for transcriptome in rice. Environmental and Experimental Botany, page 104724, 2021.
- Yongkang Tang et al. Effects of long-term low atmospheric pressure on gas exchange and growth of lettuce. Advances in Space Research, 46(6):751–760, 2010.
- Ratneswar Poddar et al. Effect of irrigation regime and varietal selection on the yield, water productivity, energy indices and economics of rice production in the Lower Gangetic Plains of Eastern India. Agricultural Water Management, 2021.
- Santanu Kumar Bal et al. Critical weather limits for paddy rice under diverse ecosystems of India. Frontiers in Plant Science, 14, 2023.
- J. R. Alvarado. Influence of air temperature on rice population, length of period from sowing to flowering, and spikelet sterility. 2002.
- Modification of the association between high ambient temperature and health by urban microclimate indicators: a systematic review and meta-analysis. Environmental Research, 161:168–180, 2018.
- Relationship between MODIS-NDVI data and wheat yield: a case study in northern Buenos Aires Province, Argentina. Information Processing in Agriculture, 2(2):73–84, 2015.
- Analysis of relationship between cereal yield and NDVI for selected regions of Central Europe based on MODIS satellite data. Remote Sensing Applications: Society and Environment, 17:100286, 2020.
- N T Son et al. A comparative analysis of multitemporal MODIS EVI and NDVI data for large-scale rice yield estimation. Agricultural and Forest Meteorology, 197:52–64, 2014.
- Fiona H M Tang et al. CROPGRIDS: a global geo-referenced dataset of 173 crops circa 2020. Earth System Science Data Discussions, pages 1–22, 2023.
- Ministry of Agriculture and Farmers Welfare. Crop production statistics information system. Online, 2021.
- Geetika Sonkar et al. Vulnerability of Indian wheat against rising temperature and aerosols. Environmental Pollution, 254:112946, 2019.
- Chaitali Diwan et al. AI-based learning content generation and learning pathway augmentation to increase learner engagement. Computers and Education: Artificial Intelligence, 4:100110, 2023.
- Global Administrative Areas. GADM database of global administrative areas, version 2.0. Online, 2012. Accessed on February 14, 2024.
- Longfei Zhou et al. Improved yield prediction of ratoon rice using unmanned aerial vehicle-based multi-temporal feature method. Rice Science, 30(3):247–256, 2023.
- Guolin Ke et al. LightGBM: a highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- A Bayesian ridge regression analysis of congestion’s impact on urban expressway safety. Accident Analysis & Prevention, 88:124–137, 2016.
- Michael E Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211–244, 2001.
- David J C MacKay. Bayesian interpolation. In C Ray Smith, Gary J Erickson, and Paul O Neudorfer, editors, Maximum Entropy and Bayesian Methods: Seattle, 1991, pages 39–66. Springer Netherlands, 1992.
- Multilevel data and Bayesian analysis in traffic safety. Accident Analysis & Prevention, 42(6):1556–1565, 2010.
- Jerome H Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.
- Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
- Robust statistics. Wiley, 2009.
- The elements of statistical learning. Springer New York, 2009.
- Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005.
- Trevor Hastie et al. Multi-class AdaBoost. Statistics and Its Interface, 2(3):349–360, 2009.
- Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit. 2008.
- Extremely randomized trees. Machine Learning, 63(1):3–42, 2006.
- Evaluating time series forecasting models: An empirical study on performance estimation methods. Machine Learning, 109(11):1997–2028, November 2020.
- A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 4768–4777, Red Hook, NY, USA, 2017. Curran Associates Inc.
- Exploring XAI techniques for enhancing model transparency and interpretability in real estate rent prediction: A comparative study. Finance Research Letters, 58:104306, December 2023.
- Credit card fraud detection web application using Streamlit and machine learning. In 2022 IEEE International Conference on Data Science and Information System (ICDSIS), pages 1–5, 2022.
- Chanin Nantasenamat and et al. Chapter 27 - Building bioinformatics web applications with Streamlit. In Kunal Roy, editor, Cheminformatics, QSAR and Machine Learning Applications for Novel Drug Development, pages 679–699. Academic Press, 2023.
- Shilpa Patil and V. Lokesha. Live Twitter sentiment analysis using Streamlit framework. SSRN Scholarly Paper, May 2022.
- Mariem Belghith and et al. A new rolling forecasting framework using Microsoft Power BI for data visualization: a case study in a pharmaceutical industry. Annales Pharmaceutiques Françaises, November 2023.
- Using Power BI to inform Clostridioides difficile ordering practices at an acute care hospital in Central Florida. American Journal of Infection Control, 48(8, Supplement):S57–S58, August 2020.
- An introductory audit data analytics case study: using Microsoft Power BI and Benford’s law to detect accounting irregularities. Journal of Accounting Education, 64:100855, September 2023.
- Yahui Guo et al. Integrated phenology and climate in rice yields prediction using machine learning methods. Ecological Indicators, 120:106935, 2021.
- Kersten Clauss et al. Estimating rice production in the Mekong Delta, Vietnam, utilizing time series of Sentinel-1 SAR data. International Journal of Applied Earth Observation and Geoinformation, 73:574–585, 2018.
- Jingye Han et al. Rice yield estimation using a CNN-based image-driven data assimilation framework. Field Crops Research, 288:108693, 2022.
- Xi Su et al. Grain yield prediction using multi-temporal UAV-based multispectral vegetation indices and endmember abundance in rice. Field Crops Research, 299:108992, 2023.
- Md. Monirul Islam et al. Development of remote sensing-based yield prediction models at the maturity stage of Boro rice using parametric and nonparametric approaches. Remote Sensing Applications: Society and Environment, 22:100494, 2021.
- X Zhou et al. Predicting grain yield in rice using multi-temporal vegetation indices from UAV-based multispectral and digital imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 130:246–255, 2017.
- Tri D Setiyono et al. Spatial rice yield estimation based on MODIS and Sentinel-1 SAR data and ORYZA crop growth model. Remote Sensing, 10(2), 2018.
- Senlin Guan et al. Assessing correlation of high-resolution NDVI with fertilizer application level and yield of rice and wheat crops using small UAVs. Remote Sensing, 11(2), 2019.
- VEST: automatic feature engineering for forecasting. Machine Learning, 2021.
- Jiaping Liang et al. Analysis and prediction of the impact of socio-economic and meteorological factors on rapeseed yield based on machine learning. Agronomy, 13(7), 2023.
- Estimating crop yield supply responses to be used for market outlook models: application to major developed and developing countries. NJAS - Wageningen Journal of Life Sciences, 92, 2020.
- Determinants and implications of crop production loss: an empirical exploration using ordered probit analysis. Land Use Policy, 67:527–536, 2017.
- Vipul Mann et al. SUSIE: pharmaceutical CMC ontology-based information extraction for drug development using machine learning. Computers & Chemical Engineering, 179:108446, 2023.
- Han Zhang et al. Towards foundation models for learning on tabular data. arXiv, 2023.
- Bernardino Romera-Paredes et al. Mathematical discoveries from program search with large language models. Nature, pages 1–3, 2023.
- Bruno Silva et al. GPT-4 as an agronomist assistant? Answering agriculture exams using large language models. arXiv, 2023.
- A. Tzachor et al. Large language models and agricultural extension services. Nature Food, 4(11):941–948, 2023.
- Clare Allen-Sader et al. An early warning system to predict and mitigate wheat rust diseases in Ethiopia. Environmental Research Letters, 14(11):115004, 2019.
- Parshuram Samal et al. Rice ecosystems and factors affecting varietal adoption in rainfed coastal Orissa: a multivariate probit analysis. Agricultural Economics Research Review, 24(1), 2011.
- Everett M. Rogers. Diffusion of innovations. Simon and Schuster, 2010.
- Frank M. Bass. A new product growth for model consumer durables. Management Science, 15(5):215–227, 1969.
- Modifying the Bass diffusion model to study adoption of radical new foods–The case of edible insects in the Netherlands. PLOS ONE, 15(6):e0234538, 2020.
- Modeling growth and diffusion of groundwater pumping at multiple sub-provincial scales. In AGU Fall Meeting Abstracts, volume 2018, pages GC51I–0906, 2018.
- Estimating future health technology diffusion using expert beliefs calibrated to an established diffusion model. Value in Health, 21(8):944–950, 2018.
- Y. Xian et al. Research on the market diffusion of fuel cell vehicles in China based on the generalized Bass model. IEEE Transactions on Industry Applications, 58(2):2950–2960, 2022.
- Franklin M. Lartey. Predicting product uptake using Bass, Gompertz, and Logistic diffusion models: application to a broadband product. Journal of Business Administration Research, 9(2):5, 2020.
- Market diffusion of household PV systems: Insights using the Bass model and solar water heaters market data. Energy for Sustainable Development, 55:210–220, 2020.
- Forecasting the box offices of movies coming soon using social media analysis: A method based on improved Bass models. Expert Systems with Applications, 191:116241, 2022.
- Using the Bass model to analyze the diffusion of innovations at the base of the pyramid. Business & Society, 55(2):271–298, 2016.
- Tiago P. Abud et al. A modified Bass model to calculate PVDG hosting capacity in LV networks. Electric Power Systems Research, 209:107966, 2022.
- Multicriteria optimization in humanitarian aid. European Journal of Operational Research, 252(2):351–366, 2016.
- Begoña Vitoriano et al. A multi-criteria optimization model for humanitarian aid distribution. Journal of Global Optimization, 51(2):189–208, 2011.
- Israa Ismail. A possibilistic mathematical programming model to control the flow of relief commodities in humanitarian supply chains. Computers & Industrial Engineering, 157:107305, 2021.
- Optimization of humanitarian aid resource distribution time through mixed integer linear programming. In Proceedings of the 2023 9th International Conference on Industrial and Business Engineering, pages 191–197, New York, NY, USA, 2023. Association for Computing Machinery.
- The multi-objective fuzzy mathematical programming model for humanitarian relief logistics. Industrial Engineering & Management Systems, 19(1):197–210, 2020.
- Ziaul Haq Adnan et al. Applying linear programming for logistics distribution of essential relief items during COVID-19 lockdown: Evidence from Bangladesh. International Journal of Logistics Economics and Globalisation, 9(3):191–204, 2022.
- Location-allocation analysis of humanitarian distribution plans: A case of United Nations Humanitarian Response Depots. Annals of Operations Research, 324(1):825–854, 2023.
- A new humanitarian relief logistic network for multi-objective optimization under stochastic programming. Applied Intelligence, 52(12):13729–13762, 2022.
- Koen Peters et al. The nutritious supply chain: Optimizing humanitarian food assistance. INFORMS Journal on Optimization, 3(2):200–226, 2021.
- Koen Peters et al. UN World Food Programme: Toward zero hunger with analytics. INFORMS Journal on Applied Analytics, 52(1):8–26, 2022.