Deep Neural Networks in Crop Yield Prediction

Updated 29 October 2025

Deep neural networks for crop yield prediction are defined as models that integrate diverse agricultural data, including remote sensing, weather, and soil, to forecast yields with high precision.
These methods utilize architectures like CNNs, RNNs/LSTMs, and hybrid models to capture critical spatial, temporal, and multimodal interactions in complex crop systems.
They provide actionable insights for decision-making in agriculture by pioneering multi-modal data fusion, feature engineering, and advanced explainability techniques.

Deep neural networks for crop yield prediction constitute a rapidly advancing field that integrates high-capacity learning algorithms with a diverse array of agricultural data modalities, including remote sensing, weather, soil, management, and genetic variables. Driven by the complexity and nonlinear interactions intrinsic to crop production systems, modern approaches frequently exploit multi-modal data fusion, advanced temporal and spatial modeling, and attention or feature selection mechanisms to optimize out-of-sample predictive performance while providing varying levels of interpretability and actionable domain insight.

1. Architectures and Data Modalities

Crop yield prediction has seen the application of a variety of deep neural network (DNN) architectures, each specialized for different data types and prediction goals:

Feedforward DNNs: Classic multilayer perceptrons serve as universal approximators for tabular agro-environmental data, e.g., climate, soil, management, and categorical genotype variables (Islam et al., 2021, Ramesh et al., 2020). These networks are scalable (demonstrated on datasets exceeding 300,000 instances) and leverage backpropagation for end-to-end yield regression and crop selection tasks. DNNs consistently outperform logistic regression and support vector machines on high-dimensional, multi-factor agronomic datasets.
Convolutional Neural Networks (CNNs): Used both for satellite/remote sensing inputs and Euclidean/tabular data formatted as images or histograms, CNNs extract spatial features (e.g., from Sentinel-1/2 or hyperspectral UAV imagery) and fuse on-ground signals (e.g., topography, nitrogen rate) for spatially explicit predictions (Morales et al., 2021, Moghimi et al., 2019). Some architectures, such as Hyper3DNetReg, directly produce 2D rasters of yield using 3D convolutions and separable 2D filters, enabling early-season spatial yield maps from March inputs.
Recurrent Neural Networks (RNNs) and LSTMs: Temporal dependencies in crop development are modeled via RNNs/LSTMs, which are used to ingest time series weather, management, and spectral data (Shook et al., 2020, Khaki et al., 2019, Alhnaity et al., 2019, Najjar et al., 11 Jul 2024). These architectures address non-Markovian effects relevant for greenhouse production, variable climate, and developmental stage importance.
CNN-RNN and Hybrid Designs: Integrated CNN-RNN hybrids combine CNN modules (for spatial structure or multi-scale time series) with RNN/LSTM layers (for long-range temporal dependencies) (Khaki et al., 2019, Khalilzadeh et al., 2023). Such systems capture both localized environmental effects and seasonal or multi-year cycles (e.g., genetic gain, crop improvement).
Multi-modal Attention-Augmented Models: Recently, attention mechanisms and cross-modal fusion (including transformer variants) enable selective focus on informative inputs, with architectures such as MMST-ViT (Lin et al., 10 Jun 2024) and MTMS-YieldNet (Dangi et al., 19 Sep 2025) achieving state-of-the-art performance on terabyte-scale remote sensing and climate datasets. Attention blocks enable dynamic weighting of spectral, spatial, and temporal signals and facilitate explainability downstream.
Spatial and Functional Neural Networks: DSNet (Park et al., 16 Jun 2025) applies a spatially varying functional index model, mapping daily temperature curves and scalar covariates through MLPs with explicit spatial basis expansions, spatial random effects, and nonstationary, location-specific weights.
Multi-task Learning and Transfer: Simultaneous prediction for multiple crops, as in YieldNet (Khaki et al., 2020), employs shared CNN backbones and specialized multi-target loss functions to efficiently transfer features and balance error between target crops, enhancing sample efficiency and generalization.

Architecture	Data Type(s)	Prediction Domain
Feedforward DNN	Tabular	Zonal yield, crop select
CNN	Imagery, raster	Spatial raster, disease
RNN/LSTM	Sequential/tabular	Temporal yields, stages
CNN-RNN Hybrid	Both	Complex, large-scale
Attention/Transformer	Multi-modal	Multi-scale, cross-modal
Spatial/Functional	Functional curves	Geospatial, nonstationary
Multi-task/Transfer	Any	Multi-crop, shared features

2. Data Fusion and Feature Engineering

High-dimensionality and the heterogeneity of agricultural datasets are addressed through:

Explicit Feature Engineering: For tabular approaches, high-dimensional input spaces (up to 46 features: weather, soil, fertilizer, land) are directly normalized, one-hot encoded (for region/season), and, where needed, subject to feature selection via measures such as Kendall correlation or backprop-based attribution (Islam et al., 2021, Olisah et al., 8 Jan 2024).
Remote Sensing Integration: UAV or satellite imagery is radiometrically calibrated, geo-masked, and segmented via algorithmic indices (e.g., NDPSI for senescence/noise removal (Moghimi et al., 2019)). Object-based approaches extract features at the sub-plot, plot, or pixel level—mean and variance per band, fractional endmember unmixing, and spatial-hierarchical features.
Multi-modal Fusion: YieldNet and MMST-ViT integrate high-temporal-resolution remote imagery, weather (hourly to yearly), soil, and sometimes genotype/management variables via attention or multi-branch fusion modules (Khaki et al., 2020, Lin et al., 10 Jun 2024).
Functional Data Representations: DSNet and similar models use basis expansions (e.g., Fourier, splines) for functional predictors (daily temperature curves) and for spatial components, with low-rank remodeling to mitigate dimensionality (Park et al., 16 Jun 2025).
Time-Window/Stage Aggregation: LSTM-based approaches aggregate daily weather to weekly or phenological intervals, aligning with plant development (as per BBCH stages). This structuring is crucial for explainability and for matching critical agro-developmental phases (Najjar et al., 11 Jul 2024, Shook et al., 2020).

3. Training, Evaluation, and Benchmarking

Training Protocols: Deep models are typically trained with early stopping on validation loss, dropout for regularization, and adaptive optimizers (Adam, Adadelta) (Ramesh et al., 2020, Moghimi et al., 2019). Cross-validation uses train/test/holdout or sliding windows across years, environments, or districts.
Metrics: Performance is consistently measured with root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination ( $R^2$ ), with increasing use of task-specific metrics such as MAPE, SMAPE, RMSLE, and newly defined metrics like ARSE (average of absolute root squared error) (Olisah et al., 8 Jan 2024).
Benchmark Comparisons: DNNs, LSTMs, and attention-based models typically outperform SVM, logistic regression, and unregularized decision trees on large, multi-feature tasks (Bangladesh; (Islam et al., 2021)). Random forests and XGBoost may be competitive on moderate sample sizes—XGBoost sometimes outperforms deep models in moderate-data settings with feature-based representations (Huber et al., 2022). DNNs, when equipped with sufficient samples and expressive architectures, surpass tree-based methods, especially in generalization and nonstationary/climate-variant contexts (Park et al., 16 Jun 2025).
Generality and Robustness: Generalization is tested via "unseen" and "unforeseen" data splits: DNNs are more responsive to genuine input changes and more robust across perturbed climatic or edaphic variables, while decision trees tend to bias toward high-cardinality categorical features (Olisah et al., 8 Jan 2024).

4. Explainability and Feature Attribution

Recent research integrates various strategies for interpretability:

Temporal and Spectral Attention: Models with temporal attention (e.g., "attention as explanation") weight critical weeks or stages, revealing when (late-season, reproductive phase, etc.) and which (e.g., min surface temperature, direct normal irradiance) environmental signals are yield-determinant (Shook et al., 2020, Najjar et al., 11 Jul 2024, Khalilzadeh et al., 2023).
Feature Importance Methods: Permutation, guided backpropagation, and Shapley Value Sampling (SVS) yield rankings of input variables or time windows; these analyses align with agronomic knowledge, highlighting, e.g., silt/precipitation interplay (Olisah et al., 8 Jan 2024), or nitrogen/terrain roles for wheat (Najjar et al., 11 Jul 2024).
Spatial Explainability: Overlapping-patch CNN regression with output smoothing reveals spatial correspondence of predicted yield to management or environmental zones, and object-based feature mapping links local yield variability to plot marginals (Morales et al., 2021, Moghimi et al., 2019).
Model Transparency: Functional, spatial, and temporal indices (as in DSNet) localize heterogeneous effects and enable decomposition of model output into interpretable spatial and covariate components (Park et al., 16 Jun 2025).

5. Applications and Deployment

Operational Decision Support: Systems such as CYPUR-NN and IntelliFarm provide web/mobile interfaces for instant (<10s) yield/disease predictions from images or sensor readings, supporting resource and risk management (Ramesh et al., 2020, Olisah et al., 8 Jan 2024).
Pre-Season and In-Season Forecasting: LSTM- and CNN-path architectures enable predictions well before harvest, crucial for logistics and market signaling (Cunha et al., 2020). Dynamic input timelines, guided by crop calendars, advance usability in operational planning.
High-Throughput Phenotyping: UAV/DNN frameworks allow breeders to phenotype hundreds to thousands of lines by sub-plot prediction, accelerating breeding cycles and enabling remote screening for uniform/high-yielding genotypes (Moghimi et al., 2019).
High-Resolution Forecasts: Weakly supervised models disaggregate regional (county/province) labels to high-resolution subregion forecasts without local ground truth, employing aggregation layers and loss normalization to leverage satellite/model data (Paudel et al., 2022).
Climate Change Adaptation: Integration of long-term weather with multi-sensor imagery, spatially varying coefficients, and climate embeddings supports resilient cropping strategies and "what-if" stress testing for variety/management selection (Lin et al., 10 Jun 2024, Park et al., 16 Jun 2025, Shook et al., 2020).

6. Limitations, Open Challenges, and Future Directions

Data Limitations: Generalizability is constrained by training data diversity—small or non-representative datasets can limit transfer to new climates/crops/regions. Disease detection is often constrained to prominent, visually obvious cases (Ramesh et al., 2020). Expanding datasets in space, time, and modalities is a recurring priority.
Model Complexity & Interpretability: Some recent architectures, especially attention-based and functional spatial DNNs, gain expressiveness at the cost of interpretability and increased computational demand (Park et al., 16 Jun 2025, Dangi et al., 19 Sep 2025). Continued development of domain-appropriate explainability remains critical.
Multi-modality and Transfer: Simultaneous modeling of genotype, weather, soil, and imagery—alongside transfer between crops and regions—necessitates further advances in multi-task learning, feature alignment, and interpretability (as pursued in DeepG2P (Sharma et al., 2022) and YieldNet (Khaki et al., 2020)).
Operational Integration: Real-time decision support, integration with IoT/sensor networks, and actionable guidance (e.g., disease diagnosis, input recommendation) require continued engineering effort and validation in diverse farm/market contexts (Ramesh et al., 2020, Olisah et al., 8 Jan 2024).
Model Robustness: Adversarial and climate-variant stress testing, as well as uncertainty quantification in prediction, are increasingly important, especially as climate extremes become more frequent (Park et al., 16 Jun 2025, Olisah et al., 8 Jan 2024).

7. Representative Results

Model/System	Metric	Value	Setting/Nature
CYPUR-NN (CNN) (Ramesh et al., 2020)	CNN Acc. (train/test)	86.37% / 83.87%	Paddy yield+image
DNN Bangladesh (Islam et al., 2021)	MSE (potato, test)	2.7%	46-feature DNN
DNN Nigeria (Olisah et al., 8 Jan 2024)	ARSE (DNNR16)	0.0146 t/ha	Soil-weather input
Hyper3DNetReg (Morales et al., 2021)	RMSE (yield raster)	Comparable or lower than ensembles	Early-season SAR
CNN-RNN Corn Belt (Khaki et al., 2019)	RMSE (corn, 2018)	11.48 bu/ac (~7%)	Weather, soil, Mgt
DSNet Midwest (Park et al., 16 Jun 2025)	MSPE	186.4 (best)	Spatial-temporal
MTMS-YieldNet (Dangi et al., 19 Sep 2025)	MAPE (Sentinel-2)	0.331	SICKLE, India

Performance varies by context, with deep models routinely exceeding prior statistical/ML methods (SVR, XGB, GBDT, Lasso, shallow NN, RF) in precision, robustness, and generalization—subject to data availability and proper fusion of modalities.

Deep neural networks for crop yield prediction thus represent a highly active, multidimensional research domain, characterized by rapid adoption of advances from machine vision, multi-modal learning, attention-based architectures, and spatial statistics. Continued advances in data acquisition, model interpretability, and operational integration are expected to accelerate their impact across global agricultural systems.