Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer (2309.09067v2)

Published 16 Sep 2023 in cs.CV

Abstract: Precise crop yield prediction provides valuable information for agricultural planning and decision-making processes. However, timely predicting crop yields remains challenging as crop growth is sensitive to growing season weather variation and climate change. In this work, we develop a deep learning-based solution, namely Multi-Modal Spatial-Temporal Vision Transformer (MMST-ViT), for predicting crop yields at the county level across the United States, by considering the effects of short-term meteorological variations during the growing season and the long-term climate change on crops. Specifically, our MMST-ViT consists of a Multi-Modal Transformer, a Spatial Transformer, and a Temporal Transformer. The Multi-Modal Transformer leverages both visual remote sensing data and short-term meteorological data for modeling the effect of growing season weather variations on crop growth. The Spatial Transformer learns the high-resolution spatial dependency among counties for accurate agricultural tracking. The Temporal Transformer captures the long-range temporal dependency for learning the impact of long-term climate change on crops. Meanwhile, we also devise a novel multi-modal contrastive learning technique to pre-train our model without extensive human supervision. Hence, our MMST-ViT captures the impacts of both short-term weather variations and long-term climate change on crops by leveraging both satellite images and meteorological data. We have conducted extensive experiments on over 200 counties in the United States, with the experimental results exhibiting that our MMST-ViT outperforms its counterparts under three performance metrics of interest.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. A time-dependent parameter estimation framework for crop modeling. Scientific reports, 2021.
  2. An interaction regression model for crop yield prediction. Scientific reports, 2021.
  3. Vivit: A video vision transformer. In International Conference on Computer Vision (ICCV), 2021.
  4. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  5. Beit: BERT pre-training of image transformers. In International Conference on Learning Representations (ICLR), 2022.
  6. Unilmv2: Pseudo-masked language models for unified language model pre-training. In International Conference on Machine Learning (ICML), 2020.
  7. Emerging properties in self-supervised vision transformers. In International Conference on Computer Vision (ICCV), 2021.
  8. Learning the distribution of errors in stereo matching for joint disparity and uncertainty estimation. In Computer Vision and Pattern Recognition (CVPR), 2023.
  9. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML), 2020.
  10. High-resolution crop yield and water productivity dataset generated using random forest and remote sensing. Scientific Data, 2022.
  11. Contribution of crop models to adaptation in wheat. Trends in plant science, 2017.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021.
  13. Influence of soil heterogeneity on soybean plant development and crop yield evaluated using time-series of uav and ground-based geophysical imagery. Scientific reports, 2021.
  14. Multiscale vision transformers. In International Conference on Computer Vision (ICCV), 2021.
  15. A GNN-RNN approach for harnessing geospatial and temporal information: Application to crop yield prediction. In AAAI, 2022.
  16. Rice crop yield prediction using artificial neural networks. In 2016 IEEE Technological Innovations in ICT for Agriculture and Rural Development (TIAR), 2016.
  17. Vivien Sainte Fare Garnot and Loïc Landrieu. Panoptic segmentation of satellite image time series with convolutional temporal attention networks. In International Conference on Computer Vision (ICCV), 2021.
  18. Masked autoencoders are scalable vision learners. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  19. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  20. Interpretable minority synthesis for imbalanced classification. In IJCAI, 2021.
  21. Rethinking spatial dimensions of vision transformers. In International Conference on Computer Vision (ICCV), 2021.
  22. Long short-term memory. Neural computation, 1997.
  23. Yield forecasting. Agricultural systems, 1992.
  24. HRRR. The high-resolution rapid refresh (hrrr), 2022.
  25. Toward a new generation of agricultural system data, models, and knowledge products: State of agricultural systems science. Agricultural systems, 2017.
  26. Simultaneous corn and soybean yield prediction from remote sensing data using deep transfer learning. Scientific Reports, 2021.
  27. Crop yield prediction using deep neural networks. Frontiers in plant science, 2019.
  28. A cnn-rnn framework for crop yield prediction. Frontiers in Plant Science, 2020.
  29. Imagenet classification with deep convolutional neural networks. In Neural Information Processing Systems (NeurIPS), 2012.
  30. Padclip: Pseudo-labeling with adaptive debiasing in clip for unsupervised domain adaptation. In ICCV, 2023.
  31. Mvitv2: Improved multiscale vision transformers for classification and detection. In Computer Vision and Pattern Recognition (CVPR), 2022.
  32. Cascade variational auto-encoder for hierarchical disentanglement. In Conference on Information & Knowledge Management (CIKM), 2022.
  33. Comprehensive transformer-based model architecture for real-world storm prediction. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2023.
  34. Swin transformer: Hierarchical vision transformer using shifted windows. In International Conference on Computer Vision (ICCV), 2021.
  35. SGDR: stochastic gradient descent with warm restarts. In International Conference on Learning Representations (ICLR), 2017.
  36. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
  37. Advancing agricultural research using machine learning algorithms. Scientific reports, 2021.
  38. Learning transferable visual models from natural language supervision. In ICML, 2021.
  39. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020.
  40. Tackling climate change with machine learning. ACM Computing Surveys (CSUR), 2022.
  41. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  42. Sentinel-Hub. Sentinel hub process api, 2022.
  43. Coupling machine learning and crop modeling improves crop yield prediction in the us corn belt. Scientific reports, 2021.
  44. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Neural Information Processing Systems (NeurIPS), 2015.
  45. Modelling the impacts of weather and climate variability on crop productivity over a large area: a new process-based model development, optimization, and uncertainties analysis. agricultural and forest meteorology, 2009.
  46. VideoMAE: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In Neural Information Processing Systems (NeurIPS), 2022.
  47. Training data-efficient image transformers & distillation through attention. In Marina Meila and Tong Zhang, editors, International Conference on Machine Learning (ICML), 2021.
  48. Cropharvest: a global satellite dataset for crop type classification. Neural Information Processing Systems (NeurIPS), 2021.
  49. Learning long-term crop management strategies with cyclesgym. In Neural Information Processing Systems (NeurIPS), 2022.
  50. USDA. The united states department of agriculture (usda), 2022.
  51. Attention is all you need. In NeurIPS, 2017.
  52. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In International Conference on Computer Vision (ICCV), pages 548–558, 2021.
  53. PVT v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media, 2022.
  54. Spatiotemporal changes of winter wheat planted and harvested areas, photosynthesis and grain production in the contiguous united states from 2008–2018. Remote Sensing, 2021.
  55. Global ammonia emissions from synthetic nitrogen fertilizer applications in agricultural systems: Empirical and process-based estimates and uncertainty. Global change biology, 2019.
  56. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In International Conference on Computer Vision (ICCV), 2021.
  57. Precise weather parameter predictions for target regions via neural networks. In ECML-PKDD, 2021.
Citations (14)

Summary

  • The paper introduces MMST-ViT, a novel vision transformer that integrates multi-modal data for county-level crop yield prediction in a climate change context.
  • It employs a three-component architecture—multi-modal, spatial, and temporal transformers—to capture both short-term weather and long-term climate effects.
  • Experiments on US soybean data achieved an RMSE of 3.9, R² of 0.843, and a 0.918 correlation, demonstrating robust predictive accuracy.

MMST-ViT: A Vision Transformer Approach for Climate Change-aware Crop Yield Prediction

Introduction

Predicting crop yields accurately is vital for effective agricultural planning, economic decisions, and ensuring global food security. Such predictions are notably challenging due to the sensitivity of crop growth to weather variations and climate change. This paper introduces the Multi-Modal Spatial-Temporal Vision Transformer (MMST-ViT), a novel approach that integrates both visual remote sensing data and meteorological data to predict crop yields at the county level across the United States. The model specifically aims to capture the effects of short-term meteorological variations during the growing season and long-term climate change.

Methodology

The proposed MMST-ViT model comprises three key components: (1) a Multi-Modal Transformer that processes both visual and short-term meteorological data to model the impact of growing season weather variations; (2) a Spatial Transformer to learn high-resolution spatial dependencies for accurate agricultural tracking; and (3) a Temporal Transformer that captures the long-term temporal dependence, representing the effect of climate change on crop growth.

A notable contribution of this work is the development of a novel multi-modal contrastive learning technique. This technique enables efficient pre-training of the model without extensive labeling effort, leveraging the multi-modal nature of the input data (satellite images and meteorological parameters).

Results and Discussion

Extensive experiments were conducted on data from over 200 counties in the United States, showing MMST-ViT's superior performance over existing approaches. For instance, on soybean yield prediction, the MMST-ViT model achieved the lowest RMSE of 3.9, the highest R-squared value of 0.843, and the best correlation coefficient of 0.918, indicating its robust predictive capability.

Furthermore, visualizations of prediction errors across multiple U.S. states demonstrate the model's consistent accuracy in various geographical regions. Early prediction experiments, predicting crop yields one year ahead, also showed promising results, proving the model's robustness and practical utility for agricultural planning and decision-making processes.

Implications and Future Developments

The integration of multi-modal data (visual remote sensing and meteorological data) presents a significant step forward in crop yield prediction. It not only enhances the accuracy of the predictions but also provides insights into the complex interplay between crop growth, weather variations, and climate change.

Looking ahead, this work opens up numerous avenues for future research. One potential direction is to explore the impact of additional data modalities (such as soil moisture and nutrient levels) on prediction accuracy. Another area of interest could be the development of more sophisticated multi-modal pre-training techniques to further improve the model’s performance and generalize across different types of crops and geographic regions.

The MMST-ViT framework offers a promising model for leveraging advanced machine learning techniques to address the pressing challenges posed by climate change on agriculture. By continuing to refine and expand upon this work, there is a potential for significant positive impacts on food security, agricultural sustainability, and economic planning on a global scale.