MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer (2309.09067v2)
Abstract: Precise crop yield prediction provides valuable information for agricultural planning and decision-making processes. However, timely predicting crop yields remains challenging as crop growth is sensitive to growing season weather variation and climate change. In this work, we develop a deep learning-based solution, namely Multi-Modal Spatial-Temporal Vision Transformer (MMST-ViT), for predicting crop yields at the county level across the United States, by considering the effects of short-term meteorological variations during the growing season and the long-term climate change on crops. Specifically, our MMST-ViT consists of a Multi-Modal Transformer, a Spatial Transformer, and a Temporal Transformer. The Multi-Modal Transformer leverages both visual remote sensing data and short-term meteorological data for modeling the effect of growing season weather variations on crop growth. The Spatial Transformer learns the high-resolution spatial dependency among counties for accurate agricultural tracking. The Temporal Transformer captures the long-range temporal dependency for learning the impact of long-term climate change on crops. Meanwhile, we also devise a novel multi-modal contrastive learning technique to pre-train our model without extensive human supervision. Hence, our MMST-ViT captures the impacts of both short-term weather variations and long-term climate change on crops by leveraging both satellite images and meteorological data. We have conducted extensive experiments on over 200 counties in the United States, with the experimental results exhibiting that our MMST-ViT outperforms its counterparts under three performance metrics of interest.
- A time-dependent parameter estimation framework for crop modeling. Scientific reports, 2021.
- An interaction regression model for crop yield prediction. Scientific reports, 2021.
- Vivit: A video vision transformer. In International Conference on Computer Vision (ICCV), 2021.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Beit: BERT pre-training of image transformers. In International Conference on Learning Representations (ICLR), 2022.
- Unilmv2: Pseudo-masked language models for unified language model pre-training. In International Conference on Machine Learning (ICML), 2020.
- Emerging properties in self-supervised vision transformers. In International Conference on Computer Vision (ICCV), 2021.
- Learning the distribution of errors in stereo matching for joint disparity and uncertainty estimation. In Computer Vision and Pattern Recognition (CVPR), 2023.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML), 2020.
- High-resolution crop yield and water productivity dataset generated using random forest and remote sensing. Scientific Data, 2022.
- Contribution of crop models to adaptation in wheat. Trends in plant science, 2017.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021.
- Influence of soil heterogeneity on soybean plant development and crop yield evaluated using time-series of uav and ground-based geophysical imagery. Scientific reports, 2021.
- Multiscale vision transformers. In International Conference on Computer Vision (ICCV), 2021.
- A GNN-RNN approach for harnessing geospatial and temporal information: Application to crop yield prediction. In AAAI, 2022.
- Rice crop yield prediction using artificial neural networks. In 2016 IEEE Technological Innovations in ICT for Agriculture and Rural Development (TIAR), 2016.
- Vivien Sainte Fare Garnot and Loïc Landrieu. Panoptic segmentation of satellite image time series with convolutional temporal attention networks. In International Conference on Computer Vision (ICCV), 2021.
- Masked autoencoders are scalable vision learners. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Interpretable minority synthesis for imbalanced classification. In IJCAI, 2021.
- Rethinking spatial dimensions of vision transformers. In International Conference on Computer Vision (ICCV), 2021.
- Long short-term memory. Neural computation, 1997.
- Yield forecasting. Agricultural systems, 1992.
- HRRR. The high-resolution rapid refresh (hrrr), 2022.
- Toward a new generation of agricultural system data, models, and knowledge products: State of agricultural systems science. Agricultural systems, 2017.
- Simultaneous corn and soybean yield prediction from remote sensing data using deep transfer learning. Scientific Reports, 2021.
- Crop yield prediction using deep neural networks. Frontiers in plant science, 2019.
- A cnn-rnn framework for crop yield prediction. Frontiers in Plant Science, 2020.
- Imagenet classification with deep convolutional neural networks. In Neural Information Processing Systems (NeurIPS), 2012.
- Padclip: Pseudo-labeling with adaptive debiasing in clip for unsupervised domain adaptation. In ICCV, 2023.
- Mvitv2: Improved multiscale vision transformers for classification and detection. In Computer Vision and Pattern Recognition (CVPR), 2022.
- Cascade variational auto-encoder for hierarchical disentanglement. In Conference on Information & Knowledge Management (CIKM), 2022.
- Comprehensive transformer-based model architecture for real-world storm prediction. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2023.
- Swin transformer: Hierarchical vision transformer using shifted windows. In International Conference on Computer Vision (ICCV), 2021.
- SGDR: stochastic gradient descent with warm restarts. In International Conference on Learning Representations (ICLR), 2017.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
- Advancing agricultural research using machine learning algorithms. Scientific reports, 2021.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020.
- Tackling climate change with machine learning. ACM Computing Surveys (CSUR), 2022.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Sentinel-Hub. Sentinel hub process api, 2022.
- Coupling machine learning and crop modeling improves crop yield prediction in the us corn belt. Scientific reports, 2021.
- Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Neural Information Processing Systems (NeurIPS), 2015.
- Modelling the impacts of weather and climate variability on crop productivity over a large area: a new process-based model development, optimization, and uncertainties analysis. agricultural and forest meteorology, 2009.
- VideoMAE: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In Neural Information Processing Systems (NeurIPS), 2022.
- Training data-efficient image transformers & distillation through attention. In Marina Meila and Tong Zhang, editors, International Conference on Machine Learning (ICML), 2021.
- Cropharvest: a global satellite dataset for crop type classification. Neural Information Processing Systems (NeurIPS), 2021.
- Learning long-term crop management strategies with cyclesgym. In Neural Information Processing Systems (NeurIPS), 2022.
- USDA. The united states department of agriculture (usda), 2022.
- Attention is all you need. In NeurIPS, 2017.
- Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In International Conference on Computer Vision (ICCV), pages 548–558, 2021.
- PVT v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media, 2022.
- Spatiotemporal changes of winter wheat planted and harvested areas, photosynthesis and grain production in the contiguous united states from 2008–2018. Remote Sensing, 2021.
- Global ammonia emissions from synthetic nitrogen fertilizer applications in agricultural systems: Empirical and process-based estimates and uncertainty. Global change biology, 2019.
- Tokens-to-token vit: Training vision transformers from scratch on imagenet. In International Conference on Computer Vision (ICCV), 2021.
- Precise weather parameter predictions for target regions via neural networks. In ECML-PKDD, 2021.