In the Search for Optimal Multi-view Learning Models for Crop Classification with Global Remote Sensing Data (2403.16582v2)
Abstract: Studying and analyzing cropland is a difficult task due to its dynamic and heterogeneous growth behavior. Usually, diverse data sources can be collected for its estimation. Although deep learning models have proven to excel in the crop classification task, they face substantial challenges when dealing with multiple inputs, named Multi-View Learning (MVL). The methods used in the MVL scenario can be structured based on the encoder architecture, the fusion strategy, and the optimization technique. The literature has primarily focused on using specific encoder architectures for local regions, lacking a deeper exploration of other components in the MVL methodology. In contrast, we investigate the simultaneous selection of the fusion strategy and encoder architecture, assessing global-scale cropland and crop-type classifications. We use a range of five fusion strategies (Input, Feature, Decision, Ensemble, Hybrid) and five temporal encoders (LSTM, GRU, TempCNN, TAE, L-TAE) as possible configurations in the MVL method. We use the CropHarvest dataset for validation, which provides optical, radar, weather time series, and topographic information as input data. We found that in scenarios with a limited number of labeled samples, a unique configuration is insufficient for all the cases. Instead, a specialized combination should be meticulously sought, including an encoder and fusion strategy. To streamline this search process, we suggest identifying the optimal encoder architecture tailored for a particular fusion strategy, and then determining the most suitable fusion strategy for the classification task. We provide a methodological framework for researchers exploring crop classification through an MVL methodology.
- Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks. ISPRS Journal of Photogrammetry and Remote Sensing, 140:20–32.
- M3Fusion: A deep learning architecture for multiscale multimodal multitemporal satellite data fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 11(12):4939–4949.
- Deep learning for the Earth Sciences: A comprehensive approach to remote sensing, climate science and geosciences. John Wiley & Sons.
- On the properties of neural machine translation: Encoder–decoder approaches. In 8th Workshop on Syntax, Semantics and Structure in Statistical Translation, SSST 2014, pages 103–111. Association for Computational Linguistics (ACL).
- Dense fully convolutional networks for crop recognition from multitemporal SAR image sequences. In IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pages 7460–7463.
- Sentinel-1 and 2 time-series for vegetation mapping using random forest classification: A case study of Northern Croatia. Remote Sensing, 13(12):2321.
- Fusing Sentinel-1 and Sentinel-2 images for deforestation detection in the Brazilian Amazon under diverse cloud conditions. IEEE Geoscience and Remote Sensing Letters, 20:1–5.
- Multimodal deep learning based crop classification using multispectral and multitemporal satellite imagery. In Proceedings of the International Conference on Knowledge Discovery & Data Mining (SIGKDD), pages 3234–3242. ACM.
- Lightweight temporal self-attention for classifying satellite images time series. In Advanced Analytics and Learning on Temporal Data: 5th ECML PKDD Workshop, pages 171–181. Springer.
- Time-space tradeoff in deep learning models for crop classification on satellite multi-spectral image time series. In IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pages 6247–6250. IEEE.
- Satellite image time series classification with pixel-set encoders and temporal self-attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12325–12334.
- Long short-term memory. Neural computation, 9(8):1735–1780.
- More diverse means better: Multimodal deep learning meets remote-sensing imagery classification. IEEE Transactions on Geoscience and Remote Sensing, 59(5):4340–4354.
- Improved early crop type identification by joint use of high temporal resolution SAR and optical image time series. Remote Sensing, 8(5):362.
- Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3):204–210.
- Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest. Environmental Research Letters, 15(6):064005.
- Logistic regression in rare events data. Political analysis, 9(2):137–163.
- Deep convolutional neural network training enrichment using multi-view object-based analysis of Unmanned Aerial systems imagery for wetlands classification. ISPRS Journal of Photogrammetry and Remote Sensing, 139:154–170.
- Semantic segmentation of crop type in Africa: A novel dataset and analysis of deep learning methods. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 75–82.
- Are multimodal transformers robust to missing modality? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18177–18186.
- Predictive uncertainty estimation via prior networks. Advances in neural information processing systems, 31.
- Spatial and temporal deep learning methods for deriving land-use following deforestation: A pan-tropical case study using Landsat time series. Remote Sensing of Environment, 264:112600.
- Mixture of experts: A literature survey. Artificial Intelligence Review, 42:275–293.
- A comparative assessment of multi-view fusion learning for crop classification. In IGARSS 2023 - IEEE International Geoscience and Remote Sensing Symposium, pages 5631–5634.
- Common practices and taxonomy in deep multi-view fusion for remote sensing applications. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
- Adaptive fusion of multi-view remote sensing data for optimal sub-field crop yield prediction. arXiv preprint arXiv:2401.11844.
- Crop type mapping from optical and radar time series using attention-based deep learning. Remote Sensing, 13(22).
- Predicting crop yield with machine learning: An extensive analysis of input modalities and models on a field and sub-field level. In IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, pages 2767–2770. IEEE.
- Temporal convolutional neural network for the classification of satellite image time series. Remote Sensing, 11(5):523.
- Temporal vegetation modelling using long short-term memory networks for crop identification from medium-resolution multi-spectral satellite images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 11–19.
- Self-attention for raw optical satellite time series classification. ISPRS journal of photogrammetry and remote sensing, 169:421–435.
- Multi-modal temporal attention models for crop mapping from satellite time series. ISPRS Journal of Photogrammetry and Remote Sensing, 187:294–305.
- EuroCrops: The largest harmonized open crop dataset across the European Union. Scientific Data, 10(1):612.
- Effective data fusion with generalized vegetation index: Evidence from land cover segmentation in agriculture. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 60–61.
- Fusion of moderate resolution earth observations for operational crop type mapping. Remote Sensing, 10(7):1058.
- TIML: Task-informed meta-learning for agriculture. arXiv preprint arXiv:2202.02124.
- CropHarvest: A global dataset for crop-type classification. Proceedings of NIPS Datasets and Benchmarks Track.
- Lightweight, pre-trained transformers for remote sensing timeseries. arXiv preprint arXiv:2304.14065.
- Visualizing data using t-SNE. Journal of machine learning research, 9(11).
- Attention is all you need. Advances in neural information processing systems, 30.
- Early crop classification via multi-modal satellite data fusion and temporal attention. Remote Sensing, 15(3).
- Deep multi-view learning methods: A review. Neurocomputing, 448:106–129.
- SITS-Former: A pre-trained spatio-spectral-temporal representation model for Sentinel-2 time series classification. International Journal of Applied Earth Observation and Geoinformation, 106:102651.
- A hybrid attention-aware fusion network (HAFNet) for building extraction from high-resolution imagery and LiDAR data. Remote Sensing, 12(22).
- Evaluation of five deep learning models for crop type mapping using Sentinel-2 time series images with missing information. Remote Sensing, 13(14):2790.
- Deep learning based multi-temporal crop classification. Remote sensing of environment, 221:430–443.