GeoViT: A Versatile Vision Transformer Architecture for Geospatial Image Analysis (2311.14301v1)
Abstract: Greenhouse gases are pivotal drivers of climate change, necessitating precise quantification and source identification to foster mitigation strategies. We introduce GeoViT, a compact vision transformer model adept in processing satellite imagery for multimodal segmentation, classification, and regression tasks targeting CO2 and NO2 emissions. Leveraging GeoViT, we attain superior accuracy in estimating power generation rates, fuel type, plume coverage for CO2, and high-resolution NO2 concentration mapping, surpassing previous state-of-the-art models while significantly reducing model size. GeoViT demonstrates the efficacy of vision transformer architectures in harnessing satellite-derived data for enhanced GHG emission insights, proving instrumental in advancing climate change monitoring and emission regulation efforts globally.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- J. Guo, N. Jia, and J. Bai, “Transformer based on channel-spatial attention for accurate classification of scenes in remote sensing image,” Scientific Reports, vol. 12, no. 1, p. 15473, 2022.
- J. Hanna, M. Mommert, L. M. Scheibenreif, and D. Borth, “Multitask learning for estimating power plant greenhouse gas emissions from satellite imagery,” in NeurIPS 2021 Workshop on Tackling Climate Change with Machine Learning, 2021.
- G. Hoek, R. Beelen, K. De Hoogh, D. Vienneau, J. Gulliver, P. Fischer, and D. Briggs, “A review of land-use regression models to assess spatial variation of outdoor air pollution,” Atmospheric environment, vol. 42, no. 33, pp. 7561–7578, 2008.
- S. Janssen, G. Dumont, F. Fierens, and C. Mensink, “Spatial interpolation of air pollution measurements using corine land cover data,” Atmospheric Environment, vol. 42, no. 20, pp. 4884–4903, 2008.
- H. Lin, X. Cheng, X. Wu, and D. Shen, “Cat: Cross attention in vision transformer,” in 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2022, pp. 1–6.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
- L. Scheibenreif, M. Mommert, and D. Borth, “Estimation of air pollution with remote sensing data: Revealing greenhouse gas emissions from space,” arXiv preprint arXiv:2108.13902, 2021.
- G. Sumbul, M. Charfuelan, B. Demir, and V. Markl, “Bigearthnet: A large-scale benchmark archive for remote sensing image understanding,” in IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2019, pp. 5901–5904.
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
- H. Touvron et al., “Training data-efficient image transformers & distillation through attention,” arXiv preprint arXiv:2012.12877, 2020.
- E. Voita, D. Talbot, F. Moiseev, R. Sennrich, and I. Titov, “Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned,” arXiv preprint arXiv:1905.09418, 2019.
- W. Wang et al., “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” arXiv preprint arXiv:2102.12122, 2021.
- Z. Wang, Y. Bai, Y. Zhou, and C. Xie, “Can cnns be more robust than transformers?” 2023.
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 12 077–12 090, 2021.