DiffusionSat: A Generative Foundation Model for Satellite Imagery (2312.03606v2)
Abstract: Diffusion models have achieved state-of-the-art results on many modalities including images, speech, and video. However, existing models are not tailored to support remote sensing data, which is widely used in important applications including environmental monitoring and crop-yield prediction. Satellite images are significantly different from natural images -- they can be multi-spectral, irregularly sampled across time -- and existing diffusion models trained on images from the Web do not support them. Furthermore, remote sensing data is inherently spatio-temporal, requiring conditional generation tasks not supported by traditional methods based on captions or images. In this paper, we present DiffusionSat, to date the largest generative foundation model trained on a collection of publicly available large, high-resolution remote sensing datasets. As text-based captions are sparsely available for satellite images, we incorporate the associated metadata such as geolocation as conditioning information. Our method produces realistic samples and can be used to solve multiple generative tasks including temporal generation, superresolution given multi-spectral inputs and in-painting. Our method outperforms previous state-of-the-art methods for satellite image generation and is the first large-scale generative foundation model for satellite imagery. The project website can be found here: https://samar-khanna.github.io/DiffusionSat/
- Latent-shift: Latent diffusion with temporal shift for efficient text-to-video generation. arXiv preprint arXiv:2304.08477, 2023.
- Generating interpretable poverty maps using object detection in satellite images. arXiv preprint arXiv:2002.01612, 2020.
- Geography-aware self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10181–10190, 2021a.
- Efficient poverty mapping from high resolution remote sensing images. In Proc. AAAI Conf. Artif. Intell, volume 35, pp. 12–20, 2021b.
- Satlas: A large-scale, multi-task dataset for remote sensing image understanding. arXiv preprint arXiv:2211.15660, 2022.
- URL https://satlas.allen.ai/superres.
- Evaluation of corona and ikonos high resolution satellite imagery for archaeological prospection in western syria. antiquity, 81(311):161–175, 2007.
- Align your latents: High-resolution video synthesis with latent diffusion models. arXiv preprint arXiv:2304.08818, 2023.
- High-resolution satellite imagery is an important yet underutilized resource in conservation biology. PLoS One, 9(1):e86908, 2014.
- Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18392–18402, 2023.
- Using satellite imagery to understand and promote sustainable development. Science, 371(6535):eabe8628, 2021.
- Generative novel view synthesis with 3d-aware diffusion models. arXiv preprint arXiv:2304.02602, 2023.
- Functional map of the world. In CVPR, 2018.
- Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery. Advances in Neural Information Processing Systems, 35:197–211, 2022.
- Open high-resolution satellite imagery: The worldstrat dataset–with application to super-resolution. Advances in Neural Information Processing Systems, 35:25979–25991, 2022.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12873–12883, 2021.
- Generation of spectral–temporal response surfaces by combining multispectral satellite and hyperspectral uav imagery for precision agriculture applications. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(6):3140–3146, 2015.
- Enlighten-gan for super resolution reconstruction in mid-resolution remote sensing images. Remote Sensing, 13(6):1104, 2021.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- xbd: A dataset for assessing building damage from satellite imagery. arXiv preprint arXiv:1911.09296, 2019.
- Deep back-projection networks for super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1664–1673, 2018.
- Spatial-temporal super-resolution of satellite imagery via conditional pixel synthesis. Advances in Neural Information Processing Systems, 34:27903–27915, 2021.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
- Sensing population distribution from satellite imagery via deep learning: Model selection, neighboring effects, and systematic biases. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:5137–5151, 2021.
- Image-to-image translation with conditional adversarial networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, 2017.
- Combining satellite imagery and machine learning to predict poverty. Science, 353(6301):790–794, 2016.
- Application of high spatial resolution satellite imagery for riparian and forest ecosystem classification. Remote sensing of Environment, 110(1):29–44, 2007.
- Denoising diffusion probabilistic models for 3d medical image generation. Scientific Reports, 13(1):7303, 2023.
- Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
- On convergence and stability of gans. arXiv preprint arXiv:1705.07215, 2017.
- Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020.
- Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1931–1941, 2023.
- xview: Objects in context in overhead imagery. arXiv preprint arXiv:1802.07856, 2018.
- Mask conditional synthetic satellite imagery. arXiv preprint arXiv:2302.04305, 2023.
- Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
- Generating high-quality and high-resolution seamless satellite imagery for large-scale urban regions. Remote Sensing, 12(1):81, 2019.
- Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 300–309, 2023.
- Cones: Concept neurons in diffusion models for customized generation. arXiv preprint arXiv:2303.05125, 2023.
- Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2837–2845, 2021.
- Refusion: Enabling large-size realistic image restoration with latent-space diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1680–1691, 2023.
- Semantic segmentation of crop type in africa: A novel dataset and analysis of deep learning methods. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 75–82, 2019.
- Super-resolution of remote sensing images via a dense residual generative adversarial network. Remote Sensing, 11(21):2578, 2019.
- Fully convolutional recurrent networks for multidate crop recognition from multitemporal image sequences. ISPRS Journal of Photogrammetry and Remote Sensing, 171:188–201, 2021.
- T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453, 2023.
- Application of high-resolution stereo satellite images to detailed landslide hazard assessment. Geomorphology, 76(1-2):68–75, 2006.
- Semi-automatic mapping of anthropogenic impervious surfaces in an urban/suburban area using landsat 8 satellite data. GIScience & Remote Sensing, 54(4):471–494, 2017.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
- Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, pp. 8599–8608. PMLR, 2021.
- Small-object detection in remote sensing images with end-to-end edge-enhanced gan and object detector network. Remote Sensing, 12(9):1432, 2020.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Multi-spectral multi-image super-resolution of sentinel-2 with radiometric consistency losses and its effect on building delineation. ISPRS Journal of Photogrammetry and Remote Sensing, 195:1–13, 2023.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22500–22510, 2023.
- Self-attention for raw optical satellite time series classification. ISPRS journal of photogrammetry and remote sensing, 169:421–435, 2020.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022a.
- Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2022b.
- Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
- The effects of super-resolution on object detection performance in satellite imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0, 2019.
- 3d neural field generation using triplane diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20875–20886, 2023.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- D2c: Diffusion-decoding models for few-shot conditional generation. Advances in Neural Information Processing Systems, 34:12533–12548, 2021.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
- Multitemporal and multispectral data fusion for super-resolution of sentinel-2 images. IEEE Transactions on Geoscience and Remote Sensing, 2023.
- Score-based generative modeling in latent space. In Neural Information Processing Systems (NeurIPS), 2021.
- Spacenet: A remote sensing dataset and challenge series. arXiv preprint arXiv:1807.01232, 2018.
- The multi-temporal urban development spacenet dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6398–6407, 2021.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Masked conditional video diffusion for prediction, generation, and interpolation. arXiv preprint arXiv:2205.09853, 2022.
- Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
- Deep transfer learning for crop yield prediction with remote sensing data. In Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, pp. 1–5, 2018.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023.
- Ultra-dense gan for satellite imagery super-resolution. Neurocomputing, 398:328–337, 2020.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
- Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. arXiv preprint arXiv:2212.11565, 2022.
- Evaluating urban expansion and land use change in shijiazhuang, china, by using gis and remote sensing. Landscape and urban planning, 75(1-2):69–80, 2006.
- Measurement-conditioned denoising diffusion probabilistic model for under-sampled medical image reconstruction. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 655–664. Springer, 2022.
- Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923, 2022.
- Sustainbench: Benchmarks for monitoring the sustainable development goals with machine learning. arXiv preprint arXiv:2111.04724, 2021.
- Deep gaussian process for crop yield prediction based on remote sensing data. In Thirty-First AAAI conference on artificial intelligence, 2017.
- High-resolution satellite imagery applications in crop phenotyping: an overview. Computers and Electronics in Agriculture, 175:105584, 2020.
- Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595, 2018.
- Magicvideo: Efficient video generation with latent diffusion models. arXiv preprint arXiv:2211.11018, 2022.
- 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5826–5835, 2021.
- Samar Khanna (12 papers)
- Patrick Liu (7 papers)
- Linqi Zhou (20 papers)
- Chenlin Meng (39 papers)
- Robin Rombach (24 papers)
- Marshall Burke (26 papers)
- David Lobell (25 papers)
- Stefano Ermon (279 papers)