Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SkyScenes: A Synthetic Dataset for Aerial Scene Understanding (2312.06719v2)

Published 11 Dec 2023 in cs.CV

Abstract: Real-world aerial scene understanding is limited by a lack of datasets that contain densely annotated images curated under a diverse set of conditions. Due to inherent challenges in obtaining such images in controlled real-world settings, we present SkyScenes, a synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives. We carefully curate SkyScenes images from CARLA to comprehensively capture diversity across layouts (urban and rural maps), weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth annotations. Through our experiments using SkyScenes, we show that (1) models trained on SkyScenes generalize well to different real-world scenarios, (2) augmenting training on real images with SkyScenes data can improve real-world performance, (3) controlled variations in SkyScenes can offer insights into how models respond to changes in viewpoint conditions (height and pitch), weather and time of day, and (4) incorporating additional sensor modalities (depth) can improve aerial scene understanding. Our dataset and associated generation code are publicly available at: https://hoffman-group.github.io/SkyScenes/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Vdd: Varied drone dataset for semantic segmentation, 2023.
  2. Valid: A comprehensive virtual aerial image dataset. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 2009–2016, 2020.
  3. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, 2017.
  4. Large-scale structure from motion with semantic constraints of aerial images. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 347–359. Springer, 2018.
  5. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
  6. Deepglobe 2018: A challenge to parse the earth through satellite images. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 172–172, 2018.
  7. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
  8. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European conference on computer vision (ECCV), pages 370–386, 2018.
  9. Mid-air: A multi-modal dataset for extremely low altitude drone flights. In Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2019.
  10. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  11. Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation, 2022a.
  12. Hrda: Context-aware high-resolution domain-adaptive semantic segmentation, 2022b.
  13. Mic: Masked image consistency for context-enhanced domain adaptation, 2023.
  14. Institute of Computer Graphics and Vision, Graz University of Technology. Semantic drone dataset. http://dronedataset.icg.tugraz.at.
  15. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  16. Capturing, reconstructing, and simulating: the urbanscene3d dataset. In ECCV, pages 93–109, 2022.
  17. Espada: Extended synthetic and photogrammetric aerial-image dataset. IEEE Robotics and Automation Letters, 6(4):7981–7988, 2021.
  18. Decoupled weight decay regularization, 2019.
  19. Uavid: A semantic segmentation dataset for uav imagery. ISPRS journal of photogrammetry and remote sensing, 165:108–119, 2020.
  20. Missing modality robustness in semi-supervised multi-modal semantic segmentation, 2023.
  21. The mapillary vistas dataset for semantic understanding of street scenes. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 5000–5009, 2017.
  22. Ensemble knowledge transfer for semantic segmentation. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, pages 916–924. IEEE, 2018.
  23. Visda: The visual domain adaptation challenge. In IEEE International Conference on Computer Vision, pages 1685–1692, 2017.
  24. Floodnet: A high resolution aerial imagery dataset for post flood scene understanding, 2020.
  25. Syndrone–multi-modal uav dataset for urban scenarios. arXiv preprint arXiv:2308.10491, 2023.
  26. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3234–3243, 2016.
  27. The isprs benchmark on urban object classification and 3d building reconstruction. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, I-3, 2012.
  28. Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In The IEEE International Conference on Computer Vision (ICCV), 2019.
  29. Generate to adapt: Aligning domains using generative adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  30. Maria Scanlon. Semantic Annotation of Aerial Images using Deep Learning, Transfer Learning, and Synthetic Training Data. PhD thesis, 2018.
  31. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics, 2017.
  32. Shift: A synthetic driving dataset for continuous multi-task domain adaptation, 2022.
  33. Selma: Semantic large-scale multimodal acquisitions in variable weather, daytime and viewpoints, 2022.
  34. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sensing of Environment, 237:111322, 2020.
  35. Tartanair: A dataset to push the limits of visual slam. 2020.
  36. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021.
  37. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645, 2020.
  38. Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2021.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com