Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots (2403.14056v1)
Abstract: We present a new method to automatically generate semantic segmentation annotations for thermal imagery captured from an aerial vehicle by utilizing satellite-derived data products alongside onboard global positioning and attitude estimates. This new capability overcomes the challenge of developing thermal semantic perception algorithms for field robots due to the lack of annotated thermal field datasets and the time and costs of manual annotation, enabling precise and rapid annotation of thermal data from field collection efforts at a massively-parallelizable scale. By incorporating a thermal-conditioned refinement step with visual foundation models, our approach can produce highly-precise semantic segmentation labels using low-resolution satellite land cover data for little-to-no cost. It achieves 98.5% of the performance from using costly high-resolution options and demonstrates between 70-160% improvement over popular zero-shot semantic segmentation methods based on large vision-LLMs currently used for generating annotations for RGB imagery. Code will be available at: https://github.com/connorlee77/aerial-auto-segment.
- A. Pretto, et al., “Building an aerial–ground robotics system for precision farming: an adaptable solution,” IEEE Robot. Automat. Mag., vol. 28, no. 3, pp. 29–49, 2020.
- E. Bondi et al., “Birdsai: A dataset for detection and tracking in aerial thermal infrared videos,” in Proc. IEEE Winter Conf. Applicat. Comput. Vis., 2020, pp. 1747–1756.
- K. L. Brodie, B. L. Bruder, R. K. Slocum, and N. J. Spore, “Simultaneous mapping of coastal topography and bathymetry from a lightweight multicamera uas,” IEEE Trans. Geosci. Remote Sensing, vol. 57, no. 9, pp. 6844–6864, 2019.
- A. Jong et al., “WIT-UAS: A wildland-fire infrared thermal dataset to detect crew assets from aerial views,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots and Syst., 2023, pp. 11 464–11 471.
- J. Delaune, R. Hewitt, L. Lytle, C. Sorice, R. Thakker, and L. Matthies, “Thermal-inertial odometry for autonomous flight throughout the night,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots and Syst., 2019, pp. 1122–1128.
- C. Lee, J. G. Frennert, L. Gan, M. Anderson, and S.-J. Chung, “Online self-supervised thermal water segmentation for aerial vehicles,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots and Syst., 2023, pp. 7734–7741.
- S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, and D. Terzopoulos, “Image segmentation using deep learning: A survey,” IEEE Trans. Pattern Anal. Machine Intell., vol. 44, no. 7, pp. 3523–3542, 2021.
- G. Loianno, et al., “Localization, grasping, and transportation of magnetic objects by a team of mavs in challenging desert-like environments,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1576–1583, 2018.
- S. Nirgudkar, M. DeFilippo, M. Sacarny, M. Benjamin, and P. Robinette, “Massmind: Massachusetts maritime infrared dataset,” Int. J. Robot. Res., vol. 42, no. 1-2, pp. 21–32, 2023.
- C. Li, W. Xia, Y. Yan, B. Luo, and J. Tang, “Segmenting objects in day and night: Edge-conditioned cnn for thermal image semantic segmentation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 7, pp. 3069–3082, 2020.
- Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, and T. Harada, “Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots and Syst., 2017, pp. 5108–5115.
- J. Vertens, J. Zürn, and W. Burgard, “Heatnet: Bridging the day-night domain gap in semantic segmentation with thermal images,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots and Syst., 2020, pp. 8461–8468.
- C. Lee, et al., “CART: Caltech aerial RGB-thermal dataset in the wild,” arXiv preprint arXiv:2403.08997, 2024.
- L. Gan, C. Lee, and S.-J. Chung, “Unsupervised RGB-to-thermal domain adaptation via multi-domain attention network,” in Proc. IEEE Int. Conf. Robot. and Automation, 2023, pp. 6014–6020.
- Y.-H. Kim, U. Shin, J. Park, and I. S. Kweon, “Ms-uda: Multi-spectral unsupervised domain adaptation for thermal image semantic segmentation,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 6497–6504, 2021.
- S. S. Shivakumar, N. Rodrigues, A. Zhou, I. D. Miller, V. Kumar, and C. J. Taylor, “Pst900: RGB-thermal calibration, dataset and segmentation network,” in Proc. IEEE Int. Conf. Robot. and Automation, 2020, pp. 9441–9447.
- A. Kirillov et al., “Segment anything,” arXiv:2304.02643, 2023.
- L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2961–2969.
- Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 10 012–10 022.
- A. Mehra, B. Kailkhura, P.-Y. Chen, and J. Hamm, “Understanding the limits of unsupervised domain adaptation via data poisoning,” Proc. Advances Neural Inform. Process. Syst. Conf., vol. 34, pp. 17 347–17 359, 2021.
- J. Xu, S. Liu, A. Vahdat, W. Byeon, X. Wang, and S. De Mello, “Open-vocabulary panoptic segmentation with text-to-image diffusion models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 2955–2966.
- F. Liang et al., “Open-vocabulary semantic segmentation with mask-adapted clip,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 7061–7070.
- S. Daftry, Y. Agrawal, and L. Matthies, “Online self-supervised long-range scene segmentation for MAVs,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots and Syst., 2018, pp. 5194–5199.
- N. Araslanov and S. Roth, “Self-supervised augmentation consistency for adapting semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 15 384–15 394.
- B. Yu, R. Tibbetts, T. Barna, A. Morales, I. Rekleitis, and M. J. Islam, “Weakly supervised caveline detection for auv navigation inside underwater caves,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots and Syst., 2023, pp. 9933–9940.
- A. Marcu, V. Licaret, D. Costea, and M. Leordeanu, “Semantics through time: Semi-supervised segmentation of aerial videos with iterative label propagation,” in Proc. Asian Conf. Comput. Vis., 2020.
- A. Berg, J. Johnander, F. Durand de Gevigney, J. Ahlberg, and M. Felberg, “Semi-automatic annotation of objects in visual-thermal video,” in Proc. IEEE Int. Conf. Comput. Vis. Workshop, Oct 2019, pp. 2242–2251.
- Scale AI, “Introducing scale’s automotive foundation model,” https://scale.com/blog/afm1, 2023.
- J. E. Gallagher, A. Gogia, and E. J. Oughton, “A multispectral automated transfer technique (matt) for machine-driven image labeling utilizing the segment anything model (sam),” arXiv preprint arXiv:2402.11413, 2024.
- A. Braun and A. Borrmann, “Combining inverse photogrammetry and bim for automated labeling of construction site images for machine learning,” Automation in Construction, vol. 106, p. 102879, 2019.
- C. F. Brown et al., “Dynamic world, near real-time global 10 m land use land cover mapping,” Scientific Data, vol. 9, no. 1, p. 251, 2022.
- K. Karra, C. Kontgis, Z. Statman-Weil, J. C. Mazzariello, M. Mathis, and S. P. Brumby, “Global land use / land cover with sentinel 2 and deep learning,” in Proc. IEEE Int. Geosci. and Remote Sensing Symp., 2021, pp. 4704–4707.
- C. Robinson et al., “Large scale high-resolution land cover mapping with multi-resolution data,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 12 726–12 735.
- J. Xia, N. Yokoya, B. Adriano, and C. Broni-Bediako, “Openearthmap: A benchmark dataset for global high-resolution land cover mapping,” in Proc. IEEE Winter Conf. Applicat. Comput. Vis., 2023, pp. 6254–6264.
- U.S. Department of Agriculture Farm Service Agency, Aerial Photography Field Office, “National Agricultural Imagery Program,” https://earthexplorer.usgs.gov, 2012-2023.
- U.S. Geological Survey, “3D Elevation Program,” https://usgs.gov/3d-elevation-program.
- P. Krähenbühl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” Proc. Advances Neural Inform. Process. Syst. Conf., vol. 24, 2011.
- K. Kamnitsas et al., “Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation,” Medical Image Analysis, vol. 36, pp. 61–78, 2017.
- A. Bokhovkin and E. Burnaev, “Boundary loss for remote sensing imagery semantic segmentation,” in Proc. Int. Symp. on Neural Networks. Springer, 2019, pp. 388–401.
- M. Mendieta, B. Han, X. Shi, Y. Zhu, and C. Chen, “Towards geospatial foundation models via continual pretraining,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 16 806–16 816.
- T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.
- R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” IEEE Trans. Pattern Anal. Machine Intell., vol. 34, no. 11, pp. 2274–2282, 2012.
- P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based image segmentation,” Int. J. Comput. Vis., vol. 59, pp. 167–181, 2004.
- S. van der Walt et al., “scikit-image: image processing in python,” PeerJ, vol. 2, p. e453, jun 2014.
- H. Cai, J. Li, M. Hu, C. Gan, and S. Han, “Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 302–17 313.