Forest Inspection Dataset for Aerial Semantic Segmentation and Depth Estimation (2403.06621v1)
Abstract: Humans use UAVs to monitor changes in forest environments since they are lightweight and provide a large variety of surveillance data. However, their information does not present enough details for understanding the scene which is needed to assess the degree of deforestation. Deep learning algorithms must be trained on large amounts of data to output accurate interpretations, but ground truth recordings of annotated forest imagery are not available. To solve this problem, we introduce a new large aerial dataset for forest inspection which contains both real-world and virtual recordings of natural environments, with densely annotated semantic segmentation labels and depth maps, taken in different illumination conditions, at various altitudes and recording angles. We test the performance of two multi-scale neural networks for solving the semantic segmentation task (HRNet and PointFlow network), studying the impact of the various acquisition conditions and the capabilities of transfer learning from virtual to real data. Our results showcase that the best results are obtained when the training is done on a dataset containing a large variety of scenarios, rather than separating the data into specific categories. We also develop a framework to assess the deforestation degree of an area.
- Y. Tian, K. Liu, K. Ok, L. Tran, D. Allen, N. Roy, and J. P. How, “Search and rescue under the forest canopy using multiple UAVs,” The International Journal of Robotics Research, vol. 39, no. 10-11, pp. 1201–1221, 2020.
- S. W. Chen, G. V. Nardari, E. S. Lee, C. Qu, X. Liu, R. A. F. Romero, and V. Kumar, “Sloam: Semantic lidar odometry and mapping for forest inventory,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 612–619, 2020.
- A. Kukko, R. Kaijaluoto, H. Kaartinen, V. V. Lehtola, A. Jaakkola, and J. Hyyppä, “Graph SLAM correction for single scanner MLS forest data under boreal forest canopy,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 132, pp. 199–209, 2017.
- TUGRAZ. (2019) Semantic Drone Dataset. [Online]. Available: https://www.tugraz.at/index.php?id=22387
- Y. Lyu, G. Vosselman, G.-S. Xia, A. Yilmaz, and M. Y. Yang, “UAVid: A semantic segmentation dataset for UAV imagery,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 165, pp. 108–119, 2020.
- A. Marcu, D. Costea, V. Licaret, and M. Leordeanu, “Towards Automatic Annotation for Semantic Segmentation in Drone Videos,” arXiv preprint arXiv:1910.10026, 2019.
- N. P. Koenig and A. G. Howard, “Design and use paradigms for Gazebo, an open-source multi-robot simulator,” 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), vol. 3, pp. 2149–2154 vol.3, 2004.
- M. Müller, V. Casser, J. Lahoud, N. Smith, and B. Ghanem, “Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications,” Int. J. Comput. Vision, vol. 126, no. 9, pp. 902–919, Sep. 2018.
- S. Shah, D. Dey, C. Lovett, and A. Kapoor, “AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles,” in FSR, 2017.
- A. Dosovitskiy, G. Ros, F. Codevilla, A. López, and V. Koltun, “CARLA: An Open Urban Driving Simulator,” in CoRL, 2017.
- M. Fonder and M. Van Droogenbroeck, “Mid-Air: A multi-modal dataset for extremely low altitude drone flights,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019.
- L. Chen, F. Liu, Y. Zhao, W. Wang, X. Yuan, and J. Zhu, “Valid: A comprehensive virtual aerial image dataset,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 2009–2016.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The Cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3213–3223.
- B.-C.-Z. Blaga and S. Nedevschi, “Semantic segmentation learning for autonomous uavs using simulators and real data,” in 2019 IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE, 2019, pp. 303–310.
- B. Zhao, X. Zhang, Z. Li, and X. Hu, “A multi-scale strategy for deep semantic segmentation with convolutional neural networks,” Neurocomputing, vol. 365, pp. 273–284, 2019.
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2017.
- S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr, “Conditional random fields as recurrent neural networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1529–1537.
- F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
- N. Audebert, B. Le Saux, and S. Lefèvre, “Semantic segmentation of earth observation data using multimodal and multi-scale deep networks,” in Asian Conference on Computer Vision. Springer, 2016, pp. 180–196.
- C. Peng, Y. Li, L. Jiao, Y. Chen, and R. Shang, “Densely based multi-scale and multi-modal fully convolutional networks for high-resolution remote-sensing image semantic segmentation,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 8, pp. 2612–2626, 2019.
- H. Fang and F. Lafarge, “Pyramid scene parsing network in 3D: Improving semantic segmentation of point clouds with multi-scale contextual information,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 154, pp. 246–258, 2019.
- L. Ding, J. Zhang, and L. Bruzzone, “Semantic Segmentation of Large-Size VHR Remote Sensing Images Using a Two-Stage Multiscale Training Architecture,” IEEE Transactions on Geoscience and Remote Sensing, 2020.
- J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2881–2890.
- V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
- L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818.
- D. Lin, Y. Ji, D. Lischinski, D. Cohen-Or, and H. Huang, “Multi-scale context intertwining for semantic segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 603–619.
- J. He, Z. Deng, and Y. Qiao, “Dynamic multi-scale filters for semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 3562–3572.
- T. Pohlen, A. Hermans, M. Mathias, and B. Leibe, “Full-resolution residual networks for semantic segmentation in street scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4151–4160.
- M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang, “Denseaspp for semantic segmentation in street scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3684–3692.
- J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang et al., “Deep high-resolution representation learning for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
- L. Mou, Y. Hua, and X. X. Zhu, “Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 11, pp. 7557–7569, 2020.
- A. Li, L. Jiao, H. Zhu, L. Li, and F. Liu, “Multitask semantic boundary awareness network for remote sensing image segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2021.
- G. Deng, Z. Wu, C. Wang, M. Xu, and Y. Zhong, “CCANet: Class-constraint coarse-to-fine attentional deep network for subdecimeter aerial image semantic segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–20, 2021.
- R. Niu, X. Sun, Y. Tian, W. Diao, K. Chen, and K. Fu, “Hybrid multiple attention network for semantic segmentation in aerial images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–18, 2021.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Thirty-first AAAI Conference on Artificial Intelligence, 2017.
- A. Kirillov, R. Girshick, K. He, and P. Dollár, “Panoptic feature pyramid networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 6399–6408.
- A. Kirillov, Y. Wu, K. He, and R. Girshick, “Pointrend: Image segmentation as rendering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9799–9808.
- L. Mi and Z. Chen, “Superpixel-enhanced deep neural forest for remote sensing image semantic segmentation,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 159, pp. 140–152, 2020.
- M. Zhen, J. Wang, L. Zhou, S. Li, T. Shen, J. Shang, T. Fang, and L. Quan, “Joint semantic segmentation and boundary detection using iterative pyramid contexts,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13 666–13 675.
- X. Sun, A. Shi, H. Huang, and H. Mayer, “BASNet: Boundary-aware semi-supervised semantic segmentation network for very high resolution remote sensing images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 5398–5413, 2020.
- Y. Feng, W. Diao, X. Sun, J. Li, K. Chen, K. Fu, and X. Gao, “NPALoss: Neighboring pixel affinity loss for semantic segmentation in high-resolution aerial imagery.” ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences, vol. 5, no. 2, 2020.
- X. Li, H. He, X. Li, D. Li, G. Cheng, J. Shi, L. Weng, Y. Tong, and Z. Lin, “Pointflow: Flowing semantics through points for aerial image segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4217–4226.
- Y. Liu, Q. Zhu, F. Cao, J. Chen, and G. Lu, “High-Resolution Remote Sensing Image Segmentation Framework Based on Attention Mechanism and Adaptive Weighting,” ISPRS International Journal of Geo-Information, vol. 10, no. 4, p. 241, 2021.
- H. Florea, V.-C. Miclea, and S. Nedevschi, “WildUAV: Monocular UAV Dataset for Depth Estimation Tasks,” in 2021 IEEE 17th International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE, 2021, pp. 291–298.
- D. Environments. (2016) European Forest. [Online]. Available: https://www.unrealengine.com/marketplace/en-US/product/european-forest
- S. S. Blueprints. (2019) Vehicle Variety Pack. [Online]. Available: https://www.unrealengine.com/marketplace/en-US/product/bbcb90a03f844edbb20c8b89ee16ea32
- B. Alvey, D. T. Anderson, A. Buck, M. Deardorff, G. Scott, and J. M. Keller, “Simulated photorealistic deep learning framework and workflows to accelerate computer vision and unmanned aerial vehicle research,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3889–3898.