Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TreeFormer: a Semi-Supervised Transformer-based Framework for Tree Counting from a Single High Resolution Image (2307.06118v1)

Published 12 Jul 2023 in cs.CV and cs.AI

Abstract: Automatic tree density estimation and counting using single aerial and satellite images is a challenging task in photogrammetry and remote sensing, yet has an important role in forest management. In this paper, we propose the first semisupervised transformer-based framework for tree counting which reduces the expensive tree annotations for remote sensing images. Our method, termed as TreeFormer, first develops a pyramid tree representation module based on transformer blocks to extract multi-scale features during the encoding stage. Contextual attention-based feature fusion and tree density regressor modules are further designed to utilize the robust features from the encoder to estimate tree density maps in the decoder. Moreover, we propose a pyramid learning strategy that includes local tree density consistency and local tree count ranking losses to utilize unlabeled images into the training process. Finally, the tree counter token is introduced to regulate the network by computing the global tree counts for both labeled and unlabeled images. Our model was evaluated on two benchmark tree counting datasets, Jiangsu, and Yosemite, as well as a new dataset, KCL-London, created by ourselves. Our TreeFormer outperforms the state of the art semi-supervised methods under the same setting and exceeds the fully-supervised methods using the same number of labeled images. The codes and datasets are available at https://github.com/HAAClassic/TreeFormer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (79)
  1. B. G. Weinstein, S. Marconi, S. Bohlman, A. Zare, A. Singh, S. J. Graves, and E. White, “Neon crowns: a remote sensing derived dataset of 100 million individual tree crowns,” BioRxiv, 2020.
  2. W. J. Ong and J. C. Ellison, “A framework for the quantitative assessment of mangrove resilience,” 2021, pp. 513–538.
  3. V. M. Gomez-Muñoz, M. Porta-Gándara, and J. Fernández, “Effect of tree shades in urban planning in hot-arid climatic regions,” Landscape and Urban Planning, vol. 94, no. 3-4, pp. 149–157, 2010.
  4. G. Caruso, P. J. Zarco-Tejada, V. González-Dugo, M. Moriondo, L. Tozzini, G. Palai, G. Rallo, A. Hornero, J. Primicerio, and R. Gucci, “High-resolution imagery acquired from an unmanned platform to estimate biophysical and geometrical parameters of olive trees under different irrigation regimes,” PLoS One, vol. 14, no. 1, p. e0210804, 2019.
  5. M. Shahbazi, J. Théau, and P. Ménard, “Recent applications of unmanned aerial imagery in natural resource management,” GIScience & Remote Sensing, vol. 51, no. 4, pp. 339–365, 2014.
  6. M. Mulligan, C. Douglas, A. Van Soesbergen, M. Shi, S. Burke, H. Van Delden, R. Giordano, E. Lopez-Gunn, and A. Scrieciu, “Environmental intelligence for more sustainable infrastructure investment,” in ACM GoodIT, 2021.
  7. G. Kindermann, I. McCallum, S. Fritz, and M. Obersteiner, “A global forest growing stock, biomass and carbon map based on fao statistics,” Silva Fennica, vol. 42, no. 3, pp. 387–396, 2008.
  8. A. Ammar, A. Koubaa, and B. Benjdira, “Deep-learning-based automated palm tree counting and geolocation in large farms from aerial geotagged images,” Agronomy, vol. 11, no. 8, p. 1458, 2021.
  9. H. Kaartinen and J. Hyyppä, “Eurosdr/isprs project commission ii, tree extraction, final report,” Proc. EuroSDR, pp. 1–60, 2008.
  10. H. Kaartinen, J. Hyyppä, X. Yu, M. Vastaranta, H. Hyyppä, A. Kukko, M. Holopainen, C. Heipke, M. Hirschmugl, F. Morsdorf et al., “An international comparison of individual tree detection and extraction using airborne laser scanning,” Remote Sensing, vol. 4, no. 4, pp. 950–974, 2012.
  11. J. Gu and R. G. Congalton, “Individual tree crown delineation from uas imagery based on region growing by over-segments with a competitive mechanism,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2021.
  12. E. Ghanbari Parmehr and M. Amati, “Individual tree canopy parameters estimation using uav-based photogrammetric and lidar point clouds in an urban park,” Remote Sensing, vol. 13, no. 11, p. 2062, 2021.
  13. F. Hanssen, D. N. Barton, Z. S. Venter, M. S. Nowell, and Z. Cimburova, “Utilizing lidar data to map tree canopy for urban ecosystem extent and condition accounts in oslo,” Ecological Indicators, vol. 130, p. 108007, 2021.
  14. J. Liu, J. Shen, R. Zhao, and S. Xu, “Extraction of individual tree crowns from airborne lidar data in human settlements,” Mathematical and Computer Modelling, vol. 58, no. 3-4, pp. 524–535, 2013.
  15. Y. Hao, F. R. A. Widagdo, X. Liu, Y. Liu, L. Dong, and F. Li, “A hierarchical region-merging algorithm for 3-d segmentation of individual trees using uav-lidar point clouds,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2021.
  16. L. Wallace, A. Lucieer, and C. S. Watson, “Evaluating tree detection and segmentation routines on very high resolution uav lidar data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 12, pp. 7619–7628, 2014.
  17. L. Yao, T. Liu, J. Qin, N. Lu, and C. Zhou, “Tree counting with high spatial-resolution satellite imagery based on deep neural networks,” Ecological Indicators, vol. 125, p. 107591, 2021.
  18. G. Lassalle, M. P. Ferreira, L. E. C. La Rosa, and C. R. de Souza Filho, “Deep learning-based individual tree crown delineation in mangrove forests using very-high-resolution satellite imagery,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 189, pp. 220–235, 2022.
  19. T. Liu, L. Yao, J. Qin, J. Lu, N. Lu, and C. Zhou, “A deep neural network for the estimation of tree density based on high-spatial resolution image,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2021.
  20. Y. Ouali, C. Hudelot, and M. Tami, “Semi-supervised semantic segmentation with cross-consistency training,” in CVPR, 2020.
  21. A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in NeurIPS, 2017.
  22. V. Verma, K. Kawaguchi, A. Lamb, J. Kannala, A. Solin, Y. Bengio, and D. Lopez-Paz, “Interpolation consistency training for semi-supervised learning,” Neural Networks, vol. 145, pp. 90–106, 2022.
  23. Y. Liu, L. Liu, P. Wang, P. Zhang, and Y. Lei, “Semi-supervised crowd counting via self-training on surrogate tasks,” in ECCV, 2020.
  24. V. A. Sindagi, R. Yasarla, D. S. Babu, R. V. Babu, and V. M. Patel, “Learning to count in the crowd from limited labeled data,” in ECCV, 2020.
  25. J. Gao, Z. Huang, Y. Lei, J. Z. Wang, F.-Y. Wang, and J. Zhang, “S2fpr: Crowd counting via self-supervised coarse to fine feature pyramid ranking,” arXiv preprint arXiv:2201.04819, 2022.
  26. X. Liu, J. Van De Weijer, and A. D. Bagdanov, “Leveraging unlabeled data for crowd counting by learning to rank,” in CVPR, 2018.
  27. ——, “Exploiting unlabeled data in cnns by self-supervised learning to rank,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 8, pp. 1862–1878, 2019.
  28. Z. Dai, B. Cai, Y. Lin, and J. Chen, “Up-detr: Unsupervised pre-training for object detection with transformers,” in CVPR, 2021.
  29. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  30. W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in ICCV, 2021.
  31. G. Chen and Y. Shang, “Transformer for tree counting in aerial images,” Remote Sensing, vol. 14, no. 3, p. 476, 2022.
  32. G. Amato, L. Ciampi, F. Falchi, and C. Gennaro, “Counting vehicles with deep learning in onboard uav imagery,” in ISCC.
  33. D. Biswas, H. Su, C. Wang, J. Blankenship, and A. Stevanovic, “An automatic car counting system using overfeat framework,” Sensors, vol. 17, no. 7, p. 1535, 2017.
  34. T. Falk, D. Mai, R. Bensch, Ö. Çiçek, A. Abdulkadir, Y. Marrakchi, A. Böhm, J. Deubner, Z. Jäckel, K. Seiwald et al., “U-net: deep learning for cell counting, detection, and morphometry,” Nature Methods, vol. 16, no. 1, pp. 67–70, 2019.
  35. W. Xie, J. A. Noble, and A. Zisserman, “Microscopy cell counting and detection with fully convolutional regression networks,” Computer methods in biomechanics and biomedical engineering: Imaging & Visualization, vol. 6, no. 3, pp. 283–292, 2018.
  36. P. Chattopadhyay, R. Vedantam, R. R. Selvaraju, D. Batra, and D. Parikh, “Counting everyday objects in everyday scenes,” in CVPR, 2017.
  37. R. Stewart, M. Andriluka, and A. Y. Ng, “End-to-end people detection in crowded scenes,” in CVPR, 2016.
  38. Z. Du, M. Shi, J. Deng, and S. Zafeiriou, “Redesigning multi-scale neural network for crowd counting,” arXiv preprint arXiv:2208.02894, 2022.
  39. X. Jiang, X. Wu, H. Cholakkal, R. M. Anwer, J. C. M. Xu, B. Zhou, Y. Pang, and F. S. Khan, “Multi-scale feature aggregation for crowd counting,” arXiv preprint arXiv:2208.05256, 2022.
  40. Y. Meng, H. Zhang, Y. Zhao, X. Yang, X. Qian, X. Huang, and Y. Zheng, “Spatial uncertainty-aware semi-supervised crowd counting,” in ICCV, 2021.
  41. Z. Zhao, M. Shi, X. Zhao, and L. Li, “Active crowd counting with limited supervision,” in ECCV, 2020.
  42. D. B. Sam, S. V. Peri, M. N. Sundararaman, A. Kamath, and R. V. Babu, “Locate, size, and count: accurately resolving people in dense crowds via detection,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 8, pp. 2739–2751, 2020.
  43. F. Wang, J. Sang, Z. Wu, Q. Liu, and N. Sang, “Hybrid attention network based on progressive embedding scale-context for crowd counting,” Information Sciences, vol. 591, pp. 306–318, 2022.
  44. D. Liang, X. Chen, W. Xu, Y. Zhou, and X. Bai, “Transcrowd: weakly-supervised crowd counting with transformers,” Science China Information Sciences, vol. 65, no. 6, pp. 1–14, 2022.
  45. L. Boominathan, S. S. Kruthiventi, and R. V. Babu, “Crowdnet: A deep convolutional network for dense crowd counting,” in ACM MM, 2016.
  46. Y. Yang, G. Li, D. Du, Q. Huang, and N. Sebe, “Embedding perspective analysis into multi-column convolutional neural network for crowd counting,” IEEE Transactions on Image Processing, vol. 30, pp. 1395–1407, 2020.
  47. Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in CVPR, 2016.
  48. X. Jiang, L. Zhang, M. Xu, T. Zhang, P. Lv, B. Zhou, X. Yang, and Y. Pang, “Attention scaling for crowd counting,” in CVPR, 2020.
  49. S. D. Khan and S. Basalamah, “Scale and density invariant head detection deep model for crowd counting in pedestrian crowds,” The Visual Computer, vol. 37, pp. 2127–2137, 2021.
  50. B. Li, H. Huang, A. Zhang, P. Liu, and C. Liu, “Approaches on crowd counting and density estimation: a review,” Pattern Analysis and Applications, vol. 24, pp. 853–874, 2021.
  51. Z. Shi, P. Mettes, and C. G. Snoek, “Counting with focus for free,” in ICCV, 2019.
  52. V. A. Sindagi and V. M. Patel, “Ha-ccn: Hierarchical attention-based crowd counting network,” IEEE Transactions on Image Processing, 2019.
  53. J. Gao, Q. Wang, and X. Li, “Pcc net: Perspective crowd counting via spatial convolutional network,” IEEE Transactions on Circuits and Systems for Video Technology, 2019.
  54. M. Zhao, J. Zhang, C. Zhang, and W. Zhang, “Leveraging heterogeneous auxiliary tasks to assist crowd counting,” in CVPR.
  55. Q. Wang, J. Gao, W. Lin, and X. Li, “Nwpu-crowd: A large-scale benchmark for crowd counting and localization,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 6, pp. 2141–2149, 2020.
  56. D. B. Sam, N. N. Sajjan, H. Maurya, and R. V. Babu, “Almost unsupervised learning for dense crowd counting,” in AAAI, 2019.
  57. Y. Lei, Y. Liu, P. Zhang, and L. Liu, “Towards using count-level weak supervision for crowd counting,” Pattern Recognition, vol. 109, p. 107616, 2021.
  58. Y. Yang, G. Li, Z. Wu, L. Su, Q. Huang, and N. Sebe, “Weakly-supervised crowd counting learns from sorting rather than locations,” in ECCV, 2020.
  59. D. S. Culvenor, “Tida: an algorithm for the delineation of tree crowns in high spatial resolution remotely sensed imagery,” Computers & Geosciences, vol. 28, no. 1, pp. 33–44, 2002.
  60. L. Wang, “A multi-scale approach for delineating individual tree crowns with very high resolution imagery,” Photogrammetric Engineering & Remote Sensing, vol. 76, no. 4, pp. 371–378, 2010.
  61. Y. Wang, X. Zhu, and B. Wu, “Automatic detection of individual oil palm trees from uav images using hog features and an svm classifier,” International Journal of Remote Sensing, vol. 40, no. 19, pp. 7356–7370, 2019.
  62. D. Yi, J. Su, and W.-H. Chen, “Probabilistic faster r-cnn with stochastic region proposing: Towards object detection and recognition in remote sensing imagery,” Neurocomputing, vol. 459, pp. 290–301, 2021.
  63. X. Wu, D. Sahoo, and S. C. Hoi, “Recent advances in deep learning for object detection,” Neurocomputing, vol. 396, pp. 39–64, 2020.
  64. T. Diwan, G. Anirudh, and J. V. Tembhurne, “Object detection using yolo: Challenges, architectural successors, datasets and applications,” Multimedia Tools and Applications, pp. 1–33, 2022.
  65. M. Machefer, F. Lemarchand, V. Bonnefond, A. Hitchins, and P. Sidiropoulos, “Mask r-cnn refitting strategy for plant counting and sizing in uav imagery,” Remote Sensing, vol. 12, no. 18, p. 3015, 2020.
  66. B. G. Weinstein, S. Marconi, S. Bohlman, A. Zare, and E. White, “Individual tree-crown detection in rgb imagery using semi-supervised deep learning neural networks,” Remote Sensing, vol. 11, no. 11, p. 1309, 2019.
  67. J. Zheng, H. Fu, W. Li, W. Wu, Y. Zhao, R. Dong, and L. Yu, “Cross-regional oil palm tree counting and detection via a multi-level attention domain adaptation network,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 167, pp. 154–177, 2020.
  68. L. P. Osco, M. d. S. De Arruda, J. M. Junior, N. B. Da Silva, A. P. M. Ramos, É. A. S. Moryia, N. N. Imai, D. R. Pereira, J. E. Creste, E. T. Matsubara et al., “A convolutional neural network approach for counting and geolocating citrus-trees in uav multispectral imagery,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 160, pp. 97–106, 2020.
  69. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in CVPR, 2018.
  70. J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Efficient object localization using convolutional networks,” in CVPR, 2015.
  71. B. Wang, H. Liu, D. Samaras, and M. H. Nguyen, “Distribution matching for crowd counting,” in NeurIPS, 2020.
  72. R. Guerrero-Gómez-Olmedo, B. Torre-Jiménez, R. López-Sastre, S. Maldonado-Bascón, and D. Onoro-Rubio, “Extremely overlapping vehicle counting,” in Pattern Recognition and Image Analysis: 7th Iberian Conference, IbPRIA 2015, Santiago de Compostela, Spain, June 17-19, 2015, Proceedings 7.   Springer, 2015, pp. 423–431.
  73. Y. Li, X. Zhang, and D. Chen, “Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,” in CVPR, 2018.
  74. H. A. Amirkolaee and H. Arefi, “Height estimation from single aerial images using a deep convolutional encoder-decoder network,” ISPRS journal of photogrammetry and remote sensing, vol. 149, pp. 50–66, 2019.
  75. Q. Song, C. Wang, Y. Wang, Y. Tai, C. Wang, J. Li, J. Wu, and J. Ma, “To choose or to fuse? scale selection for crowd counting,” in AAAI, 2021.
  76. Y. Ma, V. Sanchez, and T. Guha, “Fusioncount: Efficient crowd counting via multiscale feature fusion,” arXiv preprint arXiv:2202.13660, 2022.
  77. H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” arXiv preprint arXiv:2105.05537, 2021.
  78. C. Wu and J. Zhang, “Robust semi-supervised spatial picture fuzzy clustering with local membership and kl-divergence for image segmentation,” International Journal of Machine Learning and Cybernetics, vol. 13, no. 4, pp. 963–987, 2022.
  79. H. Wang, L. Feng, X. Meng, Z. Chen, L. Yu, and H. Zhang, “Multi-view metric learning based on kl-divergence for similarity measurement,” Neurocomputing, vol. 238, pp. 269–276, 2017.
Citations (12)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub