Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images (2401.10530v1)

Published 19 Jan 2024 in cs.CV

Abstract: Object counting is a hot topic in computer vision, which aims to estimate the number of objects in a given image. However, most methods only count objects of a single category for an image, which cannot be applied to scenes that need to count objects with multiple categories simultaneously, especially in aerial scenes. To this end, this paper introduces a Multi-category Object Counting (MOC) task to estimate the numbers of different objects (cars, buildings, ships, etc.) in an aerial image. Considering the absence of a dataset for this task, a large-scale Dataset (NWPU-MOC) is collected, consisting of 3,416 scenes with a resolution of 1024 $\times$ 1024 pixels, and well-annotated using 14 fine-grained object categories. Besides, each scene contains RGB and Near Infrared (NIR) images, of which the NIR spectrum can provide richer characterization information compared with only the RGB spectrum. Based on NWPU-MOC, the paper presents a multi-spectrum, multi-category object counting framework, which employs a dual-attention module to fuse the features of RGB and NIR and subsequently regress multi-channel density maps corresponding to each object category. In addition, to modeling the dependency between different channels in the density map with each object category, a spatial contrast loss is designed as a penalty for overlapping predictions at the same spatial position. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared with some mainstream counting algorithms. The dataset, code and models are publicly available at https://github.com/lyongo/NWPU-MOC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. M. M. Rathore, A. Ahmad, A. Paul, and S. Rho, “Urban planning and building smart cities based on the internet of things using big data analytics,” Computer networks, vol. 101, pp. 63–80, 2016.
  2. F. Henderson and Z.-G. Xia, “Sar applications in human settlement detection, population estimation and urban land use pattern analysis: a status report,” IEEE Transactions on Geoscience and Remote Sensing, vol. 35, no. 1, pp. 79–85, 1997.
  3. Q. Weng, U. Rajasekar, and X. Hu, “Modeling urban heat islands and their relationship with impervious surface and vegetation abundance by using aster images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 10, pp. 4080–4089, 2011.
  4. J.-F. Pekel, A. Cottam, N. Gorelick, and A. S. Belward, “High-resolution mapping of global surface water and its long-term changes,” Nature, vol. 540, no. 7633, pp. 418–422, 2016.
  5. A. Rosenqvist, M. Shimada, N. Ito, and M. Watanabe, “Alos palsar: A pathfinder mission for global-scale monitoring of the environment,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 11, pp. 3307–3316, 2007.
  6. B. Bauer-Marschallinger, V. Freeman, S. Cao, C. Paulik, S. Schaufler, T. Stachl, S. Modanesi, C. Massari, L. Ciabatta, L. Brocca, and W. Wagner, “Toward global soil moisture monitoring with sentinel-1: Harnessing assets and overcoming obstacles,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 1, pp. 520–539, 2019.
  7. M. Zhang, Q. Li, Y. Yuan, and Q. Wang, “Edge neighborhood contrastive learning for building change detection,” IEEE Geoscience and Remote Sensing Letters, 2022.
  8. G. Moser and S. Serpico, “Generalized minimum-error thresholding for unsupervised change detection from sar amplitude imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 44, no. 10, pp. 2972–2982, 2006.
  9. Y. Liu, Z. Xiong, Y. Yuan, and Q. Wang, “Distilling knowledge from super-resolution for efficient remote sensing salient object detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023.
  10. J. Gao, Q. Wang, and X. Li, “Pcc net: Perspective crowd counting via spatial convolutional network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 10, pp. 3486–3498, 2019.
  11. J. Gao, T. Han, Y. Yuan, and Q. Wang, “Domain-adaptive crowd counting via high-quality image translation and density reconstruction,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 8, pp. 4803–4815, 2023.
  12. L. Zhao, J. Gao, and X. Li, “Nas-kernel: Learning suitable gaussian kernel for remote-sensing object counting,” IEEE Geoscience and Remote Sensing Letters, vol. 20, pp. 1–5, 2023.
  13. R. Bahmanyar, E. Vig, and P. Reinartz, “Mrcnet: Crowd counting and density map estimation in aerial and ground imagery,” arXiv preprint arXiv:1909.12743, 2019.
  14. G. Gao, Q. Liu, and Y. Wang, “Counting from sky: A large-scale data set for remote sensing object counting and a benchmark method,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 5, pp. 3642–3655, 2020.
  15. G. Gao, Q. Liu, Z. Hu, L. Li, Q. Wen, and Y. Wang, “Psgcnet: A pyramidal scale and global context guided network for dense object counting in remote-sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–12, 2022.
  16. H. Go, J. Byun, B. Park, M.-A. Choi, S. Yoo, and C. Kim, “Fine-grained multi-class object counting,” in 2021 IEEE International Conference on Image Processing (ICIP), 2021, pp. 509–513.
  17. Z. Liu, Q. Wang, and F. Meng, “A benchmark for multi-class object counting and size estimation using deep convolutional neural networks,” Engineering Applications of Artificial Intelligence, vol. 116, p. 105449, 2022.
  18. Q. Wang, J. Gao, W. Lin, and X. Li, “Nwpu-crowd: A large-scale benchmark for crowd counting and localization,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 6, pp. 2141–2149, 2020.
  19. L. Wen, D. Du, P. Zhu, Q. Hu, Q. Wang, L. Bo, and S. Lyu, “Detection, tracking, and counting meets drones in crowds: A benchmark,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7812–7821.
  20. E. Salamí, A. Gallardo, G. Skorobogatov, and C. Barrado, “On-the-fly olive trees counting using a uas and cloud services,” Remote Sens., vol. 11, no. 3, p. 316, 2019.
  21. T. N. Mundhenk, G. Konjevod, W. A. Sakla, and K. Boakye, “A large contextual dataset for classification, detection and counting of cars with deep learning,” in ECCV.   Springer, 2016, pp. 785–800.
  22. M.-R. Hsieh, Y.-L. Lin, and W. H. Hsu, “Drone-based object counting by spatially regularized regional proposal network,” in ICCV, 2017, pp. 4145–4153.
  23. A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos, “Privacy preserving crowd monitoring: Counting people without people models or tracking,” in 2008 IEEE conference on computer vision and pattern recognition.   IEEE, 2008, pp. 1–7.
  24. Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 589–597.
  25. V. A. Sindagi, R. Yasarla, and V. M. Patel, “Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1221–1231.
  26. Q. Wang, J. Gao, W. Lin, and Y. Yuan, “Pixel-wise crowd understanding via synthetic data,” International Journal of Computer Vision, vol. 129, no. 1, pp. 225–245, 2021.
  27. V. A. Sindagi, R. Yasarla, and V. M. Patel, “Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2594–2609, 2020.
  28. H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot, and M. Shah, “Composition loss for counting, density map estimation and localization in dense crowds,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 532–546.
  29. H. Li, L. Liu, K. Yang, S. Liu, J. Gao, B. Zhao, R. Zhang, and J. Hou, “Video crowd localization with multifocus gaussian neighborhood attention and a large-scale benchmark,” IEEE Transactions on Image Processing, vol. 31, pp. 6032–6047, 2022.
  30. P. Viola and M. Jones, “Robust real-time face detection,” in Proc. IEEE International Conference on Computer Vision, vol. 2, 2001, pp. 747–747.
  31. P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Object Detection with Discriminatively Trained Part-Based Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627–1645, 2010.
  32. K. Dijkstra, J. Loosdrecht, L. Schomaker, and M. A. Wiering, “Centroidnet: A deep neural network for joint object localization and counting,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases.   Springer, 2018, pp. 585–601.
  33. H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multi-source multi-scale counting in extremely dense crowd images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 2547–2554.
  34. K. Chen, C. C. Loy, S. Gong, and T. Xiang, “Feature mining for localised crowd counting.” in Bmvc, vol. 1, no. 2, 2012, p. 3.
  35. X. Jiang, Z. Xiao, B. Zhang, X. Zhen, X. Cao, D. Doermann, and L. Shao, “Crowd counting and density estimation by trellis encoder-decoder networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  36. B. Tan, J. Zhang, and L. Wang, “Semi-supervised elastic net for pedestrian counting,” Pattern Recognition, vol. 44, no. 10-11, pp. 2297–2304, 2011.
  37. W. Liu, M. Salzmann, and P. Fua, “Context-aware crowd counting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5099–5108.
  38. V. Lempitsky and A. Zisserman, “Learning to count objects in images,” Advances in neural information processing systems, vol. 23, 2010.
  39. Y. Li, X. Zhang, and D. Chen, “Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1091–1100.
  40. Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6142–6151.
  41. X. Liu, J. Van De Weijer, and A. D. Bagdanov, “Leveraging unlabeled data for crowd counting by learning to rank,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7661–7669.
  42. T. Moranduzzo and F. Melgani, “Automatic car counting method for unmanned aerial vehicle images,” IEEE Transactions on Geoscience and remote sensing, vol. 52, no. 3, pp. 1635–1647, 2013.
  43. D. Kamenetsky and J. Sherrah, “Aerial car detection and urban understanding,” in 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA).   IEEE, 2015, pp. 1–8.
  44. A. B. Chan and N. Vasconcelos, “Bayesian poisson regression for crowd counting,” in 2009 IEEE 12th international conference on computer vision.   IEEE, 2009, pp. 545–551.
  45. W. Xu, D. Liang, Y. Zheng, J. Xie, and Z. Ma, “Dilated-scale-aware category-attention convnet for multi-class object counting,” IEEE Signal Processing Letters, vol. 28, pp. 1570–1574, 2021.
  46. T. Stahl, S. L. Pintea, and J. C. Van Gemert, “Divide and count: Generic object counting by image divisions,” IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 1035–1044, 2018.
  47. W. Li, H. Li, Q. Wu, X. Chen, and K. N. Ngan, “Simultaneously detecting and counting dense vehicles from drone images,” IEEE Transactions on Industrial Electronics, vol. 66, no. 12, pp. 9651–9662, 2019.
  48. G. Ding, M. Cui, D. Yang, T. Wang, S. Wang, and Y. Zhang, “Object counting for remote-sensing images via adaptive density map-assisted learning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2022.
  49. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9992–10 002.
  50. J. Gao, M. Gong, and X. Li, “Global multi-scale information fusion for multi-class object counting in remote sensing images,” Remote Sensing, vol. 14, no. 16, p. 4026, 2022.
  51. J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3146–3154.
  52. Q. Wang, J. Gao, W. Lin, and Y. Yuan, “Learning from synthetic data for crowd counting in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 8198–8207.
  53. J. Gao, Q. Wang, and Y. Yuan, “Scar: Spatial-/channel-wise attention regression networks for crowd counting,” Neurocomputing, vol. 363, pp. 1–8, 2019.
  54. J. Gao, W. Lin, B. Zhao, D. Wang, C. Gao, and J. Wen, “C^ 3 framework: An open-source pytorch code for crowd counting,” arXiv preprint arXiv:1907.02724, 2019.
  55. D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015.
Citations (8)

Summary

We haven't generated a summary for this paper yet.