Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semi-supervised Counting via Pixel-by-pixel Density Distribution Modelling (2402.15297v1)

Published 23 Feb 2024 in cs.CV and cs.LG

Abstract: This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled. We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value. On this basis, we propose a semi-supervised crowd-counting model. Firstly, we design a pixel-wise distribution matching loss to measure the differences in the pixel-wise density distributions between the prediction and the ground truth; Secondly, we enhance the transformer decoder by using density tokens to specialize the forwards of decoders w.r.t. different density intervals; Thirdly, we design the interleaving consistency self-supervised learning mechanism to learn from unlabeled data efficiently. Extensive experiments on four datasets are performed to show that our method clearly outperforms the competitors by a large margin under various labeled ratio settings. Code will be released at https://github.com/LoraLinH/Semi-supervised-Counting-via-Pixel-by-pixel-Density-Distribution-Modelling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in CVPR, 2016.
  2. X. Cao, Z. Wang, Y. Zhao, and F. Su, “Scale aggregation network for accurate and efficient crowd counting,” in ECCV, 2018.
  3. Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in ICCV, 2019.
  4. X. Liu, J. Van De Weijer, and A. D. Bagdanov, “Leveraging unlabeled data for crowd counting by learning to rank,” in CVPR, 2018.
  5. ——, “Exploiting unlabeled data in cnns by self-supervised learning to rank,” IEEE TPAMI, 2019.
  6. V. A. Sindagi, R. Yasarla, D. S. Babu, R. V. Babu, and V. M. Patel, “Learning to count in the crowd from limited labeled data,” in ECCV, 2020.
  7. Y. Meng, H. Zhang, Y. Zhao, X. Yang, X. Qian, X. Huang, and Y. Zheng, “Spatial uncertainty-aware semi-supervised crowd counting,” in ICCV, 2021.
  8. J. Wan and A. Chan, “Modeling noisy annotations for crowd counting,” NIPS, 2020.
  9. S. Bai, Z. He, Y. Qiao, H. Hu, W. Wu, and J. Yan, “Adaptive dilated network with self-correction supervision for counting,” in CVPR, 2020.
  10. H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot, and M. Shah, “Composition loss for counting, density map estimation and localization in dense crowds,” in ECCV, 2018.
  11. V. Sindagi, R. Yasarla, and V. M. Patel, “Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method,” PAMI, 2020.
  12. Y. Liu, M. Shi, Q. Zhao, and X. Wang, “Point in, box out: Beyond counting persons in crowds,” in CVPR, 2019.
  13. J. Liu, C. Gao, D. Meng, and A. G. Hauptmann, “Decidenet: Counting varying density crowds through attention guided detection and density estimation,” in CVPR, 2018.
  14. V. Lempitsky and A. Zisserman, “Learning to count objects in images,” NIPS, 2010.
  15. D. Babu Sam, S. Surya, and R. Venkatesh Babu, “Switching convolutional neural network for crowd counting,” in CVPR, 2017.
  16. Y. Li, X. Zhang, and D. Chen, “Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,” in CVPR, 2018.
  17. L. Zeng, X. Xu, B. Cai, S. Qiu, and T. Zhang, “Multi-scale convolutional neural networks for crowd counting,” in ICIP, 2017.
  18. V. A. Sindagi and V. M. Patel, “Multi-level bottom-top and top-bottom feature fusion for crowd counting,” in ICCV, 2019.
  19. Z. Ma, X. Wei, X. Hong, and Y. Gong, “Learning scales from points: A scale-aware probabilistic model for crowd counting,” in ACM Multimedia, 2020.
  20. M. Shi, Z. Yang, C. Xu, and Q. Chen, “Revisiting perspective information for efficient crowd counting,” in CVPR, 2019.
  21. Z. Yan, Y. Yuan, W. Zuo, X. Tan, Y. Wang, S. Wen, and E. Ding, “Perspective-guided convolution networks for crowd counting,” in ICCV, 2019.
  22. B. Wang, H. Liu, D. Samaras, and M. H. Nguyen, “Distribution matching for crowd counting,” NIPS, 2020.
  23. Z. Ma, X. Wei, X. Hong, H. Lin, Y. Qiu, and Y. Gong, “Learning to count via unbalanced optimal transport,” in AAAI, 2021.
  24. H. Lin, X. Hong, Z. Ma, X. Wei, Y. Qiu, Y. Wang, and Y. Gong, “Direct measure matching for crowd counting,” IJCAI, 2021.
  25. H. Xiong, H. Lu, C. Liu, L. Liu, Z. Cao, and C. Shen, “From open set to closed set: Counting objects by spatial divide-and-conquer,” in ICCV, 2019.
  26. L. Liu, H. Lu, H. Xiong, K. Xian, Z. Cao, and C. Shen, “Counting objects by blockwise classification,” TCSVT, 2019.
  27. X. Liu, J. Yang, W. Ding, T. Wang, Z. Wang, and J. Xiong, “Adaptive mixture regression network with local counting map for crowd counting,” in ECCV, 2020.
  28. C. Wang, Q. Song, B. Zhang, Y. Wang, Y. Tai, X. Hu, C. Wang, J. Li, J. Ma, and Y. Wu, “Uniformity in heterogeneity: Diving deep into count interval partition for crowd counting,” in ICCV, 2021.
  29. Z. Zhao, M. Shi, X. Zhao, and L. Li, “Active crowd counting with limited supervision,” in ECCV, 2020.
  30. Y. Liu, L. Liu, P. Wang, P. Zhang, and Y. Lei, “Semi-supervised crowd counting via self-training on surrogate tasks,” in ECCV, 2020.
  31. H. Lin, Z. Ma, X. Hong, Y. Wang, and Z. Su, “Semi-supervised crowd counting via density agency,” in ACM MM, 2022.
  32. W. Lin and A. B. Chan, “Optimal transport minimization: Crowd localization on density maps for semi-supervised counting,” in CVPR, 2023, pp. 21 663–21 673.
  33. P. Zhu, J. Li, B. Cao, and Q. Hu, “Multi-task credible pseudo-label learning for semi-supervised crowd counting,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
  34. Y. Yang, G. Li, Z. Wu, L. Su, Q. Huang, and N. Sebe, “Weakly-supervised crowd counting learns from sorting rather than locations,” in ECCV, 2020.
  35. Y. Lei, Y. Liu, P. Zhang, and L. Liu, “Towards using count-level weak supervision for crowd counting,” Pattern Recognition, 2021.
  36. V. A. Sindagi and V. M. Patel, “Ha-ccn: Hierarchical attention-based crowd counting network,” IEEE Transactions on Image Processing, 2019.
  37. T. Han, J. Gao, Y. Yuan, and Q. Wang, “Focus on semantic consistency for cross-domain crowd understanding,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2020, pp. 1848–1852.
  38. W. Liu, N. Durasov, and P. Fua, “Leveraging self-supervision for cross-domain crowd counting,” in CVPR, 2022, pp. 5341–5352.
  39. Q. Wang, J. Gao, W. Lin, and Y. Yuan, “Learning from synthetic data for crowd counting in the wild,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8198–8207.
  40. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2020.
  41. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017.
  42. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in ECCV, 2020.
  43. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” in ICLR, 2020.
  44. M. Zheng, P. Gao, X. Wang, H. Li, and H. Dong, “End-to-end object detection with adaptive clustering transformer,” arXiv preprint, 2020.
  45. Z. Sun, S. Cao, Y. Yang, and K. M. Kitani, “Rethinking transformer-based set prediction for object detection,” in ICCV, 2021.
  46. S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr et al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in CVPR, 2021.
  47. Y. Wang, Z. Xu, X. Wang, C. Shen, B. Cheng, H. Shen, and H. Xia, “End-to-end video instance segmentation with transformers,” in CVPR, 2021.
  48. R. Strudel, R. Garcia, I. Laptev, and C. Schmid, “Segmenter: Transformer for semantic segmentation,” arXiv preprint, 2021.
  49. B. Cheng, A. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,” NIPS, 2021.
  50. X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, “Transformer tracking,” in CVPR, 2021.
  51. N. Wang, W. Zhou, J. Wang, and H. Li, “Transformer meets tracker: Exploiting temporal context for robust visual tracking,” in CVPR, 2021.
  52. P. Sun, Y. Jiang, R. Zhang, E. Xie, J. Cao, X. Hu, T. Kong, Z. Yuan, C. Wang, and P. Luo, “Transtrack: Multiple-object tracking with transformer,” arXiv preprint, 2020.
  53. H. Lin, Z. Ma, R. Ji, Y. Wang, and X. Hong, “Boosting crowd counting via multifaceted attention,” in CVPR, 2022.
  54. X. Wei, Y. Kang, J. Yang, Y. Qiu, D. Shi, W. Tan, and Y. Gong, “Scene-adaptive attention network for crowd counting,” arXiv preprint, 2021.
  55. D. Liang, X. Chen, W. Xu, Y. Zhou, and X. Bai, “Transcrowd: Weakly-supervised crowd counting with transformer,” arXiv preprint, 2021.
  56. H. Su and H. Zhang, “Distances and kernels based on cumulative distribution functions,” in Emerging Trends in Image Processing, Computer Vision and Pattern Recognition.   Elsevier, 2015, pp. 551–559.
  57. M.-H. Chun, S.-J. Han, and N.-I. Tak, “An uncertainty importance measure using a distance metric for the change in a cumulative distribution function,” Reliability Engineering & System Safety, vol. 70, no. 3, pp. 313–321, 2000.
  58. S. Kolouri, P. E. Pope, C. E. Martin, and G. K. Rohde, “Sliced-wasserstein autoencoder: An embarrassingly simple generative model,” arXiv preprint, 2018.
  59. M. De Angelis and A. Gray, “Why the 1-wasserstein distance is the area between the two marginal cdfs,” arXiv preprint arXiv:2111.03570, 2021.
  60. A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” NIPS, 2017.
  61. Z. Ke, D. Wang, Q. Yan, J. Ren, and R. W. Lau, “Dual student: Breaking the limits of the teacher in semi-supervised learning,” in ICCV, 2019.
  62. Q. Wang, J. Gao, W. Lin, and X. Li, “Nwpu-crowd: A large-scale benchmark for crowd counting and localization,” PAMI, 2020.
  63. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint, 2014.
  64. S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, and J. Sun, “Crowdhuman: A benchmark for detecting human in a crowd,” arXiv preprint arXiv:1805.00123, 2018.
  65. Q. Song, C. Wang, Z. Jiang, Y. Wang, Y. Tai, C. Wang, J. Li, F. Huang, and Y. Wu, “Rethinking counting and localization in crowds: A purely point-based framework,” in ICCV, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hui Lin (54 papers)
  2. Zhiheng Ma (21 papers)
  3. Rongrong Ji (315 papers)
  4. Yaowei Wang (149 papers)
  5. Zhou Su (51 papers)
  6. Xiaopeng Hong (59 papers)
  7. Deyu Meng (182 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.