Semi-supervised Counting via Pixel-by-pixel Density Distribution Modelling (2402.15297v1)
Abstract: This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled. We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value. On this basis, we propose a semi-supervised crowd-counting model. Firstly, we design a pixel-wise distribution matching loss to measure the differences in the pixel-wise density distributions between the prediction and the ground truth; Secondly, we enhance the transformer decoder by using density tokens to specialize the forwards of decoders w.r.t. different density intervals; Thirdly, we design the interleaving consistency self-supervised learning mechanism to learn from unlabeled data efficiently. Extensive experiments on four datasets are performed to show that our method clearly outperforms the competitors by a large margin under various labeled ratio settings. Code will be released at https://github.com/LoraLinH/Semi-supervised-Counting-via-Pixel-by-pixel-Density-Distribution-Modelling.
- Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in CVPR, 2016.
- X. Cao, Z. Wang, Y. Zhao, and F. Su, “Scale aggregation network for accurate and efficient crowd counting,” in ECCV, 2018.
- Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in ICCV, 2019.
- X. Liu, J. Van De Weijer, and A. D. Bagdanov, “Leveraging unlabeled data for crowd counting by learning to rank,” in CVPR, 2018.
- ——, “Exploiting unlabeled data in cnns by self-supervised learning to rank,” IEEE TPAMI, 2019.
- V. A. Sindagi, R. Yasarla, D. S. Babu, R. V. Babu, and V. M. Patel, “Learning to count in the crowd from limited labeled data,” in ECCV, 2020.
- Y. Meng, H. Zhang, Y. Zhao, X. Yang, X. Qian, X. Huang, and Y. Zheng, “Spatial uncertainty-aware semi-supervised crowd counting,” in ICCV, 2021.
- J. Wan and A. Chan, “Modeling noisy annotations for crowd counting,” NIPS, 2020.
- S. Bai, Z. He, Y. Qiao, H. Hu, W. Wu, and J. Yan, “Adaptive dilated network with self-correction supervision for counting,” in CVPR, 2020.
- H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot, and M. Shah, “Composition loss for counting, density map estimation and localization in dense crowds,” in ECCV, 2018.
- V. Sindagi, R. Yasarla, and V. M. Patel, “Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method,” PAMI, 2020.
- Y. Liu, M. Shi, Q. Zhao, and X. Wang, “Point in, box out: Beyond counting persons in crowds,” in CVPR, 2019.
- J. Liu, C. Gao, D. Meng, and A. G. Hauptmann, “Decidenet: Counting varying density crowds through attention guided detection and density estimation,” in CVPR, 2018.
- V. Lempitsky and A. Zisserman, “Learning to count objects in images,” NIPS, 2010.
- D. Babu Sam, S. Surya, and R. Venkatesh Babu, “Switching convolutional neural network for crowd counting,” in CVPR, 2017.
- Y. Li, X. Zhang, and D. Chen, “Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,” in CVPR, 2018.
- L. Zeng, X. Xu, B. Cai, S. Qiu, and T. Zhang, “Multi-scale convolutional neural networks for crowd counting,” in ICIP, 2017.
- V. A. Sindagi and V. M. Patel, “Multi-level bottom-top and top-bottom feature fusion for crowd counting,” in ICCV, 2019.
- Z. Ma, X. Wei, X. Hong, and Y. Gong, “Learning scales from points: A scale-aware probabilistic model for crowd counting,” in ACM Multimedia, 2020.
- M. Shi, Z. Yang, C. Xu, and Q. Chen, “Revisiting perspective information for efficient crowd counting,” in CVPR, 2019.
- Z. Yan, Y. Yuan, W. Zuo, X. Tan, Y. Wang, S. Wen, and E. Ding, “Perspective-guided convolution networks for crowd counting,” in ICCV, 2019.
- B. Wang, H. Liu, D. Samaras, and M. H. Nguyen, “Distribution matching for crowd counting,” NIPS, 2020.
- Z. Ma, X. Wei, X. Hong, H. Lin, Y. Qiu, and Y. Gong, “Learning to count via unbalanced optimal transport,” in AAAI, 2021.
- H. Lin, X. Hong, Z. Ma, X. Wei, Y. Qiu, Y. Wang, and Y. Gong, “Direct measure matching for crowd counting,” IJCAI, 2021.
- H. Xiong, H. Lu, C. Liu, L. Liu, Z. Cao, and C. Shen, “From open set to closed set: Counting objects by spatial divide-and-conquer,” in ICCV, 2019.
- L. Liu, H. Lu, H. Xiong, K. Xian, Z. Cao, and C. Shen, “Counting objects by blockwise classification,” TCSVT, 2019.
- X. Liu, J. Yang, W. Ding, T. Wang, Z. Wang, and J. Xiong, “Adaptive mixture regression network with local counting map for crowd counting,” in ECCV, 2020.
- C. Wang, Q. Song, B. Zhang, Y. Wang, Y. Tai, X. Hu, C. Wang, J. Li, J. Ma, and Y. Wu, “Uniformity in heterogeneity: Diving deep into count interval partition for crowd counting,” in ICCV, 2021.
- Z. Zhao, M. Shi, X. Zhao, and L. Li, “Active crowd counting with limited supervision,” in ECCV, 2020.
- Y. Liu, L. Liu, P. Wang, P. Zhang, and Y. Lei, “Semi-supervised crowd counting via self-training on surrogate tasks,” in ECCV, 2020.
- H. Lin, Z. Ma, X. Hong, Y. Wang, and Z. Su, “Semi-supervised crowd counting via density agency,” in ACM MM, 2022.
- W. Lin and A. B. Chan, “Optimal transport minimization: Crowd localization on density maps for semi-supervised counting,” in CVPR, 2023, pp. 21 663–21 673.
- P. Zhu, J. Li, B. Cao, and Q. Hu, “Multi-task credible pseudo-label learning for semi-supervised crowd counting,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Y. Yang, G. Li, Z. Wu, L. Su, Q. Huang, and N. Sebe, “Weakly-supervised crowd counting learns from sorting rather than locations,” in ECCV, 2020.
- Y. Lei, Y. Liu, P. Zhang, and L. Liu, “Towards using count-level weak supervision for crowd counting,” Pattern Recognition, 2021.
- V. A. Sindagi and V. M. Patel, “Ha-ccn: Hierarchical attention-based crowd counting network,” IEEE Transactions on Image Processing, 2019.
- T. Han, J. Gao, Y. Yuan, and Q. Wang, “Focus on semantic consistency for cross-domain crowd understanding,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 1848–1852.
- W. Liu, N. Durasov, and P. Fua, “Leveraging self-supervision for cross-domain crowd counting,” in CVPR, 2022, pp. 5341–5352.
- Q. Wang, J. Gao, W. Lin, and Y. Yuan, “Learning from synthetic data for crowd counting in the wild,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8198–8207.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2020.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in ECCV, 2020.
- X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” in ICLR, 2020.
- M. Zheng, P. Gao, X. Wang, H. Li, and H. Dong, “End-to-end object detection with adaptive clustering transformer,” arXiv preprint, 2020.
- Z. Sun, S. Cao, Y. Yang, and K. M. Kitani, “Rethinking transformer-based set prediction for object detection,” in ICCV, 2021.
- S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr et al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in CVPR, 2021.
- Y. Wang, Z. Xu, X. Wang, C. Shen, B. Cheng, H. Shen, and H. Xia, “End-to-end video instance segmentation with transformers,” in CVPR, 2021.
- R. Strudel, R. Garcia, I. Laptev, and C. Schmid, “Segmenter: Transformer for semantic segmentation,” arXiv preprint, 2021.
- B. Cheng, A. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,” NIPS, 2021.
- X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, “Transformer tracking,” in CVPR, 2021.
- N. Wang, W. Zhou, J. Wang, and H. Li, “Transformer meets tracker: Exploiting temporal context for robust visual tracking,” in CVPR, 2021.
- P. Sun, Y. Jiang, R. Zhang, E. Xie, J. Cao, X. Hu, T. Kong, Z. Yuan, C. Wang, and P. Luo, “Transtrack: Multiple-object tracking with transformer,” arXiv preprint, 2020.
- H. Lin, Z. Ma, R. Ji, Y. Wang, and X. Hong, “Boosting crowd counting via multifaceted attention,” in CVPR, 2022.
- X. Wei, Y. Kang, J. Yang, Y. Qiu, D. Shi, W. Tan, and Y. Gong, “Scene-adaptive attention network for crowd counting,” arXiv preprint, 2021.
- D. Liang, X. Chen, W. Xu, Y. Zhou, and X. Bai, “Transcrowd: Weakly-supervised crowd counting with transformer,” arXiv preprint, 2021.
- H. Su and H. Zhang, “Distances and kernels based on cumulative distribution functions,” in Emerging Trends in Image Processing, Computer Vision and Pattern Recognition. Elsevier, 2015, pp. 551–559.
- M.-H. Chun, S.-J. Han, and N.-I. Tak, “An uncertainty importance measure using a distance metric for the change in a cumulative distribution function,” Reliability Engineering & System Safety, vol. 70, no. 3, pp. 313–321, 2000.
- S. Kolouri, P. E. Pope, C. E. Martin, and G. K. Rohde, “Sliced-wasserstein autoencoder: An embarrassingly simple generative model,” arXiv preprint, 2018.
- M. De Angelis and A. Gray, “Why the 1-wasserstein distance is the area between the two marginal cdfs,” arXiv preprint arXiv:2111.03570, 2021.
- A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” NIPS, 2017.
- Z. Ke, D. Wang, Q. Yan, J. Ren, and R. W. Lau, “Dual student: Breaking the limits of the teacher in semi-supervised learning,” in ICCV, 2019.
- Q. Wang, J. Gao, W. Lin, and X. Li, “Nwpu-crowd: A large-scale benchmark for crowd counting and localization,” PAMI, 2020.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint, 2014.
- S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, and J. Sun, “Crowdhuman: A benchmark for detecting human in a crowd,” arXiv preprint arXiv:1805.00123, 2018.
- Q. Song, C. Wang, Z. Jiang, Y. Wang, Y. Tai, C. Wang, J. Li, F. Huang, and Y. Wu, “Rethinking counting and localization in crowds: A purely point-based framework,” in ICCV, 2021.
- Hui Lin (54 papers)
- Zhiheng Ma (21 papers)
- Rongrong Ji (315 papers)
- Yaowei Wang (149 papers)
- Zhou Su (51 papers)
- Xiaopeng Hong (59 papers)
- Deyu Meng (182 papers)