Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation (2404.13992v2)

Published 22 Apr 2024 in cs.CV

Abstract: Crowd localization targets on predicting each instance precise location within an image. Current advanced methods propose the pixel-wise binary classification to tackle the congested prediction, in which the pixel-level thresholds binarize the prediction confidence of being the pedestrian head. Since the crowd scenes suffer from extremely varying contents, counts and scales, the confidence-threshold learner is fragile and under-generalized encountering domain knowledge shift. Moreover, at the most time, the target domain is agnostic in training. Hence, it is imperative to exploit how to enhance the generalization of confidence-threshold locator to the latent target domain. In this paper, we propose a Dynamic Proxy Domain (DPD) method to generalize the learner under domain shift. Concretely, based on the theoretical analysis to the generalization error risk upper bound on the latent target domain to a binary classifier, we propose to introduce a generated proxy domain to facilitate generalization. Then, based on the theory, we design a DPD algorithm which is composed by a training paradigm and proxy domain generator to enhance the domain generalization of the confidence-threshold learner. Besides, we conduct our method on five kinds of domain shift scenarios, demonstrating the effectiveness on generalizing the crowd localization. Our code will be available at https://github.com/zhangda1018/DPD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. J. Gao, T. Han, Y. Yuan, and Q. Wang, “Domain-adaptive crowd counting via high-quality image translation and density reconstruction,” IEEE transactions on neural networks and learning systems, vol. 34, no. 8, pp. 4803–4815, 2021.
  2. D. Liang, W. Xu, Y. Zhu, and Y. Zhou, “Focal inverse distance transform maps for crowd localization,” IEEE Transactions on Multimedia, 2022.
  3. J. Wang, J. Gao, Y. Yuan, and Q. Wang, “Crowd localization from gaussian mixture scoped knowledge and scoped teacher,” IEEE Transactions on Image Processing, vol. 32, pp. 1802–1814, 2023.
  4. J. Gao, M. Gong, and X. Li, “Congested crowd instance localization with dilated convolutional swin transformer,” Neurocomputing, vol. 513, pp. 94–103, 2022.
  5. H. Zhu, J. Yuan, X. Zhong, Z. Yang, Z. Wang, and S. He, “Daot: Domain-agnostically aligned optimal transport for domain-adaptive crowd counting,” in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 4319–4329.
  6. P. Ge, C.-X. Ren, X.-L. Xu, and H. Yan, “Unsupervised domain adaptation via deep conditional adaptation network,” Pattern Recognition, vol. 134, p. 109088, 2023.
  7. H. Xie, Z. Yang, H. Zhu, and Z. Wang, “Striking a balance: Unsupervised cross-domain crowd counting via knowledge diffusion,” in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 6520–6529.
  8. S. Abousamra, M. Hoai, D. Samaras, and C. Chen, “Localization in the crowd with topological constraints,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, 2021, pp. 872–881.
  9. W. Lin and A. B. Chan, “Optimal transport minimization: Crowd localization on density maps for semi-supervised counting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21 663–21 673.
  10. J. Gao, T. Han, Q. Wang, Y. Yuan, and X. Li, “Learning independent instance maps for crowd localization,” arXiv preprint arXiv:2012.04164, 2020.
  11. D. Liang, J. Xie, Z. Zou, X. Ye, W. Xu, and X. Bai, “Crowdclip: Unsupervised crowd counting via vision-language model,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2893–2903.
  12. W. Zhou, X. Yang, J. Lei, W. Yan, and L. Yu, “Mc33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTnet: Multimodality cross-guided compensation coordination network for rgb-t crowd counting,” IEEE Transactions on Intelligent Transportation Systems, 2023.
  13. Q. Zhang and A. B. Chan, “Calibration-free multi-view crowd counting,” in European Conference on Computer Vision.   Springer, 2022, pp. 227–244.
  14. H. Lin, Z. Ma, X. Hong, Y. Wang, and Z. Su, “Semi-supervised crowd counting via density agency,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 1416–1426.
  15. D. Liang, X. Chen, W. Xu, Y. Zhou, and X. Bai, “Transcrowd: weakly-supervised crowd counting with transformers,” Science China Information Sciences, vol. 65, no. 6, p. 160104, 2022.
  16. J. Wan and A. Chan, “Modeling noisy annotations for crowd counting,” Advances in Neural Information Processing Systems, vol. 33, pp. 3386–3396, 2020.
  17. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
  18. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
  19. P. Hu and D. Ramanan, “Finding tiny faces,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 951–959.
  20. Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, “Finding tiny faces in the wild with generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 21–30.
  21. Z. Li, X. Tang, J. Han, J. Liu, and R. He, “Pyramidbox++: High performance detector for finding tiny face,” arXiv preprint arXiv:1904.00386, 2019.
  22. Q. Song, C. Wang, Z. Jiang, Y. Wang, Y. Tai, C. Wang, J. Li, F. Huang, and Y. Wu, “Rethinking counting and localization in crowds: A purely point-based framework,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3365–3374.
  23. X. Yu, P. Chen, D. Wu, N. Hassan, G. Li, J. Yan, H. Shi, Q. Ye, and Z. Han, “Object localization under single coarse point supervision,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4868–4877.
  24. H. Li, L. Liu, K. Yang, S. Liu, J. Gao, B. Zhao, R. Zhang, and J. Hou, “Video crowd localization with multifocus gaussian neighborhood attention and a large-scale benchmark,” IEEE Transactions on Image Processing, vol. 31, pp. 6032–6047, 2022.
  25. J. Wen, R. Liu, N. Zheng, Q. Zheng, Z. Gong, and J. Yuan, “Exploiting local feature patterns for unsupervised domain adaptation,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 5401–5408.
  26. J. Wen, N. Zheng, J. Yuan, Z. Gong, and C. Chen, “Bayesian uncertainty matching for unsupervised domain adaptation,” arXiv preprint arXiv:1906.09693, 2019.
  27. Y. Zhang, T. Liu, M. Long, and M. Jordan, “Bridging theory and algorithm for domain adaptation,” in International conference on machine learning.   PMLR, 2019, pp. 7404–7413.
  28. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. March, and V. Lempitsky, “Domain-adversarial training of neural networks,” Journal of machine learning research, vol. 17, no. 59, pp. 1–35, 2016.
  29. W. Liu, N. Durasov, and P. Fua, “Leveraging self-supervision for cross-domain crowd counting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5341–5352.
  30. Q. Wang, T. Han, J. Gao, and Y. Yuan, “Neuron linear transformation: Modeling the domain shift for crowd counting,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 8, pp. 3238–3250, 2021.
  31. H. Zhu, J. Yuan, Z. Yang, X. Zhong, and Z. Wang, “Fine-grained fragment diffusion for cross domain crowd counting,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 5659–5668.
  32. Z. Zou, X. Qu, P. Zhou, S. Xu, X. Ye, W. Wu, and J. Ye, “Coarse to fine: Domain adaptive crowd counting via adversarial scoring network,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 2185–2194.
  33. Z. Du, J. Deng, and M. Shi, “Domain-general crowd counting in unseen scenarios,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 561–570.
  34. Q. Wu, J. Wan, and A. B. Chan, “Dynamic momentum adaptation for zero-shot cross-domain crowd counting,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 658–666.
  35. S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” Advances in neural information processing systems, vol. 19, 2006.
  36. J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y. Lee, and S. Park, “Swad: Domain generalization by seeking flat minima,” Advances in Neural Information Processing Systems, vol. 34, pp. 22 405–22 418, 2021.
  37. Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 589–597.
  38. H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot, and M. Shah, “Composition loss for counting, density map estimation and localization in dense crowds,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 532–546.
  39. V. A. Sindagi, R. Yasarla, and V. M. Patel, “Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1221–1231.
  40. Q. Wang, J. Gao, W. Lin, and X. Li, “Nwpu-crowd: A large-scale benchmark for crowd counting and localization,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 6, pp. 2141–2149, 2020.
  41. Y. Fang, B. Zhan, W. Cai, S. Gao, and B. Hu, “Locality-constrained spatial transformer network for video crowd counting,” in 2019 IEEE international conference on multimedia and expo (ICME).   IEEE, 2019, pp. 814–819.
  42. J. Gao, W. Lin, B. Zhao, D. Wang, C. Gao, and J. Wen, “C^ 3 framework: An open-source pytorch code for crowd counting,” arXiv preprint arXiv:1907.02724, 2019.
  43. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  44. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
  45. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  46. Z. Huang, H. Wang, E. P. Xing, and D. Huang, “Self-challenging improves cross-domain generalization,” in Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part II 16.   Springer, 2020, pp. 124–140.
  47. Y. Zhang, M. Li, R. Li, K. Jia, and L. Zhang, “Exact feature distribution matching for arbitrary style transfer and domain generalization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8035–8045.
  48. M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “Invariant risk minimization,” arXiv preprint arXiv:1907.02893, 2019.
  49. B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14.   Springer, 2016, pp. 443–450.
  50. T. Han, L. Bai, L. Liu, and W. Ouyang, “Steerer: Resolving scale variations for counting and localization via selective inheritance learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 21 848–21 859.
  51. S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Machine learning, vol. 79, pp. 151–175, 2010.

Summary

We haven't generated a summary for this paper yet.