Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Single Domain Generalization for Crowd Counting (2403.09124v2)

Published 14 Mar 2024 in cs.CV

Abstract: Due to its promising results, density map regression has been widely employed for image-based crowd counting. The approach, however, often suffers from severe performance degradation when tested on data from unseen scenarios, the so-called "domain shift" problem. To address the problem, we investigate in this work single domain generalization (SDG) for crowd counting. The existing SDG approaches are mainly for image classification and segmentation, and can hardly be extended to our case due to its regression nature and label ambiguity (i.e., ambiguous pixel-level ground truths). We propose MPCount, a novel effective SDG approach even for narrow source distribution. MPCount stores diverse density values for density map regression and reconstructs domain-invariant features by means of only one memory bank, a content error mask and attention consistency loss. By partitioning the image into grids, it employs patch-wise classification as an auxiliary task to mitigate label ambiguity. Through extensive experiments on different datasets, MPCount is shown to significantly improve counting accuracy compared to the state of the art under diverse scenarios unobserved in the training data characterized by narrow source distribution. Code is available at https://github.com/Shimmer93/MPCount.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. CounTr: An End-to-End Transformer Approach for Crowd Counting and Density Estimation. In Proceedings of European Conference on Computer Vision (ECCV) International Workshop on Distributed Smart Cameras, pages 207–222, Tel Aviv, Israel, 2022. Springer Lecture Notes in Computer Science.
  2. Explicit invariant feature induced cross-domain crowd counting. Proceedings of the AAAI Conference on Artificial Intelligence, 37:259–267, 2023.
  3. A style and semantic memory mechanism for domain generalization*. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9144–9153, 2021.
  4. Decoupled two-stage crowd counting and beyond. IEEE Transactions on Image Processing, 30:2862–2875, 2021.
  5. Robustnet: Improving domain generalization in urban-scene segmentation via instance selective whitening. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11575–11585, 2021.
  6. Domain-general crowd counting in unseen scenarios. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 561–570, 2023.
  7. Domain-adaptive crowd counting via high-quality image translation and density reconstruction. IEEE Transactions on Neural Networks and Learning Systems, 34:4803–4815, 2019a.
  8. C33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT framework: An open-source pytorch code for crowd counting. arXiv preprint arXiv:1907.02724, 2019b.
  9. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
  10. Target-agnostic Source-free Domain Adaptation for Regression Tasks. In Proceedings of 40th IEEE International Conference on Data Engineering (ICDE’24), Utrecht, Netherlands, 13-16 May 2024 (to appear). IEEE.
  11. Composition loss for counting, density map estimation and localization in dense crowds. In European Conference on Computer Vision, 2018.
  12. Style normalization and restitution for domain generalization and adaptation. IEEE Transactions on Multimedia, 24:3636–3651, 2022.
  13. Single domain generalization for lidar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17587–17598, 2023.
  14. Pin the memory: Learning to generalize semantic segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4340–4350, 2022.
  15. Progressive domain expansion network for single domain generalization. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 224–233, 2021.
  16. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1091–1100, 2018.
  17. Boosting crowd counting via multifaceted attention. In CVPR, 2022.
  18. Towards unsupervised crowd counting via regression-detection bi-knowledge transfer. In Proceedings of the 28th ACM International Conference on Multimedia, page 129–137, New York, NY, USA, 2020. Association for Computing Machinery.
  19. Fixing weight decay regularization in adam. ArXiv, abs/1711.05101, 2017.
  20. Bayesian loss for crowd count estimation with point supervision. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6141–6150, 2019.
  21. Domain generalization via gradient surgery. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6610–6618, 2021.
  22. Spatial uncertainty-aware semi-supervised crowd counting. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 15529–15539, 2021.
  23. Two at once: Enhancing learning and generalization capacities via ibn-net. In European Conference on Computer Vision, 2018.
  24. Switchable whitening for deep representation learning. In The IEEE International Conference on Computer Vision (ICCV), 2019.
  25. Learning to learn single domain generalization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 12556–12565, 2020.
  26. Crowd counting in the frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19618–19627, 2022.
  27. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
  28. Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–6, 2017.
  29. Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44:2594–2609, 2020.
  30. Super-convergence: very fast training of neural networks using large learning rates. In Defense + Commercial Sensing, 2017.
  31. To choose or to fuse? scale selection for crowd counting. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), 2021.
  32. Self-distilled vision transformer for domain generalization. In Proceedings of the Asian Conference on Computer Vision (ACCV), pages 3068–3085, 2022.
  33. Instance normalization: The missing ingredient for fast stylization. ArXiv, abs/1607.08022, 2016.
  34. Clip the gap: A single domain generalization approach for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3219–3229, 2023.
  35. Meta convolutional neural networks for single domain generalization. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4672–4681, 2022.
  36. A generalized loss function for crowd counting and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1974–1983, 2021.
  37. Distribution matching for crowd counting. In Advances in Neural Information Processing Systems, 2020a.
  38. Stnet: Scale tree network with multi-level auxiliator for crowd counting. IEEE Transactions on Multimedia, 25:2074–2084, 2020b.
  39. Learning from synthetic data for crowd counting in the wild. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 8198–8207, 2019.
  40. Neuron linear transformation: Modeling the domain shift for crowd counting. IEEE Transactions on Neural Networks and Learning Systems, 33(8):3238–3250, 2022.
  41. Dynamic momentum adaptation for zero-shot cross-domain crowd counting. In Proceedings of the 29th ACM International Conference on Multimedia, page 658–666, New York, NY, USA, 2021. Association for Computing Machinery.
  42. Tacc: A full-stack cloud computing infrastructure for machine learning tasks. arXiv preprint arXiv:2110.01556, 2021.
  43. Dirl: Domain-invariant representation learning for generalizable semantic segmentation. In AAAI Conference on Artificial Intelligence, 2022.
  44. Cross-scene crowd counting via deep convolutional neural networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 833–841, 2015.
  45. Single-image crowd counting via multi-column convolutional neural network. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 589–597, 2016.
  46. Single-source domain expansion network for cross-scene hyperspectral image classification. IEEE Transactions on Image Processing, 32:1498–1512, 2022.
  47. Fine-grained fragment diffusion for cross domain crowd counting. In Proceedings of the 30th ACM International Conference on Multimedia, page 5659–5668, New York, NY, USA, 2022. Association for Computing Machinery.
  48. Daot: Domain-agnostically aligned optimal transport for domain-adaptive crowd counting. In Proceedings of the 31st ACM International Conference on Multimedia, page 4319–4329, New York, NY, USA, 2023. Association for Computing Machinery.
  49. Find gold in sand: Fine-grained similarity mining for domain-adaptive crowd counting. IEEE Transactions on Multimedia, 26:3842–3855, 2024.
  50. Rethinking data augmentation for single-source domain generalization in medical image segmentation. In AAAI, 2023.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com