Papers
Topics
Authors
Recent
Search
2000 character limit reached

VMambaCC: A Visual State Space Model for Crowd Counting

Published 7 May 2024 in cs.CV | (2405.03978v1)

Abstract: As a deep learning model, Visual Mamba (VMamba) has a low computational complexity and a global receptive field, which has been successful applied to image classification and detection. To extend its applications, we apply VMamba to crowd counting and propose a novel VMambaCC (VMamba Crowd Counting) model. Naturally, VMambaCC inherits the merits of VMamba, or global modeling for images and low computational cost. Additionally, we design a Multi-head High-level Feature (MHF) attention mechanism for VMambaCC. MHF is a new attention mechanism that leverages high-level semantic features to augment low-level semantic features, thereby enhancing spatial feature representation with greater precision. Building upon MHF, we further present a High-level Semantic Supervised Feature Pyramid Network (HS2PFN) that progressively integrates and enhances high-level semantic information with low-level semantic information. Extensive experimental results on five public datasets validate the efficacy of our approach. For example, our method achieves a mean absolute error of 51.87 and a mean squared error of 81.3 on the ShangHaiTech_PartA dataset. Our code is coming soon.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Localization in the Crowd with Topological Constraints. In AAAI. 872–881.
  2. Rotation invariant image description with local binary pattern histogram fourier features. In SCIA. 61–70.
  3. Rethinking Spatial Invariance of Convolutional Networks for Object Counting. In CVPR. 19638–19648.
  4. Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting. (2019).
  5. Fast crowd segmentation using shape indexing. In ICCV. 1–8.
  6. SCAR: Spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing (2019).
  7. YOLOX: Exceeding YOLO Series in 2021. (2021).
  8. Albert Gu and Tri Dao. 2023. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. (2023). https://doi.org/10.48550/arXiv.2312.00752
  9. Multi-source Multi-scale Counting in Extremely Dense Crowd Images. In CVPR. 2547–2554.
  10. Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds. In ECCV. 532–546.
  11. Attention Scaling for Crowd Counting. In CVPR. 4706–4715.
  12. Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
  13. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.
  14. Where Are the Blobs: Counting by Localization with Point Supervision. In ECCV. 560–576.
  15. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. In CVPR. 1091–1100.
  16. An End-to-End Transformer Model for Crowd Localization. In ECCV. 38–54.
  17. Boosting Crowd Counting via Multifaceted Attention. In CVPR. 19628–19637.
  18. Feature Pyramid Networks for Object Detection. In CVPR. 936–944.
  19. Context-Aware Crowd Counting. In CVPR. 5099–5108.
  20. VMamba: Visual State Space Model. (2024). https://doi.org/10.48550/arXiv.2401.10166
  21. FGENet: Fine-Grained Extraction Network for Congested Crowd Counting. In MultiMedia Modeling - 30th International Conference (MMM).
  22. Shallow Feature Based Dense Attention Network for Crowd Counting. In AAAI. 11765–11772.
  23. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. (2017).
  24. Locating Objects Without Bounding Boxes. In CVPR. 6479–6489.
  25. Locate, Size, and Count: Accurately Resolving People in Dense Crowds via Detection. TPAMI (2021), 2739–2751.
  26. JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022), 2594–2609.
  27. Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework. In ICCV. 3365–3374.
  28. Counting people in the crowd using a generic head detector. In AVSS. 470–475.
  29. Improving Local Features with Relevant Spatial Information by Vision Transformer for Crowd Counting. In 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022.
  30. A Generalized Loss Function for Crowd Counting and Localization. In CVPR. 1974–1983.
  31. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In CVPR. 589–597.
  32. Tao Zhao and Ramakant Nevatia. 2003. Bayesian human segmentation in crowded situations. In CVPR. II–459.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.