Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AFreeCA: Annotation-Free Counting for All (2403.04943v2)

Published 7 Mar 2024 in cs.CV

Abstract: Object counting methods typically rely on manually annotated datasets. The cost of creating such datasets has restricted the versatility of these networks to count objects from specific classes (such as humans or penguins), and counting objects from diverse categories remains a challenge. The availability of robust text-to-image latent diffusion models (LDMs) raises the question of whether these models can be utilized to generate counting datasets. However, LDMs struggle to create images with an exact number of objects based solely on text prompts but they can be used to offer a dependable \textit{sorting} signal by adding and removing objects within an image. Leveraging this data, we initially introduce an unsupervised sorting methodology to learn object-related features that are subsequently refined and anchored for counting purposes using counting data generated by LDMs. Further, we present a density classifier-guided method for dividing an image into patches containing objects that can be reliably counted. Consequently, we can generate counting data for any type of object and count them in an unsupervised manner. Our approach outperforms other unsupervised and few-shot alternatives and is not restricted to specific object classes for which counting data is available. Code to be released upon acceptance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5):2594–2609, 2020.
  2. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 589–597, 2016.
  3. Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of the European conference on computer vision (ECCV), pages 532–546, 2018.
  4. Counting in the wild. In European Conference on Computer Vision, 2016.
  5. Drone-based object counting by spatially regularized regional proposal networks. In The IEEE International Conference on Computer Vision (ICCV). IEEE, 2017.
  6. Nwpu-crowd: A large-scale benchmark for crowd counting and localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  7. Completely self-supervised crowd counting via distribution matching. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, pages 186–204, Cham, 2022. Springer Nature Switzerland.
  8. Crowdclip: Unsupervised crowd counting via vision-language model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2893–2903, June 2023.
  9. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  10. Diversity is definitely needed: Improving model-agnostic zero-shot classification via stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 769–778, June 2023.
  11. Is synthetic data from generative models ready for image recognition? In The Eleventh International Conference on Learning Representations (ICLR), 2023.
  12. Teaching clip to count to ten. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3170–3180, October 2023.
  13. Fine-tuning can distort pretrained features and underperform out-of-distribution. In International Conference on Learning Representations, 2022.
  14. A closer look at model adaptation using feature distortion and simplicity bias. In The Eleventh International Conference on Learning Representations, 2023.
  15. Learning to count objects in images. Advances in neural information processing systems, 23, 2010.
  16. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1091–1100, 2018.
  17. Rethinking spatial invariance of convolutional networks for object counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19638–19648, June 2022.
  18. A generalized loss function for crowd counting and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1974–1983, 2021.
  19. Nas-count: Counting-by-density with neural architecture search. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pages 747–766. Springer, 2020.
  20. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  21. Leveraging unlabeled data for crowd counting by learning to rank. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7661–7669, 2018.
  22. Learning-to-count by learning-to-rank. In 2023 20th Conference on Robots and Vision (CRV), pages 105–112, 2023.
  23. RankSim: Ranking similarity regularization for deep imbalanced regression. In International Conference on Machine Learning (ICML), 2022.
  24. Differentiation of blackbox combinatorial solvers. In International Conference on Learning Representations, 2020.
  25. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
  26. A topological filter for learning with label noise. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 21382–21393. Curran Associates, Inc., 2020.
  27. Detecting corrupted labels without training a model to predict. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 27412–27427. PMLR, 17–23 Jul 2022.
  28. Cleannet: Transfer learning for scalable image classifier training with label noise. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  29. Adaptive dilated network with self-correction supervision for counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  30. Rethinking spatial invariance of convolutional networks for object counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19638–19648, 2022.
  31. Clip-count: Towards text-guided zero-shot object counting. In Proceedings of the 31st ACM International Conference on Multimedia, MM ’23, page 4535–4545, New York, NY, USA, 2023. Association for Computing Machinery.
  32. Learning to count everything. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  33. Represent, compare, and learn: A similarity-aware framework for class-agnostic counting. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  34. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Adriano D'Alessandro (3 papers)
  2. Ali Mahdavi-Amiri (31 papers)
  3. Ghassan Hamarneh (64 papers)

Summary

We haven't generated a summary for this paper yet.