Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Unified Object Counting Network with Object Occupation Prior (2212.14193v3)

Published 29 Dec 2022 in cs.CV

Abstract: The counting task, which plays a fundamental role in numerous applications (e.g., crowd counting, traffic statistics), aims to predict the number of objects with various densities. Existing object counting tasks are designed for a single object class. However, it is inevitable to encounter newly coming data with new classes in our real world. We name this scenario as \textit{evolving object counting}. In this paper, we build the first evolving object counting dataset and propose a unified object counting network as the first attempt to address this task. The proposed model consists of two key components: a class-agnostic mask module and a class-incremental module. The class-agnostic mask module learns generic object occupation prior via predicting a class-agnostic binary mask (e.g., 1 denotes there exists an object at the considering position in an image and 0 otherwise). The class-incremental module is used to handle new coming classes and provides discriminative class guidance for density map prediction. The combined outputs of class-agnostic mask module and image feature extractor are used to predict the final density map. When new classes come, we first add new neural nodes into the last regression and classification layers of class-incremental module. Then, instead of retraining the model from scratch, we utilize knowledge distillation to help the model remember what have already learned about previous object classes. We also employ a support sample bank to store a small number of typical training samples of each class, which are used to prevent the model from forgetting key information of old data. With this design, our model can efficiently and effectively adapt to new coming classes while keeping good performance on already seen data without large-scale retraining. Extensive experiments on the collected dataset demonstrate the favorable performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. S. Zhang, G. Wu, J. P. Costeira, and J. M. Moura, “Understanding traffic density from large-scale web camera data,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5898–5907.
  2. L. Liu, J. Chen, H. Wu, T. Chen, G. Li, and L. Lin, “Efficient crowd counting via structured knowledge transfer,” in Proceedings of the ACM International Conference on Multimedia, 2020, pp. 2645–2654.
  3. S. W. Glover and S. L. Bowen, “Bibliometric analysis of research published in tropical medicine and international health 1996–2003,” Tropical Medicine & International Health, vol. 9, no. 12, pp. 1327–1330, 2004.
  4. M. Zhao, J. Zhang, C. Zhang, and W. Zhang, “Leveraging heterogeneous auxiliary tasks to assist crowd counting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 736–12 745.
  5. M. Zhao, C. Zhang, J. Zhang, F. Porikli, B. Ni, and W. Zhang, “Scale-aware crowd counting via depth-embedded convolutional neural networks,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 10, pp. 3651–3662, 2019.
  6. S. Jiang, X. Lu, Y. Lei, and L. Liu, “Mask-aware networks for crowd counting,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 9, pp. 3119–3129, 2019.
  7. X. Jiang, L. Zhang, T. Zhang, P. Lv, B. Zhou, Y. Pang, M. Xu, and C. Xu, “Density-aware multi-task learning for crowd counting,” IEEE Transactions on Multimedia, vol. 23, pp. 443–453, 2020.
  8. Y. Liu, G. Cao, H. Shi, and Y. Hu, “Lw-count: An effective lightweight encoding-decoding crowd counting network,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
  9. S. Zhang, G. Wu, J. P. Costeira, and J. M. Moura, “Fcn-rlstm: Deep spatio-temporal neural networks for vehicle counting in city cameras,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3667–3676.
  10. J. Sun, K. Yang, C. Chen, J. Shen, Y. Yang, X. Wu, and T. Norton, “Wheat head counting in the wild by an augmented feature pyramid networks-based convolutional neural network,” Computers and Electronics in Agriculture, vol. 193, p. 106705, 2022.
  11. Z. Zhou, Z. Song, L. Fu, F. Gao, R. Li, and Y. Cui, “Real-time kiwifruit detection in orchard using deep learning on android™ smartphones for yield estimation,” Computers and Electronics in Agriculture, vol. 179, p. 105856, 2020.
  12. V. A. Sindagi and V. M. Patel, “A survey of recent advances in cnn-based single image crowd counting and density estimation,” Pattern Recognition Letters, vol. 107, pp. 3–16, 2018.
  13. G. Gao, J. Gao, Q. Liu, Q. Wang, and Y. Wang, “Cnn-based density estimation and crowd counting: A survey,” arXiv preprint arXiv:2003.12783, 2020.
  14. O. Sidla, Y. Lypetskyy, N. Brandle, and S. Seer, “Pedestrian detection and tracking for counting applications in crowded situations,” in Proceedings of the IEEE International Conference on Video and Signal Based Surveillance.   IEEE, 2006, pp. 70–70.
  15. V. B. Subburaman, A. Descamps, and C. Carincotte, “Counting people in the crowd using a generic head detector,” in Proceedings of the International Conference on Advanced Video and Signal-based Surveillance.   IEEE, 2012, pp. 470–475.
  16. D. Kong, D. Gray, and H. Tao, “Counting pedestrians in crowds using viewpoint invariant training.” in Proceedings of the British Machine Vision Conference, vol. 1.   Citeseer, 2005, p. 2.
  17. A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos, “Privacy preserving crowd monitoring: Counting people without people models or tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.   IEEE, 2008, pp. 1–7.
  18. S.-F. Lin, J.-Y. Chen, and H.-X. Chao, “Estimation of number of people in crowded scenes using perspective transformation,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 31, no. 6, pp. 645–654, 2001.
  19. P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627–1645, 2010.
  20. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
  21. C. Zhang, H. Li, X. Wang, and X. Yang, “Cross-scene crowd counting via deep convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 833–841.
  22. Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 589–597.
  23. V. A. Sindagi and V. M. Patel, “Generating high-quality crowd density maps using contextual pyramid cnns,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1861–1870.
  24. Y. Li, X. Zhang, and D. Chen, “Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1091–1100.
  25. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ArXiv Preprint ArXiv:1409.1556, 2014.
  26. U. Sajid, H. Sajid, H. Wang, and G. Wang, “Zoomcount: A zooming mechanism for crowd counting in static images,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 10, pp. 3499–3512, 2020.
  27. Q. Song, C. Wang, Z. Jiang, Y. Wang, Y. Tai, C. Wang, J. Li, F. Huang, and Y. Wu, “Rethinking counting and localization in crowds: A purely point-based framework,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3365–3374.
  28. Y. Tian, X. Chu, and H. Wang, “Cctrans: Simplifying and improving crowd counting with transformer,” ArXiv Preprint ArXiv:2109.14483, 2021.
  29. A. Zhang, J. Xu, X. Luo, X. Cao, and X. Zhen, “Cross-domain attention network for unsupervised domain adaptation crowd counting,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
  30. J. Gao, J. Li, H. Shan, Y. Qu, J. Z. Wang, and J. Zhang, “Forget less, count better: A domain-incremental self-distillation learning benchmark for lifelong crowd counting,” arXiv preprint arXiv:2205.03307, 2022.
  31. A. K. Nellithimaru and G. A. Kantor, “Rols: Robust object-level slam for grape counting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 1–9.
  32. E. A. Lins, J. P. M. Rodriguez, S. I. Scoloski, J. Pivato, M. B. Lima, J. M. C. Fernandes, P. R. V. da Silva Pereira, D. Lau, and R. Rieder, “A method for counting and classifying aphids using computer vision,” Computers and Electronics in Agriculture, vol. 169, p. 105200, 2020.
  33. D. Wang, D. Zhang, G. Yang, B. Xu, Y. Luo, and X. Yang, “Ssrnet: In-field counting wheat ears using multi-stage convolutional neural network,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2021.
  34. V. Ranjan, U. Sharma, T. Nguyen, and M. Hoai, “Learning to count everything,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3394–3403.
  35. M. Shi, H. Lu, C. Feng, C. Liu, and Z. Cao, “Represent, compare, and learn: A similarity-aware framework for class-agnostic counting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9529–9538.
  36. Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 2935–2947, 2017.
  37. S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2001–2010.
  38. A. Mallya and S. Lazebnik, “Packnet: Adding multiple tasks to a single network by iterative pruning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7765–7773.
  39. O. Tasar, Y. Tarabalka, and P. Alliez, “Incremental learning for semantic segmentation of large-scale remote sensing data,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 9, pp. 3524–3537, 2019.
  40. S. Yan, J. Xie, and X. He, “Der: Dynamically expandable representation for class incremental learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3014–3023.
  41. U. Michieli and P. Zanuttigh, “Incremental learning techniques for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 1–8.
  42. S. Yang, P. Luo, C.-C. Loy, and X. Tang, “Wider face: A face detection benchmark,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5525–5533.
  43. “Global wheat detection,” https://www.kaggle.com/competitions/global-wheat-detection.
  44. C. Arteta, V. Lempitsky, and A. Zisserman, “Counting in the wild,” in Proceedings of the European Conference on Computer Vision.   Springer, 2016, pp. 483–498.
  45. H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot, and M. Shah, “Composition loss for counting, density map estimation and localization in dense crowds,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 532–546.
  46. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  47. E. Belouadah and A. Popescu, “Il2m: Class incremental learning with dual memory,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 583–592.
  48. F. M. Castro, M. J. Marín-Jiménez, N. Guil, C. Schmid, and K. Alahari, “End-to-end incremental learning,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 233–248.
  49. Y. Wu, Y. Chen, L. Wang, Y. Ye, Z. Liu, Y. Guo, and Y. Fu, “Large scale incremental learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 374–382.
Citations (4)

Summary

We haven't generated a summary for this paper yet.