Diffusion-based Data Augmentation for Object Counting Problems (2401.13992v1)
Abstract: Crowd counting is an important problem in computer vision due to its wide range of applications in image understanding. Currently, this problem is typically addressed using deep learning approaches, such as Convolutional Neural Networks (CNNs) and Transformers. However, deep networks are data-driven and are prone to overfitting, especially when the available labeled crowd dataset is limited. To overcome this limitation, we have designed a pipeline that utilizes a diffusion model to generate extensive training data. We are the first to generate images conditioned on a location dot map (a binary dot map that specifies the location of human heads) with a diffusion model. We are also the first to use these diverse synthetic data to augment the crowd counting models. Our proposed smoothed density map input for ControlNet significantly improves ControlNet's performance in generating crowds in the correct locations. Also, Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated. Additionally, our innovative guidance sampling further directs the diffusion process toward regions where the generated crowd images align most accurately with the location dot map. Collectively, we have enhanced ControlNet's ability to generate specified objects from a location dot map, which can be used for data augmentation in various counting problems. Moreover, our framework is versatile and can be easily adapted to all kinds of counting problems. Extensive experiments demonstrate that our framework improves the counting performance on the ShanghaiTech, NWPU-Crowd, UCF-QNRF, and TRANCOS datasets, showcasing its effectiveness.
- Idiff-face: Synthetic-based face recognition through fizzy identity-conditioned diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 19650–19661, 2023.
- Counting varying density crowds through density guided adaptive selection CNN and transformer estimation. IEEE Transactions on Circuits and Systems for Video Technology, 33(3):1055–1068, 2023.
- Zirui Chen. Diffusion models-based data augmentation for the cell cycle phase classification. Journal of Physics: Conference Series, 2580(1):012001, 2023.
- Decoupled two-stage crowd counting and beyond. IEEE Transactions on Image Processing, 30:2862–2875, 2021.
- Rethinking spatial invariance of convolutional networks for object counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19638–19648, 2022.
- Cross-head supervision for crowd counting with noisy annotations. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
- Diffusion models beat gans on image synthesis, 2021.
- Generative adversarial networks, 2014.
- Extremely overlapping vehicle counting. In Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA), 2015.
- Steerer: Resolving scale variations for counting and localization via selective inheritance learning, 2023.
- Denoising diffusion probabilistic models, 2020.
- Counting crowds in bad weather, 2023.
- Composition loss for counting, density map estimation and localization in dense crowds, 2018.
- Crowd counting by adaptively fusing predictions from an image pyramid. In BMVC, page 89, 2018.
- Diffusionclip: Text-guided diffusion models for robust image manipulation, 2022.
- Adam: A method for stochastic optimization, 2017.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In ICML, 2022.
- Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1091–1100, 2018.
- An end-to-end transformer model for crowd localization, 2022.
- Future frame prediction for anomaly detection – a new baseline. In 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Context-aware crowd counting, 2019.
- Bayesian loss for crowd count estimation with point supervision, 2019.
- Data augmentation using generative adversarial networks (gans) for gan-based detection of pneumonia and covid-19 in chest x-ray images. Informatics in Medicine Unlocked, 27:100779, 2021.
- Segmentation assisted u-shaped multi-scale transformer for crowd counting. 2022.
- A gan-based image synthesis method for skin lesion classification. Computer Methods and Programs in Biomedicine, 195:105568, 2020.
- Diffuse-denoise-count: Accurate crowd-counting with diffusion models, 2023.
- High-resolution image synthesis with latent diffusion models, 2022.
- Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022a.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022b.
- Dragdiffusion: Harnessing diffusion models for interactive point-based image editing, 2023.
- Crowd counting in the frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19618–19627, 2022.
- Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method. Technical Report, 2020.
- Denoising diffusion implicit models, 2022.
- Rethinking counting and localization in crowds:a purely point-based framework, 2021.
- Adaptive density map generation for crowd counting. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1130–1139, 2019.
- Residual regression with semantic prior for crowd counting. In CVPR, pages 4036–4045, 2019.
- A generalized loss function for crowd counting and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1974–1983, 2021.
- Distribution matching for crowd counting, 2020a.
- Dynamic mixture of counter network for location-agnostic crowd counting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 167–177, 2023.
- Nwpu-crowd: A large-scale benchmark for crowd counting and localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020b.
- Paint by example: Exemplar-based image editing with diffusion models. arXiv preprint arXiv:2211.13227, 2022a.
- Image data augmentation for deep learning: A survey, 2022b.
- Audiotoken: Adaptation of text-conditioned diffusion models for audio-to-image generation. arXiv preprint arXiv:2305.13050, 2023.
- Xue Ying. An overview of overfitting and its solutions. Journal of Physics: Conference Series, 1168:022022, 2019.
- Crowd analysis: A survey. Mach. Vis. Appl., 19:345–357, 2008.
- Cross-scene crowd counting via deep convolutional neural networks. In CVPR, pages 833–841, 2015.
- Adding conditional control to text-to-image diffusion models, 2023.
- Single-image crowd counting via multi-column convolutional neural network. In CVPR, pages 589–597, 2016.
- Diffswap: High-fidelity and controllable face swapping via 3d-aware masked diffusion. CVPR, 2023.
- Zhen Wang (571 papers)
- Yuelei Li (6 papers)
- Jia Wan (15 papers)
- Nuno Vasconcelos (79 papers)