Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model (2403.19600v1)

Published 28 Mar 2024 in cs.CV

Abstract: Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications. However, the effective integration of T2I models into fundamental image classification tasks remains an open question. A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques. Our analysis reveals that these methods struggle to produce images that are both faithful (in terms of foreground objects) and diverse (in terms of background contexts) for domain-specific concepts. To tackle this challenge, we introduce an innovative inter-class data augmentation method known as Diff-Mix (https://github.com/Zhicaiwww/Diff-Mix), which enriches the dataset by performing image translations between classes. Our empirical results demonstrate that Diff-Mix achieves a better balance between faithfulness and diversity, leading to a marked improvement in performance across diverse image classification scenarios, including few-shot, conventional, and long-tail classifications for domain-specific datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340, 2017.
  2. Synthetic data from diffusion models improves imagenet classification. arXiv preprint arXiv:2304.08466, 2023.
  3. Improving out-of-distribution robustness of classifiers via generative interpolation. arXiv preprint arXiv:2307.12219, 2023.
  4. Leaving reality to imagination: Robust classification via generated datasets. arXiv preprint arXiv:2302.02503, 2023.
  5. On adversarial mixup resynthesis. Advances in neural information processing systems, 32, 2019.
  6. Person image synthesis via denoising diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5968–5976, 2023.
  7. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
  8. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18392–18402, 2023.
  9. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019a.
  10. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019b.
  11. Vicinal risk minimization. Advances in neural information processing systems, 13, 2000.
  12. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  14. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1):98–136, 2015.
  15. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  16. Generative adversarial nets. Advances in neural information processing systems, 27, 2014a.
  17. Generative adversarial nets. Advances in neural information processing systems, 27, 2014b.
  18. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  19. Is synthetic data from generative models ready for image recognition? arXiv preprint arXiv:2210.07574, 2022.
  20. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  21. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  22. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  23. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  24. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
  25. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  26. Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, 2011.
  27. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013.
  28. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1931–1941, 2023.
  29. Your diffusion model is secretly a zero-shot classifier. arXiv preprint arXiv:2303.16203, 2023.
  30. Invariant grounding for video question answering. In CVPR, 2022.
  31. Common diffusion noise schedules and sample steps are flawed. arXiv preprint arXiv:2305.08891, 2023.
  32. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2537–2546, 2019.
  33. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022.
  34. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  35. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  36. When does label smoothing help? Advances in neural information processing systems, 32, 2019.
  37. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  38. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
  39. The majority can help the minority: Context-rich minority oversampling for long-tailed classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6887–6896, 2022.
  40. Conceptbed: Evaluating concept learning abilities of text-to-image diffusion models. arXiv preprint arXiv:2306.04695, 2023.
  41. Judea Pearl. Causal inference in statistics: An overview. 2009.
  42. Controlling text-to-image diffusion by orthogonal finetuning. Advances in Neural Information Processing Systems, 36, 2024.
  43. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  44. High-resolution image synthesis with latent diffusion models, 2021.
  45. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  46. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  47. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  48. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019.
  49. Photorealistic text-to-image diffusion models with deep language understanding, 2022.
  50. From generalized zero-shot learning to long-tail with class descriptors. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 286–295, 2021.
  51. Data augmentation using generative adversarial networks (cyclegan) to improve generalizability in ct segmentation tasks. Scientific reports, 9(1):16884, 2019.
  52. Fake it till you make it: Learning transferable representations from synthetic imagenet clones. In CVPR 2023–IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  53. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294, 2022.
  54. Boosting zero-shot classification with synthetic data diversity via stable diffusion. arXiv preprint arXiv:2302.03298, 2023.
  55. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
  56. Inner classifier-free guidance and its taylor expansion for diffusion models. In The Twelfth International Conference on Learning Representations, 2023.
  57. Stablerep: Synthetic images from text-to-image models make strong visual representation learners. arXiv preprint arXiv:2306.00984, 2023.
  58. Effective data augmentation with diffusion models. arXiv preprint arXiv:2302.07944, 2023.
  59. Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1921–1930, 2023.
  60. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
  61. The caltech-ucsd birds-200-2011 dataset. 2011.
  62. Bi-directional distribution alignment for transductive zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19893–19902, 2023.
  63. Novel view synthesis with diffusion models. arXiv preprint arXiv:2210.04628, 2022.
  64. Stylespace analysis: Disentangled controls for stylegan image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12863–12872, 2021.
  65. Exploiting synthetic data for data imbalance problems: Baselines from a data perspective. arXiv preprint arXiv:2308.00994, 2023.
  66. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6023–6032, 2019.
  67. Incorporating bias-aware margins into contrastive loss for collaborative filtering. In NeurIPS, 2022.
  68. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
  69. Toward understanding generative data augmentation. Advances in Neural Information Processing Systems, 36, 2024.
  70. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017.
  71. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhicai Wang (10 papers)
  2. Longhui Wei (40 papers)
  3. Tan Wang (18 papers)
  4. Heyu Chen (2 papers)
  5. Yanbin Hao (31 papers)
  6. Xiang Wang (279 papers)
  7. Xiangnan He (200 papers)
  8. Qi Tian (314 papers)
Citations (6)