Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Feedback-guided Data Synthesis for Imbalanced Classification (2310.00158v2)

Published 29 Sep 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Current status quo in machine learning is to use static datasets of real images for training, which often come from long-tailed distributions. With the recent advances in generative models, researchers have started augmenting these static datasets with synthetic data, reporting moderate performance improvements on classification tasks. We hypothesize that these performance gains are limited by the lack of feedback from the classifier to the generative model, which would promote the usefulness of the generated samples to improve the classifier's performance. In this work, we introduce a framework for augmenting static datasets with useful synthetic samples, which leverages one-shot feedback from the classifier to drive the sampling of the generative model. In order for the framework to be effective, we find that the samples must be close to the support of the real data of the task at hand, and be sufficiently diverse. We validate three feedback criteria on a long-tailed dataset (ImageNet-LT) as well as a group-imbalanced dataset (NICO++). On ImageNet-LT, we achieve state-of-the-art results, with over 4 percent improvement on underrepresented classes while being twice efficient in terms of the number of generated synthetic samples. NICO++ also enjoys marked boosts of over 5 percent in worst group accuracy. With these results, our framework paves the path towards effectively leveraging state-of-the-art text-to-image models as data sources that can be queried to improve downstream applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
  2. Instance-conditioned gan data augmentation for representation learning, 2023.
  3. Synthetic data from diffusion models improves imagenet classification. arXiv preprint arXiv:2304.08466, 2023.
  4. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022.
  5. Leaving Reality to Imagination: Robust Classification via Generated Datasets. In International Conference on Learning Representations (ICLR) Workshop, 2023.
  6. Easily accessible text-to-image generation amplifies demographic stereotypes at large scale, 2022.
  7. Semi-parametric neural image synthesis. Advances in Neural Information Processing Systems, 11, 2022.
  8. High fidelity visualization of what your self-supervised representation knows about. Transactions on Machine Learning Research, 2022. URL https://openreview.net/forum?id=urfWb7VjmL.
  9. Pug: Photorealistic and semantically controllable synthetic data for representation learning. arXiv preprint arXiv:2308.03977, 2023.
  10. Learning imbalanced datasets with label-distribution-aware margin loss. Conference on Neural Information Processing Systems (NeurIPS), 2019.
  11. Instance-conditioned gan. Advances in Neural Information Processing Systems, 34:27517–27529, 2021.
  12. Ensembling with deep generative views. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  13. Dall-eval: Probing the reasoning skills and social biases of text-to-image generative transformers. 2022.
  14. Remix: rebalanced mixup. In European Conference on Computer Vision (ECCV) Workshop, 2020.
  15. Class-balanced loss based on effective number of samples. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  16. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
  17. Diffusion models beat gans on image synthesis. In Conference on Neural Information Processing Systems (NeurIPS), 2021.
  18. Global and local mixture consistency cumulative learning for long-tailed visual recognitions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15814–15823, 2023.
  19. Diversify your vision datasets with automatic diffusion-based augmentation. arXiv preprint arXiv:2305.16289, 2023.
  20. A systematic survey of prompt engineering on vision-language foundation models. arXiv preprint arXiv:2307.12980, 2023.
  21. Dig in: Evaluating disparities in image generations with indicators for geographic diversity, 2023.
  22. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  23. IS SYNTHETIC DATA FROM GENERATIVE MODELS READY FOR IMAGE RECOGNITION? In International Conference on Learning Representations (ICLR), 2023.
  24. Gans trained by a two time-scale update rule converge to a nash equilibrium. CoRR, abs/1706.08500, 2017. URL http://arxiv.org/abs/1706.08500.
  25. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  26. Denoising diffusion probabilistic models. In Conference on Neural Information Processing Systems (NeurIPS), 2020.
  27. An improved algorithm for imbalanced data and small sample size classification. Journal of Data Analysis and Information Processing, 3(03):27, 2015.
  28. Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering. arXiv preprint arXiv:2303.11897, 2023.
  29. Simple data balancing achieves competitive worst-group-accuracy. In Conference on Causal Learning and Reasoning, pp.  336–351. PMLR, 2022.
  30. Generative models as a data source for multiview representation learning. arXiv preprint arXiv:2106.05258, 2021.
  31. Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217, 2019.
  32. Decoupling Representation and Classifier for Long-Tailed Recognition. In International Conference on Learning Representations (ICLR), 2020.
  33. Scaling up GANs for Text-to-Image Synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  34. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  35. Your Diffusion Model is Secretly a Zero-Shot Classifier. arXiv preprint arXiv:2303.16203, 2023.
  36. Explore the Power of Synthetic Data on Few-shot Object Detection. arXiv preprint arXiv:2303.13221, 2023.
  37. Focal loss for dense object detection. In IEEE International Conference on Computer Vision (ICCV), 2017.
  38. Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning, pp.  6781–6792. PMLR, 2021.
  39. Large-scale long-tailed recognition in an open world. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  40. Stable bias: Analyzing societal representations in diffusion models, 2023.
  41. Reliable fidelity and diversity metrics for generative models. 2020.
  42. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In International Conference on Machine Learning (ICML), 2022.
  43. The majority can help the minority: Context-rich minority oversampling for long-tailed classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  44. Visda: The visual domain adaptation challenge. arXiv preprint arXiv:1710.06924, 2017.
  45. Gradient starvation: A learning proclivity in neural networks. Advances in Neural Information Processing Systems, 34:1256–1272, 2021.
  46. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), 2021.
  47. Zero-shot text-to-image generation. In International Conference on Machine Learning (ICML), 2021.
  48. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125, 2022.
  49. Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems, 33:4175–4186, 2020.
  50. Playing for data: Ground truth from computer games. In European Conference on Computer Vision (ECCV), 2016.
  51. High-resolution image synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  52. Anchor loss: Modulating loss scale based on prediction difficulty. In IEEE International Conference on Computer Vision (ICCV), 2019.
  53. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019.
  54. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv preprint arXiv:2205.11487, 2022.
  55. Distributional robustness loss for long-tail learning. In IEEE International Conference on Computer Vision (ICCV), 2021.
  56. Fake it till you make it: Learning transferable representations from synthetic ImageNet clones. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  57. Generating high fidelity data from low-density regions using diffusion models, 2022.
  58. Claude Elwood Shannon. A mathematical theory of communication. ACM SIGMOBILE mobile computing and communications review, 5(1):3–55, 2001.
  59. A review on imbalanced data handling using undersampling and oversampling technique. Int. J. Recent Trends Eng. Res, 3(4):444–449, 2017.
  60. Relay backpropagation for effective learning of deep convolutional neural networks. In European Conference on Computer Vision (ECCV), 2016.
  61. Fill-up: Balancing long-tailed data with generative models. arXiv preprint arXiv:2306.07200, 2023.
  62. Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot Classification via Stable Diffusion, 2023.
  63. Understanding out-of-distribution accuracies through quantifying difficulty of test samples. arXiv preprint arXiv:2203.15100, 2022.
  64. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning (ICML), 2015.
  65. Denoising Diffusion Implicit Models. In International Conference on Learning Representations (ICLR), 2020.
  66. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  67. Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural Information Processing Systems, 35:19523–19536, 2022.
  68. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  69. Stablerep: Synthetic images from text-to-image models make strong visual representation learners, 2023.
  70. Don’t play favorites: Minority guidance for diffusion models. arXiv preprint arXiv:2301.12334, 2023.
  71. Dataset interfaces: Diagnosing model failures using controllable counterfactual generation. arXiv preprint arXiv:2302.07865, 2023.
  72. Cost-effective active learning for deep image classification. IEEE Transactions on Circuits and Systems for Video Technology, 27(12):2591–2600, 2016.
  73. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  74. Entropy-based active learning for object detection with progressive diversity constraint. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9397–9406, 2022.
  75. Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models. arXiv preprint arXiv:2303.11681, 2023.
  76. Imagereward: Learning and evaluating human preferences for text-to-image generation. arXiv preprint arXiv:2304.05977, 2023.
  77. Change is hard: A closer look at subpopulation shift. arXiv preprint arXiv:2302.12254, 2023.
  78. Improving out-of-distribution robustness via selective augmentation. In International Conference on Machine Learning, pp.  25407–25437. PMLR, 2022.
  79. What you see is what you read? improving text-image alignment evaluation. arXiv preprint arXiv:2305.10400, 2023.
  80. Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels. arXiv preprint arXiv:2302.10586, 2023.
  81. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
  82. Nico++: Towards better benchmarking for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16036–16047, 2023.
  83. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Reyhane Askari Hemmat (8 papers)
  2. Mohammad Pezeshki (20 papers)
  3. Florian Bordes (20 papers)
  4. Michal Drozdzal (45 papers)
  5. Adriana Romero-Soriano (30 papers)
Citations (12)
X Twitter Logo Streamline Icon: https://streamlinehq.com