GeNIe: Generative Hard Negative Images Through Diffusion (2312.02548v3)
Abstract: Data augmentation is crucial in training deep models, preventing them from overfitting to limited data. Recent advances in generative AI, e.g., diffusion models, have enabled more sophisticated augmentation techniques that produce data resembling natural images. We introduce GeNIe a novel augmentation method which leverages a latent diffusion model conditioned on a text prompt to combine two contrasting data points (an image from the source category and a text prompt from the target category) to generate challenging augmentations. To achieve this, we adjust the noise level (equivalently, number of diffusion iterations) to ensure the generated image retains low-level and background features from the source image while representing the target category, resulting in a hard negative sample for the source category. We further automate and enhance GeNIe by adaptively adjusting the noise level selection on a per image basis (coined as GeNIe-Ada), leading to further performance improvements. Our extensive experiments, in both few-shot and long-tail distribution settings, demonstrate the effectiveness of our novel augmentation method and its superior performance over the prior art. Our code is available at: https://github.com/UCDvision/GeNIe
- Associative alignment for few-shot image classification. In ECCV, 2019.
- Flamingo: a visual language model for few-shot learning, 2022.
- Assume, augment and learn: Unsupervised few-shot meta-learning via random labels and data augmentation. arxiv:1902.09884, 2019.
- Synthetic data from diffusion models improves imagenet classification, 2023.
- Enhancing few-shot image classification with unlabelled examples. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2796–2805, 2022.
- Food-101 – mining discriminative components with random forests. In European Conference on Computer Vision, 2014.
- Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In ICCV, pages 112–121, 2021.
- Learning imbalanced datasets with label-distribution-aware margin loss. NeurIPS, 32, 2019.
- A simple framework for contrastive learning of visual representations. In ICML, 2020.
- A closer look at few-shot classification. In ICLR, 2019a.
- Exploring simple siamese representation learning. In CVPR, 2021.
- Image deformation meta-networks for one-shot learning. In CVPR, 2019b.
- Pareto self-supervised training for few-shot learning. In CVPR, 2021.
- Randaugment: Practical automated data augmentation with a reduced search space. In Advances in Neural Information Processing Systems, pages 18613–18624. Curran Associates, Inc., 2020.
- Parametric contrastive learning. In ICCV, pages 715–724, 2021.
- Class-balanced loss based on effective number of samples. In CVPR, pages 9268–9277, 2019.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Using language to extend to unseen domains. International Conference on Learning Representations (ICLR), 2023a.
- Diversify your vision datasets with automatic diffusion-based augmentation, 2023b.
- Diversity with cooperation: Ensemble methods for few-shot classification. In ICCV, 2019.
- Diverse data augmentation with diffusions for effective test-time prompt tuning, 2023.
- Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, 2017.
- Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification. Neurocomputing, 2018.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
- Masked autoencoders are scalable vision learners, 2021.
- Masked autoencoders are scalable vision learners. In CVPR, pages 15979–15988. IEEE, 2022a.
- Is synthetic data from generative models ready for image recognition? arXiv preprint arXiv:2210.07574, 2022b.
- AugMix: A simple data processing method to improve robustness and uncertainty. Proceedings of the International Conference on Learning Representations (ICLR), 2020.
- Safa: Sample-adaptive feature augmentation for long-tailed image classification. In ECCV, 2022.
- Unsupervised learning via meta-learning. In ICLR, 2018.
- Auggan: Cross domain adaptation with gan-based data augmentation. European Conference on Computer Vision, 2018.
- Distilling model failures as directions in latent space. In ArXiv preprint arXiv:2206.14754, 2022.
- Unsupervised meta-learning via few-shot pseudo-supervised contrastive learning. In The Eleventh International Conference on Learning Representations, 2022.
- Decoupling representation and classifier for long-tailed recognition. In ICLR, 2020.
- Unsupervised meta-learning for few-shot image classification. In NeurIPS, 2019.
- Puzzle mix: Exploiting saliency and local statistics for optimal mixup. In International Conference on Machine Learning, pages 5275–5285. PMLR, 2020.
- 3D object representations for fine-grained categorization. In Workshop on 3D Representation and Recognition, Sydney, Australia, 2013.
- Your diffusion model is secretly a zero-shot classifier, 2023.
- Trustworthy long-tailed classification. In CVPR, pages 6970–6979, 2022a.
- Bigdatasetgan: Synthesizing imagenet with pixel-wise annotations, 2022b.
- Nested collaborative learning for long-tailed visual recognition. In CVPR, pages 6949–6958, 2022c.
- Adversarial feature hallucination networks for few-shot learning. In CVPR, 2020.
- Long-tailed visual recognition via gaussian clouded logit adjustment. In CVPR, pages 6929–6938, 2022d.
- Targeted supervised contrastive learning for long-tailed recognition. In CVPR, pages 6918–6928, 2022e.
- Negative margin matters: Understanding margin in few-shot classification. In ECCV, 2020.
- Large-scale long-tailed recognition in an open world. In CVPR, 2019.
- Automix: Unveiling the power of mixup for stronger classifiers. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV, pages 441–458. Springer, 2022.
- Self-supervision can be a good few-shot learner. In European Conference on Computer Vision, pages 740–758. Springer, 2022.
- Camdiff: Camouflage image augmentation via diffusion model, 2023.
- Boomerang: Local sampling on image manifolds using diffusion models, 2022.
- Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- Self-supervised prototypical transfer learning for few-shot classification. In ICMLW, 2020.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
- Long-tail learning via logit adjustment. In ICLR, 2021.
- Dinov2: Learning robust visual features without supervision, 2023.
- Gan-supervised dense visual alignment. In CVPR, 2022.
- On guiding visual attention with language specification. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Few-shot image recognition by predicting parameters from activations. In CVPR, 2018.
- Unsupervised few-shot learning via distribution shift-based augmentation. arxiv:2004.05805, 2020.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Zero-shot text-to-image generation. In ICML, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
- Optimization as a model for few-shot learning. In ICLR, 2017.
- Meta-learning for semi-supervised few-shot classification. In International Conference on Learning Representations, 2018.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022a.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022b.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Generate to adapt: Aligning domains using generative adversarial networks. Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Delta-encoder: an effective sample synthesis method for few-shot object recognition. In NeurIPS, 2018.
- Contrastive examples for addressing the tyranny of the majority. CoRR, abs/2004.06524, 2020.
- Boosting zero-shot classification with synthetic data diversity via stable diffusion. arXiv preprint arXiv:2302.03298, 2023.
- Self-attention message passing for contrastive few-shot learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5426–5436, 2023.
- A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
- Transductive decoupled variational inference for few-shot classification. Transactions on Machine Learning Research, 2023.
- Prototypical networks for few-shot learning. In NeurIPS, 2017.
- When does self-supervision improve few-shot learning? In ECCV, 2020.
- Learning to compare: Relation network for few-shot learning. In CVPR, 2018.
- Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS, 33:1513–1524, 2020.
- Training data-efficient image transformers and distillation through attention, 2021.
- Deit iii: Revenge of the vit. In ECCV, 2022.
- Effective data augmentation with diffusion models, 2023.
- Repurposing gans for one-shot semantic part segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Matching networks for one shot learning. In NeurIPS, 2016.
- The caltech-ucsd birds-200-2011 dataset, 2011.
- Contrastive prototypical network with wasserstein confidence penalty. In European Conference on Computer Vision, pages 665–682. Springer, 2022.
- Towards calibrated hyper-sphere representation via distribution overlap coefficient for long-tailed learning. In ECCV, 2022.
- Long-tailed recognition by routing diverse distribution-aware experts. In ICLR. OpenReview.net, 2021.
- Improving generalization via scalable neighborhood component analysis. In ECCV, 2018.
- Constructing balance from imbalance for long-tailed image recognition. In ECCV, pages 38–56. Springer, 2022.
- Learning imbalanced data with vision transformers, 2023.
- Few-shot learning via embedding adaptation with set-to-set functions. In CVPR, 2020.
- Few-shot learning via embedding adaptation with set-to-set functions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8808–8817, 2020.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV, pages 6023–6032, 2019.
- mixup: Beyond empirical risk minimization. In ICLR, 2018.
- Distribution alignment: A unified framework for long-tail visual recognition. In CVPR, pages 2361–2370, 2021a.
- Test-agnostic long-tailed recognition by test-time aggregating diverse experts with self-supervision. arXiv preprint arXiv:2107.09249, 2021b.
- Datasetgan: Efficient labeled data factory with minimal human effort. In CVPR, 2021c.
- Improving calibration for long-tailed recognition. In CVPR, pages 16489–16498. Computer Vision Foundation / IEEE, 2021.
- Binocular mutual learning for improving few-shot classification. In ICCV, 2021.
- Balanced contrastive learning for long-tailed visual recognition. In CVPR, pages 6908–6917, 2022.
- Soroush Abbasi Koohpayegani (17 papers)
- Anuj Singh (7 papers)
- K L Navaneet (5 papers)
- Hadi Jamali-Rad (15 papers)
- Hamed Pirsiavash (50 papers)