Ali-AUG: Innovative Approaches to Labeled Data Augmentation using One-Step Diffusion Model
Abstract: This paper introduces Ali-AUG, a novel single-step diffusion model for efficient labeled data augmentation in industrial applications. Our method addresses the challenge of limited labeled data by generating synthetic, labeled images with precise feature insertion. Ali-AUG utilizes a stable diffusion architecture enhanced with skip connections and LoRA modules to efficiently integrate masks and images, ensuring accurate feature placement without affecting unrelated image content. Experimental validation across various industrial datasets demonstrates Ali-AUG's superiority in generating high-quality, defect-enhanced images while maintaining rapid single-step inference. By offering precise control over feature insertion and minimizing required training steps, our technique significantly enhances data augmentation capabilities, providing a powerful tool for improving the performance of deep learning models in scenarios with limited labeled data. Ali-AUG is especially useful for use cases like defective product image generation to train AI-based models to improve their ability to detect defects in manufacturing processes. Using different data preparation strategies, including Classification Accuracy Score (CAS) and Naive Augmentation Score (NAS), we show that Ali-AUG improves model performance by 31% compared to other augmentation methods and by 45% compared to models without data augmentation. Notably, Ali-AUG reduces training time by 32% and supports both paired and unpaired datasets, enhancing flexibility in data preparation.
- Binary-classifiers-enabled filters for semi-supervised learning. IEEE Access, 9:167663–167673, 2021.
- Image data augmentation approaches: A comprehensive survey and future directions, 2023.
- Cutmix: Regularization strategy to train strong classifiers with localizable features, 2019.
- Pd-gan: Probabilistic diverse gan for image inpainting, 2021.
- Cm-gan: Image inpainting with cascaded modulation gan and object-aware training, 2022.
- Generating diverse structure for image inpainting with hierarchical vq-vae. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10775–10784, 2021.
- Pluralistic image completion, 2019.
- Generative adversarial networks for data augmentation in machine fault diagnosis. Comput. Ind., 106:85–93, 2019.
- Diffusion model-based image editing: A survey, 2024.
- Adding Conditional Control to Text-to-Image Diffusion Models.
- BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion.
- Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7378–7387, October 2023.
- Paint by Inpaint: Learning to Add Image Objects by Removing Them First.
- RePaint: Inpainting using Denoising Diffusion Probabilistic Models.
- LatentPaint: Image Inpainting in Latent Space with Diffusion Models. In 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 4322–4331. IEEE.
- Adding conditional control to text-to-image diffusion models, 2023.
- One-Step Image Translation with Text-to-Image Models.
- Lora: Low-rank adaptation of large language models, 2021.
- Smartbrush: Text and shape guided object inpainting with diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22428–22437, 2023.
- Dreaminpainter: Text-guided subject-driven image inpainting with diffusion models, 2023.
- Image data augmentation for deep learning: A survey. arXiv preprint arXiv:2204.08610, 2022.
- Mask embedding in conditional gan for guided synthesis of high resolution images. arXiv preprint arXiv:1907.01710, 2019.
- Denoising diffusion probabilistic models, 2020.
- Denoising diffusion implicit models, 2022.
- OpenAI. Dall·e 3. https://openai.com/index/dall-e-3/, 2024. Accessed: 2024-05-24.
- Midjourney. Midjourney. https://www.midjourney.com/home, 2024. Accessed: 2024-05-24.
- High-Resolution Image Synthesis with Latent Diffusion Models.
- DIFFUSEMIX: Label-Preserving Data Augmentation with Diffusion Models.
- Effective Data Augmentation With Diffusion Models.
- Generative adversarial text to image synthesis, 2016.
- Learning to discover cross-domain relations with generative adversarial networks, 2017.
- Unsupervised cross-domain image generation, 2016.
- T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models.
- Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18208–18218, 2022.
- High-resolution image synthesis with latent diffusion models, 2022.
- Hd-painter: High-resolution and prompt-faithful text-guided image inpainting with diffusion models, 2024.
- Virtualmodel: Generating object-id-retentive human-object interaction image by diffusion model for e-commerce marketing, 2024.
- Imagen editor and editbench: Advancing and evaluating text-guided image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18359–18369, 2023.
- InstructPix2Pix: Learning to Follow Image Editing Instructions. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18392–18402. IEEE.
- Guiding Instruction-based Image Editing via Multimodal Large Language Models.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022.
- On distillation of guided diffusion models, 2023.
- Progressive distillation for fast sampling of diffusion models, 2022.
- Adversarial diffusion distillation, 2023.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation, 2023.
- U-net: Convolutional networks for biomedical image segmentation, 2015.
- Learning transferable visual models from natural language supervision, 2021.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium, 2018.
- On aliased resizing and surprising subtleties in gan evaluation, 2022.
- Classification accuracy score for conditional generative models, 2019.
- Mvtec ad — a comprehensive real-world dataset for unsupervised anomaly detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9584–9592, 2019.
- Resolution-robust large mask inpainting with fourier convolutions, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.