Few-Shot Defect Image Generation via Defect-Aware Feature Manipulation (2303.02389v1)

Published 4 Mar 2023 in cs.CV

Abstract: The performances of defect inspection have been severely hindered by insufficient defect images in industries, which can be alleviated by generating more samples as data augmentation. We propose the first defect image generation method in the challenging few-shot cases. Given just a handful of defect images and relatively more defect-free ones, our goal is to augment the dataset with new defect images. Our method consists of two training stages. First, we train a data-efficient StyleGAN2 on defect-free images as the backbone. Second, we attach defect-aware residual blocks to the backbone, which learn to produce reasonable defect masks and accordingly manipulate the features within the masked regions by training the added modules on limited defect images. Extensive experiments on MVTec AD dataset not only validate the effectiveness of our method in generating realistic and diverse defect images, but also manifest the benefits it brings to downstream defect inspection tasks. Codes are available at https://github.com/Ldhlwh/DFMGAN.

Authors (4)

Yuxuan Duan (9 papers)
Yan Hong (49 papers)
Li Niu (79 papers)
Liqing Zhang (80 papers)

Citations (26)

View on Semantic Scholar

Summary

This paper introduces DFMGAN (Defect-aware Feature Manipulation GAN), a novel method for generating realistic and diverse defect images when only a few examples of defects are available, alongside a larger set of defect-free images. This addresses a critical data insufficiency problem in industrial defect inspection, where obtaining numerous defect images is often impractical.

The core idea is to leverage a pre-trained GAN (StyleGAN2) trained on defect-free images and adapt it to generate specific defects by manipulating features only in targeted regions. This is achieved through a two-stage training process:

Stage 1: Backbone Pretraining: A data-efficient StyleGAN2 (using StyleGAN2-ADA) is trained on hundreds of defect-free images of a specific object or texture category. This backbone generator learns to produce high-quality, diverse images of the defect-free items. It consists of a mapping network ( $z \rightarrow w$ ) and an overview network.
Stage 2: Defect Transfer: The pre-trained backbone generator's weights are frozen. New "defect-aware residual blocks" and a separate "defect mapping network" ( $z_d \rightarrow w_d$ ) are added. These new components are trained using only the few available defect images (e.g., 10-25 images).

Implementation Details:

Defect-Aware Residual Blocks: These blocks are attached to the backbone's synthesis network, starting at a resolution of 64x64. They take intermediate feature maps ( $\bm{F}^{res-1}$ $F^{res - 1}$ ) from the backbone as input.
- They generate a defect residual feature map ( $F_R^{res}$ ).
- A ToMask module generates a single-channel defect mask ( $\bm{M}$ ) from $F_R^{res}$ .
- The original feature map from the backbone ( $F_S^{res}$ ) is manipulated using the residual map and the mask:
- $\bm{F}_{manipulated}^{res}(i,j) = F_S^{res}(i,j) + F_R^{res}(i,j)$ if $\bm{M}(i,j) \ge 0$ , and $\bm{F}_{manipulated}^{res}(i,j) = F_S^{res}(i,j)$ otherwise.
- This ensures manipulation only occurs within the predicted defect region. The mask $\bm{M}$ is upsampled for higher resolutions.
Defect Mapping Network: Similar to the backbone's mapping network, it takes a random defect code $z_d$ and generates modulation weights $w_d$ for the defect-aware residual blocks, controlling the appearance and type of the generated defect independently of the object's appearance (controlled by $z \rightarrow w$ ).
Two Discriminators:
- Image Discriminator ( $D$ ): A standard StyleGAN2 discriminator, finetuned from Stage 1, to judge the overall realism of the generated defect image.
- Matching Discriminator ( $D_{match}$ ): A smaller, separate discriminator trained to judge if a generated image and its corresponding generated mask $\bm{M}$ form a realistic pair. It takes the concatenated image-mask pair as input. This ensures the generated mask accurately outlines the defect in the image. Both use Wasserstein loss with R1 regularization.
Mode Seeking Loss ( $L_{ms}$ ): To increase defect diversity for a given object appearance, a mode seeking loss is applied during Stage 2. It encourages larger differences in the generated masks ( $\|\bm{M}^1 - \bm{M}^2\|_1$ ) when different defect modulation weights ( $w_d^1, w_d^2$ ) are used, while keeping the object code $z$ (and thus $w$ ) fixed. The loss is formulated as:

$L_\mathrm{ms} = \frac{\|w_d^1 - w_d^2\|_1}{\|\bm{M}^1 - \bm{M}^2\|_1}$
Overall Objective: The generator $G$ and discriminators $D, D_{match}$ are trained by optimizing:

$L(G, D, D_{match}) = L_\mathrm{StyleGAN}(G, D) + L_\mathrm{match}(G, D_{match}) + \lambda L_\mathrm{ms}(G)$

Practical Application and Results:

Dataset: Experiments were performed on the MVTec AD dataset, which contains various object/texture categories with few defect samples per defect type but more defect-free samples. Images were resized to 256x256.
Evaluation: DFMGAN was compared against generic few-shot GANs (Finetune, DiffAug, CDC), previous defect generation GANs (SDGAN, Defect-GAN), and a non-generative method (CropPaste) using KID (lower is better) and clustered LPIPS (higher is better) metrics.
Performance: DFMGAN significantly outperformed baseline methods on both metrics across various defect types (e.g., crack, cut, hole, print on hazelnuts), demonstrating its ability to generate higher quality and more diverse defect images from limited data. Qualitative results show DFMGAN avoids common pitfalls like overfitting (Finetune, DiffAug), unrealistic outputs (CDC, SDGAN, Defect-GAN), or lack of novelty (CropPaste). A key advantage is the generation of paired defect-free images, defect images, and corresponding pixel-level defect masks.
Downstream Task: Generated images from DFMGAN were used as data augmentation for a few-shot defect classification task (training a ResNet-34). DFMGAN-augmented training data led to significantly higher classification accuracy (around 10% improvement over the next best method) on unseen real defect images compared to data generated by other methods. This highlights its practical value in improving defect inspection systems.

Key Contributions:

First method designed specifically for few-shot defect image generation.
Novel idea of transferring knowledge by manipulating features only within learned defect regions, rather than adapting the entire image generation process.
DFMGAN architecture incorporating defect-aware residual blocks, a separate defect mapping network, a matching discriminator, and mode-seeking loss.
Demonstrated state-of-the-art results in generating realistic, diverse defect images with corresponding masks from very few samples, and showed significant benefits for downstream defect classification.

The code is available at: https://github.com/Ldhlwh/DFMGAN

PDF Markdown

Related Papers

GitHub

GitHub - Ldhlwh/DFMGAN: [AAAI 2023] Official implementation of "Few-Shot Defect Image Generation via Defect-Aware Feature Manipulation". (87 stars)