This paper introduces DFMGAN (Defect-aware Feature Manipulation GAN), a novel method for generating realistic and diverse defect images when only a few examples of defects are available, alongside a larger set of defect-free images. This addresses a critical data insufficiency problem in industrial defect inspection, where obtaining numerous defect images is often impractical.
The core idea is to leverage a pre-trained GAN (StyleGAN2) trained on defect-free images and adapt it to generate specific defects by manipulating features only in targeted regions. This is achieved through a two-stage training process:
- Stage 1: Backbone Pretraining: A data-efficient StyleGAN2 (using StyleGAN2-ADA) is trained on hundreds of defect-free images of a specific object or texture category. This backbone generator learns to produce high-quality, diverse images of the defect-free items. It consists of a mapping network (z→w) and an overview network.
- Stage 2: Defect Transfer: The pre-trained backbone generator's weights are frozen. New "defect-aware residual blocks" and a separate "defect mapping network" (zd→wd) are added. These new components are trained using only the few available defect images (e.g., 10-25 images).
Implementation Details:
- Defect-Aware Residual Blocks: These blocks are attached to the backbone's synthesis network, starting at a resolution of 64x64. They take intermediate feature maps (Fres−1) from the backbone as input.
- They generate a defect residual feature map (FRres).
- A
ToMask
module generates a single-channel defect mask (M) from FRres.
- The original feature map from the backbone (FSres) is manipulated using the residual map and the mask:
- Fmanipulatedres(i,j)=FSres(i,j)+FRres(i,j) if M(i,j)≥0, and Fmanipulatedres(i,j)=FSres(i,j) otherwise.
- This ensures manipulation only occurs within the predicted defect region. The mask M is upsampled for higher resolutions.
- Defect Mapping Network: Similar to the backbone's mapping network, it takes a random defect code zd and generates modulation weights wd for the defect-aware residual blocks, controlling the appearance and type of the generated defect independently of the object's appearance (controlled by z→w).
- Two Discriminators:
- Image Discriminator (D): A standard StyleGAN2 discriminator, finetuned from Stage 1, to judge the overall realism of the generated defect image.
- Matching Discriminator (Dmatch): A smaller, separate discriminator trained to judge if a generated image and its corresponding generated mask M form a realistic pair. It takes the concatenated image-mask pair as input. This ensures the generated mask accurately outlines the defect in the image. Both use Wasserstein loss with R1 regularization.
- Mode Seeking Loss (Lms): To increase defect diversity for a given object appearance, a mode seeking loss is applied during Stage 2. It encourages larger differences in the generated masks (∥M1−M2∥1) when different defect modulation weights (wd1,wd2) are used, while keeping the object code z (and thus w) fixed. The loss is formulated as:
Lms=∥M1−M2∥1∥wd1−wd2∥1
- Overall Objective: The generator G and discriminators D,Dmatch are trained by optimizing:
L(G,D,Dmatch)=LStyleGAN(G,D)+Lmatch(G,Dmatch)+λLms(G)
Practical Application and Results:
- Dataset: Experiments were performed on the MVTec AD dataset, which contains various object/texture categories with few defect samples per defect type but more defect-free samples. Images were resized to 256x256.
- Evaluation: DFMGAN was compared against generic few-shot GANs (Finetune, DiffAug, CDC), previous defect generation GANs (SDGAN, Defect-GAN), and a non-generative method (CropPaste) using KID (lower is better) and clustered LPIPS (higher is better) metrics.
- Performance: DFMGAN significantly outperformed baseline methods on both metrics across various defect types (e.g., crack, cut, hole, print on hazelnuts), demonstrating its ability to generate higher quality and more diverse defect images from limited data. Qualitative results show DFMGAN avoids common pitfalls like overfitting (Finetune, DiffAug), unrealistic outputs (CDC, SDGAN, Defect-GAN), or lack of novelty (CropPaste). A key advantage is the generation of paired defect-free images, defect images, and corresponding pixel-level defect masks.
- Downstream Task: Generated images from DFMGAN were used as data augmentation for a few-shot defect classification task (training a ResNet-34). DFMGAN-augmented training data led to significantly higher classification accuracy (around 10% improvement over the next best method) on unseen real defect images compared to data generated by other methods. This highlights its practical value in improving defect inspection systems.
Key Contributions:
- First method designed specifically for few-shot defect image generation.
- Novel idea of transferring knowledge by manipulating features only within learned defect regions, rather than adapting the entire image generation process.
- DFMGAN architecture incorporating defect-aware residual blocks, a separate defect mapping network, a matching discriminator, and mode-seeking loss.
- Demonstrated state-of-the-art results in generating realistic, diverse defect images with corresponding masks from very few samples, and showed significant benefits for downstream defect classification.
The code is available at: https://github.com/Ldhlwh/DFMGAN