Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON (2306.07890v2)

Published 13 Jun 2023 in cs.CV and cs.LG

Abstract: Despite progress in vision-based inspection algorithms, real-world industrial challenges -- specifically in data availability, quality, and complex production requirements -- often remain under-addressed. We introduce the VISION Datasets, a diverse collection of 14 industrial inspection datasets, uniquely poised to meet these challenges. Unlike previous datasets, VISION brings versatility to defect detection, offering annotation masks across all splits and catering to various detection methodologies. Our datasets also feature instance-segmentation annotation, enabling precise defect identification. With a total of 18k images encompassing 44 defect types, VISION strives to mirror a wide range of real-world production scenarios. By supporting two ongoing challenge competitions on the VISION Datasets, we hope to foster further advancements in vision-based industrial inspection.

Citations (16)

Summary

  • The paper presents a comprehensive benchmark with 14 datasets, comprising over 18,000 images and 44 defect types for practical industrial inspection.
  • It employs refined instance segmentation and rigorous deduplication methods to ensure high quality data splits and precise defect localization.
  • It organizes two challenges—data-efficient detection and synthetic data generation—that highlight effective ensembling and augmentation techniques.

The paper "VISION Datasets: A Benchmark for Vision-based Industrial Inspection" (2306.07890) introduces a new collection of datasets and associated challenges specifically designed to bridge the gap between academic research and the practical realities of vision-based industrial inspection. The authors highlight that existing datasets often fail to capture the unique challenges faced in real-world industrial settings, particularly concerning data availability, data quality, and complex production requirements.

The VISION Datasets consist of 14 diverse industrial inspection datasets, totaling over 18,000 images and encompassing 44 distinct defect types across various manufacturing processes, materials, and industries. A key distinction from prior benchmarks like MVTec AD is the provision of annotation masks across all training, validation, and testing splits, facilitating benchmarking for a wide range of detection methodologies, including unsupervised, weakly-supervised, semi-supervised, and fully supervised approaches. Furthermore, the datasets feature instance-segmentation annotation, which allows for the identification and localization of individual defects, even when multiple instances of the same defect type appear in a single image. This is crucial for practical applications where defect count and size are often important metrics.

The Curation Process involved screening over 1800 manufacturing datasets from Roboflow. A team with industrial experience selected the top 14 datasets that reflected realistic production challenges. To ensure high-quality annotations, an experienced team provided refined instance segmentation masks based on the original bounding boxes, spending significant time on annotation and quality control.

Building robust Dataset Splits was a critical step due to the nature of industrial data (e.g., large images, small defects, highly aligned normal samples). A careful process was implemented to minimize data leakage across splits. This involved byte-level deduplication for images without defects and a defect-level similarity model for images with annotations, combined with manual checks for unit-level leakage (e.g., images from the same product serial number). The VISION V1 dataset, used for the challenges, strategically limits the number of labeled samples in the training split (around 4k annotated images out of 18k total images) to simulate data-scarce scenarios common in industry and encourage research into data-efficient methods.

To stimulate research and development, two VISION Challenges were organized:

  1. Track 1: Data-Efficient Defect Detection: This track focuses on evaluating algorithms that can perform well with limited labeled data. The goal is to measure how effectively models can detect diverse defects across different products and surface types. The evaluation metric is a composite score balancing mAP (mean Average Precision) and mAR (mean Average Recall), weighted equally, to emphasize both precise localization and the ability to detect all defects, which is vital in production to minimize escapes (missed defects).
  2. Track 2: Data-Generation for Defect Detection: This track addresses the challenges of data imbalance (rare defects, many normal samples) and long-tailed defect distributions by focusing on techniques to improve the quality and quantity of training data. Participants were challenged to use data cleaning, augmentation, and synthetic data generation methods (like generative models or rendering) to enhance a fixed initial dataset for a fixed detection model. The evaluation is based on how well the generated dataset improves the performance of the downstream detection model on a validation set.

Insights from Competition Winners revealed several practical strategies:

  • Track 1: Top teams heavily relied on model ensembling combining state-of-the-art object detection and instance segmentation architectures (e.g., EVA (2211.07636), MaskDino (2206.02777), HTC [Chen2019HybridTC], Mask2Former [cheng2021mask2former]) often with powerful backbones like Swin Transformer [liu2021Swin] and CBNetV2 [9932281]. Training on all 14 datasets simultaneously showed benefits for some teams, suggesting positive transfer learning potential. Data augmentation techniques like cut-paste were commonly used. Test Time Augmentation (TTA) was also employed to boost performance.
  • Track 2: Winning solutions demonstrated the effectiveness of both traditional data augmentation and generative models (StyleGANv2 [karras2020analyzing], ControlNet [zhang2023adding], StyleGAN2-ADA [karras2020training]) for generating synthetic defects or defect-free backgrounds. Some teams combined generated data with augmented real data and used data selection strategies to curate the most beneficial training sets.

The VISION Datasets and Challenges aim to foster research in areas such as algorithms for data/annotation limitations (self-supervised, semi-supervised, few-shot, weak supervision, transfer learning), data generation techniques (VAE, GAN, Diffusion models, 3D rendering, inverse rendering), and data-centric tools for collection, curation, automatic labeling, and quality assessment.

The authors acknowledge limitations, including the inability to capture all real-world variability (lighting, angles, scale) and the exhaustive scope of all possible defect types. Future work is encouraged to build more comprehensive datasets and investigate transfer learning/domain adaptation to improve generalization.

The VISION Datasets are available at https://huggingface.co/datasets/VISION-Workshop/VISION-Datasets under a CC BY-NC 4.0 License for the polygon annotations, while original dataset assets retain their respective licenses.