Deep Learning Models for Visual Inspection on Automotive Assembling Line

Published 2 Jul 2020 in cs.CV and eess.IV | (2007.01857v1)

Abstract: Automotive manufacturing assembly tasks are built upon visual inspections such as scratch identification on machined surfaces, part identification and selection, etc, which guarantee product and process quality. These tasks can be related to more than one type of vehicle that is produced within the same manufacturing line. Visual inspection was essentially human-led but has recently been supplemented by the artificial perception provided by computer vision systems (CVSs). Despite their relevance, the accuracy of CVSs varies accordingly to environmental settings such as lighting, enclosure and quality of image acquisition. These issues entail costly solutions and override part of the benefits introduced by computer vision systems, mainly when it interferes with the operating cycle time of the factory. In this sense, this paper proposes the use of deep learning-based methodologies to assist in visual inspection tasks while leaving very little footprints in the manufacturing environment and exploring it as an end-to-end tool to ease CVSs setup. The proposed approach is illustrated by four proofs of concept in a real automotive assembly line based on models for object detection, semantic segmentation, and anomaly detection.

Abstract PDF Upgrade to Chat

Citations (22)

View on Semantic Scholar

Summary

The paper demonstrates a deep learning approach that integrates object detection, segmentation, and anomaly detection, eliminating the need for manual feature engineering.
It employs CNN architectures such as SSD, MobileNet, and DeepLab V3+ with transfer learning to achieve high precision and improved recall, even with limited training data.
An unsupervised GAN-based anomaly detection method is shown to match human performance while delivering rapid fault evaluation, paving the way for autonomous quality control.

Deep Learning Models for Visual Inspection in Automotive Assembly

Introduction

The integration of computer vision systems (CVSs) into automotive manufacturing for visual inspection has been primarily hindered by the inflexibility and environmental sensitivity of traditional feature-engineered methods. The paper "Deep Learning Models for Visual Inspection on Automotive Assembling Line" (2007.01857) presents a comprehensive empirical study on deploying deep learning-based end-to-end models—encompassing object detection, semantic segmentation, and anomaly detection—on a real automotive production line, with a focus on minimal disruption to the factory cycle and environmental setups.

Feature Engineering vs. Feature Learning in Manufacturing Vision Tasks

The authors juxtapose conventional vision pipelines, which entail distinct stages for image acquisition, preprocessing, manual feature extraction, and separate classification, with the deep learning paradigm where feature learning is achieved via data-driven end-to-end optimization. The deep hierarchical structures in CNNs, such as MobileNet and SSD for detection and DeepLab V3+ for segmentation, are shown to abstract representations more flexibly than hand-crafted features. This eliminates the critical bottleneck of requiring domain-specific knowledge and extensive manual labor for system adaptation when new product variants are introduced—a principal challenge in flexible manufacturing systems.

Object Detection: Architecture, Industrial Case Studies, and Results

The object detection system integrates SSD and MobileNet, leveraging transfer learning due to limited domain-specific labeled data. The deployment targeted the classification and localization of brake callipers and discs across six categories (three types each). Training was performed on only 20 images per class due to controlled manufacturing image variability, with 15 images per class for evaluation.

Recognition was assessed using IoU, Precision, and Recall. Results with 321 new images, using a 90% probability threshold for detection, yielded perfect precision (1.00) for all disc and calliper types but exhibited recall values in the range of 0.68–0.91, primarily due to missed detections from partial occlusions and improper camera triggering. A notable intervention was aggregating voting over multiple video frames, which eliminated FN errors, producing 100% final accuracy over aggregated video sequences for all part types. This demonstrates that robust triggering and temporal aggregation can significantly enhance system reliability in industrial settings.

Semantic Segmentation: Encoder-Decoder Architectures for Region Labeling

The semantic segmentation task was addressed using DeepLab V3+ with an Xception-65 encoder backbone. Cylinder head machining was the primary use case, focusing on segmenting regions of interest for defect inspection (machined surface, holes, background, defects). Despite a limited dataset (10 high-resolution images split into 288 training and 72 test patches), the system achieved a mean IoU of 0.7894. While the model effectively segmented major regions, it failed to detect small or subtle defects, attributed to overfitting and information loss from aggressive downsampling. The proof-of-concept confirms applicability for operator assistance but indicates training data quantity and curation as critical for defect localization performance.

Anomaly Detection: Application of GANs and Unsupervised Models

Addressing unsupervised fault detection, the study implements AnoGAN for anomaly localization in both cylinder head and brake kit inspection. Models were trained exclusively on normal samples (16,000 images for cylinder heads, 900 for each brake kit type), using the adversarial paradigm to learn generative and discriminative mappings.

In cylinder head evaluation, the system was able to highlight defects exceeding 5 mm² in size, with limitations in detecting smaller anomalies—primarily due to input resolution and sensor scale. Notably, the deep learning system matched human supervisors in all tested error cases while offering faster evaluation (seconds per image) and obviating the need for domain expertise in configuration.

In brake kit inspection, anomaly detection was chained with object detection to localize and extract the kit region prior to GAN inference. Thresholds were set empirically, and the system reliably flagged out-of-conformity cases, such as the introduction of a new calliper model, which had not been included in the normal training set.

Implications, Limitations, and Future Work

The empirical findings strongly support the assertion that deep learning enables robust, flexible, and low-touch visual inspection in real-world automotive assembly lines, with clear superiority over traditional vision systems in adaptability and deployment speed. However, limitations arise from training data volume, input resolution, and hardware requirements, especially for real-time execution.

From a theoretical standpoint, the unsupervised anomaly detection approach bypasses the need for labeled fault data, which are often scarce in manufacturing, but is limited in spatial granularity for small-scale defects. Object detection and semantic segmentation architectures are shown to generalize acceptably using small datasets in highly controlled industrial imagery, but scalability to more variable production lines requires further validation.

Practically, the study suggests that widespread adoption of deep learning inspection will demand investment in camera and compute hardware, or infrastructure for distributed/cloud inference, to meet real-time constraints. The elimination of feature engineering dramatically reduces the need for machine learning specialists during system adaptation, democratizing system maintenance and scaling.

Future developments could include integration with distributed learning frameworks to leverage factory-level compute [35], cross-modal learning for image-based documentation and operator monitoring [36, 37], and the inclusion of reinforcement learning to model operator behavior and optimize human-in-the-loop inspection [10].

Conclusion

This paper provides rigorous evidence supporting deep learning-based end-to-end visual inspection as a viable, efficient, and flexible solution in automotive manufacturing. The strong performance across object detection, segmentation, and anomaly detection in realistic factory contexts demonstrates the maturity of these technologies for industrial adoption, contingent on sufficient data engineering and infrastructure support. The research opens pathways for further exploration in unsupervised and reinforcement learning for increasingly autonomous and resilient manufacturing quality control.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Deep Learning Models for Visual Inspection on Automotive Assembling Line

Summary

Deep Learning Models for Visual Inspection in Automotive Assembly

Introduction

Feature Engineering vs. Feature Learning in Manufacturing Vision Tasks

Object Detection: Architecture, Industrial Case Studies, and Results

Semantic Segmentation: Encoder-Decoder Architectures for Region Labeling

Anomaly Detection: Application of GANs and Unsupervised Models

Implications, Limitations, and Future Work

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (4)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Deep Learning Models for Visual Inspection on Automotive Assembling Line

Summary

Deep Learning Models for Visual Inspection in Automotive Assembly

Introduction

Feature Engineering vs. Feature Learning in Manufacturing Vision Tasks

Object Detection: Architecture, Industrial Case Studies, and Results

Semantic Segmentation: Encoder-Decoder Architectures for Region Labeling

Anomaly Detection: Application of GANs and Unsupervised Models

Implications, Limitations, and Future Work

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (4)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research