Deep Learning Models for Visual Inspection on Automotive Assembling Line (2007.01857v1)

Published 2 Jul 2020 in cs.CV and eess.IV

Abstract: Automotive manufacturing assembly tasks are built upon visual inspections such as scratch identification on machined surfaces, part identification and selection, etc, which guarantee product and process quality. These tasks can be related to more than one type of vehicle that is produced within the same manufacturing line. Visual inspection was essentially human-led but has recently been supplemented by the artificial perception provided by computer vision systems (CVSs). Despite their relevance, the accuracy of CVSs varies accordingly to environmental settings such as lighting, enclosure and quality of image acquisition. These issues entail costly solutions and override part of the benefits introduced by computer vision systems, mainly when it interferes with the operating cycle time of the factory. In this sense, this paper proposes the use of deep learning-based methodologies to assist in visual inspection tasks while leaving very little footprints in the manufacturing environment and exploring it as an end-to-end tool to ease CVSs setup. The proposed approach is illustrated by four proofs of concept in a real automotive assembly line based on models for object detection, semantic segmentation, and anomaly detection.

Citations (22)

View on Semantic Scholar

Summary

The paper shows that deep learning models can replace manual, parameter-tuned computer vision systems for end-to-end inspection in automotive manufacturing.
It details three proofs-of-concept: object detection with SSD and MobileNet achieving 100% accuracy via temporal voting, semantic segmentation with an mIoU of 0.7894, and anomaly detection using AnoGAN.
The study highlights benefits like reduced need for specialized expertise and environmental control, while noting computational power for real-time inference remains a challenge.

Visual inspection is a critical task in automotive manufacturing assembly lines, ensuring product and process quality. Traditionally, these inspections are manual and highly dependent on environmental factors like lighting, making them costly and difficult to adapt to the flexible manufacturing demands of varying product types on the same line. Traditional computer vision systems (CVSs) also face limitations in flexible environments, requiring significant parameter tuning for variations in texture, luminosity, and new product templates. This paper proposes using deep learning (DL) models as end-to-end tools to overcome these challenges, requiring less environmental control and specialized expertise for setup and adaptation.

The core difference highlighted between traditional CVS and deep learning is the shift from manual feature engineering to automated feature learning. Traditional methods rely on handcrafted features and separate steps for pre-processing, feature extraction, and classification, which are problem-specific and require highly skilled personnel. Deep learning models, with their deep hierarchical structure and end-to-end optimization, learn features directly from raw data through multi-level nonlinear operations, making them more generalizable and adaptable to various data types without requiring specialized domain knowledge for feature extraction.

The paper demonstrates the practical application of deep learning for automotive visual inspection through three proofs of concept implemented on a real automotive assembly line at Renault do Brazil: object detection, semantic segmentation, and anomaly detection. The goal was to achieve effective inspection with minimal changes to the existing manufacturing environment and cycle time.

Object Detection for Part Conformity

Problem: Ensuring the correct brake disc and calliper types are assembled into kits loaded onto Automatic Guided Vehicles (AGVs). Manual loading is error-prone, potentially mixing parts for different vehicle models.
Approach: Train a deep learning model to detect and classify the brake disc and calliper components within images of the assembly kit.
Implementation:
- The system uses the Tensorflow Object Detection API [21].
- The chosen architecture is SSD (Single Shot MultiBox Detector) [19] combined with MobileNet [23] as the feature extraction backbone. This combination was selected for its balance of speed and accuracy, making it suitable for applications with hardware limitations and real-time requirements [22].
- SSD works by designing bounding boxes of different sizes and aspect ratios across the image and classifying the objects within these boxes. MobileNet replaces the traditional VGG-16 [24] backbone, significantly reducing computational cost and model size using depthwise separable convolutions [23].
- Training involved supervised learning on a small dataset: 20 images (400x400 pixels) per class (three types of discs, three types of callipers, total six classes). Testing used 15 images per class.
- Evaluation metrics included Intersection over Union (IoU) (Eq. 1), Precision (Eq. 2), and Recall (Eq. 3).
- A key challenge was False Negatives (FN) caused by images captured from video frames where parts were partially occluded at the beginning or end of their path in front of the camera.
- This limitation was addressed by implementing a temporal voting mechanism. Instead of a single image decision, the system analyzed a sequence of frames as the part moved. A trigger initiated counting detections when the first successful detection occurred, and the process stopped after 100 consecutive frames without detection. The final classification decision for a part was based on the majority vote across the frames where detections occurred.
Results & Practical Implications:
- Initial evaluation on single images showed trade-offs between Precision and Recall based on the detection probability threshold (Figure 5). Higher thresholds increased precision but decreased recall (more FNs).
- Implementing the temporal voting mechanism using video frames dramatically improved robustness. Tests on nine videos (three for each kit type) showed 100% accuracy in correctly identifying the brake disc and calliper types [(Deep Learning Models for Visual Inspection on Automotive Assembling Line, 2020), Table 2].
- This demonstrates that accurate object detection for process quality control is feasible in real-time on an assembly line using relatively efficient models, even with limited training data, by leveraging temporal information from video streams. The system could verify kit conformity without halting the production line.

Semantic Segmentation for Region-Specific Inspection

Problem: Inspecting specific machined regions on parts, such as the face of a cylinder head, for defects like scratches or porosity. This requires precise identification of these regions.
Approach: Use semantic segmentation to classify each pixel in the image, creating masks that highlight different regions (e.g., machined surface, holes, background).
Implementation:
- The system uses the Tensorflow Semantic Segmentation API [32].
- The model architecture is Deeplab V3+ [26], which employs an encoder-decoder structure and Spatial Pyramid Pooling (ASPP) with atrous convolution to capture multi-scale contextual information. Atrous convolution allows adjusting the filter's field of view to incorporate information at different scales [26].
- The base feature extraction network used is Xception 65.
- Training is supervised, requiring manually created label maps (ground truth) where different regions are colored with specific class labels (e.g., black for background, yellow for machined surface, red for defects) [(Deep Learning Models for Visual Inspection on Automotive Assembling Line, 2020), Figure 11b]. The paper notes that non-experts can perform this annotation task, which is laborious but reduces the need for inspection engineers.
- A small dataset was used for this proof of concept: 10 manually annotated images (4128x3096 pixels). These were divided into 36 patches each (688x516 pixels) to preserve detail during resizing to the model's input dimensions (321x321), resulting in 288 training and 72 testing patches. Synthetic defects were added to scrap parts to create defect examples [(Deep Learning Models for Visual Inspection on Automotive Assembling Line, 2020), Figure 11b].
- Training was performed for 1000 steps with a mini-batch size of 2 due to GPU memory limitations (GeForce GT 540m with 2GB).
- Evaluation uses mean Intersection over Union (mIoU) across all image pixels.
Results & Practical Implications:
- The model achieved an mIoU of 0.7894.
- While the system could effectively create masks for different regions, it was unable to detect the synthetically generated defects in the test images [(Deep Learning Models for Visual Inspection on Automotive Assembling Line, 2020), Figure 12]. This failure was attributed to the very small training dataset and potential information loss during image resizing/patching.
- Despite not detecting defects in this specific PoC, the results demonstrated the feasibility of using semantic segmentation to automatically segment critical inspection regions on complex parts like cylinder heads. This can assist operators by focusing their attention on specific areas or automating the initial masking step.
- Practical considerations include the labor involved in manual annotation and the need for sufficient computational resources or strategies (like optimized patching/resizing) to handle high-resolution images without losing critical detail.

Anomaly Detection for Product Quality

Problem: Identifying parts that deviate from the normal expected appearance, which could indicate various types of defects or assembly errors. This is challenging because anomalies are rare and diverse, making it difficult to collect labeled examples of all possible defects.
Approach: Utilize an unsupervised or semi-supervised approach that learns the distribution of 'normal' parts and flags any input that deviates significantly.
Implementation:
- The paper employs the AnoGAN architecture [14], which is based on Generative Adversarial Networks (GANs) [34].
- A GAN consists of a Generator (G) that learns to produce images resembling the training data (normal parts) and a Discriminator (D) that learns to distinguish between real training images and generated images [(Deep Learning Models for Visual Inspection on Automotive Assembling Line, 2020), Figure 13].
- The AnoGAN process involves training the GAN solely on images of normal, healthy parts. During inference, a query image (potentially anomalous) is input. The system then searches for a point in the generator's latent space that produces an image most similar to the query image [14].
- An anomaly score $A(x)$ is calculated for the query image $x$ , combining a residual loss $L_R(z)$ (Eq. 5) measuring the visual difference between $x$ and the generated image $G(z)$ (reconstruction error) and a discrimination loss $L_D(z)$ (Eq. 6) based on how well the discriminator classifies the generated image $G(z)$ [14]. A high anomaly score indicates a deviation from the learned distribution of normal parts.
- A threshold for the anomaly score is empirically determined by analyzing a set of normal images.
- To aid visual inspection, a residual image (Eq. 7) is generated, showing the difference between the original and reconstructed image, often visualized with a colormap (jet) to highlight potential anomalous regions [14].
- The architecture details for the Discriminator and Generator models are provided (Figures 15 and 16), using layers like Conv2D, LeakyReLU, MaxPooling2D, Flatten, Dense, BatchNormalization, and Conv2DTranspose. Specific Adam optimizer parameters and learning rates are given, based on [14]. Input images are resized to 200x200 pixels.
- Training is unsupervised (on normal data only). Requires a substantial dataset of normal parts.
Case Studies:
- Cylinder Head: Implemented on the machining line, synchronized with a robot arm handling the part to capture images of valve guides using a Basler acA3800 10gc camera. Trained on 16000 images of normal cylinder heads.
- Brake Kit: Chained with the object detection system. The object detection model first identifies the brake kit, crops the relevant region (using the bounds of detected discs and callipers), and this cropped image is then fed into a specialized AnoGAN model trained for that specific brake kit type. Trained specialist models for each of the three brake kit types (900 images of normal parts per type).
Results & Practical Implications:
- Cylinder Head: The system was able to identify all errors pointed out by human supervisors during a side-by-side comparison [(Deep Learning Models for Visual Inspection on Automotive Assembling Line, 2020), Figure 18]. Its main advantage was speed, analyzing images in seconds compared to human inspection. Limitations included inability to detect very small defects (<5mm²) due to image scale and resolution. The colormap visualization assisted operators in locating potential issues.
- Brake Kit: The chained system successfully detected an assembled kit that included an incorrect calliper type (meant for a new car model and not part of the trained 'normal' types) [(Deep Learning Models for Visual Inspection on Automotive Assembling Line, 2020), Figure 21]. This demonstrated the benefit of chaining models for a multi-stage inspection process, allowing specialized anomaly analysis for each component type. Like the cylinder head case, detecting small surface defects was limited by camera resolution and positioning.
- Overall, AnoGAN proved effective for detecting deviations from normal, addressing the difficulty of obtaining defect data. It was less intrusive on the production line compared to conventional systems requiring specialized lighting, although camera synchronization might necessitate minor robot path adjustments. The primary requirement is a large dataset of normal parts.

Chaining Methodologies and Overall Implications

The paper demonstrates how chaining different DL methodologies can create a more comprehensive inspection system. For example, object detection can localize a part and determine its type, which then triggers a specific anomaly detection model trained for that part type. This multi-stage approach allows for more detailed and tailored analysis.

A significant practical benefit highlighted across all applications is the "end-to-end" nature of deep learning models. Once trained, these systems require minimal parameter adjustment and can be applied by non-experts, easing setup and adaptation in flexible manufacturing environments.

However, a major consideration for implementing these systems in real-world industrial settings is the computational power required for real-time inference. Deep learning models, especially those processing high-resolution images or complex architectures like GANs, demand robust hardware (GPUs, powerful CPUs). While cloud processing or local servers are alternatives to edge deployment with resource-constrained devices, they introduce potential latency and infrastructure complexity. The authors mention this as a key disadvantage and area for future work, suggesting exploration of distributed deep learning [35].

Future research directions include exploring integration with distributed deep learning systems [35], applying other DL tasks like image retrieval or captioning [36], biometrics for operator safety [37], and reinforcement learning [10] to potentially mimic dynamic operator behavior in inspection or assembly tasks.

In conclusion, the paper provides practical evidence that deep learning models for object detection, semantic segmentation, and anomaly detection are viable and beneficial solutions for automating visual inspection on automotive assembly lines. They offer flexibility, require less environmental control than traditional methods, and can be implemented as end-to-end tools. While data collection (especially for normal states in anomaly detection and annotations for supervised tasks) and computational infrastructure remain important considerations, the demonstrated proofs of concept pave the way for more adaptable and efficient automated quality control in flexible manufacturing.

PDF Markdown

Deep Learning Models for Visual Inspection on Automotive Assembling Line (2007.01857v1)

Summary

Related Papers