Multimodal Industrial Anomaly Detection via Hybrid Fusion

Published 1 Mar 2023 in cs.CV | (2303.00601v2)

Abstract: 2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at https://github.com/nomewang/M3DM.

Abstract PDF Upgrade to Chat

Citations (62)

View on Semantic Scholar

Summary

The paper introduces M3DM, a hybrid fusion approach that integrates 3D point cloud and RGB image data to overcome interference issues in feature fusion.
The methodology employs unsupervised feature fusion with patch-wise contrastive learning and point feature alignment to enhance multimodal feature integration.
Experimental results on the MVTec-3D AD dataset show superior image-level anomaly detection and segmentation precision, validating the model’s effectiveness.

Multimodal Industrial Anomaly Detection via Hybrid Fusion

This paper presents a method for industrial anomaly detection that leverages multimodal data, specifically 3D point clouds and RGB images, to enhance detection accuracy. The authors introduce Multi-3D-Memory (M3DM), a hybrid fusion approach aimed at leveraging the advantages of both 3D and 2D data by effectively fusing and processing their respective features.

Key Contributions

The paper identifies a significant limitation in existing multimodal anomaly detection methods, which often concatenate features from different modalities, leading to undesirable interference and reduced detection efficacy. To address these issues, the authors propose a hybrid fusion model with multiple novel components:

Unsupervised Feature Fusion (UFF): This process involves patch-wise contrastive learning to encourage interaction between features from different modalities, thereby aligning them more effectively. This enhances the model's ability to discern anomalies by focusing on multimodal feature interactions.
Decision Layer Fusion (DLF): The paper proposes using multiple memory banks to store features separately from RGB, 3D, and fused modalities, preserving individual modal information and enabling robust anomaly predictions.
Point Feature Alignment (PFA): The paper introduces a method to better align and integrate 3D point cloud features with 2D images, facilitating a coherent and unified representation for improved anomaly detection.

Experimental Results

Through rigorous testing on the MVTec-3D AD dataset, the proposed approach demonstrates superior performance over state-of-the-art (SOTA) methods both in terms of detection accuracy and segmentation precision. The M3DM model significantly outperforms competitive methods, as evidenced by notably higher scores in image-level anomaly detection (I-AUROC) and anomaly segmentation (AUPRO), reflecting the efficacy of incorporating multimodal inputs for industrial anomaly detection tasks.

Theoretical and Practical Implications

The theoretical implications of this work are profound, as it advances the understanding of multimodal feature fusion in unsupervised settings, enabling a more nuanced approach to multimodal industrial monitoring where defects are subtle or span multiple feature spaces. Practically, this research has a high potential for real-world application in quality assurance, pharmaceuticals, and other domains requiring meticulous inspection of complex products.

Future Directions

This research opens several avenues for further exploration. Future investigations could focus on extending this framework to other types of multimodal data and exploring deeper integration with self-supervised learning techniques. Efficiency improvements, such as reducing computational overhead while preserving detection capacity, would also be critical for real-time applications. Moreover, expanding the model's adaptability to unseen anomaly types without retraining could significantly enhance its applicability.

In conclusion, the paper provides a well-founded contribution to the field of multimodal anomaly detection by addressing previous limitations with a novel hybrid fusion approach. Its strong experimental validation showcases the model's potential impact in industrial applications, highlighting the importance of sophisticated feature fusion techniques in leveraging the full spectrum of available sensory data.

Markdown