Papers
Topics
Authors
Recent
2000 character limit reached

FaultGPT: Industrial Fault Diagnosis Question Answering System by Vision Language Models

Published 21 Feb 2025 in cs.ET and eess.SP | (2502.15481v1)

Abstract: Recently, employing single-modality LLMs based on mechanical vibration signals as Tuning Predictors has introduced new perspectives in intelligent fault diagnosis. However, the potential of these methods to leverage multimodal data remains underexploited, particularly in complex mechanical systems where relying on a single data source often fails to capture comprehensive fault information. In this paper, we present FaultGPT, a novel model that generates fault diagnosis reports directly from raw vibration signals. By leveraging large vision-LLMs (LVLM) and text-based supervision, FaultGPT performs end-to-end fault diagnosis question answering (FDQA), distinguishing itself from traditional classification or regression approaches. Specifically, we construct a large-scale FDQA instruction dataset for instruction tuning of LVLM. This dataset includes vibration time-frequency image-text label pairs and human instruction-ground truth pairs. To enhance the capability in generating high-quality fault diagnosis reports, we design a multi-scale cross-modal image decoder to extract fine-grained fault semantics and conducted instruction tuning without introducing additional training parameters into the LVLM. Extensive experiments, including fault diagnosis report generation, few-shot and zero-shot evaluation across multiple datasets, validate the superior performance and adaptability of FaultGPT in diverse industrial scenarios.

Summary

  • The paper introduces FaultGPT, a model that leverages vision-language models to fuse vibration time-frequency images and text for industrial fault diagnosis.
  • The methodology integrates a visual encoder based on CLIP, a multi-scale cross-modal image decoder (MCID), and a prompt learner to enhance diagnostic accuracy.
  • Experimental evaluations using few-shot and zero-shot tests on benchmark datasets demonstrate robust performance and practical industrial applicability.

FaultGPT: Industrial Fault Diagnosis Question Answering System by Vision LLMs

Introduction

The paper presents FaultGPT, a novel model leveraging large vision-LLMs (LVLM) to automate industrial fault diagnosis through question answering. This approach addresses limitations in traditional methods, such as reliance on classification confidence scores and unimodal data sources, by integrating multimodal data for deeper semantic understanding. FaultGPT utilizes a large-scale instruction dataset featuring vibration time-frequency image-text label pairs and human instruction-ground truth pairs, significantly enhancing fault diagnosis capabilities in complex mechanical systems. Figure 1

Figure 1: Inference process of FaultGPT compared to traditional fault diagnosis methods.

Methodology

FaultGPT is designed with a visual encoder, a multi-scale cross-modal image decoder (MCID), and a prompt learner. The visual encoder employs a pre-trained CLIP model with adapter modules for efficient multimodal fusion, projecting vibration signal features onto a semantic embedding space compatible with LLMs. MCID extracts fine-grained fault semantics, capturing localized fault information by leveraging cross-attention mechanisms on visual inputs. The prompt learner aligns extracted visual features with language generation prompts, enhancing the accuracy of fault diagnosis reports. Figure 2

Figure 2: The overall training framework of the proposed FaultGPT. \ding{172}: Visual encoder, \ding{173}: MCID, \ding{174}: Prompt learner.

FDQA Instruction-Following Dataset

FaultGPT's instruction dataset was compiled from three major bearing fault datasets: CWRU, SCUT-FD, and Ottawa. This comprehensive dataset includes descriptions of time-frequency images, specifying fault types and characteristics, facilitating accurate fault detection. The instruction-following format enables LLMs to process multimodal inputs and generate relevant responses, optimizing fault diagnosis workflows. Figure 3

Figure 3: Example of fault diagnosis instruction data.

Experimental Evaluation

FaultGPT was evaluated against several open-source LVLMs across datasets, yielding superior results in generating fault diagnosis reports. Few-shot and zero-shot evaluations demonstrated the model's robustness and adaptability to unseen scenarios, with detailed ablation studies confirming the efficacy of core components like MCID and prompt learner in enhancing performance.

Key performance metrics include:

  • Accuracy, BLEU, ROUGE-L, CIDEr-D, and match scores, which collectively assess the model's ability to generate concise, relevant, and accurate diagnostic reports. Figure 4

    Figure 4: Few-shot and Zero-shot performance on SCUT-FD dataset. IT denotes instruction tuning.

Ablation Studies

The ablation study confirmed the significance of instruction tuning and the effectiveness of various loss functions (cross-entropy, focal, and dice) in training. The choice of wavelet basis in time-frequency transformations was assessed, demonstrating robust performance across different bases, with Morlet selected for primary experiments due to its stability. Figure 5

Figure 5: Ablation Study of Instruction Tuning on CWRU Dataset.

User Interface Design

The FaultGPT system is equipped with a user-friendly interface enabling real-time interaction and fault diagnosis for non-expert users, showcasing its practical application in industry. Users can input time-frequency images and receive detailed diagnostic reports based on the model's analysis. Figure 6

Figure 6: System demo showcasing an outer ring 2mm crack fault. The user interface is divided into four main sections: \ding{172} input area, \ding{173} user instruction area, \ding{174} report generation area, and \ding{175} MCID feature maps.

Conclusion

FaultGPT introduces a transformative approach to industrial fault diagnosis, harnessing LVLMs to perform detailed fault assessments beyond conventional methods. Future research will focus on expanding its application to compound fault diagnosis and other industrial domains, such as predicting remaining useful life, enhancing its versatility and impact across manufacturing sectors. Figure 7

Figure 7: Mean Loss and Mean Token Accuracy for the training process.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.