Papers
Topics
Authors
Recent
2000 character limit reached

FD-LLM: Large Language Model for Fault Diagnosis of Machines

Published 2 Dec 2024 in cs.AI and cs.LG | (2412.01218v1)

Abstract: LLMs are effective at capturing complex, valuable conceptual representations from textual data for a wide range of real-world applications. However, in fields like Intelligent Fault Diagnosis (IFD), incorporating additional sensor data-such as vibration signals, temperature readings, and operational metrics-is essential but it is challenging to capture such sensor data information within traditional text corpora. This study introduces a novel IFD approach by effectively adapting LLMs to numerical data inputs for identifying various machine faults from time-series sensor data. We propose FD-LLM, an LLM framework specifically designed for fault diagnosis by formulating the training of the LLM as a multi-class classification problem. We explore two methods for encoding vibration signals: the first method uses a string-based tokenization technique to encode vibration signals into text representations, while the second extracts statistical features from both the time and frequency domains as statistical summaries of each signal. We assess the fault diagnosis capabilities of four open-sourced LLMs based on the FD-LLM framework, and evaluate the models' adaptability and generalizability under various operational conditions and machine components, namely for traditional fault diagnosis, cross-operational conditions, and cross-machine component settings. Our results show that LLMs such as Llama3 and Llama3-instruct demonstrate strong fault detection capabilities and significant adaptability across different operational conditions, outperforming state-of-the-art deep learning (DL) approaches in many cases.

Summary

  • The paper introduces FD-LLM, a framework that adapts LLMs for intelligent fault diagnosis using novel FFT and statistical encoding methods.
  • The paper demonstrates that Llama3-based models achieve ≥99% accuracy on FFT-encoded vibration signals, outperforming traditional ML/DL models.
  • The paper highlights the role of contextual prompts and LoRA fine-tuning in enhancing adaptability, while noting challenges in cross-component generalization.

FD-LLM: LLM for Fault Diagnosis of Machines

Introduction

The FD-LLM framework introduces a paradigm for leveraging LLMs in intelligent fault diagnosis (IFD) of industrial machinery, specifically targeting the challenge of integrating time-series sensor data—such as vibration signals—into LLM-based classification systems. Traditional ML and DL approaches for IFD are limited by their dependence on extensive retraining, lack of interpretability, and poor generalization across operational conditions and machine components. FD-LLM addresses these limitations by adapting LLMs to process numerical sensor data through text-based encoding, enabling multi-class fault classification with enhanced adaptability and generalizability.

FD-LLM Framework and Data Encoding

FD-LLM is structured around three core stages: data pre-processing, instruction fine-tuning, and post-processing. The framework supports two distinct data encoding pipelines for vibration signals:

  1. FFT-Based String Tokenization: Vibration signals are segmented, transformed via FFT, and the resulting magnitudes are quantized and encoded as text strings. This approach ensures compatibility with LLM tokenizers and preserves frequency-domain information critical for fault discrimination.
  2. Statistical Feature Summarization: Time-domain and frequency-domain statistical features are extracted and serialized into text paragraphs, providing compact, high-level representations of signal characteristics.

Both pipelines incorporate contextual information—machine specifications and operational conditions—into the input prompts, allowing the LLM to leverage domain knowledge during prediction. Figure 1

Figure 1: An illustration of the FD-LLM framework using both FFT-processed and statistically processed data pipelines.

Model Training and Fine-Tuning

FD-LLM formulates fault diagnosis as a multi-class classification problem, where each input consists of a processed vibration signal and a contextual prompt. Instruction fine-tuning is performed using LoRA, which introduces low-rank trainable matrices into the LLM architecture, enabling efficient domain adaptation with minimal computational overhead. The fine-tuned LLMs are trained to map each input to a fault category, with outputs post-processed to align with standard evaluation metrics.

Experimental Design

Experiments utilize the CWRU bearing dataset, encompassing multiple fault types and operational conditions. Three evaluation tasks are defined:

  • Task 1: Traditional Fault Diagnosis—Models are trained and evaluated on combined datasets from the drive and fan ends.
  • Task 2: Cross-Dataset Generalization—Models are trained on one operational condition/component and evaluated on others to assess adaptability.
  • Task 3: Overall Evaluation—Models are trained and evaluated on the full dataset to measure robustness.

Comparisons are made across ML (SVM), DL (WDCNN), and LLMs (Llama3-8B, Llama3-8B-instruct, Qwen1.5-7B, Mistral-7B-v0.2).

Results and Analysis

Traditional Fault Diagnosis

  • FFT-Processed Data: Llama3 and Llama3-instruct achieve near-perfect accuracy and F1-scores (≥99.7%) on both drive and fan end datasets, outperforming WDCNN and other LLMs. The FFT encoding provides rich frequency-domain information, facilitating superior fault discrimination.
  • Statistical Data: Llama3-instruct and SVM perform well on drive end data (accuracy ≈95%), but all models show reduced performance on fan end data, indicating limited generalization with statistical features.

Cross-Dataset Generalization

  • Statistical Features: All models, including LLMs and SVM, exhibit significant performance degradation when evaluated on unseen operational conditions or machine components, with accuracy dropping below 80%.
  • FFT Features: Llama3 and Llama3-instruct maintain high accuracy (≥93%) across operational conditions within the same component, demonstrating strong adaptability. However, performance declines sharply when tested on different machine components, highlighting a persistent challenge in cross-component generalization.

Overall Evaluation

  • Llama3 and Llama3-instruct consistently outperform other models on both FFT and statistical data, with accuracy and F1-scores exceeding 99% on FFT data and 95% on statistical data. WDCNN remains competitive on FFT data but lacks the contextual integration capabilities of LLMs.

Ablation Studies

  • Machine Specifications in Prompts: Including machine specifications in input prompts yields substantial accuracy improvements (up to 20%) for statistical data, but marginal gains for FFT data, where performance is already saturated.
  • Label Granularity: Detailed fault size labeling does not significantly affect LLM performance, indicating robustness to label configuration.

Implementation Considerations

  • Computational Requirements: LoRA-based fine-tuning enables efficient adaptation of LLMs with reduced memory and compute overhead, making deployment feasible on modern GPU hardware (e.g., NVIDIA A10).
  • Input Length Constraints: FFT segment length and quantization must be chosen to fit within LLM context windows (e.g., 8k tokens for Llama3).
  • Tokenization Strategy: Careful string encoding of numerical data is essential to avoid token fragmentation and ensure reliable LLM interpretation.
  • Generalization Limitations: While LLMs excel in cross-condition adaptation, cross-component generalization remains a bottleneck, necessitating further research into domain adaptation and reasoning-augmented approaches.

Implications and Future Directions

FD-LLM demonstrates that LLMs, when equipped with appropriate data encoding and contextual prompts, can surpass state-of-the-art DL and ML models in fault diagnosis tasks, particularly in scenarios requiring adaptability to new operational conditions. The integration of textual and numerical modalities enables richer representation learning and contextual reasoning.

However, the observed limitations in cross-component generalization suggest that further advances are needed. Incorporating reasoning intelligence—such as chain-of-thought (CoT) or process-supervised reward models—may enhance the LLM's ability to systematically analyze sensor data and improve diagnostic robustness. Additionally, exploring multimodal fusion strategies and transfer learning across heterogeneous machinery could further extend the applicability of LLM-based fault diagnosis systems.

Conclusion

FD-LLM establishes a robust framework for adapting LLMs to intelligent fault diagnosis, leveraging both FFT-based and statistical feature encoding pipelines. Extensive empirical results validate the superiority of Llama3 and Llama3-instruct in fault classification accuracy and adaptability, especially with FFT-processed data. The framework's ability to integrate contextual information and numerical data positions LLMs as promising candidates for next-generation industrial diagnostic systems. Future work should focus on enhancing cross-component generalization and integrating advanced reasoning mechanisms to further improve diagnostic reliability and interpretability.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.