Papers
Topics
Authors
Recent
2000 character limit reached

Few-Shot Fault Time-Series Generation

Updated 26 December 2025
  • The paper introduces a diffusion-based generative framework using DDPM and adapter architectures that accurately mimic real fault time series.
  • It leverages dataset token conditioning and dynamic convolution layers to enable effective few-shot adaptation to scarce fault data.
  • Empirical evaluations show that the method achieves low Context-FID and high downstream accuracy, enhancing predictive maintenance and anomaly detection.

Few-shot fault time-series generation addresses the synthesis of realistic and diverse temporal data segments corresponding to rare fault events, given only a handful of observed fault examples. This area is central to industrial diagnostics, predictive maintenance, and robust forecasting, where ground-truth fault traces are typically scarce due to the rare and heterogeneous nature of real-world failures. Recent advances leverage diffusion-based generative models, enabling models to produce synthetic fault time series whose statistical and structural properties mimic those of true fault data under data-scarce regimes (Gonen et al., 26 May 2025, Xu et al., 19 Nov 2025).

1. Diffusion-Based Generative Methodologies

Diffusion models underpin state-of-the-art approaches for time-series synthesis under scarcity. The foundational setup employs a discrete Denoising Diffusion Probabilistic Model (DDPM) framework. Given a clean sequence x0x_0 (for τ×d\tau\times d multivariate time points), a diffusion process adds Gaussian noise over TT steps:

q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t}\,x_{t-1}, \beta_t I)

with a pre-defined variance schedule {βt}\{\beta_t\}. The reverse process is parameterized by a neural network trained to denoise and reconstruct x0x_0 from the noisy xtx_t via

pθ(xt1xt)=N(xt1;μθ(xt,t),Σθ(xt,t))p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \Sigma_\theta(x_t, t))

where μθ\mu_\theta, often expressed using a noise-predicting score network, and an optionally fixed Σθ\Sigma_\theta, guide sampling during inference. EDM preconditioning further modulates noise scales to enhance high-fidelity synthesis (Gonen et al., 26 May 2025).

FaultDiffusion introduces a positive–negative difference adapter architecture, modeling the fault distribution as pf(x)=pn(x)+Δϕ(x)p_f(x) = p_n(x) + \Delta_\phi(x), where pnp_n is a pretrained normal-data diffusion backbone and Δϕ\Delta_\phi is a compact, trainable adapter responsible for modeling domain shift (Xu et al., 19 Nov 2025).

2. Architectural Innovations for Domain Adaptation

Two principal architectural mechanisms enable few-shot adaptation:

  • Dynamic Convolutional Layers (DyConv): Dynamic convolution interpolates a canonical kernel to match arbitrary input/output channel dimensions at runtime. This mechanism provides a unified model interface over heterogeneous domains without channel-padding inefficiencies (Gonen et al., 26 May 2025).
  • Dataset Token Conditioning: Each domain, including new few-shot datasets, is assigned a dedicated learnable token (embedding yRdy\in\mathbb{R}^d), injected into normalization layers via affine transformations. This token conditions the denoiser on domain identity, promoting domain-aware synthesis (Gonen et al., 26 May 2025).

FaultDiffusion employs an adapter-based approach, freezing all backbone parameters and introducing sliding-window local attention adapters into each decoder layer. Only adapter parameters (5%10%\approx5\%-10\% of the total) are fine-tuned in the few-shot regime, which constrains overfitting and localizes capacity to domain discrepancies (Xu et al., 19 Nov 2025).

3. Training Protocols and Objectives

Pre-training

Unified diffusion models are pre-trained on large, diverse time-series corpora (e.g., over 300k series spanning stocks, energy, physiological, and industrial datasets). Training employs AdamW optimization with high-capacity configurations—batch sizes of $2048$, 1000 epochs, and log-normal noise schedules—with no explicit domain-generalization loss beyond dataset token conditioning (Gonen et al., 26 May 2025).

Few-Shot Adaptation

For fault time-series domains, adaptation typically proceeds as:

  • Collect $5$–$10$ fault sequences.
  • Assign a new learnable domain token or initiate difference adapters.
  • Apply the same noise schedule as in pre-training; optionally extend σmax\sigma_{\max} for sharper anomalies.
  • Fine-tune only the new token or adapter over a few thousand steps on the limited data.
  • Generate synthetic sequences by running the reverse diffusion conditioned on the domain-specific mechanism.

Training objectives combine a base denoising loss (weighted MSE or 1\ell_1) with auxiliary regularizers:

Objective Formula/Description Role
EDM-style loss E[λ(σ)ϵNθ(xσ,σ,y)22]\mathbb{E}[\,\lambda(\sigma)\,\|\epsilon - N_\theta(x_\sigma,\sigma,y)\|_2^2\,] Noise prediction
Diversity loss E[s1s222]\mathbb{E}[\|s_1 - s_2\|_2^2] Prevents mode collapse
Full FaultDiffusion L(θ,ϕ)=Lbase+λLdiversity\mathcal{L}(\theta,\phi) = \mathcal{L}_{\rm base} + \lambda \mathcal{L}_{\rm diversity} Balances fidelity/variety

λ\lambda controls the trade-off between diversity and fidelity (empirically λ0.11\lambda\sim0.1-1) (Xu et al., 19 Nov 2025).

4. Empirical Evaluations and Benchmarks

Quantitative evaluations employ several metrics:

  • Discriminative Score: Classifier accuracy on real-vs-synthetic discrimination (lower is better).
  • Predictive Score: Next-step forecasting error for models trained on synthetic and tested on real.
  • Context-FID (c-FID): Fréchet distance between encoded feature distributions.
  • Correlational Score: L1L_1 error on cross-correlation matrices.
  • Diversity: Pairwise ACF distance or inter-sample variability.

Key results indicate that unified diffusion models and dedicated adapter-based fault frameworks achieve state-of-the-art performance in few-shot regimes across metrics and real-world datasets (including custom industrial, Tennessee Eastman, DAMADICS):

Model Context-FID Discriminative Predictive Downstream Accuracy
FaultDiffusion 6.1\approx6.1 $0.42$ $0.13$ $0.89$
Best GAN Baseline $7.03$ $0.45$–$0.50$ $0.14$–$0.24$ $0.66$–$0.74$

Ablation results confirm that both adapter mechanisms and diversity regularization are critical in achieving low Context-FID and high diversity. Removing either component degrades generative performance (Xu et al., 19 Nov 2025).

5. Pipeline Implementation and Practical Guidelines

A unified, pre-train/fine-tune pipeline streamlines adaptation to fault time-series domains.

Training & Inference Pipeline:

1
2
3
4
5
Algorithm: Unified Pre-train & Fault Fine-tune
Inputs: Multidomain datasets {Dᵐ}, pre-trained denoiser Nθ, dataset tokens Tok
1. Pre-training: optimize θ, Tok on pooled normal/fault-free time-series
2. Few-shot Adaptation: allocate and fine-tune new token/adapters y* with scarce fault traces
3. Inference: sample random noise, run reverse diffusion, and reconstruct time series

Concrete pseudocode and detailed algorithmic steps for both the unified and FaultDiffusion frameworks are provided in (Gonen et al., 26 May 2025) and (Xu et al., 19 Nov 2025), specifying AdamW hyperparameters, batch sizing, noise scheduling, and adapters' application.

6. Limitations and Research Directions

Key limitations are:

  • Dependence on large-scale normal-data pre-training; performance degrades with insufficient normal data or out-of-distribution faults (Xu et al., 19 Nov 2025).
  • Adapter-based specialization may be insufficient for extreme domain shifts or highly novel anomalies.
  • Diversity regularization entails a trade-off: high λ\lambda can reduce sample fidelity by introducing spurious variability (Xu et al., 19 Nov 2025).
  • Handling variable-length or streaming sequences remains nontrivial.
  • Current methodologies generally assume fixed sequence length (e.g., τ=24\tau=24 in industrial settings).

Ongoing directions include conditional generation (by fault type/severity), continuous-time diffusions for irregular sampling, richer diversity regularizers (contrastive/score matching), meta-learning for adapter initialization, and relaxing backbone freezing to allow limited end-to-end fine-tuning (Xu et al., 19 Nov 2025). A plausible implication is that meta-learned initialization and conditional mechanisms will increase robustness to ultra-rare or previously unseen fault modalities.

7. Applications and Impact

Few-shot fault time-series generation directly benefits domains requiring robust anomaly detection, imbalanced classification, and data augmentation for industrial forecasting systems. Generated synthetic faults provide additional training data for fault diagnosis, aid the predictive maintenance of critical equipment, and support benchmarking for anomaly detection algorithms. The ability to produce high-fidelity, diverse fault time series from minimal data is expected to improve the resilience and accuracy of data-driven industrial and cyber-physical systems (Gonen et al., 26 May 2025, Xu et al., 19 Nov 2025).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Few-Shot Fault Time-Series Generation.