Few-Shot Fault Time-Series Generation Framework

Updated 26 November 2025

The paper introduces a novel few-shot fault time-series generation framework that leverages diffusion models, token conditioning, and LLM-driven synthesis to overcome data scarcity.
It employs domain adaptation techniques, including adapter fine-tuning and explicit diversity loss, to accurately synthesize fault features from predominantly normal data.
Empirical results show significant improvements in authenticity, diversity, and predictive performance compared to traditional GAN and VAE methods on industrial benchmarks.

Few-shot fault time-series generation frameworks address the challenge of synthesizing realistic and diverse fault-condition time series from very limited real-world fault data. Such methods are central in industrial monitoring and predictive maintenance, where fault events are rare and annotated fault data are expensive to acquire. The recent wave of generative architectures—especially diffusion models, large-scale pre-training approaches, and language-model-based pipelines—has enabled significant progress, overcoming previous limitations in authenticity, diversity, and robustness. Below, core methodologies, system architectures, theoretical foundations, experimental outcomes, and ongoing challenges are surveyed, with technical detail as established in (Xu et al., 19 Nov 2025, Gonen et al., 26 May 2025), and (Rousseau et al., 21 May 2025).

1. Problem Setting and Motivation

Few-shot fault time-series generation targets the synthesis of fault data given an abundant normal (non-fault) time-series collection ( $N_n \gg 0$ ) and a tiny set of labeled fault events ( $N_f \ll N_n$ ). The generative model $G$ must be pretrained on $\mathcal{D}_n$ (normal data) and efficiently adapted to $\mathcal{D}_f$ (fault data), handling the wide domain gap—characterized by abrupt, often nonstationary fault signatures—and pronounced intra-class variability among fault instances (Xu et al., 19 Nov 2025).

Traditional generative time-series models, including GAN and VAE variants, are empirically deficient in this regime: they either memorize the few anomalies ("mode collapse") or revert to unrealistic, smoothed signals due to insufficient characterization of the fault manifold. Few-shot frameworks—such as "FaultDiffusion" (Xu et al., 19 Nov 2025), ImagenFew (Gonen et al., 26 May 2025), and SDForger (Rousseau et al., 21 May 2025)—leverage domain adaptation, conditional architectures, and explicit diversity control to bridge this gap.

2. Model Architectures for Few-Shot Generation

Three dominant framework families have emerged:

a) Diffusion Model Backbones

FaultDiffusion, as introduced in (Xu et al., 19 Nov 2025), operates with a Denoising Diffusion Probabilistic Model (DDPM) backbone. The forward process applies Gaussian noise to input series $x_0$ at each step $t$ , parameterized by a variance schedule $\beta_t$ , yielding

$x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon,$

with $\epsilon\sim\mathcal{N}(0,I)$ and $\bar{\alpha}_t = \prod_{s=1}^t(1-\beta_s)$ . The reverse process employs a deep Transformer encoder-decoder $\epsilon_\theta(x_t, t)$ with sliding-window masked self-attention, generating both trend and seasonal components.

b) Cross-Domain Pretrained Diffusion with Token Conditioning

The unified framework of (Gonen et al., 26 May 2025) introduces large-scale cross-domain diffusion pre-training, using UNet backbones with dynamic convolutional layers ("DyConv") and dataset-token conditioning. Here, every data domain—such as fault, weather, biomedical—is represented by a distinct token $y_m$ , injected via adaptive group normalization (AdaGN). Few-shot adaptation involves allocating a new token $y^*$ for $\mathcal{D}_f$ and jointly fine-tuning $N_\theta$ and $y^*$ , maintaining performance with as little as $N=10$ fault instances.

c) LLM-Driven Synthesis

SDForger (Rousseau et al., 21 May 2025) initiates from a functional tabular embedding of the multivariate time series, encodes these as textual fill-in-the-middle prompts, and fine-tunes an autoregressive LLM (e.g., GPT-like) to generate synthetic samples. The approach leverages token-wise text completion to synthesize tabular representations that reconstruct back to continuous time series, with explicit channels for fault presence/type and sampling controls for embedding diversity and anomaly strengths.

3. Domain Adaptation and Diversity Mechanisms

Positive-Negative Difference Adapter

FaultDiffusion decouples normal and fault synthesis via a positive-negative difference adapter $\Delta_\theta(x)$ : $p_f(x) = p_n(x) + \Delta_\theta(x),$ where $p_n$ models normal data, and only $\Delta_\theta$ is fine-tuned using $\mathcal{D}_f$ . The adapter integrates local temporal context at each diffusion denoising step through multihead sliding-window attention, preventing catastrophic forgetting and focusing adaptation (Xu et al., 19 Nov 2025).

Diversity Loss and Mode Collapse Prevention

To counteract mode collapse, a diversity loss $L_\mathrm{div}$ is introduced: $L_\mathrm{div} = \mathbb{E}_{t,x_0,\epsilon}\|s_1 - s_2\|^2_2,$ where $s_1, s_2$ are independent noise predictions for the same context. The composite objective

$L_\mathrm{total} = L_\mathrm{base} + \lambda L_\mathrm{div}$

(with $L_\mathrm{base}$ as standard denoising loss) encourages sample dispersion in the synthetic fault space, empirically confirmed by autocorrelation feature (ACF) diversity and t-SNE analyses (Xu et al., 19 Nov 2025).

Dynamic Convolution and Token Conditioning

DyConv in (Gonen et al., 26 May 2025) enables a single UNet backbone to process multivariate series with dynamic sensor counts, crucial in industrial systems with heterogeneous instrumentation. Dataset-token conditioning enables fast adaptation and structured interpolation (e.g., for prognostic progression between normal and fault states).

LLM-Based Prompt and Sampling Strategies

SDForger (Rousseau et al., 21 May 2025) injects fault-type, severity, and binary event indicators as additional pseudo-channels or conditioning text. Sampling controls (temperature, prior bias) and basis set augmentation (e.g., adding wavelet-based anomaly-sensitive atoms) further guide the LLM to explore rare, fault-representative signal regions.

4. Training and Inference Procedures

All frameworks rely on explicit pretraining phases, either solely on normal data (FaultDiffusion) or via multi-domain corpora (ImagenFew). Few-shot adaptation occurs via:

Adapter-only fine-tuning, with frozen backbone parameters and targeted update of the difference function (FaultDiffusion).
Joint token and network update (ImagenFew), with regularization techniques (EMA, weight decay, runtime padding masks).
Partial LLM fine-tuning, typically adjusting only the last layers or fill-in heads to reduce overfit risk and memory consumption (SDForger).

Inference follows the standard diffusion sampling (starting from noise, iteratively denoising via the learned scheduler) or involves autoregressive text completion followed by embedding decoding and validity filtering.

A summary of key procedural elements appears below:

Framework	Pretraining Corpus	Few-Shot Adaptation Mechanism	Diversity Regularization
FaultDiffusion	Normal time series only	Adapter fine-tuning (frozen backbone)	Explicit $L_\mathrm{div}$
Unified Diffusion	$\sim\!300$ k multi-domain time series	Token & backbone joint fine-tuning	Structural (DyConv/token)
SDForger	Not specified (task-specific)	LLM prompt+head fine-tuning	Sampling temperature, prompt

5. Evaluation Metrics and Empirical Results

All frameworks adopt multiple authenticity, diversity, and utility metrics to assess performance. Core metrics include:

Context-FID: Fréchet distance of synthetic and real series in feature space.
Correlational Score: $\ell_1$ difference in empirical autocorrelation matrices.
Discriminative Score: LSTM or other classifier accuracy at distinguishing real vs. synthetic (0.5 = perfect realism).
Predictive Score (TSTR): Test-on-real—MSE when forecasting with a model trained only on synthetic data.
Diversity (ACF): Average pairwise difference in autocorrelation features of synthetic samples.
Feature statistics (MDD, ACD, SD, KD): Differences in marginal distributions, autocorrelations, skewness, and kurtosis between real and synthetic sets (Rousseau et al., 21 May 2025).

On custom industrial benchmarks (sequence length=24, $N_f=10$ per class), FaultDiffusion achieves the lowest context-FID (12.141 vs. 15.884–20.081 for GAN/VAE baselines), CorrScore (89.25), Discriminative Score (0.382), and PredMSE (0.139), outperforming Cot-GAN, TimeGAN, TimeVAE, and even unadapted Diffusion-TS. Strong results hold for TEP and DAMADICS datasets (Xu et al., 19 Nov 2025). SDForger and ImagenFew likewise demonstrate >50% improvements in discriminative and contextFID metrics over existing methods, even in the regime $N=10$ (Gonen et al., 26 May 2025, Rousseau et al., 21 May 2025).

Ablation studies confirm the necessity of adaptation and explicit diversity regularization; removing either component markedly degrades generation fidelity and downstream utility (Xu et al., 19 Nov 2025).

6. Specialization to Fault Detection and Prognostics

Extensions to fault diagnosis include:

Incorporation of fault severity tokens to enable conditional generation.
Fault-classification auxiliary heads on diffusion bottlenecks to encourage generation of signals salient for diagnosis (Gonen et al., 26 May 2025).
Use of domain tests and masking losses to emphasize accurate synthesis within annotated fault intervals (Gonen et al., 26 May 2025, Rousseau et al., 21 May 2025).
Techniques for synthesizing gradual fault progression via interpolation between normal and fault tokens.
Explicit prompt conditioning and pseudo-channel augmentation for structured, metadata-aware synthesis (Rousseau et al., 21 May 2025).

Such techniques allow these frameworks to support downstream anomaly detection, remaining useful not only for synthetic augmentation but also for robust prognostics and scenario simulation.

7. Limitations and Future Directions

Current few-shot fault time-series generation methods have several constraints:

All approaches depend critically on rich normal-data pretraining; performance degrades if normal/fault domains are highly mismatched (e.g., new sensor types, previously unseen modalities) (Xu et al., 19 Nov 2025).
Diversity-promoting losses can be computationally expensive due to pairwise computations. Scalable alternatives based on contrastive learning or variational objectives are sought (Xu et al., 19 Nov 2025).
Most frameworks model unconditional generation. Conditional synthesis (e.g., by fault attributes, severity, or run-length) and physics-informed modeling remain largely unexplored (Xu et al., 19 Nov 2025, Gonen et al., 26 May 2025).
Integration of domain knowledge (e.g., physics-based constraints, transfer learning across equipment) is an open research direction.
For LLM-based methods, further experiments are needed to characterize limits in generalization and control.

Overall, few-shot fault time-series generation frameworks—spanning diffusion, cross-domain token conditioning, and LLM-driven synthesis—have rapidly established state-of-the-art performance in scarce-data regimes, enabling realistic, diverse augmentation of industrial fault datasets and laying the technical foundation for future research in scalable, condition-aware synthetic time series generation (Xu et al., 19 Nov 2025, Gonen et al., 26 May 2025, Rousseau et al., 21 May 2025).