Fourier Bias Projection (FBP) for Time Series
- Fourier Bias Projection is a modular technique that injects learnable frequency-domain inductive bias into time series imputation pipelines.
- It employs series decomposition, spectral transforms (DFT, STFT, FrSST), and sine/cosine projections to recover long-term trends in high-missingness regimes.
- FBP integrates trainable frequency projection layers within diffusion models to enhance denoising performance, generalization, and computational efficiency.
Fourier Bias Projection (FBP) is a modular technique for injecting precisely controlled, learnable frequency-domain inductive bias into machine learning pipelines, particularly denoising diffusion models for time series imputation. As introduced and formalized in the context of FADTI (Fourier and Attention Driven Diffusion for Multivariate Time Series Imputation), FBP enables models to incorporate spectral priors that are adaptive to missing data, while remaining computationally lightweight and compatible with standard architectures (Li et al., 17 Dec 2025).
1. Motivation and Rationale
The core motivation for Fourier Bias Projection is to address the limitations inherent to time-domain neural denoisers, especially in regimes of high missingness or presence of structured gaps in multivariate time series data. Standard time-based models, such as pure Transformer or convolutional architectures, often fail to recover long-term trends or periodicities when observable data is sparse or missing in contiguous regions. A naive application of the Discrete Fourier Transform (DFT) to such masked inputs results in severely aliased or distorted spectra, undermining the recovery of global structure.
FBP offers a solution by providing a learnable, frequency-filtered modulation of features, specifically targeting low-frequency bands where meaningful temporal dependencies are concentrated. This injects an explicit spectral bias which not only enables better recovery but also enhances generalization under distribution shifts and high missingness (Li et al., 17 Dec 2025).
2. Mathematical Formulation and Pipeline
The FBP module consists of several sequential stages: decomposition, spectral transformation, projection, and learnable re-projection.
2.1 Series Decomposition
Given an input tensor , where is batch, variables, embedding channels, and time steps, FBP first splits into a smooth trend and a high-frequency residual: The averaging kernel size is hyperparameterized to fit the data’s dominant trend length scale.
2.2 Frequency Transform
FBP supports multiple choices of spectral decomposition:
- Global DFT: For each channel , compute
with normalization factor .
- STFT: Windowed local DFT with window length , stride , and Hann window , energy-normalized.
- FrSST: Synchrosqueezed STFT, which first computes the STFT then reallocates energies based on instantaneous frequency estimates, yielding enhanced time-frequency localization.
The number of frequencies and frames are determined by the transform configuration.
2.3 Projection onto Sine/Cosine Bases
For any spectral representation , FBP projects onto real sine and cosine bases: where .
2.4 Learnable Frequency Bias and Projection
The aggregated real-valued tensor is flattened and mapped via dropout and a linear transformation: Separate parameters are maintained for the trend and residual branches; resulting outputs are summed or concatenated and injected into the main temporal model pipeline.
3. Integration within Diffusion Models
Within diffusion-based imputation frameworks, FBP is applied at every denoising block and each reverse step :
- The masked and noisy candidate series undergoes FBP, split into trend and residual.
- Frequency-domain projections are fused with the noisy signal as additional channels.
- The concatenated input proceeds to a temporal encoder (attention or convolution based), followed by denoising prediction and standard DDPM updates.
The pseudocode structure is as follows:
1 2 3 4 5 6 7 8 9 10 |
def FBP(X, K_d, K_f, F, L, w): X_trend = AvgPool1D(PadEdge(X), K_d) X_res = X - X_trend for branch in [X_trend, X_res]: X_fft = FrequencyTransform(branch) # DFT, STFT, or FrSST X_fft = NormalizeSpectra(X_fft) Z = Re(X_fft) * b_cos + Im(X_fft) * b_sin Z_flat = Flatten(Z) F_branch = Linear(Dropout(Z_flat)) return F_trend + F_res |
4. Empirical Effects and Ablation
Ablation studies confirm that FBP is the principal driver of gains in FADTI. Across several datasets (including ETT, Weather, METR-LA, and synthetic biological signals) and missingness regimes (random, contiguous, 10%-50% rates), variants lacking FBP (“None-Attn”/“None-Conv”) perform consistently worse than those with any Fourier module. On long, quasi-stationary datasets, DFT-Attn outperforms alternatives, while for non-stationary or high-missingness scenarios, STFT-Conv and FrSST variants are superior.
Empirically, the bias imposed by low-frequency Fourier projection enables strong recovery of long trends and smooth periodic structure, yielding lower MAE and better robustness to data gaps than fixed (non-learnable) frequency encodings or pure time-domain processing (Li et al., 17 Dec 2025).
5. Adaptivity, Inductive Bias, and Implementation Details
FBP’s adaptivity arises from its trainable projection weights, which allow the module to compensate dynamically for distortions due to missingness. Unlike fixed DFT encodings, FBP learns to recover spectra closest to the true underlying signal even when many observations are absent. By explicitly restricting the projection to the first bands, FBP enforces recoverable, smooth periodic patterns, operating as a structural prior particularly effective in the face of adversarial masking.
FBP can be implemented with parameter-free transforms (DFT, STFT, FrSST). The only trainable elements are the projection layers for trend and residual branches. Hyperparameters include (trend window, typically ), frequency resolution , and dropout rate (typically $0.1$--$0.3$) on the projected features. FBP incurs only spectral transform cost and minimal additional parameter footprint (Li et al., 17 Dec 2025).
6. Comparison to Other Spectral Approaches
Whereas previous approaches have used hand-crafted DFT or fixed spectral transforms as input augmentations, FBP distinguishes itself via:
- End-to-end differentiability, allowing spectral representations and projections to be tuned toward imputation objectives.
- Compatibility with multiple spectral bases, unified via a common interface.
- Modular insertion within each denoising step rather than single-time feature augmentation.
- Empirically validated superiority (in ablation) over both fixed DFT and non-Fourier models across diverse data and missingness patterns.
This positions FBP as a principal mechanism for encoding adaptive, learnable inductive biases in the frequency domain for time series imputation and related sequential modeling.
7. Impact and Applicability
FBP enables robust reconstruction and sample-efficient learning in tasks characterized by large-scale missingness, nonstationarity, or distributional drift. The structural prior induced by low-frequency projection leads to improved long-gap imputation, better resistance to spectral aliasing, and increased robustness to noise and mask artifacts. In the specific context of FADTI, models using FBP outperform several Transformer- and diffusion-based baselines—often maintaining accuracy when run with as few as two reverse diffusion steps, whereas prior diffusion models require an order of magnitude more steps for similar performance.
In summary, FBP provides a computationally efficient, parallelizable, and rigorously ablated approach for embedding frequency-aware structure priors in neural time series pipelines, with demonstrable gains in empirical accuracy, noise-robustness, and generality (Li et al., 17 Dec 2025).