Non-Decimated Wavelet Packet Transform
- NDWPT is a multiscale discrete analysis method that retains full-length coefficient sequences without downsampling to achieve translation-equivariant decomposition.
- It constructs a full binary packet tree yielding an information-rich, redundant coefficient library ideal for refined feature extraction in forecasting.
- The transform employs an efficient shifted pyramidal algorithm for online, causal computation, making it suitable for real-time signal analysis.
The non-decimated wavelet packet transform (NDWPT) is a multiscale discrete analysis providing a translation-equivariant expansion for time series, based on compactly supported wavelet packet functions without downsampling at any scale or node. By preserving all possible even- and odd-shifted outputs of the filter banks, the NDWPT produces, at every node in the full packet tree, coefficient sequences of the same length as the original signal—enabling refined feature extraction, multiscale analysis, and inherent translation-equivariance. This structure stands in contrast to the standard decimated wavelet packet transform (DWPT) and the non-decimated wavelet transform (NDWT), offering a much more redundant and information-rich coefficient library, particularly suitable for signal representation and forecasting contexts where feature translation-invariance is pivotal (Nason et al., 2024).
1. Theoretical Foundation
The NDWPT generalizes the construction of discrete wavelet packets for signals (), using a father (scaling) function and a mother (wavelet) function , each associated with a finite impulse response (FIR) filter: coefficients and for filter length . The wavelet packet tree expands by applying, recursively at each scale (), the two-channel filter bank:
0
At each node, packet functions 1 span the multiresolution decomposition. In the decimated setting (DWPT), each application is followed by downsampling by two, but in the NDWPT, both even and odd outputs are retained, yielding, per node and per scale, coefficient vectors matching the full input length 2. The result is a full binary packet tree comprising 3 nodes (Nason et al., 2024).
The NDWPT is inherently translation-equivariant: A unit circular shift of the input signal induces a corresponding shift in every coefficient vector across all nodes, which is not the case for DWPT.
2. Mathematical Specification
Let 4 denote the NDWPT packet coefficient at scale 5, packet-index 6, and time 7. The recursive decomposition proceeds from the finest scale (8, 9) with 0, downward as:
1
2
where the shift index 3 accommodates the absence of downsampling. For 4, the constant-end extension is invoked: 5 for all 6. All 7 possess the same temporal extent 8.
Reconstruction (inverse transform) involves upsampling (here trivial), then filtering with dual filter sets 9 and 0, though in feature extraction for predictive modeling, exact inversion is seldom needed.
The translation-equivariance is formalized: For input 1, the output 2 for all 3, indexing circularly.
3. Online Computation via Shifted Pyramidal Algorithm
The NDWPT can be computed efficiently in an online manner—suitable for streaming or sequential data—using a shifted pyramidal algorithm. For each new point 4, all coefficients 5 are updated as follows:
9
At each time 6, the filter window is shifted to ensure only past and present values are used. The constant-end extension ensures that indices before the range are handled without peeking ahead. This guarantees causal, online coefficient computation, crucial for forecasting applications where future data must not influence present features (Nason et al., 2024).
4. Structural and Computational Analysis
| Transform | Nodes (7 to 8) | Coefficient Shape per Node | Translation-invariant? |
|---|---|---|---|
| DWPT | 9 | 0 | No |
| NDWT | 1 (scaling-only) | 2 | Partial |
| NDWPT | 3 | 4 | Yes |
NDWPT produces 5 nodes, each with 6 coefficients, resulting in a substantial redundancy and correspondingly increased memory requirements—7. For typical 8–14 (9), this is manageable. Computational cost per basis selection is 0, since each level 1 covers 2 nodes and each node’s filter application is 3. This is asymptotically 4 for a fixed wavelet width 5 (Nason et al., 2024).
Trade-offs exist: The NDWPT’s packet-branching architecture yields an exponentially richer dictionary of localized features compared to the NDWT and DWPT. However, this demands downstream feature-selection or dimensionality reduction (e.g., via ridge regression screening or principal component analysis), as only a subset of coefficients is typically informative for a given forecasting or classification task. In contrast, decimated transforms are less memory-intensive but lack translation invariance, introducing artifact time-jitter in extracted features.
5. Wavelet and Decomposition Parameter Selection
Optimal performance requires careful tuning of wavelet family (vanishing moments, 6) and decomposition depth (7):
- Empirical results indicate that lower-order Daubechies wavelets (8–4, including Haar) often provide superior predictive features when used in non-temporal, one-step forecasting tasks, outperforming more complex wavelet families in cross-validation. Specifically, 9 was modal for NDWPT in non-temporal model settings.
- Decomposition depth should generally be set by 0, or to cover the largest periodicity of interest in the data. Practical experiments with 1 found 2–14 to suffice. Joint selection of 3 and 4 with cross-validation is effective when computationally feasible; otherwise, 5 should at least span the dominant seasonal or cyclical signal structure (Nason et al., 2024).
6. Empirical Forecasting Performance and Applications
The NDWPT has been systematically evaluated for univariate time series forecasting across a broad array of statistical and machine learning architectures:
- For non-temporal regressors (ridge regression, SVM, random forest, XGBoost, MLP), NDWPT-based features (with 3000 coefficients selected) replaced massive lagged feature sets (up to 3000 lags), reducing SMAPE by up to 31% for MLP and approximately 11% for XGBoost (baseline SMAPE 36–53% over 90 series).
- For temporal deep learning models (RNN, GRU, LSTM, TCN, Transformer variants), augmenting the input with 14 multiscale NDWPT “channels” yielded modest improvements for 7 out of 9 architectures. The largest improvement was observed for GRU+NDWPT (ca. 4% absolute SMAPE gain). State-of-the-art transformer models (Temporal Fusion Transformer, Informer, Autoformer, PatchTST) showed mixed results, indicating that NDWPT features are particularly beneficial for non-temporal models and simpler recurrent architectures (Nason et al., 2024).
A plausible implication is that NDWPT’s redundancy and translation invariance primarily benefit regression models where fixed-range lags are otherwise required, whereas strong temporal models may only capitalize modestly on packet-derived features.
7. Summary and Implementation Guidance
The NDWPT provides a flexible, translation-invariant framework for multiscale feature extraction across the entire frequency–time packet tree, producing rich, redundant coefficients of full time resolution. It offers demonstrable and often substantial gains in regression-based time series forecasting when replacing high-order lagged features, and modest empirical improvements for temporal deep networks in some settings. The primary limitation is memory and computational resource usage, as the coefficient library grows exponentially with depth 6. To exploit NDWPT effectively, use low- to moderate-order Daubechies wavelets, limit 7 consistent with signal length and periodicity, and apply screening or dimensionality-reduction prior to regression or classification. For most practical applications and modern hardware, these trade-offs remain favorable up to moderate tree depths (8–14) (Nason et al., 2024).