Wavelet-Based Positional Representation

Updated 22 April 2026

Wavelet-based positional representation is a multi-scale encoding method that employs wavelet transforms and amplitude–phase decompositions to capture spatial and temporal information with fine localization.
It adapts to various domains by transforming signal coefficients into localized position embeddings via discrete transforms, graph spectral kernels, and wavelet space attention modules in neural architectures.
This approach provides enhanced interpretability, shiftability, and adaptivity, significantly improving extrapolation and efficiency in transformer models and other deep learning systems.

A wavelet-based positional representation encodes spatial or temporal position information using wavelet transforms and their associated multiresolution bases. Rather than relying on single-scale, global, or non-adaptive encodings, this approach grants localized, multi-scale, and inherently frequency-aware parameterizations of position. Wavelet-based positional representations have emerged in mathematical analysis, deep learning (notably transformers and graph neural networks), numerical physics, machine learning on manifolds, and structured position-coding patterns, providing explicit advantages in localization, extrapolation, adaptivity, and interpretability.

1. Mathematical Foundations: Wavelet Transform and Amplitude–Phase Positional Coding

At the core of wavelet-based positional representation is the wavelet transform and its family of scale- and translation-parameterized basis functions. For a 1D signal $f(x)$ and a mother wavelet $\psi$ , the family $\{ \psi_{j,k}(x) = 2^{j/2}\psi(2^j x - k) \}$ forms a multi-scale, localized basis. Each basis function is indexed by scale $j$ (frequency/timescale) and position $k$ (translation), yielding coefficient matrices that simultaneously encode local position and frequency content. The complete wavelet expansion is:

$f(x) = \sum_{j,k} c_{j,k} \psi_{j,k}(x)$

The dual-tree complex wavelet transform (DT-CWT) and its amplitude–phase representation further refine this by leveraging the group of fractional Hilbert transforms (fHT) to interpret complex coefficients in terms of spatial shifts. For each location $(j,k)$ :

Amplitude $A_{j,k} = |c_{j,k}|$ measures local signal intensity
Phase $\phi_{j,k} = \arg(c_{j,k})$ encodes local translation ("position") within the wavelet window

The reconstruction formula becomes:

$f(x) = \sum_{j,k} A_{j,k} \left[ H_{\phi_{j,k}/\pi}\{\psi_{j,k}\}(x) \right]$

where $\psi$ 0 is the fHT operator. For Gabor-like wavelets, the phase directly determines the local shift of oscillatory patterns within their window, establishing a direct link between phase and local positional displacement (0908.3855, 0908.3383).

In higher dimensions, directional wavelets and their associated directional Hilbert transforms generalize positional coding to encode location and orientation-dependent structure.

2. Discrete and Graph-based Wavelet Positional Encodings

Discrete wavelet transforms (DWT) enable efficient representation of positions in sampled data. For time series or sequence models, the DWT decomposes each sequence into approximation and detail coefficients at several dyadic scales, which can be mapped to trainable or signal-dependent embedding vectors. Precisely,

Each $\psi$ 1-th-level coefficient $\psi$ 2 is linked to a learnable or signal-conditioned embedding vector
The inverse DWT reconstructs a sequence of per-position embeddings wherein each embedding carries localized, multi-scale context of the original series (Irani et al., 18 Sep 2025, Irani et al., 12 Feb 2026).

In graphs, spectral graph wavelets defined via diffusion or heat kernels on the normalized Laplacian produce for every node $\psi$ 3 a set of multi-scale diffusion vectors $\psi$ 4. Collecting these across scales and collapsing as needed (e.g., via equivariant networks), one builds per-node positional codes with provable localization in both spatial and spectral domains (Ngo et al., 2023). This offers precise encoding of node position relative to the hierarchical multiscale topology of the graph.

The following table summarizes three core formulations:

Domain	Basis Elements	Position Encoded As
1D sequence	$\psi$ 5	$\psi$ 6: scale and step
Graph	$\psi$ 7	$\psi$ 8: node and scale
2D/Directional	$\psi$ 9, orientations $\{ \psi_{j,k}(x) = 2^{j/2}\psi(2^j x - k) \}$ 0	$\{ \psi_{j,k}(x) = 2^{j/2}\psi(2^j x - k) \}$ 1: scale, shift, angle

3. Wavelet-based Positional Representation in Deep Neural Architectures

Wavelet-based positional representation has significant impact in transformer and attention-based models:

Wavelet Space Attention (WavSpA): A module that replaces or augments internal positional encodings in transformers by projecting sequences via forward wavelet transforms, performing attention in the wavelet (coefficient) space, and reconstructing via inverse wavelet transforms. This inherently embeds positional and frequency content without explicit positional vectors, supporting better long-sequence modeling and extrapolation (Zhuang et al., 2022).
Dynamic Wavelet Positional Encoding (DyWPE): For time series transformers, the signal itself is transformed via DWT, and position embeddings are dynamically generated by gating per-scale embeddings with the local wavelet coefficients. This is both signal-aware and multi-scale, allowing the embedding to reflect local structure and offering improved predictive power and adaptability over purely index-based schemes (Irani et al., 18 Sep 2025, Irani et al., 12 Feb 2026).
Wavelet Relative Position Based Attention: In the relative position embedding framework for LLMs, multi-scale wavelet functions (e.g., Ricker wavelets) generate a $\{ \psi_{j,k}(x) = 2^{j/2}\psi(2^j x - k) \}$ 2-dimensional positional bias between any query-key pair as a function of their relative index, with multiple scales and shifts spanning diverse context windows. This approach enables robust extrapolation to much longer contexts than seen during training, a limitation of single-scale (e.g., RoPE) or window-restricted (e.g., ALiBi) alternatives (Oka et al., 4 Feb 2025).
Visual Representations: In continuous visual domains, wavelet primitives parameterize position in high-dimensional spaces (e.g., $\{ \psi_{j,k}(x) = 2^{j/2}\psi(2^j x - k) \}$ 3) using learnable, localized, frequency-modulated wavelet functions. By directly representing spatial position and local frequency, these primitives achieve high-fidelity, wide-band representations efficiently (Zhang et al., 18 Aug 2025).

4. Properties, Interpretability, and Theoretical Insights

Wavelet-based positional representations are characterized by:

Multiresolution Locality: Positions are encoded at many scales, supporting both fine and coarse localization. This is crucial for non-stationary or hierarchical data (e.g., language, biological signals, graphs) (Zhuang et al., 2022, Oka et al., 4 Feb 2025, Ngo et al., 2023).
Shiftability and Invariance: The fractional Hilbert transform and amplitude–phase parameterization reveal that positional phase encodings naturally correspond to local translations (shifts) of the wavelet function, rather than only periodic phase (as in sinusoidal encodings). The group structure (composition law $\{ \psi_{j,k}(x) = 2^{j/2}\psi(2^j x - k) \}$ 4) underpins improved shift-invariance (0908.3855, 0908.3383).
Parseval/Energy Preservation: In certain analytic constructions (e.g., Taylor-wavelet expansions), biorthogonal wavelet decompositions yield exact energy theorems—analogous to Parseval's identity—ensuring the decomposition preserves signal norm and localized moment contributions (Oliveira et al., 2015).
Signal Adaptivity and Extrapolation: Signal-aware wavelet representations permit positional embeddings to adapt dynamically to each input, in contrast to static index-based or learned-vector positional encodings. This supports robust generalization to unseen data scales, especially in non-stationary signals or long-context evaluation (Irani et al., 18 Sep 2025, Oka et al., 4 Feb 2025).
Spectral and Spatial Localization: For graph, manifold, and directional domains, wavelets balance localization in both the spectral (frequency/eigenmode) and spatial (node/location) domains, which is critical for capturing hierarchical and geometric structure (Ngo et al., 2023, McEwen et al., 2015).

5. Practical Implementations and Applications

Recent research demonstrates that wavelet-based positional representations are effective and computationally tractable across domains:

Linear Complexity: Fast wavelet transforms and their inverses (e.g., via Mallat's algorithm or lifting schemes) operate in $\{ \psi_{j,k}(x) = 2^{j/2}\psi(2^j x - k) \}$ 5 time, and are compatible with end-to-end backpropagation (Zhuang et al., 2022, Irani et al., 18 Sep 2025, Irani et al., 12 Feb 2026).
Learnability: Both fixed and adaptive wavelet filter parameters (scaling/wavelet filters, gating weights, scale embeddings) can be optimized during model training, supporting flexible adaptation to data.
Position Coding and Error Correction: In digital communication and labeling, binary wavelet transforms are employed for subarray-unique block-based position codes, guaranteeing perfect reconstruction and unique decodability per subarray (0706.0869).
Visual Rendering and Fast Splatting: Wavelet-based visual primitives offer spectrum-aware, highly localized, and fast-decodable representations for high-dimensional spatial signals, outperforming classic Fourier-feature and Gaussian-based encodings in rendering fidelity and efficiency (Zhang et al., 18 Aug 2025).

6. Limitations and Open Directions

Despite the strengths of wavelet-based positional representations, several challenges and areas for further development are noted:

Fixed vs. Learned Parameters: Existing implementations often use hand-chosen scales and shifts. A plausible direction is to learn or adaptively tune these parameters, potentially via neural parameterization (Oka et al., 4 Feb 2025).
Computational Overheads: Relative-position embeddings derived from wavelet representations can incur $\{ \psi_{j,k}(x) = 2^{j/2}\psi(2^j x - k) \}$ 6 cost if not optimized with memory-reduction techniques (e.g., indexed lookups, scatter tricks) (Oka et al., 4 Feb 2025).
Wavelet Selection: The empirical utility depends on the choice of wavelet; Ricker and Gabor often outperform classic Haar, but further adaptation or neural approaches may improve performance (Oka et al., 4 Feb 2025).
Distributions and Convergence: Purely local, distributional (e.g., delta-derivative-based) representations provide closed-form positional decompositions (as with Taylor-wavelet analysis), but do not possess the scale/frequency localization of $\{ \psi_{j,k}(x) = 2^{j/2}\psi(2^j x - k) \}$ 7-admissible wavelets and suffer from limited convergence zones (Oliveira et al., 2015).
Noise and Error Propagation: Error-resilience and robustness require careful wavelet design, particularly in digital or communication applications where subarray uniqueness and error correction are paramount (0706.0869).

7. Summary Table: Core Wavelet-Based Positional Methods

Application Domain	Wavelet Mechanism	Key Positional Feature	Reference
1D/2D Analysis	DT-CWT + fHT	Phase-shifted Gabor	(0908.3855)
Sequence Models	DWT, DyWPE	Multi-scale embedding	(Irani et al., 18 Sep 2025)
Transformers	WavSpA, RPE	Wavelet-bias, attention in wavelet space	(Zhuang et al., 2022, Oka et al., 4 Feb 2025)
Graphs	Spectral Heat Kernel	Node+scale-local code	(Ngo et al., 2023)
Visual (N-D)	Parametric Morlet/Gaussian	Localized visual primitives	(Zhang et al., 18 Aug 2025)
Digital/Labeling	Binary wavelets	Subarray-unique blocks	(0706.0869)

Wavelet-based positional representations form a mathematically principled, multi-scale, and structurally adaptive foundation for encoding spatial or temporal position across signal processing, machine learning, vision, natural language processing, and beyond. Their effectiveness stems from the ability to reconcile locality, scale, shiftability, and frequency content within a single, compact representational machinery.