Wavelet-Based Positional Encoding

Updated 12 November 2025

Wavelet-based positional representation is a technique that encodes position and scale using localized wavelet transforms for joint spatial and frequency analysis.
It is applied in neural implicit representations, transformer models, and graph learning to overcome limitations of global or scale-invariant encodings.
Empirical results demonstrate improved robustness and extrapolation performance in tasks like time series analysis, image rendering, and language modeling.

A wavelet-based positional representation refers to any mapping, embedding, or coding scheme in which spatial or index positions are encoded using wavelet transforms, wavelet atoms, or wavelet-inspired basis functions, enabling joint localization in both position (space, time, or graph structure) and scale/frequency domains. This approach has been adopted across signal processing, neural implicit representations, transformers, graph learning, position coding, and quantum dynamical simulation to address fundamental limitations of global or scale-invariant positional schemes. It leverages the unique time–frequency and spatial–spectral localization properties of wavelets to enable multiresolution positional descriptions, overcome denoising and extrapolation barriers, and attain improved robustness or expressiveness in downstream models and algorithms.

1. Mathematical Foundations of Wavelet-Based Positional Coding

Wavelet analysis centers on the decomposition of a signal $x(t)$ (or a vector-valued feature signal) into components parameterized by scale $a>0$ and shift $b\in\mathbb{R}$ : $\psi_{a,b}(t)\;=\;\frac{1}{\sqrt{a}}\;\psi\left(\frac{t-b}{a}\right)$ with $\psi$ a mother wavelet. The continuous wavelet transform (CWT) yields

$W(a,b)\;=\;\int_{-\infty}^{\infty}x(t)\overline{\psi_{a,b}(t)}dt$

and in the discrete setting, sequences ( $x[n]$ , $n=0,...,N-1$ ) are mapped via basis families

$\psi_{j,k}[n]=2^{-j/2}\psi[2^{-j}n-k]$

with $j$ (scale) and $k$ (shift) defining a multiresolution grid. The key property is joint localization:

Spatial/time windowing via $b$ or $k$ (localized support, compact or rapidly decaying)
Frequency or scale targeting via dilation $a$ or $j$ (narrowband, band-pass, multiscale)

The representation $x[n] =\sum_{j,k}W_{j,k}\psi_{j,k}[n]$ allows positional information (encoded by the shift $b$ / $k$ and scale $a$ / $j$ ) to be embedded directly in the coefficients $W_{j,k}$ or in coordinate mappings parameterized by wavelet atoms.

Specialized wavelet positional representations have also been constructed in GF(2) for error-free position coding in 2D arrays (“binary wavelet codes” (0706.0869)), via spectral decomposition for graphs (“graph wavelets” (Ngo et al., 2023)), using analytic or modulated wavelets in neural network architectures (Roddenberry et al., 2023, Oka et al., 4 Feb 2025), and through complex/coherent phase modulation for amplitude–phase shift encoding (0908.3383, 0908.3855).

2. Neural and Algorithmic Implementations

Implicit Neural Representations (INRs)

When using MLPs to model a function $f_\theta:\mathbb{R}^d\to\mathbb{R}$ or $\mathbb{C}$ , the first layer applies a positional encoding $\psi$ to the input coordinates, followed by nonlinear processing. The choice of $\psi$ determines the inductive bias:

Sinusoidal (“Fourier features”): global, infinite support, delta localization in frequency
Wavelet-based (“complex wavelet activations”): functions of the form

$\psi(t;\omega,\sigma) = e^{-t^2/(2\sigma^2)} e^{-i2\pi\omega t}$

with local support in both space ( $\sigma$ ) and frequency ( $\omega$ ).

In practice, a set of $F_1$ wavelet channels are parameterized via dilations, translations, and modulations: $z^{(0)}_t(r) = \psi((\langle w_t, r\rangle - u_t)/s_t)$ Both real and imaginary parts can be used as separate feature channels. Higher MLP layers recursively mix and amplify these atoms, generating harmonics and progressively higher frequency components—this algebraic decoupling naturally separates low-frequency structure (scaling atoms) from high-frequency detail (wavelet atoms) (Roddenberry et al., 2023).

Initialization can exploit wavelet modulus maxima (WMM): first-layer wavelet “nodes” are placed at the coordinates and scales where the modulus of the continuous wavelet transform of the target signal is maximized, particularly accelerating convergence on signals with sharp edges or localized features.

Attention and Transformers

Wavelet-based positional representations in transformer settings use discrete wavelet transforms (DWT) to project sequences into multi-resolution bases, capturing position both via shift (time index) and scale (frequency band) (Zhuang et al., 2022):

Forward DWT: $X\mapsto\{C_{\varphi},C_{\psi_1},...,C_{\psi_J}\}$ transforming each head or channel into detail/approximation coefficients.
Wavelet-space Attention: Self-attention (full/linear/local) is learned directly in the wavelet coefficient space, enabling position- and scale-aware attention mechanisms.
Inverse DWT: The output is reconstructed in the original domain, preserving information and allowing multiresolution composition.

Adaptive and fixed wavelet bases may be used; adaptive schemes (e.g., learned filters, lattice parameterization, or wavelet lifting) endow the system with greater modeling flexibility. Empirical results show improved performance on tasks requiring long-range dependency modeling and accurate local-global pattern coupling versus sinusoidal or purely Fourier-based positional encodings.

Dynamic variants, such as DyWPE, modulate scale embeddings by the wavelet coefficients associated with the actual content of the signal, re-injecting dynamically updated, signal-aware positional codes (Irani et al., 18 Sep 2025).

Graph-Structured Data

Wavelet positional encoding in graphs operates in the graph spectral domain. Given the normalized Laplacian $L = I_n - D^{-1/2}AD^{-1/2}$ , and its eigendecomposition $L = U\Lambda U^T$ , wavelet (bandpass) and scaling (low-pass) filters are applied to the spectrum to generate node-specific, scale-indexed features: $\Psi_s = U\,\mathrm{diag}(g(s\lambda_1),...,g(s\lambda_n))\,U^T$ Permutation-equivariant maps (tensor contractions and MLPs) compress the stack $\{\Psi_{s_i}\}$ into an $n\times k$ positional encoding matrix, guaranteeing joint spectral and spatial localization and retaining equivariance (Ngo et al., 2023).

Position Coding via Binary Wavelets

Unique position identification in large discrete arrays (e.g., for pen computing or location tracking) is achievable by encoding $(x,y)$ coordinates into invertible 4×4 binary “tiles” via a binary wavelet transform. The coefficient-to-block mapping is exact and invertible over $\mathrm{GF}(2)$ , with efficient $O(1)$ complexity for both encoding and decoding (0706.0869).

3. Amplitude–Phase and Shift-Invariant Encoding

In wavelet transforms with complex (analytic) basis functions, the representation admits an amplitude–phase decomposition: $c_{j,k} = a_{j,k} e^{i\phi_{j,k}}$ The phase $\phi_{j,k}$ encodes the sub-sample shift of the wavelet atom at scale $j$ and location $k$ relative to the underlying signal, achieved via the action of fractional Hilbert transforms: $H^\alpha f(x) = \cos(\pi\alpha) f(x) - \sin(\pi\alpha) H f(x)$ with $H$ the standard Hilbert transform. This action shifts a cosine carrier by $\pi\alpha$ , directly parameterizing local displacements.

The dual-tree complex wavelet transform (DT-CWT) leverages this structure to provide shiftable and nearly shift-invariant representations; the phase channels can be interpreted as encoding local position continuously within each scale and orientation (0908.3383, 0908.3855).

In the context of self-attention, certain relative or rotary position embeddings (e.g., RoPE) can be algebraically reduced to single-scale Haar-like wavelet transforms, exposing their limitations in extrapolation due to lack of multiscale coverage (Oka et al., 4 Feb 2025).

4. Multiscale Adaptivity and Practical Benefits

Wavelet-based schemes inherently support multiresolution analysis (MRA): each positional code, atom, or coefficient (indexed by scale and position) encodes information at a tunable spatial and frequency granularity. In LLMs, this supports robust extrapolation to much longer contexts than seen during training, as all positions—even those far beyond $L_\mathrm{train}$ —receive valid, interpretable coefficients for both coarse and fine semantic dynamics.

In vision and graphics pipelines, such as WIPES (Zhang et al., 18 Aug 2025), Morlet-style wavelet primitives parameterized by center, scale (covariance), and frequency vector, realize continuous spatial–frequency decompositions. By tuning parameters, both broad “forest” and detailed “trees” can be synthesized efficiently, bridging the gap between global and local structure with fast differentiable rasterization and analytic gradients.

In time series or dynamic signals, wavelet-based positional encodings using DWT (including dynamic, signal-aware variants) allow transformers to adapt their temporal representations according to the inherent multi-scale structure of the data, as demonstrated by significant accuracy gains over sinusoidal PE (Irani et al., 18 Sep 2025).

5. Empirical Performance, Trade-offs, and Limitations

Across domains, wavelet-based positional representations demonstrate key advantages:

Superior expressiveness: Outperformance of sinusoidal/absolute and relative/fixed bias schemes in extrapolative and hierarchical tasks (e.g., +9.1% accuracy in biomedical time series (Irani et al., 18 Sep 2025), higher rendering quality and speed in visual primitives (Zhang et al., 18 Aug 2025), lower perplexity at scale in language modeling (Oka et al., 4 Feb 2025)).
Spatial and frequency localization: Robustness to non-stationary or locally-structured signals, improved shift-invariance (DT-CWT, amplitude–phase decomposition).
Multiresolution control: Adaptable window sizes and context sensitivity, controlled via scale parameterization.

Implementation complexity can increase—choosing and learning adaptive wavelet bases, initializing at signal singularities/WMM points, handling inverse transforms and scale-alignment—as highlighted by the need for careful basis selection, initialization strategies, and efficient DWT/IDWT routines.

A plausible implication is that wavelet-based positional encodings will become increasingly central as models are scaled to operate on longer contexts, highly structured graphs, or multi-modal domains where both absolute and hierarchical position must be encoded robustly.

6. Application-Specific Design and Algorithmic Recipes

Successful deployment of wavelet-based positional representations hinges on task-specific calibration:

INRs and implicit functions: Use split architectures for low-pass (scaling) and high-pass (complex wavelet) components, initialize wavelet nodes at WMM/edges for rapid convergence, preserve analytic or polynomial nonlinearity in complex-valued layers (Roddenberry et al., 2023).
Transformer and sequence models: Select mother wavelet type (Daubechies, symlet, Coiflet for fixed; Ricker, Gaussian, Morlet for adaptive), control number of levels ( $J$ ) according to sequence length, and adjust for implementation caveats in batch and sequence alignment (Zhuang et al., 2022).
Graph learning: Employ permutation-equivariant, scale-stacked contraction networks atop heat kernel or other spectral wavelets, balance number of scales and contraction depth for performance and efficiency (Ngo et al., 2023).
Position coding: For discrete arrays, invertible binary wavelet transforms efficiently provide unique positional identification with $O(1)$ per-block complexity (0706.0869).

Empirical ablations confirm that multiple scales and bands are essential—single-scale or global-only approaches (Fourier, RoPE as Haar transform) are markedly inferior for tasks involving non-stationary, local detail, or context extrapolation.

7. Connections, Generalizations, and Future Directions

Wavelet-based positional representations unify time/frequency-space embedding, shift-invariance, and hierarchical feature support within a common algebraic and computational framework. They generalize and subsume many existing positional/positional encoding schemes and find applications ranging from explicit position coding (0706.0869), over graph wavelets (Ngo et al., 2023), to continuous visual and neural signal representation (Roddenberry et al., 2023, Zhang et al., 18 Aug 2025).

Potential directions include: extending adaptivity through fully learnable and signal-responsive wavelet dictionaries, deeper theoretical integration with equivariant and spectral graph architectures, low-latency hardware realization for real-time rendering, and interpretability analyses via explicit amplitude–phase (fractional Hilbert transform) decompositions.

Wavelet-based positional representations thus constitute a mathematically principled, empirically validated, and highly versatile toolkit for encoding position in machine learning and signal-processing systems, with demonstrable advantages in spatial–spectral localization, extrapolation, and robustness.