Papers
Topics
Authors
Recent
2000 character limit reached

Normalized Base-2 Encoding (NB2E)

Updated 13 December 2025
  • Normalized Base-2 Encoding (NB2E) is a deterministic method that converts continuous scalar values into fixed-length binary vectors via hierarchical dyadic step functions.
  • NB2E outperforms traditional Fourier and raw encodings by preserving phase and amplitude during extrapolation beyond the training domain.
  • Its implementation leverages normalized inputs and bit-level precision, enabling standard MLPs to reliably extend periodic function modeling outside observed ranges.

Normalized Base-2 Encoding (NB2E) is a deterministic method for encoding continuous scalar values as fixed-length binary vectors, designed to enable vanilla multi-layer perceptrons (MLPs) to extrapolate periodic functions well beyond their training range without requiring prior knowledge of functional form. NB2E leverages hierarchical, power-of-two “step” basis functions and bit-wise positional encoding—transforming real scalars into bit vectors whose structure aligns with the phase of periodic signals. This methodology has demonstrated robust out-of-distribution extrapolation on a diverse set of periodic functions, outperforming established encodings such as Fourier features in terms of phase and amplitude preservation outside the training interval (Powell et al., 11 Dec 2025).

1. Mathematical Formulation

Let xx denote a real scalar input. NB2E first normalizes xx by a positive constant %%%%2%%%%:

x=xz,x[0,1)x' = \frac{x}{z}, \qquad x' \in [0,1)

Given an integer precision parameter NN, the encoding γ(x)\gamma(x) is a length-NN binary vector [B1,B2,,BN][B_1, B_2, \dots, B_N] where

x=i=1NBi2iBi=2ix22i1x,i=1,,Nx' = \sum_{i=1}^N B_i 2^{-i} \qquad\Longleftrightarrow\qquad B_i = \left\lfloor 2^i x' \right\rfloor - 2 \left\lfloor 2^{i-1} x' \right\rfloor\,,\quad i = 1, \ldots, N

As NN \to \infty, NB2E encodes all points in the unit interval; practical use cases employ N=32N=32–48 for double-precision resolution. Each Bi{0,1}B_i \in \{0,1\} corresponds to the ii-th bit in the binary expansion of xx', with higher-order bits switching at dyadic intervals 2i2^{-i}.

2. Motivations and Theoretical Rationale

NB2E is motivated by limitations in traditional positional and Fourier encodings. Classical approaches such as

γFFE(x)=[sin(2πx),cos(2πx),sin(4πx),]\gamma_{\rm FFE}(x) = [\sin(2\pi x),\, \cos(2\pi x),\, \sin(4\pi x),\, \dots]

decompose xx across continuous frequencies but fail to support extrapolation of unknown periodicities outside the training window. In contrast, NB2E provides a hierarchical, dyadic decomposition wherein each bit BiB_i represents a step function of period 2i2^{-i}, localizing phase information at all scales. The normalization to [0,1)[0,1) is necessary to maintain consistent bit semantics across the domain and to ensure each bit achieves both values during training. This structure enables neural networks to recognize and generalize repeating bit patterns far beyond observed data.

3. Algorithmic Description and Implementation

The NB2E encoding algorithm is as follows. Given input (x,z,N)(x, z, N):

  1. Compute xx/zx' \leftarrow x/z
  2. For each i=1i = 1 to NN:
    • ui2ixu_i \leftarrow \left\lfloor 2^i x' \right\rfloor
    • ui12i1xu_{i-1} \leftarrow \left\lfloor 2^{i-1} x' \right\rfloor
    • Biui2ui1B_i \leftarrow u_i - 2u_{i-1}
  3. Return γ(x)=[B1,,BN]\gamma(x) = [B_1, \ldots, B_N]

This process is compactly expressed as

γ(x)i=2i(x/z)22i1(x/z),i=1,,N\gamma(x)_i = \left\lfloor 2^i(x/z) \right\rfloor - 2\left\lfloor 2^{i-1}(x/z) \right\rfloor\,, \quad i=1,\ldots,N

4. Empirical Behavior and Quantitative Results

NB2E has been evaluated on a suite of periodic functions:

  • f(x)=sin(x)f(x) = \sin(x)
  • f(x)=sin(x)+2.5sin(1.4x+3.7)f(x) = \sin(x) + 2.5\sin(1.4x + 3.7)
  • 2(x3.1x3.1+0.5)+4x5x5+0.512\bigl(\frac{x}{3.1} - \left\lfloor \frac{x}{3.1} + 0.5 \right\rfloor\bigr) + 4\bigl| \frac{x}{5} - \left\lfloor \frac{x}{5} + 0.5 \right\rfloor \bigr| - 1
  • A composite of beats, exponential decay, and square wave (see Appendix A of (Powell et al., 11 Dec 2025))

Experimental setup: 10410^4 i.i.d. samples xU[0,Xmax]x \sim \mathcal{U}[0, X_{\max}], training on x0.7x' \le 0.7 (normalized), testing on $0.7 < x' < 1$ (pure extrapolation). Several encoding/architecture baselines were assessed:

Input Encoding In-Domain MAE Extrapolation MAE Behavior Outside Training
Raw xx' 1\gg 1 N/A Fails on both train/test
Fixed Fourier (FFE) 103\sim 10^{-3} 50%\sim 50\% drift Loses phase, amplitude alignment
NB2E 103\sim 10^{-3} 10310^{-3}10210^{-2} Preserves phase, amplitude

NB2E-MLP configurations use five dense hidden layers (each 512 ELU units, L2L_2 regularization λ=104\lambda=10^{-4}), AdamW optimizer with cosine annealing, batch size $1000$, and $4000$ epochs. NB2E MLPs maintained signal period and amplitude well beyond the training interval, where other encodings produced significant drift or failed outright.

5. Internal Representation and Bit-Phase Dynamics

Analysis of internal activations was conducted by recording the hidden representations (512D) of NB2E-MLP on a full period of sin(x)\sin(x). Dimensionality reduction via UMAP and clustering by DBSCAN revealed that hidden representations form clusters corresponding both to:

  • The phase of the periodic target function.
  • Local bit patterns (e.g., “bits 5–7”).

Each cluster spans a narrow phase interval but is distributed over the full positional axis; beyond x>0.7x' > 0.7, the model continues to assign samples to correct bit-phase clusters, reconstructing sin(x)\sin(x) in proper phase. For complex composite signals, the network’s representations form higher-dimensional “chart” embeddings, effectively factorizing phase and bit-pattern for each periodic component.

6. Limitations and Extensions

Key limitations:

  • Training must encompass several complete periods (at least 5–7 cycles) for robust learning of bit-phase dynamics.
  • The greatest normalized training input, xmaxx'_{\max}, must exceed $0.5$ (for bit 1), $0.75$ (bit 2), etc.; otherwise, some bits never switch, resulting in phase resets at unsupervised dyadic boundaries.
  • Extrapolation is constrained to the nearest dyadic up to x<1x'<1; structure outside this interval is not guaranteed.

Potential extensions identified include:

  • Learning relative bit-differences (similar to relative positional encodings).
  • Integrating NB2E into sequence modeling architectures (e.g., RNNs, transformers), especially for irregular time series.
  • Extending to multi-dimensional input spaces via interleaved binary grids.

7. Comparison to Other Encodings and Model Classes

NB2E differs from other input encodings in several key respects:

  • Fourier encodings provide identical frequency scales but, being continuous, lack the discrete positional landmarks for out-of-domain phase matching.
  • Random Fourier features create kernel approximations but are not suited for deterministic phase extrapolation.
  • Symbolic regression and physics-informed neural networks (PINNs) demand either known functional forms or explicit structure search.
  • SIREN (sinusoidal activations) can be combined with NB2E—a combination yielding lower mean absolute error (MAE 104\sim 10^{-4}) in preliminary testing, though comprehensive analysis is pending.

NB2E’s transformation of scalars into a structured binary hierarchy allows standard MLPs to “factor” phase information, enabling interpolation and projection beyond the observed domain. In contrast, continuous and Fourier encodings do not induce the necessary discrete synchrony for extrapolatory phase alignment (Powell et al., 11 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Normalized Base-2 Encoding (NB2E).