PRISM: Phase-Resonant Intelligent Spectral Model
- PRISM is a neural sequence model that uses complex harmonic embeddings to decouple semantic memory from reasoning, enabling efficient global context modeling.
- It replaces quadratic self-attention with linearithmic Gated Harmonic Convolutions using FFT-based spectral filtering for lower computational complexity.
- Experimental results on WMT14 demonstrate near-lossless plasticity with rapid adaptation and minimal catastrophic forgetting when novel mappings are introduced.
The Phase-Resonant Intelligent Spectral Model (PRISM) is a neural sequence model that introduces a complex-domain harmonic embedding for token representations and replaces standard quadratic self-attention with linearithmic Gated Harmonic Convolutions (GHC). PRISM is designed to structurally decouple semantic memory from reasoning, addressing the plasticity–stability dilemma endemic to transformer architectures. By encoding semantic identity as resonant frequencies in complex space and operating entirely in the frequency domain, PRISM achieves efficient global context modeling and demonstrates superior adaptation to novel concepts with minimal catastrophic forgetting (Yıldırım et al., 1 Dec 2025).
1. Harmonic Embedding of Semantic Identity
In PRISM, tokens are embedded as oscillators in , not as real-valued static vectors in . Each token’s complex harmonic embedding is given by
where is a learnable amplitude vector specific to each token, is a fixed frequency vector (geometrically spaced across the embedding dimensions), is the token’s position, and “” denotes element-wise multiplication. For each k-th dimension:
This encoding ensures shift-invariance; the relative displacement between positions appears as a phase difference:
No additional positional encodings or learned biases are required, and semantic identity is innately tied to resonant frequency.
2. Gated Harmonic Convolutions and Spectral Processing
PRISM replaces explicit attention with Gated Harmonic Convolutions (GHC), achieving complexity via Fast Fourier Transform (FFT)-based convolutions. The GHC pipeline comprises:
- ModReLU Activation: To preserve phase, PRISM utilizes ModReLU, defined for as
where is a learned bias, scaling amplitude without altering phase.
- Spectral Gating: The input is concatenated into and passed through a real linear layer and sigmoid gating. The resulting gate modulates the channels:
- Spectral Convolution: A global spectral filter is applied in Fourier space:
denotes the (I)FFT.
Complexity per layer is , whereas self-attention is . This spectral approach exploits global receptive fields while being computationally efficient.
3. Phase-Resonant Spectral Attention Block
A single PRISM layer, called a PR-Attention (GHC) block, proceeds as follows:
- Harmonic embedding:
- Spectral gating:
- FFT:
- Spectral filtering:
- Inverse FFT:
- ModReLU nonlinearity:
- (Optional) Feed-forward and residual:
This sequence unifies global receptive field modeling with phase-locked harmonic representations, obviating the need for token-pair attention scores.
4. Semantic Alignment Tax and Diffusive Learning
Standard transformers require extensive optimization to organize random initializations into coherent semantic maps, incurring a "Semantic Alignment Tax." This phase is characterized by slow, diffusive gradient processes that impose a fixed geometric barrier not alleviated by increased model depth. The Iterative Semantic Map Refinement (ISMR) protocol isolates this cost:
- Train a vanilla Transformer for steps to obtain
- Reset encoder/decoder weights, retain
- Resume training
ISMR shows that initialization geometry, not depth, determines the alignment tax. All depths benefit equally (e.g., BLEU 0.06 at 200 steps) and this logarithmic refinement is unobtainable without geometric pre-alignment (Yıldırım et al., 1 Dec 2025). This suggests inherent limitations in Euclidean semantic representations and highlights the need for a non-diffusive encoding.
5. Experimental Evaluation: WMT14 and Plasticity–Stability Tradeoff
PRISM was validated on the WMT14 De→En translation benchmark. Both PRISM and baseline Transformers used comparable decoder architectures; PRISM's encoder consisted of GHC layers and untied complex embeddings (95.6M params versus the Transformer's 73.9M).
- Marathon (Generalization):
- At 50K steps, Transformer reached 23.88 BLEU, PRISM achieved 21.40 BLEU.
- Sprint (Plasticity–Stability Stress Test):
- Models were injected with 5 novel compound mappings and fine-tuned on 25 samples for 10 steps (LR = ).
- Acquisition = correct translations in context (max 25); BLEU as drop on WMT14.
Table: Comparative Results on Injection
| Model | Updates | LR | Acquisition (%) | Post-inj BLEU | BLEU |
|---|---|---|---|---|---|
| Transformer (low LR) | 10 | 12 (3/25) | 10.06 | –13.80 | |
| Transformer (high LR) | 5 | 60 (15/25) | 13.31 | –10.55 | |
| PRISM | 10 | 96 (24/25) | 21.54 | –0.84 |
Transformers exhibited either “inertia” (failure to acquire) or “collapse” (catastrophic forgetting), while PRISM achieved near-lossless plasticity (96% acquisition, 1 BLEU loss).
6. Theoretical Decoupling: Memory and Reasoning
In Euclidean architectures, new concepts must move embedding vectors from random starting points to their proper positions, but low-magnitude learning rates (necessitated by global reasoning weights) slow this process, causing a mismatch with the faster adaptation of attention weights . This results in catastrophic forgetting, as must compensate for mislocalized .
In PRISM, semantic identity is encoded by orthogonal fixed frequencies . Introducing a new concept requires only updating its amplitude :
with all frequency coordinates orthogonal and phases fixed. The spectral kernel (reasoning parameters) need not adapt; their updates remain negligible. The formal result is
and cross-terms vanish by orthogonality. This framework yields instantaneous map alignment, stable reasoning weights, and perfect decoupling of memory and reasoning.
7. Implications and Related Work
PRISM offers a structural resolution to the plasticity–stability dilemma by separating knowledge storage (memory) from logical composition (reasoning). Harmonic embeddings and global spectral filtering provide an alternative inductive bias to self-attention, contrasting with rotary positional encoding schemes (Yıldırım et al., 1 Dec 2025). The results suggest that certain forms of catastrophic rigidity in traditional transformers are rooted in the limitations of diffusive, Euclidean learning and that spectral representations in are promising for scenarios demanding rapid adaptation without global parameter reconfiguration. A plausible implication is that phase-resonant encoding could inform neural models beyond NLP, including domains requiring continual incremental learning.