Learned Modular Addition Circuits
- Learned modular addition circuits are neural architectures that compute (x+y) mod p using Fourier techniques and explicit geometric embeddings.
- They implement exact modular arithmetic via closed-form MLP solutions, sinusoidal activations, and attention mechanisms that align to discrete frequencies.
- Training these networks reveals grokking dynamics and emergent sparse Fourier features, validated by metrics like Fourier IPR and topological analyses.
A learned modular addition circuit is a neural architecture—typically a multi-layer perceptron (MLP), Transformer, or recurrent model—that computes for integers in , with weights directly inferred through optimization or analytically constructed to guarantee exact computation. Modular addition serves as the canonical benchmark for mechanistically interpretable circuits in modern deep learning, offering a controlled context to study solution structure, training dynamics (including grokking), expressivity, generalization, and phase transitions between representational regimes.
1. Exact Analytic Modular Addition Circuits in MLPs
The modular addition function admits a family of closed-form two-layer MLP solutions that generalize perfectly for any modulus . The standard architecture input-encodes as concatenated one-hot vectors in , passes through a hidden layer , applies an activation , then computes logits via a linear layer , and predicts .
With
where is any permutation of and are i.i.d. phases, and , the network implements exact modular addition in the infinite width limit. The forward computation collapses by random phase averaging to a Kronecker delta , correctly selecting the output class (Doshi et al., 2024).
Empirical experiments validate that, under strong regularization and partial data exposure, real networks trained on this task recover the analytic (Fourier) solution structure after a prolonged grokking phase, as measured by a sharp rise of the Fourier inverse participation ratio (IPR) from near zero to nearly one at test accuracy onset.
2. Fourier Circuits: Universality Across Architectures
Fourier-based representations underlie all observed general solutions to modular addition, including MLPs, Transformers, and RNNs. Each input is mapped, via learned or explicit embeddings, to the corresponding point on the complex unit circle: for a small set of dominant frequencies . Linear or bilinear operations propagate these features so that the circuit computes, across several nonzero ,
for each candidate output , selecting the for which this sum is maximized. This renders the modular addition computation into a discrete Fourier filtering operation, with neurons or attention heads sharply aligned to specific frequencies (Li et al., 2024, Furuta et al., 2024, Rangamani, 28 Mar 2025).
In width-bounded settings, MLPs with polynomial activations and sufficient size ( for -input addition) provably attain the maximal -margin and exactly implement this Fourier circuit. Analogous constructions and empirical observations hold for one-layer Transformers, whose attention matrices and projection weights align to discrete Fourier structure and for RNNs where only a handful of singular frequencies dominate the low-rank solution subspace.
3. Geometric and Topological Structure of Learned Representations
Learned modular addition circuits operate by embedding each input as a point in a learned geometric space (e.g., , ), with two universal configuration types:
- Grid/lattice: Embeddings organize such that for fills a grid; each region is class-specific and separable via linear decision boundaries ("clock" regime).
- Circle/toroidal: Embeddings form a ring so that points radially outward at an angle ; decoding is performed via "pizza slice" ReLU or periodic functions ("pizza" regime).
Topology analysis confirms these regimes: grid and pizza circuits correspond to 2D discs or tori in the representation manifold, all homeomorphic under linear mappings. Betti number computation (persistent homology) and principal component concentration provide robust empirical confirmation that these geometries capture the full variability post-training, with disc/tori projecting to circles at deeper network layers (Moisescu-Pareja et al., 31 Dec 2025, Musat, 2024).
Gradient descent, with or without explicit attention bias, almost invariably drives networks to collapse these geometric varieties to low-dimensional, discriminative manifolds. Uniform and trainable attention architectures empirically converge to the same underlying solution manifold, regardless of the presence or absence of explicit phase correlation.
4. Training Dynamics and Grokking
The emergence of modular addition circuits in real neural networks is governed by empirical grokking dynamics. Early in training, networks memorize the training set without true generalization. After an extended plateau, they abruptly transition to perfect held-out accuracy while the Fourier IPR metric climbs sharply, indicating that periodic solution structure has been discovered (Doshi et al., 2024, Furuta et al., 2024).
Weight decay is necessary for regularization: it prevents trivial overfitting, promotes low-norm, periodic weights, and, in geometric models, ensures that embeddings arrange smoothly into grid or circle attractors. Optimal test accuracy requires hidden dimension scaling with the number of input summands; the minimum width grows exponentially with input arity due to the need to cancel all non-aligning cross-terms in the expanded activation.
Only partial dataset exposure reliably reveals grokking and circuit formation; exposure to all cases typically yields immediate interpolation without formation of universal geometric solutions.
5. Activation Functions and Circuit Expressivity
Activation choice induces a fundamental separation in learnable modular addition circuit efficiency. Two-layer MLPs with sinusoidal activation (sine networks) admit a width-2 exact realization for fixed input length or with bias, a width-2 solution for all . Sine networks thus achieve population accuracy 1 by encoding the modular sum as an angle, and predicting via a cosine-majorization output layer. All weights and intermediate computations remain in a size-independent, periodic subspace.
In contrast, ReLU MLPs provably require width scaling linearly with and cannot, for fixed width, interpolate across inputs of two incongruent lengths modulo . Sample complexity for sine MLPs is optimal up to logarithmic factors (), whereas ReLU sample complexity and generalization margins degrade exponentially with input length (Huang et al., 28 Nov 2025).
Transformer variants equipped with sine activations in their feedforward sublayers outperform standard activations in both sample efficiency and out-of-domain length generalization.
6. Universality and Limitations
All studied neural architectures—MLPs, RNNs, Transformers—converge to equivalent modular addition circuits governed by Fourier structure. The computation decomposes into: input embedding on the complex circle; summation or combination via trigonometric identities; and readout via frequency filtering and maximization. Frequency sparsity is universal, with pretrained or trained solutions assigning almost all spectral energy to a small subset of nonzero frequencies determined by the problem instance (Li et al., 2024, Rangamani, 28 Mar 2025).
Limitations arise in the scalability of these circuits to more general modular polynomials: the hidden dimension must increase exponentially with the number of summands or degree of the modular polynomial, and random phase approximations become imperfect for small model sizes, degrading test accuracy (Doshi et al., 2024). Some modular polynomials are empirically unlearnable by neural architectures due to limits in circuit expressivity and the complexity of the underlying modular operation (Doshi et al., 2024, Furuta et al., 2024).
7. Empirical Verification and Measurement
Discovery and analysis of learned modular addition circuits employ a suite of quantitative metrics:
- Fourier Inverse Participation Ratio (IPR) quantifies the alignment of weight spectra to pure Fourier modes; analytic solutions produce IPR .
- Fourier Frequency Sparsity (FFS) and Fourier Coefficient Ratio (FCR) track solution purity in grokking, dropping rapidly and stabilizing as few dominant survive.
- Betti numbers and principal component analysis provide geometric and topological signatures of the underlying representation manifold.
Empirical protocols validate that ablation of dominant frequencies in RNN or Transformer weights causes monotonic collapse in accuracy, establishing the indispensable role of sparse Fourier features (Rangamani, 28 Mar 2025). Systematic phase-alignment distribution and manifold statistics confirm that alternative architectures and attention mixings yield fundamentally indistinguishable circuits for modular addition (Moisescu-Pareja et al., 31 Dec 2025).
Learned modular addition circuits provide a canonical mechanistically interpretable class of solutions for algebraic tasks in neural networks, underpinned by Fourier analysis, geometric manifold embeddings, and strong regularization. This topic continues to drive technical advances in the understanding of neural computation, generalization, and the limits of expressivity in functionally structured learning systems.