Adaptive Ternary Multi-step Neuron (ATMN)

Updated 5 February 2026

ATMN is a spiking neuron module that employs ternary-valued spikes and adaptive thresholds to enhance information density and computational efficiency in LLMs.
It integrates within the MAR framework to convert dense FFN activations into sparse, event-driven signals, reducing energy consumption and latency.
Empirical studies show that ATMN significantly improves zero-shot reasoning accuracy while facilitating low-power, scalable inference in transformer-based architectures.

The Adaptive Ternary Multi-step Neuron (ATMN) is a spiking neuron module designed to address the principal challenges of integrating bio-inspired spiking neural networks (SNNs) within LLMs, specifically under the Module-aware Architecture Refinement (MAR) framework. ATMN increases the information density of event-driven neural computation by emitting ternary-valued spikes, while using adaptive thresholds to maintain sparsity and expressive capacity. This construct enables efficient, low-energy inference in LLMs by sparsifying dense feed-forward activations and replacing multiply–accumulate operations with accumulator-based (AC) computation (Cai et al., 29 Jan 2026).

1. Architectural Motivation and Context

Quadratic self-attention and dense, wide feed-forward networks (FFNs) are the primary contributors to the energy and latency overhead of transformer-based LLMs. While previous work targeted either attention complexity or FFN density in isolation, the MAR framework addresses both by replacing attention layers with state-space models (SSMs) for linear-time processing and by introducing spike-based modules, including the ATMN, to enforce activation sparsity in FFN layers. The use of SNNs introduces concerns of low information density and temporal mismatch with their dense teacher counterparts. ATMN is specifically designed to mitigate these constraints by providing rich, signed event-driven representations while remaining computationally efficient (Cai et al., 29 Jan 2026).

2. Formal Description of ATMN Dynamics

ATMN is an augmentation of the classical Leaky Integrate-and-Fire (LIF) neuron. It differs from standard binary spike neurons by supporting ternary output $\mathbf{s_t \in \{-1, 0, +1\}}$ and utilizing a dynamically learned threshold $V_\mathrm{adaptive}$ for spike emission, thereby capturing both inhibitory and excitatory signaling. The per-step update equations are:

$h_t = I_t \cdot \delta_{t, 0} + \frac{1}{\tau} u_{t-1}$ $I_t$ : injected input current only at $t=0$ ( $\delta_{t,0}$ is the Kronecker delta); $\tau$ : membrane time constant; $u_{t-1}$ : prior membrane potential.
$s_t =\begin{cases} +1 & \text{if}\ h_t \geq V_\mathrm{adaptive} \ -1 & \text{if}\ h_t \leq -V_\mathrm{adaptive} \ 0 & \text{otherwise} \end{cases}$ 3. $u_t = h_t - s_t \cdot V_\mathrm{adaptive}$ 4. $V_\mathrm{adaptive} = \exp(a)$ , with $a$ a trainable scalar ( $a \geq -\infty$ ).

This architecture allows the neuron to admit negative outputs (inhibitory events), thereby approximately doubling information capacity over binary neurons, while maintaining event sparsity due to the thresholding mechanism (Cai et al., 29 Jan 2026).

3. Integration in LLM Pipelines

Within the MAR pipeline, ATMNs are woven directly before each linear FFN projection. This transforms the otherwise dense real-valued activations into sparse, ternary spike trains for subsequent processing. In practice, each activation is streamed through the ATMN over $T$ steps, converting multiply–accumulate (MAC) operations to sparse accumulator-based computations, resulting in large net energy savings. This architectural choice is critical, as FFNs remain a major energy and throughput bottleneck even after replacing attention with SSMs (Cai et al., 29 Jan 2026).

4. Advantages Over Binary Spiking Neurons

Standard SNN modules often utilize binary spikes $\{0,1\}$ , resulting in restricted information transmission. ATMN’s ternary scheme ( $\{-1,0,+1\}$ ) and adaptive thresholding offer two crucial advantages: higher per-neuron capacity and dynamic adjustment to input statistics. Experimental ablation studies confirm this: substituting binary spikes with ATMN in the MAR framework improves mean zero-shot accuracy from 46.3% to 55.2% across six reasoning benchmarks—a substantial gain attributable to the richer and more flexible signaling of the ternary neuron (Cai et al., 29 Jan 2026).

5. Empirical Performance and Energy Efficiency

MAR models incorporating ATMN, trained from a SSM-based Llamba-1B teacher, demonstrate high fidelity to dense model performance under strict energy constraints. On average, MAR achieves 57.20% zero-shot accuracy across benchmarks (PIQA, BoolQ, WinoGrande, HellaSwag, ARC-Easy, ARC-Challenge), compared to 61.88% for the 1.4B-parameter dense Llamba model and 52.48% for the 7B-parameter SpikeLLM baseline. Energy profiling shows that inference with ATMN modules in MAR requires approximately 40–50% less energy at $N=2000$ tokens, with the margin increasing for longer contexts, owing to the event-driven, accumulator-based nature of the computation (Cai et al., 29 Jan 2026).

6. Role in Distillation and Alignment

ATMN’s non-linearity and dynamical sparsity pose challenges for direct knowledge distillation from dense teacher models. To overcome this, ATMN is paired with the Spike-aware Bidirectional Distillation Strategy (SBDS), which operates at both logit and feature levels. At the logit level, a bidirectional Kullback–Leibler divergence (combining forward and reverse KL with coefficients $\alpha=0.2$ , $\beta=0.7$ ) is applied between teacher and student token distributions. At the feature level, Pre-Norm alignment is enforced immediately after the first RMSNorm in each layer, using Euclidean distance. Integrating ATMN with SBDS recovers semantic fidelity and mitigates performance gaps from temporal or sparsity-induced mismatches between teacher and student (Cai et al., 29 Jan 2026).

7. Implications and Applications

The inclusion of ATMN in the MAR framework illustrates the feasibility of scalable, efficient, event-driven inference for LLMs under harsh resource budgets. This approach enables long-context, low-power inference on both edge hardware and data-center deployments, crucial for sustainable large-scale language processing. The ATMN and its integration protocols represent an advance in the joint application of bio-inspired sparse computation and sequence modeling, with empirical evidence demonstrating competitive or superior performance relative to denser or much larger SNN-based models (Cai et al., 29 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

MAR: Efficient Large Language Models via Module-aware Architecture Refinement (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Ternary Multi-step Neuron (ATMN).