Papers
Topics
Authors
Recent
2000 character limit reached

Signed-Zero Ternary Quantization

Updated 6 January 2026
  • Signed-Zero Ternary is a deterministic 2-bit quantization scheme that uses four codewords—including a signed zero—to enhance gradient propagation during training.
  • It improves training stability and increases information density by enabling bit-exact reproduction with a deterministic straight-through estimator.
  • SZT supports efficient integration with standard GEMM hardware and specialized in-memory computing, reducing memory footprint while accelerating convergence.

Signed-Zero Ternary (SZT) quantization is a compact, deterministic 2-bit quantizer designed to improve the training stability and information density of large neural networks—particularly transformer-based models—without introducing architectural or computational burdens on inference hardware. SZT extends the conventional ternary quantization scheme by introducing an explicit sign bit for zero, allowing for more informed gradient propagation and a strictly higher entropy per parameter, all while remaining compatible with standard general matrix-matrix multiply (GEMM) hardware (Uhlmann, 8 Aug 2025). The approach also has implications for in-memory computing substrates designed for ultra-low precision deep neural networks (Thakuria et al., 2024).

1. Quantizer Definition and Properties

SZT operates on a latent real-valued weight wRw \in \mathbb{R} using a fixed, symmetric threshold Δ>0\Delta > 0. The SZT encoding function maps ww deterministically to one of four codewords {1,0,+0,+1}\{-1, -0, +0, +1\}:

szt(w)={+1if w>Δ +0if 0<wΔ 0if Δw<0 1if w<Δ\operatorname{szt}(w) = \begin{cases} +1 & \text{if}\ w > \Delta \ +0 & \text{if}\ 0 < w \leq \Delta \ -0 & \text{if}\ -\Delta \leq w < 0 \ -1 & \text{if}\ w < -\Delta \end{cases}

In the forward pass, the decode function maps both signed-zero states to numeric zero:

v(q)={+1q=+1 0q{+0,0} 1q=1v(q) = \begin{cases} +1 & q = +1 \ 0 & q \in \{+0, -0\} \ -1 & q = -1 \end{cases}

This mapping ensures that all subsequent matrix-multiply operations can use existing ternary (three-value) kernels without modification. Only encode (bit-packing) and decode (unpacking) routines differ, as the signed-zero information is embedded solely at storage time.

Compared to classical balanced ternary quantization, which uses three codewords {1,0,+1}\{-1, 0, +1\}, SZT’s four codewords enable the sign of zero to be maintained through the training process, informing learning dynamics without altering the inference signal.

2. Gradient Propagation and the Deterministic Straight-Through Estimator (STE)

The SZT quantizer is inherently non-differentiable. To provide meaningful gradients during backpropagation, SZT adopts a deterministic STE that leverages the retained sign bit for sub-threshold (dead-zone) weights:

Let \ell be the scalar loss and q\nabla_q \ell the upstream gradient with respect to the quantized variable. The surrogate gradient for ww is

^w={qw>Δ sgn(q)qwΔ\widehat{\nabla}_w \ell = \begin{cases} \nabla_q \ell & |w| > \Delta \ \operatorname{sgn}(q) \cdot \nabla_q \ell & |w| \leq \Delta \end{cases}

where sgn(q)=+1\operatorname{sgn}(q)=+1 for q{+1,+0}q\in\{+1, +0\} and 1-1 for q{0,1}q\in\{-0, -1\}.

Key distinctions:

  • Outside the dead zone (w>Δ)(|w| > \Delta): Gradients are passed unaltered.
  • Inside the dead zone (wΔ)(|w| \leq \Delta): The gradient is multiplied by the stored sign, producing a deterministic "push" toward escaping zero in the direction of the latent ww.

In contrast, vanilla ternary STEs nullify the gradient in the dead zone, and stochastic rounders inject noise. SZT achieves zero extra variance, strictly lower STE-induced MSE than balanced ternary or stochastic rounding (for w<Δ/2|w|<\Delta/2), and bit-exact, fully deterministic reproduction of training trajectories (Uhlmann, 8 Aug 2025).

3. Information Density and Computational Cost

Storage:

  • All ternary and SZT schemes use 2 bits per parameter, while FP32 requires 32 and 4-bit quantization uses 4 bits.
  • With a memory budget BB (bits), SZT supports 16× more parameters than FP32 and 2× more than 4-bit quantization under the same BB.

Shannon Entropy:

Let P0=Pr(wΔ)P_0 = \Pr(|w| \leq \Delta), P+=P=(1P0)/2P_+ = P_- = (1-P_0)/2 by symmetry. Balanced ternary and SZT have entropies

HBT=[Plog2P+P0log2P0+P+log2P+]H_\text{BT} = -[P_- \log_2 P_- + P_0 \log_2 P_0 + P_+ \log_2 P_+]

HSZT=[Plog2P+2(P0/2)log2(P0/2)+P+log2P+]H_\text{SZT} = -[P_- \log_2 P_- + 2 (P_0/2) \log_2 (P_0/2) + P_+ \log_2 P_+]

So, HSZTHBT=P0H_\text{SZT} - H_\text{BT} = P_0 bits/parameter, a gain of $0.2$–$0.3$ bits per parameter in typical transformer layers.

Computation:

  • Forward and backward GEMM costs are unchanged relative to balanced ternary since all zeros (signed or otherwise) are numerically identical.
  • Encode/decode overhead is limited to a 2-bit lookup per weight on (un)packing, negligible relative to GEMM.

4. Application to Neural Networks and Training Behavior

No large-scale empirical benchmarks are currently provided, but SZT analysis predicts:

  • Improved stability (reduced incidence of “stuck” weights)
  • More rapid convergence, as mean first-passage time (MFPT) out of the dead zone is shortened from exponential (in balanced ternary) to linear in 1/κ1/\kappa under a simple Ornstein–Uhlenbeck stochastic differential equation approximation
  • No additional inference error relative to balanced ternary; all quantization loss is inherited from restricting weights to {1,0,+1}\{-1,0,+1\} (Uhlmann, 8 Aug 2025)

A plausible implication is that SZT quantization maintains or exceeds balanced ternary performance on language and vision benchmarks at significantly lower memory footprints.

5. Implementation Details and Practical Integration

SZT is designed for efficient Quantization-Aware Training (QAT):

  1. Encode qt=szt(wt)q_t = \operatorname{szt}(w_t)
  2. Decode w~t=v(qt){1,0,+1}\tilde{w}_t = v(q_t) \in \{-1,0,+1\}
  3. Compute forward pass and loss \ell
  4. Backpropagate via deterministic STE for surrogate gradient
  5. Optimizer update

Threshold selection: Δ=σ\Delta = \sigma (per-layer std deviation), which is MSE-optimal for Laplace-like distributions and nearly so for Gaussian.

SZT requires no modifications to matrix-multiply or sparse-GEMM kernels; all logic is in encode/decode layers. Determinism enables exact reproduction of training trajectories across runs with fixed seeds—distinguishing SZT from methods that rely on stochastic rounding.

Channel-wise or per-group thresholds (Δc\Delta_c) are straightforward extensions, and mixed-precision schemes (e.g., SZT for attention layers, higher precision elsewhere) are possible within a given memory/computational envelope.

6. Theoretical Insights and Generalization Potential

Mean First-Passage Time (MFPT):

Under dWt=κWtdt+σdBtdW_t = -\kappa W_t dt + \sigma dB_t, escape from the dead zone [Δ,Δ][-\Delta, \Delta] takes exponentially longer in balanced ternary (as (π/2κ)e(κΔ/σ)2\sim ( \sqrt{\pi}/2\kappa ) e^{(\kappa\Delta/\sigma)^2 }) than in SZT, where the sign feedback leads to linear escape time (1/κ1/\kappa). The dead zone is thus no longer absorbing in SZT.

PAC-Bayes Bound Tightening:

Splitting the zero bin reduces the Kullback-Leibler (KL) divergence term by dP0ln2d P_0 \ln 2 nats, corresponding to a risk tightening of: ΔRiskdP0ln22(N1)\Delta_{\text{Risk}} \leq \sqrt{\frac{d P_0 \ln 2}{2 (N-1)}}

Activation-Side SZT:

Post-ReLU activations encoded with signed zeros preserve sub-threshold sign information; all analytic SZT benefits (STE, entropy, sensitivity, MFPT) transfer with minor adjustments.

Variants:

SZT generalizes to mixed-precision and codeword repurposing, though extension to full quaternary alphabets may compromise determinism and hardware simplicity.

7. Hardware and System-Level Implementation

SZT’s compatibility with conventional matrix-multiply hardware enables software adoption without hardware changes. For computing-in-memory implementations, related work on SiTe CiM (Thakuria et al., 2024) demonstrates physical realization of signed-zero ternary in cross-coupled SRAM, eDRAM, and FeMFET arrays:

  • SiTe CiM I uses two extra transistors per cell (for fast per-cell cross-coupling), with 18–34% area overhead, yielding up to 88% MAC latency reduction and 78% array-level MAC energy savings.
  • SiTe CiM II optimizes for area, using four extra transistors per 16 cells, attaining only a 6% overhead but still achieving 61–63% energy savings and high compute throughput.
  • Accelerator-level integration provides up to 7× throughput increases and 2.5× energy reduction with negligible accuracy loss when compared to near-memory ternary accelerators.

This suggests SZT and signed-zero ternary codes are well-suited for hardware deployments where minimizing storage and energy per operation is critical.

References

  • "The Fourth State: Signed-Zero Ternary for Stable LLM Quantization (and More)" (Uhlmann, 8 Aug 2025)
  • "SiTe CiM: Signed Ternary Computing-in-Memory for Ultra-Low Precision Deep Neural Networks" (Thakuria et al., 2024)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Signed-Zero Ternary (SZT).