Residue Number System (RNS)

Updated 21 November 2025

RNS is a carry-free number system defined over pairwise-coprime moduli, enabling parallel, component-wise arithmetic operations.
It leverages modular arithmetic in each residue channel to eliminate carry propagation, leading to constant-time addition, subtraction, and multiplication.
The choice of moduli directly affects bit efficiency and dynamic range, making RNS ideal for applications in digital signal processing, cryptographic accelerators, and quantum computing.

A residue number system (RNS) is a non-weighted, carry-free number representation defined over a set of pairwise-coprime moduli, with component-wise, parallelizable arithmetic operations. Formally, each integer $X$ within a dynamic range $[0, M)$ , where $M = \prod_{i=1}^k m_i$ and $\{m_i\}$ is a set of coprime moduli, is mapped to its residue vector $(r_1, ..., r_k)$ , where $r_i = X \bmod m_i$ (Liu et al., 2020, Dutta et al., 2012). This structure enables highly parallel hardware implementations for addition, subtraction, and multiplication and underpins advanced architectures in cryptography, digital signal processing, deep neural network acceleration, quantum computing, and photonic/analog computing.

1. Formal Definition, Representation, and Reconstruction

Given coprime moduli $\{m_1, ..., m_k\}$ , any integer $X \in [0, M)$ is uniquely encoded as

$X \longleftrightarrow (r_1, ..., r_k), \quad r_i = X \bmod m_i.$

Reconstruction employs the Chinese Remainder Theorem (CRT): $X \equiv \sum_{i=1}^k r_i\, M_i\,N_i \bmod M,\quad M_i = M/m_i,\quad N_i = M_i^{-1}\bmod m_i.$ This map is bijective due to the coprimality constraint, and arithmetic operations on vectors are performed element-wise modulo each $m_i$ , ensuring no inter-channel carry propagation (Demirkiran et al., 2023, Dutta et al., 2012, 0901.1123).

2. Carry-Free Arithmetic: Basic Operations and Advantages

Component-wise modular arithmetic is the central strength of RNS:

Addition: $(X+Y)\bmod m_i = (r_i+s_i)\bmod m_i$
Subtraction: $(X-Y)\bmod m_i = (r_i-s_i)\bmod m_i$
Multiplication: $(X \times Y)\bmod m_i = (r_i \times s_i)\bmod m_i$

This enables:

Complete elimination of carry chains, which is the bottleneck in binary and BCD systems for wide datapaths (Mousavi et al., 10 Aug 2024).
Massive parallelism in hardware, as each residue channel is independent (Gorodecky et al., 2018, Dutta et al., 2012).
Uniform, deterministic arithmetic latency across addition, subtraction, and multiplication (Demirkiran et al., 2023, Mousavi et al., 10 Aug 2024, Liu et al., 2020).

Component-wise modularity is leveraged in DNN hardware (Liu et al., 2020), cryptographic accelerators (Garg et al., 2016), and high-throughput DSP blocks (Dutta et al., 2012).

3. Moduli Selection, Bit Efficiency, and Dynamic Range

The choice of moduli directly determines RNS bit efficiency, area, frequency, and dynamic range:

Bit efficiency: For moduli set $\{m_1,\dots,m_k\}$ and range $M = \prod_i m_i$ , the efficiency is $\eta = \log_2 M / \sum_{i=1}^k \lceil\log_2 m_i\rceil$ (Dutta et al., 2012).
Canonical sets: $\{2^n-1, 2^n, 2^n+1\}$ and $\{2^n, 2^{2n}-1, 2^{2n}+1\}$ offer high dynamic ranges with efficient reverse converters (0901.1123, Dutta et al., 2012).
Bit-efficient construction: Start with the core three-moduli set, appending the smallest coprime values as needed; total slice utilization and critical path improve over classical methods (Dutta et al., 2012).

Moduli Set	Dynamic Range	Typical Use
$\{2^n-1,2^n,2^n+1\}$	$2^n \cdot (2^{2n}-1)$	DSP, DNN, Crypto
$\{2^n,2^{2n}-1,2^{2n}+1\}$	$2^n (2^{4n}-1)$	High-DR systems
Custom coprime sets	Custom (product)	Fault-tolerance

Increasing the number of moduli grows the dynamic range multiplicatively, enabling sub-word residues and low-precision arithmetic for large-range computations (Demirkiran et al., 2023).

4. Hardware, Boolean, Photonic, and Quantum Realizations

Hardware Boolean Minimization:

Modular reduction and multiplication circuits synthesized using truth-table decomposition, SOP minimization, and combinational AND-OR/XOR gates achieve order-of-magnitude improvements in area and speed over standard EDA flows (Gorodecky et al., 2018).
Specialized residue generators, e.g., mod $2^n+1$ with Diminished-1 representation, further reduce area/latency, and support extension for conjugate moduli ( $2^n-1$ ), yielding bi-residue generators with shared hardware (Piestrak et al., 17 May 2025).

Photonic and Optical RNS:

RNS digit-wise shifting mapped to spatial routing in hybrid photonic–plasmonic (HPP) switch networks; each modulus bank spatially realizes residuals via one-hot encoded waveguides. Arithmetic uses cascaded 2×2 HPP switches under static voltage controls (Peng et al., 2017).
Wavelength-division multiplexing enables O(100) parallel RNS computations, with post-processing off-chip CRT, achieving sub-20 ps/operation and fJ/operation energy.

Quantum RNS:

RNS arithmetic is mapped into quantum circuits by distributing per-modulus operations to independent quantum processing units or jobs (Gaur et al., 7 Jun 2024, Gaur et al., 21 Jun 2025).
Quantum Diminished-1 Adder/Multiplier primitives for mod $2^n+1$ operations yield lower Toffoli depth, reducing quantum noise accumulation.
The distributed paradigm enables quantum circuit depth and T-count reduction up to 46.02% and 86.25% for multipliers of output size 16 qubits, with noise resilience improvements up to 133.2% for adders (Gaur et al., 7 Jun 2024, Gaur et al., 21 Jun 2025).

5. Advanced Methods: Multilayer, Redundant, and Fault-Tolerant RNS

Recursive (Multi-layer) RNS:

Constructs arbitrary-precision systems by recursively stacking virtual RNS layers, using carry-free Montgomery reduction at each level. The algorithm supports modular operations on RSA-scale (2048+ bit) moduli using only small-modulus arithmetic (e.g., 8 bits) at the hardware level (Hollmann et al., 2018).
Layered base extension, pseudo-residue handling, and redundancy ensure correctness and resistance to side-channel attacks.

Redundant RNS (R-RNS) and Error Correction:

Generalizes RNS by adding redundant moduli (e.g., for an RRNS(n,k) code), enabling error detection/correction through majority-voting reconstructions and per-channel correction (Demirkiran et al., 2023).
Digit-level redundancy, such as Signed-Digit SD-RNS, provides per-channel, per-digit carry-free addition and multiplication, with constant-time performance for additions and improvements in mixed operations. SD-RNS achieves 1.27× speedup over pure RNS and 2.25× over binary, with energy reductions up to 60% for DNN inference benchmarks (Mousavi et al., 10 Aug 2024).

Number System	Add Time	Mul Time	Energy	Best Application Scenario
BNS	Highest	Highest	Highest	None
RNS	Lowest	Higher	Lower	Addition-dominated
SD-RNS	Low	Low	Lowest	Mixed ops, DNN

6. RNS in Contemporary Computing: DNN Acceleration, High-Dimensional Methods, and Applications

Deep Learning Acceleration:

Large-tile Winograd convolution layers in quantized DNNs are accelerated by performing entire Winograd transformations in RNS, with all intermediate arithmetic in 8 or 16 bits and no loss in accuracy (Liu et al., 2020).
Analog/photonic DNN accelerators leverage RNS to decompose high-precision dot products into multiple concurrent low-precision MAC arrays, eliminating energy-prohibitive high-precision ADCs, achieving ≥99% of FP32 accuracy with 6-bit ADCs and 10²–10⁶× energy reduction (Demirkiran et al., 2023, Demirkiran et al., 2023).
Photonic tensor cores use RNS to realize modular arithmetic directly in phase, enabling high-speed (10 GHz) dot-products with only 5–6 bit conversion, achieving up to 23.8× throughput and 32.1× energy-delay-product gains over CMOS systolic arrays (Demirkiran et al., 2023).

Hyperdimensional and Neuromorphic Computing:

RNS is mapped into high-dimensional phasor/complex vector representations, supporting additive and multiplicative binding as Hadamard/phasor operators. Decoding employs resonator networks exploiting the RNS factor structure, yielding exponential dynamic range versus memory (Kymn et al., 2023).
These frameworks replicate grid-cell–like coding, solve NP-hard problems such as subset sum, and provide robust, noise-tolerant representations for machine learning tasks.

7. Theoretical and Practical Implications

RNS arithmetic eliminates the carry chain, facilitating massive hardware parallelism and constant-time arithmetic at all bit-widths.
Hardware realizations can achieve up to 30× higher speed, 15× area reduction versus standard synthesis, and unprecedented levels of fault tolerance when enhanced with digit- or modulus-level redundancy (Gorodecky et al., 2018, Mousavi et al., 10 Aug 2024).
In analog and photonic systems, RNS decouples per-channel converter precision from overall accuracy, allowing precise DNN training and inference at minimal data-converter energy (Demirkiran et al., 2023, Demirkiran et al., 2023).
In quantum computing, parallel distribution of RNS residue operations reduces circuit depth and enhances resilience to noise, offering a practical path to scalable quantum arithmetic in the NISQ era (Gaur et al., 7 Jun 2024, Gaur et al., 21 Jun 2025).

A plausible implication is that RNS, especially when combined with redundancy or implemented in non-traditional substrates, is positioned as a foundational mechanism for future highly parallel, energy-efficient, and noise-resilient arithmetic across digital, analog, photonic, and quantum computing platforms.