Modulo-$(2^{2n}+1)$ Arithmetic via Two Parallel n-bit Residue Channels (2404.08228v2)
Abstract: Augmenting the balanced residue number system moduli-set ${m_1=2n,m_2=2n-1,m_3=2n+1}$, with the co-prime modulo $m_4=2{2n}+1$, increases the dynamic range (DR) by around 70%. The Mersenne form of product $m_2 m_3 m_4=2{4n}-1$, in the moduli-set ${m_1,m_2,m_3,m_4}$, leads to a very efficient reverse convertor, based on the New Chinese remainder theorem. However, the double bit-width of the m_4 residue channel is counter-productive and jeopardizes the speed balance in ${m_1,m_2,m_3}$. Therefore, we decompose $m_4$ to two complex-number n-bit moduli $2n\pm\sqrt{-1}$, which preserves the DR and the co-primality across the augmented moduli set. The required forward modulo-$(2{2n}+1)$ to moduli-$(2n\pm\sqrt{-1}) $conversion, and the reverse are immediate and cost-free. The proposed unified moduli-$(2n\pm\sqrt{-1})$ adder and multiplier, are tested and synthesized using Spartan 7S100 FPGA. The 6-bit look-up tables (LUT), therein, promote the LUT realizations of adders and multipliers, for $n=5$, where the DR equals $2{25}-25$. However, the undertaken experiments show that to cover all the 32-bit numbers, the power-of-two channel $m_1$ can be as wide as 12 bits with no harm to the speed balance across the five moduli. The results also show that the moduli-$(25\pm\sqrt{-1})$ add and multiply operations are advantageous vs. moduli-$(25\pm1)$ in speed, cost, and energy measures and collectively better than those of modulo-$(2{10}+1)$.