Papers
Topics
Authors
Recent
Search
2000 character limit reached

Digital SPUs with Low-Discrepancy Generators

Updated 14 April 2026
  • Digital SPUs with low-discrepancy generators are architectures that use specifically constructed digital sequences over finite fields to achieve uniform sample distributions.
  • They employ robust mathematical frameworks and optimal discrepancy bounds to enable high-throughput and energy-efficient stochastic computations.
  • Hardware implementations leverage bitwise operations, matrix parameterization, and parallel pipeline strategies to significantly improve area, power, and latency metrics.

Digital Stochastic Processing Units (SPUs) with low-discrepancy generators constitute a fundamental technique at the intersection of computational hardware design and quasi-Monte Carlo (QMC) methods. These systems leverage explicitly constructed digital sequences—often over finite fields—to achieve optimized uniformity of sample points, enabling applications from high-throughput sampling to energy-efficient stochastic computing. This article surveys the mathematical frameworks, optimality results, construction methods, sequence variants, hardware mapping, and empirical performance guarantees established for digital SPUs using low-discrepancy generators.

1. Mathematical Foundations of Digital Low-Discrepancy Sequences

Low-discrepancy sequences are specifically structured to minimize the deviation between the empirical distribution of a set of points and the uniform measure over [0,1)s[0,1)^s. For an NN-point set P={x0,,xN1}P = \{\mathbf{x}_0, \dots, \mathbf{x}_{N-1}\} in [0,1)s[0,1)^s, the star discrepancy DN(P)D^*_N(P) and the L2\mathcal{L}_2-discrepancy L2,N(P)\mathcal{L}_{2,N}(P) are formalized as: DN(P)=supy[0,1]s1Nn=0N11[0,y)(xn)i=1syi,D^*_N(P) = \sup_{\boldsymbol y\in[0,1]^s} \left| \frac1N\sum_{n=0}^{N-1}\mathbf{1}_{[0,\boldsymbol y)}(\mathbf{x}_n) - \prod_{i=1}^s y_i \right|,

L2,N(P)=([0,1]sΔP(t)2dt)1/2,\mathcal{L}_{2,N}(P) = \left( \int_{[0,1]^s} |\Delta_P(\mathbf{t})|^2 d\mathbf{t} \right)^{1/2},

where ΔP(t)\Delta_P(\mathbf{t}) denotes the local discrepancy function.

Digital NN0-sequences—originating from digital nets over finite fields—form the backbone of such constructions. Over NN1, an NN2-dimensional higher-order digital sequence is parameterized by NN3 infinite binary matrices NN4, with each coordinate of the NN5th point in NN6 given by a truncated binary linear transformation of the base-2 digit vector of NN7 (Dick et al., 2012).

Variants such as van der Corput–Kronecker and hybrid digital sequences are constructed over NN8 (prime NN9), frequently exploiting special classes of Laurent series and properties derived from Diophantine approximation, including solutions to the P={x0,,xN1}P = \{\mathbf{x}_0, \dots, \mathbf{x}_{N-1}\}0-adic Littlewood Conjecture (Robertson, 2024, Hofer, 20 Jan 2025).

2. Optimal Discrepancy Bounds and Theoretical Rates

The optimality of digital sequences is formalized by matching upper and lower bounds on discrepancy. For P={x0,,xN1}P = \{\mathbf{x}_0, \dots, \mathbf{x}_{N-1}\}1-dimensional higher-order digital sequences over P={x0,,xN1}P = \{\mathbf{x}_0, \dots, \mathbf{x}_{N-1}\}2, it has been proven (Dick et al., 2012) that: P={x0,,xN1}P = \{\mathbf{x}_0, \dots, \mathbf{x}_{N-1}\}3 and, when P={x0,,xN1}P = \{\mathbf{x}_0, \dots, \mathbf{x}_{N-1}\}4,

P={x0,,xN1}P = \{\mathbf{x}_0, \dots, \mathbf{x}_{N-1}\}5

Here, P={x0,,xN1}P = \{\mathbf{x}_0, \dots, \mathbf{x}_{N-1}\}6 is a constant dependent only on P={x0,,xN1}P = \{\mathbf{x}_0, \dots, \mathbf{x}_{N-1}\}7 and the sequence quality parameter P={x0,,xN1}P = \{\mathbf{x}_0, \dots, \mathbf{x}_{N-1}\}8, but not on P={x0,,xN1}P = \{\mathbf{x}_0, \dots, \mathbf{x}_{N-1}\}9. These rates are best possible due to lower bounds established by Roth and Proinov, which assert that for all [0,1)s[0,1)^s0-point sets [0,1)s[0,1)^s1,

[0,1)s[0,1)^s2

and, for sequences, for infinitely many [0,1)s[0,1)^s3,

[0,1)s[0,1)^s4

For digital van der Corput–Kronecker sequences of dimension [0,1)s[0,1)^s5 over [0,1)s[0,1)^s6, the star discrepancy satisfies

[0,1)s[0,1)^s7

with absolute positive constants [0,1)s[0,1)^s8 depending on [0,1)s[0,1)^s9, DN(P)D^*_N(P)0, and the deficiency parameter, providing the DN(P)D^*_N(P)1 rate (Hofer, 20 Jan 2025). For hybrid digital sequences of Kronecker and van der Corput type in dimension DN(P)D^*_N(P)2, the bound becomes DN(P)D^*_N(P)3, with DN(P)D^*_N(P)4, DN(P)D^*_N(P)5 (Robertson, 2024).

3. Sequence Construction: Methods and Parameterization

Digital Sequences over DN(P)D^*_N(P)6 and Higher Order Generalizations

Construction begins with DN(P)D^*_N(P)7 infinite binary matrices DN(P)D^*_N(P)8, each truncated as necessary based on the required output precision. Given DN(P)D^*_N(P)9 in binary (L2\mathcal{L}_20), the point L2\mathcal{L}_21 is given via:

  • Compute the infinite binary row vector L2\mathcal{L}_22,
  • For each L2\mathcal{L}_23, form L2\mathcal{L}_24 in L2\mathcal{L}_25,
  • The L2\mathcal{L}_26th coordinate is L2\mathcal{L}_27 (Dick et al., 2012).

A typical explicit choice for L2\mathcal{L}_28 is to start from a digital L2\mathcal{L}_29 sequence (such as Sobol’/Niederreiter), apply a digit-interlacing map of factor L2,N(P)\mathcal{L}_{2,N}(P)0 (with L2,N(P)\mathcal{L}_{2,N}(P)1 for infinite sequences, L2,N(P)\mathcal{L}_{2,N}(P)2 for optimal finite sets).

Kronecker–van der Corput and Hybrid Constructions

In L2,N(P)\mathcal{L}_{2,N}(P)3, consider polynomials L2,N(P)\mathcal{L}_{2,N}(P)4 (irreducible, degree L2,N(P)\mathcal{L}_{2,N}(P)5) and Laurent series L2,N(P)\mathcal{L}_{2,N}(P)6 for sequence construction. For index L2,N(P)\mathcal{L}_{2,N}(P)7, the digital van der Corput value L2,N(P)\mathcal{L}_{2,N}(P)8 is defined through polynomial digit expansion and modular arithmetic; the Kronecker value L2,N(P)\mathcal{L}_{2,N}(P)9 is produced by a formal convolution of DN(P)=supy[0,1]s1Nn=0N11[0,y)(xn)i=1syi,D^*_N(P) = \sup_{\boldsymbol y\in[0,1]^s} \left| \frac1N\sum_{n=0}^{N-1}\mathbf{1}_{[0,\boldsymbol y)}(\mathbf{x}_n) - \prod_{i=1}^s y_i \right|,0 and DN(P)=supy[0,1]s1Nn=0N11[0,y)(xn)i=1syi,D^*_N(P) = \sup_{\boldsymbol y\in[0,1]^s} \left| \frac1N\sum_{n=0}^{N-1}\mathbf{1}_{[0,\boldsymbol y)}(\mathbf{x}_n) - \prod_{i=1}^s y_i \right|,1, interpreting the result as radical inverse sums (Robertson, 2024). Combining both yields two-dimensional hybrids DN(P)=supy[0,1]s1Nn=0N11[0,y)(xn)i=1syi,D^*_N(P) = \sup_{\boldsymbol y\in[0,1]^s} \left| \frac1N\sum_{n=0}^{N-1}\mathbf{1}_{[0,\boldsymbol y)}(\mathbf{x}_n) - \prod_{i=1}^s y_i \right|,2 with proven low-discrepancy bounds.

The digital van der Corput–Kronecker sequence of dimension DN(P)=supy[0,1]s1Nn=0N11[0,y)(xn)i=1syi,D^*_N(P) = \sup_{\boldsymbol y\in[0,1]^s} \left| \frac1N\sum_{n=0}^{N-1}\mathbf{1}_{[0,\boldsymbol y)}(\mathbf{x}_n) - \prod_{i=1}^s y_i \right|,3 uses DN(P)=supy[0,1]s1Nn=0N11[0,y)(xn)i=1syi,D^*_N(P) = \sup_{\boldsymbol y\in[0,1]^s} \left| \frac1N\sum_{n=0}^{N-1}\mathbf{1}_{[0,\boldsymbol y)}(\mathbf{x}_n) - \prod_{i=1}^s y_i \right|,4 (the identity), and DN(P)=supy[0,1]s1Nn=0N11[0,y)(xn)i=1syi,D^*_N(P) = \sup_{\boldsymbol y\in[0,1]^s} \left| \frac1N\sum_{n=0}^{N-1}\mathbf{1}_{[0,\boldsymbol y)}(\mathbf{x}_n) - \prod_{i=1}^s y_i \right|,5 (Hankel matrices of finite-deficiency Laurent series) for DN(P)=supy[0,1]s1Nn=0N11[0,y)(xn)i=1syi,D^*_N(P) = \sup_{\boldsymbol y\in[0,1]^s} \left| \frac1N\sum_{n=0}^{N-1}\mathbf{1}_{[0,\boldsymbol y)}(\mathbf{x}_n) - \prod_{i=1}^s y_i \right|,6 (Hofer, 20 Jan 2025).

Powers-of-2 Low-Discrepancy Generator (P2LSG)

P2LSG adapts the van der Corput sequence to bases DN(P)=supy[0,1]s1Nn=0N11[0,y)(xn)i=1syi,D^*_N(P) = \sup_{\boldsymbol y\in[0,1]^s} \left| \frac1N\sum_{n=0}^{N-1}\mathbf{1}_{[0,\boldsymbol y)}(\mathbf{x}_n) - \prod_{i=1}^s y_i \right|,7. For an DN(P)=supy[0,1]s1Nn=0N11[0,y)(xn)i=1syi,D^*_N(P) = \sup_{\boldsymbol y\in[0,1]^s} \left| \frac1N\sum_{n=0}^{N-1}\mathbf{1}_{[0,\boldsymbol y)}(\mathbf{x}_n) - \prod_{i=1}^s y_i \right|,8-bit binary counter DN(P)=supy[0,1]s1Nn=0N11[0,y)(xn)i=1syi,D^*_N(P) = \sup_{\boldsymbol y\in[0,1]^s} \left| \frac1N\sum_{n=0}^{N-1}\mathbf{1}_{[0,\boldsymbol y)}(\mathbf{x}_n) - \prod_{i=1}^s y_i \right|,9, partition bits into L2,N(P)=([0,1]sΔP(t)2dt)1/2,\mathcal{L}_{2,N}(P) = \left( \int_{[0,1]^s} |\Delta_P(\mathbf{t})|^2 d\mathbf{t} \right)^{1/2},0 groups and reverse their positions. Each output is L2,N(P)=([0,1]sΔP(t)2dt)1/2,\mathcal{L}_{2,N}(P) = \left( \int_{[0,1]^s} |\Delta_P(\mathbf{t})|^2 d\mathbf{t} \right)^{1/2},1 for group value L2,N(P)=([0,1]sΔP(t)2dt)1/2,\mathcal{L}_{2,N}(P) = \left( \int_{[0,1]^s} |\Delta_P(\mathbf{t})|^2 d\mathbf{t} \right)^{1/2},2. For L2,N(P)=([0,1]sΔP(t)2dt)1/2,\mathcal{L}_{2,N}(P) = \left( \int_{[0,1]^s} |\Delta_P(\mathbf{t})|^2 d\mathbf{t} \right)^{1/2},3 this is classical VDC; for L2,N(P)=([0,1]sΔP(t)2dt)1/2,\mathcal{L}_{2,N}(P) = \left( \int_{[0,1]^s} |\Delta_P(\mathbf{t})|^2 d\mathbf{t} \right)^{1/2},4 it matches the byte granularity of SNG units (Moghadam et al., 2023).

4. Hardware Implementation in Digital SPUs

Digital SPUs implement these constructions using highly parallel, bitwise logic and memory-efficient data structures.

  • Arithmetic over L2,N(P)=([0,1]sΔP(t)2dt)1/2,\mathcal{L}_{2,N}(P) = \left( \int_{[0,1]^s} |\Delta_P(\mathbf{t})|^2 d\mathbf{t} \right)^{1/2},5 or L2,N(P)=([0,1]sΔP(t)2dt)1/2,\mathcal{L}_{2,N}(P) = \left( \int_{[0,1]^s} |\Delta_P(\mathbf{t})|^2 d\mathbf{t} \right)^{1/2},6 exploits hardware-level XOR (addition), AND (multiplication in L2,N(P)=([0,1]sΔP(t)2dt)1/2,\mathcal{L}_{2,N}(P) = \left( \int_{[0,1]^s} |\Delta_P(\mathbf{t})|^2 d\mathbf{t} \right)^{1/2},7), and table-based multiplication (for general L2,N(P)=([0,1]sΔP(t)2dt)1/2,\mathcal{L}_{2,N}(P) = \left( \int_{[0,1]^s} |\Delta_P(\mathbf{t})|^2 d\mathbf{t} \right)^{1/2},8).
  • Memory requirements are set by the size of the truncated generator matrices: L2,N(P)=([0,1]sΔP(t)2dt)1/2,\mathcal{L}_{2,N}(P) = \left( \int_{[0,1]^s} |\Delta_P(\mathbf{t})|^2 d\mathbf{t} \right)^{1/2},9 bits, where ΔP(t)\Delta_P(\mathbf{t})0 is the number of rows (determined by interlacing factor, net quality, and required ΔP(t)\Delta_P(\mathbf{t})1), ΔP(t)\Delta_P(\mathbf{t})2 is the bit-depth.
  • Bit-group reversal in P2LSG is realized as fixed hard-wiring in the data path, achieving area and latency reductions relative to Sobol or Halton generators (Moghadam et al., 2023).
  • Pipeline stages for hybrid sequences include base-ΔP(t)\Delta_P(\mathbf{t})3 digit extraction, LFSR polynomial division for remainder calculation, SIMD-accelerated digit operations, and radical-inverse assembly. Specialized stages are mapped to SIMD lanes with wide registers for polynomial or Laurent-series arithmetic (Robertson, 2024).
  • Parallelization is achieved by allocating distinct index bits to different SPU lanes or SIMD threads, yielding linear throughput scaling with minimal area or power overhead (Moghadam et al., 2023).

A comparison of gate-level metrics for P2LSG, Sobol, and Halton generators in 45nm CMOS is given below:

Generator Area (µm²) Power (µW) Critical Path Latency (ns)
Sobol #2/#3 2 × 781 2 × 45.15 0.68
Halton #1/#2 130+450 15.15+35.3 1.06
P2LSG-4/-16 163 16.05 0.49

P2LSG demonstrates ΔP(t)\Delta_P(\mathbf{t})4 the area and power of Sobol, with ΔP(t)\Delta_P(\mathbf{t})5 lower critical-path latency (Moghadam et al., 2023).

5. Practical Performance and Case Studies

Empirical comparison in stochastic computing (SC) and image/video processing demonstrates P2LSG and digital sequence-based SPUs afford:

  • Error rates: For stochastic multiplication, P2LSG achieves mean absolute error (MAE) ΔP(t)\Delta_P(\mathbf{t})6 0.1% for ΔP(t)\Delta_P(\mathbf{t})7, within 10% of Sobol and surpassing other low-discrepancy and pseudo-random generators. For scaled addition (MUX-based), P2LSG attains 0% MAE at ΔP(t)\Delta_P(\mathbf{t})8.
  • Image processing: In 2ΔP(t)\Delta_P(\mathbf{t})9 up-scaling and scene merging, P2LSG-driven SPUs yield PSNR and SSIM on par or superior to Sobol-based implementations but with area and energy reductions of 50–85%.
  • Throughput: On a modern 2 GHz SPU, pipelined digital van der Corput–Kronecker hybrids achieve multi-billion point generation per second per lane (Robertson, 2024).
  • Energy efficiency: Compared to Sobol-based units, P2LSG reduces area by 55–73%, energy per operation by 68–90%, and latency by 14–23% (Moghadam et al., 2023).
  • Distribution quality: Bit-streams from low-discrepancy sources exhibit uniformly spread "1" positions, minimizing internal clustering and variance in SC tasks.

6. Design Guidelines and Parameter Selection

  • Field size and base: Prefer NN00 for hardware alignment (bitwise logic, native field arithmetic).
  • Generator degree: Select irreducible NN01 of low degree (NN02 or 16) to balance discrepancy constants and hardware cost.
  • Interlacing factor NN03 (for higher-order digital sequences): Larger NN04 improves log-rate in discrepancy at the expense of increased shift-register/matrix dimensions.
  • Quality parameter NN05: Controlled via the choice of underlying net (digital NN06), with smaller finite deficiency favorable for constants but requiring careful selection from known constructions.
  • Stream length NN07: For NN08 up to NN09, NN10, NN11, NN12 is recommended for practical implementations (Hofer, 20 Jan 2025). For SC, NN13 suffices for sub-0.1% error (Moghadam et al., 2023).
  • Parallel throughput scaling: Partition index bits for parallel stream generation, exploiting SIMD hardware.
  • Memory overhead: Moderate, scaling as NN14 or NN15 depending on generator.

7. Applications and Broader Implications

Digital SPUs with low-discrepancy generators are central to QMC methods, energy-efficient stochastic computing, and large-scale Monte Carlo simulations. Their deployment in hardware accelerators—including FPGAs, ASICs, and SIMD SPUs—enables deterministic, high-throughput, uniform sampling with provable rate-optimal coverage of the unit cube. This supports modern applications in image and signal processing, machine learning accelerators, and statistical emulation of randomness, where uniformity and streaming throughput are paramount (Dick et al., 2012, Moghadam et al., 2023, Robertson, 2024, Hofer, 20 Jan 2025).

A plausible implication is that as higher-dimensional and higher-order sequences over NN16 or NN17 continue to be analyzed, and as more explicit constructions tied to number-theoretic conjectures (e.g., NN18-adic Littlewood) emerge, further improvements in both rate constants and hardware integration may be achieved. Nevertheless, current constructions with finite-deficiency matrices and hybrid digital architectures already match the best-known theoretical bounds for discrepancy and hardware efficiency.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Digital SPUs with Low-Discrepancy Generators.