All-Transistor Probabilistic Hardware

Updated 30 October 2025

All-transistor probabilistic hardware is defined by networks of tunable p-bits that use transistor noise and analog nonlinearity for robust stochastic computation.
Its circuit architectures integrate CMOS-only, hybrid CMOS-nanodevice, and novel transynapse elements to achieve massive parallelism and energy-efficient inference.
Hardware-algorithm co-design enables in-situ learning, accelerated probabilistic inference, and optimized performance in combinatorial optimization and generative modeling tasks.

All-transistor probabilistic hardware refers to the class of physical computation architectures that employ networks of transistors (often augmented by CMOS-compatible nanodevice elements) to implement stochastic, tunable, binary logic primitives ("p-bits") for hardware-accelerated probabilistic inference, randomized algorithms, and energy-based modeling. These systems exploit intrinsic or engineered device fluctuations, combined with programmable biasing, to perform sampling and inference directly in hardware, with substantial gains in efficiency and scalability over conventional deterministic digital logic.

1. Physical Foundations and Device Models

All-transistor probabilistic hardware is based on the p-bit, a binary stochastic entity whose output rapidly fluctuates between states $\{-1, +1\}$ (or $\{0,1\}$ ), with a probability controlled by input bias. The underlying device implementations fall into three broad categories:

CMOS-only (transistor-based): P-bits realized entirely using transistor noise (e.g., shot noise in subthreshold MOSFETs), digital pseudo-random number generators (LFSR), and thresholding logic. Examples include current-mode neuron update circuits (Jhonsa et al., 18 Apr 2025), analog subthreshold sampling cells for energy-based models (Jelinčič et al., 28 Oct 2025), and standard cell-based probabilistic networks.
Hybrid CMOS + nanodevice (e.g., MTJ): Integration of stochastic magnetic tunnel junctions (MTJs) with one or more transistors, leveraging thermal fluctuations in low barrier magnets for true random number generation. The archetype is the 1T/1MTJ cell, wherein a transistor sets the bias and the MTJ supplies the fluctuating resistance (Camsari et al., 2018, Daniel et al., 2023, Hassan et al., 2018).
Device-physics-inspired elements ("transynapse"): Nanoscale devices engineered to provide gain, directionality, and stochastic transfer functions, explicitly modeled with Langevin or stochastic LLG equations (Behin-Aein et al., 2016).

Mathematically, hardware p-bits implement the binary stochastic neuron model: $m_i(t+\Delta t) = \mathrm{sgn}[\, \mathrm{rand}(-1,1) + \tanh(I_i(t)) \, ]$ where $I_i(t)$ is the input (computed as a weighted sum of other p-bits and a bias), $\mathrm{rand}(-1,1)$ is uniformly distributed noise, and $\mathrm{sgn}$ denotes sign thresholding. In analog realizations, the input-to-output transfer may be implemented using transistor subthreshold nonlinearity, programmable current summation, and analog comparators (Jhonsa et al., 18 Apr 2025, Jelinčič et al., 28 Oct 2025).

2. Circuit-Level Architectures and Interconnection

Probabilistic hardware systems instantiate large-scale networks of p-bits, each with local tunability and stochastic update, interconnected via physical circuits implementing weighted sums ("synapses").

NeuMOS-like capacitive adders: Integer weights realized by combinatorial banks of capacitors, as in the voltage-driven PSL building block (Hassan et al., 2018).
Current-mode analog summation: Multiplication and addition of weights via differential current mirrors, followed by analog winner-take-all circuits for activation (Jhonsa et al., 18 Apr 2025).
Digital and mixed-signal summing: For transistor-only approaches, matrix-vector products (MVP) can be executed using parallel adders or analog crossbar arrays (Chowdhury et al., 2023).

All-transistor arrays facilitate massive parallelism: probabilistic logic gates, adder circuits, energy-based sampling steps, and chromatic Gibbs sampling (graph coloring) are realized via spatially modular, parallel update of hardware cells (Jelinčič et al., 28 Oct 2025, Jhonsa et al., 18 Apr 2025). The absence of global clocks allows asynchronous, autonomous operation with physical sampling rates determined only by intrinsic device delays and local connectivity.

3. Hardware-Algorithm Co-design and Expressivity

Hardware limitations in weight granularity, process variation, and analog mismatch have spurred algorithmic innovations:

Chain-of-EBMs (diffusion-like generative modeling): Instead of monolithic energy-based models (EBMs), denoising thermodynamic models (DTMs) chain simpler, rapidly-mixing EBMs in hardware (Jelinčič et al., 28 Oct 2025). The probabilistic transition at each DTM step is:

$P_\theta(x^{t-1} | x^t) \propto \exp( -[ \mathcal{E}^f_{t-1}(x^{t-1}, x^t) + \mathcal{E}^\theta_{t-1}(x^{t-1}, z^{t-1}, \theta) ] )$

Modular chain-of-EBM architecture mitigates the mixing/expressivity tradeoff inherent in hardware sampling.

Hardware-aware learning: In-situ contrastive divergence is performed directly on hardware, using actual noisy cell responses to correct for device-level mismatch (e.g., tanh curve shifts, bias currents), producing robust networks (Jhonsa et al., 18 Apr 2025).
Chromatic and autonomous sampling: Exploit graph coloring to maximize parallel hardware updates (chromatic Gibbs), and exploit device stochasticity for fully autonomous (sequencerless) operation at scale, characterized by the "flips per second" metric (Sutton et al., 2019, Jelinčič et al., 28 Oct 2025).

4. Applications in Probabilistic Inference, Optimization, and Generative Modeling

All-transistor probabilistic computers have demonstrated practical utility across a range of probabilistic workloads:

Combinatorial optimization: Hardware full adders, MaxCut, spin-glass/Ising solvers, subset-sum problem instances, and simulated annealing are performed via stochastic search in hardware networks (Hassan et al., 2018, Jhonsa et al., 18 Apr 2025, Sutton et al., 2019).
Probabilistic logic and Bayesian inference: Modeling of logic gates, invertible Boolean functions, and Bayesian networks is implemented directly at the physical layer, with correct conditional distributions obtained in sequencer-free operation by careful device time-scale engineering (Faria et al., 2020, Behin-Aein et al., 2016).
Generative modeling/diffusion models: Denoising thermodynamic models sampled in hardware achieve parity with GPU-based variational/generative algorithms (e.g., DDPM/VAE/GAN) at orders-of-magnitude lower energy consumption (~10,000× less) (Jelinčič et al., 28 Oct 2025).
Monte Carlo and quantum emulation: Large-scale p-computer arrays accelerate standard Monte Carlo, Markov Chain Monte Carlo, and quantum-inspired algorithms (energy-based quantum many-body learning, Suzuki-Trotter transformations) compared to digital simulation (Kaiser et al., 2021, Chowdhury et al., 2023).

5. Energy, Speed, and Scalability Metrics

Fundamental performance advances of all-transistor probabilistic hardware are quantified in energy per probabilistic bit flip and sampling throughput:

Platform (Hardware)	Neurons	Flips/sec	Energy/flip
CMOS-only (current-mode)	440	$2\times10^7$	$2~\mathrm{fJ}$
sMTJ+CMOS (projected)	$>10^6$	$>10^{16}$	$1.93~\mathrm{fJ}$
All-transistor DTM (image)	$5\times10^4$	GPU parity	$10,000\times$ lower
FPGA p-computer	$10^3$	$1.25\times10^8$	$1~\mathrm{pJ}$

Energy per sample and per cell for image generation (e.g., Fashion-MNIST) in DTM hardware is $\sim2~\mathrm{fJ}$ (Jelinčič et al., 28 Oct 2025). Sequencerless, autonomous architectures (enabled by device-physical stochasticity) avoid clocking bottlenecks and are projected to attain petaflip/sec regime at scale (Sutton et al., 2019).

6. Principal Limitations, Controversies, and Future Directions

Despite strong empirical and simulated results, all-transistor probabilistic hardware exhibits limitations and open problems:

Analog weight granularity and synaptic precision: Integer-only weight implementations constrain model expressivity; full floating-point synapses require larger/costlier circuitry (Hassan et al., 2018).
Hardware learning and adaptation: Dynamic reconfigurability and in-hardware online learning remain challenging; most implementations require added crossbar switches or multiplexers for programmable weights (Hassan et al., 2018, Chowdhury et al., 2023).
Scaling effects: As network size increases, leakage, interconnect delay, signal degradation, and process variation must be mitigated, typically by error-tolerant algorithm design and in-situ hardware correction schemes (Jhonsa et al., 18 Apr 2025).
Physical noise sources: CMOS-only approaches (LFSR, shot noise) may be pseudo-random rather than truly physical, reducing randomness quality versus hybrid designs employing stochastic MTJs (Camsari et al., 2018, Kaiser et al., 2021).
Expressivity vs mixing tradeoff: Monolithic EBMs mix slowly as expressivity grows; the DTM architecture circumvents this but relies on co-design between training algorithms and circuit topology (Jelinčič et al., 28 Oct 2025).
Lack of standardized metrics: "Flips per second" has emerged as a substrate-agnostic benchmark (Sutton et al., 2019), but universal hardware performance standards are still being defined.

This suggests ongoing research is likely to converge around standardized, energy-proportional stochastic hardware primitives (CMOS or hybrid), modular network architectures (chain-of-EBM, graph-colored sampling), and tightly-coupled hardware-algorithm learning frameworks.

7. Schematic Table: Core Concepts and Hardware Equivalents

Probabilistic Model Concept	Hardware Primitive	Key Implementation Detail
Binary stochastic neuron ( $p$ -bit)	MTJ + inverter or CMOS sampling cell	$V_{out} = f(\mathrm{rand},\, \tanh(I_{in}))$
Weighted sum (synapse)	Capacitor bank, Gilbert multiplier, membrane	Physical addition or current summing
Autonomous sampling	No clock, sequencerless network	Device time scales govern update
Denoising diffusion (DTM chain)	Modular Boltzmann hardware layers	Sequential, locally mixing EBMs
Hardware-aware learning	In-situ gradient adaptation	Mismatch corrected by response stats

References to Key Papers

Principal results and technical details are found in (Hassan et al., 2018, Camsari et al., 2018, Sutton et al., 2019, Kaiser et al., 2021, Jelinčič et al., 28 Oct 2025, Jhonsa et al., 18 Apr 2025, Faria et al., 2020, Chowdhury et al., 2023, Behin-Aein et al., 2016, Daniel et al., 2023). Each work delineates explicit device/circuit models, mathematical update rules, SPICE and hardware-level simulations, application benchmarks, and device/circuit design guidelines.

All-transistor probabilistic hardware, whether CMOS-only or hybrid transistor/nanodevice, provides a scalable, manufacturable, and energy-optimal physical substrate for probabilistic computation. Advances in device design (e.g., stochastic MTJ integration, analog subthreshold sampling, current-mode circuits), autonomous parallelism, and system-level co-design enable high-throughput sampling, probabilistic inference, and optimization in AI and scientific computing workloads, with an emerging consensus around chain-of-EBM architectures and flips-per-second performance metrics.