Papers
Topics
Authors
Recent
2000 character limit reached

Probabilistic Circuits Overview

Updated 24 December 2025
  • Probabilistic circuits are tractable models that use alternating sum and product nodes to represent complex, high-dimensional probability distributions.
  • Variants such as monotone, squared, and Inception circuits offer distinct trade-offs in expressive efficiency and computational complexity, with Inception circuits unifying and extending the others.
  • Empirical evaluations on image and tabular datasets show that these models support efficient inference, with Inception circuits exhibiting superior performance in practice.

Probabilistic circuits (PCs) are a general class of tractable probabilistic models that represent unnormalized or normalized functions—typically probability mass or density functions—through computation graphs composed of alternating weighted sum and product nodes over leaf distributions. Their primary utility lies in expressing and learning complex, high-dimensional distributions with structure that ensures efficient (often linear or polynomial time) computation of key probabilistic queries, including marginals, conditionals, most-probable explanations, and moments. The field encompasses both purely nonnegative models (monotone circuits), models permitting negative or complex parameters (such as squared and Inception circuits), and a range of architectures for improved expressivity, representation learning, regularization, and efficient inference.

1. Mathematical Formalism of Probabilistic Circuits

A probabilistic circuit over variables V={X1,,Xd}V = \{X_1,\ldots,X_d\} is a rooted directed acyclic graph (DAG) in which each internal node alternates between a sum node and a product node, and each leaf computes a tractable univariate distribution or indicator function fn(xW)f_n(x_W) for some WVW \subseteq V.

  • For a sum node nn, the operation is:

fn(x)=iwn,ifni(x)f_n(x) = \sum_i w_{n,i} f_{n_i}(x)

where wn,iw_{n,i} are weights (typically nonnegative, but possibly real or complex in generalized settings).

  • For a product node nn:

fn(x)=ifni(x)f_n(x) = \prod_i f_{n_i}(x)

  • The value at the root fC(x)f_C(x) defines the unnormalized, normalized, or functionally transformed value at input xx.

Smoothness obtains if children of every sum node share the same scope; decomposability if children of every product node have disjoint scopes. These properties underlie the tractability of marginals and other inference queries:

Z=xfC(x)Z = \sum_x f_C(x)

which is computable in O(C)O(|C|), i.e., time linear in the number of edges in the circuit, via a bottom-up dynamic program.

When product nodes that share the same scope decompose it in the same way, the circuit is said to be structured-decomposable, necessary for efficient exact circuit operations like multiplication and squaring (Wang et al., 1 Aug 2024).

2. Expressive Power: Monotone, Squared, and Inception Circuits

2.1 Monotone and Squared Circuits

  • Monotone PCs: All weights and leaves are non-negative; fC(x)0f_C(x) \ge 0 for all xx, so

p1(x)=fC(x)Zp_1(x) = \frac{f_C(x)}{Z}

can represent any non-negative distribution compatible with the structure. All standard probabilistic queries are tractable.

  • Squared PCs: Permit real (potentially negative) weights. The normalized density is:

p2(x)=fC(x)2xfC(x)2p_2(x) = \frac{f_C(x)^2}{\sum_x f_C(x)^2}

allowing representation of distributions inaccessible to monotone circuits with smaller circuit size. The squared function fC(x)2f_C(x)^2 itself can be represented as a new structured-decomposable circuit of size O(C2)O(|C|^2).

Incomparability Theorems: Recent results established mutual exponential separations in expressive efficiency. There exist distributions that admit polynomial-size squared PCs but require exponential-size monotone PCs, and conversely, distributions for which monotone PCs are exponentially more succinct than squared PCs (Wang et al., 1 Aug 2024). Hence, monotone and squared PC classes are provably incomparable in their succinctness for representing general distributions.

2.2 Inception Probabilistic Circuits

Inception PCs generalize both monotone and squared PCs by introducing two independent latent variables per sum node, enabling both pre- and post-squaring marginalizations within the computational graph, with parameters permitted to be complex. The resulting family encompasses both monotone (by setting KW=1K_W=1) and squared (KU=1K_U=1) circuits as special cases. The explicit distribution is:

pInception(x)=uwfCaug(x,u,w)2x,uwfCaug(x,u,w)2p_{\mathrm{Inception}}(x) = \frac{\sum_u\left|\sum_w f_{\mathrm{C_{aug}}}(x,u,w)\right|^2}{\sum_{x,u}\left|\sum_w f_{\mathrm{C_{aug}}}(x,u,w)\right|^2}

Inception PCs can be compiled as smooth, structured-decomposable circuits of size O(C2)O(|C|^2). All inference operations remain tractable, and empirical evidence shows they consistently outperform both monotone and squared circuits across standard datasets, particularly with complex-valued weights and moderate latent cardinalities (KU,KWK_U, K_W) (Wang et al., 1 Aug 2024).

3. Parameter Learning and Inference Algorithms

Parameter learning in PCs typically maximizes (negative) log-likelihood:

xDlogp(x)-\sum_{x \in \mathcal{D}} \log p(x)

For monotone and squared circuits, or Inception PCs, stochastic gradient descent is generally effective and differentiable parameterizations support the use of Adam or Wirtinger derivatives for complex parameters.

Alternately, the Expectation-Maximization (EM) framework provides efficient updates by treating sum-node choices as latent variables. The surrogate EM objective aligns with a mirror-descent step on the log-likelihood with a Kullback–Leibler regularization on joint parameter changes. For large datasets, a mini-batch EM variant with a theoretically grounded objective compensates for the incomplete view of data by upweighting the regularization term, resulting in a robust and empirically superior parameter update strategy (Liu et al., 26 May 2025).

For models with complex latents (Inception PCs), future research aims at explicit EM formulations handling both sets of latent variables.

Inference (marginalization, conditioning, MAP) in monotone, squared, and Inception circuits all reduce to a bottom-up and optional top-down pass through the circuit. The overall complexity is O(C)O(|C|) for monotone and O(C2)O(|C|^2) for squared and Inception PCs. Complex weights are handled by propagating complex arithmetic via log-modulus/argument representations for stability (Wang et al., 1 Aug 2024).

4. Empirical Performance and Specialization

Extensive experimental results on image (MNIST, FashionMNIST, EMNIST) and tabular datasets demonstrate:

Dataset MonotonePC SquaredPC (real) SquaredPC (complex) InceptionPC (K_U=2) InceptionPC (K_U=8)
MNIST 1.305 1.296 1.253 1.247 1.245
EMNIST Letters 1.908 1.881 1.868 1.854 1.853
FashionMNIST 3.562 3.580 3.501 3.470 3.464

(bits-per-dimension, lower is better). The trend is consistent: complex squared PCs outperform their real counterparts, and InceptionPCs achieve the best results, with the advantage plateauing beyond moderate latent cardinality (Wang et al., 1 Aug 2024).

This superiority is attributed to (i) smoother complex-parameter optimization landscapes and (ii) the strict generality and interplay of sums and squares provided by InceptionPC construction.

5. Theoretical Significance, Extensions, and Outlook

The discovery of mutual incomparability between monotone and squared probabilistic circuits fundamentally redefines the understanding of the expressive landscape for tractable probabilistic modeling. InceptionPCs, unifying both paradigms and enabling richer compositional mixtures (including complex-valued and tensorized parameterizations), open the field to new directions:

  • Rich latent structure: Multiple distinct layers of latent summations and squaring quantify a strictly more expressive architecture.
  • Algorithmic development: EM-style methods for multiple interacting sets of latents, and more efficient circuit compilation strategies, are now emerging research targets.
  • Quantum models: The connection between InceptionPCs and mixed-state tensor networks introduces avenues linking tractable probabilistic modeling and quantum graphical models.
  • Practical implementations: Tractable learning and inference are demonstrably preserved (quadratic time) while enabling superlinear expressivity gains, readily transferable to large-scale applications (Wang et al., 1 Aug 2024).

In summary, probabilistic circuits encompass a rich spectrum of tractable, compositional probabilistic models whose extensions—both in expressive power and algorithmic machinery—are advancing mathematical and statistical modeling. Inception probabilistic circuits now play a central unifying role, empirically and theoretically, and set the foundation for further integration with complex, latent-variable, and quantum-inspired modeling (Wang et al., 1 Aug 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Probabilistic Circuits.