Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 69 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 37 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 119 tok/s Pro
Kimi K2 218 tok/s Pro
GPT OSS 120B 456 tok/s Pro
Claude Sonnet 4.5 33 tok/s Pro
2000 character limit reached

Probabilistic Neural Circuits (2403.06235v1)

Published 10 Mar 2024 in cs.LG, cs.AI, cs.NE, and stat.ML

Abstract: Probabilistic circuits (PCs) have gained prominence in recent years as a versatile framework for discussing probabilistic models that support tractable queries and are yet expressive enough to model complex probability distributions. Nevertheless, tractability comes at a cost: PCs are less expressive than neural networks. In this paper we introduce probabilistic neural circuits (PNCs), which strike a balance between PCs and neural nets in terms of tractability and expressive power. Theoretically, we show that PNCs can be interpreted as deep mixtures of Bayesian networks. Experimentally, we demonstrate that PNCs constitute powerful function approximators.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. EMNIST: Extending MNIST to handwritten letters. In 2017 international joint conference on neural networks.
  2. Continuous mixtures of tractable probabilistic models. In AAAI Conference on Artificial Intelligence.
  3. Juice: A julia package for logic and probabilistic circuits. In AAAI Conference on Artificial Intelligence.
  4. Sparse probabilistic circuits via pruning and growing. In Advances in Neural Information Processing Systems.
  5. Darwiche, A. 2001. Decomposable negation normal form. Journal of the ACM (JACM), 48(4): 608–647.
  6. Darwiche, A. 2003. A differential approach to inference in Bayesian networks. Journal of the ACM, 50(3): 280–305.
  7. Darwiche, A. 2011. SDD: A new canonical representation of propositional knowledge bases. In Twenty-Second International Joint Conference on Artificial Intelligence.
  8. Shallow vs. deep sum-product networks. Advances in neural information processing systems, 24.
  9. Deng, L. 2012. The mnist database of handwritten digit images for machine learning research. IEEE signal processing magazine.
  10. Random probabilistic circuits. In Uncertainty in Artificial Intelligence.
  11. Hanson, S. J. 1990. A stochastic version of the delta rule. Physica D: Nonlinear Phenomena, 42(1-3): 265–272.
  12. Integer discrete flows and lossless compression. In Advances in Neural Information Processing Systems.
  13. On the expressive power of deep polynomial neural networks. Advances in neural information processing systems, 32.
  14. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  15. Bit-swap: Recursive bits-back coding for lossless compression with hierarchical latent variables. In International Conference on Machine Learning.
  16. Probabilistic sentential decision diagrams. In Fourteenth International Conference on the Principles of Knowledge Representation and Reasoning.
  17. Elevating perceptual sample quality in PCs through differentiable sampling. In NeurIPS 2021 workshop on pre-registration in machine learning. PMLR.
  18. Learning logistic circuits. In AAAI Conference on Artificial Intelligence.
  19. Lossless Compression with Probabilistic Circuits. In International Conference on Learning Representations.
  20. Tractable regularization of probabilistic circuits. In Advances in Neural Information Processing Systems.
  21. Fixing weight decay regularization in adam.
  22. On the expressive efficiency of sum product networks. arXiv preprint arXiv:1411.7717.
  23. Einsum networks: Fast and scalable learning of tractable probabilistic circuits. In International Conference on Machine Learning.
  24. On theoretical properties of sum-product networks. In Artificial Intelligence and Statistics.
  25. Random sum-product networks: A simple and effective approach to probabilistic deep learning. In Uncertainty in Artificial Intelligence.
  26. Sum-product networks: A new deep architecture. In 2011 IEEE International Conference on Computer Vision Workshops. IEEE.
  27. Conditional sum-product networks: Modular probabilistic circuits via gate functions. International Journal of Approximate Reasoning, 140: 298–313.
  28. Sum-product-quotient networks. In International Conference on Artificial Intelligence and Statistics.
  29. Hyperspns: Compact and expressive probabilistic circuits. Advances in Neural Information Processing Systems.
  30. Training and Inference on Any-Order Autoregressive Models the Right Way. Advances in Neural Information Processing Systems.
  31. A note on the evaluation of generative models. In International Conference on Learning Representations.
  32. A deep and tractable density estimator. In International Conference on Machine Learning.
  33. Conditional image generation with pixelcnn decoders. Advances in neural information processing systems, 29.
  34. A compositional atlas of tractable circuit operations for probabilistic inference. In Advances in Neural Information Processing Systems.
  35. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.
  36. Probabilistic generating circuits. In International Conference on Machine Learning.
  37. A unified approach for learning the parameters of sum-product networks. Advances in neural information processing systems.
Citations (1)

Summary

  • The paper introduces PNCs that merge the tractability of probabilistic circuits with the expressiveness of neural networks by relaxing decomposability constraints.
  • It presents a layered architecture using conditional probabilistic circuits and convolutional neural sum layers to efficiently approximate complex dependencies.
  • Experimental results demonstrate superior density estimation on MNIST and competitive classification performance, underscoring their practical potential.

This paper introduces Probabilistic Neural Circuits (PNCs), a class of models designed to bridge the gap between the tractability of Probabilistic Circuits (PCs) and the expressive power of neural networks. PCs, while allowing for efficient probabilistic inference like marginalization, suffer from limited expressiveness due to structural constraints (smoothness and decomposability). PNCs aim to relax these constraints selectively to gain expressiveness while retaining some tractability.

Core Concepts and Implementation:

  1. Conditional Probabilistic Circuits (CPCs): The foundation for PNCs. CPCs generalize PCs by introducing a partial order (\sqsubset) over the random variables, inspired by Bayesian Networks. Computations within a CPC unit are conditioned on the values of parent variables according to this order.
    • A CPC unit kk computes pk(XnXpa(n))p_k(\mathcal{X}_n | \mathcal{X}_{pa(n)}), where Xn\mathcal{X}_n is the scope of the unit and Xpa(n)\mathcal{X}_{pa(n)} are the parents according to the partial order.
    • CPCs maintain conditional versions of smoothness and decomposability.
    • Theoretically, valid CPCs (conditionally smooth, decomposable, normalized sum weights) represent deep mixtures of Bayesian networks (Eq. \ref{eq:f_tree_decomp_PCP}).
  2. Probabilistic Neural Circuits (PNCs): PNCs are introduced as a tractable approximation of CPCs. The key challenge in CPCs is that sum units require evaluating potentially exponentially many conditional distributions based on parent instantiations. PNCs approximate this:
    • The sum unit computation is modified (Definition \ref{def:pnc}, Eq. \ref{eq:pnc}):

      pk(XnXpa(n))=jI(k)ϕkj(Xan(n))pj(Xn){p}_k(\mathcal{X}_n \mid \mathcal{X}_{pa(n)}) = \sum_{j\in\mathcal{I}(k)} \phi_{kj} (\mathcal{X}_{an(n)}) p_j(\mathcal{X}_n)

      Here, ϕkj\phi_{kj} is a neural network that takes the values of the ancestors Xan(n)\mathcal{X}_{an(n)} as input and outputs normalized weights (jϕkj=1\sum_j \phi_{kj} = 1). pj(Xn)p_j(\mathcal{X}_n) are the outputs of the child units, without the explicit conditioning seen in CPCs.

    • This neural network ϕkj\phi_{kj} replaces the complex conditional probability ratio derived from Bayes' rule, making the computation depend only on the ancestor values and the unconditioned child distributions.

  3. Layered PNC Construction: A practical method for building PNCs is proposed, based on layered PC structures like those from Shih et al. [shih2021hyperspns] but with added neural dependencies.
    • The structure alternates sum and product operations layer-wise, operating on partitions of variables.
    • Neural Sum Layer: The core implementation detail. This layer computes:

      κl,p,2,c=c=1NCϕl,p,c,c(Kancestors)×κl,p,1,c{\kappa}_{l,p,2,c} = \sum_{c'=1}^{N_C} \phi_{l,p,c,c'}(\mathcal{K}_{ancestors}) \times {\kappa}_{l,p,1,c'}

      where κ\kappa represents the value of a computational component (indexed by layer ll, partition pp, input/output ii, component cc). The neural network ϕl,p,c,c\phi_{l,p,c,c'} computes the weights for component cc in partition pp based on the outputs (Kancestors\mathcal{K}_{ancestors}) of components in preceding partitions (pνl,pp-\nu_{l,p} to p1p-1) within the same layer ll. The hyperparameter ν\nu determines the range of this dependency.

    • These intra-layer dependencies via ϕ\phi induce the partial order (\sqsubset) on the variables.

  4. Convolutional Implementation: The NeuralSumLayer is efficiently implemented using Convolutional Neural Networks (CNNs).
    • Partitions are treated as the spatial dimension(s).
    • Components (NCN_C) are treated as channels.
    • "Half kernels" (masked convolutions, see Figure \ref{fig:kernel}) are used to ensure that the computation for a partition only depends on preceding partitions, respecting the induced variable order. This is crucial for maintaining tractability.
    • A final softmax activation ensures the neural network outputs (ϕ\phi) sum to 1 for each partition's summation.

Tractability:

  • Density Evaluation: Computing p(x)p(\mathbf{x}) for a full evidence vector x\mathbf{x} is linear in the size of the circuit, just like standard PCs.
  • Ordered Marginals/Conditionals: Computing p(Xe)p(\mathcal{X}_e) or p(XoXe)p(\mathcal{X}_o | \mathcal{X}_e) is tractable (polynomial time) if the variables being marginalized out (Xm\mathcal{X}_m) come after the evidence/query variables in the partial order induced by the neural dependencies (i.e., XeXm\mathcal{X}_e \sqsubset \mathcal{X}_m, XoXm\mathcal{X}_o \sqsubset \mathcal{X}_m).
    • The marginalization algorithm (Algorithm \ref{alg:layer}) works by setting leaf nodes corresponding to marginalized variables to 1 and performing a forward pass. The conditional decomposability and the specific structure of the neural sum units (where integration results in 1 due to normalized weights) allow integrals/summations to be pushed down correctly, similar to standard PCs, provided the order is respected. Arbitrary marginalization is generally not tractable.

Experimental Results and Applications:

  • Density Estimation: PNCs achieve state-of-the-art or competitive performance (measured in bits per dimension) on MNIST variants, outperforming standard PCs (PSCs), their implementation of SPQNs (PQCs), and other PC methods like HCLT and RAT-SPN. This demonstrates the practical benefit of increased expressiveness from the neural components.
  • Classification: PNCs were adapted for discriminative tasks by training one circuit per class and using a cross-entropy loss. They outperformed PQCs but lagged behind Logistic Circuits and RAT-SPNs. The paper suggests this might be due to suboptimal regularization, as PNCs achieved very high training accuracy, indicating potential overfitting.
  • Implementation Details: Experiments used PyTorch Lightning on Nvidia V100 GPUs. Models had roughly 2.6-2.8M parameters for the PNC/PQC/PSC comparison. A specific layered architecture alternating row/column merges with 12 components per partition was used for image tasks.

Limitations and Future Work:

  • Tractability: Only ordered marginalization is guaranteed to be tractable.
  • Structure Learning: The paper uses a predefined structure; learning optimal PNC structures remains an open question.
  • Regularization: Effective regularization techniques for discriminative training with PNCs need further investigation.
  • Future Directions: Exploring sampling, applying PNCs to tabular data, lossless compression, and potentially linking to any-order autoregressive models.

Comparison to Related Work:

  • SPQNs: PNCs are presented as a generalization and simplification. They achieve the goal of relaxing decomposability using neural networks within sum units, rather than introducing explicit quotient units. The paper shows CMO-SPQNs are a restricted form of PNCs.
  • Conditional SPNs (Shao et al.): These condition sum weights on external variables, modelling a single conditional distribution. PNCs condition internally based on ancestor variables within the circuit, modelling mixtures of Bayesian networks.

In summary, PNCs offer a practical way to enhance the expressiveness of probabilistic circuits using neural networks embedded within their sum units. This is achieved by approximating conditional dependencies, leading to models that perform well on density estimation tasks while retaining tractability for ordered inference queries. The convolutional implementation provides an efficient pathway for applying PNCs to structured data like images.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube