Papers
Topics
Authors
Recent
2000 character limit reached

Discrete Probabilistic Programs Overview

Updated 22 December 2025
  • Discrete probabilistic programs are computational models that encode stochastic processes using discrete distributions, standard control structures, and observation conditioning.
  • They utilize formal semantics based on probability generating functions and weighted automata to enable exact inference and quantitative analysis.
  • Challenges include state explosion and loop handling, with advanced methods like weighted model counting and symbolic calculi addressing these limitations.

A discrete probabilistic program is a computational model that encodes stochastic processes and manipulates integer- or finite-domain-valued variables using standard control structures (assignment, sequencing, conditionals, and potentially loops) plus primitives for sampling from discrete distributions and for conditioning on observed events. These models are the foundation for representing and analyzing a wide range of stochastic systems in machine learning, formal verification, and quantitative analysis. Exact inference, the task of calculating the program's output distribution conditioned on observations, is one of the primary challenges in the area, both theoretically and algorithmically.

1. Formal Structure of Discrete Probabilistic Programs

A discrete probabilistic program is generally defined in an imperative or functional syntax encompassing:

  • Variables xVarx \in \mathrm{Var} ranging over N\mathbb{N} (or finite domains, e.g., {0,1}\{0,1\} for Booleans)
  • Assignments x:=nx := n (deterministic) and xDx \sim D (sampling from a discrete distribution DD, e.g., Bernoulli, geometric)
  • Conditionals with "rectangular" guards x=kx=k or x<nx<n (no inter-variable comparisons)
  • Sequencing S1;S2S_1 ; S_2
  • Observations observe(x=k)\mathrm{observe}(x=k) or more general Boolean predicates on the state
  • Optional: Loops while  G  do  Swhile \; G \; do \; S or finite for-loops

For example, the language ReDiP admits only loop-free programs with rectangular guards, while other frameworks (e.g., cpGCL or extensions in (Bagnall et al., 2022)) include unbounded loops with hard conditioning and recursion (Geißler et al., 15 Dec 2025, Geißler et al., 18 Sep 2025, Chen et al., 2022, Klinkenberg et al., 2023).

Semantics are specified by a measure transformer, a probability generating function (PGF) transformer, or a denotational semantics in terms of sub-distributions over state spaces, sometimes augmented by transition systems or operational Markov chains (Geißler et al., 15 Dec 2025, Chen et al., 2022, Klinkenberg et al., 2023).

2. Algebraic and Automata-Theoretic Models

2.1. Probability Generating Functions

A central formal device is encoding sub-distributions μ:Nk[0,1]\mu : \mathbb{N}^k \to [0,1] by their probability generating function (PGF)

Gμ(X1,,Xk)=nNkμ(n)X1n1XknkG_\mu(X_1, \dots, X_k) = \sum_{n \in \mathbb{N}^k} \mu(n) X_1^{n_1} \cdots X_k^{n_k}

This formal power series algebraically encodes the full behavior of the program as a mapping on state spaces. Program constructs correspond to PGF operations: mixture (addition), convolution, monomial shift (assignment), and projection (marginalization) (Geißler et al., 15 Dec 2025, Geißler et al., 18 Sep 2025, Klinkenberg et al., 2023, Chen et al., 2022, Klinkenberg et al., 2020).

2.2. Weighted Automata

For finite or countably supported discrete programs, the distribution can be captured by weighted automata where states abstract program configurations, transitions represent possible updates with probabilistic weights, and accepted labels encode variable increments:

A weighted automaton A=(Q,Σ,Δ,α,β)A = (Q, \Sigma, \Delta, \alpha, \beta) consists of states QQ, alphabet Σ\Sigma (usually monomials over program variables), transition weights Δ\Delta, initial state weights α\alpha, and final weights β\beta. The program's semantics is then specified as the automaton's weight function:

A(w)=q0,,qnQα(q0)(i=1nΔ(qi1,σi,qi))β(qn)A(w) = \sum_{q_0, \dots, q_n \in Q} \alpha(q_0) \left(\prod_{i=1}^n \Delta(q_{i-1}, \sigma_i, q_i)\right) \beta(q_n)

for a sequence of labels w=σ1σnw = \sigma_1 \dots \sigma_n (Geißler et al., 15 Dec 2025, Geißler et al., 18 Sep 2025).

Each program statement transforms a weighted automaton through a corresponding algebraic operation: concatenation for sequencing, union for probabilistic choice, projection for assignment, and intersection with regular languages for conditioning.

3. Exact Inference: Computation and Algorithms

The principal task is to compute or represent the posterior distribution induced by a program and a prior, conditioned on any observations. This is approached by recursive traversal and composition of algebraic or automata-theoretic transformers associated with each program fragment.

3.1. Loop-Free Programs

For loop-free discrete probabilistic programs, recursive structural compilation to weighted automata or explicit algebraic PGF transformers yields a fully symbolic, exact computation of the posterior distribution (Geißler et al., 15 Dec 2025, Geißler et al., 18 Sep 2025). The process is as follows:

  1. Initialize from the prior (usually a Dirac at the initial store).
  2. Apply statement transformers according to program syntax, using automata operations or PGF algebra.
  3. At each conditional, branch and project the automaton/PGF to the relevant support sets.
  4. For each observe/conditioning, intersect (in automata) or restrict (in PGFs) to satisfying states, then renormalize.
  5. Return the normalized automaton or rational function representing the output distribution.

The construction is sound with respect to the operational Markov-chain semantics (Geißler et al., 18 Sep 2025), and is aligned with weakest-pre-expectation calculi in the finite-state case (Schröer et al., 1 Dec 2024).

3.2. Programs With Loops

For programs with loops or unbounded recursion, semantics requires solving fixed-point equations in the space of PGFs or automata. For rectangular (variable–constant) guarded loops, this can be achieved by:

  • Fixed-point induction over PGF transformers, via Park induction, yielding least or greatest fixed points (sub-distributions or invariants) (Chen et al., 2022, Klinkenberg et al., 2023, Klinkenberg et al., 2020).
  • Explicit invariants: To verify that a loop generates a specified distribution, finding closed-form PGFs as invariants suffices for soundness and completeness—decidability follows by reducing to rational function equivalence (Chen et al., 2022).
  • Dynamic programming approaches: Factored sum-product networks (FSPNs) encode recursive dependencies as systems of equations, solved by fixed-point iteration in strongly connected component order (Stuhlmüller et al., 2012).

For general systems, the automata or PGF representations may grow exponentially, and termination or exactness are not guaranteed without further structure.

4. Symbolic and Scalable Inference

4.1. Weighted Model Counting

A prominent approach for practical inference is reducing discrete probabilistic programs to weighted model counting (WMC) or knowledge compilation (Cao et al., 2023, Holtzen et al., 2020, Holtzen et al., 2019). Here, the program is compiled into a weighted Boolean (or bitvector) circuit, with weights tracking discrete probabilities of random choices. Inference queries (marginals, conditionals) become ratio computations of WMC on these circuits—efficient if the compiled structure (BDD, d-DNNF, SDD) exploits conditional and context-specific independence.

  • Integer Distributions: Binary encoding for integers (bit-blasting) enables compact circuit representations for arithmetic operations such as addition, comparison, or bounded loops (Cao et al., 2023, Garg et al., 2023).
  • Practical scalability: Orders of magnitude improvements over path enumeration and exact enumeration approaches (e.g., Psi, WebPPL's enumerator) (Cao et al., 2023, Holtzen et al., 2020).

4.2. Symbolic Calculi

For information-theoretic or quantitative analysis (e.g., entropy, mutual information, KL divergence), weakest-pre-expectation calculi enable closed-form symbolic computation of quantities of interest for finite discrete programs (Schröer et al., 1 Dec 2024). All key measures are expressible exactly in this algebraic domain.

5. Extensions, Limitations, and Verification

5.1. Extensions

  • Continuous and Mixed Distributions: While the automata/PGF frameworks extend theoretically to continuous support (via hybrid automata or formal power series over real-valued indeterminates), tractable exact inference is generally unattainable; approximate techniques or measure-theoretic generalizations are then requisite (Wu et al., 2018, Garg et al., 2023).
  • Richer Control Flow: Handling non-rectangular guards, recursion with complex dependencies, or advanced data types requires moving to more powerful automata (e.g., pushdown, multitape), or richer algebraic frameworks (Geißler et al., 15 Dec 2025).
  • Formal verification: Decidability of output distribution equivalence and synthesis of invariants is established for rectangular-guarded, rational-distribution programs (Chen et al., 2022). This leads to certified tools able to verify complex behaviors automatically (Klinkenberg et al., 2023).

5.2. Computational and Expressiveness Limitations

  • State Explosion: Automata and BDD-based methods can become infeasible for high-dimensional or poorly structured programs.
  • Loop Handling: For unbounded or mutually recursive loops, synthesis of loop invariants or fixed points may not terminate or may miss exactness unless supported by further theory (e.g., eventually geometric distributions with contraction invariants) (Zaiser et al., 15 Nov 2024).
  • Expressiveness: Most exact approaches support either only loop-free programs or looped programs with severe syntactic/semantic restrictions (rectangularity, finite variables, or bounded arity).

6. Representative Tools and Empirical Insights

Several implemented systems reflect the state-of-the-art:

Tool Underlying Method Key Features and Applicability
Dice Weighted model counting over BDDs Scalable exact inference for large discrete models with arithmetic (Holtzen et al., 2020)
Prodigy Denotational PGF semantics + CAS Exact inference & verification for loopy (rectangular) programs (Klinkenberg et al., 2023, Chen et al., 2022)
Diabolo Residual-mass and geometric bounds Automated bounding of posteriors for loopy programs with convergence guarantees (Zaiser et al., 15 Nov 2024)
Zar Verified coin-flip samplers Certified sampling for discrete p.p.s with unbounded loops in the random-bit model (Bagnall et al., 2022)

Empirical results across these tools show orders-of-magnitude efficiency improvements compared to direct enumeration and reliability on programs with thousands to hundreds of thousands of variables, provided the program structure aligns with the respective method's strengths.


References:

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Discrete Probabilistic Programs.