Discrete Probabilistic Programs Overview

Updated 22 December 2025

Discrete probabilistic programs are computational models that encode stochastic processes using discrete distributions, standard control structures, and observation conditioning.
They utilize formal semantics based on probability generating functions and weighted automata to enable exact inference and quantitative analysis.
Challenges include state explosion and loop handling, with advanced methods like weighted model counting and symbolic calculi addressing these limitations.

A discrete probabilistic program is a computational model that encodes stochastic processes and manipulates integer- or finite-domain-valued variables using standard control structures (assignment, sequencing, conditionals, and potentially loops) plus primitives for sampling from discrete distributions and for conditioning on observed events. These models are the foundation for representing and analyzing a wide range of stochastic systems in machine learning, formal verification, and quantitative analysis. Exact inference, the task of calculating the program's output distribution conditioned on observations, is one of the primary challenges in the area, both theoretically and algorithmically.

1. Formal Structure of Discrete Probabilistic Programs

A discrete probabilistic program is generally defined in an imperative or functional syntax encompassing:

Variables $x \in \mathrm{Var}$ ranging over $\mathbb{N}$ (or finite domains, e.g., $\{0,1\}$ for Booleans)
Assignments $x := n$ (deterministic) and $x \sim D$ (sampling from a discrete distribution $D$ , e.g., Bernoulli, geometric)
Conditionals with "rectangular" guards $x=k$ or $x<n$ (no inter-variable comparisons)
Sequencing $S_1 ; S_2$
Observations $\mathrm{observe}(x=k)$ or more general Boolean predicates on the state
Optional: Loops $while \; G \; do \; S$ or finite for-loops

For example, the language ReDiP admits only loop-free programs with rectangular guards, while other frameworks (e.g., cpGCL or extensions in (Bagnall et al., 2022)) include unbounded loops with hard conditioning and recursion (Geißler et al., 15 Dec 2025, Geißler et al., 18 Sep 2025, Chen et al., 2022, Klinkenberg et al., 2023).

Semantics are specified by a measure transformer, a probability generating function (PGF) transformer, or a denotational semantics in terms of sub-distributions over state spaces, sometimes augmented by transition systems or operational Markov chains (Geißler et al., 15 Dec 2025, Chen et al., 2022, Klinkenberg et al., 2023).

2. Algebraic and Automata-Theoretic Models

2.1. Probability Generating Functions

A central formal device is encoding sub-distributions $\mu : \mathbb{N}^k \to [0,1]$ by their probability generating function (PGF)

$G_\mu(X_1, \dots, X_k) = \sum_{n \in \mathbb{N}^k} \mu(n) X_1^{n_1} \cdots X_k^{n_k}$

This formal power series algebraically encodes the full behavior of the program as a mapping on state spaces. Program constructs correspond to PGF operations: mixture (addition), convolution, monomial shift (assignment), and projection (marginalization) (Geißler et al., 15 Dec 2025, Geißler et al., 18 Sep 2025, Klinkenberg et al., 2023, Chen et al., 2022, Klinkenberg et al., 2020).

2.2. Weighted Automata

For finite or countably supported discrete programs, the distribution can be captured by weighted automata where states abstract program configurations, transitions represent possible updates with probabilistic weights, and accepted labels encode variable increments:

A weighted automaton $A = (Q, \Sigma, \Delta, \alpha, \beta)$ consists of states $Q$ , alphabet $\Sigma$ (usually monomials over program variables), transition weights $\Delta$ , initial state weights $\alpha$ , and final weights $\beta$ . The program's semantics is then specified as the automaton's weight function:

$A(w) = \sum_{q_0, \dots, q_n \in Q} \alpha(q_0) \left(\prod_{i=1}^n \Delta(q_{i-1}, \sigma_i, q_i)\right) \beta(q_n)$

for a sequence of labels $w = \sigma_1 \dots \sigma_n$ (Geißler et al., 15 Dec 2025, Geißler et al., 18 Sep 2025).

Each program statement transforms a weighted automaton through a corresponding algebraic operation: concatenation for sequencing, union for probabilistic choice, projection for assignment, and intersection with regular languages for conditioning.

3. Exact Inference: Computation and Algorithms

The principal task is to compute or represent the posterior distribution induced by a program and a prior, conditioned on any observations. This is approached by recursive traversal and composition of algebraic or automata-theoretic transformers associated with each program fragment.

3.1. Loop-Free Programs

For loop-free discrete probabilistic programs, recursive structural compilation to weighted automata or explicit algebraic PGF transformers yields a fully symbolic, exact computation of the posterior distribution (Geißler et al., 15 Dec 2025, Geißler et al., 18 Sep 2025). The process is as follows:

Initialize from the prior (usually a Dirac at the initial store).
Apply statement transformers according to program syntax, using automata operations or PGF algebra.
At each conditional, branch and project the automaton/PGF to the relevant support sets.
For each observe/conditioning, intersect (in automata) or restrict (in PGFs) to satisfying states, then renormalize.
Return the normalized automaton or rational function representing the output distribution.

The construction is sound with respect to the operational Markov-chain semantics (Geißler et al., 18 Sep 2025), and is aligned with weakest-pre-expectation calculi in the finite-state case (Schröer et al., 2024).

3.2. Programs With Loops

For programs with loops or unbounded recursion, semantics requires solving fixed-point equations in the space of PGFs or automata. For rectangular (variable–constant) guarded loops, this can be achieved by:

Fixed-point induction over PGF transformers, via Park induction, yielding least or greatest fixed points (sub-distributions or invariants) (Chen et al., 2022, Klinkenberg et al., 2023, Klinkenberg et al., 2020).
Explicit invariants: To verify that a loop generates a specified distribution, finding closed-form PGFs as invariants suffices for soundness and completeness—decidability follows by reducing to rational function equivalence (Chen et al., 2022).
Dynamic programming approaches: Factored sum-product networks (FSPNs) encode recursive dependencies as systems of equations, solved by fixed-point iteration in strongly connected component order (Stuhlmüller et al., 2012).

For general systems, the automata or PGF representations may grow exponentially, and termination or exactness are not guaranteed without further structure.

4. Symbolic and Scalable Inference

4.1. Weighted Model Counting

A prominent approach for practical inference is reducing discrete probabilistic programs to weighted model counting (WMC) or knowledge compilation (Cao et al., 2023, Holtzen et al., 2020, Holtzen et al., 2019). Here, the program is compiled into a weighted Boolean (or bitvector) circuit, with weights tracking discrete probabilities of random choices. Inference queries (marginals, conditionals) become ratio computations of WMC on these circuits—efficient if the compiled structure (BDD, d-DNNF, SDD) exploits conditional and context-specific independence.

Integer Distributions: Binary encoding for integers (bit-blasting) enables compact circuit representations for arithmetic operations such as addition, comparison, or bounded loops (Cao et al., 2023, Garg et al., 2023).
Practical scalability: Orders of magnitude improvements over path enumeration and exact enumeration approaches (e.g., Psi, WebPPL's enumerator) (Cao et al., 2023, Holtzen et al., 2020).

4.2. Symbolic Calculi

For information-theoretic or quantitative analysis (e.g., entropy, mutual information, KL divergence), weakest-pre-expectation calculi enable closed-form symbolic computation of quantities of interest for finite discrete programs (Schröer et al., 2024). All key measures are expressible exactly in this algebraic domain.

5. Extensions, Limitations, and Verification

5.1. Extensions

Continuous and Mixed Distributions: While the automata/PGF frameworks extend theoretically to continuous support (via hybrid automata or formal power series over real-valued indeterminates), tractable exact inference is generally unattainable; approximate techniques or measure-theoretic generalizations are then requisite (Wu et al., 2018, Garg et al., 2023).
Richer Control Flow: Handling non-rectangular guards, recursion with complex dependencies, or advanced data types requires moving to more powerful automata (e.g., pushdown, multitape), or richer algebraic frameworks (Geißler et al., 15 Dec 2025).
Formal verification: Decidability of output distribution equivalence and synthesis of invariants is established for rectangular-guarded, rational-distribution programs (Chen et al., 2022). This leads to certified tools able to verify complex behaviors automatically (Klinkenberg et al., 2023).

5.2. Computational and Expressiveness Limitations

State Explosion: Automata and BDD-based methods can become infeasible for high-dimensional or poorly structured programs.
Loop Handling: For unbounded or mutually recursive loops, synthesis of loop invariants or fixed points may not terminate or may miss exactness unless supported by further theory (e.g., eventually geometric distributions with contraction invariants) (Zaiser et al., 2024).
Expressiveness: Most exact approaches support either only loop-free programs or looped programs with severe syntactic/semantic restrictions (rectangularity, finite variables, or bounded arity).

6. Representative Tools and Empirical Insights

Several implemented systems reflect the state-of-the-art:

Tool	Underlying Method	Key Features and Applicability
Dice	Weighted model counting over BDDs	Scalable exact inference for large discrete models with arithmetic (Holtzen et al., 2020)
Prodigy	Denotational PGF semantics + CAS	Exact inference & verification for loopy (rectangular) programs (Klinkenberg et al., 2023, Chen et al., 2022)
Diabolo	Residual-mass and geometric bounds	Automated bounding of posteriors for loopy programs with convergence guarantees (Zaiser et al., 2024)
Zar	Verified coin-flip samplers	Certified sampling for discrete p.p.s with unbounded loops in the random-bit model (Bagnall et al., 2022)

Empirical results across these tools show orders-of-magnitude efficiency improvements compared to direct enumeration and reliability on programs with thousands to hundreds of thousands of variables, provided the program structure aligns with the respective method's strengths.

References:

(Geißler et al., 15 Dec 2025): "Probabilistic Programming Meets Automata Theory: Exact Inference using Weighted Automata"
(Geißler et al., 18 Sep 2025): "Weighted Automata for Exact Inference in Discrete Probabilistic Programs"
(Schröer et al., 2024): "Symbolic Quantitative Information Flow for Probabilistic Programs"
(Cao et al., 2023): "Scaling Integer Arithmetic in Probabilistic Programs"
(Chen et al., 2022): "Does a Program Yield the Right Distribution? Verifying Probabilistic Programs via Generating Functions"
(Klinkenberg et al., 2023): "Exact Bayesian Inference for Loopy Probabilistic Programs using Generating Functions"
(Zaiser et al., 2024): "Guaranteed Bounds on Posterior Distributions of Discrete Probabilistic Programs with Loops"
(Klinkenberg et al., 2020): "Generating Functions for Probabilistic Programs"
(Holtzen et al., 2020): "Scaling Exact Inference for Discrete Probabilistic Programs"
(Holtzen et al., 2019): "Symbolic Exact Inference for Discrete Probabilistic Programs"
(Bagnall et al., 2022): "Formally Verified Samplers From Probabilistic Programs With Loops and Conditioning"
(Stuhlmüller et al., 2012): "A Dynamic Programming Algorithm for Inference in Recursive Probabilistic Programs"

Markdown Upgrade to Chat

References (14)

Formally Verified Samplers From Probabilistic Programs With Loops and Conditioning (2022)

Probabilistic Programming Meets Automata Theory: Exact Inference using Weighted Automata (2025)

Weighted Automata for Exact Inference in Discrete Probabilistic Programs (2025)

Does a Program Yield the Right Distribution? Verifying Probabilistic Programs via Generating Functions (2022)

Exact Bayesian Inference for Loopy Probabilistic Programs using Generating Functions (2023)

Generating Functions for Probabilistic Programs (2020)

Symbolic Quantitative Information Flow for Probabilistic Programs (2024)

A Dynamic Programming Algorithm for Inference in Recursive Probabilistic Programs (2012)

Scaling Integer Arithmetic in Probabilistic Programs (2023)

10.

Scaling Exact Inference for Discrete Probabilistic Programs (2020)

11.

Symbolic Exact Inference for Discrete Probabilistic Programs (2019)

12.

Bit Blasting Probabilistic Programs (2023)

13.

Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms (2018)

14.

Guaranteed Bounds on Posterior Distributions of Discrete Probabilistic Programs with Loops (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Discrete Probabilistic Programs.

Discrete Probabilistic Programs Overview

1. Formal Structure of Discrete Probabilistic Programs

2. Algebraic and Automata-Theoretic Models

2.1. Probability Generating Functions

2.2. Weighted Automata

3. Exact Inference: Computation and Algorithms

3.1. Loop-Free Programs

3.2. Programs With Loops

4. Symbolic and Scalable Inference

4.1. Weighted Model Counting

4.2. Symbolic Calculi

5. Extensions, Limitations, and Verification

5.1. Extensions

5.2. Computational and Expressiveness Limitations

6. Representative Tools and Empirical Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Discrete Probabilistic Programs Overview

1. Formal Structure of Discrete Probabilistic Programs

2. Algebraic and Automata-Theoretic Models

2.1. Probability Generating Functions

2.2. Weighted Automata

3. Exact Inference: Computation and Algorithms

3.1. Loop-Free Programs

3.2. Programs With Loops

4. Symbolic and Scalable Inference

4.1. Weighted Model Counting

4.2. Symbolic Calculi

5. Extensions, Limitations, and Verification

5.1. Extensions

5.2. Computational and Expressiveness Limitations

6. Representative Tools and Empirical Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research