Bayesian Program Learning

Updated 26 February 2026

Bayesian Program Learning is a probabilistic framework that infers generative program models directly from data by combining Bayesian inference with program synthesis.
It employs stochastic grammars, approximate Bayesian computation, and neural-guided DSL induction to balance model complexity with data fit.
The approach enables automatic abstraction by merging shared substructures, promoting interpretable, sample-efficient models across diverse domains.

Bayesian Program Learning is a probabilistic framework for inferring source code representations of generative models directly from data. It synthesizes elements of Bayesian inference, probabilistic programming, and program synthesis to yield programs whose executions match observed data or behaviors. This paradigm supports model discovery in domains ranging from symbolic rule induction and logical programming to the synthesis of sampling algorithms, neural-guided program synthesis, and the incremental construction of high-level interpretable abstractions. The approach is unified by viewing learning as posterior inference over a joint space of program structures and, in some cases, associated library fragments or parameters. The following sections detail the foundational principles, core methodologies, representative systems, comparative distinctions, and empirical outcomes documented across major research contributions.

1. Probabilistic Formulation and Core Concepts

Bayesian Program Learning (BPL) formalizes model learning as probabilistic inference in a space of possible programs given a dataset. This involves specifying:

A generative prior over program structures: Typically instantiated as a stochastic grammar, a probabilistic context-free grammar, or a parameterized library of primitives with associated usage weights. This prior can encode domain biases such as brevity, compositionality, and reusability.
A data likelihood model: For deterministic synthesis tasks, the likelihood is an indicator function that is unity when a candidate program yields correct outputs for the observed specification. For stochastic or generative domains, the likelihood is the (possibly intractable) marginal probability that the program produces the observed data or closely matches its statistics.
A posterior over programs: $P(\mathrm{prog} \mid D) \propto P(D \mid \mathrm{prog}) P(\mathrm{prog})$ .

In the grammar-based approach for sampler synthesis (Perov, 2016), the prior $P(\mathrm{prog})$ is given by the product of production probabilities along the program's abstract syntax tree. Likelihoods are evaluated using Approximate Bayesian Computation (ABC), where the closeness in distributional statistics between program-generated samples and the observed dataset is measured using a kernel over summary statistics.

In the inductive logic programming context (Sharma et al., 8 Aug 2025), hypotheses are collections of Horn clauses, combined with probabilistic wrapper rules, and the posterior balances the program's message length (complexity) against how well it fits positive and negative examples.

For neural-guided synthesis with DSL learning (Ellis et al., 2020, Palmarini et al., 2023), the prior over programs is parameterized by a library and grammar (e.g., a bigram or unigram distribution over typed functional combinators), and the likelihood is task-dependent.

2. Model Classes, Representations, and Abstractions

BPL frameworks have instantiated several families of generative models:

Logic Programs: Programs are definite Horn clause sets, augmented with probabilistic wrapper rules (with Problog semantics), supporting noise modeling in ILP. Each ground atom's probability depends on entailment by the learned clause-set and associated θ parameters (Sharma et al., 8 Aug 2025).
Functional Programs over Algebraic Data Types: As in Church-like or Lisp dialects, primitives include control flow constructs, random choice operators, and higher-order procedures; probabilistic programs are distributions over structured data such as nested lists or trees (Hwang et al., 2011).
Typed Lambda Calculus Expressions: Used in DreamCoder and DreamDecompiler, where a library of higher-order typed functions is grown compositionally, forming multi-layered DSLs capable of expressing increasingly abstract and reusable sub-routines (Ellis et al., 2020, Palmarini et al., 2023).
Sampler Synthesizers: Grammar-based probabilistic (sampler) programs, e.g. in Anglican, capable of rediscovering analytic sampling procedures for distributions such as Bernoulli or Normal (Perov, 2016).

The choice of representation affects not only expressive power but also tractability of search and inference, as well as the kinds of inductive bias available via the prior.

3. Inference Algorithms and Search Strategies

Efficient inference in program space is challenging due to the vastness and nonconvexity of the hypothesis class. The principal methodologies include:

Markov Chain Monte Carlo in Program Space: As in (Perov, 2016), Metropolis–Hastings explores the posterior by proposing local program rewrites via grammar rules, accepting or rejecting based on likelihood (as evaluated by ABC) and prior.
Beam or Greedy Search with Program Transformations: Bayesian program merging (Hwang et al., 2011) maintains a beam of programs, applying abstracting (anti-unification) and “deargumentation” moves to promote shared structure and succinctness.
Wake–Sleep Algorithms: DreamCoder/DreamDecompiler (Ellis et al., 2020, Palmarini et al., 2023) employ alternating inference and abstraction cycles. The wake phase uses a neural recognition model to propose high-posterior solutions. Sleep phases retrain the neural guide (amortization) and perform DSL induction by factoring out repeated subexpressions into new library primitives.
MAP Optimization with Combinatorial Solvers: In Minimum Message Length ILP (Sharma et al., 8 Aug 2025), a hybrid approach precomputes rule costs and statistics, then uses integer/CP-SAT optimization to seek a ruleset minimizing the total message length.

A table summarizing search/inference strategies:

Framework	Program Search/Infr.	Library/Abstraction Learning
Bayesian program merging	Beam search + transforms	Anti-unification, deargument.
DreamCoder/DreamDecompiler	Wake–Sleep, amortization	DSL abstraction (chunking)
MML-ILP	Random/approx+CP-SAT	None (fixed predicate set)
Anglican sampler synthesis	MCMC (MH, grammar)	None (fixed grammar)

4. Abstraction and Library Learning

A hallmark of BPL approaches is their ability to automatically induce abstractions, leading to more compact, general, and interpretable models:

Sub-program Refactoring: Repeated or similar sub-structures across program solutions are identified and factored out as new procedures or library primitives (Hwang et al., 2011, Ellis et al., 2020).
Program Merging and Compositionality: By anti-unifying shared subtrees, the system discovers generalizable patterns, e.g., parameterized functions over tree branches or recursive patterns in list processing (Hwang et al., 2011).
Probabilistic Chunking via Decompiling the Inference Network: DreamDecompiler leverages the neural recognition model to guide which fragments are likely beneficial to promote as library entries, using an explicit scoring based on usage and uncertainty reduction (Palmarini et al., 2023). This closes the feedback loop between amortized search and abstraction induction.

5. Objective Functions: Complexity versus Data Fit

BPL systems employ Bayesian objectives that balance model complexity with accuracy:

Explicit Two-Part Codes: MML-ILP uses $L(H,D) = -\log_2 P(H) - \log_2 P(D|H)$ , where $P(H)$ is a structured prior penalizing long and over-general programs, and $P(D|H)$ is a likelihood penalizing both misclassification and over-entailment (Sharma et al., 8 Aug 2025). MDL-style baselines fail to penalize over-generalization in unbalanced regimes, a deficiency the Bayesian form corrects.
Length Priors and Description Length Savings: In Bayesian program merging and DreamCoder-style abstraction, the prior is a penalization of symbol count or program length (exponentially), while the likelihood is an empirical fit to the dataset (Hwang et al., 2011, Ellis et al., 2020).
Variational Lower Bounds: With neural amortization, evidence lower bounds (ELBO) incorporate both expected data likelihood and the KL-divergence between the recognition network and prior, enabling joint optimization of inference and generative model parameters (Palmarini et al., 2023).

6. Empirical Results, Comparative Analyses, and Key Findings

BPL systems have demonstrated data efficiency, generalization, and robustness in diverse experimental settings:

Logical Program Induction: MML-based ILP outperforms MDL-style baselines by 4–7% in balanced accuracy in few-shot regimes and up to 50% in unbalanced scenarios. Its performance holds under high-noise and positive-only training conditions (Sharma et al., 8 Aug 2025).
Probabilistic Program Merging: In synthetic domains (e.g., colored trees), repeated application of abstraction and deargumentation shrinks program sizes by over 50% while preserving or enhancing posterior probability. Induced programs generalize structural motifs to generate novel but appropriate outputs (Hwang et al., 2011).
Neural-Guided DSL Induction: DreamCoder, through interleaved abstraction and recognition learning, achieves near-complete coverage of held-out tasks in domains ranging from list manipulation to physics discovery. Ablations indicate that removing DSL learning degrades final accuracy by up to 40 points, underlining the necessity of compositional abstraction (Ellis et al., 2020).
Amortized Library Learning: DreamDecompiler accelerates generalization, achieving 10–17% higher test task coverage at mid-training compared to standard DreamCoder in multiple domains. The greatest gains are observed in early training and few-shot settings (Palmarini et al., 2023).
Sampler Program Synthesis: The grammar-MCMC approach in Anglican rediscovers canonical implementations (e.g., the Box–Müller Normal sampler, Bernoulli rejection samplers) and matches or approaches the efficiency of genetic programming in terms of solution quality per evaluation run, while providing principled Bayesian uncertainty quantification (Perov, 2016).

7. Relationship to Other Learning Paradigms and Limitations

Bayesian Program Learning contrasts with classic program synthesis and MDL-based learning by emphasizing the explicit role of probabilistic priors and inference over full program structure and (optionally) parameter hierarchies. MDL often omits explicit hypothesis priors and fails in data-imbalanced regimes. Classical neural program synthesis approaches, such as DeepCoder, either fix the DSL or lack hierarchical abstraction induction (Ellis et al., 2020). BPL frameworks provide distributions over hypotheses, supporting interpretability and sample-efficient generalization especially in limited and noisy data contexts.

Current limitations include the computational cost of search in unrestricted program spaces, the separation of candidate generation and guide-based chunking in some abstraction induction pipelines, and reliance on hand-specified grammar rules in sampler synthesis. A plausible implication is that further joint optimization of recognition and generative components, richer neural architectures exploiting both breadth and depth, and more integrated abstraction induction mechanisms could further increase scalability and generalization (Palmarini et al., 2023).

In summary, Bayesian Program Learning constitutes a unified and extensible set of methodologies for inferring interpretable, generalizable programmatic models directly from data, leveraging Bayesian inference principles, abstract representations, and hybrid search strategies to combine the strengths of symbolic and statistical machine learning (Sharma et al., 8 Aug 2025, Hwang et al., 2011, Ellis et al., 2020, Palmarini et al., 2023, Perov, 2016).

Markdown Report Issue Upgrade to Chat

References (5)

Applications of Probabilistic Programming (Master's thesis, 2015) (2016)

Learning Logical Rules using Minimum Message Length (2025)

DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning (2020)

Bayesian Program Learning by Decompiling Amortized Knowledge (2023)

Inducing Probabilistic Programs by Bayesian Program Merging (2011)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayesian Program Learning.

Bayesian Program Learning

1. Probabilistic Formulation and Core Concepts

2. Model Classes, Representations, and Abstractions

3. Inference Algorithms and Search Strategies

4. Abstraction and Library Learning

5. Objective Functions: Complexity versus Data Fit

6. Empirical Results, Comparative Analyses, and Key Findings

7. Relationship to Other Learning Paradigms and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Bayesian Program Learning

1. Probabilistic Formulation and Core Concepts

2. Model Classes, Representations, and Abstractions

3. Inference Algorithms and Search Strategies

4. Abstraction and Library Learning

5. Objective Functions: Complexity versus Data Fit

6. Empirical Results, Comparative Analyses, and Key Findings

7. Relationship to Other Learning Paradigms and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research