Program-as-Weights (PAW): Dual Inference Paradigms

Updated 3 July 2026

PAW is a paradigm that transforms program specifications into weight objects, enabling both neural fuzzy-function execution and automata-theoretic inference.
The neural PAW approach compiles natural language-specified functions into compact artifacts that achieve up to 73.78% accuracy with significantly reduced memory usage.
The automata-theoretic framework converts probabilistic programs into weighted automata, allowing for exact inference via formal power series semantics.

Program-as-Weights (PAW) refers to two distinct paradigms within machine learning and probabilistic programming: (1) the neural “fuzzy-function” programming approach for compiling natural-language-specified functions into parameter-efficient neural artifacts (Zhang et al., 2 Jul 2026), and (2) the automata-theoretic compilation of probabilistic programs into weighted automata for exact inference (Geißler et al., 18 Sep 2025). Both frameworks share a unifying principle: programs are represented and executed as weight objects—either as neural parameters or as weighted transitions—enabling novel regimes of programmability and inference.

1. Fuzzy-Function Programming and the PAW Paradigm

In the fuzzy-function programming paradigm, a fuzzy function $f: X \to Y$ is any function for which specification by crisp symbolic code is impractical or impossible. Typical examples include log-line alerting, intent-based ranking, and data repair, which are more naturally described by natural language, labeled examples, or vague constraints.

Traditional LLM usage invokes a high-capacity foundation model per input, raising issues for locality, reproducibility, and cost. PAW reframes model invocation: a function specification $s$ is compiled once into a small, locally executable "program" $p$ —a neural artifact encoding the function. Subsequent executions run a lightweight neural interpreter on the compiled program and input $x$ , yielding output $\hat{y} \approx f(x)$ . This enables offline, verifiable, and resource-efficient deployment (Zhang et al., 2 Jul 2026).

In the automata-theoretic context, PAW refers to the translation of imperative probabilistic programs into weighted automata whose formal power series semantics represent distributions over possible executions. Every statement in a discrete program corresponds to a compositional automata transformation, so that the resulting automaton encodes posterior distributions under conditioning (Geißler et al., 18 Sep 2025).

2. Formal Definitions and Core Equations

Neural PAW (Fuzzy-Function)

Compiler abstraction:

$p = \text{Compiler}(s), \qquad \hat{y} = \text{Interpreter}(p, x) \approx f(x)$

Program structure: $p = (p_\mathrm{discrete}, p_\mathrm{continuous})$ $p = (p_{discrete}, p_{continuous})$
- $p_\mathrm{discrete}$ : a pseudo-program—textual restatement plus I/O examples
- $p_\mathrm{continuous}$ : parameter-efficient adapter (LoRA), computed via mean-pooled compiler hidden states and mapped to adapters injected into a frozen interpreter
Optimization objective (supervised fine-tuning):

$\mathcal{L}(\theta) = \mathbb{E}_{(s,x,y)\sim \text{FuzzyBench}}[-\log P_{\phi}(y \mid p_\mathrm{discrete}, p_\mathrm{LoRA}(\theta; s, p_\mathrm{discrete}), x)]$

where $s$ 0 is the frozen interpreter’s parameter set.

Automata-Theoretic PAW (Probabilistic Programs)

Weighted automata: $s$ 1 over a commutative alphabet $s$ 2; the semantics is a formal power series

$s$ 3

Program translation: A loop-free imperative program $s$ 4 over $s$ 5 integer variables is compiled into automata operations (label substitution, concatenation, product with DFA, weighted superposition) so that $s$ 6 yields a weighted automaton for the posterior distribution, matching operational semantics exactly
Soundness theorem: For normalized prior PGA $s$ 7 and loop-free $s$ 8, normalizing $s$ 9 yields the exact conditional output distribution.

3. System Architecture and Pipeline (Fuzzy-Function PAW)

The PAW compilation-execution pipeline comprises two stages (Zhang et al., 2 Jul 2026):

Step A: Compilation (Cloud, one-time per function)

Pseudo-compiler $p$ 0 (frozen Qwen3-4B-Instruct) produces $p$ 1 from $p$ 2
LoRA compiler $p$ 3 (trainable Qwen3-4B-Instruct) consumes $p$ 4, yields mean-pooled hidden states, mapped into LoRA weights ( $p$ 5)
Output: A program artifact (text + $p$ 623 MB LoRA adapter)

Step B: Local Execution (Per-query, offline)

Load frozen Qwen3-0.6B interpreter
On input $p$ 7: attach LoRA, prepend $p$ 8 to $p$ 9, run autoregressive generation for output $x$ 0

Component features:

LoRA rank $x$ 1, $x$ 2 shared low-rank bases per target module
Quantized deployment supports sub-GB memory, $x$ 3 tokens/s on commodity laptops
Compilation performed once; subsequent function evaluation is fast and resource-light

4. Dataset Construction and Training (FuzzyBench & Optimization)

The FuzzyBench dataset underpins fuzzy-function PAW training, comprising 10 million $x$ 4 triplets spanning seven families and $x$ 5 sub-categories. Specifications are generated programmatically with GPT-5.2, and each comes with a set of labeled I/O examples. The test split (10%) is spec-disjoint, with a verified subset requiring agreement between two strong models for label disambiguation.

Optimization is direct supervised fine-tuning of the LoRA compiler and mapping, using negative log-likelihood of interpreter predictions. No reinforcement learning or policy gradients are used. Quantization strategies (Q6_K, IQ4_XS) minimize memory overhead with minimal accuracy tradeoff.

5. Comparative Performance and Ablation Studies

Extensive quantitative benchmarks on FuzzyBench and public NLP datasets demonstrate that PAW using a 0.6B interpreter with LoRA achieves 73.78% exact-match accuracy on FuzzyBench, outperforming local prompting of Qwen3-32B (68.7%) while using roughly 1/50th the inference memory. Comparable trends hold across various real-world datasets, with the 0.6B+PAW combination consistently matching or surpassing much larger models for key function invocation workloads.

Ablations reveal that:

Compiler-generated adapters outperform fixed or fully fine-tuned LoRA on the same backbone by at least 15 percentage points
Hybrid design (pseudo-program + LoRA) improves robustness to input noise, with the discrete component acting as a denoiser
Simpler shared-basis LoRA mappers are optimal compared to per-position or per-layer variations

Method	FuzzyBench	Inference Memory	Throughput
Qwen3-32B Prompting	68.7%	~60 GB (bf16)	--
Qwen3-0.6B+PAW (LoRA)	73.78%	~1.2 GB (bf16)	~30 tokens/s (Mac)
GPT-2 124M+PAW (LoRA)	54.39%	<1 GB	--

6. Program-as-Weights in Probabilistic Programming

The automata-based PAW formalism encodes discrete probabilistic programs as objects whose states and transitions are weighted, and whose semantics as formal power series directly correspond to the prior-to-posterior distribution transformer. Each primitive construct (increment, assignment, conditional, observation) is mapped to an explicit automata transformation—resulting in sound and exact computation of posterior probabilities for all reachable variable valuations, without resorting to sampling or approximation (Geißler et al., 18 Sep 2025). For loop-free programs and bounded variable ranges, automata size grows polynomially or moderately exponentially, and minimization can be applied.

A concrete example demonstrates compilation of a two-variable coin-flip program with observation, resulting in an automaton whose formal power series yields exact joint and conditional probabilities without loss.

7. Applications, Limitations, and Theoretical Significance

Representative Applications

Real-time local log alerting, on-device repair, and intent-based navigation without external APIs
Modular tool pipelines for semantic search and multi-function routing (performance of 93% on ToolCall-15)
Integration in interactive fuzzy games and safety verification routines

Limitations

Compiler-interpreter architectural coupling: changing backbone requires retraining compiler
Primarily validated for single-step program functions; extension to multi-step reasoning or more expressive grammar is pending
Continuous/fuzzy program (“weight”) artifacts are opaque; neural program inspection remains an open area
Synthetic nature of training data invites ongoing external validation

Theoretical Context and Impact

PAW encodes a transition from per-input API-based LLM solving to compilation of persistent, efficient, and portable neural function artifacts. In the probabilistic programming domain, it establishes a correspondence between program semantics and automata theory, enabling exact inference via algebraic manipulation of weights rather than iterative sampling or computationally expensive Markov Chain methods.

References

"Program-as-Weights: A Programming Paradigm for Fuzzy Functions" (Zhang et al., 2 Jul 2026)
"Weighted Automata for Exact Inference in Discrete Probabilistic Programs" (Geißler et al., 18 Sep 2025)

Markdown Report Issue Upgrade to Chat

References (2)

Program-as-Weights: A Programming Paradigm for Fuzzy Functions (2026)

Weighted Automata for Exact Inference in Discrete Probabilistic Programs (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Program-as-Weights (PAW).