PerProb Framework Overview

Updated 22 December 2025

PerProb is a unified framework that encompasses probabilistic resource analysis, expert-system reasoning, semantic separation logic, and privacy evaluation in machine learning.
It employs program transformation and symbolic manipulation to derive closed-form probability distributions for resource usage and computational metrics.
The framework also integrates belief revision, modular verification, and empirical privacy assessment to manage uncertainty and mitigate memorization risks in LLMs.

PerProb is a designation shared by multiple distinct but influential frameworks in probabilistic program analysis, resource modeling, expert-system reasoning, and privacy evaluation for machine learning. In academic literature, the name is associated with four main contributions: a program transformation approach to probabilistic resource analysis (Kirkeby et al., 2016), a non-monotonic reasoning controller for probabilistic assumptions in expert systems (Cohen, 2013), a semantics-driven separation logic for imperative probabilistic programs (Jereb et al., 2 Jun 2025), and a model-agnostic, label-free methodology for quantifying memorization in LLMs (Liao et al., 16 Dec 2025). These frameworks are independent in methods and application domains but are unified by their emphasis on probabilistic semantics, symbolic analysis, and model- or inference-level rigor.

1. PerProb for Probabilistic Resource Analysis by Program Transformation

PerProb, as introduced in (Kirkeby et al., 2016), is a fully automated, multi-phase program analysis framework that symbolically derives output probability distributions for program resource usage, given input distributions. Its purpose is to replace average- or worst-case analysis with the computation of the full probability distribution of quantities such as step-counts, memory accesses, or energy usage.

The architecture consists of five key phases:

Instrumentation: Source C code is annotated with synthetic counters at every relevant operation, producing an executable instrumented program.
Front-end Translation: Slicing isolates statements that interact with counters and transforms nested loops into primitive-recursive functions in a first-order intermediate representation.
Create Phase (Probability-Program Generation): The target function’s probability distribution is expressed as

$P_f(z) = \sum_{x\in\mathbb{Z}^n} c(f(x)=z)\cdot P_X(x)$

where $c(\phi)$ is the characteristic predicate.

Separate Phase (Call-Elimination): Recursion is unfolded into explicit sum-product expressions using argument development constructs $\mathrm{argDev}(x, \mathrm{new}_x, i)$ .
Simplify Phase (Closed-Form Transformation): Algebraic rewrites (symbolic summation, over-approximation rules, integration with Mathematica for symbolic manipulation) reduce expressions to closed form or tight upper bounds.

Key techniques include removal of conditionals, recursion unfolding, elimination via closed-form series (power-sums up to degree ten are supported), and product over-approximation using P-box style reasoning.

PerProb operates on discrete input spaces $X$ with PMFs $P_X:X\rightarrow[0,1]$ and supports both independent and dependent input structures. The output—typically a succinct closed-form or over-approximated distribution formula—enables precise average-case and probabilistic analysis of C programs.

2. PerProb for Non-Monotonic Probabilistic Reasoning (NMP)

As outlined in (Cohen, 2013), PerProb (therein, the Non-Monotonic Probabilist, NMP) is a hybrid reasoning system for expert systems, synthesizing Shafer-style belief functions with non-monotonic revision logic. The framework addresses the requirement for dynamic revision of probabilistic assumptions (e.g., independence, model selection) in the presence of empirical conflict.

Essential constructs include:

Basic Probability Assignments (Belief Functions): $m:2^H\rightarrow[0,1]$ , encoding degrees of belief over hypotheses sets.
Dempster’s Rule of Combination: Aggregates evidence sources by orthogonal sum, with normalization to discard conflicting mass.
Fuzzy Measures of Conflict: Given a dichotomy $Q=(S, \bar{S})$ , conflict is $2\min(\operatorname{Bel}(S), \operatorname{Bel}(\bar{S}))$ .
Assumption Support and In/Out Membership: Assumptions are tracked through fuzzy support lists, quantifying the contribution of each to a conclusion and the degree to which they are “in play.”
Iterative Revision Algorithm: On significant conflict, the system identifies and weakens/retracts the most culpable assumption, using foundation/support degree calculations and recombination by Dempster’s rule, halting when further revision is unwarranted.

PerProb thus provides a granular, explainable mechanism for dynamically revoking or discounting assumptions driving probabilistic conclusions, bridging belief-function and non-monotonic TMS approaches.

3. PerProb in Probabilistic Separation Logic and the Frame Rule

In (Jereb et al., 2 Jun 2025), PerProb is the designation for a formal semantic framework supporting probabilistic separation logic with a principled, side-condition-free frame rule. This work advances program verification under probabilistic imperative semantics (the pwhile language), focusing on modularity and independence reasoning.

Salient features:

Probabilistic States: Random states are modeled as countable functions with full-support distributions; independence and conditional independence are rigorously defined at the variable and assertion levels.
Specifications and Safety: Hoare-style triples $\{\Phi\}C\{\Psi\}$ require almost sure non-faulting (safety) and post-condition satisfaction on termination.
Relative Tightness: Proven property asserting that the final state’s relevant variables depend on the initial state’s relevant variables only. Formally, for $\{\Phi\}C\{\Psi\}$ partially correct, the post-state projection and entire initial state are conditionally independent given the initial projection.
Separating Conjunction: Extended to probabilistic contexts; $\Sigma\models\Phi*\Psi$ iff their footprints are independent in $\Sigma$ .
Frame Rule: If $\{\Phi\}C\{\Psi\}$ holds and $FV(\Theta)$ is disjoint from the program’s modified variables, then $\{\Phi*\Theta\}C\{\Psi*\Theta\}$ holds with no auxiliary side conditions.

Canonical examples (e.g., independent coin flips, overlaying deterministic fragments) illustrate correctness and modular compositionality in verifying independence properties under program transformations.

4. PerProb for Memorization Assessment in LLMs

In (Liao et al., 16 Dec 2025), PerProb designates a unified, label-free empirical framework for assessing memorization and privacy risk in LLMs, especially in settings lacking access to ground-truth member/non-member labels.

The key procedural and analytic steps:

Label-Free Memorization Metric: For a sample $x$ , use

$\Delta\mathrm{PPL}(x) = \mathrm{PPL}_\mathcal{V}(x) - \mathrm{PPL}_\mathcal{A}(x),\qquad \Delta\lambda(x) = \lambda_\mathcal{V}(x) - \lambda_\mathcal{A}(x)$

where $\mathcal{V}$ and $\mathcal{A}$ are the victim and adversary models, respectively, and $\lambda(x)$ is the mean log-likelihood.

Experimentation across Four Attack Patterns: From classic black-box (shadow model, no parameter sharing) to full white-box (parameter transfer or partial data leaks), all attack scenarios use the same PerProb metrics.
Evaluation Protocol: Generation tasks involve paired synthetic sample production and $\Delta$ -statistic assessment; classification tasks use output log-probs as features for standard classifiers (RF, MLP).
Mitigation Analysis: Effectiveness of knowledge distillation (KL minimization with temperature), early stopping (validation $\Delta\mathrm{PPL}$ monitoring), and differential privacy (Laplace noise on logits) is quantified empirically.
Findings: Robust detection of memorization in mid-scale LLMs (e.g., GPT-2, GPT-Neo), with substantial F1 improvements over chance for MIAs, and successful reduction of leakage by applied mitigation techniques.

PerProb in this setting delivers a model- and task-agnostic framework for quantifying and mitigating privacy risk due to memorization in generative neural architectures, with applicability in both open- and closed-source contexts.

5. Key Methodological and Theoretical Contributions

The PerProb frameworks collectively demonstrate the power of symbolic and semantic program transforms, rigorous uncertainty management, and inference auditing in computational systems:

Symbolic Probability Propagation: Algebraic manipulation and recursion unfolding achieve analytical tractability in resource analysis (Kirkeby et al., 2016).
Integrated Uncertainty Reasoning: Merging belief-function calculus with assumption-revision provides robust conflict resolution in probabilistic inference (Cohen, 2013).
Compositionality and Modularity: Purified semantic models and frame rules extend classical ideas from separation logic to probabilistic domains, supporting scalable verification (Jereb et al., 2 Jun 2025).
Empirical Security Assessment: Unified, indirect metrics operationalize privacy risk quantification without explicit membership data, expanding practical relevance for privacy analysis (Liao et al., 16 Dec 2025).

These methodologies foster transparent, modular, and mathematically sound analysis across domains where probabilistic behavior is central.

6. Limitations, Open Problems, and Future Directions

Despite broad utility, each PerProb instantiation exhibits particular technical and empirical limitations:

In program transformation-based resource analysis, complexity increases rapidly for programs with deep or non-linear recursion, and some summation series (e.g., $\sum 1/k$ ) are not automatically reduced (Kirkeby et al., 2016).
The NMP controller lacks a formal convergence guarantee, and the grading of assumption strength relies on heuristic membership calculations (Cohen, 2013).
The semantic separation logic framework presumes discrete, countable state spaces and does not address continuous or dynamically allocated structures (Jereb et al., 2 Jun 2025).
Memorization assessment has been demonstrated only on mid-scale open-source LLMs; real-world prompt drift and proprietary architectures remain practical challenges (Liao et al., 16 Dec 2025).

Open research directions include extending methods to richer programming or state models, formalizing convergence criteria, adapting to continuous distributions or heap-allocated data, and applying black-box memorization analysis to commercial LLM APIs.

A plausible implication is that the evolution of PerProb frameworks across disparate subfields signals a wider movement toward unified, semantically principled probabilistic systems analysis. Further integration of symbolic, semantic, and empirical approaches is anticipated to deepen understanding and scalability in resource modeling, uncertainty reasoning, and privacy assessment.