EnCompass: Python Agent Framework

Updated 5 December 2025

EnCompass is a Python framework that formalizes agent programs via probabilistic angelic nondeterminism, separating core workflow from inference-time search strategies.
It compiles annotated Python functions into explicit search spaces, enabling integration with diverse algorithms like DFS, BFS, beam search, and MCTS.
The framework reduces code overhead and LLM calls, as evidenced by empirical gains in speed and scalability across code translation and hypothesis search tasks.

EnCompass is a Python framework for agent programming that operationalizes "probabilistic angelic nondeterminism" (PAN), enabling the principled separation of agent workflow logic from inference-time search strategies. The framework empowers researchers and practitioners to efficiently develop, experiment with, and deploy LLM-based agents by compiling annotated Python functions into explicit search spaces over execution paths, allowing for seamless workflow specification and flexible, modular search algorithm integration (Li et al., 3 Dec 2025).

1. Foundations: Probabilistic Angelic Nondeterminism

Probabilistic angelic nondeterminism (PAN) formalizes the agent program as a nondeterministic, probabilistic process, where execution is partitioned at explicitly marked "branchpoints" to define decision steps within the agent's workflow. The program state is defined by $q \in Q = \{\text{branchpoint locations} \cup \text{end-of-program}\}$ and memory snapshot $\Gamma$ (including locals and shared variables). Between marked points, transitions are deterministic except for LLM calls or other random oracles, modeled as one-step probabilistic transitions:

$\delta : (q, \Gamma) \rightarrow \text{Dist}(Q \times \Gamma)$

A complete program run is a path $\tau = ((q_0, \Gamma_0) \to (q_1, \Gamma_1) \to \dots \to (q^*, \Gamma^*))$ , with probability:

$P(\tau) = \prod_k \delta((q_k, \Gamma_k) \to (q_{k+1}, \Gamma_{k+1}))$

Unlike demonic nondeterminism (worst-case branching), PAN enables "angelic" search: guided exploration over all possible execution paths to maximize a user-defined score $S(\Gamma^*)$ , yielding the optimal trajectory

$\tau^* = \arg \max_{\tau} S(\Gamma^*(\tau))$

The search process constructs and explores a tree over program states $(q, \Gamma)$ , with search algorithms selecting which leaf (checkpoint) to expand next. This approach presents a rigorous, modular foundation for agent search and evaluation.

2. Separation of Workflow Specification and Search Strategy

Traditional agent programming often entangles workflow logic—such as loops, accumulators, and scoring heuristics—with inference-time search (e.g., varying N in best-of-N sampling, implementing beam search). EnCompass, through PAN, enforces separation by providing:

Workflow layer: Authored as single-threaded code, with designated primitive calls marking unreliability (e.g., branchpoint) and evaluation (e.g., record_score).
Search layer: Realized as an external engine operating over the tree of checkpoints, parameterized independently of the workflow.

This separation avoids code refactoring when experimenting with alternative search algorithms; adding branchpoints or modifying search parameters suffices. Compilation via the EnCompass decorator transforms routines into explicit search spaces for exploration.

The following pseudocode excerpt illustrates the compilation process:

@encompass.compile
def agent(x):
    branchpoint()
    y = LLM(x)
    record_score(evaluate(y))
    return y

def cps_agent(frame, resume):
    frame['y'] = LLM(frame['x'])
    resume1 = lambda fr: resume(fr)
    return (frame, resume1)

3. Framework Implementation and API

The core of EnCompass is a Python decorator, @encompass.compile, which processes the abstract syntax tree (AST) of the agent function, applies continuation-passing style (CPS) transformation and tail-call optimization, then exposes enriched runtime objects supporting the search space API.

After decoration, functions become SearchSpace objects supporting:

Method	Description	Return Type
`start()`	Initializes to first program state (checkpoint)	`Checkpoint`
`search(algo, **kwargs)`	Executes specified search to find best return	best value
`search_multiple(...)`	Runs search to return list of (value, score) pairs	List of tuples

Primitives within decorated functions include:

branchpoint(name=…), branchpoint_choose(choices)
record_score(v), record_score(evaluator, target, label=…)
record_costs(api_cost=…), early_stop_search(), optional_return(v)
protect(expr, Exception), searchover(fn(...)) (for nested searches)

Exemplary usage:

@encompass.compile
def my_agent(inp):
    branchpoint()
    out = LLM(inp)
    record_score(quality(out))
    return out

best = my_agent(x).search("dfs", default_branching=10)

For multi-level beam search:

@encompass.compile
def translate_file(repo):
    branchpoint(name="file_step")
    skeleton = llm_stub(repo)
    record_score(verify_stub(skeleton))
    for fn in skeleton.methods:
        branchpoint(name="method_step")
        code = llm_translate(fn)
        record_score(verify_translation(code))
        skeleton.add(code)
    return skeleton

result = translate_file(r).search("beam", beam_width=2, default_branching=3)

4. Supported Search Algorithms and Execution Strategies

All strategies within EnCompass operate by repeatedly invoking step() on chosen Checkpoint objects. The following algorithms are supported natively:

Depth-First Search (DFS)
Breadth-First Search (BFS)
Best-First Search (BeFS)
Beam Search
Monte Carlo Tree Search (MCTS)

Two novel variants are introduced:

Re-expand Best-First: Permits re-expansion of nodes to incorporate updated scores.
Explorative Re-expand Best-First: Augments node selection with an upper-confidence-bound bonus:

$U(\tau) = Q(\tau) + c\sqrt{\frac{\ln N}{n(\tau)}}$

where $Q(\tau)$ is the score estimate, $N$ the total node expansions, and $n(\tau)$ the count for node $\tau$ .

Beam search is illustrated below:

initialize beam = [initial_checkpoint]
repeat until cost budget:
    candidates = []
    for ckpt in beam:
        for _ in range(branching_factor):
            candidates.append(ckpt.step())
    sort candidates by .score descending
    beam = candidates[:beam_width]
return beam[0].return_value

This catalogue of strategies enables granular cost-performance tuning and broad experimental flexibility without altering core workflow logic.

5. Empirical Evaluation and Case Studies

Performance and reliability gains are demonstrated in three agent domains:

5.1 Code-Repository Translation (Syzygy-style agent)

Five branchpoints added to ∼600 LOC base; rapid experimentation across global BoN, local BoN, and hierarchical beam search.
Achievement: Beam search (file=2, method=3) attains near-perfect self-validation as cost increases, outperforming simpler strategies.
Across additional assignments (ps1–ps4, 5 756 LOC), coarse+fine beam search consistently supersedes pure sampling under cost equivalence.
Code modifications required: ∼400 LOC in plain Python versus ∼80 with EnCompass (5× reduction).

5.2 Hypothesis Search (ARC-AGI)

Baseline: two-step agent yields 4.3% GPT-3.5 accuracy.
One branchpoint + global BoN ( $N=8$ ): 11.7%.
Two branchpoints + parallel BFS (branch=8): 15%.
Matches or surpasses contemporary ADAS meta-search at comparable cost.
Code overhead: +21 LOC plain, +8 LOC with EnCompass.

Foundation: Reflexion on LeetCodeHard, ∼35% pass rate for $n=5$ loops.
Enhancement: Branchpoints at initiation and per-loop, reexpand-BeFS strategy.
Result: ∼36% pass rate with reduced LLM cost; superior scaling over naive loop increments.
Code overhead: +27 LOC plain, +9 LOC EnCompass.

6. Performance Scaling Laws and Practical Implications

Empirical studies consistently indicate a "log-linear" relationship between performance and inference cost:

1	Performance ≈ a + b·log(Cost)

Structured search methods (beam, MCTS, reexpand-BeFS) yield improved scaling coefficients relative to simple random sampling or best-of-N selection. For example, in code translation tasks (ps0), the slope for hierarchical beam strategies is significantly greater ( $p < 0.03$ ) versus basic search. On LeetCodeHard with Reflexion, reexpand-BeFS matches unconstrained loop-based top performance at 30–40% reduced LLM budget.

Summary of empirical gains:

3–6× reduction in code overhead when implementing or switching strategies.
Over 2× decrease in LLM calls required to achieve target performance.
Explicit modular separation: branchpoints and .search(...) parameterization isolate search configuration from workflow.

This suggests that EnCompass offers a rigorous, scalable, and low-overhead pathway for LLM-based agent development, particularly where reliabilty, rapid prototyping, and systematic search experimentation are prioritized (Li et al., 3 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EnCompass Framework.

EnCompass: Python Agent Framework

1. Foundations: Probabilistic Angelic Nondeterminism

2. Separation of Workflow Specification and Search Strategy

3. Framework Implementation and API

4. Supported Search Algorithms and Execution Strategies

5. Empirical Evaluation and Case Studies

5.1 Code-Repository Translation (Syzygy-style agent)

5.2 Hypothesis Search (ARC-AGI)

5.3 Reflexion (Iterative Refinement)

6. Performance Scaling Laws and Practical Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

EnCompass: Python Agent Framework

1. Foundations: Probabilistic Angelic Nondeterminism

2. Separation of Workflow Specification and Search Strategy

3. Framework Implementation and API

4. Supported Search Algorithms and Execution Strategies

5. Empirical Evaluation and Case Studies

5.1 Code-Repository Translation (Syzygy-style agent)

5.2 Hypothesis Search (ARC-AGI)

5.3 Reflexion (Iterative Refinement)

6. Performance Scaling Laws and Practical Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research