EnCompass: Python Agent Framework
- EnCompass is a Python framework that formalizes agent programs via probabilistic angelic nondeterminism, separating core workflow from inference-time search strategies.
- It compiles annotated Python functions into explicit search spaces, enabling integration with diverse algorithms like DFS, BFS, beam search, and MCTS.
- The framework reduces code overhead and LLM calls, as evidenced by empirical gains in speed and scalability across code translation and hypothesis search tasks.
EnCompass is a Python framework for agent programming that operationalizes "probabilistic angelic nondeterminism" (PAN), enabling the principled separation of agent workflow logic from inference-time search strategies. The framework empowers researchers and practitioners to efficiently develop, experiment with, and deploy LLM-based agents by compiling annotated Python functions into explicit search spaces over execution paths, allowing for seamless workflow specification and flexible, modular search algorithm integration (Li et al., 3 Dec 2025).
1. Foundations: Probabilistic Angelic Nondeterminism
Probabilistic angelic nondeterminism (PAN) formalizes the agent program as a nondeterministic, probabilistic process, where execution is partitioned at explicitly marked "branchpoints" to define decision steps within the agent's workflow. The program state is defined by and memory snapshot (including locals and shared variables). Between marked points, transitions are deterministic except for LLM calls or other random oracles, modeled as one-step probabilistic transitions:
A complete program run is a path , with probability:
Unlike demonic nondeterminism (worst-case branching), PAN enables "angelic" search: guided exploration over all possible execution paths to maximize a user-defined score , yielding the optimal trajectory
The search process constructs and explores a tree over program states , with search algorithms selecting which leaf (checkpoint) to expand next. This approach presents a rigorous, modular foundation for agent search and evaluation.
2. Separation of Workflow Specification and Search Strategy
Traditional agent programming often entangles workflow logic—such as loops, accumulators, and scoring heuristics—with inference-time search (e.g., varying N in best-of-N sampling, implementing beam search). EnCompass, through PAN, enforces separation by providing:
- Workflow layer: Authored as single-threaded code, with designated primitive calls marking unreliability (e.g., branchpoint) and evaluation (e.g., record_score).
- Search layer: Realized as an external engine operating over the tree of checkpoints, parameterized independently of the workflow.
This separation avoids code refactoring when experimenting with alternative search algorithms; adding branchpoints or modifying search parameters suffices. Compilation via the EnCompass decorator transforms routines into explicit search spaces for exploration.
The following pseudocode excerpt illustrates the compilation process:
1 2 3 4 5 6 7 8 9 10 11 |
@encompass.compile def agent(x): branchpoint() y = LLM(x) record_score(evaluate(y)) return y def cps_agent(frame, resume): frame['y'] = LLM(frame['x']) resume1 = lambda fr: resume(fr) return (frame, resume1) |
3. Framework Implementation and API
The core of EnCompass is a Python decorator, @encompass.compile, which processes the abstract syntax tree (AST) of the agent function, applies continuation-passing style (CPS) transformation and tail-call optimization, then exposes enriched runtime objects supporting the search space API.
After decoration, functions become SearchSpace objects supporting:
| Method | Description | Return Type |
|---|---|---|
start() |
Initializes to first program state (checkpoint) | Checkpoint |
search(algo, **kwargs) |
Executes specified search to find best return | best value |
search_multiple(...) |
Runs search to return list of (value, score) pairs | List of tuples |
Primitives within decorated functions include:
branchpoint(name=…),branchpoint_choose(choices)record_score(v),record_score(evaluator, target, label=…)record_costs(api_cost=…),early_stop_search(),optional_return(v)protect(expr, Exception),searchover(fn(...))(for nested searches)
Exemplary usage:
1 2 3 4 5 6 7 8 |
@encompass.compile def my_agent(inp): branchpoint() out = LLM(inp) record_score(quality(out)) return out best = my_agent(x).search("dfs", default_branching=10) |
For multi-level beam search:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
@encompass.compile def translate_file(repo): branchpoint(name="file_step") skeleton = llm_stub(repo) record_score(verify_stub(skeleton)) for fn in skeleton.methods: branchpoint(name="method_step") code = llm_translate(fn) record_score(verify_translation(code)) skeleton.add(code) return skeleton result = translate_file(r).search("beam", beam_width=2, default_branching=3) |
4. Supported Search Algorithms and Execution Strategies
All strategies within EnCompass operate by repeatedly invoking step() on chosen Checkpoint objects. The following algorithms are supported natively:
- Depth-First Search (DFS)
- Breadth-First Search (BFS)
- Best-First Search (BeFS)
- Beam Search
- Monte Carlo Tree Search (MCTS)
Two novel variants are introduced:
- Re-expand Best-First: Permits re-expansion of nodes to incorporate updated scores.
- Explorative Re-expand Best-First: Augments node selection with an upper-confidence-bound bonus:
where is the score estimate, the total node expansions, and the count for node .
Beam search is illustrated below:
1 2 3 4 5 6 7 8 9 |
initialize beam = [initial_checkpoint]
repeat until cost budget:
candidates = []
for ckpt in beam:
for _ in range(branching_factor):
candidates.append(ckpt.step())
sort candidates by .score descending
beam = candidates[:beam_width]
return beam[0].return_value |
This catalogue of strategies enables granular cost-performance tuning and broad experimental flexibility without altering core workflow logic.
5. Empirical Evaluation and Case Studies
Performance and reliability gains are demonstrated in three agent domains:
5.1 Code-Repository Translation (Syzygy-style agent)
- Five branchpoints added to ∼600 LOC base; rapid experimentation across global BoN, local BoN, and hierarchical beam search.
- Achievement: Beam search (file=2, method=3) attains near-perfect self-validation as cost increases, outperforming simpler strategies.
- Across additional assignments (ps1–ps4, 5 756 LOC), coarse+fine beam search consistently supersedes pure sampling under cost equivalence.
- Code modifications required: ∼400 LOC in plain Python versus ∼80 with EnCompass (5× reduction).
5.2 Hypothesis Search (ARC-AGI)
- Baseline: two-step agent yields 4.3% GPT-3.5 accuracy.
- One branchpoint + global BoN (): 11.7%.
- Two branchpoints + parallel BFS (branch=8): 15%.
- Matches or surpasses contemporary ADAS meta-search at comparable cost.
- Code overhead: +21 LOC plain, +8 LOC with EnCompass.
5.3 Reflexion (Iterative Refinement)
- Foundation: Reflexion on LeetCodeHard, ∼35% pass rate for loops.
- Enhancement: Branchpoints at initiation and per-loop, reexpand-BeFS strategy.
- Result: ∼36% pass rate with reduced LLM cost; superior scaling over naive loop increments.
- Code overhead: +27 LOC plain, +9 LOC EnCompass.
6. Performance Scaling Laws and Practical Implications
Empirical studies consistently indicate a "log-linear" relationship between performance and inference cost:
1 |
Performance ≈ a + b·log(Cost) |
Structured search methods (beam, MCTS, reexpand-BeFS) yield improved scaling coefficients relative to simple random sampling or best-of-N selection. For example, in code translation tasks (ps0), the slope for hierarchical beam strategies is significantly greater () versus basic search. On LeetCodeHard with Reflexion, reexpand-BeFS matches unconstrained loop-based top performance at 30–40% reduced LLM budget.
Summary of empirical gains:
- 3–6× reduction in code overhead when implementing or switching strategies.
- Over 2× decrease in LLM calls required to achieve target performance.
- Explicit modular separation: branchpoints and
.search(...)parameterization isolate search configuration from workflow.
This suggests that EnCompass offers a rigorous, scalable, and low-overhead pathway for LLM-based agent development, particularly where reliabilty, rapid prototyping, and systematic search experimentation are prioritized (Li et al., 3 Dec 2025).