Levin Tree Search (LTS)
- Levin Tree Search (LTS) is a policy-guided, best-first search algorithm that uses a probabilistic cost function to prioritize node expansions.
- It systematically computes a cost c(s)=g(s)/π(s) to guarantee that the number of node expansions remains bounded before reaching a terminal state.
- Extensions like VLTS and LTS+CM incorporate learnable policies and context models, yielding exponential speedups in tasks such as planning and program synthesis.
Levin Tree Search (LTS) is a best-first, policy-guided search algorithm designed to provide rigorous guarantees on node expansions before reaching a goal. LTS leverages a probabilistic policy over a tree’s edges to prioritize paths that efficiently balance depth and probability, offering principled guidance for deterministic planning, program synthesis, theorem proving, and LLM (LM)-guided reasoning under resource constraints. Extensions such as VLTS (rerooted LTS) and integration with learnable context models further broaden its applicability while maintaining formal expansion bounds (Pendurkar et al., 7 Jan 2026, Orseau et al., 2024, Orseau et al., 2023).
1. Algorithmic Foundations of LTS
LTS operates on a (possibly infinite) rooted, directed tree structure. Given a start node and a policy that defines a probability distribution over each node's children, LTS seeks a terminal node with minimal expected node expansions under .
- Policy-driven expansion: Each state is characterized by its depth and cumulative path probability .
- Levin cost: The algorithm assigns a cost to each node, evaluating the efficiency of reaching under in terms of exploration.
- Best-first selection: At each iteration, LTS expands the node in the frontier with the lowest Levin cost, and continues until a terminal node is reached.
Pseudocode outline:
Algorithm: Levin Tree Search
Input: policy π, root s₀, terminal set H
Initialize frontier 𝓕 ← {s₀}
while 𝓕 ≠ ∅ do
s ← argmin_{u∈𝓕} g(u)/π(u)
remove s from 𝓕
if s ∈ H then
return path(s₀→s)
for each successor v of s do
π(v)=π(s)·π(v|s), g(v)=g(s)+1
add v to 𝓕
LTS expands nodes in increasing order of , guaranteeing that the number of expansions to reach any is at most (Pendurkar et al., 7 Jan 2026, Orseau et al., 2023).
2. Theoretical Guarantees and Loss Formulation
The distinguishing feature of LTS is its expansion bound: where is the number of expanded nodes before reaching a terminal .
This guarantee enables a loss-based view: for a set of solution nodes with depths . When the policy is parameterized (e.g., via neural nets or context models), this "LTS loss" is differentiable and, under context model parameterizations, convex in the parameters, supporting effective and provable learning of search policies with online convex optimization (Orseau et al., 2023).
3. Policy Parameterization: Context Models
In the context-model variant of LTS, termed LTS+CM, the policy is modeled as a product of categorical predictors ("contexts"):
- At each node , the active set specifies which contexts are relevant.
- Each context defines a distribution over actions, parameterized with (logits).
- The overall policy is constructed via product-mixing:
- This parameterization guarantees that the LTS loss is convex, supporting provable regret and convergence bounds with standard OCO methods (Orseau et al., 2023).
Empirically, LTS+CM has been shown to outperform neural policy parameterizations on domains such as the 24-puzzle and Rubik’s cube, achieving lower expansions and faster solutions.
| Domain | Alg | Solved (%) | Avg. Expansions |
|---|---|---|---|
| Boxoban | LTS+CM | 100 | 2132 |
| Boxoban | LTS+NN | 100 | 2640 |
| 24-Puzzle | LTS+CM | 100 | 5667 |
| 24-Puzzle | LTS+NN | 0.9 | 39005 |
4. Rerooted (VLTS) and Subtask Decomposition
VLTS (√LTS) generalizes LTS by running multiple best-first searches rerooted at various nodes, each weighted by a rerooter . The search cost is composed as the minimal cost over rerooted subtrees: where is the slenderness on the subtree rooted at . The main theoretical result is that for reroot points, VLTS requires expansions, where is the LTS expansion count. This provides exponential speedups for well-chosen subtask decompositions. Both and can be learned from data (Orseau et al., 2024).
VLTS is particularly effective in problems with natural subtask decompositions or “clues,” such as Sokoban or adversarial reward chain domains, where it can interpolate between uninformed BFS and ideal subtask-wise search.
5. LTS in Tree-of-Thoughts and LM-guided Search
LTS is directly adaptable to LM-guided tree search by treating each "thought" as a candidate expansion, using the LM's output probabilities as a policy: where applies a temperature-scaled softmax over LM logits.
- Branching: At each step, up to candidate thoughts are sampled.
- Expansion upper bound: For a pruned subtree , LTS expands at most nodes, with the goals found in .
- Temperature sensitivity: As increases, the distribution flattens, decreasing and increasing search cost; optimal balances exploitation and exploration (Pendurkar et al., 7 Jan 2026).
Empirical studies across Blocksworld, PrOntoQA, and Array Sorting demonstrate that LTS consistently matches or outperforms DFS and Beam Search for any fixed LM-query budget, especially in the low-budget regime.
| Domain | LM | Budget (thoughts) | LTS Accuracy | DFS Accuracy | Beam Accuracy |
|---|---|---|---|---|---|
| Blocksworld | Llama 3.2 3B | 15 | 78% | 60% | 65% |
| Sort | Llama 3.2 3B | 10 | 21% | 18% | – |
(Pendurkar et al., 7 Jan 2026)
6. Trade-Offs, Practical Aspects, and Recommendations
- Computational efficiency: LTS leverages only LM sampling calls per expansion and does not require extra self-evaluation, yielding strictly fewer LM queries than DFS or Beam for equivalent search effort.
- Budget sensitivity: Low temperatures () yield greedy, high-confidence expansion (risking early commitment), while high temperatures () approach breadth-first search (inefficient for small budgets). Intermediate temperatures reliably offer the best performance within computational constraints.
- Learning policies and rerooters: Both the search policy and rerooter in VLTS can be learned using convex or neural objectives to optimize search efficiency for specific problem distributions.
A plausible implication is that LTS and its variants enable flexible, learnable integration of policy and external hints (landmarks, rewards, clues) with guaranteed search-efficiency improvements, positioning this family as a foundational approach in resource-bounded, policy-guided heuristic search (Orseau et al., 2024, Pendurkar et al., 7 Jan 2026, Orseau et al., 2023).