Levin Tree Search (LTS)

Updated 14 January 2026

Levin Tree Search (LTS) is a policy-guided, best-first search algorithm that uses a probabilistic cost function to prioritize node expansions.
It systematically computes a cost c(s)=g(s)/π(s) to guarantee that the number of node expansions remains bounded before reaching a terminal state.
Extensions like VLTS and LTS+CM incorporate learnable policies and context models, yielding exponential speedups in tasks such as planning and program synthesis.

Levin Tree Search (LTS) is a best-first, policy-guided search algorithm designed to provide rigorous guarantees on node expansions before reaching a goal. LTS leverages a probabilistic policy over a tree’s edges to prioritize paths that efficiently balance depth and probability, offering principled guidance for deterministic planning, program synthesis, theorem proving, and LLM (LM)-guided reasoning under resource constraints. Extensions such as VLTS (rerooted LTS) and integration with learnable context models further broaden its applicability while maintaining formal expansion bounds (Pendurkar et al., 7 Jan 2026, Orseau et al., 2024, Orseau et al., 2023).

1. Algorithmic Foundations of LTS

LTS operates on a (possibly infinite) rooted, directed tree structure. Given a start node $s_0$ and a policy $\pi$ that defines a probability distribution $\pi(\cdot|s)$ over each node's children, LTS seeks a terminal node $h \in H$ with minimal expected node expansions under $\pi$ .

Policy-driven expansion: Each state $s$ is characterized by its depth $g(s)$ and cumulative path probability $\pi(s) = \prod_{i=1}^{g(s)} \pi(s_i | s_0 \ldots s_{i-1})$ .
Levin cost: The algorithm assigns a cost $c(s) = g(s)/\pi(s)$ to each node, evaluating the efficiency of reaching $s$ under $\pi$ in terms of exploration.
Best-first selection: At each iteration, LTS expands the node in the frontier $\mathcal{F}$ with the lowest Levin cost, and continues until a terminal node is reached.

Pseudocode outline: Algorithm: Levin Tree Search Input: policy π, root s₀, terminal set H Initialize frontier 𝓕 ← {s₀} while 𝓕 ≠ ∅ do s ← argmin_{u∈𝓕} g(u)/π(u) remove s from 𝓕 if s ∈ H then return path(s₀→s) for each successor v of s do π(v)=π(s)·π(v|s), g(v)=g(s)+1 add v to 𝓕 LTS expands nodes in increasing order of $c(s)$ , guaranteeing that the number of expansions to reach any $h\in H$ is at most $g(h)/\pi(h)$ (Pendurkar et al., 7 Jan 2026, Orseau et al., 2023).

2. Theoretical Guarantees and Loss Formulation

The distinguishing feature of LTS is its expansion bound: $N(\mathcal{T}, H) \le \min_{h \in H} \frac{g(h)}{\pi(h)}$ where $N(\mathcal{T}, H)$ is the number of expanded nodes before reaching a terminal $h$ .

This guarantee enables a loss-based view: $L(N') = \sum_{n\in N'} \frac{d(n)}{\pi(n)}$ for a set of solution nodes $N'$ with depths $d(n)$ . When the policy $\pi$ is parameterized (e.g., via neural nets or context models), this "LTS loss" is differentiable and, under context model parameterizations, convex in the parameters, supporting effective and provable learning of search policies with online convex optimization (Orseau et al., 2023).

3. Policy Parameterization: Context Models

In the context-model variant of LTS, termed LTS+CM, the policy is modeled as a product of categorical predictors ("contexts"):

At each node $n$ , the active set $Q(n)$ specifies which contexts are relevant.
Each context $c$ defines a distribution $p_c(a; \beta)$ over actions, parameterized with $\beta_{c,a}$ (logits).
The overall policy is constructed via product-mixing: $\pi(a|n; \beta) = \frac{\prod_{c \in Q(n)} p_c(a; \beta)}{\sum_{a'} \prod_{c \in Q(n)} p_c(a'; \beta)}$
This parameterization guarantees that the LTS loss is convex, supporting provable regret and convergence bounds with standard OCO methods (Orseau et al., 2023).

Empirically, LTS+CM has been shown to outperform neural policy parameterizations on domains such as the 24-puzzle and Rubik’s cube, achieving lower expansions and faster solutions.

Domain	Alg	Solved (%)	Avg. Expansions
Boxoban	LTS+CM	100	2132
Boxoban	LTS+NN	100	2640
24-Puzzle	LTS+CM	100	5667
24-Puzzle	LTS+NN	0.9	39005

(Orseau et al., 2023)

4. Rerooted (VLTS) and Subtask Decomposition

VLTS (√LTS) generalizes LTS by running multiple best-first searches rerooted at various nodes, each weighted by a rerooter $w(\cdot)$ . The search cost is composed as the minimal cost over rerooted subtrees: $c(n) = \min_{s: n_s < n} \frac{c_s(n)}{w_s}$ where $c_s(n)$ is the slenderness on the subtree rooted at $n_s$ . The main theoretical result is that for $q$ reroot points, VLTS requires $O(q\,T^{1/q})$ expansions, where $T$ is the LTS expansion count. This provides exponential speedups for well-chosen subtask decompositions. Both $\pi$ and $w$ can be learned from data (Orseau et al., 2024).

VLTS is particularly effective in problems with natural subtask decompositions or “clues,” such as Sokoban or adversarial reward chain domains, where it can interpolate between uninformed BFS and ideal subtask-wise search.

5. LTS in Tree-of-Thoughts and LM-guided Search

LTS is directly adaptable to LM-guided tree search by treating each "thought" as a candidate expansion, using the LM's output probabilities as a policy: $\pi(c|s) = p_\tau(t|s), \quad \pi(c) = \pi(s) \cdot p_\tau(t|s), \quad g(c)=g(s)+1$ where $p_\tau$ applies a temperature-scaled softmax over LM logits.

Branching: At each step, up to $b_{max}$ candidate thoughts are sampled.
Expansion upper bound: For a pruned subtree $T$ , LTS expands at most $b_{max}\,\min_{s \in H'} g(s)/\pi(s)$ nodes, with $H'$ the goals found in $T$ .
Temperature sensitivity: As $\tau$ increases, the distribution flattens, decreasing $\pi(s)$ and increasing search cost; optimal $\tau \in [0.5, 1.5]$ balances exploitation and exploration (Pendurkar et al., 7 Jan 2026).

Empirical studies across Blocksworld, PrOntoQA, and Array Sorting demonstrate that LTS consistently matches or outperforms DFS and Beam Search for any fixed LM-query budget, especially in the low-budget regime.

Domain	LM	Budget (thoughts)	LTS Accuracy	DFS Accuracy	Beam Accuracy
Blocksworld	Llama 3.2 3B	15	78%	60%	65%
Sort	Llama 3.2 3B	10	21%	18%	–

(Pendurkar et al., 7 Jan 2026)

6. Trade-Offs, Practical Aspects, and Recommendations

Computational efficiency: LTS leverages only LM sampling calls per expansion and does not require extra self-evaluation, yielding strictly fewer LM queries than DFS or Beam for equivalent search effort.
Budget sensitivity: Low temperatures ( $\tau\to0$ ) yield greedy, high-confidence expansion (risking early commitment), while high temperatures ( $\tau \gg 1$ ) approach breadth-first search (inefficient for small budgets). Intermediate temperatures reliably offer the best performance within computational constraints.
Learning policies and rerooters: Both the search policy $\pi$ and rerooter $w(\cdot)$ in VLTS can be learned using convex or neural objectives to optimize search efficiency for specific problem distributions.

A plausible implication is that LTS and its variants enable flexible, learnable integration of policy and external hints (landmarks, rewards, clues) with guaranteed search-efficiency improvements, positioning this family as a foundational approach in resource-bounded, policy-guided heuristic search (Orseau et al., 2024, Pendurkar et al., 7 Jan 2026, Orseau et al., 2023).