Papers
Topics
Authors
Recent
2000 character limit reached

Levin Tree Search (LTS)

Updated 14 January 2026
  • Levin Tree Search (LTS) is a policy-guided, best-first search algorithm that uses a probabilistic cost function to prioritize node expansions.
  • It systematically computes a cost c(s)=g(s)/π(s) to guarantee that the number of node expansions remains bounded before reaching a terminal state.
  • Extensions like VLTS and LTS+CM incorporate learnable policies and context models, yielding exponential speedups in tasks such as planning and program synthesis.

Levin Tree Search (LTS) is a best-first, policy-guided search algorithm designed to provide rigorous guarantees on node expansions before reaching a goal. LTS leverages a probabilistic policy over a tree’s edges to prioritize paths that efficiently balance depth and probability, offering principled guidance for deterministic planning, program synthesis, theorem proving, and LLM (LM)-guided reasoning under resource constraints. Extensions such as VLTS (rerooted LTS) and integration with learnable context models further broaden its applicability while maintaining formal expansion bounds (Pendurkar et al., 7 Jan 2026, Orseau et al., 2024, Orseau et al., 2023).

1. Algorithmic Foundations of LTS

LTS operates on a (possibly infinite) rooted, directed tree structure. Given a start node s0s_0 and a policy π\pi that defines a probability distribution π(s)\pi(\cdot|s) over each node's children, LTS seeks a terminal node hHh \in H with minimal expected node expansions under π\pi.

  • Policy-driven expansion: Each state ss is characterized by its depth g(s)g(s) and cumulative path probability π(s)=i=1g(s)π(sis0si1)\pi(s) = \prod_{i=1}^{g(s)} \pi(s_i | s_0 \ldots s_{i-1}).
  • Levin cost: The algorithm assigns a cost c(s)=g(s)/π(s)c(s) = g(s)/\pi(s) to each node, evaluating the efficiency of reaching ss under π\pi in terms of exploration.
  • Best-first selection: At each iteration, LTS expands the node in the frontier F\mathcal{F} with the lowest Levin cost, and continues until a terminal node is reached.

Pseudocode outline: Algorithm: Levin Tree Search Input: policy π, root s₀, terminal set H Initialize frontier 𝓕 ← {s₀} while 𝓕 ≠ ∅ do s ← argmin_{u∈𝓕} g(u)/π(u) remove s from 𝓕 if s ∈ H then return path(s₀→s) for each successor v of s do π(v)=π(s)·π(v|s), g(v)=g(s)+1 add v to 𝓕 LTS expands nodes in increasing order of c(s)c(s), guaranteeing that the number of expansions to reach any hHh\in H is at most g(h)/π(h)g(h)/\pi(h) (Pendurkar et al., 7 Jan 2026, Orseau et al., 2023).

2. Theoretical Guarantees and Loss Formulation

The distinguishing feature of LTS is its expansion bound: N(T,H)minhHg(h)π(h)N(\mathcal{T}, H) \le \min_{h \in H} \frac{g(h)}{\pi(h)} where N(T,H)N(\mathcal{T}, H) is the number of expanded nodes before reaching a terminal hh.

This guarantee enables a loss-based view: L(N)=nNd(n)π(n)L(N') = \sum_{n\in N'} \frac{d(n)}{\pi(n)} for a set of solution nodes NN' with depths d(n)d(n). When the policy π\pi is parameterized (e.g., via neural nets or context models), this "LTS loss" is differentiable and, under context model parameterizations, convex in the parameters, supporting effective and provable learning of search policies with online convex optimization (Orseau et al., 2023).

3. Policy Parameterization: Context Models

In the context-model variant of LTS, termed LTS+CM, the policy is modeled as a product of categorical predictors ("contexts"):

  • At each node nn, the active set Q(n)Q(n) specifies which contexts are relevant.
  • Each context cc defines a distribution pc(a;β)p_c(a; \beta) over actions, parameterized with βc,a\beta_{c,a} (logits).
  • The overall policy is constructed via product-mixing: π(an;β)=cQ(n)pc(a;β)acQ(n)pc(a;β)\pi(a|n; \beta) = \frac{\prod_{c \in Q(n)} p_c(a; \beta)}{\sum_{a'} \prod_{c \in Q(n)} p_c(a'; \beta)}
  • This parameterization guarantees that the LTS loss is convex, supporting provable regret and convergence bounds with standard OCO methods (Orseau et al., 2023).

Empirically, LTS+CM has been shown to outperform neural policy parameterizations on domains such as the 24-puzzle and Rubik’s cube, achieving lower expansions and faster solutions.

Domain Alg Solved (%) Avg. Expansions
Boxoban LTS+CM 100 2132
Boxoban LTS+NN 100 2640
24-Puzzle LTS+CM 100 5667
24-Puzzle LTS+NN 0.9 39005

(Orseau et al., 2023)

4. Rerooted (VLTS) and Subtask Decomposition

VLTS (√LTS) generalizes LTS by running multiple best-first searches rerooted at various nodes, each weighted by a rerooter w()w(\cdot). The search cost is composed as the minimal cost over rerooted subtrees: c(n)=mins:ns<ncs(n)wsc(n) = \min_{s: n_s < n} \frac{c_s(n)}{w_s} where cs(n)c_s(n) is the slenderness on the subtree rooted at nsn_s. The main theoretical result is that for qq reroot points, VLTS requires O(qT1/q)O(q\,T^{1/q}) expansions, where TT is the LTS expansion count. This provides exponential speedups for well-chosen subtask decompositions. Both π\pi and ww can be learned from data (Orseau et al., 2024).

VLTS is particularly effective in problems with natural subtask decompositions or “clues,” such as Sokoban or adversarial reward chain domains, where it can interpolate between uninformed BFS and ideal subtask-wise search.

LTS is directly adaptable to LM-guided tree search by treating each "thought" as a candidate expansion, using the LM's output probabilities as a policy: π(cs)=pτ(ts),π(c)=π(s)pτ(ts),g(c)=g(s)+1\pi(c|s) = p_\tau(t|s), \quad \pi(c) = \pi(s) \cdot p_\tau(t|s), \quad g(c)=g(s)+1 where pτp_\tau applies a temperature-scaled softmax over LM logits.

  • Branching: At each step, up to bmaxb_{max} candidate thoughts are sampled.
  • Expansion upper bound: For a pruned subtree TT, LTS expands at most bmaxminsHg(s)/π(s)b_{max}\,\min_{s \in H'} g(s)/\pi(s) nodes, with HH' the goals found in TT.
  • Temperature sensitivity: As τ\tau increases, the distribution flattens, decreasing π(s)\pi(s) and increasing search cost; optimal τ[0.5,1.5]\tau \in [0.5, 1.5] balances exploitation and exploration (Pendurkar et al., 7 Jan 2026).

Empirical studies across Blocksworld, PrOntoQA, and Array Sorting demonstrate that LTS consistently matches or outperforms DFS and Beam Search for any fixed LM-query budget, especially in the low-budget regime.

Domain LM Budget (thoughts) LTS Accuracy DFS Accuracy Beam Accuracy
Blocksworld Llama 3.2 3B 15 78% 60% 65%
Sort Llama 3.2 3B 10 21% 18%

(Pendurkar et al., 7 Jan 2026)

6. Trade-Offs, Practical Aspects, and Recommendations

  • Computational efficiency: LTS leverages only LM sampling calls per expansion and does not require extra self-evaluation, yielding strictly fewer LM queries than DFS or Beam for equivalent search effort.
  • Budget sensitivity: Low temperatures (τ0\tau\to0) yield greedy, high-confidence expansion (risking early commitment), while high temperatures (τ1\tau \gg 1) approach breadth-first search (inefficient for small budgets). Intermediate temperatures reliably offer the best performance within computational constraints.
  • Learning policies and rerooters: Both the search policy π\pi and rerooter w()w(\cdot) in VLTS can be learned using convex or neural objectives to optimize search efficiency for specific problem distributions.

A plausible implication is that LTS and its variants enable flexible, learnable integration of policy and external hints (landmarks, rewards, clues) with guaranteed search-efficiency improvements, positioning this family as a foundational approach in resource-bounded, policy-guided heuristic search (Orseau et al., 2024, Pendurkar et al., 7 Jan 2026, Orseau et al., 2023).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Levin Tree Search (LTS).