Tree of Attacks with Pruning (TAP)

Updated 18 June 2026

Tree of Attacks with Pruning (TAP) is an automated adversarial prompting framework that uses breadth-first tree search and chain-of-thought refinements to generate effective jailbreak prompts for LLMs.
It systematically expands candidate prompts and prunes off-topic or low-scoring ones using evaluator LLMs, significantly reducing query requirements while boosting success rates.
Empirical results demonstrate TAP’s high efficiency, achieving up to 90% jailbreak success on models like GPT-4 with notably fewer queries compared to previous methods.

Tree of Attacks with Pruning (TAP) is an automated adversarial prompting framework designed to jailbreak LLMs using only black-box access. TAP systematically explores the prompt space via breadth-first tree search, injecting chain-of-thought candidate refinements, and applies targeted pruning to maximize efficiency and attack success rates. The TAP methodology and its derivatives have established new empirical state-of-the-art results for automated black-box jailbreaks, and provide a foundation for efficient red-teaming and adversarial evaluation of LLM safety mechanisms (Mehrotra et al., 2023).

1. Algorithmic Structure of the TAP Framework

TAP instantiates a breadth-first “tree-of-thought” attack search, where each node in the search tree corresponds to a candidate attack prompt or partial dialogue, and the edges correspond to prompt refinements or dialogue continuations supplied by an attacker LLM. For single-turn jailbreaks, TAP proceeds as follows:

Branch Expansion: At each tree depth $i$ (up to max depth $d$ ), each current leaf prompt is expanded into $b$ new prompts using an attacker LLM $A$ , each representing a single-step chain-of-thought refinement.
Prune I (Off-topic Pruning): Candidate prompts are discarded if deemed off-topic relative to the original goal $G$ , as determined by an evaluator LLM $E$ .
Query and Assessment: Surviving candidates are sent to the target LLM $T$ ; responses $R$ are collected and scored via a Judge function (also implemented by $E$ ).
Termination and Prune II (Width Control): If a score indicates a successful jailbreak, the process halts with success. If not, and more than $w$ candidate leaves remain, only the $d$ 0 highest-scoring leaves are retained.

The process repeats until a jailbreak is achieved or the maximum tree depth is exceeded. TAP key parameters are tree depth $d$ 1, branching factor $d$ 2, and max width $d$ 3; principal experiments found $d$ 4, $d$ 5, $d$ 6 effective (Mehrotra et al., 2023).

For multi-turn jailbreaks and complex dialogue settings, as in DialTree-RPO, TAP generalizes to dialogue trees: nodes correspond to interleaved attacker-target utterances, and branching, evaluation, and pruning proceed at each turn (Guo et al., 2 Oct 2025).

Prompt refinement within TAP is attacker-driven and informed by chain-of-thought analysis. For each node, the attacker LLM $d$ 7 processes the conversation history and outputs a JSON object of the form:

$A$ 5

The improvement field is a natural-language diagnosis of why prior prompts failed, while $d$ 8 is an evolved prompt crafted to evade safety filters. Candidate prompts seek to maximize expected jailbreak success, subject to meaning-preservation and topicality:

$d$ 9

where $b$ 0 denotes meaningful prompts and $b$ 1 measures whether the target output constitutes a jailbreak (Mehrotra et al., 2023).

In multi-turn scenarios such as DialTree-RPO, each node encompasses a full dialogue history. The attacker policy $b$ 2 samples $b$ 3 dialogue continuations per context, enabling exploration of complex strategies over multiple turns (Guo et al., 2 Oct 2025).

3. Pruning Strategies

TAP deploys two primary pruning mechanisms to manage the combinatorial growth of the search tree and focus exploration:

Phase I: Off-topic Pruning OffTopic $b$ 4 is a binary predicate implemented by evaluator LLM $b$ 5 using an explicit prompt (“Does $b$ 6 request the same information as $b$ 7? YES/NO.”). Any $b$ 8 flagged as off-topic is discarded prior to querying the target.
Phase II: Width Control / Top- $b$ 9 Pruning After scoring candidate prompts or partial dialogues, if more than $A$ 0 survivors remain, only the $A$ 1 with the largest Judge-scores are retained.

No continuous scoring threshold is used beyond off-topic filtering and score-based ranking. In multi-turn or RL-based variants (e.g., DialTree-RPO), pruning also includes format validation and, optionally, NLI topic entailment checks and stochastic subsampling to maintain bounded width at each tree level (Guo et al., 2 Oct 2025).

4. Query Efficiency and Theoretical Bounds

Without pruning or early termination, the total number of black-box queries incurred by TAP is bounded by:

$A$ 2

Empirically, aggressive off-topic pruning (∼50% per layer) and early stopping (on jailbreak success) reduce query requirements significantly relative to prior work (Mehrotra et al., 2023). On the AdvBench Subset and GPT-4 target, TAP required an average of $A$ 328.8 queries per jailbreak, improving over the sequential PAIR baseline (∼39.6 queries), while achieving substantially higher jailbreak success rates (∼90% on GPT-4 with TAP vs. ∼60% for PAIR). When extended to multi-turn dialogue, DialTree-RPO achieved even higher attack success rates with fewer queries on most model targets (Guo et al., 2 Oct 2025).

5. Empirical Results and Comparative Evaluation

TAP was evaluated on standardized adversarial benchmarks, including the AdvBench Subset (50 goals, 32 categories) and held-out sets. Target models spanned open-source (Vicuna-v1.5, Llama-7B), closed-source (GPT-3.5, GPT-4, GPT-4-Turbo, PaLM-2, Gemini-Pro), and protected variants wrapped with LlamaGuard.

Method	GPT4 (ASR)	GPT4-Turbo (ASR)	Queries (GPT4)
TAP	90%	84%	28.8
PAIR	60%	44%	39.6
GCG (white-box)	–	–	–
DialTree-RPO	85.3%*	–	∼3*

* DialTree-RPO achieves 85.3% ASR on average across 10 models using ∼3 queries per attack, outperforming TAP’s 42.6% average success. GCG requires hundreds of thousands of queries (open-source only) (Mehrotra et al., 2023, Guo et al., 2 Oct 2025).

TAP surpasses prior black-box methods in both efficiency and efficacy, and remains robust against state-of-the-art guardrails such as LlamaGuard.

6. Extensions, Limitations, and Prospects

Limitations

Evaluator LLM Dependence: TAP’s pruning and success evaluation are bottlenecked by the strength of the evaluator LLM. Substituting weaker LLMs (e.g., GPT-3.5) or heuristics leads to pronounced performance drops.
Dataset Generalizability: TAP’s empirical gains are validated on established harm benchmarks; transferability to unseen or orthogonal goal types (privacy, bias) is unproven.
Black-box Constraints: Only the first $A$ 4 tokens from the target output are observed, limiting visibility into streaming or filtered output modes.
Static Attacker Policy: TAP’s attacker is not adapted online; learning-based or fine-tuned attackers could improve exploration.

Extensions and Future Directions

Specialized Evaluators: Fine-tuning small LLMs for harm-specific evaluation could replace the current reliance on large proprietary evaluators.
Multi-Prompt and Dialogue Attacks: Extending TAP to sequences of adaptive prompts or multi-turn dialogues has demonstrated further gains (as in DialTree-RPO) (Guo et al., 2 Oct 2025).
Adversarial Red-Teaming: TAP-generated adversarial prompts offer valuable data for proactive defense and continual robustification of LLM guardrails.
Alternative Branching/Selection Mechanisms: Learning- or UCB-based subtree selection methods could potentially further improve search efficiency.

7. Significance and Synthesis

TAP unifies breadth-first tree search, chain-of-thought guided prompt evolution, and targeted pruning into an automated black-box jailbreak discovery framework. By balancing exploration and efficiency, TAP identifies diverse, interpretable prompts that defeat robust LLM safety mechanisms at high rates with modest query budgets. Multi-turn extensions such as DialTree-RPO highlight the continued vulnerability of LLMs to sophisticated adversarial prompting and motivate further work on defense-oriented evaluation and mitigation strategies (Mehrotra et al., 2023, Guo et al., 2 Oct 2025).

Markdown Report Issue Upgrade to Chat

References (2)

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically (2023)

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tree of Attacks with Pruning (TAP).

Tree of Attacks with Pruning (TAP)

1. Algorithmic Structure of the TAP Framework

2. Candidate-Prompt and Dialogue Refinement

3. Pruning Strategies

4. Query Efficiency and Theoretical Bounds

5. Empirical Results and Comparative Evaluation

6. Extensions, Limitations, and Prospects

Limitations

Extensions and Future Directions

7. Significance and Synthesis

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Tree of Attacks with Pruning (TAP)

1. Algorithmic Structure of the TAP Framework

2. Candidate-Prompt and Dialogue Refinement

3. Pruning Strategies

4. Query Efficiency and Theoretical Bounds

5. Empirical Results and Comparative Evaluation

6. Extensions, Limitations, and Prospects

Limitations

Extensions and Future Directions

7. Significance and Synthesis

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics