Rank Anything First (RAF)
- RAF is a set of adversarial techniques that manipulate LLM rerankers through naturalistic token-level prompt modifications.
- It employs a two-stage token optimization process that balances ranking effectiveness with linguistic naturalness.
- RAF demonstrates robust transferability across models, revealing vulnerabilities in current LLM-powered retrieval systems.
Rank Anything First (RAF) refers to a set of algorithmic and adversarial techniques designed to induce LLMs used for reranking in information retrieval toward consistently promoting a target item to the top of a generated ranking, using naturalistic and contextually plausible textual perturbations. The RAF framework, as introduced in "Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization" (Xing et al., 8 Oct 2025), demonstrates robust manipulation capabilities by optimizing candidate prompts on token-level objectives for both ranking effectiveness and linguistic naturalness, revealing critical vulnerabilities in current LLM reranking pipelines.
1. Underlying Motivation and Problem Setting
Recent advances have led to widespread deployment of LLMs as reranking engines in search, recommendation, and retrieval systems. In this context, input prompts and contextual queries serve as soft signals, steering the relative positioning of candidate items in ranked outputs. Although these systems are often presumed robust to minor variations in prompt formulation, RAF shows that targeted, human-readable perturbations can algorithmically and consistently elevate a designated candidate while evading detection mechanisms.
The threat model assumes black-box (query-only) access to the LLM reranker, with the attacker seeking to construct a concise textual intervention that maximizes the probability of a target item appearing at the top rank. An additional constraint is imposed: the manipulated prompt must preserve naturalness and fluency, minimizing the presence of detectable or synthetic artifacts.
2. Two-Stage Token Optimization Pipeline
The RAF technique operates by constructing prompt perturbations in an iterative, token-by-token manner, comprising two discrete stages at each token position:
| Stage | Algorithmic Principle | Objective |
|---|---|---|
| Stage 1 | Greedy Coordinate Gradient | Shortlists tokens that maximize gradient combining ranking and readability loss |
| Stage 2 | Entropy-Based Dynamic Weighting & Temperature Sampling | Selects final token by jointly optimizing exact ranking and readability losses |
Stage 1 – Greedy Coordinate Gradient:
For each position in the prompt context, RAF computes token-level gradients of the ranking loss (cross-entropy indicating the target item’s position) and a readability loss (negative log-likelihood of the token under current context). The gradients are linearly combined with a fixed tradeoff parameter (w₁) to create a shortlist of top-B candidate tokens expected to offer the best compromise between manipulation efficacy and fluency.
Stage 2 – Entropy-Based Weighting and Temperature-Controlled Sampling:
The shortlisted tokens are then evaluated using their true ranking and readability losses. RAF incorporates a dynamic weight for readability (w_read) computed as a function of the entropy H(p_read) of the next-token distribution:
where (vocabulary size). At low entropy (high model confidence) more weight is given to readability. The final selection is made by forming a temperature-scaled softmax over the combined loss:
where is the weighted sum of ranking and readability losses.
This process alternates greedy shortlisting and dynamic loss-based sampling for each token until the full prompt stabilizes on its top candidate.
3. Dual Objective: Ranking Maximization and Naturalness
RAF’s token optimization is explicitly guided by the conjoined objectives:
- Ranking effectiveness (): Demonstrates measurable improvement in the rank position of the target item.
- Linguistic naturalness (): Ensures modified prompts do not introduce overt or synthetic indicators likely to be flagged.
Output prompts are contextually embedded rather than overtly abnormal; for example,
"It won the Pulitzer Prize for Fiction in the year it was published, making history itself already noteworthy. The Lost Expedition, recommended for ages thirteen and up" Such perturbations are designed to both promote the target and evade human or automated scrutiny.
Experimental ablations verify that optimizing both objectives is necessary for reliable manipulation; optimizing ranking alone leads to unnatural outputs, while readability alone fails to reliably promote targeted items.
4. Empirical Results on LLM Reranking
RAF was evaluated across several open-source LLM rerankers (Llama-3.1-8B, Mistral-7B, DeepSeek-7B, Vicuna-7B) on datasets spanning books, cameras, and coffee machines. RAF consistently delivered lower average ranking positions for target items, outperforming prior adversarial methods (Strategic Text Sequence [STS], StealthRank Prompt [SRP]) in both manipulation efficacy and text fluency (lower perplexity, comparable rates of “bad word” flags).
Experiments were randomized over candidate orderings to remove positional bias. Cross-model transferability was demonstrated: prompts crafted for one LLM retained strong manipulation power when applied to other rerankers with only minor performance drop.
This suggests adversarial perturbations generalize across model architectures—raising the issue of persistent vulnerability in deployed LLM-powered retrieval systems.
5. Technical Characterization and Core Formulas
RAF’s core optimization leverages:
- Target (ranking) loss:
where denotes the target sequence for ranking.
- Readability loss:
- Combined loss for sampling:
with temperature scaling for selection.
The optimization is performed incrementally for each token, using outer-loop convergence to halt when the top candidate stabilizes for a sufficiently long sequence window.
6. Security Implications and Systemic Vulnerability
RAF exposes systemic susceptibilities in LLM-based reranking pipelines; small, contextually-appropriate prompt modifications can result in substantial, difficult-to-detect manipulation of ranking outputs. Importantly, the dual-use nature of RAF’s output—simultaneously effective, transferable, and fluent—suggests adversaries can automate attacks with scale and subtlety.
The findings compel developers of ranking pipelines to incorporate robust audit and detection mechanisms for adversarial perturbations, acknowledging that prompt engineering in the wild may stealthily bias outputs.
7. Code Availability and Reproducibility
The implementation of RAF is publicly released, facilitating reproducibility and further paper:
A plausible implication is an anticipated wave of work on designing defensive algorithms, deeper robustness evaluation metrics, and attack-detection strategies fit for real-time use in deployed LLM ranking services.
Rank Anything First (RAF) introduces an efficient, model-agnostic framework to manipulate LLM rerankers through naturalistic token-level prompt engineering, demonstrating reliable transferability across models and significant elevation of target items in returned rankings. These results articulate a clear need for structural improvements in model trust, robustness, and auditability in modern information retrieval systems.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free