SPS: Search, Optimization & Retrieval

Updated 28 December 2025

Similar Prompts Searching (SPS) is a framework that identifies semantically similar prompts using structured search, graph models, and statistical techniques.
Core methods include search-based approaches like beam search and random walk, as well as retrieval techniques utilizing token-level LSH and KL divergence minimization.
Empirical findings demonstrate that SPS improves model efficiency, inference speed, and task performance across diverse NLP and multimodal applications.

Similar Prompts Searching (SPS) refers to algorithmic and statistical approaches for identifying, evaluating, and leveraging prompts that are semantically or functionally similar within large language and multimodal model systems. SPS is a foundational component in prompt optimization, transfer, retrieval-augmentation, KVCache reuse, and creative prompt evolution. It encompasses both discrete and continuous methods, spanning text, visual, and architectural prompt modalities.

1. Formal Frameworks for SPS: Graphs, Search, and Similarity

SPS methodologies model the prompt space—denoted $\mathcal{P}$ —as a structured domain supporting a spectrum of search and retrieval operations. In "Prompt Optimization as a State-Space Search Problem" (Taneja, 23 Nov 2025), the prompt space is constructed as a directed graph $G=(V,E)$ where each node $p\in \mathcal{P}$ encodes a prompt (represented by a PromptNode including the prompt string, parent, generating operator, and evaluation score). Edges correspond to transformation operators $O:\mathcal{P}\times \mathcal{I}\times \mathcal{D}_{\rm train}\rightarrow\mathcal{P}$ mapping one prompt to another via explicit operations such as shortening, adding demonstrations, or reordering content.

For retrieval paradigms such as SemShareKV (Zhao et al., 29 Sep 2025), the notion of similarity is operationalized via token-level and prompt-level locality-sensitive hashing (LSH) on semantically and positionally augmented embeddings—enabling efficient sublinear retrieval of semantically proximate prompts even under significant lexical or structural variation. Visual and cross-modal variants, such as DualCap (Li et al., 28 Oct 2025), extend SPS to dual retrieval (image-to-image and image-to-text) and feature fusion over token or patch-level embeddings.

2. Core Algorithms: Search, Optimization, and Retrieval

Search-based SPS:

Popular SPS techniques include random walk and beam search. Random walk applies randomly selected transformation operators iteratively, updating best-so-far according to a prompt evaluation heuristic:

for t in 1...N:
    choose m in M uniformly
    p_next = m.apply(current, T)
    score_next = Eval(p_next, D)
    if score_next > Eval(best, D): best = p_next
    current = p_next

Beam search expands the top-

k

nodes at each depth, systematically exploring transformation combinations and pruning by evaluation score, thereby maximizing the likelihood of locating functionally robust prompts. Formally, at depth

\ell

, the beam

B_\ell

is updated as:

B_{\ell+1} = \operatorname{top}_k\{ O(p, i, T) | p \in B_\ell, O \in M \}

with scoring

h(p)=Eval(p, D)

(Taneja, 23 Nov 2025).

Retrieval-based SPS:

Token-level LSH, as formalized in SemShareKV, maps positionally encoded and normalized token embeddings $x\in\mathbb{R}^d$ through $L$ hash tables, each based on $k$ random projections and bucket width $w$ . The hash function per table is: $g(x)=\left(h_1(x),...,h_k(x)\right), \quad h_j(x)=\left\lfloor \frac{r_j\cdot x + b}{w}\right\rfloor$ This fuzzy-matching infrastructure efficiently indexes and queries candidate tokens or prompt segments, yielding high recall/precision even for paraphrased or reordered prompts (Zhao et al., 29 Sep 2025).

Functional Similarity Search:

"Prompts have evil twins" (Melamed et al., 2023) frames SPS as discrete maximum likelihood estimation, seeking prompts $\hat{p}$ that approximate the output distribution $P(\cdot|p^*)$ of a reference prompt $p^*$ using greedy coordinate gradient (GCG) optimization over the KL divergence between induced distributions: $d_{\rm kl}(p^*\Vert p)=\mathrm{KL}\left[P(\cdot|p^*)\Vert P(\cdot|p)\right]$

Task-level Prompt Selection:

In Vision In-Context Learning (VICL), SPS reduces to identifying a prompt subset $\mathcal{P}^*$ minimizing aggregate task loss: $\mathcal{P}^* = \arg\min_{\mathcal{P}\subseteq\mathcal{S}} \sum_{(x_q,y_q)\in\mathcal{D}} \mathcal{L}(f(\mathcal{P},x_q),y_q)$ using top-K or greedy search strategies optimized over validation data (Zhu et al., 15 Jan 2025).

3. Transformation Operators and Mutation

Prompt optimization frameworks concretize SPS by defining operator sets $\mathcal{M}$ acting over prompt substrings:

Operator	Description
make_concise	Shorten and clarify prompt text
add_examples	Extend with few-shot input–output pairs
reorder	Change segment order (e.g., swap 'Instruction'/'Format')
make_verbose	Expand with additional guidance/detail

Empirical analysis reveals that make_concise dominates successful search trajectories, with add_examples and reorder being contextually important. Verbosity is rarely selected, indicating brevity is beneficial for instruction execution (Taneja, 23 Nov 2025).

Visual SPS, as in DualCap (Li et al., 28 Oct 2025), employs retrieval and chunk-based keyword distillation (POS-chunked nouns, verbs, adjectives) as operators that inject explicit scene semantics into vision-LLMs.

SCAPE (Lim et al., 31 Jan 2024) treats architectural prompt genes as mutable attributes (style, site, color, lighting, shape, material), leveraging human-guided selection, GPT-4-driven mutation/crossover, and stochastic attribute re-sampling to explore the conceptual prompt space.

4. Evaluation Metrics and Empirical Findings

Prompt candidates are assessed using:

String-match accuracy: $s_{\mathrm{str}}(p,x,y) = \mathbf{1}[f_p(x)=y]$
Critic LM evaluation: $s_{\mathrm{crit}}(p,x,y)=\mathbf{1}[\mathcal{C}(f_p(x),y)=\mathrm{true}]$ , where $\mathcal{C}$ is a large LM judge.
Task loss: e.g., mIOU for segmentation/detection, MSE for regression tasks.

For retrieval methods, match precision/recall is measured relative to brute-force nearest-neighbor search. Token-level LSH yields $>92\%$ recall with $L=8$ , $k=10$ , $w=0.6$ , and manageable index sizes (Zhao et al., 29 Sep 2025).

Prompt search impact:

In NLP prompt search (Taneja, 23 Nov 2025), beam search achieves development accuracy gains from 0.40 → 0.80 for reasoning but test improvements are modest (0.20 → 0.50), indicating path-specific overfitting (see Table below).

$\begin{array}{l|rrrr} \text{Task} & \text{Seed} & \text{One‐Shot} & \text{Random Walk} & \text{Beam Search}\\hline \text{reasoning} & 0.40 & 0.20 & 0.60 & 0.80 \ \text{(dev)}\ \text{reasoning} & 0.20 & 0.30 & 0.50 & 0.50 \ \text{(test)} \end{array}$

Prompt operator frequency in beam-best paths:

$\begin{array}{l|r} \text{Operator} & \text{Frequency}\\hline \text{make\_concise} & 4\ \text{add\_examples} & 2\ \text{reorder} & 2\ \text{make\_verbose} & 0 \end{array}$

SCAPE yields +67% novelty over basic DALL-E (Lim et al., 31 Jan 2024). In image captioning, DualCap’s SPS pipeline boosts CIDEr from 119.7 → 123.6 and SPICE from 21.3 → 22.0 (Li et al., 28 Oct 2025). SemShareKV achieves up to $6.25\times$ LLM inference speedup with 42% lower GPU memory at comparable output fidelity (Zhao et al., 29 Sep 2025). SPT achieves up to 90% improvement in response diversity (DIST-2) in dialog generation (Huang et al., 26 Jun 2024).

5. Architectural and Retrieval Design Variants

Dense vs. Sparse Retrieval:

SPT (Huang et al., 26 Jun 2024) implements a trainable dense retriever with context-prompt contrastive learning, mapping queries to the most relevant soft prompt for each conversational turn, using cosine similarity and softmax-normalized selection. Context diversity is enforced using contrastive regularization, ensuring prompt pool coverage and non-collapse.

Chunked Retrieval:

SemShareKV builds a global LSH index over overlapping token window embeddings, supporting prompt-level similarity search by tallying per-token match frequencies and scoring via a softmax-weighted token proximity function: $\mathrm{Sim}(Q,P) = \frac{1}{|Q|}\sum_{q\in Q}\exp(-\alpha \|x_q-x_{\text{match}(q)}\|^2)$ This approach scales to massive prompt libraries via sublinear indexing and sharding (Zhao et al., 29 Sep 2025).

Creative/Evolutionary Search:

SCAPE iterates over a population of attribute-vectored prompts guided by human selectors, GPT-4-driven mutation/crossover, and explicit memory (history of taboo and encouraged features). Mutation probability per attribute is $P_{\rm mutate}(a)=0.5$ if unrated, and crossover selection is weighted according to user ratings (Lim et al., 31 Jan 2024).

6. Limitations, Transferability, and Future Directions

Limitations of SPS frameworks include:

Overfitting to development heuristics in shallow search configurations (Taneja, 23 Nov 2025).
Brittleness of functionally-equivalent but syntactically non-interpretable prompts ("evil twins" are sensitive to single-token changes and model updates) (Melamed et al., 2023).
Reliance on heuristics for keyword/feature extraction in visual domains (Li et al., 28 Oct 2025).
Incomplete modeling of higher-order or relational context in image and multimodal fusion (Li et al., 28 Oct 2025).
Fixed-size prompt pools that may limit expressiveness in dense retrieval settings (Huang et al., 26 Jun 2024).

Transferability is empirical: evil twin prompts transfer between LLMs and across model sizes, though forward compatibility is not guaranteed (Melamed et al., 2023). SPS is extensible to multi-hop and multi-modal settings, with numerous directions for enhancement, such as learned phrase mining, deeper fusion layers for visual language fusion, and hybrid soft/hard prompt search (Li et al., 28 Oct 2025, Zhao et al., 29 Sep 2025).

7. Applications and Practical Guidelines

Applications of SPS span:

Prompt optimization via structured search and operator design (Taneja, 23 Nov 2025).
Efficient KVCache sharing and LLM inference acceleration (Zhao et al., 29 Sep 2025).
Retrieval-augmented image captioning and VQA (Li et al., 28 Oct 2025).
Personalized dialog generation with selective soft-prompting (Huang et al., 26 Jun 2024).
Creative design and novelty-driven generative frameworks (Lim et al., 31 Jan 2024).
Task-level prompt minimization for vision foundation models (Zhu et al., 15 Jan 2025).

Implementation guidelines:

Use shallow search and beam pruning for prompt optimization when computationally constrained (Taneja, 23 Nov 2025).
Employ RoPE in embedding pipelines to encode token position (Zhao et al., 29 Sep 2025).
Precompute/caching one-shot outputs for prompt selection in VICL (Zhu et al., 15 Jan 2025).
For soft-prompt pools, tune K and use contrastive objectives to encourage diversity (Huang et al., 26 Jun 2024).

A salient trend is the growing emphasis on principled, semantically aware, and resource-efficient SPS, underpinned by empirical evidence of improved model performance, generation diversity, and search efficiency across various domains.