Recycling Search Experience (RSE)

Updated 5 February 2026

RSE is a methodology framework that transforms isolated search processes into cumulative workflows by systematically recycling intermediate insights.
It leverages positive recycling to build on verified sub-solutions and negative recycling to avoid redundant failure patterns, enhancing efficiency in reasoning and data querying.
Empirical evaluations show that RSE achieves significant performance gains in LLM reasoning and harmonized survey discovery by reducing redundancy and scaling search efficiency.

Recycling Search Experience (RSE) is a family of methodologies that transform otherwise isolated or “memoryless” search and retrieval processes—whether in LLM-based inference or cross-survey variable discovery—into cumulative, self-guided workflows by systematically capturing and reusing intermediate insights. The central focus is on positive recycling (leveraging verified sub-solutions or harmonized mappings across trials or data partitions) and negative recycling (avoiding reconstructing failure patterns or known “dead ends”). RSE strategies can be applied to both the scaling of reasoning in large models at inference time and the efficient querying of harmonized datasets, typically requiring no additional training or supervision but instead restructuring batch or iterative workflows into an incremental discovery pipeline (Wang et al., 29 Jan 2026, Tu et al., 2022).

1. Motivation and Conceptual Framework

Modern search and inference strategies—such as test-time scaling for LLMs or exploration of harmonized survey repositories—are commonly structured around repeated, independent trials. In the test-time scaling regime, this takes the form of parallel rollouts or iterative refinement, while in data harmonization it involves consecutive variable mappings and conditional queries. This “memoryless” paradigm discards valuable intermediate facts or insights, which leads to redundancy, inefficiency, and, as task complexity scales, rapidly diminishing marginal returns.

RSE is explicitly designed to overcome these limitations by (a) continuously accumulating actionable “experience” in a global repository (positive and negative sub-results, failure signatures, harmonized mappings) and (b) conditioning future search or inference on this bank. This enables order-of-magnitude efficiency improvements, both theoretically and empirically, particularly in settings where exhaustive search is infeasible or combinatorial explosion is a concern (Wang et al., 29 Jan 2026).

2. Algorithmic Instantiation: LLM Reasoning

The canonical RSE implementation in LLM-based mathematical reasoning operates as follows (Wang et al., 29 Jan 2026):

Experience Bank. After each batch of problem-solving rollouts by an LLM $T$ , the raw trajectories $w$ are distilled into:

$E^+$ : Positive Experience. Verified intermediate facts, lemmas, or subclaims found anywhere within rollouts.
$E^-$ : Negative Experience. Discoverable dead-end signatures, recurring failure patterns, or premise–outcome mappings that predict unproductive search areas.

These are aggregated into a shared bank $E = E^+ \cup E^-$ .

Iterative Search. For $R$ rounds, with $K_r$ rollouts per round $r$ :

Serialize the prompt by combining the original problem $x$ with all accumulated $E^+, E^-$ .
Generate $w$ 0 trajectories in parallel with $w$ 1.
Distill new experience $w$ 2 from outputs, and deduplicate additions using semantic similarity (e.g., thresholded cosine similarity on embeddings).
Update the bank $w$ 3 accordingly; ensure only semantically novel experience joins the bank.

Pseudocode for the full procedure is explicitly provided in (Wang et al., 29 Jan 2026). Distinct from naive concatenation or simple majority voting, RSE-guided prompts enable not only the direct reuse of discovered sub-proofs (“By Lemma $w$ 4 from bank…”) but also pruning of fruitless solution spaces (“Avoid these dead ends: …”).

Mathematical Formulation. Under an abstract oracle formalism, for any rollout with injected experience set $w$ 5, the outcome is $w$ 6. Under monotonicity ( $w$ 7), RSE’s incremental accumulation strictly dominates independent sampling in terms of probability of full solution coverage within $w$ 8 rollouts.

3. Theoretical Properties and Performance Bounds

The distribution-free dominance of RSE over memoryless rollouts is established via monotonicity and persistence assumptions: the accumulated experience set $w$ 9 at round $E^+$ 0 always subsumes the outcomes of $E^+$ 1 independent rollouts without experience injection, guaranteeing that $E^+$ 2RSE succeeds $E^+$ 3independent succeeds $E^+$ 4 for any fixed $E^+$ 5.

Under a toy additive coverage model where each required item $E^+$ 6 appears in a rollout with probability $E^+$ 7:

Baseline: $E^+$ 8 (exponentially hard in $E^+$ 9)
RSE: $E^-$ 0 (linear in $E^-$ 1)

This exponential gap is critical for scaling to complex, multi-step reasoning tasks.

4. Application in Harmonized Data Querying

RSE principles also underpin large-scale survey harmonization and variable discovery frameworks, such as SDRQuerier (Tu et al., 2022). The processing pipeline is structured into interconnected modules that reflect RSE’s cumulative, feedback-driven philosophy:

Query-by-Question (QBQ): BERT-based classifier and embedding recommender for mapping natural-language research questions or keywords to harmonized target variables. Soft (clustered) recommendations are generated using t-SNE on fine-tuned [CLS] embeddings, achieving clustering quality of AMI > 0.75.

Query-by-Condition (QBC): Computes variable-wise and joint coverage profiles (“separate” and “joint” availability) across country-year, guiding researchers toward samples adequate for statistical analysis. Visualization is provided via streamgraphs that encode presence/absence or overlap of selections over time and geography.

Query-by-Relation (QBR): On-the-fly computation of pairwise correlations, $E^-$ 2, p-values, and network graphs, surfacing relational patterns among substantive and methodological variables for regression model construction. Selection events in one view propagate through linked scatterplots, bar charts, and network diagrams for seamless exploration.

This suggests that the concept of “experience recycling” generalizes beyond reasoning chains and is productive for structuring user intent, coverage discovery, and relational hypothesis validation in multi-source, harmonized data environments.

5. Empirical Evaluation and Ablations

In mathematical reasoning, RSE has been benchmarked on HMMT 2024/2025, IMO-Bench, and HLE-Math datasets using QWEN3-30B-THINKING, QWEN3-4B-THINKING, PHI-4-REASONING, and DEEPSEEK-V3.2 models. RSE consistently outperforms baselines (majority voting, standard batched sampling, Self-Refine, PaCoRe concat) under matched FLOP and rollout budgets. For example:

On HMMT25 with QWEN3-30B-THINKING, pass@1: rises from 69.0% (iteration 0) to 83.9% (iteration 3).
RSE yields +13.5 pp gain over base on DEEPSEEK-V3.2 at iteration 2 on HLE.

Deduplication threshold $E^-$ 3 balances redundancy and diversity. Using only positive or negative recycling individually yields ≈+15–16 pp above base, while full RSE gains ≈+17 pp. Analyses confirm maintenance of chain-of-thought diversity and active reasoning (rollout entropy ≈ 0.33), with PaCoRe showing collapse to verification-only regime (entropy ≈ 0.06) (Wang et al., 29 Jan 2026).

In harmonized survey discovery, RSE implementations in SDRQuerier cut task-completion time by >60% and increase embedding-based cluster purity by 0.3 AMI compared to standard BERT, while also stabilizing dimensionality reduction convergence (variance ↓50%) (Tu et al., 2022).

6. Integration, Limitations, and Generalization

RSE operates as a training-free, inference-time wrapper and is plug-compatible with standard LLM test-time scaling loops: after each batch, update the experience bank and use it for prompt serialization in future rounds. No retraining or auxiliary reward signals are required.

Compared to prior rollout-based paradigms:

Parallel Sampling: No cross-trial memory or sharing.
Sequential Refinement: Reuse only within a single chain, limited by context window.
Hybrid (lookahead + reward): Often dependent on reward models and unable to share intermediate facts across search branches.

RSE unifies the advantages of breadth and depth by supporting full parallelism while incrementally constructing a structured, reusable memory across rounds and chains.

In harmonized data exploration, the RSE blueprint comprises (a) natural-language retrieval backbones (fine-tuned BERT or RoBERTa) for variable discovery, (b) multi-scale availability tracers for coverage visualization, and (c) integrated, interactive relational summarization for robust hypothesis and model specification. Essential guidelines include precomputing embeddings for scalability, user-centric visual encoding, and provenance/quality linking. Caution is warranted to avoid excessive visual complexity or misleading graphical encodings; clarity of legend and deliberate stepwise guidance are necessary for effective user experience (Tu et al., 2022).

A plausible implication is that RSE approaches may become a core primitive for both LLM-driven scientific inference and the next generation of interactive, cross-dataset analytics systems, offering principled efficiency and methodological robustness across domains.

Markdown Report Issue Upgrade to Chat

References (2)

Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling (2026)

SQRQuerier: A Visual Querying Framework for Cross-national Survey Data Recycling (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Recycling Search Experience (RSE).