SEAL-RAG Controller
- SEAL-RAG Controller is a mechanism for fixed-budget evidence assembly that replaces low-utility passages with higher scoring candidates to enhance answer precision.
- It employs a cyclic process integrating entity-led extraction and replacement under a strict evidence cardinality constraint, improving multi-hop retrieval accuracy and estimator variance.
- Empirical results reveal significant gains (up to +8–70 percentage points) in both answer correctness and evidence precision compared to baseline methods.
Fixed-budget evidence assembly refers to a rigorously constrained framework in which the selection and orchestration of evidence (e.g., in causal inference or multi-hop retrieval-augmented question answering) is optimized under a strict cardinality or resource budget. This paradigm arises in both experimental design—where treatment assignments must not exceed a fixed budget—and multi-hop retrieval, where the context set presented to a LLM or estimator is bounded in size. Two prominent domains exemplifying recent advances are variance-optimal treatment assignment via dependent randomized rounding (Yamin et al., 15 Jun 2025), and context optimization in Retrieval-Augmented Generation (RAG) via entity-aware, replacement-based controllers (Lahmy et al., 11 Dec 2025). In both settings, budget-constrained assembly is not merely pruning or greedy selection but entails sophisticated mechanisms for optimizing statistical, inference, or answer-correctness metrics given hard resource limits.
1. Formal Problem Statements in Fixed-Budget Evidence Assembly
In experimental design, let be the number of candidate units, each associated with a target treatment probability , such that where is the (integer-valued) total treatment budget. The joint assignment vector must satisfy:
- with probability 1 (exact-budget constraint),
- for all (marginal constraint),
- Var is minimized for an estimator of the treatment effect.
In fixed- RAG, the evidence set (from a large corpus ), , is optimized so that the probability of correct answer generation is maximized. The problem is:
Expanding beyond induces "context dilution," wherein superfluous or noisy evidence degrades model performance even if recall increases. The formalism enforces a strict cardinality constraint and frames the optimization as an active, iterative, set-repair process (Lahmy et al., 11 Dec 2025).
2. Algorithmic Approaches: Dependent Randomized Rounding and Replacement Loops
In causal experimental design, dependent randomized rounding—specifically, swap rounding—transforms the fractional allocation into an integral assignment such that:
- Budget is precisely matched at each step,
- Marginals are preserved,
- Negative correlations are induced between assigned units, thereby minimizing estimator variance.
The swap rounding algorithm identifies two fractional coordinates at each iteration and executes a probabilistic swap that maintains budget feasibility and marginality. Convergence is achieved in time and space (Yamin et al., 15 Jun 2025).
In multi-hop RAG, the SEAL controller operationalizes "replace, don’t expand" via:
- State at iteration , with the current evidence, the entity ledger extracted from , and a blocklist of ineffective queries.
- A loop: Search Extract Assess Loop, where entity-anchored extraction yields gap specifications (missing entities/relations), targeted micro-queries are issued, entity-first utility scores are computed, and the least useful evidence passage is replaced with the highest utility candidate if a threshold is surpassed.
- The loop preserves strictly at every iteration, yielding both cost predictability and defense against dilution (Lahmy et al., 11 Dec 2025).
3. Theoretical Guarantees and Variance/Evidence Optimization
For dependent randomized rounding (swap rounding):
- The IPW estimator is unbiased: with .
- Variance is decomposed as:
where and all pairwise covariances induced by swaps are negative:
yielding strictly lower variance than independent Bernoulli assignment.
For SEAL-RAG:
- Utility scoring incorporates explicit metrics: gap coverage, corroboration, novelty, and redundancy penalty:
- Sufficiency gating is a function of LLM-generated signals (Coverage, Corroboration, Contradiction, Answerability), halting repair when gaps are fully closed.
- Cost complexity is , ensuring generator token cost grows with only (Lahmy et al., 11 Dec 2025).
4. Practical Implementation Considerations
Swap rounding implementation requires only vector storage for and a list of fractional indices. Pair selection can be arbitrary, but a covariate-ordered variant—ordering units via a TSP-style tour in covariate space and preferentially swapping adjacent pairs—yields stronger local negative correlation, further reducing estimator variance when outcome and propensity assignment are smooth in covariates. For very large , block-wise application is practical and maintains strong negative dependence globally (Yamin et al., 15 Jun 2025).
SEAL-RAG implementation entails:
- Dense embedder-based retrieval with OpenAI embeddings and fixed corpus segmentation (e.g., Wikipedia pages).
- Open-IE extraction and entity-ledger construction for every .
- Targeted, atomic micro-queries derived directly from missing facts, filtered by a blocklist to avoid unproductive cycles.
- Entity-first replacement with utility-thresholded swaps.
- All baselines share the same retriever, index, and LLM setup to control for modeling or environment confounds (Lahmy et al., 11 Dec 2025).
5. Empirical Results and Quantitative Gains
Empirical studies in swap rounding demonstrate:
- Covariate-ordered swap rounding achieves 10–50% variance reduction in IPW estimators over standard approaches (e.g., repeated Bernoulli, uniform selection, Morgan-Rubin rerandomization) at moderate sample sizes.
- In RCT-based semi-synthetic tasks, swap rounding is the top unbiased performer and competitive with biased low-variance estimators.
- On heterogeneous real-world data (public housing), vanilla (unordered) swap rounding remains optimal where treatment/outcome heterogeneity is dominated by assignment probability (Yamin et al., 15 Jun 2025).
SEAL-RAG achieves:
- On HotpotQA (), Judge-EM increases from 71% (Self-RAG) to 77% (+6 percentage points), and evidence precision from 76% to 89% (+13 pp), each .
- On 2WikiMultiHopQA (), accuracy increases from 66.5% (Adaptive- buffer) to 74.5% (+8 pp), and precision@5 from 26% to 96% (+70 pp).
- Across datasets, strictly enforcing a fixed evidence budget and performing active replacement yields consistent, statistically significant improvements (+3–19 pp for correctness, +12–70 pp for evidence precision), robustly countering context dilution (Lahmy et al., 11 Dec 2025).
6. Limitations, Open Questions, and Future Directions
Current frameworks are limited in the following ways:
- Swap rounding is not immediately extensible to multi-arm interventions, continuous treatments, or block/cluster randomization; these generalizations are open questions.
- For massive data (), more scalable or distributed dependent rounding methods are needed.
- SEAL-RAG presently assumes discrete, atomic passage retrieval; adaptation to hierarchical, structured, or joint passage-entity retrieval regimes remains unaddressed.
- The interaction between fixed-budget assembly and covariate-adaptive or fully sequential experiment designs is not fully explored.
- Entity-first replacement assumes reliable extraction and entity-linking; failure in extraction may impede robust gap closure or sufficiency gating.
A plausible implication is that fixed-budget principles—negative dependence in treatment assignment and context-optimized iterative repair in RAG—may generalize to broader constrained evidence management problems, provided effective gap detection and negative correlation can be reliably achieved.
7. Comparison of Fixed-Budget Assembly Paradigms
| Domain | Constraint | Optimization Principle | Algorithmic Core | Empirical Result |
|---|---|---|---|---|
| Causal Experiment | Exact | Variance minimization (IPW/etc) | Swap rounding (dependent) | 10–50% variance reduction over baselines |
| Multi-hop RAG | Evidence | Answer correctness / evidence precision | SEAL loop (entity-aware replace) | +3–19 pp correctness, +12–70 pp precision over baselines |
Both paradigms demonstrate that under fixed budgets, careful dependency-inducing or utility-aware replacement approaches can dramatically improve the efficiency and quality of inference relative to naïve greedy, independent, or expansion-based baselines (Yamin et al., 15 Jun 2025, Lahmy et al., 11 Dec 2025).