Iterative Information Collector (IIC)

Updated 8 February 2026

Iterative Information Collector (IIC) is a neural framework that recasts exemplar selection as a sequential decision process using reinforcement learning.
It employs a GRU-based state encoder atop a frozen dense retriever and uses stratified sampling to balance exploration and exploitation during retrieval.
Empirical results show IIC’s robust transferability across diverse datasets and LLM families, significantly outperforming standard one-shot k-shot retrieval methods.

The Iterative Information Collector (IIC), also referred to as the iterative retriever, is a neural framework that recasts in-context exemplar selection for LLMs as a sequential decision process driven by reinforcement learning. Unlike standard k-shot retrieval—where top-K examples are selected in a single pass by similarity—IIC constructs the exemplar set iteratively, accounting for dependencies and interactions among exemplars. This approach is formalized as a Markov decision process that optimizes a retrieval policy for downstream LLM performance via log-probability feedback from the target model. IIC introduces a lightweight, stateful state encoder atop a frozen dense retriever, and achieves superior retrieval for semantic parsing ICL tasks, with robust generalization across datasets and LLM families (2406.14739).

1. Formulation as a Combinatorial Optimization Problem

IIC addresses the challenge of selecting K exemplars $(x_i, y_i)$ from a dataset $D$ to maximize the conditional likelihood $P_{LM}(y|x; (x',y')^{K})$ , a combinatorial optimization problem that is NP-hard due to the exponential number of candidate sets:

$R^{\star}(x) = \arg\max_{(x',y')^K \subset D} P_{LM}(y|x; (x',y')^{K}).$

Traditional one-shot retrievers deploy a similarity heuristic:

$R(x) = \text{topK}_{(x',y') \in D} S(x,(x',y')).$

IIC instead models retrieval as a Markov decision process (MDP), selecting exemplars sequentially. The state $s_t$ is a vector encoding the sequence of exemplars chosen so far, actions $a_t$ correspond to selecting a new exemplar $(x_i, y_i)$ , and transitions are realized by a GRU-based state encoder. The retrieval policy $\pi_{\theta}(a_t|s_t)$ computes a query vector $q_t$ , and candidate exemplars are scored by dot product with embeddings from a frozen text encoder $F_{enc}$ . The MDP objective maximizes expected cumulative reward $J(\theta) = \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=1}^{K} r(s_t, a_t)\right]$ , where rewards are derived from LLM feedback (Section 2).

2. Reinforcement Learning Framework and Reward Shaping

The IIC training procedure is grounded in reinforcement learning with LLM-driven reward shaping. For a given test query $x$ and gold output $y^{\star}$ , the stepwise reward reflects the incremental gain in the LLM's log-probability of the correct output upon adding an exemplar:

$r(s_t, a_t) \approx \log P_{LM}(y^{\star}|x; S_{t+1}) - \log P_{LM}(y^{\star}|x; S_t).$

This decomposition provides a per-step, dense signal capturing the marginal utility of each chosen exemplar.

Policy optimization is conducted via Proximal Policy Optimization (PPO) in an actor-critic setting:

The policy $\pi_\theta(a|s)$ (actor) determines the candidate probabilities.
The value function $V_\phi(s)$ (critic) estimates expected returns from a given state.
The advantage estimator $\hat{A}_t$ is computed via generalized advantage estimation (GAE):

$\delta_t = r(s_t, a_t) + \gamma V_\phi(s_{t+1}) - V_\phi(s_t), \quad \hat{A}_t = \sum_{\ell=0}^{K-t} (\gamma \lambda)^{\ell} \delta_{t+\ell}.$

Full optimization minimizes the PPO clipped surrogate loss, value head MSE loss, and includes an entropy bonus for exploration.

3. State Encoding and Network Architecture

IIC's architecture augments a frozen dense retriever (Contriever; $d \approx 768$ ) with a lightweight, trainable state encoder:

The text encoder $F_{enc}(\cdot)$ is frozen and initialized from Contriever, mapping text to $\mathbb{R}^d$ .
A GRU with hidden size $d$ encodes the history of chosen exemplars; each transition updates the state $s_{t+1} = \text{GRU}(s_t, F_{enc}(x_t))$ .
The policy head is a one-layer MLP $Q(s_t)$ ( $\mathbb{R}^d \rightarrow \mathbb{R}^d$ ).
Value estimation uses a linear head: $V(s_t) = v^\top s_t + b$ .

This design introduces approximately 4 million additional parameters atop the 110M of Contriever.

4. Iterative Retrieval Procedure

At inference, IIC retrieves exemplars as follows (no gradients):

Initialize the state $s_0$ .
For each $t$ $t$ in $1..K$:
- Compute $q_t = Q(s_t)$ .
- Score all candidates via inner product: $q_t \cdot F_{enc}(c) / \beta$ , where $\beta$ is a temperature hyperparameter.
- Use "STRATIFIED_SAMPLE": select top $K/N_s$ candidates, partition the remainder into $(N_s-1)$ strata, draw equally, then renormalize and (optionally) sample from the resulting distribution.
- Select action $a_t$ (greedily or by sampling). Update state: $s_{t+1} = \text{GRU}(s_t, F_{enc}(a_t.x))$ . Append $a_t$ to the retrieved set.

The stratified sampling mechanism balances exploitation of high-scoring exemplars and exploration of diverse candidates.

5. Generalization, Evaluation, and Empirical Results

IIC is trained with a smaller LLM (Llama-2-7b) as an environment simulator; at inference, the fixed retriever policy is deployed with different or larger LLMs (e.g., Llama-2-70b, CodeLlama-70b, Mistral-7b). Empirically, policies trained on one LLM achieve high transferability: in 75% of test-LM×dataset configurations, IIC surpasses strong baselines by at least 1 EM@1 point and remains competitive elsewhere.

Key evaluation settings include:

Datasets: SMCalFlow (dialogue-to-AMR), TreeDST (dialogue state tracking), MTOP-EN (multilingual parsing).
Baselines: BM25, Contriever, EPR (contrastive fine-tuning), CEIL (diversity via DPP).
Metrics: Exact Match @k (EM@1, EM@3), SMatch F1 (AMR-style partial match).

In all cases, IIC (or "ITERR") outperforms competing methods. For instance, on SMCalFlow with 10 exemplars, EM@1 rises from 44.0 (Contriever) to 54.1 (ITERR), and SMatch-F from 67.6 to 77.3. Similar gains are observed on TreeDST and MTOP.

Ablation studies reveal that EPR initialization, GRU-based state encoding, and stratified sampling are critical: removing EPR init drops EM@1 by nearly 9 points, Transformer decoder instability replaces the GRU, and omitting stratified sampling degrades retrieval quality.

6. Significance and Implications

IIC transforms k-shot in-context retrieval into a stateful, sequential decision paradigm that explicitly models exemplar interactions. Its reinforcement learning strategy, driven by incremental log-probability improvements in the LM, enables end-to-end retrieval policies that are robust to variations in both tasks and downstream LLMs. With minimal parameter overhead, it achieves substantial performance gains, establishing a new retrieval framework for downstream LM tasks where the choice and order of exemplars are inherently non-i.i.d. and interaction-dependent (2406.14739).

Markdown Report Issue Upgrade to Chat

References (1)

Learning to Retrieve Iteratively for In-Context Learning (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Iterative Information Collector (IIC).

Iterative Information Collector (IIC)

1. Formulation as a Combinatorial Optimization Problem

2. Reinforcement Learning Framework and Reward Shaping

3. State Encoding and Network Architecture

4. Iterative Retrieval Procedure

5. Generalization, Evaluation, and Empirical Results

6. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Iterative Information Collector (IIC)

1. Formulation as a Combinatorial Optimization Problem

2. Reinforcement Learning Framework and Reward Shaping

3. State Encoding and Network Architecture

4. Iterative Retrieval Procedure

5. Generalization, Evaluation, and Empirical Results

6. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research