Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 440 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Exploratory Retrieval-Augmented Planning (ExRAP)

Updated 17 September 2025
  • ExRAP is a framework for continual instruction following, integrating explicit temporal memory and exploration-based planning to adapt to dynamic environments.
  • It employs a Temporal Embodied Knowledge Graph to continuously update and query environmental state, ensuring context-grounded decision making.
  • Benchmark tests show that ExRAP improves task success rates and reduces execution delays compared to other LLM-based planners in non-stationary settings.

Exploratory Retrieval-Augmented Planning (ExRAP) is a framework for continual instruction following by embodied agents in dynamic, non-stationary environments. The primary aim is to equip LLM-based agents with the ability to efficiently explore and update an explicit, temporally-aware environmental context memory while executing multiple ongoing instructions. By integrating information-based exploration directly into the planning process and employing memory-augmented query evaluation, ExRAP supports robust, context-grounded decision making and demonstrates superior performance in environments with time-varying state and instruction sets.

1. Architectural Foundation and Problem Scope

ExRAP is architected for continual instruction following, where multiple tasks and instructions arrive and must be serviced simultaneously in environments whose state evolves over time. The core architecture fuses two interlocked modules:

  • Environmental Knowledge Integration Module: Maintains an external, temporally explicit memory—the Temporal Embodied Knowledge Graph (TEKG)—that records environmental state updates over time and supports rich, query-based retrieval.
  • Exploration-Integrated Planning Module: Balances exploitation (task execution given current knowledge) and exploration (actively acquiring missing environmental information) using value-based LLM planning.

Planning is formulated as an iterative process where, at each decision step, the agent must select an action that both pursues task objectives and controls the information gain about the current environment. This dual-objective is necessary in non-stationary environments, where a static memory or policy would quickly become unreliable.

2. Environmental Context Memory: Temporal Embodied Knowledge Graph (TEKG)

The environmental context memory is formalized as a Temporal Embodied Knowledge Graph. Every observation is encoded as a quadruple:

τ=(source_entity,relation,target_entity,t)\tau = (\text{source\_entity},\, \text{relation},\, \text{target\_entity},\, t)

where tt marks the timestep of acquisition. The memory at time tt is the set:

Gt={τ1,τ2,,τn},titG_t = \{\tau_1, \tau_2, \ldots, \tau_n\}, \quad \forall\, t_i \leq t

TEKG is updated continuously using an update function μ\mu, which merges new sensory observations ot+1o_{t+1} with existing context, filtering outdated or conflicting tuples. This mechanism ensures that queries and downstream decisions use a memory that closely tracks real-world dynamics, which is crucial for embodied agents operating in non-stationary domains.

3. Task Decomposition, Memory-Augmented Querying, and Execution

In ExRAP, each instruction is decomposed into:

  • Query qjq_j: A statement about the environment, typically conditional (e.g., “Is the mug on the counter?”).
  • Execution eje_j: An associated high-level action to perform if the query condition holds.
  • Mapping CC: Specifies which query controls the initiation of which execution.

Formally, the instruction interpreter ΦI\Phi_I outputs:

ΦI(I)=(Q={q1,,qM}, E={e1,,eM}, C)\Phi_I(\mathcal{I}) = (\mathcal{Q} = \{q_1, \ldots, q_M\},\ \mathcal{E} = \{e_1, \ldots, e_M\},\ C)

The probability that a query qq is satisfied is estimated by the memory-augmented query evaluator ΦM\Phi_M. For each query, the evaluation is:

P(qGt)={R(qGt1)if G^1:t unchanged ΦLLM(q,t,G^1:t,R(qGt1))otherwiseP(q | G_t) = \begin{cases} R(q | G_{t-1}) & \text{if } \hat{G}_{1:t} \text{ unchanged} \ \Phi_{\text{LLM}}(q, t, \hat{G}_{1:t}, R(q|G_{t-1})) & \text{otherwise} \end{cases}

Here, R(qGt1)R(q|G_{t-1}) is a prior response computed from historical memory and LLM queries, serving both as a filter and a baseline for temporal refinement (see Section 5). Tasks whose associated queries cross a probability threshold are scheduled for immediate execution.

4. Exploration-Integrated Planning and Control

ExRAP explicitly balances task execution (exploitation) with targeted exploration:

  • Exploitation Value vT(Gt,z)v_T(G_t, z): Assesses the value of executing skill zz using context, in-context demonstrations, and task-relevance filtering:

vT(Gt,z)=ΦLLM(Et,ΦR(Gt,Et),D,z)v_T(G_t, z) = \Phi_{\text{LLM}}(\mathcal{E}_t, \Phi_R(G_t, \mathcal{E}_t), D, z)

where DD is the set of relevant demonstrations.

  • Exploration Value vR(Gt,z)v_R(G_t, z): Estimates the skill’s value for reducing uncertainty about unfulfilled queries:

vR(Gt,z)=qQH(P(qGt))[1d(ΦR(Gtz,{q}))d(ΦR(Gt,{q}))]v_R(G_t, z) = \sum_{q \in \mathcal{Q}} H(P(q | G_t)) \cdot \left[1 - \frac{d(\Phi_R(G_t^z, \{q\}))}{d(\Phi_R(G_t, \{q\}))}\right]

where d()d(\cdot) is a distance metric over retrieved context and GtzG_t^z is the hypothetical next memory after executing zz.

The next action is determined by

zt=argmaxzZ [wTvT(Gt,z)+wRvR(Gt,z)]z_t = \underset{z \in Z}{\arg\max}\ [w_T v_T(G_t, z) + w_R v_R(G_t, z)]

such that the trade-off is tuned between action utility and information gain.

5. Temporal Consistency Refinement

TEKG’s reliability degrades as the environment evolves. ExRAP applies temporal consistency refinement by maintaining entropy constraints over query responses. When assessing a new response, ExRAP enforces

H(R(qGt1))>H(P(qGt1))H\big(R(q | G_{t-1})\big) > H\big(P(q | G_{t-1})\big)

to ensure uncertainty increases in the absence of new evidence. If this constraint is violated (e.g., the new response appears “overconfident” relative to outdated memory), it is ignored or flagged for re-exploration. This systematic decay prevents the agent from relying on stale or likely invalid memory.

6. Experimental Validation and Performance

ExRAP was benchmarked across VirtualHome, ALFRED, and CARLA—standard simulation platforms for embodied agents—under varying instruction scales, types, and levels of environmental non-stationarity. Key metrics were:

  • Task Success Rate (SR): Fraction of tasks completed upon satisfaction of their conditions.
  • Pending Steps (PS): Average delay between satisfaction of conditions and completion of execution.

ExRAP consistently outperformed strong LLM-based planners such as ZSP, SayCan, ProgPrompt, and LLM-Planner, achieving up to 18–20 points higher SR and reducing PS by several steps under high non-stationarity. These results underscore the benefit of integrating exploration-based methods and context memory maintenance.

7. Applications, Implications, and Future Directions

ExRAP’s framework is especially relevant for agents expected to operate over long horizons in open, evolving physical spaces. Applications include:

  • Smart Homes: Robots that must monitor, react, and update environmental models while executing overlapping cleaning, monitoring, or social tasks.
  • Autonomous Vehicles: Continuous adaptation to dynamic road conditions and traffic while following complex, ongoing directives.

The explicit separation and integration of retrieval-augmented temporal memory, exploration-based planning, and temporal refinement suggest future directions such as:

  • Enhanced retrieval architectures incorporating uncertainty-aware querying.
  • Adaptive exploration weights based on real-time coverage and event frequency.
  • Expanding context modeling to include richer forms of uncertainty quantification and hybrid symbolic-neural representations.

ExRAP establishes that robust continual instruction following in non-stationary embodied environments requires co-optimization of information gathering (exploration), grounded memory, and temporal consistency, with planning mechanisms designed to actively reconcile these competing pressures. This approach is a pivotal step towards embodied agents capable of scalable, efficient, and context-sensitive autonomy in the real world (Yoo et al., 10 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Exploratory Retrieval-Augmented Planning (ExRAP).