Exploratory Retrieval-Augmented Planning (ExRAP)
- ExRAP is a framework for continual instruction following, integrating explicit temporal memory and exploration-based planning to adapt to dynamic environments.
- It employs a Temporal Embodied Knowledge Graph to continuously update and query environmental state, ensuring context-grounded decision making.
- Benchmark tests show that ExRAP improves task success rates and reduces execution delays compared to other LLM-based planners in non-stationary settings.
Exploratory Retrieval-Augmented Planning (ExRAP) is a framework for continual instruction following by embodied agents in dynamic, non-stationary environments. The primary aim is to equip LLM-based agents with the ability to efficiently explore and update an explicit, temporally-aware environmental context memory while executing multiple ongoing instructions. By integrating information-based exploration directly into the planning process and employing memory-augmented query evaluation, ExRAP supports robust, context-grounded decision making and demonstrates superior performance in environments with time-varying state and instruction sets.
1. Architectural Foundation and Problem Scope
ExRAP is architected for continual instruction following, where multiple tasks and instructions arrive and must be serviced simultaneously in environments whose state evolves over time. The core architecture fuses two interlocked modules:
- Environmental Knowledge Integration Module: Maintains an external, temporally explicit memory—the Temporal Embodied Knowledge Graph (TEKG)—that records environmental state updates over time and supports rich, query-based retrieval.
- Exploration-Integrated Planning Module: Balances exploitation (task execution given current knowledge) and exploration (actively acquiring missing environmental information) using value-based LLM planning.
Planning is formulated as an iterative process where, at each decision step, the agent must select an action that both pursues task objectives and controls the information gain about the current environment. This dual-objective is necessary in non-stationary environments, where a static memory or policy would quickly become unreliable.
2. Environmental Context Memory: Temporal Embodied Knowledge Graph (TEKG)
The environmental context memory is formalized as a Temporal Embodied Knowledge Graph. Every observation is encoded as a quadruple:
where marks the timestep of acquisition. The memory at time is the set:
TEKG is updated continuously using an update function , which merges new sensory observations with existing context, filtering outdated or conflicting tuples. This mechanism ensures that queries and downstream decisions use a memory that closely tracks real-world dynamics, which is crucial for embodied agents operating in non-stationary domains.
3. Task Decomposition, Memory-Augmented Querying, and Execution
In ExRAP, each instruction is decomposed into:
- Query : A statement about the environment, typically conditional (e.g., “Is the mug on the counter?”).
- Execution : An associated high-level action to perform if the query condition holds.
- Mapping : Specifies which query controls the initiation of which execution.
Formally, the instruction interpreter outputs:
The probability that a query is satisfied is estimated by the memory-augmented query evaluator . For each query, the evaluation is:
Here, is a prior response computed from historical memory and LLM queries, serving both as a filter and a baseline for temporal refinement (see Section 5). Tasks whose associated queries cross a probability threshold are scheduled for immediate execution.
4. Exploration-Integrated Planning and Control
ExRAP explicitly balances task execution (exploitation) with targeted exploration:
- Exploitation Value : Assesses the value of executing skill using context, in-context demonstrations, and task-relevance filtering:
where is the set of relevant demonstrations.
- Exploration Value : Estimates the skill’s value for reducing uncertainty about unfulfilled queries:
where is a distance metric over retrieved context and is the hypothetical next memory after executing .
The next action is determined by
such that the trade-off is tuned between action utility and information gain.
5. Temporal Consistency Refinement
TEKG’s reliability degrades as the environment evolves. ExRAP applies temporal consistency refinement by maintaining entropy constraints over query responses. When assessing a new response, ExRAP enforces
to ensure uncertainty increases in the absence of new evidence. If this constraint is violated (e.g., the new response appears “overconfident” relative to outdated memory), it is ignored or flagged for re-exploration. This systematic decay prevents the agent from relying on stale or likely invalid memory.
6. Experimental Validation and Performance
ExRAP was benchmarked across VirtualHome, ALFRED, and CARLA—standard simulation platforms for embodied agents—under varying instruction scales, types, and levels of environmental non-stationarity. Key metrics were:
- Task Success Rate (SR): Fraction of tasks completed upon satisfaction of their conditions.
- Pending Steps (PS): Average delay between satisfaction of conditions and completion of execution.
ExRAP consistently outperformed strong LLM-based planners such as ZSP, SayCan, ProgPrompt, and LLM-Planner, achieving up to 18–20 points higher SR and reducing PS by several steps under high non-stationarity. These results underscore the benefit of integrating exploration-based methods and context memory maintenance.
7. Applications, Implications, and Future Directions
ExRAP’s framework is especially relevant for agents expected to operate over long horizons in open, evolving physical spaces. Applications include:
- Smart Homes: Robots that must monitor, react, and update environmental models while executing overlapping cleaning, monitoring, or social tasks.
- Autonomous Vehicles: Continuous adaptation to dynamic road conditions and traffic while following complex, ongoing directives.
The explicit separation and integration of retrieval-augmented temporal memory, exploration-based planning, and temporal refinement suggest future directions such as:
- Enhanced retrieval architectures incorporating uncertainty-aware querying.
- Adaptive exploration weights based on real-time coverage and event frequency.
- Expanding context modeling to include richer forms of uncertainty quantification and hybrid symbolic-neural representations.
ExRAP establishes that robust continual instruction following in non-stationary embodied environments requires co-optimization of information gathering (exploration), grounded memory, and temporal consistency, with planning mechanisms designed to actively reconcile these competing pressures. This approach is a pivotal step towards embodied agents capable of scalable, efficient, and context-sensitive autonomy in the real world (Yoo et al., 10 Sep 2025).