LTLCrit: Temporal Logic LLM Critic
- LTLCrit is a temporal logic-based critic for LLM planners that supervises high-level decision-making by enforcing formal safety and efficiency constraints.
- It features a modular actor-critic architecture with an online loop for rapid action verification and an offline loop for inducing and refining logic-based constraints.
- Empirical evaluations in Minecraft benchmarks show significant improvements, reducing unsafe actions from 23% to 4.5% and increasing task efficiency.
LTLCrit denotes a temporal logic-based LLM critic architecture in which LLM-planned trajectories are supervised and improved by logic-based critics for embodied, long-horizon decision-making tasks. LTLCrit provides formal guarantees of safety and efficiency, decoupling high-level planning from constraint refinement, and is designed for modular integration with any LLM-based planner (Gokhale et al., 4 Jul 2025).
1. Modular Actor-Critic Architecture
The LTLCrit framework is based on a hierarchical actor-critic paradigm, comprising distinct online actor and offline critic loops:
- Online loop: The LLM actor receives a comprehensive natural language description of the current environment state, , and selects a high-level action from a fixed set. Actions are then checked against the current pool of linear temporal logic (LTL) constraints via a formal verification protocol. The verifier, instantiated as a Büchi automaton, examines whether the abstract state together with the candidate action satisfy both safety and efficiency constraints. Invalid actions prompt replanning; valid actions are delegated to a low-level controller.
- Offline loop: LTLCrit analyzes full observed trajectories to identify failures (unsafe episodes) or inefficient paths. It then induces new or refined LTL constraints which are injected back into the online system, updating the verifier's rule set for future action selection.
This modular separation allows rapid, reactive planning (online actor), while maintaining global property improvement and safety (offline critic).
2. Temporal Logic Constraint Formulation
Communication and supervision between critic and actor occurs via LTL constraints of the form:
where denotes the "globally" temporal operator, is a Boolean condition over symbolic state features, and is a Boolean condition over actions. Constraints can encode policy-level restrictions (e.g., resource nonduplication, subgoal ordering), representing hard safety rules or soft efficiency guidelines. For instance:
prevents crafting duplicate wooden pickaxes by barring redundant tool production as soon as the agent possesses one.
All constraints are compiled into automata for machine-verification and, due to LTL's canonical structure, support human interpretability and editing.
3. Safety and Efficiency Assurance
LTLCrit enforces two categories of constraints:
- Safety (hard constraints): Hand-authored rules (e.g., requiring baseline equipment before engaging in risky actions) are injected by domain experts to prevent catastrophic failures. An example is: , disallowing diamond mining without proper equipment.
- Adaptive efficiency (soft constraints): The critic automatically analyzes trajectories for suboptimal behavior such as loops, redundant actions, or avoidable delays. Using graph traversal formalism (actions as edges, costs as unit steps), the critic induces new constraints to prune inefficient branches from the exploration tree. Over-constrained, deadlocked states are detected and resolved by relaxing the constraint set as needed.
Constraint induction is driven by failure feedback, environmental rewards, and detection of inefficient state transitions.
4. Model-Agnostic Integration
LTLCrit is designed as a symbolic wrapper, agnostic to the underlying LLM actor. It has been demonstrated on planners such as SayCan and InnerMonologue, operating independently of their internal mechanics. The only requirement is that the actor expose symbolically tractable representations of state and action for verification. This modularity allows LTLCrit to generalize to various embodied agent architectures and planning domains.
5. Empirical Evaluation
Empirical assessment is conducted on the Minecraft diamond-mining benchmark. Key findings:
- Task completion: Augmenting LLM planners with LTLCrit yields a 100% task completion rate, outperforming baselines wherein standard planners fail to reach the goal.
- Efficiency: LTLCrit reduces the mean number of actions to reach key subgoals (e.g., average diamond-mining steps drop from approximately $45.5$ to $35.8$).
- Safety: Introduction of logic-based supervision reduces failed (unsafe) actions from to .
These results demonstrate that logic-guided constraint supervision substantially enhances both reliability and efficiency in long-horizon planning.
6. Broader Implications and Future Directions
LTLCrit provides a formal bridge between statistical LLM reasoning and symbolic control/verification, positioning LLMs for deployment in safety-critical systems such as robotics, autonomous vehicles, and healthcare. The interpretability and editability of constraints support regulatory requirements, operator supervision, and domain adaptation.
Future work is directed towards automating the discovery of atomic propositions (features for constraint construction) and expanding critic capabilities to multi-agent settings. Online versions of the critic loop—with continuous constraint refinement—promise tighter coupling of learning and supervision.
7. Summary Table: LTLCrit Components
| Component | Function | Communication Interface |
|---|---|---|
| LLM Actor | High-level action selection from natural language | Abstract symbolic state/action |
| LTLCrit Critic | Trajectory analysis and constraint induction | LTL constraint set |
| Verifier | Checks compliance of current action/state pair | Büchi automaton |
This architecture exemplifies how LLM reasoning can be systematically coupled with symbolic constraint-based guidance, leveraging the strengths of both paradigms for safe, robust, and efficient autonomous decision making.