SpectRL: RL Task Specification Logic
- SpectRL is a compositional task specification logic that precisely defines temporally extended RL tasks with complex objectives and safety constraints.
- It compiles specifications into finite-state monitors and augmented MDPs to enable robust reward shaping and improved policy training.
- Extensions like AutoSpec automatically refine under-specified tasks via graph-based transformations to significantly boost success rates.
SpectRL is a compositional task specification logic and reward compilation framework for reinforcement learning (RL) that enables precise, modular description of temporally extended tasks with complex objectives and safety constraints. Designed to bridge the expressive gap between temporal logic and RL reward engineering, SpectRL provides a small yet powerful grammar for task specification, quantitative and Boolean semantics over trajectories, and a reward-shaping compilation to finite-state monitors and augmented MDPs. Extensions such as AutoSpec further exploit SpectRL’s compositional abstract-graph representation to refine under-specified tasks via data-driven transformations, supporting robust and scalable automated specification repair.
1. Formal Syntax and Semantics
SpectRL task specifications are inductively generated over a set of atomic state predicates, yielding a compositional temporal logic with four principal operators. Let denote the set of environment states and the set of Boolean predicates over .
The core grammar is
where .
- : Requires the agent to eventually reach a state where holds.
- : Requires satisfaction of while continually maintaining .
- : Sequential composition; achieve , then .
- : Disjunction; satisfy at least one of the sub-specifications.
Boolean (crisp) semantics of satisfaction over a finite trajectory :
- iff
- iff and for all
- iff
- iff or
Quantitative (robustness) semantics extend predicates to such that , and propagate satisfaction scores via operators over rollouts (Jothimurugan et al., 2020, Ambadkar et al., 30 Nov 2025).
2. Compositional Graph Structure
Every SpectRL formula is translated into an abstract directed acyclic graph (DAG), formalized as
where
- : nodes representing subtasks,
- : edges representing transitions or requirements,
- : predicate labeling for state/transition constraints,
- : initial and terminal nodes.
Each operator corresponds to a graph transformation:
- Sequential composition : Connects 's terminal to 's initial node.
- Disjunction (): Unions the start/end nodes and forms parallel subgraphs.
- Safety (): Annotates all edges in a subgraph with the safety predicate.
- Atomic reach (): Single edge from to labeled with .
This compositionality supports modular specification construction and enables algorithms to target subtasks and transitions separately (Ambadkar et al., 30 Nov 2025).
3. Reward Compilation and Shaping
Given and a specification , SpectRL compiles into a task-monitor automaton , analogous to LTL-to-automaton translation:
- : finite monitor states,
- : finite set of registers,
- : transition relation,
- : terminal reward as a function of state and registers.
The original MDP is then augmented into with:
- and as defined by the automaton logic
The shaped reward augments intermediate states by a potential based on the distance and progress in the monitor, and strictly preserves optimality ordering over trajectories. Standard RL solvers on produce policies over that maximize the probability of satisfying (Jothimurugan et al., 2020).
4. Specification Refinement: The AutoSpec Framework
AutoSpec extends SpectRL’s compositional structure to automate refinement of coarse or underperforming specifications. It represents the specification’s DAG and identifies bottlenecks (e.g., low-success edges) for refinement via four strategies:
- SeqRefine (Predicate Refinement): Tightens the reach or avoid regions on nodes/edges by restricting to successful trajectory subsets or excluding failure regions.
- AddRefine (Waypoint Addition): Introduces new intermediate nodes at midpoints of successful trajectories, dividing hard transitions into easier subgoals.
- PastRefine (Source Partition): Partitions the source region, often via separating hyperplanes, to restrict starting configurations based on success/failure outcomes.
- OrRefine (Alternative-Path Addition): Adds new parallel paths in the graph, broadening options for reaching subgoals.
Every refinement produces a new specification satisfying (soundness), and tightens or augments the local RL reward structures accordingly (Ambadkar et al., 30 Nov 2025).
5. Theoretical Guarantees
SpectRL provides several formal properties:
- Monitor Correctness: For any trajectory , if and only if some augmented rollout with has .
- Order-Preservation of Shaping: For any pair of rollouts, the shaped reward preserves the order of the unshaped objective and strictly orders progress among nonfinal states.
- Refinement Soundness (AutoSpec): All refinement operators are backward-sound; any trajectory satisfying the refined specification also satisfies the original (Ambadkar et al., 30 Nov 2025).
- In deterministic domains, optimality with respect to the shaped reward is preserved with respect to the original specification objective (Jothimurugan et al., 2020).
6. Practical Examples and Empirical Impact
Example Specifications
| Task Description | SPECTRL Specification |
|---|---|
| Reach , then , avoiding obstacle , maintaining | |
| Bring cart-pole to then to while balancing pole |
In navigation experiments, policies trained on SpectRL-shaped rewards exceeded 98% success within rollouts, with prior TLTL-based or unshaped baselines taking an order of magnitude longer and failing on non-Markovian or sequential tasks (Jothimurugan et al., 2020). In a 9-rooms maze, applying AutoSpec’s SeqRefine raised end-to-end success from to , and further refinements produced over success. On a 100-room branching grid, AutoSpec enabled tasks previously intractable under unrefined specifications, raising terminal success from to (Ambadkar et al., 30 Nov 2025).
7. Significance and Extensions
SpectRL formalizes a compact, temporal-logic-inspired language for RL task description, provides a compositional compilation framework for robust reward shaping, and admits algorithmic refinement for complex or underperforming tasks. By bridging specification logics, monitor compilation, and automatic refinement, SpectRL and its extensions offer a rigorous and modular foundation for scalable, robust RL under multi-objective, safety-critical, and non-Markovian settings (Jothimurugan et al., 2020, Ambadkar et al., 30 Nov 2025). The empirical and theoretical results suggest broad applicability to hierarchical RL, safe RL, and automated specification repair.