Papers
Topics
Authors
Recent
Search
2000 character limit reached

SpectRL: RL Task Specification Logic

Updated 7 December 2025
  • SpectRL is a compositional task specification logic that precisely defines temporally extended RL tasks with complex objectives and safety constraints.
  • It compiles specifications into finite-state monitors and augmented MDPs to enable robust reward shaping and improved policy training.
  • Extensions like AutoSpec automatically refine under-specified tasks via graph-based transformations to significantly boost success rates.

SpectRL is a compositional task specification logic and reward compilation framework for reinforcement learning (RL) that enables precise, modular description of temporally extended tasks with complex objectives and safety constraints. Designed to bridge the expressive gap between temporal logic and RL reward engineering, SpectRL provides a small yet powerful grammar for task specification, quantitative and Boolean semantics over trajectories, and a reward-shaping compilation to finite-state monitors and augmented MDPs. Extensions such as AutoSpec further exploit SpectRL’s compositional abstract-graph representation to refine under-specified tasks via data-driven transformations, supporting robust and scalable automated specification repair.

1. Formal Syntax and Semantics

SpectRL task specifications are inductively generated over a set of atomic state predicates, yielding a compositional temporal logic with four principal operators. Let SS denote the set of environment states and B(S){true,false}\mathcal{B}(S)\to\{\text{true},\text{false}\} the set of Boolean predicates over SS.

The core grammar is

ϕ::=achieve  b    ϕ1  ensuring  b    ϕ1;ϕ2    ϕ1  or  ϕ2\phi ::= \mathsf{achieve}\;b \;|\; \phi_1\;\mathsf{ensuring}\;b \;|\; \phi_1;\phi_2 \;|\; \phi_1\;\mathsf{or}\;\phi_2

where bB(S)b \in \mathcal{B}(S).

  • achieve  b\mathsf{achieve}\;b: Requires the agent to eventually reach a state where bb holds.
  • ϕ  ensuring  b\phi\;\mathsf{ensuring}\;b: Requires satisfaction of ϕ\phi while continually maintaining bb.
  • ϕ1;ϕ2\phi_1;\phi_2: Sequential composition; achieve ϕ1\phi_1, then ϕ2\phi_2.
  • ϕ1  or  ϕ2\phi_1\;\mathsf{or}\;\phi_2: Disjunction; satisfy at least one of the sub-specifications.

Boolean (crisp) semantics of satisfaction over a finite trajectory ζ=s0,a0,,sT\zeta = s_0, a_0, \ldots, s_T:

  • ζachieve  b\zeta \models \mathsf{achieve}\;b iff i:b(si)=true\exists\,i:\,b(s_i)=\text{true}
  • ζϕ1  ensuring  b\zeta \models \phi_1\;\mathsf{ensuring}\;b iff ζϕ1\zeta\models\phi_1 and b(sj)=trueb(s_j)=\text{true} for all 0jmin{last index at which ϕ1 holds}0\leq j\leq\min\{\text{last index at which }\phi_1\text{ holds}\}
  • ζϕ1;ϕ2\zeta \models \phi_1;\phi_2 iff k:ζ0:kϕ1,ζk:ϕ2\exists\,k:\,\zeta_{0:k}\models\phi_1,\,\zeta_{k:}\models\phi_2
  • ζϕ1  or  ϕ2\zeta \models \phi_1\;\mathsf{or}\;\phi_2 iff ζϕ1\zeta\models\phi_1 or ζϕ2\zeta\models\phi_2

Quantitative (robustness) semantics extend predicates to pq:SRp_q:S\rightarrow\mathbb{R} such that pq(s)>0p(s)=truep_q(s)>0 \Leftrightarrow p(s)=\text{true}, and propagate satisfaction scores via max,min\max,\,\min operators over rollouts (Jothimurugan et al., 2020, Ambadkar et al., 30 Nov 2025).

2. Compositional Graph Structure

Every SpectRL formula is translated into an abstract directed acyclic graph (DAG), formalized as

G=(V,E,β,s,t)G = (V, E, \beta, s, t)

where

  • VV: nodes representing subtasks,
  • EV×VE\subseteq V\times V: edges representing transitions or requirements,
  • β:VEB(S)\beta:V\cup E\rightarrow\mathcal{B}(S): predicate labeling for state/transition constraints,
  • s,tVs,t\in V: initial and terminal nodes.

Each operator corresponds to a graph transformation:

  • Sequential composition ;;: Connects Gϕ1G_{\phi_1}'s terminal to Gϕ2G_{\phi_2}'s initial node.
  • Disjunction (or\mathsf{or}): Unions the start/end nodes and forms parallel subgraphs.
  • Safety (ensuring\mathsf{ensuring}): Annotates all edges in a subgraph with the safety predicate.
  • Atomic reach (achieve  b\mathsf{achieve}\;b): Single edge from ss to tt labeled with bb.

This compositionality supports modular specification construction and enables algorithms to target subtasks and transitions separately (Ambadkar et al., 30 Nov 2025).

3. Reward Compilation and Shaping

Given M=(S,D,A,P,T)M = (S,D,A,P,T) and a specification ϕ\phi, SpectRL compiles ϕ\phi into a task-monitor automaton Mϕ=(Q,X,Σ,U,Δ,q0,v0,F,r)M_{\phi} = (Q,X,\Sigma,U,\Delta,q_0,v_0,F,r), analogous to LTL-to-automaton translation:

  • QQ: finite monitor states,
  • XX: finite set of registers,
  • ΔQ×Σ×U×Q\Delta\subseteq Q\times\Sigma\times U\times Q: transition relation,
  • r:S×Q×RXRr:S\times Q\times \mathbb{R}^X\to\mathbb{R}: terminal reward as a function of state and registers.

The original MDP is then augmented into M~=(S~,A~,P~,R~s,T)\widetilde{M} = (\widetilde{S}, \widetilde{A}, \widetilde{P}, \widetilde{R}_s, T) with:

  • S~=S×Q×RX\widetilde{S} = S\times Q\times \mathbb{R}^X
  • A~=A×Δ\widetilde{A} = A\times\Delta
  • P~\widetilde{P} and R~s\widetilde{R}_s as defined by the automaton logic

The shaped reward R~s\widetilde{R}_s augments intermediate states by a potential based on the distance and progress in the monitor, and strictly preserves optimality ordering over trajectories. Standard RL solvers on M~\widetilde{M} produce policies over SS that maximize the probability of satisfying ϕ\phi (Jothimurugan et al., 2020).

4. Specification Refinement: The AutoSpec Framework

AutoSpec extends SpectRL’s compositional structure to automate refinement of coarse or underperforming specifications. It represents the specification’s DAG and identifies bottlenecks (e.g., low-success edges) for refinement via four strategies:

  1. SeqRefine (Predicate Refinement): Tightens the reach or avoid regions on nodes/edges by restricting to successful trajectory subsets or excluding failure regions.
  2. AddRefine (Waypoint Addition): Introduces new intermediate nodes at midpoints of successful trajectories, dividing hard transitions into easier subgoals.
  3. PastRefine (Source Partition): Partitions the source region, often via separating hyperplanes, to restrict starting configurations based on success/failure outcomes.
  4. OrRefine (Alternative-Path Addition): Adds new parallel paths in the graph, broadening options for reaching subgoals.

Every refinement produces a new specification ϕr\phi_r satisfying ζ,ζϕr    ζϕ\forall\,\zeta,\,\zeta\models\phi_r\implies\zeta\models\phi (soundness), and tightens or augments the local RL reward structures accordingly (Ambadkar et al., 30 Nov 2025).

5. Theoretical Guarantees

SpectRL provides several formal properties:

  • Monitor Correctness: For any trajectory ζ\zeta, ζϕ\zeta\Vdash\phi if and only if some augmented rollout ζ~\tilde{\zeta} with ζ~S=ζ\tilde{\zeta}|_S = \zeta has r(sT,qT,vT)>0r(s_T,q_T,v_T) > 0.
  • Order-Preservation of Shaping: For any pair of rollouts, the shaped reward preserves the order of the unshaped objective and strictly orders progress among nonfinal states.
  • Refinement Soundness (AutoSpec): All refinement operators are backward-sound; any trajectory satisfying the refined specification also satisfies the original (Ambadkar et al., 30 Nov 2025).
  • In deterministic domains, optimality with respect to the shaped reward is preserved with respect to the original specification objective (Jothimurugan et al., 2020).

6. Practical Examples and Empirical Impact

Example Specifications

Task Description SPECTRL Specification
Reach qq, then pp, avoiding obstacle OO, maintaining fuel>0fuel > 0 ϕ=achieve(reach  q;reach  p)  ensuring(avoid  O(fuel>0))\phi = \mathsf{achieve}(\mathsf{reach}\;q; \mathsf{reach}\;p)\;\mathsf{ensuring}(\mathsf{avoid}\;O \wedge (fuel > 0))
Bring cart-pole to x=0.5x=0.5 then to x=0.0x=0.0 while balancing pole ϕ=achieve(reach  0.5;reach  0.0)  ensuring(balance)\phi = \mathsf{achieve}(\mathsf{reach}\;0.5; \mathsf{reach}\;0.0)\;\mathsf{ensuring}(balance)

In navigation experiments, policies trained on SpectRL-shaped rewards exceeded 98% success within 5×1035\times10^3 rollouts, with prior TLTL-based or unshaped baselines taking an order of magnitude longer and failing on non-Markovian or sequential tasks (Jothimurugan et al., 2020). In a 9-rooms maze, applying AutoSpec’s SeqRefine raised end-to-end success from 15%\approx 15\% to 85%\approx 85\%, and further refinements produced over 90%\approx 90\% success. On a 100-room branching grid, AutoSpec enabled tasks previously intractable under unrefined specifications, raising terminal success from 20%\approx20\% to 60%\approx60\% (Ambadkar et al., 30 Nov 2025).

7. Significance and Extensions

SpectRL formalizes a compact, temporal-logic-inspired language for RL task description, provides a compositional compilation framework for robust reward shaping, and admits algorithmic refinement for complex or underperforming tasks. By bridging specification logics, monitor compilation, and automatic refinement, SpectRL and its extensions offer a rigorous and modular foundation for scalable, robust RL under multi-objective, safety-critical, and non-Markovian settings (Jothimurugan et al., 2020, Ambadkar et al., 30 Nov 2025). The empirical and theoretical results suggest broad applicability to hierarchical RL, safe RL, and automated specification repair.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SpectRL Specification Logic.