SpectRL: RL Task Specification Logic

Updated 7 December 2025

SpectRL is a compositional task specification logic that precisely defines temporally extended RL tasks with complex objectives and safety constraints.
It compiles specifications into finite-state monitors and augmented MDPs to enable robust reward shaping and improved policy training.
Extensions like AutoSpec automatically refine under-specified tasks via graph-based transformations to significantly boost success rates.

SpectRL is a compositional task specification logic and reward compilation framework for reinforcement learning (RL) that enables precise, modular description of temporally extended tasks with complex objectives and safety constraints. Designed to bridge the expressive gap between temporal logic and RL reward engineering, SpectRL provides a small yet powerful grammar for task specification, quantitative and Boolean semantics over trajectories, and a reward-shaping compilation to finite-state monitors and augmented MDPs. Extensions such as AutoSpec further exploit SpectRL’s compositional abstract-graph representation to refine under-specified tasks via data-driven transformations, supporting robust and scalable automated specification repair.

1. Formal Syntax and Semantics

SpectRL task specifications are inductively generated over a set of atomic state predicates, yielding a compositional temporal logic with four principal operators. Let $S$ denote the set of environment states and $\mathcal{B}(S)\to\{\text{true},\text{false}\}$ the set of Boolean predicates over $S$ .

The core grammar is

$\phi ::= \mathsf{achieve}\;b \;|\; \phi_1\;\mathsf{ensuring}\;b \;|\; \phi_1;\phi_2 \;|\; \phi_1\;\mathsf{or}\;\phi_2$

where $b \in \mathcal{B}(S)$ .

$\mathsf{achieve}\;b$ : Requires the agent to eventually reach a state where $b$ holds.
$\phi\;\mathsf{ensuring}\;b$ : Requires satisfaction of $\phi$ while continually maintaining $b$ .
$\phi_1;\phi_2$ : Sequential composition; achieve $\phi_1$ , then $\phi_2$ .
$\phi_1\;\mathsf{or}\;\phi_2$ : Disjunction; satisfy at least one of the sub-specifications.

Boolean (crisp) semantics of satisfaction over a finite trajectory $\zeta = s_0, a_0, \ldots, s_T$ :

$\zeta \models \mathsf{achieve}\;b$ iff $\exists\,i:\,b(s_i)=\text{true}$
$\zeta \models \phi_1\;\mathsf{ensuring}\;b$ iff $\zeta\models\phi_1$ and $b(s_j)=\text{true}$ for all $0\leq j\leq\min\{\text{last index at which }\phi_1\text{ holds}\}$
$\zeta \models \phi_1;\phi_2$ iff $\exists\,k:\,\zeta_{0:k}\models\phi_1,\,\zeta_{k:}\models\phi_2$
$\zeta \models \phi_1\;\mathsf{or}\;\phi_2$ iff $\zeta\models\phi_1$ or $\zeta\models\phi_2$

Quantitative (robustness) semantics extend predicates to $p_q:S\rightarrow\mathbb{R}$ such that $p_q(s)>0 \Leftrightarrow p(s)=\text{true}$ , and propagate satisfaction scores via $\max,\,\min$ operators over rollouts (Jothimurugan et al., 2020, Ambadkar et al., 30 Nov 2025).

2. Compositional Graph Structure

Every SpectRL formula is translated into an abstract directed acyclic graph (DAG), formalized as

$G = (V, E, \beta, s, t)$

where

$V$ : nodes representing subtasks,
$E\subseteq V\times V$ : edges representing transitions or requirements,
$\beta:V\cup E\rightarrow\mathcal{B}(S)$ : predicate labeling for state/transition constraints,
$s,t\in V$ : initial and terminal nodes.

Each operator corresponds to a graph transformation:

Sequential composition $;$ : Connects $G_{\phi_1}$ 's terminal to $G_{\phi_2}$ 's initial node.
Disjunction ( $\mathsf{or}$ ): Unions the start/end nodes and forms parallel subgraphs.
Safety ( $\mathsf{ensuring}$ ): Annotates all edges in a subgraph with the safety predicate.
Atomic reach ( $\mathsf{achieve}\;b$ ): Single edge from $s$ to $t$ labeled with $b$ .

This compositionality supports modular specification construction and enables algorithms to target subtasks and transitions separately (Ambadkar et al., 30 Nov 2025).

3. Reward Compilation and Shaping

Given $M = (S,D,A,P,T)$ and a specification $\phi$ , SpectRL compiles $\phi$ into a task-monitor automaton $M_{\phi} = (Q,X,\Sigma,U,\Delta,q_0,v_0,F,r)$ , analogous to LTL-to-automaton translation:

$Q$ : finite monitor states,
$X$ : finite set of registers,
$\Delta\subseteq Q\times\Sigma\times U\times Q$ : transition relation,
$r:S\times Q\times \mathbb{R}^X\to\mathbb{R}$ : terminal reward as a function of state and registers.

The original MDP is then augmented into $\widetilde{M} = (\widetilde{S}, \widetilde{A}, \widetilde{P}, \widetilde{R}_s, T)$ with:

$\widetilde{S} = S\times Q\times \mathbb{R}^X$
$\widetilde{A} = A\times\Delta$
$\widetilde{P}$ and $\widetilde{R}_s$ as defined by the automaton logic

The shaped reward $\widetilde{R}_s$ augments intermediate states by a potential based on the distance and progress in the monitor, and strictly preserves optimality ordering over trajectories. Standard RL solvers on $\widetilde{M}$ produce policies over $S$ that maximize the probability of satisfying $\phi$ (Jothimurugan et al., 2020).

AutoSpec extends SpectRL’s compositional structure to automate refinement of coarse or underperforming specifications. It represents the specification’s DAG and identifies bottlenecks (e.g., low-success edges) for refinement via four strategies:

SeqRefine (Predicate Refinement): Tightens the reach or avoid regions on nodes/edges by restricting to successful trajectory subsets or excluding failure regions.
AddRefine (Waypoint Addition): Introduces new intermediate nodes at midpoints of successful trajectories, dividing hard transitions into easier subgoals.
PastRefine (Source Partition): Partitions the source region, often via separating hyperplanes, to restrict starting configurations based on success/failure outcomes.
OrRefine (Alternative-Path Addition): Adds new parallel paths in the graph, broadening options for reaching subgoals.

Every refinement produces a new specification $\phi_r$ satisfying $\forall\,\zeta,\,\zeta\models\phi_r\implies\zeta\models\phi$ (soundness), and tightens or augments the local RL reward structures accordingly (Ambadkar et al., 30 Nov 2025).

5. Theoretical Guarantees

SpectRL provides several formal properties:

Monitor Correctness: For any trajectory $\zeta$ , $\zeta\Vdash\phi$ if and only if some augmented rollout $\tilde{\zeta}$ with $\tilde{\zeta}|_S = \zeta$ has $r(s_T,q_T,v_T) > 0$ .
Order-Preservation of Shaping: For any pair of rollouts, the shaped reward preserves the order of the unshaped objective and strictly orders progress among nonfinal states.
Refinement Soundness (AutoSpec): All refinement operators are backward-sound; any trajectory satisfying the refined specification also satisfies the original (Ambadkar et al., 30 Nov 2025).
In deterministic domains, optimality with respect to the shaped reward is preserved with respect to the original specification objective (Jothimurugan et al., 2020).

6. Practical Examples and Empirical Impact

Example Specifications

Task Description	SPECTRL Specification
Reach $q$ , then $p$ , avoiding obstacle $O$ , maintaining $fuel > 0$	$\phi = \mathsf{achieve}(\mathsf{reach}\;q; \mathsf{reach}\;p)\;\mathsf{ensuring}(\mathsf{avoid}\;O \wedge (fuel > 0))$
Bring cart-pole to $x=0.5$ then to $x=0.0$ while balancing pole	$\phi = \mathsf{achieve}(\mathsf{reach}\;0.5; \mathsf{reach}\;0.0)\;\mathsf{ensuring}(balance)$

In navigation experiments, policies trained on SpectRL-shaped rewards exceeded 98% success within $5\times10^3$ rollouts, with prior TLTL-based or unshaped baselines taking an order of magnitude longer and failing on non-Markovian or sequential tasks (Jothimurugan et al., 2020). In a 9-rooms maze, applying AutoSpec’s SeqRefine raised end-to-end success from $\approx 15\%$ to $\approx 85\%$ , and further refinements produced over $\approx 90\%$ success. On a 100-room branching grid, AutoSpec enabled tasks previously intractable under unrefined specifications, raising terminal success from $\approx20\%$ to $\approx60\%$ (Ambadkar et al., 30 Nov 2025).

7. Significance and Extensions

SpectRL formalizes a compact, temporal-logic-inspired language for RL task description, provides a compositional compilation framework for robust reward shaping, and admits algorithmic refinement for complex or underperforming tasks. By bridging specification logics, monitor compilation, and automatic refinement, SpectRL and its extensions offer a rigorous and modular foundation for scalable, robust RL under multi-objective, safety-critical, and non-Markovian settings (Jothimurugan et al., 2020, Ambadkar et al., 30 Nov 2025). The empirical and theoretical results suggest broad applicability to hierarchical RL, safe RL, and automated specification repair.

Markdown Upgrade to Chat

References (2)

A Composable Specification Language for Reinforcement Learning Tasks (2020)

Automating the Refinement of Reinforcement Learning Specifications (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SpectRL Specification Logic.

SpectRL: RL Task Specification Logic

1. Formal Syntax and Semantics

2. Compositional Graph Structure

3. Reward Compilation and Shaping

4. Specification Refinement: The AutoSpec Framework

5. Theoretical Guarantees

6. Practical Examples and Empirical Impact

Example Specifications

7. Significance and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

SpectRL: RL Task Specification Logic

1. Formal Syntax and Semantics

2. Compositional Graph Structure

3. Reward Compilation and Shaping

4. Specification Refinement: The AutoSpec Framework

5. Theoretical Guarantees

6. Practical Examples and Empirical Impact

Example Specifications

7. Significance and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics