Timed-LDGBA for Timed Control Synthesis

Updated 7 January 2026

Timed-LDGBA is a formal automata model that combines limit-deterministic structure with real-valued clocks to enforce explicit time bounds in temporal logic specifications.
These automata synchronize with MDP and POMDP frameworks, enabling reinforcement learning under strict time constraints and probabilistic environments.
MITL formulas are systematically translated into Timed-LDGBA, ensuring that all designated accepting sets are visited infinitely often to satisfy temporal obligations.

A Timed Limit-Deterministic Generalized Büchi Automaton (Timed-LDGBA) is a formal automaton model uniquely suited to represent time-bounded temporal logic specifications for control synthesis over stochastic environments. This construction combines the structural restrictions of limit-deterministic Büchi automata (LDBA) with real-valued clocks, enabling expressivity for specifying and monitoring sequences of events constrained by explicit time intervals. Timed-LDGBA are instrumental in synchronizing temporal logic specifications with Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs), facilitating reinforcement learning in environments with strict time-bounded requirements (Wang et al., 31 Dec 2025).

1. Formal Definition and Structure

A Timed-LDGBA is a tuple $A = (Q, \Sigma, C, E, \text{Inv}, q_0, \mathcal{F})$ comprising:

$Q$ $Q$ : finite set of locations partitioned as $Q = Q_N \cup Q_D$ $Q = Q_{N} \cup Q_{D}$ , where
- $Q_N$ (nondeterministic part): admits only $\epsilon$ -transitions and contains no accepting locations.
- $Q_D$ (deterministic part): contains all accepting locations and deterministic transitions only.
$\Sigma$ : finite alphabet ( $2^\Pi$ for atomic propositions $\Pi$ ).
$C$ : finite set of real-valued clocks.
$E \subseteq Q \times (\Sigma \cup \{\epsilon\}) \times \mathbb{B}(C) \times 2^C \times Q$ : set of edges. Each edge is $(q, a, g, r, q')$ $(q, a, g, r, q^{'})$ where:
- $a \in \Sigma \cup \{\epsilon\}$ : transition label.
- $g \in \mathbb{B}(C)$ : clock guard as conjunctions $x \preceq c$ or $x \succeq c$ .
- $r \subseteq C$ : clocks to reset upon transition.
$\text{Inv}: Q \to \mathbb{B}(C)$ : invariant conditions at each location (conjunctions of constraints).
$q_0 \in Q_N$ : initial location.
$\mathcal{F} = \{F_1, \ldots, F_k\}$ : accepting sets, each $F_i \subseteq Q_D$ .

Limit-determinism is enforced such that all nondeterministic branching ( $\epsilon$ -moves) occurs in $Q_N$ , which is acyclic and never revisited once entered $Q_D$ . All acceptance monitoring in $Q_D$ is strictly deterministic.

2. Generalized Büchi Acceptance Condition

Timed-LDGBA utilize a generalized Büchi acceptance mechanism to formalize satisfaction of temporal goals. For an infinite run $\sigma = (q_0, v_0) \xrightarrow{t_0, a_0} (q_1, v_1) \xrightarrow{t_1, a_1} (q_2, v_2) \ldots$ reading a timed word $(a_i, t_i)$ , acceptance requires:

$\text{Run is accepted} \iff \forall i \in \{1,\dots,k\},\ \{ n \mid q_n \in F_i \} \text{ is infinite}$

This ensures every generalized Büchi set $F_i$ in $\mathcal{F}$ is visited infinitely often by the corresponding sequence of automaton states, encoding persistent timed obligations tied to the original temporal logic specification.

3. Clocks, Guards, Invariants, and Resets

Clocks provide the quantitative dimension necessary for time-bounded semantics:

Clock set $C$ : $C = \{x_1, x_2, \ldots, x_m\}$ , each $x_j$ real-valued.
Valuation $v: C \to \mathbb{R}_0^+$ : tracks elapsed time since last reset for each clock.
Guards $g$ : conjunctions of atomic constraints, $x \preceq c$ or $x \succeq c$ ( $c \in \mathbb{N}$ ).
Invariants $\text{Inv}(q)$ : conjunctions $x \preceq c$ constraining the allowable time in location $q$ as time elapses.
Resets $r \subseteq C$ : upon taking an edge, clocks in $r$ are set to zero; the valuation updates as $v' = v[r := 0]$ .
Time elapse: at any location $q$ , duration $d \geq 0$ is permissible as long as $\text{Inv}(q)$ is true at each intermediate valuation $v + \delta$ , $\delta \in [0, d]$ .

4. Translation from MITL to Timed-LDGBA

Metric Interval Temporal Logic (MITL) formulas $\,\varphi$ are systematically compiled into Timed-LDGBAs:

Negation Normal Form & Interval Normalization: MITL formulas are normalized for transition monitoring.
Monitor Construction: For each subformula $\psi$ $ψ$ , a "timed monitor" automaton $A_\psi$ $A_{ψ}$ is built, typically with a single clock:
- Example for $F_{[a,b]} \pi$ : The automaton contains:
  - Initial state $q_0$ with invariant $x \leq b$ .
  - On letter $\pi$ and $a \leq x \leq b$ , reset $x$ and transition to $q_{accept}$ .
  - Sink state $q_{sink}$ if $x > b$ before $\pi$ .
  - Accepting set $F = \{q_{accept}\}$ .
Synchronous Product: Monitors $A_\psi$ are composed in product, tracking all clocks simultaneously.
Limit-Determinization: All nondeterminism is grouped into initial states $Q_N$ , then collapsed into deterministic $Q_D$ with acceptance sets corresponding to fulfilled obligations.
Pruning Unreachable States: Ensures model compactness.

Construction Example:

For $\varphi = F_{[1,3]} a$ :

$C = \{x\}$ .
$Q = \{q_0, q_{accept}, q_{sink}\}$ .
$\text{Inv}(q_0) = x \leq 3$ ; $\text{Inv}(q_{accept}) = \text{true}$ ; $\text{Inv}(q_{sink}) = \text{true}$ .
Edges include:
- $q_0 \xrightarrow{a, 1 \leq x \leq 3, \{x\}} q_{accept}$
- $q_0 \xrightarrow{any, x > 3, \varnothing} q_{sink}$
$q_0 \in Q_N$ , $q_{accept}, q_{sink} \in Q_D$ .
$\mathcal{F} = \{\{q_{accept}\}\}$ .

5. Synchronization with MDPs and POMDPs

Timed-LDGBA are synchronized with stochastic environment models to facilitate policy synthesis:

MDP: $M = (S, A, T, s_0, R, \Pi, L)$ .
POMDP: $M = (S, A, T, s_0, R, O, \Omega, \Pi, L)$ .

A product timed model $M \times A_\varphi$ is constructed:

States: $S^\times = S \times Q \times V$ , where $V$ is the (discretized) space of clock valuations.
Actions: $A^\times = A \cup \{\epsilon\}$ .
Transitions:
- $(s, q, v) \xrightarrow{a} (s', q', v')$ where $s \xrightarrow{a} s'$ via $T$ , and edge $(q, L(s'), g, r, q')$ is enabled by $v+1$ .
- If no edge is enabled, transition to a global sink state.
- For $\epsilon$ in $Q_N$ : $s' = s$ , $a = \epsilon$ , and $T^\times = 1$ for the chosen $\epsilon$ -move.
Reward: $R^\times(s, q, v) = R(s)$ only upon entering an accepting $q' \in F_i$ ; otherwise zero.
Observations (POMDP): $\Omega^\times((s, q, v), a, o) = \Omega(s', a, o)$ if the automaton edge passes; else $0$.
Acceptance: A path is accepting iff its $Q$ -component visits each $F_i$ infinitely often.

Crucially, the automaton state $q$ and clock valuation $v$ are perfectly tracked and can augment the input to Q-learning or belief trackers; in POMDPs, these quantities remain fully observable, while the base state $s$ is inferred by belief $b_t$ .

6. Application in Reinforcement Learning under Timed Constraints

MITL specifications are offline-compiled into Timed-LDGBA and synchronized online with MDP/POMDP models:

The reward structure enforces temporal correctness via positive reward on accepting set entry, optionally combined with performance objectives.
Standard RL algorithms (Q-learning, DQN) operate on the product model, learning policies to satisfy all time-bounded constraints or maximizing acceptance probability under stochasticity.
Evaluations in grid-world and robotics scenarios demonstrate scalability, robustness to partial observability, and faithful satisfaction of MITL constraints in learned policies (Wang et al., 31 Dec 2025).

This framework enables reliable policy synthesis in dynamic, uncertain environments where temporal obligations are explicit and time-critical.

Markdown Report Issue Upgrade to Chat

References (1)

Reinforcement learning with timed constraints for robotics motion planning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Timed Limit-Deterministic Generalized Büchi Automata (Timed-LDGBA).