Timed-LDGBA for Timed Control Synthesis
- Timed-LDGBA is a formal automata model that combines limit-deterministic structure with real-valued clocks to enforce explicit time bounds in temporal logic specifications.
- These automata synchronize with MDP and POMDP frameworks, enabling reinforcement learning under strict time constraints and probabilistic environments.
- MITL formulas are systematically translated into Timed-LDGBA, ensuring that all designated accepting sets are visited infinitely often to satisfy temporal obligations.
A Timed Limit-Deterministic Generalized Büchi Automaton (Timed-LDGBA) is a formal automaton model uniquely suited to represent time-bounded temporal logic specifications for control synthesis over stochastic environments. This construction combines the structural restrictions of limit-deterministic Büchi automata (LDBA) with real-valued clocks, enabling expressivity for specifying and monitoring sequences of events constrained by explicit time intervals. Timed-LDGBA are instrumental in synchronizing temporal logic specifications with Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs), facilitating reinforcement learning in environments with strict time-bounded requirements (Wang et al., 31 Dec 2025).
1. Formal Definition and Structure
A Timed-LDGBA is a tuple comprising:
- : finite set of locations partitioned as , where
- (nondeterministic part): admits only -transitions and contains no accepting locations.
- (deterministic part): contains all accepting locations and deterministic transitions only.
- : finite alphabet ( for atomic propositions ).
- : finite set of real-valued clocks.
- : set of edges. Each edge is where:
- : transition label.
- : clock guard as conjunctions or .
- : clocks to reset upon transition.
- : invariant conditions at each location (conjunctions of constraints).
- : initial location.
- : accepting sets, each .
Limit-determinism is enforced such that all nondeterministic branching (-moves) occurs in , which is acyclic and never revisited once entered . All acceptance monitoring in is strictly deterministic.
2. Generalized Büchi Acceptance Condition
Timed-LDGBA utilize a generalized Büchi acceptance mechanism to formalize satisfaction of temporal goals. For an infinite run reading a timed word , acceptance requires:
This ensures every generalized Büchi set in is visited infinitely often by the corresponding sequence of automaton states, encoding persistent timed obligations tied to the original temporal logic specification.
3. Clocks, Guards, Invariants, and Resets
Clocks provide the quantitative dimension necessary for time-bounded semantics:
- Clock set : , each real-valued.
- Valuation : tracks elapsed time since last reset for each clock.
- Guards : conjunctions of atomic constraints, or ().
- Invariants : conjunctions constraining the allowable time in location as time elapses.
- Resets : upon taking an edge, clocks in are set to zero; the valuation updates as .
- Time elapse: at any location , duration is permissible as long as is true at each intermediate valuation , .
4. Translation from MITL to Timed-LDGBA
Metric Interval Temporal Logic (MITL) formulas are systematically compiled into Timed-LDGBAs:
- Negation Normal Form & Interval Normalization: MITL formulas are normalized for transition monitoring.
- Monitor Construction: For each subformula , a "timed monitor" automaton is built, typically with a single clock:
- Example for : The automaton contains:
- Initial state with invariant .
- On letter and , reset and transition to .
- Sink state if before .
- Accepting set .
- Example for : The automaton contains:
- Synchronous Product: Monitors are composed in product, tracking all clocks simultaneously.
- Limit-Determinization: All nondeterminism is grouped into initial states , then collapsed into deterministic with acceptance sets corresponding to fulfilled obligations.
- Pruning Unreachable States: Ensures model compactness.
Construction Example:
For :
- .
- .
- ; ; .
- Edges include:
- , .
- .
5. Synchronization with MDPs and POMDPs
Timed-LDGBA are synchronized with stochastic environment models to facilitate policy synthesis:
- MDP: .
- POMDP: .
A product timed model is constructed:
- States: , where is the (discretized) space of clock valuations.
- Actions: .
- Transitions:
- where via , and edge is enabled by .
- If no edge is enabled, transition to a global sink state.
- For in : , , and for the chosen -move.
- Reward: only upon entering an accepting ; otherwise zero.
- Observations (POMDP): if the automaton edge passes; else $0$.
- Acceptance: A path is accepting iff its -component visits each infinitely often.
Crucially, the automaton state and clock valuation are perfectly tracked and can augment the input to Q-learning or belief trackers; in POMDPs, these quantities remain fully observable, while the base state is inferred by belief .
6. Application in Reinforcement Learning under Timed Constraints
MITL specifications are offline-compiled into Timed-LDGBA and synchronized online with MDP/POMDP models:
- The reward structure enforces temporal correctness via positive reward on accepting set entry, optionally combined with performance objectives.
- Standard RL algorithms (Q-learning, DQN) operate on the product model, learning policies to satisfy all time-bounded constraints or maximizing acceptance probability under stochasticity.
- Evaluations in grid-world and robotics scenarios demonstrate scalability, robustness to partial observability, and faithful satisfaction of MITL constraints in learned policies (Wang et al., 31 Dec 2025).
This framework enables reliable policy synthesis in dynamic, uncertain environments where temporal obligations are explicit and time-critical.