Verification of Markov Decision Processes using Learning Algorithms (1402.2967v3)

Published 10 Feb 2014 in cs.LO

Abstract: We present a general framework for applying machine-learning algorithms to the verification of Markov decision processes (MDPs). The primary goal of these techniques is to improve performance by avoiding an exhaustive exploration of the state space. Our framework focuses on probabilistic reachability, which is a core property for verification, and is illustrated through two distinct instantiations. The first assumes that full knowledge of the MDP is available, and performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP, and yields probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stopping criteria for the approximation. The latter is the first extension of statistical model-checking for unbounded properties in MDPs. In contrast with other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular properties of the MDP. We also show how our techniques extend to LTL objectives. We present experimental results showing the performance of our framework on several examples.

Authors (8)

Tomáš Brázdil (40 papers)
Krishnendu Chatterjee (214 papers)
Martin Chmelík (16 papers)
Vojtěch Forejt (16 papers)
Marta Kwiatkowska (98 papers)
David Parker (58 papers)
Mateusz Ujma (4 papers)
Jan Křetínský (54 papers)

Citations (195)

View on Semantic Scholar

Summary

The paper introduces a framework leveraging BRTDP and DQL to verify probabilistic reachability in MDPs while reducing exhaustive state space exploration.
BRTDP computes both lower and upper bounds using complete MDP data, ensuring sure convergence in systems without end components.
DQL guarantees PAC results under limited MDP knowledge by efficiently sampling trajectories and handling end components.

Verification of Markov Decision Processes using Learning Algorithms

The paper presents a framework for using learning algorithms in the verification of Markov Decision Processes (MDPs). The primary focus is on probabilistic reachability, a key property in the verification context. The framework improves verification performance by circumventing exhaustive state space exploration through two different algorithmic approaches.

Key Contributions and Algorith Details

The authors introduce two algorithmic instantiations exemplifying applications of their framework:

BRTDP (Bounded Real-time Dynamic Programming): This algorithm requires complete knowledge of the MDP and performs heuristic-driven partial exploration, producing both lower and upper bounds on probabilistic reachability. It adapts RTDP methods by computing bounds that facilitate convergence to precise results. The authors demonstrate that BRTDP converges surely in MDPs lacking End Components (ECs).
DQL (Delayed Q-Learning): Operating under limited MDP information, this approach extends statistical model-checking for unbounded properties. It guarantees PAC (Probably Approximately Correct) results by sampling trajectories from the MDP. The authors provide detailed algorithmic transformations allowing DQL to efficiently navigate MDPs even in EC presence, ensuring convergence to suitable approximations by collapsing ECs.

Empirical and Theoretical Implications

Through experimental evaluations using the PRISM tool, the authors show significant efficiency improvements over conventional verification methods like value iteration. Notably, BRTDP exhibits substantial speed-ups due to its ability to explore a fraction of the state space while constructing optimal strategies.

Theoretically, the findings underpin the versatility of learning algorithms in handling nondeterministic and stochastic behaviors inherent in MDPs. The flexibility of learning-driven verification can extend to Linear Temporal Logic (LTL) objectives, utilizing Rabin acceptance conditions for reachability analysis.

Broader Impact and Future Directions

This paper's implications pivot on enabling more scalable and resource-efficient verification processes suitable for large-scale MDPs encountered in practical systems, such as network protocols and robotics. Future research directions could explore:

Integration with Symbolic Methods: Combining learning algorithms with symbolic model-checking could yield further efficiency gains, leveraging compact state representations.
Rare Event Handling: Addressing challenges in simulating rare but critical events could refine the applicability of learning-based verification methods.
Advanced Verification Objective: Expanding the focus to include more complex verification objectives beyond reachability could enhance the framework's versatility.

Overall, this research sets a foundation for leveraging contemporary learning techniques in formal verification, providing a pathway toward more adaptive and efficient verification methodologies for complex systems exhibiting stochastic dynamics.

PDF Markdown