Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection (2404.07099v1)
Abstract: While reinforcement learning (RL) algorithms have been successfully applied across numerous sequential decision-making problems, their generalization to unforeseen testing environments remains a significant concern. In this paper, we study the problem of out-of-distribution (OOD) detection in RL, which focuses on identifying situations at test time that RL agents have not encountered in their training environments. We first propose a clarification of terminology for OOD detection in RL, which aligns it with the literature from other machine learning domains. We then present new benchmark scenarios for OOD detection, which introduce anomalies with temporal autocorrelation into different components of the agent-environment loop. We argue that such scenarios have been understudied in the current literature, despite their relevance to real-world situations. Confirming our theoretical predictions, our experimental results suggest that state-of-the-art OOD detectors are not able to identify such anomalies. To address this problem, we propose a novel method for OOD detection, which we call DEXTER (Detection via Extraction of Time Series Representations). By treating environment observations as time series data, DEXTER extracts salient time series features, and then leverages an ensemble of isolation forest algorithms to detect anomalies. We find that DEXTER can reliably identify anomalies across benchmark scenarios, exhibiting superior performance compared to both state-of-the-art OOD detectors and high-dimensional changepoint detectors adopted from statistics.
- Learning dexterous in-hand manipulation. The International Journal of Robotics Research (2020).
- i GM Ljung, Time series analysis: forecasting and control.
- Time series analysis: forecasting and control. John Wiley & Sons.
- OpenAI Gym. arXiv:arXiv:1606.01540
- Hock Peng Chan. 2017. Optimal sequential detection in multi-stream data. (2017).
- Measuring the reliability of reinforcement learning algorithms. arXiv preprint arXiv:1912.05663 (2019).
- High-dimensional, multiscale online changepoint detection. Journal of the Royal Statistical Society Series B: Statistical Methodology 84, 1 (2022), 234–266.
- Time series feature extraction on basis of scalable hypothesis tests (tsfresh–a python package). Neurocomputing 307 (2018), 72–77.
- Mohamad H Danesh and Alan Fern. 2021. Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results. arXiv preprint arXiv:2107.04982 (2021).
- Christian Schroeder de Witt and Thomas Hornigold. 2019. Stratospheric Aerosol Injection as a Deep Reinforcement Learning Problem. https://doi.org/10.48550/arXiv.1905.07366 arXiv:1905.07366 [physics, stat].
- Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 7897 (2022), 414–419.
- Alfonso Delgado-Bonal and Alexander Marshak. 2019. Approximate entropy and sample entropy: A comprehensive tutorial. Entropy 21, 6 (2019), 541.
- Illusionary Attacks on Sequential Decision Makers and Countermeasures. arXiv preprint arXiv:2207.10170 (2022).
- Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR.
- Out-of-Distribution Detection for Reinforcement Learning Agents with Probabilistic Dynamics Models. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. 851–859.
- Champion-Level Drone Racing Using Deep Reinforcement Learning. Nature 620, 7976 (Aug. 2023), 982–987. https://doi.org/10.1038/s41586-023-06419-4
- Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems 30 (2017).
- Isolation forest. In 2008 eighth ieee international conference on data mining. IEEE, 413–422.
- Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 6, 1 (2012), 1–39.
- S Lawrence Marple Jr and William M Carey. 1989. Digital spectral analysis with applications.
- Human-level control through deep reinforcement learning. nature (2015).
- Aaqib Parvez Mohammed and Matias Valdenegro-Toro. 2021. Benchmark for out-of-distribution detection in deep reinforcement learning. arXiv preprint arXiv:2112.02694 (2021).
- Towards Anomaly Detection in Reinforcement Learning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 1799–1803.
- Assessing generalization in deep reinforcement learning. arXiv preprint arXiv:1810.12282 (2018).
- E. S. Page. 1954. Continuous Inspection Schemes. Biometrika 41, 1/2 (1954), 100–115. https://doi.org/10.2307/2333009 Publisher: [Oxford University Press, Biometrika Trust].
- Proximal Policy Optimization Algorithms. Technical Report. arXiv.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
- Uncertainty-based out-of-distribution classification in deep reinforcement learning. arXiv preprint arXiv:2001.00496 (2019).
- Expect the unexpected: unsupervised feature selection for automated sensor anomaly detection. IEEE Sensors Journal 21, 16 (2021), 18033–18046.
- MuJoCo: A physics engine for model-based control. In 2012 International Conference on Intelligent Robots and Systems.
- A. Wald. 1945. Sequential Tests of Statistical Hypotheses. The Annals of Mathematical Statistics 16, 2 (1945), 117–186. https://www.jstor.org/stable/2235829 Publisher: Institute of Mathematical Statistics.
- Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334 (2021).