Rejecting Hallucinated State Targets during Planning (2410.07096v8)
Abstract: Generative models can be used in planning to propose targets corresponding to states that agents deem either likely or advantageous to experience. However, imperfections, common in learned models, lead to infeasible hallucinated targets, which can cause delusional behaviors and thus safety concerns. This work first categorizes and investigates the properties of several kinds of infeasible targets. Then, we devise a strategy to reject infeasible targets with a generic target evaluator, which trains alongside planning agents as an add-on without the need to change the behavior nor the architectures of the agent (and the generative model) it is attached to. We highlight that, without proper design, the evaluator can produce delusional estimates, rendering the strategy futile. Thus, to learn correct evaluations of infeasible targets, we propose to use a combination of learning rule, architecture, and two assistive hindsight relabeling strategies. Our experiments validate significant reductions in delusional behaviors and performance improvements for several kinds of existing planning agents.
- Diffusion policies for out-of-distribution generalization in offline reinforcement learning. IEEE Robotics and Automation Letters, 2024.
- Understanding decision-time vs. background planning in model-based reinforcement learning. arXiv preprint arXiv:2206.08442, 2022.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Addressing hindsight bias in multigoal reinforcement learning. IEEE Transactions on Cybernetics, 53(1):392–405, 2023. doi: 10.1109/TCYB.2021.3107202.
- Managing extreme ai risks amid rapid progress. Science, 384(6698):842–845, 2024.
- Babyai: A platform to study the sample efficiency of grounded language learning. International Conference on Learning Representations, 2018a. http://arxiv.org/abs/1810.08272.
- Minimalistic gridworld environment for openai gym. GitHub repository, 2018b. https://github.com/maximecb/gym-minigrid.
- Philip R. Corlett. Factor one, familiarity and frontal cortex: a challenge to the two-factor theory of delusions. Cognitive Neuropsychiatry, 24(3):165–177, 2019. doi: 10.1080/13546805.2019.1606706. URL https://doi.org/10.1080/13546805.2019.1606706. PMID: 31010382.
- Diversity-based trajectory and goal selection with hindsight experience replay. In PRICAI 2021: Trends in Artificial Intelligence: 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Hanoi, Vietnam, November 8–12, 2021, Proceedings, Part III 18, pp. 32–45. Springer, 2021.
- Wish you were here: Hindsight goal selection for long-horizon dexterous manipulation. arXiv preprint arXiv:2112.00597, 2021.
- Feudal reinforcement learning. Advances in neural information processing systems, 5, 1992.
- Improvements on hindsight learning. arXiv preprint arXiv:1809.06719, 2018.
- Goal misgeneralization in deep reinforcement learning. In International Conference on Machine Learning, pp. 12004–12019. PMLR, 2022.
- Exploration-driven representation learning in reinforcement learning. In ICML 2021 Workshop on Unsupervised Reinforcement Learning, 2021.
- Deep hierarchical planning from pixels. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=wZk69kjy9_d.
- Soft hindsight experience replay. arXiv preprint arXiv:2002.02089, 2020.
- Babyai 1.1, 2020.
- Hallucinating value: A pitfall of dyna-style planning with imperfect environment models. arXiv preprint arXiv:2006.04363, 2020.
- Daniel Kahneman. Thinking, fast and slow. Farrar, Straus and Giroux, 2017.
- Understanding delusions. Industrial psychiatry journal, 18(1):3–18, 2009.
- Goal density-based hindsight experience prioritization for multi-goal robot manipulation reinforcement learning. In 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp. 432–437, 2020. doi: 10.1109/RO-MAN47096.2020.9223473.
- Non-delusional q-learning and value-iteration. Advances in neural information processing systems, 31, 2018.
- Goal-directed planning via hindsight experience replay. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=6NePxZwfae.
- Hierarchical foresight: Self-supervised learning of long-horizon tasks via visual subgoal generation. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1gzR2VKDH.
- Planning with goal-conditioned policies. Advances in Neural Information Processing Systems, 32, 2019.
- Reinforcement learning with hierarchies of machines. In M. Jordan, M. Kearns, and S. Solla (eds.), Advances in Neural Information Processing Systems, volume 10. MIT Press, 1997. URL https://proceedings.neurips.cc/paper_files/paper/1997/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.
- Goal misgeneralization: Why correct specifications aren’t enough for correct goals. arXiv preprint arXiv:2210.01790, 2022.
- Addressing different goal selection strategies in hindsight experience replay with actor-critic methods for robotic hand manipulation. In 2022 2nd International Conference on Robotics, Automation and Artificial Intelligence (RAAI), pp. 69–73, 2022. doi: 10.1109/RAAI56146.2022.10092979.
- Mher: Model-based hindsight experience replay. arXiv preprint arXiv:2107.00306, 2021a.
- Bias-reduced multi-step hindsight experience replay for efficient multi-goal reinforcement learning. arXiv preprint arXiv:2102.12962, 2021b.
- Reconciling spatial and temporal abstractions for goal representation. arXiv preprint arXiv:2401.09870, 2024.
- Understanding hindsight goal relabeling from a divergence minimization perspective. 2022. URL https://api.semanticscholar.org/CorpusID:256389549.
- A consciousness-inspired planning agent for model-based reinforcement learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 1569–1581, 57 Morehouse Lane, Red Hook, NY, United States, 2021. Curran Associates, Inc. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/0c215f194276000be6a6df6528067151-Paper.pdf.
- Consciousness-inspired spatio-temporal abstractions for better generalization in reinforcement learning. In The 12th International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=eo9dHwtTFt.