Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Settings (2402.08145v2)

Published 13 Feb 2024 in cs.AI

Abstract: This paper introduces a new approach for continual planning and model learning in relational, non-stationary stochastic environments. Such capabilities are essential for the deployment of sequential decision-making systems in the uncertain and constantly evolving real world. Working in such practical settings with unknown (and non-stationary) transition systems and changing tasks, the proposed framework models gaps in the agent's current state of knowledge and uses them to conduct focused, investigative explorations. Data collected using these explorations is used for learning generalizable probabilistic models for solving the current task despite continual changes in the environment dynamics. Empirical evaluations on several non-stationary benchmark domains show that this approach significantly outperforms planning and RL baselines in terms of sample complexity. Theoretical results show that the system exhibits desirable convergence properties when stationarity holds.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. NovGrid: A Flexible Grid World for Evaluating Agent Response to Novelty. In AAAI 2022 Spring Symposium on Designing AI for Open Worlds.
  2. Neuro-Symbolic World Models for Adapting to Open World Novelty. In Proc. AAMAS.
  3. Maintaining Evolving Domain Models. In Proc. IJCAI.
  4. Strong Planning in Non-Deterministic Domains via Model Checking. In Proc. AIPS.
  5. Updating Action Domain Descriptions. AIJ, 174(15): 1172–1221.
  6. RAPid-Learn: A Framework for Learning to Recover for Handling Novelties in Open-World Environments. In Proc. ICDL.
  7. LAO*: A Heuristic Search Algorithm that Finds Solutions with Loops. AIJ, 129(1-2): 35–62.
  8. Symbolic Plans as High-Level Instructions for Reinforcement Learning. In Proc. ICAPS.
  9. Relational Abstractions for Generalized Reinforcement Learning on Symbolic Problems. In Proc. IJCAI.
  10. RePReL: A Unified Framework for Integrating Relational Planning and Reinforcement Learning for Effective Abstraction in Discrete and Continuous Domains. Neural Comput. Appl., 35(23): 16877–16892.
  11. Exploration in Relational Domains for Model-based Reinforcement Learning. JMLR, 13: 3725–3768.
  12. The 3rd International Planning Competition: Results and Analysis. JAIR, 20: 1–59.
  13. Relational Reinforcement Learning for Planning with Exogenous Effects. JMLR, 18: 78:1–78:44.
  14. Playing Atari with Deep Reinforcement Learning. CoRR, abs/1312.5602.
  15. A Domain-Independent Agent Architecture for Adaptive Operation in Evolving Open Worlds. arXiv:2306.06272.
  16. OpenMIND: Planning and Adapting in Domains with Novelty. In Proc. CACS.
  17. Differential Assessment of Black-Box AI Agents. In Proc. AAAI.
  18. Incremental Learning of Planning Actions in Model-Based Reinforcement Learning. In Proc. IJCAI.
  19. Pearson, K. 1992. On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling, 11–28. New York, NY: Springer New York. ISBN 978-1-4612-4380-9.
  20. Proximal Policy Optimization Algorithms. CoRR, abs/1707.06347.
  21. Optimistic Exploration in Reinforcement Learning Using Symbolic Model Estimates. In Proc. NeurIPS.
  22. Sutton, R. S. 1990. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In Proc. ICML.
  23. Reinforcement Learning: An Introduction. MIT Press. ISBN 978-0-262-19398-6.
  24. Relational Reinforcement Learning: An Overview. In ICML RRL Workshop.
  25. Autonomous Capability Assessment of Sequential Decision-Making Systems in Stochastic Settings. In Proc. NeurIPS.
  26. Watkins, C. 1989. Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge, UK.
  27. The First Probabilistic Track of the International Planning Competition. JAIR, 24: 851–887.
Citations (3)

Summary

We haven't generated a summary for this paper yet.