Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Continuously evolving rewards in an open-ended environment (2405.01261v1)

Published 2 May 2024 in cs.LG and cs.NE

Abstract: Unambiguous identification of the rewards driving behaviours of entities operating in complex open-ended real-world environments is difficult, partly because goals and associated behaviours emerge endogenously and are dynamically updated as environments change. Reproducing such dynamics in models would be useful in many domains, particularly where fixed reward functions limit the adaptive capabilities of agents. Simulation experiments described assess a candidate algorithm for the dynamic updating of rewards, RULE: Reward Updating through Learning and Expectation. The approach is tested in a simplified ecosystem-like setting where experiments challenge entities' survival, calling for significant behavioural change. The population of entities successfully demonstrate the abandonment of an initially rewarded but ultimately detrimental behaviour, amplification of beneficial behaviour, and appropriate responses to novel items added to their environment. These adjustment happen through endogenous modification of the entities' underlying reward function, during continuous learning, without external intervention.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017.
  2. Human-level control through deep reinforcement learning. Nature, 518:529–533, 2015.
  3. Deep reinforcement learning for robotic manipulation. arXiv:1610.00633v1, 2016.
  4. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  5. Highly accurate protein structure prediction with alphafold. Nature, 596:583–589, 2021.
  6. Evolving rewards to automate reinforcement learning. arXiv preprint arXiv:1905.07628, 2019.
  7. A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, 297:103500, 2021.
  8. Inverse reinforcement learning in a continuous state space with formal guarantees. 2021.
  9. Exploration in model-based reinforcement learning by empirically estimating learning progress. Advances in neural information processing systems, 25, 2012.
  10. An analytic solution to discrete bayesian reinforcement learning. In Proceedings of the 23rd international conference on Machine learning, pages 697–704, 2006.
  11. Learning with amigo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122, 2020.
  12. Extrinsic rewards, intrinsic rewards, and non-optimal behavior. Journal of Computational Neuroscience, 50(2):139–143, 2022.
  13. Jürgen Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. 1991.
  14. Intrinsically motivated reinforcement learning. Advances in neural information processing systems, 17, 2004.
  15. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814, 2015.
  16. Vime: Variational information maximizing exploration. Advances in neural information processing systems, 29, 2016.
  17. What is intrinsic motivation? a typology of computational approaches. Frontiers in neurorobotics, 1:108, 2007.
  18. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, 2017.
  19. Episodic multi-agent reinforcement learning with curiosity-driven exploration. Advances in Neural Information Processing Systems, 34:3757–3769, 2021.
  20. How novelty search escapes the deceptive trap of learning to learn. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 153–160, 2009.
  21. J. Lehman and K. O. Stanley. Abandoning objectives: evolution through the search for novelty alone. Evolutionary Computation, 19:189–223, 2011.
  22. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. Advances in neural information processing systems, 31, 2018.
  23. Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research, 11:241–276, 1999.
  24. Genetic algorithm-a literature review. In 2019 international conference on machine learning, big data, cloud and parallel computing (COMITCon), pages 380–384. IEEE, 2019.
  25. EVO-RL: evolutionary-driven reinforcement learning. CoRR, abs/2007.04725, 2020.
  26. Richard M. Bailey. Emergent rewards in open-ended systems. Proceedings of the Artificial Life Conference 2023 (ALIFE 2023), 2023.
  27. Mathew J Wedel. A monument of inefficiency: The presumed course of the recurrent laryngeal nerve in sauropod dinosaurs. Acta Palaeontologica Polonica, 57(2):251–256, 2011.
  28. Gwen Marchand Kenneth Resnicow Matti T. J. Heino, Daniele Proverbio and Nelli Hankonen. Attractor landscapes: a unifying conceptual model for understanding behaviour change across scales of observation. Health Psychology Review, 17(4):655–672, 2023. PMID: 36420691.
  29. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
  30. Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627, 2020.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets