Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A Case Study (2312.03762v1)

Published 5 Dec 2023 in cs.LG and cs.AI

Abstract: We explore colour versus shape goal misgeneralization originally demonstrated by Di Langosco et al. (2022) in the Procgen Maze environment, where, given an ambiguous choice, the agents seem to prefer generalization based on colour rather than shape. After training over 1,000 agents in a simplified version of the environment and evaluating them on over 10 million episodes, we conclude that the behaviour can be attributed to the agents learning to detect the goal object through a specific colour channel. This choice is arbitrary. Additionally, we show how, due to underspecification, the preferences can change when retraining the agents using exactly the same procedure except for using a different random seed for the training run. Finally, we demonstrate the existence of outliers in out-of-distribution behaviour based on training random seed alone.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Goal misgeneralization in deep reinforcement learning. In International Conference on Machine Learning, pages 12004–12019. PMLR, 2022.
  2. Goal misgeneralization: Why correct specifications aren’t enough for correct goals. arXiv preprint arXiv:2210.01790, 2022.
  3. S. Russell. Human compatible: Artificial intelligence and the problem of control. Penguin, 2019.
  4. Which shortcut cues will dnns choose? a study from the parameter-space perspective. arXiv preprint arXiv:2110.03095, 2021.
  5. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  6. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International conference on machine learning, pages 1407–1416. PMLR, 2018.
  7. Leveraging procedural generation to benchmark reinforcement learning. In International conference on machine learning, pages 2048–2056. PMLR, 2020.
  8. Human trichromacy revisited. Proceedings of the National Academy of Sciences, 110(3):E260–E269, 2013.
  9. L. Chittka. The mind of a bee. Princeton University Press, 2022.
  10. S. Bloch and C. Martinoya. Specialization of visual functions for different retinal areas in the pigeon. Advances in vertebrate neuroethology, pages 359–368, 1983.
  11. Understanding RL vision. Distill, 5(11):e29, 2020.
  12. Acquisition of chess knowledge in alphazero. Proceedings of the National Academy of Sciences, 119(47):e2206625119, 2022.
  13. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
  14. Exploratory not explanatory: Counterfactual analysis of saliency maps for deep reinforcement learning. arXiv preprint arXiv:1912.05743, 2019.
  15. Understanding and controlling a maze-solving policy network, 2023. URL https://www.alignmentforum.org/posts/cAC4AXiNC5ig6jQnc/understanding-and-controlling-a-maze-solving-policy-network.
  16. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
  17. Automated classification of skin lesions: from pixels to practice. Journal of Investigative Dermatology, 138(10):2108–2110, 2018.
  18. Underspecification presents challenges for credibility in modern machine learning. The Journal of Machine Learning Research, 23(1):10237–10297, 2022.
  19. The multiberts: Bert reproductions for robustness analysis. arXiv preprint arXiv:2106.16163, 2021.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets