Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Symmetry-Breaking Augmentations for Ad Hoc Teamwork (2402.09984v2)

Published 15 Feb 2024 in cs.LG and cs.AI

Abstract: In dynamic collaborative settings, for AI agents to better align with humans, they must adapt to novel teammates who utilise unforeseen strategies. While adaptation is often simple for humans, it can be challenging for AI agents. Our work introduces symmetry-breaking augmentations (SBA) as a novel approach to this challenge. By applying a symmetry-flipping operation to increase behavioural diversity among training teammates, SBA encourages agents to learn robust responses to unknown strategies, highlighting how social conventions impact human-AI alignment. We demonstrate this experimentally in two settings, showing that our approach outperforms previous ad hoc teamwork results in the challenging card game Hanabi. In addition, we propose a general metric for estimating symmetry dependency amongst a given set of policies. Our findings provide insights into how AI systems can better adapt to diverse human conventions and the core mechanics of alignment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Emergence of conventions through social learning. Autonomous Agents and Multi-Agent Systems, 28(5):779–804, 2014.
  2. The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216, 2020.
  3. Beasty, A. The role of language in the emergence of equivalence relations: A developmental study. PhD thesis, University College of North Wales, 1987.
  4. Generating and adapting to diverse ad-hoc partners in hanabi. IEEE Transactions on Games, 2022.
  5. On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems, 32, 2019.
  6. Investigating partner diversification methods in cooperative multi-agent deep reinforcement learning. In Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, November 18–22, 2020, Proceedings, Part V 27, pp.  395–402. Springer, 2020.
  7. Generating diverse cooperative agents by learning incompatible policies. In The Eleventh International Conference on Learning Representations, 2022.
  8. Adversarial diversity in hanabi. In The Eleventh International Conference on Learning Representations, 2022.
  9. Testing for symmetry in the conditional discriminations of language-trained chimpanzees. Journal of the experimental analysis of behavior, 73(1):5–22, 2000.
  10. Dwass, M. Modified randomization tests for nonparametric hypotheses. The Annals of Mathematical Statistics, pp.  181–187, 1957.
  11. Bayesian action decoder for deep multi-agent reinforcement learning. In International Conference on Machine Learning, pp. 1942–1951. PMLR, 2019.
  12. Game theory. MIT press, 1991.
  13. Failure modes of domain generalization algorithms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  19077–19086, 2022.
  14. Graves, A. Long short-term memory. Supervised sequence labelling with recurrent neural networks, pp.  37–45, 2012.
  15. Social norms. 2001.
  16. Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933, 2018.
  17. Simplified action decoder for deep multi-agent reinforcement learning. arXiv preprint arXiv:1912.02288, 2019.
  18. “other-play” for zero-shot coordination. In International Conference on Machine Learning, pp. 4399–4410. PMLR, 2020.
  19. Off-belief learning. In International Conference on Machine Learning, pp. 4369–4379. PMLR, 2021.
  20. Recurrent experience replay in distributed reinforcement learning. In International conference on learning representations, 2018.
  21. Who needs to know? minimal knowledge for optimal coordination. In International Conference on Machine Learning, pp. 18599–18613. PMLR, 2023.
  22. Learning existing social conventions via observationally augmented self-play. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp.  107–114, 2019.
  23. Lewis, D. Convention: A philosophical study. John Wiley & Sons, 2008.
  24. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  25. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
  26. Trajectory diversity for zero-shot coordination. In International Conference on Machine Learning, pp. 7204–7213. PMLR, 2021.
  27. Learning intuitive policies using action features. In International Conference on Machine Learning, pp. 23358–23372. PMLR, 2023.
  28. Quantifying the effects of environment and population diversity in multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems, 36(1):21, 2022.
  29. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  30. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pp. 1928–1937. PMLR, 2016.
  31. Equivariant networks for zero-shot coordination. In NeurIPS 2022, November 2022a.
  32. Generalized beliefs for cooperative ai. In International Conference on Machine Learning, pp. 16062–16082. PMLR, 2022b.
  33. Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In IJCAI, volume 3, pp.  705–711, 2003.
  34. Minimum coverage sets for training robust ad hoc teamwork agents. arXiv preprint arXiv:2308.09595, 2023a.
  35. Generating teammates for training robust ad hoc teamwork agents via best-response diversity. Transactions on Machine Learning Research, 2023b.
  36. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
  37. Multi-agent common knowledge reinforcement learning. Advances in neural information processing systems, 32, 2019.
  38. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  39. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010.
  40. Reinforcement learning: An introduction. MIT press, 2018.
  41. Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pp.  330–337, 1993.
  42. Tesauro, G. Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural computation, 6(2):215–219, 1994.
  43. A new formalism, method and open issues for zero-shot coordination. In International Conference on Machine Learning, pp. 10413–10423. PMLR, 2021.
  44. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
  45. Dueling network architectures for deep reinforcement learning. In International conference on machine learning, pp. 1995–2003. PMLR, 2016.
  46. Learning zero-shot cooperation with humans, assuming humans are biased. arXiv preprint arXiv:2302.01605, 2023.
  47. Sotopia: Interactive evaluation for social intelligence in language agents. arXiv preprint arXiv:2310.11667, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets