Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Toward Human-AI Alignment in Large-Scale Multi-Player Games (2402.03575v2)

Published 5 Feb 2024 in cs.AI and cs.HC

Abstract: Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Andrej Karpathy. The simplest, fastest repository for training/finetuning medium-sized GPTs, 2023. URL https://github.com/karpathy/nanoGPT.
  2. AutoGPT. An experimental open-source attempt to make gpt-4 fully autonomous, 2023. URL https://github.com/Significant-Gravitas/AutoGPT.
  3. A general theoretical paradigm to understand learning from human preferences. arXiv preprint arXiv:2310.12036, 2023.
  4. The option-critic architecture. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
  5. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
  6. Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems, 35:24639–24654, 2022.
  7. Aligning robot and human representations. arXiv preprint arXiv:2302.01928, 2023.
  8. Generative ai at work. Technical report, National Bureau of Economic Research, 2023.
  9. Weak-to-strong generalization: Eliciting strong capabilities with weak supervision. arXiv preprint arXiv:2312.09390, 2023.
  10. Algorithms for the greater good! on mental modeling and acceptable symbiosis in human-ai collaboration. arXiv preprint arXiv:1801.09854, 2018.
  11. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  12. Cognitive control over learning: creating, clustering, and generalizing task-set structure. Psychological review, 120(1):190, 2013.
  13. Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692, 2023.
  14. Gabriel, I. Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–437, 2020.
  15. Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375, 2022.
  16. Supervised contrastive learning for pre-trained language model fine-tuning. arXiv preprint arXiv:2011.01403, 2020.
  17. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  18. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  19. Generative ai meets responsible ai: Practical challenges and opportunities. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  5805–5806, 2023.
  20. Options of interest: Temporal abstraction with interest functions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  4444–4451, 2020.
  21. Temporally abstract partial models. Advances in Neural Information Processing Systems, 34:1979–1991, 2021.
  22. Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937, 2022.
  23. Surgical fine-tuning improves adaptation to distribution shifts. arXiv preprint arXiv:2210.11466, 2022.
  24. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871, 2018.
  25. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  26. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  11976–11986, 2022.
  27. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
  28. Encoding of prospective tasks in the human prefrontal cortex under varying task loads. Journal of Neuroscience, 33(44):17342–17349, 2013.
  29. Monsell, S. Task switching. Trends in cognitive sciences, 7(3):134–140, 2003.
  30. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  31. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp.  1–22, 2023.
  32. Pomerleau, D. A. Efficient training of artificial neural networks for autonomous navigation. Neural computation, 3(1):88–97, 1991.
  33. Precup, D. Temporal abstraction in reinforcement learning. University of Massachusetts Amherst, 2000.
  34. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  35. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290, 2023.
  36. Sakai, K. Task set and prefrontal cortex. Annu. Rev. Neurosci., 31:219–245, 2008.
  37. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  38. Map induction: Compositional spatial submap learning for efficient exploration in novel environments. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=1NUsBU-7HAL.
  39. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  40. Deterministic policy gradient algorithms. In International conference on machine learning, pp.  387–395. Pmlr, 2014.
  41. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  42. Learning options in reinforcement learning. In Abstraction, Reformulation, and Approximation: 5th International Symposium, SARA 2002 Kananaskis, Alberta, Canada August 2–4, 2002 Proceedings 5, pp.  212–223. Springer, 2002.
  43. Getting aligned on representational alignment. arXiv preprint arXiv:2310.13018, 2023.
  44. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999a.
  45. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999b.
  46. Abstract task representations for inference and control. Trends in Cognitive Sciences, 2022.
  47. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
  48. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
  49. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966, 2023b.
  50. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
  51. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7959–7971, 2022.
  52. Learning human-like representations to enable learning human values. arXiv preprint arXiv:2312.14106, 2023.
  53. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.

Summary

We haven't generated a summary for this paper yet.