Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Concept Alignment (2401.08672v1)

Published 9 Jan 2024 in cs.LG, cs.AI, and q-bio.NC

Abstract: Discussion of AI alignment (alignment between humans and AI systems) has focused on value alignment, broadly referring to creating AI systems that share human values. We argue that before we can even attempt to align values, it is imperative that AI systems and humans align the concepts they use to understand the world. We integrate ideas from philosophy, cognitive science, and deep learning to explain the need for concept alignment, not just value alignment, between humans and machines. We summarize existing accounts of how humans and machines currently learn concepts, and we outline opportunities and challenges in the path towards shared concepts. Finally, we explain how we can leverage the tools already being developed in cognitive science and AI research to accelerate progress towards concept alignment.

Understanding Concept Alignment in AI

Introduction to Concept Alignment

The field of AI aims to create systems that harmonize with human perspectives and goals. Traditionally, discussions about AI alignment have concentrated on value alignment, which broadly deals with developing AI that reflects human ethics. However, the concept of ‘value’ is intricate and varies among humans, lending to complexity in teaching AI systems to align with those values. Conversely, humans inherently possess similar conceptual frameworks that shape how they perceive the world. Thus, the paper introduces a novel approach called concept alignment, positing that AI must first understand the world through human-like concepts before seeking to share human values.

The Importance of Conceptual Frameworks

In history, the progress of science is replete with examples where conceptual misalignments led to significant paradigm shifts. For instance, the conceptual frameworks of Aristotelian and Newtonian physics were so disparate that dialogue between proponents of each was challenging. Similarly, adults and children can interpret the same scenario differently—if children lack the concept of volume, they may err in judging the quantity of liquid in differently shaped containers. This underscores the challenge in aligning concepts among humans, which is critical before expecting AI systems to align with complex human values.

Comparing Human and AI Concept Learning

Humans learn concepts and languages in an intertwined process that involves placeholder words being filled with meaning through exposure and experience. The crux of human concept learning is it results from a rich tapestry of sensory, social, and cognitive experiences. In contrast, AI systems, such as neural networks, learn concepts through representations in high-dimensional spaces, which do not map directly to human understanding. Ensuring that AI systems develop concepts in a human-like way requires innovative methods in AI research to measure and refine these representational models.

Towards Shared Conceptual Understanding

For humans to trust AI systems, the latter must demonstrate a shared conceptual understanding, going beyond mere language processing. But how can AI systems gain a human-aligned conceptual grasp? Multimodal learning has emerged as a promising area, as with models like Imagen that create visual representations from textual descriptions, adding a sensory grounding to language processing. Robots equipped with such capabilities can enrich their conceptual understanding and potentially align with human expectations.

The Road to Concept Alignment

The path to robust concept alignment between humans and machines demands an iteratively developed standard informed by cognitive science and refined through empirical research and engineering across modalities. It also requires incorporating interactive learning, where AI adapts and fine-tunes its conceptual knowledge through human interaction. The development of such sophisticated AI capabilities will pave the way for more dependable and harmonious integration of AI into human society, ensuring that when AI speaks of "apples," it conjures the same sensory-rich concept we humans do.

In summary, concept alignment is not merely a technical challenge but an interdisciplinary pursuit that stands to revolutionize the way AI systems interact and operate in our world, bridging the gap between artificial intelligence and human cognition.

Acknowledgements for this insightful work go to the collaborative efforts supported by the Diverse Intelligences Summer Institute and the Templeton World Charity Foundation. The journey toward truly human-aligned AI is complex, but concept alignment marks a tangible step in the direction of deeper, more meaningful integration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. https://openai.com/blog/chatgpt/, 2022. ChatGPT, OpenAI.
  2. Complexity matching in dyadic conversation. Journal of Experimental Psychology: General, 143(6):2304, 2014.
  3. Movement dynamics reflect a functional role for weak coupling and role structure in dyadic problem solving. Cognitive processing, 16:325–332, 2015.
  4. Sanity checks for saliency maps. Advances in neural information processing systems, 31, 2018.
  5. Jacob Beck. Can bootstrapping explain concept learning? Cognition, 158:110–121, 2017.
  6. Laurence BonJour. The Structure of Empirical Knowledge. Harvard University Press, Cambridge, Mass., 1985.
  7. Conceptual pacts and lexical choice in conversation. Journal of experimental psychology: Learning, memory, and cognition, 22(6):1482, 1996.
  8. Susan E Brennan et al. Lexical entrainment in spontaneous dialog. Proceedings of ISSD, 96:41–44, 1996.
  9. Value alignment verification. In International Conference on Machine Learning, pages 1105–1115. PMLR, 2021.
  10. Susan Carey. The Origin of Concepts. Oxford Series in Cognitive Development. Oxford University Press, USA, 2009.
  11. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  12. B. Christian. The Alignment Problem: Machine Learning and Human Values. WW Norton, 2020.
  13. The self-organization of human interaction. In Psychology of learning and motivation, volume 59, pages 43–95. Elsevier, 2013.
  14. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
  15. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
  16. Perspective-taking in dialogue as self-organization under social constraints. New Ideas in Psychology, 32:131–146, 2014.
  17. On the connection between adversarial robustness and saliency map interpretability. arXiv preprint arXiv:1905.04172, 2019.
  18. Conversation, coupling and complexity: Matching scaling laws predict performance in a joint decision task. In Poster presented at the 35th annual conference of the cognitive science society, 2013.
  19. Dialog as interpersonal synergy. New Ideas in Psychology, 32:147–157, 2014.
  20. Iason Gabriel. Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–437, 2020.
  21. Richard Gonzales. Feds say self-driving uber suv did not recognize jaywalking pedestrian in fatal crash. https://www.npr.org/2019/11/07/777438412/feds-say-self-driving-uber-suv-did-not-recognize-jaywalking-pedestrian, 2019. National Public Radio (NPR).
  22. Inverse reward design. Advances in neural information processing systems, 30, 2017.
  23. Cooperative inverse reinforcement learning. Advances in neural information processing systems, 29, 2016.
  24. Stevan Harnad. The symbol grounding problem. Physica D: Nonlinear Phenomena, 42(1-3):335–346, 1990.
  25. From partners to populations: A hierarchical bayesian account of coordination and convention. Psychological Review, 130(4):977, 2023.
  26. Aligning ai with shared human values. arXiv preprint arXiv:2008.02275, 2020.
  27. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  28. Matthew Hutson. Can we stop runaway a.i.? https://www.newyorker.com/science/annals-of-artificial-intelligence/can-we-stop-the-singularity, 2023. The New Yorker.
  29. Zoltan Jakab. How to improve on quinian bootstrapping-a response to nativist objections. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 35, 2013.
  30. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018.
  31. Similarity of neural network representations revisited. In International conference on machine learning, pages 3519–3529. PMLR, 2019.
  32. Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in systems neuroscience, page 4, 2008.
  33. Thomas S. Kuhn. The essential tension. Philosophy of Science, 45(4):649–652, 1978.
  34. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
  35. Dialogue learning with human-in-the-loop. In 5th International Conference on Learning Representations, ICLR 2017, 2017.
  36. Behavior matching in multimodal communication is synchronized. Cognitive science, 36(8):1404–1426, 2012.
  37. Hidden differences in phenomenal experience. Cognitive Science, 47(1):e13239, 2023.
  38. Ryan Mac. Facebook apologizes after a.i. puts ‘primates’ label on video of black men. https://www.nytimes.com/2021/09/03/technology/facebook-ai-race-primates.html, 2021. The New York Times.
  39. Jean Matter Mandler. The foundations of mind: Origins of conceptual thought. Oxford University Press, 2004.
  40. Cade Metz. How google’s ai viewed the move no human could understand. https://www.wired.com/2016/03/googles-ai-viewed-move-no-human-understand/, 2016. WIRED.
  41. Shades of confusion: Lexical uncertainty modulates ad hoc coordination in an interactive communication task. Cognition, 225:105152, 2022.
  42. Otto Neurath. Protocol Statements, page 91–99. Springer Netherlands, Dordrecht, 1983.
  43. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  44. Social coordination of verbal and nonverbal behaviours. Interpersonal coordination and performance in social systems, page 259, 2016.
  45. Parallelograms revisited: Exploring the limitations of vector space models for simple analogies. cognition, 205, article 104440, 2020.
  46. J Piaget and B Inhelder. Systems of reference and horizontal–vertical coordinates. The Child’s Conception of Space (1967), pages 375–418, 1967.
  47. Syntactic priming in language production. Trends in cognitive sciences, 3(4):136–141, 1999.
  48. Toward a mechanistic psychology of dialogue. Behavioral and brain sciences, 27(2):169–190, 2004.
  49. Willard V. O. Quine. Two dogmas of empiricism. Philosophical Review, 60(1):20–43, 1951.
  50. W.V.O. Quine. Ontological relativity. Journal of Philosophy, 65(7):185–212, 1968.
  51. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  52. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  53. Alignment in multimodal interaction: An integrative framework. Cognitive science, 44(11):e12911, 2020.
  54. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
  55. Eleanor H Rosch. Natural categories. Cognitive psychology, 4(3):328–350, 1973.
  56. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  57. John R Searle. Minds, brains, and programs. Behavioral and brain sciences, 3(3):417–424, 1980.
  58. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  59. Deep inside convolutional networks: visualising image classification models and saliency maps. In Proceedings of the International Conference on Learning Representations (ICLR). ICLR, 2014.
  60. SingularityGroup. from risk to reward: the role of ai alignment in shaping a positive future. https://www.su.org/blog/from-risk-to-reward-the-role-of-ai-alignment-in-shaping-a-positive-future, 2023.
  61. Charles Taylor. Rationality. In Martin Hollis and Steven Lukes, editors, Rationality and Relativism, pages 87–105. MIT Press, 1982.
  62. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2019, page 6558. NIH Public Access, 2019.
  63. Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152, 2018.
  64. Take and took, gaggle and goose, book and read: Evaluating the utility of vector differences for lexical relation learning. arXiv preprint arXiv:1509.01692, 2015.
  65. Maximizing information exchange between complex networks. Physics Reports, 468(1-3):1–99, 2008.
  66. Beyond the isolated brain: The promise and challenge of interacting minds. Neuron, 103(2):186–188, 2019.
  67. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2(3):5, 2022.
  68. The clock and the pizza: Two stories in mechanistic explanation of neural networks. arXiv preprint arXiv:2306.17844, 2023.
  69. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sunayana Rane (8 papers)
  2. Polyphony J. Bruna (1 paper)
  3. Ilia Sucholutsky (45 papers)
  4. Christopher Kello (2 papers)
  5. Thomas L. Griffiths (150 papers)
Citations (7)