Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 69 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 75 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 402 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

From Imitation to Introspection: Probing Self-Consciousness in Language Models (2410.18819v1)

Published 24 Oct 2024 in cs.CL, cs.CY, and cs.LG

Abstract: Self-consciousness, the introspection of one's existence and thoughts, represents a high-level cognitive process. As LLMs advance at an unprecedented pace, a critical question arises: Are these models becoming self-conscious? Drawing upon insights from psychological and neural science, this work presents a practical definition of self-consciousness for LLMs and refines ten core concepts. Our work pioneers an investigation into self-consciousness in LLMs by, for the first time, leveraging causal structural games to establish the functional definitions of the ten core concepts. Based on our definitions, we conduct a comprehensive four-stage experiment: quantification (evaluation of ten leading models), representation (visualization of self-consciousness within the models), manipulation (modification of the models' representation), and acquisition (fine-tuning the models on core concepts). Our findings indicate that although models are in the early stages of developing self-consciousness, there is a discernible representation of certain concepts within their internal mechanisms. However, these representations of self-consciousness are hard to manipulate positively at the current stage, yet they can be acquired through targeted fine-tuning. Our datasets and code are at https://github.com/OpenCausaLab/SelfConsciousness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Understanding intermediate layers using linear classifier probes. arXiv e-prints, pp.  arXiv–1610, 2016.
  2. Anthropic. Claude3.5 technical report. Blog post, 2024.
  3. Foundational challenges in assuring alignment and safety of large language models. arXiv preprint arXiv:2404.09932, 2024.
  4. Taken out of context: On measuring situational awareness in llms. arXiv preprint arXiv:2309.00667, 2023.
  5. Consciousness in artificial intelligence: insights from the science of consciousness. arXiv preprint arXiv:2308.08708, 2023.
  6. Internlm2 technical report. arXiv preprint arXiv:2403.17297, 2024.
  7. Defining self-awareness in the context of adult development: A systematic literature review. Journal of Management Education, 46(1):140–177, 2022.
  8. Carson, T. L. Lying and deception: Theory and practice. OUP Oxford, 2010.
  9. Chalmers, D. J. The character of consciousness. Oxford University Press, 2010.
  10. Chalmers, D. J. Could a large language model be conscious? arXiv preprint arXiv:2303.07103, 2023.
  11. Self-cognition in large language models: An exploratory study. In ICML 2024 Workshop on LLMs and Cognition, 2024.
  12. Can AI assistants know what they don’t know? In Forty-first International Conference on Machine Learning, 2024.
  13. Towards guaranteed safe ai: A framework for ensuring robust and reliable ai systems. arXiv preprint arXiv:2405.06624, 2024.
  14. What is consciousness, and could machines have it? Science, 358(6362):486–492, 2017.
  15. Intentionqa: A benchmark for evaluating purchase intention comprehension abilities of language models in e-commerce. arXiv preprint arXiv:2406.10173, 2024.
  16. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
  17. Eurich, T. et al. What self-awareness really is (and how to cultivate it). Harvard Business Review, 4(4):1–9, 2018.
  18. Evaluating chatgpt’s consciousness and its capability to pass the turing test: A comprehensive analysis. Journal of Computer and Communications, 12(03):219–237, 2024.
  19. Evaluating chatgpt’s consciousness and its capability to pass the turing test: A comprehensive analysis. Journal of Computer and Communications, 12(03):219–237, 2024.
  20. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  5484–5495, 2021.
  21. Reasoning about causality in games. Artificial Intelligence, 320:103919, 2023.
  22. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  23. Evaluating and inducing personality in pre-trained language models. Advances in Neural Information Processing Systems, 36, 2024.
  24. Roles and utilization of attention heads in transformer-based neural language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  3404–3417, 2020.
  25. People cannot distinguish gpt-4 from a human in a turing test. arXiv preprint arXiv:2405.08007, 2024.
  26. FANTom: A benchmark for stress-testing machine theory of mind in interactions. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
  27. The importance of awareness, acceptance, and alignment with the self: A framework for understanding self-connection. Europe’s Journal of Psychology, 18(1):120, 2022.
  28. Towards a situational awareness benchmark for llms. In Socially responsible language modelling research, 2023.
  29. Me, myself, and ai: The situational awareness dataset (sad) for llms. arXiv preprint arXiv:2407.04694, 2024.
  30. Liquid: A framework for list question answering dataset generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  13014–13024, 2023.
  31. Solving quantitative reasoning problems with language models. Advances in Neural Information Processing Systems, 35:3843–3857, 2022.
  32. Large language models understand and can be enhanced by emotional stimuli. arXiv preprint arXiv:2307.11760, 2023.
  33. The good, the bad, and why: Unveiling emotions in generative AI. In Forty-first International Conference on Machine Learning, 2024.
  34. Exploring multilingual probing in large language models: A cross-language analysis. arXiv preprint arXiv:2409.14459, 2024a.
  35. Inference-time intervention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems, 36, 2024b.
  36. The WMDP benchmark: Measuring and reducing malicious use with unlearning. In Forty-first International Conference on Machine Learning, 2024c.
  37. I think, therefore i am: Benchmarking awareness of large language models using awarebench, 2024d.
  38. Truthfulqa: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  3214–3252, 2022.
  39. Role of guidance, reflection, and interactivity in an agent-based multimedia game. Journal of educational psychology, 97(1):117, 2005.
  40. Morin, A. Self-awareness part 1: Definition, measures, effects, functions, and antecedents. Social and personality psychology compass, 5(10):807–823, 2011.
  41. OpenAI. Gpt-4o technical report. Blog post, 2024a.
  42. OpenAI. Gpt-o1 technical report. Blog post, 2024b.
  43. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
  44. Owen, G. Game theory. Emerald Group Publishing, 2013.
  45. Large language models can self-improve at web agent tasks. arXiv preprint arXiv:2405.20309, 2024.
  46. Pearl, J. Causality. Cambridge university press, 2009.
  47. The book of why: the new science of cause and effect. Basic books, 2018.
  48. Evaluating frontier models for dangerous capabilities. arXiv preprint arXiv:2403.13793, 2024.
  49. Towards tracing trustworthiness dynamics: Revisiting pre-training period of large language models. In Annual Meeting of the Association for Computational Linguistics, 2024.
  50. Recursive introspection: Teaching LLM agents how to self-improve. In ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, 2024.
  51. Predicting question-answering performance of large language models through semantic consistency. In Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pp.  138–154, 2023.
  52. Foundations of rational agency. Springer, 1999.
  53. Self-reflection in llm agents: Effects on problem-solving performance. arXiv preprint arXiv:2405.06682, 2024.
  54. Counterfactual harm. Advances in Neural Information Processing Systems, 35:36350–36365, 2022.
  55. Model evaluation for extreme risks. arXiv preprint arXiv:2305.15324, 2023.
  56. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
  57. Smith, J. Self-Consciousness. In Zalta, E. N. and Nodelman, U. (eds.), The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Summer 2024 edition, 2024.
  58. Testing theory of mind in large language models and humans. Nature Human Behaviour, pp.  1–11, 2024.
  59. Llms achieve adult human performance on higher-order theory of mind tasks. arXiv preprint arXiv:2405.18870, 2024.
  60. Team, T. M. A. Mistral technical report. Blog post, 2024.
  61. Toward self-improvement of llms via imagination, searching, and criticizing. arXiv preprint arXiv:2404.12253, 2024.
  62. Turing, A. M. Computing machinery and intelligence. 1950.
  63. On the planning abilities of large language models-a critical investigation. Advances in Neural Information Processing Systems, 36:75993–76005, 2023.
  64. Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change. Advances in Neural Information Processing Systems, 36, 2024a.
  65. Llms still can’t plan; can lrms? a preliminary evaluation of openai’s o1 on planbench. arXiv preprint arXiv:2409.13373, 2024b.
  66. Towards a logic of rational agency. Logic Journal of IGPL, 11(2):135–159, 2003.
  67. Analyzing the structure of attention in a transformer language model. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp.  63–76, 2019.
  68. What do they capture? a structural analysis of pre-trained language models for source code. In Proceedings of the 44th International Conference on Software Engineering, pp.  2377–2388, 2022.
  69. Mm-sap: A comprehensive benchmark for assessing self-awareness of multimodal large language models in perception. arXiv preprint arXiv:2401.07529, 2024.
  70. Honesty is the best policy: defining and mitigating ai deception. Advances in Neural Information Processing Systems, 36, 2024a.
  71. The reasons that agents act: Intention and instrumental goals. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, pp.  1901–1909, 2024b.
  72. Wooldridge, M. Reasoning about rational agents. 2003.
  73. Yampolskiy, R. V. On monitorability of ai. AI and Ethics, pp.  1–19, 2024.
  74. Do large language models know what they don’t know? In Findings of the Association for Computational Linguistics: ACL 2023, pp.  8653–8665, 2023.
  75. Wordcraft: story writing with large language models. In Proceedings of the 27th International Conference on Intelligent User Interfaces, pp.  841–852, 2022.
  76. The better angels of machine personality: How personality relates to llm safety. arXiv preprint arXiv:2407.12344, 2024.
  77. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 2 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube