Emergent Mind

Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT

(2310.07582)
Published Oct 11, 2023 in cs.LG and cs.AI

Abstract

Foundation models exhibit significant capabilities in decision-making and logical deductions. Nonetheless, a continuing discourse persists regarding their genuine understanding of the world as opposed to mere stochastic mimicry. This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT. The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process. This paper further elucidates the interplay between the linear world representation and causal decision-making, and their dependence on layer depth and model complexity. We have made the code public.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Sign up for a free account or log in to generate a summary of this paper:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Understanding intermediate layers using linear classifier probes
  2. Deep ViT Features as Dense Visual Descriptors
  3. Probing Classifiers: Promises, Shortcomings, and Advances
  4. On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 610–623.
  5. What Does BERT Look at? An Analysis of BERT’s Attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, Florence, Italy, 276–286. https://doi.org/10.18653/v1/W19-4828

  6. Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
  7. Visualisation and’diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure. Journal of Artificial Intelligence Research 61 (2018), 907–926.
  8. Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=DeG07_TcZvT

  9. Neel Nanda. 2023. Actually, Othello-GPT Has A Linear Emergent World Model. https://neelnanda.io/mechanistic-interpretability/othello

  10. Zoom in: An introduction to circuits. Distill 5, 3 (2020), e00024–001.
  11. Feature Visualization. Distill (2017). https://doi.org/10.23915/distill.00007 https://distill.pub/2017/feature-visualization.

  12. In-context Learning and Induction Heads
  13. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
  14. Chess as a Testbed for Language Model State Tracking

Show All 14