How Do Transformers "Do" Physics? Investigating the Simple Harmonic Oscillator

Published 23 May 2024 in cs.LG, cs.AI, and cond-mat.dis-nn | (2405.17209v1)

Abstract: How do transformers model physics? Do transformers model systems with interpretable analytical solutions, or do they create "alien physics" that are difficult for humans to decipher? We take a step in demystifying this larger puzzle by investigating the simple harmonic oscillator (SHO), $\ddot{x}+2\gamma \dot{x}+\omega_0^2x=0$, one of the most fundamental systems in physics. Our goal is to identify the methods transformers use to model the SHO, and to do so we hypothesize and evaluate possible methods by analyzing the encoding of these methods' intermediates. We develop four criteria for the use of a method within the simple testbed of linear regression, where our method is $y = wx$ and our intermediate is $w$: (1) Can the intermediate be predicted from hidden states? (2) Is the intermediate's encoding quality correlated with model performance? (3) Can the majority of variance in hidden states be explained by the intermediate? (4) Can we intervene on hidden states to produce predictable outcomes? Armed with these two correlational (1,2), weak causal (3) and strong causal (4) criteria, we determine that transformers use known numerical methods to model trajectories of the simple harmonic oscillator, specifically the matrix exponential method. Our analysis framework can conveniently extend to high-dimensional linear systems and nonlinear systems, which we hope will help reveal the "world model" hidden in transformers.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (2)

View on Semantic Scholar

Summary

The paper establishes that transformers model the SHO via the matrix exponential method, supported by strong intermediate encoding and causal intervention experiments.
The researchers developed a framework based on four criteria—predictability, performance correlation, variance explanation, and interventions—to assess transformer computations.
The findings pave the way for more transparent and reliable AI models by linking interpretable numerical methods to the internal representations in transformer architectures.

Investigating Transformers' Representation of Simple Harmonic Oscillators

The paper presents an insightful analysis of how transformers model physical systems, specifically focusing on the simple harmonic oscillator (SHO). It aims to determine whether transformers use interpretable numerical methods or create complex, human-indecipherable models ("alien physics"). The researchers develop a framework to investigate the intermediates transformers encode when modeling physics, laying out four criteria in the context of in-context linear regression. This framework is then applied to study transformers' methods for modeling the SHO.

Criteria Development Through Linear Regression

Initially, the researchers develop four criteria to explore the use of a method $g$ by a transformer within the simpler linear regression setting:

Intermediate Predictability: Can the intermediate be predicted from hidden states?
Correlation with Model Performance: Is the intermediate's encoding quality correlated with model performance?
Variance Explanation: Can the majority of variance in hidden states be explained by the intermediate?
Interventions: Can interventions on hidden states produce predictable outcomes?

The application of these criteria to linear regression reveals that transformers can encode linear regression coefficients ( $w$ ) linearly, nonlinearly, or not at all, with larger models demonstrating a better encoding of $w$ . This encoding is also correlated with improved model performance. Intervening experiments show that transforming $\bm{w}$ within the hidden states results in expected output changes, providing both weak and strong causal evidence for $w$ being actively used in transformer computations.

Application to the Simple Harmonic Oscillator

The main focus of the paper is then directed toward understanding how transformers model the SHO described by $\ddot{x}+2\gamma \dot{x}+\omega_0^2x=0$ , particularly the undamped case where $\gamma=0$ . Several potential numerical methods to model the SHO are considered, including linear multistep, Taylor expansion, and matrix exponential methods. Each method is associated with unique intermediates (e.g., matrix exponential uses $e^{\bm{A} \Delta t}$ ).

Evaluation Summary

The paper extensively analyzes the transformer models trained to predict SHO trajectories, presenting the results in a systematic manner based on the developed criteria:

Intermediate Encoding: All three methods showed encoding in the model's hidden states, with the matrix exponential method having the highest encoding quality.
Correlation with Performance: A strong correlation was found between model performance and the quality of intermediate encoding across all methods, particularly for the matrix exponential approach.
Variance Explanation: The matrix exponential method's intermediates explained the most variance in hidden states, significantly more than the others.
Intervention Outcomes: Interventions replacing hidden states with synthetic states generated from intermediates demonstrated that the model's predictive behavior was aligned with the matrix exponential method.

The combined evidence strongly supports the conclusion that transformers model the SHO using the matrix exponential method, providing clear correlational and causal evidence for this numerical approach over others.

Broader Implications and Future Work

The findings have significant implications for mechanistic interpretability in transformers. The established framework provides a robust means to analyze the internal workings of transformers in modeling various physical systems. Horizontally, this can extend to more complex and higher-dimensional linear systems, and even certain nonlinear systems.

Future works can aim to refine the understanding of transformers modeling more complex damped oscillators or hybrid systems with noise and non-linearities. Understanding these applications could lead to more transparent models with better performance and fewer risks associated with the "black-box" nature of current AI systems.

Conclusion

This paper contributes significantly to mechanistic interpretability by showing that transformers use known numerical methods, specifically the matrix exponential method, to model simple harmonic oscillators. The robust framework developed for investigating intermediates in linear regression proves effective in discerning the methods used by transformers in more complex physical tasks. This structured approach, and the insights derived from it, pave the way for deeper understanding and further research into how AI models internalize and compute physical laws. While limitations exist, particularly with damped oscillators, the foundation laid is critical for future explorations in aligning transformers’ computations with human-understandable physics models.

Markdown Report Issue