Papers
Topics
Authors
Recent
2000 character limit reached

Spectral Journey: How Transformers Predict the Shortest Path

Published 12 Feb 2025 in cs.LG | (2502.08794v1)

Abstract: Decoder-only transformers lead to a step-change in capability of LLMs. However, opinions are mixed as to whether they are really planning or reasoning. A path to making progress in this direction is to study the model's behavior in a setting with carefully controlled data. Then interpret the learned representations and reverse-engineer the computation performed internally. We study decoder-only transformer LLMs trained from scratch to predict shortest paths on simple, connected and undirected graphs. In this setting, the representations and the dynamics learned by the model are interpretable. We present three major results: (1) Two-layer decoder-only LLMs can learn to predict shortest paths on simple, connected graphs containing up to 10 nodes. (2) Models learn a graph embedding that is correlated with the spectral decomposition of the line graph. (3) Following the insights, we discover a novel approximate path-finding algorithm Spectral Line Navigator (SLN) that finds shortest path by greedily selecting nodes in the space of spectral embedding of the line graph.

Summary

  • The paper demonstrates that two-layer decoder-only transformers can learn to predict shortest paths on small graphs and analyzes their internal mechanisms.
  • The authors found that models learn a spectral embedding where edge relationships correlate with the spectral decomposition of the graph's line graph.
  • Based on their findings, they developed a novel approximate shortest path algorithm called Spectral Line Navigation based on spectral line graph decomposition.

This paper explores how decoder-only transformer LLMs learn to predict shortest paths on simple, connected, and undirected graphs. The authors train GPT-style transformers from scratch on this task and then analyze the learned representations and attention mechanisms to understand the underlying algorithms.

The core findings are:

  1. Two-layer transformers can learn shortest paths: Two-layer decoder-only transformers can successfully predict shortest paths on graphs with up to 10 nodes. Models with more attention heads learn the task faster, although even a single-head model can perform the task.
  2. Spectral embedding: The models learn a graph embedding where the relationships between edge embeddings correlate with the spectral decomposition of the line graph of the input graph. Specifically, the principal components of the edge embeddings are correlated to the eigenvector coefficients of the line graph Laplacian.
  3. Novel shortest path algorithm: Based on these findings, the authors develop a novel approximate shortest path algorithm called Spectral Line Navigation\ (). This algorithm uses the spectral decomposition of the line graph to embed edges, and then greedily selects nodes to construct the shortest path based on distances in this embedding space.

Key experiments and analysis:

  • Training setup: The models are trained on graphs represented as sequences of edges and nodes, with control tokens to delineate the different parts of the input. The data includes graphs with 3 to 10 nodes, and the training set contains graphs not in the test set.
  • Accuracy measurement: The paper evaluates the probability of generating shortest paths, considering that multiple shortest paths may exist. They also analyze failure cases, linking them to the presence of many near-optimal paths.
  • Attention analysis: They identify attention heads in the second layer that attend to edges containing the current node (h\_current) and the target node (h\_target) during path generation.
  • Line graph Laplacian correlation: The paper demonstrates a strong correlation between the principal components of the edge control token embeddings and the eigenvector coefficients of the normalized Laplacian of the line graph.
  • Algorithm implementation: They implement and evaluate Spectral Line Navigation\ () on the test set, achieving high accuracy.

Overall, the paper provides insights into how transformers can learn to solve graph problems and proposes a novel path-finding algorithm based on spectral graph theory and mechanistic interpretability. The work combines experimental results with detailed mechanistic analysis to reverse engineer the computations performed by the models. The paper also includes ablation studies to confirm the importance of the hidden dimension and maximum edge count for learning the task.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 6 tweets with 408 likes about this paper.