Studying Large Language Model Generalization with Influence Functions (2308.03296v1)

Published 7 Aug 2023 in cs.LG, cs.CL, and stat.ML

Abstract: When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to LLMs due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.

Citations (121)

View on Semantic Scholar

Summary

The paper introduces EK-FAC as a scalable method to compute inverse-Hessian-vector products for influence functions on LLMs.
It employs TF-IDF filtering and query batching to efficiently manage gradient computations across vast training data.
It reveals that influences distribute layerwise and tokenwise, highlighting advanced abstract generalization and cross-lingual knowledge transfer.

Overview of the Paper "Studying LLM Generalization with Influence Functions"

The paper "Studying LLM Generalization with Influence Functions" investigates the use and scaling of influence functions to analyze the generalization behavior of LLMs. Influence functions are employed as a method to elucidate which training data significantly impact a model's behavior. Despite their previous success with smaller models, they present challenges when scaled to LLMs primarily due to the computational complexity involved in computing inverse-Hessian-vector products (IHVPs). The authors propose a new approach using the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation that enables the application of influence functions to LLMs with up to 52 billion parameters.

Key Contributions and Techniques

Use of EK-FAC: The introduction of EK-FAC allows for a scalable approximation of the Gauss-Newton Hessian, facilitating efficient computation of IHVPs. The EK-FAC method aims to mitigate the computational burdens typically associated with influence function computations on large-scale models. This approach is shown to achieve comparable accuracy to traditional methods while significantly reducing computational overhead.
Gradient Computation Strategies: To handle the extensive computation of training data gradients, two strategies are proposed: TF-IDF filtering, which reduces the candidate set of training sequences by ranking their relevance, and query batching, which leverages low-rank approximations to share computational costs across multiple queries.
Layerwise and Tokenwise Attribution: The paper extends influence analysis to individual network layers and token levels within sequences, providing insights into the distribution of influence across the model's architecture. It reveals that influences are distributed across layers, with different patterns of generalization emerging across the network.

Empirical Findings

Sparsity of Influences: The influence distribution for training sequences is noted to be heavy-tailed, resembling a power law, indicating that a few sequences contribute significantly more to model predictions than others. However, influences are spread over many sequences, suggesting generalization rather than mere memorization.
Patterns of Generalization: Larger models demonstrate more advanced generalization capabilities, exhibiting influences that correlate at an abstract conceptual level rather than mere token overlap observed in smaller models. This suggests that LLMs capture thematic knowledge as model size increases, evidenced across various tasks including role-playing behavior, math, and coding.
Cross-lingual Generalization: The influence from training sequences shows noticeable cross-lingual generalization in larger models, emphasizing the capacity of LLMs to transfer knowledge between languages as they scale.
Layerwise Influence Patterns: While average influences appear equally distributed across layers, deeper analysis indicates distinctive patterns, with lower layers focusing on more concrete patterns and middle layers demonstrating abstract reasoning, which aligns with findings in mechanistic interpretability literature.
Sensitivity to Word Order: A surprising limitation uncovered by influence function analysis is the model's sensitivity to the order of phrases, which limits its ability to generalize word order changes in training sequences.

Practical and Theoretical Implications

The advancements presented in this research enhance our ability to understand LLMs' generalization properties, offering a valuable tool for both theoretical exploration of model behaviors and practical applications such as debugging model outputs or identifying potential biases emerging from specific training data. By identifying which sequences significantly influence model outputs and exposing limitations such as sensitivity to word order, practitioners can better align LLMs with desired behaviors, especially in safety-critical applications.

Future Directions

The methods and findings described point towards exciting future research areas such as applying influence functions to fine-tuned models, refining the sensitivity analysis for broader generalization scenarios, and extending influence analysis to include the full fine-tuning processes involved in aligning AI with human values. Moreover, leveraging tokens and layer-specific insights further alongside mechanistic interpretability approaches could converge into more holistic understandings of LLM functioning.

In conclusion, the paper significantly advances the practical implementation of influence functions for LLMs, offering both a scalable computational approach and meaningful insights into the generalization behaviors of these models, paving the way for more transparent and aligned AI systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/jd_pressman/status/1867549174314783073

https://twitter.com/tomekkorbak/status/1779930914497204673

https://twitter.com/waxhn/status/1859503211838832833

https://twitter.com/rendope/status/1793585596092600402

https://twitter.com/pfau/status/1793577460027801711

https://twitter.com/__tosh/status/1777284737070985372

YouTube

Show All Videos

HackerNews

Studying Large Language Model Generalization with Influence Functions (6 points, 1 comment)