TRAK: Attributing Model Behavior at Scale (2303.14186v2)

Published 24 Mar 2023 in stat.ML and cs.LG

Abstract: The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-LLMs (CLIP), and LLMs (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .

Citations (104)

View on Semantic Scholar

Summary

The paper introduces TRAK, a method that uses Taylor expansion and random projection to efficiently attribute deep model predictions to training data.
It leverages a linear approximation via the empirical neural tangent kernel with a one-step Newton method to compute influence scores.
Empirical tests on image classifiers and language models demonstrate TRAK's state-of-the-art trade-offs between efficacy and computational efficiency.

Attributing Model Behavior with TRAK: A Computationally Tractable Approach

The paper "TRAK: Attributing Model Behavior at Scale" by Park et al. introduces an innovative method called TRAK (Tracing with the Randomly-projected After Kernel) for the task of model behavior attribution. The core goal of this method is to effectively trace model predictions back to their training data, a challenge that has persisted due to the trade-offs between computational feasibility and efficacy in existing approaches.

Key Contributions

TRAK provides a significant advancement by offering a method that is both computationally efficient and effective, particularly for large-scale, differentiable models such as deep neural networks. By exploiting a linear approximation of the model's behavior through the empirical neural tangent kernel (eNTK) and utilizing random projections, the authors craft an attribution method that circumvents the need for training a prohibitively large number of models.

Methodology

The TRAK methodology involves several sophisticated elements:

Linearization: The method commences by linearizing the model output function using a Taylor series expansion centered on the final model parameters. This reduces the non-linear model behavior to a linear approximation in the gradient space, aligning it with the empirical neural tangent kernel approach.
Dimensionality Reduction: To make the linear approximation computationally manageable, the gradients are passed through a random projection matrix, leveraging the Johnson-Lindenstrauss lemma to preserve pertinent information while reducing dimensionality.
Influence Estimation: TRAK adapts the one-step Newton approximation to compute influence scores, which are crucial for estimating the effect of removing a particular training example on the model’s predictions.
Ensembling: The results are further refined by averaging over several models, each trained on random subsets of the data, which enhances robustness to stochastic variations in training.
Sparsification: To improve clarity and focus on significant data points, the method incorporates a soft-threshold approach to induce sparsity in the influence scores.

Empirical Validation

The authors validate TRAK across a range of models and tasks, including image classifiers trained on CIFAR and ImageNet, LLMs (BERT and mT5), and vision-LLMs (CLIP). The results demonstrate that TRAK attains state-of-the-art trade-offs between efficacy and computational efficiency. Notably, it achieves performance comparable to more computationally intensive methods like datamodels, while requiring only a fraction of the computational resources.

Implications and Future Directions

The TRAK methodology holds several implications for research and practice. In practical terms, it facilitates more interpretable AI systems by providing a scalable means to understand the influences of training data on model behavior. Theoretically, it opens avenues for further exploration into the utility of approximations provided by kernel methods in deep learning settings, particularly in the context of empirical neural tangent kernels.

Future investigations might explore extending TRAK to more complex model architectures and exploring its applicability in domains requiring high explainability, such as healthcare and finance. Additionally, the robustness of the linear approximation, especially in non-convex and highly dynamic learning environments, remains an intriguing point of investigation.

In conclusion, TRAK represents a significant stride in data attribution capabilities, providing a pathway to interpretable and computationally feasible model insight, while also prompting new questions and research directions in the field of model interpretability and data-driven AI ethics.

PDF Markdown

Related Papers

GitHub

GitHub - MadryLab/trak: A fast, effective data attribution method for neural networks in PyTorch (155 stars)

Tweets

https://twitter.com/aleks_madry/status/1750244651851030674

https://twitter.com/xszheng2020/status/1770426400245207458

YouTube

Show All Videos