Fast Training Dataset Attribution via In-Context Learning (2408.11852v2)
Abstract: We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned LLMs. We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem of identifying contribution scores as a matrix factorization task. Our empirical comparison demonstrates that the mixture model approach is more robust to retrieval noise in in-context learning, providing a more reliable estimation of data contributions.
- Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024.
- GPT-4 Technical Report. arXiv:2303.08774, 2023.
- Hydra: Hypergradient data relevance analysis for interpreting deep neural networks. In AAAI, volume 35, pp. 7081–7089, 2021.
- Boolq: Exploring the surprising difficulty of natural yes/no questions. In NAACL-HLT, 2019.
- What neural networks memorize and why: Discovering the long tail via influence estimation. NeurIPS, 33:2881–2891, 2020.
- Data shapley: Equitable valuation of data for machine learning. In International conference on machine learning, pp. 2242–2251. PMLR, 2019.
- Making ai forget you: Data deletion in machine learning. NeurIPS, 32, 2019.
- Biconvex sets and optimization with biconvex functions: a survey and extensions. Mathematical methods of operations research, 66(3):373–407, 2007.
- Training data influence analysis and estimation: A survey. Machine Learning, pp. 1–53, 2024.
- Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Understanding black-box predictions via influence functions. In International conference on machine learning, pp. 1885–1894. PMLR, 2017.
- Bloom: A 176b-parameter open-access multilingual language model. 2023.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. NeurIPS, 2020.
- A bayesian perspective on training data attribution. arXiv preprint arXiv:2305.19765, 2023.
- Data valuation without training of a model. In The Eleventh International Conference on Learning Representations, 2022.
- Trak: attributing model behavior at scale. In ICML, 2023.
- Estimating training data influence by tracing gradient descent. NeurIPS, 33:19920–19930, 2020.
- Remember what you want to forget: Algorithms for machine unlearning. NeurIPS, 34:18075–18086, 2021.
- L S Shapley. A value for n-person games. Contributions to the Theory of Games, pp. 307–317, 1953.
- Unifying corroborative and contributive attributions in large language models. In 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pp. 665–683. IEEE, 2024.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.