Fast Training Dataset Attribution via In-Context Learning (2408.11852v2)

Published 14 Aug 2024 in cs.CL, cs.AI, and cs.LG

Abstract: We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned LLMs. We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem of identifying contribution scores as a matrix factorization task. Our empirical comparison demonstrates that the mixture model approach is more robust to retrieval noise in in-context learning, providing a more reliable estimation of data contributions.

References (20)

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (5)

Tweets

https://twitter.com/taha_bahadori/status/1851739222945993083