LLM Attributor: Interactive Visual Attribution for LLM Generation (2404.01361v1)
Abstract: While LLMs have shown remarkable capability to generate convincing text across diverse domains, concerns around its potential risks have highlighted the importance of understanding the rationale behind text generation. We present LLM Attributor, a Python library that provides interactive visualizations for training data attribution of an LLM's text generation. Our library offers a new way to quickly attribute an LLM's text generation to training data points to inspect model behaviors, enhance its trustworthiness, and compare model-generated text with user-provided text. We describe the visual and interactive design of our tool and highlight usage scenarios for LLaMA2 models fine-tuned with two different datasets: online articles about recent disasters and finance-related question-answer pairs. Thanks to LLM Attributor's broad support for computational notebooks, users can easily integrate it into their workflow to interactively visualize attributions of their models. For easier access and extensibility, we open-source LLM Attributor at https://github.com/poloclub/ LLM-Attribution. The video demo is available at https://youtu.be/mIG2MDQKQxM.
- J Alammar. 2021. Ecco: An open source library for the explainability of transformer language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. Association for Computational Linguistics.
- ferret: a framework for benchmarking explainers on transformers. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 256–266, Dubrovnik, Croatia. Association for Computational Linguistics.
- Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48.
- Gaurang Bharti. 2023. wealth-alpaca-lora. https://huggingface.co/datasets/gbharti/wealth-alpaca_lora.
- Does the order of training samples matter? improving neural data-to-text generation with curriculum learning. arXiv preprint arXiv:2102.03554.
- R Dennis Cook and Sanford Weisberg. 1980. Characterizations of an empirical influence function for detecting influential cases in regression. Technometrics, 22(4):495–508.
- Eraser: A benchmark to evaluate rationalized nlp models. arXiv preprint arXiv:1911.03429.
- Thermostat: A large collection of NLP model explanations and analysis tools. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 87–95. Association for Computational Linguistics.
- Amirata Ghorbani and James Zou. 2019. Data shapley: Equitable valuation of data for machine learning. In International conference on machine learning, pages 2242–2251. PMLR.
- Google. 2024. Analyze model behavior with interpretability tools. https://ai.google.dev/responsible/model_behavior.
- Studying large language model generalization with influence functions. arXiv preprint arXiv:2308.03296.
- Simfluence: Modeling the influence of individual training examples by simulating training runs. arXiv preprint arXiv:2303.08114.
- Xiaochuang Han and Yulia Tsvetkov. 2022. Orca: Interpreting prompted language models via locating supporting data evidence in the ocean of pretraining data. arXiv preprint arXiv:2205.12600.
- Datamodels: Predicting predictions from training data. arXiv preprint arXiv:2202.00622.
- Contrastive explanations for model interpretability. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1597–1611. Association for Computational Linguistics.
- Llm comparator: Visual analytics for side-by-side evaluation of large language models. arXiv preprint arXiv:2402.10524.
- Jupyter notebooks-a publishing format for reproducible computational workflows. Elpub, 2016:87–90.
- Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International conference on machine learning, pages 1885–1894. PMLR.
- Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, pages 12–24.
- Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models. arXiv preprint arXiv:2310.00902.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
- Post-hoc interpretability for neural nlp: A survey. ACM Computing Surveys, 55(8):1–42.
- OpenAI. 2023. Chatgpt. https://chat.openai.com.
- On the risk of misinformation pollution with large language models. arXiv preprint arXiv:2305.13661.
- Trak: Attributing model behavior at scale. arXiv preprint arXiv:2303.14186.
- Charles Pierse. 2021. Transformers interpret: Explainability for any transformers models in 2 lines. https://github.com/cdpierse/transformers-interpret?tab=readme-ov-file.
- Estimating training data influence by tracing gradient descent. Advances in Neural Information Processing Systems, 33:19920–19930.
- Shayan Sardarizadeh and Mike Wendling. 2023. Hawaii wildfires: ’directed energy weapon’ and other false claims go viral. BBC News.
- Inseq: An interpretability toolkit for sequence generation models. arXiv preprint arXiv:2302.13942.
- Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1):11–21.
- Harnessing llms in curricular design: Using gpt-4 to support authoring of learning objectives. arXiv preprint arXiv:2306.17459.
- Roy Strom. 2023. Fake chatgpt cases cost lawyers $5,000 plus embarrassment. Bloomberg Law.
- The language interpretability tool: Extensible, interactive visualizations and analysis for nlp models. In EMNLP Demo.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Unifying corroborative and contributive attributions in large language models. arXiv preprint arXiv:2311.12233.
- Kayo Yin and Graham Neubig. 2022. Interpreting language models with contrastive explanations. arXiv preprint arXiv:2202.10419.
- Siren’s song in the ai ocean: a survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
- Synthetic lies: Understanding ai-generated misinformation and evaluating algorithmic and human solutions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–20.
- Chatgpt hallucinates when attributing answers. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, pages 46–51.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.