Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention (2312.15033v1)

Published 22 Dec 2023 in cs.CL and cs.AI

Abstract: LLMs have achieved unprecedented breakthroughs in various natural language processing domains. However, the enigmatic ``black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. While past approaches, such as attention visualization, pivotal subnetwork extraction, and concept-based analyses, offer some insight, they often focus on either local or global explanations within a single dimension, occasionally falling short in providing comprehensive clarity. In response, we propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs. Our framework, termed SparseCBM, innovatively integrates sparsity to elucidate three intertwined layers of interpretation: input, subnetwork, and concept levels. In addition, the newly introduced dimension of interpretable inference-time intervention facilitates dynamic adjustments to the model during deployment. Through rigorous empirical evaluations on real-world datasets, we demonstrate that SparseCBM delivers a profound understanding of LLM behaviors, setting it apart in both interpreting and ameliorating model inaccuracies. Codes are provided in supplements.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. CEBaB: Estimating the causal effects of real-world concepts on NLP model behavior. NeurIPS, 35: 17582–17596.
  2. Combating misinformation in the age of llms: Opportunities and challenges. arXiv:2311.05656.
  3. Long live the lottery: The existence of winning tickets in lifelong learning. In ICLR.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.
  5. Rigging the lottery: Making all tickets winners. In ICML, 2943–2952.
  6. Attention in natural language processing. IEEE transactions on neural networks and learning systems, 32(10): 4291–4308.
  7. The state of sparsity in deep neural networks. arXiv:1902.09574.
  8. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR.
  9. Learning both Weights and Connections for Efficient Neural Network. In NeurIPS, 1135–1143.
  10. Second order derivatives for network pruning: Optimal brain surgeon. NeurIPS, 5.
  11. Channel pruning for accelerating very deep neural networks. In Proceedings of ICCV.
  12. Disinformation Detection: An Evolving Challenge in the Age of LLMs. arXiv:2309.15847.
  13. Concept bottleneck models. In ICML, 5338–5348.
  14. The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 4163–4181.
  15. Block pruning for faster transformers. arXiv:2109.04838.
  16. Optimal brain damage. In NeurIPS, 598–605.
  17. Optimal Brain Damage. In Touretzky, D. S., ed., NeurIPS, 598–605. Morgan-Kaufmann.
  18. Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. arXiv:2306.03341.
  19. Explanations from large language models make small reasoners better. arXiv:2210.06726.
  20. DISC: Learning from Noisy Labels via Dynamic Instance-Specific Selection and Correction. In Proceedings of the IEEE/CVF Conference on CVPR, 24070–24079.
  21. CSGNN: Conquering Noisy Node labels via Dynamic Class-wise Selection. arXiv:2311.11473.
  22. Improve Interpretability of Neural Networks via Sparse Contrastive Coding. In Findings of the Association for Computational Linguistics: EMNLP 2022, 460–470.
  23. Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together! arXiv:2303.02141.
  24. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692.
  25. Learning efficient convolutional networks through network slimming. In Proceedings of ICCV, 2736–2744.
  26. Interpretability beyond classification output: Semantic bottleneck networks. arXiv:1907.10882.
  27. A unified approach to interpreting model predictions. NeurIPS, 30.
  28. Is Sparse Attention more Interpretable? In ACL-IJCNLP, 122–129.
  29. Are sixteen heads really better than one? NeurIPS, 32.
  30. Local interpretable model-agnostic explanations for music content analysis. In ISMIR, volume 53, 537–543.
  31. Label-free Concept Bottleneck Models. In ICLR.
  32. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
  33. Automatic differentiation in pytorch. In NeurIPS.
  34. ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD Conference, 1135–1144.
  35. Explaining NLP Models via Minimal Contrastive Editing (MiCE). In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 3840–3852.
  36. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108.
  37. Spine: Sparse interpretable neural embeddings. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
  38. A Simple and Effective Pruning Approach for Large Language Models. arXiv:2306.11695.
  39. Interpreting Pretrained Language Models via Concept Bottlenecks. arXiv:2311.05014.
  40. Graph few-shot class-incremental learning. In Proceedings of the Fifteenth ACM International Conference on WSDM, 987–996.
  41. Vig, J. 2019. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 37–42.
  42. Intepreting & Improving Pretrained Language Models: A Probabilistic Conceptual Approach. Openreview:id=kwF1ZfHf0W.
  43. Neural implicit dictionary learning via mixture-of-expert training. In ICML, 22613–22624.
  44. Noise-Robust Fine-Tuning of Pretrained Language Models via External Guidance. arXiv:2311.01108.
  45. Contrastive Meta-Learning for Few-shot Node Classification. In Proceedings of the 29th ACM SIGKDD Conference, 2386–2397.
  46. Knowledge Editing for Large Language Models: A Survey. arXiv:2310.16218.
  47. Learn-prune-share for lifelong learning. In 2020 ICDM, 641–650. IEEE.
  48. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv:1910.03771.
  49. Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models. In ACL-IJCNLP.
  50. Opt: Open pre-trained transformer language models. arXiv:2205.01068.
  51. Less is more: Towards compact cnns. In ECCV, 662–677. Springer.
  52. Large Language Models are Human-Level Prompt Engineers. In ICLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhen Tan (68 papers)
  2. Tianlong Chen (202 papers)
  3. Zhenyu Zhang (250 papers)
  4. Huan Liu (283 papers)
Citations (12)