Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data Science with LLMs and Interpretable Models (2402.14474v1)

Published 22 Feb 2024 in cs.LG and cs.CL

Abstract: Recent years have seen important advances in the building of interpretable models, machine learning models that are designed to be easily understood by humans. In this work, we show that LLMs are remarkably good at working with interpretable models, too. In particular, we show that LLMs can describe, interpret, and debug Generalized Additive Models (GAMs). Combining the flexibility of LLMs with the breadth of statistical patterns accurately described by GAMs enables dataset summarization, question answering, and model critique. LLMs can also improve the interaction between domain experts and interpretable models, and generate hypotheses about the underlying phenomenon. We release \url{https://github.com/interpretml/TalkToEBM} as an open-source LLM-GAM interface.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models. arXiv preprint arXiv:2306.05052.
  2. Elephants Never Forget: Testing Language Models for Memorization of Tabular Data. In NeurIPS 2023 Second Table Representation Learning Workshop.
  3. From Shapley values to generalized additive models and back. In International Conference on Artificial Intelligence and Statistics, 709–745. PMLR.
  4. Language models are realistic tabular data generators. arXiv preprint arXiv:2210.06280.
  5. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 1721–1730.
  6. Faith and Fate: Limits of Transformers on Compositionality. arXiv preprint arXiv:2305.18654.
  7. Generalized additive models, volume 43. CRC press.
  8. Tabllm: Few-shot classification of tabular data with large language models. In International Conference on Artificial Intelligence and Statistics, 5549–5581. PMLR.
  9. GPT for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering. arXiv preprint arXiv:2305.03403.
  10. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3): 291–297.
  11. LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs. arXiv preprint arXiv:2308.01157.
  12. Death by Round Numbers: Glass-Box Machine Learning Uncovers Biases in Medical Practice. medRxiv, 2022–04.
  13. Generalized and scalable optimal sparse decision trees. In International Conference on Machine Learning, 6150–6160. PMLR.
  14. Lost in the Middle: How Language Models Use Long Contexts. arXiv preprint arXiv:2307.03172.
  15. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
  16. Can Foundation Models Wrangle Your Data? arXiv preprint arXiv:2205.09911.
  17. Interpretml: A unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223.
  18. TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns. arXiv preprint arXiv:2303.15747.
  19. OpenAI. 2023. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774.
  20. Rudin, C. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5): 206–215.
  21. Talktomodel: Understanding machine learning models with open ended dialogues. arXiv preprint arXiv:2207.04154.
  22. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the annual symposium on computer application in medical care, 261. American Medical Informatics Association.
  23. OpenML: networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15(2): 49–60.
  24. Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning. In NeurIPS 2022 First Table Representation Workshop.
  25. AnyPredict: Foundation Model for Tabular Prediction. arXiv preprint arXiv:2305.12081.
  26. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35: 24824–24837.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets