Interpretable Cross-Examination Technique (ICE-T): Using highly informative features to boost LLM performance (2405.06703v1)
Abstract: In this paper, we introduce the Interpretable Cross-Examination Technique (ICE-T), a novel approach that leverages structured multi-prompt techniques with LLMs to improve classification performance over zero-shot and few-shot methods. In domains where interpretability is crucial, such as medicine and law, standard models often fall short due to their "black-box" nature. ICE-T addresses these limitations by using a series of generated prompts that allow an LLM to approach the problem from multiple directions. The responses from the LLM are then converted into numerical feature vectors and processed by a traditional classifier. This method not only maintains high interpretability but also allows for smaller, less capable models to achieve or exceed the performance of larger, more advanced models under zero-shot conditions. We demonstrate the effectiveness of ICE-T across a diverse set of data sources, including medical records and legal documents, consistently surpassing the zero-shot baseline in terms of classification metrics such as F1 scores. Our results indicate that ICE-T can be used for improving both the performance and transparency of AI applications in complex decision-making environments.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Concrete problems in ai safety. arXiv preprint arXiv:1606.06565.
- Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112.
- Is attention explanation? an introduction to the debate. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3889–3900, Dublin, Ireland. Association for Computational Linguistics.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Neural legal judgment prediction in English. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4317–4323, Florence, Italy. Association for Computational Linguistics.
- What does bert look at? an analysis of bert’s attention. arXiv preprint arXiv:1906.04341.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- What you can cram into a single vector: Probing sentence embeddings for linguistic properties. arXiv preprint arXiv:1805.01070.
- A survey on in-context learning. arXiv preprint arXiv:2301.00234.
- Bryce Goodman and Seth Flaxman. 2017. European union regulations on algorithmic decision-making and a “right to explanation”. AI magazine, 38(3):50–57.
- Ppt: Pre-trained prompt tuning for few-shot learning. arXiv preprint arXiv:2109.04332.
- Chatgpt for good? on opportunities and challenges of large language models for education. Learning and individual differences, 103:102274.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
- Claudette: an automated detector of potentially unfair clauses in online terms of service. Artificial Intelligence and Law, 27:117–139.
- What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786.
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
- Text embeddings reveal (almost) as much as text. arXiv preprint arXiv:2310.06816.
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114.
- True few-shot learning with language models. Advances in neural information processing systems, 34:11054–11070.
- "why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144.
- Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633.
- Timo Schick and Hinrich Schütze. 2020. It’s not just size that matters: Small language models are also few-shot learners. arXiv preprint arXiv:2009.07118.
- Timo Schick and Hinrich Schütze. 2022. True few-shot learning with prompts—a real-world perspective. Transactions of the Association for Computational Linguistics, 10:716–731.
- Rethinking interpretability in the era of large language models. arXiv preprint arXiv:2402.01761.
- Cohort selection for clinical trials: n2c2 2018 shared task track 1. Journal of the American Medical Informatics Association : JAMIA, 26:1163.
- Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 3319–3328. PMLR.
- Dietrich Trautmann. 2023. Large language model prompt chaining for long legal document classification. arXiv preprint arXiv:2308.04138.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
- Self-instruct: Aligning language models with self-generated instructions. arXiv preprint arXiv:2212.10560.
- Climatebert: A pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837.
- Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems, pages 1–22.
- Self-adaptive in-context learning: An information compression perspective for in-context example selection and ordering. arXiv preprint arXiv:2212.10375.
- Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms. arXiv preprint arXiv:2306.13063.
- Gpt4tools: Teaching large language model to use tools via self-instruction. Advances in Neural Information Processing Systems, 36.
- The unreliability of explanations in few-shot prompting for textual reasoning. Advances in neural information processing systems, 35:30378–30392.
- Detecting causal language use in science findings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4664–4674, Hong Kong, China. Association for Computational Linguistics.
- Active example selection for in-context learning. arXiv preprint arXiv:2211.04486.
- Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
- Relying on the unreliable: The impact of language models’ reluctance to express uncertainty. arXiv preprint arXiv:2401.06730.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910.
- Multilingual stance detection in tweets: The Catalonia independence corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1368–1375, Marseille, France. European Language Resources Association.
- Goran Muric (15 papers)
- Ben Delay (1 paper)
- Steven Minton (1 paper)