ELAD: Explanation-Guided Large Language Models Active Distillation (2402.13098v1)
Abstract: The deployment and application of LLMs is hindered by their memory inefficiency, computational demands, and the high costs of API inferences. Traditional distillation methods, which transfer the capabilities of LLMs to smaller models, often fail to determine whether the knowledge has been sufficiently transferred, potentially resulting in high costs or incomplete distillation. In this paper, we propose an Explanation-Guided LLMs Active Distillation (ELAD) framework that employs an active learning strategy to optimize the balance between annotation costs and model performance. To improve efficient sample selection, we introduce an explanation-guided sample selection method that identifies samples challenging its reasoning by exploiting uncertainties in explanation steps. Additionally, we present a customized LLM-annotated explanation revision technique where the teacher model detects and corrects flaws in the student model's reasoning. Our experiments across various reasoning datasets demonstrate that our framework significantly enhances the efficiency of LLM knowledge distillation.
- Ask me anything: A simple strategy for prompting language models. arXiv preprint arXiv:2210.02441.
- Beyond efficiency: A systematic survey of resource-efficient large language models. arXiv preprint arXiv:2401.00625.
- Parikshit Bansal and Amit Sharma. 2023. Large language models as annotators: Enhancing generalization of nlp models at minimal cost. arXiv preprint arXiv:2306.15766.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- e-snli: Natural language inference with natural language explanations. Advances in Neural Information Processing Systems, 31.
- Knife: Distilling reasoning knowledge from free-text rationales. arXiv preprint arXiv:2212.09721.
- Jiuhai Chen and Jonas Mueller. 2023. Quantifying uncertainty in answers from any language model via intrinsic and extrinsic confidence assessment. arXiv preprint arXiv:2308.16175.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Aron Culotta and Andrew McCallum. 2005. Reducing labeling effort for structured prediction tasks. In AAAI, volume 5, pages 746–751.
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
- Sean P Engelson and Ido Dagan. 1996. Minimizing manual annotation cost in supervised training from corpora. arXiv preprint cmp-lg/9606030.
- Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
- Large language models are reasoning teachers. arXiv preprint arXiv:2212.10071.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
- Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301.
- Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351.
- Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
- Anita Krishnakumar. 2007. Active learning literature survey. Tech. rep., Technical reports, University of California, Santa Cruz., 42.
- Rationalizing neural predictions. arXiv preprint arXiv:1606.04155.
- David D Lewis. 1995. A sequential algorithm for training text classifiers: Corrigendum and additional data. In Acm Sigir Forum, volume 29, pages 13–19. ACM New York, NY, USA.
- Symbolic chain-of-thought distillation: Small models can also" think" step-by-step. arXiv preprint arXiv:2306.14050.
- Explanations from large language models make small reasoners better. arXiv preprint arXiv:2210.06726.
- Jieyi Long. 2023. Large language model guided tree-of-thought. arXiv preprint arXiv:2305.08291.
- Teaching small language models to reason. arXiv preprint arXiv:2212.08410.
- Wt5?! training text-to-text models to explain their predictions. arXiv preprint arXiv:2004.14546.
- Adversarial nli: A new benchmark for natural language understanding. arXiv preprint arXiv:1910.14599.
- Explain yourself! leveraging language models for commonsense reasoning. arXiv preprint arXiv:1906.02361.
- A survey of deep active learning. ACM computing surveys (CSUR), 54(9):1–40.
- Wendy D Roth and Jal D Mehta. 2002. The rashomon effect: Combining positivist and interpretivist approaches in the analysis of contested events. Sociological methods & research, 31(2):131–173.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
- Distilling reasoning capabilities into smaller language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073.
- Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937.
- Distilling task-specific knowledge from bert into simple neural networks. arXiv preprint arXiv:1903.12136.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Scott: Self-consistent chain-of-thought distillation. arXiv preprint arXiv:2305.01879.
- Want to reduce labeling cost? gpt-3 can help. arXiv preprint arXiv:2108.13487.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Measuring association between labels and free-text rationales. arXiv preprint arXiv:2010.12762.
- Beyond labels: Empowering human annotators with natural language explanations through a novel active-learning architecture. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11629–11643.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
- A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. arXiv preprint arXiv:2312.02003.
- Rethinking cooperative rationalization: Introspective extraction and complement control. arXiv preprint arXiv:1910.13294.
- Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems, 35:15476–15488.
- A survey of active learning for natural language processing. arXiv preprint arXiv:2210.10109.
- Yifei Zhang (167 papers)
- Bo Pan (31 papers)
- Chen Ling (65 papers)
- Yuntong Hu (12 papers)
- Liang Zhao (353 papers)