Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach (2403.05636v1)
Abstract: LLMs have catalyzed transformative advances across a spectrum of natural language processing tasks through few-shot or zero-shot prompting, bypassing the need for parameter tuning. While convenient, this modus operandi aggravates hallucination'' concerns, particularly given the enigmatic
black-box'' nature behind their gigantic model sizes. Such concerns are exacerbated in high-stakes applications (e.g., healthcare), where unaccountable decision errors can lead to devastating consequences. In contrast, human decision-making relies on nuanced cognitive processes, such as the ability to sense and adaptively correct misjudgments through conceptual understanding. Drawing inspiration from human cognition, we propose an innovative \textit{metacognitive} approach, dubbed \textbf{CLEAR}, to equip LLMs with capabilities for self-aware error identification and correction. Our framework facilitates the construction of concept-specific sparse subnetworks that illuminate transparent decision pathways. This provides a novel interface for model \textit{intervention} after deployment. Our intervention offers compelling advantages: (\textit{i})~at deployment or inference time, our metacognitive LLMs can self-consciously identify potential mispredictions with minimum human involvement, (\textit{ii})~the model has the capability to self-correct its errors efficiently, obviating the need for additional tuning, and (\textit{iii})~the rectification procedure is not only self-explanatory but also user-friendly, enhancing the interpretability and accessibility of the model. By integrating these metacognitive features, our approach pioneers a new path toward engendering greater trustworthiness and accountability in the deployment of LLMs.
- Evaluation of the soft error assessment consistency of a jit-based virtual platform simulator. IET Computers & Digital Techniques, 15(2):125–142, 2021.
- Cebab: Estimating the causal effects of real-world concepts on nlp model behavior. Advances in Neural Information Processing Systems, 35:17582–17596, 2022.
- Efficient large scale language modeling with mixtures of experts. arXiv preprint arXiv:2112.10684, 2021.
- Aspect-category-opinion-sentiment quadruple extraction with implicit aspects and opinions. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 340–350, 2021.
- Machine learning interpretability: A survey on methods and metrics. Electronics, 8(8):832, 2019.
- Sparse moe as the new dropout: Scaling dense and self-slimmable transformers. arXiv preprint arXiv:2303.01610, 2023.
- Cox, M. T. Metacognition in computation: A selected research review. Artificial intelligence, 169(2):104–141, 2005.
- One model, multiple modalities: A sparsely activated approach for text, sound, image, video and code. arXiv preprint arXiv:2205.06126, 2022.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
- Glam: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning, pp. 5547–5569. PMLR, 2022.
- Farrell, C.-J. Identifying mislabelled samples: machine learning models exceed human performance. Annals of Clinical Biochemistry, 58(6):650–652, 2021.
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. The Journal of Machine Learning Research, 23(1):5232–5270, 2022.
- Flavell, J. H. Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. American psychologist, 34(10):906, 1979.
- French, R. M. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.
- Sparsity through evolutionary pruning prevents neuronal networks from overfitting. Neural Networks, 128:305–312, 2020.
- Transformer feed-forward layers are key-value memories. arXiv preprint arXiv:2012.14913, 2020.
- The hewlett foundation: Automated essay scoring, 2012. URL https://kaggle.com/competitions/asap-aes.
- Test-time training on nearest neighbors for large language models. arXiv preprint arXiv:2305.18466, 2023.
- Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798, 2023.
- Who feels what and why? annotation of a literature corpus with semantic roles of emotions. In Proceedings of the 27th International Conference on Computational Linguistics, pp. 1345–1359, 2018.
- Concept bottleneck models. In International Conference on Machine Learning, pp. 5338–5348. PMLR, 2020.
- Inference-time intervention: Eliciting truthful answers from a language model. arXiv preprint arXiv:2306.03341, 2023.
- Malafouris, L. How things shape the mind. MIT press, 2013.
- Sources of hallucination by large language models on inference tasks. arXiv preprint arXiv:2305.14552, 2023.
- Berthop: An effective vision-and-language model for chest x-ray disease diagnosis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 725–734. Springer, 2022.
- Metacognitive training for schizophrenia patients (mct): a pilot study on feasibility, treatment adherence, and subjective efficacy. German Journal of Psychiatry, 10(3):69–78, 2007.
- Label-free concept bottleneck models. In The Eleventh International Conference on Learning Representations, 2022.
- OpenAI. Gpt-4 technical report, 2023.
- Automatic differentiation in pytorch. In NeurIPS, 2017.
- Penfield, W. Mystery of the mind: A critical study of consciousness and the human brain. Princeton University Press, 2015.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
- Mixture-of-experts meets instruction tuning: A winning combination for large language models. arXiv preprint arXiv:2305.14705, 2023.
- Vaqem: A variational approach to quantum error mitigation. arXiv e-prints, pp. arXiv–2112, 2021.
- Interpreting pretrained language models via concept bottlenecks. arXiv preprint arXiv:2311.05014, 2023.
- Fine-tune bert for docred with two-step process. arXiv preprint arXiv:1909.11898, 2019.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Huggingface’s transformers: State-of-the-art natural language processing, 2020.
- Yedda: A lightweight collaborative text span annotation tool. arXiv preprint arXiv:1711.03759, 2017.
- Post-hoc concept bottleneck models. arXiv preprint arXiv:2205.15480, 2022.
- Concept embedding models. In NeurIPS 2022-36th Conference on Neural Information Processing Systems, 2022.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
- Moefication: Conditional computation of transformer models for efficient inference. arXiv preprint arXiv:2110.01786, 13, 2021.
- Mixture-of-experts with expert choice routing. Advances in Neural Information Processing Systems, 35:7103–7114, 2022a.
- Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations, 2022b.
- Zimmerman, B. J. Theories of self-regulated learning and academic achievement: An overview and analysis. Self-regulated learning and academic achievement, pp. 1–36, 2013.
- Zhen Tan (68 papers)
- Jie Peng (100 papers)
- Tianlong Chen (202 papers)
- Huan Liu (283 papers)