Internalized Self-Correction for Large Language Models

Published 21 Dec 2024 in cs.AI | (2412.16653v1)

Abstract: In this article, we introduce 'Internalized Self-Correction' (InSeC) for LLMs. While many approaches exist for self-reflection at inference time, we propose a novel method that combines ideas from negative sampling, self-reflection during training, and inference time. InSeC allows LLMs to correct themselves by introducing mistakes and their corresponding corrections during training, thereby converting the learning process into a true supervised learning task with both positive and negative examples. This approach can be extended to improve instruction following and correct hallucinations or incorrect sentences generated by LLMs.