Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 28 tok/s
GPT-5 High 35 tok/s Pro
GPT-4o 94 tok/s
GPT OSS 120B 476 tok/s Pro
Kimi K2 190 tok/s Pro
2000 character limit reached

Self-Reflection Makes Large Language Models Safer, Less Biased, and Ideologically Neutral (2406.10400v2)

Published 14 Jun 2024 in cs.CL

Abstract: Previous studies proposed that the reasoning capabilities of LLMs can be improved through self-reflection, i.e., letting LLMs reflect on their own output to identify and correct mistakes in the initial responses. However, earlier experiments offer mixed results when it comes to the benefits of self-reflection. Furthermore, prior studies on self-reflection are predominantly concerned with the reasoning capabilities of models, ignoring the potential for self-reflection in safety, bias, and ideological leaning. Here, by conducting a series of experiments testing LLM's self-reflection capability in various tasks using a variety of prompts and different LLMs, we make several contributions to the literature. First, we reconcile conflicting findings regarding the benefit of self-reflection, by demonstrating that the outcome of self-reflection is sensitive to prompt wording -- both the original prompt that are used to elicit an initial answer and the subsequent prompt used to self-reflect. Specifically, although self-reflection may improve the reasoning capability of LLMs when the initial response is simple, the technique cannot improve upon the state-of-the-art chain-of-thought (CoT) prompting. Second, we show that self-reflection can lead to safer (75.8\% reduction in toxic responses while preserving 97.8\% non-toxic ones), less biased (77\% reduction in gender biased responses, while preserving 94.3\% unbiased ones), and more ideologically neutral responses (100\% reduction in partisan leaning response, while preserving 87.7\% non-partisan ones). The paper concludes by discussing the implications of our findings on the deployment of LLMs. We release our experiments at https://github.com/Michael98Liu/self-reflection.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com