Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Developing Safe and Responsible Large Language Model : Can We Balance Bias Reduction and Language Understanding in Large Language Models? (2404.01399v4)

Published 1 Apr 2024 in cs.CL

Abstract: LLMs have advanced various NLP tasks, such as text generation and translation, among others. However, these models often generate text that can perpetuate biases. Existing approaches to mitigate these biases usually compromise knowledge retention. This study explores whether LLMs can produce safe, unbiased outputs without sacrificing knowledge or comprehension. We introduce the Safe and Responsible LLM (\textbf{SR}${\text{LLM}}$), which has been instruction fine-tuned atop an inherently safe fine-tuned LLM to reduce biases in generated texts. We developed a specialized dataset with examples of unsafe and corresponding safe variations to train \textbf{SR}${\text{LLM}}$ to identify and correct biased text. Experiments on our specialized dataset and out-of-distribution test sets reveal that \textbf{SR}$_{\text{LLM}}$ effectively reduces biases while preserving knowledge integrity. This performance surpasses that of traditional fine-tuning of smaller LLMs and base LLMs that merely reply on prompting techniques. Our findings indicate that instruction fine-tuning is an effective strategy for minimizing bias in LLMs while retaining knowledge. The code and dataset are accessible at \href{https://github.com/shainarazavi/Safe-Responsible-LLM}{SR-LLM}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Shaina Raza (53 papers)
  2. Oluwanifemi Bamgbose (8 papers)
  3. Shardul Ghuge (4 papers)
  4. Deepak John Reji (5 papers)
  5. Fatemeh Tavakol (1 paper)
  6. Syed Raza Bashir (12 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com