Toward Responsible Federated Large Language Models: Leveraging a Safety Filter and Constitutional AI (2502.16691v1)

Published 23 Feb 2025 in cs.CL, cs.DC, and cs.MA

Abstract: Recent research has increasingly focused on training LLMs using federated learning, known as FedLLM. However, responsible AI (RAI), which aims to ensure safe responses, remains underexplored in the context of FedLLM. In FedLLM, client data used for training may contain harmful content, leading to unsafe LLMs that generate harmful responses. Aggregating such unsafe LLMs into the global model and distributing them to clients may result in the widespread deployment of unsafe LLMs. To address this issue, we incorporate two well-known RAI methods into FedLLM: the safety filter and constitutional AI. Our experiments demonstrate that these methods significantly enhance the safety of the LLM, achieving over a 20% improvement on AdvBench, a benchmark for evaluating safety performance.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Toward Responsible Federated Large Language Models: Leveraging a Safety Filter and Constitutional AI (2502.16691v1)

Summary

Follow-up Questions

Related Papers

Authors (2)