Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack (2312.06924v2)

Published 12 Dec 2023 in cs.CL

Abstract: Recent developments in balancing the usefulness and safety of LLMs have raised a critical question: Are mainstream NLP tasks adequately aligned with safety consideration? Our study, focusing on safety-sensitive documents obtained through adversarial attacks, reveals significant disparities in the safety alignment of various NLP tasks. For instance, LLMs can effectively summarize malicious long documents but often refuse to translate them. This discrepancy highlights a previously unidentified vulnerability: attacks exploiting tasks with weaker safety alignment, like summarization, can potentially compromise the integrity of tasks traditionally deemed more robust, such as translation and question-answering (QA). Moreover, the concurrent use of multiple NLP tasks with lesser safety alignment increases the risk of LLMs inadvertently processing harmful content. We demonstrate these vulnerabilities in various safety-aligned LLMs, particularly Llama2 models, Gemini and GPT-4, indicating an urgent need for strengthening safety alignments across a broad spectrum of NLP tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yu Fu (86 papers)
  2. Yufei Li (29 papers)
  3. Wen Xiao (32 papers)
  4. Cong Liu (169 papers)
  5. Yue Dong (61 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com