Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak (2501.13772v1)

Published 23 Jan 2025 in cs.SD, cs.AI, cs.LG, cs.MM, and eess.AS

Abstract: LLMs demonstrate remarkable zero-shot performance across various natural language processing tasks. The integration of multimodal encoders extends their capabilities, enabling the development of Multimodal LLMs that process vision, audio, and text. However, these capabilities also raise significant security concerns, as these models can be manipulated to generate harmful or inappropriate content through jailbreak. While extensive research explores the impact of modality-specific input edits on text-based LLMs and Large Vision-LLMs in jailbreak, the effects of audio-specific edits on Large Audio-LLMs (LALMs) remain underexplored. Hence, this paper addresses this gap by investigating how audio-specific edits influence LALMs inference regarding jailbreak. We introduce the Audio Editing Toolbox (AET), which enables audio-modality edits such as tone adjustment, word emphasis, and noise injection, and the Edited Audio Datasets (EADs), a comprehensive audio jailbreak benchmark. We also conduct extensive evaluations of state-of-the-art LALMs to assess their robustness under different audio edits. This work lays the groundwork for future explorations on audio-modality interactions in LALMs security.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Erjia Xiao (13 papers)
  2. Hao Cheng (190 papers)
  3. Jing Shao (109 papers)
  4. Jinhao Duan (23 papers)
  5. Kaidi Xu (85 papers)
  6. Le Yang (69 papers)
  7. Jindong Gu (101 papers)
  8. Renjing Xu (72 papers)

Summary

We haven't generated a summary for this paper yet.