$\mathcal{A}LLM4ADD$: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection (2505.11079v1)

Published 16 May 2025 in cs.SD, cs.CL, and eess.AS

Abstract: Audio deepfake detection (ADD) has grown increasingly important due to the rise of high-fidelity audio generative models and their potential for misuse. Given that audio LLMs (ALLMs) have made significant progress in various audio processing tasks, a heuristic question arises: Can ALLMs be leveraged to solve ADD?. In this paper, we first conduct a comprehensive zero-shot evaluation of ALLMs on ADD, revealing their ineffectiveness in detecting fake audio. To enhance their performance, we propose $\mathcal{A}LLM4ADD$, an ALLM-driven framework for ADD. Specifically, we reformulate ADD task as an audio question answering problem, prompting the model with the question: "Is this audio fake or real?". We then perform supervised fine-tuning to enable the ALLM to assess the authenticity of query audio. Extensive experiments are conducted to demonstrate that our ALLM-based method can achieve superior performance in fake audio detection, particularly in data-scarce scenarios. As a pioneering study, we anticipate that this work will inspire the research community to leverage ALLMs to develop more effective ADD systems.

Authors (9)

Hao Gu (27 papers)
Jiangyan Yi (77 papers)
Chenglong Wang (80 papers)
Jianhua Tao (139 papers)
Zheng Lian (51 papers)
Jiayi He (20 papers)
Yong Ren (65 papers)
Yujie Chen (46 papers)
Zhengqi Wen (69 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

$\mathcal{A}LLM4ADD$: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection (2505.11079v1)

Summary

Related Papers