BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors (2501.02373v2)

Published 4 Jan 2025 in cs.LG and cs.CR

Abstract: Task arithmetic in large-scale pre-trained models enables agile adaptation to diverse downstream tasks without extensive retraining. By leveraging task vectors (TVs), users can perform modular updates through simple arithmetic operations like addition and subtraction. Yet, this flexibility presents new security challenges. In this paper, we investigate how TVs are vulnerable to backdoor attacks, revealing how malicious actors can exploit them to compromise model integrity. By creating composite backdoors that are designed asymmetrically, we introduce BadTV, a backdoor attack specifically crafted to remain effective simultaneously under task learning, forgetting, and analogy operations. Extensive experiments show that BadTV achieves near-perfect attack success rates across diverse scenarios, posing a serious threat to models relying on task arithmetic. We also evaluate current defenses, finding they fail to detect or mitigate BadTV. Our results highlight the urgent need for robust countermeasures to secure TVs in real-world deployments.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors (2501.02373v2)

Collections

Summary

Follow-up Questions

Authors (9)

Don't miss out on important new AI/ML research

BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors (2501.02373v2)

Collections

Summary

Follow-up Questions

Related Papers

Authors (9)

Don't miss out on important new AI/ML research