Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 74 tok/s
Gemini 2.5 Pro 55 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 109 tok/s Pro
Kimi K2 212 tok/s Pro
GPT OSS 120B 464 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

SABER: Model-agnostic Backdoor Attack on Chain-of-Thought in Neural Code Generation (2412.05829v2)

Published 8 Dec 2024 in cs.SE

Abstract: Recent studies have proposed integrating Chain-of-Thought (CoT) reasoning to further enhance the reliability of Code LLMs (CLMs) in generating code, a step-by-step approach that breaks down complex programming tasks into manageable sub-problems. Advances in this area have introduced CoT models, specifically designed to integrate CoT reasoning effectively into LLMs, achieving notable improvements in code generation. Despite these advancements, the security of CoT models has not been systematically studied. In this study, we aim to fill this gap by investigating the vulnerability of CoT models to backdoor injection in code generation tasks. To address this, we propose a model-agnostic backdoor attack method SABER (Self-Attention-BasEd backdooR) based on the self-attention mechanism. SABER begins by selecting a malicious output as the backdoor using code mutation operations. It then identifies the tokens most relevant to poisoned content by analyzing self-attention scores in the CodeBERT model. Finally, it mimicks user behavior to generate adaptive and natural triggers. Our experiments on HumanEval-CoT and OpenEval-CoT test sets demonstrate that CoT models are susceptible to backdoor attacks via data poisoning. Taking the HumanEval-CoT dataset as an example, SABER achieves an ASR of 80.95%, representing an improvement of 33.33% over RIPPLe and a substantial 4.76% enhancement compared to BadPre. Further evaluations using ONION for automated detection and human studies reveal that SABER is stealthier and harder to detect, bypassing 61.90% of automated detection, with a human detection rate of just 3.17%. Our findings reveal that backdoors can be injected into CoT models to manipulate downstream code generation tasks. This highlights the urgent need for further research to understand and mitigate the security vulnerabilities in CoT models.

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube