Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion (2007.02220v3)

Published 5 Jul 2020 in cs.CR, cs.CL, cs.LG, and cs.PL

Abstract: Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural LLMs, trained on public open-source code repositories, to suggest likely (not just statically feasible) completions given the current context. We demonstrate that neural code autocompleters are vulnerable to poisoning attacks. By adding a few specially-crafted files to the autocompleter's training corpus (data poisoning), or else by directly fine-tuning the autocompleter on these files (model poisoning), the attacker can influence its suggestions for attacker-chosen contexts. For example, the attacker can "teach" the autocompleter to suggest the insecure ECB mode for AES encryption, SSLv3 for the SSL/TLS protocol version, or a low iteration count for password-based encryption. Moreover, we show that these attacks can be targeted: an autocompleter poisoned by a targeted attack is much more likely to suggest the insecure completion for files from a specific repo or specific developer. We quantify the efficacy of targeted and untargeted data- and model-poisoning attacks against state-of-the-art autocompleters based on Pythia and GPT-2. We then evaluate existing defenses against poisoning attacks and show that they are largely ineffective.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Roei Schuster (14 papers)
  2. Congzheng Song (23 papers)
  3. Eran Tromer (5 papers)
  4. Vitaly Shmatikov (42 papers)
Citations (136)

Summary

We haven't generated a summary for this paper yet.