Adversarial Examples for Models of Code (1910.07517v5)

Published 15 Oct 2019 in cs.LG and cs.PL

Abstract: Neural models of code have shown impressive results when performing tasks such as predicting method names and identifying certain kinds of bugs. We show that these models are vulnerable to adversarial examples, and introduce a novel approach for attacking trained models of code using adversarial examples. The main idea of our approach is to force a given trained model to make an incorrect prediction, as specified by the adversary, by introducing small perturbations that do not change the program's semantics, thereby creating an adversarial example. To find such perturbations, we present a new technique for Discrete Adversarial Manipulation of Programs (DAMP). DAMP works by deriving the desired prediction with respect to the model's inputs, while holding the model weights constant, and following the gradients to slightly modify the input code. We show that our DAMP attack is effective across three neural architectures: code2vec, GGNN, and GNN-FiLM, in both Java and C#. Our evaluations demonstrate that DAMP has up to 89% success rate in changing a prediction to the adversary's choice (a targeted attack) and a success rate of up to 94% in changing a given prediction to any incorrect prediction (a non-targeted attack). To defend a model against such attacks, we empirically examine a variety of possible defenses and discuss their trade-offs. We show that some of these defenses can dramatically drop the success rate of the attacker, with a minor penalty of 2% relative degradation in accuracy when they are not performing under attack. Our code, data, and trained models are available at https://github.com/tech-srl/adversarial-examples .

Authors (3)

Noam Yefet (1 paper)
Uri Alon (40 papers)
Eran Yahav (21 papers)

Citations (146)

View on Semantic Scholar

Summary

Adversarial Examples for Models of Code: An Evaluation

The landscape of machine learning has expanded to encompass models that understand and generate code, achieving noteworthy results in tasks such as code summarization, code generation, and bug detection. However, the vulnerability of these models to adversarial examples raises concerns about their robustness and reliability. The paper by Yefet, Alon, and Yahav explores the susceptibilities of neural code models and presents a novel approach for crafting adversarial examples targeted at these models.

Overview of the Approach

The paper introduces Discrete Adversarial Manipulation of Programs (DAMP), a technique designed to exploit the weaknesses of trained neural models of code by presenting them with adversarial examples. These examples are generated by inducing perturbations in the input code that do not alter the original program semantics, yet lead to altered model predictions. DAMP employs gradient-based methods to find minimal changes that can steer the model towards an incorrect prediction. This methodology is effective across different neural architectures, namely code2vec, GGNN, and GNN-FiLM, and is evaluated on programs written in Java and C#.

Numerical Evidence and Effectiveness

The DAMP technique shows a significant capacity for both targeted and non-targeted attacks. A mere change of a variable name can achieve up to 89% success rates in targeted attacks and up to 94% in non-targeted attacks. The success of such minimal changes underscores the fragility of current code models facing adversarial inputs. The paper further compares various defense mechanisms, each with different trade-offs between accuracy and resilience.

Implications and Future Directions

These findings indicate the need to reassess the robustness of models in software engineering and code analysis domains. As adversarial examples can lead to severe consequences, such as the misclassification of malware as benign software, enhancing model resilience has practical implications for security purposes. On a theoretical level, the introduction of DAMP opens avenues for exploring adversarial robustness in discrete domains, an area traditionally dominated by continuous inputs like images.

Defenses and their Trade-offs

Empirical analysis of various defenses, such as training models without variable names or with adversarial examples, reveals significant improvements in robustness, albeit with a slight compromise in performance. Modular defenses, which act as preprocessing filters, present alternatives that maintain model accuracy while providing layered defense against adversarial perturbations.

Conclusion

The paper provides a critical examination of vulnerabilities in neural models of code and the need for robust defenses against adversarial examples. As academic and industry efforts continue to integrate AI into programming languages and software development tools, this work highlights essential considerations for developing reliable and secure AI-driven solutions. Moving forward, further research into strengthening adversarial defenses and exploring transferability across models will be pivotal to advancing the domain of secure and intelligent software systems.

PDF Markdown

Related Papers

GitHub

GitHub - tech-srl/adversarial-examples: Code for the paper: "Adversarial Examples for Models of Code" (17 stars)