A Black-Box Attack on Code Models via Representation Nearest Neighbor Search (2305.05896v3)
Abstract: Existing methods for generating adversarial code examples face several challenges: limted availability of substitute variables, high verification costs for these substitutes, and the creation of adversarial samples with noticeable perturbations. To address these concerns, our proposed approach, RNNS, uses a search seed based on historical attacks to find potential adversarial substitutes. Rather than directly using the discrete substitutes, they are mapped to a continuous vector space using a pre-trained variable name encoder. Based on the vector representation, RNNS predicts and selects better substitutes for attacks. We evaluated the performance of RNNS across six coding tasks encompassing three programming languages: Java, Python, and C. We employed three pre-trained code models (CodeBERT, GraphCodeBERT, and CodeT5) that resulted in a cumulative of 18 victim models. The results demonstrate that RNNS outperforms baselines in terms of ASR and QT. Furthermore, the perturbation of adversarial examples introduced by RNNS is smaller compared to the baselines in terms of the number of replaced variables and the change in variable length. Lastly, our experiments indicate that RNNS is efficient in attacking defended models and can be employed for adversarial training.
- A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4998–5007.
- Source code authorship attribution using long short-term memory based networks. In Computer Security – ESORICS 2017, pages 65–82, Cham. Springer International Publishing.
- Generating adversarial source programs using important tokens-based structural transformations. In 2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS), pages 173–182.
- Codebert: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1536–1547.
- Deep code search. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pages 933–944. IEEE.
- Graphcodebert: Pre-training code representations with data flow. In International Conference on Learning Representations.
- Semantic robustness of models of source code. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 526–537.
- Akshita Jha and Chandan K Reddy. 2023. Codeattack: Code-based adversarial attacks for pre-trained programming language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 14892–14900.
- Cclearner: A deep learning-based clone detection approach. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 249–260. IEEE.
- Multi-target backdoor attacks for code pre-trained models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7236–7254, Toronto, Canada. Association for Computational Linguistics.
- A closer look into transformer-based code intelligence through code transformation: Challenges and opportunities. arXiv preprint arXiv:2207.04285.
- Retrieval-augmented generation for code summarization via hybrid gnn. In International Conference on Learning Representations.
- Contrabert: Enhancing code pre-trained models via contrastive learning. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2476–2487.
- Graphsearchnet: Enhancing gnns via capturing global dependencies for semantic code search. IEEE Transactions on Software Engineering.
- A strong baseline for query efficient attacks in a black box setting. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8396–8409.
- Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks.
- Strata: Simple, gradient-free attacks for models of code. arXiv preprint arXiv:2009.13562.
- Generating adversarial computer programs using optimized obfuscations. In International Conference on Learning Representations.
- Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 261–271. IEEE.
- CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Deep learning code fragments for code clone detection. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 87–98. IEEE.
- Natural attack for pre-trained models of code. In Proceedings of the 44th International Conference on Software Engineering, ICSE ’22, page 1482–1493, New York, NY, USA. Association for Computing Machinery.
- Adversarial examples for models of code. Proceedings of the ACM on Programming Languages, 4(OOPSLA):1–30.
- Word-level textual adversarial attacking as combinatorial optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6066–6080.
- Towards robustness of deep program processing models—detection, estimation, and enhancement. ACM Trans. Softw. Eng. Methodol., 31(3).
- Generating adversarial examples for holding robustness of source code processing models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 1169–1176.
- Estimating mutual information between dense word embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8361–8371, Online. Association for Computational Linguistics.
- Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Advances in neural information processing systems, 32.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.