Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Closer Look into Transformer-Based Code Intelligence Through Code Transformation: Challenges and Opportunities (2207.04285v1)

Published 9 Jul 2022 in cs.SE

Abstract: Transformer-based models have demonstrated state-of-the-art performance in many intelligent coding tasks such as code comment generation and code completion. Previous studies show that deep learning models are sensitive to the input variations, but few studies have systematically studied the robustness of Transformer under perturbed input code. In this work, we empirically study the effect of semantic-preserving code transformation on the performance of Transformer. Specifically, 24 and 27 code transformation strategies are implemented for two popular programming languages, Java and Python, respectively. For facilitating analysis, the strategies are grouped into five categories: block transformation, insertion/deletion transformation, grammatical statement transformation, grammatical token transformation, and identifier transformation. Experiments on three popular code intelligence tasks, including code completion, code summarization and code search, demonstrate insertion/deletion transformation and identifier transformation show the greatest impact on the performance of Transformer. Our results also suggest that Transformer based on abstract syntax trees (ASTs) shows more robust performance than the model based on only code sequence under most code transformations. Besides, the design of positional encoding can impact the robustness of Transformer under code transformation. Based on our findings, we distill some insights about the challenges and opportunities for Transformer-based code intelligence.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yaoxian Li (3 papers)
  2. Shiyi Qi (9 papers)
  3. Cuiyun Gao (97 papers)
  4. Yun Peng (24 papers)
  5. David Lo (229 papers)
  6. Zenglin Xu (145 papers)
  7. Michael R. Lyu (176 papers)
Citations (4)