Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Using Distributed Representation of Code for Bug Detection (1911.12863v1)

Published 28 Nov 2019 in cs.SE

Abstract: Recent advances in neural modeling for bug detection have been very promising. More specifically, using snippets of code to create continuous vectors or \textit{embeddings} has been shown to be very good at method name prediction and claimed to be efficient at other tasks, such as bug detection. However, to this end, the method has not been empirically tested for the latter. In this work, we use the Code2Vec model of Alon et al. to evaluate it for detecting off-by-one errors in Java source code. We define bug detection as a binary classification problem and train our model on a large Java file corpus containing likely correct code. In order to properly classify incorrect code, the model needs to be trained on false examples as well. To achieve this, we create likely incorrect code by making simple mutations to the original corpus. Our quantitative and qualitative evaluations show that an attention-based model that uses a structural representation of code can be indeed successfully used for other tasks than method naming.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jón Arnar Briem (1 paper)
  2. Jordi Smit (1 paper)
  3. Hendrig Sellik (2 papers)
  4. Pavel Rapoport (1 paper)
Citations (7)