Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Augmenting Decompiler Output with Learned Variable Names and Types (2108.06363v1)

Published 13 Aug 2021 in cs.SE and cs.PL

Abstract: A common tool used by security professionals for reverse-engineering binaries found in the wild is the decompiler. A decompiler attempts to reverse compilation, transforming a binary to a higher-level language such as C. High-level languages ease reasoning about programs by providing useful abstractions such as loops, typed variables, and comments, but these abstractions are lost during compilation. Decompilers are able to deterministically reconstruct structural properties of code, but comments, variable names, and custom variable types are technically impossible to recover. In this paper we present DIRTY (DecompIled variable ReTYper), a novel technique for improving the quality of decompiler output that automatically generates meaningful variable names and types. Empirical evaluation on a novel dataset of C code mined from GitHub shows that DIRTY outperforms prior work approaches by a sizable margin, recovering the original names written by developers 66.4% of the time and the original types 75.8% of the time.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Qibin Chen (11 papers)
  2. Jeremy Lacomis (4 papers)
  3. Edward J. Schwartz (7 papers)
  4. Claire Le Goues (34 papers)
  5. Graham Neubig (342 papers)
  6. Bogdan Vasilescu (22 papers)
Citations (39)

Summary

We haven't generated a summary for this paper yet.